Gartner and Forrester on the search industry – Part 2

by | Jun 15, 2017 | Search

There is more work to do before I do a deep dive into the Gartner and Forrester reports. Patience!  As Charlie Hull has highlighted in his blog post Natural Language Processing (NLP) is not a ‘new’ technology, and indeed is arguably not a technology at all. The impression created in the Gartner and especially the Forrester reports is that NLP is new and game-changing. Like most aspects of search there is a long history to consider.

In my own case this history dates back to 1983/84 when I was heading up the e-publishing strategy development for Reed Publishing. Reed were partnering with IBM and the University of Waterloo on creating a computer-based version of the Oxford English Dictionary. The immediate benefit was to replace the 4 million index cards on which the OED was based. I have to admit I could not immediately understand the logic of IBM and the University of Waterloo being involved. The Reed task was to keyboard the OED into a machine-readable database. A visit to IBM Research in Hurley, UK, was eye-opening. IBM wanted a database of the English language that they could use to optimise their search applications, at that time primarily STAIRS. The OED not only gave them a definition but a context of how the word was used, often in multiple contexts. Computational linguistics was a speciality of the University of Waterloo, out of which came OpenText and much more. Then I could see the bigger picture. However this discovery came twenty years after the foundation of what is now the Association for Computational Linguistics in 1962!

The breakthrough period for NLP was towards the end of the 1990s. It was marked by the establishment of the Natural Language Processing Group at Stanford and by a seminal book from Christopher D Manning and Hinrich Schutz entitled The Foundations of Statistical Language Processing (Download). In 2000 MIT began to publish the journal Computational Linguistics, which remains the primary journal in the area and is helpfully open access. Since that time progress has been very rapid indeed, and if you would like a readable introduction to the development and future prospects of NLP then a paper in IEE Computational Intelligence Magazine by Erik Cambria and Bebo White is very well written.

From this very brief overview I hope you will begin to see that NLP has been a core discipline in linguistics and computer science for almost 20 years, and I’d be very confident in predicting that every search engine developed since the late 1990s has taken the opportunities into account. Moreover it is not just commercial vendors that have adopted NLP – there are open source options as well. So NLP has not suddenly appeared and formed the basis of a new generation of cognitive insight engines. This hidden development and adoption of NLP is similar to so many developments in search. Expertise tracking is an example, the algorithms for which were developed in the 1990s. Everyone (even Microsoft) is now offering this, but this is not because the technology has become ‘available’. It is because the need for expertise identification inside organisations has become increasingly important in globally-networked collaboration environments.

As I look back to the early history of search in the 1960s I can see that we have come a long way in small steps. However the fundamentals (for example the inverted file approach) remain the same. I have over a thousand research papers on my server and add at least a dozen or more each week. It’s the only way I know of judging what could be on the roadmaps of commercial and open source vendors in the next few years.

So that’s sorted out the NLP claims from Gartner and Forrester. Part 3 to come