A history of enterprise search 1970-1979

Aug 8, 2017

Although the theme of this series of posts is enterprise search only now in the 1970s do products emerge which are clearly the antecedents of what we would regard as enterprise search applications. From here on in the focus on academic research will be significantly less, not because less research is being carried out but because it is well documented in a range of books. In particular¬† each chapter of Introduction to Information Retrieval by Manning, Raghavan and Schutze has an annotated bibliography and can be downloaded as a pdf. However there are three academics that I must include. The first of these is Gerard Salton. He developed the SMART software application as a ‘test bed’ at Harvard University and took it with him to Cornell University where he stayed for the rest of his career. Salton developed the cosine vector space model (VSM) to compare the relevance of a group of search results. The evolution of this model took place over a number of years and David Durbin has tried to unravel the way in which it developed, providing a good bibliography. Karen Spark Jones worked in a number of departments at Cambridge University from the time of her PhD in 1964. A profile of her work whilst at Cambridge links to papers describing her research, all of which has had a major impact on information retrieval. Her overview of information retrieval research is essential reading. The third person is Stephen Robertson, a research colleague of Karen Spark Jones who went on to work at the Microsoft Research Laboratories in Cambridge. His work has extended from the mid-1970s until quite recently, the scope of which is indicated by his list of research papers. Stephen is especially noted for his development of the BM25 ranking model, which built on the work of Karen Spark Jones on the term frequency.inverse document frequency model.

If you want to choose a date to mark the beginning of enterprise search then 1970 is that date, It marked the launch by IBM of STAIRS (Storage and Information Retrieval System), an evolution of the AQUARIUS software that IBM developed to cope with the documentation for the defence of an anti-trust suit in the USA that started in 1969. STAIRS was specifically designed for multi-user time-share applications (the typical enterprise scenario) and remained on the IBM product list until the early 1990s. Jumping out of any sort of chronology in 1985 STAIRS was subject to a very thorough evaluation which raised doubts about the effectiveness of full text indexing, A review article by David Blair, published in 1995, is a must-read for anyone with an interest in enterprise search and evaluation as it looks back at the 1985 evaluation with the benefit of substantial hindsight, and benefits from the fact that although Blair was one of the authors of the original review it comes across as an independent and unbiased assessment.

By the mid-1970s mini-computers were being adopted very widely, and many organisations and companies saw this as an opportunity to develop text/document retrieval software products for these mini-computers. These included BASIS (Battelle Institute) and INQUIRE (Infodata). So far this history has been dominated by developments in the USA but the mini-computer market stimulated software development in the UK, including ASSASSIN (ICI), STATUS (Atomic Weapons Research Establishment), CAIRS (Leatherhead Food Research Association) and DECO (Unilever). I had a role on the development team of DECO from 1979 – 1981. These and other applications all emerged towards the end of the 1970s. An interesting comparative review of them was published in 1984. These applications all evolved from specific organisational requirements which were then productised for use more widely, demonstrating that you did not need to be a large academic institution or software company to develop retrieval software. These systems were accessed through networked terminals; the IBM PC was not launched until 1981. I am sure the market developed in the same way in the USA but information on the US market at this time is very difficult to find.

As a footnote to this post on the 1970s the first assessment of the potential role of artificial intelligence in information retrieval was published in 1976. Just over a decade later Verity, the prototype for all enterprise search applications, emerged from a company specialising in AI development. That is a story for the next post.

