A history of enterprise search 1960-1969

Condensing the progress made in the 1960s into a single post is not easy and so this is a very selective perspective.  As far as algorithm developments were concerned Bourne and Ford published a paper on stemming in 1961, Damerau reported on approaches to solve mis-spellings and in 1965 Rocchio and Salton considered how best to optimise the performance of retrieval systems. This was one of the first outcomes of the SMART project, initially at Harvard and then at Cornell, that will figure significantly in the history of the 1970s. Many of the developments of the period were reported in a new Information Retrieval section of ACM Communications from March 1964. A year earlier Information Storage and Retrieval was launched as a peer-reviewed journal, changing its name to Information Processing and Management in 1975.

Another initiative that started in the 1960s and lasted into the 1970s was ground-breaking work by Cyril Cleverdon, the librarian of the Cranfield Institute of Technology, UK on the comparative efficiency of indexing systems. It was funded by the US National Science Foundation.  I had the good fortune to meet Cyril early in my career and his encouragement of my career choice was along the lines of “You will never be out of a job”. How right he was.

In the 1960s advances in computer technology resulted in some very important developments in search development in terms of both research and the availability of commercial services. Not only did IBM release the 7090 range in late 1959 but followed up quickly with the 360 range in 1965. In parallel the technology to provide remote shared access to large computer centres was developed, with J.C.R.Licklider as the early innovator, leading directly to the Internet. At this point in the history of search a chronological approach is not of value, and instead it is valuable to be aware of a number of major projects, several of which lead to commercial online services becoming available from 1965 onwards.

In terms of the impact on the underlying algorithms of search the work at System Development Corporation in the early part of the decade is of particular importance. The Wikipedia link to SDC (based largely in Los Angeles) gives no sense of the very high level of innovation in SDC, especially in the development of information retrieval. Synthex was led by Robert Simmons with the objective of developing a system that could read and understand text, answer questions and compose an answer in readable English. The name was chosen as a tribute to the Memex concept of Vannevar Bush from 1945. There was a related ProtoSynthex project. One outcome of these projects was TEXTIR, an online search system developed for the Los Angeles Police Department in 1964 that could accept questions in natural language. Further development enabled it to incorporate synonyms into a search formulation and offer search term weighting. In parallel Hal Borko was developing BOLD with a focus on the automatic classification of the text in documents. Yet another project was COLEX, the aim of which was to advance the development of using time-sharing services to provide online access to bibliographic databases.

These projects gave SDC the ability to launch the ORBIT online search service in 1967, a commercial service for information professionals and researchers which enabled them to search through large databases of abstracts of research literature. The project was led by Carlos Cuadra. Just a few months earlier the Information Sciences Group at the Lockheed Palo Alto Research Laboratories, led by Roger Summit, had launched the DIALOG online search service. The focus of this group was more towards scaling up online services and user interface development and one of its innovations was the display of set numbers at each stage of a query, a forerunner of facet hit numbers in current search applications.However probably the first public demonstration of computer-based information retrieval was at the 1964 World Fair with the LIBRARY/USA demonstration.

Other major centres of information retrieval science and application development in the 1960s included the work at Harvard and then Cornell University led by Gerard Salton, though this did not come to fruition until the early 1970s. Probably the most innovative was the work of Donald Hillman at Lehigh University on searching the full text of documents (the LEADER project) but mention should also be made of the SPIRES project at Stanford University (which remains one of the pre-eminent centres of information retrieval to this day) and TIP at MIT’s Lincoln Laboratories. IBM was also very much involved in retrieval research on a global basis and research in to the use of computer applications for law research had been initiated. These and many other projects are described in detail by Bourne and Hahn in The History of Online Services 1963-1976 and in addition there is an excellent paper by Hahn based on the research for the book.

The importance of the online services to enterprise search is that they addressed the issues of scaling up the concepts developed in the 1950s and paid attention to user satisfaction, the user interface and user support. Probably the first user assessment of an online service was carried out in 1969 by Timbie and Coombs.  It was not until the early 1970s that these services were available in Europe and indeed globally, a problem primarily of low network capacity and very high network access costs. The launch of these services also set a standard for the search experience for a generation of information professionals and researchers that was not challenged until the arrival of Alta Vista and then Google 30 years later. The online services showed that research services could be delivered on demand at the desktop. The next decade was primarily about delivering only the most relevant information.

