Elasticsearch – the definitive guide. A new book from O’Reilly Media

by | Mar 5, 2015 | Intranets, Reviews, Search

As I’m in the final stages of writing the 2nd edition of Enterprise Search I was delighted to see that O’Reilly Media had published Elasticsearch – the definitive guide, written by Clinton Gormley and Zachary Tong, both with Elasticsearch. I downloaded the pdf version and could not believe my eyes when the file page total reached 719. It makes the projected length of my own book of around 300 pages seem like a short story! The book is divided into seven sections, covering (with chapter numbers)

  • Getting started (11)
  • Search in depth (6)
  • Dealing with human language (7)
  • Aggregations (11)
  • Geolocation (4)
  • Modelling for data (4)
  • Administration, modelling and deployment (3)

There is, as you might expect, a great deal of code but it is surrounded by text of the highest quality of clarity and accuracy. I am not a developer and the code means nothing to me but the descriptions of the principles of information retrieval and search, and how these can be utilised in Elasticsearch, are faultless. For the same reason I’m not going to try to assess the book from a developer perspective. However there are some more general comments that I’d like to share with you.

First the scale of the book shows the functional power of open source search. I could not spot any functionality that was ‘missing’ and most organisations will only make use of a small percentage of the code. Both Solr and Elasticsearch have developed substantially over the last few years to meet emerging requirements from users captured by the community and in the case of Elasticsearch by Elasticsearch.com. What makes search difficult to manage are the challenges of language analysis and to see seven chapters on this topic is a good indication of the quality of the book and the software.

Second the scale of the book illustrates why open source search may be easy to download for free but from there on in you really need to know what you are doing, and for that you need a sound background in information retrieval concepts and practice. There is no point in giving this book to a developer who is not booked out to a project at present! Although there are many worked examples in the book you need to be able to extend these laterally to your own organisation to understand how best to use Elasticsearch, and that requires a knowledge of the repositories to be searched and the types of query that will be used. Open source search has to be developed as a partnership between the development team and the business team. Even writing the functional specification is going to take a substantial amount of experience and formal knowledge.

Third it is worth paying particular attention to the icons which indicate tips, suggestions and warnings. Elasticsearch is still quite young and there are various catches for the unwary. If you want to change the parent value of a child document, it is not sufficient to just re-index or update the child document—the new parent document may be on a different shard. Instead, you must first delete the old child, and then index the new child. A small point but with potentially big impacts. As with any search software understanding where a change requires a re-index is very important.

Finally this book is all about software development and not about search management. There is no reference to search logs and analytics and the management section is mainly about technical performance management. Not unsurprisingly search interface design is not covered at all. The index is superb but there is no entry for ‘user’ or for ‘interface’, nor (more surprisingly) for federated search.

I don’t have the expertise to judge this book as a reference handbook on Elasticsearch though I suspect that there will not be many other books on the topic now that this one has been released by authors who are both with Elasticsearch. As a manual on the way in which information retrieval software works it is very good indeed and any student on a computer science or information science course will find the technical explanations a great deal easier to understand than most of the reference texts on the subject. Business and IT managers should also speed read this book to get an idea of how carefully they will need to specify the functionality of the application.

Martin White