80% of corporate information is unstructured. Really?

by | Sep 11, 2018 | Digital workplace, Information Management, Search

Along with the time that knowledge workers spend searching the second most widely used statistic in discussions around search business cases is that 80% of organizational information is unstructured. It is not so much a statistic as a statement, as rarely are the implications discussed in any detail.

In May 2000 Dr. Mike Lynch, the founder of Autonomy, gave a press conference, He was presenting the quarterly report on his company and stated that volume of unstructured information within companies (in other words anything that is not held in a database such as text documents and emails) was doubling every three months. He went on to forecast that in two years 80% of valuable corporate information will be in an unstructured format. It was a number that the Autonomy press team repeated as frequently as they could. The argument was that search applications that could handle unstructured text were the future. I have seen some anecdotal information that this number came out of International Data Corporation, but I have not been able to confirm this. In 2008 Seth Grimes published a very detailed account of the history of 80% so I am going to take up the story a few years later.

In 2012 a survey of the growth of big data applications was undertaken by Unisphere Research on behalf of MarkLogic. The survey collected information from 250 companies. Of these only 14% had more than 50% of unstructured information and 60% had somewhere between 10 and 50%. Quite some way off from 80%! In a similar survey carried out by Unisphere in 2013 the percentages were very much the same.

In the 2016 survey there is no chart to show the percentage distribution but the Executive Summary states
“Overall, structured data still represents the vast majority of data under management. Around 45% of the respondents said that structured data accounted for 75% or more of data under management, and another 32% indicated that structured data represented more than half of the data under management. Moreover, 55% of the respondents said that the growth of structured data was greater than the growth of unstructured data.”

It is very likely that the percentage varies widely across industry sectors, with (for example but no evidence) professional services firms having a higher percentage than financial services businesses. It could be that in some organisations the level of unstructured data could be 80% of the total, but all the evidence suggests that the reality is (as indicated by Unisphere) that the advent of big data and the Internet of Things will catalyze a continuing growth in structured data. A 2016 survey by the Aberdeen Group emphasised the importance of unstructured data for decision making but quoted the 80% figure with a link to a presentation given by SAS in 2014 which quoted the source as IDC.

In the final analysis the percentage is irrelevant. The issue is whether employees have access to all the corporate information resources they need to make informed decisions. Given recent research on levels of search satisfaction the answer is almost certainly that they do not. That, rather than mythical volume figures and the time spent searching, is why search needs to be taken seriously.

Martin White