What this blog is.... representative of my own views and experiences relating to the management and usage of data

What this blog is not...representative of the views of any employer or technology vendor (and it's not a place to find code either)

All views are my own...unless I switch on comments in which case those are yours!

Sunday 7 July 2013

Three Idiots Sat Babbling


At the start of the week there was an interesting piece in The Guardian about Big Data.  There’s been a lot of this sort of thing over the past few years of course but I’ve really been noticing a ramp up in press attention over the past couple of months.  Perhaps that has something to do with the recent release of Big Data: A Revolution that will transform how we Live, Work and Think by Kenneth Cukier and Viktor Mayer-Shonberger.  It's certainly a provocative read and one I’ll return to in a future post but for now I wanted to focus on another text mentioned in that Guardian article – The Minority Report by the great Philip K Dick.

This is a (very good) short story written in 1956 by Dick and undoubtedly gets referenced in quite so many Big Data articles because of the 2002 (not bad) film adaptation which definitely made analytics sexier than it probably deserves to be.  It was of relevance to the Guardian article primarily because the PreCrime unit it revolves around so closely resembles the “Crush” (Criminal Reduction Utilising Statistical History) policing approach being adopted in various parts of the planet. 

You can get a precis of the plot here (though I’d recommend reading it yourself because it is good) but I was interested in reading it to see if there was anything on top of the basic concept of using predictive analysis to reduce crime that can relate to the business of data management as I know it fifty eight years after the story was written.   

Given the era it was written in we can surely forgive the eccentric systems architecture it describes.  Chapter 1 gives a useful summary of what the PreCrime unit looks like...

"In the gloomy half-darkness the three idiots sat babbling. Every incoherent utterance, every random syllable, was analysed, compared, reassembled in the form of visual symbols, transcribed on conventional punchcards, and ejected into various coded slots. All day long the idiots babbled, imprisoned in their special high-backed chairs, held in one rigid position by metal bands, and bundles of wiring, clamps. Their physical needs were taken care of automatically. They had no spiritual needs. Vegetable-like, they muttered and dozed and existed. Their minds were dull, confused, lost in shadow.

But not the shadows of today. The three gibbering, fumbling creatures, with their enlarged heads and wasted bodies, were contemplating the future. The analytical machinery was recording prophecies, and as the three precog idiots talked, the machinery carefully listened."

No doubt a familiar experience to anyone that’s worked in Business Intelligence or Data Warehousing environments in the past decade but let’s look past the punchcard technology and call our “precog idiots” the equivalent of a Big Data statistical correlation engine.  As the story progresses, Dick offers an insight into how the three work together…

"...the system of the three precogs finds its genesis in the computers of the middle decades of this century. How are the results of an electronic computer checked? By feeding the data to a second computer of identical design. But two computers are not sufficient. If each computer arrived at a different answer it is impossible to tell a priori which is correct. The solution, based on a careful study of statistical method is to utilise a third computer to check the results of the first two. In this manner, a so-called majority report is obtained"

At a stretch, we could call this in memory, parallel processing? 

OK, I’m stretching the point here.  Perhaps Dick was not that much of a systems visionary?  Maybe his real strength was in predicting some of the data management issues that commonly arise today? 

1)      A Data Quality issue causes serious problems – Specifically it's the data quality dimension of timeliness that is revealed as having dropped the hero into difficulty.  It is the fact that each of the precog reports are run at different times – therefore using different ‘real time’ parameters – that leads to the different result of  the minority report.  And that’s the simple version of the plot. Nevertheless, the lesson is clear - don't compare apples with oranges.

2)      Engage current data providers in any delivery enhancement project - Driving the plot behind the pre-cog mistake is a power struggle between the Army who used to impose order and the PreCrime Unit that has usurped that role.  This seems to me to perfectly reflect the challenge any new Business Analytics solution faces when having to earn the credibility required to replace existing solutions.  Maybe even today there are people who claim to see little difference in Big Data solutions other than scale? Successful implementation projects will aim to bring in the existing providers of information into their stakeholder engagement.  These resource typically have much wisdom to impart and should be encouraged to find benefit from the new solutions,

3)      Knowing what data is held and how it is reported is key – It is only because of his position in the PreCrime Unit that the hero get placed in the tricky situation in the first place but it is also only by understanding the nature of the data held about him does he change his behaviour to escape the trap (sort of). Not only do I think it’s sensible data governance practice for organisations to know what data they actually have and how it is reported, I also think it is important for all of us as individuals to educate ourselves about what data is held about us, where, by whom and, critically, how we all contribute to it’s creation.

From an initial review of Cukier and Mayer-Shonberger I note that the authors are advising we all stop worrying about the causal why’s? and focus instead on the what’s? that the data shows us.  Clearly the lesson from Dick’s The Minority Report is to continue asking ‘why’ and ‘what’ but also start asking 'what if’. 

I’m aiming to post some thoughts on Big Data: A Revolution once I’ve finished it.  Hopefully it will be interesting to compare that vision of the future we’re living today with Dick’s vision of the future he envisaged back in 1956.

No comments:

Post a Comment