What this blog is.... representative of my own views and experiences relating to the management and usage of data

What this blog is not...representative of the views of any employer or technology vendor (and it's not a place to find code either)

All views are my own...unless I switch on comments in which case those are yours!

Tuesday 23 July 2013

'The Hydrogen Sonata' and the ethics of Big Data


Earlier this summer we lost Iain M Banks, one of the most imaginative writers in modern literature.  Living in Edinburgh and sharing a number of good friends, I was lucky enough to have met him on several occassions over the past 25 years or so, in bars (mainly), SF conventions (sometimes) and even bobbing about in a swimming pool. He was every bit as wonderful and generous a man as the many appreciations of his life have deservedly highlighted and I'll miss him very much.   
 
But Iain's significant body of work remains to inspire and I was reminded of a section from his latest, last Culture novel, The Hydrogen Sonata when I clicked through to a link about Predictive Analytics in the Cloud from the LinkedIn Big Data group this morning.

The article - which reads exactly as most over excited vendor press releases read and is concerned with a market I just know Iain would have the greatest disdain for - talks about tools which, and I'll quote directly from the CEO "can literally just say, 'Here's every event that happens in the world.' Earnings, economic indicators, seasonality, cyclicality, political events, and so on'...And they can literally model every single stock in the AMEX and the NYSE in relation to every single event, and can gain that kind of precision around it, which obviously helps them in terms of making their investments."

In The Hydrogen Sonata, Banks writes about an even grander ambition ambition.  The story muses about the challenges encountered when trying to predict how an entire civilisation will react when confronted with evidence which may reveal their founding myths to have been a lie.  Other civilisations - including Banks' great utopia, The Culture -  have attempted a particular simulation modelling technique to try and predict the reaction but these have all been found wanting:

"The Simming Problem boiled down to, How True to life was it morally justified to be?"

A longer history of the challenge is also presented: 

"Long before more species made it to the stars, they would be entirely used to the idea that you never made any significant societal decision with large-scale or long-term consequences without running simulations of the future course of events, just to make sure you were doing the right thing.  Simming problems at that stage were usually constrained by not having the calculational power to run a sufficiently detailed analysis, or disagreements regarding what the initial conditions ought to do.

Later, usually round about the time when your society had develped the sort of processal tech you could call Artifical Intelligence without blushing, the true nature of the Simming Problem started to appear.

Once you could reliably model whole populations within your simulated environment, at the level of detail and complexity that meant individual within the simulation had some sort of independent existence, the question became : how god-like, and how cruel, did you want to be?"

These considerations develop over several pages.  To simulate life, life must first be created to the extent that it recognises itself as life, leading to the thought that we might all be in a simulation ourselves.  All fantastically, entertaining stuff but it's perhaps not surprising that Banks' Culture Minds end up concluding that "Just Guessing" is ultimately as effective. 

What I believe will endure about this passage is not just the wit and vision but the necessity to assess the ethics of data usage.  OK, 'simming' life is not where we are right now but larger and larger data sets are being use in increasingly 'sophisticated' models such as the one hyped by the Business Insider article. We've all been reminded of 1984 and The Minority Report in recent weeks and months thanks largely to the no longer secret efforts of the NSA. Mainstream commentators are falling over themselves to express concern and outrage about the uses to which data could or, with greater alarm, 'is' being put to.  Data Professionals should start developing some answers to the more common concerns. 

Questions such as what individuals need to know about the data being collected about them, how it is impacting the choices available to them, who their data can be sold on to and whether attempts should be made to identify individuals from analysing a variety of aggregate data sources are all relevant and live today.  And there is surely a role for Data Governance to play here?  Data Governance should help organisations understand what they are connecting and why.  Data Governance should help organisations understand what they can do legitimately - ethically - do with their data. This will require particular attention as organisations look to exploit secondary usage of data.  If nothing else, establishing a Big Data Use Assessment every time you want to use your data sets for a purpose other than that they were originally intended to will help reduce the risk of costly law suits later down the line.
 
Later in The Hydrogen Sonata one of the characters warns that “One should never mistake pattern for meaning" which is text book Big Data best practice and another indication that data practitioners should take heed of this, the last, of the great Iain M Banks' Culture novels.  Recommended now and forever.  Whichever simulation you find yourself in.

No comments:

Post a Comment