Some guiding principles for a successful open data economy

November 1st, 2011

For some months now, I’ve been following “open data” on my google alerts and there is a lot going on, even in nations like Kenya. Added to the experience and frustration I’ve gained working on Britain’s second largest public dataset – crime – and working with my excellent Chief Data Architect, I’ve come to realise that for any government, organisation or developer starting out on this journey, there ought to be in place the following principles from the outset – which I’m sorry to say, has not been our experience.

Guiding principles for open data;

  1. The data released by the government or public sector organisation must be 100% accurate and all fields completed. If it is not, you will raise the barriers to entry for developers who will have to spend their own precious time and money cleaning it up which they will then understandably be reluctant to share with anyone else.
  2. No new data fields may be added until point 1. is fully complete. If you try to run before you can walk, you and your data will fall over and public confidence in the initiative will quickly ebb.
  3. The provider of the data must not have a conflict of interest in providing it – i.e. they should not be releasing public data but witholding key information that they hope to make financial gain from nor should they be entitled to run a website that uses the data to advertise their business, nor should they have first sight and use of the data before the development community. Release the data yes –  release a competing platform simultaneously with unfair advantages – no.
  4. There must be from the start an open two-way conversation between data providers and developers. The easiest way to do this is to set up a forum with an ethos of recrimination free information discovery. The much less effective way is to set up a panel of experts who meet every couple of months and have little first hand knowledge of the data.
  5. Visibly open metrics must be kept on the number of active developers/applications/downloads – what is growing, declining, why?
  6. Government must take full responsibility for the accuracy of the data it releases. If some of it is found to be wrong, say so and say when it will be fixed. Do not change files and not tell any developers or the general public. They will notice.
  7. Government, public sector organisations and developers must be quick to be honest and open about any mistakes in data or applications – there will be some, this is part of the learning process. The point is that mistakes get found because it is open but were hidden away when it was closed.
  8. Establish clear data governance rules from the outset. Central hub collectors of large datasets should not permit spoke feed-in organisations to freely change codes, numbers, files etc. – this has a knock-on effect to developers down the line whose system is then broken and has to be redesigned, again raising the cost of open data development. Data integrity needs to be relied upon. Databases rely on defined rules that are always do this and never do that.

UKCrimeStats goes down because of . . . crime !

April 18th, 2011

It’s quite an irony that all phone and internet lines near our datacentre went down for a few days from last Friday mid-afternoon until this morning because of the theft of 1,000 pairs of cable !

So we had no way of getting our website UKCrimeStats back online again which is incredibly annoying in the early days of any website. And my initial anger with BT turned to some sympathy when I found out why this had happened.

Cable theft is a growing problem and the root cause it would seem – other than dishonesty – has to be the pretty spectacular recovery of copper prices since 2008, now near recent highs.

For all that, I find it quite incredible that this actually happened in broad daylight of the time of my cut-off was anything to go by. Cable theft is apparently costing the UK £770m a year – but that doesn’t include the lost time and opportunities to those dependent on broadband and telecom services.

DECODING THE CRIME DATA IN ENGLAND AND WALES: DEC 2010 – FEB 2011

April 15th, 2011

We now have the first three months of data uploaded on www.ukcrimestats.com and I realise that not everyone has the patience to work their way through a lot of tables to get the big picture. So that’s why we produced this first report which contains the top 10 results for the different combinations over the first quarter of data. For more results than the top 10, go to the website and export up to 1,000 results to Excel. UKCrimeStats is the only website that can give you this kind of cumulative analysis.

Decoding the Crime Data: Dec 2010 - Feb 2011And here are the contents;

Key Findings

Police Forces

Table 1: The 10 Highest Total Crime Rate Police Forces

Table 2: The 10 Lowest Total Crime Rate Police Forces

Neighbourhoods

Table 3: The 10 Highest Total Crime Neighbourhoods

Table 4: The 10 Highest Total Crime Rate Neigbourhoods

Table 5: The 10 Highest Total Violent Crime Neighbourhoods

Table 6: The 10 Highest Violent Crime Rate Neighbourhoods

Table 7: The 10 Highest Total Vehicle Crime Neighbourhoods

Table 8: The 10 Highest Vehicle Crime Rate Neighbourhoods

Table 9: The 10 Highest Total Robbery Crime Neighbourhoods

Table 10: The 10 Highest Robbery Crime Rate Neighbourhoods

Table 11: The 10 Highest Total Other Crime Neighbourhoods

Table 12: The 10 Highest Other Crime Rate Neighbourhoods

Table 13: The 10 Highest Total Burglary Crime Neighbourhoods

Table 14: The 10 Highest Burglary Crime Rate Neighbourhoods

Table 15: The 10 Highest Total Anti-social behaviour Crime Neighbourhoods

Table 16: The 10 Highest Anti-social behaviour crime rate Neighbourhoods

Streets

Table 17: The 10 Highest Total Crime Streets

Table 18: The 10 Highest Violent Crime Streets

Table 18: The 10 Highest Vehicle Crime Streets

Table 19: The 10 Highest Robbery Crime Streets

Table 20: The 10 Highest Other Crime Streets

Table 21: The 10 Highest Burglary Crime Streets

Table 22: The 10 Highest Anti-social behaviour Crime Streets

Policy Recommendations

Annex: Problems with the data

One of the things we’ve been very careful to be is open about the shortcomings of the data. Some of the results look wrong – but this is the official police data and that is what the site reflects. Here’s one finding we have flagged up on our FAQ page, appended below;

“Why is Great Moor Street in Bolton the highest violent crime on or near street in England and Wales?

The official data feed from www.police.uk belonging to Greater Manchester Police says it is. If you’d like to check, please download the official file http://crimemapper2.s3.amazonaws.com/frontend/crime-data/2011-02/2011-02-greater-manchester-street.zip . The problem appears to be the accompanying sentence to nearly every crime on or near Great Moor Street which says “CrimeMapper has moved this record to a location that is the system’s nearest available street to the centre of the policing foot-beat where the record was originally plotted”.

We don’t know why they’ve done that, perhaps the location records were lost?

It has skewed the results but we can’t guess which street each of these crimes were on and this website only reflects the official police data. And to remove the crimes on the street from the database would be a distortion as well. If it changes, we will change it too. But this is in the hands of the police and their data record-keeping.”

For all that, we are confident that UKCrimeStats is by far and away the most accurate, open and analytical website for looking at the crime data in the UK. And as the monthly data continues to accumulate and the Police and RKH get better at improving its quality, UKCrimeStats will be able to give an even more precise picture of what’s going on.