For some months now, I’ve been following “open data” on my google alerts and there is a lot going on, even in nations like Kenya. Added to the experience and frustration I’ve gained working on Britain’s second largest public dataset – crime – and working with my excellent Chief Data Architect, I’ve come to realise that for any government, organisation or developer starting out on this journey, there ought to be in place the following principles from the outset – which I’m sorry to say, has not been our experience.
Guiding principles for open data;
- The data released by the government or public sector organisation must be 100% accurate and all fields completed. If it is not, you will raise the barriers to entry for developers who will have to spend their own precious time and money cleaning it up which they will then understandably be reluctant to share with anyone else.
- No new data fields may be added until point 1. is fully complete. If you try to run before you can walk, you and your data will fall over and public confidence in the initiative will quickly ebb.
- The provider of the data must not have a conflict of interest in providing it – i.e. they should not be releasing public data but witholding key information that they hope to make financial gain from nor should they be entitled to run a website that uses the data to advertise their business, nor should they have first sight and use of the data before the development community. Release the data yes – release a competing platform simultaneously with unfair advantages – no.
- There must be from the start an open two-way conversation between data providers and developers. The easiest way to do this is to set up a forum with an ethos of recrimination free information discovery. The much less effective way is to set up a panel of experts who meet every couple of months and have little first hand knowledge of the data.
- Visibly open metrics must be kept on the number of active developers/applications/downloads – what is growing, declining, why?
- Government must take full responsibility for the accuracy of the data it releases. If some of it is found to be wrong, say so and say when it will be fixed. Do not change files and not tell any developers or the general public. They will notice.
- Government, public sector organisations and developers must be quick to be honest and open about any mistakes in data or applications – there will be some, this is part of the learning process. The point is that mistakes get found because it is open but were hidden away when it was closed.
- Establish clear data governance rules from the outset. Central hub collectors of large datasets should not permit spoke feed-in organisations to freely change codes, numbers, files etc. – this has a knock-on effect to developers down the line whose system is then broken and has to be redesigned, again raising the cost of open data development. Data integrity needs to be relied upon. Databases rely on defined rules that are always do this and never do that.