Tim Berners-Lee and Nigel Shadbolt on the benefits of open data
TBL and Nigel Shadbolt, who are together pushing along the open data idea in government, have an article in the Times (London) of November 18 about the benefits of free data, following on from the announcement yesterday:
Data has a particular value in that you can combine it with other data to discover new things. When in 1854 John Snow took the deaths from a cholera outbreak in London and plotted them on a map, he was able to illustrate the connection between the quality of the source of water and cholera — the world changed. In March the Department for Transport released three years’ worth of data about the location of accidents involving cyclists. Within 24 hours someone had converted this data to create cycle-accident route planners that avoid the black spots.
Government data is a valuable resource that we have already paid for. We are not talking about personal data but data that tells us, for example, about the amount and type of traffic on our roads, where the accidents are, how much is spent on areas where these accidents occur. This is data that has already been collected and paid for by the taxpayer, and the internet allows it to be distributed much more cheaply than before. Governments can unlock its value by simply letting people use it. This is beginning to happen in a number of countries, notably in the US under the Obama Administration, and in June Gordon Brown asked us to advise the Government on how to make rapid progress here.
(The fun thing here being that OS would argue that its data has not been collected and paid for by the taxpayer because it’s a trading fund. Unfortunately this doesn’t hold up in front of the point that (a) almost all of its data was collected while it was not a trading fund (b) half its revenues do come from the taxpayer, in the form of licences from public organisations.)
As all of this data becomes available, we have to look for the joins between it. A new set of standards for the web is emerging that allows us to link data from different sources. Everyone knows that web pages have addresses that identify them, allowing you to navigate around and find what you want. To make the web of linked open data work we also need to give identifying addresses to the objects and properties that make up the basic information in pages, spreadsheets or databases.
Think about the practical applications. If Companies House referred to companies using these new open, uniform identifiers, then other people who needed to talk about companies could use these whenever they referred to a company. If all websites that make data available about companies point to the same identifier for a company, then it’s possible to pull that data together very easily — whether its data about stock price, a product or a director. This is one of the core principles at the heart of the web of linked data.
None of this works unless the data is there in the first place. But when it is, innovation flourishes. Maybe someone uses the web to show schools close to you and their Ofsted reports, or the planning applications that might affect you, or the allotments available to use, or the crime rates in your area. Data is beginning to drive the Government’s websites. But without a consistent policy to make it available to others, without the use of open standards and unrestrictive licences for reuse, information stays compartmentalised and its full value is lost.
So there you have it: the free data concept is right there at the heart of government, with extra semantic web power from the person who invented it. That’s good. That’s very good.
- The following posts may be related...(the database guesses):
- Sounds like a good idea: Sir Tim Berners-Lee goes to Downing Street to talk open data (15 September 2009; score: 79.84%)
- Gordon Brown announces OS maps to be free online (18 November 2009; score: 45.01%)
- Data.gov.uk: now that's what we call a result (25 January 2010; score: 40.88%)
- Today in The Guardian: Berners-Lee talks; who should we be chasing? (23 March 2006; score: 36.75%)
- OS data: 'you will like it' (23 March 2010; score: 33.97%)

November 18th, 2009 at 3:28 pm
Now for Royal Mail and PAF
November 18th, 2009 at 4:04 pm
@Richard
“Now for Royal Mail and PAF”
…and the UK Meteorological Office and the output from its numerical weather prediction models.
November 18th, 2009 at 4:27 pm
… and the Environment Agency for the Digital Rivers Network, Source Protection Zones, Floodmaps etc. etc.
November 19th, 2009 at 12:39 am
This is spot on. When I saw data being published by cities across the US I was initially excited, but then hit the format wall; these data sets got published in XLS, CSV, HTML Tables, etc. which makes it harder to integrate into apps. So I created http://elev.at which has the aim to make published information computable.
November 19th, 2009 at 9:04 am
@ Joubel. CSV, XLS and to some extent HTML are at least machine readable and fairly standard. So long as co-ordinates or other geo-referenced key is in the data, it can be used with little or no work. Supplying such data in GIS file formats, usually SHP or TAB means only minority users of appropriate GIS software can use them. Many more people have Excel.
An image of a table in a PDF is something else again.
Adrian
November 19th, 2009 at 9:07 am
From reading the reports, I am somewhat concerned that there is a distorted perception in Government that the free PSI argument is about being able to publish/view PSI over the internet and so PSI should be published in formats and means comptatible with this to cater for website developers, when internet publication is only secondary need.
Instead, to be useful to industry, industry needs PSI to be published in established and storage-efficient industry standard formats compatible with industry standard software (no properietry formats like GeoPDF). The size and nature of some of these information sets would make publication in in formats created for internet publication, would be ridiculous and unacceptable. So for example in the case of PSI with a geographical component to it, that can be though of as vector drawing objects (points, areas and lines) it should be published as ESRI Shapefiles (geographical coordinates + attribute data) + metadata on the projection parameters for the data, not bloated, inefficient GML or XML files full of empty space.
All that is required for PSI publication are FTP sites from where the raw data in industry standard formats can be efficiently downloaded and having front ends describing the nature of the information howe it was collected or created, by whom and when etc. Nothing more elaborate is needed. For mapping what is definitely not required, nor acceptable, is am internet map server, simply delivering tiled images of maps.
November 19th, 2009 at 11:09 am
While I share the euphoria generated by this announcement, there is a danger that must not be overlooked.
PSI data must pass a ‘genuine government monopoly’ test before it is made available free as PSI. Otherwise the government can grow the power of the state by making the taxpayer fund the creation of data which would otherwise be created by the private sector in a normal competitive market.
Mapping is a good example;
The smallest components of map data which are genuine government monopolies should absolutely be made available free of charge as PSI. This includes things like electoral boundaries, sites of scientific interest, names of roads, postcodes etc.
But maps are already value added products which depend on other non-government data such as aerial photography, height data, coastlines, natural features etc. over which the government has no natural monoploy.
The mapping market is now subject to vigorous competition from private sector companies such as Bartholemews, NavTeq, the GeoInformation Group (UKMaps), Intermap (NEXTMap Britain) and Getmapping (The People’s Map). This competition has generated innovation (eg Getmapping created the first national aerial photography coverage and OS had to follow suit; Intermap created a better height model and forced OS to follow suit; NavTeq and TeleAtlas created driver restriction information, and forced OS to follow with the Integrated Transport Layer).
Provided that the genuine government monopoly data is made freely available as PSI then the private sector mapping market will thrive and customers will benefit from the competition. But if OS maps are made freely available then the taxpayer will have to fund the aerial photography collection and the height data height data collection, because no private sector company will be able to compete. This would be nationalisation by another name and would result in the growth of the state and reduction of competition and choice.
November 19th, 2009 at 8:24 pm
@Tristram Cary
“PSI data must pass a ‘genuine government monopoly’ test before it is made available free as PSI.”
The monopoloy and competition issues are important, but it is not the main basis of the argument for making access to PSI free. The main basis is that PSI creation or collection is by and large funded directly or indirectly by the state from revenue generated through general taxation. Therefore those paying for it i.e., us taxpayers, should get access to it without having to pay for it again.
Since the released PSI will be available for commercial reuse, the businesses that produce mapping in competition with the OS as it is now, will be able to reap the benefits of free OS data like everyone else. They will be able to incorporate it into their own products and add value if they wish. The difference is that instead of them having few competitors, there will be potentially thousands, so profit margins will be forced down by the existing mapping business having to be more competitive. Then again, their costs will be lower or even nil.
I agree that data collection businesses like Intermap, may have to rethink their business plans, but Intermap also collects data for the US where federal mapping and GI is freely available and has been for many years. So if they can do it there they can here, without being greatly affected. Many if not most organisations and individuals cannot afford a license to geographic data collected by the private sector in the same way they cannot afford a license for that collected by the OS. Therefore arguments that they will see a reduction in customers in response to a freeing of OS data are largely unfounded. These private sector businesses will more likely see a reduction in their potential customer base, not their actual numbers of customers. They will need to, if they have not already, differentiate themselves and their products from freed PSI. This might be by providing higher accuracy, or higher resolution, or specialised products.
“But if OS maps are made freely available then the taxpayer will have to fund the aerial photography collection and the height data height data collection, because no private sector company will be able to compete.”
The taxpayer does this now, that is the whole point. However, I agree the business models of the data collection businesses may have to change. Perhaps, to being ones based on collecting data paid for the state, but which is then owned by the state. They, then in turn charge the state a fair price for performing the service. In essence they become contractors, just like construction companies charge the state for building a new road, but dont actually own the road they build. Private sector businesses that might be collecting data similar to the to-be released PSI, are only able to charge now for data they collect, because firstly an artificially maintained market was earlier created by the creation of trading funds and secondly, because they have to be sure of making a profit from their large investment, when most customers only require a small subset of the entire dataset collected.
November 20th, 2009 at 8:45 am
@ Nick,
If I’ve understood you correctly, data collection and digitisation of OS is already mainly done by contractors – Kampsax and BlomAerofilms have been doing so over the last 10 years.
Adrian
November 20th, 2009 at 11:42 am
Great News. Does this mean I can overlay Bus Stop locations (derived data) on Google maps?
November 20th, 2009 at 5:09 pm
I would argue you always have been able to, as most Bus Stops are not present on OS maps.
November 22nd, 2009 at 5:00 pm
So what are we actually going to get access to in specific technical details terms? Or isn’t it yet established? Will we (as web developers) be able to take a postcode turn it into a location? Will we be able to work out routes via roads?; that is will there be actual road route data, or just pictures of roads?
I haven’t used Google Maps (in a web development way) but it seems very embedded and accesses Google from users’ browsers. I’ve always preferred to do things more server (my server) side, but that doesn’t seem so possible with Google Maps. Is it going to be possible with what we’re going to have access to from OS?
Is there a chance that OS is going to be awkward, like they have already, and if they’re heart’s not in providing a genuinely useful service, then they can probably get in the way a bit. Is that a likely/possible thing?
November 22nd, 2009 at 5:40 pm
I would like to see all of Canada’s Statistics Canada data opened up. They have a wealth of information and they charge in the 1,000 to 10,000s for reports and raw data.
It seems like a crazy system tax payers pay for governement employees to create it and then to access it we have to pay again
November 24th, 2009 at 1:10 pm
@Ed: Re. bus stops – you’ve missed the point of the derived data issue slightly. If a local authority creates a layer of bus stop locations in its GIS using an OS base map to locate the points, it becomes derived data and subject to OS’s licensing restrictions. It doesn’t matter that the bus stop is not shown on the base OS map: what matters is that we *used* the base map.
So for instance, say a new bus stop is created on Mercator Street, about 20 yards along from its junction with Peters Road. I load up our corporate GIS, use the OS Streetview layer to find the Mercator St/Peter Rd junction, then drop a marker onto my “bus stops” layer in the appropriate place. Because I used that OS map to work out where the “appropriate place” was, my bus stops layer is now OS-derived data and I can’t just give that data away to be used without restriction.
Of course if I’d made a trip out to Mercator Street armed with a GPS unit and taken my own location reading, I’d be safe. But that’s not how most local authority geodata layers are created.
November 25th, 2009 at 9:43 am
@Andy, I am well aware of what the OS claims, however at a industry presentation in September the product management director of the OS aimed to clarify the situation saying if the feature did not appear on the map there was no issue with derived data.
Of course he was talking from the perspective of an industry expect and not a lawyer, so we must await the promised new framework on derived data which was to be published in October.
November 25th, 2009 at 11:37 am
@Ed: That’s interesting; I don’t suppose anyone was recording him at the time?… :-)
November 26th, 2009 at 9:34 am
I am really interested in the opportunities that arise from “open data” to provide citizen centre views of local services, issues (who is going to build on that playing field near you), and a channel to communicate in a relevant way to my local authority.
Is there a forum or group that is discussing these opportunities ?
(We provide solutions and services to local government to help their planning processes and citizen engagement)
November 26th, 2009 at 10:49 pm
Hi, have you seen the climate researcher are moving the debate on government data:
http://www.realclimate.org/index.php/archives/2009/11/the-cru-hack-context/
“”"What has this got to do with CRU? The data that CRU needs for their data base comes from entities that restrict access to much of their data. And even better, since the UK has submitted an exception for additional data, some nations that otherwise would provide data without question will not provide data to the UK. I know this from experience, since my nation (Iceland) did send in such conditions and for years I had problem getting certain data from the US.”"”
Given the leaked emails cited by Monbiot:
http://www.monbiot.com/archives/2009/11/23/the-knights-carbonic/
http://www.eastangliaemails.com/emails.php?eid=914&filename=1219239172.txt
http://www.eastangliaemails.com/emails.php?eid=490&filename=1107454306.txt
It looks more like an after the fact made-up excuse for bad behviour of some CRU people to me but it of course becomes interesting for freeourdata to get involved, isn’t it?
And for the well thought out big picture please read Judy Curry here:
http://www.climateaudit.org/?p=7826
November 27th, 2009 at 4:20 pm
A Cycle Accident map is at http://labs.timesonline.co.uk/blog/2009/03/11/uk-cycling-accidents/
which I’d not seen before.