Free Our Data: the blog

A Guardian Technology campaign for free public access to data about the UK and its citizens


Archive for the 'General' Category

PDFs are bad for open government, says Sunlight Foundation in US

Saturday, November 7th, 2009

This is always worth remembering:

Government releasing data in PDF tends to be catastrophic for Open Government advocates, journalists and our readers because of the amount of overhead it takes to get data out of it. When a government agency publishes its data and documents as PDFs, it makes us Open Government advocates and developers cringe, tear our hair out, and swear a little (just a little). Most earmark requests by members of congress are published as PDF files of scanned letters, leading the Sunlight Foundation and others to write custom parsers for each letter.

I know that a lot of the efforts going on in the data.gov.uk channels are about finding effective ways of parsing data. The hope has to be though that very little of that involves finding ways of reversing data that has been output to PDF. The point being of course that turning PDF into useful data is, in the famous quote, “about as easy as turning hamburger into cow”.

Back to the Sunlight Foundation again:

Here at Sunlight we want the government to STOP publishing bills, and data in PDFs and Flash and start publish them in open, machine readable formats like XML and XSLT. What’s most frustrating is, Government seems to transform documents that are in XML into PDF to release them to the public, thinking that that’s a good thing for citizens. Government: We can turn XML into PDFs. We can’t turn PDFs into XML.

And another word for Flash. Ah, Flash:

Flash isn’t off the hook either. Government has spent lots of time and money developing flash tools to allow citizens to view charts and graphs online, and while we’re happy the government is interested in allowing citizens to do this, Government’s primary method of disclosure should not be these visualizations, but rather publishing the APIs and datasets that allow citizens to make their own

The comments are worth it too, such as Adrian Holovaty: “If I had a dollar for each hour I’ve spent trying to finagle raw data out of PDFs, I could afford Adobe Photoshop.”

And the rather scary one from Michael Friis: “Here in Denmark Parliament publishes many ancillary documents as PNGs.” Which is quite scary, though in line with Ordnance Survey’s tendency to release FOI requests as TIFFs.

Digital engagement, widening and public data getting analysed… in private

Friday, October 30th, 2009

Stephen Timms reports that there’s been good progress in Making Public Data Public.

As the Digital Engagement blog notes:

So far our request for developers to “get excited and make things” has so far exceeded our initial expectations. Not only is the number of people signing up to the developer forum higher (currently more than 1,300), but also the discussion board is very active with a healthy list of ideas for the site and, perhaps most excitingly, a few applications are beginning to see the light of day.

And also:

Working in partnership with Guardian Professional, we held 3 developer days hosted at The Guardian‘s Kings Place offices in central London on the 14th-16th September. As an organisation they were best placed to help us undertake this task, having built a community of talented developers and opened up their API. You can have a look here at the excellent postcode paper concept and the rather wonderful traffic data visualisations here, which were just two of the many ideas for applications that emerged over the course of the camp. Ideas about their priorities for further data releases (to add to the 1,100 datasets currently on the site) were shared and important foundations for further iterations of the HMG Data site were laid.

There’s a certain irony in the fact that the sessions at the Guardian were held under such secrecy that I didn’t find out about them until the week after. More posts on that later…

Tim Berners-Lee to help UK government build single data access point

Thursday, October 29th, 2009

Computer Weekly reports that Tim Berners-Lee has been asked by the government to develop a single point of access for public data – as Stephen Timms, who has taken over where Tom Watson left off in the Cabinet Office, reports progress in “making public data public” (a concept that, when you think about it, seems a bit strange – as in “shouldn’t that have been done from the outset?”).

According to Computer Weekly, Timms told an RSA/Intellect event that

information is the “essential raw material” of a new digital society. “Government must play its part by setting a framework for new approaches to using data and ‘mashing’ data from different sources to provide new services which enhance our lives. In particular, we want government information to be accessible and useful for the widest possible spectrum of people.”

Well, minister, if that’s truly what you want, then you’ll make it free of charge, and free of copyright restrictions. It’s as simple as that. Could we suggest something like Creative Commons? The US government seems to find it amenable. .

Timms said, “We are supporting Sir Tim in a major new project, aiming for a single online point of contact for government data, and to extend access to data from the wider public sector. We want this project for ‘Making Public Data Public’ to put UK businesses and other organisations at the forefront of the new semantic web, and to be a platform for developing new technologies and new services.”

Fine words. We’d like some actions to go with them. We’re hearing plenty of sticks being wielded over how people use the net – Lord Mandelson’s threats to file-sharers, for example – but the carrots for companies to build on something that really would benefit Britain, by using British data, seems to be stuck on a really slow train.

Part of the problem, of course, is that it’s almost impossible to put a figure on how opportunity cost is lost through the lack of access to this data – whereas the music industry can much more easily point to figures it’s produced (though you may argue about their provenance) to suggest precisely how much harm it’s suffering through untrammelled downloading.

Interesting to contrast, though, that when we asked the Royal Mail to specify precisely how much harm it was suffering through the use by ernestmarples.com of the postcode to lat/long conversion, it robustly declined to say.

Of course there is the Cambridge trading funds report, with its analysis of the opportunity cost of the trading funds regime. But this goes much wider – the Cambridge analysis didn’t look at the Royal Mail and postcodes, for example, which have become embedded into many systems’ location processing.

Computer Weekly again:

So far, 1,300 people have signed up to the developer forum and contributed to the discussion board on what the data could be used for. The Cabinet Office also held a developers’ camp where ideas were shared.

We’ll have more about the devcamp in a future post.

Kent County Council wants you to recycle its data

Tuesday, October 20th, 2009


Have a look at http://picandmix.org.uk/:

Pic and Mix aims to increase public access to Kent-related datasets including those generated by Kent County Council (KCC). For the purposes of the pilot, we have brought together a sample of the most useful information. Where possible, it’s been provided in a format that allows it to be ‘mashed’ and customised. Please help us shape this initiative by suggesting additional data and ways in which we can improve this site. And if you do anything clever with the data, we’d like you to share that with us too!

The About page has more:

Last year, Kent County Council won Innovate08! Our idea had three elements:

  • To make publicly available information – things like crime statistics, employment information, business information – more
    accessible.
  • We also wanted to provide tools that would enable people to ‘pic and mix’ data to create customised information.
  • And last but not least, we wanted to provide a platform where people could share this information and discuss ways in which it could be used.

Winning Innovate08 meant we were given funding for a pilot project to see how people in Kent would respond to a resource of this kind. Our pilot project was intially launched with 25 small Kent-based businesses. With this new site we  hope to get the wider community involved.

So, how could Pic and Mix benefit you? Well, there’s a lot of information out there in a lot of different places. Rather than spend ages tracking down the information you need, we want you to come to a single place – picandmix.org.uk. For example, you may be looking for a care home for an elderly relative. You might want to mix this information with GP locations and bus routes. By plotting this information on a map you will be able to see which care homes are close to a GP surgery, and the bus routes. Another example might be a security company deciding where to focus its marketing efforts. They may want to mix office premises with crime statistics and use the information to plan a campaign.

Fascinating. We await developments – and news of same

Time for local government to think harder about opening its data

Wednesday, September 30th, 2009

Chris Taggart gave a presentation earlier this month to APPSI – the Advisory Panel on Public Sector Information – about opening up local government data.

Even without the actual talk (is it online anywhere in some form?), the slides make compelling reading. Local government, of course, can sometimes be just as bad as central government (or indeed trading funds) about hanging grimly on to its data, enforcing dubious or unnecessary copyright, and basically making peoples’ lives hard when it should be making it easier.

You can also read my thoughts on how local government could open itself up in an article for Society Guardian here, which has attracted some useful comments – and links to interesting sites.

But now, here’s the lecture. Flash required, of course.

Do you know where your postboxes are?

Tuesday, September 15th, 2009

As an example of how getting data out there can just be plain useful, let’s return to one of the winners of the Show Us A Better Way competition (remember that?).

Prizewinner: postbox locations.

Obstacle: Royal Mail wouldn’t release the data of the location of its 116,000 postboxes.

Solution: Freedom of Information request.

Obstacle: incomplete geographic information in the response (a postcode, not long/lat, plus a mystical Royal Mail reference per box); no collection times.

Solution: FOI request for the collection times and a bit of data marriage.

Obstacle: still don’t know where the postboxes actually are.

Solution: crowdsource it! Get people to pinpoint the locations of what they think are the postboxes onto an OpenStreetMap map. So far about 26,000 have been done – have you done the ones near you?

Obstacle: Royal Mail says it still holds all the rights to the locations of the postboxes.

Solution: actually, you don’t really need a solution. Toothpaste is notoriously hard to put back into the tube.

And as Matthew Somerville pointed out to us, knowing the locations of the postboxes means that one might be able to do “travelling salesman” analyses on the routes – which could have huge potential savings for the Royal Mail. How much does it spend on fuel and time doing collections every day? How much might it save with a proper analysis? Who knows? We won’t until we see all the postboxes put in their place.

And that’s why it’s better to rely on making government data available – free, in both senses of the word – than to try to create artificial “value” from it by charging.

Price does two things: it implies that what you are pricing has value; and it puts a barrier between the thing being “sold” and its potential users. If the users don’t want it enough, they won’t ever go across the barrier. If you take down the barrier, then you get every user you could ever get. And some of them will do really useful things with your product – that’s possible if it’s data.

Sounds like a good idea: Sir Tim Berners-Lee goes to Downing Street to talk open data

Tuesday, September 15th, 2009

Well, Sir Tim Berners-Lee (he invented the web, you know) seems to be getting stuck in. He has gone to Downing Street along with Nigel Shadbolt (whose name always reminds of a Harry Potter character – apologies: he’s actually professor of artificial intelligence at the University of Southampton) to talk to Gordon Brown.

About what?

Mr Berners-Lee and Mr Shadbolt presented an update to Cabinet on their work advising the Government on how to make data more accessible to the public.

Gordon Brown has already spoken publicly about his aim of making the UK a world leader in opening up government information on the internet, an important element of Building Britain’s Future.

He could have asked us. We’d have told him back in 2006. Or 2007. Or 2008.

Sir Tim Berners-Lee told Cabinet about the goal of delivering a single online access point to Government information, similar to the one introduced by the Obama administration in the US.

Don’t we sort of have that already through the work of OPSI and its data portal? Sometimes it seems like the work of Carol Tullo and John Sheridan et al has just been swept down a plughole – or perhaps memory hole, a la 1984.

He also spoke about proposals to extend the “open data” approach, ensuring greater transparency in government and improving the efficiency of public services.

It would be interesting if the “efficiency of public services” meant “to stop different bits of government squabbling over the data they collect like children in a playground and instead start to share it freely, rather as we adults advise children to do so they can discover the benefits of sharing”.

But there’s a suspicion it’s really code for “cut public services while saying what’s being cut will be replaced by something else at some time in the future”.

The Government hopes the data project will benefit the UK by creating jobs, driving new economic growth and allowing the re-use of government data to encourage the development of new, innovative information-based businesses and services.

Hold on just a moment there. The government hopes all these things, does it? Is that because it’s taking the Cambridge study seriously, and looking at its potential benefits to the economy? So we’re not going to see terrible approximations like the OS’s “hybrid” strategy, then?

It is also expected to help increase the transparency of government and empower citizens to get more out of public service by tailoring it to their needs.

What I don’t like here is the description of it as a “data project” as though it were something that sat apart from what should actually be a process – and a core process at that. It shouldn’t be “what part of this data shall we release” but “is there any of this that shouldn’t be released?”

After the update from Sir Tim and Professor Shadbolt, The Prime Minister confirmed his full support for the next phase of their work.

It would be nice to know what that next phase included. Anyone seen a copy of the timetable?

You cannot charge for property searches, councils told, and you might have to pay some back

Thursday, August 6th, 2009

Interesting decision by the Information Commissioner: property searches are environmental data, and as such should be made available to councils under Freedom of Information regulations.

This is pretty big – particularly for estate agents.

Thanks to EPSIPlus forum for the pointer:

As the head of the IPSA noted:

The ICO has published two section 50 rulings today against Local Authorities in England.

East Riding of Yorkshire – The ICO has ruled Building Control and Traffic data is EIR and the Local Authority must make the data available in 35 days.

Stoke City Council – The ICO has ruled Building Control and Traffic data is EIR and the Local Authority must make the data available in 35 days.

Failure to comply by either Local Authority may result in the ICO making written certification of this fact to the High Court (or the Court of Session in Scotland) pursuant to section 54 of the Act and may be dealt with as a contempt of court. Data must be made available under the pricing terms of EIR. The ICO is not satisfied by the ‘made available under another means’ (CON29R requests) and the payment of a full Local Authority fee. This is because the Charging Regulations (CPSR) acts as a barrier to the data.

The Property Search Industry will now seek reimbursement of fees paid under duress / under protest. (emphasis added).

Now, that could get rather interesting. And for cash-strapped councils, not being able to charge for property searches (or even parts of them, but particularly the environmental data side of them) is going to make a difference. If anyone knows how much councils make from those charges, we’d be very interested to know more.

Free our data, says Lords info committee

Thursday, August 6th, 2009

Simon Dickson has picked up what we were remiss in missing: the Lords Information Committee. He describes it as Free our data, says Lords info committee.

He notes that its final report

couldn’t really have been more in favour of the free our bills [as pushed by They Work For You, which would show you details of bills in progress in committee] agenda.

A key recommendation, among those listed in its listed in the press release:

(I’ve copied and pasted these from puffbox.com. All credit to Simon for what’s below, apart from any mistakes in the stuff in [italics], which are my additions

  • information and documentation related to the core work of the House of Lords should be produced and made available online in an open standardised electronic format (not pdf) that enables people outside Parliament to analyse and re-use the data
  • the integration of information on Parliament’s website, eg biographical info on Members to be linked to their voting record, their register of interests, questions tabled, etc [basically, like They Work For You]
  • Bills should be presented on Parliament’s website in a way that makes the legislative process more transparent and easier to understand [=Free Our Bills]
  • an online system enabling people to sign up to receive electronic alerts and updates about particular Bills [rather like planningalerts, but for legislation]
  • a requirement on the Government to start producing Bills in an electronic format which both complies with “open standards” and is readily reusable [a bit like the Conservatives’ suggestions]
  • an online database to increase awareness of Members’ areas of expertise
  • an online debate to run in parallel with a debate in the Lords Chamber
  • greater access to Parliament for factual filming
  • a trial period during which voting in the Lords is filmed from within the voting lobbies
  • all public meetings of Lords committees to be webcast with video and audio
  • a review of the parliamentary language used in the House of Lords to make it easier for people outside the House to understand

Let’s see how it pans out. Is there time for this to be implemented before the election? Or would either of the main parties put it onto their agenda – or even manifesto?

Naughty, very naughty: Ernest Marples frees the postcodes

Saturday, July 11th, 2009

An interesting new site – ernestmarples.com/ – is trying to make postcodes free.

The people behind it (the whois details tell you that it’s registered to a location in SW1A 1AA, which happens to be Buckingham Palace) are Harry Metcalfe and Richard Pope.

They insist, when asked the question of “where does the data come from?” that

We’re not saying. But, just to be clear: we don’t hold a copy of the postcode database ourselves, neither in complete form nor as part of a cache.

But their aim is clear enough:

Post codes are really useful, but the powers that be keep them closed unless you have loads of money to pay for them. Which makes it hard to build useful websites (and that makes Ernest sad).

So we are setting them free and using them to run PlanningAlerts.com and Jobcentre Pro Plus. We’re doing the same as everyone’s being doing for years, but just being open about it.

Hopefully the Government and Royal Mail will realise the value of this service and license us to offer it officially and for free. If not, and this website gets shut down, we’ll close the websites we’ve made that make use of this site’s lookup service. Permanently.

There’s a long list of people who have supported it. We’ll add our voice. The Free Our Data campaign thinks it’s a good idea to make postcodes freely available.

OS expert isn’t Max Craglia either… so who is it?

Friday, July 10th, 2009

You’ll recall the famous scene in the film Spartacus (directed of course by the same man who went on to direct 2001: A Space Odyssey) in which the Roman troops have captured the rebel slaves, and are trying to find out which of them is Spartacus, their leader.

At which one man stands up and says “I’m Spartacus!” And another, and another…

Well, the search for the identity of Ordnance Survey’s “internationally recognised expert” who looked over its calculations for its international comparison of mapping agency funding models is like that. Only in reverse. “I’m not Spartacus!” seems to be what people are saying.

In response to a suggestion in the comments that the person in question might be Dr Max Craglia, of the Joint Research Centre of the European Commission, a specialist in geographic information policies. So we sent off a quick email to him, asking if he was the one. (Don’t know who he is? main profile, another profile.)

“I regret I am not the expert you are looking for,” he responded, sounding more like Obi Wan Kenobi in Star Wars than Spartacus.

We’ve noted this in a roundup of what also happened at the Activate 09 summit, organised by the Guardian and part-sponsored by Ordnance Survey.

Among the other issues there were whether OS’s maps are fit for 21st-century digital economy purpose (Tom Watson MP, formerly of the Cabinet Office, thinks not) and also whether National Rail – the company owned by the train operating companies, rather than the nationalised success to Network Rail – should make its train running times available for free. Since it’s private-sector data, it doesn’t fall under the FOD campaign’s “government-owned or -generated non-personal data” umbrella.

Then again, the reaction on Twitter also suggests that with so many government billions being poured into the private rail sector, it would make sense to demand the data for free as a quid pro quo. It’s an argument that does have merits.

So in the meantime does anyone have any more (realistic) suggestions for who OS’s Spartacus is?

David Cameron gives speech suggesting “setting data free”

Thursday, June 25th, 2009

David Cameron has given a keynote speech which continues to edge the Conservative party towards something that might look like the glimmer of the beginnings of the outline of the rough shape of a manifesto.

Part of it was to do with what he called “Setting data free”. See what you make of it.

In Britain today, there are over 100,000 public bodies producing a huge amount of information.

This ranges from school league tables to train timetables; from health outcomes to public sector job vacancies. Most of this information is kept locked up by the state. And what is published is mostly released in formats that mean the information can’t be searched or used with other applications, like online maps. his stands in the way of accountability.

(snip..)

… what about patient outcomes in the NHS? Some of the most important information you’ll ever need to know, how long your Dad will survive if he gets cancer, your chances of a good life if you have a stroke, all this is out of your hands.

Now, again, imagine if this information was in your hands. You’d be able to compare your local hospital with others, and do something about it if it wasn’t good enough. Choose another hospital. Voice your complaint to a patient group. Make change happen.

All this data which would help people in this country hold the powerful to account – it’s all locked away in some vault. And it’s only getting worse.


We’re going to set this data free. In the first year of the next Conservative Government, we will find the most useful information in twenty different areas ranging from information about the NHS to information about schools and road traffic and publish it so people can use it.

This information will be published proactively and regularly – and in a standardised format so that it can be ‘mashed up’ and interacted with.

What’s more, because there is no complete list that can tell us exactly what data the government collects, we will create a new ‘right to data’ so that further datasets can be requested by the public.

By harnessing the wisdom of the crowd, we can find out what information individuals think will be important in holding the state to account.

And to avoid bureaucrats blocking these requests, we will introduce a rule that any request will be successful unless it can be proved that it would lead to overwhelming costs or demonstrable personal privacy or national security concerns.

If we are serious about helping people exert more power over the state, we need to give them the information to do it. And as part of that process, we will review the role of the Information Commissioner to make sure that it is designed to maximise political accountability in our country.

The suggestion that councils will be obliged to publish data in a common format is one that Cameron has made before; and it’s something that Adrian Short’s mashthestate has been pushing (and successfully) without any political party.

The idea though of the “right to data” and the assumption that data should be available is interesting. Allied to Sir Tim Berners-Lee suggesting ways to get the data out there, it looks like all the political parties think that making data free – in the sense of untrammelled, if not yet in the sense of not-charged – is an idea whose time has come.

OS chairman’s speech: internal study shows “free” OS would cost government 500m-1bn pounds – but won’t publish

Thursday, May 14th, 2009

The following is the text – as captured in shorthand contemporaneously – of a speech by Sir Rob Margetts, chairman of Ordnance Survey on Tuesday May 12. It is not complete but does capture the major themes and quotations.

The context is that Sir Rob was explaining to an invited audience, including many existing customers of OS, how the new “hybrid” strategy had been determined as the best one for its future development. He took some pains to emphasise that the “free data” model had not been rejected out of hand; but that instead a special study had been commissioned to investigate it.

This is my shorthand notes of what was said. My own comments are at the end.

There were major issues affecting the sustainability of OS as it goes through its proposed strategy.

We examined the complete range of options very impartially and objectively. That includes the free data, utility model where you would make data available to anybody [for free]. We examined the fully commercial model.. and alternatives within that range.

Our study of the utility [free data] model was done because some hold that that is a good strategy, and some of us weren’t indifferent to it. Some [of the study team] going in thought it could be interesting.

The study was fully costed for the government, calculating the costs of change to the residual value.

We came to conclusion that the cost to government in the first five years would be between £500m and £1 billion. That wasn’t the only reason that we discarded it. We did, with outside help, a review of equivalent organisations around the world.

We wanted sustainability and high [data] quality and came to the conclusion that at nearly every organisation that had gone to free data model, the quality had declined and that users and customers were increasingly dissatisfied with the product.

And the attractiveness to staff and recruitment and retention had also reduced. We found no evidence that this model actually worked elsewhere.

Those that work had a user-pays model. We tried to understand and explain why. Think that comes to the responsiveness to needs of the organisation. [ie: the responsiveness of the organisation to needs.]

If customers are required to pay then they specify needs very clearly and give feedback on whether they have got value [for money].

Customer stimulation is a vital part of any organisation because it’s sustainable.

And of course [there’s] recruiting and retaining quality staff.. they want to work for a qulity organisation and respond to real customer needs.

That’s why we didn’t pursue [the free data model] but can affirm that we looked at it in detail.

We also looked at a fully commercial model but weren’t satisfied it would fulfil the fundamental strategy [for OS].

We believe use [of geographical data] has expanded dramatically and changed.. but that potential is still considerably underexploited.

Our No. 1 aim is to improve capacity of OS to assist the exploitation of geographic information and be one of fundamental enablers of that [exploitation] in the UK for social and individual benefit.

With the proviso that by doing that we have to keep a sustainable organisation that not only covers its costs but also has enough left over… about £20m per annum.. to invest in the products that the market needs for customers, whether private individuals or business enterprises.

Commentary: Well, we’re fascinated to learn that OS found that there’s absolutely nobody out there who is making a free data model work. We have already emailed the South African mapping organisation, about which we wrote in 2007, to find out whether they were contacted by OS, and if so what they told them.

We will also pursue Freedom Of Information enquiries to find out which organisations OS spoke to and what their responses were. Since these are all free data models, there can’t be any commercial confidentiality for the foreign organisations, can there?

The “£500m – £1bn” range is extremely wide, and we’d like to see the detailed working. I asked the minister with responsibility for OS, Iain Wright, who was there, if he would order OS to release its full study. He said that if there weren’t any commercial-in-confidence implications… I wonder if we’ll see it? Again, we’ll ready some FOI requests.

There were questions at the end, and one interesting one came from Bob Barr, who pointed out that there is always the possibility of “pay to change” – that when you have a database of 460m features with (to give the statistics that Vanessa Lawrence, OS’s chief executive, read) 5,000 changes daily, why not charge those who are changing it? (We’ve looked at that model before, though I would like to see some more recent Land Registry figures.)

Here’s the question as I recorded it.

Robert Barr: “this hybrid financing.. it seems to be today that payment will be at the point of use. Usually [in other online systems] there’s a model where you pay to change the database. Doesn’t it make sense for data to be paid for where you change it?”

Peter ter Harr of OS: “This is a model we have been looking at. There are advantages and disadvantages. It’s not always the user who pays [in the current model]. There are many OS products which are free at the point of use. It’s the information provider who puts it online who pays. We have been looking at the model in various other countries. It works well in cases where it’s part of the statutory process.”

And that’s it? We really, really need to see that OS internal study, as it contradicts pretty much every study that’s been published. It’s going to be fascinating tracking it down.

One other thing: the cost to the government isn’t quite the same as the benefit to the economy, nor the eventual benefit to the government through taxation. It was the latter (actually, both) that the Cambridge study looked at. We are perfectly happy to generate tweaked versions of the “free data” model that could keep OS charging for some products (such as MasterMap) while freeing other data sets. Now that would be a truly hybrid model.

If anyone has had sight of that OS study, or any part of it, do please drop me an email at charles.arthur@gmail.com. Or upload it to Wikileaks and let us know. We think it’s so important it ought to be out there, not locked away in an OS cupboard.

February 10: come to the public debate on free data, government agencies and copyright

Friday, January 30th, 2009

On Tuesday 10 February at 5pm, the think tank Policy Exchange will be hosting a debate entitled Free Our Data? Government Agencies and Copyright, in Westminster.

Speakers will include the Technology Guardian editor, Charles Arthur, a co-founder of the Free Our Data campaign; Adam Afriyie, shadow minister for innovation; and Ed Parsons, geospatial technologist at Google and former chief technology officer at Ordnance Survey. We’re hopeful that there will be at least one other speaker, from the public sector.

If you would like to attend, email events@policyexchange.org.uk. (You’ll discover the precise venue then!)

A quick roundup to start the new year

Thursday, January 1st, 2009

Hope you’ve all come through the new year without suffering too many leap-last-year problems. I thought it would be interesting to round up a few things that I’ve seen but not really had enough brainpower to turn into anything more than notes.

First, Public Data Sets stored on Amazon Web Services. (Via Richard Allan.) An interesting idea: got public datasets? Well, why not get them stored somewhere really cheap where people can access them but you only pay per download. It’s the ultimate outsourcing, and you also get to see how many people are downloading it without the capital costs of the servers.

Public Data Sets on AWS provides a centralized repository of public data sets that can be seamlessly integrated into AWS cloud-based applications. AWS is hosting the public data sets at no charge for the community, and like all AWS services, users pay only for the compute and storage they use for their own applications. An initial list of data sets is already available, and more will be added soon.

It’s already got the Human Genome and the US Census data. The idea of hosting UK public datasets on AWS was floated in the Cambridge Economics report released with the Budget back in March. Any takers?

Second, Municipalities open their GIS systems to citizens (thanks, Gerry Gavigan, who points out that “As well as innovation and the other usual unexpected benefits, it points to the existence, alas without quantification, of financial benefits.”) The article explains:

For instance, the online burning permit sales service of the Minnesota Department of Natural Resources (DNR) allows citizens to declare precisely where they would like to burn woody debris. High precision is essential in deciding whether a permit is obtainable, as well as when and under what conditions: if there is a high fire risk in the area and day for which a user asks for a permit, the software must refuse it. The Web site, however, makes it easy to enter the location with the greatest possible resolution: users first type an address into a form to get an approximate location on the map, then zoom at will and finally click on the exact spot for which they are applying for a permit.

And, more pertinently:

The success of initiatives like OpenStreetMap or the availability of Yahoo! and Google Maps APIs may make you think that people may create services like these and many more all by themselves, without getting any bureaucrat involved. However, in order to benefit the most from digital maps and other spatial data, citizens need such data to be officially inserted in, and completely integrated with, the maps and databases public administrations use to plan roads, zoning, and everything else.

Citizens may use Web sites like those mentioned here to request services as different as bus stops, trekking permits, or new post offices. Other uses may include signalling construction abuses, damages to public property, or illegal dumpsters. We may draw our preferred public bus routes on a map in our City Council official Web site.

Of course, to make all this work in practice, public administrations should also clarify the data ownership situation. Who owns data directly and freely provided from citizens? What license should apply to those data or any derived ones? This, however, is a separate issue, not really related to open source software.

And finally, some interesting questions being asked in Parliament by John Howell (of the Tories) about Ordnance Survey income from local authorities, and on use of OS vector data for commercial use (and, previously, about discussions between OS and Google over mapping licences; Ed Parsons, formerly OS and now Google, says the minister’s answer is wrong); while Mike Gapes, a Labour MP, about London mapping payments to OS and payments to OS for use of its data by various government authorities.

We’d be interested in any comments on what Gapes and Howell are trying to unearth here… and of course your comments on anything else. And happy new year!