Redmond Diary

By Andrew J. Brust

Blog archive

Testimony to the New York City Council on Intro 991, Proposed Legislation for Open Data Standards

Good afternoon. My name is Andrew Brust. I am the Chief, New Technology at twentysix New York, a consultancy specializing in application development, business intelligence and other software technologies. I am also a native New Yorker and former technology professional with the City of New York. In the mid and late 1980s, I was a programmer for the Department of Parks and Recreation and later I was the Computer Systems Director at the Department of Cultural Affairs. Thank you for allowing me to read my testimony today; I'm sure you can understand that given my career history and my current position, I have great interest in this legislation.

The language in Intro 991 seems to speak implicitly to a number of important features, advantages and a technology premise for the City's data sharing platform. But a number of these points deserve to be called out explicitly, so I hope it's OK that I do so briefly here. Beyond those points, there are some less obvious, but equally important, capabilities that I'd be grateful if you'd consider, and I will mention them briefly here as well.

Outbound Interface and Content
Let's start with the interface the system will provide to its users and consumers. I think it's incredibly important that the system provide data in a relatively raw form that developers can work with, rather than in a full-blown end-user interface. The reason why should be clear: developers will provide and produce interfaces and integrated services that use and serve the data. Should the City or various of its agencies wish to do as well, that's fine. But the primary mission should be to provide an information platform that developers and entrepreneurs can innovate on top of.

Of course, if the data is provided in the right format, then transformation of it from machine-readable to human-readable form should be almost trivial. Today, the Atom Syndication Format, which is a particular schema within eXtensible Markup Language, or XML, is a common format for arbitrary, structured data and it can be rendered in human-readable form by most modern Web browsers. The REpresentational State Transfer, or REST, standard is arguably the most popular service protocol for allowing such data to be queried. And so I would certainly recommend that Atom and REST be supported.

But the reality is that other formats are important as well. On the data presentation side, these include JavaScript Object Notation, or JSON, as well as older formats like the longstanding comma separated values, or CSV, format. On the Web service side, Simple Object Access Protocol, or SOAP, is very important too. It was the first format supported by Web services and is still the most popular such format in the enterprise software development space. Each and all of these standards should be supported. That may sound like a tall order, but with proper design, it's in no way out of reach. The best way for multiple formats, including formats not yet introduced, to be supported, is to implement things in such a way that the data is produced in a single, flexible format that can easily be transformed and re-published in virtually any other format. Similarly, the query logic should facilitate development of "layers" of code around it that support specific service protocols.

Reading and Writing
The system should allow writing data, in addition to reading it. City residents should be able to submit a tennis permit request through this platform, or even pay a parking ticket, or a water bill, or City income tax bill. City natives should be able to request a copy of their birth certificates, and numerous other submissions should be accepted, in addition to mere queries for information.

Back on the reading side, users and systems should be able to retrieve non-structured data, including archival photographs of specific City lots, maps, titles and deeds, audio from major speeches made by the Mayor and video of Council meetings and hearings as well. Ultimately this could more than make up for the loss of WNYC-TV. The fact is that Channel 31 was a video authority of record, and the loss of it has been significant; the data platform contemplated by this bill, if it supports rich media in addition to textual data, could bring about services that fill the gap left when WNYC was sold off, and go well beyond the services that linear, broadcast television can deliver.

License Issues
Beyond the formats, protocols and content that are produced, this system will require innovations in licensing as well. The availability of the data that this platform could produce will enable unprecedented analyses, products and services, useful for both commercial and social services pursuits. But to make possible a number of different query and data visualization services, applications will need to cache, aggregate, slice and dice the system's data. To do so, they will need to stage the data in local or hosted databases and the City should expressly permit this so as not to impede the innovation that would result.

Beyond a permissive regime around the availability of data, the City will also need to allow companies to make a market, and to charge, for the value-added services they build on top of the public platform. Certainly, companies should not be charging for the mere redistribution of the data, but they should be permitted -- indeed encouraged -- to build user-friendly front-ends, interesting "mashups," innovative analyses, and inventive integrations of the platform's data.

Google Maps should be able to show where the big potholes are; Zagat should be able to indicate which restaurants have a sterling Health Department inspection record; WebMD should be able to create heatmaps showing which neighborhoods are hardest hit by an epidemic, and the New York Times ought to be able to indicate which boroughs and neighborhoods are getting the most, or least, arts funding.

Retail consultancies should be able to show which precincts are best and least served by certain types of shops. Tourists should be able to see where the cheapest hotel rooms are and where the most availability exists. Members of this Committee should be able to see how well Verizon is living up to its commitment to deploy FiOS service to all areas of all five boroughs.

Children's Aid Society should be able to illustrate where concentrations of child homelessness and abuse exist. Food for Survival should be able to show which ethnic, geographic, economic and age groups are most susceptible to hunger. And none of these organizations should have to stop and wonder whether they are using or republishing the data in some unauthorized form.

On Being Open
Back to the technical now and, to an extent, the political. Consider carefully your use of the word "open" in the title of this legislation. I think everyone can agree that all data and infrastructure under this initiative should be useable from virtually any platform, programming language and type of device. If that's what is meant by the word "open" in Intro 991's language, then all is well. But if "open" is somehow meant to connote a requirement that Open Source technologies be used to serve or consume the data, or that any software that does so be required to comply with GPL or other Open Source licensing, then we will have a huge problem on our hands.

Rather, the City and its agencies should be permitted to implement the back-end platform as they see fit, whether they do so using Java, PHP, Ruby, C#, Visual Basic or COBOL. I would imagine that agency implementations would need to be signed-off upon or certified by DoITT, but as long as they produce their output and solicit their input using the correct formats, standards, protocols and interfaces, that should meet the whatever litmus test may exist.

I'd like to close on an issue of civic pride. The City of New York is a unique municipal government within the State. Most cities are contained within counties. The City of New York, as you well know, comprises five counties, and provides the services that in other parts of the State are delivered by special districts, incorporated villages, towns, cities and counties. As such, our data standards system should serve as a model for each of these distinct types of government within New York State. So let's not just do this the right way. Let's do this in an unprecedentedly exemplary, creative and exciting way. Let's make this the time in history when the economy was down, but the great tradition of commerce and ingenuity in the City of New York was nonetheless invoked to bring about innovation, opportunity and a new standard in good government, adopted by other governments in New York, and other states.

Thank you again for the generosity of your time, attention and consideration. And, once again, good afternoon.

Posted by Andrew J. Brust on 06/29/2009

comments powered by Disqus


Subscribe on YouTube