Brief remarks on open data and the library as creative space

On October 23, 2014, I was honored to be part of a great panel discussion on the subject of open data as part of the University of North Carolina’s Open Access Week celebration. I want to thank Anne Gilliland, Scholarly Communications Officer at UNC Libraries for organizing and David Hansen, Clinical Assistant Professor & Reference Librarian at UNC Law for moderating.

At the beginning of the panel, each of us explained how open data impacts our fields. below is are my brief remarks:


I’m the digital scholarship librarian with a particular focus on digital humanities. I help UNC faculty and students incorporate technology into their research and teaching by connecting them to library resources.

So, for the purposes of this event, I want to unpack “library resources.”

enlargedThe space we are in right now, the Research Hub, represents an emerging path for research libraries. It is a recognition that libraries are creative places where work is produced, not just consumed. In that sense, it fits in with this narrative that the research library is transitioning from a collection-based mission to a production-based mission.

However, it is not as if production and collection are unconnected. Our researchers use the collection to produce their work and it has always been so. Scholars are always standing on the shoulders of giants and the research lifecycle is a feedback loop where reincarnation is very real. This is as true in the humanities as it is in the sciences.

And our behavior as a library reflects that. We continue to build our collection and it is without a doubt impressive. Increasingly though, that collection is digital. We buy – or sometimes rent – ebooks, we purchase thematic digital collections and we compliment all of this by digitizing our own special collections to make them easier to use.

These collections do a reasonably good job of recreating the analog reading experience and add value in the form of full-text searching and not having to schlep hundreds of books around.

However, we are hearing from scholars who want to use emerging technology to do things like text analysis and data mining on our collection. This could involve techniques like topic modeling, sentiment analysis or named-entity recognition across millions of pages of text. Scholars are looking for patterns across decades of newspapers, entire national libraries and hours of oral history transcripts.

Unfortunately, many of our digital collections are simply not designed to be studied this way.

So I spend time thinking about text-as-data. When vendors come to visit, it has been my job to ask about how our users might use their collection. The answers are always unsatisfactory and I know the issues are have more to do with how to monetize the data than with simply making it available. I know this is isn’t technologically difficult because, over the summer, our over worked and understaffed tech team managed to open up 4 of the libraries most popular collections from Documenting the American South to make it astonishingly easy to analyze the collections with common tools.

As a librarian, my job is to help our users find the information they need and, I need add, in the most useful format. I am very excited about the open data movement and the open access movement to which it connected, at least ideologically, and I’m looking forward to seeing how this discussion unfolds at UNC today and in the months to come.