Jump to ratings and reviews
Rate this book

Linked Data for Libraries, Archives and Museums: How to Clean, Link and Publish your Metadata

Rate this book
Libraries, archives and museums are facing up to the challenge of providing access to fast growing collections while managing constrained budgets. Key to this is the creation, linking, and publishing of good quality metadata, which allows library collections to be discovered, accessed and disseminated in a sustainable manner. In this handbook, metadata experts van Hooland and Verborgh introduce the core concepts of metadata standards and Linked Data, and show how they can be applied to existing metadata. Giving readers the tools and understanding to achieve maximum results with limited resources, this book covers such crucial topics as
The value of metadata
Metadata creation, including architecture, data models, and standards
Metadata cleaning
Metadata reconciliation
Metadata enrichment through Linked Data and named-entity recognition
Importing and exporting metadata
Ensuring a sustainable publishing model
This handbook delivers the necessary conceptual and practical understanding to empower institutions to make the right decisions when making their resources accessible on the Web.

224 pages, Paperback

First published June 19, 2014

10 people are currently reading
70 people want to read

About the author

Seth van Hooland

3 books1 follower

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
7 (18%)
4 stars
14 (36%)
3 stars
17 (44%)
2 stars
0 (0%)
1 star
0 (0%)
Displaying 1 - 5 of 5 reviews
Profile Image for Simon Mcleish.
Author 2 books141 followers
November 21, 2014
First posted on my blog here.

Linked Data has been a buzz word for a couple of years now, and is something which is consistently forming the topic of questions I receive at work. This is usually due to someone reading an advocacy post somewhere, which leads them to ask "Could we fix all our resource discovery problems by releasing our metadata in linked data form?" So this is a timely publication, even though much of it is just backing up the reasons I have for being cautious about advocating such an approach myself.

The authors clearly aim Linked Data for... at a non-technical librarian - and I am an IT professional who has worked in a library for most of my working life, so not their prime audience. Hooland and Verborgh are clearly working hard to deliver their material in the way which is most suited to their target group. So, for instance, while there is some description of how different approaches to metadata work (tabular, relational, structured, linked), this is really only enough to support a discussion of the pros and cons of the different methods. It is clear that this is not a book which advocates Linked Data for the sake of it, but wants to make it possible for readers to evaluate for themselves whether it is a good approach for a particular metadata collection. This is a refreshingly mature approach, as unthinking blanket promotion of the current buzz word technologies (from "Internet", "XML", to, more recently, "social" and "cloud") is one of the main reasons why people grow disillusioned, as the race to move to the new paradigm is run whether or not it is appropriate in an individual case.

The fast moving nature of the linked data community means, unfortunately, that parts of the book are already obsolete. Some of the case studies and useful websites discussed already lead to blank pages or errors, or to material which differs from the description of the text. This is inevitable in a book on this topic, but does reduce the usefulness and impact of the book.

Much of the book resonated strongly with my experience while working on projects considering using/producing linked data or actually creating it. It is clear that understanding and improving the metadata involved is absolutely key to a successful release of a linked data version of an existing data set, and so the main chapters are successively concerned with cleaning, reconciling, enriching, and publishing metadata. While the linked data is the motivating factor of the discussion, much of it is likely to be of interest to any data set manager who is looking to improve the metadata they hold. Each chapter is accompanied by a real world case study, which is useful as a pointer to how the more theoretical ideas can be implemented in a specific scenario. I did feel that some of the discussion which revolves around the use of specific software (for such tasks as enriching metadata) maybe was too tied to something which is unlikely to remain a constant, but in general the case studies are an excellent part of the book. In a few years, the software discussed may no longer be available, may have changed name, or (most likely) may have been changed and updated so that the discussion of it is not applicable any more. Any book which extensively references work under development or online has this problem, of course, so this is not a criticism specific to this book.

Having an IT rather than librarianship background, I did find that some things which I was already familiar with were treated in more detail than I needed, especially the slightly heavy-handed advocacy of REST as an API architecture in the final section on the publication of data. I suspect I would have felt this even if I wasn't already familiar with REST, so this was a rare instance of the authors of the book not getting the level of their discussion correct. This is such an overwhelming part of the publishing section that other issues which may be important (such as infrastructure requirements and the use of analytics) are basically ignored, which seemed to me to make this section less valuable. (The importance which the authors give publishing is perhaps indicated by the by the 44 pages they give it, as opposed to the more than 180 which is used to discuss the metadata aspects of linked data.)

Did the book help me to answer the questions that people throw at me? Probably not. But it does confirm that the caveats I have, which include data quality, the paucity of existing links inside the data, and the need to enrich data before publishing. The key question is why it is worth exposing a particular dataset, and the answer to this question must be to do with the value of use cases for the data and not because it's something everyone is doing. A good introduction for librarians, if falling a bit short in the final stages.
Profile Image for Joshua.
37 reviews
August 22, 2014
A nice overview, especially for non-technical readers. The case studies at the end of each chapter offer very good walkthroughs using tools like OpenRefine and SPARQL endpoints, and they do so with real world datasets which is an excellent change from most technical books that only offer dumbed-down toy datasets in their examples.
Profile Image for dejah_thoris.
1,350 reviews23 followers
August 16, 2016
Lots of good discussion and well thought-out case studies. The latter took awhile to work through, which is why it took me so long to "read" the book. The most useful part was learning how to clean metadata using OpenRefine. There were a few typos and some of the links no longer work (like those to the custom APIs for Europeana and DPLA), which can be frustrating. I found my inability to work on some of the larger data sets more so. (My work machine is a new laptop. I followed the filtering advice, but it couldn't handle working with the recommended 3% of the Powerhouse Museum data.) Fortunately, you can follow along with the authors very well, but it would be nice to have smaller data sets available as an alternative for some of the case studies.
Profile Image for Paul.
106 reviews10 followers
August 6, 2015
What a fantastic introduction to the topic. Now to read it again to see if it sinks in!
Displaying 1 - 5 of 5 reviews

Can't find what you're looking for?

Get help and learn more about the design.