Linked Data Basic Profile 1.0 - Use Cases and Requirements

This document describes the motivation, scope, use cases, and requirements for best practices and simple approach for a Linked Data architecture, based on HTTP access to web resources that describe their state using RDF.

Status of This Document

1. Scope and Motivation

Linked Data was defined by Tim Berners-Lee with the following [4Rules]

1. Use URIs as names for things

2. Use HTTP URIs so that people can look up those names.

3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL)

4. Include links to other URIs. so that they can discover more things.

These four rules have proven very effective in guiding and inspiring people to publish Linked Data on the web. The amount of data, especially public data, available on the web has grown rapidly, and an impressive number of extremely creative and useful “mashups” has been created using this data as result.

There has been much less focus on the potential of Linked Data as a model for managing data on the web - the majority of the Application Programming Interfaces (APIs) available on the Internet for creating and updating data follow a Remote Procedure Call (RPC) model rather than a Linked Data model.

If Linked Data were just another model for doing something that RPC models can already do, it would be of only marginal interest. Interest in Linked Data arises from the fact that applications with an interface defined using Linked Data can be much more easily and seamlessly integrated with each other than applications that offer an RPC interface. In many problem domains, the most important problems and the greatest value are found not in the implementation of new applications, but in the successful integration of multiple applications into larger systems.

Some of the features that make Linked Data exceptionally well suited for integration include:

A single interface – defined by the HTTP methods – that is universally understood and is constant across all applications. This is in contrast with the RPC architecture where each application has a unique interface that has to be learned and coded to.
A universal addressing scheme – provided by HTTP URLs – for both identifying and accessing all “entities”. This is in contrast with the RPC architecture where there is no uniform way to either identify or access data.
A simple yet extensible data model – provided by RDF – for describing data about a resource in a way which doesn’t require prior knowledge of vocabulary being used.

Experience implementing applications and integrating them using Linked Data has shown very promising results, but has also demonstrated that the original four rules defined by Tim Berners-Lee for Linked Data are not sufficient to guide and constrain a writable Linked Data API. As was the case with the original four rules, the need generally is not for the invention of fundamental new technologies, but rather for a series of additional rules and patterns that guide and constrain the use of existing technologies in the construction of a Basic Profile for Linked Data to achieve interoperability.

The following list illustrates a few of the issues that require additional rules and patterns:

What URLs do I post to in order to create new resources?
How do I get lists of existing resources, and how do I get basic information about them without having to access each one?
How should I detect and deal with race conditions on write?
What media-types/representations should I use?
What standard vocabularies should I use?
What primitive data types should I use?

A good goal for the Basic Profile for Linked Data would be to define a specification required to allow the definition of a writable Linked Data API equivalent to the simple application APIs that are often written on the web today using the Atom Publishing Protocol (APP). APP shares some characteristics with Linked Data, such as the use of HTTP and URLs. One difference is that Linked Data relies on a flexible data model with RDF, which allows for multiple representations.

2. Use Cases

This section collects a limited number of high-level use cases to illustrate the need for a Basic Profile for Linked Data, both for data consumption and for the creation and modification of data. The W3C maintains a repository of Semantic Web Case Studies and Use Cases [SEMWEB-UC]. Some of that material may apply to the Use Cases for the Basic Profile for Linked Data and will be called out in the following sections where appropriate.

Many of us have multiple email accounts that include information about the people and organizations we interact with – names, email addresses, telephone numbers, instant messenger identities and so on. When someone’s email address or telephone number changes (or they acquire a new one), our lives would be much simpler if we could update that information in one spot and all copies of it would automatically be updated. In other words, those copies would all be linked to some definition of “the contact.” There might also be good reasons (like off-line email addressing) to maintain a local copy of the contact, but ideally any copies would still be linked to some central “master.”

Agreeing on a format for “the contact” is not enough, however. Even if all our email providers agreed on the format of a contact, we would still need to use each provider’s custom interface to update or replace the provider’s copy, or we would have to agree on a way for each email provider to link to the “master”. If we look outside our own personal interests, it would be even more useful if the person or organization exposed their own contact information so we could link to it.

What would work in either case is a common understanding of the resource, a few formats needed, and access guidance for this resources. This would support how to acquire a link to a contact, and how to use those links to interact with a contact (including reading, updating, and deleting it), as well as how to easily create a new contact and add it to my contacts and when deleting a contact, how it would be removed from my list of contacts. It would also be good to be able to add some application-specific data about my contacts that the original design didn’t consider. Ideally we’d like to eliminate multiple copies of contacts, there would be additional valuable information about my contacts that may be stored on separate servers and need a simple way to link this information back to the contacts. Regardless of whether a contact collection is my own, shared by an organization, or all contacts known to an email provider (or to a single email account at an email provider), it would be nice if they all worked pretty much the same way.

2.2 Keeping Track of Personal and Business Relationships

In our daily lives, we deal with many different organizations in many different relationships, and they each have data about us. However, it is unlikely that any one organization has all the information about us. Each of them typically gives us access to the information (at least some of it), many through websites where we are uniquely identified by some string – an account number, user ID, and so on. We have to use their applications to interact with the data about us, however, and we have to use their identifier(s) for us. If we want to build any semblance of a holistic picture of ourselves (more accurately, collect all the data about us that they externalize), we as humans must use their custom applications to find the data, copy it, and organize it to suit our needs.

Would it not be simpler if at least the Web-addressable portion of that data could be linked to consistently, so that instead of maintaining various identifiers in different formats and instead of having to manually supply those identifiers to each one’s corresponding custom application, we could essentially build a set of bookmarks to it all? When we want to examine or change their contents, would it not be simpler if there were a single consistent application interface that they all supported? Of course it would.

Our set of links would probably be a simple collection. The information held by any single organization might be a mix of simple data and collections of other data, for example, a bank account balance and a collection of historical transactions. Our bank might easily have a collection of accounts for each of its collection of customers.

2.3 System and Software Development Tool Integration

System and software development tools typically come from a diverse set of vendors and are built on various architectures and technologies. These tools are purpose built to meet the needs for a specific domain scenario (modeling, design, requirements and so on.) Often tool vendors view integrations with other tools as a necessary evil rather than providing additional value to their end-users. Even more of an afterthought is how these tools’ data -- such as people, projects, customer-reported problems and needs -- integrate and relate to corporate and external applications that manage data such as customers, business priorities and market trends. The problem can be isolated by standardizing on a small set of tools or a set of tools from a single vendor, but this rarely occurs and if does it usually does so only within small organizations. As these organizations grow both in size and complexity, they have needs to work with outsourced development and diverse internal other organizations with their own set of tools and processes. There is a need for better support of more complete business processes (system and software development processes) that span the roles, tasks, and data addressed by multiple tools. This demand has existed for many years, and the tools vendor industry has tried several different architectural approaches to address the problem. Here are a few:

Implement an API for each application, and then, in each application, implement “glue code” that exploits the APIs of other applications to link them together.
Design a single database to store the data of multiple applications, and implement each of the applications against this database. In the software development tools business, these databases are often called “repositories.”
Implement a central “hub” or “bus” that orchestrates the broader business process by exploiting the APIs described previously.

It is fair to say that although each of those approaches has its adherents and can point to some successes, none of them is wholly satisfactory. The use of Linked Data as an application integration technology has a strong appeal. [OSLC]

2.4 Library Linked Data

The W3C Library Linked Data working group has a number of use cases cited in their Use Case Report. [LLD-UC] These referenced use cases focus on the need to extract and correlate library data from disparate sources. Variants of these use cases that can provide consistent formats, as well as ways to improve or update the data, would enable simplified methods for both efficiently sharing this data as well as producing incremental updates without the need for repeated full extractions and import of data.

2.5 Municipality Operational Monitoring

Across various cities, towns, counties, and various municipalities there is a growing number of services managed and run by municipalities that produce and consume a vast amount of information. This information is used to help monitor services, predict problems, and handle logistics. In order to effectively and efficiently collect, produce, and analyze all this data, a fundamental set of loosely coupled standard data sources are needed. A simple, low-cost way to expose data from the diverse set of monitored services is needed, one that can easily integrate into the municipalities of other systems that inspect and analyze the data. All these services have links and dependencies on other data and services, so having a simple and scalable linking model is key.

2.6 Healthcare

For physicians to analyze, diagnose, and propose treatment for patients requires a vast amount of complex, changing and growing knowledge. This knowledge needs to come from a number of sources, including physicians’ own subject knowledge, consultation with their network of other healthcare professionals, public health sources, food and drug regulators, and other repositories of medical research and recommendations.

To diagnose a patient’s condition requires current data on the patient’s medications and medical history. In addition, recent pharmaceutical advisories about these medications are linked into the patient’s data. If the patient experiences adverse affects from medications, these physicians need to publish information about this to an appropriate regulatory source. Other medical professionals require access to both validated and emerging effects of the medication. Similarly, if there are geographical patterns around outbreaks that allow both the awareness of new symptoms and treatments, this information needs to quickly reach a very distributed and diverse set of medical information systems. Also, reporting back to these regulatory agencies regarding new occurrences of an outbreak, including additional details of symptoms and causes, is critical in producing the most effective treatment for future incidents.

3. Requirements

4. References

4Rules: http://www.w3.org/DesignIssues/LinkedData.html
Dublin Core: Dublin Core Metadata Initiative Terms, DCMI Recommendation, 11 October 2010. This version is http://dublincore.org/documents/2010/10/11/dcmi-terms/. The latest version is http://dublincore.org/documents/dcmi-terms/.
DC-RDF: Expressing Dublin Core metadata using the Resource Description Framework (RDF), M. Nilsson and all, 14 January 2008, http://dublincore.org/documents/2008/01/14/dc-rdf/. Latest available at: http://dublincore.org/documents/dc-rdf/.
GLD: Government Linked Data, W3C, 9 February 2012, http://www.w3.org/2011/gld/wiki/Main_Page
LinkedData: Linked Data at W3C http://www.w3.org/standards/semanticweb/data
LLD-UC: W3C Library Linked Data Use Case Report http://www.w3.org/2005/Incubator/lld/wiki/UseCaseReport
RDF: Resource Description Framework (RDF) http://www.w3.org/TR/rdf-concepts/
RDF-MT: RDF Semantics, P. Hayes, Editor, W3C Recommendation, 10 February 2004, http://www.w3.org/TR/2004/REC-rdf-mt-20040210/.
RDF-REST: RDF Simple Data Interface Protocol – Level Zero, Latest version available at: http://www.w3.org/2001/sw/wiki/index.php?title=REST
REST: Representational State Transfer (REST), R. Fielding, Ph.D. dissertation, 2000.
IANA: Internet Assigned Numbers Authority (IANA) MIME Media Types http://www.iana.org/assignments/media-types/index.html.
RFC2616: Hypertext Transfer Protocol - HTTP/1.1, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, T. Berners-Lee, June 1999. Available at http://www.ietf.org/rfc/rfc2616.txt.
RFC3986: Uniform Resource Identifier (URI): Generic Syntax, Berners-Lee, Fielding, Masinter, January 2005.
RFC3987: Internationalized Resource Identifiers (IRIs), Duerst, Suignard, January 2005.
SEMWEB-UC: Semantic Web Case Studies and Use Cases, W3C, 9 February 2012.
SPARQL: SPARQL Query Language for RDF, E. Prud'hommeaux, A. Seaborne, Editor, W3C Recommendation, 15 January 2008, http://www.w3.org/TR/2008/REC-rdf-sparql-query-20080115/. Latest version available at http://www.w3.org/TR/rdf-sparql-query/.
SPARQL-HTTP: SPARQL 1.1 Graph Store HTTP Protocol, Latest version available at http://www.w3.org/2009/sparql/docs/http-rdf-update/
WEBARCH: Architecture of the World Wide Web, Volume One, N. Walsh, I. Jacobs, Editors, W3C Recommendation, 15 December 2004, http://www.w3.org/TR/2004/REC-webarch-20041215/. Latest version available at http://www.w3.org/TR/webarch/.

Linked Data Basic Profile 1.0 - Use Cases and Requirements

W3C Member Submission 20 March 2012

Abstract

Status of This Document

Table of Contents

1. Scope and Motivation

2. Use Cases

2.2 Keeping Track of Personal and Business Relationships

2.3 System and Software Development Tool Integration

2.4 Library Linked Data

2.5 Municipality Operational Monitoring

2.6 Healthcare

3. Requirements

4. References

5. Acknowledgements

Linked Data Basic Profile 1.0 - Use Cases and Requirements

W3C Member Submission 20 March 2012

Abstract

Status of This Document

Table of Contents

1. Scope and Motivation

2. Use Cases

2.1 Maintaining Social Contact Information

2.2 Keeping Track of Personal and Business Relationships

2.3 System and Software Development Tool Integration

2.4 Library Linked Data

2.5 Municipality Operational Monitoring

2.6 Healthcare

3. Requirements

4. References

5. Acknowledgements