This wiki is locked. Future workgroup activity and specification development must take place at our new wiki. For more information, see this blog post about the new governance model and this post about changes to the website.
DEPRECATED and inactive proposal, NOT recommended for implementation

Tracked Resource Set Specification

Draft 3 of May 13, 2011

STATUS: DEPRECATED and inactive proposal, NOT recommended for implementation

The Tracked Resource Set protocol allows a server to expose a set of resources in a way that allows clients to discover the exact set of resources in the set, to track all additions to and removals from the set, and to track state changes to all resources in the set. The protocol does not assume that clients will dereference the resources. The protocol is suitable for dealing with large sets containing a large number of resources, as well as highly active resource sets that undergo continual change. The protocol is HTTP-based and follows RESTful principles.

Terminology

  • Resource Set - an enumerable, finite, collection of Resources
  • Resource - web resource identified by URI; the Resource Set members
  • Server - party playing the role of Resource Set provider
  • Client - party playing the role of consumer; interacts with a Server to enumerate and track Resources in the Server's Resource Set
  • Tracked Resource Set - describes the set of Resources in a Resource Set, expressed as a Base and a Change Log
  • Base - portion of a Tracked Resource Set representation that lists member Resources
  • Change Log - portion of a Tracked Resource Set representation detailing a series of Change Events
  • Change Event - describes the addition, removal, or state change of a member Resource

Overview

The Server maintains a Resource Set. A Resource Set consists of a finite, enumerable set of Resources. Each Resource is identified by a URI. The Server will have its own well-defined criteria for determining the exact set of member Resources at any point in time. However, clients need not be aware of the Server's criteria, and will instead discover a Resource Set's members by interacting with the Server using the Tracked Resource Set protocol.

The Server MUST provide an HTTP(S) URI corresponding to its Resource Set. This is referred to as the Tracked Resource Set URI. (Mechanisms for discovering Tracked Resource Set URIs is outside the scope of the Tracked Resource Set specification.)

A GET request sent to the Tracked Resource Set URI returns a representation of the state of the Resource Set. A Tracked Resource Set representation characterizes the Resource Set in terms of a Base and a Change Log: the Base provides an initial approximation of the membership of the Resource Set, and the Change Log provides a time series of adjustments describing changes to members of the Resource Set. When the Base is empty, the Change Log describes a history of how the Resource Set has grown and evolved since its inception. When the Change Log is empty, the Base is an ahistorical enumeration of the Resources in the Resource Set. This hybrid base+delta form gives the Server flexibility to structure the representation in ways that are most useful to its Clients.

The Base portion of a Tracked Resource Set representation is represented as an RDF container where each member references a Resource that was in the Resource Set at the time the Base was computed. The Change Log portion is represented as an RDF collection, where the entries correspond to Change Events arranged in reverse chronological order. A “cutoff” property of the Base identifies the most recent Change Event that is already covered by the Base portion. There must not be a gap between the Base portion and the Change Log portion of a Tracked Resource Set representation; however, the Change Log portion may contain earlier Change Event entries that would be accounted for by the Base portion.

Tracked Resource Set

An HTTP GET on a Tracked Resource Set URI returns a representation structured as follows (note: for exposition, the example snippets show the RDF information content using Turtle; the actual representation of these resources “on the wire” is ordinarily RDF/XML):

@prefix oslc_trs: <http://open-services.net/ns/core/trs#> .

<https://.../myTrackedResourceSet>
  a oslc_trs:TrackedResourceSet ;
  oslc_trs:base <https://.../myResources> ;
  oslc_trs:changeLog [
    a oslc_trs:ChangeLog ; 
    oslc_trs:changes ( ... )
  ] .

A Tracked Resource Set MUST provide references to the Base and Change Log using the oslc_trs:base and oslc_trs:changeLog predicates respectively. The Change Log MUST be a local reference, allowing Servers to include the most recent Change Events as part of the Tracked Resource Set’s HTTP response. The Base portion is always in the form of an external reference (i.e., a resource URI), which requires another HTTP GET to access but is generally only of interest to a Client during initialization.

The Server SHOULD support etags, caching, and conditional GETs for Tracked Resource Set resources.

Change Log

A Change Log provides a list of changes organized in inverse chronological order, most recent first. The following example illustrates the contents of a Change Log:

@prefix oslc_trs: <http://open-services.net/ns/core/trs#> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

_:myChangeLog 
  a oslc_trs:ChangeLog ; 
  oslc_trs:changes (
    [ a oslc_trs:Creation ;
      oslc_trs:changed <https://.../WorkItem/23> ;
      oslc_trs:order "103"^^xsd:integer ;
      dcterms:identifier "2010-10-27T17:39:33.000Z#103"
    ]
    [ a oslc_trs:Modification ;
      oslc_trs:changed <https://.../WorkItem/22> ;
      oslc_trs:order "102"^^xsd:integer ;
      dcterms:identifier "2010-10-27T17:39:32.000Z#102"
    ]
    [ a oslc_trs:Deletion ;
      oslc_trs:changed <https://.../WorkItem/21> ;
      oslc_trs:order "101"^^xsd:integer ;
      dcterms:identifier "2010-10-27T17:39:31.000Z#101"
   ]) .

As shown, a Change Log provides a set of Change Event entries in a single-valued RDF collection-type property called oslc_trs:changes. An RDF collection, i.e., a linked list (reference: RDF Collections), is used in the Change Log to ensure that the entries retain their correct (inverse chronological) order.

Each Change Event has a unique identifier, dcterms:identifier, as well as a sequence number, oslc_trs:order; sequence numbers are non-negative integer values that increase over time. A Change Event entry carries the URI of the changed Resource, oslc_trs:changed, and an indication (i.e., via rdf:type) of whether the Resource was added to the Resource Set, removed from the Resource Set, or changed state while a member of the Resource Set. The first entry in the Change Log, i.e., "103" in this example, is the most recent change. As changes continue to occur, a Server MUST add new Change Events to the front of the list. The sequence number (i.e., oslc_trs:order) of newer entries MUST be greater than previous ones. The sequence numbers MAY be consecutive numbers but need not be.

Note that the actual time of change is not included in a Change Event. Only a sequence number, representing the "sequence in time" of each change, and a unique identifier are provided. The identifier of a Change Event MUST be guaranteed unique, even in the wake of a Server roll back. A time stamp MAY be used to generate such an identifier, as in the above example, although other ways of generating a unique value are also possible.

A Change Log represents a series of changes to its corresponding Resource Set over some period of time. The Change Log MUST contain Change Events for every Resource creation, deletion, and modification during that period. A Server MUST report a Resource modification event if a GET on it would return a significantly different response from previously. For a resource with RDF content, a modification is anything that would affect the set of RDF triples in a significant way. Unlike creations and deletions, a Server MAY safely report a modification event even in cases where there would be no significant difference in response.

The Server SHOULD NOT report unnecessary Change Events. A Client SHOULD ignore a creation or modification event for a Resource that is already a member of the Resource Set, and SHOULD ignore a deletion or modification event for a Resource that is not a member of the Resource Set.

Change Log Segmentation

The Change Log in the previous example consisted of a single oslc_trs:ChangeLog resource. Typically, however, the Change Log will be very large, requiring the changes to be segmented into multiple smaller oslc_trs:ChangeLog resources:

@prefix oslc_trs: <http://open-services.net/ns/core/trs#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

_:myChangeLog
  a oslc_trs:ChangeLog ; 
  oslc_trs:changes ( ... ) ;
  oslc_trs:previous <https://.../myChangeLog/1> .

As shown, the oslc_trs:previous reference is used in this case to connect to the Change Log resource containing the next group of chronologically earlier Change Events. The most recent Change Events SHOULD be included in the Tracked Resource Set itself. This allows a Client to easily discover the most recent Change Event, and retrieve successively older Change Log resources until it encounters a Change Event that has already been processed (on a previous check). The protocol does not attach significance to where a Server breaks the Change Log into separate parts, i.e., the number of entries in an oslc_trs:ChangeLog is entirely up to the Server.

Truncated Change Logs

A chain of Change Logs MAY continue all the way back to the inception of the Resource Set and contain Change Events for every change made since then. However, to avoid maintaining this ever growing list of Change Logs indefinitely, a Server MAY truncate the log at a suitable point in the chain. This can be accomplished by removing the target of an oslc_trs:previous reference and/or removing the reference itself. In either case, Clients MUST be prepared to receive HTTP Error 404, Not found, when navigating the "previous" reference from a final or stale Change Log segment.

To ensure that a new Client can always get started, the Change Log MUST contain the base cutoff event of the corresponding Base, and all Change Events more recent than it. Thus the Server is only allowed to truncate Change Events older than the base cutoff event, because these duplicate information contained in the Base. When the Base has no base cutoff event (i.e., the Base enumerates the Resource Set at the start of time), the Change Log MUST contain all Change Events back to the start of time; i.e., no truncation is allowed.

To minimize the likelihood of Clients falling too far behind and losing information, it is highly RECOMMENDED that a Server retain a minimum of seven days worth of Change Events.

Base Resources

The Base resources of a Tracked Resource Set are represented by an RDF container where each member references a Resource that was in the Resource Set at the time the Base was computed. HTTP GET on a Base URI returns an RDF container with the following structure:

@prefix oslc_trs: <http://open-services.net/ns/core/trs#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

<https://.../myResources> 
  oslc_trs:cutoffIdentifier "2010-10-27T17:39:31.000Z#101" ;
  rdfs:member <https://.../WorkItem/1> ;
  rdfs:member <https://.../WorkItem/2> ;
  rdfs:member <https://.../WorkItem/3> ;
  ...
  rdfs:member <https://.../WorkItem/199> ;
  rdfs:member <https://.../WorkItem/200> .

Each Resource in the Resource Set MUST be referenced from the container using an rdfs:member predicate. The Base MAY be broken into multiple pages, in which case the standard OSLC paging (reference: Resource Paging) mechanism is used to connect one page to the next. (Note that an OSLC queryBase satisfies these requirements, which may allow a Server to use existing queryBases as Base containers, but there is no requirement that the Tracked Resource Set Base also be a queryBase.) The Tracked Resource Set protocol does not attach significance to the order in which a Server enumerates the resources in the Base or breaks the Base up into pages.

As shown above, a Base usually provides an oslc_trs:cutoffIdentifier property, whose value is the identifier (i.e., dcterms:identifier) of the most recent Change Event in the corresponding Change Log that is already reflected in the Base. This corresponds to the latest point in the Change Log from which a Client can begin incremental monitoring/updating if it wants to remain synchronized with further changes to the Resource Set. As mentioned above, the cutoff Change Event MUST appear in the non-truncated portion of the Change Log. When the oslc_trs:cutoffIdentifier is omitted, the Base enumerates the (possibly empty) Resource Set at the beginning of time.

The Base is only an approximation of the Resource Set. A Base might omit mention of a Resource that ought to have been included or include a Resource that ought to have been omitted. For each erroneously reported Resource in the Base, the Server MUST at some point include a corrective Change Event in the Change Log more recent that the base cutoff event. The corrective Change Event corrects the picture for that Resource, allowing the Client to compute the correct set of member Resources. A corrective Change Event might not appear in the Change Log that was retrieved when the Client dereferenced the Tracked Resource Set URI. The Client might only see a corrective Change Event when it processes the Change Log resource obtained by dereferencing the Tracked Resource Set URI on later occasions.

A Server MUST refer to a given resource using the exact same URI in the Base ( rdfs:member reference) and every Change Event ( oslc_trs:changed reference) for that resource.

Resources

This section defines the resources of the Tracked Resource Set specification. Implementations MUST support RDF/XML (i.e., application/rdf+xml) and MAY support Turtle (i.e., text/turtle or application/x-turtle) representations of these resources. Normal HTTP content negotiation is used to select the representation actually used.

Tracked Resource Set Namespace

The namespace used for resources and properties defined in this specification is as follows:

Resource: Tracked Resource Set

A Tracked Resource Set provides a representation of the current state of a Resource Set.

  • Name: TrackedResourceSet
  • Type URI: http://open-services.net/ns/core/trs#TrackedResourceSet

Prefixed Name Occurs Value-type Representation Range Description
oslc_trs:base exactly-one Resource Reference n/a An enumeration of the Resources in the Resource Set.
oslc_trs:changeLog exactly-one Local Resource n/a oslc_trs:ChangeLog A Change Log providing a time series of incremental adjustments to the Resource Set.

A Base resource (i.e., target of the oslc_trs:base predicate) has the following properties:

Prefixed Name Occurs Value-type Representation Range Description
rdfs:member zero-or-many Resource Reference n/a A member Resource of the Resource Set.
oslc_trs:cutoffIdentifier zero-or-one String n/a n/a The value of dcterms:identifier of the most recent Change Log entry that is accounted for in this Base. When omitted, the Base is an enumeration at the start of time.

Resource: Change Log

A Change Log describes what resources have been created, modified or deleted, and when.

  • Name: ChangeLog
  • Type URI: http://open-services.net/ns/core/trs#ChangeLog

Prefixed Name Occurs Value-type Representation Range Description
oslc_trs:changes exactly-one Local Resource n/a rdf:List The list of Change Event entries, ordered by decreasing Change Event oslc_trs:order. Events that occurred later appear earlier in the list.
oslc_trs:previous zero-or-one Resource Reference oslc_trs:ChangeLog The continuation of the Change Log, containing the next group of chronologically earlier Change Events.

Each entry in an oslc_trs:changes list is an anonymous resource (blank node) representing a Change Event with the following properties:

Prefixed Name Occurs Value-type Representation Range Description
rdf:type exactly-one Resource n/a oslc_trs:Creation, oslc_trs:Modification, oslc_trs:Deletion The type of the Change Event.
oslc_trs:changed exactly-one Resource Reference any The Resource that has changed.
dcterms:identifier exactly-one String n/a n/a The unique identifier for the Change Event.
oslc_trs:order exactly-one Non-negative Integer n/a n/a The sequence in time of the Change Event.

Client Behavior

This section describes one (relatively straightforward) way that a Client can use the Tracked Resource Set protocol to build and maintain its own local internal representation of a Server’s Resource Set.

Initialization procedure

A Client wishing to determine the complete collection of Resources in a Server's Resource Set, so that it can build its own local internal representation, proceeds as follows:

  • Send a GET request to the Tracked Resource Set URI to retrieve the Tracked Resource Set representation to learn the URI of the Base.
  • Use GET to retrieve successive pages of the Base, adding each of the member Resources to the Client's local internal representation of the Resource Set.
  • Invoke the Incremental Update procedure (below). The sync point event is either the oslc_trs:cutOffIdentifier property (on the first page of the Base) or the beginning of time (when the Base has no trs:cutOffIdentifier property). A clever Client might run this step in parallel with the previous one in an effort to prevent the case where the Client can’t catch up to the current state of the Resource Set using the Change Log (after initial processing) because initial processing takes too long.

The overall work to build the local internal representation of the Resource Set is linear in the size of the Base plus the number of Change Events that occurred after the base cutoff event. The Server can help Clients building new local internal representations of its Resource Set by providing as recent a Base as possible, because that means the Client will have to process fewer Change Events. It is entirely up to the Server how often it computes a new Base, if ever. It is also up to the Server how it compute the members of a Base, whether by enumerating its Resource Set directly (e.g., by querying an underlying database), or perhaps by coalescing its internal change log entries into a previous base.

Incremental Update procedure

Suppose now that a Client has a local internal representation of the Server's Resource Set that is accurate as of a particular sync point event known to the Client. A Client wishing to update its local internal representation of the Server's Resource Set acts as follows:

  • Send a GET request to the Tracked Resource Set URI to retrieve the Tracked Resource Set representation to learn its current Change Log.
  • Search through the chain of Change Logs from newest to oldest to find the sync point event. The incremental update fails if the Client is unable to locate the sync point.
  • Process all Change Events after the sync point event, from oldest to newest, making corresponding changes to the Client's local internal representation of the Resource Set. Record the latest event processed as the new sync point event. A clever Client might record (some number of) recently processed events for possible future undo.

When the procedure succeeds, the Client will have updated its own local internal representation of the Server's Resource Set to be an accurate reflection of the set of resources as described by the retrieved representation of the Tracked Resource Set. Of course, the Server’s actual Resource Set may have undergone additional changes since then. While the Client may never catch up to the Server, it can at least keep its local internal representation of the Resource Set almost up to date. By choosing the interval at which it polls for updates, a Client controls how long the two are allowed to drift apart. The overall work to maintain the local internal representation of the Resource Set is linear in the length of the Change Event stream.

In the (hopefully rare) situation that the Client fails to find its sync point event, one of two things is likely to have happened on the Server: either the Server has truncated its Change Log, or the Server has been rolled back to an earlier state.

A Client can detect a Server rollback when it notices the Server reusing a range of event sequence numbers that it used before but with distinct event identifiers. If the Client had been retaining a local record of previously processed events, the Client may be able to work out a substitute sync point event, undo changes to its local internal representation back to that sync point, and then pick up processing from there.

Once the Incremental Update procedure fails, it is unlikely to succeed in the future. The Client has reached an impasse. The Client’s only way forward is to discard its local internal representation and start over.

References

Topic revision: r11 - 28 Nov 2012 - 22:32:16 - SteveSpeicher
 
This site is powered by the TWiki collaboration platform Copyright � by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Contributions are governed by our Terms of Use
Ideas, requests, problems regarding this site? Send feedback