HistoryViewLinks to this page 2013 May 24 | 12:14 pm

In-progress Working document - supporting specification development for the next version of PM. This is not an official list of committed items but a live working document to help prioritize and elaborate on the specification efforts.

Contents


The purpose of this page is to collect the architectural direction as driven by scenario priorities for the next version of PM specifications (V1). It also contains any issues left unaddressed in previous version(s), if any.

Items for Consideration for 2.0

Themes

  • Low entry point

Others to consider:

  • Specific examples are intended to be exemplary, not limiting. The assumption is, that unless something is specifically excluded in normative text (MUST NOT), it is “possible”. The intent is to allow a broad range of alternative implementations and implementation decisions.
  • Re-use existing material when possible, rather than defining new, especially vocabularies widely used in the global RDF community, like Dublin Core.
  • Enable whatever is defined-new here for wide re-use by other specifications, for potentially unforeseen scenarios.
  • Avoid using reification, even though it is allowed by Core 2.0 Appendix C
  • Use of DateTime (from XML Schema 1.0) vs DateTimeStamp (from XML Schema 1.1)
  • Linkage between Perf Mon Record and the resource its metrics “describe”

Item Details

Re-use of Other Vocabularies

Potential re-use of software estimation and measurement work from the Software Project Management working group, which they call EMS.

  1. EMS-defined metric URIs: see especially
    1. ems:Metric = a name for something that can be measured. For example, duration is the EMS metric that measures the amount of time that a project takes.
    2. ems:UnitOfMeasure = the units associated with the specific measured value of a metric. For example, project duration is typically measured in units of months or years.
    3. NASA’s QUDT vocabulary defines a substantial set of metrics that EMS metrics have a defined relationship to QUDT, a few of which appear to be ripe for re-use in Performance Monitoring, like the following. Diving directly in QUDT is not trivial, so the EMS page introductory material may be useful.
      1. Dimensionless for counts, like database connections
      2. DimensionlessRatio for percentages like CPU utilization
      3. Frequency if we need to expose how often a particular metric is collected
      4. Time as an alternative to EMS for durations and timestamps. Would need to decide between them if both are viable for our scenarios.
    4. ems:TimeMetrics: see if Duration, Start/Finish times make sense to re-use
  2. Representing the value of a particular value, e.g. CPU utilization is 95%, or an average of 200.5 database connections are in use
    1. Metric Entities gives a bit of context for EMS’s approach to things.
    2. ems:Measure = a single measured value of a metric in specific unit: contains title, name of metric, name of units of measure, numeric value
    3. EMS metric values appear to be all double (timestamps are represented as Unix time). For averages etc. suspect PerfMon will need floats; note however that the lexical space of double encompasses the lexical spaces of integer and float, so double actually covers all numeric data. EMS recommends rendering boolean values as 0/1 for client simplicity rather than introducing another predicate.

Potential re-use of W3C Recommendation-track linked data vocabularies, or (if more applicable) the inputs they reference.

  1. W3C Government Linked Data WG - the inputs/outputs for the “statistical cube” vocabulary may be relevant. Find drafts off the WG wiki, and some others (that may eventually migrate to the WG wiki page) on their vocabulary discussion wiki page.
    1. Not very mature at this point, only at First Public Working Draft status.
    2. DBPedia is popular (RDF extract of Wikipedia) in Linked Data circles for general terms.

Use of DateTime

  • xsd10:DateTime allows omission of timezone. Default is to interpret it as the timezone of the server.
  • Clients may be in different time zone from the server, making correct interpretation of timestamps impossible without out of band knowledge.
  • Different clients, each in a distinct time zone, may interact with the same resource.
  • Hence, all timestamps lacking an explicit time zone facet are ambiguous in the absolute sense.
  • xsd11:DateTimeStamp is a new data type introduced to close this gap. It is the same as xsd10:DateTime except that it makes the time zone facet mandatory.
  • Other specs like SPARQL have not yet been updated to include DTS, so the Core WG felt it was premature of us to incorporate it directly. They recommended that we continue using DateTime and have the spec require a time zone facet in prose instead.

Linkage

  • By their nature Perf Mon Records are data “about” something else … IT component in our scenarios, but could be anything measurable.
  • We want everything we define here to be equally re-usable in other domains.
  • Defining types for all the “things” (types) that a PMR could describe is not a direct requirement of the scenarios.
  • The Reconciliation WG already has a proposal for common types like Computer System, process, disk, etc.
  • PM does need to define some way to link a PMR to what it describes in order to satisfy the scenarios, even if it does not really care what type is used as the object of the link, unless it imposes an implementation requirement for multi-typing (the PMR is also of-type “thing that PMR describes”). While current implementations can tolerate that, it seems over-reaching to impose multi-typing on all implementations.
  • We have no obvious scenario where we need to access/find a PMR by starting with “the thing” and following a link to the PMR, so there is no obvious need to define a reverse predicate.
  • Given SPARQL support or equivalent, the PMR->thing link could be followed in the reverse direction without defining a specific predicate for that.
  • Core WG is still wrestling with general guidance on bidi linking because of the expectations it tends to create about coherency (when a link in one direction exists, many people expect that the inverse link MUST also exist or there is an error).
  • Absent a specific scenario, we will defer defining a link in the opposite direction.

Specification backlog

  • Averages are underspecified
    • Not all averages are created equal:
      • arithmetic, geometric, …
      • duration, number of values
      • measured, sampled
    • How does a client find out which of these combinations, and possibly others, was used to construct any particular average?
    • Current scenarios do not express this need, so the syntax does not address it. Likewise, the metrics associated with a PMR are Gauges vs. Counters.
  • Current PMR approach does not handle large tables well
    • Because each point (ems:observes) is fully self-defining, it is well-suited for PMRs that contain a single point-in-time set of distinct values, i.e. the current scenarios.
    • For the same reason, it would duplicate information (unit of measure, metric type, … all but value) in the case were the PMR is exposing a set of such observations, for example a set of similar observations at specific timed intervals. EMS offers fact table as a more efficient way of representing this.
      • Dimensions would be things like: status = successes, failures, total.
      • ems:Measurement is an EMS analog to a Perf Mon Record, showing how current PMR-style ems:observes and possibly-future-PMR fact tables can be exposed.
      • EMS REST API Data Model provides a bit of context