SCM Build Integration Scenarios

The purpose of these scenarios is to provide a way for different build systems and SCM systems to interoperate.

Eventually, we would want this interoperation to go both ways; we want an SCM system to be able to initiate a build by invoking a service provided by a 'foreign' build tool, and we want that build tool to be able to exchange information with the SCM system, and control some actions in that SCM system. However, defining services provided by a build system is not in the SCM scope - not all SCM systems include a build system, and even if they did, most also interoperate with third-party build systems. For this reason, the scenarios here focus on the services that must be provided by the SCM system.

For future reference by some other workgroup, here are some of the possible Build Services.

Throughout this page, the term 'configuration' is taken to mean the state of some specific SCM configuration at some specific state in time, containing a specific set of versions of directories and files.

Specifically, we want an integration to be able to:

Allow a user to select some SCM configuration
Produce a file system image (view, work area, projection, etc.), of that configuration
Get information on the configuration contents, such as file names and other properties
Check new or modified files in to SCM (such as the results of a build), including a possible bill-of-materials
Produce a baseline, apply a tag or label, etc., to the configuration, representing a completed build
Show the differences between two builds or configurations, including:
- Differences in membership of the configuration (files added, removed, moved, etc.)
- Differences in content of specific files
- Change sets added or removed in newer build

These requirements seem to indicate the need for at least the following services:

Services offered by the SCM system

Discover available SCM services
Select a configuration
Different SCM systems provide different sets of configurations, and identify them in different ways; some of these capabilities probably need to be exposed in service discovery.
SamitMehta Would be helpful to take a look at the most commonly asked configurations for builds across various tools?
NickCrossley Also, the list of possible configurations might be very large, so the user needs ways to query for a configuration, or to select from a subset - for example, the current configurations on a branch, the configurations at some point in time, etc.
Project a given configuration to the file system
NickCrossley This needs elaboration. What options and limitations are there in file system visibility, cross-platform support, method of file system projection (actual copied vs. links, etc.), how are component dependencies represented, etc. Note that some options might be determined by the combined capabilities of both the SCM and build systems, not just one or the other - for example, some SCM systems might prefer to represent subcomponent dependencies using symbolic links on UNIX or Linux, but some build systems might not support symbolic links.
SamitMehta Nick, I believe that the issue you are raising is that once this service is called, the resulting "projection" onto the file system may not be consistently implemented across the various SCM systems. One possible outcome could be that the response to the service can provide clues to REST client tool about the "projection" - e.g. what dependencies exist between the components.
SamitMehta Something else that may be challenging is that most SCM tools provide client tools to "project" a configuration to the file system. Some of the SCM tools control these "projections" - e.g. permissions/ownership/dates, storage, track changes even in the projection etc. Some tools have built-in capabilities to optimize how these "projections" are updated with changes. Would these tools need to provide a different/new mechanism in support of this service? Or, does the service just invoke the built-in capabilities and wouldn't the REST API then have to be supported by the SCM client tool?
Robin Fuller It may make sense for some scm tools to have the RestAPI? return the actual configuration in the response stream. In this model after this is all said and done it would be 'nice' if the scm systems client would call to the RestAPI? in order to retrieve a configuration. And ultimately end up with a single scm client that was scm-system agnostic :)
Show contents of a configuration
The build system might need to know details of the files in the configuration, so that it knows what to build and how. Such details might include:
- the change sets in the configuration
- a list of the files in the configuration
- properties of the files in the configuration, possibly including if available:
  - file modify time
  - time the file was included in the configuration
  - file encoding
  - Robin Fuller also user information as to who made changes to the file, and any comments/information related to the changes (think along the lines of the build system notifying the changers of a configuration in the case of a build failure, showing what has changed, by whom, why, and when).
Check in modified or new files, optionally to be marked as 'build artifacts'
Many SCM systems are able to mark build artifacts, and this feature should be exposed. In some SCM systems, this might be related to the bill of materials (see below).
NickCrossley It is quite likely that a user would want to check in only some of the new or modified files produced by a build - for example, many users do not put intermediate .o or .obj files into SCM. In many cases the build system does not know the complete list of new or modified files; it might be aware of only the ones specifically marked as build targets. So, the SCM system itself almost certainly needs a way to find the list of new and modified files, and a way to select a user-chosen subset for checking in.
Robin Fuller I can see an argument that its the build systems (or build configurations) responsibility to be aware of what the build-artifact/output list is, and as such, the SCM system only needs to expose a PUT action for those resources. Alternatively, it could be possible to also mark the resources with a special flag (as build artifact) if supported, and then the Build system can query for files/folders of this type, and if recreated successfully by the build then send a commit/put action.
Check in a Bill-of-Materials
Many SCM systems have ways to store and report on the bill of materials from a build, and might represent this content in a special way. Other SCM systems might just store the bill of materials as a file. To allow for the difference, checking in a bill of materials should be represented as a different service, or at least as an optional flag on the file check in service.
Produce a baseline, apply a tag or label
NickCrossley Some SCM systems might have a way to generate the name for this new baseline, label, or tag; others might require the user to supply it; still others might have a combination of both.
Report on differences between the membership of two configurations
The results would indicate new, modified, removed, and renamed files or versions thereof. The service should provide for two explicitly identified configurations, and for a single explicitly identified configuration and one of two specially named configurations:
- the previous baseline or build
- the configuration at the start of the appropriate branch or stream
Report on change set differences in a configuration
As above, but the results are in terms of change sets added or removed, rather than in terms of files and file versions.
Report on the differences between two files
NickCrossley Or two versions of a file, or the versions of a file in two configurations, etc. - this needs defining in more depth. What is the expected result of such a service - a context or unified diff or some form, or a GUI comparison, or something else? Is this just a compare service, or a merge service? If a merge, is the result to be checked in? Should the service allow control over some options for the comparison, such as the treatment of white space and new lines, the file encodings, whether or ignore byte order marks, etc.?

Examples

Rational Synergy Example

This is an example of the above story using Synergy. In Synergy, a specific configuration at some point in time is represented by a versioned resource called a project version. The WVCM concept of a configuration more-or-less corresponds to the project - all possible versions of it. In common parlance, you are normally concerned with specific versions of configurations, so a project version is normally abbreviated to just project without ambiguity. In Synergy, all objects have both system- and user-defined properties; the user may read almost any property of any object (there are some security limitations possible), and may modify some of the system and all of the user defined properties of objects to which they have write access. Note that Synergy properties fall into two categories; attributes and relationships. Attributes are simply values such as strings, integers, booleans, etc. Relationships are a bi-directional named link between two specific object versions. This is relevant because although both types of data are properties, the API used to set or obtain the data is different, so the caller must know which to use. For system defined properties, we could of course hide this difference behind the API, but it would more difficult to do so for user-defined properties; this might make it difficult to have an OSLC service that does not distinguish between the two types of property.

Discover that Synergy is an available SCM service
Identify the Solaris integration build version of my project by name myproject-solaris_7.2_integration
In Synergy, the file system projection of a project is called a work area; each project version may have at most one active work area, while different versions of the same project can have different work areas - and of course it is possible for two versions of a project to have either the same contents or different contents. Most projects in active use already have a work area, so the integration just needs to find that property. If that work area does not exist, or is not in a suitable location, the integration may create a new work area for the project, or move the existing work area. If modifying the properties of that project is undesirable (or not possible die to access constraints), the integration may copy the project to a new version, optionally retaining the exact same configuration membership. In any event, the integration may obtain an absolute path to the work area of the original or copied project.
The integration can list the contents of the work area using standard file system operations. Since one of the options in Synergy work area projection is to set the file modify time to the modify time set in the SCM system, the work area file modify times can represent the time the controlled file was last modified anywhere. However, the file modify time does not represent the time the file was added to the configuration. Suppose you have a work area for some configuration, and you then decide a particular revision or change set is bad, and you revert to an older version. If you have the file times in the work area representing the file modify time in SCM, the older version will likely have an older modify time. If this older time is set in the work area, the build system will probably not rebuild anything, and report success even though the build products no longer reflect the contents of the work area or configuration. Other than forcing rebuilds or deleting the built products before the build, there are two ways to avoid this:
- Have the time stamps in the work area reflect the time the file was projected to the work area instead of the true file modify time - and Synergy has an option to do this, settable on the project version.
- Use a build system that understands this distinction, and has rebuild dependencies reflect both the actual modify time and the 'use time' of each file. Synergy used to ship with a build system that did this (ObjectMake), but that product was based on code with a dubious license, and is no longer shipped. Existing customers may continue to use that build system with current and new releases of Synergy.
The integration can find other properties of files in the project, including their type and possibly their encoding. All objects in Synergy have defined types; Synergy comes with a number of object types built in, and the user may define new types. The type of a file defines a number of indirect properties, such as the editors and other tools that Synergy will invoke by default, the permissions in the work area, the expansion of key words in the source, etc. A build system might want to look at the type of an object to determine how it is to be built, etc. In simple cases, a Synergy type corresponds to the normal file suffixes, such as .c, .java, etc., and the build system uses the file suffix directly rather than ask Synergy how to build files of that type.
After a build, files may be checked in to Synergy. As in many SCM systems, a file must be checked out before it can be checked in. By default, object versions that are not checked out are represented in the work area by read-only files; this might cause the build to fail when trying to write to build targets, if there are already earlier versions of those build targets in the work area. There are two usual ways to avoid this in Synergy:
- Have the build system check out targets before building them.
  The previous Synergy tool ObjectMake used to do this automatically; with third-party build tools the user had to modify the Ant rules or make file or whatever to check out build targets explicitly.
- Change the work area properties on the project version to ask for all files to be read/write, not just checked out files.
  This is quite a common setting anyway; it allows Synergy to be used with IDEs and similar tools with no explicit Synergy integration. The user just edits files, deletes files, creates new files, etc., without regard to Synergy, and then runs a tool to synchronize the work area afterwards. During this sync, Synergy will detect all the changes made to the work area (and to the project if that has been changed at the same time) and allow the user to check out modified files, control new files, remove or restore deleted files, etc. The user may also mark new files as build artifacts (products in Synergy parlance).
In Synergy, the bill of materials is represented as a property of the project version. It could be created automatically by ObjectMake, and may be created explicitly for other tools.
After a build is complete, the normal action in Synergy is to create a baseline. A baseline captures the state of multiple project versions (actually creating a frozen copy of those project configurations, but not copying the work areas). A baseline also creates a checked-in copy of any products that are part of those projects but not already checked in. Properties of the projects are part of the copy, hence including the bill of materials. A baseline also captures the change sets that are included in the corresponding projects; this data is represented by a set of relationships between the baseline and the change set objects. Baselines are queryable objects; all this information is available by looking at properties of the baseline.
Reporting on the differences in membership, change sets, etc., between configurations is just a matter of listing the two and showing the differences in the manner desired. Synergy has some of these reports built in (also made available through Rational Change), but many users construct their own reports. If the OSLC SCM service required a new report content or format, it would be fairly straight-forward for Synergy to add a new built-in report.
Synergy does have ways of displaying the difference between the content of any two object versions, both for the GUI and the CLI, though of course it is also possible to do that using the diff tool and options of your choice in the work area. The options available in the Synergy tools themselves are rather limited in current releases.

FrankSchophuizen: _An aspect not (yet) addressed on this page is that a Build system needs a context for (the projection to the file system of) a configuration in the SCM system, for example a set of applications needed to generate the build results (e.g. checkers, pre-processors, compilers, linkers), attributes (e.g. tool version, default settings, location paths, libraries and other resources).

In fact, the Build system should be a configuration management system by itself that allows adding and deleting context applications (or versions), set and modify properties of those applications (per version), define and modify build contexts (configurations) to be used to run builds with, etcetera._

-- FrankSchophuizen - 19 Oct 2009

Topic revision: r6 - 21 Oct 2009 - 18:54:33 - NickCrossley

Main.ScmBuildIntegrationStory moved from Main.ScmBuildManagementStory on 19 Oct 2009 - 09:42 by NickCrossley - put it back

Main

Webs
Main
Sandbox
TWiki

Copyright � by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Contributions are governed by our Terms of Use
Ideas, requests, problems regarding this site? Send feedback