HistoryViewLinks to this page 2012 September 24 | 12:31 pm

Contents


Introduction

This page categorizes, lists, and defines the software metrics that QSM has identified as being the most useful for monitoring and controlling projects.

Size Metrics

Size metrics measure the magnitude of the product being developed in the project.

Estimating the size of a software project is often the most difficult and time consuming step in the estimation process. Project size is a measure of the functionality that a software project will create or modify. It is emphatically not the cost nor the amount of effort required to complete this functionality.

When estimating the size of software projects there are three principals that should be observed.

  • Determine what sizing artifacts are available. The sizing metrics used will depend both on organizational standards and where in the development lifecycle the estimate is being generated. Early in the lifecycle only abstract measures such as high level requirements are available. Later, more concrete measures such as screens, reports, and database tables can be used. Concrete measures tend to produce more accurate size measures for estimating since they are more closely aligned with what the developers must create.
  • Use multiple sizing techniques. If the size estimates converge, it increases your confidence in the accuracy of the estimates. If not, this is a good indicator that more work remains to be done.
  • Present size as a range. A size estimate is just that: an estimate. By presenting the size estimate as a range from best case, through most-likely, to worst-case scenario you provide a range of possibilities which permits the project estimator to model the risk involved with an estimate and account for it. Imprecise sizing methods are more likely to be accurate than precise methods; scale the estimate’s precision to improve accuracy.

I will post a graphic here when I learn how to integrate images with the text on the TWiki.

Early Size Metric Examples- Lifecycle Phase Inception/Concept Definition

High Level Requirements/Business Needs

High level requirements are usually the first quantification of the size and scope of the system. Sometimes these are captured by the business development group within the organization and are sometimes called “Business Level Requirements”. On Agile development projects this may involve defining the usage model, domain model and user interface model. These requirements are usually define at a fairly high level of abstraction.

Analogy Sizing

Analogy sizing is where you try to infer the size of a new system by comparing it to the size of some known (completed) projects. Is the new system larger or smaller than the known systems and by how much. Analogy sizing is designed to get you in the ballpark of where the actual size is likely to be. A use case model based only on Briefly Described form provides a basis for estimation from historical data.

Function Point

A Function Point is a measure of business functionality delivered to a user by an information system. It is defined by IFPUG, the International Function Point Users Group, and several ISO standards.

Middle Size Metric Examples - Lifecycle Phase Elaboration/Requirements & High Level Design

Technical Requirements & Use Cases

Technical requirements or low level requirements usually involve defining the system down at a functional level and also capturing some of the non-functional performance and quality criteria. These requirements can be calculations, business rules, data manipulation, technical details. etc… . In some cases defining these requirements way involve UML modeling activities. Use case points can be assigned after use cases reach the Essential Outline form.

Function Point

A Function Point is a measure of business functionality delivered to a user by an information system. It is defined by IFPUG, the International Function Point Users Group, and several ISO standards.

Stories

Stories or User Stories are decriptions of work that can be performed in finite periods of time. In some cases they need to implemented in less than a week. In practice stories are much like features that are implemented in the system. Stories are used to calculate velocity on agile development projects.

Story Point

Story Point http://agilefaq.net/2007/11/13/what-is-a-story-point is a measure of the effort required to implement a user story in Scrum Agile projects. Story points are measured in units defined by the development team as a reflection of their own capability, or team velocity, and therefore should not be directly compared across teams. However, if an Agile team has a constant team velocity, the story points can be related to person-hours of effort

Late Size Metric Examples - Lifecycle Phase Construction & Test

QSM SLOC & ESLOC Metrics

SLOC or Source Lines Of Code is an example of very low level size metric. Generally QSM breaks SLOC into three buckets, newly created or new, existing but modified and existing and not modified or reused. Total SLOC is the sum of all three categories. ESLOC is Effective Source Lines Of Code. For QSM purposes ESLOC is defined as New plus Modified. As you can see this will be different from other ESLOC definitons from some of the other estimation vendors. ESLOC is used as the sizing input to QSM’s SLIM-Estimate estimating tool.

KLOC

A KLOC is one thousand lines of source code, excluding blank lines and comment-only lines. Maybe IEEE defines this?

Galorath Source Lines of Code (SLOC) Metrics

The following table describes in detail precisely what is and is not included in a SLOC count. For each logical line of code,

Include: All executable lines.
Include: Non-executable declarations and compiler directives.
Exclude: Comments, banners, blank lines, and non-blank spacers.

Also, look at the means by which a line was produced,

Include: Manually programmed lines.
Include: Lines developed by the developer for use with a Source Code Generator.
Exclude: Lines generated as output from a Source Code Generator.
Include: Lines converted with automated code translators. However, these lines should be entered as pre-existing code. The user will then define how much rework must be done on the translated code through the use of rework percentages.
Include: Copied, reused, or modified lines of code. Again, these lines should be entered as pre-existing lines of code.
Exclude: Deleted lines of code.

Furthermore, look at the origin of each line:

Include: New lines developed from scratch
Include: Pre-existing lines taken from a prior version, build, or release
Include: Invocation statements or lines considered for rework evaluation from COTS or other off the shelf packages. The user should define the level of rework required for those lines which will be modified in any way.
Include: Invocation statements only for unmodified vendor supplied or special support libraries.
Include: Modified vendor supplied or special support libraries, commercial libraries, reuse libraries, or other software component libraries. The user should define the level of rework required for those lines which will be modified in any way.
Exclude: Lines which are part of an unmodified vendor supplied operating system or utility or other non-developed code.

Lastly, consider the end usage of each line:

Include: Lines which are in or part of the primary product
Include: Lines which are external to or in support of the primary product, only if they are deliverable.
Exclude: Lines which are external to or in support of the primary product, but are not deliverable, or any other non-deliverable lines.

Size Components

Size components are things that represent standard type of components that need to be implemented in the system. Examples are things like Screens, Reports, Interfaces, Data Conversion Routines, Forms, Extentions/Enhancements, etc… . These can further be grouped into complexity bins if desired.

Schedule Metrics

Schedule metrics measure the duration of the project. Schedules are not unique to software projects, so definitions from PMI, the Project Management Institute, could be used. Project lifecycles typically define high level phases for schedule decomposition like Rational RUP: Inception, Elaboration, Construction & Test and Transition. Typical schedule metrics are the start and end dates of a task in the work breakdown structure (WBS) as well as the elapsed calendar time for the task. Schedule metrics can be measured at each level of the WBS hierarchy. In SEER-SEM, project duration is measured from the start of software requirements to the end of “operational test and evaluation”, i.e., just up to installation or production of the finished system. The concept stage is not included nor, for hardware-integrated systems, is hardware integration. Calendar time is typically measured in hours, days, weeks, months and years.

Effort Metrics

Effort metrics measure the amount of labor expended in the project. Effort is not unique to software projects, so definitions from PMI, the Project Management Institute, could be used. Effort is the application people over calendar time which defines the project staffing profile. Actual project staffing (number of actual people) should be differentiated from Full Time Equivelant (FTE) staffing which is normalized for individuals working less than of greater that full time. In other words, if 2 people are working half time for one month they are counted as one FTE person for the month. Effort is typically measured in person hours, person days, person weeks, person months and person years. Effort values are typically provided for task items in a WBS hierarchy as well as breaking out the effort by skill/role. For example, a task which will take one person ten months to complete represents ten person months of effort. Another task which will take ten people one month also represents ten person months of effort.

Quality Metrics

Quality metrics measure the characteristics of the defect injection and removal processes used in the project, the latent defects in the product being developed and its subsequent reliability. A software defect is any condition where the software does not behave as it was designed to behave (deviation from specification). This could be an unexpected program or system termination, failure to operate at all, failure to produce output, incorrect output, or any situation where the software does not behave correctly. Each individual defect is only counted once regardless of how many times it appears in the program output. For example, if a single internal calculation which appears in several outputs is incorrect the failure is counted as a single defect. See 1.1.2. Faults vs. Failures [Jor02:c2; Lyu96:c2s2.2; Per95:c1; Pfl01:c8] (IEEE610.12-90; IEEE982.1-88) http://www.computer.org/portal/web/swebok/html/ch5. Defects can be often be correlated to test procedure failures or program change requests. However, in both cases, care must be used in applying these definitions. For example, a single defect can cause many test procedure failures, and program change requests can be generated from changes in user requirements as well as from software defects.

Defects Found & Fixed

QSM estimates the number of defects to be discovered on the software project. These cover all defects across the project lifecycle, requirements defects, design defects & coding defects. They are typically categorized into 3-5 severity levels like cosmetic, tolerable, moderate, serious and critical. They can be expressed as a rate or a cummulative value. QSM also estimates defects remaining or latent defects across the project lifecycle.

SEER-SEM estimates only defects remaining upon program delivery (to the customer). ‘Defects’ are considered to be any condition where the software does not behave as intended.

Mean Time To Defect

Mean Time To Defect or MTTD can be calculated from the defect metrics. All systems have a required MTTD defined by their operational runtime environment and mission profile. By projecting when the system will meet its required MTTD, we make sure that the schedule and cost estimates also support the systems required reliability.

Derived Metrics

Many additional metrics can be derived from the core metrics listed above. Productivity metrics are popular derived metrics that usually measure a ratio of the output (size) achieved for a given amount of input (effort and schedule) expended.

Productivity:

Productivity Ratios

These are simply the ratio of the output (size) to the input (usually effort). Typical measures are SLOC/Person Month or Function Points/Person Month. Sometimes the production rate is measured by measuring the ratio of output (size) to the input of schedule. Typical measures would be SLOC/Month or FP/Month.

Productivity Index QSM

Productivity Index

“Productivity” embraces many important factors in software development, including:

  • Management influence
  • Software development methods
  • Software development tools, techniques and aids
  • Skills and experience of development team members
  • Availability of development computer(s)
  • Complexity of application type

Productivity is calculated using the computational form of the software production equation:

Productivity = [Size/](Size/.html)(Effort/B)1/3 * Time4/3

In this equation, time is in years, effort is in staff years and B is a special skills factor, a function of size. It increases slowly with size in the range 18,000 < SLOC < 100,000 as the need for special integration, testing, quality assurance, documentation and management skills increases with the complexity caused by sheer volume of code. Table 1 shows how B varies with size.

Table 1 - SLOC vs. B

Size(SLOC) B
5-15K .16
20K .18
30K .28
40K .34
50K .37
>70K .39

We have calculated the Productivity parameter for all the systems in our data base and have found it behaves exponentially. The range of numbers spans values from a few hundred to hundreds of thousands. The values tend to cluster around certain discrete values that follow a Fibonacci-like sequence.

These numbers are not easily understood by commercial managers, so we use a simple scale of integers called the Productivity Index to represent the engineering numbers on a one-for-one basis (Table 2). These are adequate to span the universe of all software projects seen so far. The values can be extended whenever an organization becomes efficient enough to require it.

Table 2 - PI Translation

P Constant PI
754 1
987 2
1220 3
1597 4
1974 5
2584 6
3194 7
4181 8
5186 9
6765 10
10946 12
13530 13
17711 14
21892 15
28657 16
35422 17
46368 18
. .
. .
. .
9101328 40

All you need to calculate the PI is:

  • The number of new and modified source lines of code (or any other valid size measure).
  • Total Staff months of Effort for the Main Software Build phase.
  • Total elapsed calendar months for the Main Software Build phase.

Here is an example:

Size is 75,000 ESLOC, MB Effort is 95 PM, Development Time (td) is 15 months.

We use the following formula to calculate the Productivity Constant

Prod. Const = SLOC/((Effort/12/B)^(1/3)*(td/12)^(4/3))

Prod. Const. = 75000/((95/12/.39)^(1/3)*(15/12)^(4/3))

Prod. Const. = 75000/(2.73*1.35) = 20,366 (with intermediate rounding).

Then entering the table above with this number and interpolating we get approximately PI = 14.6.

Interpretation of the Productivity Index

The PI is a macro measure of the total development environment. Values from 1 to 40 are adequate to describe the full range of projects. Low values generally are associated with poor working environments, poor tools and high product complexity. High values are associated with good environments, tools and management and well-understood, straightforward projects. QSM has sorted the 8,000 systems in our data base by application type. Average values in each application category have been determined along with the variability in PI values within each category.

An increase or decrease of one value on the PI scale has a significant impact on time and effort. In a typical development, compared to the industry average:

An organization one PI below the average takes 10% more time and costs approximately 30% more.

An organization one PI above the average takes 10% less time and cost approximately 30% less.

The PI is also closely coupled with product quality. A lower PI significantly increases the number of defects that will be created and need to be fixed in a system. A higher PI significantly decreases the number of defects created in a system.

It is typical to see productivity improvement over time in all application categories. Rates of improvement are slower in the more complex application categories. Improvement rates vary from a maximum of 1 PI every 1.5 years to a minimum of 1 PI every 4 to 5 years.

Galorath Technology and Complexity Calibrations

Technology and complexity are intermediate values computed by SEER-SEM and are key inputs into the effort and schedule computations.

In general, you would calibrate these values to compensate for non-linear differences between your actual and estimated effort and schedule. Typically, most differences between actual and estimate can be explained with known parameter, size, and staffing inputs. However, there occasionally are differences that cannot be explained with the standard set of SEER-SEM input descriptors.

Reasons for using the Technology/Complexity Adjustments are:

  • To make use of trends observed in historical data even if specific technology and environment details are not known
  • To make top level adjustments on key model inputs, eliminating the need for adjust parameter details
  • Adjust for your organization’s ability to productively staff up projects

Technology and Complexity Factors Inside the SEER-SEM Model

A project’s effort and duration are interrelated, as is reflected in their calculation within the model. Effort drives duration, notwithstanding productivity-related feedback between duration constraints and effort. The basic effort equation is:

K = D0.4(Se/Cte)1.2

such that Se is effective size and Cte is effective technology, a composite metric that captures factors relating to the efficiency or productivity with which development can be carried out. An extensive set of people, process, and product parameters feed into the effective technology rating. A higher rating means that development will be more productive. D is staffing complexity - a rating of the project’s inherent difficulty in terms of the rate at which staff are added to a project.

The general form of this equation should not be a surprise. In numerous empirical studies, the effort-size relationship has been seen to assume the general form y = a * sizeb with a as the linear multiplier on size, and the exponent ranging between 0.9 and 1.2 depending on available data. Most experts feel that b>1 is a reasonable assumption, translated as effort increases at a proportionally faster rate than size. While SEER-SEM’s value of 1.2 is at the high end of this range, the formula above is only part of the estimating process.

Once effort is obtained, duration is solved using the following equation:

td = D-0.2(Se/Cte)0.4

The duration equation is derived from key formulaic relationships (not detailed here). Its 0.4 exponent indicates that as a project’s size increases, duration also increases, though less than proportionally. This size-duration relationship is also used in component-level scheduling algorithms with task overlaps computed to fall within total estimated project duration.

Quality:

Defect Density

Defect density is the ratio of defects to the size of the output. Typical measures are defects/Kloc or Defects/Function Point.

Comments

Enter your comments here:

Main/AndyBerner - 22 Oct 2009

Aren’t Story Points also used by Agile teams as a measure of relative size, similar to the use of function points? As Larry points out, the relationship to effort is through “team velocity” with the assumption that a particular team (thus particular team size) will be able to complete so many story points in a particular amount of time. See Mike Cohn’s book, Agile Estimating and Planning.

Main/AndyBerner - 22 Oct 2009

Function points are consistent (within tolerance) across differing teams and have some predictive value for unformed teams; they have some approximate ‘absolute value’ within a given technology and so are useful for comparison of widely varying project sizes and SDLCs. Story points have only relative value peculiar to each team. Even velocity (based on story points) is peculiar to a team; the rate of change in velocity, however, may be useful for predicting learning rates across teams if starting conditions (like size and domain experience) are about the same.

Main/SkipPletcher - 04 Dec 2009

Capers Jones makes the case that using Lines Of Code for more than a single-language assessment can lead to “professional malpractice.” I would ask, since the code is already written, why would sizing at this point provide any value for estimation? How does knowing the size of existing code help me estimate what remains to be done in the project?

Main/SkipPletcher - 04 Dec 2009

Category:Rest API main topics


Categories