FESTA handbook Performance Indicators

Jump to navigationJump to search

Back to FESTA Handbook main page: FESTA handbook

5. Performance Indicators

5.1 Introduction

During the process of developing hypotheses, it is important to choose appropriate performance indicators (PIs) that will allow answering the hypotheses, but that will also be obtainable within the budget and other limitations of the project. Many different kinds of performance indicators have been used in previous studies, and they are related to various aspects of driving. Below a definition and description of performance indicator is given. Further, it is explained how the performance indicator is related to measures, and the types of different measures that have been identified are described. Also, examples are provided to illustrate the concepts. An overview of the PI-Measures-Sensors-table, that can be found in the D2_1_PI_Matrix_Final.xls annex of FESTA Deliverable 2.1, is given, along with a background text related to the different groups of performance indicators and measures is provided. Once performance indicator and measures have been defined and linked, it is necessary to test how they will work, which in practice means testing the whole data transmission chain from sensors/devices, vehicles and/or roadside equipment to processed and upload data in the research database. The moment to run these tests is the so called “piloting phase”, that will be further described in Chapter 6 which is dedicated to Experimental procedures.

5.2 Performance indicators definition

Definition: performance indicators are quantitative or qualitative indicators, derived from one or several measures, agreed on beforehand, expressed as a percentage, index, rate or other value, which is monitored at regular or irregular intervals and can be compared to one or more criteria.

Further explanations:

  • hypotheses steer the selection of performance indicators and the criteria against which those should be compared. hypotheses are seen as questions that can be answered with the help of measurable performance indicators.
  • Criteria can be baseline, different experimental conditions, absolute values, etc. This depends on the research questions and hypotheses.
  • New performance indicators or combinations can be developed during the course of the study. They will have to be validated in follow-up studies.
  • A denominator is necessary for a performance indicator. A denominator makes a measure comparable (per time interval/per distance/in a certain location/…). Therefore “crash” or “near-crash” in themselves should rather be considered as “events”, since they only become comparable when they get a denominator, like “number of crashes per year per 100.000 inhabitants”. For certain performance indicators either time or distance can be used in the denominator (e.g. number of overtaking manoeuvres, percentage of time exceeding the posted speed limit).

For performance indicators measured via rating scales and questionnaires, focus groups, interviews, etc., the “denominator” would be the time and circumstances of administrating the measuring instruments, for example before the test, after having experienced the system, and so on.

Performance indicators are very diverse in nature. There are global performance indicators as well as detailed performance indicators, there are observed and self-reported (subjective) performance indicators, there are performance indicators calculated from continuous and from discrete data, and so on. An example for a rather global performance indicators based on continuous log data would be the mean speed on motorways, whereas an example for a performance indicators based on discrete, self-reported data would be the level of perceived usability of a function. Some performance indicators can be based on either self-reported, discrete measures or on logged data, such as for example the rate of use of a system. The participants can be asked how often they use a function, but the actual function activation and the different settings chosen by the driver can also be logged from the system.

All performance indicators are based on measures, which are combined and/or aggregated in certain ways, and which are normalised in order to allow comparisons. The measures are described below.

5.3 Measures

Five different types of measures are identified, namely Direct Measures, Indirect Measures, events, Self-Reported Measures, and Situational Variables, which are described in more detail below. A measure does not have a “denominator”. Therefore it is not in itself comparable to other instances of the same measure or to external criteria. The measure itself, however, can very well be a fraction (like speed). Several performance indicators can use the same measures as input, and the same measures can be derived from different types of sensors. An example would be speed that can be read from the CAN bus, logged from a GPS receiver, or calculated by an external sensor registering wheel rotations.

5.3.1 Direct (raw) measures

A Direct Measure is logged directly from a sensor, without any processing before saving the data to the log file. Linear transformations like the conversion from m/s to km/h are not considered to be processing. How the sensor arrives at its output is not relevant for the classification. Longitudinal acceleration, for example, is a Direct Measure if logged directly from an accelerometer, but not if derived from the speed and time log. In this case it would be a Derived or Pre-Processed Measure, because it is not directly available from a sensor and has to be calculated from other measures, i.e. pre-processed, before logging. Further examples of Direct Measures are: raw eye movement data, the distance to the lead vehicle as measured by radar and a video film of the forward scene.

5.3.2 Derived (pre-processed) measures

A Pre-Processed Measure is not directly logged from a sensor, but either a variable that has been filtered, for example, or which is a combination of two or several Direct or other Pre-Processed Measures. An example of a Pre-Processed Measure is time to collision (TTC), which is based on the distance between a vehicle and another vehicle or object, divided by their speed difference. The distance to vehicle or object on collision course is a Direct Measure from a radar, for example. The speed difference between the own vehicle and the other vehicle or object is another Pre-Processed Measure, based on the own speed as read from the CAN bus, for example, and the calculated speed of the other vehicle or object. Another example of Pre-Processed Measure based on raw eye movement data and the vehicle geometry is: pre-defined zones that the driver looks at, e.g. mirror, windscreen and radio.

One major issue with derived measures is that there is to date rather little standardisation on how to calculate them. That means that derived measures calculated in one study may not match those calculated in another study. It also introduces the potential that the calculation procedure adopted may not conform to the best scientific standards and that, as a result, the findings may lack validity. The problem is currently being addressed by the Safety and Human Factors Standards Steering Committee of SAE International, which has adopted a task (TASK J2944) termed "Driving Performance Definitions" (http://standards.sae.org/wip/j2944/). The task is focused on measures related to longitudinal and lateral vehicle control.

A special case of Derived Measures are those that are coded by a human observer after data logging is completed. Examples might be gaze direction coding, classifications of scenarios or classifications of secondary task engagements. These Measures are considered to be “derived”, because data reduction by a human observer is more than only a linear transformation, and they can be based on more than one Direct Measure. In case of secondary task classification one might use both a video of the driver’s hands and a log file of an eye tracker, and for scenario classification both a road database and a video of the forward view might be used.

5.3.3 Events

Events can be seen as singularities based on Direct Measures and/or Derived Measures or on a combination of those. They can be short in time, like a crash, or extended over a longer period of time, like an overtaking manoeuvre. One or more preconditions must be fulfilled for an event to be classified as such, that is, one or several “trigger” criteria must be exceeded. For the event “overtaking manoeuvre”, for example, the non-technical definition might be: A vehicle in a vehicle-following situation changes lanes, accelerates and passes the vehicle in front, then changes lanes back into the original lane, in front of the vehicle(s) that have been overtaken. Depending on the infrastructure design, the definition might need to be extended to motorways with more than two lanes in each direction, for example.

Several performance indicators can be related to one event type, for example for overtaking manoeuvre it could be of interest to determine the number of overtakings, the duration of overtaking, the distance/time spent in opposite lane and so on. For a more technical definition that sets the trigger criteria of when exactly an overtaking manoeuvre starts and when it ends, either the literature has to be consulted or an own definition has to be developed. This can possibly be based on previous data, or, if nothing else is available, on the data from the current FOT .

Events are very important to NDS/FOT studies, because a core type of analysis performed in almost every NDS/FOT is what can be called Event Based Analysis (EBA). The basic principle of EBA is analysis is to identify shorter driving segments called Crash Relevant Events (CREs, typically in the order of 5-10 seconds), during which the crash risk is judged to be higher compared to other driving, and then to analyse why these events occur, and/or whether their frequency or dynamics change when particular safety systems are made available to the driver. A key element to NDS/FOT success is therefore defining CREs in a proper way. If the selected events are indeed crash relevant, then extrapolation to the general driver population is both possible and credible. While simple in theory, identifying CREs in NDS/FOT data is a bit more difficult in practice. To begin with, the simplest measure of crash risk, i.e. actual crashes, are incredibly rare, so the final database will usually contain fewer than needed for statistical analysis. Surrogate events therefore have to be used. These surrogate events need to be driving situations where there is no actual crash, but where the event still unfolds in such a way that the its presence can be used as an indicator of crash risk. A key challenge for all NDS/FOT studies is therefore how to couple surrogate (non-crush) events to crash causation mechanism. Ideally, one would only use CREs that are known with certainty to be predictive of actual crash involvement. Unfortunately, while a lot of effort has gone into developing algorithms, filtering techniques etc. that allow for efficient yet relevant CRE selection, precise definitions with a clear cut, indisputable connection to crash involvement have yet to be fully established. Selecting suitable CRE definition is therefore a major decision point for a project, and should be treated as such. Currently, there are four main approaches to CRE definition in use. These will be given short introductions below. When defining CRE for a project, it is further recommended to read Annex C, which contains a more detailed overview of the pros and cons of each approach, as well as some associated topics to think about. Driver response based CRE definitions

The first approach can be called the "driver response based" approach, and it builds on the general idea that the CREs can be identified from the way drivers respond to them. The most common version of this is to look for extreme vehicle kinematics. The basic assumption is that drivers prefer to travel in comfort and generally will not expose themselves to drastic events unless necessary. Thus abrupt velocity and direction changes in the vehicle may indicate unplanned and urgent responses to unexpected situations. A less common version is to use what can be called the startle response in the driver to find events. The general idea is that unexpected traffic situations that include a perceivable threat to the driver triggers a response in the form of a general tensioning of the body. This "jerk" may be used as a tell-tale that the driver did not expect the situation and that she/he perceives it as genuinely threatening. In the driver response based approach,CREs are identified based on how each driver evaluates the situation. For example, while one driver may brake hard at a certain time-to-collision threshold, another driver might not brake at all when at that time-to-collision value. Hence a selection of CREs based on this approach will include the firs event but not the second (since the driver did not respond, it is by definition not an event). Driver response based CRE selections will therefore reflect the normal variability in any driver population in terms of driving style, risk perception and capacity to respond. It follows that representative selection of drivers becomes a key issues when using a driver response based CRE selection. Safety function response based CRE definitions

When the study is an FOT, i.e. designed to assess the impact of one or more active safety functions, then a very natural approach to CRE identification is to use the function itself to detect CREs. After all, that is what the function is designed to do. For example, if an FOt is set up to assess the effects of Forward Collision Warning (FCW) on crash risk, the warnings issued can be used as event identifiers. The downside of this approach is that any CRE that occurs outside the function's detection capacity will be missing from the analysis. It thus becomes impossible to estimate the frequency of CREs which the function in principle needs to detect but in practice cannot. On the upside, a very realistic assessment of function availability and usage is obtianed. Since the function only can do something when it is turned on, true availability and usage rates are automatically represented in the data set. Driving context based CRE definitions

A third approach to CRE identification is to base in on driving context. The underlying assumption here is that too small margins equal elevated crash risk. In other words, there exist situations where the safety margins are inherently so small that the slightest mistake or variation could lead to a crash. Prevention to whatever leads to these small margins thus will enhance traffic safety. The definition of what constitutes too small margins can be static, such as when lane makers are used to indicate boundaries that should not be unintentionally crossed. The definition can also be dynamic. For example, if a vehicle is closer than X to another vehicle and simultaneously closing in faster than Y, it might be considered as being in conflict regardless of whether an action is taken or not. In a driving context approach, the CRE definition is independent of how drivers resolve the situation. This means all drivers are equally covered, independently or their capacity or willingness to respond. It also means that drivers with a more aggressive driving style will contribute more events per driver to the analysis than those who are less aggressive, since they end up in small margin situations more often. If the analysis team believe that small margins in and of themselves are predictive of crash involvement, this is ok, since these drivers should have a higher crash risk. If the team is dubious about this assumption however, this approach might not be the right one for the study. Driving history based CRE definitions

A forth approach is to look for unusual events in a driving history perspective. The underlying assumption is that unusual events in a person's or group's driving history are unusual precisely because drivers try to avoid them. Presumably at least a certain portion of them would be crash risk related. The advantage of this approach is that it will find the most unusual events that occurred during the study for each person (or group), and it is reasonable to assume that those are events which drivers would prefer to avoid in the future. The corresponding disadvantage is of course that those events may be special for other reasons than being-safety-critical. Even if drivers try to avoid them, they may have little or no connection to traffic safety. Combined approaches

Sometimes projects makes combined use of the approaches above. For example, in euroFOT, after multiple versions have been tried, the CRE definition for lead vehicle conflicts that finally was settled or included in the following criteria:

  • FCW warning issued (Function based).
  • Brakes applied within 5 seconds after FCW warning (Driver response based).
  • Max Brake Jerk > 10 (Driver response based).
  • Max Brake Pressure > 20 bar (Driver response based).
  • Lead vehicle = moving (Driver response based).
  • Direction indicators not in use prior to warning (Driving condition based). Coupling CREs and crash risk - the CRE causation and applicability question

The discussion above illustrates that performance indicators can be built by counting events, or by considering certain aspects of those events. However, it also illustrates that each approach to CRE definition outlined above also represent a different view of how the coupling between CREs and crash risk should be made, and it is not obvious which is the best way to go. What is clear however is that different CRE definitions will lead to different results. In Fitch, RAKHA et al. (2008), it is described how another project, (HANOWSKI, BLANCO et al., 2008) approached the same data set as Fitch and his colleagues were analyzing with a different CRE selection method. Interestingly, while both projects found hundreds of what they judged to be relevant CREs in that data set, only 7 of the 596 CREs found by HANOWSKI et al. overlapped with those identified by FITCH et al.. Clearly, the CRE defined in a project matters.

Thus, FESTA neither can nor will try to provide specific trigger values for events, nor will the exact measures that have to be included for the definition of a certain event be provided and the events listed in the matrix should be seen as examples. Projects must make an informed, explicit, conscious and preferably well informed decision on their own of how to use the available approaches to best fulfill their goals, and then set up the CRE detection correspondingly.

Some general recommendations can be made, though. First of all, it is important to point out that not everything can be seen in the data typically collected in the NDS/FOT studies. For example, mental states do not show in video and CAN data, so if you want to analyse how intentions, expectations, or levels of attentiveness contribute to crashes, probably the study to be conducted should not be a pure NDS/FOT study. A mixed approach, where the NDS/FOT is complemented by additional data collection better suited capturing drivers' intentions and routes, expectations is probably more suitable.

Second, please not that CREs do not have to look like crashes to be relevant for analysis. This comes back to the underlying assumptions that about what mechanisms are predictive of crash involvement. If for example the study to be conducted t is assumed that high levels of variability in normal driving in and of itself is an accurate crash predictor (a version of the driving context based CRE selection approach above), then each event that captures the tails of the driving parameter distributions is crash relevant, even if it does not look spectacularly dangerous on video.

Third, the selection criteria have to match the time scale of the event to be analysed. This applies in particular to using physiological driver parameters for CRE selection. Many such parameters, like respiratory rate, simply change too slowly for precise correlations with traffic situation changes (i.e. capturing hard brakings by looking at respiratory rate is not likely to succeed).

Fourth, it may strengthen the credibility of the results if the CRE analysis is restricted to injury related CREs. Doing this typically includes finding out at which travel speeds and/or road types injuries occur for the crash type to be analysed, and then limiting the CRE analysis to events which fall inside those speeds and road types.

Again, it is recommended to read Annex C, which contains a more detailed overview of the pros and cons of each approach, as well as some associated topics to think about.

5.3.4 Self-reported measures

A number of performance indicators are based on Self-Reported Measures, which are gleaned from either questionnaires, rating scales, interviews, focus-groups, or other methods requiring introspection on the part of the participant. These subjective measures are typically not logged continuously, but rather only once or a few times during the course of one study. The measures related to Self-Reported performance indicators could be the answers to each single question or the checks on the rating scales, while the sensors would be the questionnaires or rating scales themselves. It is more difficult to make a meaningful distinction between measure and sensor for semi- and unstructured interviews and especially for focus groups.

Subjective data, e.g. on acceptance and trust of a system or function, can provide valuable performance indicators, and in particular such data can be related to function usage in cases where this is within the control of the operator. Consideration should be given to tracking such acceptance and trust over time, as the levels may change with experience of the function.

In the matrix only a small number of Self-Reported Measures are included, which are those that are necessary for the computation of a performance indicator that is not solely based on self-reported measures, like for example “deviation from intended lane” or “rate of errors”.

5.3.5 Situational variables

Situational Variables can be logged like Direct Measures or computed like Derived Measures. They can also be self-reported and they can correspond to events. Their commonality is that they can be used as a differentiation basis for other performance indicators, in order to allow for a more detailed analysis. It might, for example, be of interest to compare certain performance indicators in different weather or lighting conditions, on different road types, or for different friction conditions. These Situational Variables are included in the performance indicator (PI) matrix in the measures table, but they are not linked to any specific performance indicator. In principle all kinds of measures can be used as Situational Variables, such as when analyses are performed for different speed intervals.

Data on Situational Variables are essential to collect, since it helps to establish important control factors that are needed when analysing the effects observed in the FOT. Ideally, a lot of in-depth data is collected, such as:

  • Video data (video processing takes a lot of time so this should be automated as much as possible).
  • Questionnaires & travel diaries (at certain points during the test, too frequent interventions will disturb the naturalistic approach).
  • Data on surroundings (such as surrounding vehicles - headways - and traffic state).
  • Meta data; description of data and tests for evaluation, such as who drove the vehicle, which functions were studied in a particular test drive, circumstances when driving, other functions in the vehicle, data and time of the test, etc.
  • Audio data; e.g. to give test drivers the options to tell what happened and what their experiences are, to make it as easy as possible for them - e.g. using a voice memo.
  • Logging of function states and messages (system state, e.g. on/off, what information is presented to the driver).

However, collecting all these data is costly and makes the analyses time-consuming. A possible solution is to collect in-depth data for part of the FOT (e.g. for part of the vehicles).

5.4 The PI-Measures-Sensors matrix

A matrix was developed that in one table contains performance indicators covering different aspects of research questions that might be addressed in an FOT (see the annex of FESTA D2_1_PI_Matrix_Final.xls Deliverable 2.1). These performance indicators are described with respect to different categories. For each performance indicator, the measures on which it is based are listed.

All these measures are then described in another table. Different categories are provided for description, where some are reserved for Direct Measures, others for Derived Measures and for events. Each Direct Measure points to a sensor from which the measure can be read. As mentioned above, for certain measures different sensors can be used. In this case, each of those is described as a separate measure.

A link is made between the PIs and the measures table by indicating for each performance indicator which measure is needed to compute it. In this way, when the hypotheses have been generated, it should be possible to pick the appropriate performance indicator and from there proceed via the pointers to the necessary measures and from there to the sensors. If several sensors can provide the same measures choices can be made due to budget limitations, sensor limitations or other restrictions.

Presently most measures for Self-Reported performance indicator are not included in the matrix. Instead, a direct reference is made to the appropriate questionnaire, rating scale or method needed to obtain this PI. For correct deployment of the recommended method, the user is directed to the instructions for this particular method.

Measures that describe driver characteristics are not included in the matrix itself, but in the annex to the matrix. In this annex, it is explained which instruments could be used to assess different aspects of driver characteristics (FESTA D2_1_PI_Matrix_Final.xls D2.1). The characteristics covered in this document are usually stable over a longer period of time.

This matrix is not meant to be exhaustive; it is only an aid for selecting performance indicators, measures and sensors. It should by no means be regarded as being limited to the performance indicators or measures entered now, and users are encouraged to expand the matrix during the course of their FOTs. Further instructions on how to work with the matrix are provided in FESTA D2_1_PI_Matrix_Final.xls Deliverable D2.1.

5.5 Performance Indicators oper impact area

The performance indicators are split into different sub-groups, depending on which area of the traffic system they are concerned with.

5.5.1 Indicators of driving performance and safety

Driving performance is discussed and analysed in relation to traffic safety. Given that accidents are usually multicausal, the desired set of indicators should cover a number of factors. Otherwise any FOT is likely to miss essential information that is required to produce reliable and valid results.

Traffic safety is regarded as a multiplication of three orthogonal factors, namely exposure, accident risk and injury risk (Nilsson, 2004). The driver decision making and behaviour covers all these aspects. Typically, strategic decisions are highly relevant for exposure, tactical decisions for the risk of a collision, and operational decisions for the risk of injuries (Michon, 1985). Consequently, an FOT should cover all these aspects, because it is essential to cover the driver tasks and driver behaviour widely, and include decisions like whether to use the vehicle at all, route planning before the trip, timing of the trip, etc. However, as those decisions often lay outside what can be influenced with available countermeasures, the main focus is usually on driving performance while actually driving a vehicle.

The most common approach to traffic safety in NDS studies is to contrast driver behaviour: normal driving and the sequence of events leading up to conflict, sotuations (i.e. near-crashes, or crashes).

In summary, an indicator of driving performance is a behavioural variable which indicates the quality of the behaviour in respect to road safety. The behaviour is measured directly from driver (e.g. frequency of glance to given object) or indirectly from the vehicle (e.g. speed).

5.5.2 Indicators of system performance and influence on driver behaviour

In this part indicators were developed that describe the actual performance of the system to be tested. These indicators are mostly related to both safety and acceptability. Here the focus is directed at the question whether the system actually functions the way it is meant to under realistic conditions. False alarms and misses are obvious indicators in this regard. Relations exist with indicators of acceptance and trust, which examine the subjective opinion of the participants on how the system worked.

Furthermore, indicators that describe the influence of the system on the driver and the interaction between system and driver are described. They will enable assessing the driver’s willingness to use the system in various situational contexts. They will also contribute to the identification of potential misuses of the system leading to incidents or conflicts. In a longitudinal perspective, they will also contribute to an analysis of the learning and appropriation phases. The intrinsic performance of the system

The first issue is the intrinsic performance of the system studied. It is related to the precision and the reliability of the system. Does the system perform as expected? In this case we need indicators signalling any deviations, such as false alarms and misses, but also indicators about the context in which these deviations occur. Ideally, the origin of the deviation should also be identified. The identification of false alarms or misses may be based on automated sensors or may require a video recording of the driving scene. For example, in the French LAVIA (ISA) project, loss of the recommended or target speed was automatically recorded while mismatching between the target speed and the posted speed limit was identified on the basis of a video recording of the driving scene.

The intrinsic performance of the system should be distinguished from the operational envelope of the system (i.e. the use cases for which the system was designed to work). This is important when assessing the opinion on the performance of the system: when asking the driver to assess the system performance, the limits of the system operation should be differentiated from system deviations. Two main indicators related to the operational envelope are: 1) availability of the system over driving time (Percentage of the driving time the system is available, e.g. some system are only available above a certain speed, for special road characteristics, etc.); and 2) frequency of take-over requests (the system is active but not able to provide assistance due to system limits, e.g. for ACC maximum brake rate is limited).

Both intrinsic performance and competence envelope are assumed to play a role for the drivers’ opinion on the system. Modes of drivers’ interaction with the system

The second issue is the driver’s interaction with the system. This goes beyond the analysis of overall driving performance when using support systems: in fact 1) it is examined how drivers use and interact with the system; and 2) it is examined how this interaction may affect driving behaviour and performance.

How drivers use and interact with the system

Some support systems require/enable the driver to activate/deactivate the system, to override the system, to select one system among other systems available, to select or to register some vehicle-following or speed thresholds, and so on. In other words, using a system implies the application of a number of procedures, and these procedures should be registered and analysed. This is the case for systems such as speed limiters, cruise control, adaptive cruise control or navigation systems for example. These procedures may be classified as the driver’s direct or indirect interventions, depending on whether they are applied through vehicle controls (brake or accelerator) or through system controls. As for the indicators of system performance the situational context should be taken into account. This is important for identifying potential misuses of the system leading to incidents or conflicts as described above. In a longitudinal perspective, these indicators will also contribute to an analysis of the evolution of system usage from the learning and appropriation phases to the integration phase. Furthermore, the frequency with which the system “interferes” with the driver’s activity has to be assessed. For example, when driving with a speed limiter, how often is the system “active”, that is, effectively limiting the vehicle speed. How this interaction may affect driving behaviour and performance

For analysing the effect of the driver’s interaction with the system on driving behaviour and performance various levels of analysis could be employed, depending on the desired level of granularity of analysis. Obviously, this granularity depends on the recording means available as well as on the time required for performing such analyses. For example, studying changes in glance behaviour requires video recordings and is time consuming.

For an analysis of behavioural changes at a more global level, synthetic indicators should be conceived. These indicators are assumed to reflect changes at the tactical or strategic level of the driving task. Indicators such as “lane occupancy” and “frequency of lane change” are often used to assess changes at the tactical level. Changes at the strategic level could be reflected by changes in the itinerary chosen or changes in driving time.


  • Classify the support systems by type and level of interaction implied by their use;
  • Classify the indicators according to the level of granularity of analysis that they permit;
  • Classify the indicators according to the means and time required for collecting and analysing them.

5.5.3 Performance indicators of environmental aspects

Exhaust emissions include many different substances like: HC, CO, NOx, PM, CO2, CH4, NMHC, Pb, SO2, N2O and NH3. Greenhouse gases – CO2, CH4 and N2O – represent the same society cost anywhere, while costs for other substances depend on the geographical position.

There are two alternatives for quantifying exhaust emissions: measured exhaust emissions or calculated. For measurements there are still two alternatives: on board or in the laboratory. The laboratory alternative demands use of logged driving patterns. Because of the high complexity and costs for measurements of exhaust emissions, in practice, calculated emissions is in most cases the only reasonable alternative.

Models for exhaust emissions in general include three parts: cold start emissions; hot engine emissions and evaporative emissions. The following formula is a rough description of an exhaust emission model:

Σ(Traffic activity) x (Emission factor)=Total emissions

Traffic activity data include at least: mileage and engine starts. Hot emission factors for one vehicle are functions of the driving pattern and vehicle parameters. Cold start emission factors are functions of the engine start temperature, trip length and average speed. Evaporative emissions are to a large extent a function of fuel quality and fuel tank temperature variations.

Models on a micro level, including engine simulation, should in principle be able to describe most [ ICT] functions. This is not the case for models on a macro level in general. Micro models are often used for emission factor estimation and macro models for total emission estimations.

The conclusion about what to include as performance indicators would then be: exhaust emissions or measures with high correlation to exhaust emissions.

5.5.4 Indicators of traffic efficiency

The efficiency of a traffic system can be measured as, for example, traffic flow, speed and density in relation to the optimum levels of these properties given the traffic demand and the physical properties of the road network.

A combination of FOTs and traffic modelling is required to allow estimation of traffic efficiency impacts of the tested technologies. A schematic picture of the proposed methodology is shown in Figure 5.1.

Im 5.1.png

Figure 5.1: FESTA Traffic efficiency estimation based on FOT results

Driver behaviour data are based on the data collected in the FOT . These driver behaviour data will, together with the system functionality[1] of the tested technology, be used as input to traffic modelling in order to aggregate the individual driver/vehicle impact on traffic efficiency effects. This requires that both driver/vehicle data of equipped vehicles and properties of the traffic system that the vehicles have driven in (henceforth referred to as Situational Variables[2]) are collected in the FOT .

The driver behaviour data required in order to estimate traffic efficiency for any type of FOT system are specified in terms of performance indicators and Measures and included in the attached matrix. These data (along with the Situational Variables, which can be found in the Measures Table in the Annex of D2_1_PI_Matrix_Final.xls Deliverable D2.1 should be ascertained for the baseline case (non-equipped vehicle) and for equipped vehicles, so that comparisons can be made between the two).

The appropriate traffic modelling approach will differ depending on which type of driving tasks supported by the considered technology. Michon’s (1985) hierarchical driving model can be applied to select a traffic modelling approach. To model systems that support tactical or operational driving tasks, it is appropriate to apply a traffic microsimulation model. A microsimulation model considers individual vehicles in the traffic stream and models vehicle-vehicle interactions and vehicle-infrastructure interactions. To model systems that support strategic and some types of tactical driving tasks it is appropriate to apply a traffic simulation model. A mesoscopic model considers individual vehicles but model their movements and interactions with a lower level of detail than microscopic models.

It is advisable to study traffic efficiency for a series of scenarios with varying levels of traffic penetration of the tested systems. The systems should also be studied in representative traffic volumes. This is achieved straightforwardly by running the traffic simulation model with different inputs. The situational data will also contribute to the differences between the scenarios (both measured and modelled).

Outputs from the traffic models will be used to make comparisons of traffic efficiency for the studied scenarios. Example outputs of interest are traditional quality of service and traffic efficiency indicators such as speed, travel time, and queue length.

5.5.5 Acceptance and trust

Acceptability indicates the degree of approval of a technology by the users. It depends on whether the technology can satisfy the needs and expectations of its users and potential stakeholders. Within the framework of introducing new technologies, acceptability relates to social and individual aspects as well.

Regarding the dimension of “Acceptance and Trust”, the following subjective PIs should be focused on during FOTs:

Ex-ante usefulness (level of usefulness perceived by the user prior to usage): before using a system, what are the dimensions of usefulness that occur to the future user immediately? What are the benefits he expects from using the system? Ex-post usefulness (level of usefulness perceived by the user after practice with the system): after a first use of a system, what are the user’s impressions regarding the system’s benefits. Ex-post usefulness is to be analysed in relation to the statements of the indicator on “ex-ante usefulness”. The reactions to both indicators will give useful information for system acceptance. The measurement of these two indicators can be operationalised via self designed questionnaires, based on established methodological approaches (see Nielsen, 1993; Grudin, 1992). A qualitative approach like a Focus Group with a formalized protocol and individual in-depth interviews is also appropriate. The Observed rate of use of the system or of specific system parts represents an additional indicator for system acceptance and perceived usefulness. Perceived system consequences (perception of positive or negative consequences of system's use) is another key indicator for system performance: the user expresses his/her impressions and attitudes regarding the potential consequences when using the system, which can be positive as well as negative. These impressions can best be collected via an interview and be exploited in Focus Groups, which have the advantage of group dynamics that can provide additional information on the subjective norm. Construction of standardised questionnaires is possible as well (for methodological background on this indicator, see Featherman and Pavlou, 2003). Motivation (level of motivation/impetus to use system) should be connected with the indicator Behavioural intention (level of intention to use system). Both indicators can be investigated best via self-designed questionnaires based on established methodological findings (see Armstrong, 1999; Ajzen and Fishbein, 1980). The Response to perceived social control/response to perceived societal expectations indicates the impact of perceived social control of the user’s behaviour. This indicator is a more sociological one, which should give an indication whether the user feels a social benefit (for example, social recognition) when using the system, or on the contrary, that he hesitates to use the system because he fears social disapproval when using the system (see Castells, 2002). Usability/level of perceived usability concerns the aspects of the user’s general capacity to interact with the system (including installation and maintenance issues, see Grudin, 1992; Shakel & Richardson, 1991). For these indicators, the combination of in-depth interviews, Focus Groups and self-designed questionnaires based on established methodology is recommended.

5.5.6 Driver characteristics

Even though driver characteristics are not performance indicators in themselves, they are important as Situational Variables, which is why they are included in this section. The focus here is on describing the drivers that participate in the study, as compared to selecting drivers based on certain characteristics, which is treated in Chapter 6. Drivers differ on a large variety of characteristics, which may all have an influence on how they drive and use different systems and services. These differences may be important to take into account when planning an FOT . Four categories of driver characteristics may be distinguished:

  • Demographic characteristics: gender, age, country, educational level, income, socio-cultural background, life and living situation, etc.
  • Driving experience, and driving situation and motivation: experience in years and in mileage, professional, tourist, with or without passengers and children etc.
  • Personality traits and physical characteristics: sensation seeking, locus of control, cognitive skills, physical impairments or weaknesses, etc.
  • Attitudes and intentions: attitudes towards safety, environment, technology etc.

Studies often focus on characteristics of individual drivers. However, drivers are not alone on the road. There are other road users and there may be passengers in the vehicle, which may influence the driver’s behaviour.

There are several different reasons for considering driver characteristics:

  • To make sure that the sample of drivers is representative of the target population.
  • To explain the outcomes of the FOT .
  • To improve systems and services, taking into account differences between drivers.

Driver characteristics may play different roles in FOTs:

  • Characteristics of drivers possessed before the FOT may play a role in how they behave in traffic during the FOT .
  • Although some characteristics are stable, other ones may change when using a system or service in the FOT. Attitudes may change radically before and after using a system for a longer period of time.

In general it is useful in an FOT to gather as many characteristics of drivers as practically possible. Even if no specific impacts are expected of certain characteristics, some outcomes may be explained better with more knowledge about the participants. A minimum set of data such as age, gender, income group and educational level is easy to gather from participants.

Next, information is needed about driving experience. Usually this is measured by means of self-reports. The amount of practice, i.e. the mileage of an individual driver can be collected by asking the subject for an estimation of his/her overall mileage since licensing or the current mileage per year. However, beware that these self-reports are not very reliable.

For further understanding of driver behaviour one may consider to use questionnaires on attitudes, driving behaviour and personality traits. A well-known questionnaire about (self-reported) driving behaviour is the Driver Behaviour Questionnaire. Some widely used personality tests are the Five Factor Model (FFM) test and the Traffic Locus of Control test(T-LOC). Special attention may be given to the personality trait of sensation seeking, which is correlated with risky driving. The Sensation Seeking Scale (SSC) measures this trait. These questionnaires are available in many different languages, but they are not always standardised, and cultural differences may play a role. Personality traits are very easy to measure, just by administering a short questionnaire. However, the concepts and interrelations of factors are very complex, and results should be treated with caution.

When evaluating the acceptance and use of new systems in the vehicle, drivers’ acceptability of technology is important. Both social and practical aspects play a role. Technology acceptance has different dimensions, such as diffusion of technology in the drivers’ reference group, the intention of using the technology, and the context of use (both personal and interpersonal). Measuring acceptability can be realized via (existing) standardised questionnaires, in-depth interviews before and after “use” (driving), and focus groups.

5.6 Iteration

When the performance indicators have been defined, it is recommended to re-check whether these indicators are indeed capable to test the hypotheses defined earlier, and if necessary to adjust the hypotheses or the indicators. Available resources will play a major role in determining which performance indicators to use. It is also necessary to look forward in the FESTA chain, and to consider data storage and analysis. If a large number of performance indicators has been selected, or if the performance indicators require a huge amount of data to be collected, considerations about data collection and storage capacity come into play as well as the question how to analyse those data. For example video data require a large capacity and ample resources to analyse them. If there are foreseeable problems with this it may be necessary to limit the amount of performance indicators.


  1. System functionality refers to the way in which the tested FOT system works. Information on when and how the system operates can be used to create parameters for the models developed
  2. Situational Variables are not necessarily directly relevant for Performance Indicators or Derived Measures, but must also be measured or recorded as they provide key beckgound information that complements the driver behaviour data and is sometimes needed to derive the driver behaviour data. Examples include light conditions and road type