FESTA handbook Experimental procedures

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Back to FESTA Handbook main page: FESTA handbook

6. Experimental procedures

This section of the handbook provides guidance on the overall experimental design of FOTs in order to ensure experimental rigour and scientific quality. The first section — Participants — provides advice on participant selection, including demographics, driving experience, personality and attitudes, along with consideration of sample size. The second section – Study design – provides guidance of the formulation of hypothesis, experimental design and possible confounds. The third section – Experimental environment– suggests how the road environment (road type, weather conditions etc.) plays a part in the design of an FOT and the subsequent data analysis. In the forth section piloting is explained and in the last section the methods of controlled and semi-controlled testing is explained.

6.1 Participants

6.1.1 Characteristics

Depending upon the research questions, there is often a need to select a particular group of participants for inclusion in the FOT and ensure that this group is in some way representative of those drivers who will ultimately interact with the system.

The types of variables that should be taken into account include:

  • Demographics variables, such as age, gender, social economic variables, and permanent or temporary driver impairments
  • Driving experience, in general but also experience with various systems, accident history and the usual time of driving and roads used
  • Personality and attitudes.

The first of these two variables are relatively easy to measure, using questionnaires. The data are objective and can be verified by the experimenter. Personality and attitudes, however, deserve more attention as there are a number of different ways in which one can evaluate these. FOTs may incorporate a battery of psychometric measures. Such measures are generally included in order to relate psychological factors to driving behaviour. Since drivers exhibiting certain traits or attitudes are known to engage in riskier driving behaviours, it would seem important that systems under investigation in FOTs are trialled amongst a range of drivers to ensure that the systems work for those who need it most.

Personality aspects that may be taken into account are:

  • Sensation seekers, who tend to drive more recklessly
  • Locus of control: drivers with an internal locus of control will continue to maintain direct involvement with the driving task choosing to rely on their own skills, whilst those with an external locus of control may be more likely to rely on the system and surrender involvement in the driving task
  • Drivers’ attitudes towards road safety issues.

Personality and attitudes are known to affect the ways in which drivers interact with systems, and it may therefore be of interest to preselect certain personality types in much the same was as one would sample e.g. young males, or elderly drivers to a particular trial.

Recruiting on a personality/attitude base will ensure that a system is tested on a broad range of drivers who may interact with the system very differently. Recruiting on a personality/attitude base may be appropriate since these are likely to influence behaviour directly. Variations in beliefs are likely to explain differences in driver behaviour and system use. Before beginning recruitment for any FOT . researchers must consider the relationship between individual differences and the behaviour which the system is seeking to influence.

In addition to selecting drivers, personality and attitudes can also be used as covariates in analysis in order to identify several differences in driver behaviour and system use between groups. It is not imperative that FOTs base their recruitment on such measures. However, their inclusion within the experimental design provides useful insight into the manner in which individual characteristics influence behavioural adaptation to new systems.

Before deciding to recruit on a personality/attitudinal base, researchers should consider that, when tiding the inclusion criteria for any study, it is inevitable that there will be a progressive shrinking of the research participant population. It may therefore be necessary to screen a large number of drivers in order to recruit a relatively small number of participants with the appropriate characteristics, particularly since certain individuals will be less inclined to volunteer to trial certain systems. For example, since speeding represents a thrill seeking behaviour, high sensation seekers may be less likely to volunteer to participate in an ISA trial. Inevitably selecting participants on additional measures such as these will increase the burden associated with the recruitment phase of any FOT .

6.1.2 Sample size and power analysis

FOT studies should be able to assess the functionality of the ICT systems and their impact on the driver behaviour, traffic safety, environment, etc. When the chosen sample size is too small, it is difficult to statistically prove effects of the system that are actually there. With very large sample sizes the chance of finding an effect increases. However, there are two major drawbacks on using very large sample sizes:

  • Every driver/participant needs a car equipped with the system and with a data logging system, which is expensive.
  • Small effects which are statistically significant might be found, but they might not be relevant when looking at power effect.

The appropriate sample size[1] for an FOT depends on a number of choices that have to be made in the final setup. These are, for instance, the number of ICT systems that are going to be tested and the choice of a between-subjects (two separate groups of drivers with and without an ICT system, but always with data logger) or a within-subjects design (each participant drives a certain amount of time with and without the ICT system).

In order to ensure that the chosen sample size is representative for the behaviour of a group of drivers and that it is possible to statistically prove effects that are there, power analysis is needed to calculate the desirable sample size. This power analysis is based on a number of assumptions:

  • Suppose an FOT is based on a between subjects design, such that different groups of drivers each drive with a different system. Or at least one group with an ICT system and one group without and ICT system
  • The power is 80 %, indicating the chance of statistically proving a difference between the groups when it is there (i.e. a chance of 20 % of failing to prove it)
  • The alpha level is 5 % (i.e. the chance of falsely finding a significant effect)
  • Two-tailed testing, because we have no reason to assume that either one of the groups performs better/worse than the other

The effect size is 0.2, which is typical for a small effect that can be expected in an FOT with a lot of disturbing factors compared to more experimental test set-ups. An effect size of 0.5 is typical for a medium size effect. EuroFOTanalysis has indicated that it is more effective to increase the number of drivers than to extend the time period of data collection (Jamson et al., 2009).

Im 6.1.png

Figure 6.1:Total sample size as a function of the statistical power and the effect size (2-sided test, alpha = 0.05, independent variables).

Figure 6.1: shows that a total sample size of 800 (i.e. two groups of 400) drivers would be needed to be able to statistically prove small size effects between the two groups. The groups are relatively large to compensate for the relatively high number of disturbing factors when trying to find effects in real traffic. If we expect medium size effects groups of only 75 drivers would be sufficient. If a within subjects design is chosen, one group of 400 drivers would be sufficient to test both the without and with system conditions. In practice, recruiting the specified target sample may turn out very difficult. In order to compare results between countries, it would be ideal to have the same equipment in all countries as well as the same group of drivers. This can however prove to be difficult because of the penetration rate of different makes of car in the vehicle fleet of different countries and drivers and their 'national driving styles' being different.

6.2 Study design

6.2.1 Hypothesis formulation

Hypothesis formulation is described in section 4.2.4 and subsections.

As a general rule, research practice proceeds in the following way:

  1. Formulation of the hypothesis
  2. Testing the hypothesis
  3. Acceptance or rejection of the hypothesis
  4. Replication of the results or (in the case of rejection) refinement of the hypothesis

A hypothesis is specific a statement which can be tested with statistical means by analysing measures and performance indicators. It is a tentative explanation for certain behaviours, phenomena, or events that will occur. It is essential for an FOT to be designed with clear hypotheses in mind in order to aid the interpretation of the results.

In formulating a hypothesis, consideration should be given to the variables under scrutiny. It is vital that the variables collected in an FOT allow the researcher to accept or reject their hypotheses. To do this, both the independent and dependent variables should be well defined at the start of the FOT. The independent variable is one which can be manipulated by the researcher. As the researcher changes the independent variable, he or she records what happens using dependent variable(s). The resulting value of the dependent variable is caused by and depends on the value of the independent variable. Other variables, known as controlled or constant variables are those which a researcher wants to remain constant and thus should observe them as carefully as the dependent variables. Most studies have more than one controlled variable.

6.2.2 Experimental design

The two basic types of experimental designs are within subject design (this is sometimes also referred to as crossed design) and between subject design (this is sometimes also referred to as nested design). FOTs also need to contain a control condition, in which subjects do not get any treatment. This condition is meant to serve as the baseline: This is how drivers behave in case there is no treatment or no experimental manipulation at all. Within subject design

In a within subject design, each subject encounters every level of treatment or experiences all experimental manipulations. For example in an FOT evaluating navigation systems, every subject drives for some time with (experimental condition) and for some time without (control condition) the system. In this specific case, one half of the subjects would start with the control condition and then switch to the navigation (experimental) condition and half of the subjects would do this vice versa.

This type of design has two advantages: (1) fewer subjects are needed compared to a between subject design, and (2) is more likely to find a significant effect, given the effects are real. The power of a within subject design is higher than in a between subject design. This is related to the reduction in error variance, since there are no individual differences connected to differences in treatment measures. A disadvantage is the risk for carry-over effects, which means that if a subject experiences one condition, this may affect driving in the other condition. Between subject design

In a between subjects design, each subject participates in one experimental (or control) condition. The major distinguishing feature is that each subject has a single score (with or without the system). Note that the single score can still consist of driving on various types of roads, during long periods of time or different types of driving behaviour, workload and comfort.

The advantage here is that carry-over effects are not a problem, as individuals are measured only once in every condition. The total number of subjects needed to discover effects is greater than with within subject designs. The more treatments in a between subject design, the more subjects are needed altogether. In order to limit the confounding effects due to individual differences in a between subject design, one should either use random assignment, in which the assignment of what subject is exposed to what treatment is done randomly or use matching groups (also called matched pairs), in which one also has to make sure that different groups are comparable with respect to pre-selected characteristics, such as gender and age. In order to do this, one needs to identify the variables that one wants to match across the groups, and measure the matching variable for each participant and one needs to assign the participants to groups by means of a restricted random assignment to ensure a balance between groups. Also, one needs to keep the variable constant or restrict its range. This will reduce differences within each group and therefore reduce within treatment variability.

The main drawback with the matched pairs design is in the sampling process. As the number of characteristics that require matching increases, so a correspondingly large sample pool will be required to allow adequate matching to be possible. A further problem is that this design assumes that the researcher actually knows what extraneous factors need to be controlled for, i.e. matched — and in some circumstances this may not always be the case. Longitudinal and Cross-Sectional Designs

One question an FOT may have to answer is whether an effect of a treatment (e.g. driving with a system) changes over time? To investigate this, longitudinal or cross-sectional designs can be employed. While longitudinal surveys of this type can be very useful they do not provide an answer to the questions concerning why the changes may or may not have occurred. If things like that are measured in FOTs, one should already have a clear idea why a positive effect may disappear after a while. This could for instance be such factors as risk compensation (because the systems warn you, you can drive until you are warned).

One of the difficulties with longitudinal studies is that it is hard to keep subjects motivated during the entire study period, or people may move, or become ill. Because of these difficulties other methods for investigating changes over time have been developed and the cross-sectional design offers an alternative.

The cross-sectional design looks at changes over time by taking a number of cross-sections of the population at the same instant in time. This is obviously quicker and less costly than a longitudinal study, and there is a lower chance of actually ‘losing’ participants during the run of the experiment. On the other hand, a main drawback with the cross-sectional study is related to the previous experiences of the participants and how this might have an impact on the findings. Baseline and treatment period

The baseline period is often squeezed in the project and it's quite short, especially in relation to the treatment period. Ideally, the two would be equal lengths so that there is the opportunity in the baseline period for the same variations to occur that may occur in the treatment phase (such as seasonal effects, see 6.3.6). The more data available the more robust the results are.

6.2.3 Threats to validity: confounds and other interfering effects

As a general rule, the results of an empirical study should allow a clear decision if the hypothesized relationships between variables exist or not, i.e. if the hypotheses can be accepted or has to be rejected. In the best case, the researcher is able to attribute the changes he/she observed at the dependent variable without any doubts to the manipulation of the independent variable. The internal validity of an experimental or quasi-experimental study describes the extent to which this inference is unequivocally possible because the study has been designed in a way that alternative explanations for the effects are implausible or can be excluded. The internal validity of a study increases to the extent to which such alternative explanations can be ruled out. In the literature these factors are also described as confounded variables which need to be controlled by appropriate measures right from the beginning of a study.

In the literature several interfering effects have been described which interfere with the effect of an independent variable on a dependent variable and contribute to a decrease of internal validity if they are not controlled by measures implemented in the experimental design. The following effects constitute threats for internal validity of FOTs:

  • History: Unplanned events unrelated to the study might have an effect on the correlation between independent and dependent variables. For example, during the performance of an FOT an important paragraph of the road code might be changed (e.g. new speed limits for certain road categories) which is accompanied by increased police surveillance activities.
  • Maturation: Mainly effects due to experience and learning which affect the dependent variable and are (in long-term studies) erroneously attributed to the independent variable.
  • Testing: If the behaviour of interest is sampled at different times there might be a biasing effect from the number of times, e.g. by becoming more familiar with the test situation. For FOTs this might become relevant if subjects are tested at different times over the course of the study but not if their behaviour is sampled continuously and more or less unobtrusively.
  • Selection: In general, the participation in an FOT is voluntary which means that the strategy of recruiting subjects can have a biasing effect. For example, to offer a certain amount of money (e.g. 500 Euro) as compensation for the effort caused by completely finalising the study might be an incentive for participants with a low income whereas it might insult people with a very high income.
  • Drop-out: During the run of an FOT one has to take into account that not all subjects will finalise their participation as planned. However, this drop-out can have a biasing effect on the results of an FOT if the subjects who quit early differ systematically from those who finalise as planned with regard to relevant characteristics (e. g. socio-economic status, age, gender etc.).
  • Experimenter-bias: Effects on the dependent variable which result from the social interaction between the experimenter and the subjects which might occur, for example, if at the beginning of an FOT the experimenter explains the system functions very carefully to some subjects due to sympathy whereas he is careless with this at some others.

6.3 Experimental environment

The experimental environment is a critical element within an FOT , since it will determine the data that is collected and the ability to fulfil the objectives of the FOT . In general, environmental factors can be treated in several different ways, including

  • Explicitly included in an FOT because there is a particular interest in data connected to that environmental factor (e.g. motorway routes for lane departure warnings)
  • Explicitly included in an FOT because these environmental factors are part of the range occurring within a normal driving scenario (e.g. night time driving)
  • Measured scientifically so that the data relating to that environmental factor can be included within post trial data analysis (e.g. vehicle headways)
  • Recorded (in varying levels of detail), so that portions of data can be excluded from analysis (e.g. heavy rain, where all or some of the data from a particular day may be discarded; or overtaking manoeuvres where short periods of data within a larger set are discounted during a study of steady following behaviour).

6.3.1 Geographical location

In line with above, the geographic location can be chosen because it is representative of the intended area of use for a vehicle/system (e.g. predominantly motorway environments). Alternatively, the geographic area can be chosen because it displays the characteristics needed to collect the specific data the study is interested in during the FOT (e.g. the choice of mountainous and/or northern European environments in order to collect data on the use of systems in cold environments).

The population within a particular geographical location may affect the running of the FOT . For example, certain cultural issues, population characteristics, car ownership, use of new technologies, and language issues may be apparent. In addition the characteristics pertaining to the road and prevailing traffic may be of importance, including:

  • Road type and localities present
  • Traffic patterns, such as types of journeys (e.g. commuter or tourist travel), traffic flow, traffic density, vehicle types, and frequency and sophistication of journeys
  • Other transport options, the availability and costs and the inducement or penalties to encourage particular transport mode choices
  • Legal regulatory and enforcement environment, such as speed limits, levels of enforcement of traffic regulations (e.g. speed cameras), penalties for traffic or other violations, standardisation (e.g. compliance of road signs with international standards).

The geographical location may also have implications with regards to technical and other study issues, including infrastructure and data communication issues such as:

  • Network/beacon infrastructure for vehicle-infrastructure communication
  • Network coverage/reliability for telecommunications, especially if automatic over-the-air data transmission is used instead of manual data download
  • Localised GPS coverage issues (e.g. urban canyons, foliage cover)
  • Logistical issues, both in the validation and the experimentation phase safe and secure access to infrastructure equipment should be ensured for validation of the functions(especially in case of Cooperative Systems), for data download (if remote access is not available) and maintenance. As well target vehicles should be accessed for data download (if data is not being transmitted over the air) and for maintenance.
  • The availability and quality (resolution, scope and depth of content) of electronic maps that can integrate vehicle location for situation evaluation. Moreover, in case of complex functions and especially for Cooperative Systems, high accuracy maps may be required in order to implement these functions.
  • Availability of other data, e.g. from the police, highway authorities, fleet operators, maintenance personnel.

The most important point in relation to the geographical area is that it must be chosen based specifically on the objectives of the particular FOT , and in particular, in relation to the validity of the data that is being collected. There are two overall considerations:

  • Is it needed to consider a particular geographical aspect because it is relevant to the types of vehicles and or systems being studied?
  • Does a geographical aspect need to be considered to ensure that the results obtained can be generalised to the wider ‘population’ of interest (i.e. external validity)?

The starting point is to consider the overall objectives of the FOT , including the types of cars and systems that will be incorporated into the trial. The second major consideration is that of generalisation of the results. In particular it is necessary to ensure that geographical aspects are included to ensure that the data collected during a specific FOT can be generalised to the wider population of interest. The third factor to consider is whether the geographical factor is of particular interest in terms of data analysis. If it is desirable to analyse results according the presence or absence of a particular factor, then the geographical environment(s) must include that factor (and possibly variation thereof). Finally, it is important to note that the decision to collect data in a specific country might, due to legal requirements in the country, have an impact on how especially personal data later could be collected, handled and shared.

6.3.2 Road type

The road type is the environmental factor that perhaps has greatest dynamic influence on individual and collective driver behaviour, and hence impact on safety, mobility, traffic efficiency and the environment within an FOT . It is highly dependent on the geographic area, as discussed above.

The road type will encompass a number of variables which will influence driver use of systems, driver attitudes, driver behaviour, and driver outcomes. The FOT may want to include roads with specific characteristics, including:

  • Surfaced or unsurfaced roads
  • Minimum, average and maximum speeds of traffic
  • Number of lanes, and presence of lane marking
  • Visibility (of the environment and other traffic)
  • The types of manoeuvres that a driver will need to undertake (e.g. stopping at traffic lights, or overtaking manoeuvres)
  • Typical vehicular headways
  • Presence of safety features such as rumble strips or speed cameras.

Three main categories of road should be differentiated:

  • Urban
  • Rural
  • Motorway

Note that road classifications differ in different countries and there is no standard European classification. Ideally a map and a database of the region of deployment of the FOT should be established in order to reduce the time needed afterwards for collecting this type of data (on the basis of the video recording of the road scene). An electronic map containing at least the type of roads and the speed limits in force (and location of speed cameras) would greatly facilitate the task.

6.3.3 Traffic conditions and interactions with other road users

Traffic conditions and interactions with other road users are important considerations. A distinction needs to be made between:

  1. Traffic conditions in a general sense, which characterize a general level of constraints and which, in the same manner as the infrastructure zones, define the driving environment
  2. Other road users and their behaviour, which characterize an individual level of interaction between the driver and one or more other road users in the driver’s immediate proximity.

The traffic, as a general and contextual entity, can be characterized using several dimensions, for example:

  • Density: expressed in terms of the number of vehicles travelling in a given space;
  • Stability: this can be within a traffic stream (in which case it is expressed in terms of the frequency of speed variations on a traffic lane in a given unit of time) or between different traffic streams (in which case it is expressed in terms of the frequency of lane changes in a given unit of time)
  • Speed: the average speed of traffic
  • Composition: types of vehicle (light vehicle, heavy vehicle, van, motorcycle) and their relative proportions in a given traffic stream.

The interactions at individual level between the driver and one or more other road users in the immediate vicinity can also be characterized using several dimensions:

  • The category to which they belong (light vehicle, heavy vehicle, van, motorcycle, pedestrians)
  • Their speed and acceleration (direction and rate)
  • Their manoeuvres and behaviour (merging into the subject’s lane or pulling out into a lane, merging from an entry slip road, braking, etc.).

Other characteristics to be taken into account are:

  • Route choice
  • Temporary road/traffic variables
  • The traffic encountered
  • Impact of road measures on driver behaviour
  • Static and dynamic variables associated with the road

6.3.4 Roads to include

When setting up and running an FOT , it is necessary to consider the extent to which specific road types need to be incorporated into the trial and hence which participants need to be selected. The basic questions to consider are:

  • Are specific road types needed to answer the research questions for that sample?
  • Would any system of interest be used on a range of different road types?
  • Is it expected driver behaviour (in terms of safety, mobility, traffic efficiency and environmental impact) to differ according to the road type they are travelling along?
  • Is it needed to be able to compare results according to different road types?
  • Is it needed to include specific road types in order to generalise the results to a wider population?
  • Are interactions with other road users to be included in the analysis? If so video equipment needs to be installed.

By considering the above questions, one can determine whether a range of different road types are needed, or whether the FOT can concentrate on collecting data based on specific road types. In an FOT , the objective is usually to study the normal driver behaviour. This means that drivers should not be encouraged to change their normal routes.

6.3.5 Weather conditions

Weather conditions are hard to predict, control for, or measure accurately in an FOT. However, weather conditions and associated factors such as ambient lighting are relevant aspects for all FOTs, irrespective of the overall purpose of the study. A well designed FOT must consider a range of weather-related issues, with a view to including, targeting or excluding particular weather conditions. In order to include weather as an experimental variable within analysis, or to specifically include or exclude data for analysis, it is necessary to use a consistent taxonomy and definition of weather conditions.

Related to how weather factor are measured, is the level of accuracy employed in the measurement of weather factors, including location and time attributes. A further complication with weather factors is that it is often combinations of weather and other dynamic and static factors that have a practical impact on an individual driver or general traffic conditions within an FOT. Extreme weather conditions present a risk to FOTs because they often can’t be predicted, and can make journeys impossible, prevent access to vehicles, or in the worst case can destroy equipment.

Data may be confounded due to abnormal weather, for example snowfall increasing driver headways and reducing traffic speed or bright sunshine causing glare on screens in vehicles, or momentary distraction to drivers.

There are several ways of potentially measuring weather conditions:

  • In real-time using direct measurement of the factor, e.g. vehicle sensor to measure ambient temperature (which could then be used to link the use of features to outside temperature).
  • Indirect real-time measurement using a surrogate sensor, e.g. recording the use of the windscreen wipers to indicate when it is raining.
  • Subjective rating scales (completed by the driver or other) , e.g. a driver assessment of the degree of rainfall.
  • Post-hoc data mapping – the use of weather records to estimate the weather conditions.
  • Post-hoc analysis of video data by a trained data coder.

At a general level, there are four main considerations with regard to weather:

  • Which weather conditions are relevant?
  • Should they be ‘designed in’ or ‘designed out’ of the study?
  • Do weather conditions of interest have a macro (e.g. a rainy day) or micro (e.g. reflected glare) level impact?
  • What level of data is needed, and how is this obtained?

6.3.6 Time of day and seasonal effects

Temporal factors such as time of day, and seasonal effects have a considerable impact on the planning of FOTs, and the analysis of data. They can really cause problems for explaining the effects that are found (e.g whether they are caused by the system under test or by seasonal circumstances). In contrast to the weather effects outlined above, the temporal factors can usually be predicted, and so it is usually easier to deal with the issues successfully. The main issues to do with the time of day, week, and seasonal variations are:

  • Influence on driver state (e.g. sleepiness)
  • Disruption caused by external events, for example school opening times
  • Influence on traffic levels
  • Other temporal influences on traffic
  • Impact on vehicle occupants
  • Glare
  • Ambient light levels
  • Seasonal confounding of data collection
  • Influence on route choice
  • Pragmatics to do with drivers work and life schedules
  • Using time of day as a surrogate, for example, time of day can be used to specify or control for traffic levels or ambient light levels.

Time of day and seasonal effects are different to weather issues in several ways, including:

  • Time of day and seasonal effects are much more predictable than weather conditions
  • They are often proxies – i.e. not important in themselves, but important because they result in variation of a factor of interest (e.g. traffic levels, or level of the sun above the horizon)

These two factors mean that a greater emphasis should be placed on planning around relatively predictable time of day and seasonal effects, and considering their impact on the FOT. There are different ways to (partly) deal with seasonal effects: have a control group, or adjust the length of the test. The latter means that either there is a short time period for the FOT, so that baseline and treatment phase take place in the same season, or that the FOT is very long (more than a year), so that baseline and treatment phase include the same seasons.

6.4 Conducting a pilot study to test the evaluation process

A pilot study can be defined as a “small scale version, or trial run, done in preparation for the major study” Polit et al]., 2001)); it goes before large-scale quantitative research and is very useful to test the research instruments, identify any performance problems and ensure a reasonable durability of the technology instruments adopted. Conducting a pilot study is a fundamental phase to get warning in advance about practical problems or difficulties that may affect the study and it is also necessary to prepare the deployment of the FOT and to support the design of the relevant tools for the evaluation process (Saad, 1997; Saad and Dionisio, 2007). This task should be performed early in the evaluation process. It represents an important step for the mobilisation and the dialogue between the various teams involved in the FOT and for promoting a common framework and consensus for the evaluation process.

The relevance of conducting a pilot study and the time required are often under-estimated. To better understand the importance of this step a list of general reasons for conducting a pilot study (for a wider overview, see Polit et al]., 2001) is shown below:

  • Developing and testing adequacy of research instruments;
  • Assessing the feasibility of the full-scale study;
  • Testing the research protocol;
  • Testing whether the sampling frame and technique are effective;
  • Verifying the likely success of proposed recruitment approach;
  • Identifying logistical problems which might occur using proposed methods;
  • Testing variability in outcome to help determining sample size;
  • Collecting preliminary data;
  • Verifying what resources (finance, staff) are needed for a planned study;
  • Verifying the proposed data analysis techniques to uncover potential problems;
  • Testing the research questions and research plan;
  • Training the researchers, both in data analysis and in personal integrity issues.

Going more in detail, in FOTs these preliminary field tests have to deal with three main levels of analysis with specific objectives.

  1. Obviously, the first preliminary field tests have to check the technical functioning of the data collection systems in real driving situations. They should enable to identify potential problems of sensor calibration or drift and thus to establish the periodicity of maintenance procedures during the FOT. They should also permit to validate the data collection procedure from data acquisition, data transmission to data storage.

The technical teams involved in the FOT should be in charge of these field tests.

  1. The second level of preliminary field test deals mainly with the issue of assessing the usability and usage of the systems under study and of identifying the main critical issues associated with their use in real driving situations. This is particularly relevant for:
  • Structuring the familiarisation phase of the drivers before their participation to the FOT];
  • Contributing to the design of the questionnaires for the subjective assessment of the systems;
  • Testing and/or improving the various tools developed for data processing, such as automatic identification of critical “use cases” and “scenarios” and video based identification of triggering events or categorisation of road and traffic contexts.
  • Identifying a number of critical scenarios when using the systems, scenarios that could be investigated more extensively when the data gathered from the FOT are processed and analysed.

This test requires the participation of a sufficient number of drivers (depending on the target population in the FOT) and should be performed in real driving situations. An experimental journey on the road could be designed for that purpose (depending on the hypotheses formulated). This level of analysis provides useful data for designing the relevant tools for the evaluation process as mentioned above, for estimating the time required for data processing and data analysis and thus calibrating these phases in the FOT. It may be seen also as an opportunity for training the team (s) in charge of data processing. Finally, it represents an important step for testing some of the hypotheses formulated in the FOT and/or for refining them. In this phase, it’s important to underline that the drivers used in the pilot study will not be part of the final sample and therefore most of them do not need to be naive.

Psychologists, ergonomists, and human factors experts should perform these tests in close cooperation with the team in charge of statistical analysis as well as the team in charge of developing data processing tools.

To test whether what is asked from participants is realistic, it is a good idea to pilot yourself before letting 'real' participants undergo the testing. Let someone (or several persons) from the project team drive in an FOT vehicle, answer the questionnaires, fill in the travel diaries, etc. This is especially relevant for people working on subjective data collection and analysis.

  1. The third level consists of testing the feasibility of the overall evaluation process from the selection of the participants through to data collection. It is a kind of final rehearsal before the deployment of the FOT. It enables in particular a check of the communication process between the various teams involved in the practical deployment of the FOT of the robustness of the technical tools designed for data collection and transmission and of the robustness of the evaluation tools used in the assessment.

The result of the pilot can be a no-go if too many problems are still present. In this case it could be reasonable to delay the start of the data collection phase and to repeat some earlier steps. This means that there are feedback loops in the piloting process.

6.5 Controlled testing

As described in section 6.2, a power analysis is required to determine the necessary sample size for conducting an FOT. The estimated or simulated frequency of events and the penetration rate are a key element in this calculation. It might prove that a naturalistic FOT is not feasible, due to the low frequency of events resulting in a very high number of needed vehicles or a very long experimental period.

In such cases, one possible option is to allow controlled or semi-controlled testing. This means, that all or a certain group of the drivers are instructed before or during the test execution to behave in a certain manner. For instance a professional driver might be instructed to simulate a car breakdown to trigger the car breakdown warning function in passing (uncontrolled) vehicles. In the controlled approach, the test drivers are called into the test and they are asked to drive the test route with some arrangements. Preferably, the tests will be conducted in real traffic. Some tests, however, must probably be organised on a closed test track. One test may include several runs of the route. Several situational variables can be fixed in advance. The tests can be designed so that some variables are systematically controlled during the data collection. Based on the practical constraints different levels of control, from totally naturalistic to totally controlled, can be chosen, taking into account that controlled testing breaks with the principle of un-interfered experiments and should be chosen, only if the FOT boundary conditions and/or the power analysis do not allow a naturalistic test of the function under test. Controlled testing can also be used as a supplement to naturalistic FOTs.

Table 6-1 provides an overview of differences between controlled tests and naturalistic driving studies [DRIVE C2X DOW]

Table 6-1: Complementary uses of naturalistic and controlled tests in cooperative system evaluation Table6.1.png

6.5.1 Operationalisation of tests

Controlled testing requires a strict operationalisation process from the high-level hypothesis down to the individual tests to be performed. A three-step process is advised:

In controlled tests all drivers are instructed to follow a defined test scenario. This scenario is created from the hypothesis defined and tries to provoke a system behaviour, which causes the activation of the function to gather data needed to prove or disprove the hypothesis. The scenario should therefore contain:

  • functions addressed
  • hypothesis addressed
  • Description of desired situation
  • List of desired participant types
  • List of desired vehicle types
  • List of vehicle groups (e.g. one group as broken down vehicle, one group for the passing vehicles)

It is sufficient if scenarios are described in non-formal text. However it might be advisable to use a pre-defined scheme to describe them. To follow up in the operationalisation, the scenarios have to be further refined into test scripts. A test script builds upon one scenario and maps it to a given area and a given project setup. To generate this test script each group has to be mapped onto the road network in the test site. A route is created, which defines for each group, where the vehicles will exactly drive and what timing is desired/expected.

A baseline can be created by assigning a separate control group to the test script with systems switched off.

This test script therefore contains:

  • One route for each participant group with timing information (including individual vehicle timing offset)
  • A desired state for the functions to test
  • A desired state for the logging and monitoring systems

In a final step, the test script is turned into a test case before actually starting the test. For this test case the actual drivers and vehicles are assigned to the groups. Also a date and time for the test case is fixed. One test script might be scheduled several times as a test case to gather enough qualified data to filter out outliers in the execution. Drivers and vehicles may change for different test cases of the same test script. (In the fig. 6-2 an example is shown for the process explained in this section.

Im 6.2.png

Figure 6-2 : Operationalisation of test scenarios

6.5.2 Operationalisation tool chain

In larger FOTs a dedicated set of tools is highly advised for the operationalisation process.

In a scenario editor tool all scenarios can be entered in pre-defined fields. These map to the textual information needed to describe the scenario, but also define formal aspects, such as desired number of test iterations or whether a pre-validation in simulation is necessary. It should also list the necessary performance indicators.

The script editor tool is a map-based tool. It loads scenarios and maps the implicit information on what should happen to explicit routes in one specific location. For each of the driving groups one route needs to be created. Intelligent mapping tools allow to use the underlying road network data in the map (e.g. OSM, GMaps) to automatically follow the street. To get a first idea of how the script will perform a real-time minimal simulation can be used to see virtual vehicles move on the defined routes. Thus synchronisation between groups can be reached to successfully create the desired situations.

The script editor tool also creates log profiles to be taken during the test based on the performance indicators contained in the scenarios. For this process the measures needed for all performance indicators are merged. Sophisticated scripts can also contain time-bound or location-bound markers, which are executed in the vehicles once it passes the given point. These markers are used by the test system for instance to trigger:

  • a change of log profile (e.g. extended logging, when entering the test area)
  • a driver instruction
  • activation or deactivation of functions (e.g. for control group)
  • a synthetic function behaviour (e.g. turn on the broken-down vehicle warning)

In a final step of operationalisation the test script has to be mapped to the current test site situation. A test case is generated from the test script by allocating available vehicles and drivers to the groups shortly before starting the test. This should not be done in advance, since fluctuations in vehicle pool and drivers are to be expected for larger fleets. The tool chain can support this with a dedicated control connection to the vehicles.

6.5.3 Test execution

In theory a controlled test can run unsupervised. In practice controlled tests need live supervision to have an acceptable success rate. (Note, that a test is determined to be successful, if the desired scenario has been created – not necessarily if the function was triggered)

The supervision of a controlled test is preferably managed with a test control tool. This tool displays in real-time the status of all participating vehicles (monitoring) and the selected test case. Thus, the operator can monitor test progress and determine deviations from the original script.

A way to directly interact with test drivers is desired. Using the same connection as the monitoring data, the test control tool can send messages back to the vehicles. These messages can contain:

  • Textual instructions to the drivers (to be displayed on HMI)
  • Voice instructions to drivers
  • scenario script and test case information, e.g. Test name, Schedule, Route information
  • Trigger information for the test-system itself (log profile changes)
  • Trigger information for the system under test

It has to be decided, if driver instructions are necessary for the FOT. If so, they can be either displayed on the system HMI or on a dedicated device.


  1. For further information on how to choose the sample size, the reader should refer to FESTA Deliverable D.2.4[1].