Automated Driving Datasets

Jump to navigationJump to search


This information has been collected by the CARTRE EU-project in 2018, with some pointers coming from the ENABLE-S3 project. If you have datasets to add, please edit this page or e-mail
Generally, all naturalistic driving and image classification datasets are usable for automated driving studies, as they can be used as training data. Naturalistic driving data indicates how humans behave in different scenarios and the data can be used to identify different testing scenarios for automation. Most of such naturalistic driving datasets from around the world are already featured on the Data Catalogue. They are multi-purpose, enabling a wide set of research questions not limited to automated driving development.
This page details the publicly available datasets that are specifically collected with automated vehicles (or the like) or for development of automated vehicle functionality. To date, these datasets are quite different than FOT datasets from large-scale user tests, which the FOT-Net/CARTRE have created an online catalogue for. The following datasets can be classified as development data. Data from large-scale user tests has not yet been made widely available, much due to competitive development status of current prototype vehicles.


Oxford RobotCar Dataset
The Oxford University has collected a dataset consisting of 1000 km recorded driving in central Oxford over the period of 1.5 years. One needs an academic e-mail address to register, ending with .edu or Alternatively, the University can be contacted for negotiating a commercial license. The data is mainly intended for non-commercial academic use. The dataset features almost 20 million images. [1] (W. Maddern, G. Pascoe, C. Linegar and P. Newman, "1 Year, 1000km: The Oxford RobotCar Dataset", The International Journal of Robotics Research (IJRR), 2016.)

Apollo project
Apollo is an automated driving ecosystem and open platform initiated by Baidu. It features source code, data and collaboration options. The platform offers various types of development data, e.g. annotated traffic sign videos, vehicle log data from demonstrations, training data for multi-sensor localization and scenarios for their simulation environment. [2]
ApolloScape, a part of Apollo, additionally offers training data for semantic segmentation (pixel-level classification of video frames, usually input for training neural networks). As of March 2018, the dataset contained 74 thousand video frames. [3]
Data uploaded by partners is considered to be private by default [4], but it can be marked public of even so, that specific partners cannot access the data. Sample data is available but wider access to data requires negotiated licenses. Apollo features a business model where one part of the model is about getting wider access to the resources through data and SW contributions. is a startup that has built advanced neural network components that enable self-driving features. They have open sourced parts of their data and software code. They sell dashcam components that go together with the software. Users can submit data and earn community points. [5] [6]

KITTI Vision Benchmark Suite
The Karlsruhe Institute of Technology has open sourced six hours of data captured while driving in Karlsruhe. The dataset is famous for its use in vision benchmarks. Annotations / evaluation metrics are provided along with raw data. The dataset cannot be used for commercial purposes. [7]

The Cityscapes dataset features 5000 images with high quality annotations and 20,000 images with coarse annotations from 50 different cities. The images are annotated at pixel-level and offer training material for neural network studies. When the dataset is used in studies, the users are requested to cite related dataset papers. [8]

The Udacity open source self-driving car project
Udacity is building an open source self-driving car. The project offers example data recordings from more than ten hours of driving and annotated driving datasets, where objects in video have been marked with surrounding boxes. In addition to open sources tools, Udacity publishes programming challenges to further the development. The project plans to attract students from around the world. [9]

HD1K Benchmark Suite
This dataset for optical flow (movement measured from video) benchmarking was created by the Heidelberg Collaboratory for Image Processing in close cooperation with Robert Bosch GmbH. It contains over 1000 frames of high-resolution video with diverse weather. The dataset contains reference information about movement. It is used for optical flow algorithm benchmarking. The data is general-purpose but it was collected with a measurement van. [10]

Playing for data
This Darmstadt University dataset is an example on recent efforts in the academic community to extract neural network training data from computer games. In games, every pixel belongs to known objects. This reduces the need for manual annotation work, but certainly the data is limited to the details the game can generate. The datasets consists of 24966 densely labelled frames and it is compatible with the Cityscapes dataset. [11]

Berkeley DeepDrive
The consortium has released 400 hours of video, including also GPS and inertial measurement unit data. The basic license is limited for personal use. [12]

Málaga Urban Dataset
This stereo camera and laser dataset was collected on a 37 km route in urban Malaga. The files are downloadable right away, under BSD open source license, requesting referral to a scientific paper by authors from universities of Almeria and Malaga. [13]

Update May 2020, new datasets
- Lyft Dataset on camera and LiDAR inputs from a vehicle fleet. The LiDARs have 40 or 64 layers. The dataset contains a semantic map and objects annotated as 3D boxes. They also feature dataset-related competitions. [14]
- The nuScenes dataset by Aptiv. The dataset consists of 1000 20-second scenes. It is free for non-commercial purposes. [15]
- Waymo Open Dataset is a non-commercial and growing dataset. It consists of 10 Hz data from five LiDARs and five cameras. [16]
- Canadian Adverse Driving Conditions Dataset. The first public automated driving dataset featuring snowy conditions. It has been stored in a similar format than the KITTI raw dataset. [17]
- Mapillary Vistas Dataset contains 25 thousand images with pixel-level annotation (semantic segmentation). [18]
- Ford AV Dataset was collected in 2017-2018 in Michigan. The dataset comes in Rosbag format. [19]
- Audi Autonomous Driving Dataset (A2D2) [20]