This is a webpage to distribute UAV video datasets created/assembled within the MULTIDRONE project. If one uses any part of these datasets in his/her work, he is kindly asked to cite the following two papers:

  • I. Mademlis, V. Mygdalis, N.Nikolaidis, M. Montagnuolo, F. Negro, A. Messina and I.Pitas, "High-Level Multiple-UAV Cinematography Tools for Covering Outdoor Events", IEEE Transactions on Broadcasting, vol. 65, no. 3, pp. 627-635, 2019.
  • I. Mademlis, N.Nikolaidis, A.Tefas, I.Pitas, T. Wagner and A. Messina, "Autonomous UAV Cinematography: A Tutorial and a Formalized Shot-Type Taxonomy", ACM Computing Surveys, vol. 52, issue 5, pp. 105:1-105:33, 2019.

In order to access the datasets created/assembled by Aristotle University of Thessaloniki, please complete and sign this license agreement. Subsequently, email it to Prof. Ioannis Pitas so as to receive FTP credentials for downloading.

In order to access the datasets created/assembled/provided by other MULTIDRONE partners (RAI, Deutsche Welle), please complete and sign this license agreement. Subsequently, email it to Alberto Messina so as to receive FTP credentials for downloading.

AUTH Multidrone Datasets
RAI/Deutsche Welle Multidrone Datasets



AUTH Multidrone Datasets


To acquire these datasets, please complete and sign this license agreement.. Subsequently, email it to Prof. Ioannis Pitas so as to receive FTP credentials for downloading.


If you are granted permission to access, the following datasets are available (NOTE: For datasets assembled from Youtube videos, only links to the videos and the relevant annotation files, if any, are provided).

-DCROWD_VID
A dataset for visual human crowd detection was assembled from Youtube videos, licensed mainly under Standard Youtube License. It is a collection of 53 videos selected by querying the Youtube search engine with specific keywords describing crowded events (e.g. parade, festival, marathon, protests). Non-crowded videos have also been gathered by searching for unspecified drone videos. No annotation is currently available.

-SHOT_TYPES
A dataset containing 46 professional and semi-professional UAV videos was assembled from Youtube material. Care was taken to include as many UAV framing shot types and UAV/camera motion types as possible, based on the UAV shot type taxonomy defined in the context of the MULTIDRONE project. No annotation is currently available.

-Annotations_boats_Raw
A dataset for boat detection/tracking was assembled, consisting of 13 Youtube videos (resolution: 1280 x 720) at 25 frames per second. Annotations are not exhaustive, i.e. there may be unannotated objects in the given image frames. An annotation file is included along with each video file. The annotations are stored in the text files with the format:

  • frameN
  • #objects
  • x y w d
where x, y indicate the upper left corner of the bounding box and w, h describe its width and height in frame N.


-Annotations_Bicycles_Raw
A dataset for bicycle detection/tracking was assembled, consisting of 7 Youtube videos (resolution: 1920 x 1080) at 25 frames per second. Annotations are not exhaustive, i.e., there may be unannotated objects in the given video frames. An annotation file is included along with each video file. The annotations are stored in the corresponding text files with the following format:

    Channel   frameN   ObjectID   x1   y1   x2   y2   0   ObjectType/View
where x1, y2, x2, y2 refer to the upper left and bottom right corner of the bounding box, Object ID is a numerical object identifier (non-consistent, non-reliable), frameN is the number of video frame, while ObjectType/View (where applicable) labels the object class and categorical pose relative to the camera (“1F” means Front View, “1B” means Back View, “1L” means Left View, “1R” means Right View, 2 means Bicycle Crowd, 5H means High-Density Human Crowd, 5L means Low-Density Human Crowd, 0 denotes irrelevant TV graphics).

-Benchmark_RAI
A dataset for bicycle detection/tracking was prepared by processing/editing and annotating material made available by RAI under the “Giro 2017” MULTIDRONE dataset. It is a dataset consisting of two videos (resolutions: 768 x 432 and 960 x 540) at 25 frames per second. The videos are from Giro d’Italia TV coverage provided by RAI. Annotations are exhaustive, i.e., all objects of a certain class present in a given image are covered by an annotation. An annotation file is included along with each video file. The annotations are stored in the text files with the following format:

  • frameN
  • #objects
  • x y w d
where x, y indicate the upper left corner of the bounding box and w, h describe its width and height in frame N.

-person_detection_UAV
A visual person detection dataset has been prepared, consisting of two UHD videos (2160p - 3840 x 2160) at 25 frames per second. The dataset was shot in AUTH Campus employing AUTH research personnel as actors. The camera was mounted on a DJI Phantom IV UAV and pointed towards the ground. The drone was either hovering or flying at low speed, while the actors were walking in random directions. The total video duration is 4 minutes and 20 seconds. An annotation text file is provided along with each video file. Each line refers to a corresponding video frame in the following format:

    number_of_frame person_id min_x min_y max_x max_y

-AUTHDroneSunday_VID
A dataset for visual human crowd detection was collected, in the form of 6 videos shot inside the AUTH Campus using a DJI Phantom IV UAV. The videos depict a crowd of visitors during an "AUTH at Sundays" event. The video format is UHD 20160p, with a resolution of 4096 x 2160 at a rate of 25 frames per second. There are two scenes, the first containing a sparse crowd that moves near exhibition stands and the second a dense static crowd that watches a presentation done by AUTH students. The second scene has 5 videos that are shot from different view angles. No annotation is currently available.


-uav_detection
A dataset was prepared by AUTH for visual drone detection. It consists of 12 Full HD videos (1080p - 1920 x 1080) filmed using two cameras. The cameras were pointed at the general direction of a flying DJI Phantom IV. The drone is shot against various backgrounds, including the sky, trees, buildings and roads. In 11 out of the 12 videos, the two cameras are at ground level and looking up to the drone, maintaining a bottom view of it. In the last video the camera is at the same or higher elevation than the drone, maintaining mostly side and top views of it. The total video duration is 31 minutes. About 39K video frames were annotated for drone detection, with annotations of the following format:

    frame_number, number_of_drones, x_min, y_min, width, height

-uav_detection_2
A dataset for drone detection was collected using one camera held by a person on the ground, within AUTH campus. In total, 11 Full HD videos were produced, which contain shots of a DJI Phantom IV, shot against various backgrounds and at multiple sizes and views. The total duration of this dataset is 15 minutes, or about 22K frames at 25fps. No annotation is currently available.


-landing_sites
A dataset of videos depicting potential UAV landing sites has also been captured. It consists of 2 videos (at a resolution of 4096 x 2160 pixels and with approximate total duration 5 minutes) captured by a DJI Phantom IV within AUTH campus, containing potential landing sites around a point of interest (POI), or generally in the university campus. The potential landing sites include terrain locations characterized by small terrain slope and no obstacles, so as to maximize the possibility of safe UAV landing. No annotation is currently available.


-AUTHObservatory_VID
A dataset named “AUTHObservatory_VID” was also collected by AUTH for building/Point-of-Interest detection purposes. It consists in two videos shot inside the AUTH Campus using a DJI Phantom IV UAV, containing the building of the observatory with the telescope dome. This is a unique building in the campus that can be considered as a Point-Of-Interest in the context of the other buildings. The video format is UHD 2160p, with a resolution of 4096 x 2160 at a rate of 25 frames per second. The view angles include a top view and a 360 perspective of the building sides from a height of 30m-50m. No annotation is currently available.


-face_deid_UAV
A dataset for face de-identification consists of one 3840x2160 video, which was shot by flying a DJI Phantom IV. The drone was flying at a height of about 3-5 meters and its camera was pointed downwards recording the subjects walking-by and occasionally looking directly at it. The total video duration is 45 seconds with a framerate of 25 fps. Each face in the 1124 extracted frames is annotated with a bounding box, using the pixel coordinates of its top left corner followed by its width and height, also in pixels. So the annotation of the dataset is in the following format:

    frame_number, number_of_faces, bounding box for each face in this frame

-face_deidentification_UAV_mult_views
A dataset for face de-identification purposes was collected by a DJI Phantom IV UAV and consists of one 4096 x 2160 video. The UAV was flying at a height of about 3-5 meters, while the subjects were recorded from multiple viewpoints walking-by and occasionally looking directly at it. The total duration of the video is 2 minutes and 23 seconds with a framerate of 25 fps. No annotation is currently available.


-Annotations_eights_DW_raw
A dataset for boat detection/tracking was created, using footage from DW, consisting of 3 videos (resolution: 1280 x 720) subsamplbed at 25 frames per second. An annotation file is included along with each video file. The annotations are stored in the text files with the format:
  • frameN
  • #objects
  • x y w d
where x, y indicate the upper left corner of the bounding box and w, h describe its width and height in frame N.


-UAV_Parkour
A UAV dataset for parkour athlete detection was assembled from 8 Youtube videos, depicting both male and female athletes performing pakour at different landscapes, under differ lighting conditions. The annotations provided are stored in the format:
  • frameN
  • #objects
  • x y w d
where x, y denote the upper left corner of the bounding box and w, h its width and height. As spectators are also depicted in the dataset videos, the annotation is 2-class, with label 0 assigned to 'person' class and 1 to 'athlete' class, but it is not exhaustive, i.e., there may be unannotated objects in some frames. The labels are provided in files with the following format:
  • frameN
  • #objects
  • 0 or 1
similar to the annotation files.

-Final_bicycles
A dataset for bicycle detection/tracking was created, consisting of 6 HD videos, at 50 or 25 frames per second. An annotation file is included along with each video file. The annotations are stored in the text files with the format:
  • frameN
  • #objects
  • x y w d
where x, y indicate the upper left corner of the bounding box and w, h describe its width and height in frame N.


-Final_boats
A dataset for rowing boat detection/tracking was created, consisting of 5 HD videos, at 50 or 25 frames per second. An annotation file is included along with each video file. The annotations are stored in the text files with the format:
  • frameN
  • #objects
  • x y w d
where x, y indicate the upper left corner of the bounding box and w, h describe its width and height in frame N.


-Final_single_boats
A dataset for single boat detection/tracking was created, consisting of 5 HD videos, at 50 or 25 frames per second. An annotation file is included along with each video file. The annotations are stored in the text files with the format:
  • frameN
  • #objects
  • x y w d
where x, y indicate the upper left corner of the bounding box and w, h describe its width and height in frame N.


-UAV_BothKamp
A UAV dataset for parkour athlete detection was created by annotating the footage acquired during MULTIDRONE experimental media production. It consists of 6 videos (1920 - 1080) at 50 frames per second. The annotations provided are not exhaustive, i.e., there may be unannotated objects in some frames, and they are stored in text files with the following format:
  • frameN
  • #objects
  • x y w d
where x, y denote the upper left corner of the bounding box and w, h its width and height.


-Aerial_Crowd_Auth
An aerial crowd detection dataset was created by annotating videos captured by two different RGB cameras (olympus, sony) placed ~10m over the ground, recording a human crowd from different viewing angles. The videos were partially annotated, resulting in 563 1920x1080 RGB images along with their segmentation maps, which consist of two classes ('crowd', 'non-crowd'). The segmentation maps are available as .png images, where pixels belonging in the 'crowd' class are in red color, while 'non-crowd' class pixels are in black color.




-UAV_Crowd_Seville
A UAV crowd detection dataset was created by annotating videos captured by three different operating UAV cameras. The videos were partially annotated, resulting in 603 1920x1080 RGB images along with their segmentation maps, which consist of two classes (crowd, non-crowd). The segmentation maps are available as .png images, where pixels belonging in the 'crowd' class are in red color, while 'non-crowd' class pixels are in black color.





RAI/Deutsche Welle Multidrone Datasets


To acquire these datasets, please complete and sign this license agreement. Subsequently, email it to Alberto Messina so as to receive FTP credentials for downloading.

If you are granted permission to access, the following datasets are available (non-annotated).

-IGA_2017
The footage was taken during the International Horticultural Exhibition (IGA) in Berlin, June 2017. The drone used was a Mavic Pro, parts of the footage has been published on Deutsche Welle’s Internet format ‘Daily Drone’ https://www.youtube.com/watch?v=MBgjr3ua554.

-WUENSDORF_2017
The footage showing a former Soviet base in the Federal State of Brandenburg, Germany, was shot with one Inspire 2 and one Mavic Pro (July 2017). The footage was used to create another Daily Drone clip https://www.youtube.com/watch?v=IIwQmGsXTNs. It shows the remains of the Soviet barracks and a Lenin statue.

-MUENCHEBERG_2017
The footage was taken in Muencheberg, Brandenburg, Germany, in October 2017, using one Inspire 2 and one Mavic Pro. In total, 29 clips were produced focusing on the MULTIDRONE Camera Motion Types taxonomy. The dataset includes the clips and the associated flight records.

-MUENCHEBERG_2018
One Inspire 2, one Mavic Pro and one Phantom 4 were used by a Deutsche Welle team to film a group of cyclists simulating a bicycle race, in Muencheberg, Brandenburg, Germany, during May 2018. The shoot was accompanied by colleagues from the University of Bristol who created simulations of such a bike race prior to the actual shooting. The parameters of these simulations such as flight altitude, camera angle, etc., were used during the recording of the race. Flight records are provided.

-NAUEN_2018
One football player and one cyclist were filmed with one Inspire 2 and one Mavic Pro in Nauen, Germany, during April 2018. The shooting focussed on a subset of the UAV Camera Motion Types identified in the MULTIDRONE UAV shot type taxonomy (Lateral Tracking Shot, Vertical Tracking Shot, Pedestal/Elevator Shot With Target, Chase/Follow Shot, Orbit). The dataset contains 19 clips and their associated flight records.

-GIRO_2017
This dataset consists of 9 clips taken form 2017 edition of the Giro d'Italia at 1920x1080 resolution and MP4 format at 25 frames per second.

-GIRO_2018
This dataset consists of 26 clips taken form 2018 edition of the Giro d'Italia at 1920x1080 resolution and MP4 format at 25 frames per second.

-ARCHIVE_2018
This dataset consists in 36 clips taken from RAI archives and depicting various shots of bikers, football players, boat racers and other additional outdoor sports (ski, sailing). Resolution is varying from 720x576 to 19020x1080 depending on the stored copy in the archive.

-METEORA_2018
This dataset contains UAV footage filmed for Deutsche Welle's "Euromaxx – Lifestyle in Europe", in the mountains of Meteora, Greece, in August 2018. The footage mainly depicts rock climbing and it was shot using two drone models (a Mavic Air and an Inspire 2), as well as a variety of different shot perspectives, movements and angles.

-WANNSEE_2018
This dataset contains UAV footage filmed by a Deutsche Welle team during the live rowing regatta "Rund um Wannsee" of 2018, one of the longest races in the world, set in the southwest of Berlin. Two drone teams were set along the track, a third drone was used for aerial overview and two additional standard camera teams covered the rest.

-WANNSEE_2018_Test
This dataset contains UAV footage filmed by a Deutsche Welle team before the live rowing regatta "Rund um Wannsee" of 2018. Three drones were employed (an Inspire, a Mavic Air and a Mavic Pro), with flight records provided.

-CYCLISTS_2019
This is a dataset depicting a bicycle race training session in northern Italy (May 2019). The footage was filmed by RAI, using a DJI Phantom UAV flying above the bikers.

-YOUTUBE_Drone_Footage
This is a dataset consisting in the list of links of roughly 10 hours of drone footage on YouTube on soccer, rowing and cycling.