The Fine-grained Incident Video Retrieval (FIVR) is the problem where: given a query video, the objective is to retrieve all associated videos, considering several types of associations. Hence, FIVR offers a single framework that contains several retrieval tasks as special cases. In this work, three association types of related videos are considered:
For the collection of the FIVR-200K dataset, we set up the following process to retrieve videos about major news events that took place during the recent years. First, we crawled Wikipedia's Current Event page to build a collection of the major news events since the beginning of 2013. Each news event is associated with a topic, headline, text, date, and hyperlinks. Five examples of the collected news events are displayed on the following table.
|Syrian civil war||2013-01-01||Armed conflicts and attacks||Fierce clashes erupt near the Aleppo ...||BBC|
|Greek debt crisis||2015-07-07||Business and economics||Eurozone leaders hold a crisis meeting ...||Reuters|
|Hurricane Harvey||2017-08-29||Disasters and accidents||The death toll from Hurricane Harvey ...||New York Times|
|Artificial intelligence||2016-01-27||Science and technology||A computer program called AlphaGo ...||MIT Technology Review|
|Boston Marathon Bombing||2014-07-21||Law and Crime||Azamat Tazhayakov, a friend of accused ...||MSN News|
We retained only news events categorized as "Armed conflicts and attacks" or "Disasters and accidents". We selected these two categories to find multiple videos on YouTube that report on the same news event, and ultimately to collect numerous pairs of videos that are associated with each other through the relations of interest (DSV, CSV and ISV). The time interval used for crawling the news events was from January 1st 2013 to December 31st 2017. A total of 9,431 news events were collected, and 4,687 news events were retained after filtering. Then, the public YouTube API was used to collect videos by providing event headlines as queries. The results were filtered to contain only videos published at the corresponding event start date and up to one week after the event. Furthermore, they were filtered to contain only videos with duration up to five minutes, which resulted in the collection of 225,960 videos (~48 videos/event).
For the automatic selection of the query videos, we had deployed a retrieval pipeline that estimated suitability of candidate videos as benchmarks. A video graph is constructed by connecting with an edge videos with high similarity score. The simlarity between videos derives from the visual similarity of the video content and the textual similarity of the video titles. Then, the connected components of the video graph are extracted and filtered based on their size, the number of unique uploaders in the component, and the publication date of their videos. Attempting to find the original version of videos in each cluster, we chose the video that was published earliest as the query video. The total number of resulting queries using this process was 635. Since it would be overly time consuming to annotate all of them, we selected the top 100 as the final query set (ranked based on the size of the corresponding graph component). In the following Figure, some examples of video queries and relevant video for each association type from the FIVR-200K dataset are illustrated.
The dataset annotations can be found here, where you can find the events crawled from Wikipedia (events.json), the video annotations (annotation.json) and the Youtube video ids (youtube_ids.txt).
The labels used for the annotation and their corresponding definitions are as follows:
If you have any comment or encounter any issues, please get in touch with Giorgos Kordopatis-Zilos.
If you use FIVR-200K dataset in your research, please cite the following paper:
The video dataset is provided under the Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).