Introduction

Welcome to the VIVA traffic sign detection benchmark! This challenge uses the LISA-TS dataset. This dataset consists of the original LISA-TS set, LISA-TS Extension Training, and LISA-TS Extension Testing. The two first are provided with annotations for training, while the final is used for testing and thus provided without annotations.

The challenge evaluates detectors on four superclasses:

  • Speed limit
  • Warning
  • No turn
  • Stop

As in the German Traffic Sign Detection Benchmark (GTSDB), detectors are evaluated using Area Under Curve (AUC) of a precision-recall curve for a particular sign superclass. Evaluation of bounding boxes is done using the PASCAL overlap requirement of 50%. Precision-recall curves are interpolated as specified in [1]. There will be a class winner for each superclass, and the overall winner is the entry with the highest mean AUC across all four superclasses. Partial submissions with only a subset of the superclasses are allowed, but discouraged, and these will not be considered for the overall leaderboard.

Submissions

Submissions take the form of a zip-file containing detections from several runs with different parameters (in order to be able to generate a PR-curve) for each superclass, split in folders. The structure is as follows:

results.zip
  ∟ speedLimit
    ∟ results1.csv
    ∟ results2.csv
    ∟ ...
  ∟ warning
    ∟ results1.csv
    ∟ results2.csv
    ∟ ...
  ∟ noTurn
    ∟ results1.csv
    ∟ results2.csv
    ∟ ...
  ∟ stop
    ∟ results1.csv
    ∟ results2.csv
    ∟ ...

The four folder names matter, but the zip-file and the results-files can have any names. There can be an arbitrary number of results files in each folder.

Each results-file should be formatted in the same way as for GTSDB: a CSV-file with a detection on each line, fields separated by semi-colon (;). The file should contain no header. The fields must be:

  • Filename with extension (without path) of the file in which your algorithm has detected a traffic sign
  • The bounding box of the detection in the image
    • Leftmost image column of the box
    • Upmost image row of the box
    • Rightmost image column of the box
    • Downmost image row of the box

Any further fields will be ignored.

Dataset and tools

As mentioned previously, the dataset is in 3 parts, downloadable here:

  1. LISA-TS Original (7.7 GB)
  2. LISA-TS Extension (1.5 GB)
  3. LISA-TS Testing (3.6 GB)

1 and 2 are provided with ground truth annotations and may be used for training, whereas 3 is used for testing and the annotations are thus not included.

When using this dataset please cite the following paper:

Andreas Møgelmose, Dongran Liu, and Mohan M. Trivedi, “Traffic Sign Detection for U.S. Roads: Remaining Challenges and a Case for Tracking,” IEEE Intelligent Transportation Systems Conference, (ITSC2014), Oct. 2014.

To aid in testing and tuning your system, we provide a Python toolkit. It consists of the following scripts:

  • filterAnnotationFile.py: Used to filter out only the classes that you are interested in at a given moment. It can also randomly split the annotations in a training and test set when you are tuning your detector.
  • extractAnnotations.py: Based on an annotation file (either a full one, or a filtered one from the script above), this script can extract cropped training images.
  • evaluateDetections.py: Takes a single file with detection bounding boxes as specified in the section above and evaluates them against a ground truth file.
  • generatePRC.py: Takes one or more sets of detection files, evaluates them against a ground truth file and computes the AUC for each set of detection files. Will optionally generate a PRC plot.

Each script can be run with -h for further usage information, e.g.:

$ python generatePRC.py -h
usage: generatePRC.py [-h] [-gt annotations.csv]
                      [-d detections.csv [detections.csv ...]] [-t "PRC plot"]
                      [-l ["Team, algorithm" ["Team, algorithm" ...]]]
                      [-p 0.5] [-o] [-s prcPlot.png] [--noInterpolation]

Generate a precision-recall curve and compute the area under the curve (AUC)
from multiple detection results and an annotation file.

optional arguments:
  -h, --help            show this help message and exit
  -gt annotations.csv, --groundTruth annotations.csv
                        The path to the csv-file containing ground truth
                        annotations.
  -d detections.csv [detections.csv ...], --detectionPaths detections.csv [detections.csv ...]
                        Paths to multiple the csv-files containing detections.
                        Each line formatted as filenameNoPath;upperLeftX;upper
                        LeftY;lowerRightX;lowerRightY. No header line. The
                        files should be produced with different parameters in
                        order to create multiple precision/recall data points.
                        This flag can be given several times to plot multiple
                        detector agains the ground truth.
  -t "PRC plot", --title "PRC plot"
                        Title put on the plot.
  -l ["Team, algorithm" ["Team, algorithm" ...]], --legend ["Team, algorithm" ["Team, algorithm" ...]]
                        Legend for each curve in the plot. Must have the same
                        number of entries as there are curves if given. If not
                        given, generic titles are used.
  -p 0.5, --pascal 0.5  Define Pascal overlap fraction.
  -o, --plot            Show plot of the computed PR curve.
  -s prcPlot.png, --savePlot prcPlot.png
                        Save the computed PR curve to the file prcPlot.png.
  --noInterpolation     By default the PR curves are interpolated according to
                        Davis & Goadrich "The Relationship Between Precision-
                        Recall and ROC Curves. If this flag is given,
                        interpolation is disabled.

Sample workflow

Imagine we are training a stop sign detector. In this particular case we choose to use only LISA-TS Extension as our data source, but the steps generalize, if we wanted to combine with LISA-TS Original.

First step is to find the stop sign images and generate a training dataset and a testing dataset for tuning our detector (since we do not have access to the annotations for LISA-TS training, we cannot use that – it is only for evaluating the final results):

$ python filterAnnotationFile.py -f stop -p stop 80 allTrainingAnnotations.csv

This will find all stop sign annotations, split them up in two files with 80% in one and 20% in the other, and save the spilt with the prefix “stop”. On Linux we can verify this:

$ wc -l stop-split*
   946 stop-split1.csv
   237 stop-split2.csv
  1183 total

Next, we extract our traning images:

$ python extractAnnotations.py crop stop-split1.csv 
None
['Filename', 'Annotation tag', 'Upper left corner X', 'Upper left corner Y', 'Lower right corner X', 'Lower right corner Y', 'Origin file', 'Origin frame number', 'Origin track', 'Origin track frame number\n']
[]
annotations/0_stop_1399493548.avi_image0.png
annotations/1_stop_1399493548.avi_image1.png
annotations/2_stop_1399493548.avi_image2.png
annotations/3_stop_1399493548.avi_image3.png
[... snip ...]
annotations/941_stop_1405372330.avi_image3.png
annotations/942_stop_1405372399.avi_image0.png
annotations/943_stop_1405372399.avi_image1.png
annotations/944_stop_1405372399.avi_image3.png
Done. Processed 946 annotations.

This will produce a folder called annotations with a lot of cropped stop signs in it.

Now we train our detector and test it on the remaining 20% of our data (the test images can be conveniently found using extractAnnotations.py copy stop-split2.csv). We have made it output results as specified above, so now we can test it against the ground truth:

$ python evaluateDetections.py results.csv stop-split2.csv 
------
Number of annotations:	236
------
Testing with a Pascal overlap measure of: 0.50
True positives:		212
False positives:	512
False negatives (miss):	24
------
Precision:		0.2928
Recall:			0.8983

Well, recall looks not too bad, but the precision could be better. We adjust some thresholds and run two more times, to see how our system performs at different settings. Now we can make a PR curve:

$ python generatePRC.py -d results1.csv results2.csv results3.csv -gt stop-split2.csv -o
Warning: Not the same number of legend entries as curves. Using generic names.
70.5424949524

We see that we have an AUC of 70.5% with this detector. The PR curve looks like this:

prcPlotIf we want to get rid of the warning in the output above, we can add a -l to give a proper legend. And while we’re at it, we add -t to give a proper title. Finally, we can also plot two or more detectors together by adding another -d:

$ python ~/phd/code/annotationTools/generatePRC.py -d results1.csv results2.csv results3.csv -d otherResults1.csv otherResults2.csv otherResults3.csv otherResults4.csv -gt stop-split2.csv -t "PR curve for stop sign detection" -l "My new great detector" "Another detector" -o

If we had time, we could generate even more result files for each detector, and the curves would look much nicer.

Sample files for the above examples can be downloaded here: sampleResults.

References

[1] Davis, Jesse, and Mark Goadrich. “The relationship between Precision-Recall and ROC curves.” Proceedings of the 23rd international conference on Machine learning. ACM, 2006.