The 2012 Spoken Web Search Task
The task involves searching FOR audio content WITHIN audio content USING an audio content query. This task is particularly interesting for speech researchers in the area of spoken term detection or low-resource speech processing.

In the task, two sets of unmarked audio files from multiple, resource-limited languages will be provided to researchers: one consists of queries and the other of content. The task requires that each occurrence of a query within the content be identified. Both the appropriate audio files, and the locations of the query terms within the audio files must be found. No transcriptions, language tags or any other metadata are provided. The task therefore requires researchers to build a language-independent audio search system.

Participants will receive separate audio files in the development and evaluation sets, which will however be from the same languages. The queries, however, will only overlap partially, so that some queries will have been seen during development, some won't. Corpus size will be in the order of a few hundred utterances per language, not more than few hours.

Additional data are provided for system development: for each language occurring in the evaluation set, a set of development data consisting of audio and transcriptions are provided. For some languages, word pronunciation information will be available, for others not. For some languages time-based alignment information will be available, for others not.

The task will be run in two different conditions:
(1) Using no additional audio data than that provided during system development.
(2) Using any additional data and resources that participants might have available (as long as their use is documented).

Target group
The task is of interest to researchers in the area of speech technology (also for under-resourced languages), spoken term detection and spoken content search.

Data
Indian data set: Participants will be provided with a data set that has been kindly made available by the Spoken Web team at IBM Research, India. The
audio content is spontaneous speech that has been created over phone in a live setting by low-literate users. While most of the audio content is related to farming practices, there are other domains as well. The data set comprises audio from four different Indian languages: English, Hindi, Gujarati and Telugu. Each data item is an 8 KHz audio file ca. 4-30 secs in length. In total there are approximately 400 items, plus about 100 spoken search queries. Language labels will not be provided. 

African data set: This data set consists of audio content created over the phone in four of the 11 South African languages. Audio content consists of a combination of read and elicited speech. Some audio recording artifacts are found in the data. Each data item is an 8 KHz audio file ca. 4-10 secs in length. In total there are approximately 1,500 items, plus about 100 spoken search queries per language. Language labels, transcriptions, time alignments and pronunciation information will be provided.

Ground truth and evaluation
The ground truth is created manually and provided by the task organizers, following the principles of NIST's Spoken Term Detection (STD) evaluations.

Task schedule
31 May: Development set release
1 July: Test set release
10 Sept: Run submission deadline
17 Sept: Results returned

Recommended reading
Results of the 2006 Spoken Term Detection Evaluation. Fiscus, J., Ajot, J., Garofolo, J., Doddington, G. The 2007 Special Interest Group on Information Retrieval (SIGIR-07) Workshop in Searching Spontaneous Conversational Speech.

Metze, F., Rajput, N., Anguera, X., Davel, M., Gravier, G., van Heerden, C. Mantena, G.V., Muscariello, A., Prahallad, K. Szoke, I. and Tejedor, J. The
Spoken Web Search task at MediaEval 2011, in Proc. ICASSP, 2012, Kyoto Japan, March 2012, pp 3487-3491.

Ground truth and evaluation
The ground truth is created manually and provided by the task organizers, following the principles of NIST's Spoken Term Detection (STD) evaluations.

Task organizers:
Florian Metze, CMU, USA
Marelie Davel, NWU, South Africa
Etienne Barnard, NWU, South Africa
Xavier Anguera, Telefonica, Spain
Guillaume Gravier, IRISA, France
Nitendra Rajput, IBM Research, India