Skip to content.
Find topic
Search text
WS06 topics
WS06 home
Members
Team papers
Team presentations
Feature transcriptions
Pre-workshop meetings
Additional papers
Feature sets
Data
Links
Members' area
Papers/final report
Member journals
Meeting notes
Ongoing work
Results
Structures
Tools & code
Compute/space
Project ideas
Final presentation
Tools
Recent changes
Topic list
Verbose topic list
Help!
Brief intro to this site
Text formatting rules
TWiki documentation
--
KarenLivescu
- 15 Dec 2005
WS06
>
DataBases
More...
Printable version
Attach a file
Edit this page
---++ Possible databases for use at WS06 This is a listing of the databases that have been mentioned for possible use at WS06. ---+++ Audio only * SVitchboard (currently, this is the main database we plan to use) * Small-vocabulary (10- to 500-word), conversational speech extracted from the [[http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC97S62 Switchboard database]] * Described in [[http://www.cstr.ed.ac.uk/downloads/publications/2005/king_bartels_bilmes_svitchboard.pdf [King et al., Interspeech 2005] ]] * Also see the [[http://www.cstr.ed.ac.uk/research/projects/svitchboard SVitchboard project site]] * <nop>PhoneBook (a possible alternative in case we want to look at isolated words, or just a somewhat easier task) * ~8000-word vocabulary, isolated words, phonetically rich * Described in [[http://ieeexplore.ieee.org/iel2/3469/10215/00479283.pdf?arnumber=479283 [Pitrelli et al., ICASSP 1995] ]] * Also see the [[http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC95S27 PhoneBook database page]] ---+++ Audio-visual * AVTIMIT (MIT) * Multi-speaker * Read speech consisting of sentences from the [[http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC93S1 TIMIT database]] * Described in [[http://people.csail.mit.edu/people/saenko/hazen_icmi_04.pdf Hazen et al., ICMI 2004]] * AVICAR (UIUC) * Multi-speaker * Read speech consisting of isolated digits and letters, phone numbers, and sentences * In-car environment with varying car noise levels * Described in [[http://www.ifp.uiuc.edu/speech/AVICAR/downloads/documents/AVICAR.pdf Lee et al., ICSLP 2004]] * See the [[http://www.ifp.uiuc.edu/speech/AVICAR/ AVICAR project web site]] * Manual face and lip segmentations: attached at bottom of page * Description of the [[AVICAR Data on ws06afsr]] * CMU audio-visual speech processing project data (CMU) * 10 speakers * Isolated words * Lip tracking parameters available * See the [[http://amp.ece.cmu.edu/projects/AudioVisualSpeechProcessing/#Download project web site]] * CUAVE (Clemson) * 36 speakers, each reading 60 isolated digits and 60 connected digits * Also a number of speaker-pair recordings * Studio environment * See RelevantPapers for some example results * New continuous numbers database (MIT)? * Stay tuned... ---+++ Transcribed data For sanity checks, or in order to do separate acoustic/pronunciation modeling experiments * STP ([[http://www.icsi.berkeley.edu/real/stp/ Switchboard Transcription Project]]) * Manually transcribed (for the most part) utterances from the [[http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC97S62 Switchboard database]] * Transcribed at a detailed phonetic level (including diacritics, e.g. nasalization, frication) * [[FeatureTranscription][New data]] transcribed for the workshop ---+++ Articulatory measurement data For detailed study of articulatory phenomena. The current thinking is that we will not use these, as they lack some useful articulatory information and are perhaps too difficult to translate to the discrete feature values that we intend to use. However, this is up for discussion. * [[http://www.medsch.wisc.edu/ubeam/ University of Wisconsin X-ray microbeam database]], a multi-speaker database described in [[http://www.medsch.wisc.edu/~milenkvc/pdf/ubdbman.pdf this user's handbook]] * [[http://www.cstr.ed.ac.uk/research/projects/artic/mocha.html MOCHA]], consisting of electromagnetic articulograph (EMA) recordings of 2 speakers ---++ Discussion area for this page: %COMMENT{type="above"}% -- Main.KarenLivescu - 08 Dec 2005