Find topic
WS06 topics
Members' area
Tools
Help!
-- KarenLivescu - 15 Dec 2005
|
Transcription guidelines
Checklist
- Before starting an utterance:
- Point your web browser to the phone-to-feature mappings and the instructions on this page
- Open the .wav file with sampling rate 8000 and offset 1024 bytes. (To avoid doing this separately for each file, you can try checking the "use these settings for all .wav files" box when opening the first file.)
- Write down the time.
- To keep in mind during transcription:
- The detailed guidelines below.
- The boundaries in the initial .wd transcription were generated automatically and have many errors. You will probably need to modify some/many of them.
- The provided initial/final silence boundaries in all tiers were generated automatically from the .wd files; these are also likely to be wrong.
- The initial .phn transcription is an alignment with the dictionary pronunciations. It is often wrong when words are reduced. Change it with abandon.
- The final transcription should give enough information so that the speaker, by looking at the transcription, could recreate the acoustics exactly. More on this in the detailed guidelines below.
- Go for accuracy over speed. But if you are very unsure about a segment, use "?" or multiple labels (e.g. "FRIC/APP").
- Don't worry about exact boundaries up to +/- 20ms.
- Use the .cm tier to mark anything problematic or that should be discussed.
- Use the drop-down menus to label feature tiers, rather than typing in.
- It's OK to use some deductive reasoning. E.g. if the intended sound is [b] but the actual segment is a fricative, it is probably a LAB, FRIC and not a [v] (L-D, FRIC).
- When finishing an utterance:
- Write down the time.
- Save all transcription panes.
- Sanity checks:
- Re-skim the guidelines below to make sure the transcription follows them.
- Listen to each segment (using right-click on transcription --> "Play label") to make sure you believe the transcription
- Stretch, get a drink...
- When doing the 2nd pass:
- Fix mistakes, not disagreements.
- Once the 2nd pass is done, you should be able to "defend" each difference between the transcriptions, i.e. "I chose this over that because...".
Detailed guidelines
In no particular order...
- Phone/feature hybrid transcriptions:
- Use a phone label when a segment's features match one of the phone labels. If they don't, mark "N/A" in the .phn tier and use the label tiers instead.
- If there are multiple "N/A" segments in a row (i.e. segments with different feature values with no matching phone labels), leave them as multiple segments rather than making one big "N/A" segment. (No good reason for this; just a convention.)
- If a phone has an unspecified feature value ([hh], [q], [r], [er], [axr], [sh], [zh], [ch], [jh]), it must be transcribed with both a phone label and feature values (for all tiers, not just the unspecified ones). E.g. an [hh] must be labeled with features since its vowel value is unspecified; it should be easy to tell what vowel shape the [hh] is in based on formants/listening. Similarly, when [q] is realized as IRR, the vowel shape is usually easy to tell and should be labeled in the .vow tier (if it's not easy to tell, label it '?'). For some phones with unspecified feature values, it may be hard to tell what the actual value is; e.g. an [r] may be rounded or unrounded; label it '?' if you can't tell.
- The "recreation" rule:
- The final transcription should give enough information so that the speaker, by looking at the transcription, could recreate the acoustics exactly (assuming he/she could actually read the transcription).
- For example, if a word is very reduced but with a hint of the original gestures, that should be indicated somehow in the transcription. E.g. if the word is "probably" and is produced like "pry" but with a hint of labial/lateral gestures in the middle, don't transcribe it as /pcl p r ay1 ay2/. Use place=LAB or LAT and degree=APP to indicate these hints of gestures.
- [ah] vs. [ax], [ih] vs. [ix], [er] vs. [axr]: Use a schwa if the segment is unstressed and 50ms or shorter.
- APP degree:
- Used for both glides ([y], [w]) and other sounds realized as approximants.
- If there is any gesture towards an intended consonant, even if small, use APP degree. E.g. if "probably" is produced almost like [p r ay] but with some evidence of lip narrowing in the middle of the [ay]-like region, mark that as APP, LAB.
- Two stop closures in a row: If you can't tell when the place of closure has changed (e.g. "woul*d g*o"), just mark the boundary in the middle.
- Baseline phonetic transcription: The initial transcription is an alignment (currently, done by Karen) of the utterance to its baseform (dictionary) pronunciation. In other words, "probably" will always get the initial phonetic transcription [p r aa bcl b ax bcl b l iy] even if it was pronounced [p r ay]. This transcription is a very very rough guideline; ignore it or change it as much as necessary to match what was actually spoken.
- "GLO" place: Used only for glottal stops.
- VOI vs. IRR in the .glo tier: If there are regular pitch periods, even with very low pitch, label them as VOI. Use IRR only when the pitch periods are not at regular intervals.
- Boundaries in diphthongs: Where does the boundary between 1 and 2 go?
- [aw1] should look/sound more like an [ae] (or [aa] in some dialects), [aw2] like an [uh] or [w]
- [ay1] <--> [aa], [ay2] <--> [ih] or [y]
- [ey1] <--> [eh], [ey2] <--> [ih] or [y]
- [ow1] doesn't have a non-diphthong correlate, [ow2] <--> [uh] or [w]
- [oy1] <--> [ao] or [ow1], [oy2] <--> [ih] or [y]
- Voiceless vowels: mark them as "VL", not "ASP", in the glottal tier (to differentiate from [hh])
- The vowel rule:
- If the .pl/.dg tiers are both "NONE/VOW", then there should be a vowel label in the .vow tier; otherwise, the .vow tier should be "N/A".
- The only exception is for rhoticized vowels ([er], [axr]) and syllabics ([el], [em], [en]); these get a vowel label in the .vow tier and a constriction in the .pl1/.dg1 tier ("RHO/APP" for [axr], [er]; "LAT/CLO", "LAB/CLO", or "ALV/CLO" for [el], [em], [en]).
- Laterals: Use "LAT" place for both light and dark [l]s. For an [l] with incomplete tongue tip closure, use "APP" in the degree tier. Example: the [l]s at the end of the words "all", "feel" would usually be marked as a "LAT/APP" segment followed by a "LAT/CLO" segment.
- Stops:
- For unaspirated (voiced or voiceless) stops, the .dg is CLO during the closure, then FRIC during the frication.
- For voiceless stops at the beginning of a stressed syllable, there may be aspiration. If there is a clear distinction between a frication portion and aspiration portion, use CLO for closure, FRIC for frication, APP during the aspiration. The aspiration should also be indicated as ASP in the .glo tier.
- When doing a phonetic labeling, the burst is not separated into frication and aspiration; however, if there are feature changes during the aspiration (e.g. if the lateral in the word "place" occurs during the aspiration), then label the aspiration portion with feature labels.
- If there is an utterance-initial or -final stop closure, it may not be possible to tell where the boundary between closure and silence is. In that case, mark a 100ms-long closure.
- Transitional periods between steady states: Don't label them as separate segments if they are natural/necessary transitional periods, e.g. the formant transitions between vowels and consonants. If there is an "extra" transitional sound beyond what is necessary, like "feel" --> [f iy ax l], label it as such.
- Voicing onset/offset: When in doubt, open a waveform blow-up; the onset/offset of voicing is the point at which periodicity starts/stops.
- Diphthongs realized as monophthongs: Label them as the monophthong, not as 1, when possible. For example, an underlying [aw] that's produced as an [ae] should be marked [ae], not [aw1]. Similarly for [ey]. For [ow], there's no monophthong label corresponding to the first part, so mark it [ow1]. For [oy], the initial part of it may sound like an [ao], in which case label it as such; or it may sound more like the sound at the beginning of [ow] or "or", in which case label it [oy1].
- Here's a tricky utterance we talked about at the May 10 meeting, and the transcription I think we agreed on (at least I think we agreed on the [dh] portion, which was the tricky part).
Miscellaneous wavesurfer issues
- When saving a config file, panes (at least transcription & time axis panes) seem to get saved with heights reduced by 4 pixels. After saving a config file, open it in a text editor and re-set the heights to the correct ones. Or, just edit the config file directly in a text editor rather than saving in wavesurfer.
- For pre-May14 utterances, choose sampling rate 8000 and offset of 128 bytes for the "ms98" files. For the ws96 files, choose 8000 samples/sec and 1024 bytes. Starting with the May14 utterances, use an offset of 1024 bytes for all files. Then choose the config file "featureset5_v3".
- "Save all transcriptions" doesn't seem to work. Do "Save transcription as..." in each transcription pane to save them separately (it is recommended to do this when you are done with each pane anyway).
- Random errors pop up every now and then... If something weird happens, save everything and restart wavesurfer.
- When two transcription panes have their properties set so that their boundaries move together, sometimes it only works to move the boundary in one of the transcription panes and not the other. When a boundary is moved, a marker at the old location sometimes remains in the spectrogram/waveform panes. It might go away if you click elsewhere or scroll back & forth in time. If not, open the pane properties for that transcription, uncheck and check the "extend boundaries..." option under Trans1, and click "Apply".
- To adjust spectrogram color settings, use "Spectrogram controls..." in the pane.
-- KarenLivescu - 26 Jan 2006
|