Find topic
WS06 topics
Members' area
Tools
Help!
-- KarenLivescu - 15 Dec 2005
|
Feature sets
A repository of feature sets we have discussed. The feature set discussion shifted to the feature transcription effort, and that page now has the most up-to-date feature set.
Feature set 1
Based on the vocal tract variables of articulatory phonology, and used in Livescu & Glass, HLT/NAACL-04 & ICSLP-04.
| Feature name | Values | Comments |
| LIP-LOC | 0=PRO, 1=LAB, 2=DEN | Corresponds (roughly) to the horizontal displacement of the lips. LAB (labial) is the default position of the lips; PRO (protruded) refers to rounded sounds such as [w]; DEN (dental) refers to labio-dental sounds such as [f]. Note that this definition does not allow us to represent rounded labio-dentals. |
| LIP-OPEN | 0=CLO, 1=CRIT, 2=NAR, 3=WID | The degree of opening of the lips. CLO (closed) is used for labial closures; CRI (critical) is used for fricatives; NAR (narrow) for [w] or [uw]; and "wide" for all other sounds. |
| TT-LOC | 0=DEN, 1=ALV, 2=P-A, 3=RET | The location along the upper surface of the vocal tract to which the tongue tip is closest. DEN (dental) refers to the [th/dh] configuration; ALV=alveolar ([t/d], [n], [s,z], many vowels); P-A=palato-alveolar ([sh/zh], some vowels and non-coronal consonants; RET=retroflex ([r], [er]). |
| TT-OPEN | 0=CLO, 1=CRIT, 2=NAR, 3=M-N, 4=MID, 5=WIDE | Degree of opening of the tongue tip, relative to the location indicated by TT-LOC. CLO (closed) refers to full closures; CRI (critical) to fricatives; NAR (narrow) to glides; M-N (mid-narrow) to vowels with a narrow constriction; and so on. |
| TB-LOC | 0=PAL, 1=VEL, 2=UVU, 3=PHAR | The location along the upper surface of the vocal tract to which the main "hump" of the tongue body is closest. UVU (uvular) is considered to be the default position when no particular constriction is being made with the tongue body. This set of values cannot accurately represent "fronted" velars such as the /k/ in "key" since they would fall in between "velar" and "palatal". |
| TB-OPEN | 0=CLO, 1=CRIT, 2=NAR, 3=M-N, 4=MID, 5=WIDE | The degree of opening of the tongue body, relative to the location indicated by TB-LOC. The interpretation of the values is analogous to TT-OPEN. |
| VEL | 0=CLO, 1=OPEN | Indicates whether or not the velum is open, i.e. is allowing air to flow to the nasal cavities. CLO (closed) is non-nasal; OPEN is nasal. |
| GLOT | 0=CLO, 1=CRIT, 2=WIDE | The degree of opening of the glottis. CLO (closed) is used for glottal or glottalized stops; CRIT (critical) for voiced sounds; and WIDE for voiceless sounds. |
Feature set 2
If we make the assumption that LIP-LOC and LIP-OPEN are always synchronized; the 4 tongue features are always synchronized; GLOT and VEL are always synchronized; and there are no substitutions of feature values, then we can collapse each set of synchronous features into a single feature, whose state space of possible values may be much smaller than the product space of the component features. Feature set 2 is the result. This is the feature set used for the baseline end-to-end recognition experiment in ExperimentsResults.
| Feature name | Values  | Comments |
| G | 0=C-VO, 1=C-VL, 2=O-VO | Combination of GLOT and VEL. |
| T | 0=D-CR-U-M, 1=A-CL-U-N, 2=A-CL-U-M, 3=A-CR-U-M, 4=A-N-U-M, 5=A-MN-PA-N, 6=A-MN-PA-MN, 7=A-M-PA-M, 8=A-M-U-M, 9=A-W-V-M, 10=P-CR-PA-MN, 11=P-M-U-MN, 12=P-W-V-CL, 13=P-W-V-CR, 14=P-W-V-N, 15=P-W-U-N, 16=P-W-U-MN, 17=P-W-PH-MN, 18=R-N-U-M | Combination of TT-LOC, TT-OPEN, TB-LOC, TB-OPEN. Values are concatenated abbreviations of those four feature values, respectively. |
| L | 0=P-N, 1=P-W, 2=L-CL, 3=L-CR, 4=L-W, 5=D-CR | Combination of LIP-LOC and LIP-OPEN. Values are concatenated abbreviations of those two features' values; e.g. L=D-CR is equivalent to LIP-LOC=DEN, LIP-OPEN=CRIT. |
Feature set 3
Based on Chang, Wester, & Greenberg, Sp. Comm. 47:290-311, 2005 (see Papers). This is the type of feature set we might use for acoustic modeling.
| Feature name | Values | Comments |
| place | LAB, LAB-DEN, ALV, POST-ALV, VEL, DEN, GLO, RHO, FRT, CEN, BK, SIL | The first 7 refer to consonantal sounds; the next 4 to vocalic; and SIL to silence. GLOttal place refers to [hh]; RHOtic is as in [r], [er], [axr]; FRonT, CENtral, and BacK refers to front/central/back vowels. |
| manner | VOC, NAS, ST, FR, FL, SIL | |
| voicing | +, -, SIL? | Not sure if there is a SIL value for this feat and a couple more below (KL) |
| static | +, -, SIL? | - (dynamic) refers to stops, affricates, syllabic nasals, flaps (but not nasal flaps?), glides, liquids, and diphthongs |
| lip-rounding | +, -, SIL? | Pertinent to vocalic segments & glides |
| vocalic tongue height | HI, MID, LO, SIL? | |
| tense | +, -, SIL? | Also referred to as intrinsic vocalic duration |
One drawback of this feature set as used in Chang et al. is that diphthongs get a single feature vector; e.g. [ay] is low front, although the ending articulation is presumably high. Also, static/dynamic refers to entire segments rather than to instantaneous articulations, so it probably doesn't make sense for us to use it. Not clear whether tense/lax also has this issue.
Mapping from feature set 1 to feature set 3
If we are to use different feature sets for pronunciation modeling and acoustic modeling, then we need a mapping from the former to the latter feature sets. Here's a mapping from feature set 1 to 3:
| Feature name | Value | Expression in terms of feature set 1 | Comments |
| place | LAB | ((LL = LAB) OR (LL = PRO)) AND (LO < NAR) | Closed or critical constriction at the lips |
| LAB-DEN | (LL = LAB-DEN) AND (LO < NAR) | |
| ALV | (TTL = ALV) AND (TTO < NAR) | |
| POST-ALV | (TTL = P-A) AND (TTO < NAR) | |
| VEL | (TBL = VEL) AND (TBO < NAR) | |
| DEN | (TTL = DEN) AND (TTO < NAR) | |
| GLO | (LO > NAR) AND (TTO > NAR) AND (TBO > NAR) AND (GLO = WI) | No constriction anywhere, and no voicing |
| RHO | (TTL = RET) AND (TTO <= NAR) </td> | |
| FRT | (LO >= NAR) AND (TTO >= NAR) AND (TBO >= NAR) AND ((TTL = ALV) OR (TTL = P-A)) AND ((TBL = PAL) OR (TBL = VEL)) | No more than a narrow constriction anywhere, and tongue in a forward position |
| CEN | (LO >= NAR) AND (TTO >= NAR) AND (TBO >= NAR) AND ((TTL = ALV) OR (TTL = P-A)) AND ((TBL = VEL) OR (TBL = UVU) OR (TBL = PHA)) | Will have to work on this (slight overlap with FRT) |
| BK | (LO >= NAR) AND (TTO >= NAR) AND (TBO >= NAR) AND (TTL = P-A) AND ((TBL = UVU) OR (TBL = PHA)) | Also needs tweaking |
| SIL | No mapping | |
| manner | VOC | (LO >= NAR) AND (TTO >= NAR) AND (TBO >= NAR) AND ((GLO = CL) OR (CLO = CR)) | No major constriction, and voicing is on |
| NAS | (VEL = OP) AND ((LO = CL) OR (TTO = CL) OR (TBO = CL)) | A nasal has both an open velum and a closure somewhere. Nasalized vowels are not considered nasals. |
| ST | (LO = CL) OR (TTO = CL) OR (TBO = CL) AND (VEL = CL) | Full closure somewhere, and not nasal (this assumes that "stop" refers only to the closure part) |
| FR | (LO = CR) OR (TTO = CR) OR (TBO = CR) | Critical closure somewhere |
| FL | (TTO = NAR) | This only works because flaps are defined to have a narrow tongue tip closure in feature set 1; not very convincing... |
| SIL | ??? | |
| voicing | + | (GLO = CL) OR (GLO = CR) | Glottal stop considered voiced here |
| - | GLO = WI | |
| static | + | No mapping for this feature | |
| - | | |
| lip-rounding | + | LL = PRO | |
| - | (LL = LAB) OR (LL = L-D) | |
| voc. tongue ht. | HI | (LO >= NAR) AND (TTO >= NAR) AND (TBO = M-N) | No more than a narrow constriction anywhere, and tongue body in a high position |
| MID | (LO >= NAR) AND (TTO >= NAR) AND (TBO = MID) | |
| LO | (LO >= NAR) AND (TTO >= NAR) AND (TBO = WI) | |
| tense | + | No obvious mapping for this feature | |
| - | | |
Feature set 4
A modification of feature set 3, which I am suggesting for observation modeling as an alternative that fixes some drawbacks I see with FS3. -- KarenLivescu - 31 Dec 2005
| Feature name | Values | Comments |
| place | LAB, LAB-DEN, ALV, POST-ALV, VEL, DEN, GLO, RHO, FRT, CEN, BK, SIL | Same as in FS3. |
| manner | VOW, GLI, LAT, FL, FR, CLO, SIL | VOW replaces VOC and refers only to vowels; GLI refers to glides (included so as to be able to represent labial/velar glides); LAT is for laterals (didn't see how to distinguish them from some vowels without a dedicated feature value); CLO refers to the closure portion of a stop; FR refers to both fricatives and stop bursts. Nasality is now its own feature and can apply to most manners (the value NAS in FS3 is now manner=CLO, nasality=+). |
| nasality | +, - | Silence is considered to be - nasal. |
| voicing | +, - | Silence is considered to have - voicing. |
| lip-rounding | +, - | Applicable to any sound. Silence is - round. |
| vocalic tongue height | HI, MID, LO, SIL | |
Questions:
- What should [hh] be? In FS1, it is essentially a voiceless [ah]. That's OK for some [hh]s, but in words like "human" it's more like a palatal fricative.
- Should there be a dental place other than inter-dental (e.g. for dental [n]s)? This would help model things like "in the" with no [dh] but with a dental [n]. If we have such a value, there is no way to train classifiers for them from phonetic transcriptions. It would either have to happen through some EM-like training, or by bootstrapping from manual transcriptions.
- Is it OK to not have a nasal consonant manner? I.e. is it OK to lump oral stop closures and nasal stop closures in the same manner class? It makes the feature set less redundant, but is it harder to classify?
A question with an answer (just a reminder):
- Why not just use an established feature set, e.g. the one in the IPA chart or Stevens' distinctive features?
- Because those features refer to segments, not instantaneous articulations. The IPA ones are almost fine, but still have some segment-specific properties (e.g. "aspirated stop" means the end of the stop is aspiration noise).
-- KarenLivescu - 21 Jan 2006
Phone-to-feature mapping table
For now this is a mapping from phones to feature set 1. (See a few notes below.)
| Phone | Index | LIP-LOC | LIP-OPEN | TT-LOC | TT-OPEN | TB-LOC | TB-OPEN | VEL | GLOT |
| aa | 0 | 1 | 3 | 1 | 5 | 3 | 3 | 0,1 | 1 |
| ae | 1 | 1 | 3 | 1 | 5 | 1 | 5 | 0,1 | 1 |
| ah | 2 | 1 | 3 | 1 | 4 | 2 | 4 | 0,1 | 1 |
| ao | 3 | 0 | 3 | 1 | 5 | 3 | 3 | 0,1 | 1 |
| aw1 | 4 | 1 | 3 | 1 | 5 | 1 | 5 | 0,1 | 1 |
| aw2 | 5 | 0 | 2 | 2 | 5 | 2 | 3 | 0,1 | 1 |
| ax | 6 | 1 | 3 | 1 | 4 | 2 | 4 | 0,1 | 1 |
| axr | 7 | 1 | 3 | 3 | 1,2,3 | 1,2,3 | 0,1,3,4,5 | 0,1 | 1 |
| ay1 | 8 | 1 | 3 | 1 | 5 | 3 | 3 | 0,1 | 1 |
| ay2 | 9 | 1 | 3 | 1 | 3 | 0 | 3 | 0,1 | 1 |
| b | 10 | 1 | 1 | 1 | 4 | 2 | 5 | 0 | 1 |
| bcl | 11 | 1 | 0 | 1 | 4 | 2 | 5 | 0 | 1 |
| ch | 12 | 1 | 3 | 2 | 1 | 0 | 3 | 0 | 2 |
| d | 13 | 1 | 3 | 1 | 1 | 1 | 4 | 0 | 1 |
| dcl | 14 | 1 | 3 | 1 | 0 | 1 | 4 | 0 | 1 |
| dh | 15 | 1 | 3 | 0 | 1 | 2 | 4 | 0 | 1 |
| dx | 16 | 1 | 3 | 1 | 2 | 1 | 4 | 0 | 1 |
| eh | 17 | 1 | 3 | 1 | 4 | 0 | 4 | 0,1 | 1 |
| el | 18 | 1 | 3 | 1 | 0 | 2 | 2 | 0,1 | 1 |
| em | 19 | 1 | 0 | 1 | 4 | 2 | 4 | 1 | 1 |
| en | 20 | 1 | 3 | 1 | 0 | 2 | 4 | 1 | 1 |
| er | 21 | 1 | 3 | 3 | 1,2,3 | 1,2,3 | 0,1,3,4,5 | 0,1 | 1 |
| ey1 | 22 | 1 | 3 | 1 | 4 | 0 | 4 | 0,1 | 1 |
| ey2 | 23 | 1 | 3 | 1 | 3 | 0 | 3 | 0,1 | 1 |
| f | 24 | 2 | 1 | 1 | 4 | 1 | 4 | 0 | 2 |
| g | 25 | 1 | 3 | 2 | 5 | 1 | 1 | 0 | 1 |
| gcl | 26 | 1 | 3 | 2 | 5 | 1 | 0 | 0 | 1 |
| hh | 27 | 1 | 3 | 1 | 4 | 2 | 4 | 0 | 2 |
| ih | 28 | 1 | 3 | 1 | 3 | 0 | 3 | 0,1 | 1 |
| iy | 29 | 1 | 3 | 1 | 3 | 0 | 2 | 0,1 | 1 |
| jh | 30 | 1 | 3 | 2 | 1 | 0 | 4 | 0 | 1 |
| k | 31 | 1 | 3 | 2 | 5 | 1 | 1 | 0 | 2 |
| kcl | 32 | 1 | 3 | 2 | 5 | 1 | 0 | 0 | 2 |
| l | 33 | 1 | 3 | 1 | 0 | 2 | 2 | 0,1 | 1 |
| m | 34 | 1 | 0 | 1 | 4 | 2 | 4 | 1 | 1 |
| n | 35 | 1 | 3 | 1 | 0 | 2 | 4 | 1 | 1 |
| ng | 36 | 1 | 3 | 2 | 5 | 1 | 0 | 1 | 1 |
| ow1 | 37 | 0 | 3 | 2 | 5 | 2 | 3 | 0,1 | 1 |
| ow2 | 38 | 0 | 2 | 2 | 5 | 1 | 2 | 0,1 | 1 |
| oy1 | 39 | 0 | 3 | 1 | 5 | 2 | 3 | 0,1 | 1 |
| oy2 | 40 | 1 | 3 | 1 | 3 | 0 | 3 | 0,1 | 1 |
| p | 41 | 1 | 1 | 1 | 4 | 2 | 5 | 0 | 2 |
| pcl | 42 | 1 | 0 | 1 | 4 | 2 | 5 | 0 | 2 |
| r | 43 | 1 | 3 | 3 | 1,2,3 | 1,2,3 | 0,1,3,4,5 | 0,1 | 1 |
| s | 44 | 1 | 3 | 1 | 1 | 2 | 4 | 0 | 2 |
| sh | 45 | 1 | 3 | 2 | 1 | 0 | 3 | 0 | 2 |
| t | 46 | 1 | 3 | 1 | 1 | 1 | 4 | 0 | 2 |
| tcl | 47 | 1 | 3 | 1 | 0 | 1 | 4 | 0 | 2 |
| th | 48 | 1 | 3 | 0 | 1 | 2 | 4 | 0 | 2 |
| uh | 49 | 0 | 3 | 2 | 5 | 2 | 3 | 0,1 | 1 |
| uw | 50 | 0 | 2 | 2 | 5 | 1 | 2 | 0,1 | 1 |
| v | 51 | 2 | 1 | 1 | 4 | 1 | 4 | 0 | 1 |
| w | 52 | 0 | 2 | 2 | 5 | 2 | 2 | 0,1 | 1 |
| y | 53 | 1 | 3 | 1 | 3 | 0 | 2 | 0,1 | 1 |
| z | 54 | 1 | 3 | 1 | 1 | 2 | 4 | 0 | 1 |
| zh | 55 | 1 | 3 | 2 | 1 | 0 | 4 | 0 | 1 |
| epi | 56 | 1 | 0 | 1 | 3 | 2 | 3 | 0 | 2 |
| sil | 57 | 1 | 0 | 1 | 3 | 2 | 3 | 0 | 2 |
| dn | 58 | 1 | 3 | 1 | 1 | 1 | 4 | 0,1 | 1 |
| dcln | 59 | 1 | 3 | 1 | 0 | 1 | 4 | 0,1 | 1 |
| tn | 60 | 1 | 3 | 1 | 1 | 1 | 4 | 0,1 | 2 |
| tcln | 61 | 1 | 3 | 1 | 0 | 1 | 4 | 0,1 | 2 |
A few notes:
- Feature values are indicated with their numerical labels to save space--see feature definition table for the meanings of the numerical values.
- Dynamic phones (diphthongs, stops, etc.) have been broken into two parts.
- [dn], [tn], [dcln], [tcln] are "nasalizable" versions of [d, t, dcl, tcl] (typically in post-nasal position, e.g. "winter")
- [sil] and [epi] (epenthetic silence) have been given dummy feature values for now
-- KarenLivescu - 22 Dec 2005
A first attempt at converting Buckeye phones
I started playing around with the definitions Karen made in order to see if I could get a mapping from Buckeye phones to Feature Set 4. Karen's psuedo-code inspired me to see if I could take her descriptions and compile them into a perl program to convert the table above into feature definitions. I'm running into some inconsistencies with the mapping (some of which Karen describes above).
First: Here are the rules I'm using to map between FS 1 and FS 4. They are just slightly modified versions of her rules.
| Feature name | Value | Expression in terms of feature set 1 |
| place | LAB | ((LL = LAB) OR (LL = PRO)) AND (LO < NAR) |
| LAB-DEN | (LL = LAB-DEN) AND (LO < NAR) |
| ALV | (TTL = ALV) AND (TTO < NAR) |
| PST-ALV | (TTL = P-A) AND (TTO < NAR) |
| VEL | (TBL = VEL) AND (TBO < NAR) |
| DEN | (TTL = DEN) AND (TTO < NAR) |
| GLO | (LO > NAR) AND (TTO > NAR) AND (TBO > NAR) AND (GLO = WI) |
| RHO | (TTL = RET) AND (TTO <= NAR) </td> |
| FRT | (LO >= NAR) AND (TTO >= NAR) AND (TBO >= NAR) AND ((TTL = ALV) OR (TTL = P-A)) AND ((TBL = PAL) OR (TBL = VEL)) |
| CEN | (LO >= NAR) AND (TTO >= NAR) AND (TBO >= NAR) AND (((TTL = ALV) AND ((TBL = VEL) OR (TBL = UVU) OR (TBL = PHA))) OR ((TTL = P-A) AND (TBL = VEL))) |
| BK | (LO >= NAR) AND (TTO >= NAR) AND (TBO >= NAR) AND (TTL = P-A) AND ((TBL = UVU) OR (TBL = PHA)) |
| SIL | PHN = "h#" |
| manner | GLI | (LO >= NAR) AND (TTO >= NAR) AND (TBO >= NAR) AND ((GLO = CL) OR (GLO = CR)) AND ((TTO = NAR) OR (TBO = NAR)) |
| LAT | (LO >= NAR) AND (TTO >= NAR) AND (TBO >= NAR) AND ((GLO = CL) OR (GLO = CR)) AND ((TTL = RET) OR (TTO = CL)) |
| VOW | (LO >= NAR) AND (TTO >= NAR) AND (TBO >= NAR) AND ((GLO = CL) OR (GLO = CR)) |
| CLO | (LO = CL) OR (TTO = CL) OR (TBO = CL) |
| FR | (LO = CR) OR (TTO = CR) OR (TBO = CR) |
| FL | (TTO = NAR) |
| SIL | PHN = "h#" |
| nasality | + | VEL = OP |
| - | VEL = CL |
| voicing | + | (GLO = CL) OR (GLO = CR) |
| - | GLO = WI |
| lip_rounding | + | LL = PRO |
| - | (LL = LAB) OR (LL = L-D) |
| voc_tongue_ht | HI | (LO >= NAR) AND (TTO >= NAR) AND (TBO = M-N) |
| MID | (LO >= NAR) AND (TTO >= NAR) AND (TBO = MID) |
| LO | (LO >= NAR) AND (TTO >= NAR) AND (TBO = WI) |
| NA | (LO < NAR) OR (TTO < NAR) OR (TBO < M-N) |
| SIL | PHN = "h#" |
Applying the following script to the ruleset above creates a mapping script:
#!/usr/bin/perl
print <<EOM;
#!/usr/bin/perl
print \" | #phn | place | manner | nasal | voicing | lip-rd | voc-ht | \\n";
while(<>) {
chomp;
# hack! choose first of all symbols
s/,[^\s]*//g;
(\$PHN,\$IX,\$LL,\$LO,\$TTL,\$TTO,\$TBL,\$TBO,\$VEL,\$GLO)=split;
\$place="XXX";
\$manner="XXX";
\$nasality="XXX";
\$voicing="XXX";
\$lip_rounding="XXX";
\$voc_tongue_ht="XXX";
# BEGIN AUTOMATICALLY GENERATED RULES
EOM
while(<>) {
chomp;
s/\s+$//;
($feature,$value,$mapping)=split(/\s+/,$_,3);
$mapping=~s/GLO ([<>]?=?) CL/\$GLO \1 0/g;
$mapping=~s/GLO ([<>]?=?) CR/\$GLO \1 1/g;
$mapping=~s/GLO ([<>]?=?) WI/\$GLO \1 2/g;
$mapping=~s/LL ([<>]?=?) (DEN|LAB-DEN|L-D)/\$LL \1 2/g;
$mapping=~s/LL ([<>]?=?) PRO/\$LL \1 0/g;
$mapping=~s/LL ([<>]?=?) LAB/\$LL \1 1/g;
$mapping=~s/LO ([<>]?=?) CL/\$LO \1 0/g;
$mapping=~s/LO ([<>]?=?) CR/\$LO \1 1/g;
$mapping=~s/LO ([<>]?=?) NAR/\$LO \1 2/g;
$mapping=~s/LO ([<>]?=?) WI/\$LO \1 3/g;
$mapping=~s/TTL ([<>]?=?) DEN/\$TTL \1 0/g;
$mapping=~s/TTL ([<>]?=?) ALV/\$TTL \1 1/g;
$mapping=~s/TTL ([<>]?=?) P-A/\$TTL \1 2/g;
$mapping=~s/TTL ([<>]?=?) RET/\$TTL \1 3/g;
$mapping=~s/TTO ([<>]?=?) CL/\$TTO \1 0/g;
$mapping=~s/TTO ([<>]?=?) CR/\$TTO \1 1/g;
$mapping=~s/TTO ([<>]?=?) NAR/\$TTO \1 2/g;
$mapping=~s/TTO ([<>]?=?) M-N/\$TTO \1 3/g;
$mapping=~s/TTO ([<>]?=?) MID/\$TTO \1 4/g;
$mapping=~s/TTO ([<>]?=?) WI/\$TTO \1 5/g;
$mapping=~s/TBL ([<>]?=?) PAL/\$TBL \1 0/g;
$mapping=~s/TBL ([<>]?=?) VEL/\$TBL \1 1/g;
$mapping=~s/TBL ([<>]?=?) UVU/\$TBL \1 2/g;
$mapping=~s/TBL ([<>]?=?) (PHA|PHAR)/\$TBL \1 3/g;
$mapping=~s/TBO ([<>]?=?) CL/\$TBO \1 0/g;
$mapping=~s/TBO ([<>]?=?) CR/\$TBO \1 1/g;
$mapping=~s/TBO ([<>]?=?) NAR/\$TBO \1 2/g;
$mapping=~s/TBO ([<>]?=?) M-N/\$TBO \1 3/g;
$mapping=~s/TBO ([<>]?=?) MID/\$TBO \1 4/g;
$mapping=~s/TBO ([<>]?=?) WI/\$TBO \1 5/g;
$mapping=~s/VEL ([<>]?=?) CL/\$VEL \1 0/g;
$mapping=~s/VEL ([<>]?=?) OP/\$VEL \1 1/g;
$mapping=~s/ =\s+\"/ eq \"/g;
$mapping=~s/ = / == /g;
$mapping=~s/ OR / \|\| /g;
$mapping=~s/ AND / \&\& /g;
printf("\$%s=\"%s\" if ((\$%s eq \"XXX\") && (%s));\n",$feature,$value,$feat
ure,$mapping);
}
print <<EOM2;
# END AUTOMATICALLY GENERATED RULES
print join(" | ","",\$PHN,\$place,\$manner,\$nasality,\$voicing,\$lip_roundi
ng,\$voc_tongue_ht,""),"\\n";
}
EOM2
However, when I use the script, some of the definitions I get are not sensical. So -- an opinion question -- should I try to get the rules right or just fix the FS4 definitions by hand?
Here's the result:
| #phn | place | manner | nasal | voicing | lip-rd | voc-ht |
| aa | CEN | VOW | - | + | - | HI |
| ae | FRT | VOW | - | + | - | LO |
| ah | CEN | VOW | - | + | - | MID |
| ao | CEN | VOW | - | + | + | HI |
| aw1 | FRT | VOW | - | + | - | LO |
| aw2 | BK | VOW | - | + | + | HI |
| ax | CEN | VOW | - | + | - | MID |
| axr | RHO | CLO | - | + | - | NA |
| ay1 | CEN | VOW | - | + | - | HI |
| ay2 | FRT | VOW | - | + | - | HI |
| b | LAB | FR | - | + | - | NA |
| bcl | LAB | CLO | - | + | - | NA |
| ch | PST-ALV | FR | - | - | - | NA |
| d | ALV | FR | - | + | - | NA |
| dcl | ALV | CLO | - | + | - | NA |
| dh | DEN | FR | - | + | - | NA |
| dx | FRT | GLI | - | + | - | MID |
| eh | FRT | VOW | - | + | - | MID |
| el | ALV | CLO | - | + | - | NA |
| em | LAB | CLO | + | + | - | NA |
| en | ALV | CLO | + | + | - | NA |
| er | RHO | CLO | - | + | - | NA |
| ey1 | FRT | VOW | - | + | - | MID |
| ey2 | FRT | VOW | - | + | - | HI |
| f | LAB-DEN | FR | - | - | - | NA |
| g | VEL | FR | - | + | - | NA |
| gcl | VEL | CLO | - | + | - | NA |
| hh | GLO | XXX | - | - | - | MID |
| ih | FRT | VOW | - | + | - | HI |
| iy | FRT | GLI | - | + | - | NA |
| jh | PST-ALV | FR | - | + | - | NA |
| k | VEL | FR | - | - | - | NA |
| kcl | VEL | CLO | - | - | - | NA |
| l | ALV | LAT | - | + | - | NA |
| m | LAB | CLO | + | + | - | NA |
| n | ALV | CLO | + | + | - | NA |
| ng | VEL | CLO | + | + | - | NA |
| ow1 | BK | VOW | - | + | + | HI |
| ow2 | BK | VOW | - | + | + | HI |
| oy1 | BK | VOW | - | + | + | HI |
| oy2 | FRT | VOW | - | + | - | HI |
| p | LAB | FR | - | - | - | NA |
| pcl | LAB | CLO | - | - | - | NA |
| r | RHO | CLO | - | + | - | NA |
| s | ALV | FR | - | - | - | NA |
| sh | PST-ALV | FR | - | - | - | NA |
| t | ALV | FR | - | - | - | NA |
| tcl | ALV | CLO | - | - | - | NA |
| th | DEN | FR | - | - | - | NA |
| uh | BK | VOW | - | + | + | HI |
| uw | FRT | GLI | - | + | + | NA |
| v | LAB-DEN | FR | - | + | - | NA |
| w | BK | GLI | - | + | + | NA |
| y | FRT | GLI | - | + | - | NA |
| z | ALV | FR | - | + | - | NA |
| zh | PST-ALV | FR | - | + | - | NA |
| epi | LAB | CLO | - | + | + | NA |
| sil | LAB-DEN | CLO | - | + | - | NA |
| dn | ALV | FR | - | + | - | NA |
| dcln | ALV | CLO | - | + | - | NA |
| tn | ALV | FR | - | + | - | NA |
| tcln | ALV | CLO | - | + | - | NA |
-- EricFoslerLussier - 02 Jan 2006
I will work on the rules. The main nonsensical ones seem to be the vowels. They seem to be very hard to convert between articulatory phonology-style features and IPA-style features. In the meantime, here's a fixed table for FS4 (a few notes below):
| #phn | place | manner | nasal | voicing | lip-rd | voc-ht |
| aa | BK | VOW | - | + | - | LO |
| ae | FRT | VOW | - | + | - | LO |
| ah | CEN | VOW | - | + | - | MID |
| ao | BK | VOW | - | + | + | LO |
| aw1 | FRT | VOW | - | + | - | LO |
| aw2 | BK | VOW | - | + | + | HI |
| ax | CEN | VOW | - | + | - | MID |
| axr | RHO | GLI | - | + | - | NA |
| ay1 | BK | VOW | - | + | - | LO |
| ay2 | FRT | VOW | - | + | - | HI |
| b | LAB | FR | - | + | - | NA |
| bcl | LAB | CLO | - | + | - | NA |
| ch | PST-ALV | FR | - | - | - | NA |
| d | ALV | FR | - | + | - | NA |
| dcl | ALV | CLO | - | + | - | NA |
| dh | DEN | FR | - | + | - | NA |
| dx | ALV | FL | - | + | - | NA |
| eh | FRT | VOW | - | + | - | MID |
| el | ALV | CLO | - | + | - | NA |
| em | LAB | CLO | + | + | - | NA |
| en | ALV | CLO | + | + | - | NA |
| er | RHO | CLO | - | + | - | NA |
| ey1 | FRT | VOW | - | + | - | MID |
| ey2 | FRT | VOW | - | + | - | HI |
| f | LAB-DEN | FR | - | - | - | NA |
| g | VEL | FR | - | + | - | NA |
| gcl | VEL | CLO | - | + | - | NA |
| hh | GLO | XXX | - | - | - | MID |
| ih | FRT | VOW | - | + | - | HI |
| iy | FRT | GLI | - | + | - | NA |
| jh | PST-ALV | FR | - | + | - | NA |
| k | VEL | FR | - | - | - | NA |
| kcl | VEL | CLO | - | - | - | NA |
| l | ALV | CLO | - | + | - | NA |
| m | LAB | CLO | + | + | - | NA |
| n | ALV | CLO | + | + | - | NA |
| ng | VEL | CLO | + | + | - | NA |
| ow1 | BK | VOW | - | + | + | HI |
| ow2 | FRT | GLI | - | + | + | NA |
| oy1 | CEN | VOW | - | + | + | HI |
| oy2 | FRT | VOW | - | + | - | HI |
| p | LAB | FR | - | - | - | NA |
| pcl | LAB | CLO | - | - | - | NA |
| r | RHO | GLI | - | + | - | NA |
| s | ALV | FR | - | - | - | NA |
| sh | PST-ALV | FR | - | - | - | NA |
| t | ALV | FR | - | - | - | NA |
| tcl | ALV | CLO | - | - | - | NA |
| th | DEN | FR | - | - | - | NA |
| uh | BK | VOW | - | + | + | HI |
| uw | BK | VOW | - | + | + | HI |
| v | LAB-DEN | FR | - | + | - | NA |
| w | BK | GLI | - | + | + | NA |
| y | FRT | GLI | - | + | - | NA |
| z | ALV | FR | - | + | - | NA |
| zh | PST-ALV | FR | - | + | - | NA |
| epi | LAB | CLO | - | + | + | NA |
| sil | LAB-DEN | CLO | - | + | - | NA |
| dn | ALV | FR | - | + | - | NA |
| dcln | ALV | CLO | - | + | - | NA |
| tn | ALV | FR | - | - | - | NA |
| tcln | ALV | CLO | - | - | - | NA |
Notes:
- axr = GLI?
- We will need more front-back/low-high values if we want to distinguish all of the vowels (ah vs. ax, uw vs. uh).
- Some other phone sets are indistinguishable (e.g. t vs. s). Some are perhaps OK left undistinguishable (e.g., r vs. axr vs. er, uh vs. ow2, ih vs. ay2).
-- KarenLivescu - 17 Jan 2006
|