Home

09.8.15.23:03: CROTHERS ON VOWEL SYSTEM FREQUENCY

While researching the last post on LCD phonology, I found John Crothers' survey of vowel systems in Greenberg et al. eds. (1978). Here are Crothers' top five vowel systems from a sample of 209 languages. He uses the term 'interior' to refer to

front rounded vowels: y ø œ

central nonlow vowels: ɨ ə

back unrounded vowels: ɯ ɤ ʌ

I'm deliberately not looking at the rest of Crothers' article to see if I recognize these systems.

1. 55 languages (26%)

5 peripheral vowels, 0 interior vowels, symmetrical


Front Central Back
High i
u
Lower mid ɛ
ɔ
Low
a

This system is in Spanish.

Hawaiian also has this system if length is disregarded.

Some might analyze Russian as having five vowels if ы is treated as an allophone of и, but I prefer to treat it as having six vowels.

2. 29 languages (14%)

5 peripheral vowels, 1 interior vowel


Front Central Back
High i ɨ u
Lower mid ɛ
ɔ
Low
a

This system is in Russian if one regards /ɨ/ as a phoneme distinct from /i/.

3. 23 languages (11%)

3 peripheral vowels, 0 interior vowels, symmetrical


Front Central Back
High i
u
Low
a

This is the Classical Arabic system (if vowel length is ignored).

4. 14 languages (7%)

5 peripheral vowels, 2 interior vowels, symmetrical


Front Central Back
High i ɨ u
(Upper) mid e ə o
Low
a

I can't think of any major language with this system. Gong reconstructed this system (ignoring vowel length, tenseness, and retroflexion) for Tangut, I reconstructed this system for Old Japanese, and Whitman and Frellesvig reconstructed this system for Proto-Japonic.

5. 13 languages (6%)

4 peripheral vowels, 0 interior vowels, asymmetrical


Front Central Back
High i
u
Lower mid ɛ

Low
a

I don't know of any major language with this system.

I'm surprised that these two systems

Old Chinese, pre-Tangut, and my Proto-Japonic vowels

5 peripheral vowels, 2 interior vowels, symmetrical


Front Central Back
High i
u
(Upper) mid e ə o
Low
a

Proto-Austronesian and Martin 1987's Proto-Japonic vowels

3 peripheral vowels, 1 interior vowel, symmetrical


Front Central Back
High i
u
Mid
ə
Low
a

are not in the top five or even in Crothers' Table 2 of top ten vowel system types.

Notice that none of the front rounded vowels common in European languages (French, German, and even Finnish and Hungarian) are in the top five or even Crothers' top ten. Assuming that Crothers' sample is representative of languages throughout the world, this tells us how atypical European languages are.

The 'biggest' languages don't necessarily have the most common vowel systems:

Mandarin has y which is not in the top ten.

Hindi and English have vowel systems larger than any of the top five above. They still don't fit any of the top five even if some vowels are treated as length variants and length is ignored.

Bengali has a symmetrical four-height system.

Portuguese has nasal vowels which are not in the top ten.

I've already mentioned that German and French have front rounded vowels which are not in the top ten. French also has nasal vowels.

Only Spanish is an exact match for any of the top five, though Japanese comes close and Russian is arguable.


09.8.14.23:59: WHAT WOULD AN LCD LANGUAGE SOUND LIKE?

If someone were to create a new Esperanto, they might want their language to be easily pronounced by speakers of most of the world's languages. They could devise a lowest common denominator (LCD) sound system consisting only of the most common consonants and vowels combining to form simple CV syllables.

Here's an LCD segmental inventory consisting only of sounds found in 90% or more of the languages in UPSID (the UCLA Phonological Segment Inventory Database):

Consonants: m: 94.24% - the only one! (Mohawk is among the tiny minority of languages without it.)

Vowels: None! But I would bet a, i, u would surpass 90% if slight variations were conflated.

I don't think a language consisting solely of m is going to be very successful. Let's try again, lowering the threshold to 80%:

Consonants:


Bilabials Palatal Velar
Stop 4. p: 83.15%
2. k: 89.36%
Nasal 1. m: 94.24%

Glide
3. j: 83.81%

Note that the top three have different manners and places of articulation. p in fourth place is the first consonant whose place overlaps with one of the first three. The first consonant whose manner overlaps with one of the first three is labiovelar w (73.61%) in fifth place.

I'm surprised there's no t or n. This is probably because UPSID treats dentals and alveolars differently. Conflating the two classes might result in 80%+ frequencies for t and n.

Vowels:


Front Central Back
High i: 87.14%
u: 81.82%
Low
a: 86.92%

Note that

i is the vowel counterpart of j

a and k can both be pronounced with the mouth wide open

u is bilabial like m

The four next most common vowels have much lower frequencies:

4. Lower mid front ɛ (I would have expected upper mid e): 41.24%

5. Upper mid front o (oddly, not ɔ which would be at the same height as ɛ): 40.13%

6. Upper mid front e (slightly less frequent than its same-height back counterpart o): 37.47%

7. Lower mid front ɔ (less common than its same-height front counterpart ɛ): 35.92%

I suspect that conflating the figures for upper mid and lower mid vowels would result in percentages much higher than 40% but still less than 80%.

Assuming that UPSID is a representative sample of the world's language, this exercise shows how little they have in common on an exact phonetic level.


09.8.13.22:55: WHICH LA-N-GUAGES HAVE N-O N-ASALS?

I just discovered that UPSID allows me to search for languages that lack a certain type of sound. So I set the unwanted sound class to 'nasal' and got 16 results - "3.55% of all languages in UPSID."

The one that surprised me out of these 16 was Hakka. I've never heard Hakka spoken, but all descriptions I've seen include nasals. Hashimoto (1973: 88) describes Hakka initial nasals as having "a slight denasalization" yet treats them as phonological nasals /m n ŋ/. He regards Hakka final nasals as true nasals, so Hakka should not be among those 16 even if one regarded Hakka initial /m n ŋ/ as /mb nd ŋg/ (based on his phonetic notation on p. 95). In any case, the Old Chinese ancestor of Hakka definitely had nasals.

Hereafter I will regard any language with nasal allophones as 'having' nasals. The phonemic analysis of a language is open to question, but a language either has or does not have phonetic nasals.

Pirahã (South America) has [m] and [n] as allophones of /b/ and /g/. It also has nasalized vowels after /h/ and /ʔ/.

Andoke, Cubeo, Epena Pedee, and Maxakali (South America) and Klao and Kpan (Africa) have nasal allophones of voiced (prenasalized) stops before nasalized vowels. Kaingang has nasal allophones of voiced stops after nasalized vowels and in word-initial position before nasalized vowels.

The remaining 7 languages fall into four categories:

1. Apinaye and Siriono (South America) have prenasalized stops and nasalized vowels.

2. Waris (New Guinea) has prenasalized stops.

3. Barasano (South America) has nasalized vowels.

4. Lushootseed, Quileute (North America), and the central dialect of Rotokas (New Guinea) have no nasal(ized) segments.

Assuming that none of the above seven languages have nasal allophones, I suspect that they lost their nasals via two paths:

1. Nasals to prenasalized stops to oral stops: e.g., m > mb > b > β

Languages in categories 1 and 2 may be at the mb stage. Languages in category 4 may be at the b stage (Lushootseed and Quileute) or the β stage (central Rotokas which has [b] ~ [β] allophony). Central Rotokas voiced obstruents correspond to Aita Rotokas voiced obstruents and nasals, implying that CR has merged them into a single class of oral phonemes: e.g., *b, *m > CR /b/ [b] ~ [β].

All category 4 languages have voiced oral obstruents (from earlier nasals?). There may be no language lacking both nasals and voiced oral obstruents.

2. Nasal-vowel sequences to nonnasal-nasalized vowel sequences: e.g., ma > > mbã >

This may have occurred in the category 1 and 3 languages. If nasalized vowels ever existed in the ancestors of the category 2 and 4 languages, they lost their nasality: ã > a.


09.8.12.22:11: THE 'MISSING' BILABIALS OF MOHAWK

Most languages I know of have at least two bilabials: e.g.,

Vietnamese: ɓ, m (no p or b!)

Hawaiian: p, m, w

Average European language: p, b, m

Mandarin and many Chinese languages: p, ph, m, w (romanized as b, p, m, w in Pinyin)

Thai: p, ph, b, m, w

m appears in nearly all languages in UPSID's sample:

1. m: 94.24%

2. p: 83.15%

3. w: 73.61% (regarded by UPSID as 'labial-velar')

4. b: 63.64%

However, Mohawk has only one native bilabial (w) which Wikipedia regards as velar.

According to the Mohawk Language Standardisation Project, w is pronounced before a vowel as in English will, so I presume it is, strictly speaking, labiovelar [w] rather than a true velar [ɰ].

The sequence /wh/ is pronounced as labiodental [f]. I don't know of any language that has [f] but no bilabial oral stops.

Although it's possible that Mohawk never had any bilabials - none if we regard w as labiovelar - until contact with Westerners, I think it's more probable that it once had them and lost them.

Given that Mohawk has /t/ and /n/, I presume that pre-Mohawk had their bilabial counterparts *p and *m.

*p might have gone to zero as in Irish:

'father': Proto-Indo-European *pHteer > Proto-Celtic *ɸatiir > Ir athair (cf. Latin pater)

*m might have merged with *w.

Mandarin and many Chinese languages have partly merged Old Chinese nonemphatic *m and *w: e.g.,

'die': 亡 OC *m > Md wang (but Cantonese mong still has m-)

'king': 王 OC *w > Md wang (Cantonese wong)

Note, however, that OC *m remained unchanged before front vowels: e.g.,

'people': 民 OC *min > Md min, Cantonese man

'increasingly': 彌 OC *me > Md mi, Cantonese nei (!; also mei)

I don't know of any language in which all m became w.


09.8.11.23:59: BALANCING THE VOWEL SYSTEM OF MOHAWK

David Boxenhorn suggested that the unusual features of Mohawk phonology indicated a language in transition. What if the on-bal-en-ced vowel system of Mohawk is a halfway point between two very different balanced vowel systems?

Pre-Mohawk could have had a symmetrical four-vowel system without nasal vowels:


front central back
high *i
*u
mid

low
*a
total number of vowels per column 1 2 1

Nasal vowels developed from *VN sequences:

*uN fused into *ũ

nonlabial vowels plus + *-N became *ĩ, *ə̃, *ã̃̃


front central back
high *i *ĩ
*u *ũ
mid
*ə̃
low
*a *ã
total number of vowels per column 2 4 2

Four changes upset this balance:

1. Nonlabial nasal vowels *ĩ, *ə̃, *ã merged into *ə:̃

̃̃

front central back
high *i
*u *ũ
mid
*ə̃
low
*a
total number of vowels per column 1 3 2

2. fronted and lowered to to differentiate it from *ə:̃


front central back
high *i
*u *ũ
upper mid
*ə̃
lower mid
low
*a
total number of vowels per column 2 2 2

3. *u lowered to to differentiate it from *ũ and balance *ɛ:


front central back
high *i
*ũ
upper mid
*ə̃
lower mid
low
*a
total number of vowels per column 2 2 2

Changes 2 and 3 could have occurred in the opposite order. (Of course, if 3 occurred first, then would front to balance instead of *u backing to to balance *ɛ.)

4. *ə̃ backed and lowered to match the other nasal vowel *ũ in labiality and the other mid vowels in height. This results in he current unbalanced vowel system (in a more precise phonetic notation than in previous posts) which has only one central vowel but three back vowels (and two lower mid back vowels!).


front central back
high i [i]
on [ũ]
upper mid


lower mid e [ɛ]
en [ʌ̃] o [ɔ]
low
a [a]
total number of vowels per column 2 1 3

Here are two possible balanced futures for the Mohawk vowel system.

I first thought of scenario A:

1. loses its labiality and becomes ɯ̃:


front central back
high i
ɯ̃
upper mid


lower mid ɛ
ʌ̃ ɔ
low
a
total number of vowels per column 2 1 3

2. ʌ̃ becomes central ə̃ to differentiate it from ɔ:


front central back
high i
ɯ̃
upper mid
ə̃
lower mid ɛ ɔ
low
a
total number of vowels per column 2 1 3

3. ɯ̃ becomes central ɨ̃ to match central ʌ̃:


front central back
high i ɨ̃
upper mid
ə̃
lower mid ɛ ɔ
low
a
total number of vowels per column 2 1 3

4. Nasal vowels denasalize:


front central back
high i ɨ
upper mid
ə
lower mid ɛ ɔ
low
a
total number of vowels per column 2 1 3

5. o raises to u:


front central back
high i ɨ u
upper mid
ə
lower mid ɛ
low
a
total number of vowels per column 2 1 3

6. a backs to ɑ, approaching the lower mid back slot vacated by ɔ (now u). The result is a six-vowel system without o:


front central back
high i ɨ u
upper mid
ə
lower mid ɛ
low

ɑ
total number of vowels per column 2 2 2

My English dialect has an o-less short vowel system: e.g., cot is [kɑt]. (But I do have [ou] in my long vowel system: e.g., coat [kout].)

7. At some further future point, ɑ could raise to ɔ, and the central vowels ɨ and ə could lower to ə and a:


front central back
high i
u
upper mid
ə
lower mid ɛ ɔ
low
a
total number of vowels per column 2 2 2

Scenario B reaches the same endpoint in a much simpler manner:

1. loses its nasality and becomes u:


front central back
high i
u
upper mid


lower mid ɛ
ʌ̃ ɔ
low
a
total number of vowels per column 2 1 3

2. ʌ̃ becomes central ə̃ to differentiate it from ɔ:


front central back
high i
u
upper mid
ə̃
lower mid ɛ ɔ
low
a
total number of vowels per column 2 2 2

3. ə̃ loses its nasality and becomes ə:


front central back
high i
u
upper mid
ə
lower mid ɛ ɔ
low
a
total number of vowels per column 2 2 2

(8.12.1:57: Completely reorganized; step-by-step tables and scenario B added.)


09.8.10.23:59: THE ON-BAL-EN-CED VOWEL SYSTEM OF MOHAWK

Mohawk initially seems to have only four vowels. Its alphabet has the letters a, e, i, and o but not u. However, the letter combinations en and on represent [ʌ̃] and [ũ], not [en] and [on] or even [ẽ] and [õ]. So Mohawk actually has six vowels:

i [i]
on [ũ]
e [e] en [ʌ̃] o [o]

a [a]

There are no nasal versions of [a e i o] and no oral versions of [ʌ̃ ũ]:

Mohawk oral vowels

i [i]
(no [u]!)
e [e] (no [ʌ]!) o [o]

a [a]

Mohawk nasal vowels

(no [ĩ]!)
on [ũ]
(no [ẽ]!) en [ʌ̃] (no [õ]!)

(no [ã]!)

I wonder how this unbalanced system came into being. Did pre-Mohawk have four vowels *a, *i, *u, *ʌ? (*N = unspecified pre-Mohawk syllable-final nasal consonant.)

*i

*u



*a

From four to six vowels

Pre-Mohawk *a *i *aN *iN *ʌN *uN *u
Mohawk a i e en [ʌ̃] on [ũ] o

All nonlabial vowels (*a, *i, *ʌ) + *-N merged into en [ʌ̃].

fronted to e to differentiate it from nasal en [ʌ̃].

*u lowered to o to differentiate it from nasal on [ũ].

Or did pre-Mohawk once have six oral vowels?

Pre-Mohawk six-vowel system

*i
*u
*e *o

*a

From six vowels to six vowels

Pre-Mohawk *a *i *e *aN *iN *eN *ʌN *uN *oN *u *o
Mohawk a i e en [ʌ̃] on [ũ] o

All nonback vowels (*a, *i, *e, *ʌ) + *-N merged into en [ʌ̃].

Both back vowels (*u, *o) + *-N merged into on [ũ].

*e and merged into e.

*u and *o merged into o.

Or did pre-Mohawk have some other vowel system?


09.8.9.23:39: MOHAWK ISN'T A MOHAWK WORD

Thanks to Andrew West, I got interested in Mohawk. Mohawk not only isn't based on a Mohawk word but can't be one because Mohawk has no m - or p or b. It originally had no bilabial stops - oral or nasal - though it does have m and p in loanwords and the bilabial glide w in native words. Here's one possible etymology for Mohawk. Their name for themselves is Kanien'kehá:ka 'people of the flint'. The colon is part of the spelling and indicates a long vowel.

Wikipedia transcribes the name of their language (Kanien’kéha) as [kanjʌ̃ʔˈɡɛha], even though the article also says that

The consonants /k/, /t/ and the clusters /ts kw/ are pronounced voiced before any voiced sound (i.e. a vowel or /j/). They are voiceless at the end of a word or before a voiceless sound. /s/ is voiced word initially and between vowels.

which implies that Kanien'kéha should be [ganjʌ̃ʔˈɡɛha]. But if it - and the related autonym Kanien'kehá:ka had initial and medial [g], why don't (presumably lay) transcriptions like

Kanienkeh, Kanienkehaka, Kanien'Kahake

contain g-? Did the voicing of word-initial k (and kw, ts, t) before vowels begin after these early transcriptions? Did the transcription of /k kw t ts/ as k kw ts t in all positions persist after they developed voiced allophones before vowels and /j/?

I don't know of any other language with a Mohawk-like distribution of voiced and voiceless oral obstruents. If I had never heard of Mohawk, I wouldn't think a language could lack any simple voiceless oral obstruents in initial position:

Possible voiceless initials: all clusters? Impossible voiceless initials
[kt kk ks kh st sk sh tk th] [k kw ts t]

Could the cluster /hl/ be voiceless [l̥] or even [ɬ]?

I don't know if [ʔ] can contrast with zero in word-initial position before a vowel. All other [ʔ]-clusters are only word-medial.

I wonder if Mohawk speakers borrowed foreign k and t as kh and th to preserve their voicelessness.


Tangut fonts by Mojikyo.org
Tangut radical font by Andrew West
All other content copyright © 2002-2009 Amritavision