For years, I was skeptical about Norman's 1994 hypothesis of pharyngeals in Old Chinese. What changed my mind was a chance encounter with a book on Maltese phonetics and phonology in the Leiden University library. I was surprised at the parallels between Maltese and Chinese vowels: e.g.,

*aa (in a nonemphatic environment?) > ie [iə] ~ [ɪɪ] (source; Comrie [1996: 687] listed the phonetic realization of ie as [iɛ])

cf. OC *a (in a nonemphatic environment according to Norman's hypothesis) > Middle Chinese *ɨə, Meixian Hakka i

the pronunciation of għi (presumably a semi-etymological spelling) as [aʕj]

cf. OC *i (in an emphatic environment according to Norman's hypothesis) > southern Middle Chinese *aj (see Pulleyblank 1984: 199), northern Middle Chinese *ej

Then I looked into classical (and a bit of modern colloquial) Arabic phonetics and found even more parallels.

But doubts remained because emphatic consonants in classical Arabic and Hebrew are restricted to acute and postvelar consonants, whereas Norman's hypothesis required OC emphatics at all points of articulation. (Strictly speaking, Arabic postvelars are not 'emphatics', but vowels behave similarly around acute emphatics and postvelars [Kaye 1987: 669].)

Then I found a lot of descriptions of modern Arabic dialects online and learned that Cairo Arabic has plain and pharyngealized versions of every consonant other than [q] (which behaves like a pharyngealized consonant). See Islam Youssef's dissertation for details. On closer observation, most of these emphatics are secondary. Similarly, back in 2002 I hypothesized that most emphatics in OC were secondary except for *q.

Nonetheless, the OC emphatic hypothesis still has many problems from both internal and comparative points of view.

I prefer proto-languages to be as nonexotic (i.e., probable) as possible. This does not mean that proto-languages have to be 'bland', but extraordinary proto-characteristics require extraordinary justification. Emphatics are rare in UPSID: e.g. (if alveolars and dentals are conflated),

t, d, z appear in only three languages: 'Arabic' (presumably modern standard Arabic), Tamasheq (Berber), and Shilha (Berber).

s is only in 'Arabic' and Shilha

k, l, r are only in Shilha (Arabic l in ʔallaah seems to have been overlooked)

ŋ is only in !Xu (which has 140 other segments!)

UPSID lists no n, p, b, m, w which would be common in an emphatic OC reconstruction. All five exist in Cairo Arabic, so they're not impossible. But I have never heard of a language with the other elements of OC's emphatic inventory:

aspirates: kh (and qh?), tsh, th, ph

voiceless sonorants: , hn, hm, hr, hl, hw

dz (I presume ts could have been an intermediate stage between s and ts in Hebrew - does anyone pronounce צ as [tsʕ]? According to Wikipedia, Yemenite Jews pronounce it as [sɣ] or [sʕ] without [t].)

Moreover, two of these questionable OC emphatics (*hn and *hl) hardened to Middle Chinese *th. Such fortition is not entirely surprising given how s became an affricate ts in Hebrew (but not in Chinese!). Nonetheless, I'd feel more confident if similar sound changes were observed in other languages. I don't want OC to be sui generis.

4.12.16:46: The fortition of *hn and *hl to Middle Chinese *th accounts for the nasal and lateral phonetics in graphs for some MC *th-syllables: e.g.,

嘆 MC *thanh < OC *hnars 'sigh'

its phonetic also appears in 難 MC *nan < OC *nar 'difficult'

湯 MC *thaŋ < OC *hlaŋ 'soup'

its phonetic also appears in 陽 MC *jɨaŋ < OC *laŋ 'yang; male principle' LOOK ON THE UPSID

Some historians treat reconstructed proto-phonemes as abstract symbols representing concrete sound correspondences in actual languages. I prefer my proto-phonemes to be as phonetically realistic as possible because phonetic details have implications for both the proto-phoneme system and later sound changes. I look at attested phonemic systems and their phonetic realizations to find patterns that I can apply to my reconstructions. I want to avoid proto-sound systems that look like collections of random IPA symbols. Phonemic inventories tend to be organized as if their speakers had an unconscious sense of sonic aesthetics. And they tend to contain certain sounds more than others.

The UCLA Phonological Segment Inventory Database (UPSID) compiled by Ian Maddieson* and Kristin Precoda (what a surname for a phonetician!) shows which segments occur in 451 languages. The front page of Henning Reetz' user interface shows the top 20 consonants and top 10 vowels. See the complete list of the segments in the database for the rest. The list of all the languages in the database starts with Pirahã and Rotokas (only 11 segments each; see below) and ends with !Xu with 141 segments.

Pirahã consonants (no nasals or r, despite the name of the language)









Pirahã vowels (I'm surprised it has o instead of an u that would balance the i)




Rotokas consonants (again, no nasals, and I'm surprised it has a g but no b or d, since many languages have the opposite pattern - I presume β and ɾ are from *b and *d)







Rotokas vowels (pretty normal)






Tangut would have a large inventory if tense and retroflex vowels were counted as distinct segments. (UPSID distinguishes between oral and nasal vowels but ignores length, tenseness, and retroflexion.)

What I'd like to see is a program that could show the frequencies of combinations of segments: e.g.,

- do languages that have o tend to have e? (Pirahã would be an exception to such a tendency.)

- how many languages have the Classical Arabic pattern of [f t k b d dʒ] without [p] and [g]? (Earlier*p and *g became [f] and [dʒ].)

*Ian Maddieson was the professor of my phonetics class at Berkeley in 1992. He pointed out to me that I pronounced English final -l as a velar glide [ɰ], which explained at last why I had to take special care to pronounce final Korean /l/ as a real lateral [l]. PRESYLLABLES COULD BE THE KEY!

Weera Ostapirat, an expert on Kra-Dai - he did coin the name, after all - might agree with my hypothesis about the origins of Be initials. Today I found an abstract of his article "Be's Obstruent Consonants in Kra-Dai Perspective":

The study suggests that Kra-Dai obstruent consonants may have developed into different Be sounds depending on whether they occured as a pre-Be initial or medial. When occured as a medial consonant, these obstruents were lenited. These medial lenited sounds then may have directly become modern Be spirant or approximant initials (after the preceding syllable dropped) or occluded to stops of respective articulations.

I wonder if Be d- was a hardened medial *-ð- from an even earlier medial *-t-. A fricative *-ð- stage would allow me to claim that all medial obstruents fricated:

Pre-Be *-kh- *-p- *-f- *-m- *-th- *-t- *-ɗ-
voicing *-g- *-b- *-v- *-dɦ- *-d- *-ɗ-
frication *-ɣ- *-β- *-v- *-ɦ- *-ð- *-ɮ-
Be v- h- d- l-

*dɦ occasionally weakened to ɦ in Sanskrit: e.g., hita 'placed' < *dɦita. (Most forms of 'place' retain the voiced aspirate stop: e.g., dhiiyate [dɦiijətee] 'was placed'.) COULD PRESYLLABLES BE THE KEY?

Weera Ostapirat (2000: 57-58) has a list of Be-Siamese comparisons with some odd correspondences:

Be Siamese Examples
k- kh- 'bitter', 'knee', 'excrement'
ŋ- 'rice'
v- 'hair'
p- 'year', probably 'blow'*
f- 'dream'
m- 'bear'
h- th- 'bowl'
ɓ- p- 'go', 'aunt', 'mouth'
d- (typo for ɗ-?) t- 'low'
l- d- < *ʔd- 'nose', 'obtain'

Here's what I think I happened in Be (based solely on that list - my only exposure to Be):

1. Root-initial kh-, p-, f-, m- lenited and merged as v- intervocalically after a presyllable that was later lost:

*CV-kh- > *CV-g- > *CV-ɣ- > v-

*CV-p- > *CV-b- > *CV-β- > v-

*CV-f- > *CV-v- > v-

*CV-m- > *CV-w̃ > v-

(did *w- also become Be v-?)

2. Similarly, root-initial th- lenited after a presyllable that was later lost:

*CV-th- > *CV-d- > *CV-ð- > h-

3. *N-kh- became ŋ-.

(*N- is an unknown nasal.)

4. *kh- not preceded by anything became k-, merging with original *k-.

(Siamese k- < *k- corresponds to Be k- in 'chicken'.)

(Did all aspirates deaspirate: e.g., did *th- not preceded by anything become t-?)

5. *ʔd- (I prefer *ɗ-) became l-.

6. After (5), Be developed a new d- (a typo for ɗ-?) from *t-. Similarly, *p- became ɓ-.

(What happened to *ʔb- [or *ɓ-]? Did it merge with *p- as ɓ-, or become something else?)

*Ostapirat listed the Be word for 'blow' as pau 21 and its Siamese cognate as vou B1. But the Siamese word for 'blow' is pau B1, so I suspect the Be word is vou 21.

4.10.0:02: Siamese has no v-; Proto-Tai *v- became Siamese f-. FROM PRESYLLABLES TO PROTO-KRA CLUSTERS? (PART 1)

The Kra languages aren't rich in Chinese loans, but I'm still interested in them because they have 'siniform' phonology and are distantly related to Siamese. They comprise one branch of the Kra-Dai family (table adapted from Ostapirat [2000: 1]):

Kra-Dai (a.k.a. 'Tai-Kadai')
Kra (a.k.a. Kadai) languages Hlai languages Kam-Tai languages
Be dialects Tai languages Kam-Sui languages

Ostapirat's 2000 book Proto-Kra is a very readable introduction to the Kra languages. Its appendix contains Ostapirat's reconstructions of over 300 Proto-Kra etyma. Some of the reconstructions have strange-looking clusters:

*kɣ- *t-ɣ- (why the hyphen?*) *pɣ- *bɣ- *mɣ-

*kʒ- is particularly odd because it is the only certain *Cʒ-sequence in Ostapirat's list of reconstructions. (Proto-Kra *dʒ is a unit phoneme, and *n(ʒ)- has parentheses which may indicate uncertainty.) Perhaps *tʒ- merged with *tʃ- or*dʒ-, but why is there no *pʒ-?

I suspect that the voiced fricatives in these clusters originated from obstruents that lenited intervocalically: e.g.,

*CVk- > *CVg- > *CVɣ- > *Cɣ-

*kVS- > *kVZ- > *kVʒ- > *kʒ-

(*S and *Z- are cover symbols for any Proto-Kra voiceless or voiced sibilant.)

(4.9.23:23: Cf. the Khmer clusters /tk/, /pk/, /ks/. I can't find *tk-, *pk-, *ks- in Ostapirat's Proto-Kra reconstruction.)

This is not unlike what I think happened in Tangut (though Tangut dropped the presyllables completely):

*CV-k- > *CV-g- > *CV-ɣ- > ɣ-

*CV-t- > *CV-d- > *CV-l- > l-

*CV-p- > *CV-b- > *CV-β- >*CV-v- > *v- > w-

*CV-S- > *CV-z- > z-

One pair of Proto-Kra words implies that medial *k-lenition took place in pre-Proto-Kra:

'I': PK *t-ɣu (tone A)

'we': PK *ku (tone A)

Could 'we' be from 'I' with a prefix?

*tV-ku > *tV-gu > *tV-ɣu > *t-ɣu

*Proto-Kra hyphenated initials 'fall off' in some of the daughter languages. Since all reflexes of *t-ɣ- contain dental stops, i wonder if *t-ɣ- is a typo for *tɣ-.

4.9.21:41: I originally proposed

*CVp- > *CVb- > *CVβ- >*CVɣ- > *Cɣ-

but now I wonder if some *kw- type clusters could partly be from sequences like *kV-p- with medial labial stop lenition. (23:24: Cf. Khmer /kp/. I can't find *kp- in Ostapirat's Proto-Kra reconstruction.)

*Cl- and/or *Cr- could partly be from sequences like *CV-t-. FV = VI

Another language I should know more about is Bai, which is so full of Chinese-like words that some think it is a Chinese language. Bai literally looks like Chinese; it was once written in an expanded sinography with baigraphs constructed from Chinese components: e.g.,

亻+昂 ŋa 'we' (< 'person' + phonetic [Middle Chinese *ŋaŋ])

Although the study of phonetic elements in baigraphy might elucidate the phonological history of Bai, the only baigraphs I have on hand are some examples in 白语简志.

The bigraph for 'six' is the sinograph 陸 'land' (also used as a complex substitute for 六 'six' in Chinese). One might expect the Bai for 'six' to resemble 陸/六 Middle Chinese *luk, but in fact it is fv̩(fu in Dai and Huang 1992), not lv̩ or lu. Why does Bai have an f-? I could only think of two possible answers:

1. fv̩ has nothing to do with MC *luk < Old Chinese *ruk, or related words (e.g., Written Tibetan drug or Written Burmese khrauk); 陸 is a semantogram representing the Bai translation equivalent of Chn 'six'. Nonetheless, the rhyme and tone of fv̩ seem to be from an earlier *-u + stop sequence*, just like other Sino-Tibetan words for 'six'. Is that similarity simply coincidental?

2. fv̩ is the normal Sino-Tibetan word for 'six', and its f- may be from an earlier cluster (e.g., *pr- or *pl-; cf. Lushai paruk 'six', though Lushai is only distantly related to Bai - Lushai and pre-Bai would have added a labial prefix independently). This solution seems improbable. The glossary in 白语简志 has only a few words with f-, and none of those f-s seem to correspond to Old Chinese *pr- or *pl-. I doubt that 'six' could be the only common word in pre-Bai with a *p-liquid cluster.

I've rejected a third answer (the f- reflects an odd Chinese word for 'six') since I've never seen any Chinese word for 'six' with a labial initial.

I'd like to read Laurent Sagart's 1998 paper on loan strata in Bai and see what he thought about 'six'.

*4.7.20:03: Bai pv̩ 'belly' (with the same tone as 'six') corresponds to 腹 Middle Chinese *puk. NINE GOLDEN CHICKENS

In the summer of 1997, I picked up Lonely Planet's Thai Hill Tribes Phrasebook so I could have a cheap (only US $4.95!) source of SE Asian language data compiled by reliable scholars.

Its Mien chapter was written by Chris Court, who was a professor at the University of Hawaii for a few years. (I lost touch with him after he left UH and haven't been able to locate him since. What happened to him?)

Mien is a language I should know more about because it contains many Chinese loans that could help me reconstruct earlier stages of Chinese. Unfortunately, I know almost nothing about Mien or its sister Hmong which interests me for the same reason. So I'm not sure if I'm interpreting Court's data correctly. Since (1) the phonetic description in the Lonely Planet book was written for tourists and (2) I have no access to any other Mien material at the moment, I have to guess the IPA equivalents of the romanization that Court used.

The lower numerals of Mien appear to be partly cognate to those of Hmong, so I assume they're native Hmong-Mien numerals. The higher numerals, on the other hand, are Chinese loanwords. The Chinese words for 'one' through 'nine' are in the Mien words for 'eleven' through 'nineteen' which are literally 'ten' + ('one' through 'nine). The following table contains these Chinese-derived morphemes which presumably cannot stand alone (since Mien has its own numerals for 'one' through' nine') plus other higher numerals:

Gloss Middle Chinese Mien orthography* Mien phonetics (?) Notes
one *ʔit yietv jiət did *i break into in closed syllables and ej in open syllables?
two *ɲih nyeic (but nyic in nyic ziepc 'twenty') ɲej / ɲi
three *sam faam (but faah in faah ziepc 'thirty') faam / faa did *s > f, or does f reflect a southern Chinese language with ɬ (cf. how Welsh ll [ɬ] has been Anglicized as Fl- in Floyd)?; *s > ɬ in Toisanese and in some Tai languages
four *sih feix fej
five *ŋoʔ hmmz hmm̩ hm- < *hŋ-?
six *luk luoqc luək did *u break into uə? *-k seems to have become glottal stop
seven *tshit cietv tshiət see 'one'
eight *pɛt betv pet did raise to i?
nine *kuʔ juov tɕuə see the main text of this entry
ten *dʑip ziepc tsiəp see 'one'
hundred *pæk baeqv pɛʔ did raise to ɛ?
thousand *tshen cin tshin did *e raise to i?
ten thousand *muənh waanc waan MC *mu- later became *ɱv-; Sino-Vietnamese vạn [vaan] retains the *-v- while Cantonese retains the nasal: maan. Could waanc be a loan from a southern Mandarin cognate of standard Md wan?

(Middle Chinese *-ʔ and *-h represent creaky and breathy phonation, not [ʔ] and [h]. Hence MC *-ʔ does not correspond to Mien -q [ʔ].)

There might have been a chain shift: > > *e > *i > iə/ej. (4.7.9:53: Since my own evidence for this shift is in loanwords, the shift could have occurred in the donor language before the words were borrowed.)

The number that leapt out at me out of that list was juov [tɕuə] 'nine'. The initial is palatal like the initial of standard Mandarin jiu [tɕjow] 'nine'. Yet the rhyme is un-Mandarin, so it can't be a loan from southern Mandarin. And southern Chinese languages have a velar stop for 'nine'. I could only conclude that j- [tɕ] was a palatalized earlier Mien *k.

Tonight, I found a 1996 article by David Solnit confirming my guess. According to Solnit (Linguistics of the Tibeto-Burman Area 19.1.14), Mien could come from Proto-Hmong-Mien *kj or *k followed by *i. Thus juov [tɕuə] 'nine' could be from an earlier *kju or *kiw reflecting a southern late MC *kɨw (cf. Sino-Vietnamese cửu [kɨw]).

I found two other Chinese loanwords in Court's Mien chapter that may have undergone this Mien-internal shift:

'chicken': Mien jae [tɕɛ] < Proto-Hmong-Mien *kai (Downer 1982, quoted in Solnit)?

cf. early southern Middle Chinese *kaj (without *-j- or *i!)

and late southern Middle Chinese*kjej (cf. Sino-Vietnamese [ke])

'gold': Mien jiem [tɕiəm] < *kim with *i-breaking?

cf. Middle Chinese *kɨm

'Chicken' is puzzling for two reasons.

First, PHM *kai should become something like Mien gae [kɛ] in Solnit's scheme since *k doesn't precede a *j or *i.

(4.7.10:37: Was the Mien descendant of PHM *kai replaced by a borrowing of late southern Middle Chinese*kjej? But I'd expect MC *e to become Mien i; cf. 'thousand'.)

Second, PHM *kai has a velar initial even though the Hmong word for 'chicken' is qab (b is a tonal letter) with a uvular initial. I would expect the PHM form to have a uvular initial which might reflect a uvular initial in Old Chinese *ke [qae] 'chicken'.

(4.9.21:27: Wang Fushi [1994; cited in Sagart 1999: 192] reconstructed Proto-Hmong *qe 'chicken' with the expected uvular initial. Pehaps Mien borrowed 'chicken' after Chinese had shifted uvulars to velars.)

*Key to tonal spelling (numerical notation speculations added 4.7.10:25):

no final tonal letter: medium-high level tone (55?)

-v: high rising-falling tone (454?)

-h: mid falling tone (42?)

-x: medium-low rising tone (24?)

-z: low rising-falling tone (121?)

-c: low level tone (11?)

Mien tones seem to have these correspondences with Chinese tonal categories:

Chinese tone category Level Rising (< MC *-ʔ) Departing (< MC *-h) Entering (MC final stops)
Yin = MC *voiceless unmarked -v -x -v
Yang = MC *voiced -h -z -c

None of the numerals have -h except 'three' in 'thirty', but 'mule' (corresponding to 驢 MC *lɨə [Yin Level tone]) is lorh [lɔ].

4.7.10:18: The vowel correspondence *ɨə : ɔ reminds me of Korean 어 [ə] ~ [ɔ]. The rounded vowel in lorh could also reflect an MC variant *lɔ from an emphatic OC *ra . Meixian Hakka lu 'mule' could be from this MC variant. MC *lɨə is from a nonemphatic OC *ra and normally corresponds to Meixian li: e.g., the surname 呂: MC *lɨə, Meixian li.

4.7.10:34: Mien seems to be a voiced-low language like Cantonese:

Chinese tone category Level Rising (< MC *-ʔ) Departing (< MC *-h) Entering (MC final stops)
*Voiceless = 55? -v = 454? -x = 24? -v = 454?
*Voiced -h = 42? -z = 121? -c = 11?

All the *voiceless tones end with a high pitch (4 or 5) whereas all the *voiced tones end with a low pitch (2 or 1). However, the initial pitches are inconsistent: one *voiceless tone starts low (24) and one *voiced tone starts high (42).

4.7.12:21: Suzhou has ləu 'mule' implying MC *lɔ < OC *ra instead of li (lit.) ~ ly (colloq.) implying MC *lɨə < OC *ra. I'm surprised that literary reflexes of MC *lɨə have i instead of y because late MC had *-y whereas earlier MC had *-ɨə. 取 and 娶 (both < MC *tshuə) have the opposite pattern: tshy (lit.), tshi (colloq.). Other MC *-ɨə and *-uə sinographs in 汉语方言字汇 only have one reading in Suzhou ending with -i or -y.

