One may find the synchronic variation that I reconstructed for Old Chinese to be implausible. However, I found an even greater degree of synchronic variation in มาบกราด Maap Kraat Nyah Kur presyllables among the three elderly informants in Diffloth (1984). Informants MKle and MKlu have different forms even though they are sisters! Colored cells indicate innovative forms. The most prominent vowel is indicated with a subscript symbol specifying register: a underline for clear voice (< *voiceless initials) and a diaresis for breathy voice (< *voiced initials).

Proto-Monic Gloss MKle MKlu MKp
*jliiŋ long chəmi̤in khəmi̤in chəmi̤in
*j-m-lɛɛʔ short chəmlɛ̤ɛʔ chəmlɛ̤ɛʔ khəmlɛ̤ɛʔ
*(cŋ)kaam husk cəŋka̱am cəŋka̱am təŋka̱am
*c-ŋ-kiəm handful cəŋki̲am təŋki̱əm təŋki̱əm
*cŋkɔɔr bark cəŋku̱ay kəŋku̱ay ~
*cŋkiər unpleasant cəŋki̱əy təŋki̱əy cəŋkɯ̱əy
*smpɔɔt to wipe səmpu̱at kəmpu̱at kəmpu̱at ~
*tmpɔh seven cəmpɔ̱h kəmpɔ̱h kəmpɔ̱h ~
*p-m-tɯl sand pəmtɯ̱y kəmtɯ̱y kəmtɯ̱y
*cmpiir pumpkin chəmpi̱i ~
khəmpi̱iy səmpɨ̱ɨy

It would be difficult to reconstruct Monic presyllabic consonants with confidence without premodern and comparative data. In one case ('seven'), none of the three informants preserved the original presyllabic initial.

And without more information on cross-generational phonology, I cannot understand how

[m]any of these differences probably represent unsystematic and limited adaptation, on the part of very old persons, to the rapidly changing speech of younger speakers, many of whom, including their own children, have now abandoned the language and speak Thai. We simply arrived twenty years too late to record sounds which have been preserved for more than fourteen centuries. (Diffloth 1984: 320)

With the exception of a monosyllabic form of 'seven', I don't see how any of the innovative forms are more Thai-like than the more conservative forms. (Thai favors monosyllables.)

Is the variation in Phan Rang Cham due to such adaptation?

11.26.15:36: And what is one to make of the variation in Hmongic forms for 'dog' (Ratliff 2010: 206)?

Proto-Hmong-Mien *qluwX


Pa-Hng (why a final nasalized vowel?)

Baiyun ta 1 ljɔ̃ 7

Tan Trinh ta 1 ljɔ̃ 7 ~ Baiyun ka 1 ljɔ̃ 7

West Hmongic:

Xianjin (Hmong) tl̥e 3

Fuyuan (Hmjo) qlei B

North Hmongic: Jiwei (Qo Xiong) qwɯ 3

Mienic: Liangzi Mun klu 3

If not the for the Pa-Hng forms, I would say that *ql- simplified to q- or kl- with assimilation in tl̥- (q- became dental like l which became voiceless like the preceding stop). Are the Pa-Hng forms stretched monosyllables - expansions of *klV and *tlV? Or do they contain *l-roots preceded by different prefixes? Or do they preserve the original disyllabic structure of their Proto-Hmong-Mien ancestor which might have been something like *qa.luwX? Would *qa- have changed to *ta- by analogy with some other word or even a compressed but now extinct form with an assimilated cluster like *tljɔ̃ 7?

11.26.20:45: If the root of 'dog' is *luwX, then Old Chinese *Cə.kˁroʔ (which has no internal etymology) could be a borrowing of a prefixed Proto-Hmong-Mien word that displaced the native word 犬 *[k]ʷʰˁ[e][n]ʔ.

What if the Old Chinese form was *təkˁroʔ from a Proto-Hmong-Mien *tV-qV-luwX with a doubly prefixed *l-root or *tV-qluwX with a singly prefixed *ql-root? Could Pa-Hng ta-l-be a simplification of an earlier *tV-ql-? IS 'HE' A MAN IN A BOX?

For a long time I thought Khitan pronouns were unknown. Then this year I got ahold of Wu and Janhunen (2010). The authors regard the Khitan small script character

309 <ghó>

as a word possibly meaning 'he' when by itself*. Unfortunately I cannot find a full argument for that interpretation in their book.

309 resembles Chinese 士 'officer, gentleman' with an added enclosure 囗. I am reminded of 囝, a graph for the Min word for 'son' composed of 子 'son' inside 囗. (There was no contact between the Khitan in the north and the Min in the south, so 309 did not influence 囝 or vice versa.)

Other potential forms of 'he' are

309-140 <ghó.en> 'he.GEN' = 'his'? (蕭仲恭 41.1, 46.6, 耶律詳穩 35.34)

11.25.0:06: Why isn't this <ghó.on> with vowel harmony?

309-205 <ghó.de> 'he.DAT' = '(to) him'? (興宗 21.15, 24.26, 許王 39.20, 蕭仲恭 5.58, 蕭敵魯 45.11)

Why isn't this <ghó.do> with vowel harmony?

309-254 <ghó.d> 'he.PL?' = 'they'? (耶律詳穩 32.6)

Wu and Janhunen (2010: 200) identified this as a possible dative which makes sense in the context of 耶律詳穩 32. Was *-de shortened to -d after gho?

Added 11.25.23:33:

309-341 <ghó.er> 'he.ACC/INST' = '(by) him'? (道宗 17.23)

Why doesn't this suffix have harmonic variants: e.g., *<or>?

309-339 <ghó.i> 'he.GEN' = 'his'? (蕭仲恭 48.10)

<i> does not harmonize; cf.

<b.qo.i> 'son.GEN'

Did <ghó.i> and <ghó.en> both mean 'he.GEN'?

If those blocks containing 309 plus characters for common case suffixes are forms of 'he', why are they so rare?

*11.25.23:40: 309 <ghó> also occurs as a phonogram in blocks such as

309-261-261-112-341 <ghó> (道宗 22.20)

Once again <ghó> appears next to e-graphs even though o and e typically do not mix in Altaic-type languages.

11.26.0:52: The transliteration of <ghó> seems to be based on the Chinese transcription 訛 *(ng)()o (Kane 2009: 72). It does not resemble the third person pronouns *i (singular) and *a (plural) that Janhunen (2003: 18) reconstructed for Proto-Mongolic. It is not currently possible to determine whether the Khitan and/or Proto-Mongolic pronouns are innovations. I doubt that the common ancestor of Khitan and Proto-Mongolic can ever be reconstructed in detail. CAN ANYONE EXPLAIN THE EXTRA X IN AVESTAN AND OLD PERSIAN?

Jackson (1892: 29) wrote (converting his transliteration to Hoffman's),

In Av. [= Avestan], we sometimes find x prefixed to ṣ̌, initial or internal, apparently without etymological value: e.g., ā-xṣ̌nuš 'up to knee', cf. Skt. abhi-jñu.

Another example in Jackson (1892: 136, 193) is the desiderative present participle zixṣ̌nā̊ŋhəmna- 'wanting to know' (cf. Skt jijñāsamāna-; Av xṣ̌n : Skt < Proto-Indo-European *gn).

Was this x- added to by analogy with words with 'true' x-clusters corresponding to Sanskrit kṣ-: e.g., Avestan xṣ̌aϑrəm 'rule' (cf. Skt kṣatram).

Old Persian is not descended from Avestan, but it also had this extra x in /xšnā-/ 'know' (inchoative; see Cheung 2007: 466 for examples).

Zoroastrian Middle Persian /šnās-/ 'recognize' (inchoative) lost it. (But is the `ayin in Manichaean Middle Persian <ʕšnʔs> a reflection of the extra x?)

Avestan and Persian belong to different branches of Iranian. Did the extra x indepedently 'grow' twice in Iranian, once in the east (Avestan?) and again in the west (Persian)?

11.24.23:27: I placed a question mark after "Avestan" since its classification as eastern is disputed. In any case, Avestan and Old Persian do not subgroup together. They are in northeastern and southwestern branches in this tree. WHY DID KOREANS BORROW LATE MIDDLE CHINESE GRADE II VELAR-FINAL SYLLABLES IN TWO DIFFERENT WAYS? (PART 2)

(I originally meant to post this entry last night but noticed I had overlooked something essential and decided to upload a revised version. The title should be more specific since these posts are about *K- + nonhigh, nonlabial vowel + *K syllables, but I've retained the title for the sake of continuity.)

For over twenty years I had been assuming that Middle Old Chinese *ˀraK/ˀreK/ˀrəK-rhymes all merged into Late Middle Chinese (LMC) Grade II *æK which I recently revised as *ʌ̆eK. But I had known the evidence against such a merger all along!

In modern Sino-Korean, reflexes of Middle Old Chinese (MOC) *KˀreK/KˀrəK always end in -jəK (< *eK), whereas reflexes of MOC *KˀraK end in either -jəK (< *eK) or -aŋ (< *eŋ). I conclude that the northeastern LMC (NELMC) source dialect of 8th century Sino-Korean at least partly distinguished between reflexes of those two types of syllables unlike other LMC dialects. NELMC may have undergone a chain shift not found in other dialects (*ʌ̆a > *ʌ̆e > *e) without the stages in the Late Old Chinese and Early Middle Chinese columns. Hence NELMC is not descended from the prestige dialects in those columns. 

Sinograph Gloss Middle Old Chinese Vocalization Late Old Chinese Early Middle Chinese Non-NE LMC NELMC 8th c. SKPrescriptive SK Premodern SK Modern SK
change *kˀraŋ *kʌ̆aŋ *keaŋ *kæŋ *kæŋ *kʌ̆eŋ *kʌjŋkʌjŋ kʌjŋ kɛŋ
seventh Heavenly Stem *keŋkjəŋ
to plow *kˀreŋ *kʌ̆eŋ *kaeŋ *kɛŋ *keŋkjəŋ
guest *kʰˀrak *kʌ̆ak *kʰeak *kʰæk *kʰæk *kʰʌ̆ek *kʌjkkʰʌjk kʌjk kɛk
go to *kˀrak *kʌ̆ak *keak *kæk *kæk *kʌ̆ek *kʌjkkʌjk kjək
obstruct *kˀrek *kʌ̆ek *kaek *kɛk *kek *kekkjək
hide, skin; change *kˀrək *kʌ̆ək *xʱek? *hekhjək (irregular initial*)

In the prescriptive SK of 東國正韻 Tongguk chŏngun (1446), the reflexes of MOC *KˀreK/KˀrəK (KjəK) are always distinct from reflexes of MOC *KˀraK (KʌjK). That suggests the KjəK < *KeK reflexes of MOC *KˀraK were less prestigious and hence not worthy of inclusion in Tongguk chŏngun.

Were those KjəK-readings borrowed from a LMC dialect which had *KɛK instead of *Kʌ̆eK from MOC *KˀraK?

Were those KjəK-readings considered incorrect because Old Korean *e was a poorer match for the NELMC diphthong? Perhaps *ʌ̆e (short-long) had become *ʌĕ (long-short) which would have been better approximated by Old Korean *ʌj than by *e. But at least some *e-forms persisted and their reflexes are standard today,

Were those KjəK-readings considered incorrect because of a desire to keep a clear distinction between the Early Middle Chinese 庚陌 *-æŋ/*-æk and 耕麥 *-ɛŋ/*-ɛk categories that was muddied in the actual borrowings whose descendants are in use today? Were readings like kʌjŋ for 庚 artificial creations in Tongguk chŏngun?

*11.23.23:39: There is no Korean-internal reason to borrow a *k-word like 革 with h-. I suspect the h- reflects a NELMC initial *xʱ-. My theory of emphatic origins requires a lower vowel presyllable to condition emphasis in this word:

Early Old Chinese *Cʌ-krək > MOC *kˀrək

If that presyllable were *Nʌ-, it could have dropped in mainstream Chinese after emphasis whereas it fused with the *k- in the ancestor of NELMC:

Early Old Chinese *Nʌ-krək*Nʌ-kˁrək*ŋkˁrək*ŋgˁrək*gˁrək*gʌ̆ək*gʌ̆ek*ɣʌ̆ek > NELMC *xʱek

The Tongguk chŏngun reading kjək may have an artificial k- based on the prestige dialectal base of the Chinese phonological tradition.

The Tongguk chŏngun readings as a whole may be an artificial compromise between that tradition and actual NELMC-based readings already in use in Korea since the 8th century AD. WHY DID KOREANS BORROW LATE MIDDLE CHINESE GRADE II VELAR-FINAL SYLLABLES IN TWO DIFFERENT WAYS? (PART 1)

In "Chinese Grade II, Version 2015.11.19", I wrote,

The stages [of Chinese presented here] are 'generic'; as I will demonstrate later, actual dialects could differ from this model.

Here's such a brief demonstration. What I reconstructed as Late Middle Chinese (LMC) *æ normally corresponds to Sino-Korean (SK) a. But that *æ also has other correspondences:

更 'change': MOC *kˀraŋ > LMC *kæŋ : SK kʌjŋ (not *kaŋ)

羹 'soup': MOC *kˀraŋ > LMC *kæŋ : SK kʌjŋ

庚 'seventh Heavenly Stem': MOC *kˀraŋ > LMC *kæŋ : SK kjəŋ < *keŋ (not *kaŋ or *kʌjŋ)

耕 'to plow': MOC *kˀreŋ > LMC *kæŋ : SK kjəŋ < *keŋ

客 'guest': MOC *kʰˀrak > LMC *kʰæk : SK kʌjk (no aspiration; borrowed before Korean developed a phonemic aspirated /kʰ/;  not *kak)

格 'go to': MOC *kˀrak > LMC *kæk : SK kjək*kek (not *kak or *kʌjk)

隔 'obstruct': MOC *kˀrek > LMC *kæk : SK kjək*kek

革 'hide, skin; change': MOC *kˀrək > LMC *kæk : SK hjək*hek (irregular initial)

(The SK forms are premodern and in IPA to facilitate comparison. *-k forms added 11.22.23:34.)

I think ʌj and  (< Old Korean *e) reflect two different aprpoaches to borrowing a northeastern LMC diphthong *ʌe corresponding to *æ before velars in other Middle Chinese dialects. ʌj matched the first vowel of *ʌe and approximated the second with a glide. Old Korean *e matched the second, more prominent half of *ʌe (which I could more precisely write as *ʌ̆e).

11.23.13:34: If you look carefully, you can see a pattern among the SK readings that I missed when I wrote this entry. I'll reveal that pattern next time. HOW DID MEOW BECOME ME-NG-OW?

While looking through Thurgood's From Ancient Cham to Modern Dialects (1999) for examples of diphthongs resulting from vocalic splits*, I found two unusual-looking forms:

Western Cham maŋiau 'cat' from Proto-Chamic *miaw (p. 159)

Phan Rang Cham pimaw 'mushroom' from Proto-Chamic *bɔh maw (p. 158)

'Cat' has a -ŋ- that 'grew' in the middle of *miaw. Moreover, there is an extra -a- between m- and this -ŋ-. Why was this originally monosyllabic word stretched into two syllables when the general tendency was to compress? Examples from Thurgood (1999: 112):

Proto-Chamic *bara > Western Cham pra 'shoulder'

Proto-Chamic *bulan > Western Cham ea plan 'moon, month' (What does ea mean?)

Proto-Chamic *bulu > Western Cham plau 'body hair'

'Mushroom' disturbs me because i and are very different vowels. I fear that Old Chinese presyllabic vowels might have undergone nonrecoverable changes similar to those that occurred in this word.

11.21.23:46: I am also disturbed by Phan Rang Cham mɨyau 'cat' (p. 159). Could this word be like a pre-Western Cham *mayiau, a stretched form whose intrusive *-y- nasalized to *-ɲ- under the influence of the preceding *m-? But why would  *-ɲ- back to -ŋ-?

11.22.0:19: Western Cham does have ɲ [Thurgood 1999: 274], though I do not know if word-medial -ɲi- is possible. Did *-ɲi- become -ŋi-? CHINESE GRADE II, VERSION 2015.11.19

(More like version 2015.11.21, since I have revised this entry over the past two days.)

In my last entry, I reconstructed the Late Old Chinese reading of the Grade II word 講 'discuss, explain' (later 'speak') as *kəoŋʔ. I think I've reconstructed Grade II words with similar diphthongs before. In any case, here's how I think *-ˁr- (> Middle Chinese Grade II) and *-r- (> Middle Chinese Grade III) syllables developed between Middle Old Chinese (after emphasis had developed) and Late Middle Chinese:

Middle Old Chinese Vowel bending Vocalization Late Old Chinese Early Middle Chinese Late Middle Chinese
*-ˁre *-re *-ʌe *-ae *-ɛ(j) *-æ(j)
*-re *-rie *-ɨie *-ɨe *-ɨi
*-ˁra *-ra *-ʌa *-ea *-ɛ *-(j)æ
*-ra *-rɨa *-ɨa *-ɨə *-iə > *-ø > *-y
*-ˁroh *-ro *-ʌo *-əw *-əw
*-ro *-ruo *-ɨuo *-uo *-u
*-ˁri *-rei *-ʌej *-aej *-ɛj *-æj
*-ri *-ri *-ɨi
*-ˁrə *-rəɨ *-ʌəɰ *-aej *-ɛj *-æj
*-rə *-rɨə *-ɨə *-ɨ *-ɨi
*-ˁru *-rou *-ʌow > *-ʌew *-aew *-ɛw *-æw
*-ru *-ru *-ɨu *-u *-ɨw

The stages are 'generic'; as I will demonstrate later, actual dialects could differ from this model.

I wrote *-ˁ- before *-r- since I follow Baxter and Sagart in writing emphasis before the first consonant of a cluster: e.g., *pˁr-. But I think emphasis was a feature of all consonants in a cluster: e.g., /pˁr/ was phonetically [pˁrˁ].

Between Middle and Late Old Chinese, nonemphatic *-r- became a high central that fused with nonfront high vowels, but emphatic *-ˁr- became lower-mid back before lower vowels. (I decided to write the reflex of *-ˁr- as *-ʌ- because emphasis is associated with backing and because *-ˁrə did not become long *-əː.)

Perhaps vocalization predated vowel bending: e.g., *-re > *-ɨe > *-ɨie.

In Late Old Chinese, lower series vowels dissimilated:

*ʌa (achromatic-achromatic) > *ea (palatal-achromatic) : ae (!) in Baxter's Middle Chinese notation

*ʌə (achromatic-achromatic) > e > *ae (achromatic-palatal) : ea (!) in Baxter's Middle Chinese notation

*-ʌow (achromatic + labial + labial) > *-ʌew > *-aew (achromatic + palatal + labial) : aew in Baxter's Middle Chinese notation

In Early Middle Chinese, lower series diphthongs monophthongized

*ea >

*ae >

In Late Middle Chinese, merged into *æ.

Toward the end of the Late Middle Chinese period, a *-j- developed between velars and in at least some dialects: e.g., the ancestor of Mandarin and the source of Sino-Vietnamese.

The first vowel in the achromatic-achromatic diphthong *ɨə dissimlated. Then the resulting fused into a front mid labial vowel that raised:

*-ɨə > -iə > *-ø > *-y

Go-on -o < *-ɨə, Kan-on -yo < *-iə, Sino-Korean < *-ø, and Sino-Vietnamese < *-y reflect these four stages. (However, those four types of Sinoxenic were borrowed from four different Middle Chinese dialects at four different times, so it is possible that they reflect different dialect developments: e.g., Sino-Korean could be from a conservative *-ɨə or even *-ə and Sino-Vietnamese -ư, an unrounded vowel, may directly reflect an *-ɨ from *-ɨə.)

I am not happy with the diphthongs I reconstructed. I should examine the diphthongs of Mon-Khmer languages which have undergone vocalic splits and diphthongization to get a feel for which vowel sequences are plausible. (The only such Mon-Khmer language I am familiar with is Khmer which does not have ʌa, etc.)

11.21.22:42: I originally intended to include onset development in the table above but omitted it to focus on the vowels. Here are a few examples of how *r-clusters changed over time:

Sinograph Gloss Middle Old Chinese Vowel bending Vocalization Late Old Chinese Early Middle Chinese Late Middle Chinese
snake *pˁra *pra *pʌa *pea *pæ
skin *pra *prɨa *pɨa *pua *puo > *pu *fu
to crouch *dˁreʔ *ɖreʔ *ɖʌeʔ *ɖaeʔ *ɖɛ(j)ˀ *ʈɦǽ(j)
bug *dreʔ *ɖrieʔ *ɖɨieʔ *ɖɨeʔ *ɖɨeˀ *ʈɦɨí
household *kˁra *kra *kʌa *kea *kæ *kjæ
plant, place name *kra *krɨa *kɨa *kɨə *kiə > *kø > *ky

*ɨa became *ua after labials in syllables with zero or glottal codas: *Pɨa(H) > *Pua(H).

In late Early Middle Chinese, final mid vowels in diphthongs raised (and merged with the preceding vowel if it was identical):

*uo > *uu > *u

*ɨe > *ɨi

*ie > *ii > *i (no examples in this post because *ie was only in syllables that did not have medial *-r-)

Coronals developed retroflex allophones before rhotics:

*t(ʰ)r > *ʈ(ʰ)r > *ʈ(ʰ)

*dr > *ɖr >

*nr > *ɳr >

*(t)s(ʰ)r > *(t)ʂ(ʰ)r > *(t)ʂ(ʰ)

*(d)zr > *(d)ʐr > *(d)ʐ

Those allophones became phonemic after the rhotics were lost.

Final glottal stops conditioned glottalization in Early Middle Chinese which in turn led to a tone in Late Middle Chinese. (I think it might be better to translate 聲 as 'phonation' rather than as 'tone' for Early Middle Chinese.)

Labials weakened to dentilabial fricatives before *u: *pu > *fu.

Voiced obstruents became voiceless-obstruent--clusters in Late Middle Chinese (Pulleyblank 1984). IS SANYA MANDARIN MANDARIN?

Today I found Thurgood et al.'s A Grammatical Sketch of Hainan Cham and was astonished by Appendix C on 三亞 Sanya Mandarin. An informant read a Mandarin newspaper out loud with surprising results:

- the codas included -k and -t unlike any Mandarin variety I have ever seen but like southern Chinese languages: e.g.,

kok²⁴ 'country' (cf. LOC *kwək, standard Mandarin guó)

siet²⁴ 'snow' (cf. LOC *swɨat, standard Mandarin xuě)

but LOC *-p corresponds to *-t:

zet²⁴ 'leaf' (cf. LOC *jɨap, standard Mandarin yè)

sit²⁴ 'ten' (cf. LOC *dʑɨəp, standard Mandarin shí)

- the codas included glottal stops and nasal-glottal stop clusters generally where they would be reconstructed in Old Chinese (!): e.g.,

kiuʔ⁴ ³ 'nine' (cf. LOC *kuʔ, standard Mandarin jiǔ)

kiaŋʔ⁴ ³ 'nine' (cf. LOC *kəoŋʔ, standard Mandarin jiǎng)

but see the exceptions below!

- "the numbers are all read in Cham except dates": e.g.,

六十年 as Cham naːnʔ³³ piu⁵⁵ tʰun³³ instead of Chinese *lok²⁴ sit²⁴ nien²¹

How much does this reading pronunciation correspond to the informant's spoken Mandarin? Is it possible that this reading pronunciation is derived from a conservative southern Chinese language? There seem to be at least four strata in the reading pronunciations.

The first two strata (a conservative southern Chinese language and Cham) are listed above.

The third stratum looks like Mandarin and must be recent. It is characterized by an absence of final stops (and tonal differences I will explore later):

社會主義 se³³ huj³³ tsu²¹ zi³³ 'socialism' (cf. the LOC readings *dʑiaʔ, gwas, tɕuoʔ, and ŋɨajh - of course there was no LOC word 'socialism' - and standard Mandarin shèhuì zhǔyì)

but 主席 tsiuʔ⁴ ³ si³³ 'chairman' (cf. the LOC readings *tɕuoʔ and *ziak) preserves the final glottal stop in the morpheme 主 'master' - though the following morpheme lacks *-k!

tə²¹ 'get' (cf. LOC *tək and standard Mandarin dé)

s(i)o²¹ 'speak' (cf. LOC *ɕwɨat and standard Mandarin shuō)

The word 學習 sioʔ²⁴ sit²⁴ 'to study' is a combination of forms from different strata like 'chairman'. However, in 'chairman' the first syllable was more archaic, whereas in 'to study' the second syllable sit²⁴ is more archaic (cf. LOC *zɨəp and standard Mandarin without a stop coda). sioʔ²⁴ (< *xioʔ < LOC *gəuk) is like 合肥 Hefei Mandarin ɕyɐʔ with a glottal stop that is a trace of the *-k preserved in the oldest Chinese stratum. Hence I think sioʔ²⁴ is from a fourth stratum that is slightly more archaic than standard Mandarin xué which has no final stop. IS 'WING' FROM 'BRANCH' IN CHINESE?

In my previous entry, I reconstructed Old Chinese

*Cɯ.ke > *ke 'branch' (spelled 支 or 枝 with 木 'tree'), limb' (spelled 肢 or 胑 with 肉 'flesh') 

Could *C have been *s- if 'limb' is related to 翅 *sɯ.ke-s 'wing'?

Baxter and Sagart (2014: 140) reconstructed 翅 as Old Chinese *s-kʰe-s and *kʰe-s with an aspirated *kʰ- absent from their reconstruction of 'branch' as *ke.

Below I reconstruct a single 翅 *sɯ.ke-s 'wing' that underwent three different paths of reduction. I have included 'branch' and a possible cognate 咫 'foot (8 inches)' (< 'length of a branch'?) for comparison.

Sinograph Early Old Chinese Presyllabic vowel neutralization;
emphasis phonemic
Early *s.k-reduction Late *s.k-reduction Later reflexes
Phase 1: syncope Phase 2: cluster to aspirate Phase 1: syncope Phase 2:  cluster to fricative
*sɯ.ke-s *sə.ke-s *sə.ke-s *sə.ke-s * *ɕieh Middle Chinese *ɕieʰ; perhaps also a few modern forms like 漳浦 Zhangpu Min si unless their s- is from *tɕʰ-
* *kʰe-s *kʰe-s *tɕʰieh Most Chinese varieties: e.g., Mandarin chi
*ke-s *kieh Northwestern Middle Chinese *keʰ implied by 翅 in transcriptions of Indic ke-like syllables; no modern descendants
支枝肢胑 *sɯ.ke *sə.ke *ke *tɕie Middle Chinese *tɕie; most Chinese varieties
*kie Min forms with k-
*sɯ.ke-ʔ *sə.ke-ʔ *keʔ *tɕieʔ Middle Chinese *tɕieˀ

Early *s.k-reduction is part of the same wave of changes as *C.l-reductions 2-3 in this post. Similarly, late *s.k-reduction is part of the same wave of changes as *C.l-reductions 4-5.

If *sɯ- is part of a root 'branch', then *sɯ- was lost in 'limb, branch' prior to late *s.k-reduction whereas it was never dropped in the derived word 'wing'.

A wild possibility is that *sɯ- was lost in 'limb, branch' very early on - even before emphasis - but the resulting *ke did not become emphatic due to analogy with 'wing'. However, I would expect 'wing' to be remodeled after 'branch' rather than the other way around. In English, 'branch' is more common than 'wing'. Was the frequency the other way around in early Chinese? WHY DOES ARABIC HAVE EMPHATICS IN LOANWORDS FROM LANGUAGES WITHOUT EMPHATICS?

Normally if a language with sound X borrows from a language without sound X, I wouldn't expect sound X to be in borrowings. So for instance Hindi-Urdu has voiced aspirates and English doesn't. Hence I wouldn't expect voiced aspirates in Hindi-Urdu loanwords from English. (If there are such loanwords with gh, etc., I'd like to know about them.)

However, Hindu-Urdu loanwords from English do have retroflex stops even though English doesn't: e.g., ākar from doctor.

The reason is that Hindi-Urdu lacks alveolar stops, and to Hindi-Urdu speakers, English alveolar stops are perceived as being closer to Hindi-Urdu retroflex stops than to Hindi-Urdu dental stops.

I have long known that Arabic has emphatics (in bold) in loanwords even from languages without emphatics: e.g.,

Latin strata > Greek strata > Aramaic ʕsr > Arabic iraː 'way'

(11.17.19:59: I presume the Aramaic form had a vowel added before s to break up the initial cluster: str- > ʔVs.tˁV.r-. Did s really simplify to  in Arabic? Could the Arabic word be from Middle Persian <slt> /srat/ 'street'?)

French bicyclette > Moroccan Arabic bəqʃliːa

Spanish falta > Moroccan Arabic faːla 'error, offense'

French automobile > Moroccan Arabic uːmubiːl

French déserter > Algerian Arabic zarˁtˁa

French exercice > Algerian Arabic garˁsˁ (with French /gz/ simplified to /g/)

Italian gelati > Tunisian Arabic ʒiːlaː 'ice cream'

Turkish abla > Egyptian Arabic ʔala 'older sister'

French tante > Egyptian Arabic an

Those examples are from Kossmann's chapter on borrowings in Owens (2013). I didn't see any explanation for those emphatics there, but I guess that they might have something to do with approximating foreign vowel qualities: e.g., it would make sense to borrow automobile as uːmubiːl if /u/ had a lower allophone [ʊ] after emphatic /tˁ/ that was closer to foreign o than the higher allophone of /u/ after nonemphatic /t/.

Perhaps I am on the right track - at least for Moroccan Arabic (MA). Kenstowicz and Louriz wrote in their abstract:

MA has three vowel phonemes /a/ /i/ /u/ (as well as an epenthetic schwa). They take lowered and retracted allophones [ɑ], [e] and [o] respectively, when tautosyllabic with an emphatic consonant. The latter are redundant and predictable variant of the corresponding phonemes. This would lead one to predict that they should play no role in loanword adaptation. Also, since French lacks emphatic consonants, we expect that the above mentioned allophones should be absent completely from French loanwords in MA. However, consideration of French loanwords in MA shows that French /ɑ/, /e/ and /o/ can be identified with the MA allophones that appear in emphatic contexts in the native phonology.

I think this is the full article. I haven't had time to read it yet.

Note that the three vowels match the three 'lower series' vowels *a, *e, *o of my Old Chinese reconstruction which condition emphasis unless preceded by the 'higher series' vowels: e.g.,

*ke > *kˁe 'chicken' (source of White Hmong qaib)

but *Cɯ.ke > *ke 'branch' (spelled 支 or 枝 with 木 'tree'), limb' (spelled 肢 or 胑 with 肉 'flesh') 

An understanding of the apparent emphatic-nonemphatic mismatches in Arabic loanwords and their sources may help understand similar apparent mismatches in Old Chinese words and related forms in neighboring languages: e.g., between Old Chinese *kˁ- and White Hmong q- in 'chicken' (though to be a C > Hmong-Mien loan) and 'dog' (possibly a Hmong-Mien > Chinese loan?). HOW DID HMONGIC AND MIENIC GET THEIR WORDS FOR 'IRON'? (PART 1)

The short answer is "from Chinese". Here's a longer answer. Thanks to Mark Alves for drawing my attention to this issue.

Baxter and Sagart‘s (2014: 160) reconstruction of the Old Chinese word 鐵 for 'iron' does not quite match the Proto-Hmongic and Proto-Mienic forms reconstructed by Ratliff (2010: 258):

Language Initial Vowel Coda 'Tone'
Old Chinese l̥ˁ- -i- -k D
Proto-Hmongic l̥- -u-! -w C
Proto-Mienic r̥-! -ɛ- -k D

I have rewritten Ratliff's hl- and hr- in IPA to facilitate comparison.

'Tone' is in quotation marks since Old Chinese did not have tones and it is not clear whether the other two proto-languages had them. Nonetheless these 'tonal' categories definitely have tonal reflexes in daughter languages.

The Proto-Hmongic form has a labial vowel *u where I would expect a palatal vowel.

Similarly, Proto-Hmongic has *u in *ʔjuw C 'small, young' borrowed from some reflex of Old Chinese 幼 *[ʔ](r)iw-s. I would have expected Proto-Hmongic *ʔjiw C, but there is no Proto-Hmongic rhyme *-iw in Ratliff's reconstruction.

Here's my attempt to (unconvincingly, I'll admit) bridge the phonetic gap between the Old Chinese and Proto-Hmongic forms: 

Proto-Hmong-Mien *-k words developed Proto-Hmongic tone C unlike Proto-Hmong-Mien *-t and *-p words which developed Proto-Hmongic tone D (Ratliff 2010: 31). I suspect Proto-Hmong-Mien *-k became pre-Proto-Hmongic *-x which merged with *-h, the source of Proto-Hmongic tone C:

Proto-Hmong-Mien Pre-Proto-Hmongic Proto-Hmongic
*-h *-h Tone C (accompanied by breathiness [ʰ]?)
*-k *-x
*-t *-ʔ Tone D (accompanied by a final [ʔ]?)

(11.16.1:19: This merger has a parallel in Chinese:

Early Old Chinese Middle Old Chinese Late Old Chinese Middle Chinese Modern Chinese
*-s *-h *-h *-ʰ Tone C
*-ks *-x
*-ts *-ts > *-s (phonetically [c]?) *-s (phonetically [ɕ]?) *-jʰ

Unlike the pre-Proto-Hmongic merger, the Chinese merger involved clusters and perhaps a chain shift if Middle Old Chinese *-s really was [s]: *-ts, *-ps > *-s > *-h.)

Old Chinese *l̥ˁik 'iron' was borrowed before the merger as pre-Proto-Hmongic *l̥ix after Proto-Hmong-Mien *-ik and *-ek had become pre-Proto-Hmongic *-ɨx.

Old Chinese *ʔiwh 'young' was borrowed as pre-Proto-Hmongic *ʔiwh.

The rhyme of 'iron' merged with the rhyme of 'young' in pre-Proto-Hmongic, and the vowel assimilated to the following glide in Proto-Hmongic:

*-ix > *-iɣ > *-iɰʰ > *-iwʰ > Proto-Hmongic *-uw C

(11.16.1:26: Was Proto-Hmongic *-uw phonetically [ʊw]? Modern reflexes include [o], [ɔ], [ə], and [aw] as well as [u]. See Ratliff 2010: 135-136.)

Perhaps *-x (from an earlier *-k) generally became pre-Proto-Hmongic *-w which was then lost after certain vowels: e.g., 

*-ɨx*-ɨwʰ*-ɨ C (there was no *-ɨw in Proto-Hmongic).

(11.16.0:16: I am reminded of pre-Tangut *-k which shifted to *-w which was then lost after certain vowels: e.g.,

*-ak > *-aw > -a [there was no -aw in Tangut]

However, this secondary -w in Tangut was not associated with a particular tone unlike my proposed secondary *-w in Proto-Hmongic.)

The Proto-Mienic form has *r̥- instead of *l̥-. A rhotic also underlies Vietic forms for 'iron': e.g., Vietnamese sắt < *kr-. Did Proto-Mienic and Vietic borrow 'iron' from Chinese dialects in which *l̥- became *r̥-?

Next: Chronological issues.

11.16.1:10: The Vietic borrowing may reflect an archaic *kr- cluster from an even earlier *kʌ.l-:


> mainstream Chinese *l̥ˁik > *l̥ˁit*l̥ˁeit*tʰet

> dialect A *l̥ˁik > *l̥ˁeik (source of Proto-Tai *l̥ek 'iron' [Pittayaporn 2009: 333] and Proto-Palaungic *l̥ek 'iron' [Sidwell 2010])

> dialect B *k.rˁik*r̥ˁik*r̥ˁeik (source of Proto-Mienic *r̥ˁɛk)

> dialect C *k.rˁik*k.rˁeik  > *k.rˁaik (source of Vietic forms; was -ik borrowed as a palatal stop *-c?)

(11.16.1:40: The shift *-ei > *-ai is in other southern reflexes of old emphatic syllables: e.g., 雞 *kˁe > *kai, the source of Proto-Tai *kaj B 'chicken' (the tone is unexplained and may reflect an *-h from an earlier *-s suffix in the source dialect.)

The Vietic borrowing may have displaced a native cognate of Proto-Katuic *taːʔ 'iron' (Sidwell 2005) if the Vieto-Katuic hypothesis (see Alves 2005) is correct.

Proto-Katuic *taːʔ 'iron' superficially resembles but is probably not cognate to Old Khmer <teka> /ɗɛːk/ 'iron' which Jenner compared to Siamese เหล็ก /lèk/ 'iron'.  Could the Khmer form be from a Old Chinese dialect in which *k- dropped without conditioning the devoicing of the following liquid?

> dialect D *lˁik > *lˁeik > *dek 

But why was Old Chinese *d- borrowed as an Old Khmer implosive /ɗ/ instead of /d/? Could dialect D have had implosives?

> dialect D *ʔlˁik > *ʔlˁeik > *ɗek? 

Are there other instances of Old Chinese *lˁ corresponding to Khmer /ɗ/? I'd like to take a second look at Jenner and Pou's "Some Chinese Loanwords in Khmer" (1973). FALLEN PREFIXES: GSR 0011

I don't have time to do what I said I would at the end of my last entry. A simple answer grew into a long post that I can't complete now. While researching that entry, I came across Grammata serica recensa series 0011 and thought it might be fun to apply my 'extended emphatic theory' to it taking Baxter and Sagart's reconstructions as a starting point.

First, a few words about the phonetic component of 0011: 阝+左. It represents syllables of the shape LOJ. (I use capital letters to reflect generic forms.) But - at least in the later script - it contains 左 representing syllables of the shape TSAJ/TSAR*. 左 TSAJ/TSAR is too different from 阝+左 LOJ to be a phonetic within a phonetic. The Sino-Korean reading 좌 chwa for 左 and 佐 could be from a Middle Chinese *tswa which in turn could be from an Old Chinese TSOJ/TSOR. Is the 左 in阝+左 a partial phonetic reflecting a dialect in which 左 ended in -OJ rather than -AJ? Is there a way to reconcile L- and TS-?

Another case of a possible L-/TS- alternation is 酉 *luʔ (Baxter and Sagart: *N-ruʔ) 'wine' ~ 酒 *tsuʔ 'wine'. I proposed that 酉 and 酒 are members of an Old Chinese palatal series. But the initials of 0011 do not overlap with those of my proposed palatal series.

Maybe all of this is a nonissue if 阝+左 and 左 are in fact unrelated. My ignorance of Chinese paleography is showing.

Let's move on to something I think I understand better: the words written with 阝+左:

GSR Sinograph Gloss Early Old Chinese *C.l-reduction 1 (optional) Phonemic
*C.l-reduction 2 (optional) *C.l-reduction 3:
*sə.l(ˁ)- > *s.l(ˁ)- (optional)
after all
*s.l(ˁ)- > *l̥(ˁ)-
*C.l-reduction 4:
*sə.l- > *s.j-
after all *s.l-
> *s-
*C.l-reduction 5:
*s.l- > *z-
Middle Chinese
0011d long and narrow mountain *(CV-)loj *lojʔ *lˁojʔ *lˁojʔ *dojʔ *dwajʔ *dwaˀ
0011j hanging tuft of hair *rɯ-loj *rə-loj *r-loj *ɖuoj *ɖwɨaj *ɖwie
*tV-loj-ʔ *t-loj-ʔ *t-lˁoj-ʔ *tojʔ *twajʔ *twaˀ
*(CV-)loj-ʔ *loj-ʔ *lˁoj-ʔ *dojʔ *dwajʔ *dwaˀ
0011l lazy
0011b, 0011e 墮隋 to fall
0011a to destroy
0011e, 0011f 墮隳 *sɯ-loj *sə-loj *s-loj *l̥oj *xuoj (W. dialect) *xwɨaj *xwie
0011b to shred sacrificial meat *sɯ-loj-ʔ *s-loj-ʔ *s-lˁoj-ʔ *l̥ˁoj-ʔ *tʰojʔ *tʰwaʔ *tʰwaˀ
*sɯ-loj-s *sə-loj-s *s-loj-s *l̥oj-s *xuojh (W. dialect) *xwɨajh *xwieʰ
*sə-loj-s *s-loj-s *suojh *swɨajh *swieʰ
0011i slippery
0011k beautiful *sɯ.lojʔ-s *s.lojʔ-s *s.lˁojʔ-s *l̥ˁojʔ-s *tʰojʔ *tʰwajh *tʰwaʰ
*sɯ.lojʔ *lojʔ *lˁoj-ʔ *dojʔ *dwajʔ *dwaˀ
0011c oval *s.lojʔ *s.lˁojʔ *l̥ˁojʔ *tʰojʔ *tʰwajʔ *tʰwaˀ
0011h marrow *sɯ.lojʔ *sə.lˁojʔ *s-lojʔ *sojʔ *swɨajʔ *swieˀ
0011b (place name) *sɯ.loj *sə.loj *s.juoj *zwɨaj *zwie
0011g to follow

(Thanks to David Boxenhorn for fixing the table.)

Here is a simplified table including possibilities absent from the large table. Only one path of reduction is listed per Middle Chinese reading. There are others: e.g., Middle Chinese *sə.lˁoj could be from a *s(ə).lˁoj that reduced to *lˁoj (*C.l-reduction 2) as well as a *sʌ.loj that reduced to *loj (*C.l-reduction 1).

Early Old Chinese *C.l-reduction 1 (optional) Phonemic emphasis *C.l-reduction 2 (optional) *C.l-reduction 3:
*sə.l(ˁ)- > *s.l(ˁ)- (optional)
after all *s.l(ˁ)- > *l̥(ˁ)-
*C.l-reduction 4:
*sə.lˁ- > *s.d-
after all *s.lˁ- > *s-?
*sə.l- > *s.j-
after all *s.l- > *s-
*C.l-reduction 5:
*s.d- > *dz-?
*s.j- > *z-
Middle Chinese
*sʌ.loj *sʌ.loj *sə.lˁoj
*sə.lˁoj *s.doj? *dzwaj? *dzwa
*s.lˁoj *soj *swaj *swa
*s.loj *s.lˁoj *l̥ˁoj *tʰoj *tʰwaj *tʰwa
*loj *lˁoj *doj *dwaj *dwa
*sɯ.loj *sɯ.loj *sə.loj *sə.loj *sə.loj *s.joj *zwɨaj *zwie
*s.loj *soj *swɨaj *swie
*s.loj *l̥oj *xuoj ( W. dialect) *xwɨaj *xwie
*juoj *jwɨaj *jwie

And here is a text summary of what I think happened.

Originally, there were at least six roots:

*(CV-)loj 'long and narrow mountain'

*loj 'to fall' > 'hanging hair'; perhaps also 'lazy' (fallen?) and 'to destroy' > 'to shred sacrificial meat' and even 'slippery' (causing to fall?)

*sɯ.lojʔ 'beautiful' (and 'oval'?)

*sɯ.lojʔ 'marrow'

*sɯ.loj (place name; derived from one of the other roots?)

*sɯ.loj 'to follow'

The various *sɯ- may have had various earlier sources prior to Early Old Chinese: e.g., *si, *ɕə, *tsu, etc.

These words had variation in degrees of reduction: e.g., *sɯ.lojʔ (none) ~ *s.lojʔ (partial) ~ *lojʔ (full).

Sagart (1999) gave examples of such variation in modern languages: e.g., Phan Rang Cham cơ.lan ~ clan ~ lan 'road' (quoted from Alieva 1994; from disyllabic Proto-Austronesian *zalan).

Although I assume the degree of reduction was unpredictable, I also assume that sound changes regularly applied to consonant clusters and single initials, resulting in predictable outputs (though not inputs!): e.g., all *s.l- at any given time became the same thing. (However, *s.l- became three different things at different times: *l̥-, *s-, and *z-.) Once clusters fused into new initials, they left gaps to be filled by presyllable-initial sequences that reduced into new clusters: e.g., *s.l- > *l̥- followed by *sə.l- > *s.l-,

The Middle Chinese forms in the final column are reflexes of variants that not only happened to survives but were also considered worthy of inclusion in the lexicographic tradition. Yet other variants must have existed in speech but were not recorded.

The variation in the Middle Chinese column does not imply that any given Middle Chinese speaker had three ways to say 'to shred sacrificial meat'. Even if such a word were still in use, each speaker probably only had one way to say it, and three of those ways were regarded as sufficiently prestigious. The term Middle Chinese as used here does not refer to a single coherent language; rather, it is a set of approved forms of heterogeneous origin.

*11.15.13:55: I was wondering why Baxter and Sagart reconstructed *-r in 左 ~ 佐 *tsˁarʔ-s 'to aid, assist'.

I see that 左 'left' (not 'to aid, assist') rhymes with an *-r word in Shijing 1.V.5.3. Starostin (1989: 567) reconstructed the rhyme words as

左 : 瑳 : 儺

*tsaːjʔ 'left' : *sʰaːjʔ : *n̥aːr (or *naːr).

I have rewritten his notation into IPA to faciliate comparison with Baxter and Sagart's reconstructions:

*tsˁa[j]ʔ : *tsʰˁarʔ : *nˁarʔ.

I assume that 儺 is to be read as *nˁarʔ which would rhyme better with the other two words than its other reading *nˁar. Do commentaries point to one reading or the other?

The brackets indicate that *[j] indicates that the coda is uncertain: it could be *-r as well as *-j. That rhyme sequence seems to indicate that 'left' ended in *-r: *tsˁarʔ. Since 'to aid, assist' was written with the same character as 'left', I think the two words probably both had *r. PROTO-MIN AND SINO-VIETIC EVIDENCE FOR EROSION

To demonstrate my proposed stages of Old Chinese erosion, I present my reconstructions for the words with Proto-Min reflexes and/or borrowed forms in Vietic from Baxter and Sagart (2015: 71-72). I retained Baxter and Sagart's numbering of the examples.

Type A words

168. 節 stage 3 *Cʌ-tsik > *Cʌ-tsit > stage 4 *Cə-tˁsit > stage 5 *C-tˁsit > stage 6 *tsˁet 'joint'

> Proto-Min *ts-

> Vietnamese *ts- > Tết 'New Year festival'

Also cf.


pre-Tangut *Tʌ-tsik > 4739 1tsewr1 'id.'

whose retroflexion and lowered vowel reflect a lost coronal-initial presyllable.

Monosyllabic words with low series vowels automatically developed emphasis:

167. 斗 *toʔ > *tˁoʔ 'bushel; ladle'

> Proto-Min *t-

> Vietnamese *t- > đấu 'bushel'

169. 繭 *kenʔ > *kˁenʔ 'cocoon'

> Proto-Min *k-

> Vietnamese kén 'id.'

170. 芥 *krets > *krˁets 'mustard plant'

> Proto-Min *k-

> *kɛs or later *kɛjʰ > Vietnamese cải 'cabbage'

171. 點 *temʔ > *tˁemʔ 'black spot'

> Proto-Min *t-

> Vietnamese *t- > đốm 'spot' (irregular vowel)

172. 白 *brak > *bˁrak 'white'

> Proto-Min *b-

> Vietnamese bạc 'silver'

Conversely, monosyllabic words with high series vowels did not develop emphasis:

173. 而 *nə > *nə

cf. 乃 stage 3 *Cʌ-nəʔ > stage 4 *Cə-nˁəʔ > stage 5 *C-nˁəʔ > stage 6 *nˁəʔ

If those Middle Old Chinese monosyllabic words (167, 169-173) were sesquisyllabic or polysyllabic at an earlier stage, external comparison would be necessary to identify the phonemes preceding the surviving syllables. My theory predicts the lost vowels were originally low series: e.g.,

167. 斗 stage 3 *Cʌ-toʔ > stage 4 *Cətˁoʔ > stage 5 *C-tˁoʔ > stage 6 *tˁoʔ

Type B words

Early Vietic presyllables could reflect stage 4 or 5 (if an epenthetic vowel was inserted to break up an initial cluster) in Chinese borrowings.

163. 牀 stage 3 *kɯ-dzraŋ > stage 4 *kə-dzraŋ > stage 5 *k-dzraŋ 'bed'

> Proto-Min *dzh-

> Vietic *kV-ɟ- > Rục /kciːŋ 2/, Vietnamese giưòng 'id.'

164. 種 stage 3 *kɯ-toŋʔ > stage 4 *kə-toŋʔ > stage 5 *k-toŋʔ > stage 6 *toŋʔ 'seed'

> Proto-Min *tš- may be from stage 5 or stage 6 since *t- and *k-t- merged into that Proto-Min initial

> Vietic *kV-C- > Rục /kcoːŋ 3/ 'id.', Vietnamese giống 'species, breed, strain, race, sex, gender'

165. 箴 stage 4 *tə-qəm > stage 5 *t-qəm > stage 6 *q- > *k- > *tɕ- (palatalization)

> Proto-Min *tš- (see 164)

> Vietnamese *tV-C-> găm 'bamboo or metal needle'

11.9.0:14: Later borrowings of the same word are *tV-C-> ghim and *k- > kim. The high vowels reflect the raising and fronting of to *i that in turn conditioned the palatalization of *k. Kim is a borrowing of a stage 6 form *kim prior to palatalization.

the *t- remains in Lakkia /them 1/

166. 謝 stage 2 *si-lak-s > stage 3 *sɯ-ljak-s > stage 4 *sə-ljak-s 'decline, renounce'

> Proto-Min *-dzia C; the hyphen indicates a lost presyllable

> Vietnamese *CV-ɟ- > giã 'say goodbye'

I will bridge the gaps between *sə-lj-, Proto-Min *-dz-, and Vietnamese *CV-ɟ- next time. COMEBACK

I had a good reason to not post for the past two days - and to have been emphasizing emphasis in recent posts.

On Thursday and Friday, I participated in an academic conference for the first time since 2003. Guillaume Jacques wrote a report about it. My contribution was "Old Chinese Type A/Type B in Areal Perspective".

Maybe I should have renamed my talk "Typological" instead of "Areal". But apart from one mention of Salish, I did stick to Eurasia and Egypt which is right next door.

You can download my PowerPoint presentation here. I want to supplement it with a table of my stages of (pre)syllabic erosion in Old Chinese:

Erosion stage 1 2 3 4 5 6
Number of vowels in (pre)syllables 6 4 2 1 0
Low series vowels *Ce- *Că (= *Cʌ-) *Cə- *C- *Ø-
High series vowels *Ci- *Ci- *Cɨ̆- ( = *Cɯ-)
*Cə- *Cə-
*Cu- *Cu-
Language stage Pre-Chinese Early Old Chinese Middle Old Chinese Late Old Chinese

Notes on the phases:

1. Roots were originally disyllabic with the same six vowels in both first and second syllables. Maybe either syllable could have been stressed at this point:

2. First syllables of disyllables which were unstressed (and/or lose stress?) became presyllables with less vocalic diversity than the stressed syllables that followed them. Six vowels were reduced to an Austronesian-like four-vowel system. Presyllabic *i may have left traces in some syllables that distinguish it from other high series presyllabic vowels. With the exception of those syllables, it is impossible to determie whether a high vowel syllable had *i, *ə, or *u without non-Chinese evidence. Hence I consider this stage to be pre-Chinese.

3. Low vowel *Că- presyllables (which I have been writing as *Cʌ- on my site) conditioned emphasis:

*Că-Ca > ́*Că-Cˁa

All high series presyllabic vowels merged into *ɨ̆ (which I have been writing as on my site). No emphasis developed after *Cɨ̆-presyllables:

*Cɨ̆-Ca > *Cɨ̆-Ca (no change)

4. The two presyllabic vowels merged into schwa. Emphasis is no longer predictable and beomes phonemic.

*Că-Cˁa > *Cə-Cˁa /CəCˁa/

*Cɨ̆-Ca > *Cə-Ca /CəCa/

5. The presyllabic vowels are lost, and presyllables become preinitials.

*Cə-Cˁa > *CCˁa

*Cə-Ca > *CCa

6. Preinitials were lost:

*CCˁa > *Cˁa

*CCa > *Ca

Presyllables could be in various degrees of reduction at any given time: e.g., in the earliest period, unstressed *Ce (stage 1) could have been optionally pronounced as *Ci (stage 2). In Middle Old Chinese, *Cə- and its reduction *C- coexisted side by side. This is analogous to the different degress of reduction of unstressed vowels in English ranging from spelling-like pronunciations to schwa or even zero. In a few cases, unstressed syllables can disappear entirely: e.g., because [bɪˈkʌz] ~ [bɨˈkʌz] ~ [bəˈkʌz] ~ [bkʌz] ~ cause [kʌz].

11.8.3:09: Applying my stages to those forms of because:

Stage 1/2: [bɪˈkʌz]

Stage 3: [bɨˈkʌz] (frontness neutralization)

Stage 4: [bəˈkʌz] (height neutralization)

Stage 5: [bkʌz] (loss; > [pkʌz] with voicing assimilation?) CONSONANTAL VS. VOCALIC THEORIES OF CHINESE EMPHASIS

David Boxenhorn asked me about the implications of consonantal and vocalic theories of Chinese emphasis.

First, let me define how I interpret 'consonantal' and 'vocalic' in this context.

I regard Baxter and Sagart's (2014) reconstruction as a consonantal theory. In my understanding of their system, the locus of emphasis is restricted to the initial consonants of core syllables; there are no emphatic preinitials or presyllable initials, no emphatic vowels, and no spreading of emphasis beyond the consonants.

I advocate what could be called a vocalic theory in the sense that emphasis was ultimately conditioned by low vowels in what I call Early Old Chinese. But in Middle Old Chinese, some of those low vowels were lost, and the locus shifted to the consonant (though emphasis was phonetically present in the following vowel if not the coda). Then in Late Old Chinese, emphasis was lost, and previously predictable vocalic allophones after emphatic and nonemphatic consonants became phonemic:

Early Old Chinese (no phonemic emphasis): /Cʌpi/ [Cʌpi] > [Cˁʌˁpˁiˁ]

Middle Old Chinese (phonemic emphatic consonants): /pˁi/ [pˁiˁ] > [pˁeˁiˁ]

Late Old Chinese (no phonemic emphasis): /pei/ [pei]

So my theory could also be called consonantal or even syllabic depending on which period one is looking at and whether one is looking at phonemes or allophones.

Now back to the question.

Baxter and Sagart (2014: 69) reconstruct 36 emphatic consonants. 35 of them have nonemphatic counterparts; the 36th, *ʔʷˁ, lacks a nonemphatic counterpart *ʔʷ. Conversely, there are no nonemphatic consonants lacking emphatic counterparts. The near-total symmetry between the emphatic and nonemphatic subsets of consonants is striking; it is reminiscent of the near-total symmetry between

- the emphatic and nonemphatic subsets of the phonetic (but not phonemic!) inventory of Cairene Arabic

- the palatalized and nonpalatalized consonants in Russian

(Norman 1994, the originator of the Chinese emphatic theory, regarded Russian nonpalatalized consonants as pharygealized: i.e., what I call 'emphatic'; in any case, the palatalized consonants are not simply nonemphatic.)

If we knew nothing about Slavic language history, we might notice how other Slavic languages have smaller sets of palatalized consonants or even no palatalized consonants at all (e.g., Serbo-Croatian), conclude that Russian is conservative, and project the Russian system back into Proto-Slavic. But that would be a mistake, as we know that palatalization in Slavic was secondary and conditioned by front vowels. The short front vowel */ĭ/ was lost, and palatalized consonant allophones that had once been before it and other front vowels were reinterpreted as phonemes:

*/Cĭ/ [Cʲɪ] > /Cʲ/ [Cʲ] (after loss of short */ĭ/)

*/Ci/ [Cʲi] > /Cʲi/ [Cʲi] (nonshort */i/ retained)

The large phonetic inventory of Cairene Arabic emphatics is due to emphatic spread from five emphatic phonemes /tˁ dˁ sˁ zˁ rˁ/ and the vowel /ɑ/ (Youssef 2014); there is no need to assume that Cairene Arabic preserved a far larger inventory of emphatics than Classical Arabic. (Note, however, that emphatic /rˁ/ and a back /ɑ/ phoneme distinct from /a/ do not exist in Classical Arabic. The origins of these two phonemes are deserving of investigation. I do not assume that all non-Classical traits of modern Arabic varieties are innovations; some could be retentions of traits conserved in the nonstandard dialects of Arabic conquerors but lost in the standard.)

The precedents of Slavic and Cairene Arabic make me hesitant to project the gigantic Old Chinese inventory back into a higher node or even Proto-Sino-Tibetan. There is, to the best of my knowledge, no attested Sino-Tibetan language with such an inventory. Emphatics have not been reported in any variety of Chinese. (Perhaps they are waiting to be detected; sometimes we are blind to the unexpected.) Was the Old Chinese consonant system the last remnant of a huge proto-system that was simplified everywhere else?

I don't think so. I have never seen so many emphatics in any other proto-language. I already wrote about Afroasiatic emphatics at some length last week, so here I will merely state that Ehret's (1995) Proto-Afroasiatic has only seven voiceless emphatics which mostly form 'triads' with nonemphatic voiced and voiceless consonants:

*p' *t' *tl' *s' *c' *k' *kʷ' vs. *p *t (no *tl) *s *c *k *kʷ vs. *b *d *dl *z *j *g *gʷ

In Proto-Afroasiatic, emphatics were ejectives which generally later became pharyngealized in Arabic. (Mehri emphatics seem to be in transition.)

Johanna Nichols' (2003) Proto-Nakho-Dagestanian (Northeast Caucasian) also has such triads:

Ejectives *t' *c' *cc' *č' *čč' *ƛ' *ƛƛ' *k' *kk' *q' *qq'
Voiceless *t *c *cc *čč *ƛƛ *k *kk *q *qq
Voiced *d - *ǯ - - *g - *G -

There is a phonetic reason for triads instead of tetrads with voiced as well as voiced ejectives: voiced ejectives do not and cannot exist.

The presence of voiced as well as voiceless emphatics in Old Chinese indicats that Old Chinese was not like Arabic - that its pharyngealized consonants were not from earlier *ejectives.

Interestingly, Nichols does not reconstruct pharyngealized consonants in P even though they are present in Archi and Rutul. I have not found pharyngealized consonants in Nichols' lists of Archi and Rutul reflexes. That suggests the pharyngealized consonants of those two languages are rare and possibly secondary.

I think Old Chinese pharyngealized consonants are also secondary. But why do I think low vowels conditioned pharyngealization? The low vowel a is the syllabic counterpart of the pharyngeal approximant ʕ (Pulleyblank 1997 and Operstein 2010: 177); it is to ʕ what i, ɨ, and u are to j, ɰ, and w. So I expect an Old Chinese *a-like low vowel to condition pharygealization in neighboring segments - much as back /ɑ/ does in Cairene Arabic - particularly given that northern neighbors of Chinese and their neighbors have harmonic systems in which vowel and consonant qualities are intertwined to some extent. I write the unstressed low vowel triggering pharyngealization as *ʌ, borrowing the symbol for the conventional interpretation of arae a 'bottom a' (ㆍ), the minimal low vowel of Middle Korean. I could have written it as *ă, but I wanted a symbol that was easy to distinguish from *a and that reflected my hypothesis that Chinese once had height harmony like Middle Korean.

My vocalic emphatic theory predicts that all Middle Old Chinese words with emphatic consonants once had emphasis-triggering low vowels. I used to think that Old Chinese *e and *o belonged to the same height class as *a (as they do in my reconstruction of Old Korean) and also triggered emphasis, but I am less certain of that. Maybe *e and *o-syllables also needed a preceding true low vowel to become emphatic:

*(Cʌ)Ce > *Cˁe (no presyllable needed) or *CʌCe > *Cˁe but *Ce > *Ce?

I could reinterpret *e and *o as *aj and *aw or *ja and *wa with *a (cf. Pulleyblank's *ə/*a two-vowel reconstructions of Old Chinese), but that has costs: e.g., it forces me to reinterpret *-ew as *-aɥ, etc.

Mid vowels aside, my theory predicts that Middle Old Chinese words of the type emphatic consonant + higher vowel (*Cˁi / *Cˁə / *Cˁu) should be from earlier *CACI (*A = low vowel and *I = high vowel) sequences. If Chinese borrowed such words from a polysyllabic language (e.g., Austronesian) or vice versa, the polysyllabic sources/borrowings should have begun with low vowels at the time of borrowing. Some Austronesian words with possible Old Chinese relatives pose problems for my theory; I will deal with them next time.

Conversely, if we assume that Old Chinese emphatic consonants are not innovations, then Old Chinese borrowings may preserve emphasis that was once present in the donors. The trouble is that there is no independent or internal evidence to suggest that Austronesian, Kra-Dai, Hmong-Mien, etc. had emphatics. If Old Chinese 狗 *Cə.kˁro 'dog' is a borrowing from Proto-Hmong-Mien *qluwX 'id.', why does it have an emphatic? My vocalic theory could account for the emphasis as being from a low presyllabic vowel and/or the low series vowel *o. (That would be the case even if the direction of borrowing were reversed.)

Lastly for now, my theory predicts that vowel heights for Old Chinese prefixes can be recovered if correlations between prefixes and emphasis can be made. I have yet to test this prediction.

On the other hand, the consonantal theory predicts no correlation between prefixes and emphasis since prefixes are invariably reconstructed with nonemphatics and can occur before both emphatic and nonemphatic-initial roots. A CEREBRAL COUNTEREXAMPLE? OLD CHINESE 首 'HEAD'

Last night, I wrote that there was

a specific word that made me question my old uvular theory - possibly even before I saw Baxter and Sagart's uvular proposal years ago.

That word is 首 'head' which Baxter and Sagart reconstructed as *l̥uʔ. I regard that form as Middle Old Chinese.

Last week, I wrote,

the [Early] Old Chinese initial of 'head' may be from *Kl- [...] if the word is related to Proto-Austronesian *quluh and/or Proto-Tai *krawC (Pittayaporn 2009: 323) / *kləwC (Li Fang-Kuei 1977). Proto-Hmong-Mien *kləuX 'road' (Ratliff 2010: 264) is a loan from Old Chinese 道 containing 首 as a phonetic. If 道 had an initial stop, perhaps 首 did too.

According to my old uvular theory, a *q- would be sufficient to trigger emphasis. If 首 'head' is from *quluh, its later reflexes should have stop initials and lowered vowels as traces of emphasis:

*quluh > *quluʔ > *qɯluʔ > *qluʔ > *l̥ˁuʔ > *l̥ˁouʔ > *tʰouʔ > *tʰauʔ

(The similarity to 頭 Mandarin [tʰou] and Cantonese [tʰau] 'head' is coincidental; 頭 and 首 are unrelated words.)

But the actual Late Old Chinese form of 首 was *ɕuʔ with a fricative from nonemphatic *l̥- and a high vowel that was never lowered by emphasis.

Maybe 首 has nothing to do with Proto-Austronesian *quluh and originally had a *k- as in the Proto-Tai word and the Proto-Hmong-Mien borrowing of its near-homophone 道 'road'.

Or maybe they are tied together.

Baxter and Sagart do not reconstruct *q- as a preinitial or in presyllables. In the dialect reconstructed by Baxter and Sagart, *q- is not automatically emphatic, and I speculate that nonemphatic preinitial/presyllabic *q- fused with nonemphatic *l into voiceless nonemphatic .

If Proto-Tai *krawC is a loan from Old Chinese, it could be from a dialect in which the presyllabic vowel lowered after a uvular that then fronted and became emphatic (due to lower vowel-emphatic harmony):

*qul- > *qɯl- > *qʌl- > *kˁʌl- > *kˁl- > *kˁr-

That emphasis conditioned the lowering of *-u to *-aw.

Proto-Hmong-Mien *kləuX 'road' may be from yet another dialect - one which did not shift *kˁl- to *kˁr-:

*qʌluʔ > *kˁʌluʔ > *kˁluʔ > *kˁlouʔ > *kləuʔ

The rhyme *-əuʔ may be an intermediate stage in bending between *-ouʔ and *-auʔ.

The ancestors of modern Chinese words for 'road' lost the presyllable after it conditioned emphasis:

*qʌluʔ > *kˁʌluʔ > *kˁʌlˁuʔ > *lˁuʔ > *douʔ > *dəuʔ (?) > *dauʔ

Northern Min forms such as Jianyang lau have a secondary l- that is an intervocalically lenited *-d- and not a retention of an original lateral:

*kˁʌluʔ > *kˁʌlˁuʔ > *kʌduʔ > *kʌdouʔ > *kʌdəuʔ (?) > *kʌdauʔ > *kʌlauʔ > lau

(11.4.0:24: The chronology of lenition and presyllabic loss relative to vowel changes is unknown.

For convenience, I have retained the low presyllabic vowel throughout the derivation, though it may have merged with the high presyllabic vowel at some point before the presyllable was lost. Once emphasis became phonemic for the initials of core syllables, the height of presyllabic vowels could have been neutralized without any loss of information:

Stage: locus of distinction 1: presyllabic vowel height 2: core initial emphasis 3: core vowel height
Low vowel presyllable *kʌluʔ *kə *kədouʔ
High vowel presyllable *kɯluʔ *kəl *kəjuʔ

I wrote "vowel" at the top of the column for stage 3 rather than "consonant and vowel" because in many cases the distinctions in stages 1 and 2 leave no traces on the initial: e.g.,

*kʌpuʔ > *kə > *kəpouʔ

*kɯpuʔ > *kəp > *kəpuʔ

Emphatic *pˁ- and nonemphatic *p- have merged into nonemphatic *p- in stage 3; only the vowels are distinct.

On the other hand, emphatic *lˁ- and nonemphatic *l- did not merge with each other. The former hardened and merged with emphatic *dˁ- as nonemphatic *d-, whereas the latter weakened to *j-.) DID BACK CONSONANTS CONDITION EMPHASIS IN OLD CHINESE?

David Boxenhorn asked me if emphasis in Old Chinese (OC) could be conditioned by presyllabic back consonants instead of low vowels: e.g.,

*Q.C- > *Cˁ- rather than *Cʌ.C- > *Cˁ-

My old answer was that both could condition emphasis, though low vowels in presyllables and syllables proper were the primary sources. I thought that

- uvular (and pharyngeal?)-initial syllables

- and perhaps also uvular-initial presyllables

conditioned emphasis (indicated below with *ˁ).

Contrast the development of uvulars with velars in this earlier reconstruction:

Low vowels: emphasis and secondary uvulars

*qa > *qˁa vs. *ka > *qˁa

*qe > *qˁe vs. *ke > *qˁe

*qo > *qˁo vs. *ko > *qˁo

*Cʌ.CV > *CˁV

e.g., *Cʌ.kV > *qˁV

High vowels: emphasis with original uvulars but no emphasis with velars

*qə > *qˁə vs. *kə > *kə

*qi > *qˁi vs. *ki > *ki

*qu > *qˁu vs. *ku > *ku

*Cɯ.CV > *CˁV

except: *qɯ.CV > *CˁV

However, Baxter and Sagart's OC reconstruction contrasts emphatic and nonemphatic uvulars: e.g., *q vs. *qˁ, etc. Such a distinction is rare in the world's languages; it is currently in Archi and Rutul in the Caucasus, far from China. I would rather not reconstruct an exotic distinction - at least not at an early level - so I prefer to reconstruct primary nonemphatic uvulars and secondary emphatic uvulars:

Original nonemphatic uvulars and velars retained after higher vowels:

*qə > *qə; cf. *kə > *kə

*qi > *qi; cf. *ki > *ki

*qu > *qu; cf. *ku > *ku

*Cɯ.qV > *qV; e.g., *Cɯ.kV > *kV

Secondary emphatic uvulars and velars developed after lower vowels:

*qa > *qˁa; cf. *ka > *kˁa

*qe > *qˁe; cf. *ke > *kˁe

*qo > *qˁo; cf. *ko > *kˁo

*Cʌ.qV > *qˁV; e.g.,*Cʌ.kV > *kˁV

Such a complex system eventually broke down when most of the uvulars left gaps to be filled by emphatic velars which backed:

*q(ˁ)- > *ʔ-; new *q- from *kˁ-
*q(ʰˁ)- > *x- (usually*; phonetically [χ] before lower vowels?); new *qʰ- from *kʰˁ-

*ɢˁ- > *ɢ-; new *ɢ- from *gˁ-

*ɢ- > *ʁ- > *ɰ- > *j-; new *ɢ- from *gˁ-

The new system only had one series of uvulars and no phonemic emphasis (though emphasis may have persisted at the phonetic level):

Stage Emphasis Presyllables Uvulars Velars Vowels
Early Old Chinese Nonphonemic Present One series:
One series:
One series without diphthongs:
*a *e *o *ə *e *o
Middle Old Chinese Phonemic Lost to varying degrees in dialects; loss or merger of presyllabic vowels conditioning emphasis made emphasis phonemic Two series:
... vs. *qˁ ...
Two series:
... vs. *kˁ ...
Late Old Chinese Nonphonemic One series mostly from emphatic velars: *q- < *kˁ-
But *ɢ- is from both original uvular *ɢˁ- > *ɢ- and emphatic velar *gˁ-
One series from nonemphatic velars:
Two series with diphthongs:
lower *a *e *o *əɨ *ei *ou
(< *emphatic consonant + *a *e *o *ə *e *o)
higher *ɨa *ie *uo *ɨə *i *u
(< *nonemphatic consonant + *a *e *o *ə *e *o)

(11.3.1:09: The three stages contain roughly the same amount of complexity distributed in different ways. The locus of a binary distinction traveled rightward over time:

Early Old Chinese: low vs. high vowel presyllables

*/Cʌ.pi/ [Cʌ.pi] > [Cˁʌˁ.pˁɪˁ]

*/Cɯ.pi/ [Cɯ.pi]

Middle Old Chinese: emphatic vs. nonemphatic core syllable initials; Baxter and Sagart's reconstruction corresponds to this stage

*/pˁi/ [pˁəˁɪˁ]

*/pi/ [pi]

Late Old Chinese: lower vs. higher vowels

*/pei/ [peɪ]

*/pi/ [pi]

I originally wanted to use back consonants in the examples above, but the uvular-velar shifts would complicate the three-way contrast.)

That's the big picture. Tomorrow I'll look at a specific word that made me question my old uvular theory - possibly even before I saw Baxter and Sagart's uvular proposal years ago.

*11.3.0:45: See Baxter and Sagart (2014: 102-105) for less common reflexes of *q(ʰˁ)-:

- Middle Chinese *ɕ- via *x- before front vowels including secondary fronted *a

- Proto-Min *kʰ-

- Middle Chinese *ʈʰ- from *qʰr- in eastern dialects

For simplicity, I have only listed reflexes of *q(ʰˁ)- without preinitials or presyllables at the Middle Old Chinese level. Such preceding elements resulted in even more reflexes: e.g., Middle Old Chinese *t.qʰ- became Late Old Chinese *tɕʰ- (see Baxter and Sagart 2014: 160; their Middle Chinese initials are almost identical to my Late Old Chinese initials). WHY DO OPPOSITES ATTRACT? THE MEHRI DEFINITE ARTICLE Ḥ(Ə)-

Last week I mentioned the Mehri definite article a- which only occurred before emphatics and voiced consonants.

Mehri has another definite article ḥ(ə)- which only occurs before glottal stops, f-, and voiced nonemphatics:

feature ʔ-, f- b-, d-, g-, l-, m-, n-, r-, s-, w-, y-
pharyngeal(ized) + - -
voiced - - +

Why does voiceless pharyngeal-initial ḥ(ə)- occur before voiced nonpharyngealized consonants? This distribution is what I call diachronic detritus; the synchronically inexplicable result of an earlier sound change. Rubin (2010: 71) wrote that

Many of the nouns with the definite article ḥ(ə)- have an etymological initial ʾ- [i.e.., glottal stop], which is sometimes reflected in the long ā of the definite article ḥā-.

Exceptions have y- from *y- (e.g, yūm 'days') or are due to analogy.

So ḥ(ə)- may have originally only been before *ʔ- which was lost before voiced consonants and *y-. Maybe this was the earlier distribution of articles:

1. *ʔa-

> a- before emphatic and voiced consonants

a- remains in front of ʔ- from voiced *ʕ-

> *ʔ- > zero before voiceless consonants other than *ʔ-

2. *ḥə-

> before *ʔ- (> zero in *ʔ- + voiced consonant clusters) and *y-

Why did *y-words have both *ʔa- and *ḥə-?

Was *ḥə- lost before *ʔ- + emphatic/voiceless consonant clusters?

Why is ḥə- before voiceless nonglottal f-? (I presume Mehri f is from *p- since there is no p in Mehri; was the lenition of *p due to Arabic influence?)

ḥə-f- would not be surprising if that sequence were from *ḥə-ʔw- whose cluster fused into -f-, but such a word would have a nondefinite form with (ʔ)w-, and Rubin did not note any (ʔ)w- ~ ḥə-f-alternations.

That scheme omits the plural definite article hə- ~ ha- which only occurs

- before nouns with the voiceless fricatives s- and ɬ̠- and a CCōC pattern

- the high-frequency nouns həbɛ̄r 'the camels' and hərbāt 'the companions' (sg. ərbāt)

Are the last two forms relics of a period when h-articles were more common? They have nothing in common with the other h-article forms apart from ending in long vowel + consonant sequences. DID *I CONDITION *A-FRONTING IN OLD CHINESE?

Two nights ago I reconstructed Old Chinese (OC) 射 with presyllabic *-i- without comment:

*mi-lak (*m-ljak?) 'to hit with an arrow' ~ *mi-lak-s (*m-ljak-s?) 'to shoot with a bow, archer'

I want to explain my reasoning here.

One mystery in OC phonology is why *a behaves in two different ways in syllables of the type *T(s)a(k/j): e.g.,

蛇 with two readings:

Early OC *mlaj 'snake' > Late OC *ʑjæj with a front vowel

Early OC *laj (second syllable of 'compliant') > Late OC *jɨaj with a nonfront vowel

Here are some approaches to the problem:

- Starostin (1989) reconstructed *a and *ia; the later is equivalent to my *ja above.

- A few years ago, I reconstructed *a and *æ.

- Baxter and Sagart (2014) used the notation *a and *A, stating that "Our *A is not intended as a seventh vowel; it is an explicitly ad hoc notation that basically means 'a case of OC *-a which for as yet unexplained reasons becomes MC -jae instead of MC -jo [i.e., the regular reflex].' "

My fourth approach involves the frontness of *-i- moving into the following syllable:

Early OC *mi-laj > *mljaj > *nɮjaj > *ɮjaj > Late OC *ʑjæj 'snake'

Early OC *laj > *lɨaj > *ɮɨaj > *ʑɨaj > Late OC *jɨaj (second syllable of 'compliant')

There was a chain shift *nɮ- > *ɮ- > *ʑ- > *j-. Prenasalization shielded *nɮ- from weakening to *ʑ- like *ɮ-.

Normally *a-breaking is conditioned by a high-vowel presyllable. 'Compliant' may have had only one such syllable, and the remainder of the word harmonized with it:

委蛇 Early OC *Cɯ-q(r)oj-laj > Late OC *ʔuoj-jɨaj

I assume presyllables had short vowels that were reduced versions of the six in main syllables: *i, *ə, *u, *e, *a, *o. The last three merged into a low vowel I write as *ʌ. The first three merged into a high vowel I write as except in a few cases where later fronting is a trace of *i. Perhaps at one point there was a triangular three-vowel system in presyllables somewhat like that of Pacoh: *i vs. a nonfront high vowel vs. *ʌ.

Unfortunately, I do not have any evidence for *-i- other than the fronting in later OC. Why was this fronting in such a specific environment (*coronal + *a + *-k or *-j)? Why didn't fronting occur in *-ŋ final syllables which usually develop like *-k syllables? *-ŋ has more in common with *-k than *-j, yet fronting only occurred before the latter two codas. WHAT WAS THE RANGE OF OLD CHINESE ROOSTERS?

The Chinese character 酉 currently only represents the word for the tenth Earthly Branch conventionally translated as 'rooster' in English.

The character is a drawing of a wine vessel and is used as a semantic element in characters for words with alcoholic semantics. Moreover the words 酉 for 'tenth Earthly Branch' and 酒 'wine' rhyme. So it seems likely that 酉 was originally devised for a word 'wine vessel' that was cognate to 酒 'wine'. However, are there any texts in which 酉 means 'wine vessel'?

Premodern dictionaries list three other definitions for 酉:

1. 就 (Shuowen, c. 100 AD) which has many possible translations (e.g., 'to go to'); I don't know which one was intended

2. 飽 (Guangyun, 1008 AD) 'satiated'

3. 老 (Guangyun, 1008 AD) 'old'

How old are those definitions? Is there textual support for them?

All these definitions had *u in Early Old Chinese (EOC) like 酉 and 酒. Did 酉 represent five unrelated near-homophones that later greatly diverged in Late Old Chinese (LOC)?

1. 酉 'tenth Heavenly Branch': EOC *N-ruʔ (Baxter and Sagart 2014: 372) > LOC *juʔ

2. 酉 'wine vessel', possibly cognate to 酒 EOC *tsuʔ (Baxter and Sagart 2014: 347) > LOC *tsuʔ

3. a cognate of 就 '?': EOC *[dz]u[k]s > LOC *dzuh

4. a cognate of 飽 'satiated': EOC *pʌ-ruʔ > *prˁuʔ > LOC *pɔuʔ > *pæuʔ

5. a cognate of 老 'old': EOC *Cʌ-ruʔ > *rˁuʔ > LOC *louʔ > *lauʔ

Why was 酒 EOC *tsuʔ 'wine' written with a phonetic 酉 EOC *N-ruʔ 'wine vessel'? Was a common rhyme and similar semantics sufficient to justify the choice of a phonetic with completely different initials? And what is Baxter and Sagart's justification for reconstructing a nasal prefix in 酉 EOC *N-ruʔ?

But what if my OC palatal hypothesis is correct, and 酉 'wine vessel' and 酒 were cognates with similar initials? (Also see these two follow-up posts.)

酉 'wine vessel': EOC *N-cuʔ > *ɟuʔ  > LOC *juʔ

酉 'tenth Heavenly Branch' might have had *N-c- or original *ɟ-

酒 EOC *cuʔ > LOC *tsuʔ

According to the palatal hypothesis, 就 might have been EOC *N-Cu(k)-s (*C = an unknown palatal) which would have been a good phonetic match for 酉 *ɟuʔ.

However, there is no reason to believe that 飽 and 老 ever had any palatals. Nor is there any reason to reconstruct palatal prefixes. So why would 'satiated' and 'old' be written with a palatal phonetic 酉? Did the use of 酉 for *r-words reflect a dialect or dialects in which  (already lenited to *j?) and *r had converged (or even merged?): e.g., to  and *ʐ? (Starostin reconstructed Eastern Han *ʑ- as the source of Postclassic and Middle Chinese *j-.) It would not be unreasonable to write forms like 飽 *pʐˁuʔ 'satiated' and 老 *ʐˁuʔ 'old' as 酉 *ʑuʔ. Such forms would not be in early texts in which  and *r were distinct. If 酉 represented 'satiated' and 'old' in early texts, I would have to abandon this explanation.


While researching this post, I rediscovered a 2008 post in which I compared the Common Tai ʔj- : Jiamao Hlai tsh- correspondence to the *j-*ts- alternations in Middle Chinese that I derive from Old Chinese palatal *ɟ-*c-. Last year I proposed an implosive palatal stop *ʄ- as a source of later Tai  ʔj-. Norquest (2007: 13) regarded Jiamao Hlai as a "non-Hlai language which has been in close contact with Hlai", so Jiamao Hlai tsh- may reflect a Hlai obstruent at the time of borrowing. Could that early Hlai obstruent have been *ʄ-? Can *ʄ- be reconstructed at the Proto-Kra-Dai level?

According to Norquest (2007: 338), Jiamao tsh- is in borrowings that had Proto-Hlai *tɕh-; even earlier Jiamao borrowings of words with that initial phoneme have ts- from pre-Hlai *c-. Is pre-Hlai *c- a reflex of a Proto-Kra-Dai *ʄ-? Unfortunately, I can't find the only example of a Common Tai ʔj- : Jiamao Hlai tsh- correspondence that I have (CT ʔjuu B : J tshu 'to be') from Shintani (1991: 2) in Norquest (2007).

Norquest (2007: 348) reconstructed pre-Hlai *lj- as a source of Jiamao unaspirated ts- in loanwords. That reminds me of how I used to reconstruct 酉 as EOC *luʔ though its phonetic series had *ts- in Middle Chinese (e.g., 酒 Middle Chinese *tsuʔ 'wine'). The difference is that pre-Hlai *lj- was borrowed into pre-Jiamao as *lj- which hardened to *dʑ- and devoiced to ts-, whereas there is no reason to believe that the affricate of 酒 Middle Chinese *tsuʔ is the product of hardening and devoicing; those features must be projected back into the Old Chinese reading of 酒 though they conflict with all evidence pointing toward a voiced initial for its phonetic 酉. WHY WAS OTTOMAN TURKISH ض <Ḍ> PRONOUNCED IN TWO DIFFERENT WAYS?

To answer that question, I looked at three old textbooks:

Hagopian (1907: 9): "It is generally pronounced as a hard [= emphatic] z, but sometimes as a hard d."

Barker (1854: 2): "d hard, and sometimes z"

Vaughan (1709: xxvi, 2): ’z without any reference to a stop (Oddly, ظ <ẓ> was romanized as an affricate ’dz!)

Was there a new layer of Arabic loans after1709 in which Arabic was borrowed as d instead of z? I can't imagine why that would be the case, so my guess is that Vaughan left out the d-variant, and that the variation reflects two or more strata of borrowing from Arabic before 1709.

Looking at Embarki (2013: 25-26) to go back a millennium in Arabic itself, I see that Al-Khalīl (d. 786) grouped ض <ḍ> with شج <sh> and ج <j> as 'arched' (شجريه shajriyya) in Kitāb al-`ayn. That implies <ḍ> sounded somehing like sh and j. Could it have been a lateral fricative [ɮˤ] or affricate [dɮˤ]?

Sībawayh rejected the pronunciation of ض <ḍ> and ظ <ẓ> as [θ] as "bad" in Al-Kitāb (793). This theta s reminiscent of Proto-Semitic *θˁ which is the source of Arabic /ðˁ/ ~ /zˁ/ written as ظ <ẓ>. I assume <ẓ> *ðˤ and (a continuant pronunciation?) of <ḍ> first merged into *ðˤ, devoiced to become [θˤ], and merged with [θ].

Retsö (2013: 435, 439-440) collected data pointing toward an earlier lateral pronunciation of the consonant written as <ḍ>:

- "Traces of such an articulation are found in some modern dialects in the southern peninsula"

- The name of the pre-Islamic Arabic god Ruā appears in 7th c. BC cuneiform as <ru.ul.da.a.ú> with <...l.d...>

- Arabic al-qaḍī was borrowed into Spanish as alcalde with -ld-

- "In the Modern South Arabian languages we find a laterialized and glottalized apico-alveolar consonant that etymologically corresponds to Arabic /ḍ/": e.g., Mehri ź /ɬ̠ʼ/ (I presume).

Yet he concluded "there is no real evidence that the present-day [stop] realization of the ḍād is secondary". I do not understand why.

Ehret (1995: 481) reconstructed the Proto-Semitic source of this consonant as a stop *dˁ as well as a lateral ultimately going back to a Proto-Afroasiatic lateral *dl which was "probably" an affricate (i.e., [dɮ]?). I stated my objections to a stop interpretation here. Would Retsö regard Arabic /ḍ/ as a retention of Proto-Semitic source of this consonant as a stop *dˁ?

All that reminds me of the Tangut initial that Tai Chung-pui (2008: 201) reconstructed as ld-. Tibetan transcriptions for that initial include zl- and even a single instance of c-. I suspect it was a lateral affricate [dɮ] like Proto-Afroasiatic *dl. I have yet to look into the origin of ld-.

One ld-word with external cognates is

5710 1ldiq3 'arrow' (transcribed in Tibetan as ldi(H), zliH, d-ya; see Tai Chung-pui 2008: 198)

which Guillaume Jacques (2014: 161) derived from pre-Tangut *S-lje and cognate to Japhug zdi < *l- 'arrow'. I used to think 5710 was cognate to Old Chinese 矢 *hliʔ < *sl-? 'arrow'. Now I think it might be cognate to Tibetan mda < *mla 'arrow' (via Bodman's law) and Old Chinese 射 *mi-lak (*m-ljak?) 'to hit with an arrow' ~ *mi-lak-s (*m-ljak-s?) 'to shoot with a bow, archer', as *-a often rose to -i in Tangut.

Could 矢 and 射 be the zero and *a-grades of a root *lj-K? But there is no external support for a *-j- in the root. Unlike Guillaume, I don't think there was a medial *-j- in the Tangut word for 'arrow'. No other cognates contain -j-.

Was there a consonant between *S- (which conditioned Tangut vowel tension that I write as -q) and *-l- in pre-Tangut that fused with *-l- to become an affricate [dɮ]?

Even if I believed in reconstructing *-j- in 'arrow', I could not derive [dɮ] from *-lj- before *S- as Guillaume's *S-ljo 'head' became

0124 2luq3 (transcribed in Tibetan as lu; see Tai Chung-pui 2008: 220)

with l-, not ld-.

I also wouldn't reconstruct *-j- in 'head'; I think the pre-Tangut form was *S-luH which is close to Old Chinese 首 *hluʔ < *sl-? 'head'.

However, the Old Chinese initial of 'head' may be from *Kl- rather than *sl- if the word is related to Proto-Austronesian *quluh and/or Proto-Tai *krawC (Pittayaporn 2009: 323) / *kləwC (Li Fang-Kuei 1977). Proto-Hmong-Mien *kləuX 'road' (Ratliff 2010: 264) is a loan from Old Chinese 道 containing 首 as a phonetic. If 道 had an initial stop, perhaps 首 did too. WHAT IS THE ORIGIN OF THE VOICING CONTRAST IN ARABIC EMPHATICS?

Last night, I wrote (in haste as always),

In Afroasiatic (including Mehri), there is a three-way contrast between emphatics and voiceless and voiced nonemphatics

But as I've long known, that is certainly not accurate for Modern Standard Arabic which has a four-way contrast in its alveolar stops: /tˁ t dˁ d/. And some speakers have also a four-way contrast in their alveolar sibilants: /sˁ s zˁ z/. (Other speakers have /ðˁ/ instead of /zˁ/, but no one has a voiceless dental fricative /θˁ/.) However, there is no four-way contrast for nonalveolars.

According to Islam Youssef (2006: 13, 16), Cairene Arabic has voiced and voiceless emphatics as well as voiced and voiceless nonemphatics throughout its inventory of allophones, but has only five emphatic consonant phonemes which are all coronals: /dˤ tˤ sˤ zˤ rˤ/. The distinction between /r/ and /rˤ/ (Youssef 2006: 25) is absent from Modern Standard Arabic.

To be on the safe side, I've inserted "usually" in my statement since a voicing contrast in Afroasiatic emphatics is unusual in my extremely limited experience. If Arabic is not alone, I'd like to know.

In any case, Ehret (1995: 481-482) did not reconstruct a voicing contrast in Proto-Afroasiatic (PAA) emphatics. Here are his sources for the voiced and voiceless emphatic pairs in Arabic:

PAA *tʼ  Proto-Semitic *tˁ > A /tˁ/

PAA *dl  > Proto-Semitic ~*dˁ > A /dˁ/

Omotic has ejective reflexes. I wonder if Ehret did not reconstruct this consonant as an emphatic at the PAA level because he wanted to avoid having two lateral emphatics. His PAA reconstruction has either zero or one emphatic per consonant class.

I think the Arabic stop might be an innovation since other Semitic languages in the comparative table at David Boxenhorn's blog have fricatives. I would rather not have Proto-Semitic *dˁ lenite independently multiple times while remaining intact in Arabic.

PAA *tlʼ  > Proto-Semitic *sʼ ~ *sˁ > A /sˁ/

PAA *sʼ lost its emphasis and merged with *s in pre-Proto-Semitic and hence also in Arabic.

PAA *čʼ  > Proto-Semitic *θˁ (or *tʲʼ?) > A /ðˁ/ ~ /zˁ/

Ehret cited Omotic languages which have čʼ today.

Did A /ðˁ/ become /zˁ/ to be the emphatic counterpart of /z/ so that both voiced emphatics were alveolars?

10.29.14:41: This shift also reduced the markedness of the segment since alveolar fricatives are more common than dental fricatives.

The last three sound changes have no parallels in Old Chinese unless *lˤ- became *ɮ- before hardening to *d-. I have yet to see any Afroasiatic-Sinitic parallels in emphatic evolution despite the existence of emphatic-conditioned Semitic-Sinitic vocalic parallels. Voiceless obstruents never became voiced in Old Chinese (though the opposite often occurred). HOW IS MEHRI LIKE PROTO-INDO-EUROPEAN (AND UNLIKE OLD CHINESE)?

I have long been bothered by the glottalic theory because I didn't know of any example of a language whose ejectives had become voiced stops. Then last night I discovered Mehri in which ejective (= Wikipedia's 'emphatic') fricatives have voiced allophones - and voiced consonants have ejective allophones (emphasis mine)!

Voiced obstruents, or at least voiced stops, devoice in pausa. In this position, both the voiced and emphatic stops are ejective, losing the three-way contrast (/kʼ/ is ejective in all positions). Elsewhere, the emphatic and (optionally) the voiced stops are pharyngealized. Emphatic (but not voiced) fricatives have a similar pattern, and in non-pre-pausal position they are partially voiced.

Rubin (2010: 14) wrote (emphasis mine),

As Johnstone also notes, it is not completely clear how the glottalic [= Wikipedia's 'emphatic'] consonants fit into the categories of voiced and voiceless. Johnstone (AAL [Afroasiatic Linguistics], p. 7) wrote that they are "perhaps best defined as partially voiced." What is certain is that the glottalic consonants pair with voiced consonants when it comes to certain morphological features, for example the appearance of the definite article (§ 4.4) and the prefix of the D/L-Stem (§6.2). 

Let's look at Rubin's coverage of those two features:

The definite article a- is found before the consonants b, d, ð, ð̣, g, ġ, j, ḳ, l, m, n, r, ṣ, ṣ̌, ṭ, w, y, z, and ź (voiced and glottalic consonants), though not all nouns beginning with those consonants take the article a-.


The definite article is also used with nouns beginning with ʾ, though only when the ʾ derives from etymological ʿ.


The definite article a- usually does not occur (or, one could say it has the shape Ø) before the consonants f, h, ḥ, k, s, ś, t, t, and x (voiceless, non-glottalic consonants). (p. 69)

The prefix a- appears only in [D/L-Stem verbs] when the initial root letter is voiced or glottalic, similar (but not identical) to the distribution of the definite article (see § 4.4). (pp. 93-94) 

Conversely, Rubin paraphrased Johnstone's observation about nonemphatics on p. 6 of Afroasiatic Linguistics:

Aspiration of most of the voiceless non-glottalic [= nonemphatic] consonants constitutes an important element in the distinction of glottalic/non-glottalic pairs. (p. 14)

Although emphatics conditioned similar vowel changes in Mehri and Old Chinese (OC), the resemblance between the two stops there:

- In Afroasiatic (including Mehri), there is usually a three-way contrast between emphatics and voiceless and voiced nonemphatics, but in OC, there was a six-way contrast defined by voicing, aspiration, and emphasis: e.g.,

Mehri: /tˁ t d/ : cf. Proto-Indo-European (and Archi [Northeast Caucasian] and Ubykh [Northwest Caucasian]) */tʼ t d/

OC: *tˁ *tʰˁ *dˁ *t *tʰ *d

- There is no pairing between emphatics and voiced nonemphatics in Old Chinese.

Cf. how ejectives and voiced nonejectives did merge in many Indo-European varieties.

- Aspiration plays no role in distinguishing between emphatics and voiceless nonemphatics in Old Chinese.

But it may have distinguished between ejectives and nonejectives in Proto-Indo-European: e.g., */tʼ t d/ could have been *[tʼ tʰ dʱ].

These differences between Mehri and OC do not necessarily invalidate what I could call the 'Chinese Glottalic Theory'. But they do imply that there are limits of using Mehri (or any similar language) for predicting phonetic phenomena in OC.

I wish that OC did not have unique typology. Maybe it didn't. I would not be surprised if emphasis played a role in the development of Tangut vowels. HOW IS MEHRI LIKE OLD CHINESE?

I have known for some time about the existence of the Modern South Arabian languages (not to be confused with Old South Arabian). However, I had never read anything about them. I couldn't even name one until tonight when I discovered Rubin's The Mehri Language of Oman (2010). I immediately went to the section on vowels and found synchronic shifts reminscent of diachronic shifts in Old Chinese:

Mehri shift Glottalic Gutturals Liquids Cf. Old Chinese
/iː/ > [aj] *Cˁi > *Cej (> southern *Caj)
/uː/ > [aw] *Cˁu > *Cow > *Caw
/eː/ > [aa] X no parallel

I have converted Rubin's notation into IPA as used in the Wikipedia article on Mehri.

Lowering in Mehri is conditioned by three classes of consonants:

Glottalic: /tʼ θʼ ɬ̠ʼ sʼ kʼ/ (no known cases of /ʃʼ/ followed by /iː uː eː/)

Guttural: /χ ʁ ħ ʕ ʔ h/

Liquid: /r l/ if "there is normally a glottalic or guttural consonant elsewhere in the root" (Rubin 2010: 29; i.e., glottalic and guttural consonants' lowering effects can spread 'through' them: e.g., [məʁrajb] 'well-known' which I presume is phonemically /məʁriːb/)

10.27.11:52: This phenomenon is reminscent of the 'transparency' of Thai sonorants (not just liquids) in tonal development: e.g., อร่อยʔàrɔ̀ɔy 'delicious' should in theory have an r-conditioned falling tone on its second syllable, but it has a glottal stop-conditioned low tone as if the -r- didn't exist. However, in Thai, the conditioning consonant has to precede the 'transparent' sonorant, whereas in Mehri, the consonants conditioning lowering can be in the order liquid ... glottalic/guttural as well as the reverse: e.g., [məlawtəʁ] 'killed' (masc. pl.) which I presume is phonemically /məluːtəʁ/.

Wikipedia calls the glottalic consonants 'emphatic' and lists pharyngealized allophones for all but /kʼ/. Thus they may be comparable to the Old Chinese pharygealized 'emphatic' consonants that conditioned the lowering of high vowels. However, Mehri has a limited set of mostly coronal emphatics, whereas Old Chinese had an emphatic counterpart of every single nonemphatic consonant and even *ʔʷˁ which had no nonemphatic counterpart (Baxter and Sagart 2014: 69).

Early Old Chinese as reconstructed by Baxter and Sagart did not have any uvular or pharyngeal fricatives. I have hypothesized that Old Chinese had pharyngeal fricatives for Old Chinese, but there is no strong evidence for them. I do, however, believe Late Old Chinese developed uvular fricatives accompanied by lowered vowels.

The Old Chinese counterparts of Mehri glottals and liquids did not condition lowering. I would not have expected Mehri /ʔ h/ to condition lowering since I have never heard of nonpharygealized glottals having that effect in any other language. HOW DID CAMISIA BECOME KAMĪZ?

I was surprised that Latin camisia 'shirt' was borrowed into Arabic as قميص qamīṣ with three features I woudn't expect:

1. uvular q for [k]

2. a long ī for short [i] (to match an existing a ... ī vowel template? lengthening in some intermediary language?)

3. an emphatic for nonemphatic [s]

Why not borrow the word as *kamis? And why does Urdu has -z in qamīz? I assume kamīz and kamīj show different degrees of Indicization (i.e., avoidance of un-Indic q and z).

10.26.0:26: While I'm on this topic, a kamīz is half of a shalvār kamīz outfit. Why was Persian شلوار‎‎ shalvār borrowed into Arabic as سروال‎ sirwāl instead of شلوار‎‎ *shalwār? Was i ... ā an existing vowel template? Why not preserve the consonants instead of backing s and reversing l and r?

10.26.3:26: Is kamīz a direct borrowing from Portuguese camisa [kɐmizɐ]? Is qamīz a compromise between qamīṣ and kamīz? WHY RECONSTRUCT BOTH *-J- AND *-I̯- IN PROTO-HMONG-MIEN?

After a two-day detour into Korean ... back to my favorite topic!

I've written a series of posts starting here about how I've been troubled by Ratliff's (2010) Proto-Hmong-Mien (PHM) distinction between *j *ʷ in onsets and i̯ u̯ at the beginning of rhymes. Why not simply abandon the distinction, reduce the four to two (*j *w), and rewrite *-ji̯- as *-j-? The short answer is because the distinctions indicated by her notation are real, even if that notation may not be phonetically precise. Let's look at *tshji̯əŋ 'new' (last seen here) again and break it up into an initial and a rhyme.

I have no doubt that PHM had a distinction between the initials that Ratliff reconstructed as *tsh- (3.2) and *tshj- (3.17) which have different reflexes. Initial 3.17 clearly has a palatal quality absent from initial 3.2. Ratliff already reconstructed an aspirated palatal stop *ch- (4.2).

PHM Hmongic Mienic
Yanghao Jiwei White Hmong Zongdi Fuyuan Jiongnai Pa-Hng Luoxiang Mien Mun Biao Min Zao Min
3.2. *tsh- sh- s- txh- [tsh] s- tsh- θ- ɕ- θ- tθ- s- h-
3.17. *tshj- ɕh- ɕ- tsh- [tʂh] ɕ- s- s-
4.2. *ch- tɕh- tɕh- ch- [ch] tɕ- tɕh- t- tɕh- s-/ȶ- ȶh- ts-/f-

(I have excluded the initials of 'thousand' from 3.2 since some forms look like they might be post-PHM borrowings from Chinese. I excluded the initials of 'new' from 3.17 [even though I chose 3.17 because 'new' was reconstructed with it!] for reasons I will discuss next time.)

Those PHM initials remained unchanged in Ratliff's reconstructions of Proto-Hmongic and Proto-Mienic.

If 3.17 were something other than *tshj-, it might have been *tʃh- or *tʂh- (as in White Hmong tsh [tʂ]) which are absent from Ratliff's reconstruction. A retroflex value may fit Pulleyblank's hypothesis of *ks-retroflexion.

I am hesitant to reconstruct a palatal affricate *tɕh- since I know of no language contrasting palatal affricates with stops.

Similarly, rhyme 18g (*-i̯əŋ) is clearly more palatal than rhyme 21d (*-əŋ):

PHM Hmongic Mienic
Yanghao Jiwei White Hmong Zongdi Fuyuan Jiongnai Pa-Hng Luoxiang Mien Mun Biao Min Zao Min
18g. *-i̯əŋ -i -ia -æin -en -eŋ -ĩ, -e -aŋ -(j)aŋ -aŋ/-in -jaŋ/-ɛŋ/-ɔŋ
21d. *-əŋ -aŋ -ɑŋ -o [ɔ] -oŋ ? -æ̃ ? -ɔŋ -ɔŋ

It does not help that Ratliff (2010: 161) gave only a single example of 21d, though it is a basic word (*hməŋH 'night'). I am surprised that *-əŋ is rarer than *-i̯əŋ; that may imply that 21d was more marked than 18g rather than the other way around. Or is the rarity of 21d simply random?

10.25.16:57: Are there reflexes of 18g and 21d with schwa? None of the sample of eleven languages have schwa. Of course there is no need for a proto-phoneme to be preserved intact in at least one descendant language, and a mid central vowel could easily lower to a or back to o.

Ratliff reconstructed the Proto-Hmongic reflex of 18g as *-in. Other PHM rhymes that merged with 18g in Proto-Hmongic are 18b *-im, 18c *-in, 18d *-iŋ, 18e *-i̯əm, and 18f *-i̯ən. I wonder if Proto-Hmongic *-in was phonetically something like *[iən] (cf. Southern American English diphthongization). White Hmong -ia more or less retained the original diphthong but lost the *-n, whereas Zongdi reversed the diphthong but kept the nasal: *-iən > -æin. Yanghao lost the schwa while Jiwei, Fuyuan, and Jiongnai fused *iə into a mid front vowel. Pa-Hng generally lost the schwa like Yanghao, but its word le³ for 'plum' has a front vowel. (I expected Jiwei, Fuyuan, and Jiongnai to also have front-vowel words for 'plum', but Pa-Hng is the only Hmongic language with 'plum' in Ratliff's language sample.) WHY DOES KOREAN'S 'LARGE' CONSONANT HAVE A LOW FREQUENCY?

I love phonetic statistics. The first data set I ever saw was in Whitney's Sanskrit Grammar. The symmetry of Sanskrit phonology contrasted with the skewed distributions of phonemes in actual usage: e.g., this 5 x 5 grid of stops and nasals in which t is 665 times more common than jh!

Velars k 1.99 kh 0.13 g 0.82 gh 0.15 0.35
Palatals c 1.26 ch 0.17 j 0.94 jh 0.01 ñ 0.22
Retroflexes 0.26 ṭh 0.06 0.21 ḍh 0.03 1.03
Dentals t 6.65 th 0.58 d 2.85 dh 0.83 n 4.81
Labials p 2.46 ph 0.03 b 0.46 bh 1.27 m 4.34

The numbers are tiny and are hard to read. I apologize for any transcription errors. They

give the average percentage of frequency of each sound, found by counting the number of times in which it occurred in an aggregate of 10,000 sounds of continuous text, in ten different passages, of 1,000 sounds each, selected from different epochs of the literature: namely, two from the Rig-Veda, one from the Atharva-Veda, two from different Brāhmaṇas, and one each from the Manu, Bhagavad-Gītā, Çakuntalā, Hitopadeça, and Vāsa-vadattā (J.A.O.S., vol. X. p. c1). (p. 26)

Last night I found Pak Chae-yŏn's 고어사전 Koŏ sajŏn (Dictionary of Old Words) which listed the number of entries beginning with each premodern Korean consonant. I have rearranged his table to facilitate discussion:

The aspirates are much less common than their unaspirated counterparts, and aspirated kh-, the initial consonant of khɯ- 'large', is the least common - even though its unaspirated counterpart k is the second most common! Why?

Glottals h- 690 Ø- (including y-, w- which are also written with <Ø>) 1234
Velars k- 4660 kh- 70
Sibilants ts- 1212 tsh- 544 s- 1715
Dentals t- 992 th- 221 n- 781 r- 112
Labials p- 1372 ph- 276 m- 881

I think those figures support the hypothesis of secondary aspiration in Middle Korean. If aspiration is from a preceding *h- or *k-, and if *kk- simplified to k-, then kh- had only one source whereas the other aspirates could have at least two - or perhaps even up to five in the case of tsh- which is twice as common as ph- and th-:

Premodern Korean initial consonant Ratio of frequency relative to frequency of kh- Number of sources *k-source(s) *h-source(s) *s-source
tsh- 7.77 5 *kts-, *ks- *hts-, *hs- *sts-
ph- 3.94 2 *kp- *hp-
th- 3.15 2 *kt- *ht-
kh- 1 1 none (*kk- > k-) *hk-

I propose *Cs-clusters as additional sources of tsh-, just as I proposed *ks- as a source of *tsh- in Old Chinese and in Proto-Hmong-Mien. WHY DO KOREAN 'SEVEN' THROUGH 'NINE' END IN THE SAME CONSONANT? THE SEVEN-*UP HYPOTHESES

Last night I was thinking about Korean numerals, and it occurred to me that if ire 'seven days' was from ilgop 'seven' minus -op plus -e, then the root for 'seven' was il.

Similarly, the root for yŏdŏl (spelled 여덟 <yŏtŏlp>) 'eight', yŏdŭr-, can be extracted from yŏdŭre 'eight days'. (l and r are allophones of a single liquid phoneme.)

And the root for ahop 'nine', ah-, can be extracted from ahŭre 'nine days', whose nonharmonic -ŭre- might be by analogy with yŏdŭre.

Did early Korean have a suffix *-Up (whose vowel depended on vowel harmony) for 'seven' through 'nine'?

*nirk-up > Middle Korean nirkup > modern Korean ilgop (the shift of -u- to -o- is irregular)

*yətɯr-up > Middle Korean yətɯrp > modern Korean yŏdŏl (the final -p remains in some dialects: e.g., yadap, yədap, yədəp (see Chhoe 1978: 1047-1048 for details on locations) and yɐdɐp as well as yɐdɐl throughout Cheju

*ah-op > Middle Korean ahop > modern Korean ahop (*o harmonized with the low vowel *a)

That 'seven-*up hypothesis cannot be correct because the gamma of Middle Korean nirɣwəy 'seven days' is a trace of a root-final *-p:

*nirkup-əy > *nirgubəy > *nirɣuβəy > *nirɣβəy > Middle Korean nirɣwəy > modern Korean ire

I forgot that I had more or less proposed that shift last year. (In that earlier version, I wrote *-k- instead of *-g-. I don't know why I did that. I assume the lenition of *k after *r is a case of irregular compression. If such lenition were regular, *nirkup should have become Middle Korean *nirɣup.)

In that earlier post - or this even earlier post about counting days with an erroneous explanation of the words for 'seven days', 'eight days', and 'nine days' - I did not address why 'seven' through 'nine' all end in -p. I didn't even notice that until last night, almost thirty years after I first learned to count in Korean. I think it's because the -p of 'eight' is silent in modern standard Korean. I did, however, notice the -t of the four previous numerals right away: set 'three', net 'four', tasŏt 'five', and yŏsŏt 'six'.

Here's a different seven-*up hypothesis - one in which 'seven' is not a root 'seven' followed by a suffix *-up. What if root-final *-up spread by analogy to the following two numerals which were originally *yətɯr and *ah (cf. Middle Korean ah-ʌn 'ninety')?

*yətɯr*yətɯr-up*yətɯr-ɯp > yətɯrp

*ah > *ah-up > *ah-op (with suffix harmony) > ahop

(10.23.14:03: Chhoe 1978: 1046 lists ahup in Kyŏngnam, Phyŏngnam, and Phyŏngbuk. Could this be a retention of *ah-up without suffix harmony? Another Kyŏngnam form also found in Chŏnnam, agop, may retain a medial velar that weakened to -h- elsewhere; if so, then the root was *ak-.)

But if that were the case, why isn't the Middle Korean word for 'nine days' *ah-ay with the lower vowel variant of the suffix *-Ay? Perhaps the actual word ahʌray 'nine days' has -ʌray by analogy with yətʌray 'eight days'.

The latter word has an unexpected mix of higher and lower class vowels unlike yətɯrp 'eight' which only has higher class vowels. How can I account for that mismatch? I think Cheju yɐdɐp 'eight' reflects an earlier *yʌtɯr 'eight'. *yʌ then monophongized to *e outside Cheju. When Korean (i.e., non-Cheju Koreanic) developed vowel harmony, *etɯr became *etʌr and 'eight days' became *etʌr-ay. *-ʌray then spread to 'nine days'. Then *e broke to (via *ye?) with the higher vowel ə, and *yətʌrp (which may have 'grown' a -p by analogy with 'seven' by this point) was harmonized again as yətɯrp (regaining the ɯ lost during the first harmonization!) while yətʌray remained nonharmonic (perhaps to maintain parallelism with ahʌray 'nine days').

10.23.13:09: Perhaps Korean vocalic history can be described as a tug-of-war between harmony and analogy followed by the breakdown of harmony. Middle Korean yətɯrp 'eight' has harmony, whereas Middle Korean yətʌray maintain its original vocalism because of analogy. But neither harmony nor analogy can account for the lowering of u in modern Korean ilgop from Middle Korean nilkup. I would expect modern Korean *ilgup.

APPENDIX: The 二中歷 Nichūreki transcriptions of c. early 12th century Korean numerals

In Nichūreki, the Korean words for 'seven' through 'nine' were transcribed as <tarikuni>, <tirikuni>, and <etari>.

The first two look like attempts to write 'seven'. Their *n- might have been *[ⁿd] or *[nᵈ] or even *[d] and was hence transcribed with <t>-initial kana. Yoshida (2008: 2) cited Takeyasu (2004) on how modern Korean denasalized onsets can be perceived as voiced stops by non-Koreans. Could denasalization have been an old phenomenon in Korean?

The third looks like an attempt to write 'eight' (even though it was glossed as 'nine'). Perhaps the Korean informant said something like, "ndirkun, ndirkun, (y)...tʌr", repeating 'seven' twice, and the Japanese scribe missed the word for 'nine' and misinterpreted 'seven, seven, eight' as 'seven, eight, nine'.

I can't explain why 'seven' was transcribed with <ni> instead of an <f>-kana. No Korean dialect in Chhoe (1978: 1050-1051) has an -n-final word for 'seven'. The only nasal-final form, Chhŏngju ilgom, has m, not n. The Chinese transcription in 雞林類事 Jilin leishi (c. 1103) is 一急 *iʔ kiʔ without a final nasal. (My reconstruction may be too innovative. A more conservative reconstruction *il kip would still lack a final nasal. In any case, the transcription also lacks an initial nasal. Perhaps Korean */ni/ was *[ɲi], and this was misheard as *[i] since I doubt 孫穆 Sun Mu's Chinese dialect had *ɲ. By coincidence, Vietnamese has the readings nhất and nhứt for 一; that nh- [ɲ] is from *ʔj-.)

The Japanese transcription of 'eight' as <etari> was probably [jetari] in Japanese. <e>/[je] could have approximated *yʌ, *yə, *ye, or *e. If my theory of harmony above is correct, the target word might have been *(y)etʌr or *yətʌr prior to the second wave of harmonization. I assume that <ta> is for *tʌ with a nonhigh vowel, though I cannot rule out the possibility that it represented *tɯ with a high vowel. (Note how the high vowel *i of 'seven' was transcribed in Japanese with a low vowel kana <ta> as well as a high vowel kana <ti>.) NO 'CLEAR' EVIDENCE FOR MY 'NEW' PROPOSAL ABOUT HMONG-MIEN MEDIALS

The quotes around 'new' aren't sarcastic; my proposal really is new. But my attempt to apply it to the Proto-Hmong-Mien word for 'new' was not convincing.

Unfortunately, nor is this attempt to apply it to Proto-Hmong-Mien (PHM) *ntshji̯əŋ 'fresh' (Ratliff 2010: 75) which is a Chinese loan with a 'double medial'* like *tshji̯əŋ 'new'. (The scope of "like" in the preceding sentence is conveniently ambiguous, as it is not clear whether 'new' is also a Chinese loan.)

In Baxter & Sagart's (2014: 356) Old Chinese (OC) reconstruction, 清 'clear' is *tsheŋ without any medial *-j- or *-i-. Why would this have been borrowed into Proto-Hmong-Mien as *ntshji̯əŋ?

The initial nasal (which is still in Hmongic languages today) is not an insurmountable problem, as it may have been from a prefix *N- in an OC dialect that fused with *tsh-, resulting in the voiced dz- or dʑ(ʰ)- in a few Gan forms. (10.22.10:22: Sagart 2002 suggested a possible Hmong-Mien substratum in Gan, so perhaps the OC dialect with *N- was ancestral to Gan.)

However, I would have expected OC *-eŋ to have been borrowed as PHM *-eŋ, not *-ji̯əŋ. Of course OC could not have been uniform, so perhaps there was a dialect in which *e broke to *jə or *iə before *-ŋ. It would be nice if I had a list of PHM words with *-jəŋ-type sequences corresponding to Baxter and Sagart's OC *-eŋ, but I don't. Yet. Going back to the single word I have on hand, dialectal OC *-jəŋ or *-iəŋ could have been borrowed as PHM -jəŋ or -i̯əŋ with a single medial. Why a double medial?

Here are three four unsatisfying explanations:

1. No preinitial

In Late OC, *e broke to *ie after nonemphatic initials like *tsh-. Then the initial *tsh- could have palatalized before *i, and PHM *-ji̯əŋ would reflect both the palatalization of the Chinese initial and the diphthong of the Chinese rhyme. (There is no PHM *-i̯eŋ, though there is an *-i̯en.) The trouble is that the palatalization of *tsh- is quite recent in modern Chinese languages, and there is no independent evidence for it at the Old Chinese level. And I am still reluctant to accept reconstructions with a distinction between *-j-, *-i̯-, and *-ji̯-.

2. PHM prefix

If I am right about PHM having two layers of medials, a Late OC *N-tshieŋ could have been borrowed into PHM as *ntshjəŋ with a primary yod. That yod was then lost (after leaving traces elsewhere) and a new yod developed due to a late PHM prefix like *T- or *Ci-. But such a prefix is an ad hoc construct purely concocted to 'solve' this problem.

3. OC prefix *k-

The phonetic of OC 清 *tsheŋ ‘clear’ is 生 *sreŋ with *s-, not *tsh-. Why was a *tsh-syllable written with an *s-phonetic? What if 'clear' originaly had initial *kɯ-s-?

Mainstream OC: *kɯ-seŋ > *kɯ-sieng > *ksieng > *tshieng

In my OC reconstruction, consonants before low class vowels like *e became 'emphatic' unless preceded by presyllables with high vowels: e.g., *kɯ-. The high vowel conditioned the partial raising of the following mid vowel: *i ... e > *i ... ie > *ie. The *k- fused with *ts- to become *tsh- (cf. *k- as a source of aspiration in Korean and perhaps also Tangut).

OC dialect source of PHM loan: *N-kɯ-sieng > PHM *N-ki-siəŋ > *N-ki-sjiəŋ (presyllabic *i conditioned secondary *-j-) > *N-ksjiəŋ > *ntshjiəŋ

Here I have rewritten Ratliff's glide-vowel sequence *i̯ə as a diphthong *iə (like modern Vietnamese ia/iê [iə]) to avoid a double glide sequence *-ji̯-.

The troulbe is that Pulleyblank proposed that OC *ks- fused into *kʂ- rather than *tsh-. Then again, offhand I recall *kʂ- only being in formerly emphatic syllables, so perhaps *kˁsˁ- (phonetically [qˁsˁ]?) became *kʂ- whereas *ks- became *tsh-.

4. OC prefix *ni- (added 10.22.10:44)

1f one objects to *k-, I could rewrite scenario 3 above by placing *i before *N-:

Mainstream OC: *Nɯ-tsheŋ > *Nɯ-tshieng > *tshieng

OC dialect source of PHM loan: *Nɯ-tshieng > PHM *Ni-tshiəŋ > *Ni-tshjiəŋ (presyllabic *i conditioned secondary *-j-) > *ntshjiəŋ

In this scenario, a nasal is not a Gan innovation but was once present in the ancestor of all (?) Chinese languages.

PHM did not have heterorganic prenasalized obstruent initials like *mtsh-, so *ntsh- might be the result of a merger of *ntsh- with *mtsh-. I use *N- to indicate an unknown nasal which might have been *n- or *m-. Both *n- and *m- are possible preinitials in PHM (Ratliff 2010: 14).

*What I call a 'double medial' is really part of the onset followed by part of the rhyme in Ratliff's phonemic analysis. *-ji̯- looks as if it should be pronounced [jj], though I suggested other possibilities. HOW WOULD RATLIFF'S PROTO-HMONG-MIEN *-JI̯- HAVE BEEN PRONOUNCED?

In my last post, I mentioned a hypothetical Proto-Hmong-Mien medial cluster *-jw. The closest actual cluster in Ratliff's (2010) reconstruction is *-ju̯- which she analyzed as the end of an onset plus the beginning of a rhyme. She also reconstructed *-ʷi̯- which has a similar analysis: labialization of the onset plus the beginning of a rhyme. These sequences could have been pronounced [jw] and [wj], and perhaps one was [ɥ]. (Does any attested language have a phonemic distinction between /jw/ and /wj/?) But how could *-ji̯- in her *tshji̯əŋ 'new' (p. 74) have been pronounced?

At first I wanted to rewrite one of the palatal medials as a preinitial or presyllable with a high front vowel: e.g., *T-tshjəŋ or *Ci-tshjəŋ. The word vaguely resembles Old Chinese 新 *sin 'new' whose *-n might be an *-ŋ that fronted after a front vowel. What if the Proto-Hmong-Mien word were borrowed as *Ki-sjəŋ from an Old Chinese dialect with a *k-preinitial or presyllable and schwa that blocked the fronting of the; nasal coda? In later Proto-Hmong-Mien, *-jəŋ could have become something palatal but without *-j-, and *Ki-s- could have fused into *tshj- with a new *-j-. The trouble is that there is no Chinese-internal evidence for a velar preinitial. Although Matisoff did reconstruct Proto-Tibeto-Burman *g-sik 'new' with a *g- matching my *K-, that word ends in *-k, not *-ŋ which may be a Chinese innovation.

(I don't believe Sino-Tibetan has a Tibeto-Burman subgroup, but I provide Matisoff's form anyway as a rough composite of non-Chinese Sino-Tibetan words for 'new' such as the kə-ʃək-type forms in rGyalrongic languages [see #1700 in Nagano and Prins' list].) ARE PREINITIALS SOURCES OF RATLIFF'S PROTO-HMONG-MIEN MEDIALS?

I've continued to ponder the problem of /j ʷ/ and /i̯ u̯/ coexisting in Ratliff's Hmong-Mien reconstructions. Could they be reinterpreted as four different glides: e.g., /j w ɥ ɰ/? But I would rather not posit exotic glides when the data does not support them.

What if Hmong-Mien has a mixture of primary and secondary /j w/?\

- syllables with original /j w/ developed in certain ways

- preinitials *T- and *P- conditioned new /j w/ which developed in different ways

- the idea of a preinitial coronal stop conditioning medial /j/ is from Jaxontov's (1965) Old Chinese reconstruction:

- I reconstruct *P- as a pre-Tangut source of Tangut medial -w-

Let's suppose that Ratliff's Proto-Hmong-Mien /jæn/ (p. 157) and /i̯æn/ (p. 113) could be rewritten as *T-æn and *jæn:

- Early Proto-Hmong-Mien *jæn > Late Proto-Hmong-Mien *iæn > Proto-Hmongic *i and Proto-Mien *æn (e.g., 'he/she/it')

- Early Proto-Hmong-Mien *T-æn > Late Proto-Hmong-Mien *jæn > Proto-Hmongic *jæn and Proto-Mien *jæn (e.g., 'footprint/track')

10.20.13:13: The above scenario involves a chain shift. Original *j strengthened to *i, forming a diphthong with the following vowel. (Proto-Hmongic and Proto-Mienic preserve opposite halves of *iæ). Then a new *j developed from *T-, resulting in *jV sequences filling the gaps left by the earlier *jV which had become *iV.

*K- is grave like *P- and could be another source of secondary *-w-:

*K-C- > *ɣ-C- > *ɰ-C- > *Cw-

I didn't think of it at first because I reconstruct pre-Tangut *K- as a source of aspiration in Tangut rather than a source of Tangut -w-.

Ratliff (2010: 12, 14) reconstructed preinitial *N-, *n-, and *m- in Proto-Hmong-Mien. If Proto-Hmong-Mien had nasal preinitials, it might have also had oral preinitials like my *K-, *T-, and *P-. Unfortunately I do not have any evidence to support such preinitials: e.g., correspondences between Ratliff's /j/ (= my secondary *-j- from *T-) and Old Chinese preinitial *t-.

Another possibility - also unsupported - is that secondary *-j- and *-w- are from *-i- and *-u- in presyllables:

*Ci.CV > *Ci.CjV > *CjV

*Cu.CV > *Cu.CwV > *CwV

Those two scenarios are not mutually exclusive: e.g., secondary *-jw- could be from a presyllabic consonant and vowel:

*Tu.CV > *Tu.CwV > *T.CwV > *CjwV

In any case, I remain skeptical that there were two kinds of /j w/-type sounds in early Hmong-Mien, and I continue to think that the rich variety of correspondences was in part conditioned by factors other than original *j and *w. Although I lack hard evidence for presyllables, I think Proto-Hmongic */P.ɢa/ (or *Cu.ɢa?) 'to escape' and /ɢwa/ (or /ɢʷa/?) 'duck' are more likely than a proto-phonemic distinction between /ɢʷ + a/ (labiouvular + monophthong) and /ɢ + u̯a/. (uvular + monophthong). (I am ignoring tones.) Is there any attested language with a phonetic distinction between a 'weak w' and a 'strong w'? If Ratliff's Proto-Hmongic was not such a language, then I would assume /ɢʷa/ 'to escape/ and /ɢu̯a/ 'duck' were both phonetically [ɢwa] - but they developed differently, so they could not have been homophonous.

