Archives

18.11.11.23:59: NIEPODLEGŁOŚĆ

Today is the hundredth anniversary of the reestablishment of Poland.

Every November 11th is Narodowe Święto Niepodległości - 'National Holiday of Independence'.

I can account for all the morphemes in niepodległość 'independence' except one:

nie- 'not'

-pod- 'under, sub-' (cf. be subordinate, the translation of the verbs podlec < *-g-tei [impf.] ~ podlegać [perf.])?

-leg- 'lie'

-ość ('appended to adjectives to form names of abstract concepts'; in this case the adjective is podległy 'subordinate')

What is -ł-? Does it form adjectives from verbs? Is it related to the past suffix in podległ 'was subordinate (m. sg.)'?

11.12.2:43: Added derivation of -c < Proto-Balto-Slavic *-g-tei.

Interslavic leg-ti 'to lie down' (impf.) retains the -g- of the root and is more transparent than its Polish equivalent lec.

Do legti and lec have a perfective equivalent? Wiktionary says lec is perfective and lists no imperfective counterpart. I'm confused.

I could mechanically render niepodległość 'independence' into Interslavic as †nepodleglost, but the actual equivalents are

nezaležnost with za- instead of pod-, -g- > -ž-, and -n- instead of -l- (cf. Polish niezależność)

nezavisnost < zavisny 'dependent' (cf. Polish niezawisły 'independent', again with -ł- after the root)

samostojnost < samo- 'self' + stoj 'stand' (cf. Ukrainian самостійність with і < *o)

The third word is reminiscent of the pan-East Asian word 獨立 'alone-stand' for 'independence': Mandarin duli, Japanese dokuritsu, Korean tongnip, and Vietnamese độc lập.


18.7.7.14:15: THEY KANTU BE COGNATES II: A BIG MISTAKE IN TANGUT ETYMOLOGY

Middle Chinese (MC) *d- has two main Early Old Chinese (EOC) sources: *d- and *l- before the 'lower' vowels¹ *a/e/o. (I list all other sources here.²)

In theory, MC 大 *da̤(j) 'big' < EOC *lats (last seen in my last post) could have had *d- or *l- in EOC, and in fact, the word has been reconstructed in Old Chinese with both initials (*d- by Schuessler 2009 and *lˁ- [= my *l-] by Baxter and Sagart 2014). Two pieces of evidence point toward *l-:

- an alternate spelling as 世 *Hɯ-lap-s (Baxter and Sagart 2014: 109; they reconstruct *l̥ap-s)
*H- indicates a consonant that conditions aspiration or devoicing: *Hɯ-l- > *l̥-.

The use of 世 could indicate that 'big' was really *laps or that 世 was chosen to write *lats after *-ps merged with *-ts. Unfortunately there are no known cognates that could point to *-t or *-p.

- the 古丈 Guzhang subvariety of the 瓦鄉 Waxiang variety that preserves *l- in 'type A' syllables with 'lower(ed)' vowels has /lu 22/ 'big' with /l-/ (Baxter and Sagart 2014: 109).

I am guessing /u/ is from *-as and not *-ats.

For some time I thought EOC *lats (or *laps?) was cognate to Tangut

𘜶
4456 2leq3 'big'

and would have correlated the pre-Tangut *S- which conditioned -q (tenseness) with an aspirating prefix *H- that made *lats into 太 *H-lats > MC *tʰa̤j 'great'.

Now I see that a proto-Sino-Tibetan *l-word for 'big' based on those Chinese and Tangut words is impossible. The rhymes do not match. Converting Jacques' pre-Tangut reconstructions into my system, I posit three possible sources for *2leq3:

1. *Sɯ.leH

2. *Sɯ.leŋH

3. *Sɯ.laŋH

Lining up the components of the EOC and pre-Tangut forms:

EOC
*H (= *s?)
(*V?)
*l
*a
*t or *p
*s
Pre-Tangut
*S
*l
*a or *e
or *ŋ *H
Match

(✓?)

✓ or ✕


In theory, EOC 太 'great' could have been *sɯ-lats with a high presyllabic vowel that was lost before it could trigger partial vowel raising in the main syllable.

The presyllables might match, but there is no way pre-Tangut *-e, *-eŋ, or *-aŋ can be reconciled with EOC *-ats or *-aps. So I can only regard the pre-Tangut and EOC words for 'big' as lookalikes.

¹EOC had two sets of vowels:

quality
palatal
neutral
labial
stress
+
-
+
higher
*i


*u
lower
*e
*a

*o

This happens to be identical to the higher/lower eight-vowel system I reconstruct for Early Korean apart from the inclusion of stress which is irrelevant to Korean phonology. Higher/lower systems are a trait of northeast Asian languages (EOC, Mongolic, Tungusic, Korean³, and possibly Tangut - but not Tibetan to the west, Burmese to the south, or Japanese across the sea to the east).

and are cover symbols for 'unknown unstressed higher vowel' and 'unknown unstressed lower vowel'. They are based on the Korean higher and lower minimal vowels which really were and *ʌ. It seems that EOC had *i as an unstressed higher vowel at an early point, and it is possible that the unstressed subsystem was triangular: *a/i/u. If that was the case, *u has left no traces of its labiality on the following syllable, whereas *i has left traces of its palatality, and *a = has triggered partial vowel lowering and pharyngealization.

EOC *d- and *l- before the 'higher' vowels *ə/i/u have palatalized Middle Chinese reflexes *d- and *j-:

時 EOC *də > MC *dʑɨ 'time'

慎 EOC *dins > MC *dʑi̤n 'careful'

受 EOC *duʔ > MC *dʑṵ 'to receive'

怡 EOC *lə > MC *jɨ 'cheerful'

引 EOC *linʔ > MC *jḭn 'to draw a bow'

誘 EOC *luʔ > MC *jṵ 'to lead, influence'

The last two examples might have had EOC *ɟ- (me), *j- (Schuessler), or *z- (Karlgren), but let's go with a currently mainstream *l- for now.

There is no such palatalization before the 'lower' vowels (unless a higher-vowel presyllable preceded during the period of height harmony; see below).

²All other sources of MC *d- (converted from Baxter and Sagart's 2014 reconstruction):

1. EOC *nasal preinitial or *nasal-ʌ-presyllable + *t-

奠 EOC *N-ten-s > MC *de̤n 'to be fixed (v.i.)'

奠 EOC *m-ten-s > MC *de̤n 'to set forth (v.t.)'

突 EOC *mʌ-tʰut > MC *dot 'to burst through'

毒 EOC *mʌ-duk > MC *dok 'to poison'

with a longer version of the volitional prefix in 奠 EOC *m-ten-s 'to set forth'

2. EOC *Cʌ.d/l-

道 EOC *kʌ.luʔ > *kʌ.lʌuʔ > MC da̰w 'way'

cf. Proto-Hmong-Mien *kləuʔ 'way', a borrowing from Chinese

3. EOC *Cɯ.d/l- + *a/e/o > *C.d/l- + *a/e/o

The presyllabic higher vowel was lost by the period of height harmony, so it could not trigger partial raising of the vowel of the main syllable.

4. EOC *mV- + *r- > Early Middle Old Chinese (MOC) *mr- > Late MOC *d-

There were two waves of *mV.r- simplification: this one (1) and a later one (2):

Stage \ Simplification wave
1
2
1. EOC
*mV.r-
*mV.r-
2. Early MOC
*mr-
3. Late MOC
*d-
*mr-
4. MC
*m-

Examples of the two waves:

Wave 1: 逮 EOC *mʌ.rəp-s > Early MOC *mrʌəts > Late LOC *dəts > MC *də̤j  'to reach to'

Like Schuessler, I would normally prefer to reconstruct *l- instead of *r-, but for the moment I want to make the Baxter-Sagart *r- work within my system. For the logic behind *r-, see Baxter and Sagart (2014: 133-134).

Wave 2: 埋 EOC *mʌ.rə > Late MOC *mrʌə > MC *mɛj  'to bury'

I could treat cases like

萏 EOC *CV-romʔ  > MC *də̤m, second syllable of MC 菡萏 *ɣə̤mdə̤m 'lotus flower'

cf. Baxter and Sagart 2014's *rˁomʔ which should normally become MC *la̰m, not *da̰m!

as examples of wave 1 with presyllabic *m-, though once again I would prefer to reconstruct *l- instead of *r-.

Baxter and Sagart (2014: 134) posit different developments in different dialects instead of two waves within the same language.

5. EOC *N.r- + *a/e/o

 蕩 EOC *N.raŋʔ > MC *da̰ŋ 'to beat furiously (heart)'

Again, I would normally prefer to reconstruct *l- instead of *r-, but for the moment I want to make the Baxter-Sagart *r- work within my system. This word is not in Schuessler (2009), but it would have *l- in that book's system.

³I hesitate to say 'Koreanic' since I do not know if non-Korean Koreanic languages also had higher/lower vowel systems. Korean height vowel harmony seems to be an internal innovation dating long after EOC; it may be due to contact with Jurchen to the north.


18.7.1.23:59: THEY KANTU BE COGNATES

While looking for cognates of Vietnamese trai ~ giai 'boy' outside Vietic for my last post, I discovered a Kantu noncognate ʔandrus 'male, man' (L-Thongkum 2001¹). If a layman saw these three words and were asked to pick the one word not related to the other two, they'd choose nara-:

Kantu ʔandrus 'man'

Ancient Greek andrós 'man (genitive singular)'

Sanskrit náras 'man (nominative singular)'

But of course the last two are from Proto-Indo-European *ʕnḗr 'man'. The Ancient Greek nominative singular anḗr is almost identical apart from the epenthetic -a-.

The direct Sanskrit cognate of anḗr is nā́ 'man'. The loss of *ʕ- and the shift of *ḗ to ā́ are regular; the loss of *-r is not² (compare with PIE *dʰwṓr > Skt dvār 'door' which retains *-r but has a different irregularity - d- instead of dh-).

Sanskrit nár-a- is an extended version of the same word with an -a- suffix.

The Kantu word has a compressed variant ndrus. Kantu is a Katu dialect; other Katu varieties and Souei have

- shifted -s to -jh

(cf. Old Chinese *-ts, *-ps > Late Old Chinese *-s > Early Middle Chinese *-jh)³

- lost the nasal

- or added a prefix

assuming that they derive from Sidwell's (2005) Proto-Katuic *ʔndruːs 'male, man':

Katu (Triw) ʔandruːjh 'male, man'

Katu (Phuong) trus ~ padrɨjh 'boy, man'

why two different rhymes? different dialects?

Katu (An Diem) padruːjh 'boy, man'

Souei kantruah 'male, man'

¹Why isn't this word visible when I view L-Thongkum (2001) using the "build custom dictionary" option in the SEAlang Mon-Khmer database?

²This loss is regular for -stems like nr̥ for nā́, but it is not regular for Sanskrit as a whole (hence the *-r-retention in 'door').

³18.7.6.22:21: An even more relevant parallel is in Vietnamese:

*-s > *-ɕ > *-jh > hỏi/ngã tone (depending on voicing of the *onset) + /j/ as in mũi 'nose'

Thavung mús 'nose' (Premsirat 2000) retains the original *-s. Ruc muːʃ  'nose' (Phu 1998) is like my intermediate stage *-ɕ between *-s and *-jh; another Ruc form, muᵊh (Phu 1998), has no final palatal segment.

In Chinese, primary and secondary *-s generally had two different reflexes which are like those of Vietnamese *-h and *-s:

Early Old Chinese
*-s
*-ks
*-ts
*-ps
Middle Old Chinese *-s
*-ts
Late Old Chinese
*-h
*-s
Middle Chinese
'departing tone'
*-j + 'departing tone'

The general pattern of mergers (four categories into two) is clear, but the phonetic details are not: e.g., perhaps *-ks became *-x and merged with *-h from *-s at the Middle Old Chinese stage:

Early Old Chinese
*-s
*-ks
*-ts
*-ps
*-h
*-x
Middle Old Chinese *-h
*-ts
Late Old Chinese
*-s
Middle Chinese
'departing tone'
*-j + 'departing tone'

The two scenarios above need not be mutually exclusive, as they - and others I have not yet imagined - could represent what happened in different varieties of Old Chinese.

Late Old Chinese secondary *-s may have been phonetically *[ɕ], a simplification of *[tɕ] < *[ts] < *[ts] and *[ps]. I am unaware of any evidence for reconstructing an affricate as a source of Vietnamese *-s.

The high-frequency Early Old Chinese word  大 *lats 'big' has modern reflexes with and without [j]: e.g., Taiwanese tuā < *las and tāi < *lats. (The macron represents a Taiwanese 'departing tone' that developed after *voiced initials.) In the case of Taiwanese, tuā is native and tāi is borrowed, but I do not know if such an explanation can account for the presence or absence of [j] elsewhere.

I am not aware of evidence pointing toward a dialect in which all *-ts (and *-ps?) merged with *-s and *-ks as *-s, though there is no a priori reason for doubting that such a massive merger could happen.

High-frequency words may be subject to greater erosion, so perhaps *lats had an abbreviated variant *las that became the ancestor of standard Mandarin (as opposed to standard Mandarin dài < *lats.)


18.6.30.21:05: A CHRONOLOGY OF ANDROGRAPHY

Looking at the many Chữ Nôm spellings of Vietnamese trai ~ giai < *plaːj 'boy' at nomfoundation.org made me realize how useful it would be to have a Chữ Nôm dictionary with characters organized by chronology and geography. Even without any dated manuscripts on hand, I can make guesses about the ages of spellings by the sound changes that they reflect. (See my last post on the history of Vietnamese *pl-.) The spellings fall into five strata listed below in approximate chronological order. (The dating of the fifth category is uncertain.)

1. *p-l-spellings reflecting *plaːj

2. *l-spellings which may reflect *plaːj or *tlaːj (i.e., postdate the shift of *pl- to *tl-)

3. a *t-l-spelling which postdates the shift of *pl- to *tl- and the shift of *s- to *t-

4. a tr-spelling which postdate the merger of *tl- and *ʈ- as tr-

5. gi-spellings which postdate the shift of *kj- to gi-; they may reflect *CV.pl-, or, more likely, an alternate (dialectal?) development of *pl- as gi-.

Stratum
Graph Semantic Phonetic 1 Phonetic 2
1

𪩭
none ba < *p- lai
⿱巴來
2

𤳆
nam 'man'
lai none
𤳇
3
⿱司來 none
< *s-
lai
4

trai < *ʈ- none
5

giai < *kj-
𪟦
nam 'man'
giai distorted as 隹 chuy¹
⿰男皆 giai < *kj-

All of the above are listed under the reading trai at nomfoundation.org (including 佳 despite the fact that the Sino-Vietnamese reading of 佳 is giai) except for the last two which are listed under the reading giai.

There is no graphic evidence for *CV.plaːj which I thought might be the source of giay: i.e., there are no characters with a phonetic component corresponding to my hypothetical presyllable. In theory, a sesquisyllable like *kV.plaːj could have been written as a combination of *kɤː and *paːj characters: e.g., ⿰居拜.

There is also no comparative evidence within Vietic for *CV.plaːj as opposed to *plaːj. I found these forms outside Vietic at the Mon-Khmer Comparative Dictionary:

Katuic branch:

Katu (An Diem) mblɑːj 'unmarried man' (Costello 1971)

Palaungic branch:

Lawa (Mae Sariang) [kuan] mblia, mbluai 'young man' (Shorto 2006)

Is the [pi] in Lawa (Bo Luang) [pi]-plia 'young man' (Shorto 2006) reduplicative?

Old Mon blāy 'young man' (Shorto 1971) is the same word, albeit without any element before bl-.

Shorto (2006) reconstructed a Mon-looking Proto-Mon-Khmer *blaːj 'young man' which I am tempted to revise as *m.blaːj on the basis of Katu and Lawa. But perhaps there is another explanation for the shared m- in Katu and Lawa. It may be significant that Sidwell (2005, 2010) did not reconstruct this word in Proto-Katuic or Proto-Palaungic.

But I doubt Vietnamese trai ~ giai is related to those non-Vietic words because it does not go back to *b- ... -j; its tone points to voiceless *p-, and its -i is from *-l which is still preserved in more conservative Vietic languages (look up 'man' at the Mon-Khmer Comparative Dictionary for examples).

There is no Chữ Nôm evidence for *-l in the spellings of 'boy' or any other words, which leads me to think that the change of *-l to *-j was complete before Vietnamese was first written. Otherwise I would expect *-l words to be at least sporadically spelled with *-w, *-t and/or *-n phonetics. (Chinese had long ago lost its *-l by the time Vietnamese was first written, so the only options for writing Vietnamese *-l were *-j, *-w, *-t, or *-n.) *-w seems unlikely since *-l must have been palatal [ʎ] or palatalized [lʲ] before becoming *-j. I cannot yet dismiss the possibility that *-l was consistently written with *-j phonetics. If Chữ Nôm reflects some feature lost before *-l became *-j, then I would have no choice but to assume that was the case.

7.1.21:52: Wiktionary has the following etymology:

From Proto-Vietic *p-laːl, from Proto-Mon-Khmer *bplaaj, an infixed form of *blaaj (“young man”); cognate with Mon ဗၠဲာ (plai, “bachelor, unmarried male past age of puberty”).

*bp- explains the *voiceless tone and even fits my hypothesis of gi- coming in part from *CV.pl-: I could even claim that Lawa (Bo Luang) [pi]-plia preserves the original presyllable. So maybe:

*bil- > *bi-p-l- (infixation; *voiceless tone on stressed syllable) > *bi.βj- > *CV.ʑ- > gi-

But the problem of Vietic having *-l instead of *-j remains, unless one reconstructs a Proto-Austroasiatic *-ʎ to account for the mismatch in codas.

Proto-Austroasiatic
Proto-Monic
Proto-Vietic
*-l
*-l
*-l
* *-j
*-j
*-j

I would rather not reconstruct *-ʎ to save a single troubled etymology, though. And I think it's still too early to reconstruct Proto-Austroasiatic.

As for the presyllable, it is unnecessary if gi- is regarded as a dialectal development:

The sound change from *p-l and *b-l to ‹gi› is a regular sound change in Northern Vietnamese; compare giồng, giầu, giời and giun.

trai would then have to be a southern loanword in northern Vietnamese coexisting alongside native giai.

¹隹 chuy 'bird' makes no sense from a semantic or phonetic perspective. At first I thought it might be an abbreviation of 雄 hùng 'male' in 𪟦, but it's more likely to be a reshaping of the phonetic 佳 giai. 隹 is a frequent right-hand element of Chinese characters whereas 佳 is not a right-hand element in Chinese and is not common in that position in Chữ Nôm. (I don't know of any other Chữ Nôm characters with the structure ⿰X佳.)


18.6.29.14:54: FRUITS OF THE SKY

I thought Pittayaporn's Proto-Tai cluster *ɓl- was unusual until I remembered that Middle Vietnamese had /ɓl/ in words such as blái /ɓláːj/ 'fruit' whose Chữ Nôm spelling 𢁑¹ is from 巴 ba + 賴 lại. That /ɓl/ is not very old; it goes back to Proto-Vietic *pl-, and some Vietic languages preserve the original *p- to this day: e.g., Ruc pəlíː 'fruit' (Phu 1998). Vietnamese shifted *p- to b- /ɓ/ both before vowels and before *l-.² The modern Vietnamese word for 'fruit' is trái (northern /cáːj/, central-southern /ʈáːj/). nomfoundation.org lists a variant lái - is this current, and if so, where?

Another source of Middle Vietnamese /ɓl/ is *bl-: e.g., Middle Vietnamese blời 'sky' (the huyền tone written with a grave accent points to a voiced proto-initial). Cf. Ruc pləːj < *b- 'sky' (Phu 1998).

As one can see at the Wiktionary entry for blời, its modern reflexes are trời and giời. tr- is the initial I'd expect since *Cl- clusters normally merge as tr-. gi-, on the other hand, has long puzzled me since it is normally from presyllable-palatal sequences and *kj-

It just occurred to me that trời is from 'bare' *blơi whereas giời may be from *CV.blơi with a presyllable whose vowel conditioned the lenition of the following consonant which became gi- (possibly a voiced palatal stop [ɟ] in Middle Vietnamese).  In Hanoi, retroflex tr- /ʈ/ < *Cl- became palatal ch- /c/. So perhaps *CV.bl- similarly became gi- which was palatal [ɟ] in Middle Vietnamese:

*CV.bl- > *CV.βj- > *CV.ʑ- > gi-

or if a prefix were added to *b- after it devoiced to *p- and imploded to *ɓ-:

*bl- > *pl- > *ɓl-CV-ɓl- > *CV-ɓj- > *CV- > gi-

(*ʄ is a palatal implosive stop. Cf. the palatal stop pronunciation [ɟ] of gi- in Vinh, Thanh Chương, and Hà Tinh.)

Similar changes could be proposed for *(CV.)pl-:

*CV.pl- > *CV.bl- > *CV.βj- > *CV.ʑ- > gi-

*pl- > *ɓl-CV-ɓl- > *CV-ɓj- > *CV- > gi-

*(CV.)pl-words would have upper register tones (ngang, sắc, hỏi) conditioned by *voiceless initials, whereas  *(CV.)bl-words would have lower register tones (huyền, nặng, ngã) conditioned by *voiced initials: e.g.,

*CV.pla, *CV.plaʔ, *CV.plah > gia, giá, giả 

but

*CV.bla, *CV.blaʔ, *CV.blah > già, giạ, giã

There are two major problems with those accounts.

First, I don't know of any comparative evidence for a presyllable in 'sky'. If there were such a presyllable, it might have to be a prefix that was a Vietnamese-internal innovation. Maybe the development of *pl-/*bl- to gi- paralleled that of *kj- to gi-:

*pl- > *pj- > bj- > gi-

*kj- > *gj- > gi-

Second - and this applies to my no-presyllable solution as well - I don't know of any other instance of *-l- becoming *-j-,though perhaps such a stage could bridge *Cl- and Hanoi ch- /c/:

*Cl- > *tl- > *tj- > ch- /c/

Dialects that have retroflex reflexes of *Cl- had no *-j-stage:

*Cl- > *tl- > *tr- > tr- /ʈ/

l- ~ nh- /ɲ/ variation in Vietnamese (Thompson 1987: 70) suggest that *l may have once been palatal [ʎ]: e.g., hai mươi lăm ~ northern hai mươi nhăm 'twenty-five' (but năm 'five' and mươi lăm 'fifteen', not †mươi nhăm).

I don't know of any modern Vietnamese dialects that have labial reflexes of Middle Vietnamese /ɓl/, but I know almost nothing about Vietnamese dialects. If not for Middle Vietnamese or more conservative Vietic languages, I wouldn't be able to reconstruct a labial initial in 'fruit' or 'sky'.

The reflexes³ of Middle Vietnamese /ɓl/ -

Spelling
Hanoi
Vinh, Thanh Chương, Hà Tinh
Huế, Saigon
gi-
[z]
[ɟ]⁴
[j]
tr-
[tɕ]
[tʂ]

(IPA from the Wiktionary entries for giời and trời.)

- are quite different from those of Proto-Tai: /bl bj b mj m ɗ d l n/. In fact none match!

(Mostly written 18.6.18; revised, expanded, and finished 18.6.29.)

¹At nomfoundation.org, there are other spellings, all with two components:

Graph Semantic Phonetic 1 Phonetic 2
⿰來巴 none lai ba
⿱巴乃 quả 'fruit'̉ ba nãi
𣛤 lai none
𣡙 lại
𣡚
𧀞 lại

The use of an n-phonetic 乃 nãi for a syllable with /l/ makes me wonder if *pn- merged with *pl-. After such a merger, one might spell *pn- (now *pl-) words with both n- and l-phonetics, and one might also spell *pl-words such as 'fruit' with both types of phonetics.

²But not *-r-: *pr- in 'squirrel' became s- [ʂ] (sóc), not †br- [ɓr]. Khmer កំប្រុក <kaṁpruk> 'squirrel' retains the original cluster (Gage 1985: 506). This could imply that

*pr- > *pr̥- > *pʂ - >  s- [ʂ]

predated *p- to b- [ɓ]: i.e., that *p- in *pr- was lost before it could implode to †ɓ-. Then again, those are a lot of steps, and perhaps *pr- was also subject to implosion:

*pr- > *ɓr- > *ɓʐ- > *ʐ- > s- [ʂ].

³Strictly speaking, gi- is not a reflex of /ɓl/ if I am right about it being from *CV.bl- rather than from /ɓl/ < *pl-/*bl-.

⁴[ɟ] may be the closest living approximation of the Middle Vietnamese consonant that de Rhodes wrote as gi- in the 17th century. Cf. the Italian pronunciation of gi- as [dʒ].


18.6.27.9:57: RETROFLEXES FROM DENTALS IN ZHENGZHANG AND PAN'S OLD CHINESE RECONSTRUCTIONS

Middle Chinese has a series of retroflex initials

*ʈ-
*ʈʰ *ɖ- *ɳ-
*tʂ- *tʂʰ- *dʐ-
*ʂ-
*ʐ-

which in the West are thought to come from Old Chinese *r-clusters. Contrast these two words in Baxter's Old Chinese reconstruction and their Middle Chinese reflexes in my reconstruction:

專 OC *ton > MC *tɕwɨen 'exclusively'

傳 OC *tron-s > MC *ʈwɨe̤n 'what is transmitted'

OC *t- palatalized to MC *tɕ-, whereas OC *tr- became retroflex *ʈ-.

I would reconstruct the two words as OC *Cɯ.ton and *Rɯ.ton-s with high-vowel presyllables triggering the diphthongization of the following vowel:

Stage 1
*Cɯ.Con presyllable present
no effect of presyllable on following syllable
Stage 2
*Cɯ.Cuon highness of presyllabic vowel transferred onto beginning of next vowel
Stage 3
*Cwɨan presyllable lost
labiality of *uo shifted to onset; *ɨa is *uo stripped of its labiality
Stage 4
*Cwɨen *a fronted to *e before the acute coda *-n

*Rɯ- fuses with the following consonant: *Rɯ.t- > *rt- > *tr- > *ʈ-..

But the basic pattern remains: *t- palatalizes, whereas a *t- plus *R- sequence results in *ʈ-.

On the other hand, Zhengzhang Shangfang and Pan Wuyun reconstructed 'what is transmitted' without *r:

Sinograph
Gloss
Old Chinese
Middle Chinese
Baxter & Sagart
This site
Zhengzhang
Pan
exclusively *ton
*Cɯ.ton *tjon
*tjon
*tɕwɨen
what is transmitted *tron-s
*Rɯ.ton-s *tons
*tons
*ʈwɨe̤n

The palatalization of *tj- to *tɕ- makes perfect sense. But why would *t- back to *ʈ- in 'what is transmitted' and other 'type B' syllables with short vowels in Zhengzhang and Pan's reconstructions? (Their short vowels correspond to the absence of pharyngealization in Baxter and Sagart's reconstruction and the presence of high vowels in my reconstruction.)

Sinograph
Gloss
Old Chinese
Middle Chinese
Baxter & Sagart
This site
Zhengzhang
Pan

pig
*tra
*Rɯ.ta *ta
*k-la
*ʈɨə

center
*truŋ
*truŋ *tuŋ
*tuŋ
*ʈuŋ

to ascend
*trək
*Rɯ.tək *tɯɡ
*tɯɡ
*ʈɨk

know
*tre
*Rɯ.te *ʔl'e
*k-le
*ʈɨe

tree root
*tro
*Rɯ.to *to
*to
*ʈuo

Pan's *k-l- shifting to *ʈ- is a change also found in Vietnamese.

Zhengzhang's *ʔl'- (what is *'-?) shifting to *ʈ- is a similar change.

There is no *tri in Baxter and Sagart's system or mine, so there may not be a *ti in Zhengzhang and Pan's systems.

*R could be *r- or *l-; the *R- of *Rɯ.tək 'to ascend' may have been *l- if Written Tibetan ltag-pa 'upper part' is cognate.

I don't know if Zhengzhang or Pan have a *te, but I would expect their *te to have the same Middle Chinese reflex as 'know'.

For comparison, Pan and Zhengzhang's *t- does not become retroflex in 'type A' syllables with long vowels corresponding to the presence of pharyngealization in Baxter and Sagart's reconstruction and the presence of low vowels in my reconstruction. (I use as a symbol for 'unknown unstressed lower vowel.)

Sinograph
Gloss
Old Chinese
Middle Chinese
Baxter & Sagart
This site
Zhengzhang
Pan

capital city
*tˁa
*ta *taː
*k-laː
*to

western tribes
*tˁij
*Cʌ.ti *tiːl
*tiːl
*tej

island
*tˁuʔ
*Cʌ.tuʔ *tuːwː
*tɯːwʔ
*ta̰w

virtue
*tˁәk
*Cʌ.tək *tɯːg
*tɯːg
*tək

son of principal wife
*tˁek *tek
*teːg
*teːg
*tek

helmet
*tˁo *to
*toː
*toː
*təw

Summing up the patterns (and adding type A syllables with Middle Chinese retroflexes for completeness):

Syllable type
Old Chinese
Middle Chinese
Baxter & Sagart
This site
Zhengzhang
Pan
B
*t-
*ti, *Cɯ.tA
*tjV
*tjV
*tɕ-
*tr-
*RtI, Rɯ.tA, *trI
*tV
*tV
*ʈ-
A
*tˁr- *RtA, *Rʌ.tI, *trA
*rtVː
*rtVː
*tˁ-
*tA, *Cʌ.tI *tVː
*tVː
*t-

I use *I to symbolize the stressed higher vowel series *ə *i *u and *A to symbolize the stressed lower vowel series *a *e *o.

Zhengzhang and Pan do allow *tr-type clusters to become retroflexes. But why would simple *t-initials also become retroflexes before short vowels? I have never seen that change anywhere else, and

My guess is that Zhengzhang and Pan both observed the high frequency of retroflex initials in Middle Chinese and chose to reconstruct single dentals as their sources. Phonostatistics could be suggestive. If, for instance, the proportion of *ʈ- to *tɕ- in Middle Chinese B-type syllables were three to one, then it might make sense to reconstruct their Old Chinese sources as *t- and *tj- instead of as *tr- and *t-, since simple initials are normally more common than clusters. (Whether short vowels make sense as a conditioning factor for retroflexion is another matter.)


18.6.22.14:55: PITTAYAPORN'S PROTO-TAI CONSONANTS (PART 1: *ɓl-)

I can't even remember how many series I've started and never finished this year alone. Yes, I have a short attention span. I also have excessive ambition. I keep picking - or stumbling on - extremely complex topics that I can't tackle in a day. What I think are bite-size pieces just keep growing. Let's see how small I can keep this.

Pittayawat Pittayaporn's (2009: 149) reconstruction of Proto-Tai has only one cluster with an implosive: *ɓl-. That initially struck me as unusual because I am accustomed to Southeast Asian languages with complex onsets but no clusters with implosives as first elements: e.g., Pyu, Mon, and Khmer. But then I remembered that Middle Vietnamese had bl- /ɓl/.

I don't know of any modern Vietnamese dialects that have labial reflexes of Middle Vietnamese /ɓl/, but I know almost nothing about Vietnamese dialects. If not for Middle Vietnamese, I wouldn't be able to reconstruct such a reflex.

On the other hand, there are labial reflexes in modern Tai languages that make the reconstruction of Proto-Tai *ɓl- possible. Below I cite reflexes of Proto-Tai *ɓlɯən A 'moon' mostly from Pittayaporn (2009) and Hudak (2008) plus 扶绥 Fusui and Shan forms from the Austronesian Basic Vocabulary Database):

1. *ɓl-type reflexes

1a. /bl-/: Saek /bliən A1/ 'moon' (only Saek retains *-l-)

1b. /bj-/: (Bao Yen /bjɔːk DL1/ < Proto-Tai *ɓloːk D 'flower')

1c. /mj-/ (mentioned as a reflex in Pittayaporn 2009: 150; unable to find examples)

cf. Vietnamese *ɓ- > m- (but the reverse may have occurred in Pyu!)

2. *ɓ-type reflexes

2a. /b-/: Shangsi /bun A1/

cf. Shangsi /boy A1/ < Proto-Tai *ɓaɰ A 'leaf'

is Shangsi /oy/ [oj] or [oɥ]?

2b. /m-/: Fusui /mɯːn A1/

cf. Fusui /mɯj A1/ < Proto-Tai *ɓaɰ A 'leaf'

(Shan /mɔk DL1/ < Proto-Tai *ɓloːk D 'flower'; see 3c below for the Shan word for 'moon')

3. *ɗ-type reflexes (*ɗ- combines the implosion of *ɓ- with the place of articulation of *-l-)

3a. /ɗ-/: Wuming /ɗɯan A1/

cf. Wuming /ɗoj A1/ < Proto-Tai *ɗɤj A 'good'

3b. /d-/: Thai /dɯən A1/

cf. Thai /diː A1/ < Proto-Tai *ɗɤj A 'good'

3c. /l-/: Shan /lɤn A1/ (but cf. 'flower' in 2b above!)

cf. Shan /li A1/ [liː] < Proto-Tai *ɗɤj A 'good'

3d. /n-/: Po-ai /nɯːn A1/

cf. Po-ai /niː A1/ < Proto-Tai *ɗɤj A 'good'

cf. Vietnamese *ɗ- > n- (no evidence for the reverse in Pyu which had no /ɗ/)

I started writing about Proto-Tai *ɓlɯən A 'moon', but I'm going to move that to a post of its own. This post started on the 17th and has taken me five days to finish. I don't want peripheral material to hold it back any longer.


18.6.21.16:43: MEITEI PHONOLOGY

Meitei is an isolate within the Sino-Tibetan family. Its speakers constitute the majority of the population in the Indian state of Manipur on the border with Burma.

I originally wanted to write a post with the characters of the Meitei script reorganized for my own convenience in the standard Indic order, but KompoZer doesn't support the Meitei range of Unicode for some reason, so I'm going to write about Meitei phonology instead based on Chelliah (2016).

Consonant phonemes

native
native initial (ideophones only)
native medial
borrowed initial/medial
borrowed only?
-
-
-
-
h
-
-
k

ŋ
-
-
g

c

-
j
-
ɟ
ɟʱ
t

n
r l
s
d

p

m
w
-
b

I am not sure whether voiced aspirates appear in native words. In the Wikipedia article on the Meitei native religion of Sanamahism, some names of native deities are spelled with -dhou after u (in the recurring name element Ebudhou),  but others have -thou after -ng. That suggests native voiceless aspirates might have voiced in intervocalic position.

I do not know the origin of voiced obstruents in medial position in native words. Two scenarios with hypothetical examples:

Scenario 1

Initial devoicing but retention of voiced series in medial position:

*gaka > kaka

*kaga > kaga

Scenario 2

Medial voicing of voiceless series; development of new medial voiceless series from something else:

*kaka > kaga

*kaXa > kaka

Although Meitei borrowed voiced aspirates from Indo-Aryan, it did not borrow retroflex consonants.

I can't find a character in the Meitei script for /cʰ/. Are /c/ and /cʰ/ both written with U+ABC6 <c>?

Vowel phonemes

Meitei has a six-vowel system identical to that of Old Chinese and pre-Tangut:

i
ə
u
e
a
o

Pyu has a similar seven-vowel system with an additional distinction between front /ä/ and nonfront (back?) /a/.

It would be asking too much for Meitei vowels to precisely line up with those of Old Chinese, pre-Tangut, or Pyu. Nonetheless it is nice to see these mostly straightforward correspondences for the Meitei numerals at Omniglot.

Numeral
Meitei
Old Chinese
Pre-Tangut
Pyu
two
əni
*ni-s
*niX
/k.ni/
three
əhum
*sum
*K.sum
/n.hom/
four
məri
*s.li-s
*liX
/p.lä/ < *-e
five
ŋa
*C.ŋaʔ *P.ŋa /pə.ŋa/
six
təruk
*k.ruk
*K.truk
/t.ru(k?)/

Needless to say, numerals alone are insufficient evidence for a genetic relationship, as they can be borrowed: e.g., Proto-Tai has *saːm A 'three', *siː B 'four', *haː C 'five', and *krok D 'six' from Chinese (Pittayaporn 2009). (*soːŋ A  'two' is from Late Old Chinese 雙 *ʂɔŋ 'pair'.) What gives away the Chinese origin of the numerals are Chinese-internal innovations: e.g., the irregular lowering of *-u- in 'three' and the loss of *-l- in 'four'.


18.6.20.17:55: SHARP-HEAVY, ENTERING/DEPARTING, AND B/D

A generic model for tonogenesis in the Sinosphere involves four categories of final consonants.

Proto-final
Category names
Vietnamese
Chinese
Hmong-Mien Kra-Dai
*-Ø/*sonorant
ngang 'even' / huyền 'dark'
平 'level'
A
A
*-ʔ
sắc 'sharp' / nặng 'heavy'
上 'rising'
B
C
*-h
hỏi 'ask' / ngã 'fall'
去 'departing'
C
B
*nonglottal stops
sắc 'sharp' / nặng 'heavy' 入 'entering'
D
D

To demonstrate these categories, I will use hypothetical examples for simplicity.

Vietnamese has six categories. Each proto-category split in two depending on the voicing of the proto-initial consonant, and as I'd expect, all syllables with final stops developed the same tones (sắc/nặng).

*kaʔ > cá (sắc)

́*kak > các (sắc)

*gaʔ > cạ (nặng)

*gak > cạc (nặng)

In White Hmong, syllables with initial *voiced consonants and final *glottal stops developed the same tone as syllables with initial *voiceless (!) consonants and final *nonglottal stops  (Ratliff 2010: 184).

*gaʔ > kas

́*kak > kas

(-s represents a low tone.)

However, in other Sinospheric languages, syllables with final *-h and syllables with final *nonglottal stops may develop similar or identical tones:

1. In standard Mandarin, there is a weak tendency for syllables which once had final *nonglottal stops to have the same high falling tone as syllables which once had final *-h.

*kah > ku (high falling tone)

*kak > (high falling tone)

2. In Cantonese, syllables with final nonglottal stops have noncontour tones like syllables which once had final *-h.

*kah > kuː (mid level tone)

*kak > kɔːk (mid level tone)

3. Gedney (2008) has descriptions of the tone systems of 19 Tai varieties. In all of them, there is at least partial overlap between the tones of the B and D categories: e.g., in Thai, syllables of those categories almost always have the same tones:

*ka(ː)ʔ > kaː (low tone)

́*ka(ː)k > ka(ː)k (low tone)

*ga(ː)ʔ > kʰaː (falling tone)

*gaːk > kʰaːk (falling tone)

but *gak > kʰak (high tone!)

I suspect there was no distinction between */Vʔ/ and */Vːʔ/.

There is a strong tendency for B tones to overlap with D tones with long vowels. Conversely, D tones with short vowels (e.g., *gak in the hypothetical Thai example), tend to go their own way.

Today I realized what might have led to similar tones in the B and D categories (and their equivalents outside Tai). Contrast these two scenarios:

Scenario 1

Stage 1
*V
*Vʔ
*Vh
*Vk
Stage 2
V + tone 1
+ tone 2
V + tone 3
Vk + tone 2

Scenario 2

Stage 1
*V
*Vʔ
*Vh
*Vk
Stage 2
*V
*V̰ʔ *Vh
*Vk
Stage 3
*V
*V̰ *Vh
*Vk
Stage 4
V + tone 1
+ tone 2
V + tone 3
V(k) + tone 3

Scenario 1 is straightforward; all syllables with *stops develop the same tones. This is what happened in Vietnamese.

Scenario 2 is more complicated.

In stage 2, final *glottal stops condition *creaky voice which is nonphonemic (= predictable and hence nondistinctive).

In stage 3, final glottal stops are lost, and *creaky voice becomes phonemic (= unpredictable and hence distinctive). *V̰ no longer ends in a stop, so it loses its phonetic resemblance to *Vk which still ends in a stop.

In stage 4, modal and creaky-voiced syllables develop tones 1 and 2, whereas syllables ending in obstruents develop tone 3.

Pittayaporn (2009) posits a third scenario for Proto-Tai to which I add a top row (he does not reconstruct segmental sources for Proto-Tai tones).

Scenario 3

Stage 1
*V
*Vʔ
*Vh
*Vk
Stage 2
*V
*Vʔ *V̰ *k
Stage 3
V + tone 1
+ tone 2
V + tone 3
Vk + tone 3

What I don't understand is why *Vh conditions creakiness (a ʔ-quality)  rather than breathiness (an h-quality).

Pittayaporn's Proto-Tai (stage 2) is somewhat like modern Burmese which distinguishes between creaky vowels and vowels followed by glottal stops:

Stage 1
*V
*Vʔ
*Vh
*Vk
Stage 2
*V
*V̰ *V̤ *Vk
Stage 3
V + tone 1
  + (tone 2)
V + tone 3
(tone 4)

I could argue that Burmese really only has two tones, low (tone 1) and high (tone 3); it is creakiness and a final glottal stop that distinguish the other two syllable types.

The difference between Burmese and Pittayaporn's Proto-Tai is that creakiness in the former has a more straightforward source (*-ʔ) than in the latter (*-h).

Here is one more scenario that mixes elements of 2 and 3 via a chain shift in Proto-Tai:

Scenario 4

Stage 1
*V
*Vʔ
*Vh
*Vk
Stage 2
*V
*V̰ *Vʔ *Vk
Stage 3
*V + tone A
*V + tone C
*V̰ *k
Stage 4
V + tone A
+ tone C
V + tone B
Vk + tone D

Stage 1 has no phonemic phonation or tones.

Stage 2 has a distinction between modal and creaky phonation. The latter is from *glottal stop. A new glottal stop from *-h takes the place of that lost glottal stop:

*-Vh > *-Vʔ > *-V̰

Stage 3 continues the chain shift of stage 2:

*-Vʔ > *-V̰ > *-V + tone C

The result is a system resembling that of modern Burmese.

Stage 4 is fully tonal.

Scenario 4 cannot account for Thai in which tone C is still glottalized after (formerly) *voiced initials.


18.6.13.23:27: EARLY OLD CHINESE PRESYLLABIC *I

Baxter and Sagart (2014: 224) use the notation *A for an Old Chinese *a that has unexpected Middle Chinese reflexes with palatal elements: e.g.,

土 Old Chinese *tʰˁaʔ > Early Middle Chinese *tʰɔˀ 'earth'

is phonetic/semantic in

社 Old Chinese *m-tʰAʔ > Early Middle Chinese *dʑiæˀ 'sacrifice to the spirit of the soil'

instead of Early Middle Chinese †dʑɨəˀ without any palatal vowels

in fact, the phonetic series of 土 has no examples of Early Middle Chinese †-ɨə

Baxter and Sagart speculated that what they write as *A could actually reflect the effect of some unknown preinitial consonant on the vowel.

I propose that we are actually seeing vocalic transfer: the effect of some unknown first syllable vowel on a second syllable vowel.

That occurred to me as I was working out the development of 抯 'to pull out of water' which I originally intended to cite in the addendum to my last post until I decided to use more straightforward examples.

Early Old Chinese
*r(i)-tsa or *ts-r-a *Ci-tsa-ʔ *Ni-tsa-ʔ
Middle Old Chinese
*tsrˁa
*tsiaʔ
*N-tsiaʔ
Late Old Chinese
*tʂɤɑ *tsiæʔ
*dziæʔ
Early Middle Chinese
*tʂæ *tsiæ̰
*dziæ̰
Mandarin
/tʂa1/
/tɕjɛ3/
/tɕjɛ4/

Early Old Chinese: It is not clear whether the first variant has a prefix *r(i)- or an infix *-r-. *Ci- might be the same syllable as *r(i)-; if so, then there is a root *tsaʔ with only two prefixes, *ri- and *Ni-.

Middle Old Chinese: The *i in the first syllable has caused the following *a to break to *ia. No breaking occurred in *tsrˁa, either because its *i was lost before vocalic transfer (see the table below) or because it never had an *i (i.e., its *-r- was an infix).

Early vs. late *i-loss

Stage 1
*ri-tsa *ri-tsa-ʔ
Stage 2: early *i-loss
*r-tsa *ri-tsaʔ
Stage 3: vocalic transfer
*r-tsa *ri-tsi
Stage 4: late *i-loss
*tsrˁa *tsiaʔ

*tsrˁaʔ has pharyngealization because it lacked a high vowel that would have blocked pharyngealization.

*r- and *-ts- underwent metathesis: *r-ts- > *tsr-.

Late Old Chinese:

*tsr- fused to retroflex *tʂ-.

*N-ts- fused to voiced *dz-.

Pharyngealization was lost.

*a broke to *ɤa after retroflexes.

*a fronted to to assimilate with the preceding front vowel *i.

Early Middle Chinese:

*ɤa became *ea (to avoid *ɤ, a vowel absent elsewhere in the system) which then fused to *æ.

Final glottal stop *-ʔ became creaky voice (written with a subscript tilde).

Mandarin:

backed to /a/.

*ts palatalized to /tɕ/ before *-i-.

The diphthong *iæ became /jɛ/.

'Deeper' phonemicizations are possible: e.g., Pulleyblank's (1984: 52) /iă/ or my /jə/.

Open syllables with *voiceless initials and nonglottalized, nonbreathy vowels developed tone 1.

Syllables with *voiceless obstruent initials and *creaky voice developed tone 3.

Syllables with *voiced obstruent initials and *creaky voice developed tone 4 via assimilation:

*dziæ̰ > *dzʱiæ̰ > *dzʱiæ̤ > *tsʱiæ̤ > /tɕjɛ4/

The *voiced initial became breathy voiced, and the *creaky voice in the following vowel became *breathy voice. Then the initial devoiced and lost its breathy voice to dissimilate from the following *breathy voiced vowel. Ultimately the vowel lost the breathiness that conditioned what is now Mandarin tone 4.

6.19.12:51: Contrast the three readings of 抯 'to pull out of water' with those of other words written with the same phonetic 且: 且/祖 'ancestor', 沮 'to leak', 沮 'marsh', and 菹 'marsh':.

Word
且/祖 'ancestor' 沮 'to leak' 沮 'marsh' 菹 'marsh'
Early Old Chinese
*ts *Nɯ.tsaʔ *rɯ.tsa-s *ri.tsa *rɯ.tsa *r.tsa
Middle Old Chinese
*tsˁaʔ
*N.tsɨaʔ
*tsɨas
*tsia
*tsɨa *tsrˁa
Late Old Chinese
*tsɑʔ *dzɨaʔ
*tsɨah
*tsiæ *tsɨa *tʂɤɑ
Early Middle Chinese
*tsɔ̰ *dzɨə̰
*tsɨə̤̰ *tsiæ *tsɨə̤ *tʂæ
Mandarin
/tsu3/
/tɕy4/
/tɕjɛ1/ /tɕy1/ †/tʂu1/

Karlgren (1957: 32) regards the character 菹 as primarily representing Old Chinese *tṣi̯o 'to pickle', equivalent to my *r(ɯ).tsa.

At a pre-Early Old Chinese stage with many (mostly?) disyllabic words, there were three basic words:

*ts'ancestor'

Possibly *CV.ts if the first syllable was lost without a trace. Who knows, maybe at some earlier point *CVC or even more complex first syllables were possible.

*NV.tsaʔ 'to leak'

I use periods to indicate breaks between syllables and - in later stages - the borders between presyllables and syllables if I cannot detect a morpheme boundary. I don't know of any 'leak' word family with different prefixes before a root √tsaʔ, so I assume *NV.tsaʔ was a disyllabic root rather than √tsaʔ plus a nasal prefix.

*ri.tsa 'marsh'

I don't know of any 'marsh' word family with different prefixes before a root √tsa, so I assume *ri.tsaʔ was a disyllabic root rather than √tsa plus a nasal prefix.

In Early Old Chinese, the higher vowels of first syllables mostly merged into (my symbol for 'unknown high vowel' inspired by the Middle Korean minimal vowel ㅡ /ɯ/). *i is recoverable if it triggered vocalic transfer after acute initials, but I otherwise can't tell if was from *i, *ə, *u, or even some other vowel like *y that no longer existed in Old Chinese.

How did three words become six? 'Marsh' developed four variants:

1. Conservative: Middle Chinese *tsiæ implied by the fanqie of Jiyun (1037, after the Middle Chinese period but still based on Middle Chinese phonology) is the direct descendant of *ri.tsa.

No real Mandarin descendant. /tɕjɛ1/ is a reading generated by reading the Jiyun fanqie as Mandarin, not a naturally transmitted word. It would be convenient to have a term for this kind of artificial modern reading.

2. Depalatalized first vowel; presyllable lost after vocalic transfer: *ri.tsa > *rɯ.tsa > *tsɨa

This variant is not recorded in the phonological tradition but is reconstrucible on the basis of Mandarin /tɕy1/ unless that is a reading by analogy with 且 /tɕy1/.

Windows' IME says 沮 can also be read /tɕy1/.

3. Depalatalized first vowel and suffixed: *ri.tsa > *rɯ.tsa-s

I have no idea what the final *-s is doing.

4. First vowel lost before vocalic transfer; metathesis: *ri.tsa > *r.tsa > *tsrˁa

This variant has no standard Mandarin descendant.

The phonetic series of 且 is 'mixed' in the sense that it includes characters for three types of Early Old Chinese (sesqui)syllables:

1. *(Cʌ.)CV: e.g., 且/祖 *tsa/ʔ 'ancestor'

If there was a low-vowel presyllable *Cʌ-, it wouldn't have affected the vowel in a following *a-syllable, so its presence or absence is impossible to detect. I only reconstruct it whenever it leaves a trace.

2. *Cɯ.CV: e.g., 沮 *Nɯ.tsaʔ 'to leak', 且 *Cɯ.tsa 'many'

3. *Ci.CV: e.g., 菹 *ri.tsa 'marsh', 且 *Ci.tsʰaʔ 'moreover'

'many' and moreover' might have a common root √(Ci.)tsa with a prefix or a root presyllable that conditioned aspiration in 'moreover':

*Ci.tsa-ʔ > *Ci.tsiaʔ > *C.tsiaʔ > *tsʰiaʔ?

The vowels of the three types (low *ʌ, high front *i, and high back *ɯ) are reminiscent of the three vowels of open presyllables in Pacoh (Watson 1964: 144): /a i u/¹. Watson's examples are:

/pa.piː/ 'to converse'

/ti.noːl/ 'post (n.)'

/ku.ceːt/ 'to die'

Vietnamese chết /cét/ 'to die' has no trace of the presyllable still in Rục ku.cíːt  'to die'.

Here are Early Old Chinese words with similiar presyllables and their post-vocalic transfer Middle Old Chinese descendants:

*Cʌ.nuʔ > *nˁauʔ  'brain'

*Cʌ could be *pʌ-; cf. Proto-Austronesian *punuq 'brain'; see this post on the mismatch of the first vowels

*Ci.sak > *tsiak 'to loan, borrow'

*Ci- could be *ti-

*kɯ.dzraŋ > *k.dzrɨaŋ 'bed'

Appendix: More examples of Early Old Chinese *Ci- words

I collected these from Baxter and Sagart (2014: 223-226) and Schuessler (2009: 45, 64). Middle Old Chinese forms follow to show the aftermath of vocalic transfer.

1. 邪 *Ci.ɢa > *ɢia 'interrogative particle'

2. 者 *Ci.taʔ > *tiaʔ 'nominalizing particle'

I am not comfortable with sesquisyllabic particles in a language whose roots are typically monosyllabic. But then again, maybe in EOC, sesquisyllabic roots were the norm.

3. 奢 *si.tʰa > *stʰia   'extravagant'

but not all characters with the phonetic 者 had *i: e.g.,

*sɯ.taʔ 'to cook'

possibly cognate with Tangut

𗟞

4664 1liq1 < ?*S.ta < ??*Si.ta 'to cook'

The *i assumes cognancy with 炙; see below.

4. 車 *ki.kla > *kkia > *kʰlia 'chariot'

also *kɯ.kla > *klɨa 'id.'

cf. Proto-Indo-European *kʷékʷlo- 'wheel'

5. 寫 *Ci.saʔ > *siaʔ 'to depict'

6. 遮 *Ci.ta > *tia 'to cover'

7. 野 *mi.laʔ > *liaʔ 'open country'

also *mɯ.laʔ > *mlɨaʔ 'id.'; the Middle Chinese reading *dʑɨə̰ has an irregular *d-

*mɯ.rəʔ 'village' on the left might be phonetic; is the resemblance to Japanese mura coincidental? If the Japanese word were really related, I would expect †muro < ††-ə.

8. 昔 *Ci.sak > *siak 'in the past'

9. 柘 *Ci.taks > *tiaks 'mulberry tree'

In Proto-Min (PM), the merger of *i with in presyllables was total before *-ak; there are no traces of *i, and both *Ci.Cak and *Cɯ.Cak have become PM *Ciok (Baxter and Sagart 2014: 226):

10. 炙 *si.tak > *tiak 'to roast'

PM *tšiok

cognate to 煮 *sɯ.taʔ 'to cook'?

is the resemblance to Japanese tak- 'to burn, to cook rice' coincidental?

11. 尺 *Ci.tʰak 'foot (measure')

PM *tšhiok

12. 石 *Ci.dak or *Ni.Tak 'stone'

PM *džiok

13. 螫 *Ci.l̥ak > *l̥iak 'to sting'

PM *tšhiok

14. 𥼶 *Ci.l̥ak > *l̥iak 'to wash rice'

PM *tšhiok

15. 射 *mi.lak > *mliak 'to hit with an arrow'

PM *džiok

16. 借 *Ci.sak > *tsiak 'to loan, borrow'

PM *tsiok

*C- fused with *s- into *ts-

17. 席 *Ci.dzak > *ziak 'mat' with irregular *z
PM *dziok
cf. 藉 *Ci.dzak-s > *dziaks 'mat'

Unsolved mystery: Why did *-i- only leave traces in syllables ending in *-a, *-aʔ, *-as, and *-ak(s)?

¹But in closed sesquisyllables, only /ə/ is possible: e.g., /ɓəmɓar/ 'to divide by two' < /ɓar/ 'two'. I do not know yet whether Old Chinese had closed sesquisyllables.


18.6.12.23:43: REFLEXES OF PROTO-TAI *P.T- IN SAEK

Earlier today (in a table in an addendum I finished on 6.14) I mentioned the 'famous' Saek word for 'eye' (praː) which attracts attention because it's not like Thai taː or similar words in other Tai languages. Pittayaporn (2009: 323) reconstructs its Proto-Tai source as *p.ta which elegantly accounts for the p-, -r- (< *-t-), and t-.

That made me curious about whether Proto-Tai *p.t- always became pr- in Saek. Going through Pittayaporn's list of Proto-Tai reconstructions, I see that Proto-Tai *p.t- has two different reflexes:

1. pr- as in 'eye' (above) and pra:j 'die' (Pittayaporn 2009: 357)

2. t- as in tɤ: 'gizzard' (Pittayaporn 2009: 330)

The presyllable *p.- must have been lost in the ancestor of Saek 'gizzard'; it is reconstructible on the basis of Bao Yen pʰɤɰ whose aspiration is from *-r̥- < *-r- < *-t- (cf. Cao Bang tʰɤj with the same source of aspiration).

Pittayporn (2009: 328) reconstructs Proto-Tai *p.tak 'grasshopper' even though that word has no reflexes in Saek or Bao Yen. Does it have any reflexes with p-like initials? I think he reconstructs *p.t- on the basis of forms like Cao Bang and Shangsi tʰak which have aspiration from  *-r̥- < *-r- < *-t- (as in Bao Yen). Even without Saek or Bao Yen or anything labial, the pattern of initials in Cao Bang and Shangsi matches that of *p.t-words rather than *t-words:

Proto-Tai
Saek
Bao Yen
Cao Bang
Shangsi
Thai/Lao
*p.t-
pr-
pʰ- tʰ-
tʰ-
t-
*t-
t-
t-
t-
t-
t-

If Proto-Tai 'grasshopper' were simply *tak, the Cao Bang and Shangsi reflexes would be †tak with †t-.

6.15.10:16: Old Chinese had many words of the 'gizzard' type that had variants with and without presyllables: e.g., 扶 'to crawl'.

Early Old Chinese
*Nɯ-pʰa *pʰa
Middle Old Chinese
*N-pʰɨa
*pʰˁa
Late Old Chinese
*bua *pʰɑ
Early Middle Chinese
*buo *pʰɔ
Late Middle Chinese
*fʱu
*pʰo
Mandarin
/fu2/
/pʰu1/

At a stage even before Early Old Chinese, the word may have been *Ni-pʰa, *Nə-pʰa, or *Nu-pʰa with a high series vowel that was later reduced to in an unstressed position and ultimately lost.

In Early Old Chinese, the word had developed a variant without a presyllable. *pʰa is comparable to English 'cause, a variant of because without a presyllable be-. Presyllable loss - and other forms of reduction - are not entirely mechanically predictable. Just because because could lose its be- doesn't mean that it always did, much less that all be-words had such variation: e.g., there is no monosyllabic variant †lieve of believe.

In Middle Old Chinese, the high vowel of the presyllable conditioned the warping of *a to *ɨa. The variant without a presyllable had no high vowel and was subject to developing pharyngealization. I write pharyngealization after the initial consonant, but it was a quality of the entire syllable.

In Late Old Chinese, *N-pʰ- fused into *b-. rounded to *u after labials. Pharygealized *a backed to *ɑ. Pharyngealization disappeared after leaving its mark on the vowel.

In Early Middle Chinese, *a raised and rounded to *o after *u. *ɑ raised and rounded to *ɔ.

In Late Middle Chinese, the vowels raised further: *uo > *u, *ɔ > *o. *b- became breathy *fʱ before *u.

In Mandarin, breathiness conditioned tone 2 before being lost. Open syllables without that breathiness or any laryngeals developed tone 1. *o raised even further to /u/.

痡 'suffering' and 鋪 'to spread out' both have two variants, one with a presyllable and one without. The bare version happens to be homophonous with the monosyllabic version of 'to crawl'.

Early Old Chinese
*Cɯ-pʰa *pʰa
Middle Old Chinese
*pʰɨa
*pʰˁa
Late Old Chinese
*pʰua *pʰɑ
Early Middle Chinese
*pʰuo *pʰɔ
Late Middle Chinese
*fu
*pʰo
Mandarin
/fu1/
/pʰu1/

*pʰ-, unlike *b-, did not develop a breathy reflex in Late Middle Chinese. As a result, Late Middle Chinese *fu became Mandarin /fu1/ rather than /fu2/ with tone 2 conditioned by *breathiness.

I suspect that the sesquisyllabic (and even earlier disyllabic) versions of 痡 'sufferihg' and 鋪 'to spread out' had very different first halves: e.g., *kupʰa and *pipʰa, etc. The original first consonants are not recoverable, and all that can be said about the original first vowel was that it was nonlow; a low series vowel (*a *e *o) would not have conditioned the warping of *a to *ɨa. *ɯ is my symbol for an unknown high series vowel. So the 'homophony' of 痡 'sufferihg' and 鋪 'to spread out' is an illusion caused by my agnostic notation *Cɯ-pʰa; the two words may not have been homophonous until Middle Old Chinese.

I don't know why 鋪 'to spread out' is written with the 金 'metal' radical. The sesquisyllabic version of 'to spread out' has a more common spelling 敷 with the radicals 方 'direction' and 攵 'action with hand'¹ which make more sense. 敷 is not a spelling of the monosyllabic version *pʰa.

Schuessler (2007: 173) regards 鋪敷 'to spread out' to be cognate to 布 *pa-s 'to spread out' and 博 *pa-k 'wide'. The aspirated initial *pʰ- may be from some earlier cluster like *kp- (which is absent from Baxter and Sagart's 2014 reconstruction). Perhaps the earliest reconstructible form of 鋪 'to spread out' is *kɯ-pa. The two Middle Old Chinese forms would then both reflect the presyllable.

Stage 1: Early Old Chinese

*kɯ-pa

Stage 2: early presyllabic vowel loss
*kɯ-pa
*kpa
Stage 3: vocalic transfer
*kɯ-pɨa *kpa
Stage 4: late presyllabic vowel loss
*kpɨa *kpa
Stage 5: aspiration
*pʰɨa *pʰa
Stage 6: Middle Old Chinese
*pʰɨa *pʰˁa

In Stage 1, there is only one form of the word.

In Stage 2, the word develops a monosyllabic variant *kpa.

In Stage 3, the vowel of *kpa remains unbent since there is no presyllabic high vowel to condition the bending of *a to *ɨa.

In Stage 4, the presyllabic vowel of *kɯ-pɨa was lost.

In Stage 5, *kp- became *pʰ- - a change that probably also occurred in Middle Korean centuries later.

In Stage 6, the variant without a high vowel developed pharyngealization.

I forgot about the use of 布 *pa-s 'to spread out' to write 'cloth' (a borrowing from an Austroasiatic language: cf. Katu [Kantu dialect] kapaːs 'cotton', Kuy kpah 'cloth', and Sanskrit kārpāsa- 'cotton', also an AA borrowing) which fits my hypothesis of an earlier *k- in 'to spread out', a native word that happened to sound like 'cloth'. The *k-p-word was later reborrowed with disyllabic spellings:

幏布 *kæh-pɑh 'cotton' (c. 100 AD); is the first *-h for foreign *-r-, or was this spelling coined by someone who still had *kr- in 幏: *krɑh-pɑh?

古貝 *kɔˀ-pɑɕ 'cotton' (c. 430 AD)

See Schuessler (2007: 173) for further discussion, though he does not reconstruct *k- in the Old Chinese words for 'cloth' or 'to spread out'.

¹There is no Chinese word 攵 'action with hand'; the gloss refers to the use of 攵 *(r-)pʰok 'to beat' as a component in other characters. (The word 'to beat' is more commonly written 撲 which is not a component in other characters.)


18.6.11.23:59: DID SAEK SHIFT *Z- UNDER VIETNAMESE INFLUENCE?

Last night I stumbled upon found this passage in Pittayaporn (2009: 296):

In Saek, *z- became /j-/ merging with PT *ˀj-, probably due to influence from North-Central Vietnamese, where original *z- has become /j-/ (Alves 2007).

Northern Vietnamese has /z/ corresponding to /j/ in central and southern Vietnamese. I think Saek would be or would have been in contact with central Vietnamese. (It's not clear if there are Saek villages in Vietnam anymore.)

One might conclude that the north preserves a /z/ that became /j/ elsewhere. This would then be parallel with Saek. But I am not sure that is the case. Here are the data:

Old Vietnamese
*kj-, *-C-
*j-, *-T-
*r-, *-s-
Middle Vietnamese spelling
gi-
d-
r-
Northern Vietnamese
/z-/
Nonnorthern Vietnamese
/j-/
/r-/

By 'northern' I mean Hanoi and Vinh (the latter is north central); 'nonnorthern' refers to Huế (at the center) and Saigon. (I don't want to say 'south' because Huế is certainly not in the south.)

Capital letters stand for obstruents with unspecified voicing: e.g., *C could be voiceless *c or voiced *ɟ.

Hyphens before consonants indicate the presence of an unspecified presyllable: e.g, *-C- represents *c or voiced *ɟ. preceded by a presyllable.

Exactly what the Middle Vietnamese spellings gi- d- r- stood for is not certain. I can only say that none of those three consonants were /z-/ or /j-/. I think it's possible that gi- and d- became /j-/ without a *z-phase. But maybe Saek is evidence for such a phase.

Or is it? The /z-/ of Vietnamese postdates the 17th century and long postdates the devoicing of original *voiced obstruents (possibly by the late first millennium AD). On the other hand, Saek *z- is original. Did Saek have *z- and a full set of voiced obstruents as late as the 18th century - almost a thousand years after Vietnamese devoiced its voiced obstruents?

6.14.2:21: I don't think what I wrote above is clear. Let me try again.

Phases of Vietnamese

Vietnamese consonants can be said to have gone through five phases which I will illustrate with hypothetical examples for simplicity:

phase
presyllable
tones
lenition
devoicing
sesquisyllables
monosyllables
-voc
+voc
-voc -voc
+voc
-voc
+voc
1
+
-
-
-
*pətaː
*pədaː
*praː *taː
*daː
*saː
(*zaː)
2
+
+
-
-
*pətaː
*pədàː
*p *taː
*dàː
*saː
(*zàː)
3
+
+
+
+
́́*pədaː
*pədàː
*pʂ *taː
*tàː
*saː
*sàː
4
-
+
+
+
da
dà
sa
đa
đ
ta
t
5
-
+
+
+
/zaː/ ~ /jaː/ /zaː/ ~ /jaː/ /saː/ ~ /ʂaː/
/ɗaː/
/ɗàː/
/taː/
/tàː/

Phase 1: Early Old Vietnamese:

presyllables present

no tones

no lenition

phonemic voicing in obstruents

I am not sure Early Old Vietnamese ever had *(d)z-. It is perhaps telling that Early Middle Chinese 字 *dzɨʰ 'written character' was borrowed as ́*ɟɨːʰ (now chữ) rather than as †zɨːʰ which would have become †tữ. Later Early Middle Chinese 字 *dzɨʰ became Late Middle Chinese 字 *tsɨ̣ and was borrowed again into Vietnamese; see phase 3 below.

Phase 2: Middle Old Vietnamese:

*-r- > *-r̥- after a voiceless initial

subphonemic tones conditioned by voicing before main vowel: *voiceless > unmarked ngang tone, *voiced > grave accent for huyền tone

tones conditioned by final consonants may date between phase 1 and phase 2

Phase 3: Late Old Vietnamese:

voicing (lenition) of medial obstruents: *-t- > *-d-

*-r̥- > *-ʂ-

devoicing of voiced obstruent initials

words formerly distinguished by obstruent voicing now distinguished only by tone which had become phonemic

Late Middle Chinese 字 *tsɨ̣ 'written character' (with a devoiced initial) was borrowed as ́*sɨ̣ː (now tự). (For simplicity I use a Vietnamese tone mark even for Late Middle Chinese.)

Phase 4: Middle Vietnamese:

presyllables lost

*Cʂ- > s- /ʂ/

Drag chain *s- > *t- > /ɗ/

Italicized forms are 17th century spellings; those spellings of consonants remain in use today. đ is /ɗ/, but the phonetic value of d is uncertain. [d] is the simplest interpretation, but [dʲ] and [ð] are also possible.

Phase 5: Modern Vietnamese: different reflexes of Middle Vietnamese s and d depending on dialect. s lost retroflexion in Hanoi (but not in Vinh which has /z/ like Hanoi and unlike the nonnorth dialects; Thompson 1987: 98). The picture for d is less clear. Two scenarios:

Scenario 1. All dialects shifted d to /z/, and nonnorthern dialects shifted /z/ to /j/

Phase
North
Nonnorth
4
d
5a
*z
5b
/z/
/j/

Scenario 2. d shifted in different ways; no shared /z/-phase

Phase
North
Nonnorth
4
d
5
/z/
/j/

There is no doubt that Proto-Tai *z- became /j-/ in Saek. The question is whether that shift in Saek reflects the influence of Vietnamese given scenario 1. Let's suppose scenario 1 is true. Phase 4 is in the 17th century and phase 5b perhaps starts in the middle 19th century. (The last traces of Middle Vietnamese consonantism seem to disappear after the early 19th century.) So the Saek change would have to be dated between the 17th and 19th centuries. But if the Saek change were that recent, Saek would have had *z- - and presumably other Proto-Tai voiced obstruents such as *g *d *b- - as late as the 17th or even 18th century. That doesn't seem likely given that its neighbor Vietnamese had undergone devoicing prior to borrowing from Late Middle Chinese during phase 3 (circa the 10th century).

Phases of Saek

Saek has gone through some of the same changes as Vietnamese up to phase 3, though the details differ:

phase
presyllable
tones
lenition
devoicing
sesquisyllables
monosyllables
-voc
+voc
-voc
-voc
+voc
-voc
+voc
1
+
-
-
-
*pətaː
*pədaː
*praː *taː
*daː
*saː
*zaː
2
+
-
+
-
*pəd
*pər *pr̥aː *taː
*daː
*saː
*zaː
3
-
+
+
-
*pdaː *praː *pʰraː
*taː
*dàː
*saː
*zàː
4
-
+
+
+
pr raː pʰraː taː àː saː
jàː

Phase 1: Proto-Tai:

presyllables present (rewritten here as *Cə- instead of as *C.- as in Pittayaporn's notation)

no tones

no lenition

phonemic voicing in obstruents

Phase 2:
drag chain shift: *-t- > *-d- > *-r-; contrast with Vietnamese phase 3 in which  *-t- > *-d-; 

Phase 3:

loss of presyllabic vowels

*pər- > *pr-; *pr̥- > *pʰr-

subphonemic tones determined by initial consonant (Including presyllabic consonants unlike Vietnamese) after lenition (again, unlike Vietnamese)

To facilitate comparison with Vietnamese, I use Vietnamese tone notation: zero for tone A1 and a grave accent for tone A2.

Tones conditioned by final consonants may have developed between phase 1 and phase 3.

Phase 4:

drag chain shift: *pd- > *pr- > r-, *d- > tʰ-, *z- > j-

words formerly distinguished by initial voicing now distinguished by tone wh\ich has become phonemic

My guess is that lenition and devoicing happened independently in Vietnamese and Saek, whereas tonogenesis did not - Vietnamese phase 3 and Saek phase 3 may have been simultaneous.

Phases of Cao Bang

On 6.11, I thought Saek having *z- and other voiced consonants as late as the 18th century was improbable, but Tai languages on the Sino-Vietnamese border never underwent devoicing (PIttayaporn 2009: 110). Compare the phases of Cao Bang with those of Vietnamese and Saek:

phase
presyllable
tones
lenition
devoicing
sesquisyllables
monosyllables
-voc
+voc
-voc
-voc
+voc
-voc
+voc
1
+
-
-
-
*pətaː
*pədaː
*praː
*taː
*daː
*saː
*zaː
2
-
-
-
-
*ptaː
*pdaː *p
*taː
*daː
*saː
*zaː
3
-
+
+
-
*p *pdàː *pʂ
*taː
*dàː
*saː
*zàː
4
-
+
+
-/+
dàː pʰj
taː àː
àː

Phase 1: Proto-Tai: same as Saek phase 1

Phase 2:

loss of presyllabic vowels

*-r- > *-r̥- after a voiceless initial (as in Vietnamese and Saek)

Phase 3:

Chain shift: *pt- > *pr̥-*pʂ-

subphonemic tones determined by voicing of consonant before vowel (contrast with Saek)

To facilitate comparison with Vietnamese, I use Vietnamese tone notation: zero for tone A1 and a grave accent for tone A2.

Tones conditioned by final consonants may have developed between phase 1 and phase 3.

Phase 4:

*pr̥- > *tr̥- > tʰ-

elimination of *voiceless-voiced clusters and chain shift: *pd- > *d- > dʱ-

*pʂ- > *pɕ- > pʰj-

*z- > *s- > tʰ-; *z- devoiced but this seems to be an anomaly; see my 6.13 entry; the fortition is reminiscent of Vietnamese (see Phan 2013 for examples of *s- > /tʰ/ in Vietnamese: eg., *sit > thịt 'meat'¹) but probably occurred independently much later. Phan (2013: 65) regards fortition of fricatives as "common in Southeast Asia and should not be considered a shared innovation."

tone A2 still strongly associated with voiced initials but has become phonemic due to the devoicing of *z-

Finally, for reference:

Phases of Thai/Lao

Thai and Lao never underwent lenition; medial *-t- and *-d- remain as stops today.

phase
presyllable
tones
lenition
devoicing
sesquisyllables
monosyllables
-voc
+voc
-voc
-voc
+voc
-voc
+voc
1
+
-
-
-
*pətaː
*pədaː
*praː
*taː
*daː
*saː
*zaː
2
-
-
-
-
*ptaː
*pdaː *p
*taː
*daː
*saː
*zaː
3
-
+
-
-
* *ɗ *taː
*dàː
*saː
*zàː
4
-
+
-
+
taː
d pʰaː
taː àː saː sàː

Phase 1: Proto-Tai: same as Saek and Cao Bang phase 1

Phase 2:

loss of presyllabic vowels

*-r- > *-r̥- after a voiceless initial (as in Vietnamese, Saek, and Cao Bang)

Phase 3: More or less represented by Thai and Lao spelling (but Lao has no <z>; *z- corresponds to ຊ <j>)

reduction of *pC- to *t- and *ɗ- (not *d-!); was there an intermediate geminate stage *tt- and *dd-?

*-r̥- > -ʰ-

subphonemic tones determined by initial consonant (Including former presyllabic consonants unlike Vietnamese)

To facilitate comparison with Vietnamese, I use Vietnamese tone notation: zero for tone A1 and a grave accent for tone A2.

Tones conditioned by final consonants may have developed between phase 1 and phase 3.

Phase 4

drag chain shift: *ɗ- > d- > *tʰ-

words formerly distinguished by initial voicing now distinguished by tone wh\ich has become phonemic

the Vietnamese notation, though convenient, is misleading, as tones A1 and A2 have undergone splits and, in Thai, a merger.

The development of tones A1 and A2 in Thai and Lao

Stage 3 subphonemic tone
A1
A2
Stage 3 initials
*pʰ-, *s-
*ɗ-, t-
*d-, *z-
Stage 4: Thai tones
rising
mid
Stage 4: Vientiane Lao tones
rising
low
high

All of the phases above are my speculations built upon the work of Gage ("Vietnamese in Mon-Khmer Perspective", 1985) and Pittayaporn (2009). The relative chronology is only approximate; some but not all changes could be reordered with the same final results.

¹The nặng tone written with a subscript dot normally indicates a *voiced initial. It is tempting to reconstruct a change *z- > /tʰ/ as in Cao Bang. But support for *z- in native words is weak. The tone may reflect a lost voiced prefix.


18.6.10.23:59: EMPHATIC SAND

Tonight I found the section on the Middle Korean emphatic particle za at random in Lee and Ramsey (2011: 194). The earliest attestations of it I can find in Old Korean are in two 鄉歌 hyangga

毛等居叱

*motʌn kəs sa

'all thing EMPH'

- 慕竹旨郎歌 (c. 700)

一等

*hʌtʌn sa

'one EMPH'

- 禱千手觀音歌 (c. mid-8th century)
where it is spelled phonetically with Middle Chinese 沙 *ʂæ 'sand'.

It occurred to me that the 'sand' spelling of that particle¹ obviously must predate the lenition of *s to Middle Korean z.

If a *z-pronunciation had existed in Old Korean, it could have been spelled with Middle Chinese

嵯嵳𣩈㽨瘥𥰭䑘艖蒫醝䰈鹺䴾齹虘蔖䠡䣜躦𪘓 *dza

or 邪䓉耶椰瑘𥯘鎁釾𦭿𦰳斜䔑擨 *ziæ².

(There was no Middle Chinese syllable *za. This gap is not accidental. I should look into it.)

It turns out that 邪 'evil' is attested as a phonogram in Old Korean hyangga, but 俞昌均 Yu Chhang-gyun (1994: 76) interprets it as a symbol for *ra (cf. its possible Old Chinese reading *la in Schuessler 2009: 56). There have been many attempts to reconstruct the pronunciation of Old Korean. Has anyone interpreted 邪 as *sa (possibly tempted by its modern Sino-Korean reading sa) or *za? I don't have any other sets of hyangga readings on hand. Another thing to look into when I get the chance.

¹6.11.21:29: It never occurred to me to use Unicode superscript numerals for endnotes until now. No more long strings of asterisks.

It's theoretically possible that the 'sand' spelling in this text postdates the 8th century, as these poems survive in 三國遺事 Samguk yusa (1285) whose earliest surviving copy is from 1512. Even if these poems are actually from c. 700 AD, their spellings could have been altered in the centuries between then and 1512.However, I know of no other evidence pointing toward some other Ur-spelling of the emphatic particle. The 口訣 kugyŏl phonogram for *sa ~ *za is 氵 which is almost certainly an abbreviation of 沙 'sand', the most common sa-character with the left-hand component 氵 'water'. Kugyŏl manuscripts from the Koryŏ dynasty (918-1392) predate 1512; one need not worry about potential errors in their transmission.

²6.11.23:44: Nearly all of these characters are rare and therefore not likely candidates for phonograms which tended to be high-frequency characters. So one might argue that the Old Korean particle was *za but not written as such because there was no high-frequency characters with a similar reading other than 邪 *ziæ 'evil' which was already being used for *ra if Yu (1994) is correct. However, if *s had already lenited to *z in Old Korean, I would expect to see other phonogram spellings unambiguously reflecting lenition. But I know of none offhand. Although one might argue that *s lenited before other consonants, that possibility could only be confirmed if there were *(d)z-spellings of later z-words. No such spellings seem to exist.

The only *(d)z-phonogram in Yu's (1994: 75-78) catalog of phonograms in hyangga are the aforementioned 邪 *ziæ 'evil' and

齊 Middle Chinese *dzej 'equal' : Yu's Old Korean *tsjə (my *tse)

which, like 邪 *ziæ 'evil', does not represent an Old Korean syllable corresponding to a Middle Korean z-syllable. So if Old Korean already had *z-syllables, they were not written with Chinese *(d)z-characters and cannot be detected.

I could argue that in fact the dialect of Chinese known to educated Old Koreans had shifted *(d)z- to *(t)sʱ- (as in Pulleyblank's Late Middle Chinese reconstruction), so the characters above wouldn't have been appropriate for an Old Korean *za.

That Chinese dialect had a reflex of Middle Chinese *ɲ- that corresponds to z in Middle Korean Sino-Korean readings. But there was no Middle Korean Sino-Korean reading †za. So it seems Old Koreans had no good options for writing *za if they had such a syllable - and I still don't think they did.

(The questions of what that Chinese dialect's reflex of *ɲ- was and how it was borrowed into Old Korean - as *z- or as something else that became z- in Middle Korean - remain open. The simplest solution is to assume that Chinese dialect had something like the *ž- of Liao Chinese. This was borrowed into Old Korean as *z-, a consonant originally only in borrowings. Later, Middle Korean lenited *s in native words, resulting in a new /z/ that shared the fate of the old borrowed one: both /z/ soon disappeared from the Seoul dialect. [But does any Korean dialect today have a trace of /z/ in Sino-Korean words?)


18.6.9.23:59: THE PHONETIC VALUE OF MIDDLE KOREAN DOUBLE ZERO

In the earliest hangul texts from the 15th century, there were three circular letters.

ㅇ <Ø> : ㆁ <ŋ> : ㆀ <ØØ>

In modern hangul, ㅇ <Ø> has come to represent zero in initial position and /ŋ/ in coda position: e.g., 앙 <ØaØ> /aŋ/. Although ㅇ may appear with a short vertical line on top like ㆁ <ŋ> in some fonts, that line no longer distinguishes ㆁ <ŋ> from ㅇ <Ø>; the reading of ㅇ /ㆁ is now wholly dependent on its position within a syllabic block.

ㅇ <Ø> had two uses in the earliest hangul orthography for Late Middle Korean in the 15th century. it could represent initial /Ø/ as in the modern language and - unlike the modern language - also represented /ɣ/ in four environments:

1. between /r/ and a vowel

2. between /z/ and a vowel

3. between /j/ and a vowel

4. between /i/ and a vowel

This /ɣ/ has disappeared in the modern standard language, though traces remain in dialects: e.g., 15th century 몰애 <morØai> /morɣaj/ 'sand' corresponds to Pukchhŏng molgɛ with -g- (cf. standard morɛ).

What was ㆀ <ØØ>? Lee and Ramsey (2011: 146) regard it as another spelling of Late Middle Korean /ɣ/. But why would two letters be devised for the same sound at the very beginning of a script? A clue may lie in the limited distribution of ㆀ <ØØ> which was solely used to write forms of the passive/causative suffix ᅇᅵ<ØØi> - and in one instance, the causative suffix ᅇᅮ <ØØu> (月印釋譜 Wŏrin sŏkpo 14:14) - after /j/. If the first suffix were simply /ɣi/, why not spell it as 이 <Øi> which is the spelling after /l z/? (I don't know of any instances of that suffix after /i/. The second suffix is otherwise spelled <Øu> = /ɣu/ after /l z j/.)

Yesterday afternoon it occurred to me that ㆀ <ØØ> might represent a palatal allophone [ʝ] of /ɣ/. This allophone may have been geminated [ʝʝ] if it was like /ss/ and /hh/ which were written as double consonants ㅆ ㆅ <ss hh>. There is even one case of /nn/ as ㅥ <nn> in 訓民正音諺解 Hunmin chŏngŭm ŏnhae.

There is, however, no guarantee that a double consonant necessarily represented a geminate, as ㅆ ㆅ <ss hh> could also represent /z ɦ/ in the prescriptive transcription of Sino-Korean readings. (Native /z/ had a different letter ㅿ <z>. It might be more accurate to regard the artificial voiced consonants of Sino-Korean readings as breathy voiced: e.g., Sino-Korean ㅆ <ss> was /zʱ/ or /sʱ/ and therefore distinct from ㅿ /z/.) Doubled ㄲ ㄸ ㅃ ㅉ <kk tt pp cc> could only represent /g d b dz/ in that transcription in the earliest hangul texts; their use for reinforced consonants came later.

Moreover, the circle was used to derive consonant characters for nongeminates: e.g., /β/ was written as ㅸ. So ㆀ <ØØ> could be interpreted as 'derivative of circle' for [ʝ] rather as than 'double circle' for [ʝʝ] (or geminate zero which would make no sense).

One problem with this proposal is that it cannot easily account for the one instance of ㆀ <ØØ> in the causative suffix ᅇᅮ <ØØu>. It is understandable that /ɣ/ would palatalize to [ʝ] between /j/ and /i/ in, for instance, ᄆᆡᅇᅵ<mʌi.ØØi> /mʌjɣi/ [mʌjʝi] 'to be bound to', the passive stem of /mʌj/ 'to bind'. It is slightly less understandable why /ɣ/ would palatalize to [ʝ] between /j/ and /w/ in  뮈ᅇᅯ <mui.ØØuə> /mujɣwə/ 'moving'. (/ɣw/ is an allomorph of /ɣu/ before vowel-initial suffixes like /ə/ '-ing', called the 'infinitive' [though it is not like an Indo-European infinitive].)

Perhaps 뮈ᅇᅯ <mui.ØØuə> reflects a pronunciation [mujʝɥə] in which the palatal quality of /j/ spread into the following consonants. That pronunciation might even have been common, though for most purposes a phonemic spelling 뮈워 <mui.ØØuə> for /mujɣwə/ might have sufficed instead of a more precise phonetic spelling 뮈ᅇᅯ <mui.ØØuə>. I don't know if the spelling 뮈워 <mui.ØØuə> is attested, but 月印千江之曲 Wŏrin ch'ŏn'gang chi kok 62 has the spelling 뮈우 <mui.Øu> /mujɣu/ for the stem.


18.6.8.16:40: FRGÁL

Slavic languages normally only have [f] in loanwords and as a positional variant of /v/ (which is why Russian names in -v have variant spellings in -ff).

As far as I know (thanks to Short 1993), Czech initial [f] can only appear

- in onomatopoetic words (e.g., foukat 'to blow')

- as a positional variant of v before voiceless consonants (e.g., vsadit 'to bet', pronounced [fsadit])

- in loanwords from non-Slavic languages (e.g., .fonetický 'phonetic')

So what is the source of the f in the dish called frgál? That f- is before a voiced syllabic r and is not a variant of v-. Is it onomatopoetic or from a foreign language - perhaps Romanian, given that frgál is from Moravian Wallachia? That region isn't continguous with modern Romania, but it was settled by Vlachs.


18.6.6.23:50: SHIMUNEK (2017) AND DOWNES (2018)

Last night, I found the addenda and corrigenda to Andrew Shimunek's Languages of Ancient Southern Mongolia and North China (2017). I thought that would be as close as I'd get to having his book which I can't afford at $116.76 until I saw an online sampler.

It's remarkable that three books on Khitan have appeared in English within a decade - the other two being Daniel Kane's The Kitan Language and Script (2009) and Wu Yingzhe and Juha Janhunen's New Materials on the Khitan Small Script: A Critical Edition of Xiao Dilu and Yelü Xiangwen (2010 - just a year after Kane's book!).

Can a new book on Jurchen be far behind? It has been almost thirty years since Kane's The Sino-Jurchen Vocabulary of Interpreters (1989) which despite its title is a general gateway to Jurchen language studies as well as complementing Kiyose Gisaburō's A Study of the Jurchen Language and Script - Reconstruction and Decipherment (1977) which covered the Sino-Jurchen vocabulary of the Bureau of Translators.

Not long after Imre Galambos' Translating Chinese Tradition and Teaching Tangut Culture: Manuscripts and Printed Books from Khara-Khoto (2015) comes Alan Downes' PhD dissertation "How Does Tangut Work?" (submitted 2016, revised 2018), a follow-up to his BA honors thesis "The Xixia Writing System" (2008) - and his tangut.info website which links to mine.

Alas, I haven't written about Tangut - much less Khitan or Jurchen - in a long time. If I may rephrase Downes' question, I have been trying to come up with the answer to "How Does Pyu Work?" It's coming in a series of articles and a book.

These are exciting times for the study of extinct Asian languages.


18.5.13.23:59: YAT AND ETA

Today I realized that my interpretation of the early Slavic vowel yat as [ɛː] (< *ai) sounded like the classical value of the Greek letter Η eta. Since Cyrillic is an offshoot of the Greek alphabet, one might expect yat to have been written with an eta-based Cyrillic letter. But of course eta was actually the model for the Cyrillic letter И <I> because eta had raised to [i] by the 4th century AD, long before Cyrillic was created in the late 9th century. [ɛː] was long gone in Greek, so a non-Greek letter was created for yat: Ѣ.

Ѣ looks like a derivative of the front yer letter Ь [ɪ] which in turn looks like a derivative of the Glagolitic front yer letter Ⱐ. But it is strange that a lower mid long vowel was written with a modified lower high short vowel rather than, say, with an additional stroke (like Czech ě which is nowadays used to transliterate yat). I don't see any resemblance between Ѣ and its Glagolitic counterpart Ⱑ.

5.14.23:24: According to Wikipedia, Schenker (1995) thought Ⱑ might be from Greek alpha Α. That makes a lot of sense if yat were [æ].

Modern reflexes of yat vary considerably in height from [ja] with a low vowel in eastern Bulgarian* to [i] in Ukrainian.

*Eastern Bulgarian has two reflexes of yat: [ja] and [ɛ]. The former is in stressed syllables not followed by front vowels. The latter occurs elsewhere.


18.5.12.23:59: PROTO-CELTIC VOICED ASPIRATES?

I've seen this Proto-Celtic word list before, but I didn't notice voiced aspirates in it until now:

*mori-steigh-(e/o-) 'sea'

*men-n-dh-e/o- (?) 'want'*

*ati-od-bher-to- (?) 'sacrifice'

Are those pre-Proto-Celtic forms? I thought Proto-Celtic lost aspiration in voiced consonants:

Proto-Indo-European *gh *dh *bh > Proto-Celtic *g *d *b

*5.14.0:42: This reminds me of Avestan mazdā- 'wisdom' < *mn̥s-dheʔ 'mind-place', though the first root is in the e-grade in Celtic.


18.5.11.23:59: CHU AND KRA-DAI (PART 2)

Here's my attempt to reconstruct the Old Chinese (OC) phonetic series of 楚 (Schuessler 2009 series 1-62, Karlgren 1957's series 88 plus 90) to make it fit Chamberlain's (2016) hypothesis from part 1.

The series has five types of Early Middle Chinese (EMC) readings (ignoring final consonants):

I. *sɨə-*Cɯ-sa- (*kɯ-sa-?) (胥湑稰諝糈壻婿)

II. *ʂɨə- < *kɯ-sa- (疋疏蔬梳糈)

III. *tʂʰɨə- < *kʂʰɨa- < *kɯ-sa- (楚 only)

IV. *ŋæ- < *ŋgʐa- < *N-k-sa- (alternate reading of 疋 only)

V. *sej < *se (alternate reading of 壻婿 only)

The high-vowel presyllables of types I-III conditioned medial *-ɨ- which in turn conditioned the raising of *a to *ʂɨə.

The high-vowel presyllables of type I was lost after conditioning medial *-ɨ-, but they fused with *s in types II and type III. *kɯ-s- that fused early became EMC *ʂ- via *kʂ-; *kɯ-s- that fused late became EMC *tʂʰɨə- via *kʂʰ-.

Type III *kʂʰɨaʔ might have approximated an early Kra-Dai *kraʔ, especially if it were phonetically something like [kʁaʔ].

(5.12.0:56: Or if 'Kra' were [kʐaʔ]. Cf. Polish krz [kʂ] from *kʐ- < *krʲ-. Pittayaporn 2009: 99 reconstructed *ks- as a Proto-Tai source of Proto-Southwestern Tai [and hence Siamese] *kʰr-, though he does not list any examples of Proto-Tai *ks-, and he reconstructed the Proto-Tai cognate of 'Kra' as *kraː C 'slave' with *kr- rather than *ks-. Siamese kʰaː C1 'slave' lacks the -r- that would point to medial *-s-. If *ks- became Siamese kʰr-, perhaps *kz- became *kr- and then Siamese kʰ-.)

*N- fused with *k- to form the *ŋ- of type IV.

(5.12.0:11: OC *a fronted to after retroflexes.)

The *-e rhyme of type V is anomalous and unique to 壻~婿 'son-in-law'; it cannot be reconciled with the *-a rhyme of the other types.

5.12.1:03: Added all examples of each type listed in (Schuessler 2009: 59) plus 疋 as the sole example of type IV which was not listed in Schuessler.


18.5.10.23:59: CHU AND KRA-DAI (PART 1)

Chamberlain (2016) proposed that the name of the state now known as 楚 Chǔ in Mandarin is the same name as Kra as in Kra-Dai. This is an ingenious idea. But does it really work?

The rhymes certainly match. 楚 ended in *-aʔ in Old Chinese, and 'Kra' in Proto-Kra-Dai was something like *kraʔ (cf. Ostapirat's Proto-Kra *kra C 'Kra' and Pittayaporn's Proto-Tai *kraː C 'slave'; I interpret the C tone category as *-ʔ like Norquest 2016).

The trouble is the initial. If 楚 had initial *kr- in Old Chinese, it would have become Early Middle Chinese †kæʔ and Mandarin †jiǎ. But instead it became Early Middle Chinese †*tʂʰɨəʔ and Mandarin chǔ [tʂʰu] with aspirated retroflex initials.

Can those initials be reconciled?

Pulleyblank (1962: 129) proposed that Old Chinese *skʰ- might have become Early Middle Chinese *tʂʰ-. Later, Pulleyblank (1965: 206) proposed Old Chinese *kʰs- as a source of Early Middle Chinese *tʂʰ-. But there is no *s in Proto-Kra-Dai *kraʔ. *s- is likely to have been in the Old Chinese reading of 楚 since nearly all readings of characters in the 疋 phonetic series began with *ʂ- or *s- in Early Middle Chinese. There is no evidence on the Chinese side directly pointing to *k- in 楚 or any other member of the 疋 phonetic series, though 疋 does have another Early Middle Chinese reading *ŋæʔ which could mechanically be derived from an Old Chinese *ŋraʔ - close to *kraʔ but with a velar nasal rather than a stop.

Next: How can I make Chamberlain's idea work?

5.11.11:56: Added reference to Pulleyblank (1965) and link to Pulleyblank (1962).


18.5.9.23:59: ARMENIAN, KOREAN, AND BURMESE APPROACHES TO KHITAN OBSTRUENTS

In my last entry, I wrote,

the Khitan transcribed Liao Chinese *t as both <t> and <d>

There are similar inconsistencies with other obstruents and to a lesser extent even in the spelling of native Khitan words: e.g., 'second' is spelled with both 162 <c> and 104 <dz> (Kane 2009: 115).

I originally thought that Liao Chinese and Khitan had different obstruent systems: e.g., LC had an unaspirated : aspirated distinction whereas Khitan had a voicing distinction. But that wouldn't explain the inconsistency in Khitan native words.

Today it occurred to me that Khitan might have had Armenian-style variation:

The major phonetic difference between dialects is in the reflexes of Classical Armenian voice-onset time. The seven dialect types have the following correspondences, illustrated with the t–d series:

Correspondence in initial position

Indo-European *d
*dʰ *t
Sebastia
d
Erevan t
Istanbul d
Kharpert, Middle Armenian d
t
Malatya, SWA
d
Classical Armenian, Agulis, SEA t
d
Van, Artsakh t

But of course Khitan had only two obstruent series, not three.

Might the use of certain spellings correlate with certain locations and/or time periods? They would then reflect the obstruent series of different regional/chronological varieties of Khitan. The unspoken assumption of Khitan studies is that the language was homogeneous over a wide area for a long period, but that is unlikely.

Another possibility is that Khitan was like modern Korean in which unaspirated obstruents have voiced and voiceless allophones conditioned by different environments: Sino-Korean 德 /tək/ appears as

[dək] after a sonorant

[tək] elsewhere

Could 254.020 <d.ei> ~ 247.020 <t.ei> transcribing Liao Chinese 德 (Kane 2009: 253) have had a similar distribution?

A final possibility is that Khitan was like Burmese in which etymological voiceless consonants may be voiced in close juncture. Wheatley (2009: 729) explains that in Burmese,

[c]lose juncture is characteristic of certain grammatical environments [...] But within compounds the degree of juncture between syllables is unpredictable; the constituents of disyllabic compound nouns (other than recent loanwords) tend to be closely linked, but compound verbs vary, some with open, some with close juncture.

The above possibilities are not mutually exclusive for Khitan.


18.5.8.23:54: THE KHITAN EMPEROR SHENGZONG IN UNICODE

Today I discovered that lookalikes for all four Khitan large script characters for 聖宗皇帝 'Emperor Shengzong' (r.  979-1031) exist in Unicode:

𫝢伋皇帝

Of course it's only the first two characters that are interesting; they are unknown to nearly all literate in Chinese. The last two are identical to Chinese 皇帝 'emperor'.

'Emperor Shengzong' exemplifies how the Khitan large script to a Chinese eye is a mix of familiar and alien elements. The first two characters combine famliar elements

夕 'evening' + 卞 'hat' = 𫝢

亻 'person' + 及 'to reach' = 伋

in unfamiliar ways.

𫝢 turns out to be a variant of 升 'to rise', which in turn was a homophone of 聖 *šiŋ 'sage' in Liao Chinese aside from its tone. 𫝢/升 and 聖 were not homophones until the late first millennium AD, so the use of 𫝢 for 'sage' may date from the Liao dynasty and is probably not a carryover from the pre-Liao Parhae script hypothesized by Janhunen. Why didn't the Khitan simply recycle 聖 'sage' the way they recycled 皇帝 'emperor'? Was 聖 'sage' too complex for the Khitan large script which favored a low number of strokes per character?

In Chinese, 伋 is a name character of no known meaning. (It is the birth name of Confucius' grandson 子思 Zisi.) It would have been pronounced *ki in Liao Chinese and not 宗 *tsuŋ like 'ancestor'. So the reasoning for 伋 as 'ancestor' is unclear (though at least the 亻 'person' radical makes sense). Might a Khitan or even a Parhae word for 'ancestor' have sounded something like *ki?

(5.9.9:39, revised 14:16: Was 伋 a semantic compound invented by someone who might not have known about the rare character 伋? But I know of no semantic compounds unique to the Khitan large script. The closest instance I can think of is


'heaven'

which consists of 天 'heaven' over 土 'earth'. It is not a true semantic compound because it does not represent a word for 'heaven and earth' or 'world' (the sum of 'heaven and earth'); 土 'earth' seems to disambiguate an unknown Khitan word for 'heaven' from 天 for <tên>, a borrowing from Liao Chinese. The semantic function, if any, of 及 'to reach' in 伋 'ancestor' is less clear.

The Dictionary of Chinese Character Variants has no 伋-like variants of 宗. What I will call Janhunen's Question remains unanswered: If the Khitan wanted a script to distinguish themselves from the Chinese, why did they keep or replace characters seemingly at random? I still think the only possible answer is that they didn't do that - rather, they adapted a sister script of Chinese [Janhunen's hypothetical Parhae script]. The situation is somewhat parallel to that of Cyrillic which is related to the Latin alphabet but not derived from it; they are 'cousins', not 'daughter' and 'mother'.)

Although the shapes of 皇帝 'emperor' are uninteresting, the question of how we know their readings is worth examining. Kane (2009) reads them as <hoŋ di> (= <ghong di> in the transcription system on this site).

However, I have not found any Khitan small script phonetic spelling of the first half of 皇帝 'emperor' or any of its homophones in Chinese. I would expect such a spelling to be 340.071 <h.ong> with voiceless 340 <h> rather than voiced-initial 076 <gho>. (There is no known small script character <gh> without a vowel, and devoiced to *x in Liao Chinese.) No spelling <h.ong> is in Qidan xiaozi yanjiu (1985: 460). Has such a spelling been found in the thirty-plus years since the publication of that book?

Kane (2009: 244) lists 247.339.339 <t.i.i> as a small script spelling of the second half of 皇帝 'emperor'. Unfortunately, he does not cite a source for this spelling, and it is not in Qidan xiaozi yanjiu (1985: 375). I presume <t.i.i> is from an inscription discovered after Qidan xiaozi yanjiu was written. The <t> of <t.i.i> does not necessarily invalidate Kane's reading di for 帝 since the Khitan transcribed Liao Chinese *t as both <t> and <d>, and they transcribed Liao Chinese *i as both <i> and <i.i>.

5.9.0:33: Why is the name character 伋 glossed in English as 'deceptive' at zdic.net?

5.9.0:49: Kane (2009: 181) also lists a second Khitan large script character ⿰歹卞 for 聖 'sage' with 歹 'bad' on the left instead of  夕 'evening' from Liu and Wang (2004: 27, character 150). That character has no Unicode lookalike; it is character 0177 in N4631 ("Proposal on Encoding Khitan Large Script in UCS") which does not seem to list 𫝢 from Kane (2009: 183). Where is 𫝢 attested? Regardless of whether 𫝢 is an error for ⿰歹卞 and hence not a real Khitan large script character, I have no doubt that  ⿰歹卞 is a variant of the Chinese character 𫝢 and is a phonetic loan for  聖 'sage'.

I also think that 𫝢 / ⿰歹卞 <shing> may have been the inspiration for the vaguely similar Tangut character

𗼃

2shen3 'sage'

whose Tangraphic Sea analysis has been lost.

5.9.22:31: Are Khitan large script characters

1054 (升 + a dot on the right)

1056 (1054 with the first stroke 丿 stretching over both vertical strokes of 廾 plus a dot on the right)

in N4631 further variants of 𫝢 / ⿰歹卞 <shing>?

5.10.1:49: Chinggeltei's  關於契丹文字的特點 (1997: 110) includes 𫝢  in its list of Khitan large script characters.


18.5.7.23:59: OBLIQUE AFFRICATES IN CHINESE

Today on Wikipedia I saw that standard Mandarin 斜 xie [ɕjɛ] 'oblique' corresponded to Lower Yangtze Mandarin

colloquial [tɕia]

literary [tɕiɪ]

with affricate initials. The colloquial reading preserves an earlier -a going all the way back to Old Chinese; the literary reading has an innovative raised vowel [ɪ].

The dictionary Middle Chinese initial is *z-. Other dialects of Middle Chinese might have had *dz-. In any case, the Old Chinese word began with *sɯ-, though what was between that *sɯ- and *-a is not clear: *sɯ.ɢa, *sɯ.ja, and *sɯ.la are all possible. There is no known external comparison that could narrow down the possibilities. The character 斜 has the phonetic 余 *Cɯ.la, but the character 斜 dates from Han times, and at that point *ɢ, *j, and *l might have already merged into *j. (邪 'slant' - a homophone of 斜 in Middle Chinese - may be a pre-Han spelling of the same word. But its phonetic 牙 has a velar nasal initial *ŋ-!)

My hypothetical Middle Chinese *dz- might be from *sɯ.ɢ- > *s.ɢ- > *zɢ- > *zd- > *dz-. But it's more likely that it results from a Late Old Chinese or Middle Chinese confusion of *z- with *dz-. Japanese merged *z- and *dz- into /z/ which is now [dz] initially, [z] medially, and [ddz] when geminated.

Xiaoxuetang reports affricate initials in 斜 in

Mandarin: 天長 Tianchang [tsʰ] (the sole Mandarin example on the site)

Wu: 丹陽 Danyang [dʑiɑ] ~ [dʑiɒ], etc.

(Hui: no data; NB: this 徽 Hui is not the Mandarin-speaking Muslim 回 Hui, whose name is pronounced with a different tone)

Gan: 湖口 Hukou [dʑia], etc.

Xiang: 雙峰 Shuangfeng [dʑio], etc.

Min: 廈門 Amoy [tsʰia] (colloquial; literary [sia]), etc.

Yue: Cantonese [tsʰɛ] (where long ago I first observed this affricate initial corresponding to Middle Chinese *z-; I didn't know such an initial was in Mandarin too)

Ping: 永福 Yongfu [tsʰiə], etc.

Hakka: 梅縣 Meixian [tsʰia] (colloquial; literary [sia]), etc.

The affricate initial is represented in nearly every branch. No Jin variety on that website has an affricate reading. But all but one of the unclassified varieties has an affricate initial.

It seems that literary varieties of Middle Chinese kept *z- (> modern [s]) apart from *dz- while colloquial varieties merged them to various extents.

5.8.13:40: For comparison, let's see if the above dialects also have affricates for Middle Chinese 徐 *zɨə 'to walk slowly; a surname':

Mandarin: 天長 Tianchang [tʃʰʮ], etc.

Wu: 丹陽 Danyang [dʑyz] (sic), etc.

Hui: 旌德 Jingde [tsʰʮ], etc.

Gan: 湖口 Hukou [dzi], etc.

Xiang: 雙峰 Shuangfeng [dy] (sic) ~ [dʑy], etc.

Min: 廈門 Amoy [tsʰi] (colloquial; literary [su]), etc.

Yue: Cantonese [tsʰœy], etc.

Ping: 永福 Yongfu [tsʰy], etc.

Hakka: 梅縣 Meixian [tsʰi], etc.

The only Jin variety with a reading is the most well-known: 太原 Taiyuan [ɕy]. 徐 is a common surname, so it must be in other Jin varieties. The absence of affricates in Jin readings of 斜 'oblique' makes me guess that 徐 also lacks affricates in the rest of Jin, but I don't know.

The unclassified varieties have a mix of initials: e.g.,

富川 Fuchuan [sy]

鍾山 Zhongshan [θy]

賀州 Hezhou [ty] (cf. the stop [d] in Shuangfeng above)

道縣 Daoxian [tso]

連州 Lianzhou [tsʰɛu]

To work out what's going on with them would require studies of their individual phonologies. It is a shame that Xiaoxuetang doesn't seem to have initial, rhyme, and tonal inventories online for each variety. In theory I could extract inventories from the data, but I don't have the time to do that right now.


18.5.6.23:40: HAVE A ČĪZBURGERU: ENGLISH BORROWINGS IN LATVIAN

After mentioning Latvian datums last time with its combination of a Latin neuter suffix -um and a Latvian masculine suffix -s, I was curious to see how Baltic languages dealt with a recent influx of English loans. Baltic languages and Greek are the only modern Indo-European languages I know of that still retain ancient -s suffixes in the nominative case.

I guessed that all Latvian borrowings of English consonant-final stems would be placed in the first masculine declension like datums. And it does seem that is generally the case. See these two lists. Even sibilant-final stems are assigned to that declension: e.g., bizness (which is biznes-s and not copying the -ss of the English spelling) and finišs (< finish + -s). I might have expected them to be assigned to the second declension with -is or the third declension with -us.

The exceptions I've seen so far end in -er in English:

adapteris < adapter

menedžeris < manager

peidžeris < pager

porteris < porter

taimeris < timer

Were they assigned to the second declension by analogy with some earlier wave of -eris loans?

Not all English -er words become -eris words in Latvian: cheeseburger has become čīzburgers (with an un-English pronunciation of burger with [u] - †čīzberger would have been closer to the English original). Maybe -burger is by analogy with hamburgers, perhaps in turn influenced by Russian <gamburger>, also with [u]? No, maybe -burger is simply based on a spelling pronunciation.


18.5.5.23:59: THE GENDER OF 'DATE' IN BALTO-SLAVIC AND ROMANCE

On the same Wiktionary page as Dutch datum 'date' (masculine despite its Latin neuter ending -um!) are

Czech datum (neuter); cf. Slovak dátum (masculine; why a long á that doesn't match Czech or Latin?; its neighbor Hungarian dátum also has a long vowel)

Serbo-Croatian and Slovene datum (masculine)

Macedonian <datum> is also masculine. The shift to masculine in Slavic is understandable since consonant-final nouns are generally masculine, and Latin -um is not a Slavic suffix and hence prone to reinterpretation as the ending of a stem.

Leaving Slavic, Latvian has no neuter, and its feminine stems generally end in vowels, so masculine datums is also understandable.

However, Latvian's sister Lithuanian has feminine data (which looks like the Latin plural!) rather than masculine †datumas (see Wikipedia on LIthuanian declension).

And going back to Slavic, Polish also has feminine data, and Bulgarian, Macedonian, Belarusian, Russian, and Ukrainian have feminine <data>. Romance languages have feminine data (French date and Romanian dată) too. Wiktionary derives the Romance forms from a Late Latin data. fdb explains:

Italian, Spanish, Portuguese (etc.) data, and French date (whence English date) are all taken from Mediaeval Latin data, the plural of classical Latin datum, but reinterpreted in these languages as a singular noun. German and Dutch use the classical singular form datum.

All of these are bookish borrowings from Mediaeval or Classical Latin (so-called cultisms) and not organic descendants of the Latin words.

[Someone asks what organic descendants would look like.]

In that case one would expect *dada in Spanish, Portuguese and Italian.

Are the -um forms in Slavic and Latvian borrowings from German Datum?

5.6.0:01: English date then got borrowed into German as das Date which is presumably neuter by analogy with Datum.

5.6.0:09: Added quotation from fdb.

5.6.0:28: Danish date from English has common gender (cf. German above).

5.6.0:32: Added Romanian dată.


18.5.4.23:59: THE GENDER OF DUTCH '-ISM'S AND 'DATE'

Not in time for May Day ...

French communisme is masculine, as is its Latinized German equivalent Kommunismus with a restored Latin masculine nominative singular ending -us. So why is Dutch communisme (and other -isme words like socialisme) neuter?

Conversely, datum has a Latin neuter nominative singular ending -um and is still neuter in German. So why is Dutch datum masculine unlike, say, neuter museum which is still neuter in Dutch?

Are the genders by analogy with semantically similar words? Was there ever a time when de communisme and het datum were acceptable?

5.5.0:33: Google Books has examples of het datum from the 18th and 19th centuries. But I can't find any examples of de communisme in Dutch (as opposed to French where that is a preposition-noun sequence rather than a definite article-noun sequence).

Treffers-Daller (1994: 140) discusses French-Dutch gender mismatches and mentions Van Marle's hypothesis that French borrowings are marked and may receive the marked gender: the less frequent neuter gender (only 25% of Dutch nouns are neuter according to Tuinman 1967).

She also writes,

According to Volland (1986), many French loans obtain neuter gender when borrowed into German. About 60 percent of the borrowings keep the original gender in German, and 40 percent are allocated another gender. In most cases it is the masculine nouns who become neuter in German. It is remarkable that the same tendency for masculine words to become neuter exists in German and in Dutch.

Obviously Kommunismus is not one of those masculine words (though its -us may have made it resistant to gender shift).


18.5.3.23:59: CZECH VOWEL ASYMMETRY AGAIN

Judging from the IPA for Czech at Wikipedia, Czech vowels are phonetically as well as distributionally asymmetrical:

/iː/ [iː]

/u uː/ [u uː]
/i/ [ɪ]




/o oː/ [o oː]
/e eː/ [ɛ ɛː]



/a aː/ [a aː]

The front part of the system 'tilts downward' with the exception of /iː/ which is high.

Short /i/ is lower than long /iː/ and has no back counterpart at the same height.

/e eː/ are lower than /o oː/.

How did this system come about? /i iː/ are from earlier front *i *iː and central *ɨ *ɨː.

Was there a Ukrainian-like phase in which the central high vowels became *ɪ *ɪː? (Ukrainian has no phonemic vowel length, though.) The four front vowels in stage 2 then merged into an English-like subsystem with a higher long vowel and a lower short vowel in stage 3:

Stage 1
*i

*iː
*ɨː
Stage 2

*ɪː
Stage 3
[ɪ]
[iː]

Unlike Czech, Slovak is next door to Ukrainian, and according to the IPA at Wikipedia it has no [ɪ]; its vowel system is truly symmetrical on the phonetic level if one ignores the increasingly marginal vowel [æ]:

[i iː]

[u uː]
[e eː]
[o oː]

[a aː]

The Slovak phonology article at Wikipedia, however, paints a more complex picture: e.g., /e eː/ [e̞ e̞ː] may be phonetically higher than /o oː/ [ɔ̝ ɔ̝ː] - the reverse of Czech. (Did the presence of low [æ] - a vowel absent from Czech - incentivize speakers to raise /e eː/ for greater contrast during its heyday in the past?) Nonetheless it seems that length is not correlated with height differences unlike Czech where short and long /i/ have different heights.

Like Czech /i iː/, Slovak /i iː/ are from earlier front *i *iː and central *ɨ *ɨː So I suspect Slovak also had a Ukrainian-like phase in which the central high vowels became *ɪ *ɪ.

But maybe at some earlier point Czech and/or Slovak had a Rusyn-like stage in which central *ɨ *ɨː coexisted with front *ɪ *ɪ. I still don't understand how Rusyn can have both central /ɨ/ and front /ɪ/ since I assume both are from *ɨ. Are they in complementary distribution? Is one native and one borrowed?

5.4.0:40: Are Czech /e eː/ lower mid because they merged with */ě/ *[ɛː]? */ě/ was historically long, but its reflexes in Czech are both long and short for reasons I don't understand:

*bělъjь > bí /bliː/ 'white'

*svě > svět /svjet/ 'world'

The short reflex is /e/ which may be preceded by a secondary palatal consonant: e.g., /j/ in the case of /svjet/.


18.5.2.23:55: CZECH VOWEL ASYMMETRY

Having written about Slavic and vowels in my last two entries, I'm going to combine the two topics together.

The standard Czech vowel system appears symmetrical if one only looks at vowels in isolation. Each short vowel has a long counterpart:

/a/
/i/
/u/
/e/
/o/
/aː/
/iː/
/uː/
/eː/
/oː/

And the diphthongs form a triangle:

/eu/

/ou/

/au/

But distribution tells a more complex story.

Original *uː became /ou/ except "chiefly in noun prefixes" (Short 1993: 456). e.g., úraz 'injury' but urazit 'to injure'. Why was the prefix *u lengthened to an *uː later preserved in nouns? I still don't understand the backstory of length in Slavic.

Original *oː became uo and then a new /uː/ written <ů> (which I think of as <o> atop <u>); cf. Polish <ó> /u/ and Slovak <ô> /uo/ from earlier *oː. (I'd like to see a chronology of *oː-shifts in West Slavic.)

Loanwords supplied a new /oː/ and /au eu/ to balance /ou/.

Those back vowel developments did not have exact front vowel parallels. *iː did not become †/ei/ (though Short 1993: 464 reports ý /ɨː/ > /ej/ in colloquial Czech), and *eː only sometimes became /iː/ (Short 1993: 464).


Tangut Yinchuan font copyright © Prof. 景永时 Jing Yongshi
Tangut character image fonts by Mojikyo.org
Tangut radical and Khitan fonts by Andrew West
Jurchen font by Jason Glavy
All other content copyright © 2002-2018 Amritavision