Looking at the many Chữ Nôm spellings of Vietnamese trai ~ giai < *plaːj 'boy' at nomfoundation.org made me realize how useful it would be to have a Chữ Nôm dictionary with characters organized by chronology and geography. Even without any dated manuscripts on hand, I can make guesses about the ages of spellings by the sound changes that they reflect. (See my last post on the history of Vietnamese *pl-.) The spellings fall into five strata listed below in approximate chronological order. (The dating of the fifth category is uncertain.)

1. *p-l-spellings reflecting *plaːj

2. *l-spellings which may reflect *plaːj or *tlaːj (i.e., postdate the shift of *pl- to *tl-)

3. a *t-l-spelling which postdates the shift of *pl- to *tl- and the shift of *s- to *t-

4. a tr-spelling which postdate the merger of *tl- and *ʈ- as tr-

5. gi-spellings which postdate the shift of *kj- to gi-; they may reflect *CV.pl-, or, more likely, an alternate (dialectal?) development of *pl- as gi-.

Graph Semantic Phonetic 1 Phonetic 2

none ba < *p- lai

nam 'man'
lai none
⿱司來 none
< *s-

trai < *ʈ- none

giai < *kj-
nam 'man'
giai distorted as 隹 chuy¹
⿰男皆 giai < *kj-

All of the above are listed under the reading trai at nomfoundation.org (including 佳 despite the fact that the Sino-Vietnamese reading of 佳 is giai) except for the last two which are listed under the reading giai.

There is no graphic evidence for *CV.plaːj which I thought might be the source of giay: i.e., there are no characters with a phonetic component corresponding to my hypothetical presyllable. In theory, a sesquisyllable like *kV.plaːj could have been written as a combination of *kɤː and *paːj characters: e.g., ⿰居拜.

There is also no comparative evidence within Vietic for *CV.plaːj as opposed to *plaːj. I found these forms outside Vietic at the Mon-Khmer Comparative Dictionary:

Katuic branch:

Katu (An Diem) mblɑːj 'unmarried man' (Costello 1971)

Palaungic branch:

Lawa (Mae Sariang) [kuan] mblia, mbluai 'young man' (Shorto 2006)

Is the [pi] in Lawa (Bo Luang) [pi]-plia 'young man' (Shorto 2006) reduplicative?

Old Mon blāy 'young man' (Shorto 1971) is the same word, albeit without any element before bl-.

Shorto (2006) reconstructed a Mon-looking Proto-Mon-Khmer *blaːj 'young man' which I am tempted to revise as *m.blaːj on the basis of Katu and Lawa. But perhaps there is another explanation for the shared m- in Katu and Lawa. It may be significant that Sidwell (2005, 2010) did not reconstruct this word in Proto-Katuic or Proto-Palaungic.

But I doubt Vietnamese trai ~ giai is related to those non-Vietic words because it does not go back to *b- ... -j; its tone points to voiceless *p-, and its -i is from *-l which is still preserved in more conservative Vietic languages (look up 'man' at the Mon-Khmer Comparative Dictionary for examples).

There is no Chữ Nôm evidence for *-l in the spellings of 'boy' or any other words, which leads me to think that the change of *-l to *-j was complete before Vietnamese was first written. Otherwise I would expect *-l words to be at least sporadically spelled with *-w, *-t and/or *-n phonetics. (Chinese had long ago lost its *-l by the time Vietnamese was first written, so the only options for writing Vietnamese *-l were *-j, *-w, *-t, or *-n.) *-w seems unlikely since *-l must have been palatal [ʎ] or palatalized [lʲ] before becoming *-j. I cannot yet dismiss the possibility that *-l was consistently written with *-j phonetics. If Chữ Nôm reflects some feature lost before *-l became *-j, then I would have no choice but to assume that was the case.

7.1.21:52: Wiktionary has the following etymology:

From Proto-Vietic *p-laːl, from Proto-Mon-Khmer *bplaaj, an infixed form of *blaaj (“young man”); cognate with Mon ဗၠဲာ (plai, “bachelor, unmarried male past age of puberty”).

*bp- explains the *voiceless tone and even fits my hypothesis of gi- coming in part from *CV.pl-: I could even claim that Lawa (Bo Luang) [pi]-plia preserves the original presyllable. So maybe:

*bil- > *bi-p-l- (infixation; *voiceless tone on stressed syllable) > *bi.βj- > *CV.ʑ- > gi-

But the problem of Vietic having *-l instead of *-j remains, unless one reconstructs a Proto-Austroasiatic *-ʎ to account for the mismatch in codas.

* *-j

I would rather not reconstruct *-ʎ to save a single troubled etymology, though. And I think it's still too early to reconstruct Proto-Austroasiatic.

As for the presyllable, it is unnecessary if gi- is regarded as a dialectal development:

The sound change from *p-l and *b-l to ‹gi› is a regular sound change in Northern Vietnamese; compare giồng, giầu, giời and giun.

trai would then have to be a southern loanword in northern Vietnamese coexisting alongside native giai.

¹隹 chuy 'bird' makes no sense from a semantic or phonetic perspective. At first I thought it might be an abbreviation of 雄 hùng 'male' in 𪟦, but it's more likely to be a reshaping of the phonetic 佳 giai. 隹 is a frequent right-hand element of Chinese characters whereas 佳 is not a right-hand element in Chinese and is not common in that position in Chữ Nôm. (I don't know of any other Chữ Nôm characters with the structure ⿰X佳.) FRUITS OF THE SKY

I thought Pittayaporn's Proto-Tai cluster *ɓl- was unusual until I remembered that Middle Vietnamese had /ɓl/ in words such as blái /ɓláːj/ 'fruit' whose Chữ Nôm spelling 𢁑¹ is from 巴 ba + 賴 lại. That /ɓl/ is not very old; it goes back to Proto-Vietic *pl-, and some Vietic languages preserve the original *p- to this day: e.g., Ruc pəlíː 'fruit' (Phu 1998). Vietnamese shifted *p- to b- /ɓ/ both before vowels and before *l-.² The modern Vietnamese word for 'fruit' is trái (northern /cáːj/, central-southern /ʈáːj/). nomfoundation.org lists a variant lái - is this current, and if so, where?

Another source of Middle Vietnamese /ɓl/ is *bl-: e.g., Middle Vietnamese blời 'sky' (the huyền tone written with a grave accent points to a voiced proto-initial). Cf. Ruc pləːj < *b- 'sky' (Phu 1998).

As one can see at the Wiktionary entry for blời, its modern reflexes are trời and giời. tr- is the initial I'd expect since *Cl- clusters normally merge as tr-. gi-, on the other hand, has long puzzled me since it is normally from presyllable-palatal sequences and *kj-

It just occurred to me that trời is from 'bare' *blơi whereas giời may be from *CV.blơi with a presyllable whose vowel conditioned the lenition of the following consonant which became gi- (possibly a voiced palatal stop [ɟ] in Middle Vietnamese).  In Hanoi, retroflex tr- /ʈ/ < *Cl- became palatal ch- /c/. So perhaps *CV.bl- similarly became gi- which was palatal [ɟ] in Middle Vietnamese:

*CV.bl- > *CV.βj- > *CV.ʑ- > gi-

or if a prefix were added to *b- after it devoiced to *p- and imploded to *ɓ-:

*bl- > *pl- > *ɓl-CV-ɓl- > *CV-ɓj- > *CV- > gi-

(*ʄ is a palatal implosive stop. Cf. the palatal stop pronunciation [ɟ] of gi- in Vinh, Thanh Chương, and Hà Tinh.)

Similar changes could be proposed for *(CV.)pl-:

*CV.pl- > *CV.bl- > *CV.βj- > *CV.ʑ- > gi-

*pl- > *ɓl-CV-ɓl- > *CV-ɓj- > *CV- > gi-

*(CV.)pl-words would have upper register tones (ngang, sắc, hỏi) conditioned by *voiceless initials, whereas  *(CV.)bl-words would have lower register tones (huyền, nặng, ngã) conditioned by *voiced initials: e.g.,

*CV.pla, *CV.plaʔ, *CV.plah > gia, giá, giả 


*CV.bla, *CV.blaʔ, *CV.blah > già, giạ, giã

There are two major problems with those accounts.

First, I don't know of any comparative evidence for a presyllable in 'sky'. If there were such a presyllable, it might have to be a prefix that was a Vietnamese-internal innovation. Maybe the development of *pl-/*bl- to gi- paralleled that of *kj- to gi-:

*pl- > *pj- > bj- > gi-

*kj- > *gj- > gi-

Second - and this applies to my no-presyllable solution as well - I don't know of any other instance of *-l- becoming *-j-,though perhaps such a stage could bridge *Cl- and Hanoi ch- /c/:

*Cl- > *tl- > *tj- > ch- /c/

Dialects that have retroflex reflexes of *Cl- had no *-j-stage:

*Cl- > *tl- > *tr- > tr- /ʈ/

l- ~ nh- /ɲ/ variation in Vietnamese (Thompson 1987: 70) suggest that *l may have once been palatal [ʎ]: e.g., hai mươi lăm ~ northern hai mươi nhăm 'twenty-five' (but năm 'five' and mươi lăm 'fifteen', not †mươi nhăm).

I don't know of any modern Vietnamese dialects that have labial reflexes of Middle Vietnamese /ɓl/, but I know almost nothing about Vietnamese dialects. If not for Middle Vietnamese or more conservative Vietic languages, I wouldn't be able to reconstruct a labial initial in 'fruit' or 'sky'.

The reflexes³ of Middle Vietnamese /ɓl/ -

Vinh, Thanh Chương, Hà Tinh
Huế, Saigon

(IPA from the Wiktionary entries for giời and trời.)

- are quite different from those of Proto-Tai: /bl bj b mj m ɗ d l n/. In fact none match!

(Mostly written 18.6.18; revised, expanded, and finished 18.6.29.)

¹At nomfoundation.org, there are other spellings, all with two components:

Graph Semantic Phonetic 1 Phonetic 2
⿰來巴 none lai ba
⿱巴乃 quả 'fruit'̉ ba nãi
𣛤 lai none
𣡙 lại
𧀞 lại

The use of an n-phonetic 乃 nãi for a syllable with /l/ makes me wonder if *pn- merged with *pl-. After such a merger, one might spell *pn- (now *pl-) words with both n- and l-phonetics, and one might also spell *pl-words such as 'fruit' with both types of phonetics.

²But not *-r-: *pr- in 'squirrel' became s- [ʂ] (sóc), not †br- [ɓr]. Khmer កំប្រុក <kaṁpruk> 'squirrel' retains the original cluster (Gage 1985: 506). This could imply that

*pr- > *pr̥- > *pʂ - >  s- [ʂ]

predated *p- to b- [ɓ]: i.e., that *p- in *pr- was lost before it could implode to †ɓ-. Then again, those are a lot of steps, and perhaps *pr- was also subject to implosion:

*pr- > *ɓr- > *ɓʐ- > *ʐ- > s- [ʂ].

³Strictly speaking, gi- is not a reflex of /ɓl/ if I am right about it being from *CV.bl- rather than from /ɓl/ < *pl-/*bl-.

⁴[ɟ] may be the closest living approximation of the Middle Vietnamese consonant that de Rhodes wrote as gi- in the 17th century. Cf. the Italian pronunciation of gi- as [dʒ]. RETROFLEXES FROM DENTALS IN ZHENGZHANG AND PAN'S OLD CHINESE RECONSTRUCTIONS

Middle Chinese has a series of retroflex initials

*ʈʰ *ɖ- *ɳ-
*tʂ- *tʂʰ- *dʐ-

which in the West are thought to come from Old Chinese *r-clusters. Contrast these two words in Baxter's Old Chinese reconstruction and their Middle Chinese reflexes in my reconstruction:

專 OC *ton > MC *tɕwɨen 'exclusively'

傳 OC *tron-s > MC *ʈwɨe̤n 'what is transmitted'

OC *t- palatalized to MC *tɕ-, whereas OC *tr- became retroflex *ʈ-.

I would reconstruct the two words as OC *Cɯ.ton and *Rɯ.ton-s with high-vowel presyllables triggering the diphthongization of the following vowel:

Stage 1
*Cɯ.Con presyllable present
no effect of presyllable on following syllable
Stage 2
*Cɯ.Cuon highness of presyllabic vowel transferred onto beginning of next vowel
Stage 3
*Cwɨan presyllable lost
labiality of *uo shifted to onset; *ɨa is *uo stripped of its labiality
Stage 4
*Cwɨen *a fronted to *e before the acute coda *-n

*Rɯ- fuses with the following consonant: *Rɯ.t- > *rt- > *tr- > *ʈ-..

But the basic pattern remains: *t- palatalizes, whereas a *t- plus *R- sequence results in *ʈ-.

On the other hand, Zhengzhang Shangfeng and Pan Wuyun reconstructed 'what is transmitted' without *r:

Old Chinese
Middle Chinese
Baxter & Sagart
This site
exclusively *ton
*Cɯ.ton *tjon
what is transmitted *tron-s
*Rɯ.ton-s *tons

The palatalization of *tj- to *tɕ- makes perfect sense. But why would *t- back to *ʈ- in 'what is transmitted' and other 'type B' syllables with short vowels in Zhengzhang and Pan's reconstructions? (Their short vowels correspond to the absence of pharyngealization in Baxter and Sagart's reconstruction and the presence of high vowels in my reconstruction.)

Old Chinese
Middle Chinese
Baxter & Sagart
This site

*Rɯ.ta *ta

*truŋ *tuŋ

to ascend
*Rɯ.tək *tɯɡ

*Rɯ.te *ʔl'e

tree root
*Rɯ.to *to

Pan's *k-l- shifting to *ʈ- is a change also found in Vietnamese.

Zhengzhang's *ʔl'- (what is *'-?) shifting to *ʈ- is a similar change.

There is no *tri in Baxter and Sagart's system or mine, so there may not be a *ti in Zhengzhang and Pan's systems.

*R could be *r- or *l-; the *R- of *Rɯ.tək 'to ascend' may have been *l- if Written Tibetan ltag-pa 'upper part' is cognate.

I don't know if Zhengzhang or Pan have a *te, but I would expect their *te to have the same Middle Chinese reflex as 'know'.

For comparison, Pan and Zhengzhang's *t- does not become retroflex in 'type A' syllables with long vowels corresponding to the presence of pharyngealization in Baxter and Sagart's reconstruction and the presence of low vowels in my reconstruction. (I use as a symbol for 'unknown unstressed lower vowel.)

Old Chinese
Middle Chinese
Baxter & Sagart
This site

capital city
*ta *taː

western tribes
*Cʌ.ti *tiːl

*Cʌ.tuʔ *tuːwː

*Cʌ.tək *tɯːg

son of principal wife
*tˁek *tek

*tˁo *to

Summing up the patterns (and adding type A syllables with Middle Chinese retroflexes for completeness):

Syllable type
Old Chinese
Middle Chinese
Baxter & Sagart
This site
*ti, *Cɯ.tA
*RtI, Rɯ.tA, *trI
*tˁr- *RtA, *Rʌ.tI, *trA
*tA, *Cʌ.tI *tVː

I use *I to symbolize the stressed higher vowel series *ə *i *u and *A to symbolize the stressed lower vowel series *a *e *o.

Zhengzhang and Pan do allow *tr-type clusters to become retroflexes. But why would simple *t-initials also become retroflexes before short vowels? I have never seen that change anywhere else, and

My guess is that Zhengzhang and Pan both observed the high frequency of retroflex initials in Middle Chinese and chose to reconstruct single dentals as their sources. Phonostatistics could be suggestive. If, for instance, the proportion of *ʈ- to *tɕ- in Middle Chinese B-type syllables were three to one, then it might make sense to reconstruct their Old Chinese sources as *t- and *tj- instead of as *tr- and *t-, since simple initials are normally more common than clusters. (Whether short vowels make sense as a conditioning factor for retroflexion is another matter.)

