After uploading my last post, I realized that I could have transliterated Lao ຍ <ñ> and ຢ <y> in a way that more clearly indicated their origins and correspondences in Thai:

Indic consonant Lao Thai
Script Transliteration Transcription Script Transliteration Transcription
Premodern Modern Premodern Modern
ñ* (image; not in Unicode**) <ñ> ɲ <ñ> y
y <y> *y <y> *y
(none) <ỷ> (n/a) y อย <ʔy> *ʔy

The <ˀ> of ຢ <ỷ> symbolizes (1) the glottal stop of that consonant's source, Proto-Tai *ʔy, and (2) the extra length added to ຍ <y>. Could Proto-Tai *ʔy- have been an implosive stop [ʄ ] like its labial and dental counterparts and rather than a phonetic consonant cluster (and unit phoneme)?

I will also use ˀ in transliterations of the Thai and Lao letters for consonants derived from earlier implosives:

Indic consonant Lao Thai
Script Transliteration Transcription Script Transliteration Transcription
Premodern Modern Premodern Modern
(none) (U+0E8E and  (U+0E8F reserved) <ṭ̉> (d) <ṭ̉> d
<ṭ> *t (t) <ṭ> *t t
(none) <t̉> d <t̉> d
t <t> *t t <t> *t t
(none) <p̉> b <p̉> b
p <p> *p p <p> *p p

Note that in these cases, the <ˀ>-less letters are derived from the <ˀ>-letters rather than the other way around. The presence of <ˀ> in a transliteration generally indicates non-Indic origin except in <ṭ̉> which indicates non-Indic voicing in a word that originally had a voiceless retroflex stop in Sanskrit or Pali: e.g., ฎีกา <ṭ̉īkā> diikaa 'petition' from Sanskrit and Pali ṭīkā 'commentary'. (Why is ṭīkā absent from the online edition of The Pali Text Society's Pali-English Dictionary? What is the etymology of the word? The initial retroflex makes me think it's non-Indo-European unless the retroflexion is secondary.)

The Indic-based transliteration of Indic scripts allows readers to clearly see etymological connections that are lost in transcriptions of modern pronunciation: e.g., the name of HTMS (His Thai Majesty's Ship) Chakri Naruebet.

Thai script จักรีนฤเบศร
Transliteration <cakrī nṛp̉eśr>
Transcription cakkrii narɯbeet
IPA [tɕakkriː narɯbeːt]
Sanskrit cakrī 'wheel'
nṛ 'man'
peśa- 'architect'

What is the final silent ร <r> doing after นฤเบศ <nṛp̉eś> 'king' (< 'architect of men'****)? Is it based on the Sanskrit derivational suffix -ra-? There is no Sanskrit *(nṛ)peśara-. Why isn't it marked with the ทัณฑฆาต <daṇḍghāt> thanthakaat for silent letters as in the homophonous male personal name นฤเบศร์ <nṛp̉eśr̽> [narɯbeːt]?

*1.12.1:20: I am so accustomed to romanized Indic that it took me over two decades to realize that maybe the Indic palatal nasal should be romanized as ń to be consistent with palatal ś. Was the ś of Indic romanization taken from the Polish alphabet? If so, why wasn't Polish ń used for [ɲ]?

I just realized I first learned the letters ń and ź [ʑ] in 1993 from Karlgren's Chinese reconstructions. I knew nothing about Polish back then.

**1.12.1:27: Obsolete Lao letters have codepoints reserved for them in Unicode. Unfortunately, the codepoint for <ñ> is already occupied by <y>. If <ñ> is ever encoded, it may be placed somewhere in the 0EE0-0EFF range.

***1.12.2:00: Unicode has two codepoints available for Lao <ṭ̉> and <ṭ>, but I do not know if Lao ever had both letters. Wikipedia only has an image for a single obsolete Lao letter corresponding to both Thai ฎ <ṭ̉> and ฏ <ṭ>. In any case, both Lao d and t correspond to Indic ṭ:

ດີກາ<īkā> diikaa 'commentary' < Sanskrit and Pali īkā 'id.'

ປະຕິບັດ <paḥtip̉at> patibat 'to carry out' < Pali paipatti 'practice'

Did earlier Lao speakers know when to read a single letter <ṭ> as *[ɗ] or *[t], just as modern Khmer speakers know when to read Khmer ប <p> as [ɓ] or [p]?

****1.12.2:01: Is นฤเบศ <nṛp̉eś> 'king' < 'man-architect' a Sanskrit compound made in Thailand? I can't find a similar compound in Lao, Khmer, or Sanskrit. IS THAI YUAN 'VIETNAMESE' A LOANWORD FROM LAO?

I used to think Thai ญวน <ñwn> yuan 'Vietnamese' was a direct borrowing from Khmer យួន <yuən> yuən 'id.', but the Thai spelling implies that yuan once had an initial palatal nasal *ɲ-. Was the Khmer word borrowed through a Lao intermediary between stages 4 and 5 of the following sequence?

1. Sanskrit yavana- 'Greek (i.e., Ionian) > foreigner'

2. Khmer yuən 'foreigner > Vietnamese'

3. earlier Lao *yuan 'Vietnamese'

4. later Lao ຍວນ ɲuan (after *y- > ɲ-)

5. earlier Thai *ɲuan

6. later Thai yuan (after *ɲ- > y-)

Does that sequence violate the known chronology of sound changes in Lao and Thai? Do other data indicate that Lao *y- > ɲ- predated Thai *ɲ- > y-?

This image of the premodern Lao consonant letter inventory (key here) has ຍ <ñ> and ຢ <y> (both resembling Thai ย <y>) as in the modern Lao script. I've long assumed that Lao once had a letter resembling Thai ญ <ñ> and/or Khmer ញ <ñ> that became obsolete once Lao *y- became ɲ-. (Aha, here's that letter!) Then the old <y> letter (confirmed here!) became ຍ <ñ> for both primary and secondary ɲ-, and ຢ <y> with a lengthened upper right corner was devised for a new y- that developed from old *ʔy-. But why weren't Lao *ʔy-words still written with the old <y> letter as ອຍ <ʔy>? Because the glottal stop had already been lost by that point? Thai orthography does not always indicate an original glottal stop in such words: e.g.,

Gloss Proto-Tai (Pittayaporn 2009) Thai Lao
Script Transliteration IPA Script Transliteration IPA
medicine *ʔyɯə A1 ยา (not *อยา!) <yā> (not *<ʔyā>!) [jaː] A1 = A2 ຢາ <yā> [jaː] A1
to be in a place *ʔyuː B1 อยู่ <ʔyū1> [juː] B1 ຢູ່ <yū1> [juː] B1
to roast *ʔyɯəŋ C1 ย่าง (not *อย้าง!) <yāṅ1> (not *<ʔyāṅ2>!) [jaːŋ] B2= C1 ຢ້າງ <yāṅ2> [jaːŋ] C1

(I have rewritten Pittayaporn's reconstructions for consistency with the notation in the rest of this post.)

The letters A-D indicate tonal classes corresponding to <zero>, <1>, <2>, and <zero> in transliteration. There are two subtypes of the D tone for syllables ending in stops: DS (with short vowels) and DL (with long vowels). 1 indicates a proto-voiceless initial (e.g., *ʔy-); 2 indicates a proto-voiced initial (e.g., *y-).

My impression is that original *ʔy- is written as ย <y> in Thai whenever *ʔy- and *y- have identical tonal reflexes: e.g, [jaː] < *ʔyɯə A1 'medicine' has the tone that it would have also had if it had been from *yɯə A2, and  [ja:ŋ] < *ʔyɯəŋ C1 'to roast' has the tone that it would have also had if it had been from *yɯəŋ B2. So those two words are spelled as if they originally had initial *y-.

However, [juː] <  *ʔyuː B1 'to be in a place' has a low tone that never developed after *y-, so it retains a distinctive spelling with อย- <ʔy->. Similarly, Thai [jaːk] 'to want' < *ʔyɯək DL1 'to be hungry' also has that low tone and is accordingly written อยาก <ʔyāk>. (Compare อยาก <ʔyāk> with ยาก <yāk> [jaːk] < *y- DL2 'to be difficult' [not in Pittayaporn but reconstructed by Jonsson and Li for Proto-Southwestern Tai] which has a falling tone.)

As far as I know, all Thai words written with อย- <ʔy-> have a low tone (B1 or DL1). The reflexes of tones *B1 and *DL1 are identical in Thai. GOOD ARTEMISIA DRAGONS: PROTO-HMONG-MIEN EVIDENCE FOR OLD CHINESE PREINITIALS?

Modern Old Chinese reconstructions are often characterized by a wealth of complex preinitial elements that are controversial (and hence avoided by Schuessler 2009 which attempted to present a lowest common denominator version of Old Chinese). Hypotheses about these elements can be tested by examining early Chinese loanwords in neighboring languages.

Solnit (1996: 15) listed three of Ratliff's (1995) Proto-Hmong-Mien reconstructions of potential loanwords from Chinese (or vice versa). I have added a later reconstruction of Ratliff's from ABVD, since I do not have access to her 2010 book. (I don't have US$4,269 to buy a used copy.)

Gloss Ratliff's Proto-Hmong-Mien Sinograph Old Chinese Middle Chinese
1995 2010 Pan Wuyun Zhengzhang 2003 Baxter and Sagart 2011 This site
dragon *g-roŋ A ? *[g]roŋ *b·roŋ *[mə]-roŋ *mɯ-roŋ *luoŋ
artemisia *kh-ron B *[g]roː, (*[g]ro) *roː, (*ro) (*rˁo, *ro) *(Cɯ-)ro *ləw, *luə
good *k-rVŋ C *-ʔrɔŋH *raŋ *raŋ *[r]aŋ *Cɯ-raŋ *lɨaŋ

I reconstruct high-vowel presyllables in Old Chinese to condition the partial raising of nonhigh vowels in Middle Chinese:

OC *C1ɯ-C2o > MC *C2uo- (*C2 in open syllables)

OC *C1ɯ-C2a > MC *C2ɨa- (*C2 in open syllables)


I don't know of any Chinese-internal reason to favor Pan's *g- over *m- for 'dragon'.

Zhengzhang's *b· reminds me of Written Tibetan Hbrug < *mbruk 'dragon'. If Written Tibetan *mbruk is from *mruk, then maybe the Old Chinese (and even the Proto-Sino-Tibetan?) word for 'dragon' should be reconstructed with *m-. (The correspondence of WT *-k to Old Chinese has parallels elsewhere: e.g.,

'new': Tangut

1siw < *sik, Written Burmese သစ် <sac> < *sik : Old Chinese 新 *sin < *siŋ

Baxter and Sagart's *m- is likely given an alternate reading of 龍 that Baxter and Sagart reconstruct as *mˤroŋ (= my *mroŋ). Also, *m- matches Thai มะโรง maroːŋ 'year of the dragon'. Thai doesn't allow presyllables with non-a vowels, so ma- is not necessarily evidence against reconstructing *mɯ- in Old Chinese.

The Hmong-Mien word may share a root with Old Chinese plus a different prefix.


I reconstruct a high-vowel presyllable to account for the second MC reading *luə with partial vowel raising. Pan and Zhengzhang would derive that reading from OC forms with short vowels, whereas Baxter and Sagart would derive it from an OC form with a nonpharyngealized initial *r-. Reconstructions in parentheses are my guesses of equivalents of my *Cɯ-ro in systems other than mine.

It is possible that 蔞 once had a velar preinitial since other sinographs of its phonetic series definitely had *k-:

鞻屨 OC *kros > MC *kuəʰ 'a kind of shoe'

But on the other hand, the three readings

OC *sroʔ > MC *ʂuəˀ 'number'

OC *sroʔ-s  > MC *ʂuəʰ 'to count'

OC *srok > MC *ʂɔk 'frequent'

of 數 in that same series had *s-, not *k-, so there is no guarantee that 蔞 had *k- (or *g-).

In what Solnit called an "old-style" reconstruction, the Proto-Hmong-Mien word would have a voiceless initial *r̥-. Such a consonant could be a reduction of an earlier *kr- (via *xr-) or *sr-.

Proto-Hmong-Mien *-n corresponds to nothing in Old Chinese.

Proto-Hmong-Mien tone B (< *-ʔ?) should correspond to Old Chinese *-ʔ, not an open syllable.

It's not impossible that the Proto-Hmong-Mien form is a borrowing of an OC *k-ro-n-ʔ with two suffixes (of unknown function!), but I wouldn't bet on it. And the reverse (OC borrowing a word ending in *-nʔ as an open syllable) is implausible. Maybe both languages built upon a common root *ro from a substratum language.


I think the Proto-Hmong-Mien and Old Chinese words are unrelated lookalikes:

- There is no compelling reason to reconstruct *k- or *ʔ- for 良.

I do not understand why Baxter and Sagart reconstructed *k.- in

朗 Old Chinese *k.rˤaŋʔ > Middle Chinese *laŋˀ 'bright'

Even if that reconstruction is correct, I cannot assume that 良 had *k- like its graphic derivative 朗.

- The vowels do not match.

- The consonants conditioning tones do not match. Proto-Hmong-Mien *-h (the source of tone C) should correspond to Old Chinese *-s, not zero. ONE SURPRISING PHONETIC: ZHENGZHANG'S RECONSTRUCTION OF GSR 780

I looked up 詫, the Chinese source of Tangut

5311 1tʂhæ 'surprised'

from part 3 of "Bestial Marshal", in ytenx.org's online version of Zhengzhang's Old Chinese Phonology (h/t Andrew West), and that led me to his reconstructions of Grammata serica recensa phonetic series 780:

Sinograph Old Chinese Middle Chinese
Zhengzhang This site
乇(杔馲) *ʔr'aːg *trak or *rtak *ʈæk
(虴) *pr'aːɡ > *ʔr'aːg
(咤) *ʔr'aː (*tra or *rta) *ʈæ
吒(咤灹矺) *ʔr'aːgs *traks or *rtaks *ʈæʰ
(奼) *ʔr'aːgs  > *traːgs
(厇矺) *ʔr'eːɡ (*trek or *rtek) *ʈɛk
(秅) *r'aː (*dra or *rda) *ɖæ
*r'aːɡ *drak or *rdak *ɖæk
*hr'aː *tʰra or *rtʰa *ʈʰæ
(奼) *hr'aːʔ (*tʰraʔ or *rtʰaʔ) *ʈʰæˀ
*hr'aːɡs *tʰraks or *rtʰaks *ʈʰæʰ
(馲) *raːɡ (*tV-rak) *lak
(仛) *l'aːɡ (*dak) *dak
(矺) *ʔl'aːg (*tak) *tak
秅(秺)㓃 *ʔl'aːgs *taks *taʰ
(奼) *ʔl'aːgs > *taːgs
(矺) *l'aːb (*takʷ?) *tap
(托侂矺)託(飥馲魠) *l̥ʰaːɡ *tʰak *tʰak
(亳) *blaːɡ (*NpTak?) *bak

Why didn't he reconstruct this series with dental stops? What does the apostrophe signify? Does any modern language have voiceless aspirated l̥ʰ? And why did he reconstruct 虴 with *p- even though its Middle Chinese reading has a retroflex stop? I have never seen *pr- become *ʔr- (or *ʔr- become *tr-) anywhere else.

1.9.0:57: The title of this post refers to the fact that the phonetic of all of the above sinographs is identical in shape to the Khitan small script character 乇 'one' (reading unknown).

1.9.1:03: Parentheses in the sinograph column indicate that a sinograph is not in Grammata serica recensa and therefore may not have been attested before the Han Dynasty. Therefore I am reluctant to reconstruct an early Old Chinese (i.e., pre-Han) reading for a sinograph in parentheses. My Old Chinese readings in parentheses are for readings that are unique to sinographs in parentheses. LEXICON LEONUM

Wikipedia has a map of the historical distribution of lions. I would expect words for 'lion' from outside that area to either be based on words from that area or neologisms: e.g., Navajo

náshdóítsoh bitsiijįʼ daditłʼooígíí

'wildcat-big his-head-up-to PL-be-hairy-NOMINALIZER' (Navajo Wikipedia entry).

Looking at Wiktionary translations of lion, I wondered:

- if Basque lehoi has a buffer -h- added to a loanword from Latin or Romance

- why Georgian has lomi with -m- instead of the usual -b-/-w-/-o- in the widespread l-word for 'lion'

- what -tári is in modern Greek liontári

- why Hungarian oroszlán has o instead of a like its Turkic prototype arslan

Do any other Hungarian borrowings have o corresponding to Turkic a?

- where Khmer តោ <too> tao comes from (the resemblance to Sotho and Tswana tau must be coincidental)

- where Maltese iljun and dorbies come from; neither look like Arabic or Romance

- why Mongolian erseleng has -ng unlike its other Turkic-based word ars(a)lan

- what the intermediate steps between Sanskrit siṃha- and Written Tibetan seng-ge are

I assume that (a) the former is of substratal origin since it has no Indo-European etymology and (b) the latter reflects some Middle Indic language: e.g., Turner's Comparative Dictionary lists a Prakrit siṃgha- BESTIAL MARSHAL (PART 3)

0623, the second tangraph of

2ka 2tʂɨe 'lion',

has no known analysis.

Its left side is shared with only one other tangraph which is derived from it:


0631 1nieʳ 'soil, land' =

left of 0623 (second half of 2ka 2tʂɨe 'lion') +

left of 3208 1diẹ 'smoothness, levelness'

I don't know why 0631 and 3208 have the ubiquitous radical 'person' (Boxenhorn code: dex) instead of the similar-looking radical 'earth' (Boxenhorn code: ges):

dex 'person' vs. ges 'earth'

0623 and 3208 are not the only earth tangraphs without ges; others are

0183 1phio 'land'

2039 2882 2vəi 2gwi 'land' (a disyllabic word from the substratal 'red-faced' language like 2ka 2tʂɨe 'lion'?)

2107 1tsəiʳ 'land'

2370 (second half of 1tshị 2mie 'land'; 1tshị can also mean 'land' by itself)

3083 2ɣaʳ 'land'

3308 2di 'land' < Middle Chinese 地 *diʰ

4072 (first half of 1kiụ 1phəu 'land'; 1phəu can also mean 'land' by itself)

Why create an 'earth' radical if it's only in some but not all 'earth' tangraphs? And why are there so many words for 'land'?

I think the second tangraph for 'lion' might be a combination of 'land' and a radical 'run' (< Chinese 走?):


0623 (second half of 2ka 2tʂɨe 'lion') +

left of 0631 1nieʳ 'soil, land' +

right of 2402 'to run'

I abstracted the meaning 'running quickly' from the following tangraphs:

0266 2043 1kiəʳ 1dəə 'to hasten' (0266 is derived from 2043, which in turn is derived from 2402)

0325 (variant of 0266 without a 'tail' on the lower right)

2356 1dʐɨew 'to pursue' (derived from 2402)

2358 (final syllable of (1kiəʳ) 1dəə 1rieʳ 'to run' and 1lwɨə̣ 1rieʳ 'to run'; derived from 2716 1rieʳ [phonetic; see part 2] and 2043)

2402 1lwɨə̣ 'to run' (derived from 2043 and 2356)

2451 2bọ 'to flee' < northwestern Late Middle Chinese 亡 *mvo?

5263 1gwiaʳ 'to gallop' (derived from 2402)

2402 is phonetic in


5310 1lwɨə̣ 'calf' =

center of 1909 1gəuʳ 'ox' +

right of 2402 1lwɨə̣ 'to run'

0623 is presumably semantic in 5311 ('surprised by the sight of a lion'?):


5311 1tʂhæ 'surprised' < Chinese 詫 =

left of 5634 1tʂhæ 'difference' < Chinese 差 (phonetic) +

right of 0623 (second half of 2ka 2tʂɨe 'lion'; semantic?) BESTIAL MARSHAL (PART 2)

The Combined Homophones and Tangraphic Sea analysis of the first tangraph of

2ka 2tʂɨe 'lion'



0829 2ka =

center (Boxenhorn code hia) of 5303 1vị 'monkey (calendrical), beast, animal' +

center (Boxenhorn code dum) and right (Boxenhorn code cin) of 2716 1rieʳ 'skillful, ingenious'

Would it be better to translate 5303 in a calendrical context as something like 'beast' rather than 'monkey'? Or was the meaning of 5303 transitioning from 'animal' to 'monkey'? I am reminded of the Germanic word for 'animal' whose meaning narrowed to 'deer' in English.

Then again, 1vị might be cognate to Proto-Bodic *spra 'monkey' if it is from pre-Tangut *SI-Pra:

Presyllabic *S- conditioned tension of the main vowel (indicated with a subscript dot): *S ... V >

Presyllabic *-I- conditioned the 'brightening' of the main vowel: *I ... a > i

Intervocalic *-P- lenited to v-

Medial *-r- disappeared without a trace before high vowels (so *r-loss must postdate the 'brightening' of *a to i)

If so, then the semantic shift was from 'monkey' to 'animal' rather than the other way around (i.e., generic to specific).

However, other pre-Tangut reconstructions are also possible: e.g., *S-wi. And the root *spra is not otherwise attested in Qiangic. Is Tangut the only Qiangic language that retained it? Or was pre-Tangut *SI-Pra an early borrowing from Tibetan?

The tangraphic component hia is in other characters for animal words:

0825 1vɨeʳ (second half of 1lɨəə 1vɨeʳ 'jackal'; derived from 5303 in Tangraphic Sea)

1454 1biə̣ 'ape' (derived from 5303 in Tangraphic Sea)

1784 1ləụ 'man' (derived from 5303 in Tangraphic Sea)

2446 2pạ < *SpaH  'macaque' (derivation unknown; a loan from Tibetan?; 'grass' on the left is presumably phonetic; cf.

2886 2pạ 'the Tangut clan name syllable pa'

with 'surname' on the right.)

0723 1lwẽ 'to jump' refers to an action of animals; it too was derived from 5303 in Tangraphic Sea:


0723 = center of 5303 + right of 4573 1bi 'light, brightness' (why?)

I just realized that hia in

0724 2niə (plural suffix)

could refer to herds.

Not all hia-tangraphs have animal semantics: e.g.,

0391 2lie 'noon'.

2716 1rieʳ 'skillful, ingenious' was translated as Chinese 利 by Li Fanwen (2008: 445), and it superficially resembles Old Chinese 利 *ris 'sharp', but a pre-Tangut *ris should have become Tangut 2riʳ, not 1rieʳ. Could

2riʳ 'talent, scholar (< 'talented one'?)'

be the true Tangut cognate of Old Chinese 利 *ris 'sharp'?

Or is my Chinese reconstruction wrong? 利 Sino-Vietnamese lợi [ləːj] 'profit' implies a borrowing from an Annamese Middle Chinese word descended from Old Chinese *Cʌ-rəjs with a low-vowel presyllable that blocked the raising and fronting of schwa. Other readings for 利 with i-like rhymes (e.g., Mandarin li) could be from Old Chinese *rəjs (sans presyllable) as well as *ris. And Tangut 1rieʳ could be from a pre-Tangut *rəj. I could then reconstruct a Proto-Sino-Tibetan *rəj as the source of Old Chinese (*Cʌ-)rəj-s and Tangut 1rieʳ.

Maybe all these words are related:

PST *√r-j

> Zero grade: *ri-s

> 利 Old Chinese *ris > Late Middle Chinese *lì > Mandarin li, Cantonese lei, etc.

> pre-Tangut *ris > Tangut 2riʳ

> Schwa grade: *r-ə-j

> 利 southern Old Chinese *Cʌ-rəj-s > Annamese Middle Chinese *lə̀j > Sino-Vietnamese lợi

> pre-Tangut *rəj > Tangut 1rieʳ

The components shared by 0829 and 2716 (Boxenhorn code: dumcin) are also in other tangraphs for words with related semantics:

0814 2Tiạ 'skillful, clever' (class III initial unknown; could be t-, th-, d-, or n-)

2785 2teew 'scheme, strategy' (i.e., something requiring cleverness)

4733 2giaa 'dexterous' (derived from 2716 in Combined Homophones and Tangraphic Sea)

4735 2dʐɨa < *N(-C)ɯ-ca-H 'sharp'(derived from 2716 in Combined Homophones and Tangraphic Sea; cognate to 5732 below?)

5732 1tʂɨaa < *Cɯ-ca-C 'to chop' (cognate to 4735 above?)

Two dumcin tangraphs for words with unrelated semantics are


0076 = variant of 0786 1thwo (second half of 2swi 1thwo 'illusion') =

left of 0330 1miee 'dream' +

right of 0785 2bɨu 'border'

3016 2dəuu 'aspiration' (something that clever people have?)

0076 is not a dumcin tangraph for etymological purposes, as it is a variant of 0786 whose components are hur (二 bam over dum) and cin. (Of course, for indexing purposes, 0076 does contain dum and cin.)

