When I was a child, the old (and possibly prewar) kana charts in my Japanese language classrooms still had the obsolete kana for

ゐ/ヰ wi

ゑ/ヱ we

and from their placement in the chart, I guessed that they stood for i and e, because at the time I had no idea Japanese ever had wi and we in native words.

Nearly twenty years later, I think the Thai alphabet chart in my Thai classroom included the obsolete letters

khɔɔ khuat 'bottle kh' for *x (now kh)

khɔɔ khon 'person kh' for (now kh)

The links lead to entries at thai-language.com.

Lao has even more obsolete letters, but they're missing from Unicode, though spaces have been reserved for them. For example, the Lao equivalent of ฃ khɔɔ khuat for *x would be at U+0E83, after U+0E82 ຂ, the Lao equivalent of Thai ข khɔɔ khay for *kh.

I saw the Lao letter for *x for the first time today in this table of Lao letters. Here's a key to the consonants in a transliteration based on their values in ancient Tai and/or Indic languages. Consonants for Indic loans are in bold. Missing consonants (e.g., ຂ *kh) are in parentheses.

velars k (kh) x g (ɣ) gh ŋ
palatals c ch j z jh ñ
'retroflexes' (really dentals) ʔḍ (ṭ) (ṭh) ḍh
dentals ʔd t th d dh n
labials ʔb p ph f b v bh m
misc. 1 y r l w ś s h
misc. 2 ʔ ɦ

The letter labelled ɦ never had such a value in any language. I use a voiced symbol to indicate that words spelled with it have tones implying a nonexistent *ɦ. Those words are loanwords combining h- with tones normally associated with *voiced initials. The modern Lao equivalent of ɦ spells words with h- < *r-: e.g., ຮູບ huup < Sanskrit or Pali ruupa 'form'. (I presume the older spelling is ຣູບ ruup with ຣ r.)

The table is missing a Lao equivalent of Thai ฬ for Indic ḷ. I assume U+0EAC represents that Lao letter.

Here's a key to the consonants' values in modern Lao:

velars k (kh) kh kh (kh) kh ŋ
palatals c s s s s ñ
'retroflexes' (really dentals) d (t) (th) th th n
dentals d t th th th n
labials b p ph f ph f ph m
misc. 1 y r l w s s s h
misc. 2 ʔ h

I have already mentioned that some Lao romanizations have x for ຊ [s] < *j. According to this Wikipedia article, ສ (originally for *s) is always romanized s. Was that always true? If the French heard a distinction between what they romanized as x ([ɕ]?) and s, I suspect that they would have romanized ສ-words which once had *ch with x:

Lao tone class


Premodern Lao spelling Colonial Lao Romanization? Modern Lao spelling Modern Lao pronunciation
high *ch U+0E89 ɕ? x? ສ (same as *s) s
low *j
*z U+0E8B
high *s s s

(Unicode code points represents obsolete Lao letters not yet in Unicode.)

Lao tone classes are determined by voicing in Proto-Tai.

If I am wrong, then perhaps voiceless *ch became s (like Vietnamese *ch > x [s]), whereas voiced *j and *z became ɕ:

Lao tone class


Premodern Lao spelling Colonial Lao Romanization? Modern Lao spelling Modern Lao pronunciation
high *ch U+0E89 s? s? ສ (same as *s) s
low *j ɕ? x?
*z U+0E8B
high *s s s

However, I think it's more likely that

- *j and *z merged into *j (as in Thai)

- *j devoiced to *ch, merging with original *ch (as in Thai)

- *ch became a fricative ɕ (romanized x)

- ɕ merged with s

I wish I could view an early Lao-French dictionary to see the romanizations of the Lao letters for *ch, *j, and *z.

4.10.23:50: Judging from the fact that ຫ້ວຍຊາຍ Houayxay is spelled as ห้วยทราย in Thai implying *hwaydraay, I guess that Lao *dr merged with *j (or *thr merged with *ch if devoicing occurred first).

4.11.00:00: No, according to Li Fang-kuei (1977: 162), "the Siamese spelling thr- (< *dr-) [of ทราย 'sand'] does not reflect PT [Proto-Tai] *dr-. 'Sand' is therefore *zai rather than *drai in his Proto-Tai reconstruction. The real Thai and Lao reflexes of PT *dr are T r- and L h- < *r- (Li 1977: 128).

Could *zai be borrowed from a southern variant of Old Chinese 沙 *sraj 'sand'? LAO X EX ... ?

(ex in the sense of 'from')

Early Vietnamese had at least four aspirated obstruents:

*ph *th *ch *kh

Last night, I talked about Middle Vietnamese of the 17th century which had lost the palatal aspirate:

ph th (ɕ, spelled x) kh

In modern Vietnamese as spoken in Hanoi and Saigon, only th remains, though the spellings have not changed:

(f, still spelled ph) th (s, still spelled x) (x, still spelled kh)

I wonder if there is a general or even universal sequence of aspirate loss. Are palatal aspirates the first to go? Lao, like Middle Vietnamese, has lost its palatal aspirate:

ph th (s, romanized x in some systems) kh

Lao ຊ was originally but devoiced to *ch and then became the sound romanized by the French as x, which was somehow distinct from ສ [s], the sound romanized as s. In modern Lao, both ຊ x and ສ s are [s].

JFM Genibrel (Vocabulaire Annamite-Français, 1893: xvi) observed that Vietnamese x was "like s [which he equated with French ch], however in Cochin China it is as though there were an i between x and the following vowels" (Gregerson's 1969: 161 translation): i.e., x was still [ɕ] in southern Vietnamese over a century ago.

Perhaps the French who heard this palatal pronunciation of Vietnamese x assigned the letter x to a 19th century Lao ຊ [ɕ] that is now [s]. Can anyone confirm that Lao x was once [ɕ]?

4.9.2:39: I've been looking online for scans of old Lao-French dictionaries containing descriptions of Lao x. I haven't found any yet, but I did rediscover this article by Michel Ferlus on a Tai language in Qui Châu in Vietnam. This language has three aspirated obstruents. Guess which one it's missing. WHAT DO GREEK AND VIETNAMESE HAVE IN COMMON?

In Greek, j- strengthened to dj- and then z-: e.g., Gr ζυγόν zygón 'yoke' (cf. Eng zygote < Gr ζυγωτός zygōtós 'yoked'), whose cognates

Skt yugam (in turn related to yoga) 'yoke'

Lat iugum 'yoke'

Eng yoke

preserve the original j-.

Similarly, early Vietnamese *j- strengthened to d(j)- and then [z] in the north (but weakened back to [j] in the south). Although it is tempting to view southern [j] as a retention of early Vietnamese *j-, there is no record of such an initial consonant in Middle Vietnamese.

(Note that Greek z had an earlier zd or dz stage without any known parallel in Vietnamese.)

Also, the voiceless aspirated stops ph- and kh- became fricatives [f x] in Greek and Vietnamese. But Vietnamese th- is still an aspirated stop unlike Greek th- which is now a fricative [θ]. Vietnamese th- is the only aspirated consonant left in the major dialects*. The fourth Vietnamese aspirate *ch- which had no Greek counterpart also became a fricative x [s].

Alexander de Rhodes, author of the Dictionarium Annamiticum Lusitanum et Latinum (1651; viewable here), wrote that Middle Vietnamese "has three aspirates just as Greek does and they are fairly well aspirated." Although Greek φ θ χ were fricatives [f θ x] in the colloquial Greek of de Rhodes' time, he knew that they were stops in classical Greek, so he meant that MV ph th kh were [ph th kh] rather than [f θ x] (Gregerson 1969: 149).

*Mark Alves (2002: 4) reported that Nghi Ân still has [kh].

Gregerson (1969: 149) wrote that the central Vietnamese dialects of Quảng Nam and Quảng Tín (the latter now part of Quảng Nam) had [ph]. Maspero (1912) found [ph] in Saigon as late as the early 20th century.

Does any modern Vietnamese dialect preserve all three aspirated stops (ph th kh)? I've never heard of any dialect that hasn't lenited the aspirated palatal stop (or fricative) *ch to x [s]. WHAT DOES ROMANI HAVE IN COMMON WITH GREEK AND IRISH (AND EVEN CHINESE)?

I was disappointed to find that Romani doesn't have a chapter in Cardona and Jain's (2003) The Indo-Aryan Languages. The sole reference to Romani that I could find in the index was on p. 208:

In India the devoicing of murmured [= voiced aspirate] stops took place only in Dardic and Romani languages (e.g., OIA [Old Indo-Aryan] dhūma 'smoke' > European Romani thuv, bhūmi 'earth' > phuv.

English pal is from Angloromani phal < Romani phral < Sanskrit bhraatṛ 'brother' (cognate to brother, Greek phrater 'member of a phratra [tribe]', and Latin frater).

The shift of dh and bh to th and ph also occurred in Greek:

theme and thesis are cognate to do; their initials are from *dh- (preserved in Sanskrit dhaa 'to place, put')

physics is cognate to be; their initials are from *bh- (preserved in Sanskrit bhuu 'be')

I wonder if gh > kh in Romani as in Greek

The shift of m to [v] or [w] also occurred in Irish - and much of Chinese. Vietnamese borrowed words from Chinese before and after the change: e.g.,

萬 Late Old Chinese *muanh 'ten thousand' > Late Middle Chinese *(m)wàn

Borrowed before the change: muôn (written in nom as 門 môn 'gate' [phonetic] + 萬 or 万 vạn 'ten thousand' [semantic])

Borrowed after the change: vạn, vàn

The two strata of loans combine in muôn vàn 'myriad'. THE LAWFUL WEST: THE LAST VIETOGRAPH?

While looking up Sino-Vietnamese 西 tây 'west' at nomfoundation.org during my research for "The Watery West", I discovered the nom character



with Pháp 'France'* on the left. I presume it represents tây in the narrow sense of 'French'. The nom script became extinct under French rule.

With the 17th century advent of [國語] quốc ngữ -- the modern roman-style script — Nôm literacy gradually died out. The French colonial government decreed against its use. Today, less than 100 scholars world-wide can read Nôm. Much of Việt Nam's vast, written history is, in effect, inaccessible to the 80 million speakers of the language**.

Was 法+西 'French' (?) one of the last nom characters ever created, if not the very last one? There would have been no need for it prior to French colonization.

There's also a nom graph



with 亻 nhân 'person' on the left. Could 亻+西 have meant 'Western person'?

In Chinese, 亻+西 is also a variant of 似 'similar' and 價 'price' (without 買 'buy' on the bottom right).

似 'similar' has another variant 仏 which looks like the Japanese simplification of 佛 'Buddha'.

仏 Mandarin 'similar' shares its phonetic ㄙ with its near-homophone 私 Md 'private' (an expanded version of ㄙ). Although Schuessler (2009: 283) reconstructed 私 as Old Chinese *si*** with a simple *-i, colloquial Sino-Vietnamese tây < *səj may either point to OC *səj or a Colonial Chinese *səj < *Cʌ-si with warping of *i conditioned by a lost low-vowel prefix.

私 is an unusual graph in nom because it also has a purely Vietnamese reading riêng 'private'. Recycled sinographs in nom normally do not represent native Vietnamese words, unlike sinographs in Japanese writing: e.g., 私 can represent both Sino-Japanese shi and native Japanese watakushi 'I; private'.

riêng 'private' also has a more normal nom graph


with the phonetic 貞 Sino-Vietnamese trinh 'chaste' added to 私 SV < Colonial Chinese *tɨ.

*法 'law' is short for 法蘭西 Md Falanxi / Ct Faatlaansai, a Chinese phonetic transcription of France.

**But "[m]uch of Việt Nam's vast written history" has always been inaccessible to the vast majority of Vietnamese because few Vietnamese were literate in nom, which was a more complex script than Chinese.

4.7.3:36: In October 1945, Ho Chi MInh declared that "nearly all Vietnamese were illiterate." (See p. 60 of this PDF.) I presume he meant that they were illiterate in the quốc ngữ alphabet. By then only a few elderly men must have been literate in nom but not in quốc ngữ.

3:50: Page 61 of that PDF gives figures of 14-20% literacy for Vietnam in 1945. For comparison, only 28% of Afghans are literate today. You can see literacy figures for all current countries here.

***I assume *siʔ in Schuessler (2007: 477) with a glottal stop is a typo. BEWILDERING PRESYLLABIC PARALLELS

I chose bewildered for the title of last night's post just because I wanted a word that alliterated with bamboo. What is the be- doing in it? I don't know, and I became even more puzzled after I learned that wilder is a synonym of bewilder. Germanic languages have a number of verb prefixes whose functions are opaque: e.g.,

Proto-Indo-European source of prefix English (* = nonexistent form) Dutch German
*per 'forward, through' forbid verbieden verbieten
forget vergeten vergessen
forgive vergeven vergeben
(*forstand 'understand') verstaan verstehen
*bhi (< *ʕmbhi, source of ambi-) 'around' (*begrip 'understand') begrijpen begreifen
(*bestand 'exist') bestaan bestehen

What does forgive have to do with give (besides share a root)? (According to Watkins 2000: 28, for- was 'far' and forgive was once 'give away'.) What did the for- (or ver-) words once have in common? Notice that even related languages don't have the same prefixes for the same meaning: e.g., English has understand instead of *forstand corresponding to Dutch verstaan and German verstehen. And what does understanding have to do with under or standing (again, besides etymology)?

(You can read more about Germanic verb prefixes here.)

Such opaque verb prefixes are also found elsewhere in Indo-European: e.g.,

perceive < Latin percipere 'through-take'

(a hypothetical English cognate would be *forheave)

cf. conceive, deceive, receive

Russian написать 'on-write', perfective of писать 'to write'

cf. English constructions like wrote up

Sanskrit anujñaa 'permit' < 'after-know' (jñaa is cognate to 'know')

Vietnamese, Chinese, and other (South)east Asian languages were once full of prefixes. In Vietnamese, these lost prefixes have conditioned 'softened' initials: e.g.,

*taaw > đao [ɗaaw] 'knife' (unprefixed; loan from Colonial Chinese 刀 *taaw)

*CV-taaw > *CV-daaw > dao [zaaw] 'knife' (formerly prefixed; the prefix could have been from Colonial Chinese or added in Vietnamese)

The r- of Vietnamese rây 'sieve' from last night's post is probably from an *s(-like) initial that lenited after a lost *CV-prefix. (Cf. *-s- > *-r- between vowels in Latin.) English was ~ were is another example of rhotacism.) The original initial of rây is hinted at in old spellings.

I don't know what the prefix of rây was, and even if I knew what it sounded like, it might make as much sense to me as the re- of receive.

Tangut also had 'softened' initials due to lost prefixes: e.g.,

*Cʌ-tek > *Cʌ-dek > *Cʌ-lek > lew 'one'

cf. Old Chinese 隻 *tek 'single'

Glancing at Guillaume Jacques' modern gDong-brgyad rGyalrong data, I have no idea why kɯ- is in words as different as

adjectives: -jaʁ 'thick' (cf. Tangut *C-la(a) > laa, lạ 'id.'; *C- might not have been *k-, since I suspect *kl- > lh- in Tangut)

numerals: -rcat 'eight' (cf. Tangut *rja > jaʳ 'id.' without any prefix; Written Tibetan brgyad 'id.' and Old Chinese *pret 'id.' have other prefixes attached to the *r-root)

nouns: -rtsɤɣ 'leopard' (cf. Tangut *rʌ-tsek > zewʳ 'id.' without a *k-prefix)

intransitive verbs: -ŋu 'be' (cf. pre-Tangut *pʌ-ŋu > ŋwəu 'id.' with a different prefix)

transitive verbs: -tu 'have' (cf. pre-Tangut *du > diu 'id.'; perhaps *d- < *N-t-?)

(4.6.0:40: Added last two examples.)

Is the kɯ- in those three rGyalrong words really one kɯ-, or two or more unrelated homonyms?

Reconstructing the semantics of affixes in earlier Asian languages or their ultimate sources* is difficult, to say the least.

*E.g., the full k-word(s), if any, that was/were reduced to kɯ- in rGyalrong. BEWILDERED BY THE BAMBOO NUN

The majority of vietographs (made-in-Vietnam nom characters) consist of Chinese semantic elements plus Chinese phonetic elements. Since Chinese lost initial *r- long before the dawn of nom writing, there was (and is) no r- in Sino-Vietnamese, and Vietnamese words with initial r- were generally written with phonetics that had l-initial Sino-Vietnamese readings: e.g.,

ra 'go out'

semantic: 出 SV xuất 'go out' (synonymous with ra) +

phonetic: 羅 SV la 'net' (nearly homophonous with ra)

rãnh 'stream'

semantic: 氵SV thủy 'water' +

phonetic: 領 SV lãnh 'collar; to lead'

rộng 'wide'

phonetic: 弄 SV lộng 'to play' +

semantic: 廣 SV quảng 'wide'

One would predict that V rây 'to strain, sift' would be written with an SV li-type phonetic. (There is no SV reading lây.) But Ngủ thiên tự (Five Thousand Characters) lists its vietograph as

semantic: 竹 SV trúc 'bamboo'

phonetic: 尼 SV ni 'nun'

Why would an r-word be written with an n-phonetic? Is it relevant that 尼 had an r-like retroflex initial *ɳ- in Middle Chinese? Or was 尼 chosen because n- was a sonorant like r-? Could 尼 reflect an original *-n- that intervocalically lenited to r-?

*CvoicelessV-nây > rây

cf. intervocalic *-nn- > -ll- in Korean 困難 *konnan > kollan 'difficulty'

Did a voiceless-initial presyllable condition a ngang tone instead of the expected huyền tone from *r-?

There are several other nom spellings of rây:

1. 篩 SV < *sr- 'sieve' doesn't sound much like rây, but is a translation vietograph (semantograph); cf. the use of 篩 to write the unrelated native Japanese word furui 'sieve'. (SV and V rây may ultimately share an Old Chinese root *s-r-[j].) huesoft also lists two SV readings not in Mineya, si and sai, which are phonetically closer to rây since they have palatal rhymes.

2. 縒 SV sai (reading from Wiktionary) < *sr- 'uneven' (phonetic loan for rây?)

3. 笪 SV đát (reading from huesoft) 'rough bamboo mat' (used as a sieve?; semantic?) atop 例 SV lệ 'rule' (phonetic)

4. 䇴 < 竹 SV trúc 'bamboo' atop 西 SV tây < *s- 'west'.

䇴, 篩, and 縒 all point toward an earlier voiceless sibilant that could have conditioned the ngang tone of rây. Here's what I think could have happened:

- Old Chinese 釃 *sre became *srəj in colloquial Colonial Chinese

(4.5.20:24: *srəj could also simply be

- the OC source of MC 篩 *ʂi 'sieve'

- a schwa-grade variant of the skeleton *s-r-j

- a reversed type variant of 篩: *Cʌ-sri > *sri > *srei > *srəj)

- CCC was borrowed into early Vietnamese as *srəj

- This was written in early nom as

篩 SV *srɨ ~ *sri ~ *sraaj

縒 SV *sraaj

䇴 with phonetic 西 SV *səj (for an early Vietnamese variant *səj without *-r-?)

- *sr- fused into *-ʂ- and softened to *-ʐ- after a presyllable

- borrowed from Chinese (4.5.20:20: e.g., the emphasis-conditioning low-vowelled presyllable in OC *Cʌ-sri above)?

- or added in Vietnamese?

- became r before Middle Vietnamese. I hypothesize that the spelling 笪+例 with an l-phonetic dates after this shift.

- At some point before Middle Vietnamese, the presyllable was lost.

Maybe 尼 SV ni 'nun' was once *nri (cf. Middle Chinese *ɳi) and 竹+尼 represented *nV-rəj after its sibilant became *-r- and before it lost its presyllable.

The native Vietnamese words for 'sieve' were also written with 竹 'bamboo' in nom.

竹+寅, 木+簡 giần < *CV-ɟən

cf. Ruc cən

竹+床 sàng < *graŋ

cf. Ruc k(h)raaŋ

More comparative data is at Starling.

