In parts 1-6, I've examined the 'graphonetic gap' between Nai Pan Hla's spelling and pronunciation in Mon.

Beginning with this part, I will look at Minegishi Makoto's tables in Nai Pan Hla (1988-89) comparing

Mon Rao and Mon Ro are "two major groups of [Mon] dialects spoken in Burma today" (Diffloth 1984: 41). Nai Pan Hla's focuses on Mon Rao even though his native dialect is Mon Ro because Mon Rao is the dialect of "the majority" (1984: 14). In general Mon Rao dialects are further north than Mon Ro dialects; one exception is Kawkyaik, the dialect in Shorto's dictionary, which is directly east of the southeastern Mon Ro dialects.

It would be interesting to include the Mon dialects of Thailand which "forms a group of dialects by itself" (Diffloth 1984: 42) in this survey, but for now I'm  mostly going to stick with Minegishi's tables which I'm going to break down into smaller tables, starting with one for <a>:

written coda
*voiceless initial
*voiced initial
NPH Rao Shorto
<k·>, <ṅ·>
aC ɛ̤C
<t·>, <n·>, <p·>, <m·>, <h·>, <ʔ·> ɔC
o̤C ɔ̤C
<v·> ɔ ɔ̤
<y·> oa
oa ~ ɔa o̤a
o̤a ~ ɔ̤a

So far, Nai Pan Hla's nonnative Mon Rao is closer to his native Mon Ro than Shorto's Kawkyaik Mon Rao. Nai Pan Hla has lowered his native [e̤ʔ] to [ɛ̤ʔ] and favored [o̤a] over [ɔ̤a] when speaking Mon Rao.

The raising of *-aʔ after *voiced initials to [e̤ʔ] in Mon Rao reminds me of the shift of *-ak after *voiced initials to [eəʔ] in Khmer. Diffloth (1984: 155-156) reconstructs Proto-Mon *-ɛ̤ə̯ʔ with a Khmer-like diphthong. Thai Mon dialects in Diffloth (1984) still have diphthongs:

Shorto's [ɛC] reminds me of Burmese [ɛʔ] < *-ak. (But Burmese *-aŋ became [ɪ̃], not [ɛ̃]!)

Shorto's [ɛ̤aC] reminds me of Khmer [eəC] in the same environment.

Nearly all of Shorto's pronunciations after *voiced initials are higher than after *voiceless initials with two exceptions:

Mon Ro [ɔa] ~ [ɔ̤a] is probably more conservative than Nai Pan Hla's nonnative Mon Rao and Shorto's [oa] ~ [o̤a]. Diffloth (1984: 249) reconstructs Proto-Mon *-ɒɛ̯ and *-ɔ̤ɛ̯ which differ in height as well as phonation.

One feature that distinguishes Mon and Burmese script from the other Indic scripts of Continental Southeast Asian languages is the digraph <ui> which never represents anything like [ui]. In Burmese, it has surprising phonetic values:

For comparison, here are the pronunciations of <ui> in Mon from Nai Pan Hla (1988-89: 16,18):

written coda
*voiceless initial
*voiced initial
<k·>, <ṅ·>
<t·>, <n·>
<p·>, <m·>, <h·>, <ʔ·> ɜ̤C
<v·> ɒ ɜ̤
<y·> oi

Shorto (1971: xviii, xx) interprets <ui> in Old and Middle Mon as /ø/ even though none of the above modern pronunciations has a front component except for <uiy·> whose final [i] presumably corresponds to *-j and could be rewritten as [j].

A quick check of Diffloth (1984) shows that his Proto-Monic (there is no in his reconstruction) generally corresponds to modern Mon <ui>¹. also matches the earlier value of <ui> in Burmese that I reconstructed seven years before I first saw Diffloth's book.

Let's go with as a placeholder for what <ui> once stood for.

As I'd expect, generally lowered after *voiceless initials and raised after *voiced initials. An exception is before velars: always warped to [aɨ] with a lowered beginning and raised ending regardless of initial. (Of course *voiced initials conditioned breathiness: [a̤ɨ].)

Judging from  [ɜ̤], maybe *ɜ would be a more precise reconstruction of *ə.

dissimilated after *voiced initials and before *iT:

*gəT > *gə̤T > *gə̤iT > *go̤iT

I suspect a similar dissimilation occurred before *-j after a new *-əj developed (from where?) to replace the old one in stage 2:

stage 1
*-əj *-?
stage 2
*-aj *-əj
stage 3
stage 4

For comparison, *əː developed much more simply in Khmer:

¹The exceptions I found ended in *-əj which corresponds to modern Mon <ai>, not <uiy·>: e.g.,

*təj > <tai> [toa] 'arm, hand'

That makes me wonder where <uiy·> came from.


Continental Southeast Asian languages typically distinguish between upper mid /e o/ and lower mid /ɛ ɔ/. Modern spoken Mon as described by Nai Pan Hla (1988-89: 12, 16-18) fits this pattern:

Mon spellings - such as those above - imply that the distinction is the result of reorganization. The earlier sound system implied by modern Mon spelling has no (though Diffloth 1984: 284 reconstructs *ɛː at the Proto-Monic level). However, modern Mon spelling does imply an spelled <aṁ> before velars (Nai Pan Hla 1988-89: 16, 18):

written coda *voiceless initial *voiced initial
<amk·> [ɔk]
<amṅ·> [ɔŋ]

The behavior of written Mon <aṁ> is generally quite different from that of Khmer and *ɔː in the same environment.

Reflexes of Khmer before velar codas:

written coda *voiceless initial *voiced initial
<aka'> [ɑːk]
<aṅa'> [ɑːŋ]

Reflexes of Khmer ɔː before velar codas:

written coda *voiceless initial *voiced initial
<aka> [ɑk]
<aṅa> [ɑŋ]

Khmer lowered *ɔ(ː) after *voiceless initials and broke short after *voiced initials, but Mon only developed a register distinction (which once existed in Khmer but is now gone).

Diffloth (1984: 299) reconstructed a Proto-Monic upper-lower mid vowel distinction only before certain codas:

*-k/-ŋ *-c/-ɲ







*oːP *oːw
*ɔː *ɔːʔ *ɔh *ɔːK


Proto-Monic *-c/-ɲ have been lost even from modern Mon spelling: e.g.,

Old Mon
modern Mon
*ceːc (not attested?)
ci coik great-grandchild
*ciːɲ ṅ· ~ ciṅ· ciṅ· coiŋ elephant
*puːc ~ pu pu put to gouge with a chisel
*smaːɲ smāñ· smā hman to inquire

(9.20.17:00: Reformatted the examples above into a table and added the Old Mon examples preserving final palatals.)

(No Proto-Monic *-eːɲ words have survived in modern Mon.)

The distribution of vowels seems chaotic, though one generalization can be made: length is nondistinctive before glottals. (All vowels before *-ʔ are long, and all vowels before *-h are short. That statement is true even for vowels absent in the table.) I wonder if a more orderly distribution can be reconstructed.

I have excluded the central mid vowel *ə(ː) which I will discuss in part 6.

Diffloth does not reconstruct short *o.

Diffloth does reconstruct *-t, *-n, *-j, *-r, *-l, and *-s, but does not reconstruct *e(ː) *ɛ(ː) *oː *ɔ(ː) before those codas. In modern Mon, *-r and *-l have disappeared even in spelling, and *-s has become [h] (Diffloth 1984: 295-296).

Does modern Mon spelling reflects a stage of the language in which *e(ː)/*ɛ(ː) merged into *e and *oː/*ɔ(ː) merged into *o almost everywhere except before velars?


1. Written Mon mid vowels <e> and <o> (Nai Pan Hla (1988-89: 11-12, 16-18)

1a. <e>

written coda
*voiceless initial
*voiced initial
<k·>, <ṅ·>
<t·>, <n·>, <p·>, <m·>
eC e̤C
<y·> ea
<v·> ɛ
<h·>, <ʔ·> ɛC

<e> in open syllables does not lower after *voicedless initials unlike <ī> [ɔe].

<ek·> [ɔeK] is like a lowered version of <ik·> [oiC] but has the diphthong [ɔe] of open <ī> after *voicedless initials.

I suppose that *-ej > *-eaj (partial dissimilation from *j?) > [ea].

I don't know why *e lowered before *-w and final glottals. So far I haven't seen any other cases of *-Vw having the same vowel as *-vQ (see below). In part 2, we saw that after *voiced initials, <āp·> and <ām·> were [ɛ̤p] and [ɛ̤m] (with raising rather than lowering!) whereas <āv·> was respelled as <au> [ɛ̤a] (again with raising rather than lowering!).

1b. <o>

written coda
*voiceless initial
*voiced initial
<k·>, <ṅ·>, <t·>, <n·>, <p·>, <m·>, <h·> oC
<y·> oa
<v·> o
o̤ʔ ~ ɜ̤ʔ

Given that final <e> after *voiceless initials is [e], I would expect <o> in the same environment to be [o], but the actual vowel is [ao]. (Cf. Khmer in which *eː and *oː in the same environment both became diphthongs: [ae] and [ao].)

Given that final <e> after *voiced initials is [e̤], I would expect <o> in the same environment to be [o̤], but the actual vowel is [ɜ̤]. I guess *-o recently centralized to [ɜ̤], and that a similar change is optional for /o̤ʔ/.

The breaking of *-oj to *-oaj with simplification to [oa] is similar to the breaking of *-ej to *-eaj with simplification to [ea]. A single sound change can be formulated: mid vowel + *-j > that vowel + [a].

I suspect that *-ow became [o] and [o̤] after earlier *-o and *o̤ became [ao] and [ɜ̤].

2. I wish I could have been at the "Using Manchu sources" workshop in Munich to hear the Hölzls' "Chinese Kyakala: The language and its sources".

Besides being of interest from a Jurchenic perspective (Chinese Kyakala is a Jurchenic language), the online presentation introduced me to the Yiddish original of Max Weinreich's famous quotation (original script from Wikipedia):

אַ שפּראַך איז אַ דיאַלעקט מיט אַן אַרמיי און פֿלאָט

a shprakh iz a dialekt mit an armey un flot

Kyakala didn't have an army or a navy. Is it a "dialekt"?

3. Given how Hong Kong is in the news lately, it's a good time to write about Cantonese.

Cantonese has a number of unique Chinese characters representing words absent from literary Chinese and Mandarin. Generally those characters are transparent semantophonetic compounds, but here are a couple that aren't, at least not to me:


2997 1diq4 'to sink' < <WATER> + 1zhyr3 'real' (why?)

11.21.19:24: It took me until now to supply that example because I got distracted and left this entry unfinished until now.

An even more mysterious example is

𗊓< 𘠣+?

3011 2my1 'fountainhead, wellspring'

whose right side <?> is unique to that character. I can't find its right side in Unicode.

If those examples are 'singly' semantomysterious, here's a doubly semantomysterious case: why does the transcription character

𗊛< 𘠣+𗤒

3045 1tshew1

for the Chinese name 曹 Cao (*1'tshaw1 in the Chinese dialect known to the Tangut)  look like 𘠣  <WATER> plus the right side of  𗤒  3305 1kew4 'year'? 3305 might be a partial phonetic because the rhymes only differ by grade (1-ew1 and 1-ew4).

Was <WATER> in 3045 meant to correspond to the 氵 <WATER> in Chinese 漕 <WATER.Cao> *1'tshaw1 'canal', a homophone of the name 曹 *1'tshaw1?


1. In parts 1 and 2 we saw that written Mon <a> and <ā> generally had fronter values before velar, glottal, and zero codas:

a, ɛ̤
ɔ, ɔ̤, oa, o̤a
a, ɛ̤a, ai, a̤i a, a̤, ɛ̤ (!)

The exception to this pattern was <ā> [ɛ̤] after *voiced initials and before <p·> and <m·>.

I would have expected <āv·> to be [ɛ̤w] after *voiced initials, but in fact there is no longer any <āv·>. That rhyme has been respelled as a single symbol ဴ <au> (distinct from the inherent vowel <a> and the dependent vowel ု <u>) pronounced [ɛ̤a] after *voiced initials.

The pattern above has parallels in the pronunciation of the high vowel symbols <i ī u ū> (Nai Pan Hla (1988-89: 10-11, 15-17):


written coda
*voiceless initial
*voiced initial
<k·>, <ṅ·>
<t·>, <n·>, <p·>, <m·>, <h·>
iC i̤C


written coda
*voiceless initial
*voiced initial


written coda
*voiceless initial
*voiced initial
<k·>, <ṅ·>
<t·>, <n·>, <p·>, <m·>, <y·>, <h·> uC


written coda
*voiceless initial
*voiced initial

In Mon, high vowels remain high except after velars and glottal stop (but not the glottal fricative <h·>!). This is unlike Khmer in which high vowels almost always lower after *voiceless consonants.

There are no rhymes ending in [i] and [u]; high vowels must be breathy [i̤ ṳ] in open syllables. (*High vowels *i and *u with modal voice broke into diphthongs [ɔe] and [ao]; cf. the similar breaking of *modal voice *iː and *uː in Khmer to [əj] and [ow].)

I wonder if [ɜK] is from an earlier *euK parallel to [oiK]. Central [ɜ] could be a compromise between front *e and back *u.

<ī> and <ū> are apparently only in native open syllables. Do they exist in borrowed closed syllables? If they do, I imagine they are read as if they were <i> and <u> in borrowed closed syllables judging from Old Mon <jiv·> ~ <jīv·> /ɟiw/ 'Jīvaka (a physician's name)'. (In Shorto's [1971] analysis of Old Mon, /i/ and /u/ were written with both short and long vowel symbols, implying there was no length distinction.)

2. I'd like to see a guide to the history of the Mon-Burmese script. I'm familiar with the Mon-Burmese script of the 12th century Kubyaukgyi inscription and the modern script, but know nothing about the stages in between.

The Mon symbols for အ <°a> and ာ <ā> are identical to those for Burmese, but there are subtle differences between the symbols for high vowels in Mon and Burmese:

i ī °i °ī u ū
°u °ū
𑀼 𑀽 𑀉



In the Kubyaukgyi inscription, both Mon and Burmese have a circle with a stroke inside for <i>. One or both tips of the inner stroke may touch the circle. In modern Mon and Burmese, that stroke has evolved in different ways.

Mon ဣ <°i> has a variant with a redundant ဣိ <i> added. That variant could be transliterated as <°ii>.

Mon ဣဳ <°ī> is easier to remember than its Burmese counterpart; it is simply a combination of ဣ <°i> and ဳ <ī>. Did earlier Mon ever have an<°ī> like Burmese ဤ <°ī> (which is similar to Khmer ឦ <°ī>?)

Similarly, Mon ဥု~ဥူ <°ū> is easier to remember than its Burmese counterpart; it is simply a combination of ဥ <°u> and ု <u> or ူ <ū>, whereas Burmese ဦ <ū> looks like ဥ <u> plus ီ <ī> (why?). The logic of Mon ဥု <°ū> is like that of Khmer  ឩ <°ū> which is ឧ <°u> plus an extra vertical stroke reminiscent of ុ <u>.

3. I just found SEAlang's Old Mon page. Alas, Shorto's Old Mon dictionary has yet to be digitized.


1. Written Mon ာ <ā> has six different phonetic values according to Nai Pan Hla (1988-89: 10-11, 15, 17):

written coda
*voiceless initial
*voiced initial
a ɛ̤a
<k·>, <ṅ·>
<t·>, <n·>, <y·> aC
<p·>, <m·> ɛ̤C

<ā> in *voiced-initial open syllables is pronounced almost like Khmer <ā> [iə] in the same environment.

As with <a>, <ā> has a fronted reading before velar codas. (But <a> did not have a fronted reading after *voiceless initials.)

<ā> also has a fronted (but monophthongal) reading [ɛ̤] before labial codas. This reading is homophonous with <a> before zero and velar (not labial!) codas.

<ā> is never followed by glottal codas or <v·>. <āv·> has been respelled as ဴ <au> which is [ao] after *voiceless consonants but [ɛ̤a] after *voiced consonants.

2. Yesterday I mentioned five nôm spellings of Vietnamese người 'person':


This frequency table for five editions of Kiều lists two more spellings:

3. Looking at 獻花歌 Hŏnhwaga (The Flower-Offering Song, c. early 8th c.) made me wonder why 獻 <OFFER> is abbreviated as 献. 獻 doesn't sound like 南 <SOUTH> which also has no semantic relevance:

xiàn [ɕjɛn˥˩]
nán [nan˧˥]

Is a vague resemblance between the bottom left of 獻 and 南 enough to justify 南 as an abbrevation of 鬳 <CAULDRON>? Ah, I see now that there are variants of 鬳 with a 南-like (⿵冂𢆉) on the bottom. If I had to abbreviate 獻, I'd choose one of three strategies:

The all-new approach dispenses with any attempt to retain any part of the original, since neither 鬳 <CAULDRON> nor 犬 <DOG> have any obvious relationship to offering. 㧥 already exists, and 先 isn't the best phonetic, so it's not an option.

I'm surprised there is no super-simplified replacement for 獻.

4. The character 鬳 <CAULDRON> (Old Chinese *kVrek) itself doesn't make much sense to me since the supposed phonetic 虍 <TIGER.STRIPES> *qʰra according to Shuowen doesn't sound much like it. I suppose *kVr- and *qʰr- are not too distant, but the rhymes don't match at all.

5. Today I read about a Mexican restaurant named Buho in Waikiki. I assume Buho is from Spanish búho 'owl'. If búho is from Latin būbō, why did -b- become <h> (phonetically zero) rather than <b> [β]?


1. I'm going to start a new series to unfold at a glacial pace.

Yesterday [in a post to be uploaded] I mentioned how Mon vowels developed differently depending on the voicing of preceding consonants: e.g.,

*paʔ > [paʔ]

*maʔ > [mɛ̤ʔ] (vowels after *voiced consonants become breathy and are higher than after *voiceless consonants)

Examining the graphonetic gap between spelling and pronunciation can give us some idea of how Mon vowels developed. The two syllables above are spelled ပ <pa> and မ <ma>; the characters have the same inherent vowel <a> though their readings  have different rhymes. Let's look at all Mon <a>-rhymes as presented by Nai Pan Hla (1988-89: 15, 17):

written coda
*voiceless initial
*voiced initial
<k·>, <ṅ·>
<t·>, <n·>, <p·>, <m·>, <h·>, <ʔ·> ɔC
<v·> ɔ ɔ̤
<y·> oa

C is shorthand for 'the coda you'd expect based on the spelling'.

Post-*voiced raising only occurred before graphic zero and velar codas.

( [ɛ̤k] has a lower mid front vowel like Burmese [ɛʔ] < *ak, and [ɛ̤ŋ] has a front vowel like Burmese [ɪ̃] < *iŋ. The Burmese *VK changes, however, have nothing to do with initial voicing; they occur after all initials, voiceless or voiced. There seems to be something about velar codas that make them front-vowel friendly. Also cf. Khmer *aK > [eəK] after voiced initials. Khmer *a does not front between voiced initials and nonback codas: *aC > [oəC].)

In all other environments, readings of Mon <a> are identical except for phonation: modal after *voiceless initials and breathy after *voiced initials.

Nonimplosive voiced obstruents had devoiced, so the pronunciations of written syllables like <kan·> and <gan·> only differ in phonation: [kɔn] and [kɔ̤n].

I imagine that <av·> was pronounced something like *[ɔw] before *[w] was lost. (I don't know whether this loss predated the development of phonemic phonation, a.k.a. register.)

As for <ay·>, I think there is a parallel with French moi [mwa]:

*aj > *ɔj > *ɔe > [oa]

Thai Mon preserves a palatal vowel in [cɔɛ] < Proto-Monic *caj 'louse' (Diffloth 1984: 75). In modern written Mon, 'louse' is spelled <cai>. <ai> is an abbreviated spelling of <ay·> (Nai Pan Hla 1988-89: 2).

Wiktionary has a list of Diffloth's 1984 Proto-Monic, Proto-Mon, and Proto-Nyah Kur reconstructions. Monic is a subgroup of Austroasiatic with two divisions, Mon proper and Nyah Kur.

2. And now for the Khmer side of Mon-Khmer ... yesterday morning while reviewing Khmer, I encountered a couple of interesting words.

3. Last night while copying line 7 of the epitaph of Yelü Dilie as reproduced in Kane (2009: 191-211) to practice the Khitan small script, I found that Kane had transliterated


as <yi.il.iń> instead of <êm.il.iń> which would be the normal transliteration in his system. Kane (2009: 194) thinks it corresponds to a Khitan "name or official title" transcribed in Chinese as what is now read as yilimian and yilibi in Mandarin (which is not far from the northern dialect underlying the transcriptions). But neither <yi.il.iń> nor <êm.il.iń> resembles those Chinese transcriptions. <êm.il.iń> might be a match if the Chinese reversed the medial segments, writing Khitan *emilin as if it were *elimin. <yi.il.iń> is even less likely, as it lacks the labials in yilimian and yilibi.

4. I found four nôm spellings of Vietnamese người 'person' via nomfoundation.org's Nôm Lookup Tool. The last two are unusual:

The use of 仁 as an indirect semantic component reminds me of how 仁 rather than 人 is the Khitan large script character for ku 'person'. Was 仁 a carryover from the lost Parhae script, and if it was, what motivated the Parhae to write their word for 'person' as 仁 instead of as 人?

倘 is like many Tangut characters that have an obvious semantic component and one or more mystery components with no transparent function.

5. It's been almost three years since I switched from Mojikyo to Tangut Yinchuan, and I'm not going back except for preview thumbnails.

Today I noticed that Mojikyo always renders Tangut component 𘡕 086 as 𘢩 170 - the reverse of the mistake I've been making in my handwriting.

There are two minimal pairs necessitating a distinction between the two:

6. Today I discovered that Homophones edition A has the aforementioned 𗯗 5834 2le1 'to change' (24A77) instead of 𗯖 5841 2khwuq1 'to cut' as in editions B2 and B5. Such confusions not only make myself feel better about my many errors writing Tangut but also may give insight into how the script worked. Below 𗯗 5834/𗯖 5841 is 𘖵 5019 2khwuq1 'saw' which has its homophone 𗯖 5841 as phonetic beneath component 𘨝 542 <METAL>. It is written correctly in all three editions.

