My introduction to Khmer historical phonology back in 1994 was Pinnow (1980) which posited twelve long *vowels and four *short vowels:

Pinnow's long *vowels





Pinnow's short *vowels


Pinnow takes the long vowels as basic, so he indicates brevity rather than length.

Last year I discovered Jenner and Sidwell's (2010) reconstructed vowel system:

Jenner and Sidwell's long *vowels


The two *Vːə diphthongs are Angkorian innovations. Modern [ɯːə] (my [ɨə]) is an even more recent innovation; see Sakamoto (1977) who demonstrates that many [ɨə]-words are borrowings from Thai and, to a lesser extent, Vietnamese. (He notes one highly anomalous loan from Sanskrit: រឿណរង្គ <rï̄yeṇaraṅga> [rɨənrŭəŋ] < Skt raṇaraṅga 'battlefield'.)

Jenner and Sidwell's short *vowels





Jenner and Sidwell posit eight short vowels, twice as many as Pinnow's four. They only give one example of the 'extra' vowels (from a Pinnowian perspective):

Old Khmer
Modern Khmer
<radeḥ> ~ <rddeḥ>
*rədeh ~ *rɔdeh
<radeḥ> [rɔtih]

The form in the "Pinnow?" column is my guess according to my understanding of his system.

A sample of Old Khmer <-eḥ> words in Jenner's online dictionary all ended in *-eh with short *e:

<neḥ> *neh 'this'

<peḥ> eh 'to pluck'

<ʔseḥ> *seh 'horse'

Like 'cart', all three of the above words are still spelled with <-eḥ> in modern Khmer.

Are there any Old Khmer <-eḥ> words that were pronounced with *-eːh? Or is short *e an allophone of *eː before *-h?

The problem is that the Khmer script has never had distinct symbols for short and long e, so in theory Old Khmer <e> could have represented either *e or *eː. I can only think of two ways to reconstruct such a length distinction in Old Khmer:

1. Internal evidence: The *e that Pinnow and I reconstruct on the basis of modern Khmer corresponds to two sets of spelling patterns in Old Khmer.

2. Backward projection: If Old Khmer <e> has two sets of correspondences in modern Khmer, then those sets might be reflexes of short *e and long *eː.

(The fact that *e has different reflexes depending on the *voicing of the preceding consonant and on what follows *e is not relevant if the goal is to reconstruct a phonemic length distinction. To do that, one would ideally find two Old Khmer words spelled with the same consonant plus <-eḥ> with different reflexes in modern Khmer. One would then conclude that the Old Khmer script was incapable of indicating the length difference in the vowels of those two words.)

In scenario 1, if some modern Khmer <-eḥ> corresponds to Old Khmer <-eḥ> *-eh, then some other modern Khmer <-eḥ> might correspond to Old Khmer <-X> *-eːh.

In scenario 2, some Old Khmer <-eḥ> *-eh became modern Khmer <-eḥ>, wheas other instances of Old Khmer <-eḥ> *-eːh became modern Khmer <-Y>.

I don't know enough about Old Khmer to guess which scenario is correct, much less come up with other scenarios.

For now I am inclined to go with my allophonic hypothesis. Pinnow's/my *e has different reflexes depending on whether it was followed by *-h:

before *-h
before palatals
after *voiceless initial
[ə] [ej]
after *voiced initial
[ɨ] [eː]

That suggests that *e was phonetically (not phonemically!) different (shorter?) before *-h. The reflexes of *-eh are identical to those for *-ĭh:

before *-h
before *-ʔ and in open syllables
before *-j
after *voiceless initial

after *voiced initial


*e in *-eh might have been short like in *-ĭh. THE ALTERNATE SCRIPT BUREAU'S KHMER SCRIPT FOR ENGLISH (PART 7)

1. The Alternate Script Bureau's (ASB) proposal for writing English in the Khmer script uses the  មូសិកទន្ត​ <mūsikadanta> 'mouse¹ tooth' diacritic ៉ <"> to represent English /æ/. That wouldn't have occurred to me since <"> in Khmer is not a vowel symbol. It has two functions:

- to indicate that a vowel after a voiced consonant symbol is pronounced as if it had been preceded by a *voiceless consonant: e.g.,

យ៉ាង <y"āṅa>​​​ [jŋ] 'kind'

which has the reflex of *aː normally after voiceless consonants as in

ខាង ​<khāṅa> [kʰŋ] 'side'

rather than the reflex of *aː normally after voiced consonants as in

យាង <yāṅa> [jŋ] 'to go (royal)'

- to indicate that ប <pa> stands for [p] rather than [ɓ], its normal value in modern Khmer: e.g.,

ប៉ី​ <p"ī> [pəj] 'flute'

cf. បី​ <pī> [ɓəj] 'three'

(Examples from Huffman 1970 added 8.30.14:48.)

The simplicity of <"> (two short strokes) is appropriate for the fifth most common vowel in English after /ə ɪ i ɛ/. Three of those vowels have one-stroke symbols in the ASB system:

/i/ has a two-stroke symbol ី <ī>.

ASB's <"> is easier to write than my choice of two-stroke <ĕ> (modern Khmer [ae] ~ [ɛː] < *ɛː) for English /æ/.

¹8.30.15:24: Sanskrit mūṣika- 'mouse' is cognate to English mouse.

2. Last night I saw that the English Wikipedia gives two different Khmer spellings of Lon Nol:

The first is the one used by the Khmer Wikipedia which doesn't mention the second. One might expect the two spellings to be homophonous, but I would read them (perhaps erroneously) with different vowels as

Was it really possible to pronounce his personal name Nol two different ways? In case you're wondering what's going on with the gap between spelling and pronunciation, this table may help (K = voiceless obstruent, G = voiced obstruent, = voiced sonorant, P = labial, ฿ = nonlabial):

Earlier Khmer
Modern Khmer
*GɔːC <GaCa> [KɔːC]
*ṄɔːC <*ṄaCa> [ṄɔːC]
*KɔC <KaCa'>
*GɔC <GaCa'> [Kŭə฿] ~ [KuP]
*ṄɔC <ṄaCa'> [Ṅŭə฿] ~ [ṄuP]
*KoːC <KoCa> [KaoC]
*GoːC <GoCa>
*ṄoːC <ṄoCa> [ṄoːC]
(no *CoC)
*KuːC <KūCa> [KouC]
*GuːC <GūCa>
*ṄuːC <*ṄūCa> [ṄuːC]
*KuC <KuCa> [KoC]
*GuC <GuCa>
*ṄuC <ṄuCa> [ṄuC]

Earlier Khmer has both voiceless and voiced obstruents (*K and *G) which merge into voiceless [K] in modern Khmer.

Earlier Khmer has a simple short/long vowel system whose modern Khmer reflexes diverge depending on the *voicing value of the preceding consonant (and the labiality of the final consonant after *ɔ).

After *voiceless consonants, labial vowels are pushed up:

After *voiced consonants, labial vowels either remain the same or are pushed up:

The bending of Khmer vowels reminds me of the bending of Old Chinese vowels. In both Khmer and Old Chinese, *vowels split into two series, 'lower' and 'higher' (though the conditioning factors were different):

modern Khmer
Late Old Chinese
modern Khmer Late Old Chinese
*o > *əw
[ou] *ou > *aw

(Above I give Khmer reflexes for *long vowels in spite of the absence of length in the first column.)

I suspect Tangut also underwent a similar vowel split, though the details are unknown.

One might expect <ṇula> to behave like *ṄuC in my first table: i.e., it should be read [ɳul]. But in modern Khmer script, ណ <ṇa> indicates that the following vowel is read as if it had once been preceded by voiceless *n̥. Khmer never had a retroflex /ɳ/ phoneme, so ណ <ṇa> came to be used as the virtual *voiceless counterpart of ន <na> for /n/. I emphasize the word "virtual" - vowels after ន <na> are pronounced like vowels after, say, ត <ta>, but at no time was <ṇa> ever pronounced *n̥. For instance, <ṇāma> [naːm] 'water' is a borrowing from Thai น้ำ <nā2ṁ> [naːm˥] 'water' which never had *n̥. The word was borrowed after the shift of *aː to *iə after *n. Contrast with នាម <nāma> [niəm] 'name', borrowed from Indic before the shift of *aː to [iə] after voiced *n.

Early Khmer
*naːm not yet borrowed
*aː to [iə] *nm
Later Khmer
[niəm] [naːm]

3. Last night I saw the Khmer spelling of Sisowath for the first time:

ស៊ីសុវតិ្ថ <sˌīsuvatthi> [siːsoʋat]

(I confess I've only read about Cambodian history in English for the last twenty-six years.)

<suvatthi> is from Pali suvatthi- 'well-being'. But what is <sˌī>? I was surprised by the ក្ផៀសក្រោម <kphiasa kroma> 'under dash'² ុ <ˌ> which I've never seen in an Indic loan before. It indicates that <ī> is to be read [iː] as if it had followed a *voiced consonant.

There is an identically spelled Khmer word <s^ī> [siː] 'to eat' with <trīsabda>. Why isn't it <sī> [səj] with the regular reflex of *iː after voiceless */s/? Is it a loanword like another Khmer <s^ī> [siː] 'color', a loanword from Thai สี  <sī> [siː˩˩˦] 'id.' borrowed after the shift of *iː to [əj] after *voiceless consonants was complete?

²8.30.16:07: The ុ <kphiasa kroma> identical to  ុ <u> in shape is a subscript virtual voicing reversal mark. It replaces the ៉ <mūsikadanta> and the ៊ <trīsabda> diacritics when a superscript symbol occupies their positions.

It's taken me twenty-four years to figure out that <trī> is 'three' and not 'fish' and that ៊ <trīsabda> gets its name from its resemblance to ៣ <3>.

4. Last night I saw the name Mohannad Mohanna. Are Mohannad and Mohanna unrelated? Wikipedia says Arabic مهند Muhannad (> Persian Mohannad) is  from Hind 'India' (with mu- + a CaCCaC pattern). And Wikipedia has three entries for places in Iran named  مهنا‎ Mohanna.

5. Wiktionary says Latin sidus 'star' is from Proto-Indo-European *sweyd- 'sweat'. How is that semantically possible?

6. Today I started copying the Tangut character textbook


0152 4009 5370 5449 4797

1kiq2 1dyq4 1paq 1tiq4¹ 2wyr4

'gold grain palm place writing'

a.k.a. The Golden Guide by hand.

The second character in the title (4009) is the only tangraph with component 157 (𘢜).

In Homophones A, 4009 appears with component 160 (𘢟) (16B48).

In Homophones B and D (a.k.a. B2 and B5), 4009 appears with component 157 (𘢜) (17B22).

You can see scans of the Homophones pages on Andrew West's site.

157 seems to be an abbreviated form of 160 with the <WATER>-like portion (component 036 𘠣) written as a single stroke.

The Tangraphic Sea analysis of 4009 derives 157/160 from a right-hand element which I can't find in Unicode: ⿱𘠙𘠅. 006 𘠅 is the right-hand version of 036 𘠣 <WATER>. None of the characters with ⿱𘠙𘠅 have anything to do with water, though:

Li Fanwen number
𗾵 2615
second half of 𗉴𗾵 1687 2615 2chhy3 2khu4 'minced meat' (dictionary only)
𗚭 4142
to chop; bean jam?
𘚑 4446
to break, broken (dictionary only)
𗮝 5359
minced meat
fragmentary, broken; < Chn 碎
𘂉 5900
second half of  𗨦𘂉  3381 5900 2by1 2di4 'fragment' (dictionary only)

Unlike many other tangraphic elements, ⿱𘠙𘠅 has a clear semantic function: almost all of the above involve pieces or making something into pieces.

I regard dictionary-only words as candidates for loans from a substratum. 1687 2615 might be a unanalyzable disyllabic substratum synonym of native 5359.

4142 may be a phonetic loan for 'bean jam'.

Unlike 1687 2615, 4446 is not disyllabic. Could it be a native monosyllable that just so happens not to have been found in any nonlexicographic texts yet?

5359 has a straightforward graphic structure <MEAT.BREAK> and is the basis for 2615 and 4142.

5380 has an unexpected -e that may indicate that the Chinese dialect known to the Tangut already had a pronunciation of 碎 closer to modern standard Mandarin [swej] than Middle Chinese *swiʰ.

5900 may be an adjective 'broken' after 3381 'pellet'. Without any attestations not preceded by 3381, I can't tell if it can stand by itself.

7. I wouldn't have guessed that Hagadone is an Americanization of Hagedorn. THE ALTERNATE SCRIPT BUREAU'S KHMER SCRIPT FOR ENGLISH (PART 6)

1. I used the inherent vowel of the Khmer script to write English /ʌ/, but the Alternate Script Bureau's (ASB) proposal for writing English in the Khmer script uses the long vowel symbol ា <ā> instead. That surprised me because /ʌ/ isn't long.

Another surprise is that in the ASB proposal, ា <ā> does double duty for English /a/. Wouldn't that lead to ambiguous spellings? Maybe not - many of my /a/-words are /ɔ/-words in the dialect ASB is based on (e.g., pot = my /pat/ but ASB's /pɔt/). Putting such words aside, the only minimal pair I can think of is calm /kam/ : cum /kʌm/; both would be written កាមា <kāma> in ASB. I would distinguish them as កាម <kāma> and កម <kama>.

ា <ā> does double duty in my system as well for final schwa: e.g., comma /kamə/ as កាមា <kāmā>. Although I use the inherent vowel for schwa in word-medial position, I can't do so in word-final position where <Ca> represents /C/.

2. Last night I saw an ad for a book by "LEUYEN PHAM" in all caps on my Kindle. It caught my eye because I had never seen the two syllables of a Vietnamese personal name run together before. The front (and only?) page of LeUyen Pham's site asks, "How Do You Pronounce LeUyen Pham?!?" but doesn't answer the question. In Vietnamese, Le Uyên (tones unknown) is pronounced [le ʔwiən] (north) ~ [le ʔwiəŋ] (south). However, I don't know how English speakers pronounce it.

3. I thought I had never seen T in the name of any Hawaiian group until last night when I saw a reference to Hui Aloha ‘Āina Tuahine in the Honolulu Star-Advertiser. Turns out I had first seen the name of that group on p. 363 of Albert J. Schütz' The Voices of Eden back in 1995.

In standard Hawaiian, t shifted to k. However, t persists in tūtū 'any relative or close friend of grandparent's generation' and Tuahine, defined by wehewehe.org as

(More commonly Tuahine). Name of a misty rain famous in Mānoa, Oʻahu, named for Kuahine, who turned to rain after the murder of her daughter, Ka-hala-o-Puna; the rain is also in other localities.

I suspect t survived in tūtū and Tuahine because they were borrowed into English which has a /t/ : /k/ distinction absent in Hawaiian which only has three kinds of stops: labial /p/, glottal /ʔ/, and a third stop whose point of articulation varies by dialect.

Does Tuahine indicate that the Mānoa dialect had [t] for that third stop?

Today Mānoa is a center for education in standard Hawaiian with [k] for that third stop. When Hawaiian was repopularized¹, words with that third stop were pronounced with [k] following their standardized spellings with k.

¹I would rather not say "revived" since Hawaiian has never died. What has been lost is the original diversity of dialects. As far as I know, the only two varieties still spoken by large numbers of people are the Niʻihau dialect native to the population of Niʻihau and the standard language learned in schools.

4. Valdemar Knudsen (1819-1898) is said to have been able to speak "the 3 Hawaiian languages" fluently. What were the three? Hawaiian, English, and Pidgin Hawaiian? (Hawaiian Creole English, now 'Pidgin', had only begun to develop during Knudsen's last years.)

The Pidgin Hawaiian article at Wikipedia says a couple of surprising things:

Emerging in the mid-nineteenth century, it was spoken mainly by immigrants to Hawaii, and mostly died out in the early twentieth century, but is still spoken in some Hawaiian communities, especially on the Big Island.

It's still alive? I thought it was extinct. Has anyone done any modern fieldwork?

Like all pidgins, Pidgin Hawaiian was a fairly rudimentary language, used for immediate communicative purposes by people of diverse language backgrounds, but who were mainly from East and Southeast Asia.

Southeast Asia? As far as I know, mass Southeast Asian immigration to Hawaii postdates the Vietnam War. THE ALTERNATE SCRIPT BUREAU'S KHMER SCRIPT FOR ENGLISH (PART 5)

1. The Alternate Script Bureau's (ASB) proposal for writing English in the Khmer script has no inherent vowels, so it has vowel symbols correspondng to my inherent vowel: <ā> for /ʌ/ and បន្តក់ <pantaka'> /bɑntɑʔ/ <'> for /ə/.

This site
Khmer script transliteration Khmer script transliteration Khmer script transliteration
ក៉ម្ប់ស <k"amp'asa>
កែម្បស <kĕmpasa>
ពាត់ន <bāt'ana> ពតន <batana>

(8.28.1:36: Added my guesses for H&P-style forms.)

One good reason to use <'> for schwa is that it is a simple, short stroke. It would be impractical to write the most frequent vowel in English with a complex shape.

It hadn't occurred to me to use <'> as a vowel character because in Khmer proper it functions as a breve for the inherent vowel and <ā>, not as a vowel character:

បន្តក់ <pantaka'> /bɑntɑʔ/

(A hypothetical †តក <taka>​ would be †/tɑːʔ/. In theory the name of the diacritic could be written with two <'> as ˟បន្ត់ក់​ <pan'taka'>, but unstressed initial <CaC> syllables always have short vowels, so a second <'> is redundant.)

កាត់ <kāta'> /kat/ 'cut'

(The resemblance to the English word is coincidental; compare កាត <kāta> /kaːt/ 'card' without <'>.)

In Khmer proper, <'> appears atop the symbol for a syllable-final consonant following the symbol with the vowel it shortens, whereas in ASB, <'> behaves like a vowel symbol, combining with the symbol for the consonant that immediately precedes a schwa.

(8.28.1:22: Khmer examples of <'> added. បន្តក់ <pantaka'> is unreadable in ASB since ASB has no inherent vowels - it looks like ASB [p-nt-kə].)

2. Khmer ថៃ <thai> [tʰaj] must have been borrowed after Thai *d- > tʰ-, and ថៃឡង់ដ៍ <thaiḷaṅ'aṭa˟> [tʰajlɑŋ] may be an even more recent borrowing from French Thaïlande [tajlɑ̃d] with Khmer [ɑŋ] approximating French [ɑ̃]. But surely the Khmer had a word for 'Thai' predating those borrowings. Did Khmer ever have a word like †ទៃ <dai>? The only premodern Khmer word I can find in Jenner is an undated សៀម <siama> 'Siam'.

3. When looking for 'Thai' in Philip Jenner's Old Khmer dictionary last night, the only entry that appeared was ស៊ង <s"aṅa> 'two', a borrowing from Thai /sɔːŋ/ 'id.' attested in a text from 1684. The <"> indicates that <sa> by itself could not represent /sɔː/. The split of /ɔː/ to

/ɑː/ after *voiceless consonants

/ɔː/ after *voiced consonants

must have already occurred. The addition of <"> indicates that the following vowel is one normally associated with a *voiced consonant: i.e., /ɔː/ in this case. សង <saṅa> without <"> was /sɑːŋ/ < */sɔːŋ/ 'to give back'.

That split occurred after the loss of voicing in obstruents.

Thai also devoiced its obstruents, but unlike Khmer, it aspirated them: e.g., *d- > tʰ-. So I was surprised to see Thai พัน <ban> *ban (now /pʰan/)  'thousand' borrowed as Khmer ពទ <bana>. Is the Khmer spelling merely a mechanical copy of the Thai spelling, or does it indicate that Thai *b had not yet shifted to pʰ-?

4. What is the etymology of the name Odoacer?

5. I think I've only spoken of 'Turkish' borrowings into Balkan languages. Marek Stachowski (2019) writes, "it is much better to call Turkish loanwords in the Balkan languages just 'Turkish', which is sufficiently clear in English." Whew. But I confess that I used the term 'Turkish' without knowing his reasoning against the term 'Ottoman Turkish'. THE ALTERNATE SCRIPT BUREAU'S KHMER SCRIPT FOR ENGLISH (PART 4)

1. Exactly how the Brahmi script - the parent of all Indic scripts - developed is not clear, but one thing is for sure: the principle of inherent vowels is ingenious. Brahmi was first used to write Middle Indo-Aryan whose most common vowel was short a. Making this short a inherent to consonant symbols saved a lot of effort and space.

This state of affairs reflects a Proto-Indo-Iranian innovation: the merger of Proto-Indo-European *e/*o into *a (there was no Proto-Indo-European *a according to Beekes¹):

*eH, *oH, *ē, *ō
*ei, *oi
*eu, *ou

Whitney (1896: 26) found that nearly 20% of segments in a Sanskrit text sample were short a.

Middle Indo-Aryan (e.g., Pali) inherited that abundance of a from Sanskrit (= Old Indo-Aryan) and gained a few more short a via the loss of other vowels: e.g.,

Skt mr̥ga- > Pali maga- (also miga-!) 'deer'

(See Masica 1991: 167-168 for more examples.)

And it was at that stage that Brahmi was developed to write a-filled Middle Indo-Aryan. (Sanskrit was first written after its descendants were!)

An inherent a in the Brahmi script is a good fit for Old and Middle Indo-Aryan. But is it a good fit for English? What vowel is most frequent in English? My guess was schwa, and I was right. So in my adaptation of Khmer script to English, the inherent vowel is schwa: e.g.,

campus [kʰæmpəs] > កែម្បស <kĕmpasa>

cf. ASB ក៉ម្ប់ស <k"amp'asa>

(Example added 8.27.0:39. ASB added 8.27.20:31. I write English voiceless obstruents as Khmer unaspirated voiceless obstruents regardless of their allophonic aspiration in English: e.g., English /k/ [k] ~ [kʰ] asក <ka>.)

I also use the inherent vowel to write /ʌ/ and syllabic consonants: e.g.,

button [ˈbʌtn̩] > ពតន <batana>

cf. ASB ពាត់ន <bāt'ana>

(8.27.20:34: ASB added.)

(8.27.0:03: Like actual Khmer, I write final consonants with <Ca> symbols. I could use virāmas, but if I can live without them in Khmer and Thai without word spacing, I can live without them in English with word spacing.

In theory I could write syllabic consonants as subscript consonants: e.g.,

button [ˈbʌtn̩] > ពត្ន <batna>

but I prefer to reserve subscript consonants for clusters of nonsyllabic consonants.)

On the other hand, the Alternate Script Bureau's (ASB) proposal for writing English in the Khmer script has no inherent vowels because "it only adds an additional layer of complexity".

¹Like Pulleyblank, I find this situation highly improbable, and I suspect original central *a and later polarized to front and back vowels *e and *o.

2. The first proposed etymology of culvert at Wiktionary is almost meaningless: "a dialectal word". Almost, because at least that tells us the proposer thinks the word is a borrowing from another dialect. But which dialect - and what is its derivation there?

3. Today I realized I never learned how to write Burmese rotated subscripts which only appear in  Pali loanwords:

ဌ <ṭha> > ဏ္ဌ <ṇṭha>
ဍ <ḍa> > ဏ္ဍ <ṇḍa>

Are they written like their full-size counterparts but at a different angle: e.g., is sideways ဌ <ṭha> written  counterclockise from right to left rather than counterclockwise from top to bottom like its upright version?

(8.27.22:39: I found the answer on p. 402 of John Okell's comprehensive Burmese: An Introduction to the Script [429 pages!]: both rotated subscripts are written clockwise from left to right.)

Today I also realized that there is a logic to the rotation of subscript characters:

Burmese: An Introduction to the Script doesn't mention a subscript version of ဠ <ḷa>, and most of my Burmese fonts don't have such a subscript. However, the Myanmar Text font does: its subscript <ḷa> (written counterclockwise when full-sized) rotates clockwise (cf. ဌ <ṭha>) even under normal-width characters. Is subscript <ḷa> real or artificial? I've never seen any Cḷ-clusters in Sanskrit or Middle Indo-Aryan and wouldn't expect any since originates from a lenition of intervocalic ḍ: e.g.,

Skt soaśa > Pali soasa 'sixteen'

Skt garua- > Pali garua- 'garuda'

(Examples added 8.27+.12:48. See Masica 1991: 170 for more.)

4. The Jurchen character 右 <mei> is a lookalike of Chinese 右 <RIGHT>.

Today it occurred to me that <mei> sounds a bit like Japanese 右 migi 'right'. A mostly dubious scenario: THE ALTERNATE SCRIPT BUREAU'S KHMER SCRIPT FOR ENGLISH (PART 3)

1. Indic scripts like Khmer tend to have a wealth of consonant characters. This is because Indic scripts were originally intended for Indic languages characterized by

Most of these oppositions do not exist in English.

On the other hand, in spite of that wealth of consonant characters, Indic scripts originally¹ lacked characters for several fricatives that exist in English: /ʒ z θ ð f/.

The Alternate Script Bureau's (ASB) proposal for writing English in the Khmer script assigns 'extra' characters from an English perspective to English fricatives:

This site
Khmer script transliteration Khmer script transliteration Khmer script transliteration




/ʒ/ ហ្យ>






/ð/ ឌ?






I've included Huffman and Proum's (1983: 31, 42; hereafter H&P) transcription and my own for comparison. English /w/ is not a fricative, but I have included it because Khmer has no /v/ : /w/ distinction.

/ɣ/: Did ASB include this for symmetry with /x/ which is extremely marginal in English? /ɣ/ reminds me of Sanskrit l̥̄ which was created to be parallel to the rare but real phoneme r̥̄. (In Khmer, the character ឮ <l̥̄> devised for that purely hypothetical Sanskrit phoneme is used for the real and really common word [lɨː] 'to hear'.

/ʃ/: ASB's choice of ឆ <cha> reminds me of the Thai convention of borrowing English /ʃ/ as /tɕʰ/: e.g., shampoo as แมพู <jeemabū> [tɕʰɛːmpʰuː].

Unlike H&P and ASB, I use the extinct Khmer character ឝ <śa>: cf. Hindi शैम्पू <śaimpū> 'shampoo'.

/ʒ/: I use the ត្រីសព្ទ <trīsabda> diacritic <^> to indicate that voiceless <śa> is pronounced with a voiced initial. In actual Khmer orthography, <trīsabda> over a <voiceless> consonant indicates that a following vowel is pronounced as if it had originally followed a *voiced consonant: e.g., ហ៊឵ន <h^ān> [hiən] 'to dare' is pronounced as if it had developed from a (nonexistent and impossible) *ɦaːn. I could transliterate <ha> + <trīsabda> as <ɦa> rather than as <h^a>, but to imply that Khmer once had [ɦ] would go against the general historical/etymological principle of my transliteration.

/z/: H&P's ឯ <°e> may be a case of arbitrarilly using a Khmer character that would otherwise go unused.

I would have expected ASB to use ឍ <ḍha> or ​​​​ធ <dha> for /z/ by analogy with the other voiced aspirates for voiced fricatives. <Z> on my Khmer NIDA keyboard layout is assigned to ឍ <ḍha>.

My ស៊ <s^a> is based on the same principle as my ឝ៊ <ś^a> (see above).

/ð/: ASB's ឋ <ṭha> is surprising since this character originally did not represent a voiced consonant.

/f/: H&P's ហ្វ <hva> is carried over from an existing Khmer convention to write foreign /f/.

/w/: H&P use អ្វ <ʔva> because វ <va> is already taken for /v/. In modern Khmer, វ <va> is pronounced [ʋ] in initial position, but in earlier Khmer, it may have been pronounced [w].

¹"Originally" because characters werd later devised for such fricatives: e.g., Devanagari ज़ <j.a> for [zə]. But such characters created in India postdate the spread of Indic script to Southeast Asia, so there is no Khmer analogue of Devanagari ज़ <j.a> for [zə].)

²The actual printed character is ឧ <°u>, perhaps because Huffman and Proum's (1983) was written on a typewriter without a ឌ <ḍa> key. Compromises were inevitable on typewriters:

The Keyboard layout for 120+ elements of Cambodian script and essential punctuation marks was a very difficult task because of the limitation to 46 keys and 96 positions of the standard typewriter.

2. Could Khitan large script character 2091

be a variant of 2050

<taulia> 'hare'?

(8.26.17:01: I found 2091 without context in N4631 which has no gloss for it. 2091 may either be distinct from 2050 [i.e., not occur in calendrical contexts where 'hare' is expected] or represent 'hare' - the actual animal - in a noncalendrical context that remains to be deciphered.)

3. Korean 표고 phyogo 'shiitake mushroom' caught my eye because it is one of a small number of native words with yo. (Perhaps the most important are 좋 choh- < 죻- /cyoh/ 'good' and 소 so < 쇼 /syo/ 'cow'. The distribution of y is skewed in Korean. It most often precedes ŏ, and for twenty years, I have thought that many if not all native go back to an Old Korean *e (not to be confused with modern Korean ㅔ e < /əj/). But where does native yo come from?

Martin et al. (1967: 1758) suggest that phyogo may be Chinese but do not specify a Chinese source. -go /ko/ sounds like Middle Chinese 菇 *ko 'mushroom'. Wikipedia lists a number of Chinese words for 'mushroom' (given here in standard Mandarin pronunciation, though I do not know if they are all standard Mandarin words), but none are like the †piaogū [tone for first syllable unknown] that would theoretically correspond to Korean phyogo.

Here is the distributin of phy- in native words in Martin et al. (1967):

None are core words that would be likely retentions from Proto-Koreanic.

Since Korean ph- is from *kVpV- and *pVkV-, perhaps there was a constraint in some intermediate stage against clusters like *kpy- and *pky-. However, a three-consonant cluster constraint does not explain the paucity of native ky-words not beginning with kyŏ-. I'll look at those words later. Why k-? k- is the most common initial consonant letter in hangul (not counting the zero initial), whereas ㅍ ph- is the least common (not counting reinforced consonants like pp-).

