While looking in Endymion Wilkinson's Chinese History: A Manual (2000 edition - I'm three editions behind) for an English equivalent of the Chinese (and Khitan) title 開府儀同三司 for my last entry, I stumbled upon the Taiwanese word 甲 kah [kaʔ˧˨], a measurement of land, on p. 243. I was surprised to learn that it was a borrowing of Dutch akker (cognate to English acre - though a kah is actually about 2.1 acres).

I had always assumed that the Dutch had never left any linguistic traces in Taiwan. Wrong!

How many other Batavo-Taiwanese words are there? The Wikipedia entry on Taiwanese doesn't mention the existence of Dutch loans.

I just found that Wiktionary has a long English entry for 開府儀同三司. Nice. VEXED BY FIREFOX (PART 1)

I've almost always been able to use Firefox when Chrome failed me. Until now.

What would Firefox be called in Tangut? How about


4408 1870 1my1' 1jy2 'fire fox'?

The first half (4408) has bothered me for years for two reasons.

First, why does 4408 contain what looks like a <WOOD> radical (𘡩) atop what have been thought to be two <FIRE> radicals (𘠠 and 𘧦)? Compare 13+-stroke 4408 to the 4-stroke simplicity of Chinese 火 'fire'. The Tangraphic Sea analysis is improbable: would the graph for the basic word for 'fire' really be derived from the graph for half of a n apparently nonbasic word for 'fire'?


4408 1my'1 'fire' =

top and left of 4413 2pu4 'to burn, ignite' (semantic) +

right of 5082 1vi1 (second syllable of 𘓼𘍽 4555 5082 1py1 1vi1 'fire', only attested in dictionaries; could the first syllable, attested as the name of the trigram for 'fire', be cognate to 4413?; semantic)

The derivation for 4413 is unknown.

The derivation for 5082 is circular:


5082 1vi1 (second syllable of 1py1 1vi1 'fire') =

left of 5286 (second sylllable of 𘄦𘍵 1772 5286 1ten4 1vi1  'intelligent'; phonetic) +

right of 4408 1my'1 'fire' (semantic)

Surely 4408 was devised before 5286.

Second, 𗜐 4408 1my'1 < *miX 'fire' has the mysterious phonetic characteristic that I call 'prime' and represent as an apostrophe which is easier to type than a true prime symbol. I represent its pre-Tangut source as *X (though I could just as easily carry over the prime notation, since I have no idea what *X was). A mi-word for 'fire' is widespread in Sino-Tibetan, but none of the cognates of pre-Tangut *miX contain any obvious segment or tone that plausibly correlates with *X. Suppose, for instance, that I proposed that pre-Tangut *X corresponds to Written Burmese -ḥ. That correspondence works for 'fire' and 'nine' but not for 'two' and 'five'. Written Burmese 'two' lacks -ḥ, and (pre-)Tangut 'five' lacks *-X/-'.

7.11.21:55: A table of the above words and more:

Li Fanwen number
Proto-Southern Qiang
Written Burmese
Written Tibetan
Old Chinese
𗜐 4408
*mu/i (H)

𗍫 4027
*(χ)nə (L)
nhac < *n̥ik kni

𘕕 5865
*khsi -
suṁḥ nhomh

𗥃 2205
*grə L

𗏁 1999
*pŋi < *pŋa? *ʁuɑ L
pïnga lnga

𗢭 3113
*ŋgiX < *ŋgiwX? *χguə


Proto-Southern Qiang reconstructions are from Evans (2001). Key to his tone symbols:

Evans did not reconstruct a tone for 'nine'. Using his notation, I would reconstruct *(H): Longxi and Mianchi have high tones, but Taoping has a mid tone which normally points to *L.

My near-total ignorance of Pyu basic vocabulary (e.g., 'fire') does raise the troubling possibility that Pyu is a non-Sino-Tibetan language with loans from Sino-Tibetan. Tai has borrowed nearly all its lower numerals (with the exception of 'one') from Chinese.

My reconstruction of pre-Tangut 'nine' implies a chain shift:

*-k >*-w > Ø

Pre-Tangut *-w was lost, and Tangut gained a new -w from the lenition of pre-Tangut *-k: e.g., in

𘈩 0100 *kʌtik > *lew 'one'

'Three', 'four', and 'five' all had the same tone (or, more likely, segmental source of a tone) in Proto-Lolo-Burmese, and I suspect that tone source spread from one numeral to the others. (Cf. how *-i spread from 'four' to 'five' in pre-Tangut. Or how ) If that tone corresponded to Pyu -h, then that tone source spread from 'three' to 'four' and 'five' in Proto-Lolo-Burmese (or some ancestor of PLB). But that scenario assumes Pyu is conservative, which I don't think it is.

A huge problem is that the final segments (or quasi-segments in the case of [pre-]Tangut *-X/-') line up poorly. Ideally I'd like to see a pattern like

Tangut tone 2 : Written Burmese -ḥ : Pyu -h : Written Tibetan -s : Old Chinese *-s

among the oldest languages (Proto-Southern Qiang tones are of recent origin), but there are no instances of that above. And the thought of languages adding a final *-s or *-h to some random numerals but not others bothers me.

Also disturbing is the possibility that (pre-)Tangut *-X/-' corresponds to nothing in any other language because it is a reflex of a Proto-Sino-Tibetan phonetic feature completely lost elsewhere. I'd like to think that maybe some Qiangic language (i.e., a relatively close living relative of Tangut) has something corresponding to (pre-)Tangut *-X/-'. Proto-Southern Qiang apparently isn't that language.

One more possibility is that *-X/-' is unique to Tangut because it reflects a substratum language which had it. But that hypothesis cannot be tested since we know nothing about such a substratum language (unless its traces are in the so-called 'ritual' language [see Andrew West's skeptical take], and -' does not seem to be any more prominent in that subset of the Tangut vocabulary -  in fact, -' is even less frequent in the 'ritual' numerals than in the regular ones!). And if a substratum language had -', why would its speakers impose that feature onto a language that didn't have it? I don't know anything about the English of Hmong native speakers, but I imagine that English does not have any uvular phonemes (that is, a feature in Hmong absent in English).

However, I can imagine a situation in which a speaker of a continental Altaic-type language would introduce uvulars into English because uvulars and velars are in complementary distribution in their own language (i.e., nonphonemic): e.g.,

native, English /ki/ > [ki]


native, English /ka/ > [qɑ]

(For convenience I use the symbol /k/ to represent the Altaic back consonant. One might argue that ideally I should use a symbol other than /k/ or /q/ to avoid implying that one allophone is more like the Platonic form of the phoneme than the other.)

But ... when Khitan and Manchu actually did encounter [ka]-type combinations violating their phonotactics in Chinese, they borrowed them as /ka/: e.g.,

As a result, the uvular-phonetic distinction became phonemic as well as phonetic: e.g., these new imported /ka/ contrasted with native /qa/.

Then again, I am citing written Khitan and Manchu which may have reflected an elite, idealized pronunciation. Some Khitan and Manchu speakers learning Chinese might have pronounced uvulars before /a/. If they did, at least they had a phonotactic motivation for doing so. The phonotactic motivation, if any, for pronouncing whatever -' was in Tangut is unknown. Minimal pairs such as


3513 1my1 'sky' : 4408 1my'1 'fire'

seem to rule out a phonotactic motivation.

Could the fact that 'sky' and 'fire' had different vowels in pre-Tangut be relevant? Could I abandon *X and instead propose that

No, because there are cases of -y' from pre-Tangut *-uX and -y from pre-Tangut *-i: e.g.,


0320 1vy'1 < *NApuX or *CANpuX 'soft, weak' (cf. Japhug mpɯ < *-u 'soft')


4880 2ryr1 < *riH 'copper' (cf. Written Tibetan gri 'knife'?)

(The -r of 2ryr1 is vowel retroflexion conditioned by *r-. As 1lyr'3 'four' above demonstrates, there is no phonotactic constraint against ' coexisting with retroflexion, so I cannot claim that 2ryr1 would have ended in -y' if not for retroflexion.)

There is even a doublet for 'worm'


1888 2by1 < *mbuH and 5270 1by'1 < *mbuX

which is cognate to Written Tibetan Hbu [mbu] 'id.' See Gong Hwang-cherng's "A Hypothesis of Three Grades and Vowel Length Distinction in Tangut" (1995) for more examples. (Gong's 'long vowels' correspond to my V' 'vowel-prime' sequences. The correct explanation for -' would have to account for such doublets. TWENTY BLADES OF CHINESE GRASS

At the end of "An-derused", I wrote,

I was surprised to see 漢 <CHINESE> 한 Han with 艹 instead of 廿 on the top right on the cover of 最新版常用學習三千漢字 Chhoeshinphan sangyong haksŭp samchŏn hancha (Three Thousand Hanja for Everyday Study: New Edition).

I was even more surprised to look inside and see the entry for 漢 <CHINESE> 한 Han on p. 47. Each of the three thousand hanja in the book has a large entry character atop a chart showing how to write it in seven steps and one or more example words containing it. The large entry character is 漢 with 艹 (resembling the character component <GRASS> though actually having nothing to do with grass) on the top right. However,

That must be confusing to someone who does not know how to write the character. Wiktionary shows both ways to write <CHINESE>. If I were to write a book on hanja, I'd bring up the variation of <CHINESE>.

I can describe that variation in terms of Unicode:

So why don't I just type U+FA47 and U+FA9A instead of resorting to phrases like "漢 with 廿 on the top right"? Because I don't think most people have fonts that support the distinction between the two forms.The <CHINESE> hanja that you see here in fact has a third Unicode codepoint: U+6F22. Why are there three codepoints for two forms of <CHINESE>¹?

This table is my attempt to show the relationships between a few encodings and forms of <CHINESE>:

Unicode codepoint
Unicode glyph
Japanese equivalent
North Korean equivalent
South Korean equivalent in KS C 5601-1987
U+6F22 font-dependent
艹-version 廿-version (?)
廿-version 廿-version none none
艹-version none
艹-version none

The duplicate codepoints in Unicode are a byproduct of the different versions of <CHINESE> corresponding to U+6F22 in Japanese and North Korean encodings. In an 'ideal' Unicode without regard for non-Unicode encodings, there would either be two codepoints for the two versions (following a maximalist philosophy of one codepoint per form) or just one (following a minimalist philosophy of one codepoint per platonic character) but not three.

¹There are in fact at least 31 forms of <CHINESE>, but the 廿~艹 variants (and the simplified Chinese form 汉) are all that are needed for everyday purposes. AN-DERUSED

Of course right after I finished my previous post on 顏 U+984F~顔 U+9854 for Sino-Korean 안 an <FACE>, I realized I should have checked the very first Sino-Korean dictionary, 東國正韻 Tongguk chŏngun (1447), which is also one of the earliest hangul texts. Its entry for ᅌᅡᆫ ngan (the prescriptive 15th century reading of <FACE>) has the form 顏 U+984F. Needless to say, it is absurd to draw direct lines across three vastly different periods, but I'll do so anyway:

Tongguk (1447): 顏 — Gale (1897): 顏 — Sae chajŏn (1961): 顏 (all U+984F)

Those are the three earliest texts in my survey so far. I am certain of what I have seen in them. I am less certain about these search results from titles and authors in the National Library of Korea's database (via Cambridge's list of Korean studies resources), since it's possible someone typed one form instead of the other:

Sorting the results by date reveals some obvious typos: e.g., modern items like a book with 2018 in the title dated "201" instead of "201X". And the difference between "201" and "201X" is more obvious than that between 顏 U+984F and 顔 U+9854.

It is certainly not true that 顔 only appears in post-1961 books. The earliest result for 顔 is 史鉞 Sawŏl (The Axe of History, 1506). Although I don't have time to go through the online scan of the book (there is no search function), I can believe 顔 was in it, since the earliest attestation of that form that I can find is in the Chinese rhyme dictionary Guangyun (1008).

Conversely, it is also not true that 顏 U+984F is absent from recent publications, as ... oh no. The results include anything with an in the title or author's name regardless of whether it's spelled 顏, 顔, 晏 (the surname of the author of Sawŏl), in hangul as 안, etc. I suppose that makes sense in a time when few people may know what the proper hanja is. But why does searching for 顏 U+984F and 顔 U+9854 generate different results if all that matters is the presence of a syllable an regardless of written form? I don't know. I wonder if the site developer will ever address that question.

I'm going to look at the question of 顏 U+984F~顔 U+9854 from one last angle. Here is a list of the frequency of the two forms in South Korean national newspapers according to Google. I have arranged the papers in order of circulation whenever I could find figures. The figures were partly undated, so this table cannot be interpreted as a true ranking. I just wanted a rough idea of the popularity of the various papers.

顏 U+984F 顔 U+9854 Notes
Chosun Ilbo
1.8 million
顔 U+9854 figure includes instances in the paper's Japanese edition.
JoongAng Ilbo
1.3 million
0 in spite of the fact that the paper does not have a no-hanja policy like Hankyoreh (see below).
Dong-A Ilbo
1.2 million
顏 U+984F figure excludes instances in the paper's Chinese edition.
顔 U+9854 figure includes instances in the paper's Japanese edition.
Seoul Shinmun

Kyunghyang Shinmun 350,000

Hankook Ilbo 213,200

The paper has a no-hanja policy in its Korean edition, so the figures are for 顔 U+9854 in the paper's Japanese edition.
The one instance of 顏 U+984F is in a comment in the Japanese edition and is probably a character selection error, as the rest of the comment is in postwar characters; the writer is not someone like me who insists on prewar orthography.
Kookmin Ilbo

Munhwa Ilbo

For comparison, in Asahi shinbun, 顏 U+984F appears 30 times and 顔 U+9854 appears 165,000 times. (Those figures include <FACE> for native kao as well as Sino-Japanese gan, whereas <FACE> in Korean only represents Sino-Korean an.) Kanji are alive and well in Japanese, whereas hanja are in decline in Korean. I took hanja seriously when I first started learning Korean in 1987. The newspapers were full of them then. But now Hankyoreh has zero in its Korean edition. An all-kana Japanese newspaper is unthinkable today, even though Japanese TV news reporters demonstrate it is possible to present the news orally without any kanji (not counting onscreen text).

What is missing from the figures above are a sense of proportion and the time dimension. What is the frequency of each form of <FACE> per million characters (counting hangul letter blocks as single characters) per year per publication? My guess is that the Japanese usage of both forms of <FACE> has remained constant after the postwar writing reform, whereas <FACE> in either form has become increasingly infrequent in Korean, though 顔 U+9854 has taken the lead due to

It would be interesting to see frequency figures for various types of hanja in publications over time. I suspect, for instance, that the usage 日 il 'Japan, day' has declined but not to the same degree as <FACE> because newspapers continue to use 日 as an eye-catching abbreviation of 日本 Ilbon 'Japan', particularly in headlines. Even JoongAng Ilbo which has zero instances of <FACE> has 1,650 Google instances of 日, presumably mostly for Il 'Japan'. However, the example words for 日 in Grant's (1982: 43) A Guide to Korean Characters have the following frequencies in JoongAng Ilbo according to Google:
Those are 日常單語 ilsang tanŏ 'everyday words', so the low frequency of their spellings does not reflect the frequency of the words that those spellings represent. Here are the hangul spelling frequencies in JoongAng Ilbo according to Google:
JoongAng Ilbo has been around since 1965. I suspect Grant's example words were sometimes written in hanja in 1965 issues which are of course not Google-searchable. Here's an low-resolution image of the top of the front page of the debut issue. The largest characters - the only ones I can make out - are all hanja:

<NUMBER> 호 ho appears on the front page as 號 U+865F and as 号 U+53F7, the simplified form also used in postwar Japan. I get the impression that official standards aside - 号 U+53F7 isn't supported by KSC encoding or in the 1,800 hanja taught in secondary schools - Korean typographers and even reference book writers are not purists when it comes to hanja forms. I was surprised to see 漢 <CHINESE> 한 Han with 艹 instead of 廿 on the top right on the cover of 最新版常用學習三千漢字 Chhoeshinphan sangyong haksŭp samchŏn hancha (Three Thousand Hanja for Everyday Study: New Edition). J-AN-US: THE TWO <FACE>S OF NAVER

Why do I care so much about minute variations like 顏~顔 for Sino-Korean 안 an <FACE>?

In TJK¹ studies, subtly different graphs are often regarded by modern scholars as separate entities. Whether such differences also reflect linguistic differences requires study.

No such study is needed to know that 顏 and 顔 are the 'same' character in one sense. But in Unicode, they are not: 顏 is U+984F and 顔 is U+9854. Unicode is not consistent about assigning variants to different codepoints. That is not necessarily a flaw. Should the VS17 and VS18 forms of 喩 U+55A9 have different codepoints? I couldn't tell them apart without laying VS18 over VS17. On the other hand, why doesn't VS19 of 囀󠄀 U+56C0 have its own codepoint? Andrew West has much more on this issue.

Back to Korean: my interest lies in determining what the de facto standard form of 안 an <FACE> is or was at different points of time.

Last night I forgot to check Gale (1897), one of the first Korean dictionaries I ever used. Page 943 has 顏 U+984F.

Today I use Its hanja dictionary treats 顏 U+984F as the 本字 ponja 'original character' of 顔 U+9854, but its entry for 顔 U+9854 is lengthier, including lists of 18 words and 5 phrases containing 顔 U+9854 without equivalents in the entry for 顏 U+984F. Clearly the dictionary regards 顔 as the principal form. Yet if I run a search on those characters throughout the entire dictionary (i.e., if I have 전체 chŏnchhe 'entire body' selected), I get

One might conclude that the 31 words containing 顏 U+984F can never be written with 顔 U+9854, that there are no phrases that can be written with 顏 U+984F, etc. But that isn't true: anything that can be written with one can be written with the other. And yet there may not be any overlap between those lists: e.g.,

If someone runs into 顏面 anmyŏn 'face' with U+984F in a text and looks it up in Naver, one will find the words
but not 顏面 anmyŏn 'face' itself!

My impression is that in South Korea 顔 U+9854 has become dominant but has not yet fully eclipsed 顏 U+984F. Otherwise I would expect Naver to be like Japanese dictionaries which have a single main entry for 顔 U+9854 and list 顏 U+984F as a variant.

I predict that the domination of 顔 U+9854 will increase over time as Koreans type fewer hanja and only use hanja that their IMEs provide for them: e.g., 顔 U+9854 but not 顏 U+984F in the case of Windows.

¹Andrew West's term for Tangut/Jurchen/Khitan, a play on CJK for Chinese/Japanese/Korean. THE GOOD FACE OF MILLET

Leftovers from yesterday:

1. Lenition was an unspoken theme of "Fanning Red Ears of Grain". Yesterday I realized that

Dutch goeie < goede 'good'

is like

Japanese 良い yoi < yoki 'good'.

In both cases, the lenition is not regular. Not all intervocalic -d- and -k- have disappeared from those languages. Goede still exists as a formal form, and 良き yoki still exists as an archaic form. Both unlenited broeder 'brother' (religious) and lenited broer 'brother' (sibling) coexist in formal Dutch. Japanese -k- almost never lenites in noninflected forms: e.g., 時 toki 'time' has not become ˟toi. (The one exception I can think of is 垣間 kaima 'gap in a fence' < kaki-ma-mi 'fence-space' in which the noun kaki 'fence' - never ˟kai by itself - lenited.) Lenition is mandatory in inflected forms apart from archaisms, and not all hypothetical archaisms are possible: e.g., no one says ˟kakita 'wrote' instead of kaita < *kakitari. *tari didn't regularly become ta in Japanese; it is another example of reduction. All these examples demonstrate how reduction is not necessarily regular like a 'sound law'. I expect nonreductive sound changes to involve exceptionless sound laws: e.g., *dh > *d in Germanic. (Maybe not the best example since *dh > *d could be regarded as reduction [aspiration loss], but I don't know of any language in which deaspiration isn't regular. Was *dh really *ð? See Phoenix's )

I got the broeder/broer example from Phoenix who has more on Dutch (and Irish) lenition.

For even more including cases of hypercorrect -d-insertion in Dutch, see de Vaan (2018: 64-65).

I assume goei (formal goed) 'good' is a product of backformation from lenited goeie, as I don't know of any other cases of Dutch final -d [t] becoming -i.

2. I forgot to mention one other 'ao-ddity' in "Fanning Red Ears of Grain" - a Japanese name that is sui generis as far as I know:

*apa-pu 'millet-place.where.grow' > *ababu > *aβaβu > *awawu > *awau > *awɔː > 粟生 <ahafu> ao (not ˟a!)

I can't explain why the final vowel isn't long. Was *aoː confused with the much more frequent word ao 'green'?

3. When I typed Japanese 顏 kao 'face' in "Fanning Red Ears of Grain", I initially used Windows 10's Korean IME since I thought the standard modern forms of hanja were identical to the prewar kanji I prefer. But to my surprise, the IME converted 안 an (the Sino-Korean reading of 顏) to 顔 which looks exactly like the postwar kanji for kao. I was surprised.

When I was studying Korean, my instructor pointed out I had written a postwar Japanese-style kanji instead of the proper hanja which was identical to the corresponding prewar kanji. Perhaps that was before I decided to embrace prewar Japanese orthography in formal writing. Ever since then I've been writing hanja and prewar kanji identically without any problems. Until now.

I did a quick survey of Korean books I could easily find to see what form of <FACE> was in them. Obviously Wiktionary isn't a book, but I've included it anyway:

Author or publisher
새字典 Sae chajŏn (New Dictionary of Characters)
東亞出版社 Dong-A chhulphansa
A Korean-English Dictionary (in entry for 顏面 anmyŏn 'face')
Martin, Lee, and Chang
賢學學習玉篇 Hyŏnhak haksŭp okphyŏn (Hyŏnhak Study Jewel Book)
賢學社 Hyŏnhaksa
A Guide to Korean Characters
Bruce K. Grant
Jacob Chang-ui Kim
最新版常用學習三千漢字 Chhoeshinphan sangyong haksŭp samchŏn hancha (Three Thousand Hanja for Everyday Study: New Edition)
弘新文化社 Hongshin munhwasa
Pictorial Sino-Korean Characters Jacob Chang-ui Kim
동아現代活用玉篇 Dong-A hyŏndae hwaryong okphyŏn (Dong-A Modern Practical Jewel Book) 東亞出版社 Dong-A chhulphansa
List of 1,800 hanja taught in South Korean schools Wiktionary

<FACE> is in the KSC standard as presented in 中日朝漢字字形對照 'Chinese-Japanese-Korean Chinese character form comparison' and Wiktionary. But there are multiple versions of the KSC standard, so maybe the form changed over the years.

I wonder what the standard form is in North Korea.

Grant (1982) and Kim (1984) were my introduction to hanja. I started learning readings from their indexes in 1987. I obviously wasn't paying attention to the form of <FACE> back then.

My copy of Hyŏnhak haksŭp okphyŏn has a cover attached upside down and a partly mirror-imaged page of publication information which do not inspire confidence. Is Hyŏnhaksa still in business? I can't Google a company site. LA SCAR

425 years ago today, Portuguese and lascarins invaded Kandy. Lascarin is from Persian لشکر lashkar.

Wikipedia's lascar article derives Persian lashkar 'army' from Arabic العسكر al-`askar 'the army'. So is lashkar an article-incorporating word like algebra or Haitian Creole lalin < la lune?

No, because the Persian word is attested in Middle Persian as <lškl> before the coming of Islam. So the direction of borrowing and analysis was the other way around: a Persian word without an article was reinterpreted as an Arabic article-noun sequence.

But why does the Arabic word have `ayn, a consonant absent from the Persian word (and Persian in general)?

Perhaps at the time of borrowing, the first vowel of Persian lashkar sounded more like Arabic /a/ after `ayn rather than Arabic /a/ after a glottal stop.

Another possibility - not necessarily exclusive - is that lashkar could not have been interpreted as al-'askar because the Persian form has no glottal stop. Persian la- sounded more like Arabic l`a, a voiced sequence without a stop, than l'a, a voiced sequence interrupted by a stop.

A question I can't answer is why the Arabic word has s instead of sh. FANNING RED EARS OF GRAIN

Yesterday I saw part of 47 Ronin (2013). I looked up that movie, and as a result I finally learned how to spell Akō in Japanese¹: 赤穗・あかほ <RED EAR.OF.GRAIN>/<akaho>², which looks as if it should be pronounced Aka(h)o, i.e., as aka 'red' + ho 'ear of grain' with little or no sandhi. But in fact the second and third vowels have fused into a single long vowel:

*aka-po > *akabo > *akaβo > *akawo > *akao > *akɔː > ak

I didn't expect that because the normal reflex of *apo is ao: e.g.,

*kapo > *kabo > *kaβo > *kawo > 顏・かほ <kaho> kao 'face'

I would expect ō to be from *a(p)u, not *a(p)o. Is *a(p)o > ō a sound change in the Akō dialect?

Standard Japanese has cases of the reverse that I can't explain:

*apuŋgu > 扇ぐ・あふぐ <afugu> aogu 'to fan' (cf. Okinawan ōjun 'id.')

Compare with this (mostly) regular word from the same root:

*apuki > 扇・あふぎ <afugi>  ōgi 'fan (noun)' (cf. Okinawan ōji 'id.')

*k > g is irregular. Here's a doubly irregular word:

*ambure- > 溢れ・あふれ- <afure> 'to overflow' (cf. Okinawan andiin < anriin < *ambure- 'id.')

I don't know how *mb became f. The spelling <afure> should regularly be read as ōre.

¹I read John Allyn's 47 Ronin in English around 1986 and incredibly never encountered the name Akō in Japanese until now!

²I write all Japanese forms in prewar kanji and kana orthography. Prewar kana orthography is closer to earlier pronunciation than modern kana orthography. HE HINDIKE EPOKHE?

243 years ago yesterday, John Adams predicted that

[t]he Second Day of July 1776, will beh the most memorable Epocha, in the History of America.

His use of epocha retaining Latin -a got me wondering about the etymology of the word:

from Ancient Greek ἐποχή (epokhḗ, “a check, cessation, stop, pause, epoch of a star, i.e., the point at which it seems to halt after reaching the highest, and generally the place of a star; hence, a historical epoch”), from ἐπέχω (epékhō, “I hold in, check”), from ἐπι- (epi-, “upon”) + ἔχω (ékhō, “I have, hold”).

I then looked up the etymology of ékhō:

From Proto-Indo-European *seǵʰ-.

But wait - how can that be? PIE *s- becomes Greek h-, not zero.

Oh, duh: Sihler (2008: 170) points out that Grassmann's Law applies to the secondary aspirate h- as well as the primary aspirates (kh th ph):

ἔχ ékh- < *hekʰ- < *seǵʰ-

(Here I assume the devoicing of the primary aspirates predates Grassmann's Law. Does it? The Proto-Greek Wikipedia page says Grassmann's Law may be post-Mycenaean. Mycenaean already had voiceless aspirates.)

Grassmann's Law does not apply to the future stem, presumably because the law must postdate deaspiration before *s:

ἕξ- héks- < *seǵʰ-s-.

I really should have known better because the same is true in Sanskrit. Compare:

budhyate < *bhudh- 'wakes'

bhotsyate < *bhudh-sya- 'will wake'

Bucknell (1994: 179) lists a variant future form bodhiṣyati with an -i- blocking -s- from conditioning the deaspiration of the preceding dh. But I have not been able to confirm this form in Monier-Williams, Whitney, or the Digital Corpus of Sanskrit.

As tempting as it is to regard Grassmann's Law as a shared innovation of Greek and Indo-Iranian, that's not possible. Grassmann's Law must postdate *s- > h- in Greek, a change that never happened in Proto-Indo-Iranian. (*s > h did occur later in Iranian but not in Indic.) Wikipedia's Graeco-Aryan page suggests that

Rather, it is more likely that an areal feature spread across a then-contiguous Graeco-Aryan–speaking area. That would have occurred after early stages of Proto-Greek and Proto-Indo-Iranian had developed into separate dialects but before they ceased to be in geographic contact.

While I'm on the topic of Greek h ... today I was surprised to see Hindikē for 'Indian' in Wikipedia's "India (Herodotus)" article. Until now I thought that 'India' had initial I- in Greek because it was borrowed from Old Persian Hinduš 'Indus' (after *s > h in Old Persian; cf. Sanskrit Sindhus) after Greek had lost h-. But that Wikipedia article gives the Greek spelling νδική <Indikḗ> for Hindikē, not νδική <Hindikḗ>. Google gives only seven results for νδική. One is an OCR error for νδική. Two (1 2) are Armenian dictionaries with no Greek that I can see, three (1 2 3) appear to be copies of the same Armenian dictionary, and one is a Greek Facebook post. Hindikē looks like an error for the standard form Indikē. THE ETYMOLOGY OF CANTONESE 1LAT

Today it occurred to me that Cantonese 甩 1lat 'to lose' may be cognate to 失 1sat 'to lose' (now a bound morpheme in Cantonese):

1lat < *l̥it

1sat < *l̥it

Also belonging to this word family is

6jat < *lit 'to escape' (also now a bound morpheme in Cantonese)

What was the original root initial? Two scenarios with two subscenarios each:

A. *l- is original, and *l̥- is

A1. from a devoicing prefix + *l- or

A2. by analogy with some other voiceless/voiced sonorant-initial verb pair.

B. *l̥- is original, and *l- is

B1. from a voicing prefix + *l̥- or

B2. by analogy with some other voiceless/voiced sonorant-initial verb pair.

The B scenario seems less popular. I've never seen anyone propose anything like it, probably because of a reluctance to posit a primary voiceless lateral. Voiceless laterals are uncommon in the world's languages, though they seem common in 'Tibeto-Burman' (i.e., Sino-Tibetan minus Chinese - and even within Chinese, Taishanese has [ɬ]).

But wait - if both 1lat and 1sat go back to *l̥it, why do they have different initial consonants in Cantonese? Two scenarios:

A. 1lat is native Cantonese, whereas 1sat (with cognates throughout Chinese) is borrowed. In other words,

But how many native Cantonese words have l- as a reflex of *l̥-? There are many Cantonese words with s- from Proto-Chinese *l̥-. Are they all borrowings?

B. l- and s- are the products of reduction at different points in time. Three identical Proto-Chinese sequences could undergo three different paths of reduction:

reduction phase 1
reduction phase 2
reduction phase 3
reduction phase 4
*l̥- *l̥-

The trouble is that I cannot easily account for a fourth type of reduction also involving an *sl-type sequence that fuses into *z-. More on this problem tomorrow.

( It seems that every time I write that I'll continue tomorrow, I end up finding some other topic that eats up my time the next day. In this case I am finishing a July 4th-themed post that has to go up on July 4th. So this and other loose ends will have to wait - or, worse yet, be forgotten. I have no idea how many unfinished series there are on this blog after seventeen years.) 'BASIL' IN TANGUT

While researching the post I originally intended for today, I found this Tangut borrowing of Sanskrit arjaka 'basil' in Kychanov and Arakawa (2006: 361):


4541 0013 3985 1a? 1zar 1ka'3

I would expect the Sanskrit consonant cluster -rj- to be rendered as -ryr dz- with an epenthetic retroflex vowel -yr and dz, the usual Tangutization of Sanskrit j. (Tangut, like Tibetan and Late Middle Chinese, reflects a style of Sanskrit pronunciation with dental affricates instead of palatal stops.)

( Compare zar for rja with ryr ga for rka in


4541 0795 5091 3369 4293 1a? 2ryr4 1ga4 1ma4 1si4 for Sanskrit Arkamasi [a name]

from Sun and Tai 2012: 359. I cannot explain the g for k.)

But instead of †ry dza, the actual Tangut form has 1zar1 with z- and vowel retroflexion. Why?

My guess is that the Tangut reflects  a rdza ka, the Tibetan version of the Sanskrit word for 'basil'. Here's what I think happened:

1. The Tangut borrowed Tibetan a rdza ka as *a rdza ka'. (I'm leaving out tones and grades for simplicity.)

2. *a rdza ka' became *a dzar ka' after *rCV became CVr (i.e., [CVʳ] with a retroflex vowel) in Tangut.

3. Medial *-dz- lenited to *-z-: *a dzar ka' > *a zar ka'. CAN AI DECIPHER PYU?

tl;dr: I doubt it.

I ended my last entry with a teaser for what was supposed to be this entry. Today I did start writing part 6 of my 役/堤 series. Then I saw this on reddit:

Machine learning has been used to automatically translate long-lost languages - Some languages that have never been deciphered could be the next ones to get the machine translation treatment.

That took me to MIT Technology Review which links to the original paper "Neural Decipherment via Minimum-Cost Flow: from Ugaritic to Linear B".  I haven't looked at it yet. I am not a computer science person, so I almost certainly wouldn't understand it. I do understand the MIT article, so I'll make a few comments here.

The big idea behind machine translation is the understanding that words are related to each other in similar ways, regardless of the language involved.

Universal grammar?

So the process begins by mapping out these relations for a specific language. This requires huge databases of text.

There is no huge database of Pyu text. My text file of all the Pyu text that I can 'read' (not understand - just transliterate in most cases) is 50 kb.

Such a database would be possible for Pyu's distant relative Tangut. A Khitan database, though far smaller than that for Tangut, would still be bigger than the Pyu database.

The key insight enabling machine translation is that words in different languages occupy the same points in their respective parameter spaces. That makes it possible to map an entire language onto another language with a one-to-one correspondence.

If only languages had one-to-one correspondences!

The idea is that any language can change in only certain ways—for example, the symbols in related languages appear with similar distributions, related words have the same order of characters, and so on.

The general idea that language change is constrained is correct.

With these rules constraining the machine, it becomes much easier to decipher a language, provided the progenitor language is known. 

But we don't know the progenitor (ancestor) of Pyu. The reconstruction of Proto-Sino-Tibetan has barely begun. I don't even know where Pyu fits into the family.

Luo and co put the technique to the test with two lost languages, Linear B and Ugaritic. Linguists know that Linear B encodes an early version of ancient Greek and that Ugaritic, which was discovered  in 1929, is an early form of Hebrew.

But Ugaritic is not an early form of Hebrew; it's an early relative. An aunt, not a mother. Mycenean Greek has a similar relationship to ancient Greek as we know it. No mention of progenitor languages like Proto-Semitic or Proto-Indo-European. It seems that the technique is actually dependent on better known relatives, not progenitors. And those relatives have to be close. Pyu has no known close relatives.

It would be interesting to test this technique on modern languages. Spanish could be deciphered using Italian. But Italian wouldn't help, with, say, Albanian, Armenian, or Bengali. Indo-European has enormous internal diversity, and so does Sino-Tibetan.

But the big advantage of machine-based approaches is that they can test one language after another quickly without becoming fatigued. So it’s quite possible that Luo and co might tackle Linear A with a brute-force approach—simply attempt to decipher it into every language for which machine translation already operates.

The hope is that Linear A will turn out to be a close relative of some "language for which machine translation already operates". But what if it isn't? What if it's an isolate?

Pyu does not seem to be an isolate in the sense of have zero relatives. But it does seem to be an isolate within Sino-Tibetan - an Asian Albanian without close relatives among its neighbors. So I doubt a brute-force approach  using Burmese, Chin, Karen, etc. is going to pay off. A WÉI-RD READING

One last branch of the tree that started with 役小角 En no Ozunu's name:

While checking the Wiktionary entry for 堤 from "Edachi Again", I was surprised when I saw its list of Mandarin readings for the character.  As the Sesame Street song goes, "One of these things is not like the others":

  1. 'dike; base of bottle'

  2. 'dike; base of bottle'

  3. tǐ (sic; an error for dǐ) 'to stop'

  4. shí (first syllable of 堤封 shífēng, now normally tífēng 'totally')

  5. wéi (in place names; the only example I could find is premodern洙堤郡 Zhūwéi Prefecture)

Normally multiple readings of a character have initial consonants at similar places of articulation. t- and d- are both dental and sh-, though not dental, is retroflex. w-, however, is labial. I cannot think of any other T-character with a w-reading.

I found 洙堤郡 Zhūwéi Prefecture in 集韻 Jiyun (1039). I did not find it in Scripta Sinica's text database, so I have no idea how old that place name is.

The Jiyun fanqie for 堤 in 洙堤郡 is

*win + 規 *kwie

which adds up to a Middle Chinese reading *wie. But Middle Chinese no longer even existed by 1039. And I could argue that 'Middle Chinese' in the sense of 'the language of dictionaries and rhyme tables' did not exist, at least not as a spoken language. Putting those misgivings aside, I think an 11th century reader might have pronounced堤 in the prefecture name as something like *wi whose initial is still hard to reconcile with the others.

I'm not even sure how to read 洙 in the prefecture name. More on this problem next time. GSR 130 AND 128

GSR 851a 役 from my last three posts looks like a semantophonetic compound of 彳 'to go' and a phonetic GSR 130a 殳, but 殳 is in fact a semantic component 'baton' (Karlgren 1957: 226), 'a kind of lance' (Schuessler 1987: 563).

The standard Mandarin reading of 殳 is shū with a high level tone normally pointing to a *voiceless initial. But other evidence points to a *voiced initial. GSR 128s 殊 'to cut off' > 'very', a homophone of 殳, transcribes Sanskrit ju in 文殊 for Mañju(śrī). And 殊 is also now shū in standard Mandarin. Why aren't 殳 and 殊 ˟shú with a high rising tone reflecting a *voiced initial?

Here's how I reconstruct the history of 殳 and 殊:

Scenario A: Primary *-d-

*CIdo > *CIduo > *duo > *dʑuo*dʑu > *ʑu > shū

Scenario B: Secondary *d-

*NITo > *NITuo > *NTuo > *nduo > *dʑuo*dʑu > *ʑu > shū

*N- is an unknown nasal. If Old Chinese was like Pyu, it had two possible nasal initials in presyllables: *n- and *m- (but probably not *ŋ-, unless ṅraḥ /ŋ.raH/ in PYU 20 is not an isolated oddity).

*T is an unknown dental stop: *t, *tʰ, or *d.

I can posit two parallel scenarios for 投 'to throw':

Scenario A: Primary *d-

*do > *dou > *du > *dəw > tóu

Scenario B: Secondary *d-

*NTo > *ndo > *do > *dou*du > *dəw > tóu

Schuessler (2007: 500) links 投 to Written Tibetan Hdor-ba 'to throw away' and gtor-ba 'to throw', but I would expect Written Tibetan -r to correspond to Old Chinese *-r, not zero.

殊 'to cut off' is cognate to 誅 'to punish, kill, reprove'. I assume both 殊 and 誅 had unaspirated or voiceless-initial roots, as there is no evidence for *tʰ or *d in 誅:

*RIto*RItuo > *Rtuo > *truo > *ʈuo > *ʈu > zhū [tʂu]

*R- might be *r- or *l-. *R- is so common in Old Chinese that I suspect it cannot simply be *r-. Written Tibetan has preinitial l- as well as r-, so Old Chinese may also have had preinitial *l-. EDACHI AGAIN: WHAT COUNTS AS OLD JAPANESE?

Continuing from my previous entry ...

岩波古語辞典 Iwanami kogo jiten (The Iwanami Dictionary of Old Words, 1990) gives an example of Old Japanese 役 edachi 'being forced to fight or work for the government' from 古事記 Kojiki (Record of Ancient Matters, 712):


Tsutsumi ike ni edachite, Kudara no ike wo tsukuriki.

'[They] were put to work on dikes and ponds, [and they] made the Pond of Paekche.'

Here is the context from 倉野憲司 Kurano Kenji's (1991: 145) reading of the Kojiki¹:


Mata Shiragibito maiwatarikitsu. Koko wo mochite Takeuchi no sukune no mikoto hikiite, tsutsumi ike ni edachite, Kudara no ike wo tsukuriki.

'Again Shilla people came over [to Japan]. Therefore Takeuchi no sukune no mikoto led them, had them put to work on dikes and ponds, [and they] made the Pond of Paekche.'

I fear that someone might see that entry and conclude that edachi is an Old Japanese word.

Why "fear"? Notice I wrote "reading" and not "edition". Kurano's (1991: 276) edition of the Kojiki - the text upon which his reading is based - doesn't have a single hiragana, since of course hiragana did not yet exist in 712:


It has punctuation marks that almost certainly weren't in the original text. I don't know what this passage looks like in the oldest surviving manuscript (the Shinpuku Temple manuscript from 1371-72), but you can see there is no punctuation in this image of a page from that manuscript. The punctuation is hardly the biggest problem, though.

Let's look at another reading of the Kojiki by 武田祐吉 Takeda Yūkichi (1977: 137)²:


Mata Shiraki hito maiwatarikitsu. Koko wo mochite Takeuchi no sukune no mikoto, hikiite, watari no tsutsumi no ike to shite, Kudara no ike wo tsukuriki.

'Again Shilla people came over [to Japan]. Therefore Takeuchi no sukune no mikoto led them, [and they] made the Pond of Paekche as a pond of the dike of the people who crossed over [i.e., from the Korean peninsula].'

(6.29.22:12: 'pond of the dike' makes no sense to me. The original text has 堤池 <DIKE POND>.)

It has no form of the verb edachi or even the character 役 (which Takeda reads as 渡 watari '[person who] crossed over').

Those two readings are not the only possibilities. Jidaibetsu kokugo daijiten (1967: 141) mentions two more readings of 爲役 or 役:

and proposes a third:

Which of these, if any, is right? There is no evidence within the original text to know. All the readings - the 讀み下し文 yomikudashibun - are translations into a stylized Japanese that is archaic but may not be identical to Old Japanese. That's why I romanize the readings in modern pronunciation.

The key word is phonograms. Unless an Old Japanese word is attested in phonograms, its phonetic value is unknown. All the e-readings of 役 are simply educated guesses. We don't know how yomikudashi worked in 712. 役 might even have been read as something like Go-on wiyaku!

6.29.22:20: The bottom line for me is that only Old Japanese words in phonograms can be cited in phonetic transcription. Old Japanese words in semantograms should be cited without phonetic transcription: e.g.,役 as <SERVICE>, not educated guesses like edachi, etachi, etatase, etate, etashi, etc.

古事記をそのまま読む Reading the Kojiki as It Is has more on the problem of interpreting 役 (or 渡 in the manuscript that it reproduces; Takeda's reading seems to be based on a manuscript with 渡): e.g.,

しかし、「役之堤池」は、全く不可解な構文である。まず、"之"が「の」を表すとした場合、「役の堤と池」は意味をなさない。 「役」を動詞とした場合も、目的語「堤池」の前に"之"を挟むことは絶対にない。 「役」がもし正しいとすれば、考えられる唯一の可能性は「"堤池之役"の誤写」である。

However, 役之堤池 is an completely incomprehensible construction. First, if 之 represents no [a genitive marker], 'dikes and ponds of service' makes no sense. [I had considered that possibility.] Even if 役 is taken as a verb, 之 would absolutely not be between it and its object 堤池 'dikes and ponds'. If 役 is taken as correct, the only conceivable possibility is an erroneous copying of 堤池之役 'service of dikes and ponds'.

6.29.21:50: Added English translations of the readings and derivations of the Jidaibetsu readings.

¹6.29.22:23: I have converted Kurano's hiragana back into the original kanji whenever possible to facilitate comparison with the original all-kanji text.

²6.29.22:24: I have converted Takeda's hiragana and postwar simplified kanji back into the original kanji whenever possible to facilitate comparison with the original all-kanji text. EDACHI

Continuing from yesterday's post about the unusual Japanese name 役小角 En no Ozunu:

The English Wiktionary lists edachi as a kun (native Japanese) reading of the Chinese character 役. (時代別国語大辞典 Jidaibetsu kokugo daijiten [The Great Dictionary of the National Language Categorized by Era] favors etachi.) I think edachi/etachi is not fully native. I regard it as a Chinese-Japanese hybrid *ye-(nV-)tat- 'to be forced to fight or work for the government'. The verb may be identical or similar in structure to modern 役に立つ yaku ni tatsu ~ 役立つ yakudatsu 'to be useful', lit. 'role DAT/LOC stand'. Yaku is an even earlier borrowing of Chinese 役 than e.

Before I look more into edachi/etachi, here's my take on the history of its first morpheme 役:

  1. The earliest reconstructible form for 役 is *CI-waj-k 'to do service' (Schuessler 2007: 568 reconstructs 役 *wai-k from 爲 *wai 'to do' = my *CI-waj.) *CI- may have been a causative prefix *SI-. (Cf. Baxter and Sagart's [2014: 56 ] causative *s-prefix.) The unknown high vowel *I is needed to account for the later vocalism (see the appendix). *-w- could be *-ɢʷ- (after Baxter and Sagart 2014), but I prefer to avoid exotic solutions if I can. See below for hard evidence for *-w- (or at least labiality).

  2. Fusion of *-aj- into *-e-: *Ciwajk > *Ciwek

  3. Vowel harmony-driven warping: *-e- breaks to *-ie- after a high vowel: *CIwek > *CIwiek

  4. Presyllable loss: *CIwiek > *wiek. Early Sino-Vietnamese việc was borrowed at this stage.

  5. *wi-fusion: *wiek > *ɥek

  6. *e > *a in palatal environments in some southern dialects (details unclear): ek > ak. Go-on yaku was borrowed from such a dialect before the 7th century. The earliest attestation of the Go-on reading that I know of is wiyau (sic; error for †wiyaku¹) in Ruiju myōgishō (c. 1100). It is remarkable that an un-Japanese [ɥ] or [jw] survived in spelling if not in pronunciation centuries after ak was borrowed. There is no trace of such an initial in modern Go-on yaku.

  7. -simpification: *ɥek > *jek
  8. Fronting of coda after a palatal vowel: *jek > *jejk or even *jec (if Hashimoto is right about Middle Chinese final palatals). Kan-on eki < 7th c. Kan-on *yeki was borrowed at this stage. But note that Sino-Korean 역 yŏk which probably slightly postdates Kan-on eki has nothing pointing toward a palatal component of the coda. The Sino-Korean reading isn't 옉 ˟yek < ˟yəyk < ˟(y)eyk. Maybe the coda had no palatal component in the source dialect of Sino-Korean. Or perhaps there was a phonotactic constraint against ˟-eyk in Old Korean. I don't know of any native Korean root with 옉 yek < yəyk < (y)eyk.

  9. *e-raising in some dialects: *jek > *jik. Sino-Vietnamese dịch < *jic was borrowed at this stage.

  10. Coda loss in some dialects: *jik > *jiʔ > standard Mandarin yi.

Back to edachi and the unusual name 役 En: Most Old Japanese speakers had no contact with Chinese speakers. The average speaker had few Chinese borrowings in their vocabulary. The elite, on the other hand, was more familiar with Chinese, and elite pronunciations of Chinese words may have been on an continuum from native speaker-like to heavily assimilated (i.e., Japanized). So the Kan-on reading (singular) of 役 was really a set of readings in En no Ozunu's time (the 7th century):

*ye is (questionably) attested in Old Japanese as a monosyllabic word 'corvee'. (But I write it with an asterisk because I reconstructed the *y-. The word is only known through the reading tradition²; there is no phonogram spelling pointing to *y-.)

The name En may have originated as *yek whose coda assimilated to the nasal of the following genitive marker (cf. the Korean rule /k n/ > [ŋ n]).

*yek nə wonduno > *yeŋ nə wonduno > En no Ozunu (with a pseudoarchaic -nu based on an erroneous reading of a phonogram for Old Japanese no).

¹19.6.29.21:19: wiyaku is the Ruiju myōgishō Go-on reading of 疫 'epidemic', a homophone of 役 in Chinese. The graph 疫 is a combination of 疒 'disease' (semantic) and 役 (abbreviated phonetic, itself a semantic compound of 彳 'to go' and 殳 'baton, beat' [Karlgren 1957: 226]). The Kan-on reading is eki, and the word was Japanized as *ye (now pronounced e), (questionably) attested in

according to Jidaibetsu kokugo daijiten (1967: 140) and Iwanami kogo jiten (1990: 201).

But I don't know for sure how 疫 was originally intended to be read in those texts. In fact, 倉野憲司 Kurano Kenji's (1991: 255) edition of Kojiki has 伇 (an archaic variant of 役, not 疫) which appears as 役 eyami (! < *ye-yami, a hybrid of Chinese 'epidemic' and native Japanese 'illness') in his 讀み下し文 yomikudashibun on p. 101. 武田祐吉 Takeda Yūkichi's (1977: 96) yomikudashibun of Kojiki has 伇 which he reads as e (< *ye).

²*ye appears in Man'yōshū 3847 in the semantogram combination 課役 <IMPOSE SERVICE> which has been read as edachi (< *yendati), etsuki (< *yentukɨ), and mitsuki (< *mitukɨ) 'tax' (Ōno et al. 1990: 205). (6.28.21:57: All three possibilities fit the meter.)

Appendix: Evidence for labiality in 役 Karlgren (1957: 226) reconstructed Old Chinese 役 as *di̯ĕk without any labial segment. Thirty years later, Schuessler (1987: 743) reconstructed the word as ?*ljik without any labial segment. Standard Mandarin yi and Cantonese jik have no labial segment. However, both internal and external evidence point to a labial segment.

1. Internal evidence

1.1. Mandarin:

(Do any Jin varieties have a labial vowel in this morpheme? Xiaoxuetang only lists Taiyuan 太原 ieʔ.)

1.2. Most Wu varieties at Xiaoxuetang have labial vowels: y, u, or o; 莊村 Zhuangcun has ʯʔ [ʐ̩ʷʔ]

1.3. All Xiang varieties at Xiaoxuetang have y.

1.4. Some Gan varieties at Xiaoxuetang have y or u; 平江 Pingjiang has ʯɤt [ʐ̩ʷɤt].

1.5. A few Hakka varieties at Xiaoxuetang have labial vowels:

1.6. Some Yue varieties at Xiaoxuetang have v-, y, or u.

1.7. Some Pinghua varieties at Xiaoxuetang have v-, ʋ-, y, u, or o.

1.8. Min varieties (list not exhaustive):

1.8.1. Southern: 揭陽 Jieyang uek

1.8.2. Pu-Xian: yʔ in both 莆田 Putian and 仙游 Xianyou

1.8.3. Eastern: 福安 Fuan peik with p-!

1.8.4. Northern: 石陂 Shibei ɦy (with level tone!)

Are the initial consonants of Fuan and Shibei evidence for a proto-obstruent like Baxter and Sagart *ɢʷ-?

1.8.5. Central: 明溪 Mingxi y (with departing tone!)

1.8.6. Other: 隆都 Longdu uɐk (with upper register!) and 將樂 Jiangle y

1.9. Some of Xiaoxuetang's unclassified varieties also have labial segments:

2. External evidence

2.1. Early Sino-Vietnamese việc and Muong (variety unidentified) [wiək] (Pulleyblank 1994: 83, cited by Schuessler 2007: 563)

2.2. Ruiju myōgishō (c. 1100) Go-on wiyau (sic; error for †wiyaku)

2.3. Borrowings in Tai: Saek viak D2L 'work', Siamese wiek³ (Maspero 1912: 73, cited by Schuessler 2007: 563; regards เวียก wiak as Isan - i.e., not standard Siamese - and the example implies it means 'work') Added the forms from Schuessler (2007: 563). EN NO OZUNU

役小角 En no Ozunu, founder of 修驗道 Shugendō, was banished by the Japanese court 1320 years ago today. The spelling of his name is doubly interesting.

角 'horn' is normally read as tsuno. In a compound, I would expect ts- to voice to z-: -zuno. But I wouldn't expect a final -u. Iwanami kogo jiten (1990) says tsunu is an Edo period error for tsuno based on the misinterpretation of man'yōgana for -no as nu. So is Ozunu an Edo period misreading of 小角? (The genitive marker no between En and Ozunu is unwritten.) Or is Ozuno (another reading of 小角) a regularization of an original Ozunu reflecting a dialect in which *-o raised to -u? (Cf. forms like Hitachi Old Japanese yu [Kupchik 2011: 374] corresponding to Western Old Japanese and even modern standard Japanese yo 'night'.)

役 (Wiktionary) is normally read as yaku or eki. Both of those readings are Chinese loans. 役 has never had a nasal-final reading in Chinese. So why is 役 read En in this name? If the name is native, it shouldn't end in -n since all Japanese words in the 8th century ended in vowels. I wonder how 役 was read when he was alive. ASADY

I was surprised to learn last night that بشار الأسد‎ Bashar al-Assad is pronounced [baʃˈʃaːr elˈʔasad] in Levantine Arabic with a gemihnate [ʃʃ] and a single [s]. Why isn't it written as Basshar al-Asad in English?

The Polish Wikipedia reflects the geminate [ʃʃ] and a single [s]: Baszszar al-Asad.

The Slovak Wikipedia even reflects the long [aː]: Baššár al-Asad. (But the Czech Wikipedia lacks the geminate: Bašár al-Asad.)

The Hungarian and Albanian Wikipedias have e for at least one short [a]: Bassár el-Aszad (but not ˟Eszed!) and Beshar el-Asad (but not ˟Esed!).

The Thai Wikipedia has บัชชาร อัลอะซัด  <ɓăjjāra ʔălaʔaḥzăɗa> whihch I assume is read as [bàtsaːn ʔan ʔasát]. Alas, I don't know of any entries for Assad in the Lao, Khmer, or Burmese Wikipedias.

The Tamil Wikipedia has another drastic localization: பசார் அல்-அசத் <pacār al-acat>  [pasaːr al asat]. (Tamil has no initial [b] or final [d].)

I don't know how typical those renderings are. I wish I had time to investigate how Arabic names are localized in various languages.

The title is from my attempt to Tangutize Assad's name as


1637 5994 4541 2682 4541 1693 0804

2ba1 1shar3 1a? 2lu3 1a? 1sa4 2dy4

using conventions originally developed for Sanskrit.

I am unaware of any reasoning for choosing either tone 1 or tone 2 for transcribing Sanskrit, so I have not taken tones into consideration when choosing transcription characters from the set used for Sanskrit as compiled by Arakawa (1997).

1. 1637: Sanskrit -a is generally Tangutized as -a4, though Sanskrit ba can be Tangutized with -a1. I should look into exceptional cases of a1-transcription.

2. 5994: In theory I could have transcribed [ʃʃaː] as shy sha, but I have chosen an English-like solution with just one fricative.

Tangut has no syllables ending in [r]. The -r of my Tangut notation indicates vowel retroflexion, not an [r]-coda.

Tangut has no word spacing. Perhaps modern Tangut would have had a dot here to separate foreign names.

3. 4541: Devised to write Sanskrit a. Sanskrit a after consonants is normally transcribed as -a4, so I suspect 4541 is 1a4. But 1a1 is also possible since Sanskrit ba can be transcribed as 1637 2ba1. I do not think 1a2 or 1a3 are likely, as neither is the known reading of any other tangraph. 1a4 is attested as the reading of 𗅹 2375 'east, tail'. 1a1 is also not attested as the reading of any other tangraph, but Sanskrit a was transcribed in Chinese as *1a1, and Tangut  -a1 can correspond to Sanskrit -a. Hence 1a1 is not impossible as a reading of 4541.

I do not know why Li (2008: 721) does not list a tone for 4541 which appears in the level tone (i.e., tone 1) section of Mixed Categories of the Tangraphic Sea.

4. 2682: Isolated Sanskrit consonants are usually Tangutized as -y syllables. However for some reason l is Tangutized as either 2682 ending in -u or 3284 𗥰 2la3. I have opted for 2682 2lu3 since it sounds like the Japanese solution for writing -l: -ru.

5. 4541: See above.

6. 1693: An example of an -a4 character used to Tangutize a Sanskrit syllable. Contrast with 1637  2ba1 for Sanskrit ba. Gong reconstructed -a4 as [ja], but there is no [j] in most Sanskrit syllables transcribed as -a4. (An obvious exception is 𘁂 5314 2a4 for Sanskrit ya.)

7. 0804: An example of an y-syllable used to Tangutize a Sanskrit consonant. Contrast with 2682 above which ends in -u rather than the usual -y. CIR: THE THREE-AXIS MODEL OF ORTHOGRAPHIC REFORM

Having recently finished reading Robbins Burling's Spellbound: Untangling English Spelling, I've been thinking about how to characterize different proposals for reforming English orthography. The CIR model has three axes:

  1. C: Continuity

  2. I: Internationality

  3. R: Regularity

I could describe a proposal in terms of these three features using this notation: [±lowercase letter of feature].

Continuity refers to whether a proposal incorporates an existing practice, either by leaving it alone or by expanding its domain.

To write all English [dʒ] as <j> is [+c] since some English [dʒ] are already written as <j>.

To write all English [ʃ] as <x> is [-c] since English [ʃ] is not written a <x>. (I am not counting foreign names like <Xi>.)

Continuity is of interest to both existing users of English (native and nonnative) and learners who would want to access literature in the prereform orthography.

Internationality refers to whether a proposal is compatible with non-English orthographic practices.

To write all English [i] as <i> instead of, say, <ee> is [+i] since <i> represents [i] in most Latin-alphabet orthographies.

To write all English [ʃ] as <s> is [-i] since no Latin-alphabet irorthographies have <s> for [ʃ] with the major exception of Hungarian. (Also, <s> is [ʂ] for southern Vietnamese speakers - not [ʃ], but close.)

Internationality is of interest to learners who would benefit from an orthography using conventions they are likely to already know.

Regularity refers to whether a proposal has one symbol (or symbol sequence) per sound.

To write all English [k] as <k> is [+r].

To write English [k] as <c> before nonfront vowels and <k> before front vowels is [-r]. (But still more regular than the current spellings of [k]!)

Regularity is of interest to learners who do not want to be burdened with irregularities.

Obviously those features are not really binary; there are degrees of CIR. I don't want to assign arbitrary numerical values, so maybe I could double plus and minus signs: e.g., to switch English to the Shavian alphabet would be [--i] ('doubleminus international'? - cf. Orwell's 'doubleplusgood') since no other language is written in that script.

Shavian in terms of all three features is [-c -i +r]:

The existing orthography is [+c -i -r]. It is irregular with many language-specific eccentricities.

A 'perfect' orthography that is [+c +i +r] seems impossible. To maintain continuity to some nontrivial degree, a new orthography would have to abandon internationality: e.g., reject international <i> for English-specific <ee> as the spelling of [i].

A [+c -i +r] orthography would require learning a lot of English-specific conventions, but those conventions would be consistent: e.g., <mee> and <eet> instead of <me> and <eat> (cf. <eel> and <feet> which would remain unchanged).

A [-c +i +r] approach would require all four of those [i]-example words to have new spellings: <mii>, <iit>, <iil>, <fiit>.

Maybe I should call regularity between two features, 'regularity' and 'monophoneticity/monophonemicity' (?). It is possible to have regularity without absolute one-to-one correspondences: e.g., [i] could be <ii> in closed syllables but <i> in open syllables: e.g.,< iit>, <iil>, <fiit> but <mi> (since there is no [mɪ] that would be written <mi> if <i> = [ɪ]). Another example of this type of 'split' (or environment-conscious) regularity is my proposal above for <c> and <k> which I regarded as [-r].

Lastly, the CIR terminology or something like it could be used to describe any script. Invented scripts would be [-c]. Adaptations of existing scripts could be regarded as [-c] if they bear little or no relation to a previous script for a language. The modern Turkish alphabet is

The Tangut script is

[-c -i -r] scripts like Tangut are the hardest to learn because they are sui generis.

¹6.26.17:26: I inserted "almost" because of ambiguous cases like gâvur [ɟaʋur] 'infidel' which in theory could also be read as ˟[ɟaːʋur] with a long vowel as well as a palatal consonant (cf. kâfir [caːfir] 'infidel') or ˟[gaːʋur] with a long vowel. Google Translate's TTS (?) 'knows' that gâvur and kâfir both have palatal consonants but that only kâfir has a long vowel.

I'm surprised that gâvur doesn't have a long vowel. It is borrowed from Persian گاور gāvur (before the u > o shift in modern standard Persian) which does have a long vowel. And kâfir (from Arabic via Persian) demonstrates that palatals can precede long vowels in Turkish.

I confess I was tempted to derive gāvur from kāfir, but there would be no reason for Persians to change k, f, and i to g, v, and r. The earlier form of gāvur is گبر gābr which is phonetically even further from kāfir - the second consonant b is a stop, not a fricative, and there is no second vowel. gābr is from Aramaic, not Arabic. THE BATTLE OF MANG YANG PASS

The Battle of Mang Yang Pass occurred sixty-five years ago today:

It was one of the bloodiest defeats of the French Union together with the Battle of Dien Bien Phu in 1954 and the Battle of Cao Bằng in 1950.


The ambush and destruction of GM 100 [Groupement Mobile No. 100] was considered the last significant battle of the First Indochina War. Three weeks later, on Jul. 20, 1954, a battlefield ceasefire was announced when the Geneva agreements were signed, and on Aug. 1, the armistice went into effect, sealing the end of the French Indochina and the partition of Vietnam along the 17th parallel. The last French troops left South Vietnam in April 1956, upon request from President Ngô Đình Diệm.

What kind of name is Mang Yang? Vietnamese syllables normally do not begin with Y-. Mang Yang is in Gia Lai Province which has many obviously non-Vietnamese names. The one I recognize is Pleiku which is un-Vietnamese in four ways:

  1. It begins with p-. ph- is permissible but not p- (because earlier Vietnamese *p- became b-).

  2. It begins with a consonant cluster containing -l-. All native Vietnamese *Cl-clusters became tr-. Wiktionary has a phonetic Vietnamese spelling pờ lây cu splitting the first syllable Plei in two. (Why is a huyền tone assigned to pờ?)

  3. The first syllable ends in -ei, a rhyme unknown in Vietnamese.

  4. The second syllable has a k- instead of c- for [k] before a back vowel. Vietnamese k- is normally written only before front vowels.

Wikipedia and Wiktionary derive Pleiku from Jarai Plơi Kơdưr, lit. 'village north/above'.

The Vietnamese Wikipedia says Mang Yang is Bahnar for cổng trời, lit. 'gate sky': i.e., 'sky gate'. But the dictionary of the Plei Bong-Mang Yang Bahnar dialect by the Bankers and Mơ (1979) has no words like mang 'gate' or yang 'sky'. I cannot find a word for 'gate' in the dictionary's English-Bahnar index, and the only word for 'sky' I can find using that index is plĕnh on p. 99. (There is supposed to be another word for 'sky' on p. 110, but I don't see one.) There is a yang 'spirits, nonhuman beings that affect humans' on p. 145. Perhaps that is the Yang of Mang Yang. WHY DON'T FINAL FRICATIVES DEVOICE IN TURKISH?

In my last post, I didn't comment on the final consonants of Arabic Muḥammad and Turkish Mehmet. Turkish final stops and affricates devoice in final position: e.g., Arabic kitāb > Turkish kitap 'book' (but acc. sg. kitab-!). Note, however, that the etymological -d of Mehmet does not survive in the spelling of the accusative singular: Mehmet'i [mehmedi]. (The apostrophe separates a proper name from a suffix. The rule is to keep the spellings of proper names intact regardless of actual pronunciation.)

Note also that I spoke of final stops and affricates devoicing but not fricatives: /ʒ z v/ remain voiced in final position unlike their Russian counterparts.

Tonight I realized that /z v/ are phonetically fricatives but behave like sonorants. Final /z/ in native words comes from Proto-Turkic *-r. It is a former sonorant that still behaves like one. /v/ acts as if it were /w/. I think /ʒ/ is only in borrowings like bej 'beige' and garaj 'garage'; it may retain its final voicing by analogy with /z/. And/or such borrowings postdate devoicing. (When did devoicing occur?)

(I am reminded of how traditional Tangut phonology groups z- and zh-sounds with liquids in consonant class IX rather than with s- in consonant class VI and sh- in consonant class VII.)

One problem with the above analysis is that /r/ devoices in word-final position. So if /z/ is really like /r/, why doesn't it devoice like /r/? And if I understand Kornfilt (2009: 524) correctly, speakers who devoice /r/ also devoice palatal /lʲ/ and may even devoice velar /ɫ/. (Göksel and Kerslake 2005: 8-9 do not mention the devoicing of laterals.) HOW DID MUḤAMMAD BECOME MEHMET?

Originally this post was titled "Why Doesn't Muḥammed Have Ü?". But the answer to that question is simple: Arabic /u/ was borrowed as Ottoman u both before and after Arabic pharyngeals. I mistakenly thought vowels in Ottoman borrowings from Arabic were determined only by preceding Arabic consonants.

My new title question is more difficult. According to Wikipedia, "the most common Turkish form of the Arabic name Muhammad" is Mehmed (now Mehmet):

Originally the intermediary vowels in the Arabic Muhammad were completed with an e in adoption to Turkish phonotactics, which spelled Mehemed, and the name lost the central e over time. Final devoicing of d to t is a regular process in Turkish. The prophet himself is referred to in Turkish using the archaic version, Muhammed.

I thought Mehmet was a Turkish version of Arabic Maḥmūd, but they are only related because they share the same M-Ḥ-D root. The two names are distinct in Arabic spelling: Meḥemmed (now Mehmet) and Muḥammed are both محمد <mḥmd> like Arabic Muḥammad (Buğday 2009: 220), whereas I suppose Turkish Mahmut (Ottoman Maḥmūd?; the name is not in Buğday 2009) is محمود <mḥmwd> like Arabic Maḥmūd.

Turkish Mahmut < Arabic Maḥmūd has a for the same reason that Ottoman Muḥammed has u: a neighboring Arabic pharyngeal. (Contrast with Ottoman mühimmāt < Arabic muhimmāt 'important matters' in which /u/ has no pharyngeal neighbor.)

On the other hand, Turkish Mehmet < Ottoman Meḥemmed < Arabic Muḥammad has a first e where I would expect an u before an Arabic pharyngeal. And the second e of Ottoman Meḥemmed occurs where I would expect an a after an Arabic pharyngeal.

The key word is "Arabic". Turkish doesn't have pharyngeals. Here's what I think might have happened. Turks heard Arabic [muħammæd] and borrowed it in harmonized form ("in adoption to Turkish phonotactics" as Wikipedia put it) as *Mühemmed. (I assume the borrowing of Arabic /a/ as a in the presence of pharyngeals was a learned practice only possible to those who were literate: i.e., aware of a graphic if not a phonetic distinction between Arabic glottal ه <h> and pharyngeal ح <ḥ>, both borrowed as [h] in Turkish.) The first vowel was then irregularly assimilated to the other two e: Mehemmed (written etymologically in Ottoman as <mmd>, transcribed here with vowels and unwritten gemination as Meemmed, a compromise between the pronunciation and the spelling).

The relationship between Mehmet and Muhammed is slightly like that between the Korean and Japanese words for 'Buddha' on the one hand and the Sino-Korean and Sino-Japanese morphemes for 'Buddha':

Sinoxenic morpheme
부처 Puchhŏ < *put-ke < Late Old Chinese 佛 *but Pul < northern Late Middle Chinese *fur
Hotoke < *potə-ka-i < Paekche *? < Late Old Chinese 佛 *but Butsu < Early Middle Chinese *but

The two columns represent two kinds of borrowing. All of the above forms are based on Chinese 佛 'Buddha' (itself a borrowing from Indic Buddha). But the forms in the first column cannot be mechanicaly derived from Chinese like those in the second column. The former were idiosyncratically borrowed as single items and not as part of an entire lexicon complete with systematic conventions of pronunciation. (Chinese is to Korean and Japanese what Arabic was to Ottoman.)

Adding to the idiosyncracy are suffixes absent from Chinese. Early Korean *-ke and early Japanese *-ka- seem to be a Koreanic morpheme 'ruler' which may have continental origins: cf. Khitan qa 'khan'. Japanese *-i is a noun suffix.

I cannot explain the *o in Japanese. Perhaps there was a lowering of *u in Paekche, the likely donor language. But there is no other evidence of such lowering. The general tendency in early Japanese was toward raising, not lowering: pre-Old Japanese *o became Old Japanese u, not the other way around. WHY DOES MÜHACIR HAVE Ü?

After the ethnic cleansing of Phocaea, muhacirs settled in what is now Foça.

The Turkish word muhacir [muhadʒir] 'migrant' is from Arabic muhājir. I was surprised that the Azerbaijani counterpart is mühacir with ü. I would understand fronting a foreign u to make a word conform to vowel harmony, but mühacir is even less harmonic than muhacir (which would be ˟mühecir or ˟muhacır if it were fully harmonic).

first vowel
second vowel
third vowel
muhacir back
mühacir front
hypothetical (all front vowels)
˟mühecir front front front
hypothetical (all back vowels)
˟muhacır back back back

On the basis of these two words (dangerous!), I expected that Arabic u after nonemphatic consonants such as m was be borrowed into Turkish as back u and Azerbaijani as front ü.

And I was wrong. Buğday's The Routledge Introduction to Literary Ottoman (2009: 11) explains:

The pronunciation of short vowels in Persian and Arabic words is generally governed by which consonants appear before and after the vowels. Arabic vowel graphs are as a rule interpreted as front vowels in Ottoman (üstün = e, kesre = i, ötre = ö, ü). There is nonetheless a group of consonants that cause front vowels in their environment to shift their point of articulation and become back vowels (a, ı, o, u).

Those consonants that shift vowels from front to back are: ح ḥ, خ ḫ, ص ṣ, ض ż, ط ṭ, ظ ẓ, ع ,` غ ġ, ق ḳ. The remaining consonants retain the front articulation of the vowels:

ب b, پ p, ت t, ث s, ج c, چ ç, د d, ذ z, ر r, ز z, j, س s

ش ş, ف f, ك k, ل l, م m, ن n, و v, ه h,ی y

I have long known about Turkish e for Arabic a, and that has never surprised me since [æ] is an allophone of Arabic /a/ and is the phonetic value of Persian short a.

Arabic [æ] > Persian [æ] > Turkish e

But neither Arabic nor Persian have front rounded vowels, so I didn't expect this shift:

Arabic [u] > early New Persian [u] > Turkish ü (less commonly ö and rarely o)

(Modern Persian has lowered [u] to [o].)

ö is particularly odd in `Ömer after `ayn which normally should favor a back vowel: e.g., in sā`at [saːʔat] 'clock'. (Turks could not pronounce `ayn [ʕ], but they did replicate the backness of /a/ after /ʕ/ in Arabic.) Did the first vowel front to match the frontness of the second vowel?

o in `osmān 'Uthman' is understandable since a mid [o] approximates the lowered allophone [ʊ] of /u/ after `ayn.

So although I initially thought that Turkish mücahit 'jihadi' (cf. Azerbaijani mücahid) < Arabic mujāhid was irregular, in fact it is regular, and the real question is: why isn't Turkish muhacir 'migrant' ˟mühacir with a front vowel?

Another question is: Why does the word 'jihadi' have u in Uzbek mujohid (cf. Tajik mujohid with the Tajik-internal shift o < ā) and modern Uyghur mujahit? Is there an east-west split in the way Arabic u is borrowed in Turkic? Do Uzbek and Uyghur reflect Chagatai borrowing practices? Did Chagatai and early Turkish speakers perceive Arabic /u/ in nonemphatic environments differently?

Turkish fronting of nonemphatic vowels interests me because it reminds me of the Mandarin palatal reflexes of Middle Old Chinese nonemphatic vowels in Mandarin: e.g.,

Middle Old Chinese
Mandarin (sans tones)

to dwell

good fortune




3rd person poss. pron.

(Not all *k-nonemphatic vowel sequences have palatal reflexes in Mandarin. *k- that palatalized early became *tɕ- which in turn became [tʂ]: e.g., 支 *ke > *kie > *tɕie > *tɕi > [tʂɻ̩] 'branch'.)

Norman (1994) was the first to make the connection between Arabic emphasis/nonemphasis and what Pulleyblank called the type A/B contrast in Old Chinese (which Norman interpreted in terms of pharyngealization). HOW DID PHOCAEA BECOME FOÇA?

Today is the centennial of the massacre at Φώκαια <Phṓkaia> /fokea/ [focea] 'Phocaea', now Turkish Foça /fotʃa/ [fotʃa]. (What would its Ottoman spelling have been? فوچا <fwčʔ>?).

I was surprised by the correspondence betwen Greek /kea/ [cea] and Turkish /tʃa/ [tʃa]. In theory Greek /fokea/ [focea] could have become Turkish ˟Fokea /fokea/ [focea]. But maybe the local Greek and Turkish versions of the name are closer: e.g., if the local Greek dialect had shifted *ea to [ja] and if the local Turkish dialect had merged [c] and [tʃ], etc. Or maybe I'm just seeing regular borrowing conventions at work reflecting an earlier time: e.g., if Greek /k/ had palatalized to [c] before /e/ before Turkish /k/ did, then the closest Turkish equivalent of Greek [c] at that time would be [tʃ].

Having spent so many years studying Sinoxenic - systematic Chinese borrowings in Vietnamese, Korean, and Japanese - I'm accustomed to regularity in borrowings. And unusual features are usually not random noise. They generally reflect lost features: e.g., dentals in Sino-Vietnamese reflecting old southern palatalized labials, the -l of Sino-Korean reflecting an old northern final liquid absent from any living Chinese language, etc.

Middle Chinese 必 *pit 'necessarily'
> Sino-Vietnamese tất [tət] < *sət < *psət < *ət in Annamese Middle Chinese

Ferlus (1992) reconstructed earlier Sino-Vietnamese *pz-, but I have never seen that cluster in any Mon-Khmer language

the schwa is an interesting deviation from the Chinese norm I'll explore later

> Sino-Korean phil < *pir in northern Middle Chinese

the Korean aspiration is irregular and may be due to hypercorrection

So I'd like to think there's some significance in the correspondence betwen Greek /kea/ [cea] and Turkish /tʃa/ [tʃa]. But maybe there isn't any. The elite of Vietnam, Korea, and Japan looked up to Chinese and wanted to closely emulate Chinese pronunciation, whereas Turks had no motivation to closely emulate the pronunciation of their Greek subjects. Greek εἰς τὴν Πόλιν [is tim bolin] 'to the city' became İstanbul, not ˟İstimbolin. MATERNAL COMPRESSION

In "On the Origin of the Mainstream Hakka Word [oi1] 'Mother' ", W. South Coblin proposes that oi-type words for 'mother' in Hakka varieties originate from the compression of two syllables (amoi) into one (oi). Although amoi > oi at first looks like am-loss (i.e., the disappearance of the first half of the word), if the kinship prefix a- is analyzed as a zero consonant Ø- plus a rhyme -a, then oi is really Øoi with the initial of the kinship prefix Øa- and the rhyme of the root moi 'mother':

- oi

That is an example of one of three types of compression in Chinese and other languages of the region:

1. disyllabic word > loss of first syllable without any trace in the second syllable

刀 Early Old Chinese *CVtaw > Late Old Chinese *taw 'knife' (If not for Vietnamese [zaːw] with lenition of *-t- conditioned by *CV-, no first syllable would be reconstructible)

2. disyllabic word > fusion of initial consonants of both syllables + rhyme of second syllable

抱 Early Old Chinese *mʌpuʔ > Late Old Chinese *bowʔ 'to carry in the arms'

*b is a fusion of *m- and *-p-.

My formulation needs to be tweaked because the vowel of the surviving syllable has changed under the influence of the lost vowel of the previous syllable: *u has lowered to *ow.

3. disyllabic word > initial of first syllable + rhyme of second syllable ... no, I'd better reformulate that.

Coblin gives a standard Mandarin example: 不用 yòng lit. 'not use' > 甭 béng 'no need to' (note the neat stacked composite character). My initial formulation doesn't work; it would predict a fusion ˟bòng or ˟bèng (the latter takes into account the impossibility of -ong after labials in standard Mandarin). But the actual form has the tone of the first syllable and a rhyme that is unlike either syllable. So how about

3'. disyllabic word > initial of first syllable + fusion of rhymes of both syllables

to account for 甭 béng?

And 3' can be reworded to account for 抱 *bowʔ:

2'. disyllabic word > fusion of initial consonants of both syllables + fusion of rhymes of both syllables

No, wait, *-ʌ- in *mʌpuʔ isn't a rhyme - it's a vowel in the middle of a word. And I can't think of a word to describe *CʌCu > *CʌCow > *Cow. 'Umlaut' isn't right. Vowel harmony is involved, but there's also diphthongization. I've used the term 'bending' and Schuessler uses the term 'warping', but neither term acknowledges the first vowel that triggers the process. 'Harmonic bending' or 'harmonic warping'?

In any case, I've been thinking that reduction is irregular. Fusion is a type of reduction. So I expect some difficulty in trying to ... reduce reduction to a set of simple categories. I'd still like to say something other than 'anything can happen', though. There are constraints on complexity.

Let's zoom out from a single etymology toward the bigger picture of Hakka as described by Coblin. Let me try to translate his words into a tabular tree:

Early South Central Chinese
Early Southern Highlands Chinese
a subset of Tuhua/Pinghua
Common Hakka-She

土話 Tuhua 'local speech' and 平話 Pinghua 'ordinary speech' are generic terms for a set of unclassified Chinese languages. Coblin proposes that some of them may be related to his Southern Highlands Chinese group of languages which I could call 'Greater Hakka'. He reconstructs 'mother' in Early South Central Chinese as *mVi3/4, leaving aside the problem of daughter forms with tones 1 and 2 (e.g., the "[oi1]" in his title) for the time being. TANGUT VOWELS V. 190509

Writing about the Tangut transcription of Sanskrit trailokya got me thinking about the phonetic values of Tangut vowels again. Here's my own take on the four grades influenced by Gong Xun's ideas. Only basic vowels are listed in Tangraphic Sea order, so there are no nasalized, tense, or retroflex vowels. I still have no idea what the distinction that I indicate as -' was.

Basic vowel
[ʊʶ] [u]
[ɪˁ] [ɪʶ] [ɰi]
[ɑˁ] [ɑʶ] [a]
[ɤˁ] [ɤʶ] [ɯ]
[ɛˁ] [ɛʶ] [ɰe]
[ɔˁ] [ɔʶ] [o]

I write the basic vowel /ə/ as an easy-to-type y in my abstract notation.

The grades:

I. Pharyngealized; lowered and/or backed

Pharyngealization is carried over from Jerry Norman's proposal for the Old Chinese source of Middle Chinese Grade I.

The lowered and backed allophones are similar to Arabic vowel allophones after 'emphatics' as described in Kaye (2009: 565).

Syllables with 'lower' series vowels (*a *e *o) automatically developed pharygealization unless this was blocked by a preceding 'higher' series presyllabic vowel (*ɯ):

*Ca > *Cɑˁ but *CɯCa > *Ca

Conversely, a 'lower' series presyllable vowel (*ʌ) triggered pharyngealization in a following 'higher' series vowel (*ə *i *u):

*CʌCi > *Cɪˁ

Low /a/ cannot be lowered any further, so it is only backed.

Front /e/ is retracted to [ɛˁ]. The underlining indicates retraction. [ɪˁ] without underlining is already backer than front [i], so I do not underline it.

Back /u o/ cannot be backed any further, so they are only lowered.

II. Uvularized; lowered and/or backed

Medial /r/ in pre-Tangut pharyngealized syllables became uvular [ʁ]. This uvular medial was lost, but it colored the following vowel: e.g.,

pre-Tangut *pʰrat*pʰʁɑˁtpʰɑʶ = 2475 𗧑 1pha2 'to break in two'

Note that Gong Xun reconstructs uvularization in both Grades I and II:

Gong Xun

This site

In his system, a medial -ʕ- distinguishes Grade II from Grade I.

Gong has a single unmarked category corresponding to my Grade III and IV. Although it is true that Grades III and IV are in nearly complementary distribution -

Grade III: after v- (a labiovelar glide?), retroflexes, (velarized?) l-

Grade IV: elsewhere

- I still want to work out how they sounded to distinguish between the few minimal pairs that existed.

Syllables with 'higher' series vowels (*ə *i *u) automatically became Grade III or IV dependng on the preceding initial unless there was a preceding 'lower' series presyllabic vowel (*ʌ):

*Ci > *Ci but *CʌCi > *Cɪˁ

Conversely, a 'higher' series presyllable vowel (*ɯ) triggered Grade III or IV in a following 'lower' series vowel (*a *e *o):

*CɯCa > *Cæ

III. Higher and centralized

Grade III was less palatal and more velar than IV. Its palatal vowels /ɰi ɰe/ had velar glides /ɰ/ to distinguish them from the pure palatal vowels [i e] of Grade IV. The sequence /wɰ/ surfaced as [w].

IV. Higher and fronted

Grade IV was more palatal than III. It had front vowels [æ y ø] corresponding to the central or back vowels of other grades.

An exception to that pattern is [ɨ] which, though not front, was still fronter than its back counterparts in other grades.

The Grade IV equivalent of labiovelar [w] in other grades was labiopalatal [ɥ].


Unattested syllables are in parentheses.

𗳛 0244
[qɑˁ] 𗉯 1689
[qɑʶ] -
𗡝 4620
𗩰 3687
1kwa1 [qwɑˁ] 𗬶 2307
1kwa2 [qwɑʶ] 𘐈 5758
𘟖 1031
𗴫 3006
𘎫 5157
([qwɪˁ]) 𗔤 4899
[qwɪʶ] -
(1kwi3) ([kwi]) 𘅧 0576
1kwi4 [kɥi]

I regard [q] as the Grade I and II allophone of /k/.

Are the gaps in the table random or systematic? Any theory of grades should be able to answer that question.

My hypotheses above regarding the origin of the grades predict that

- lower-vowel syllables should tend to have Grades I and II

- higher-vowel syllables should tend to have Grades III and IV

if *CV monosyllables outnumbered *CV̆CV sesquisyllables.

And above we see

- there is no 1ka3 or 1kwa4

- there is no 1k(w)i1

which fits my predictions.

The absence of 1kwi3 is also not surprising, since ki-syllables should tend to have Grade IV, not Grade III. k- does not belong to the subset of initials associated with Grade III: v-, retroflexes, and l-. There are only three known k-syllables with Grade III, and two of them happen to be in the table: 1kwa3 and 1ki3. The third is 1ka'3 which must have been something like [ka] plus whatever feature was represented by -'. PITTAYAPORN'S PROTO-TAI *-ɲ

One of the innovations of Pittayawat Pittayaporn's (2009) PhD dissertation The Phonology of Proto-Tai is his reconstruction of a Proto-Tai final palatal nasal:

Since it has been established that PT [Proto-Tai] allows palatal consonants in the coda [i.e., *-c¹ and *-j], one would also expect to find the palatal nasal occurring in coda position. Although the reconstruction of PT *-c is unequivocal, there is rather little evidence for final *-ɲ. The only potential case I have identified so far is ‘to eat’, which is reflected as /kinA1/ in all SWT [Southwestern Tai] varieties but as /kɯnA1/ in NT dialects [Northern Tai] like Wuming and Yay. We can speculate that the PT form for ‘to eat’ was *kɯɲ A but the vowel was fronted so that the PSWT [Proto-Southwestern-Tai] form for this etyma was *kin A. Therefore, I tentaitively hypothesize that PT had both *-c and *-ɲ.

The reconstruction of *k- and tone category A for the Proto-Tai word 'to eat' is certain. The vowel and coda of that word are less certain.

Let's look at the 'eating' problem from a subgrouping perspective. Unlike Li Fang-Kuei whose classic model of the Tai family had only three branches (Northern, Central, and Southwestern), Pittayaporn (2009: 298) proposed four branches on the basis of clusters of innovations:

A. Most Tai languages

B. Ningming

C. Chongzuo and Shangsi

D. All of Li's Northern Tai languages (such as the displaced Saek in the southeast) plus some of his Central Tai languages

Wikipedia has a clickable version of Pittayaporn's tree.

What is 'to eat' in the four branches?

A. Siamese kin A1

B. Ningming ken A1 (not in Pittayaporn 2009; found in Hudak 2008: 121)

C. Shangsi kɤn A1

D. Yay kɯn A1 but Saek kin A1

There are two types of words for 'to eat': ones with front vowels (A, B, Saek) and ones with back vowels (C, Yay). All end in -n.

Given that -in words are found in both A and D (Saek), let's suppose those branches independently preserve a proto-rhyme *-in.

By analogy, any -in words in Siamese and Saek should respectively end in -en in Ningming, -ɤn in Shangsi, and Yay -ɯn unless complicated by other factors. But is this really the case? Compare the forms for 'to eat' with those for Pittayaporn's *lin A 'water pipe':

A. Siamese lin A2

B. Ningming (no cognate in Pittayaporn or Hudak)

C. Shangsi lin A2 (not ˟lɤn A2)

D. Yay and Saek lin A2 (not Yay ˟lɯn A2)

It is true that in the modern languages, 'to eat' and 'water pipe' belong to different tonal categories (A1 and A2) conditioned by the initials (*voiceless > 1, *voiced > 2). So one could try to salvage the *-in reconstruction of 'to eat' by claiming that *-i- changed before *-n in tone A1 syllables in Ningming, Shangsi, and Yay. But why would, for instance, tone A1 cause *-i- to lower and back to -ɤ- in Shangsi?

Might the original rhyme of 'to eat' be preserved in Shangsi - or Ningming or Yay? No, because the rhymes of 'to eat' in those languages do not otherwise correspond to -in in Siamese and Saek. Here are all the relevant correspondence sets, including those I already mentioned:

Pittayaorn's Proto-Tai
*-ɯɲ -in
-ɤn -ɯn -in
to eat
water pipe
-ɤn ?
*-ɤn -on
*-ɯn -ɯn ?
-ɯn -ɯn to ascend

Pittayaporn's solution is ingenious:

- It accounts for the front vowel of Siamese and Saek as the result of feature transfer: the palatality of *-ɲ shifted to the vowel *-ɯ-, causing it to independently front to -i- in two distant branches of Tai (assuming the Saek word isn't a loan).

- The shift of *-ɯɲ to -Vn in all branches fits a trend against -Vɲ rhymes in Southeast Asian languages. Khmer does have a high neutral vowel-palatal nasal sequence /ɨɲ/ (e.g., in <beñ> /pɨɲ/, the Penh of Phnom Penh), but it is exceptional. Burmese once had /-aɲ/ as its sole /-ɲ/ rhyme, and Vietnamese only has /-aɲ -eɲ -iɲ/.

There are, however, two problems with his *-ɲ:

First, it is only reconstructible in 'to eat'. Perhaps it had merged with *-n (and/or *-ŋ) after other vowels. Or 'to eat' is simply irregular, and *-c has no nasal counterpart, just as Old Chinese *-kʷ has no nasal counterpart.

Second, there is no external support for *-ɲ either within Kra-Dai or beyond it. Although Norquest (2015) reconstructs *-ɲ in Proto-Hlai, he does not reconstruct *-ɲ in Proto-Qi³ *kʰən (< my pre-Hlai *kən) 'to eat'. Blust's Proto-Austronesian *kaen [kaən] - somehow related to the Proto-Tai and Proto-Qi words - ends in *-n, not *-ɲ. The *k-word for 'to eat' probably goes back to Proto-Kra-Dai and is either inherited or borrowed from some Austronesian-type language⁴. Does Proto-Tai preserve a *-ɲ lost elsewhere?

¹The reconstruction of a Proto-Tai final palatal stop is another innovation of Pittayaporn (2009). Although no attested Tai language has /c/, reconstructing *-c accounts for correspondence set 2 in the following table:

Tai languages
Pittayaporn's Proto-Tai Saek


Sets 1-3 are from Pittayaporn (2009: 211-212). Set 4 is based on the forms for 'liver'.

Saek is an aberrant Tai language which "shows many peculiarities that cannot be reconciled within the conventional model of PT [Proto-Tai] phonology" (Pittayaporn 2009: 14).

The Be languages are generally thought to be close relatives of Tai. See Chen (2018: 18) for the placement of Be within four different proposed Kra-Dai language trees. Ostapirat has changed his mind over time; in 2000 he viewed Be as a sister of Tai but in 2015 he viewed Be as a primary branch of Kra-Dai, and as of 2017 he viewed Be as a sister of a Tai-Kam-Sui subgroup.

²Proto-Tai *ˀjen A 'tendon' has a different set of rhyme correspondences that may be conditioned by a palatal initial absent from Proto-Tai *ʰmen C 'porcupine'.

³Proto-Qi is my term for the common ancestor of the Qi subgroup of Hlai. Norquest reconstructs it but has no term for it. Other early Hlai languages had unrelated words for 'to eat'. As only the Proto-Qi word is cognate to the Proto-Tai word, it seems that pre-Hlai must have inherited the word from Proto-Kra-Dai, but only one dialect of Proto-Hlai (i.e., Proto-Qi) retained it whereas other dialects of Proto-Hlai replaced it with innovations of unknown origin: *C-ləːk in Proto-Run and *C-luːɦ elsewhere.

⁴I am deliberately vague here because I do not know if Proto-Kra-Dai is descended from Proto-Austronesian or is a sister to it (i.e., a descendant of Proto-Austro-Dai, if I may modernize Benedict's term 'Austro-Tai'). Or if there is no genetic relationship between Kra-Dai and Austronesian, if Proto-Kra-Dai borrowed from Proto-Austronesian, an ancestor of Proto-Austronesian, or a descendant of Proto-Austronesian. SANSKRIT TRAILOKYA IN THE TANGUT INSCRIPTION AT JUYONGGUAN

Five years ago I rediscovered 村田治郎 Murata Jirō's 1957 book on the inscriptions of the Cloud Platform at 居庸關 Juyongguan¹ in the University of Hawaii library. I had last borrowed it around 1996. Of course my attention was drawn to the Tangut inscription. But, I confess, not for long. Soon after that I dove into the world of Tangut's distant relative Pyu. And I've been there for four years.

Then yesterday Andrew West reawakened my interest in the Juyongguan inscriptions.

Today I was looking at the Tangut inscription at Juyongguan, and the Tangut transcription


5300 3639 2770 4620

1ty4 2rer4 2lo1 1ka4

of Sanskrit trailokya 'three worlds' jumped out at me. I've used Trailokya as part of my long pen name for maybe twenty-five years now.

A few words on the transcription characters:

𘎤 5300 1ty4: The only consonant clusters possible in native Tangut words had -w- as their second element. So one strategy for transcribing Sanskrit consonant clusters was to break them up into CyC-sequences. Tangut y was a neutral vowel, and in Grade IV (indicated by my -4) it was something like [ɨ] or [ɯ].

𗣀 3639 2rer4: Tangut had no [aj]. Guillaume Jacques (2014: 206) does not even reconstruct *-aj at the pre-Tangut level. I am guessing pre-pre-Tangut *-aj became pre-Tangut *-ej (which Guillaume does reconstruct) and then Tangut -e.

Here's a possible example:

𘞪 5356 1teq4 < *Sɯ-taj² 'single' could be cognate to Jingpho tāi and Boro otay, part of a cognate set that Matisoff (2003: 262) glosses as 'single/one/whole/only'³.

(I finished the rest of the entry on 5.4.15:39, added a footnote on 5.6.19:06, and then failed to save the finished page. What follows is a new second half from 5.6.19:39.)

𗥹 2770 2lo1: For a long time, I used to think that Tangut tones might actually be phonations: tone 1 was the default phonation and tone 2 was the marked (creaky or breathy?) phonation. But the phonation hypothesis predicts that Sanskrit would be transcribed solely using Tangut characters for syllables with tone 1. There would be no reason to transcribe Sanskrit with Tangut characters for syllables with tone 2: i.e., a phonation that did not exist in Sanskrit. However, most Sanskrit Co-syllables⁴ were transcribed with Tangut characters for syllables with tone 2 (Arakawa 1999: 111).

Tone 1
Tone 2
Both tones

co, jo

to, do

pho, bo, mo

yo, ro

śo, ho

Why was tone 2 favored for Sanskrit Co-syllables?

Conversely, why was ko transcribed with a Tangut character for a syllable with tone 1?

And was there a reason to transcribe the remaining Sanskrit syllables with Tangut characters for syllables with both tones? For instance, was there something about the -lo- of trailokya that necessitated tone 2, whereas the lo in some other word was somehow different to Tangut ears and required the tone 1 character 𗓽 4710 1lo1?

𗡝 4620 1ka4: This character transcribed both Sanskrit ka and kya. Why not transcribe Sanskrit kya as ky ya (cf. 1ty4 2rer4 for trai above) or as a fanqie character for kya combining  part of a kV-character with part of a ya-character? Perhaps 1ka4 was something like [kja]. But if Grade IV (written here as -4) was characterized by [j], why could 1ka4 also represent Sanskrit ka? Was there no simple [ka] in Tangut? Were Grade I and II ka something other than [ka]: e.g., [qɑˁ] and [qɑʶ] like Middle Chinese *1ka1 and *1ka2? Why was there no Grade III ka?

Chinese and Tangut grades seem to be similar. So if the Middle Chinese transcription of Sanskrit ka was 迦 *1ka3, I would expect the Tangut transcription to be 1ka3 - a syllable that does not exist in Tangut!

To complicate matters, Grinstead (1972: 144) says 4620 could represent Sanskrit ke. 1ka4 must have sounded like Sanskrit ke as well as ka and kya. Maybe it had a front vowel: [kjæ]? 

¹This name was built into Windows 10's pinyin IME. It's interesting to see what's in and out of the IME.

Sometimes more annoying than interesting. For instance, the common character 家 jia 'house' isn't listed as a choice for jia. I've been typing 家族 jiazu 'family' and deleting the second character to type 家 jia.

At least 波 bo 'wave' is included as a choice for bo now. I recall having to type the wrong reading po to make it display in some older version of the Windows Mandarin IME. I just noticed that the bopomofo IME accepts both bo and po for 波 bo 'wave'.

²(Pre-)pre-Tangut *S- conditioned Tangut -q (my symbol for vowel tension) and pre-)pre-Tangut *-ɯ- (perhaps a front or back high vowel like *-i- or *-u- in pre-pre-Tangut) conditioned Grade IV.

³Matisoff (2003: 262) does not gloss the Jingpho and Boro forms.

⁴Many Sanskrit Co-syllables are absent from Arakawa's data: e.g., kho, gho, cho, jho, ṭo, etc.

⁵Arakawa (1999: 111) accidentally omitted the rhyme and first tone of 𗓽 4710 1lo1, the other Tangut transcription character for Sanskrit lo in his table. URN-ING MY PAY

1. Four years of studying Pyu are paying off. Prof. Janice Stargardt of Cambridge made me reexamine the Hpayahtaung urn inscription (PYU 20). After all my advances in Pyu phonology, grammar, and lexicography, I'm finally beginning to understand it now. Just beginning. I imagine that the Khitan Small Script Research Group felt like I did when they began to make progress in understanding Khitan in the late 70s. The decipherment of both Pyu and Khitan both have a long way to go - neither is remotely as advanced as the decipherment of Tangut - but I am now beyond the level of mere isolated words and a handful of grammar rules.

I thought Pyu was totally hopeless when I first tried to wrestle with it in 2015. But I'm starting to see the light at the end of the tunnel now. I'll probably never reach the end of the tunnel, but I hope my work can help others get there.

2. I try not to have tunnel vision. Paradoxically, not focusing on Pyu is the key to understanding Pyu. It's my knowledge of other languages that have made a difference in my efforts to crack that extinct language. I don't have time to look into anything other than Pyu in depth anymore, but I can still glance at the world outside first-millennium Burma.

While Googling for spontaneous nasalization for last night's entry, I came across Rémy Viredaz' "Two unrecognized vowel phonemes in Proto-Slavic".

Even before I got to the mind-blowing part about new phonemes (p. 13), I was stunned by his phonetic interpretation of the traditional set of vowel phonemes as a symmetric system (p. 1). Imagine a Slavic conlang retaining those old phonetic values.

One of Viredaz' new phonemes accounts for the unusual -e of the Old Novgorod masculine o-declension corresponding to *-ъ in the Slavic mainstream.

Now I wonder how Magadhi got -e in the masculine a-declension corresponding to the Slavic masculine o-declension. Needless to say, an Indic verson of Viredaz' solution won't work.

3. I haven't forgotten about northeast Asia. Last night I also saw Andrew Shimunek's "Phonological and literary characteristics of some pieces of Khamnigan oral folklore" which made me wonder if anyone has done a survey of what might be called phonoliterary techniques in the Altaic world. Both Khamnigan and Khitan use rhyme which is alien to Korean and Japanese. Oddly a couple of words that rhyme in Russian have Khamnigan forms that do not rhyme:

R zeljonka > Kh tʃilɔːɴqʰɔ 'green tobacco'

R kartofel' ~ kartoška > qʰɔrtʰapqʰa 'potatoes'

4. Alexander Vovin's "EOJ [Eastern Old Japanese] specific vocabulary and Ainu vocabulary from the Man'yōshū" is a handy reference that only an expert in both early Japonic and Ainu could write.

Now I'm curious about the Proto-(Mainland) Japanese and even Proto-Japonic forms underlying the EOJ and Western Old Japanese forms: e.g., what I presume would be *yuru for EOJ yuru and WOJ yuri < *yuru-i 'lily'. INDO-BURMESE IRREGULARITIES

I'm going to retire the Jurchen day titles because they would all reappear after sixty days. And they made sense as umbrella titles under which I could write about multiple topics, but they make less sense if I'm only going to write about one thing.

John Okell and Anna Allott's Burmese/Myanmar Dictionary of Grammatical Forms - like John Okell's other works on Burmese - is a model that ought to be emulated by teachers of other languages. Saya John's Burmese learning materials are the best I have ever used for learning any language, so I can say without a doubt (and with great shame) that the poverty of my Burmese is all my own fault.

DGF - as Prof. Justin Watkins called it - is a pleasure to read. If only I could retain everything I had read in it. It's been nearly three years since I studied Burmese in Rangoon under Saya John, reading DGF for fun every morning at breakfast. (That's a redundant phrase - what other meal I would I have in the morning?).

Today I looked up the organization ဒို့ အိမ် <dui. im> [do̰ ʔeĩ] Doh Eain 'Our Home' and wanted to see what DGF had to say about <dui.> an abbreviation of ငါတုိ့ <ṅā tui.> 'I plural' = 'we'. While looking in the section, I saw an example sentence for <nāḥ> in the <n> section (<n> is after <d>> and <dh> in the Burmese script) with the word

စကြဝဠာ <cakravaḷā> [sɛʔ tɕa̰ wə là] 'universe' (cakra is cognate to English wheel)

which looks like a mix of Sanskrit cakravāḍa (later cakravāla) and Pali cakkavāḷa (Pali is from ḍ, and Sanskrit l looks like a Classical Sanskrit substitution for which isn't in the CS consonant system). What surprises me is that it is not †<cakravāa> [sɛʔ tɕa̰ wa l] with the penultimate and ultimate written vowel lengths/spoken tones reversed to match the Indic originals.

I looked for the word in John Okell's Burmese: An Introduction to the Script whose "Mismatches" section (pp. 308-311) is still a great reference long after one has mastered all the regular patterns of the script. It wasn't there (though it is an example of the "Unwritten final consonant" type since <cakra> should theoretically be †[s tɕa̰] with an open first syllable rather than [sɛʔ tɕa̰] from earlier cakra). But what was there was

ဇိဝှာ <jivhā> [zḛĩ ʍa] 'tongue' < Pali jivhā (cognate to English tongue)

as an example of "an unwritten creaky tone". If the word had a regular spelling, it would be †<jin.vhā> or †<jim.vhā> with the nasality (<n>/<m>) and creaky tone (<.>) indicated. If the word were a regular borrowing from Pali it would be [zḭ ʍa] without nasality or the mid vowel [e] that developed after an earlier *nasal.

In both 'universe' and 'tongue', it seems that an initial short syllable was filled out with a coda (*k and either *n or *m). This filling seems to have taken place after the pattern of borrowing Indic short vowels as *creaky vowels was set, so Pali jivhā was borrowed as *dʑḭN ʍa (*N = nasal I'm uncertain about) rather than as dʑiN ʍa.

This filling has paralells in Thai: e.g., Sanskrit cakra corresponds to Indo-Thai จักร <cakra> [càk krà-] with an unwritten filler (echo) [k].

The rule for fillers seems to be that they echo following stops and are homorganic (?) nasals before following sonorants (so maybe the earlier Burmese word for 'tongue' was *dʑḭm ʍa).

I can understand the motivation for fillers in Thai in which there is a constraint against syllables ending in short vowels. But there is no similar constraint in Burmese which has syllables ending in creaky (*short?) vowels: e.g., စ <ca> [sa̰] 'to start'. Why couldn't cakra, uh, start with †[sa̰]?

Might fillers in Indo-Burmese tell us something about how Indic vocabulary was acquired by Burmese speakers? In other words, are the fillers traces of a Mon intermediary? Old Mon did not allow open stressed syllables: e.g., <ca> 'to eat' was [caʔ] with an unwritten final [ʔ]. Unfortunately, Shorto's Old Mon dictionary doesn't have an entry for cakra- or any compounds with it; the closest entry to Burmese <cakravaḷā> is <cakkavāḷa> [cakkəwal] which matches the Pali. And Shorto has no entry for any Old Mon version of Pali jivhā 'tongue'.

Maybe echo fillers go back to Pyu: e.g., vikrama 'valor' appears in Pyu as vikrama. But Pyu had no constraint against open syllables like those of Old Mon or Thai, so I suspect the Pyu echo fillers go back to Indic itself:

The first consonant of a group—whether interior, or initial after a vowel of a preceding word—is by the grammarians either allowed or required to be doubled. (Whitney, Sanskrit Grammar, §229)

But what of nasal fillers? They have no basis in Pyu or early Indic. I think 'spontaneous' nasalizations are a later phenomenon in Indic, and even if they were contemporary with Old Burmese, would they have affected the pronunciation of 'high' languages like Sanskrit or Pali at the time?

These issues should be covered in the definitive work on Indoxenic - systematic Indic borrowings outside India. Will such a book ever exist? Does anyone know enough both about Indic and Southeast Asian languages past and present to write it? There's no similar book for Sinoxenic yet. Long ago I had hoped to write a book on 'megaloan' systems covering both Indoxenic and Sinoxenic and even Arabic loans in the Islamic world. I had no shortage of ambition back then. Now I have a shortage of time ...

21:31: I forgot to ask: what is -vāḍa/-vāla/-vāḷa? None of the meanings I can find make any sense when combined with cakra- 'wheel' to form 'universe':

From Monier-Williams' Sanskrit dictionary:

vāḍa (no entry)

vāla (said to be a later form of vāra, but I suspect it's an l-dialect variant) 'hair of an animal's tail'

From the Pali Text Society dictionary:

vāḷa 'snake, beast of prey [< Skt vyāḍa 'id.']; music (?)'

This problem isn't Indo-Burmese; it's just Indic. THE PHOENIX ON THE DAY OF THE RED DOG

Or, in Jurchen,

<RED.nggiyan DOG DAY> fulanggiyan indahūn inenggi

Today I accepted a request from Marijn van Putten (blog / to follow me on Twitter. Here are three samples of his work:

1. What was Sibawayh's pronunciation of Arabic ج <j>?

2. "Is Qur'ān a loanword from Syriac?"

All historical linguistics students would benefit from his explanation of how to evaluate loanword proposals.

3. "The Case for Proto-Semitic and Proto-Arabic Case: A Reply to Jonathan Owens"

I was not convinced by Owens' argument for case as an innovation in Arabic, but as a nonspecialist I didn't feel competent to judge. So I am happy to see Ahmad Al-Jallad and Van Putten's critical take.

Tangut Yinchuan font copyright © Prof. 景永时 Jing Yongshi
Tangut character image fonts by
Tangut radical and Khitan fonts by Andrew West
Jurchen font by Jason Glavy
All other content copyright © 2002-2018 Amritavision