Archives WHITE RAT 6.25

? qulugh ai ? sair tau nyair

'white rat year, six month, twenty five day'

Today on the Discovery Channel I saw bits of Alien Sharks featuring frilled sharks (among other types of sharks).

What is the etymology of Japanese 羅鱶 rabuka 'frilled shark'? -buka is the combining form of 鱶 fuka 'large shark', but what is 羅 ra? Is it Sino-Japanese 羅 ra 'net'? Or is 羅 ra a phonogram for something else? In any case, no native Japanese word can begin with r-.

fuka 'large shark' has nothing to do with the Chinese morpheme 'dried fish' (Mandarin xiǎng, Cantonese soeng2, etc.) that 鱶 originally represented. Why did the Japanese write their native word for 'large shark' as 鱶 'dried fish'? WHITE RAT 6.24

? qulugh ai ? sair ? nyair

'white rat year, six month, twenty four day'

1. Long ago I thought the Taiwanese car company Yue Loong was Mandarin Yuelong (tones unknown). But it was actually 裕隆 Yùlóng 'abundant' + 'eminent'. And it's been Yulon in English since 1992.

I had heard of Yulon's sub-brands but didn't know their Mandarin names until yesterday:

In theory the Mandarin names could be spelled in generic phonograms to be closer to the English names (e.g.,

), but the actual names have better semantics.

2. Until yesterday, Yulon was the only Taiwanese automaker I had ever heard of. I learned of 福特六和 Ford Lio Ho when I saw a reference to its Mazda Isamu Genki (< Japanese 勇 Isamu [a male name] +元気 genki 'good spirits'). I can't find a Chinese version of that name.  Was Isamu Genki only written in Roman letters? How was Isamu Genki pronounced in Mandarin (which doesn't have the syllables gen or ki)?

Not counting the Mazda part: 馬自達 Mǎzìdá, whose z is [ts], not [z]. Normally Japanese names retain their original kanji in Mandarin pronunciation: 松田 Matsuda would become Sōngtián. However, in this case, Matsuda 'Mazda' was phonetically transcribed, probably because the car brand is written in katakana (i.e., without kanji) as マツダ. Windows 10's IME's first option for Matsuda is マツダ. The surname 松田 comes second. I suppose the car brand is more common. But in Google, マツダ  has 58.1 million results whereas 松田 has 69.6 million results.

8.14.0:49: I just learned that in Hong Kong, 'Mazda' is Cantonese 萬事得 Maan6 si6 dak1 'ten thousand' + 'affair' + 'get'. 萬事得 clearly wasn't coined with Mandarin in mind since it is pronounced Wànshìdé in Mandarin.

Conversely, the Mandarinization 馬自達 still works in Cantonese: Maa5 zi6 daat6 isn't far from Matsuda.

3. I had first heard of the マツダ・シャンテ Matsuda Shante Mazda Chantez as a child, long before I studied French. Now I can see that Chantez is a second person plural present indicative verb form.

8.14.22:12: And now I wouldn't pronounce the final -z. I would have when I was ten and didn't know the katakana spelling, much less French.

4. Tonight I had basa for dinner. I had eaten that fish before but had never heard of its name which is from Vietnamese ba sa (in turn from Khmer បាសាក់ <pāsāk'>  [ɓaːsak] 'Bassac', also Vietnamized as Bát Sắc and Ba Thắc).

8.14.20:14: Does the Vietnamization Ba Thắc date from a period prior to the fortition of to th [tʰ]? Was the name borrowed from a language whose name for the river was something like *ɓaːɕak? Was that language something other than Khmer (which has never had ɕ as far as I know), or was it a variety of Khmer with [ɕ] for /s/?

5. I hadn't heard of a derecho until today. Midwestern news doesn't get much coverage in Hawaii. I saw the word in an AP story on p. 4 of the Star-Advertiser.

6. The word featured in today's Star-Advertiser Japan section is 3密 sanmitsu: 'the three C's the public should avoid - closed spaces, crowded places and close contact - to prevent spread of COVID-19'.

8.14.22:50: 密 mitsu < *mit is 'close, dense'. The sanmitsu 'three mitsu' are

I don't know how old those compounds are. Even if they postdate the shifts of *-t > -tsu and *p- > h-, they are pronounced with rules dating back to when 密 had *-t and 閉 had *p-. WHITE RAT 6.23

? qulugh ai ? sair ? nyair

'white rat year, six month, twenty three day'

1. Last night I was surprised to learn that Malaysia's Proton car brand is a Malayo-Euro hybrid:

Perusahaan 'industry' is from Malay usaha 'effort' plus the circumfix per- ... -an.

2. A lot of Asian cars have un-Asian model names, but at least some Proton model names are exceptions. Until yesterday, the only Proton I had ever heard of was the Saga, but then I learned of the company's later models:

3. Until this morning I had forgotten about Asüna, a pseudo-foreign name used by General Motors in Canada. The umlaut has the same 'othering' function in the far more famous pseudo-foreign name Häagen-Dazs. I finally learned the origin of that name tonight:

Reuben Mattus invented the phrase "Häagen-Dazs" in a quest for a brand name that he claimed was Danish-sounding; however the company's pronunciation of the name ignores the letters "ä" and "z"; letters like "ä" or digraphs like "zs" don't exist in Danish, but the similar words "hagen" and "das(s)" that also correspond to the company's pronunciation of its name mean "the chin" and "outhouse/toilet", respectively, in Scandinavian languages, with "das(s)" being coarse slang derived from German. According to Mattus, it was a tribute to Denmark's exemplary treatment of its Jews during the Second World War, and included an outline map of Denmark on early labels. Mattus felt that Denmark was also known for its dairy products and had a positive image in the United States. His daughter Doris Hurley reported in the 1999 PBS documentary An Ice Cream Show that her father sat at the kitchen table for hours saying nonsensical words until he came up with a combination he liked. The reason he chose this method was so that the name would be unique and original.

4. Tonight I also learned about Häagen-Dazs' extinct sort-of-competitor Frusen Glädjé which has a near-Swedish name.

5. The 'foreignness' of Häagen-Dazs isn't as strong in Mandarin 哈根達斯 Hāgēn-Dásī. It's not possible to replicate the flavor of an umlaut or the digraph zs in Chinese characters. There is nothing unusual about the phonograms 哈根達斯.

6. Tonight I discovered that Wikipedia has a whole article about foreign branding.

LOL: "Au Bon Pain, a bakery cafe with a French name, was founded in Boston."

Superdry's use of pseudo-Japanese has long bugged me. Turns out Superdry is British!

I should have figured Pret a Manger was British too. I used to eat there when I lived in London.

(8.13:0:51: Pret turns out to have shops in France! I never saw them in Paris or Lyon.)

Roland is Japanese!?

The "Roland" name was selected for export purposes, as Kakehashi was interested in a name that was easy to pronounce for his worldwide target markets. The name was found in a telephone directory, and Kakehashi was satisfied with the simple two-syllable word and its soft consonants. The letter "R" was chosen because it was not used by many other music equipment companies, and would therefore stand out in trade show directories and industry listings. Kakehashi did not learn of the French epic poem The Song of Roland until later.

(Added quotation 8.13.0:53.) WHITE RAT 6.22

? qulugh ai ? sair ? nyair

'white rat year, six month, twenty two day'

1. Today Kamala Devi Harris became the Democratic nominee for vice-president of the United States. Last week I wrote about Tamil, and by coincidence her mother Shyamala Gopalan is Tamil. The Tamil Wikipedia spells Harris' name in Tamil as

கமலா தேவி ஹாரிஸ்

<kamalā tēvi hāris·>

Tamil has no <d>.

I didn't expect Sanskrit devī 'goddess' to be borrowed into Tamil with a final short vowel [i]. Tamil ி <i> looks like Devanagari long ी <ī> but is short.

I also didn't expect English short [æ] in Harris to be borrowed into Tamil as long [aː].

Oddly Gopalan has no Tamil Wikipedia entry. The Malayalam Wikipedia spells her name as

ശ്യാമള ഗോപാലൻ

<śyāmaḷa gōpalan>

I didn't expect Sanskrit śyāma with a dental l and long feminine ā to be borrowed into Malayalam as ḷa.

Apparently the Tamil spelling of Shyamala Gopalan is

சியாமலா கோபாலன்

<ciyāmalā kōpālaṉ·>

judging from these entries.

Tamil has no initial clusters, <ś>, or <g>.

Why do Malayalam and Tamil add different nasals to Sanskrit go-pāla- 'cow-protector'?

Topics 2-7 are leftovers from yesterday. I wanted the entry on the late John Okell to stand alone without the usual date title.

2. What is the etymology of Sanskrit cārvāka-?

3. I was surprised that the English Wikipedia entry for Mysore didn't include the Kannada spelling



Is maisūru really from Sanskrit Mahiṣāsura? I wonder if it's a folk etymology.

4. Rama and Sita were siblings!? They were in some tellings of the Rāmāyaṇa.  I should read AK Ramanujan's "Three Hundred Ramayanas: Five Examples and Three Thoughts on Translation" (1987).

5. Maybe the most important word I encountered yesterday was Nahḍa with that most Arabic of sounds, the ḍād.

6. The Wikipedia article on Naḥda mentioned Rifa`a al-Tahtawi's تخليص الابريز في تلخيص باريز Takhliṣ al-ibrīz fī talkhīṣ Bārīz (1834). Why was 'Paris' borrowed with a final -z? The Arabic Wikipedia's article on Paris is titled باريس Bārīs with a final s. Is Bārīs a spelling-based borrowing or was it borrowed before Paris lost its final [s] in French?

7. Yesterday was the thirty-fifth anniversary of the release of the Japanese movie オーディーン 光子帆船スターライト Odin: Kōshi hansen Sutāraito (Odin: Photon Sailer Starlight, 1985). I never paid much attention to the English title until last night when I learned that sailer isn't a misspelling of sailor. Sailer and sailor are two spellings of the same earlier word that have become associated with different (albeit related) meanings.

8. Tonight I learned of Chamberlain's (2018) term Kri-Mol for Vietic from Wikipedia. I recognize Kri, but what is Mol?

The adopted term Kri-Mol, or Kri-Molic captures the earliest essential bifurcation between Mol-Toum (Cheut, Toum-Phong, and Việt-Mường) on the one hand, and Nrong-Theun (Mlengbrou, Kri-Phoong, Thémarou, Atel-Maleng, and Ahoe-Ahlao) on the other. Mol is an autonym used by the Mường, pronounced mɔl or mɔɯ. (Use of Mol   also eliminates confusion with the Tai speaking Mường in Nghê An.) (p. 9)

I would add that Mol, unlike the borrowing Mường from Tai, is presumably a native word. (Autonyms aren't necessarily native: e.g., Nihonjin 'Japanese person' contains no Japonic morphemes.)

I confess I never heard of the Toum language until now. It doesn't have a Wikipedia entry (yet).

And what are Nrong and Theun?

The term Nrong-Theun is derived from the names of rivers, the Theun being the main one. Nrong, a tributary of the Theun, is phonemically /ɲrɔːŋ/ (called the Nam Noy in Lao) and Theun is phonemically /thɤːn/. The Theun flows from south to north, the river name changing to Kading about two-thirds of the way before emptying into the Mekong.  'Theun' is the old French spelling and is retained as it is used universally on maps and in the literature. (p. 9)

I would be more eager to adopt this new term if only Chamberlain provided a justification for it based on shared innovations. What shared innovations characterize his two subgroups Mol-Toum and Nrong-Theun? The word innovation does not appear in his 175-page paper (more like a monograph).

If Chamberlain wishes to replace Vietic with Kri-Mol, why does he use the term Vieto-Katuic?

Kri-Mol = Vietic
West (Brou)
East (Katu, etc.)

(based on Chamberlain 2018: 12)

Why not Kri-Katuic? (Can you tell I'm fond of Kri?) And why not Nrong-Mol and Nrong-Katuic for consistency with Nrong-Theun? Is it a good idea to mix river names (Nrong) with ethnonyms (Mol) and/or language names (Kri is both an autonym and a language name) when naming language clades?

9. Normally Sino-Vietnamese refers to borrowings from Chinese in Vietnamese. Chamberlain (2018: 11) uses the term in a new way (at least for me):

Vietnamese is in reality Sino-Vietnamese (there is no non-Sino variety), originally a coastal creole, with huge numbers of Sinitic vocabulary, 70 percent of the lexicon according to Phan (2010), though with core vocabulary that is essentially Austroasiatic.

If Vietnamese is (was?) a creole, does it make sense to consider it a Kri-Mol language? If Haitian Creole is not a Romance language, then Vietnamese shouldn't be a Kri-Mol language. Yet Chamberlain (2018: 12) places it in his tree under Viet-Muong.

I wrote "was?" above because Chamberlain's phrase "originally a coastal creole" could be intrepreted to mean 'originally a creole but no longer a creole' or 'originally coastal but no longer only coastal'.

10. Chamberlain (2018: 162) points out that

'butterfly' is not the best word for comparative phonological purposes as it tends to be subject to expressive and reduplicative forces in many languages. English butterfly and its playful twin flutterby is a good example.

I had never heard of flutterby.

What makes 'butterfly' less stable than other zoonyms? (I guessed zoonym was a real word, and it is!)

11. How have I never heard of Anahita before? I found out about her when looking for the Wikipedia article on Nahḍa (see topic 5).

12. I just learned that Greek Páris is unrelated to the name of the city of Paris which is of Gaulish origin. RIP SAYA JOHN

John Okell passed away sometime between the night of August 2nd and the morning of August 3rd. I had no idea he was gone until just now.

I first met him in Thailand five years ago next month. I was a student in his introductory intensive Burmese course - the two greatest weeks in all my years of study of any subject. I never learned so much so fast. I then studied Burmese with him in London and in Burma. Here in Hawaii I have been using his books for the last year to attempt to retain what he taught.

No words of mine can describe the greatness of ဆရာ <charā> [sʰəja˩] 'teacher' John.  So I have linked to this Irrawaddy profile which I read shortly after meeting him for the first time and this obituary at Frontier Myanmar.

Thank you, Saya John. I could not have worked on Pyu without what I learned from you. WHITE RAT 6.14

? qulugh ai ? sair par ? nyair

'white rat year, six month, ten four day'

1. Today I was surprised to learn that the Sogdian script had a variant of the letter shin (U+10F45 SOGDIAN INDEPENDENT SHIN) to transcribe Chinese 所 (which had an initial - in Middle Chinese; modern standard Mandarin s- is irregular).

2. The Sogdian letter ayin (U+10F12) is quite unlike the others in shape and has no descendant in the Old Uyghur line of scripts leading to the Mongolian and Manchu scripts. Where have I seen such a spiral character before? Khmer ៚ គោមូត្រ <gomūtra> [koːmuːt] 'cow urine' first came to mind, but it has a tail and isn't coiled enough (and in some fonts isn't coiled at all). I have seen spiral characters in other Indic scripts, but they too aren't as coined as Sogdian ayin.

3. Today I mailed my Hawaii primary election ballot which had Ilocano instructions for getting a translated version. Ilocano is the third most spoken home language in Hawaii after English and Tagalog if Pidgin is not counted. Wikipedia has an unsourced figure of 85% for Ilocanos in the Filipino population in Hawaii.

Today I learned the term Ilocandia for "the traditional homeland of the Ilocano people".

4. Today I learned that the 'Sea Peoples' are a modern classification for peoples which had Egyptian exonyms. For years I just assumed they were so mysterious that they didn't even have exonyms! WHITE RAT 6.8

? qulugh ai ? sair nyêm nyair

'white rat year, six month, eight day'

1. When practicing Tangut today, I came across the character


5264 1mer4 'soldier'

with rare left and right-hand components.

The left side 𘩷 (Boxenhorn code wai) is also in

There is no obvious phonetic or semantic common denominator shared by the five characters with <wai>.

The right side (Boxenhorn code dar; I can't find it in Unicode) is only in one other character:


0271 2bi'4 (second syllable of 𗡢𗡠 0702 0271 1to'4 2bi'4 'to seek')

The rare component <dar> is incorrect in the Mojikyo font versions of 0271 and 5264. Mojikyo 0271 has the more common component 𘡭 <dao> (in 32 characters) instead of <dar>, and Mojikyo 5264 has <dar> with a slanted top stroke and without a right-hand diagonal stroke.

There is no obvious phonetic or semantic common denominator shared by the two characters with <dar>.

Do you think the graphic etymology in the Tangraphic Sea for 5264 will make any sense out of this? Let's find out tomorrow.

2. Last night I played episode 43 of 科学忍者隊ガッチャマンF Gatchaman F (1979-80) on its fortieth anniversary. The world of Gatchaman is a parallel Earth with different place names. I wonder if anyone has ever compiled all those names and even tried to put them on a map.

One such name that came up in episode 43 was ニュージョーク Nyūjōku, an obvious play on ニューヨーク Nyūyōku 'New York'. In the subtitles, Nyūjōku was rendered with an umlaut as New Jörk. Is the umlaut canonical, or was that just the subtitler's idea? Normally ö corresponds to Japanese e, not o: e.g., Röntgen became レントゲン Rentogen.

I seem to encounter these stand-in names more often in Japanese rather than American fiction. I just heard a reference to the country of パキスター Pakisutā 'Pakistar' in episode 5 of 宇宙戦士バルディオス Space Warrior Baldios (1980-81) which first aired forty years ago today. WHITE RAT 5.19

? qulugh ai tau sair par ish nyair

'white rat year, five month, ten nine day'

1. Last night I couldn't post on time because my battery was out of power and I couldn't recharge. That turns out to have been for the best since I was able to enlarge the post tonight.

What would the Tangut call a battery? I'm guessing they would borrow the Chinese word 電池 'lightning pond' for 'battery' (itself a borrowing from Japanese)  in one of three ways:

1. via direct phonetic borrowing from Mandarin (either standard diànchí or its local equivalent)

2. via conversion into 'Sino-Tangut': the conventional Tangutization of early 2nd millennium Xia Chinese: e.g.,


3666 1456 1then4 1chhi2

a phonetic approximation of Xia Chinese *3then4 'lightning' and *1chhi3 'pond'.

3. via a calque such as


3665 4707 1lhaq 2jen2 'lightning pond'

which contains the word for 'pond' I wrote about last night.

2. The word featured in this week's Star-Advertiser Japan section is リア充  riajū 'people leading a full life' which is in Windows 10's IME. It's in the English Wiktionary but not the Japanese Wiktionary. The word does, however, have its own Japanese Wikipedia article. The newspaper's definition which I give above doesn't make clear that 'full' means 'in real life'. riajū is an abbreviation of リアル riaru 'real (life)' and‎ 充実 jitsu 'fullness'. riajū fits the frequent four-mora formula for Japanese abbrevations. (jū is one syllable but has two moras.) WHITE RAT 5.18

? qulugh ai tau sair par nyêm nyair

'white rat year, five month, ten eight day'

I did something unprecedented. I did almost none of my language exercises on Sunday due to an emergency. And I did none on Monday and Tuesday because of my extracted tooth. I wasn't supposed to lie down after the surgery, and I handwrite lying down. I don't have a desk with a chair. So I slept sitting up for two nights in a row and neglected my languages. Today I did nearly four times the usual amount of exercises. I would do even more if I didn't have other things to do.

The Tangut exercises for today included part of the Tangut law code (3.4.2. punishment for salt crimes). What leapt out at me was character 4707


for 2jen2 'pool, pond'.

The Tangut script is supposed to be full of semantic compounds. In theory that should make the script easy to learn. All words in the same semantic field should be written with a common component. And the components of each character should play a part in a neat mnemonic 'story'. But that bears little resemblance to reality.

Here's the 'story' of 4707 according to the Tangraphic Sea:


4707 2jen2 'pool, pond' =

top of 4693 1na1 'deep' (i.e., the grapheme of unknown function which I call the 'horned hat': 𘡊) +

all of 5088 1chhwi3 'salt'

'Deep salt'? That's not what first comes to mind when I think of pools or ponds. Neither 1na1 nor 1chhwi3 sound like 2jen2, so 'deep' and 'salt' cannot be phonetic.

What surprises me even more is the absence of the semantic element 𘠣 'water' derived from Chinese 氵 'water'. Compare 4707 with the Chinese character for its Chinese equivalent, 池 <WATER.也>, a transparent semantophonetic compound. (也 is phonetic.)

Conversely, 'water' turns up in Tangut characters for morphemes that have no obvious or inherent connection to water: e.g.,

What is 'water' doing in those characters? It serves no obvious phonetic function, as those morphemes have no phonetic common denominator in Tangut. Those last two words are key.

7.9.22:23: In Old Chinese, 也 was *Cilajʔ, and 池 was *RIlaj (with *I = a higher series vowel other than *i: *u and/or *ə). But the two have diverged considerably in modern languages: e.g., in Mandarin, 也 is and 池 is chí. The different rhymes reflect different minor syllable vowels:

The Mandarin spellings above are in pinyin and are not phonetic: e.g., -o, -uo, and -wo are all [wo], but [wo] is spelled o after labials, uo after other consonants, and wo by itself. WHITE RAT 5.17

? qulugh ai tau sair par ? nyair

'white rat year, five month, ten seven day'

1. Leftover from July 4th: Seeing only the English title of Dream of the Emperor led me to think that the Korean TV show was about a Chinese emperor or one of the two rulers of the short-lived Korean Empire, but in fact the Korean title is 대왕의 꿈 Taewang-ŭi kkum 'Dream of the Great King' - specifically 武烈王 King Muyŏl of Shilla (r. 654-661). Wikipedia's Muyŏl article translates the show title as The King's Dream.

2. Yesterday I finally learned what oncology was. And I found its translation equivalents using Wikipedia's left-hand menu:

Today I learned the Thai equivalent is วิทยามะเร็ง <vidyāmaḥrĕṅa> wítthayaamareng 'study [of] cancer'. I'm guessing มะเร็ง mareng is a loan from Khmer ម្រេញ <mreña>  mrɨɲ 'cancer'. (-ɲ is not a possible Thai coda.)

3. Today I learned sofa is a borrowing from Arabic صفة‎ ṣuffa 'long seat made of stone or brick' - but not 'sofa'! Wiktionary lists five distinct Arabic words for 'sofa' (the last is Iraqi):

4. Another English furniture word of Arabic origin is mattress.

5. Wiktionary transliterates the Middle Persian ancestors of dīwan and takht as <dywʾn'> and <tʾht'>. Mackenzie (1971: xiv) calls <'> an "otiose stroke". Is <'> truly superfluous like an extra dot in some Chinese character variants?

6. Inscriptional Parthian numbers remind me of how I used to avoid writing certain numbers when I was very young: e.g., '5' is 𐭻𐭸   <4 1> (written from right to left). But the difference is that Inscriptional Parthian had no unique symbol <5> whereas I may not have wanted to write 5. (I'm not certain 5 was on my list of taboo symbols.)

7. How did Proto-Iranian Hwah- (ʔwah-?) 'dwell' become Middle Persian gyāg 'place'? I've never seen the sound change Hw- > gy- before.

8. I wonder what it was like to be a Nanjing dialect enthusiast from the West watching the rise of the Beijing dialect. I can imagine after reading what Gabelentz wrote in 1881:

Only in recent times has the northern dialect, pek-kuān-hoá ['northern officer speech'], in the form [spoken] in the capital, kīng-hoá ['capital speech'], begun to strive for general acceptance, and the struggle seems to be decided in its favor. It is preferred by the officials and studied by the European diplomats. Scholarship must not follow this practise. The Peking dialect is phonetically the poorest of all dialects and therefore has the most homophones. This is why it is most unsuitable for scientific purposes.

9. Gabelentz would have been sad to see the Beijing-based standard taught worldwide. Conversely, it is not easy to find modern Nanjing forms despite the prestige of Nanjing in the past. Xiaoxuetang does not list Nanjing forms for 南 'south' and 京 'capital', the two morphemes that make up thename Nanjing. The English Wikipedia's article on the Nanjing dialect doesn't even sketch the phonology or given a single example word, much less a sentence. Fortunately that article does link to a couple of resources on the Nanjing dialect: WHITE RAT 5.16

? qulugh ai tau sair par ? nyair

'white rat year, five month, ten six day'

Today I had my tooth extracted. Before my appointment I looked for cognates of Tangut 𘟗 0039 2korn1 'tooth' using STEDT's 'root canal' tool which was particularly fitting (because the tooth I lost had just undergone a root canal). STEDT derives the Tangut word from Proto-Tibeto-Burman *k(w/y)aŋ 'tusk/molar'.

Even if Proto-Tibeto-Burman (in the sense of an ancestor of all non-Chinese Sino-Tibetan languages) were valid, that etymology seems unlikely given my interpretation of Jacques' (2014) sound changes in Tangut:

Potential examples

(There is no Tangut syllable 2kor1 which would have developed from pre-Tangut *Rkaŋh.)

The nasal vowel of Tangut 2korn1 (pronounced something like [kõʳ]) points to an earlier *-m rather than an earlier *-ŋ.

Perhaps the true cognates of Tangut 2korn1 are those which STEDT derives from Proto-Tibeto-Burman *gam 'jaw, chin, molar'. A couple of forms of interest at STEDT:

'eastern rGyalrong' tə swa kam 'tooth (incisor)' (Sun Hongkai 1991)

'rGyalrong' tə swa rgu 'molar' (Dai 1989)

The language labels are unfortunately not very specific.

kam looks like the pre-Tangut form, particularly if the pre-Tangut vowel was *a (*RkamH).

rgu has an r- reminiscent of the *R- of the pre-Tangut form, though I am not certain -gu is cognate to pre-Tangut *-kVmH.

The swa in both rGyalrong forms is cognate to Tangut 𘘄 0169 1shwi3 'tooth'. Tangut -i is from pre-Tangut *a. Did pre-Tangut *s- palatalize before *i: *swa > *swi > shwi? That can't account for cases of s which did not palatalize before i: e.g.,


are all read 1si4, not 1shi3. (Initial s- is associated with Grade IV and initial sh- with Grade III, so 1si3 and 1shi4 do not exist.)

The sequence of the s-k-roots for teeth in both rGyalrong forms is identical in the Tangut collocation 𘘄𘟗 1shwi3 2korn1 'teeth' in Timely Pearl 183.

Although I don't think there was a 'Proto-Tibeto-Burman' branch of Sino-Tibetan, I still find STEDT's proposed cognate sets useful. WHITE RAT 5.15

? qulugh ai tau sair par tau nyair

'white rat year, five month, ten five day'

I've long assumed that the dav·ḥ /daʍ/ (dav·ṃḥ /ðaʍ/ with initial lenition) of Pyu

tar· dav·ḥ ~ tar· dav·ṃḥ ~ tdav·ṃḥ ~ tdaṃḥ¹ 'king'

might be cognate to Old Chinese 主 *CItoʔ 'master'. dav·ḥ can occur without tar·: e.g., yaṁ dav·ḥ 'this ?' (12.3).

Today it occurred to me that if dav·ḥ in Pyu 'king' is a noun like 'master', then tar· dav·ḥ 'king' is a noun-noun compound '?-lord', and tar· in other contexts might be that mystery noun '?'.

7.7.0:31: Some examples of tar· without a following dav·ḥ ~ dav·ṃḥ:

¹7.6.12:56: In theory, a disyllabic form †ta daṃḥ could appear in texts in the abbreviated style (i.e., the script without subscripts), but so far the disyllabic form is only found in texts in the full style with subscripts. WHITE RAT 4.4

? qulugh ai ? sair ? nyair

'white rat year, four month, four day'

Fourth month, fourth day, four topics - all from today for once. I hope to revisit my backlog later.

1. Let's play Spot the Hanja!

2. Sino-Korean homophones. The story involving the confusion of 防水 <PROTECT WATER> pangsu 'waterproof' and 放水 <RELEASE WATER> pangsu 'drain' has been disputed.

3. I haven't had furigana fun on this blog in a while. In 光文社 Kōbunsha's short-lived Japanese translation of the American comic book Fantastic Four, Dr. Doom is called 破滅博士 which looks like it should be read Hametsu Hakase 'Dr. Destruction' but has the furiganaドクター・ドゥーム Dokutā Dūmu.

That blog gives me the impression that Dr. Doom lives in a country called 幸福王国 which looks like it should be read Kōfuku ōkoku 'Happiness Kingdom' but has the furigana ラトベリア Ratoberia 'Latveria'. But without seeing a scan of the name in the comic, I can't be sure.

4. Today I learned about לוף‎ <lwp> Luf 'Loof', an extinct kosher version of SPAM. Is Loof really derived from (meat)loaf as Wikipedia says? Although Loof is apparently not being produced anymore, the name might live on as a generic word for canned beef, as it is in this list of IDF terms. That list addresses something I've long wondered about: what is it like for an overseas volunteer to join the IDF and learn Hebrew? (4.27.1:24: This gives me a bit of an idea.)

4.27.0:51: Ghil'ad Zuckermann on Luf:

Meatloaf (pronounced in Israeli luf rather than lof) is what we were forced to eat in the army when there was no kitchen around… WHITE RAT 2.11

? qulugh ai ? sair par ? nyair 

'white rat year, two month, ten one day'


(Back to Part III)

The fourth Tangut era with a known Tangut-language name is


0510 2342 5243 0140  1ngwyr1 1lo3 2se4 2lher1 'heaven good.fortune people joy' (1090.2.3-1098.2.3) = 'heaven['s] good fortune [and] people['s] joy'

corresponding to Xia Chinese 天祐民安 *1then4 3u3 1min4 1an1 'heaven help people peace'.

'Heaven' and 'people' are shared by both the Chinese and Tangut names, but the rest doesn't match. Such mismatches are common in Chinese and Khitan-language era names for the Khitan Empire next door.

If 1ngwyr1 1lo3 2se4 2lher1 were the only known instance of 1lo3 and 2lher1, it would be reasonable to guess that they meant 'help' and 'peace' on the basis of the Chinese name, but other contexts that indicate otherwise have also survived.

2. I just started following James (@jwa_khitan) on Twitter. Three threads:

2a. Khitanology 101.

2b. A new proposal on the origin of the Khitan large script:

I believe the Khitan large script may have its origins not in the Chinese clerical script, as the Liao histories say, but instead in the Chinese cursive and running scripts.

2c. What is the "N4631" that I refer to from time to time?


I thought two Korean words pronounced 철 chhŏl sound like possible Chinese loans, and Martin et al. (1967: 1593) independently entertained that possibility over a half century earlier.

3a. 철 chhŏl < earlier chhyŏl 'season' : cf. Sino-Korean  節 chŏl < chyŏl < *tser 'id.'

The trouble is the aspiration which is not in Sino-Korean or Chinese itself. The word may be compressed from a unrelated disyllabic native word like *hʌtser or *tsʌher.

3b. 철 chhŏl (no premodern attestations?) 'discretion': cf. Sino-Korean 哲 chhŏl 'wise'

I can't see why this couldn't be from 哲.

I didn't initially understand why Martin et al. propose 節 'season' as an alternate possible Chinese source of  Korean 'discretion'. 節 has many meanings in Chinese. Maybe 'restraint' is the relevant one.

4. What is the etymology of Qom (which has been in the news because of COVID-19)? The Q- makes me think it's not originally Persian.

5. Today I saw Manchu faššaha 'exerted' in Roth Li (2010: 87). There are only a few Manchu roots with -šš-:

I wonder what the history of that rare geminate is.

6. I lived in the UK for four years but had never heard of "home education".

7. I initially thought Fatma- in Fatmawadi was from Fatima, but I dismissed the idea because I couldn't think of an Indonesian-internal reason to drop the -i-. But David Boxenhorn made me reconsider the idea. I now think Indonesian borrowed this disyllabic variant:

The colloquial Arabic pronunciation of the name in some dialects (e.g., Syrian and Egyptian) often omits the unstressed second syllable and renders it as Fatma when romanized.

Did that variant already exist in the speech of the Arab traders who brought Islam to Nusantara?

8. Today I learned of tourmaline, whose English name apparently originates from Sinhalese. In Chinese, Japanese, and Korean it is the 電氣石 'electric stone', presumably

because it could attract and then repel hot ashes due to its pyroelectric properties.

The Vietnamese Wikipedia calls it tourmalin without the final -e of French and English tourmaline, perhaps to avoid it being pronounced. Why not Vietnamize it further as turmalin (to avoid un-Vietnamese ou) or even something like tunmalin (to avoid un-Vietnamese syllable-final -r)? WHITE RAT 2.10

? qulugh ai ? sair par nyair 

'white rat year, two month, ten day'


(Back to Part II)

The third Tangut era with a known Tangut-language name is


0510 2865 1910 2135 1ngwyr1 1du2 2tenq4 1e'4 'heaven peace ceremony hold' (1085.12.20-1086.9.10)

corresponding to Xia Chinese 天安禮定 *1then4 1an1 2li4 3ten4  'heavenly peace [and] ceremonial settlement'.

1ngwyr1 1du2 could either be a noun compound 'peace of heaven' or a noun-adjective phrase 'peaceful heaven'.

2tenq4 1e'4 is an object-verb phrase 'holding ceremony'. 1e'4 is not an exact equivalent of Chinese 定 'settle, become/make fixed', but it is close if one thinks of 'holding' as 'holding in place'. (3.19.18:42: 1e'4 is not 'hold' in the sense of 'hold a ceremony'.)

2. Rubi in Japanese are almost always hiragana appended to kanji, but there are rare creative exceptions:

2a. Page 70 of volume III of 永野護 Nagano Mamoru's The Five Star Stories has フォーチュン fōchun 'fortune' as rubi for 希望 kibō 'hope'. The official English translation simply has "hope". fōchun may not merely be 'fortune'; it may also be a reference to the green planet Fortune scheduled to appear over four thousand years later (the story is epic in scale).

2b. Page 82 of volume III of The Five Star Stories has 同調機 <SAME TONE MACHINE> dōchōki (a neologism?) as rubi for シーケンサー shīkensā 'sequencer'. The official English translation simply has "sequencer". I assume a sequencer is some sort of gadget in the giant robots in the series. (None of these real-life shīkensā seem to be relevant.)

2c. Page 147 of volume III of The Five Star Stories has シックス shikkusu 'sixth' as rubi for VI世 rokusei 'the sixth' (in names of royalty) in the name コーラスVI世 Kōrasu Shikkusu. I expected the official English translation to have "Colus VI" or "Colus the Sixth", but it has "the sixth heir to the throne of the Colus dynasty". The color page introducing the character in the English edition has "Colus VI".

3-5 are finds from last night:

3. Jesse P. Gates' 2020 documentation of "Ghost's bride", a text in Stau, possibly one of the closer living relatives of Tangut. The very first Stau word in the story, ʁnæ 'long ago' has a potential Tangut cognate 𗂥 1926 2ne4 < *CInejH or *CInaŋH 'in past times'. Could pre-Tangut *C- have been a uvular like Stau ʁ-? The front vowel of Stau ʁnæ makes me think pre-Tangut *CInejH with a front vowel is more likely than *CInaŋH with a nonfront vowel, but on the other hand, pre-Tangut *CInaŋH is closer to Old Chinese 曩 *naŋʔ 'in past times'. Stau as recorded by Gates does not have either -ŋ or -j (and the three codas I found in his text are low in frequency: -n, -r, -v). The history of Stau has yet to be worked out as far as I know, so I don't know whether ʁnæ had a coda, much less which coda it might have had.

4. Andreas and Yadi Hölzl's "A wedding ceremony of the Kyakala in China: Language and ritual" (2019) is about "the only extant text" of a "seemingly extinct" Jurchenic language preserving features lost in Manchu: unpalatalized dental stops and [p] in the perfective converb. (Ming Jurchen hadunpalatalized dental stops but had shifted Jin Jurchen p to f, so its perfective converb was presumably *-fi as in Manchu. Unfortunately, little of Ming Jurchen verbal morphology has been documented.)

I can't get over how Kyakala survived into the last century and then presumably disappeared. How many other languages recently vanished in China without a trace?

5. Also by Andreas and Yadi Hölzl: "The endangered languages of the Manchus" (2019). Note that "languages" is plural! The big surprise for me was the Lu language of the Manchus of ... Guizhou!? Does Lu still exist?

6. I was oblivious to the French name of アフランシ・シャア Afuranshi Shaa in 富野由悠季 Tomino Yoshiyuki's serial novel ガイア・ギア Gaia Gia (Gaia Gear) (1987-1991) until last night. アフランシ Afuranshi is from French affranchi 'freed' (masc. sg. past participle of affranchir).

7. 富野由悠季 Tomino Yoshiyuki's name is a built-in option in Windows 10's IME. Typing in Japanese is so tedious that anything that saves me the effort of typing a two or three kanji (in this case由悠季) helps.

8. I've thought of Manchu -ha/-he/-ho as a perfective suffix, but it also turns up in bihe with bi 'be'. Russian быть byt' 'be' is imperfective and has no perfective counterpart: i.e., no equivalent of bihe (if bihe is perfective). Maybe Russian is shackling my imagination, but I can't imagine how Manchu bi 'be' could be perfective. Being isn't an action and can't be completed.

9. Results of the tenth 創作漢字コンテスト 'kanji creation contest' (via Bitxəšï-史). WHITE RAT 2.9

? qulugh ai ? sair ish nyair 

'white rat year, two month, nine day'


(Back to Part I)

The second Tangut era with a known Tangut-language name is


4457 0139 2leq3 2ne1 'great peace' (1075.1.20-1085.12.19)

which is a straightforward equivalent of the Chinese-language Tangut era name *3the1 1an1 'great peace'.

Tangut adjectives normally follow nouns, but 2leq3 'great' precedes them.

2ne1 'peace' sounds like Xia Chinese¹ *1ne4 'peace', but the mismatch of grades (indicated by final numbers) leads me to think the words are unrelated soundalikes.

¹20.3.3.14:08: My replacement for the awkward term Tangut period northwestern Chinese (TPNWC) that I've used for many years. Coined by analogy with Liao and Jin Chinese. I object to using Chinese names for non-Chinese empires, but I have no problem with using those Chinese names for the varieties of Chinese spoken within those empires.

2. Page 54 of volume III of 永野護 Nagano Mamoru's The Five Star Stories has something I've never seen before: French rubi for Japanese. Ne me blâmez is to the right of 私を責 'I ACC bla ...' and pas is to the right of 下 'plea ...' of the phrase 私を責めないで下さい watashi wo semenai de kudasai 'please don't blame me'.

3.9.13:00: One might get the wrong idea from the placement of the rubi that 私を責 'I ACC bla ...' means Ne me blâmez (though the negation really corresponds to -ない without rubi) and that 下 'plea ...' means pas.

3. So true:

Learning German from native speakers is great, if all you want to know is how to say this phrase or that phrase. They are also great for pronunciation help. But if you want to know why something is the way it is, a native speaker is the last person you should ask. They won’t understand why you struggle in certain areas. Herr Antrim does, because he learned German just like you are.

I studied German with both native and nonnative speakers. I can't say I detected any difference. Of course learners may not be the best judges. I can at least say that I never thought 'I wish I were being taught by a native speaker instead'.

I will also say that nonnative speakers are not necessarily able to empathize with learners either. Many (most?) nonnative speakers who become language teachers probably have a high aptitude for languages. They 'get' things that elude the rest of us. If something is incredibly obvious to you, you may not understand why it isn't equally obvious to others.

4. Via the Wikipedia article on Heavy Metal L.Gaim (1984-85), the ancestor of The Five Star Stories: a useful term 貴種流離譚 kishu ryūritan 'noble-kind flow-away-story' for stories in which a noble hero wanders a foreign land. (My initial guess was that it referred to wandering heroes who didn't initially know they were noble, but that was too specific though it does fit L.Gaim.)

5. Last night I found George van Driem's review of Thurgood and LaPolla's The Sino-Tibetan Languages (2nd edition, 2017) which is especially hard on the editors. I only have the old 2003 edition, so I can't comment on the new one.

6. Today I learned from Wikipedia that the misspelling in Guyver: Out of Standardrized (1986) is intentional. The movie is so 'out of standardized' that the standard spelling of standardized doesn't suit it.

7. Why don't French past participles always agree? Compare (examples from here):

elles [f. pl.] sont parties [f. pl.].

'they have left' (subject agreement)

les filles [f. pl.] ont acheté [m. sg.!] des cadeaux

'the girls bought some presents' (no agreement)

Voici les cadeaux [m. pl.] que les filles ont achetés [m. pl.].

'Here are the presents that the girls have bought.' (object agreement)

I imagine at some point *ont achetées (f. pl.) existed. When did it disappear? Are invariant, originally masculine singular forms like parti and acheté in the future of French?

8. There should be a word for something you took for granted for a long time but didn't realize was unusual until now. The reading of the Japanese surname 太宰 Dazai¹ falls into that category. 太 is normally read as tai or ta in Japanese, not da-. And 宰 is normally read as sai in Japanese, not zai. So 太宰 as a common noun 'great minister' in a premodern Chinese context is read as taisai.

The voicing of d- and z- in Dazai does not reflect Chinese *voiced initials because neither 太 nor 宰 had voiced initials in Chinese. There was never any nasal-final rhyme in 太, so the z- of zai is not due to a preceding *nasal or *nasal vowel. The -z- of 宰 must be rendaku, but the d- of 太 is harder to explain except perhaps by analogy with 大 dai 'great'.

¹The most famous Dazai is 太宰治 Dazai Osamu. The Dazai who inspired this post was 太宰博士 Dr. Dazai from High Speed Task Force Turboranger (1989-1990) which ended thirty years ago last week.

9. I didn't know the superhero duo バイクロッサー Baikurossā 'Byclosser' which turned thirty-five this year had baiku 'bike' in their name until I read this part of their Wikipedia entry tonight. Since 1985 I had thought of the name as 'bi-crosser', and that was intended, but there was another layer I had never detected. WHITE RAT 2.8

? qulugh ai ? sair nyêm nyair 

'white rat year, two month, eight day'

1. TANGUT ERA NAMES I: I've long wanted a list of era names in Tangut. Andrew West has compiled them and much more in these files:

More on Tangut and other calendars on Andrew's site.

The earliest Tangut-language era names known to Andrew are


2544 2342 2shen3 1lo3 'sage['s] good fortune'

or 'fortunate sage' if I follow Li Fanwen (2008: 387) and interpret 2342 as an adjective


or 2748 2135 2chha3 1e'4 'virtue hold' = 'holding virtue'

which loosely correspond to halves of the longer Chinese-language era name 福聖承道 *4fu3 3shen3 1shin3 3thaw1 'Fortunate Sages Receive the Way' (1053.1.23-1057.2.6) used by the Tangut Empire.

2544 and 2342 are standard Tangut equivalents of Chinese 福 'good fortune' and 聖 'sage', but 2748 and 2135 are equivalent to Chinese 德 'virtue' and 持 'hold', not 道 'way' and 承 'receive'.

2. Li Fanwen (2008: 353) defines 2135 in


0020 2135 3818 'way ? -er' = 'Daoist' = Chinese 道士 'way person'

as 士 'person', but I think it retains its most common meaning 'hold' in that context, so 'Daoist' in Tangut is literally 'way holder'.

3. KANJI IN SPACE (and not in space because Earthlings took sinography to the stars - the setting of this epic is one of those alien realms populated by human lookalikes, not Terran colonists):

天照家を中心に漢字を使用している場面が複数見られるが、現在のジョーカー太陽星団では漢字は一般的に使用されておらず、ほとんど模様として認 識されている。ただし漢字文化自体は現存しており、一部のキャラクターの名前には漢字表記が存在し、古文という形で学校の授業科目の一つにもなっている。 [...]


Several scenes in which kanji are used, mostly by the Amaterasu family, may be seen, but kanji are generally not used in the Joker Star Cluster at present, and are generally perceived as patterns. However, kanji culture itself does currently exist [in the FSS universe]. Some character names have kanji spellings [e.g., the protagonist is 天照帝 <HEAVEN SHINE EMPEROR> Amaterasu no mikado], and kanji is a school subject in the form of ancient writing. [...]

Kana also appear in the story.

- Wikipedia on ファイブスター物語 Faibu sutā monogatari (The Five Star Stories)

4. The FSS Wikipedia entry led me to an entry on インチキ外国語 inchiki gaikokugo 'phony foreign languages'. Which is a mouthful, so I coined pseudoxeno. The corresponding English article is about gibberish, which isn't the same thing: "speech that is (or appears to be) nonsense". Pseudoxeno is meant to sound foreign.

The article includes foreign-sounding monster names. One that just turned forty last week is コゴエンスキー Kogoensukī 'Freezensky' from Masked Rider (1979-1980), a blend of Japanese 凍え- kogoe- 'freeze' and Russian -nskij. A creature as cold as シベリア Shiberia. WHITE RAT 2.7

? qulugh ai ? sair ? nyair 

'white rat year, two month, seven day'

1. THE UYGHUR QUESTION: Dunnell (1996: 54) wrote:

[...] Uighurs consistuted diverse groups who played important but ill-documented roles in both Liao [= Khitan Empire] and Xia [= Tangut Empire] state formation. Were they refugees from Central Asian Muslim rulers or ambitious emigrants from Shazhou (Dunhuang) anxious for employment? Or were they local Alashan (i.e., Helan shan) Uighurs? Different groups of Uighurs moved through the region in the tenth through twelfth centuries, some entering the Liao elite and some the Xia elite [...] and many maintaining traditional trading connections with Tibet, Qinghai, and Central Asia.

Are the Uyghurs in the Tangut Empire better understood now, twenty-four years later?

I have never seen anyone address the possibility of Uyghur influence on Tangut, apart from Kwanten's (1989: 18) suggestion that the Tangut may have used the Uyghur script before using their own script in 1036. (I am not counting Kwanten's hypothesis that the Tangut script represents an 'Altaic'-type [but not specifically Uyghur] language.)

Yesterday I was wondering if Tangut vocalism could have been under Turkic influence: e.g., it could have had front rounded vowels (which are unlikely to have been in Proto-Sino-Tibetan). Tangut certainly could not have shared a vowel inventory with Uyghur, as it had an enormous number of vowels with distinctions without Turkic parallels: nasality, tension, retroflex, and the mysterious quality that I indicate with an apostrophe (a substitute for a prime symbol). Nonetheless perhaps at some earlier stage Tangut could have had a simpler Turkic-like vowel inventory.

The Tangut imperial family claimed descent from the Tuoba Wei [a Serbi dynasty], and "other powerful Tangut clans [...] could also claim Xianbei [= Serbi] descent" (Dunnell 1996: 45). That brings up the possibility of para-Mongolic (i.e., Serbi) influence on Tangut: e.g., height harmony (which I have suspected to have driven the development of the grade system).

Any Turkic or para-Mongolic - in a word, 'Altaic'¹ - influence on Tangut was certainly not morpholo'gical or syntactic, as Tangut has no 'Altaic'-type morphology, and its word order is typically Tibeto-Burman (i.e., Sino-Tibetan minus Chinese). Yes, Tangut has final verbs like 'Altaic', but so does, say, Pyu which had zero contact with 'Altaic' languages. What Tangut does not have is 'Altaic'  syntactic features absent from Tibeto-Burman: e.g., consistent modifier-modified order.

¹I use the term 'Altaic' to indicate a linguistic area, not a language family.

2. Wikipedia:

Atypically even among the country's small educated elite, Sukarno was fluent in several languages. In addition to the Javanese language of his childhood, he was a master of Sundanese, Balinese and Indonesian, and was especially strong in Dutch. He was also quite comfortable in German, English, French, Arabic, and Japanese, all of which were taught at his HBS [Hogere Burgerschool].

Was Japanese really taught at  a Hogere Burgerschool in the Dutch East Indies in the late 1910s? Was Japanese taught at any high school as a foreign language outside the Japanse Empire a century ago?

As for Arabic, I imagine it was taught as a component of his religious studies rather than as a language for active use.

3. What is Fatma- in the name of Sukarno's third wife Fatmawati? It looks like Sanskrit padma- 'lotus' with p- changed to f- to Arabize it (standard Arabic has no p) and -d changed to -t to conform to Indonesian phonotactics which do not permit syllable-final voiced stops. (I assume the d in Indonesian padma 'lotus' represents [t].) -wati is from Sanskrit -vatī 'having' (feminine nominative singular).

4. Is this serious?

Therefore, a new possibility arises that the origin of Uralic languages (and perhaps also of the Yukaghir languages) may be Liao River region.

5. A reminder that 'genetic' relationships of languages are not really genetic:

N-M178* has higher average frequency in Northern Europe than in Siberia, reaching frequencies of approximately 60% among Finns and approximately 40% among Latvians, Lithuanians & 35% among Estonians (Derenko 2007 and Lappalainen 2008).

Finnish and Estonian are Uralic, but Latvian and Lithuanian are Indo-European. I would guess that N-M178 is uncommon or absent from Hungarians even though Hungarian is Uralic.

6. Writing about the pan-East Asian word 比較 'comparison' which does not seem to exist in Vietnamese made me wonder how Vietnamese so [ʂɔ] 'compare' was written in nom. has 14 different spellings with phonetics:

variant of 芻 NomNaTongLight.ttf U+F125C
扌 'hand'
'hand' indicates a verb
走 'run'
why 'run'?
𨎆 車 'chariot'
車 shared with 較 'compare'
扌 'hand'
區 < 樞 xu
'hand' indicates a verb
区 < 枢 = 樞 xu anachronistic?; 'hand' indicates a verb
扌 'hand'

lự 'hand' indicates a verb


𢫘 卢 = 盧
車 'chariot'

anachronistic?; 車 shared with 較 'compare'; does D2 轤 exist?
𨏧 車 shared with 較 'compare'
⿰卢車 卢 = 盧 anachronistic?; 車 shared with 較 'compare'
⿰口初 NomNaTongLight.ttf U+F129D

Vietnamese s- is from *Cr-. Sometimes nom spellings created before *Cr- > s- can point to what *C was. Unfortunately none of the spellings seem to be helpful unless *C- is *s-.

Normally phonetics with unrounded vowels like ơ and ư are not used to write Vietnamese syllables with rounded vowels. C1 攄 may be a graphic error for D1 攎. I cannot explain E1 ⿰口初.

Some forms look like modern PRC-style simplified characters and may be anachronistic, though maybe they did exist in premodern Vietnam.

7. MANCHU BEFORE MANCHU?: This image of a "Manchu" couple dates from c. 1590, 46 years before the Manchu adopted the autonym Manju 'Manchu'.

What does "Tohany" in the caption mean?

More images from the Boxer Codex here.

8. Really?

According to Leiden scholars, another possibility is derivation [of Greek 'hundred'] from Proto-Indo-European *h₁ḱm̥tóm, which is a regular simplification of *dḱm̥tóm in their theory.

What are other examples of Proto-Indo-European *d becoming *h₁ [ʔ] in initial clusters?

9. Tonight's 48 Hours pronounces Frunză [ˈfrunzə] in Moldova as [ˈfɹʌnzə] by analogy with English runʌn]. Sigh.

10. I use this English circumposition all the time even though I never heard of the term circumposition until I read about Pashto grammar tonight. The Wikipedia Pashto grammar article uses the term ambiposition. WHITE RAT 2.6

? qulugh ai ? sair ? nyair 

'white rat year, two month, six day'

1. I never tire of posting this list of variants of <SIX> i the Khitan large script that Andrew West compiled (and includes in his font that you see here):

How many other Khitan large script variants remain undiscovered? How many platonic characters (platographs?) does the Khitan large script have? Certainly less than the 2,000+ different forms that have been found so far. I've been guessing that the platonic'inventory is no more than 1,000 characters.

Today I read that Konstantin Pozdniakov reduced Barthel's inventory of 600 rongorongo glyphs down to about 78 with 52 accounting for 99.7% of the corpus.


The other 0.3% were made up of two dozen glyphs with limited distribution, many of them hapax legomena. This analysis excluded the Santiago Staff, which contained another three or four frequent glyphs. [My figure of ~78 is from 52 + "two dozen" + "three or four".]


As Pozdniakov readily admits, his analysis is highly sensitive to the accuracy of the glyph inventory. Since he has not published the details of how he established this inventory, it is not possible for others to verify his work.

2. Wikipedia also says:

However, Sproat (2007) believes that the results from the frequency distributions are nothing more than an effect of Zipf's Law, and furthermore that neither rongorongo nor the old texts were representative of the Rapanui language, so that a comparison between them is unlikely to be enlightening.

I've known of Zipf's law for a long time, but I didn't know what it exactly was until now:

Zipf's law was originally formulated in terms of quantitative linguistics, stating that given some corpus of natural language utterances, the frequency of any word is inversely proportional to its rank in the frequency table. Thus the most frequent word will occur approximately twice as often as the second most frequent word, three times as often as the third most frequent word, etc.: the rank-frequency distribution is an inverse relation.

If I understand that correctly, it predicts that the top three words in a list should occur in a 6 : 3 : 2 ratio. For every six occurrences of the most frequent word, there should be about three occurrences of the second most frequent word and about two occurrences of the third most frequent word. (Let's not worry about defining 'word'.)

Here are ratios from three lists I've looked at - I've given the most common word/component a value of 6 to facilitate comparison with the 6 : 3 : 2 ratio above:

3. Before seeing the frequency list above, I had never heard of Multatuli's Max Havelaar, "the book that killed colonialism" in the words of Pramoedya Ananta Toer.

In the last chapter the author announces that he will translate the book "into the few languages I know, and into the many languages I can learn."

... which fits the multilingual reputation of the Dutch.

Incredibly Max Havelaar "was not translated into Indonesian until 1972" - long after independence! And a 1976 film adaptation "was not allowed to be shown in Indonesia until 1987." I wonder why.

4. I don't know how I managed to not see this photo of the Dornogovi inscription (1058) in the Khitan large script on Wikipedia until last night. It's been up since.

Dornogovi (ᠳᠣᠷᠤᠨᠠᠭᠣᠪᠢ <TUrUnaghUbi> in the traditional Mongolian script) is 'East Gobi'.

5. In White Rat 2.2 I forgot to mention that the Albanian version of Persian khwāja 'master' is hoxha [ˈhɔdʒa]. When I first had to memorize Enver Hoxha's name for school in 1984, I must have mispronounced it as [haksha].

6. Words I discovered via Enver Hoxha's Wikipedia entry:  Albanian gjakmarrja 'blood feud' and hakmarrja 'revenge' (banned during his early years in power) - and Serbian крвна освета 'blood feud'.

7. Last night I learned that the Defense Language Institute Foreign Language Center regards Pashto as a Category IV language: i.e., super-hard for English speakers like Arabic, Mandarin, Japanese, and Korean. What little I've seen of Pashto intimidates me in a way that Persian and Hindi do not. So I'm not surprised Pashto is the hardest Indo-European language on the DLIFLC scale.

DLIFLC used to teach Vietnamese until 2003. I've long been surprised that Vietnamese and Thai were 'just' Category III languages (see this listing) because I think they're comparable to Mandarin in difficulty apart from their writing systems. I guess the Chinese script is a factor in classifying Mandarin as Category IV.

8. As a general rule, Sino-Japanese compounds don't mix elements from different borrowing strata. But there are many exceptions. 隧道 'tunnel' is read as

I suspect sui in 隧道 suidō is by analogy with its phonetic, the more common character 遂 read sui rather than a conscious use of the Kan-on reading. Today, is required in schools, but is not. Shpika stats:


遂 has the native reading tsui which I would mistake for Sino-Japanese if I didn't know better. But tsui is from an earlier disyllabic tupi which doesn't sound like a monosyllabic Chinese reading.

9. Tonight I saw the word 比較 hikaku 'comparison' on the obi of Gundam Wing Data Collection 1 and was reminded of a question that's come to mind from time to time: why is 較 read kaku rather than kyō? Compare hikaku with Korean pigyo (not ˟pigak) and Mandarin jiào (< *-æ̤w; not ˟bǐjué < *-æwk). (The corresponding hypothetical Vietnamese word should be *tỉ giảo or *tỉ giác. Has any such word ever existed?) In Middle Chinese, 較 represented *kæ̤w 'compare' and *kæwk 'bars atop the sides of a carriage box; compete; contest'. Most 交-characters were read with *-w, not *-wk. Did hikaku originate as a hypercorrection?

(2.29.11:53: The reading giảo < *kjảːw  for 較 is interesting, as I would expect ˟giáo. Segmentally giảo looks like a late stratum loan, but its tone is typical of an early stratum loan. In theory an early borrowing of 較 would have been *kẻo [kɛ̉w]. Perhaps giảo is a middle stratum loan that tells us *ɛw > *-jaːw occurred in the source language before its 'departing tone' changed from what sounded like a *sắc tone to what sounded like a *hỏi tone to Vietnamese ears.)

10. What's with the camel case title of this article: "Type 97 ShinHoTo Chi-Ha medium tank"? I've  never seen morphemic capitalization of Japanese before: 新砲塔 shinhōtō 'new cannon tower' as ShinHoTo.

Wikipedia explains what 97 ... Chi-Ha is:

Chi (チ) came from Chū-sensha (チュウセンシャ, "medium tank"). Ha and Ni, in Japanese army nomenclature, refer to model number 3 and 4, respectively from old Japanese alphabet iroha. The Type was numbered 97 as an abbreviation of the imperial year 2597, corresponding to the year 1937 in the standard Gregorian calendar. Therefore, the name "Type 97 Chi-Ha" could be translated as "1937's medium tank model 3".

11. Tonight I uploaded the last four White Rat month 1 entries (1.26 / 1.27 / 1.28 / 1.29) culminating in my promotion of Dwight Decker's phrase déjà lu.

Wiktionary says déjà /deʒa/ is from dès /dɛ/ 'from' (< de + ex) + /ʒa/ 'already'. Why isn't the combination of /dɛ/ + /ʒa/ dè /dɛʒa/?

12. I confess I carelessly used the word manuscript to refer to premodern prints until recently. See Sven Osterkamp on this issue.

13. This kind of curling would be nice to see in modern Chinese character logos.

14. Interesting 18th century romanizations of Japanese names from Sven Osterkamp:

They give us hints about Japanese and Dutch pronunciation at the time. WHITE RAT 2.5

? qulugh ai ? sair tau nyair 

'white rat year, two month, five day'

1. Last night I finally realized that the IPA symbol ɞ is a closed ɜ. And just now I realized ɵ is a closed ə. Duh. The closure makesɞ and ɵ looks like o.

2. Imagine ɵ as a modification of the hangul zero consonant letter ㅇ to transcribe, say, [ʕ] in an Arabic phrasebook for Koreans.

3. Last night I saw the Japanese word 披露 hirō 'making public' when I read this story about a 'new' 手塚治虫 Tezuka Osamu manga created with AI assistance (AI-d?). That brought to mind a question I've wondered about before: why does披露 hirō end in a long vowel? In other environments, 露 ro has a short vowel. The Chinese rhyme class of 露 ro was regularly borrowed as *-o in Japanese:

4. Windows 10's Japanese IME offered 琥 as an option when I typed <ku>. When is 琥 read ku? The only reading of 琥 that I know of is ko as in 琥珀 kohaku 'amber', the only common word containing the character. Dictionaries do list ku as an alternate reading, but I think that's an artificial Go-on reading created via fanqie. I would expect the Go-on reading of琥 to be *ku (from earlier *ko after *o-raising), but such a reading may have not have survived.

5. I've been taking a closer look at Alan Downes' 2018 PhD dissertation How Does Tangut Work? I confess I do not understand how he got

'When differing coloured vessels are received, they should be supplied stamped.' (p. 99)

out of this excerpt from article 1261 of the Tangut law code:

























one (by one)?

vessel various

















































one after another


1. 1ly3 (approximately [lə]) may be an unaccented form of 𘈩 0100 1lew1 'one'; cf. English a(n) from one.

Character 1 might be the first half of 1ly3 1ly3 'one by one' if the lost character 2 is also 5285. 1ly3 1ly3 'one by one' precedes nouns: e.g.,


1ly3 1ly3 1i4 1vi'1

'one one many born' = 'each and every living thing'

So 1ly3 1ly3 'one by one' would be appropriate in this context before the noun 1ka4 2gu4 'vessel'.

3-4, 5-6. Unlike Downes, I gloss polysyllabic words as single words instead of glossing individual syllables. I don't see where 'coloured' comes from. Could 2my1 2ner4 'various' also refer to variation in shape?

7-9. 'contribute come' corresponds to Downes' are received', and the postposition 'on' is a locative metaphor corresponding to Downes' 'when'.

10-11. 'beginning': i.e., the vessels must have been stamped from the beginning? Corresponds directly to nothing in Downes' translation.

12-13. optative prefix of inward motion + 'seal' = 'should [be] stamp[ed]'

14-15. literally 'beginning back'; corresponds to nothing in Downes' translation.

16. A perfective prefix for some unknown verb. Downes' translation supplies 'supply' as the lost verb.

6. Page 34 of volume II of 永野護 Nagano Mamoru's The Five Star Stories has an unusual case of kanji ruby for hiragana: あいだ aida 'interval' has the ruby 時間 jikan 'time' (which in turn contains the kanji 間 that can be used to write aida 'interval'). The idea is to imply that aida is an interval of time. The English translation of the passage on page 10 has "moment".

7. Bess Press publishes readers in English, Hawaiian, Marshallese, Chuukese, and CHamoru for the Hawaii market. They capitalize both letters in the CHamoru digraph CH, though Wikipedia does not: <Ch>. Wikipedia explains:

There is also a movement on Guam to capitalize both letters in a digraph such as "CH" in words like "CHamoru" (Guamanian spelling) or "CHe'lu" ['sibling'], which NMI [Northern Mariana Islands] Chamorros find silly.

I'm guessing the spelling ch is due to Spanish influence.

8. It's taken me almost four decades to realize that Bess Press might be based on Pidgin bes pres 'best press'. Until now I thought it might have been named after someone named Bess.

9. Bess Press has published a trilingual Pidgin/Okinawan/Japanese book:

Okinawan Princess: Da Legend of Hajichi Tattoos

ウチナー ヌ ウミナイビ ヌ ハジチ ヌ イファナシ

Uchinā nu uminaibi nu hajichi nu ifanashi


Okinawa no ohimesama no hajichi no densetsu

'Okinawa GEN princess GEN hajichi GEN legend'

The Okinawan title is written in ruby atop the Japanese title. That takes advantage of the match between Okinawan and Japanese word order. I am reminded of the Korean ruby on this 1940 announcement from the 大邱 Taikyū (now Taegu) court under Japanese rule.

Translator 崎原正志 Sakihara Masashi, a PhD in linguistics and a specialist in Ryukyuan and Japanese linguistics, was 38 as of September 2019, so I assume he learned Okinawan as a foreign language. I would be surprised if Okinawans born in the 80s grew up speaking Okinawan.

What I've seen of Lee Tonouchi's writing was in 'light' Pidgin in English spelling, so Sakihara was probably able to translate directly from it rather than through an English translation. Sakihara has been to Hawaii and may be familiar with Pidgin.

I assume ifanashi is Okinawan for 'legend', as it is the ruby for Japanese 伝説 densetsu 'legend'. But I can't find ifanashi in Sakihara's 2006 dictionary. The word is clearly cognate to Okinawan and Japanese hanashi 'story'. I don't know what i- is.

10. Windows 10's Japanese IME converts <tegu>  (an approximation of Korean Taegu) into 大邱. Nice.

11. Is the Korean word 루비 rubi 'ruby' directly from English or via Japanese? I would guess the latter, though Martin et al. (1967: 557) derive it directly from English. (Contrast how they derive 루(우)블 ru(u)bŭl 'ruble' in the preceding entry from Russian via English.)

Martin et al. spell the word as 루(우)비 ru(u)bi with an optional long vowel. The long vowel variant is not from Japanese ルビ rubi which has short vowels. I don't know of any variant ルービ rūbi with a long vowel.

12. While looking for ルービ rūbi, I found ルービックキューブ Rūbikku kyūbu 'Rubik's Cube'. The long vowel in Rūbikku may reflect English [ˈɹuːbɪk] because the original Hungarian name has no long vowel: Rubik [ˈrubik] (not Rúbik [ˈruːbik]).

13. Top ten kanji searches at as of 2.28 Japan time with their Shpika stats:

req in school
<MAN.WOMAN.MAN>: tease
<JADE.WHITE.STONE> uncommon morpheme for 'blue'
<WOMAN.MAN.WOMAN>: variant of 嬲
<GOLD.GOLD.GOLD>: wealthy
<STAND.WIND>: first half of 颯爽 'gallant'
<FISH.FISH.FISH>: variant of 鮮 'fresh'

<LIGHT.ARMY>: 'bright'
'city outskirts'; far more common as a character component

prewar form of 寿 'long life' still used decoratively
<DEER.DEER.DEER>: variant of 粗 'coarse'

I've seen 1 (嬲) and 3 (嫐) come up in discussions of kanji with funny component combinations.

4 (鑫), 6 (鱻), and 10 (麤) may interest people because of their tripled parts.

(2.28.0:07: Wiktionary says 鑫 is

A [Chinese] nickname for Kim Jong-un; the character is composed of three Kim characters (金), and Kim Jong-un is the third Kim to rule North Korea.

Is that true?)

Is 9 (壽) no longer common knowledge? It's been over seventy years since it was required in schools, but it's still around.

I don't know why people would look up 7 (輝), a very common character. 2 (碧) and 5 (颯) are slightly more understandable, but they're not rare either.

I'm puzzled by how 8 (冂) became a popular thing to look up.

14. Page 34 of volume II of 永野護 Nagano Mamoru's The Five Star Stories has 皮膚 hifu 'skin' spelled as mixed kanji-kana 皮フ <SKIN fu>. I wouldn't have expected such an abbreviation in the dialogue of a 'meight' (a creator of artificial humans), though I would expect many Japanese to have trouble writing膚 since it mostly appears in only one word and none of its components are read fu.

15. The word featured in this week's Star-Advertiser Japan section is チンする chin suru 'to microwave'. suru is 'do', and chin is said to be the sound of a microwave indicating it's done. Maybe that's how microwaves sounded in 1988 when the word came into use according to I don't remember any microwave sounding like that.

16. It just occurred to me that the Japanese radical name madare for 广 is 'ma-hanging', a reference to the various ma-graphs written with 广:

麻魔摩磨, etc.

广 itself, however, is not a phonetic for ma; the actual phonetic of those graphs is 麻 ma, and 广 'house built to depend on a cliff' is read gen. 广 is also the PRC simplification of 廣 'wide' - which is simplified in Japan as 広 and read kō.

Today I learned that 广 can also be a simplification of 庵 'hut', though I'm guessing that abbreviation is now obsolete. I didn't notice it in Andrew West's list of simplified characters from a 1935 Republic of China Ministry of Education proposal.

17. Charlamagne tha God is a guest on Stephen Colbert's show tonight. His name is hard to spell correctly - I just misspelled it as Charlemagne.

18. Is the pegative case only in one language? Is it necessary for the analysis of the Azoyú variety of the Tlapanec language?

19. The perlative case is more common than the pegative case, but I never heard of it until tonight. WHITE RAT 2.4

? qulugh ai ? sair ? nyair

'white rat year, two month, four day'

1. Yesterday this caught my eye in the TV Tropes guide to characters in The Five Star Stories:

His full royal title is Amaterasu dis Grand Grees Eydas IV.


The "Eydas" part of the name is actually a tens' counter, so in pure numerical count he'd be Amaterasu dis Grand Grees LXXXIV.

I assume Eydas (エイダス Eidasu) is a distortion of English eighty, but the -dasu part unintentionally reminded me of Pali dasa 'ten'. (The near-homophony of -dasu with Japanese dāsu 'dozen' is presumably also unintentional. Pali asīti 'eighty' sounds nothing like Eidasu.)

Incidentally I long thought Grees (グリース Gurīsu) was a reference to Grease since Five Star Stories creator Nagano loves music references. But Grease doesn't seem to be Nagano's kind of music. Maybe it's from Greece. (The Japanese word for 'Greece' is related though different: ギリシャ Girisha.)

2. Can you distinguish these Jurchen characters?

<fai> and <muta>

Jason Glavy's Jurchen font (shown above) has a gap beneath the 7-shaped top component of <muta>.

Kiyose (1977: 65) distinguishes 078 <fai> and 079 <muta> by their bottom right strokes:

<fai> and <muta>

Note how Kiyose writes the top components of both characters with a much narrower 7-shape.

Jin (1984: 44) has two entries for <fei>/<fai>¹ and <muta> but also says "there is no difference in character shapes" (字形沒有區別). Then he tries to distinguish them the same way Kiyose does with a long stroke in <fei/fai> and a dot in <muta>. He derives <fei>/<fai> from 扉 (Jin Chinese *fi).

I wish I could examine copies of the Sino-Jurchen vocabulary to see for myself what <fai> and <muta> look like.

¹Jin lists two readings in different parts of the entry. The reading <fei> may be influenced by féi, the modern standard Mandarin reading of 肥, the Chinese character used to transcribe <fei> in the Sino-Jurchen vocabulary. The reading <fai> may be based on two facts:

First, the Jurchen word for 'eyebrow'

<? ta> (#500)

transcribed as Ming Chinese 肥塔 *fita is cognate to Manchu faitan 'eyebrow'. There was no Ming Chinese syllable *fai, so 肥 *fi was the closest available approximation of Jurchen fai. There is no reason to believe that Jurchen fi became Manchu fai. Manchu ai generally corresponds to Jurchen ai.² I don't know of any other apparent cases of Manchu ai corresponding to Jurchen i.

Second, the Jurchen word for 'label'

<? sï> (#270)

transcribed as Ming Chinese 肥子 *fizi is a borrowing of Jin or Ming Chinese 牌子 *paizi 'label'. It's hard to tell if the borrowing predated the shift of Jin Jurchen p- to Ming Jurchen f-. It's possible that the word was borrowed during the Ming when f- was the only available Jurchen approximation of Ming Chinese p-.

²One seeming exception is Manchu aisin : Jurchen alcun or ancun? 'gold'. But I think the Manchu and Jurchen forms are different borrowings from a common source. So I would not derive the Manchu form from the Jurchen form.

3. David Boxenhorn solved the Nagamese mystery: the -m- is by analogy with Assamese (whose -m- is part of the stem and not a buffer).

4. While playing 北斗の拳 Hokuto no Ken #78 last night, I heard the name 泰山 Taizan which the subtitler correctly rendered as "Taishan". Did the subtitler have to look that up, or did they know the Mandarin name (or even know Mandarin?). The English subtitles that I saw in Hawaii broadcasts of Japanese TV shows in the 70s would have left "Taizan" untranslated, as the subtitlers were under time pressure and were probably local Japanese-Americans without any knowledge of Mandarin.

When I first encountered the name Taizan via Hokuto no Ken in 1987, I gave no thought to the -z- which is unexpected. Theoretically 泰 Tai plus 山 san 'mountain' should equal Taisan, not Taizan. Normally Sino-Japanese s may become z after a nasal or *nasal vowel: e.g.,

But there never was any nasality in 泰 Tai.

So it seems the -z- in Taizan is a case of voicing the initial of a second element of a compound (rendaku) - a morphological alteration rather than a product of phonetic conditioning.

2.27.13:57: Later last night I realized I had known an example of 'mountainous' rendaku since childhood:

ka 'fire' + 山 san 'mountain' = 火山 kazan 'volcano'

a word that came up in at least one of my elementary school Japanese-language textbooks. Back then I never questioned why 山 had an irregular reading -zan. I don't remember misreading 火山 as regular ˟kasan (though I've made a lot of other reading mistakes over my lifetime). And now, about forty years later, I know that 火 ka 'fire' never had any nasality that would condition the voicing of the s- of 'mountain'.

5. Sanskrit japa- 'muttering prayers' is a straightforward a-noun derived from the root jap 'mutter'. This etymology is nonsense:

It [japa-] can be further defined as ja to destroy birth, death, and reincarnation and pa meaning to destroy ones sins.

Chinese-like monosyllabic analyses of Sanskrit words are usually dubious. There is no such ja or pa. In fact, there is a -ja- '-born' (not 'destroy birth') a pāpa- 'sin', and a -pa- '-protecting' (not pa 'destroy one's sins'). I presume the 'analysis' of japa was influenced by ja and pāpa-.

One case in which a Chinese-like monosyllabic analysis really is true is Sanskrit khaga- 'bird' from kha 'void' and ga 'go'. There is no root khag-.

6. If I didn't see the caption identifying this picture as being of

স্বামী বিবেকানন্দ

<svāmī vivekānanda>

Swami Vivekananda

[ʃami bibekanɔndo],

I wouldn't be able to identify the non-English script. This is only the second time I've seen handwritten Bengali. (The first was a sample of Rabindranath Tagore's handwriting - possibly this image.) I'm familiar with so many scripts only in typeset form. I wish I had more access to original Khitan and Jurchen texts.

7. Phrase of the day: pizza effect.

8. Tonight I started reading volume II of 永野護 Nagano Mamoru's The Five Star Stories. (Windows 10's IME has Nagano's name built in!) Nagano has his own idiosyncratic readings of kanji: WHITE RAT 2.3

? qulugh ai ? sair ? nyair

'white rat year, two month, three day'

1. Today I offer a digital Mardi Gras feast of three entries posted at the same time. (I'll post the entries for the second half of last week later.)

2. Perhaps it was yesterday afternoon when I realized that the Khitan small script character

235 <ri> (the interpretation in Kane [2009: 62])

might be derived from Chinese 礼 (pronounced *li in Liao Chinese). That derivation won't work for Shimunek's interpretation of 235 as <ir>, a possibility Kane (2009: 62) also acknowledges.

3. These Jurchen characters look related:

<he> and <ke>

They are even more similar as printed in Kiyose (1977) which has an identically sized 人 component in both:

<he> and <ke>

I proposed that <he> might be a graphic cognate of Chinese 黑; if so, then <ke> is <he> with an elongated first stroke. WHITE RAT 2.2

? qulugh ai ? sair ? nyair

'white rat year, two month, two day'

1. Why was Persian khwāja borrowed into Manchu (via some Turkic language?) as hojo? I can't find any other version of the word ending in -o.

2. While copying the Golden Guide last night, it finally occurred to me that Tangut 𗂅 2384 2me4 'minister' is a semantic compound of 'hand' + 'person' reminiscent of English right-hand man (though 'hand' is on the left of the Tangut character!).

3. Yesterday I saw Japanese 見れます miremasu, short for miraremasu 'can see' on p. 102 of vol. 1 of The Five Star Stories. That got me to look into the phenomenon of  「ら」抜き ra-drop discussed here.

Normally the potential is -raremasu after vowel stems and -emasu after consonant stems: e.g., kak-emasu 'can write'. mi- 'see' is a vowel stem, but miremasu looks as if it contains a (nonexistent) consonant stem mir-. I wonder if future Japanese will make more (or all) verbs consonant stems.

4. Speaking of the verb miru 'see', Wiktionary derives it from Proto-Japonic *miu and says it is cognate with me 'eye'.

The reasoning behind *miu seems to be that the regular conclusive ending is *-u (cf. consonant stems like kak-u 'write') and that in Proto-Japonic, *-u was added as is to all stems, whereas in Japanese, a buffer -r- was inserted after vowel stems: *mi-u > mi-ru but *kak-u > kak-u. One could propose the reverse: *mi-ru > mir-u but *kak-ru > kak-u.

I don't think there is any relationship between mi- 'see' and me < *ma-i 'eye'. I know of no other examples of Ci-verbs corresponding to *Ca-i nouns.

5. Is the Gundam robot name 笑倣江湖 <LAUGH IMITATE RIVER LAKE> Shōhō Kōko a play on the title 笑傲江湖 Shōgō kōko <LAUGH PROUD RIVER LAKE>, known in English as The Smiling, Proud Wanderer?

江湖 <RIVER LAKE> is not the sum of its parts.

6. Sven Osterkamp on Vietnamese in Japanese transcription and vice versa.

Just two interesting examples:

Vietnamese ̣t [mot] 'one' transcribed as Japanese moru (clearly not representing a Vietnamese dialect in which t became [ok]).

Japanese mizu [mi(d)zɯ] 'water' transcribed as monosyllabic Vietnamese 篾 miệt [miət] (again, not based on a Vietnamese dialect with [t] > [k])

7. 篾 is a rare example of a nom character that is a semantogram: it can represent both Sino-Vietnamese miệt 'bamboo splints' and its unrelated native Vietnamese synonym giá 'id.' (Is giá 'bamboo splints' an obsolete word? I can't find it outside of

8. Last night I heard "avant-garbage" on The Goldbergs. I'll store that for future use.

9. Last night I finally learned the etymology of やおい yaoi:


yama[ba] nashi, ochi nashi, imi nashi

'no climax, no denouement, no meaning'

It sounds like a backronym.

10. Last night I learned that the Czech version of Mardi Gras is masopust. Maso is 'meat', but pust has a short vowel unlike půst 'fast'. Wiktionary says masopůst is obsolete. Why was ů shortened? Because it wasn't stressed? But Czech has lots of unstressed long vowels in noninitial syllables. (Czech stress tends to be on the first syllable.)

11. Going eastward: I was surprised to learn that  the Slovakian prime minister is named Peter Pellegrini. His Italian name is pronounced like a Slovakian word, so -ni is [ɲi] and not [ni]. To Italian ears his name might sound like Pellegrigni.

12. I had heard of Pellegrini's predecessor Robert Fico. What is the etymology of that name? F seems to rule out a Slavic origin, but it doesn't sound like any non-Slavic European name that I can think of.

13. I didn't know about Nagamese Creole until today. I can't remember ever seeing -m- as a buffer between a vowel-final base and -ese before.

14. This looks like a folk etymology of the ethnonym Naga:

The term "Naga" is derived from a Burmese language word "Naka", which mean "Pierced ears" or "People with pierced ears". Piercing of ears is common tradition of the said people.

The Burmese name of the Naga is Žနာဂ <nāga> from Pali nāga- 'naga'. 'Ear' is နား <nāḥ> with a high tone, not နာ <nā> with a low tone, and there is no က <ka> or ဂ <ga> meaning 'pierced'.

15. SEAlang's Burmese dictionary defines Burmese နာဂ <nāga> as "a Tibet-Myanmar speaking ethnic group inhabiting the hilly north-west region along the Myanmar-India border." There's a synonym of Tibeto-Burman

I've never seen before: Tibet-Myanmar.

16. Is Nefamese another Assamese-based pidgin, or is it something else?

17. It just occurred to me that Taic by analogy with Turkic and Mongolic could be used in English to avoid the confusing homophony of Tai and Thai. It's tiring to explain to nonspecialist the difference between Tai and Thai. Part of me feels like using Dai to make the connection with Kra-Dai obvious, but Dai is already an ethnonym. Maybe Daic and Kra-Daic?

18. Entries 360-371 and 376-381 in the Sino-Jurchen vocabulary of the Bureau of Translators have the format

Jurchen X : Chinese Y

Jurchen X : Chinese Z

That shows the semantic range of Jurchen words.

19. Today's Honolulu Star-Advertiser reprinted a 1963 photo with a Hawaii tourism ad translated into Japanese with the slogan:


omotta yori majika de ... yume mita yori mo utsukushii

thought than close be ... dream saw than even beautiful

'Closer than you thought ... even more beautiful than you dreamt'

Note how 思つた omotta 'thought' is spelled in the prewar manner as with a full-size つ <tsu>. The postwar spelling is 思った with a reduced <tsu>.

In the ad, the kanji 近 has its prewar form with two dots at thte top left instead of just one.

I suspect the slogan was translated in Hawaii by someone who still had not shifted to postwar orthography.

20. Did the Japanese sentence-final -ものを mono wo (lit. 'thing ACC') construction in


damat-te-i-reba ii mono wo

lit. 'silent-CVB-be-if good thing ACC'

'I wish I had just not said anything'

(example from this site which has an explanation and more examples)

originate as an abbreviation of mono wo followed by a verb (like 'wish')?

21. Big news from Andrew West:

In May 2015 the National Library of China acquired 18 bundles of Tangut documents in a very poor state from a book dealer in Yinchuan who had contacted eminent Tangutologist Prof. Shi Jinbo.

If only similar Khitan and Jurchen-language items could be found.

The last new Pyu-language discovery I know of was from four years ago. I'm sure there will be more.

22. Stephen Colbert just pronounced 习 Xí [ɕi] (as in Jinping) as [ʒiː]. Sigh. WHITE RAT 2.1

? qulugh ai ? sair ? nyair

'white rat year, two month, one day'

1. I just changed the nonsense string "par juri" 'ten twenty' to "juri" 'twenty' in my transcriptions of Khitan dates. So many copy-and-paste mistakes. I copy and paste to save time, but my carelessness publicly embarrasses me until I make and upload fixes. Which takes time!

(2.25.1:20: I wrote topic 2 thinking the date was 1.30, but in fact it was 2.1 since the first Khitan month this year had only 29 days.

2. If Khitan juri 'twenty' is from jur 'two' plus -i, then I would guess that 'thirty' might be something like guri: gur 'three' plus -i. But I don't really know, and I'm not even sure the Khitan word for 'three' is gur. And I can't explain the Khitan large script character


which doesn't look like Chinese 卅 <THIRTY>. And Chinese 卅 <THIRTY> (a combination of three 十 <TEN>s) does resemble ...

Khitan <FOUR>

which is obviously a combination of four lines.

3. Starting today, the last post on the index page is followed by a link to the previous week's page. I've also added links to previous weekly pages at the bottom of weekly pages from the 19.12.22-19.12.28 page onward. Those are no substitute for updating the archives page, I know. In the meantime one can use search engines to find older posts.

4. A friend gave me an interesting pamphlet,  M. Vrdalj's Engleski sa izgovorom (Beograd: Jovan, 2006). For now I'll just comment on the title on the front cover: I would expect sa 'with' to be s before izgovorom 'pronunciation.INS.SG.' (See Wiktionary's usage notes.) And indeed s izgovorom is the title on the inside front cover and the National Library of Serbia listing in the back. So does the front cover have a typo nobody caught, or is it acceptable for sa to be used for vowels?

I expect Slavic *sŭ 'with' to become s (or z) everywhere except in certain environments where it has a buffer vowel (i.e., a remnant of *ŭ): e.g., Ukrainian

зі сестрою zi sestroju 'with [one's] sister'

found at But has

з сестрою z sestroju 'with [one's] sister'

without a buffer vowel.

Google stats:

Interslavic has both s and so, but it's not clear to me when to use so. Merunka's (2018: 51) grammar only mentions s.

5. Why is it taking so long to develop the AP Russian Language and Culture exam?

AP Russian Language and Culture is a proposed Advanced Placement course and examination, in development since 2005. [...] The program was meant to launch between 2007-2008.


A prototype exam was administered to students in 2010.

But the real exam still doesn't exist ten years later.

6. Numbers of students taking Advanced Placement language tests in 2019:

student #
Spanish language
Spanish literature

Students taking Spanish AP exams outnumber students taking all other AP language exams by a ratio of four to one.

I took the German AP exam in 1989.

7. Scott Pelley on 60 Minutes pronounced Calehr as [kjl]. What language does that name originate from? Samira Calehr's sons Shaka and Miguel were going to see their grandmother in Bali. I would expect Calehr to be pronounced something like [tʃaləhr] in Indonesian. How is the name pronounced in the Netherlands where Calehr's family lives?

The family has an interesting mix of first names. Samira is Arabic, Miguel is Spanish, and I can't tell what Shaka and Mika (Samira's third son) are.

8. Another shaka is a mysterious word from Hawaii that isn't Hawaiian (Wiktionary points out sh isn't in Hawaiian) and doesn't seem to be from any language here. I wish I could do a full-text search of the local press to see how early it appeared in print.

9. Wikipedia's Help:IPA/Malay page says:

The dental fricatives [θ, ð] are found solely in Arabic loanwords [in Malay (Malaysian and Indonesian)], but the writing is not distinguished from the Arabic loanwords containing the [s, z] sounds and these sounds must be learned separately by the speakers.

So [θ ð] are written as <s z> like [s z]. On the other hand, Wikipedia's article on Malay phonology gives redha (not reza) 'good will'¹ as an example of /ð/. It notes that

Before 1972, this sound [θ] was written ⟨th⟩ in Standard Malay (but not Indonesian).

Why wasn't <th> adopted for [θ] in the 1972 orthographies?

¹Not a meaning I can find in the dictionaries at SEAlang.

10. I just heard a McDonald's commercial ending with "made perfecter" (sic). I suppose the irony is supposed to be funny. I wonder how many English learners might take that phrase as a model.

11. My 1987 printing of vol. 1 of The Five Star Stories has the Japanese phrase


sono himitsu (hitotsu ya futatsu dewanai)

that secret (one ya two

on the second color page. English requires one to say either that secret or those secrets, whereas sono himitsu can refer to one or more secrets. The author has given the impression up to that point that there is only one secret but then reveals the surprise that there are more than just one or two. To translate sono himitsu as those secrets would give away the surprise, but that secret (not just one or two) sounds odd.

The numerals are spelled inconsistently in the original as <hi to tsu> and <2 tsu> - <2> representing futa- 'two'.

ya is hard to translate precisely. X ya Y means 'X and Y (and others)', not just 'X and Y'.

Wiktionary has dehanai as a romanization of ではない <de ha na i> dewanai. Sigh. Such transliterations are dangerous for those who don't know how the word is pronounced.


Tangut Yinchuan font copyright © Prof. 景永时 Jing Yongshi
Tangut character image fonts by
Tangut radical and Khitan fonts by Andrew West
Jurchen font by Jason Glavy
All other content copyright © 2002-2018 Amritavision