Archives WHITE RAT 1.22

? qulugh ai nai sair par juri ? nyair

'white rat year, head month, twenty two day'

1. Listening to the bilingual version of "Before the Next Teardrop Falls" this afternoon, I couldn't help but wonder what if the lyrics of that song served as a future 'Rosetta Stone' for one of its two languages. Years of working on Pyu have made me see Rosetta Stones everywhere.

2. Tonight while copying the 契丹小字研究 Qidan xiaozi yanjiu (Research on the Khitan Small Script)  hand copy of the epitaph for Empress 仁懿 Renyi (?-1076) of the Khitan Empire, I was puzzled by a Khitan small script character that I had never seen before: 𠈌 in block 2 of line 26. But 𠈌 is in fact two instances of 仌 <o> side by side. Duh. Does <o.o> represent a long vowel?

3. Kiyose (1977: 114) and Jin (1984: 131) read the Jurchen phonogram

transcribing Ming Chinese 千 *cian 'thousand' as cen (in my pseudo-Möllendorff notation). I initially read it as ciyan since 千 was later Manchuized as ciyan, but I reconsidered because that phonogram also represents the first syllable of the Jurchen verb cognate to Manchu cendembi 'to check' in the jinshi monument of 1224 and the Sino-Jurchen vocabulary of the Bureau of Translators.

I tentatively conclude that Jurchen and Manchu had different conventions for borrowing northern Chinese *-ian:

Or did they? Did the Jurchen use one phonogram for both native cen and borrowed ciyan?

4. Tonight I heard Jack's Project's "Shy Shy Sugarman". I thought it was odd that Jack - Jack White (born Horst Nußbaum) - would sing in German under a pseudo-English name. The English Wikipedia says

White changed his name to make it easier to deal with English-speaking stars and their managers.

But he was recording under the White stage name as early as 1966, years before he worked with Anglophones in the 80s. Was the idea to make him seem like Peggy March who really was an Anglophone despite recording in German?

March cowrote "When the Rain Begins to Fall" produced by Jack White. Small world. WHITE RAT 1.21

? qulugh ai nai sair par juri ? nyair

'white rat year, head month, twenty one day'

1. I just noticed a dotted two-stroke version of the hiragana し <si> in Ultraman Leo #45 from forty-five years ago today. The dot is a remnant of the top stroke of the source character 之 whose strokes beneath the dot have been reduced to one. I can type dotless し <si> and 'dotted Z' 𛁄 <si> in Unicode but not dotted し. Now I wonder how many times I saw dotted し but failed to notice the dot.

2. Today I rediscovered ธีระพันธ์ เหลืองทองคำ Theraphan Luangthongkum's "A View on Proto-Karen Phonology and Lexicon" (2019). She reconstructs

Unable to reconcile the northern, central, and southern proto-forms for 'four', she refrained from reconstructing a Proto-Karen word for 'four'.

The *-t forms are unusual in Sino-Tibetan. Sino-Tibetan words for 'four' and 'five' are typically open syllables: e.g., Old Tibetan bzhi and lnga. Proto-Karen has *-t in 'seven', 'eight', and 'nine'. I would reconstruct *-t in Proto-Sino-Tibetan 'seven' and 'eight'. Did *-t spread from those numerals? If it did, why is Proto-Karen 'six' *khrowᴬ without *-t? Could *-t really have 'jumped' over 'six' to spread to 'five' and - in northern Karen - 'four'? Did *-t not take root in 'six' because *khrowt would have had an impossible double coda? (Other Sino-Tibetan languages have -k or the like in 'six', so Proto-Karen *-w is unexpected in 'six', particularly since *-ok is a permissible rhyme in Proto-Karen.)

The *-w- in Proto-Central and Proto-Southern Karen 'four' might be from an earlier *labial stop like the b- in Old Tibetan bzhi < *blʲi 'four' or the p- in Pyu plaṁ /p.lä/ 'four'.

I can't explain the *-j- in Proto-Karen 'five'.

3. Most Vietnamese names are constructed from a limited set of Sino-Vietnamese building blocks. So Vietnamese names that have non-Sino-Vietnamese elements jump out at me - like 潘{廷庭亭}¹X Phan Đình Giót, the name of a hero of 奠邊府 Điện Biên Phủ. His last words were all Chinese loanwords:

決犧牲 ... 爲黨 ... 爲民

Quyết hy sinh… vì Đảng… vì dân

lit. 'determine sacrifice ... for Party ... for people'

'Determined to sacrifice ... for the Party ... for the people'

How is Phan Đình Giót written in Chinese? Semietymologically with a phonetic transcription character for the non-Sino-Vietnamese third syllable? I wouldn't expect a Chinese writer to look up the nom character for Giót: 埣.

I don't know what 埣 giót means or even if it is the Giót in Phan Đình Giót. The character is a semantophonetic compound:

土 <EARTH> + 卒 tốt and tell me that 埣 is in the word khuôn giót. khuôn (variously spelled² 匡 ~ 困 ~ 囷 ~ 坤 ~ 𣟂) means 'model'. Does khuôn giót refer to a mold for something to do with earth: e.g., clay?

¹I can't tell whether Đình is 廷 court', 庭 'courtyard', 亭 'pavilion', or something else.

²All spellings of khuôn have -n phonetics except for 匡 which has an -ng phonetic 王. I presume 匡 reflects a dialect in which -n backed to /ŋ/.

4. I just downloaded Andrew West's beta versions of BabelMap and BabelPad so I can see Khitan small script readings. AVG scared me when it told me I'd have to wait up to 156 and 154 minutes respectively to have them checked for viruses, but both were OKed in minutes.

5. I agree with Andrew West on Jurchen


Another possibility is that Jurchen <tai> is a deliberate merging of the two Chinese characters 'heaven' and'great', rather than just a random extra stroke added to 太.

<tai> could then be a semantophonetic composite.

6. Tonight I was surprised to learn that Disney Tsum Tsum are called 썸썸 Ssŏmssŏm. The name sounds like a Koreanization of a hypothetical English pronunciation [sʌm sʌm] of Tsum Tsum instead of a direct borrowing from Japanese ツムツム Tsumutsumu (which should correspond to Korean 쓰무쓰무 Ssŭmussŭmu or 쯔무쯔무 Tchŭmutchŭmu). WHITE RAT 1.20

? qulugh ai nai sair par juri nyair

'white rat year, head month, twenty day'

1. The Khitan large script character for 'twenty' looks exactly like Chinese 廿 <TWENTY> (Liao Chinese *ʐiʔ¹?) but represents a completely unrelated word.

If Khitan juri 'twenty' is jur 'two' plus -i, are the other unknown tens also X + -i: e.g., guri 'thirty', duri 'forty', etc.?

¹This is a guess based on the fanqie 人執 in-tʂ in the Liao dynasty dictionary 龍龕手鑒 Longkan shoujian (The Handy Mirror in the Dragon Shrine, 997).

2.15.14:17: Fanqie need not reflect contemporary pronunciation, so there is no guarantee that 人執 is not a relic.

My undated copy of 辭海 from the 20th century gives the fanqie 日力 rì-lì for 廿 niàn and says 廿 is pronounced like 入 rù. Maybe that makes sense in some Chinese varieties, but it certainly doesn't make sense for modern standard Mandarin. The equation of 廿 and 入 was valid for Middle Chinese in which both were pronounced *ɲip. The fanqie 日力 might be a relic from a post-Middle Chinese period when 日力 was pronounced something like Liao Chinese iʔ-l representing *ʐiʔ. 日力 cannot be a Middle Chinese fanqie since 力 ended in *-k in Middle Chinese, conflicting with the *-p of 入.

2. After reading Tournadre (2014), I'm going to be using 'Tibetic' the way I use 'Mongolic':

The term "Tibetic" could, however, become a useful replacement for the notion of "Tibetan dialects", which is not appropriate for various reasons.

First, the notion of "Tibetan dialects" implies the existence of a single "Tibetan language". However, the so-called "Tibetan dialects" refer in fact to various languages which do not allow mutual intelligibility.


Second, these "Tibetan dialects" are spoken not only by Tibetans per se but also by other ethnic groups such as Ladakhi, Balti, Lahuli, Sherpa, Bhutanese, Sikkimese Lhopo, etc. who do not consider themselves to be Tibetans. They do not call their language "Tibetan". In a similar way, we do not talk of Latin Languages but of Romance languages and do not think of French, Portuguese, Italian, Catalan or Romanian as various dialects of Latin.

With the recent descriptions of many new "dialects" or "languages", scholars of Tibetan linguistics have come to realize the incredible diversity of this linguistic area. The representation of a single language is no longer viable and we have to speak of a language family. In fact, the Tibetic linguistic family is comparable in size and diversity to the Romance or Germanic families.

3. Mathieu Beaudouin solved a mystery I've been wondering about for years: the difference between the locative markers 𗅁 2u1 and 𘂤 1kha1 in Tangut. I hope he next investigates how the third locative marker 𘇂 2gu1 ('medessive' in his analysis) differs from 𗅁 and 𘂤 ('inessive' and 'interessive' in his analysis).

2.14.0:34: Wiktionary doesn't have entries for medessive and interessive yet. Here's a set of flash cards for 'essives' excluding medessive.

2.17.23:12: I don't know what medessive means.

4. Last night I learned about a "security video feeds streaming platform" called 황새울 Hwangsaeul. 황새 hwangsae is Korean for 'stork'. 새 sae is 'bird', but what is 황 hwang which sounds like Sino-Korean? I can't think of any appropriate Sino-Korean morpheme hwang: e.g., 黃 hwang 'yellow' doesn't make sense since storks aren't yellow. This site (from which I got the description of Hwangsaeul that I quoted) says,

Hwangsaeul itself means "stork's nest". It is the street in Seongn

am where SK Telecom offices were located.

But ul doesn't mean nest (which would be 둥지 tungji). Martin et al. (1967: 1246) define ul as 'fence, hedge, enclosure; outer rim of shoes'. So I'm not sure what Hwangsaeul means.

2.15.23:12: Martin et al. (1967: 1246) also list a native Korean word 울 ul for 'trillion' which is surprising to me since I would have thought all Korean higher numerals were borrowed from Chinese: e.g., 兆 cho 'trillion'.

울 'trillion' is not in Naver's monolingual dictionary.

5. Until I read this Wiktionary entry today, it never occurred to me to link mum's the word with mummer, much less mime. Are m-words sound-symbolic for silence because one can make an [m] sound while keeping one's llips shut?

6. Can one receive a heads up for something positive? I don't get that impression from this Wiktionary entry.

7. I've wondered about Amy Klobuchar's surname. It turns out to be Slovene even though ch is pronounced as in French in American English (despite the un-French K-!).

2.14.0:22: The Slovene Wikipedia says her great-grandfather's name was Klobučar with č [tʃ].

8. While copying the Sino-Jurchen vocabulary of the Ming dynasty bureau of translators, two graphic etymologies occurred to me.

8a. The Jurchen phonogram


may be cognate to Chinese 更. The use of <giyan> for giyan [kʲɑŋ] might be a carryover from the Parhae script that coexisted with Late Middle Chinese since Pulleyblank reconstructed the Late Middle Chinese reading of as *kjaːjŋ.

Kiyose and Jin read <giyan> as <giyen> [kʲən] in my pseudo-Möllendorff notation, but if what might be called 'Sino-Jurchen' worked like 'Sino-Manchu', northern Chinese *kjan would be borrowed as giyan, not giyen.

8b. The Jurchen phonogram


may be from 昼 (Liao and Jin Chinese *tʂiw), one of many variants of 晝 (I like the one with 日 surrounded by four lines).

2.15.23:57: An obvious problem with that derivation is the phonetic mismatch of Jurchen sh- with Chinese *tʂ-. But such a mismatch has a parallel with the mismatch of Jurchen sh- and Mongolic c- in words like Jurchen shanggiyan 'white' (cf. Written Mongolian caghan 'white').

Then again, Jin (1984) derives <shu> from a variant of 書 (Jin Chinese *ʂu) attested at Dunhuang resembling

the older version of <shu> from the Jurchen Empire. So never mind what I said. WHITE RAT 1.19

? qulugh ai nai sair par ish nyair

'white rat year, head month, ten nine day'

1. Andrew West corrected my post about ɛ̃fini:

The Wikipedia file name (url) uses Ɛ̃fini because of a technical restriction, but the actual article title is ɛ̃fini.

I conflated the file name in the address bar with the article title within the article itself. I never thought of articles as being files before.

I've long known about that technical restriction but never understood the reason for it.

2. I've long known about the obsolete practice of German-like (but inconsistent) noun capitalization in English. But I didn't know how it started until last night.

3. I hope to see Chris Button's Derivational Dictionary of Chinese and Japanese Characters some day.

4. I didn't know thorn (Þ) was ever written with a diagonal stroke until I saw this tweet by Andrew West last night.

I didn't even know about U+A764 LATIN CAPITAL LETTER THORN WITH STROKE (Ꝥ) and U+A765 LATIN SMALL LETTER THORN WITH STROKE (ꝥ) until now when I looked at this document that Andrew and Michael Everson wrote.

5. Seeing this cartoon of Latin phrases used in English made me think of Literary Chinese four-character sayings in Japanese. Are there Arabic phrases playing a comparable role in the Islamic world?

6. I confess it's taken me years to realize that the Jurchen phonogram


is simply Chinese 太 <GREAT> with a line on top rather than Chinese 天 <HEAVEN> with a dot added. That obvious derivation was in Jin (1984: 4) all along, and I never noticed it, possibly because I encountered the character through Kane (1989: 23) and never looked it up in Jin (1984) before. Wait ... Kane (1989: 23) in fact derives <tai> from太! I've looked at that page many times since the mid-90s. How did I overlook that!?

It's not possible to date the creation of <tai> on purely phonological grounds, as 太 was pronounced in Chinese like Jurchen tai for centuries. In theory <tai> could be a carryover from the lost Serbi script from c. the 5th century - or the mostly lost Parhae script from c. the 8th century. <tai> is not known to exist in the Khitan large script (which uses an exact lookalike of Chinese 太 as a phonogram <tai>), so <tai> is either inherited from an earlier script or is a 12th century Jurchen creation. WHITE RAT 1.18

? qulugh ai nai sair par nyêm nyair

'white rat year, head month, ten eight day'

1. Yesterday I wrote about  文 fumi 'writing' which is often thought to be a borrowing from Chinese 文 'writing' (presumably from a reading like northwestern Middle Chinese *ɱvun), but I would not expect Chinese *-n to be borrowed as Japanese -mi.

For years I've thought the word might have been borrowed from Paekche, the language of the people who taught literacy to the early Japanese. But no such Paekche word is known. Nor is there any plausible Korean cognate for such a word.

So last night I reconsidered another old idea of mine: that fumi 'writing' might be a repurposing of fumi 'stepping'. Tonight I verified that 'writing' and stepping' would have had the same ancient accentuation (*high-low). But that's still not enough. If writing (marks on paper) were likened to steps on the ground, why doesn't the verb fum- 'step' do double duty for writing? The actual verb for writing is kak- 'scratch'.

2. Twenty-five years ago yesterday, the Kakuranger team fought Yamanba. The 'real' yamanba is also called a yamauba. Yama is 'mountain' and uba is 'old woman'. The variant yamanba [yamamba] seems to retain a trace of earlier *[mb]:

*yamaumba > *yamaũmba > *yamaũba > *yamamba ̣̣(or yamauba)

uba seems to belong to a word family with oba < womba 'aunt' and oba < əpəmba 'grandmother'. *əpə- is 'great', so *-mba is presumably a contraction of the linker * and an otherwise unattested noun *pa 'old woman'. (Its resemblance to Middle Chinese 婆 *ba 'old woman' is coincidental.) But what is wo-? It can't be any of these wo listed in Martin (1987: 503):

And there is no candidate for u- in uba < *umba.

The accentuation of the ba-words has different pitches on the second syllable, possibly implying the various ba are not related:

Pitches in parentheses are for following particles.

3. I've long assumed that English girl had no cognates, but maybe it does.

4. Japanese 山茶花 <MOUNTAIN TEA FLOWER> sazanka looks as if it should be read ˟sansaka. Wikipedia thinks there was 音位転換 metathesis, but I have an alternate derivation that accounts for the -z- left unexplained by metathesis (which would produce ˟sasanka):

*sansakwa > *sansankwa > *sanzankwa > sazanka

The species name Camellia sasanqua is either from a variant *sasankwa (with metathesis?) or a misreading of a kana spelling ササンクワ <SA SA N KU WA> without the voicing mark.

Wikipedia says the 濁音符 voicing mark wasn't used as late as 1945 in the text of the Jewel Voice Broadcast: e.g.,


<I ka WILL ni a ra su>

chin ga kokorozashi ni arazu

'not Our [imperial] will'

now written as


<I ka" WILL ni a ra su">

with voicing marks. WHITE RAT 1.17

? qulugh ai nai sair par ? nyair

'white rat year, head month, ten seven day'

1. On Saturday morning I was listening to the music of 馬飼野康二 Makaino Kōji. He used the pseudonym Michael Korgen when foreign composers were in vogue for Japanese commericals. I love pseudonyms that vaguely sound like real names.

2. Last night I got a couple of surprises from Wikipedia's Irish orthography article:

2a. On v (in the loanword vóta from yesterday):

It occurs in a small number of words of native origin in the language such as vácarnach, vác and vrác, all of which are onomatopoeic. It also occurs in a number of alternative colloquial forms such as víog instead of bíog and vís instead of bís as cited in Niall Ó Dónaill's Foclóir Gaeilge–Béarla (Irish–English Dictionary).

2b. I had no idea /z/ existed in at least one Irish dialect:

the phoneme /z/ does exist naturally in at least one dialect, that of West Muskerry, County Cork, as the eclipsis of s.

That eclipsed s is written as <zs> in Cape Clear Irish.

s does not undergo eclipsis in standard Irish.

3. I forgot about Mazda's ɛ̃fini (IPA for French infini) division until last night. It's the only IPA brand name I've ever seen. The Wikipedia ɛ̃fini entry says IPA is "sometimes used in product naming" in Japan. What are other examples?

2.11.22:16: I just noticed the WIkipedia article for ɛ̃fini is titled Ɛ̃fini with U+0190 LATIN CAPITAL LETTER OPEN E in order to  conform to Wikipedia's convention of beginning all article titles with capital letters.

4. Kiyose (1977: 114) reads Jurchen


as hau. Jin (1984: 153) reads it as xao or xou. The difference in initials is purely notational. h and x both represent the initial that I reconstruct as uvular [χ]. The different between au and ao is also purely notational; both are roughly [ɑw].

The word is clearly a borrowing from Ming (or earlier) northern Chinese 侯 *xəw (= heo in my pseudo-Möllendorff notation) 'marquis', so I read it as heo [xəw] which is exactly like Manchu heo [xəw] 'marquis'.

侯 'marquis' is a very stable word in the north. In Middle Chinese, its rhyme shifted from *-ow to *-əw and remained unchanged in the lineage of standard Mandarin until recently when rounded to o to assimilate to the following w: *xəw > hóu [xow˧˥]. Manchu heo was borrowed prior to that assimilation. In theory heo could have been borrowed into Jurchen when the Jurchen were under Parhae rule (698-926). Or it could have been borrowed from Khitan. (There's a possibility I've never seen raised before: how many Chinese loans in Jurchen had Khitan intermediaries? Cf. how early Chinese loans - 'Go-on' - in Japanese were probably borrowed from Paekche rather than directly from Chinese.)

But in any case, there would be no reason for the Jurchen to borrow what sounded like heo [xəw] to their ears as hao [χɑw] or hou [χou] (= Kiyose's hau/Jin's xao and Jin's xou).

Jin's reading xou sounds anachronistic, as it is a perfect match for modern standard Mandarin hóu [xow˧˥].

I cannot explain Kiyose's hau/Jin's xao, as there is no extant evidence for reading 侯 as *xaw in the northeast (or anywhere else) up into the Ming dynasty. The Chinese *-aw rhyme category has a distinct history and is not confused with the *-əw category in Manchu: *-aw corresponds to Manchu -oo and -ao, whereas *-əw corresponds to Manchu -eo. I assume Jurchen had the same pattern minus -oo (which is in borrowings predating *-ao > -oo in Manchu; Manchu -ao is in borrowings postdating *-ao > -oo).

If the Jurchen word for 'marquis' were hao, I would expect that word to undergo monophthongization and become Manchu hoo rather than the heo that is attested.

One could claim that Jurchen hao became extinct and that the word was reborrowed into Manchu as heo, but that still does not address the issue of why the Jurchen would borrow Chinese *xəw as hao [χɑw] instead of heo [xəw].

As tedious as this section may be, I think a lot of Jurchen character readings require this level of scrutiny and reevaluation.

2.11.20:31: Fortunately no arguments about Jurchen are likely to revolve around a single transparent loanword from Chinese, so there is little harm in reading the Jurchen character for 'marquis' as hao or hou, etc. Nonetheless I still think there is a need to reconsider even what may seem to be obvious in Jurchen reconstruction. WHITE RAT 1.16

? qulugh ai nai sair par ? nyair

'white rat year, head month, ten six day'

1. Yesterday I was listening to the music of 岩崎文紀. I initially misread his name as Iwasaki Fuminori. His name is actually Iwasaki Yasunori. I've never seen 文 read as yasu before.

2.10.22:51: The Japanese name element yasu means 'peace' and is most commonly written

which are the first four Windows IME suggestions that are used in names.

保 doesn't represent a Chinese morpheme for 'peace(ful)', so I was surprised back in 1991 when I learned that 藤堂明保 was read Tōdō Akiyasu. But I suppose 'peace' is the intended result of safeguarding.

It's taken me a couple of days to figure out why yasu would be spelled 文 <WRITING> which is normally read fumi 'writing' in names. 文 <WRITING> is associated with 文明 <WRITING BRIGHT> 'civilization', and civilizations are supposed to be peaceful without the strife of barbarism. But I really have no idea why the composer has that unusual reading.

Japanese name laws restrict the kanji used in baby names but have no limits on how those kanji are read. lists very unusual readings of baby names with legal kanji. Some 北斗の拳 Hokuto no Ken fan (I'm guessing) named their son 北斗拳 Hotoke <NORTH DIPPER FIST> (< Hokuto Ken) which sounds like Hotoke 'Buddha'. The clipped readings ho and ke for 北 hoku and 拳 ken are nonce inventions.

A girl was named 帆都華 <SAIL CAPITAL FLOWER> Hotoke and her twin was named 都萌 <CAPITAL MOE> Tomoe with moe. ho and to are normal readings for 帆 and 都 which are not common name characters. The reading ke for 華 is rare: the one example that comes to mind is 法華經 Hokkekyō 'Dharma Flower Sutra' (i.e., the Lotus Sutra).

The creative use of kanji today may give insight into how Chinese characters were used to write non-Chinese languages in the past: e.g., what logic led to the readings of Khitan and Jurchen characters.

2. Irish parliamentary elections were held yesterday. Why does Irish stáisiún vótála 'voting station' have ú [u] instead of o or ó [o]?

2.10.2:23: vótála is the genitive singular of vótáil 'voting' from vóta 'vote' plus the verbal noun suffix -áil.

-áil also makes verbs out of nouns. Fun examples:

The Irish-language poet Cathal Ó Searcaigh in his poem "Cainteoir Dúchais" (published 1997 in the collection“ Out in the Open) uses the following verbs, most of them probably nonce words:

I first learned of Harpic and used it when I lived in the UK. I might have seen Flash. I guess they're sold in Ireland too. I've never heard of Jeyes Fluid, Vim, or Windolene.

2.10.23:39: Wiktionary derives vóta from Latin vótum. But I thought v was only in modern loanwords (there was no v in the premodern Irish alphabet), so I had guessed that the word was borrowed through English rather than directly from Latin. How old are words like Vailintín 'Valentine' and Vulgáid 'Vulgate'? Are those modern respellings of old loans?

3. What should the virus in the news be called?

2.10.2:25: I've been thinking of it as the 'coronavirus', but it is just a coronavirus:

Coronavirus is the umbrella term for a large group of viruses, including ones that can cause the common cold.

Toyota was fortunate to have retired the Corona model name almost twenty years ago. WHITE RAT 1.15

? qulugh ai nai sair par tau nyair

'white rat year, head month, ten five day'

Finally a Khitan numeral that everyone is certain about: tau 'five'. Scholars have guessed the Khitan numerals for 'one' through 'four' and 'seven' on the basis of ordinals and Mongolian, but ordinals are not necessarily like cardinals: e.g., English second is not cognate to two.

ordinal suffix type
Written Mongolian
mo ~ masqu
nigen (not cognate)
curer ~ jurer¹
cur? ~ jur?
qoyar (not cognate); cf. Janhunen's Proto-Mongolic *jiri/n
turer ~ durer¹
tur? ~ dur?
tadogh ~ todogh² C
X (= nil?)
jirghughan (not cognate)
nyêmder ~ nyêmirer
D, E
ishider ~ ishidegh
F, G
yesün (may not be cognate)

(I have only listed masculine numerals.)

And even when the two are cognate, one cannot always subtract an ordinal suffix to generate a cardinal: e.g., third³ minus -d is not three, and Khitan tadogh ~ todogh 'fifth' minus -dogh is not tau 'five'.

There is no single strategy for forming masculine ordinals in Khitan:

A. -qu may be an adjectival suffix also in liauqu 'red' and siauqu 'blue/green'.

B. -er is a masculine suffix also found in verbs.

C. -dogh may be unique to 'fifth', though it has a variant in 'ninth' (see G below).

D. -der looks like -d (like the noun plural suffix?) plus -er (see B above).

E. -irer may be from -ider (see F below) with -d-lenition.

F. -ider looks like -id-, an allomorph of -d- from -der (see D above) plus -er.

The noun plural suffix has no allomorph -id, so maybe -id- is not related.

G. -idegh combines -id- with a variant of -dogh (see C above).

2.9.23:51: Shimunek (2017: 230) analyzes 'ninth' as ishi-degh.

¹2.9.23:49: This numeral is an 'alternator' spelled with two different consonant letters. See eight approaches to alternation in these posts:

1-3 / 4 / 5-7 / 8

²Is todogh is from tadogho with a assimilating to the following labial vowels?

³It seems þridda underwent metathesis in Old English. Dutch derde 'third' apparently also independently underwent metathesis back on the continent, as Middle Dutch has dridde preserving the Tr-cluster still in German dritte  'third'. WHITE RAT 1.14

? qulugh ai nai sair par ? nyair

'white rat year, head month, ten four day'

1. Big news, small script (from Andrew West):

This is a test page for a prototype font for the Khitan Small Script, with glyphs derived from a font designed by Jing Yongshi. The Khitan Small Script will be included in Unicode 13.0 (code charts) to be released in March 2020. The font uses OpenType features for automatic cluster formation.

I downloaded the font. It works!

2. This Khitan small script pendant looks fake to me because of the errors in it:

2.1. The


in regular script block 2 has a strangely written bottom half.

2.2. The


in regular script block 4 looks like Japanese モ and the Khitan small script character <ONE>.

2.3. The


in regular script block 5 resembles Japanese タ.

I am not an expert on Khitan small script calligraphic variation, but I have been unable to find the above characters in Starikov's (1982) catalog of variants which is not comprehensive.

2.4. The seal script text is a grammatical mess:

< di.en hong>

'lady-INS ror-GEN empe grant.PASS.CVB'

'having been granted by lady ror's empe'

2.4.1. The Khitan passive constructions I have seen have the structure


'(X) was V-ed Y'

without accusative -er.

In this case, <d.em> 'grant' is the verb. Perhaps <au.ui> 'lady' is the one doing the granting. If so, <er> is instrumental rather than accusative. But I don't know if 'by Z' is Z-er in Khitan passive constructions. Even if it is, other problems remain.

2.4.2. Presumably <hong di.en au.ui> 'emperor's lady' was intended, but the order of elements is wrong: the halves of <hong di.en> are inverted, and both halves follow rather than precede the possessed: <au.ui> 'lady'.

2.4.3. The inscription ends in a converb <ei> rather than a finite verb suffix. I would expect another clause after a converb: 'Having been granted by the emperor's lady, ...'

Whoever made this looked up words and did place the verb in the correct (final) position but doesn't fully understand Khitan grammar and is not familiar with the Khitan small script.

2.4.4. I don't understand Khitan society well enough to know whether an emperor's lady would be granting ranks (in this case, gim ngu ui shang sang gün 'imperial insignia guard superior general' in the regular script on the other side) or pendants. Or if that rank had a pendant.

3. One scenario for the spread of Semitic. I have no idea if it or any of these alternatives are correct:

There is no consensus regarding the location of the Proto-Semitic urheimat; scholars hypothesize that it may have originated in the Arabian Peninsula, the Levant, the Sahara, or the Horn of Africa.

I just like seeing scenarios expressed in map form. I wish I had seen the historical maps on Wikipedia when I was in school.

4. Today I found Richard Sproat's site. I know of Sproat as one of the trio famous for their opposition to regarding the Indus script as a script in a landmark 2004 article ("Hundreds of thousands of downloads since its publication"). I don't know why it took me so long to find his site. I've visited Steve Farmer's site from time to time and I think I've visited Michael Witzel's site at least once before. I just learned Witzel had studied Japanology at university!

5. This paper by my former student Mark Post and this presentation by Roger Blench make me think Pyu might not belong to Sino-Tibetan - a possibility that's been in the back of my head since I started working on Pyu five years ago.

Blench is blunt and right - this applies to Pyu too:

The classificatory tradition of Tibeto-Burman [i.e., Sino-Tibetan minus Chinese] studies, which can be traced back at least to Konow, is to assume affiliation based on geography and a few lexical similarities. In some cases, the argument is no more than 'my friend said so' or 'I had a brilliant student who'

Pyu was spoken in an area where Sino-Tibetan languages are now spoken, and it certainly has Sino-Tibetan vocabulary. But is that enough to make it a Sino-Tibetan language?

6. Today I finally got around to looking into the Wuhan dialect after constantly seeing the name in the news. I could go on and on about it, but for now I'll just saw that it is a variety of Mandarin with a pre-Mandarin substratum lacking palatalization: e.g.,

7. Today I learned of this Chinese language vocabulary database with a radical index unlike any I've seen before (click on the 部首検索 tab, pick a radical, and then click the blue box with 部首検索 for search results).

2.8.23:48: The index contains many nontraditional radicals: e.g., the phonetic 冈 gāng which is not a radical in any list I've ever seen. The results for 冈 even include characters with similarly shaped components:

traditional radical
gāng, gǎng, gàng

gāng, gàng


𭯍 (written slightly differently)

Oddly 冈 gāng itself has no entry. WHITE RAT 1.13

? qulugh ai nai sair par ? nyair

'white rat year, head month, ten three day'

1. I uploaded all the entries from White Rat 1.2 to this one using WinSCP thanks to David Boxenhorn:

I've been reluctant to upload partly because FileZilla has lately begun to crash almost every time I load multiple images. But I just uploaded 36 images without any issues in WinSCP. Looks like I'm going to switch.

2. David Boxenhorn also drew my attention to this tweet by Benjamin Suchard:

Proto-Semitic *ṣ́ (> Arabic ض) had some funky Old Aramaic reflex that was spelled with <q> but was distinct from /q/ < PS [Proto-Semitic] *q. Later, this sound merges with ʕ; the spelling lags behind but eventually gets updated to reflect this merger.

When I first learned years ago (from David, I think) that Aramaic ʕ was from PS *ṣ́ as well as PS *ʕ. I was baffled. But not knowing much about Semitic, I put the mystery aside. Now things are clearer. I see parallels with Chinese that I should have seen back then.

Baxter and Sagart (2014) reconstruct an Old Chinese voiceless pharygealized lateral approximant *l̥ˁ which is similar to the PS emphatic lateral fricative *ṣ́. Who knows, maybe the Old Chinese consonant was a fricative like PS *ṣ́ in some environments and/or dialects. It is impossible to be certain about phonetic details of long-gone language stages.

Compare the development of *l̥ˁ in two types of Old Chinese dialects with the development of *ṣ́ in Semitic:

Eastern (i.e., coastal) Old Chinese: *l̥ˁ > *tʰ

cf. PS *ṣ́ > Arabic [dˤ] ~ [d̪ˤ] ~ [d̪ˠ] (modern pronunciations from Wikipedia)

Western (i.e., interior) Old Chinese: *l̥ˁ > *x ([χ]?)

cf. PS *ṣ́ > Old Aramaic [χ]? (spelled <q>) > later Aramaic [ʕ]

Eastern Old Chinese and Arabic hardened their unusual *laterals (*l̥ˁ and *ṣ) into stops, whereas Western Old Chinese B and Aramaic backed them.

Western Old Chinese has no living descendants, though an important word from it marginally survives as a loanword in the east:

祆 'Ahura Mazda': Mandarin xiān [ɕjɛn˥] < Western Old Chinese *xen ([χen]?; cognate to Eastern Old Chinese 天 *tʰen 'sky'; both are from *l̥ˁin 'sky', and 祆 is written as 天 plus the religious radical 示).

What I still don't understand is how [χ] became [ʕ] in Aramaic. Maybe *ṣ́ voiced to [ɮˁ] (possibly also the pronunciaiton of <ḍ> in earlier Arabic) and backed to [ʁ] on the way to [ʕ]. [ʁ] was written as <q> since there was no letter for a voiced uvular <ɢ> or <ʁ>.

3. What is the origin of "the beatings will continue until morale improves"? The sentiment is old - pour encourager les autres is from Voltaire's Candide - but the exact wording seems to be recent. Google Books Ngram Viewer has no results for "beatings will continue until morale" until 1990.

4. Somehow I had gotten the mistaken idea that KOTONOHA, the journal of the 古代文字資料館 Ancient Writing Library, hadn't had an issue in a year or so. But now I see that there were twelve issues in 2019.

The second article in the final issue of 2019 (#205) is 吉池孝一 Yoshiike Kōichi's 『東アジアの諸文字と契丹文字』 (The Scripts of East Asia and the Khitan Scripts) which places the Khitan scripts in areal perspective. Yoshiike regards the Khitan scripts, the Jurchen (large) script, and the Tangut script as 擬似漢字系文字 giji kanjikei moji 'pseudosinographic scripts' as opposed to sinographic scripts directly derived from Chinese or nonsinographic scripts that are wholly unrelated (e.g., Sogdian, Phags-pa, and hangul). That might give the impression that Yoshiike agrees with Janhunen that the Khitan large script is a sister to the Chinese script, but Yoshiike agrees with the mainstream view that the Khitan large script was created in 920. Janhunen and I, on the other hand, think that the Khitan large script grew out of an earlier script.

Oddly Yoshiike views the Khitan language as a  方言 hōgen 'dialect' of Mongolian, though as he certainly must know, it is not mutually intelligible with Mongolian. (If it were, deciphering it would be much easier.)

5. A user-added word in Naver's dictionary: 핥 <> 'a shortened word of heart', added in 2014 - is that word really in use? WHITE RAT 1.12

? qulugh ai nai sair par ? nyair

'white rat year, head month, ten two day'

1. This morning I finally learned the Arabic spelling of Hoda Kotb: هدى قطب <hdá qṭb>. I was surprised by the <q>, as <q> is [ʔ] in Egyptain Arabic. Is the name from this noun? (Yes, according to "Her name Hoda means 'guidance' in Arabic and last name 'Kotb' means pole.")

Many years earlier I was surprised by Kotb, as I had first encountered her name in speech as [ˈkɒtbiː] and had assumed it was written with a final vowel. Or had I first encountered her name when she briefly spelled it Kotbe with an e?

She even changed her professional name at one point to "Kotbe" to help people understand its correct pronunciation: "COT-BEE."

"People kept saying 'buy a vowel,' so I stuck another one on," Hoda said. It didn't help, so she decided to just let her freak name flag fly.

2. The alif maqṣūrah (ى <á>) at the end of Hoda looks like the dotless final <y> of Egyptian Arabic. How did Egypt (and neighboring areas) come to have a dotless form?

3. And while I'm at it, why does Persian <y> have no dots in both isolated and final forms?

My guess is that Egypt and Persia retained the older dotless <y> even after the dots became obligatory elsewhere. But this makes it sound as if Egyptian dotless <y> might be an innovation:

The Arabic grammarians of North Africa changed the new letters, which explains the differences between the alphabets of the East and the Maghreb.

Is that passage referring to letters like Maghrebi ڢ <f> and ڧ <q>?

4. Why does Kashmiri have a ringed version of <y>: ؠ?

5. I just learned that Arabic vowel marks

were introduced, beginning some time in the latter half of the 7th century, preceding the first invention of Syriac and Hebrew vocalization.

I had assumed Hebrew vocalization had come first. Oops.

Initially, this was done by a system of red dots, said to have been commissioned in the Umayyad era by Abu al-Aswad al-Du'ali a dot above = a, a dot below = i, a dot on the line = u, and doubled dots indicated nunation. However, this was cumbersome and easily confusable with the letter-distinguishing dots, so about 100 years later, the modern system was adopted.

I don't know of any script in general use which uses color.

6. Is there a Mongolian term equivalent to rasm for dotless Mongolian script like


uqaghan <'UQAQAn> 'sense, meaning, knowledge'

from yesterday? I would write that with dots as



The presence of dots distinguishes <gh> from <q>. <A> is not as ambiguous as one might think since vowel harmony requires it to represent a in a word with q and/or gh.

7. Not rasm: a Qur'an manuscript from the 1st century AH with dots.

8. An obituary in today's Honolulu Star-Advertiser says "No koden".

It would be hard to guess the meaning of Japanese 香典 kōden 'condolence money for a funeral' on the basis of its parts: 香 'incense' and 典 ten (not den by itself!) with various meanings (and none are money-related).

ten is a postwar respelling of the uncommon kanji 奠 ten (also den by itself!), once again without any money-related meanings.

The -d- of -den reflects a nasalized vowel in *kaũ, the old reading of kō:

kaũten > kaũnden > kauden > kɔ̄den > kōden

9. I've struggled with the translation of Japanese  妖怪 yōkai. 'Supernatural creature' is too long. I guess I won't bother anymore given that the word seems to be slowly entering the English mainstream. BYU has a site devoted to 化物之繪 Bakemono no e (Illustrations of Supernatural Creatures, c. 1660). Bakemono hasn't penetrated English as much as yōkai has, but who can predict the future? I wouldn't have predicted in the 80s that I would see yōkai in English.

One recently appeared on American TV (link added):

After its appearance in an Asahi newspaper article, a photograph of BYU’s nurikabe made its way across the world wide web, inspiring new interpretations of nurikabe in art, comic books, animation, and a variety of other formats, including an appearance as an enemy combatant in the long-running Power Rangers television series.

10. What is the pen in penultimate (which I just heard on Gilmore Girls)? (It's cognate to patient, passive, and ... field!)

Wiktionary says,

the traditional English expressions for this idea were last but one and (less often) second last.

I've never heard of either.

I'm surprised there's no note about the common and erroneous use of penultimate as ultimate (which came up in the Gilmore Girls episode).

11. What is the arap in Daniel arap Moi? (I just read his obituary.) 'Son of' (presumably in Tugen), judging from this.

12. I use N3696 as a phonetic index for Jin's  (1984) Jurchen character dictionary. I was having trouble finding the phonogram

for the transcription of Chinese 君 and 軍 until I went to and found that Jin's reading was [dʑyn] which is anachronistic and based on modern Mandarin jūn [tɕyn] rather than Ming Mandarin [kyn] without palatalization. I read it as gün [kyn].

The Khitan large script cognate of that character is


13. Today I realized that the versatile Jurchen phonogram

<her> ~ <u> ~ <hu> ~ <e> ~ <we> ~ <du> (Kiyose [1977: 65, 127])

might be from Chinese 右 <RIGHT>. The reading we [wə] resembles Late Old Chinese 右 *wɨəʔ. Perhaps Jurchen borrowed a version of 右 used by some other people centuries earlier to transcribe [wə] in another language.

Later, 右 was pronounced *wṵ in Early Middle Chinese and *wú in Late Middle Chinese. So in Parhae 右 could have been a phonogram for [u] (cf. the Sino-Korean reading 우 u south of Parhae).

Kiyose's <du> should be <u> if

hadu 'clothing'

is <CLOTHING u> (i.e., <hadu u>) rather than <ha du>. (The h- is unexpected, as it corresponds to zero in Manchu adu 'clothing'.)

I think Kiyose's <e> in

<RETAINER.we> buwe 'retainer'

is really <we>. <RETAINER> was originally read buwe by itself and then got a redundant <we> to indicate the last syllable.

<hu> might reflect a Late Middle Chinese reading of 右 as something like *xú: cf. Sino-Vietnamese hữu from a southern Late Middle Chinese dialect. The trouble is that I don't know of any northern dialect with a fricative in that word.

The one reading I can't explain is <her> (or <hel> or <here> or <hele>? - the Chinese transcription 黑勒 *hele is ambiguous). The Ming dynasty Bureau of Translators vocabulary has the entry

<? ?> 黑勒厄 *helee 'market' (36)

which is hard to interpret because there is no (obvious?) Manchu cognate except perhaps for heren 'corral, stable'. WHITE RAT 1.11

? qulugh ai nai sair par ? nyair

'white rat year, head month, ten one day'

1. The Khitan words for teens are all straightforward combinations of 'ten' + numeral unlike the para-Mongolic source of the '-teen' loanwords in Jurchen. The jury is still out on whether that source was a dialect of Khitan (i.e., mutually intelligible with written Khitan) or a sister of Khitan (i.e., another Serbi language). Different numerals alone are not enough to claim that source wasn't Khitan. Varieties of French with septante instead of soixante-dix for 'seventy' are still considered French.

2. The Honolulu Star-Advertiser reprinted a New York Times story mentioning a "Harmo Tang". I DuckDuckGoed (DuckDuckWent?) and found that spelling in the original, so it's not a secondary typo. "Harmo" is an unusual Mandarin name since the only permissible Xr syllable in names is er. (A famous example is 吾爾開希 Wú'érkāixī which is actually a Mandarinization of an Uyghur name ئۆركەش Örkesh, not a Mandarin name.) There are r-final words like 哪兒 nǎr 'where' and 那兒 nàr 'there', but they wouldn't be used as names or parts of names.

Could "Harmo" be a primary typo for "Hanmo" in the source article? But r and n are not close on keyboards.

Is "Harmo" an idiosyncratic spelling of standard Hamo? (Cf. the use of r in English Myanmar to indicate a final [aː]; there is no final [r] in the Burmese original [or Burmese at all].) But there is no common Mandarin name component ha.

Is "Harmo" a combination of a Mandarin surname and a non-Chinese name Mandarinized as Ha'ermo?

3. How was I unaware of Kryptos until two days ago? I never saw the text or creator Jim Sanborn's official site until today. The Puzzling site lists each of the four passages of ciphertext and the plaintext and decryption methodology for three of them. The fourth remains unsolved. Sanborn charges a $50 fee for responses to emails regarding the fourth passage.

What amazes me is how people were able to identify the borders between the passages. There are none in the ciphertext, though fortunately the first passage ends on line 2 and the second on line 12. The third ends with a question mark toward the end of the fourth line from the end.

While looking into Kryptos I learned how a Vigenère cipher works.

4. I've never seen the Russian name Илья Il'ya ('Elijah') spelled Illlya with three L's before.

Uppercase I and lowercase L look identical in a sans serif font. The difficulty of distinguishing them reminds me of the difficulty of distinguishing letters made up of sidün 'teeth' in the Mongolian and Manchu scripts.

An example I happened to encounter just now is


odqan <'UTQan> 'youngest'

in which <U> is a loop or belly (gedesün) like the first half of <T> and the second half of <T> is a 'tooth' like <A>.

Kara (2005: 93) gives the example


uqaghan <'UQAQAn> 'sense, meaning, knowledge'

with four letters made up of teeth: <Q> (two teeth; twice) and <A> (one tooth; twice). Final <A> and <n> can also look alike, but the readings <-AA> and <-nn> are impossible.

5. Kara (2005: 94) notes that in preclassical Mongolian orthography, <Uy> (normally for ü/ö) may represent u: e.g., ghurban 'three' was spelled as <QUyrbAn> (if I understand Kara correctly). That reminds me of how the same two letters <Uy> came to represent Manchu ū [ʊ] (not [uː]!).

There are also cases of the reverse: i.e., Mongolian ü spelled as <U> in <mUnKKA> rather than <mUynKKA> for möngke 'eternal'.

6. While copying the Sino-Jurchen vocabulary of the Ming dynasty bureau of translators, it occurred to me that the Jurchen phonogram


one of the first Jurchen characters I ever encountered, might be from a Parhae graphic cognate of Chinese 新 <NEW> used to write a para-Japonic cognate of Old Japanese nipi (later nii) 'new'.

But the trouble is that the Proto-Japonic word might have had *m-, judging from Okinawan mii- 'new'. *mi > ni is more likely than the reverse. Could *mi > ni have occurred twice, once in Japanese and again back on the peninsula in para-Japonic? Pushing believability even further, could *mipi have become *nii in para-Japonic centuries before Old Japanese nipi became nii?

7. Can cats recognize their own names? WHITE RAT 1.10

? qulugh ai nai sair par nyair

'white rat year, head month, ten day'

1. The Khitan large script graph 十 <TEN> looks exactly like Chinese 十 <TEN> but represents a native word par unlike Liao Chinese *shï.

I used to think the -hon/-hun of the Jurchen '-teen' words was from a Khitan or Khitan-like cognate of Proto-Mongolic *xarban with *-arba- becoming *-o- via *-aβa-, but that just won't work: Khitan preserved Proto-Serbi-Mongolic *p- instead of weakening it to *x-, and *-arba- : *-o- has no parallels in other Mongolic-Khitan comparisons.

Janhunen (2003: 399) reconstructed para-Mongolic *-kUn '-teen'. Manchu ᠵᠣᡵᡤᠣᠨ jorgon 'twelfth (month)' preserves the *stop that became a fricative in Jurchen

jirhon 'twelve' < *jir 'two' + *kon '-teen'.

(Earlier i assimilated to o in jorgon. The voicing of *-k- is irregular.)

2. Norman (2013: 216) says Manchu ᠵᠣᡵᡤᠣᠨ ᡳᠨᡝᠩᡤᡳ jorgon inenggi, literally 'twelve day', is 'the eighth day of the twelfth month'. I suppose the phrase is a contraction of *jorgon biya jakūn inenggi 'twelve month eight day'. There is no confusion with 'the 12th' which is ᠵᡠᠸᠠᠨ ᠵᡠᠸᡝ ᡳᠨᡝᠩᡤᡳ juwan juwe inenggi 'ten two day' (with the native words for 'ten' and 'two').

3. Today I learned of Assamese মান দেশ Mān desh 'Burma'. I assume Mān is from the first syllable of whatever pronunciation of မြန်မာ <mran mā> 'Burma' was current.

4. What is the etymology of the second half of Chinese 緬甸 'Burma' (read as Miǎndiàn in standard Mandarin)? I assume the first half is from <mran>. normally represents a word for 'suburb', but I think it is a transcription of a foreign syllable like *den which is not in Burmese <mran mā>.

I don't know of any etymology for the word <mran mā> itself. Burmese roots are typically monosyllabic, but disyllabic <mran mā> is monomorphemic.

5. How did I not see the Plain of Jars until tonight? I never even heard of it until the end of last year.

Its Lao name is ທົ່ງໄຫຫິນ [tʰoŋ˧ haj˨˦ hin˨˦], lit. 'plain jar stone' = 'plain [of] stone jars'.

6. I wouldn't have imagined that Google Play has a page for 闘士ゴーディアン Gordian the Fighter (1979-81) in Amharic. Here's the English version.

7. I had heard of malacology but didn't know what it meant until tonight. I wondered if malaco- and mollusc were cognates. Wiktionary derives both from Indo-European *mel- 'soft'. WHITE RAT 1.9

? qulugh ai nai sair ish nyair

'white rat year, head month, nine day'

1. The Khitan large script graph


is more complex than Chinese 九 <NINE> and has no obvious Chinese graphic cognate. Could the Khitan graph have originated as a logograph for a non-Khitan word *ish in an earlier (Parhae?) script that was recycled for a (nearly) homophonous, unrelated Khitan word for 'nine'?

There is disagreement on what the Khitan word for 'nine' was and whether it is related to the Mongolic word:

For about a decade I've thought the word was is, but today I finally realized that Chinese transcriptions of words written with the character for 'nine' in the small script all contain *sh, not *s. So I now think the word was ish.

Kane and Shimunek regard the word as cognate to Written Mongol yesün, but Janhunen (2003: 399, 400) regards the reconstruction of Khitan is for 'nine' as "anachronistic". He reconstructs Proto-Mongolic *yer.sü/n 'nine'¹ with an *r in the root absent from Khitan. (But perhaps a Proto-Serbi-Mongolic *rs fused into sh in Khitan. However, I know of no other examples of Mongolic ye corresponding to Khitan i, so the match may still be loose and coincidental.)

Jurchen <NINE> uyewun is not graphically cognate to Khitan <EIGHT>, though it is certainly graphically cognate to Chinese <NINE> (Jin Chinese *giu):


¹I think Janhunen's *r is motivated by words for 'ninety' like Written Mongol yeren which can be analyzed as yer-en by analogy with jir-an 'sixty'², dal-an 'seventy', and nay-an 'eighty'.

²This numeral must date after the innovation of jir-ghu-ghan 'two'-'three'-(numeral suffix) (i.e., 2 x 3) for 'six'. If Jurchen and Manchu preserve the Khitan word for 'six' (or a close relative) in their words for 'sixteen' and 'sixteenth day of the first month'  (nil-hun and niol-hun), an earlier word for 'sixty' might have been *nil-an. Could the Khitan word for 'sixty' written


<SIXTY> (large script) ~ <SIXTY> 266 (small script)

(masculine variants in the small script omitted)

be nilan? But there is no evidence for the readings of the tens in Khitan, let alone any evidence for a Mongolic-like X-an pattern for the tens in Khitan.

2. Children of empire are where you least expect them. Elizabeth Shepherd almost played Emma Peel on The Avengers:

She spent her first years in Burma (her first public appearances were performing Burmese dances at missionary meetings!), and came back to England to experience the wartime Blitz in London.

3. The etymology of Hawaiian hae 'flag'?

perhaps so called because a piece or torn [hae] tapa was used as a banner

4. The etymologies of the Hawaiian names of Kamehameha's British advisors:

[Isaac] Davis was given the Hawaiian name ʻAikake, after the way that the Hawaiians tried to pronounce Isaac, from /ˈaɪzək/ to /ˈaɪzɑkɛ/, Isaac"eh", to /ˈaɪkəkɛ/ (ʻAikake).

The Hawaiians gave [John Young] the name ʻOlohana based on Young's typical command "All hands (on deck)".

5. How did La Guardia learn German? Did his mother speak it?

His father, Achille La Guardia, was a Catholic from Cerignola, Italy, and his mother, Irene Luzzatto Coen, was a Jewish woman from Trieste, then part of the Austro-Hungarian Empire [...] He spoke several languages; when working at Ellis Island, he was certified as an interpreter for Italian, German, Yiddish, and Croatian.

6. The hospital that just got completed in ten days is named 火神山 Huǒshénshān 'Fire God Mountain'. Its soon-to-be completed sister is 雷神山 Léishénshān 'Thunder God Mountain'.

7. Ō NO

Alexander Zapryagaev:

An urge: please, if using “ou” to spell ō, add hyphens between o and u when they are separate! Otherwise, from recent: Inōe, Marunōchi, and, constantly in Sumō context, Shimanōmi.

I confess I use ou a lot for ō and ou, leaving it up to the reader to figure out which is [oː] and which is [oɯ], but I never write o-u [oɯ] across morpheme boundaries as ō. All three cases above involve two morphemes: no and a noun beginning with u.

8. I never thought about how Viking names were hibernized until I saw Sitric from Old Norse Sigtryggr. WHITE RAT 1.8


? qulugh ai nai sair nyêm nyair

'white rat year, head month, eight day'

1. The Khitan large script graph


is more complex than Chinese 八 <EIGHT>. Could the Khitan graph have originated as a logograph for a non-Khitan word *nyêm in an earlier (Parhae?) script that was recycled for a (nearly) homophonous, unrelated Khitan word for 'eight'?

I discuss the reconstruction of Khitan 'eight' here.

Jurchen <EIGHT> does not appear to be graphically cognate to Khitan <EIGHT>, though it might be derived from Jurchen <SEVEN>:


2. I just heard "Їдемо" by Хочу ЩЕ! on the radio. Its cover might lead some to think the song title was "Їдемо Україна/Yidemo Ukraine". Note how "Україна" is translated as "Ukraine" rather than transliterated as "Ukrayina".

Oddly when I searched for that cover, Google automatically switched to Russian even though I had input a string ("Їдемо Україна") with the Ukrainian letter Ї absent from Russian.

Is Їдемо with Ї- a dialectal form preserving the initial of Proto-Slavic *jĭdemŭ? The standard Ukrainian form is ідемо with i- like most Slavic languages.

-мо for the first person plural jumps out at me as a Ukrainian form, though I learned from De Bray (1980: ???) that -мо still marginally exists in Belarusian дамо 'we give' and ямо 'we eat' (but Wiktionary only lists ядзім with the regular -м ending).

-мо is also in Slovenian and Serbo-Croatian which do not subgroup with Ukrainian. It seems *-mŭ strengthened to -mo twice: at least once in the south (Slovenian and Serbo-Croatian) and once in the east (Ukrainian and Belarusian).

De Bray (1980: ???) also lists as a Ukrainian first person plural ending, though he gives no details on when to use it, and Wikipedia doesn't mention it.

ЩЕ [ʃtʃɛ] in the group name is closer to the second syllable of Proto-Slavic *ešče than the second syllable of Russian ещё [(j)ɪˈɕːɵ].

3. The last time I listened to a lot of Ukrainian pop music was eight years ago. Back then a lot of it was sung in Russian. Not anymore. Every song I've heard lately is in Ukrainian. So I was surprised to see a Ukrainian album with a Russian title and some Russian songs: Anna-Maria's Разные (Various).

4. Tonight I saw a TV report on a Hawaiian language celebration at Windward Mall.

5. While looking for an online reference to that event, I found this almost two weeks late:

Starting Wednesday, the University of Hawaii will offer free Hawaiian language courses to the public.

6. The oldest kanji in Japan? WHITE RAT 1.7


? qulugh ai nai sair ? nyair

'white rat year, head month, seven day'

1. The Khitan large script graph


is much more complex than Chinese 七 <SEVEN>. It just occurred to me that the former might have originated as a logograph for a non-Khitan word (*dalo or *dalu?¹)  in an earlier (Parhae?) script that was recycled for a (nearly) homophonous, unrelated Khitan word for 'seven'.

Sometimes Khitan and Jurchen large script characters are graphic cognates, but not in this case - Jurchen <SEVEN> is simpler (yet still different from Chinese 七 <SEVEN>):

Maybe the Jurchen character is a recycling of a logograph for a non-Jurchen word *nadan in an earlier (Parhae?) script that was recycled for a (nearly) homophonous, unrelated Jurchen word nadan 'seven'.

¹These guesses for 'seven' are based on Kane (2009) and Shimunek's (2017) interpretations of


the stem of 'seventh' (not 'seven'!).

The phonetic value of 313 is uncertain. I am unaware of any alternations between 313 and, say,

<l.o> or <l.u>.

And I am also unaware of 313 (or any other small script character or character sequence) transcribing Liao Chinese *lo or *lu.

Kane and Shimunek's interpretations are based on an assumption that <da.313> corresponds to the dolo- of Written Mongol dologhan 'seven'. (The suffix -ghan is a Mongolic innovation absent in Khitan.)

I feel uneasy about writing dologhan, for as Janhunen (2003: 34) points out,

Since Written Mongol is basically a non-spoken language transmitted with the help of an abstract graphic code, it has strictly speaking no 'phonology' or 'pronunciation', though many Written Mongol grammars misleadingly include sections on such topics.

There are, however, conventions for pronouncing it, or else I wouldn't have been able to read it out loud for my Written Mongol class in graduate school. There is no consensus on how to pronounce it: e.g., Grønbech (the textbook I used long ago) reads ᠳᠣᠯᠣᠭᠠᠨ 'seven' as dologhan whereas Lessing reads it as dolughan. The spelling is ambiguous: the fourth letter could represent o or u.

Janhunen works around the problem with a strict transliteration ignoring those conventions: e.g., tuluqhav 'seven'. I confess that I am not comfortable with his system and that I have yet to work out an alternative of my own. Off the top of my head, I could transliterate the Written Mongol spelling of 'seven' as <TUlUghan>, using uppercase to indicate ambiguities:

<l>, <gh>, <a>, and <n> are unambiguous and hence in lowercase.

Modern forms clearly point to do- for the first syllable but do not unambiguously point to o or u, as their second vowels could be contractions of ogha or ugha:

I got those forms from Sanzheev, et al. (2015-2018), Этимологический словарь монгольских языков (An Etymological Dictionary of the Mongolic Languages) legally available for free on the site of the Institute of Oriental Studies of the Russian Academy of Sciences: vol. 1 / vol. 2 / vol. 3.

Sun's (1990) Mongolic comparative dictionary provides even more forms, but again none point decisively to o or u.

I favor dologhan on the basis of Phags-pa Mongol ꡊꡡ ꡙꡡ ꡖꡋ <do lo 'an>.

2. Last night I learned about a manga called 緋の稜線 Hi no ryōsen (The Scarlet Ridge). The Sino-Japanese word 稜線 ryōsen 'ridge' can be mechanically translated into Sino-Korean as 능선 nŭngsŏn or into the Sino-Korean-native hybrid 산등성이 sandŭngsŏngi. san is Sino-Korean 'mountain', and 등 tŭng is the native word 'back', but what is the native final element -sŏngi?

3. Last night I compiled this table of the distribution of Tangut Grade II tense vowel rhymes:





Last night, I proposed that *-aq2 merged with *-aq1. Perhaps there were similar mergers of

But why were there no mergers of

{u, a, y} do not form a natural class contrasting with {i, o}.

Why don't -eq1 and -enq1 exist? Did Grade I and II *-e(n)q rhymes merge in the opposite direction as some of the other vowels, and if so, why?

Why is enq the only possible tense nasal vowel? Is that an error in my transcription system? I would expect inq, anq, and onq as well, but they don't exist (anymore?). (-un is only in Chinese loanwords which wouldn't have tense vowels, and there is no -yn.)

4. After over a quarter century of interest in Vietnam, I somehow never heard of the San Diu or Ngái peoples until last night. How do they differ from the Chinese?

5. Last night I wrote about the second half of  鍋奉行 nabebugyō 'person who tries to control every step of the process when a group cooks hot pot at the table'. Now a few words about the first half. Wiktionary gives this etymology:

Originally a compound of Old Japanese elements 肴 (na, 'small snack, hors d'oeuvre') +‎ 瓮 (he, 'a pot or pan for holding food or beverages'). The he changes to be as an instance of rendaku (連濁).

Janhunen would like that etymology. He thinks Japonic originally had lots of monosyllabic roots before being reshaped to look 'Altaic'.

2.1.23:21: Despite the fact that the word nabe is attested in Old Japanese (as 名倍 nambəy [Man'yōshū 3824] < *na-nə-pai; *pai may be *pa-i, but I don't know of any cases of a root *pa without *-i) and not in Korean until modern times, I was surprised to see Martin (1987: 490) write, "Borrowed from/into Korean nampi?" Wiktionary says the word appears in Korean as 남비 nambi /nampi/ in 1938. That is the only form in Martin et al.'s (1967: 308) dictionary. The current form (standardized in 1988 according to Wiktionary) is냄비 naembi /nɛmpi/ with fronting of the first vowel to assimilate to the secnd. 

I was also surprised to see Martin (1987: 403) quoting Ōno's proposal of a possible derivation of he 'pot or pan for food or beverages' from Korean pyŏng 'bottle' - which is an unrelated borrowing from Chinese 瓶.

6. Jacques (2014: 2) regards Pumi as the closest relative of Tangut. Here is a set of correspondences that puzzle me:

Written Tibetan
ʒɛ⁵⁵ ʒə¹¹ (lĩ⁵⁵)
ʐɐ̂ 𗥃
ni¹³ nɯ¹¹ (lĩ⁵⁵)
nǒŋ < *nǐ-jôŋ
𗍫 1ny'4
n̥i⁵⁵ ȵĩ⁵⁵ tio⁵⁵
𗾞 2ny'4
nyi-ma 'sun'
mi⁵⁵ mi³⁵
̂ ~ m̥ə̂
𗼎 2mi4 (1st syl. of 'Tangut')
𗜐 1my'1


Proto-rGyalrong is a very conservative Sino-Tibetan language. Nonetheless the multiple correspondences of Proto-rGyalrong *i may indicate that its vocalism is simpler than that of the proto-language.

2.1.14:42: I used to think 𗇋 2mer4 < *RImejH 'person' was related to mi-words for 'person', and maybe it contains the same root, but its proto-vowel is not *i. So I've replaced it with 𗼎 2my4 < *2miH, the first syllable of 𗼎𗾧 2my4 2na'4 'Tangut' < 'person black'.

7. Tonight for dinner I ate



kaki soba 'oyster buckwheat noodles'

an example of the loose matching between Japanese spelling and Japanese words. kaki 'oyster' and soba 'buckwheat noodles' are both monomorphemic, yet they are each written with two kanji. 牡蠣 has as much to do with kaki as it does with the English word oyster: almost nothing.

The Chinese word 牡蠣 (Mandarin mǔ lì) may have originated as an attempt to write a conservative sesquisyllabic form for 'oyster' like Old Chinese 蠣 *mIrats in a period when a monosyllabic *m-less pronunciation of 蠣 was favored.

*mI- is an animal prefix that probably has nothing to do with the Old Chinese word *CAm(r)uʔ or *mAruʔ 'male (of animals)' written as 牡. But the animal association of the  original referent of 牡 could have made it a favorable candidate for writing an animal name.

Could the unidentified minor syllable high vowel *I of *mIrats be *u as in 牡 *CAm(r)uʔ or *mAruʔ?

蕎麥 is a semiredundant compound: 麥 alone is 'wheat' and 蕎 is a kind of wheat. Neither half indicates noodles. The Japanese word 蕎麥 soba can refer to the buckwheat plant, and the spelling 蕎 麥 was used without augmentation for noodles made of buckwheat. In theory a silent <NOODLES> could have been added to indicate when soba refers to noodles: 蕎麥麵 soba (noodles) vs. 蕎麥 soba (the plant).

2.1.14:55: Incredibly the earliest attestation of 牡蠣 I could find at Scripta Sinica was in 世宗實錄 Sejong shillok (Veritable Records of Sejong, 1454). That may simply reflect how 'mussel' isn't the sort of word that would tend to appear in the genres of the Scripta Sinica corpus. I doubt牡蠣 was coined in 15th century Korea and thereafter somehow crossed the sea to be adopted as the spelling of Japanese kaki. The word must have been around in East Asia before then.

8. I recently noticed Korean in the opening title of The King of Queens but couldn't read it until I took a screenshot tonight:

참피온 탁구장

chhamphion thakkujang

'Champion Table Tennis Facility'

That turns out to be a real business in New York City.

9. Ortodokse or Orthodhokse? The seal of the Orthodox Church of Albania has a Greek-influenced version of its Albanian name with fricatives instead of stops.

10. I've known about Dazai Osamu's Hashire! Merosu (Run, Melos!) for a long time but never saw it until tonight. It's so short!

11. I've long thought South Park would be hard to translate into Japanese. The Japanese episode titles are not much like the originals.

12. I heard "Bringing in the Sheaves" on Two and a Half Men and had to look up sheaves. WHITE RAT 1.6


? qulugh ai nai sair ? nyair

'white rat year, head month, ? day'

1. The Khitan large script has at least nine variants of <SIX>:

I don't know which one(s), if any, are considered 'correct'. Such an extreme degree of variation may imply that Janhunen (1994: 111) was right about the Khitan large script having a history.

I have followed Andrew West in choosing what I'll call the 'fuma' variant (because it contains lookalikes of the katakana フ fu and マ ma). I rely on his calendar for Khitan and Jurchen dates.

2. I've never seen what I assume to be an Italian verb embedded like this in English before (emphasis mine).

I remained folgorated by the sounds and the melodies of Giombini

Was the interview in English, or is the word an artifact of machine translation into English?

3. The Honolulu Star-Advertiser features a Japanese word once a week. This week's word was  鍋奉行 nabebugyō 'person who tries to control every step of the process when a group cooks hot pot at the table'. It is literally 'pot magistrate'; a bugyō "was a title assigned to samurai officials of the Tokugawa government in feudal Japan".

奉行 <RECEIVE EXECUTE> is an ambiguous spelling: it can either be read using later stratum (Kan-on) readings as hōkō 'receiving and executing a lord's orders' or using earlier stratum (Go-on) readings as bugyō 'magistrate' (i.e., someone who receives and executes a lord's orders).

I think 奉行 is the only common word in which 奉 is read bu. 奉 is read in almost all other Sino-Japanese compounds except the Go-on compound 供奉 gubu 'accompany as an attendant; court monk', a word not in Windows 10's IME.

1.31.1:00: 供奉 can also be read as a Kan-on compound kyōhō 'supply; accompany as an attendant'. Typing kyōhō in Windows 10's IME also doesn't produce 供奉, so I use the Mandarin IME to write it. I've been typing Chinese characters on computers for 25 years now, and it never gets any easier.

4. The Honolulu Star-Advertiser today says singer Olivia Thai "spoke three Chinese dialects until learning English in school". I'm guessing the three are Mandarin and her parents' native languages which aren't Mandarin.

1.31.0:16: IMDb says "her parents were born and raised in Vietnam". Five major types of Chinese are spoken in Vietnam: Cantonese, Hakka, Teochew (Chaozhou), Hoklo (Hokkien), and Hainanese. The last three are forms of Southern Min.

The Vietnamese Th- spelling for [tʰ] in her name should have made me guess her family had lived in Vietnam.

5. Looking at Xinlong Queyu ʂqa¹³ rdə⁵⁵ 'ten' made me realize my belief that pre-Tangut uvulars conditioned Grade II might be wrong. The Tangut word for 'ten' is

1084 2ghaq1

which is Grade I, not II, even though its pre-Tangut ancestor presumably had *q like its distant relative Xinlong Queyu.

Maybe my belief isn't entirely wrong. There is no Tangut rhyme *-aq2: i.e., a Grade II tense vowel. So I propose this backstory for 'ten':

*SVqa > *SVʁa > *Sʁa > *ʁʁa > *ʁʁaq > *ʁaq > *ʁaq2 > *ghaq2 > ghaq1

I've excluded the tone 1- since I don't know when it developed.

The story in words:

*-q- lenited to *-ʁ-. The minor syllable was reduced to *S- which assimilated to *ʁ-, resulting in a tense consonant *ʁʁ- (cf. tense hh- in Middle Korean; the doubling is a convention I've borrowed from the conventional way Korean tense consonants are romanized). The tension of this consonant spread into the vowel, and I write that tension as -q. The tension in the vowel became phonemic after tense *ʁʁ- became regular *ʁ-. Uvular-initial words with lower vowels like *a developed Grade II. Uvular *ʁ- merged with velar *gh-, so the grade was no longer predictable. Finally, *-aq2 merged with *-aq1, leaving a gap in the rhyme system.

6. After over three decades I finally saw the lyrics of Falco's "Der Kommissar". The English sprinkled in it jumps out at me.

7. Tonight I found Suzuki Hiroyuki and Sonam Wangmo's "Lhagang Choyu Wordlist with the Thamkhas Dialect of Minyag Rabgang Khams (Lhagang, Dartsendo)" (2018) via the Wikipedia article on Choyu (Queyu). Here are examples of uvulars corresponding to Tangut Grade II:

I have omitted tones.

Normally Tangut Grade II derives from medial *-r- in forms with lower series vowels like *a, but there is no *-r- in the cognates of those words. Compare:

8. New word for today: anosmia (found after looking up Zicam which was advertised on TV tonight). WHITE RAT 1.5


? qulugh ai nai sair tau nyair

'white rat year, head month, five day'

1. Khitan large script <FOUR> doesn't look like Chinese <FOUR>, but Khitan large script <FIVE> does look like Chinese <FIVE>. Its phonetic value, however, is completely different: tau rather than ngu as in Liao Chinese.

Liao Chinese 五 *ngu 'five' is transcribed in the Khitan large script as 吾 ngu, a lookalike and soundalike of Liao Chinese 吾 *ngu 'I'. That usage of 吾 to write ngu may be a Khitan innovation, as 吾 was pronounced *ngo in the Middle Chinese known to users of predecessor scripts (Serbi and Parhae) in centuries past.

A typology of Khitan and Jurchen large script characters:

Does the character resemble a Chinese character?
Does the  character have a Chinese-like reading?
What kind of Chinese does the reading resemble?
Probable source of character
Do the Khitan/Jurchen and Chinese characters represent semantic equivalents?
Liao or Jin Chinese
Khitan or Jurchen Empire
Middle Chinese
Late Old Chinese
Khitan or Jurchen Empire
Parhae or Serbi
F no
Parhae or Serbi

Examples of each type:

A. Khitan large script phonogram 吾 ngu < Liao Chinese 吾 *ngu (rather than Middle Chinese *ngo or Late Old Chinese *nga)

B. Khitan large script phonogram 何 ha < Late Middle Chinese 吾 *ha (rather than Liao Chinese *ho)

C. Jurchen large script phonogram

<gai> [kaj]

< Late Old Chinese 可 *kʰajʔ (rather than Jin Chinese *ko [kʰɔ])

D. The Khitan large script phonogram 五 tau originated as a logogram for tau 'five' (which means the same thing as Liao Chinese 五 *ngu) and was then used to write any tau in the language. This exact usage cannot be a carryover from the Parhae script for some non-Khitan (Tungusic? Koreanic? Japonic?) language, since such a language wouldn't have a word like tau for 'five'. However, the general idea of using a character to write a non-Chinese word and all syllables sounding like that word is a carryover from the Parhae script.

If the Serbi word for 'five' was tau (as opposed to a more Mongolic-like *tabu) centuries ago when the lost Serbi script was in use, it is possible that 五 with the phonetic value tau is a carryover from the Serbi script. But I don't know whether *-b- was lost in 5th century Serbi - or even what the Serbi word for 'five' was. (Khitan and Mongolic mismatches in numerical words do not make me optimistic about guessing Serbi words on the basis of Khitan and Mongolic.)

E. The Jurchen large script phonogram

resembling Jin Chinese 不 *bu 'not' may be a derivative of 不 originally for a Parhae Koreanic cognate of Middle Korean ani 'not'. The character has nothing to do with negation in Jurchen; it is solely used to write Jurchen [an].

F. Khitan and Jurchen large script characters such as

Khitan <ai> and Jurchen <aniya> 'year'

that bear no resemblance to Chinese characters may be inherited from the Serbi and/or Parhae scripts rather than invented on the spot in the 10th and 11th centuries respectively. In Janhunen's scenario as I understand it, these characters are products of one or more alternate lines of evolution instead of conscious Khitan and Jurchen creations.

2. Last night I realized that Jurchen

<ge> /kə/

looked and sounded like the Old Japanese phonogram 居 (an adapation of Middle Chinese 居 kɨə 'dwell'). Cursive forms of 居 resemble Jurchen <ge>: the bottom component 古 is abbreviated into a shape close to 土. The dot in the Jurchen form may indicate the omission of strokes in a cursive form (see Grinstead 1972: 58 on this practice in Chinese calligraphy). Is <ge> a type B character reflecting Middle Chinese pronunciation?

<ge> corresponds to 厄 in Ming Mandarin transcription after voiced segments. Maybe it was phonetically [ɣə] after voiced segments. (Cf. *k-lenition to [ɣ] in Middle Korean to the south of the Jurchen-speaking area.)

I would predict that <ge> was [kə] in initial position, but in the one text where it occurs in that position (Memorial XX; Kiyose [1977: 190]), it corresponds to Ming Mandarin 額 *ə. That might be a mistake by analogy with the reading of <ge> in other positions where it is more common.

Kiyose's transcription <ge> seems motivated by historical and comparative considerations rather than Jurchen synchronic phonetics. <ge> sometimes corresponds to standard Manchu ge: e.g.,

'husband': <> : Manchu eigen (Translators 292)

But note Translators 137:

'camel': <> : Manchu temen (not ˟temgen - but cf. Written Mongol temegen 'id.')

3. I'm still reading William C. Hannas' The Writing on the Wall: How Asian Orthography Curbs Creativity (2003). On page 230 is a great self-referential typo: psuedowords!

4. I was puzzled by -t in Tshobdun rGyalrong kə'ŋɢət 'nine' since other Sino-Tibetan languages lack it: e.g.,

But Jacques (2009: 158) explains that the -t of the Japhug cognate kɯngɯt 'nine' is by analogy with the adjacent numeral kɯrcat 'eight'. Similarly, -t must have spread from 'eight' to 'nine' in Tshobdun. I should have figured that out.

Strictly speaking, could the spread have occurred in a common ancestor of Japhug and Tshobdun? Hsiu (2020) subgroups Japhug and Tshobdun together with 'rGyalrong proper' in 'Core rGyalrong':

Core rGyalrong
rGyalrong proper
Southeastern Situ

The five branches correspond to Gates' (2012) five languages (except that Gates uses the term 'South-central' instead of 'Southeastern').

It just occurred to me that the spread could have occurred in Tangut. But Tangut 1gy'4 'nine' cannot be from *ŋgu(t)X which would have become ˟1gwy'4 with a -w- absent from the actual word for 'nine'. However, Tangut 1gy'4 'nine' could be from *ŋgotX. Perhaps Proto-Sino-Tibetan *-əw became

The trouble is that Hill (2019: 272) has established that the Old Chinese reflex of Proto-Sino-Tibetan *-əw is *-o, not *-u.

I could try to work around this problem:

In any case, Pyu and perhaps pre-Tangut upset an otherwise neat pattern. Pyu and pre-Tangut do not subgroup together, so their mid vowels are not shared innovations.

Comparative Sino-Tibetan is still in its infancy. The stories of even basic words like 'nine' are still largely unknown. I have concentrated on the problem of reconstructing its vocalism, but there are other issues as well: e.g., Chinese and Pyu point to a voiceless consonant, whereas the other languages point to a voiced consonant. (Old Burmese k- is from *g-.) I don't know how to reconcile the evidence for *k-type initials with the evidence for *g-type initials.

5. Another numerical puzzle: Hsiu's (2020) Proto-rGyalrong * 'two' looks like Tibetan dgu 'nine' (and in fact I typed 'nine' after * by reflex at first). I've never seen anything like * before. It would be nice if * matched the mysterious 'other' words for 'two' in Tangut, but no:

See Andrew West's 2011 article for more on the 'other' Tangut numerals.

The characters for the root loq 'two' contain the left and right sides of the character for the normal word for 'two'


1ny'4 (cognate to the other Proto-rGyalrong word for 'two', the pan-Sino-Tibetan word *k.nis)

which has long reminded me of a mirror-image version of the complex Chinese character for 'two', 貳.

6. I had forgotten that I had bookmarked Wang Feng's "Language Diversity and Human Diversity in Yunnan". I accidentally clicked on the bookmark today at lunch.

Slide 9: The complexity of the Bai script with its many semantic-phonetic compounds contrasts with the relative simplicity of the Khitan and Jurchen large scripts which do not seem to have any such compounds.

Slide 19: Wang's view of Bai as a sister of Chinese reminds me of Starostin's proposal that Bai is an offshoot of Chinese. But to me Bai has always seemed too different to be closely related to Chinese.

Slide 20: The shared Bai/Chinese sound changes that Wang proposes could simply reflect Chinese sound changes in Chinese loanwords in Bai. I would like to see these sound changes in probable native Bai words: i.e., Bai words without Chinese cognates. ("Probable" because those words could be borrowings from non-Chinese sourcces.)

Slide 25: The mismatches between the early Chinese transcription of Bai and Proto-Bai may suggest that the latter postdates the former: i.e., that Proto-Bai underwent sound changes not yet reflected in the transcription.

Slide 27: The comparison of Proto-Bai *dro4 and Chinese 石 'stone' has bothered me because Proto-Bai has a *-r- absent from Chinese. I reconstruct 石 in Old Chinese as *CiTak. There has to be an *i in the minor syllable to condition *-i- in Middle Chinese:

*CiTak > *diak > *dʑiak > *dʑiek

The theoretical Go-on reading of 石 should be jaku (from *dʑiak) but the actual reading is shaku. Perhaps shaku is from a *tɕiak that lost a minor syllable *Ni-:

*Nitak > *Nitiak > *tiak > *tɕiak

(in other dialects: *Nitiak > *Ntiak > *ndiak > *diak > *dʑiak)

But here's another scenario:

*Ridiak > *rdiak > *driak > Proto-Bai *dro4

Schuessler (2007) compared the Chinese word to Vietic, and indeed my *Ridiak does superficially resemble Vietic forms like Ruc latáː 'stone' (Nguyễn  Phú Phong et al. 1998). The Ruc tone goes back to *-ʔ which isn't far from *-k. But I think the Chinese and Vietic forms are unrelated lookalikes, as the match is weak:

I also considered the Chinese and Bai forms to be lookalikes too, but given that the early transcription of Bai 'tiger' reflects a pre-Proto-Bai *la1 corresponding to Proto-Bai *lo1 (slide 25), maybe 'stone' was borrowed as pre-Proto-Bai *rdak or  *drak which became *dro4 in Proto-Bai.

Slide 27 (again): Wang regards Proto-Bai *the4 'iron' as being from "[t]he oldest layer" of Chinese loanwords. That layer cannot be very old, as *the4 resembles Late Old Chinese 鐵 *tʰet and not Early Old Chinese *HAlik or *CAl̥ik.

Bridging the Early and Late Old Chinese forms (the relative chronology is not entirely clear):

Secondary *l̥-scenario:

*HAlik > *HAlit > *HAlait > *Hlait > *l̥ait > *tʰait > *tʰeit > *tʰet

*H could have been *k: cf. Ruc klát 'iron', another version of this word which seems areal

Primary *l̥-scenario:

*CAl̥ik > *CAl̥it > *CAl̥ait > *l̥ait > *tʰait > *tʰeit > *tʰet

I favor the secondary scenario, as I'd like to think that all Old Chinese voiceless sonorants are ultimately compressions of earlier *CVC-sequences. WHITE RAT 1.4


? qulugh ai nai sair ? nyair

'white rat year, head month, four day'

1. I have not commented on the Khitan large script characters 一二三 <ONE TWO THREE> which are self-explanatory and identical to Chinese 一二三 <ONE TWO THREE>. One might expect Khitan <FOUR> to look like Chinese 四 <FOUR>, but Khitan breaks the pattern with a near-lookalike of Chinese 卅 <THIRTY> (< <TEN> x 3).

The Khitan character has four lines and looks like a tally mark.

There is a tally mark-style variant of Chinese <FOUR>, but it is a stack of two <TWO> rather than a line with three intersecting lines: 亖.

I just realized Jurchen <FOUR> also has four lines, albeit in yet another configuration:

What I call Janhunen's question applies here:

If it was the aim to create a [Khitan] script distinct from the Chinese, why were not all [Khitan large script] characters consistently replaced or modified? (Janhunen 1994: 111)

Why create a new character for <FOUR> but not <ONE>, <TWO>, or <THREE>?

Janhunen's question also applies to the Jurchen (large) script: if it was the aim to create a Jurchen script distinct from both Chinese and Khitan, why carry over一二 <ONE TWO> into Jurchen while only adding a stroke to 三 <THREE> and coming up with a new character for <FOUR>?

I like what I'll call Janhunen's solution: the Khitan and Jurchen large scripts are both offshots of an earlier script or scripts that in turn are sisters of the standard Chinese script rather than deliberately engineered deviations from it. Perhaps Khitan large script <FOUR> is derived from the lost Serbi character for <FOUR>, whereas Jurchen <FOUR> is derived froma Parhae character for <FOUR>.

2. Shimunek (2017: 233) reads <FOUR> as dur or tur, assuming that it shares the same root as the ordinal numeral d/turər (m.) ~ d/turən (f.) 'fourth'. But I fear that 'four' may be to 'fourth' what English two is to second: i.e., not related.

Khitan 'fourth' is what I call an alternator: its initial consonant is spelled both <t> and <d>. The Mongolic side of Serbi-Mongolic has d- for 'four'. Is the Khitan initial /t/ or /d/ with allophony, or is it a third consonant without a character of its own? Perhaps:

Khitan small script spelling
<t> ~ <d>

Mongolic may have merged Proto-Serbi-Mongolic *t and *d into d, reducing a three-way opposition to two: d /t/ vs. t /tʰ/.

Shimunek's tur ~ dur is another example of the vowel merger that he proposed. Mongolic preserves the original vocalic distinction between 'three' and 'four':

Written Mongol
*gu-r [ɢʊr]
[ɢur] ghurban
*tö-r [tor]

Maybe Khitan 'three' was [ɢʊr] with a [ʊ] that had been demoted to an allophone of /u/ after uvulars unlike Proto-Serbi-Mongolic /ʊ/ which contrasted with /u/. Some sample syllables:

Written Mongol
*gu [ɢʊ]
/ɢu/ [ɢʊ] ghu
*gü [gu]
/gu/ [gu]

*tu [tʊ] /tu/ [tu]
*tü [tu]
/tu/ [tu]

Uvulars which were allophones of velars before low series vowels in Proto-Serbi-Mongolic became phonemic in Khitan.

3. Khitan small script character 057 from yesterday's post belongs to a 'family' of characters that may not have any phonetic common denominator:

<054 055 056 057 058 059 388>

The readings of 054, 055, 056 (a variant of 055?), 058, and 388 are unknown. I don't know why Kane assigns the mnemonic transliteration (not a reading!) mỉ to 058 which doesn't resemble any Chinese character pronounced mi. (Some mnemonic transliterations are based on graphic resemblances to unrelated Chinese characters.)

057 is <ho> and 059 is <uni>.

One frustrating thing about the Khitan small script is that similar-looking characters like these seem to have the same graphic elements for no reason. A Khitan might say the same thing about the Latin letters E and F. Or P and R. (Of course some resemblances are significant: e.g., C and G, I and J, U, V, and W.)

Khitan small script character numbers 001-378 from Chinggeltei et al. (1985) are widely used (though not universal). The numbering after that varies by scholar. I follow Wu and Janhunen's (2010) numbering which builds upon Kane's (2009) additions and has more numbers than anyone else's (459)¹. Kane assigned 379 and 380 to


which Chinggeltei et al. (1985) regarded as

a variant of 081 and a block <335.277>.

I wrote a five-part series on 380 <FORTUNE> and its variants last year. I accept Kane's interpretation of those characters.

Wu and Janhunen (2010) assigned the next available number (381) to

(function unknown²)

which corresonds to 379 in Chinggeltei (2010). Chinggeltei regards Kane's 379 as his 386 and does not assign a number to 380 (perhaps because he still regards it as a block of two characters).

¹One might think that Wu and Janhunen's (2010) list of 459 Khitan small script characters is complete, but there are at least 472 characters, and I don't know what I'm going to do about numbering the 13 characters not inWu and Janhunen (2010).

All numbering systems for Khitan small script characters are arbitrary, but one is necessary because I need some way to refer to characters whose pronunciation is unknown without resorting to images. And I have switched to naming my images by numbers.

You can tell which images are old by their names. The earliest images are named using Kane's transcription system. So the image for 057 is called "BabelStone-small-xo.gif" rather than "BabelStone-small-057.gif" because it dates before the change in April 2014.

²Wu and Janhunen (2010) transliterate


as hong˟, but that does not mean 381 was necessarily read hong; the diacritic ˟ merely indicates a graphic resemblance to


which Wu and Janhunen (2010) read as hong. 381 and 075 could have readings as different as

057 ho and 059 uni.

Wu and Janhunen (2010: 43) list six cases of their use of the diacritic ˟ but exclude 381 hong˟.

I think the post-380 characters either represent rare syllables or are logograms for words that are not common in the known corpus (though they might have been common in everyday speech). Wu and Janhunen (2010: 40-41) identify twenty single-consonant characters. That may be a complete or nearly complete list. I would be surprised if there are more than a couple of single-consonant characters that have not been identified yet.

4. Last night I guessed that the Korean equivalent of Japanese 戰慄 senritsu 'shudder' would be 전률 <ch.ŏ.n r.yu.r> chŏllyul. But the actual word is 전율 chŏnyul. It doesn't seem possible to predict when combinations of Sino-Korean morphemes ending in /n/ and beginning with /r/ will surface as [ll] or as [n]. Compare:

Han 'Korean' + 流 ryu 'flow' = 韓流 Hallyu 'wave of Korean pop culture popularity'

chŏn + 慄 ryul = 戰慄 chŏnyul 'shudder'

The reading 慄 (r)yul is interesting because it corresponds to Middle Chinese *lit without any -u-like vowel. In the idealized Sino-Korean of  東國正韻 Tongguk chŏngun (Correct Rhymes of the Eastern Country, 1447), 慄 is 리ᇙ rírʔ which should correspond to modern Sino-Korean 릴 ril. But there is no such modern Sino-Korean reading: all the hanja read rírʔ in Tongguk chŏngun (栗凓慄鷅搮篥 - all with the same phonetic 栗 <CHESTNUT>) are now read 률 ryul, and the Sino-Korean reading 릴 ril does not exist. (In fact the only 릴 ril in Korean is a loan from English reel.) What happened?

There is no Korean-internal reason I can think of to shift -i- to -yu-. So maybe Korean borrowed from a Chinese variety with -yu- and the Tongguk reading is a 'correction'. Xiaoxuetang lists some modern southern varieties with labial vowels in 栗 'chestnut', but I doubt they are relevant:

There are some Hakka varieties with lut, but I think all those forms are borrowings from Cantonese.

The earliest reconstructible form of the word is *rik. I reconstruct *-k assuming that it is preserved in a loanword in the Kam-Sui language Then (lik 'chestnut') mentioned by Schuessler (2007: 352). (Later Chinese -it can be from either *-it or *-ik.)

5. 中山 Zhongshan Hakka is one of the Hakka varieties that seems to have borrowed 'chestnut' from Cantonese. Zhongshan "is one of a very few cities in China named after a person." What are the others?

6. I came across this while reading up on heterograms:

The New Persian term [گبر] Gabr (Zoroastrian) may have arisen "as a contemptuous term for the people who wrote [the Aramaic spelling] 'GBR' ' instead of [the native word] 'mard' " (Sims-Williams, personal communication; see GABR [link added] for other views), in which case it demonstrates a correct reading of the heterogram involved.

I vaguely recall that I thought there might be some connection between gabr and Arabic كافر kāfir, but Shaki (2012) points out that

[...] although Persians still fail to articulate some Arabic speech sounds properly, there is no unusual sound in kāfer [the standard Persian pronunciation of kāfir with lowering of short i to e] that would require phonetic modification. Moreover, although gabr has been sometimes used to denote infidel (kāfer) by semantic extension (e.g., Rūmī, Maṯnawī II, p. 287, v. 177; Ḥasan Rūmlū, ed. Navāʾī, I, p. 384; Eskandar Beg, I, pp. 85, 87), kāfer as a generic word could hardly refer to a specific revealed religion such as Zoroastrianism.

7. Why do I care about heterograms? I've been pondering whether to use the term to describe Tangut characters that might have originated as Khitan small script block-like representations of some other language ('Tangut B'): e.g.,

𗰗 1084 2ghaq1 'ten'

which might correspond to some Tangut B word written phonetically as


(pronunciation unknown) + (pronunciation unknown).

The Tangraphic Sea analysis of 1084 is unknown. If it were known, most would interpret it as a semantic compound analysis:

𘢰 as an abbreviation of character X having some semantic relevance to 'ten' +
𘤊 as an abbreviation of character Y having some semantic relevance to 'ten'

But I side with Kwanten's (1989) basic idea - the analysis might be understood as

𘢰 as pronounced in character X +
𘤊 as pronounced in character Y

I disagree with Kwanten on some major points. Kwanten seems to imply that Tangut was at least typologically 'Altaic', whereas Tangut (A) is clearly Sino-Tibetan. And in Kwanten's view, the Tangut script represents that 'Altaic'-type language, whereas in mine, it encodes a Sino-Tibetan language with spellings sometimes reflecting an unrelated isolate (Tangut B). To tie that example back to topic 5, 2ghaq1 has as much to do with 𗰗 as Middle Persian mard has to do with  the Aramaic-based spelling <GBR>: nothing but convention.

8. Today while copying the Sino-Jurchen vocabulary of the Bureau of Translators, I came across the word

<MIRROR³.ku> 'mirror'

which Kiyose (1977: 111) read as bulunku and Jin (1984: 48) read as buneku. Kane (1989: 251) read the word as meleku in the Sino-Jurchen vocabulary of the Bureau of Interpreters.

What's going on there? Let's look at the Ming Mandarin transcriptions in the two sources:

And then let's look at attested 'modern Jurchen' forms:

Here's what I think happened:

The *m in the Interpreters transcription may be a misperception or approximation of a Jurchen [b]. (Maybe b really was [b] in that dialect rather than [p].)

Maybe b- > m- under the influence of the following [ŋ], but I doubt it. (But cf. Jurchen bonion > monion 'monkey' in which the nasal is closer to b.)

Jin's -n- is doubtful. I think he was influenced by how 弄 is now pronounced nòng in standard Mandarin. It is an example of what I call n-eutralization: n-l merger in the direction of n. nòng may be a borrowing into Standard Mandarin from a n-eutralizing dialect. There are also l-eutralizing dialects. Compare these Mandarin dialects (I've left out the tones from Wuchang and Hefei):

弄 'play'
來 'come'
你 'you'

武昌 Wuchang (n-eutralizing)
noŋ nai
合肥 Hefei (l-eutralizing)
ləŋ lᴇ

³The first character

is not in any other known word and could be a logogram <MIRROR>.

9. Today while copying the Sino-Jurchen vocabulary of the Bureau of Translators, I came across the word

<ha.ji.ha> hajiha 'scissors'

The first character seems to be graphically cognate to the Khitan large script character <ha> and the Chinese character 何 which was pronounced *xo in Liao and Jin Chinese, not *xa which was the Late Middle Chinese pronunciation. My guess is that 何 was a phonogram for ha in the Parhae script that retained its value in the Khitan and Jurchen large scripts even though the vowel of the Chinese original had raised and rounded.

The standard Manchu word for scissors is hasaha. It is hard to reconcile hasaha with the Translators form hajiha and the Interpreters form transcribed in Ming Mandarin as 哈雜 *xatsa that Kane (1989: 582) interprets as hadza or haj(h)a. I would favor haj(h)a, as Jurchen probably did not have /dz/ or [dz]. I suppose one could reconstruct a common ancestor with a *z that became s in Manchu and j in Jurchen, but there is no other evidence for such a voiced fricative. WHITE RAT 1.3


? qulugh ai nai sair ? nyair

'white rat year, head month, three day'

1. Shimunek (2017: 233) reads <THREE> as ɢur, assuming that it shares the same root as the ordinal numeral ɢurər (m.) ~ ɢurən (f.) 'third'. But I fear that 'three' may be to 'third' what English two is to second: i.e., not related.

His ɢur /ɢur/ corresponds to the Written Mongol root ghur. Proto-Serbi-Mongolic *gur [ɢʊr] had a uvular allophone [ɢ] of *g before the lower series vowel *u /ʊ/. In Khitan, that allophone became phonemic after *u /ʊ/, /u/, and /o/ merged into u (Shimunek 2017: 214).

That merger is similar to the merger of Jin Jurchen *u /ʊ/, /u/, and /o/ into u in the Ming Jurchen of the Bureau of Translators (Kiyose 1977: 41). (Note, however, that according to Kiyose, not all /o/ became u /u/ in the Bureau of Translators dialect. in initial syllables became e /ə/ rather than u /u/.) Oddly Kiyose's proposed merger in Jurchen  (between the 13th and 15th centuries?) long postdates the merger in Khitan (before the 10th century?). I would have expected the mergers to be more or less simultaneous as a Manchurian areal feature. The mergers require more study.

2. Last night I heard the word pleather on The Goldbergs and couldn't identify what the p-part was. I should have been thinking of a pl-part.

3. Last night I found that the unusual block in the 契丹小字研究 Qidan xiaozi yanjiu (Research on the Khitan Small Script) hand copy of the epitaph for Empress 仁懿 Renyi (?-1076) of the Khitan Empire (left) corresponds to the conventional block in the index of blocks on p. 200 (right).

<162-六-229-349> vs. <057-229-349>

The first block can't be read since it has a noncharacter: 六 is a Chinese or Khitan large script character¹, not a Khitan small script character.

The second block can be read, but how?

Kane (2009) would read it as <>. Kane assumes that some Khitan characters had inherent vowels. He mentions Nie Hongyin's suggestion of Khitan initial consonant clusters on p. 255, but does not seem to believe in the idea.

Shimunek (2017: 218-220), on the other hand, is comfortable with initial clusters, and would read the word as <>. The initial cluster /ct/ is unusual in East Asia and the 'Altaic' world, but is plausible from a global perspective. Similar clusters appear in, for instance, Russian чтение [tɕtʲenʲɪje] 'reading' and Czech čtvrt 'quarter'.

The modern Mongolic language Mongghul may not have that particular  cluster (it's not in the list of clusters in Georg [2003: 293]), but Tibetan influence has led to clusters even in native words: e.g., rg- in rgon 'wide' (cf. Written Mongol örgen 'id.').

The vowel sequence a ... e violates 'Altaic' vowel harmony rules, though it is not impossible in the region: e.g., just this morning I copied the Manchu word daise-la-bu-ki 'substitute-VBLZ-CAUS-DES'. Such apparent violations need further examination.

1.28.22:48: Manchu daise 'substitute' seems to be a borrowing from a hypothetical Chinese *代子 'substitute' (a spoken word that wasn't preserved in the conservative written language?). The verbalizing suffix -la- converts nouns into verbs. (I couldn't find an appropriate abbreviation in the Leipzig glossing rules, so made up VBLZ  'verbalizer' by analogy with NMLZ for nominalizer.)

¹The function of 六 in the Khitan large script is unknown. Despite looking exactly like Chinese 六 <SIX>, it does not stand for the Khitan word for 'six' which is written with an entirely different character written at least nine different ways:

My guess is that 六 is a phonetic symbol pronounced something like Liao Chinese 'six' or like words for 'six' in some language of Parhae.

4. On Reba, a drawn-out pronunciation of the name Hart was described as having "two syllables": [hɑːːɹt]. I presume that was an artifat of the script which might have something like "Ha-art" (which is what the closed captions had). The overlong vowel was pronounced with a rising pitch.

I got the [ːː] symbol from the Wikipedia Estonian article which uses [ː] for long vowels and [ːː] for overlong vowels (pronounced with a falling pitch unlike "Ha-art").

5. Forty-five years ago today, Super Robot Mach Baron fought プレッシャーケルン Puresshākerun 'Pressure Köln". That got me to look up Kölsch the and find this unusual change:

As a typically Ripuarian phenomenon, [d] and [n] have changed into [ɡ] and [ŋ] in some cases, e.g. std. "schneiden, Wein", ksh. "schnigge, Wing".

1.28.0:01: [d] > [g] reminds me of t > k in Hawaiian.

6. When I first encountered Japanese 戰慄 senritsu, I thought it had something to do with fighting since 戰 is 'fight'. Then I learned 戰慄 meant 'shudder' and was puzzled by what 戰 was doing. Was 戰慄 originally 'shudder in battle'? Turns out that 戰慄 is a synonym compound 'shudder-shudder'. Schuessler (2007: 605) explains that

[A]s in many lgs., the word for 'war, battle' zhàn [the Mandarin reading of 戦] may be a semantic extension [of] zhàn 'tremble, fear' [...] The semantics are identical to Greek pólemos 'war' which is derived from a root 'tremble, fear' (Buck 1949; §20.13).

Apparently 'shudder' for 戦 only survives in 戦慄. WHITE RAT 1.2


? qulugh ai nai sair ? nyair

'white rat year, head month, two day'

1. I've been writing Jurchen dates in a sexagenary month +  numeral day hybrid style which gave each day a name that would be unique for five years and gave me an excuse to discuss Jurchen numerals. But this Khitan year I'm going to write both months and days in a mostly numerical style. Mostly because the first month is 'head month' rather than 'one month'.

There won't be another White Rat 1.2 for another sixty years, and I will be, um, gone by then, so these calendrical titles should be unique.

2. Shimunek (2017: 234) reconstructs Khitan 'two' as jur. It is unclear how the vowel of jur can be reconciled with the vowel of the root jir 'two' in Written Mongol jirghughan 'six' < jir 'two' x ghu 'three' + -PAn (lower numeral suffix). Shimunek does not reconstruct a Proto-Serbi-Mongolic word for 'two'. I presume that word was *j-r.

3. Yesterday it occurred to me that if fragments like this from the tomb of the first Khitan emperor (d. 926) could be dated, they might be the earliest surviving texts in the Khitan large script.

As far as I know, the earliest surviving dated Khitan large script text is the epitaph for 耶律延寧 Yelü Yanning from 986.

And the earliest dated Khitan small script text is the epitaph for 耶律宗教 Yelü Zongjiao from 1053.

宗教 now means 'religion', but I can't find any examples of the term in Scripta Sinica before the Yuan dynasty (i.e., after the Khitan Empire). So maybe the name was to be understood as a phrase 'ancestral teaching'.

Then again, Wikipedia says the expression is first attested with a narrow, concrete meaning (崇佛傳統及其弟子的教誨 'the tradition of Buddha-worship and the teachings of his disciples') in the Buddhist text 續傳燈錄 Xuzhuan denglu (The Lamp Record of Continued Biographies?) from the 10th century AD. But the seeming absence of 宗 教 from 10th and 11th century secular texts makes me think the expression had not yet widely diffused when Yelü Zongjiao got his name.

4. Yesterday while copying the 契丹小字研究 Qidan xiaozi yanjiu (Research on the Khitan Small Script)  hand copy of the epitaph for Empress 仁懿 Renyi (?-1076) of the Khitan Empire, I encountered the first Khitan small script block I've ever seen with this asymmetrical layout (15.1):



The trouble is that the character 六 under <c> isn't even a small script character. Is that an error in the hand copy?

5. 'Toothbrush' in Korean is 칫솔 chhissol < Sino-Korean 齒 chhi 'tooth' + -s- (genitive) + sol 'brush'. I wonder when it was coined - it's obviously modern, but is it pre- or postcolonial?

Most Korean compounds are etymologically 'balanced': a Sino-Korean morpheme is paired with another Sino-Korean morpheme, and a native morpheme is paired with a native morpheme: e.g.,

So mixed cases like chhissol stand out to me.

chhi 'tooth' is usually a bound morpheme, though Martin et al. (1967: 1653) says it is a literary word. Is it ever used outside the fixed expressions

齒(를) 떨다

chhi-rŭl ttŏlda

'tooth(-ACC) shake (v.t.)' = 'grind teeth; stingy'

齒(가) 떨리다

chhi(-ga) ttŏllida

'tooth(-NOM) shake (passive of ttŏlda)' = 'teeth grind'

6. Japanese and Korean have pairs of transparently related intransitive and transitive verbs: e.g.,

I often rely on analogies with Japanese to function (if it can be called that) in Korean, but analogies only go so far. Notice I didn't translate ire- and tŭri-. ire- can mean 'put something in something' (i.e., make something enter something), but tŭri- does not. That meaning belongs to the unrelated Korean verb nŏh-.

A list of asymmetries like that would be useful for Korean learners (and vice versa for Korean-speaking learners of Japanese - there may be some case where a Korean verb X and a derived verb X' correspond to a Japanese verbs X and an unrelated Japanese verb Y, but I can't think of one).

7. Martin (1967: 337) reports the dialect form yŏh- for nŏh-. Do they go back to *nek-?

8. I discovered the spelling 這入る <CRAWL ENTER ru> for hair-u 'to enter' when looking up 入る (ha)ir-u 'id.' in Naver (see topic 5). 入る (ha)ir-u is ambiguous, but  這入る hair-u is not. 這入る only has 88,400 Google results. akipun explains that

もし、現代の本などで「這入る」と書いてあったら、古いスタイルで文章を書きたかった or 這って入るという意味で使っていると思います。

If 這入る is written in a modern book or the like, I think the author wanted to write in an old style or is using it to mean 'crawl and enter'.

(The "or" is not me; it's in the original.)

As one could guess from the above passage, the spelling 這入る <CRAWL ENTER ru> is etymological. The Middle Japanese collocation faf-i ir-u 'crawl-and enter-FIN' fused into a single word hair-u simply meaning 'enter'. Could the fact that the common verb wi-ru 'to exist' and ir-u 'to enter' became homophones provided pressure for hair-u to replace ir-u? Probably not, as nonfinite forms of the two verbs are not homophonous, and the two had been homophonous for a long time before  the rise of hair-u.

9. I was surprised to learn that DeBakey is Arabic (دباغي <dbʔghy>? - could Dabbāghī have been Anglicized via translation as Tanner?). Its De has nothing to do with French or Dutch de (as I should have guessed, since Bakey doesn't look French or Dutch).

Is there a word for pseudomorphemes in altered names: e.g., the O' of O'Dell < Odell which has nothing to do with Irish Ó (the name is English)?

10. Wikipedia has a useful list of don'ts with Arabic names: e.g.,

"Abdul" means "servant of the" and is not, by itself, a name. Thus for example, to address Abdul Rahman bin Omar al-Ahmad by his given name, one says "Abdul Rahman", not merely "Abdul". If he introduces himself as "Abdul Rahman" (which means "the servant of the Merciful"), one does not say "Mr. Rahman" (as "Rahman" is not a family name but part of his [theophoric] personal name); instead it would be Mr. al-Ahmad, the latter being the family name.

I've wondered if Paula Abdul's last name is an Americanization of `Abd al-something. Maybe the right word is Brazilification (?), as it turns out her father had immigrated to the US via Brazil.

11. Via the Arabic names article, a new term for today: theophoric name. KHITAN NEW YEAR: THE 1100TH ANNIVERSARY OF THE KHITAN LARGE SCRIPT


? qulugh ai nai sair ? nyair

'white rat year, head month, one day'

This year the Khitan large script turns 1100, so I'll be using the Khitan large script for dates for the rest of the year.

The last three characters of the date are shared with Chinese (月一日), but the other four bear no resemblance to Chinese 白鼠年首 <WHITE RAT YEAR HEAD>. (Other Chinese equivalents exist, but they don't look like the Khitan large script characters either: e.g., 子 <CALENDRICAL.RAT>, 頭 <HEAD>, etc.) Why? As Janhunen (1994: 111) asked,

If it was the aim to create a [Khitan] script distinct from the Chinese, why were not all [Khitan large script] characters consistently replaced or modified?

I agree with Janhunen that the Khitan large script was not 'invented' in 920; it is an outgrowth of some earlier script, perhaps the fragmentarily attested Parhae script or the wholly lost Serbi script. And that script was a sister of the Chinese script with innovations absent from Chinese: <WHITE>, etc. I think the date 920 may refer to the revision of an earlier script for Khitan use.

The date itself may not be accurate, as the earliest dated Khitan script text is the epitaph for 耶律延寧 Yelü Yanning from 986. Are there Khitan large texts from 920-985 that have not yet been discovered, or was the script 'created' after 920?

Notes on the characters/words:

To tie up loose ends from last year, I've posted all the blog entries from between 1.2 and today: YELLOW PIG 12/30

songgiyan uliya aniya

juwa juwe biya orin gusin inenggi

'yellow pig year, ten two month, thirty day'

1. Today I clicked on Andrew Hsiu's map of the Qiangic linguistic area. Tangut is to the north of it; the former Tangut capital is now known by its modern Mandarin name Yinchuan.

I care about the Qiangic linguistic area because the languages in it are Tangut-like. Whether they actually subgroup with Tangut is another matter. Jacques (2014: 2) thinks most of them do, so he places them in a 'Macro-rGyalrongic group'. The exceptions are:

2. Hsiu posits 'missing' Sino-Tibetan branches to serve as sources of Sino-Tibetan-like vocabulary in

To solidify the case for these branches, one would have to

3. When hearing the word petrol /ˈpɛtɹəl/ out loud on Magnum, P.I. tonight I finally realized it's short for petroleum /pəˈtɹoʊliəm/. Duh. That isn't the first time it took me a long time to link two words whose relationship is obvious in spelling but not in pronunciation. I wish I could remember the last time that happened. I think it was sometime within the past few months.

Someone learning English as a foreign language and first encountering those words in print would immediately link them and face the different problem of pronouncing them differently: petrol is not /pəˈtɹoʊl/, and petroleum is not /ˈpɛtɹəliəm/.

4. Tonight - three days after I started reading William C. Hannas' The Writing on the Wall: How Asian Orthography Curbs Creativity (2003) - it finally occurred to me that literate Khitan would be interesting test subjects for his ideas about the effects of writing systems on thinking.

The Khitan had two scripts, and nobody really knows why. Andrew West's great essay on the mystery ends,

Both scripts are complex enough to require a considerable investment of time and effort to learn to read and write, so how is it possible that both scripts managed to coexist and flourish for so long ? Did the Khitan education system require students to learn both scripts, or were Khitan scholars only able to read and write one or other of the two scripts ? It makes no sense to me ...

... or me.

Let's imagine that Hannas could be sent back a thousand years to the Khitan Empire. Using his knowledge of Chinese, Japanese, and Korean, Hannas would be able to easily learn Khitan, an 'Altaic'-type language with many Chinese loanwords like Japanese and Korean. Hannas proposes that syllabic scripts without word division inhibit creativity. So in his framework, what effects would the Khitan scripts have?

A brief comparison:

Khitan script
word division?
not quite
not quite

The large script, despite its superficial similarity to the Chinese script, does not have a one-to-one correspondence between syllables and characters. Some syllables are written as two-character sequences: e.g., Han (the Chinese name 韓) as 何至 <>. Conversely, some disyllabic words are written as single characters: e.g., namur 'autumn' as 禾 (cf. Chinese 秋 <AUTUMN>).

The small script has a mixture of characters for single segments and syllables. The small script is more analytic than the large script which in turn is more analytic than the Chinese script:

small script > large script > Chinese script

And unlike either the large script or Chinese script, words are generally written as blocks - the first instance of word division in East Asia. The only exceptions to that rule are Chinese loanwords which are written as one syllable per block (not counting Khitan affixes added to those blocks): e.g., the disyllabic word hongdi 'emperor' from Liao Chinese  皇帝 *hongdi  [xɔŋti] is written as two blocks

<075 037> <hong di>

rather than as a single block

<075.037> <hong.di>.

So if Hannas is right, small script users might be more inclined toward creativity than large script users who would still be more inclined toward creativity than those only literate in the Chinese script. YELLOW PIG 12/29

songgiyan uliya aniya

juwa juwe biya orin uyewun inenggi

'yellow pig year, ten two month, twenty nine day'

1. Last night I learned from the Korean Wikipedia that the eight trigrams have 二進法 ijinbŏp 'binary' equivalents.

2. I have no idea if this is a true explanation for the presence of 隹 <BIRD> in 進 <WALK.BIRD> for 'forward', but it's a useful memory aid:

A bird can only walk forward but not backwards, hence implying "forward".

3. How did Proto-Germanic *hw- become tsj- in West Frisian tsjil 'wheel'?

4. Wikipedia's discussion of the possible Indo-European origin of the Chinese chariot is a bit anachronistic:

However archeological evidence shows that small scale use of the chariot [in China] began around 1200 BCE in the late Shang dynasty. This corroborates the material spread of the invention from the Eurasian Grass-Steppe to the West, by Proto-Indo-Europeans (likely the Tocharians) who similarly have borne horse, agricultural, and honey making technologies through the Tarim Basin into China.

Proto-Indo-European speakers and Tocharians are not the same people. Proto-Indo-European had ceased to exist centuries before eastern Indo-European speakers might have introduced the chariot to China.

5. Today I found a Wiktionary entry for


<ro.mā> = 'Roman-GEN god principal' = 'principal god of the Romans' = 'Jupiter'

Is that a real Dzongkha expression? It looks like a nonce attempt to explain who Jupiter was rather than a name for Jupiter. I appreciate how Wiktionary contains entries for items absent from traditional dictionaries, but I draw the line at transparent phrases. And a Google search for that particular phrase only leads to that Wiktionary entry. (I'm not counting partial matches.)

It is strange that a Dzongkha description of Jupiter has an entry but that Tibetan ཕུ་བོ <> 'older brother' does not.

Oddly STEDT doesn't have that Tibetan word either.

6. Today while copying the 契丹小字研究 Qidan xiaozi yanjiu (Research on the Khitan Small Script)  hand copy of the epitaph for Empress 仁懿 Renyi (?-1076) of the Khitan Empire, I encountered the first Khitan small script block I've ever seen with three components in a row:

<244.172.339> <s.ugh.i> (12.1)

The index of blocks has a more conventional two-on-one form:

<244.172/339> <s.ugh/i>

(I use </> to indicate row breaks within a block.)

Which form is the one on the inscription?

I don't know what the word means. It was a hapax legomenon as of 1985. Have more attestations been found since?

7. Today I've been puzzled by the Sino-Tibetan word for 'horn':

I wish I knew the Pyu word for 'horn'.

Nathan reconstructs *əw for this correspondence:

OC *o : WT u : OB uiv·

I think OB uiv· was [əw], a direct preservation of Sino-Tibetan *əw that became modern ui [o] via *ow.

I reconstruct a root *rəw. That much seems certain. The rest, however ...

8. Nathan Hill (2019: 227) thinks dr- in Tibetan drug 'six' is from *kr- (cf. Old Burmese khrok· < *krəwk 'six'). *kr- > dr- would be a double assimilation in terms of place and voicing.

But Pyu has tr- (tru 'six') and Tangut chh- in 𗤁 3200 1chhiw3 'six' may be from *Ktr- (cf. rGyalrong kətr-forms like lCogtse kətɽok; Jacques [2004: 296] reconstructed Proto-rGyalrong *kə-tɽɔk.  Moreover, *kr- became khr- (Hill 2019: 221) in Tibetan khrab 'armor', so why would it become dr- in 'six'?

Might the Tibetan, Burmese, and Pyu initials all be simplifications of an earlier complex cluster *ktr-?

9. Nathan Hill (2019: 229) proposes that Written Burmese kuiy· [ko] 'body' may be a borrowing from Pali kāya- 'body' rather than a Sino-Tibetan word cognate to Tibetan sku and Old Chinese 軀 *CIkʰo (*HIko with a minor syllable initial conditioning aspiration?). But Luce (1981) lists kuiv· [kəw] as the Old Burmese spelling. Perhaps the Pali-like silent -y· in the modern spelling was an addition motivated by folk etymology. However, regarding kuiv· as native raises another unresolved issue: k- should be from *g- which doesn't match the voiceless stops in Tibetan and Chinese.

10. Burmese has [tɕ tɕʰ dʑ] but [ʃ] (not [ɕ]). What is the reason for this asymmetry?

11. The rGyalrongic Languages Database has two varieties called "Pho sul" in nearby locations: 蒲西 Puxi (a Mandarinization of Pho sul?) and a village called 斯遥吾 Siyaowu in 蒲西 Puxi. Wikipedia says there are "Phosul" varieties of both Khroskyabs and Horpa. Is there one Phosul language that has been classified two different ways or are there two Phosul languages? Jackson Sun (2000: 214) explains:

Puxi is one of the three townships in southern Rangtang County in which Shangzhai [Horpa] speakers dwell [...] Of the five villages within Puxi Township, Shangzhai is used in Dayili Village and those hamlets of Puxi and Xiaoyili Villages north of the Rangtang River, abutting Lavrung [Khroskyabs]-speaking hamlets across the river in the same villages. The latter language is distributed in Siyaowu Village also [...]

If I understand that passage correctly, a variety of Shangzhai Horpa is spoken in Puxi Village, and a variety of Khroskyabs is spoken in Siyaowu Village.

Horpa and Khroskaybs have different words for 'sleep'. Let's compare the "Pho sul" words for 'person' from the rGyalrongic Languages Database with some data from Jackson Sun (2018: 4) (sortable version at Wikipedia):

Puxi Village Phosul may be Horpa, as it has vdz- like Horpa languages, whereas Siyaowu Phosul may be Khroskyabs, as it has a palatal after v like Hbrongrdzong Khroskyabs.

The Khroskyabs and Horpa words for 'person' may be cognate to Tangut 𘓐 2541 2dzwo4 < *PIndzojH 'person'.

Jacques (2014: 206) only proposes pre-Tangut *-jok (= my *-I-ok) as a source of -jo (= my -o3 and -o4), but I wonder if pre-Tangut *-I-oj (equivalent to a nonexistent *-joj in Jacques' system) might be another source. Puxi Village Phosul and Stau -i seem like unlikely reflexes of an earlier *-ok.

12. Is Lai Yunfan's site the only website written in Wobzi? YELLOW PIG 12/28

songgiyan uliya aniya

juwa juwe biya orin jakun inenggi

'yellow pig year, ten two month, twenty eight day'

1. Last night I got the copy of William C. Hannas' The Writing on the Wall: How Asian Orthography Curbs Creativity (2003) that I ordered on Yellow Pig 12/6. On Yellow Pig 12/1, I wrote my initial impressions based on a preview on my Kindle. I'm rereading the preview now. I'm not used to reading on paper anymore.

2. Last night I found Andrew Hsiu's Sino-Tibetan Branches Project for its Proto-rGyalrong reconstruction. Why does rGyalrong matter?

Proto-rGyalrong is an elegant marvel. It may be one of the most conservative reconstructable Sino-Tibetan meso-languages. It is clear that a reconstruction of Proto-Sino-Tibetan would definitely need to take Proto-rGyalrong into account, since Proto-Sino-Tibetan morphology, phonology, and lexicon would have looked very similar to those of Proto-rGyalrong. In order to understand how reflexes of highly eroded eastern Sino-Tibetan languages had gotten to where they are from Proto-Sino-Tibetan, it is crucial to consider Proto-rGyalrong.

Is rGyalrong the Sanskrit or Greek of Sino-Tibetan?

Hsiu's Proto-rGyalrong *k.tek 'one' is very much like my pre-Tangut *kVtek or *kAtik (formerly *kʌ-tek or *kʌ-tik in 2012 and *CV-tek in 2011).

3. Hsiu also has a page on Pyu. His 2018 Excel file incorporates data from my 2016  SEALS presentation on Pyu numerals. A paper on Pyu language history is on my to-do list.

4. I just found Hsiu's page illustrating his wave model of Sino-Tibetan. He places Pyu in his fourth wave, but I am hesitant to commit to such a detail.

5. I want to figure out where Pyu is in the comparative framework that Nathan Hill established in his landmark book The Historical Phonology of Tibetan, Burmese, and Chinese (2019).

Nathan wrote on p. 156,

Many features of [Old Chinese] loans into Vietic are not predictable on the basis of the Old Chinese source word in Baxter and Sagart's reconstruction; for example, Rục has at least -ə-, -à-, -a-, and -u- available as the vowel of the minor syllable (kəcáy 'paper', kàraŋ 'bright sunshine', kadɔːk 'nape of the neck', kumúa 'dance'), but these different vowels are not predictable on the basis of the Old Chinese forms (紙 tsyeX < *k.teʔ 'paper', 朗 langX < *k.rˤaŋʔ 'bright', 脰 duwH < *kə.dˤok-s 'neck', 舞 mjuX < *k.m(r)aʔ 'dance').

I first saw those comparisons six years ago, but it didn't occur to me until last night to compare Ruc minor syllable vowels with the minor syllable vowels that I would reconstruct for Old Chinese if I didn't know about Ruc:

Early Old Chinese
Middle Old Chinese
Late Old Chinese
Middle Chinese
height match?
*CIteʔ *CItieʔ *tɕieʔ *tɕḭe kəcáy ?
*raŋʔ *raŋʔ *laŋʔ *la̰ŋ kàraŋ ?
*CAdoks *CAdoks *doh
kadɔːk yes
*CImaʔ *CImɨaʔ *mɨaʔ *mṵo kumúa yes

Notes on each word:

'paper': I know of no Chinese-internal evidence for the identity of *C-. Baxter and Sagart reconstruct *k- on the basis of Ruc.

A high vowel *I is needed to account for the Middle Chinese vocalism and the palatalization of *t. I don't know whether *I was *[i], *[ɨ], or *[u]. Ruc ə would seem to rule out *[u]. I don't know if Ruc has i in minor syllables; if it doesn't, Ruc ə might correspond to a Chinese *[i] or *[ɨ].

'bright': Baxter and Sagart reconstruct *k- on the basis of Ruc. I am unaware of any Chinese-internal evidence for a minor syllable. Early and Middle Old Chinese *CACa-sequences and *Ca-sequences can have the same reflexes in Late Old Chinese, so it's possible that Late Old Chinese *laŋʔ is from an earlier, Ruc-like *kAraŋʔ.

I cannot explain why Ruc kàraŋ doesn't have an acute tone corresponding to Chinese *-ʔ. Cf. the tone/*-ʔ correspondences in 'paper' and 'dance'.

'neck': Lenition in 建陽 Jianyang lo was condiotnied by the vowel of a lost presyllable:

*CVd- > *CVl- > l-

I know of no Chinese-internal evidence for the identity of *C-. Baxter and Sagart reconstruct *k- on the basis of Ruc. *V had to be low *A since high *I would have conditioned the palatalization of *d.

'dance': I know of no Chinese-internal evidence for the identity of *C-. Baxter and Sagart reconstruct *k- on the basis of Ruc.

A high vowel *I is needed to account for the Middle Chinese vocalism. *CAmaʔ or *maʔ would have become Middle Chinese *mo̰, not *mṵo with a high vowel. Ruc enables me to identify *I as *u. I think Early and Middle Chinese had at least two kinds of high vowels in minor syllables: *i and *u. It is usually not possible to determine whether a minor syllable's high vowel was front or back, but this is a rare exception.

Another kind of rare exception involves *i before *a:

*CiCa > *Cia

*CuCa (and *CɨCa?) > *Cɨa

Contrast these two words for 'chariot' which are both written 車:

Early Old Chinese *tiqʰ(l)a > Late Old Chinese *tɕʰia > Mandarin chē

Baxter and Sagart (2014: 157) reconstruct *t.K- for cases of velars and uvulars palatalizing before nonfront vowels. But maybe such cases involved *CiK-.

Early Old Chinese *Cuq(l)a > Late Old Chinese *kɨa > Mandarin

Could *C- have been *t-?

See Baxter and Sagart (2014: 158) for the reasoning behind reconstructing a uvular.

The *-qʰ- ~ *-q- alternation is unexplained. If *Ci- were *ki-, perhaps *kik- > *xtɕ- > *tɕʰ-. *k-conditioned aspiration is reconstructed for Korean, and I have reconstructed it for Tangut as well.

There is no Chinese-internal evidence for a medial liquid, but if there was one, I think it would have to be *-l- which disappeared without a trace. On the other hand, Baxter and Sagart (2014) see *-r- as a possibility, but I think an *-r- would have conditioned retroflexion: *tiqʰr- would have become *tʂʰ- rather than *tɕʰ- in Late Old Chinese.

I recall that Pulleyblank thought this word might be a loan from Indo-European  (cf. Proto-Indo-European *kʷékʷlos 'wheel', Tocharian B kokale 'cart, wagon', Sanskrit cakra- 'wheel'). But Baxter and Sagart's *t- doesn't match *kʷ-, though it might be the closest approximation of a foreign palatal *c- absent from Old Chinese.

Another possibility was that the Chinese forms were something like *kiqʰla and *ku- in *kuqla-. But why would the Chinese borrow a foreign labiovelar as a uvular if they already had labiovelar *kʷ in their own language?

Here are revised reconstructions incorporating features from Ruc:

Early Old Chinese
Middle Old Chinese
Late Old Chinese
height match?
*kIteʔ *kItieʔ *tɕieʔ kəcáy ?
*kAraŋʔ *kAraŋʔ *laŋʔ kàraŋ yes
*kAdoks *kAdoks *doh
kadɔːk yes
*kumaʔ *kumɨaʔ *muaʔ kumúa yes

The Ruc forms seem to have been borrowed between the Middle and Late Old Chinese stages. They have a mix of old and new features:

6. While using BabelMap to type Pho sul βjot 'eight' last night, I discovered the character Ꞵ (U+A7B4 LATIN CAPITAL LETTER BETA). What languages are written with it? lists none.

7. Today I discovered that both my 2012 sketch of pre-Tangut and Sofronov's 2012 reconstruction of Tangut rhymes are online at YELLOW PIG 12/27

songgiyan uliya aniya

juwa juwe biya orin nadan inenggi

'yellow pig year, ten two month, twenty seven day'

1. Thoughts today while typing and handwriting the Sino-Jurchen vocabulary of the Ming dynasty bureau of translators:

1a. Jin (1984: 159) identified the phonogram

<gai> [kaj]

as being derived from Chinese 可 when it was read *ka (*kʰa to be more precise). But if the Jurchen script was invented c. 1119, long after 可 came to be read *kʰo in northern Chinese, how would its creator(s) know of the old reading *kʰa? This archaism hints at older roots for the Jurchen script. In Late Old Chinese, 可 was read *kʰaiʔ. Perhaps the origin of <gai> goes back to a pre-Jurchen script in which 可 or a derivative was used to write [kaj]. The trouble is that the earliest (?) of the northern scripts, the lost Serbi script, is from the 5th century AD after *-ai shifted to *-a in Chinese.

1b. Why does the Jurchen phonogram


have what looks like

<BRUSH> pi (< graph and word from Chinese 筆)

on the right side? And what is the function of the element resembling Chinese 亻 <PERSON> on the left?

1c. Jurchen aliku 'platter' was miswritten as


as if it were alin 'mountain'. Presumably the unknown correct spelling has two characters <ali.ku>. But which of these <ku> is the proper <ku>?

In theory the unknown character could even be a fifth <ku> that has not yet been discovered. There is no guarantee that all Jurchen large script characters have been found. (Almost none of the Jurchen small script characters have been found except for these six in two blocks:

. Assuming the Jurchen small script had roughly the same number of characters as the Khitan small script, I presume there were a few hundred Jurchen small script characters.)

1d. The Jurchen phonogram


resembles Chinese THOUSAND>, so for a second I thought it might have originated as a graph for a me-something word for 'thousand' resembling  Jurchen minggan 'thousand' in some language in Parhae. But then I thought it might be a simplification of the right side of Liao or Jin Chinese 脉 *mai.

2. Today I was reading about Mary Callahan Erdoes. How is Erdoes pronounced in American Emglish? It looks like an Americanization of Hungarian Erdős [ɛrdøːʃ] (as in the Erdős number). I associate oe with German ö and not Hungarian ő, but I just learned that óe with an acute accent is a historical spelling of ő in names. Was the name spelled with an acute accent as Erdóes in Hungary? I only found a single Google result for Erdóes.

3. The late Paul Erdős

would offer payments for solutions to unresolved problems. These ranged from $25 for problems that he felt were just out of the reach of the current mathematical thinking (both his and others), to several thousand dollars for problems that were both difficult to attack and mathematically significant. There are thought to be at least a thousand remaining unsolved problems, though there is no official or comprehensive list. The offers remain active despite Erdős's death[.]

What would a list of unsolved linguistic problems be like, and how much would each problem be worth? Naturally I first think of Pyu and TJK (Tangut/Jurchen/Khitan), but other possibilities include the Voynich manuscript, Linear A, rongorongo, etc.

4. Erdős had his own personal vocabulary.

5. Timothy Gowers in a review of a book by Terence Tao:

It has been said that David Hilbert was the last person to know all of mathematics

Is it possible to 'know all of linguistics'? I vote no.

6. Speaking of knowing, Hilbert's epitaph is a response to ignoramus et ignorabimus 'we do not know and we shall not know':

Wir müssen wissen. 'We must know.'

Wir werden wissen. 'We will know.'

I wish I could say we will know how the TJK scripts work. I want to believe there is some reasoning that has eluded us. But what I want and believe is not necessarily what is real.

Hilbert was speaking of mathematics. Here's a quotation in a similar vein about decipherment:

Any possible system made by a man can be solved or cracked by a man.

- Yuri Knorozov, 1998

I didn't learn of him until the following year after his death. It's been over twenty years since I read Breaking the Maya Code, a gift from my Russian language professor Prof. James Brown. I should read the new edition I got a few years ago.

7. I remember the dark hour when Russian might have been eliminated from the University of Hawaii (despite Russia's Pacific presence!):

But James Brown, a professor of Russian, said just because a subject is not popular now does not mean it is not needed.

"There's something to be said about providing students with what they want, but it can become ludicrous to the point where you provide only that"; Brown said. "You end up having just one flavor of things."

Obviously I wanted (and took) Russian.

Today Russian is still around at UH. But where is Prof. Brown?

8. In the final episode of Reba,v the title character coined luffle (sp.?) from loving couple. What's interesting is the [f] in the middle: it's a fricative like [v] but voiceless like [p]. YELLOW PIG 12/26

songgiyan uliya aniya

juwa juwe biya orin ninggu inenggi

'yellow pig year, ten two month, twenty six day'

1. I've been playing 宇宙からのメッセージ・銀河大戦 Uchū kara no messēji: ginga taisen (Message from Space: Galactic Wars, 1978-79) in the background while working. A name in the ending credits caught my eye: 高梨 曻 Takanashi ?. I had never seen the third character 曻 before and couldn't find it anywhere until today when I figured out that its radical according to Unicode was 曰 <SAY> rather than 日 <SUN> and was finally able to find it in Andrew West's BabelMap.

I guessed that 曻 was an alternate spelling of the common name 昇 Noboru 'rise', and Wiktionary confirms my guess. 曻 is a Japanese-only character with the same readings as 昇: Sino-Japanese shō and native Japanese noboru.

昇 is a semantic-phonetic compound <SUN.stəŋ>: the phonetic 升 (Old Chinese *stəŋ) is a drawing of a container (in Old Chinese, 'container' and 'to rise' were homophones both written as 升), and 日 <SUN> (something that rises) was added as a disambiguator.

The top element of 曻 should also be 日 <SUN>, but the character is in the 曰 <SAY> block of characters in Unicode, and I think that's a mistake.

The bottom element of 曻 is 舛 <OPPOSE> which sounded nothing like 升 in Old Chinese:

But 舛 and 升 are graphically similar, so in Japanese, 舛 (also with an optional 木 <WOOD> radical: 桝) came to be an alternative spelling for the native word masu (a unit of measurement) written 升. So 舛 came to replace 升 in 曻. And 舛 <OPPOSE> with its original meaning is so rare in Japanese that few would perceive any negative connotations in 曻.

Shpika stats (plus 漢検 Kanken levels added 1.21.19:01):

1 (!)

曰 <SAY> in modern Japanese is almost wholly in the archaic expression 曰く <SAY.ku> iwaku 'sayeth'.

(1.21.19:03: 曰 is at the highest Kanken level, which makes no sense given its relative frequency and the fact that every high school student in Japan encounters it during the required study of Literary Chinese.)

(1.21.20:11: I would expect 曰 to be a level pre-1 character. Only characters required in school can be at levels 2 or lower. Pre-1 characters are relatively common but not required, whereas level 1 characters are rare. 曰 is encountered in school but is not on the must-learn jōyō kanji list.)

I'm surprised 桝 is more common than 舛 which I've encountered in the name 舛田 Masuda (a name I learned from 舛田利雄  Masuda Toshio on the staff of Space Battleship Yamato). I've never seen 桝 before. Wiktionary says some strange things: that 桝 is a postwar simplified form of 枡 (but 桝 has more strokes!) and means 'measuring box' in Chinese (even though I thought 舛 = 升 is a Japanese-only equation).

I just learned that 升 has a new modern reading: チート chīto 'cheat', based on the coincidental similarity of the katakana to 升. There is no graphic relationship between the katakana and 升:

2. New words I encountered today:

2a. Redology (紅學 - not Erythrology?)

(often humorous) added to an ordinary English word to create a name for a (possibly non-existent) field of study.

2b. logy, the sister of ism (with an unrelated homograph)

2c. pseudepigrapha (not pseudo- ... or ... -ia!)

2d. anapodoton

2e. anacoluthon

I've known of all of those things but didn't have names for them until now.

3. I did not, however, know of the Codex Amiatinus until today.

The Codex Amiatinus is the earliest surviving complete manuscript of the Latin Vulgate version of the Christian Bible.

Although it is named after the Italian mountain where it was found, it

was produced around 700 A.D in the north-east of England, at the Benedictine monastery of Monkwearmouth–Jarrow in the Anglo-Saxon Kingdom of Northumbria and taken to Italy as a gift for Pope Gregory II in 716.

More new words (in bold):

A little space is often left between words, but the writing is in general continuous. The text is divided into sections, which in the Gospels correspond closely to the Ammonian Sections. There are no marks of punctuation, but the skilled reader was guided into the sense by stichometric, or verse-like, arrangement into cola and commata, which correspond roughly to the principal and dependent clauses of a sentence.

Today, colons and commas have different referents (and regularized English plurals).

4. I never heard of the acronym TRO until today. YELLOW PIG 12/25

songgiyan uliya aniya

juwa juwe biya orin shunja inenggi

'yellow pig year, ten two month, twenty five day'

1. For years, I thought

<ca> (a transcription of Liao Chinese 察 *cha in line 1 of the 耶律昌允 Yelü Changyun epitaph [1062])

was unique to the Khitan large script, but today I learned that it looks like a Tang dynasty (i.e., pre-Khitan Empire) variant of the Chinese character 司. There is even a variant of the derivative character 詞 <SPEECH.司> with a <ca> lookalike on the right side. But ... 司 was pronounced *sï in Liao Chinese. Not very much like ca. So why is a lookalike of a variant of 司 a phonogram for ca? The odds of any Khitan large script character being pronounced approximately like its Chinese lookalike are low, though not zero, as there are some Khitan large script characters that have Liao Chinese-like readings (minus tones, of course): 太 tai (but also dai!),tên,shui, ngu,cï, 皇帝 hongdi,ging,sheu,ong, etc. But one must be on guard, because many other Khitan large script characters are false friends: e.g.,

2. Another variant of 詞 <SPECH.司> is 𧥝 <SPEECH.𠃌> with the phonetic reduced to 𠃌. The current simplified character for standard Mandarin 詞 'word' is 词, but that could be reduced even further to three strokes: ⿹𠃌讠.

3. Wiktionary regards 𠃌 as a component in the Korean phonogram 㔖 <ka.k> kak, but the bottom component is in fact the hangul letter ㄱ <k> which is never written with a hook. The top part is the Chinese character 加 <ADD> which is pronounced ka in Korean.

4. Peter Golden's An Introduction to the History of the Turkic Peoples (1992) brings up the ever-vexing problem of consistency in writing different languages in the Roman alphabet: e.g., <ł> represents both Polish [w] and Armenian ղ [ʁ]. Historicallyղ was a velar lateral [ɫ] (which is what I think Tangut /l/ was).

Classical Armenian had a distinction between velar ղ /ɫ/ and 'regular' լ /l/ absent from Proto-Indo-European? How did that develop? Wikipedia's article on 'Proto-Armenian' (more like 'pre-Armenian'; cf. my 'pre-Tangut') doesn't have the answer yet. But it listed a few words with both liquids:

Velar /ɫ/-words:


I don't see any pattern that would enable me to predict when foreign l was borrowed as /ɫ/ or /l/ in Armenian.

5. I've never seen a French name written in Armenian before: Րեմի Վիրդա. YELLOW PIG 12/24

songgiyan uliya aniya

juwa juwe biya orin duin inenggi

'yellow pig year, ten two month, twenty four day'

1. Years ago I noticed that <ROT> had an irregular Sino-Vietnamese reading: hủ instead of hửu. But I never gave any further thought to that until this week.

朽 belongs to the Early Middle Chinese *-u > Late Middle Chinese *-ɨw rhyme category. Most Sino-Vietnamese readings are borrowed from a southern Late Middle Chinese dialect, so that rhyme category normally corresponds to Sino-Vietnamese -ưu. 舊 <OLD> was borrowed at least twice, first as what became cũ¹, and then as what became cựu. cũ is from southern Early Middle Chinese *gṳ, whereas cựu is from southern Late Middle Chinese *kɨ́w.

hủ has the same rhyme (tone aside) as cũ, so I regard both as Early Middle Chinese borrowings. To be more precise, hủ seems to be from a stratum of borrowing whose tones match the pattern found in Late Middle Chinese borrowings, whereas cũ is from an even earlier stratum with a different pattern of 'tonal' (strictly speaking, registral) borrowing. See the table below:

Vietnamese renderings of the ́*-u > *-ɨw rhyme (not all possibilities necessarily attested)

Chinese initial
Chinese 'tone' class
stratum 1
stratum 2
stratum 3
-ủ (e.g., 朽 hủ) -ửu
-ụ ̣-ũ -ữu
*voiceless departing -ủ -ú -ứu
*voiced -ũ (e.g., cũ)

An even more surprising reading of 朽 is Cantonese nau2 in addition to the regular jau2. 朽 has a variety of different initial types across the Sinitic language family. I have omitted tones. (In hindsight I shouldn't have because tones would help refine my reconstruction of the history of the initials.)

initial type
aspirated velar stop kʰa < *-æw?
Southern Min 潮州 Chaozhou in Bangkok
Eastern Min
福安 Fu'an
velar fricative xiu
Eastern Min 福州 Fuzhou
glottal fricative
Southern Min
潮州 Chaozhou
Hakka 城廂 Chengxiang
ʃiɑu Central Min
三明 Sanming
Central Min
沙縣 Shaxian
玉林興業 Yulin Xingye
橫縣 Hengxian
富寧 Funing
仁化 Renhua
龍州 Longzhou
glottal stop
Guangxi Min
平南 Pingnan
永福 Yongfu
田東 Tiandong
寧明 Ningming
Min dialect island
中山 Zhongshan
Hakka dialect island
懷集 Huaiji
prenasalized stop
新會 Xinhui

Here's an attempt to make sense of that diversity of initials. Some of it is probably wrong because I don't have the time to work out the history of all the varieties involved.

To wrap this up, in theory 㽲 <>, 㱙 <>, and 殠 <> should have the same reading as 朽 in Vietnamese (or any other language that inherited or borrowed those Chinese morphemes). In theory their Sino-Vietnamese readings should be

But in fact the actual readings according to Mineya (1972: 67) are hữu for 㽲 and hứu for 㱙殠!

hữu has stratum 3 vocalism and a stratum 3 tone pointing to a *voiced initial.

hứu has stratum 3 vocalism. The tone could be stratum 3 if it's from a Chinese departing tone.

¹Although Vietnamese cũ 'old' is a borrowing of Chinese 舊 <OLD>, it is not written as 舊 <OLD> according to which lists nineteen different spellings. My HTML editor (KompoZer) doesn't fully support Unicode, so I can only display nine spellings:


Approximations of the remaining ten:

Those spellings have several common components:

2. For some reason I thought turmeric was ˟tulmeric until I learned the correct spelling today. Reminds me of the r > l dissimilation in Latin peregrinus > Old French pelegrin (> English pilgrim).

In Japanese, 'tulmeric' is ukon with five interesting spellings:

3. Speaking of 郁 ... last Sunday I learned that anime company founder 布川ゆうじ Nunokawa Yūji was born 布川郁司 Nunokawa Yūji. I initially misread 郁 as Ikuji because 郁 is normally iku. However, in that instance it seems to have been read by analogy with its phonetic 有 yū. The two were of course closer in Old Chinese pronunciation: YELLOW PIG 12/23

songgiyan uliya aniya

juwa juwe biya orin ilan inenggi

'yellow pig year, ten two month, twenty three day'

1. How did I not figure out that Alaric was 'all-ruler'?

2. Is Alaric cognate to Aldrich? quotes the Dictionary of American Family Names (2013):

English: from a Middle English personal name, Ailric, Alrich, Aldrich, etc. (Many different forms are recorded.) It represents the coalescence of at least two Old English personal names, Ælfric 'elf ruler’' and Æ{dh}elric 'noble ruler'.

Did -lfr- in Ælfric really become -lr-? I assume the -d- was inserted in Middle English. Is there a single word for such transitional stops: e.g., the [t] in modern English prince [pʰɹɪnts]? (It's been maybe thirty or more years since I learned about that pronunciation in an early linguistics class.)

If forced to disambiguate between prince and prints, I might say [pʰɹɪ̃s] and [pʰɹɪntʰs]. The noninitial [tʰ] is, of course, deliberate and artificial.

3. Scott DeLancey's "Creolization in the Divergence of Tibeto-Burman" (forthcoming) distinguishes between two kinds of Tibeto-Burman languages: archaic and creoloid. I think Pyu fits the creoloid profile, as its grammar is "very reminscent of the minimal grammar which we find in creole languages" (p. 1). What does that suggest about the prehistory of Pyu? That this scenario applies to it (pp. 4-5; emphasis mine):

"[...] I am not proposing a pidgin stage in the development of any of the languages discussed here, these are not true creoles, in the sense of McWhorter 2001. [... C]ertain creole-like patterns can develop through intense language contact involving suboptimal transmission. What I am suggesting is that Proto-Bodo-Garo, Proto-Lolo-Burmese, Proto-Bodish, and probably others such as Proto-Tani, took on their grammatical shape in circumstances in which they were widely spoken by non-native speakers, as trade languages, languages of administration, soldier’s argot, or by mixed populations.

DeLancey is speaking of Lolo-Burmese on p. 15, but his words may also apply to Pyu:

an extended historical phase involving urban centers and kingdoms

The relative uniformity of Pyu in inscriptions from different locations has suggested to me that Pyu may have been a standardized literary language that was not necessarily spoken by everyone in the various Pyu cities. It may be too simplistic to regard the populations of those cities as homogeneously 'Pyu' in a linguistic, much less ethnic, sense. The inscriptions could have been in a lingua franca spoken natively only by an elite, possibly of northern origin. (The sesquisyllabic, superficially Mon-like structure of Pyu might suggest an Austroasiatic  [but not specifically Mon]-speaking component in the substratum population.)

It doesn't help that nobody really knows the autonym of the Pyu; neither the Burmese exonym Pyu nor the Mon exonym Tircul have been found in Pyu inscriptions, though both are attested in Chinese historical records.

4. Bob Hudson proposes five Pyu dynasties based on archaeological evidence:

I. Big Club Man Dynasty, 1st-3rd c. AD

II. Vikrama Dynasty, 4th-5th c. AD

III. World Pillar Dynasty, 6th c. AD

IV. Prabhuvarma, Prabhudevi, Khin Ba and the square-based stupas dynasty, 7th c. AD

V. Bawbawgyi builders and inscribed bricks dynasty, 8th c. AD

Bob and I visited the Bawbawgyi four years ago.

I wish I could give Bob more linguistic evidence to back up his hypothesis.

Toward the end he quotes our colleague Arlo Griffiths (who also went to the Bawbawgyi with us):

Every time I think I see a pattern, I find a new specimen which seems to contradict it.

That's been my experience in different fields. I think if I could go back in time and learn the answers to all the mysteries of Pyu, Tangut, Jurchen, and Khitan, I'd observe patterns that fit all the evidence - both the evidence I found for my proposed patterns and their counterevidence.

5. I tend to use AD rather than CE on this site because 'common' could be interpreted as 'universal', which CE is not. But of course the D of AD is also objectionable. I wish there were another alternative, an English equivalent of 西曆 seireki 'western calendar'. Which has a poor initial combination in English. Western Era spells WE which implies 'ours'. Another turnoff.

6. It's taken me years to notice that Khitan ei 'to have, exist' might be the source of the converb -i which "indicates the order in which the action happened: 'then, after that' (Kane 2009: 149):

V₁-i ... V₂

'after V₁, then V₂'

Is the converb from the full verb: 'that action having existed, then ...'? I just realized the construction above could be translated as

'having V₁, ... V₂'

and ei can be translated as 'to have'.

Examples of the converb from Kane (2009: 149-150; CV = converb):

7. The bai- of baidgha- 'bury' is written as


in the Khitan small script. Today it occurred to me that bai sounds like be (< *Npai), a Japanese reading of the 061-like kanji 可 <ABLE> used to write the -be- of the debitive suffix -ube- from Old Japanese umbəy 'indeed'. So dare I say that 061 was influenced by a peninsular logogram for a Para-Japonic cognate of Japanese -ube-? No, because -ube- "represents a purely Japanese innovation based on grammaticalization" (Vovin 2008: 880) which occurred in the Old Japanese period long after Japonic split from its peninsular relatives.

8. Why was Lao ສິ້ນ <si2n> [sin˧˩] 'skirt' borrowed into English as sinh? Is nh originally a French romanization device to indicate [n] (as opposed to simple n which might be misinterpreted as a mark of nasalization on the preceding vowel)? I keep thinking -nh in Lao romanization is a palatal nasal [ɲ] as in Vietnamese, even though Lao has no words ending in [ɲ].

It seems that the Khmer equivalent of that romanization device is nn since nh was already used to represent [ɲ] as in Vietnamese: e.g., Sinn Sisamouth for ស៊ីន ស៊ីសាមុត <s'īna s'īsāmuta> [sɨn siːsaːmut].

th appears to be a romanization device for nonsilent final [t]. I used to think that th was a romanization of final <tha> [t] in Khmer, but that wouldn't explain its use in the romanization of Lao in which <tha> is not used in spelling: e.g., Bounnhang Vorachith for ບຸນຍັງ ວໍລະຈິດ <punyaṅa vŏlaḥcita> [bun˩ɲaŋ˥ wɔː˥la˧tɕit˥].

This chart of Lao romanization reminds me that -ne is another way to write final [n], presumably again to avoid a simple n which might be misinterpreted as a mark of nasalization on the preceding vowel.

9. Wikipedia uses a symbol for the Lao falling tone that I've never seen before: U+1DC6 COMBINING MACRON-GRAVE: -᷆ (the hyphen is a placeholder).

10. I've been using the Leipzig Glossing Rules for years now in my publications (though I can't claim I've been consistently applying them on my blog). I just discovered this Wikipedia list of glossing abbreviations. YELLOW PIG 12/22

songgiyan uliya aniya

juwa juwe biya orin juwe inenggi

'yellow pig year, ten two month, twenty two day'

1. I rediscovered Peter Golden's An Introduction to the History of the Turkic Peoples (1992) to look up Burut in the index. I didn't find it, but I did find lots of references to the Tangut, Jurchen, and Khitan. I should read the whole book instead of just consulting it.

2. I've done more thinking about these 'irregular' Sino-Japanese readings:

The morphemes represented by those characters all belong to the Old Chinese *-o rhyme class. In Late Old Chinese, the *-o class split in two depending on the vowel (if any) preceding it:

*o bent in two ways: to *-ou if not preceded by any vowel or if preceded by a low vowel *A or to *-uo if preceded by a high vowel *I.

Both subtypes are represented in the readings for the characters above:

Suppose Late Old Chinese *-ou and *-uo were borrowed into pre-Old Japanese as *-ou and *-uo. One of the defining traits of Old Japanese is *o raising to *u. So *suo, *ŋguo, and *ŋgou became suu, guu, and guu. And the 'regular' reading 愚 gu is a postraising borrowing from Late Middle Chinese in which *-uo became *-u.

That's how I see things now, which is a lot simpler than what I originally had in mind last week.

1.21.19:20: APPENDIX: Shpika stats.

Early Old Chinese
Late Old Chinese

*sIroʔ/h *ʂuoʔ/h

*CIŋo *ŋuo
*CIŋo *ŋuo
*ŋoʔ/s *ŋouʔ/h

*CIŋos *ŋuoh

*sIroʔ/h *ʂuoʔ/ 1422

*CIŋos *ŋuoh

*CAŋoʔ *ŋouʔ 4861

*CIŋo *ŋuo

*ŋoʔ *ŋouʔ -

Shpika stats only apply to modern Japanese, so they are not reliable guides to kanji frequency in the past. Nonetheless, they can serve a starting point for hypotheses.

Perhaps originally there could have been just one very early, pre-*o-raising borrowing *ŋgou > guu for a very common 禺-kanji (e.g., 遇 <MEET>?), and that reading spread by analogy to less frequent 禺-kanji, displacing *ŋgu > gu readings borrowed later. But the gu reading for the common kanji 愚 <FOOLISH> remained unchanged.

Shpika has separate entries for 數 <NUMBER> and 数, the modern standard form of 數. 數 is so common that analogy cannot be a factor in its reading. I think the morpheme was borrowed very early as *suo prior to *o-raising.

3. Best Hawaiian language news I've heard in a long time: Hawaii News Now reported that young native speakers of Hawaiian on Niihau have published professional-looking books in their own language. A pity I can't find a link online.

Niihau Hawaiian is to standard Hawaiian what Sibe is to Manchu: a variety preserved on the periphery while its prestigious sister is endangered.

4. Tonight I was surprised to hear Lev Parnas speak perfect American English. I had assumed he had immigrated as an adult, but he actually arrived in the US at the age of three in the 1970s, well before the independence of Ukraine.

I was also surprised to see that he didn't have his own Wikipedia page - and that he

served as a translator [I think 'interpreter' is intended] for a legal case involving Dmytro Firtash, one of Ukraine's wealthiest oligarchs with self-admitted mob connections [...] However, recordings of Parnas speaking Ukrainian and Russian evidence that he has not retained total fluency in these two languages since coming to the United States.

I would have expected "one of Ukraine's wealthiest oligarchs" to hire a professional interpreter.

5. One last surprise: learning that the Ukrainian equivalent of Russian Дмитрий Dmitrij is Дмитро Dmytro with final stress. I would have predicted ˟Dmytryj with initial stress via mechanical conversion from Russian (dangerous, I know). Does the Ukrainian final -o reflect the  o of Δημήτριος <Dēmḗtrios>?

I would have predicted that the initial cluster Дм- Dm- came from an even earlier Дьм- Dĭm-, but Wikipedia says Дъм- Dŭm- also existed. I wouldn't have expected Greek η <ē> ([i] by the time the name was borrowed into Russian) to be Russified as ъ ŭ. YELLOW PIG 12/21

songgiyan uliya aniya

juwa juwe biya orin emu inenggi

'yellow pig year, ten two month, twenty one day'

1. Last night I learned of this year's new 'Super Sentai' show, 魔進戦隊キラメイジャー Mashin sentai Kirameijā (Devil Advance Task Force Kiramager).

The super-vehicles in the show are called 魔進 mashin 'devil advance', a pun on 'machine'. 魔 ma implies 'magical'. Maybe 'magiforth' would be a better English rendering.

Kirameijā is a blend of 煌めく kirameku 'sparkle', mage, and ranger. And the last part sounds like major, perhaps unintentionally.

2. Tonight I learned of the late Betty Pat Gatliff's SKULLpture Lab. How did that pun never occur to me before? It's occurred to others.

Imagine English written in a Chinese-style script with a pictogram of a skull recycled to write the scul- of sculpture. Perhaps the stylized spelling ☠lpture already exists. Typing "☠lpture" into DuckDuckGo generates results that don't seem to have that spelling: e.g., this page which has the typo "Scu;lpture" instead.

I wonder if there are Sinospheric puns involving 髑髏 'skull'. Imagine, for instance, an independence movement called 獨髏 (with 髑 respelled as its homophone 獨 'alone', the first half of 獨立 'independence') or 髑立 (which sounds like 獨立 'independence') with a skull as its symbol.

3. Why isn't 髑髏 a choice if I type "dulou" into Windows 10's Pinyin IME? I had to type "dokuro" in Japanese to type that. I could have fished for髑 and 髏 separately by typing "du" and "lou" into the Pinyin IME, but that would have taken longer.

4. Why did the Manchu call the Kyrgyz ᠪᡠᡵᡠᡨ Burut? Prior (2013: 29) says of Burut, "The required full study on this ethnonym [...] has yet to be produced."

5. I just heard "animoji" for the first time on The Late Show with Stephen Colbert. Who would have imagined centuries ago that Latin animatio would merge with Japanese 文字 moji 'character', a descendant of Old Japanese 文字 *mənzi (*məndzɨ?), probably a borrowing from a similar (identical?) Sino-Paekche word borrowed in turn from southern Early Middle Chinese 文字 *mən dzɨ̰.

(1.16.16:26: Scripta Sinica shows that the word 文字 is attested as early as 史記 Shiji [Records of the Historian, c. 94 BC]. Is it in any earlier texts?)

animoji has an entry in Wiktionary ... but it's for Spanish animoji defined as English animoji ... which doesn't have an entry yet!

6. How did I not discover the "citations" tab of Wiktionary until now? YELLOW PIG 12/20

songgiyan uliya aniya

juwa juwe biya orin inenggi

'yellow pig year, ten two month, twenty day'

1. I wrote about orin 'twenty' here.

2. Tonight's episode of The Late Show with Stephen Colbert was titled "Dem Moines Dembate!" (See the title here at 0:55.) I think I hear Jen Spyra saying Moines as [mɔjnts] rather than as [mɔjn] or even [mɔjnz], as if she were making an effort to pronounce the final written <s>.

Wiktionary says Des Moines (Washington) is pronounced [dəˈmɔɪnz], whereas Des Moines (Iowa) is pronounced [dəˈmɔɪn]. I never heard of the first one before.

Dem Moines works well as a pun in English because /mm/ can be reduced to a single [m] in rapid speech. So Dem Moines and Des Moines can be homophonous.

The similarity between Dembate and debate reminds me of the allophony of prenasalized obstruents that I proposed in Old Japanese over twenty years ago: /Nk Ns Nt Np/ could be [ŋg nz nd mb] or [g z d b]. (Or even [vowel nasalization + g z d b].)

3. Minutes after I saw that, I encountered the word microburst for the first time. How micro is micro? YELLOW PIG 12/19

songgiyan uliya aniya

juwa juwe biya oniohon inenggi

'yellow pig year, ten two month, nineteen day'

No need for a boilerplate discussion of 'nineteen' here since I already did that last month. Instead, I want to talk about Wu Yingzhe's (2014: 425) observation that 'eight' is also phonetically spelled as

<222.327.270> <ny.yê.êm> (<ê> = a front vowel unlike <e> for /ə/).

Written Mongolian naiman points to an original *a that fronted after *ny /ɲ/. Compare the fronting of *a in Khitan with the rounding of *a to *o after *ny in the language from which Jurchen borrowed niohon 'eighteen'.

Shimunek (2017: 358) reconstructs  'eight' in Proto-Serbi-Mongolic *ñayɪma (*nyayïma in my notation). Janhunen (2003: 17) regards *-PAn (with an assimilated variant *-man after the nasal *ny-) as a suffix. I might expect that lower numeral suffix to be reduced to a single consonant in other Khitan numerals, but so far 'eight' seems to be unique; 'ten' doesn't have it.

1.21.20:14: APPENDIX: Written Mongolian suffixed lower numerals and their Khitan cognates

Proto-Serbi-Mongolic Written Mongolian
Khitan (expected)
Khitan (actual)

The Proto-Serbi-Mongolic forms are based on  Janhunen (2003) and Shimunek (2017).

The absence of reflexes of *-PAn in Khitan 'three', 'four', and 'ten' could be explained away as the result of a simplification *-rb > *-r. But that doesn't account for the absence of -gh in dalu 'seven'.

I think *-PAn is a Mongolic innovation absent in Khitan. But doesn't Khitan nyêm contain a reflex of *-PAn? Maybe not. It just occurred to me that the m of naiman and nyêm could in fact be part of the root, and Mongolic -an in naiman could be by analogy with dologhan. That analogy never occurred in Khitan which never added *-PAn to 'seven' or any other numeral.

The evidence for a reading of Khitan


seems to be ... nil. nil based on Jurchen nilhun 'sixteen' is not a bad guess, but there is no guarantee that Khitan proper had the same root for 'six' as the Khitan dialect or Khitan-like language that is the source of nilhun.

In any case, I agree with Janhunen (2003: 17) that Mongolic 'six' is an innovation: jir 'two' times ghu 'three' plus *-PAn.

Janhunen (2003: 17) writes that "The absence of *.pA/n in 1 *nike/n > *nige/n, 5 *tabu/n, and 9 *yersü/n suggests that these numerals were somehow special and perhaps secondary." He does not comment on the absence of *.pA/n in 'two'. YELLOW PIG 12/18

songgiyan uliya aniya

juwa juwe biya niuhun inenggi

'yellow pig year, ten two month, eighteen day'

1. 'Eighteen' in Jurchen dates is either niuhun 'eighteen' (in Jurchen Empire usage) or juwa jakun 'ten eight' (in Ming dynasty usage). niuhun 'eighteen' is obviously not cognate to juwa 'ten' or jakun 'eight' and is probably a loan from a para-Mongolic language (a nonstandard Khitan dialect?).

Khitan as preserved in written records has <TEN EIGHT> in both scripts rather than a special word for 'eighteen'. The Khitan words for 'ten' and 'eight' are unknown.

As far as I know (I do not have anything like a complete concordance), <TEN> never combines with any other character in the small script, so it (a) may be a true logogram for 'ten' (though the possibility of it representing a nonnumeral homophone in one or more contexts cannot be dismissed yet) or (b) have a phonetic value that does not happen to occur in any other word. If 'ten' was something like the -hun/hon suffix found in the Jurchen teen-words borrowed from para-Mongolic, I would expect it to appear at least once in another block since I cannot imagine HUn being an exotic syllable in Khitan.

I have only seen <EIGHT> in combination with <de>, the dative-locative suffix, toward the end of line 7 in the epitaph for 蕭仲公 Xiao Zhonggong (1150). Can all Khitan numerals take nominal suffixes? (<THREE> has the suffix <de> earlier in the line.) What I wrote about <TEN> applies here as well: <EIGHT> (a) may be a true logogram for 'eight' (though the possibility of it representing a nonnumeral homophone in one or more contexts cannot be dismissed yet) or (b) have a phonetic value that does not happen to occur in any other word.

Jin (1984: 200) derives the Jurchen graph <EIGHTEEN> from Chinese 十 <TEN> and 八 <EIGHT>, but I don't see any resemblace between the three beyond a cross-shaped intersection (and <EIGHTEEN> really doesn't have a vertical stroke). Thinking of <EIGHTEEN> as <TEN> with two extras at the top left and bottom right is useful for memorizing the character, though.

Janhunen (2003: 399) reconstructs the potential ultimate source of Jurchen niuhun as pre-Proto-Mongolic *nya(y)i.ku/n 'eight-teen' via *nyo.hun or *nyohon¹. The rounding of *a (preserved in Proto-Mongolic *na(y) [Janhunen 2003: 17]) < *ny seems to be an assimilation of *a to a following labial vowel. Conversely, the Proto-Mongolic first decade numeral suffix *-man < *-pA/n has a nasal due to assimilation to the initial *n- of the root *na(y)i (Janhunen 2003: 17). Maybe the Khitan word for 'eight' was somethig like nyai - a direct preservation of Proto-Serbi-Mongolic 'eight'? (There is no evidence for the consonant-vowel sequence yi in Khitan.)  But Wu Yingzhe introduces a twist in the decipherment of 'eight' that I'll save for tomorrow, assuming I have time. Although I had other topics planned for today, midnight approaches, so I'll stop here.

¹The absence of a period before *-hon seems to be a typo. YELLOW PIG 12/17

songgiyan uliya aniya

juwa juwe biya darhon inenggi

'yellow pig year, ten two month, seventeen day'

Out of time tonight, so I'll just make a few remarks about Jurchen 'seventeen'.

'Seventeen' in Jurchen dates is either darhon 'seventeen' (in Jurchen Empire usage) or juwa nadan 'ten seven' (in Ming dynasty usage). darhon 'seventeen' is obviously not cognate to juwa 'ten' or nadan 'seven' and is probably a loan from a para-Mongolic language (a nonstandard Khitan dialect?).

Khitan as preserved in written records has <TEN SEVEN> in both scripts rather than a special word for 'seventeen'. The Khitan word for 'seven' may have been something like dalo (Kane 2009: 115) with an l like Written Mongolian dologhan. The Jurchen word for 'seventeen' may actually be dalhon if the in the Ming Mandarin transcription of Jurchen 'seventeen' represents -l- rather than -r-. (Ming Mandarin had no means to precisely represent a consonant cluster -lh-, and there was no reason for Jurchen speakers to borrow foreign *l as r.) dalhon has no Manchu cognate, so I cannot use Manchu to decide on whether represents -l- or -r-.

Janhunen (2003: 399) reconstructs the potential source of Jurchen darhon as pre-Proto-Mongolic *dal.ku/n or or *dal.u.ku/n 'seven-teen'. YELLOW PIG 12/16

songgiyan uliya aniya

juwa juwe biya nilhun inenggi

'yellow pig year, ten two month, sixteen day'

1. 'Sixteen' in Jurchen dates is either nilhun 'sixteen' (in Jurchen Empire usage) or juwa ninggu  'ten six' (in Ming dynasty usage). nilhun 'sixteen' is obviously not cognate to juwa 'ten' or ninggu 'six' and is probably a loan from a para-Mongolic language (a nonstandard Khitan dialect?).

Khitan as preserved in written records has <TEN SIX> in both scripts rather than a special word for 'sixteen'. The Khitan word for 'six' may have been something like nil- (nilhun minus -hun 'teen'), but there is no known Khitan-internal evidence for such a reading: e.g., Khitan small script <SIX> alternating with a graph sequence <>.

The nil- of nilhun has no cognate in Mongolic which has an innovation *jir-gu.xa/n 'two (times) three' (as proposed by Janhunen [2003: 399]). nil may be from the Proto-Serbi-Mongolic word for 'six' (also Janhunen's idea, though he uses the term 'Pre-Proto-Mongolic').

The Manchu cognate of nilhun is niolhun 'sixteenth day of the first month' (not 'sixteen'). Did Jurchen and Manchu borrow the word from different sources (or the same source in different periods)? Did Jurchen reduce an original *niol [ɲɔl] to nil? Or is Manchu -o- an innovation in the Manchu line due to the influence of the following -u-?

Janhunen (2003: 399) reconstructs the potential source of Jurchen nilhun as pre-Proto-Mongolic  *nil.kü/n 'six-teen'.

2. Today I learned the Serbo-Croatian word свађа svađa 'quarrel'. Wiktionary derives it from Proto-Slavic *sŭvadja. What are its cognates in other Slavic languages? I can't find anything like сважа in Russian.

3. Tonight I realized that Sino-Japanese

have 'irregular' readings by comparison with other Go-on and Kan-on: e.g.,

But I suspect the readings suu and guu are actually regular and from a layer of Go-on predating vowel raising: i.e., they were borrowed as *-ou which became -uu after raising.

4. Tonight I realized that 源氏物語 Genji Monogatari was written in roughly the same period (before 1021) as the creation of the Tangut script in 1036. I regret that  so little Tangut original literature has survived. Imagine what we could learn about Tangut culture from a Tangut equivalent of Genji. YELLOW PIG 12/15

songgiyan uliya aniya

juwa juwe biya tobohon inenggi

'yellow pig year, ten two month, fifteen day'

1. 'Fifteen' in Jurchen dates is either tobohon 'fifteen' (in Jurchen Empire usage) or juwa shunja 'ten five' (in Ming dynasty usage). tobohon 'fifteen' is obviously not cognate to juwa 'ten' or shunja 'five' and is probably a loan from a para-Mongolic language (a nonstandard Khitan dialect?).

(Khitan as preserved in written records has <TEN FIVE> in both scripts rather than a special word for 'fifteen'. The Khitan word for 'five' was tau < *tabu, not tobo. tobohon might be from a relative of Khitan in which

*CaCu > *CoCo

as there is no Jurchen-internal reason to borrow *tabu- as tobo-.)

Janhunen (2003: 399) reconstructs the potential source of Jurchen tobohon as pre-Proto-Mongolic (dare I say Proto-Serbi-Mongolic?) *tabu.ku/n 'five-teen'.

tobohon is the only one of the '-teen' loanwords in Jurchen that has a Manchu cognate: tofohon < *topokon 'fifteen'. The Jurchen b : Manchu f < *p correspondence is irregular and cannot be accounted for via Jurchen or Manchu-internal sound change. Was the word borrowed in slightly different forms from two different dialects of a para-Mongolic language? One of those dialects might have shifted *b to *p.

Jurchen 'fifty' is susai which isn't much like shunja 'five' and nothing like Janhunen's (2003: 16]) Proto-Mongolic *tabi/n.

2. Last night I ran out of time to describe a linguistic dream I had yesterday. In the dream I came up with various 'underlying' morphological forms for Sanskrit -an nominals.

There was some semblance to reality in the dream: e.g., -an masculine nominative singular present participles were 'derived' from /-ant-s/, /-s/ being the masculine nominative singular suffix preserved in, say, amr̥tas 'Amritas'.

But other parts were pure fantasy - the part that makes me cringe involved trying to explain the nominative singulars of masculine and neuter -an-stem nouns:

In that bogus fantasy scenario, masculine nouns had an 'underlying' laryngeal /H/ that only surfaced as vowel length when /-n/ was subtracted to form the nominative singular.

In the real world, the final -a of neuter nāma is from syllabic *n̥: the *n of the stem without a vowel between it and *m.

As for the final of masculine rājā, Burrow (1955: 230) says that "for phonetic reasons which are not now clear", the vowel of the suffix -an was lengthened, and "there is a tendency for the final semi-vowel of a suffix [i.e., -n] to be elided." He doesn't seem to know what's going on there; I certainly don't.

Wiktionary derives rājā from Proto-Indo-European *ʕʷrḗǵeʕ (rewritten in my preferred notation). *eʕ regularly becomes Sanskrit ā, but ... why is *-ʕ there?

3. Today I learned that Arabic فتنة <ftnh> fitna has a much broader scope of meanings than I had thought:

Lane, in his monumental Arabic-English Lexicon compiled from various traditional Arabic lexicographical sources available in Cairo in the mid-19th-century, reported that "to burn" is the "primary signification" of the verb. The verb then came to be applied to the smelting of gold and silver. It was extended to mean causing one to enter into fire and into a state of punishment or affliction. Thus, one says that something caused one to enter al-fitna, i.e. trial, affliction, etc., or more generally, an affliction whereby some good or evil quality is put to the test. Lane glosses the noun fitna as meaning a trial, a probation, affliction, distress or hardship, and says that "the sum total of its meaning in the language of the Arabs" is an affliction whereby one is tried, proved or tested.

The definitions offered by Lane match those suggested by Badawi and Haleem in their dictionary of Qur'anic usage. They gloss the triliteral root as having the following meanings: "to purify gold and silver by smelting them; to burn; to put to the test, to afflict (in particular as a means of testing someone's endurance); to disrupt the peace of a community; to tempt, to seduce, to allure, to infatuate."

The meanings of fitna as found in Classical Arabic largely carry over into Modern Standard Arabic, as evidenced by the recitation of the same set of meanings in Hans Wehr's Dictionary of Modern Written Arabic. In addition, Wehr glosses the noun fitna as also meaning "charm, charmingness, attractiveness; enchantment, captivation, fascination, enticement, temptation; infatuation, intrigue; sedition, riot, discord, dissension, civil strife."

Buckwalter & Parkinson, in their frequency dictionary of Arabic, list the noun fitna as the 1,560th most frequent word in their corpus of over 30 million words from Modern Standard Arabic and colloquial Arabic dialects. They gloss fitna as meaning "charm, allure, enchantment; unrest; riot, rebellion."

And all these years I thought fitna only meant something like 'disturbance'.

At first I couldn't understand how fitna could mean 'charm'. But then I built on the semantic chain established above:

burn > smelt > afflict > test > tempt > tempting quality = charm

It would be interesting to see the order of attestation of the various meanings in an Arabic historical dictionary. Then again, the Qur'an already has a wide range of meanings for f-t-n. Is the root attested in pre-Qu'ranic Arabic, and if it is, what does it mean?

4. I just heard gnocchi pronounced with [tʃi] instead of [ki] in a fake Italian accent on The King of Queens. I'm surprised the director didn't ask for a retake. YELLOW PIG 12/14

songgiyan uliya aniya

juwa juwe biya durhon inenggi

'yellow pig year, ten two month, fourteen day'

1. 'Fourteen' in Jurchen dates is either durhon 'fourteen' (in Jurchen Empire usage) or juwa duin 'ten four' (in Ming dynasty usage). -hon '-teen' is obviously not cognate to juwa 'ten', though dur- 'four-' does resemble Jurchen duin 'four'. Despite that resemblance, durhon is probably a loan from a para-Mongolic language (a nonstandard Khitan dialect?). (Khitan as preserved in written records has <TEN FOUR> in both scripts rather than a special word for 'fourteen'.)

Janhunen (2003: 399) reconstructs the potential source of Jurchen durhon as pre-Proto-Mongolic (dare I say Proto-Serbi-Mongolic?) *dö.r.kü/n* 'four-teen'. The mismatch between the Jurchen and pre-Proto-Mongolic vowels is unexplained. Might it be evidence for a vowel shift in the language that Jurchen borrowed from? Kiyose (1977: 41) proposes that

That would incorrectly predict that *dörkün would become Jurchen ˟derhun. Was durhon borrowed after those vowel shifts?

Jurchen 'forty' is dehi < *deki which seems to be from a para-Mongolic cognate of Janhunen's (2003: 16) Proto-Mongolic *döci/n 'id.' Jurchen dehi has e (as Kiyose would predict from *ö) instead of u (which Kiyose could not predict) like durhon. Did Jurchen borrow 'forty' and 'fourteen' from different sources, or was the inconsistency of vowels already in a single source language? The correspondence of Jurchen h < *k : Proto-Mongolic c is irregular and needs explanation.

The root *dö 'four' in Proto-Mongolic 'forty' lacks the -r of Jurchen durhon 'thirteen' and may preserve the earliest form of 'four' in Serbi-Mongolic. Janhunen (2003: 17) also observes the r-less root in Proto-Mongolic *dö.tüxer 'third' replaced by *dörbe.düger (with the extended form of 'four', *

2. Having just mentioned 'forty', by coincidence today as I was copying the 契丹小字研究 Qidan xiaozi yanjiu (Research on the Khitan Small Script)  hand copy of the epitaph for Emperor 興宗 Xingzong (1015-1054) of the Khitan Empire, I came across the block

<n.o.FORTY.ghu> 251-186-145-151

in line 25. <FORTY> is presumably a phonogram. Could it stand for something like *deki: i.e., the presumed para-Mongolic source of the Jurchen word for 'forty'?

3. Why is 'Beijing' spelled ᠪᠡᠭᠡᠵᠢᠩ begejing in the traditional Mongolian script? It looks like a  transcription of a combination of 北 Middle Chinese *pək 'north' (> modern Mandarin běi) with 京 modern Mandarin jīng 'capital'. But I doubt the -ge- has anything to do with a Middle Chinese *-k that was already gone in the north before Mongolian was first written.

My guess is that bege- is merely an orthographical convention to write [pəː] by analogy with native words in which spoken [əː] corresponds to written -ege-: e.g., spoken [təːr] and written ᠳᠡᠭᠡᠷ᠎ᠡ deger-e, both from Proto-Mongolic *dexere 'top' (written g is not quite etymological).

4. I just heard the brand name DiGiorno pronounced with an un-Italian [ʒ] instead of [dʒ]. That's an example of how [ʒ] is used by English speakers to signal that a word is foreign (even if that word doesn't actually have it). Another example is in the last topic: Beijing, sometimes pronounced [dʒ] in English even though standard Mandarin j is an affricate [tɕ], not a fricative.

Why is DiGiorno called Delissio in Canada?

5. How did Japanese 大根 <BIG ROOT> daikon come to mean 'ham' in the sense of 'actor known for an exaggerated, over-wrought style'?

How did ham come to mean that?

I just learned that ham can be an antonym of spam: 'electronic mail that is wanted; mail that is not spam or junk mail'. YELLOW PIG 12/13

songgiyan uliya aniya

juwa juwe biya gorhon inenggi

'yellow pig year, ten two month, thirteen day'

1. It just occurred to me that Jurchen doesn't use omsho 'eleven' or jirhon 'twelve' in month names; 'twelfth month' is 'ten two month', though Jin Jurchen had 'twelve day' which was later replaced by Ming Jurchen 'ten two day'.

2. Here I use the more interesting Jin Jurchen day name. 'Thirteen' in Jurchen dates is either gorhon 'thirteen' (in Jurchen Empire usage) or juwa ilan 'ten three' (in Ming dynasty usage). gorhon is obviously not cognate to juwa 'ten' or ilan 'three' and is probably a loan from a para-Mongolic language (a nonstandard Khitan dialect?). (Khitan as preserved in written records has <TEN THREE> in both scripts rather than a special word for 'thirteen'.)

Janhunen (2003: 399) reconstructs the potential source of Jurchen gorhon as pre-Proto-Mongolic (dare I say Proto-Serbi-Mongolic?) *gu.r.ku/n 'three-teen'. The mismatch between the Jurchen and pre-Proto-Mongolic vowels is unexplained. Might it be evidence for a vowel shift in the language that Jurchen borrowed from?

Jurchen 'thirty' is gusin (cf. Janhunen's [2003: 16] Proto-Mongolic *guci/n 'id.') with u instead of o. Did Jurchen borrow 'thirty' and 'thirteen' from different sources, or did a single source language have *u > *o in 'thirteen' but not 'thirty'?

The root gu 'three' in 'thirty' lacks the -r of gorhon 'thirteen' and may preserve the earliest form of 'three' in Serbi-Mongolic. Janhunen (2003: 17) also observes the r-less root in Proto-Mongolic *gu.taxar 'third' replaced by *gurba.dugar (with the extended form of 'three', *

3. When I saw the title of Linda Konnerth's "The Proto-Tibeto-Burman *gV- nominalizing prefix" (2016), I immediately thought of Old Chinese *k-. She writes on p. 2,

The velar prefix is notably absent in the Southeastern branch [of Tibeto-Burman] consisting of Yi languages and Burmese. This is, however, not surprising as the Southeastern branch is generally characterized by an isolating typological profile and lack of morphological structure.

In a footnote, she adds,

The same arguably holds for Sinitic languages, where no strong evidence of the prefix has turned up so far. However, two anonymous reviewers point out that both Sagart (1999: 98-107) and Baxter and Sagart (2014: 57) discuss a reconstructed *k-prefix for Old Chinese. The problem with this evidence is that multiple functions are discussed and only one such function is to derive nouns from verbs, while other functions include deriving action verbs and stative verbs apparently from already verbal roots. There are only two examples of the proposed nominalizer *k- given by Baxter and Sagart (2014: 57), and while this is promising, we should not put too much weight on the limited evidence for the time being.

Here is how I would reconstruct those two examples:

Example 1:

方 EOC *CIpaŋ > MOC *CIpɨaŋ > EMC *puaŋ > Md fāng '(to be) square'

匡 EOC *kIpaŋ > MOC *kIpɨaŋ > *kʰpɨaŋ > EMC *kʰuaŋ > Md kuāng 'square basket'

方 isn't always a verb, and its *C might be *k, so perhaps these are simply two different spellings of *kIpaŋ which underwent different paths of reduction rather than evidence for *k-prefixation. The shift of *kp- to *kʰp- has a parallel in Khmer synchronic phonology: /kp/ is pronounced [kʰp]. (But are there any other examples in Chinese?)

Example 2:

明 EOC *mI-raŋ > MOC *mIrɨaŋ > EMC *mɨeŋ > Md míng 'to be bright'

囧 EOC *k(V)-mI-raŋ-ʔ > MOC *kmIrɨaŋʔ > EMC *kwɨeŋ > Md jiǒng 'bright window'

These words share a root *raŋ 'bright' also in 朗 *kV-raŋ-ʔ 'to be bright'. The functions of *mI-, *kV-, and *-ʔ are unknown. If the noun 囧 has the same velar prefix as the stative verb 朗, then that prefix cannot be a nominalizer.


I feel obligated to say that Pyu does not seem to have a k-nominalizer. I have identified five nominalizers in Pyu:

Only one has initial k-, but it has a disyllabic source and may have originally been a noun.

4. How did this happen?

The Chinese name Ālóng 阿龙, sometimes misread Ayi, refers to Nung (Anong).

Ālóng is a standard Mandarin reading, and I presume Ayi is also supposed to be standard Mandarin, albeit an erroneous reading. The trouble is that neither 龙 nor its full form 龍 are read yi. And I can't imagine anyone mistaking 龙 for 衣 which is read yī.

Benedict (1972: 10) proposes that Pyu might have a "rapprochement" with Nung, but I am not sure what he means by that. As he is discussing subgrouping, he may be saying that Pyu might subgroup with Nung. YELLOW PIG 12/12

songgiyan uliya aniya

juwa juwe biya jirhon inenggi

'yellow pig year, ten two month, twelve day'

1. 'Twelve' in Jurchen dates is either jirhon 'twelve' (in Jurchen Empire usage) or juwa juwe 'ten two' (in Ming dynasty usage). I've chosen the more interesting of the two. jirhon is obviously not cognate to juwa 'ten' or juwe 'two' and is probably a loan from a para-Mongolic language (a nonstandard Khitan dialect?). (Khitan as preserved in written records has <TEN TWO> in both scripts rather than a special word for 'twelve'.)

Janhunen (2003: 399) reconstructs the potential source of Jurchen jirhon as pre-Proto-Mongolic (dare I say Proto-Serbi-Mongolic?) *jï.r.ku/n 'two-teen'. *r is a shared element of *jï.r 'two', *gu.r 'three', *dör 'four', and *pa.r 'ten' (not cognate to *-ku/n '-teen'!); the presence of *-r in 'ten' prevents me from regarding *r as a lower numeral prefix. That *-r reminds me of the unrelated lower numeral prefix g- in Tibetan gcig 'one', gnyis 'two', and gsum 'three' (but bzhi 'four'!). Are those real affixes, or has a consonant spread to adjacent numerals? ('Ten' isn't adjacent to 'two' through 'four', so maybe pre-proto-Mongolic 'ten' is simply *par with an original root-final *-r that has nothing to do with the *-r that spread through the lower numerals.)

2. Why is the binomial name of the Japanese badger (穴熊 anaguma 'hole-bear') Meles anakuma with -k-?

In theory, ana 'hole' plus kuma 'bear' could be either anakuma or anaguma. The latter has 連濁 rendaku 'consecutive voicing': the voicing of a voiceless-initial word as the second element of a compound.

Lyman's Law rules out rendaku if a voiceless-initial second element contains a medial voiced obstruent. I view Lyman's Law as a constraint against 'overnasality' because modern Japanese voiced obstruents are from Old Japanese prenasalized obstruents:

/kamu-kaNze/ > ˟/kamu-NkaNze/ [kamuŋganze]


˟/-NkaNze/ 'wind' would have had 'too much nasality': two /NC/. The actual form is /kamu-kaNze/ with only one /NC/. The modern form still observes Lyman's law: it is kamikaze¹, not ˟kamigaze.

If Lyman's law does not apply to a compound, it's not possible to predict when rendaku occurs. Which brings me back to whree I started: in theory, ana 'hole' plus kuma 'bear' could be either anakuma or anaguma.

Does anakuma without rendaku exist in dialects? Is there evidence for it existing by 1844 when Coenraad Jakob Temminck named it in Siebold's Fauna japonica? If not, is the specific name from a misreading of 穴熊 <HOLE BEAR>?

¹'God' has two forms in Japanese:

The root form cannot stand by itself, but it appears in compounds like Old Japanese kamukaze 'divine wind'.

The modern Japanese form kamikaze replaces bound kamu- with free kami.

3. I just learned that Siebold [ziːbɔlt] had a daughter who had the surname 失本 Shiimoto, a Japanization of Siebold (nowadays Japanized as シーボルトShiiboruto). I suppose m was considered close enough to b. And of course there was no way to replicate the cluster [lt] in Japanese. The big mystery to me is Shii-. I've never seen that reading for 失 <LOSE> before, and there's no shii that means 'lose' in Japanese. Shii- is usually written as 椎 <CASTANOPSIS.TREE> in names.

4. I never would have guessed Cantonese would have a Yiddish loanword: 薯嘜 syu4 mak1 'schmuck'. Its etymology is uncertain. YELLOW PIG 12/11

songgiyan uliya aniya

juwa juwe biya omsho inenggi

'yellow pig year, ten two month, eleven day'

1. 'Eleven' in Jurchen dates is omsho 'eleven' (in Jurchen Empire usage) or juwa emu 'ten one' (in Ming dynasty usage). I've chosen the more interesting of the two. omsho is obviously not cognate to juwa 'ten' or emu 'one' and is probably a loan from a para-Mongolic language (a nonstandard Khitan dialect?). (Khitan as preserved in written records has <TEN ONE> in both scripts rather than a special word for 'eleven'.)

Janhunen (2003: 399) reconstructs the potential source of Jurchen omsho as pre-Proto-Mongolic (dare I say Proto-Serbi-Mongolic?) *omcon 'eleven' and suggests it may be "connected" to a nominal root *onca 'special, additional' (> Written Mongolian onca 'special'). Note, however, that *omcon has *-m- and *onca has *-n-.

2. "Writing kanji on the air is even better practice than writing on paper" - but how about on the glass in the shower? I don't write kanji there, but I do write other scripts I'm studying. Being flustered when writing Tangut, Jurchen, and Khitan on the sands of Waikiki Beach motivated me to practice writing all three every day. Ironically, I just realized I missed yesterday's practice. Off to make up for it now ...

3. I was going to write about how I thought Jurchen

<STAR> osiha 'star'

might be a recycled logogram for <OX> (cf. Chinese牛 <OX>), but I already did so back in November. Duh. I can't remember what I wrote just weeks ago.

I'm going to stop here because I spent too much time writing addenda for the last two posts. YELLOW PIG 12/10

songgiyan uliya aniya

juwa juwe biya ice juwa inenggi

'yellow pig year, ten two month, new ten day'

1. Why was δρόμων 'dromon' borrowed as dromond in Middle English and as dromont in Old French if its stem ends in -n (the ge`nitive singular is δρόμωνος, not δρόμωντος)?

2. I have no idea what to make of this mirror thought to have a "high probability" of being one of the hundred mirrors given to 卑彌呼 Himiko by the 魏 Wei emperor in China. I know nothing about archaeology, and even if I knew something ... this whole blog exemplifies the danger of knowing a little about something.

The mirror has the inscription 長冝□孫. The missing third character is thought to be 子. The 「中國古鏡の研究」班 Ancient Chinese Mirror Study Group (2012) translates that phrase in other mirrors as


naga-ku shison ni yoroshiku

long-ADV descendants DAT good

'[may one be] long suitable for descendants'

i.e., 'may one have descendants for a long time'?

3. While looking for common kanji pronounced so in Wikipedia's list of jōyo kanji last night, I was a bit surprised to find 塑 <MODEL> which isn't that common (though it's not rare either). Here are its Shpika stats:

塑 is well below the cutoff point of #2000 that I'd suggest for required (i.e., jōyō 'common use') kanji. I'm not surprised that Gakken's A New Dictionary of Kanji Usage (1982) lists it among infrequent 47 jōyō kanji without their own entries. (Below those 47 is a list of the 102 kanji that weren't in the 1981 jōyō list but were frequent enough to have their own entries.)

塑 turns out to be a carryover from the 1946 当用漢字 tōyō 'current use' kanji list.

I just discovered Jun Da's hanzi frequency site has a single-character search feature (but there are no rankings in the results). So I have to look for characters in each list to find its ranking: e.g., 塑 is #2038 in the modern list (really?).

1.5.1:33: 塑 is not in any of the 15,000 most common Japanese words on this page.

1.5.1:45: 塑 is in just one word in this list of 45,000 Japanese words ordered by frequency: 可塑 kaso (#42,376).

1.5.10:59: The top 塑 word in this list of 2,610,776 Japanese 'words' (including a lot of 'noise' toward the end) is 可塑 kaso (#37,425).

デジタル大辞泉 Digital Daijisen defines 可塑 kaso as


'The ability to make the shape of a thing in the manner one thinks. The ability to mold [塑造 sozō 'model-make'.'

Google and Weblio searches show that 塑造 is almost always accompanied by another morpheme. No wonder my Kadokawa pocket monolingual dictionary doesn't have an entry for it (but does have an entry for suffixed 可塑性 kasosei 'plasticity'). I assume the frequency lists contain 可塑 kaso because it was stripped of common affixes like -性 -sei '-ity'.

This medical article has a lot of 可塑 kaso by itself: e.g., in the title of the site


Karei: tekiō to kaso

'Aging: Adaptation and Plasticity'

and in lines like

[...] 神経細胞は [...] 可塑の強い細胞です。

[...] shinkei saibō wa [...] kaso no tsuyoi saibō desu.

nervous-system cell TOP [...] can.model ATTR strong cell be.

'neurons .... are high in plasticity.'

But I would be hesitant to use kaso by itself.

1.5.12:18: The fact that sometimes supplies a hiragana reading of 可塑 as a root of other words in parentheses  (examples: 1, 2, 3, 4, 5) tells me that not all writers expect readers to recognize 塑 so even though in theory all school graduates should know it. (The readings of 可 ka and affixes like -性 -sei are trivial; only the youngest children would not know them.)

4. I was surprised to see unpaired single quotation marks before the foreign names in a Japanese trailer for Message from Space (1978):

Once as a typo, maybe, but twice?

5. I like the orthographic notes at the top of the Wikisource edition of the Japanese constitution. YELLOW PIG 12/9

songgiyan uliya aniya

juwa emu biya ice uyewun inenggi

'yellow pig year, ten two month, new nine day'

Last week I thought I had saved my 12.26 entry, but it was gone when I opened it, so I had to recreate it.

Tonight I discovered that half my 1.2 entry was gone when I opened it, so I'll recreate the second half later.

I thought I was constantly pressing the save button in KompoZer on 12.26 and 1.2, but either I wasn't or the button doesn't work. I'm guessing the latter, because when I closed KompoZer, I didn't get the "Save changes ... before closing" message that I get when I exit without saving. Now I'm checking to see if the latest version is in the directory after saving it. Tedious.

I've never had this problem before, possibly because in the old days I uploaded entries right after finishing them, whereas lately I've been uploading them in batches. And to upload an entry I would have to have that entry saved.

My power is out, so I can't get online to access the links I need to finish this entry. Maybe tomorrow. No, tonight - the power came back on after an hour. Here goes ...

1a. The Albanian Caucasian alphabet credited to Mesrop Mashtots (better known as the creator of the Armenian alphabet) doesn't look much like Greek, but I assume it is derived from Greek. Its ABG order certainly is. Is the order taken from this 15th century manuscript?

The Albanian Caucasian letter 'bet' (see the whole chart here) resembles Cyrillic Б, at least in its modern form. Cyrillic was created centuries after the Albanian Caucasian alphabet in the fifth century. Was there a Б-shaped variant of beta in Greek as written in the fifth century? Wikipedia likens fourth and fifth century Greek 'b-d uncial' to 'half-uncial' in which Latin

⟨b⟩ and ⟨d⟩ have vertical stems, identical to the modern letters

Was Greek b-d uncial beta like Б? I haven't been able to find an image.

Don't let 'bet' get your hopes up - Τ-like Caucasian Albanian letters are 'lyit' and 'cayn'. And what looks like Σ is 'kar'. I'd like to see derivations for eachCaucasian Albanian letter.

I know most scripts only through their modern typeset forms. (I can't even read Fraktur or Sütterlin - the latter said to have been "taught in some German schools until the 1970s, but no longer as the primary script"!) So seeing Greek minuscule was a revelation.

1b. How would Korean speakers perceive the lenis and fortis consonants of Udi? Do they sound like Korean plain and reinforced consonants? What is the origin of the two series in Udi?

1c. This 90s alphabet for Udi is a Latin/Cyrillic hybrid. I assume vowel-hard sign digraphs like iъ represent pharyngealized vowels. I wish I had a key.

2. Having mentioned Fraktur, here are my first glimpses of

3. Kyrgyz' 'left-right' vowel system looks like a height -based system. Kirchner (1998: 346) explains that Kyrgyz has

a low archiphoneme /A/, represented by /a/, /e/, /o/, /ö/, and a high /I/, represented by /ï/, /i/, /u/, /ü/. The choice of the representatives is determined by features of the preceding syllable, e.g.

I wonder if the low *A and high *I that I reconstruct for Early Old Chinese and pre-Tangut presyllables harmonized with nonheight attributes of main syllables. If so, then harmony worked in both directions! In the scenario below, Early Old Chinese has nonheight attribute harmony for presyllabic vowels, whereas Middle Old Chinese has partial height harmony for main syllable vowels.

Early Old Chinese
Middle Old Chinese
Late Old Chinese



precious thing
*/CApuʔ/ [Copuʔ]

*/CApəŋ/ [Capəŋ]
*Capaəŋ *pəŋ

house wall
*/CApek/ [Cepek]

*/kAdoks/ [kodoks]

*/CIpaʔ/ [Cɨpaʔ] *Cɨpɨ *pɨaʔ

smell (v.t.)
*/CIbits/ [Cibits]

boil (v.)
*/NIputs/ [Nuputs]

*/CInəʔ/ [Cɨnəʔ] *Cɨnɨəʔ *ɲɨəʔ

*/CIpe/ [Cipe]
*/CIpors/ [Cupors]

(1.4.22:24: Added Late Old Chinese. Changed *b in Early and Middle Old Chinese 'fragrant' to *P since Late Old Chinese *b may either be from an original *-b- or a compression of *Nep- > *Np- > *mp- > *mb- > *b-.)

(1.5.20:35: Shortly after finishing this post, I realized that perhaps labiovelars conditioned rounded allophones of *A and *I: e.g.,

I forgot about labiouvulars until now:

Last night I wondered if labials might have had the same effect: e.g.,

Could there have been a simple rule to copy the presyllabic vowel in the main syllable?

I used to reconstruct a rounding of after labials between Late Old Chinese and Early Middle Chinese:

But maybe now I don't have to bother.)

4. Tonight it occurred to me that the Jurchen phonogram

<ma> (left to right: variants from the 大金得勝陀頌碑 Great Jin Victory Hill stele [1185], the Berlin copy of the Bureau of Translators vocabulary, and the form written by 山路廣明 Yamaji Hiroaki - is that last variant just an artifact of his handwriting, or is it in orignal Jurchen texts?)

might originate from a graph for <HEAD> in a pre-Jurchen (Parhae?) script, as its Chinese near-lookalike 元 <ORIGIN> is two lines (二) representing a head atop two legs (儿). If so, its reading might be derived from a Koreanic word cognate to early Korean *məti 'head'. But ... if that were the case, why isn't the Jurchen graph read me [mə]?

The same vocalic mismatch problem arises if I claim that the Jurchen reading ma is derived from a peninsular para-Japonic reading cognate to Old Japanese mətə 'origin'.

Moreover, neither the Koreanic nor para-Japonic hypotheses account for the absence of a second syllable in the Jurchen reading. A highly speculative workaround: the Koreanic or para-Japonic source reading was something like *mət with apocope, and the character was used to write a Jurchen open syllable since native Jurchen words could not end in -t. That still doesn't solve the vocalic mismatch problem, though.

A really crazy hypothesis: what if the reading ma is Chinese in origin?

1.5.22:45: Let's see if I can make the ma-hypothesis 'work' (note the scare quotes). When a crazy idea occurs to me, I like to follow it through to see just how much absurdity results before I give it up.

元 can be reconstructed with *ŋ- in Old Chinese. That nasal might come from an even earlier cluster, but there is almost no support for a cluster within Chinese except for the fact that 元 is phonetic in 院 'courtyard wall' which has been reconstructed with *w- (Schuessler 2007: 593) or *ɢʷ- (Baxter and Sagart 2014). I reconstruct *I-presyllables in both words to account for their later vocalism:

元 fused its presyllable with the initial, whereas 院 lost its presyllable (which could have been *mI- like that of 元).

Suppose *mɢʷɨan was reduced to something like *mɨan in some northeastern Chinese dialect. (But there is no evidence for that ever happening! Nor are there any other known cases of *mɢʷ- becoming *m-.)

What if a 元-like graph pronounced *mɨan served as a phonogram for ma in some precursor of the Jurchen script: e.g., the Parhae script?

The use of a Chinese *mɨan-graph as a phonogram for a foreign ma is parallel to the use of 萬 Early Middle Chinese *mua̤n < Old Chinese *mɨanh as a phonogram for Old Japanese ma. 

5. Why is <CURSE> read as ju in Japanese? Theoretically it should be shu (from Early Middle Chinese *ɕṳ) or shū < *siu (from Late Middle Chinese *ɕìw) and there is no ju-character that would serve as an analogical model. The most common character with the same phonetic is 祝 shuku (not juku).

呪 <CURSE> wasn't in the Japanese required character list until 2010, but I learned it anyway in third grade from exposure. I assum most Japanese students encounter characters long before they are formally taught them in school.

The second half of the synonym compound 呪詛 <CURSE CURSE> juso 'curse' still isn't in the required list, but people can easily read 詛 since it shares a so-phonetic with required character that are all read so: 狙阻祖租粗組. YELLOW PIG 12/8

Yesterday was yellow pig 12/7 despite the title of yesterday's post.

I'm going back to putting the Jurchen date at the top.

songgiyan uliya aniya

juwa juwe biya ice jakun inenggi

'yellow pig year, ten two month, new eight day'

From on I'm going to split the date on two lines to display better in my browser.

1. I had meant to start the year by announcing that I had uploaded a week of posts (19.12.26-20.1.1) but my blog was down, so I didn't get around to uploading the posts until after I got my blog running again today. The posts should be on my front page for at least a month if not longer depending on when I decide to start deleting the oldest posts again.

2. Today while copying the Bureau of Translators Jurchen vocabulary, I came across the word

for 'sparrow-GEN' transcribed in Ming Mandarin as 失赤黑 *ʂi tʂʰi xəj.

In the Bureau of Interpreters vocabulary, the word appears solely in Ming Mandarin transcription as

舍徹 *ʂɛ tʂʰɛ

Kane interprets that as sece(he) (why not she-?). My guess is shec(ih)e.

One might expect the standard Manchu cognate of those words to be sicihe or shecihe, but the actual word is ... cecike.

Kane (1989: 115) gives other examples of J s(h) : M c and J h : M k:

I would add cases like

in which the Chinese transcription might not indicate a Jurchen -h-.

Those correspondences deserve further study. All I can say for now is that HAPPY NEW YEAR 2020

It's still the year of the pig in traditional East Asian calendars, but it's the year of the rat (2020) if one coordinates the Chinese animal cycle with the Gregorian calendar:

Last night I realized that Khitan small script character 216 might be a derivative of 118 <qu>:


Let's assume 216 was <qu*> with <*> indicating 'different from <qu> in some way'. Then

<216.151> 'rat'

would be read <qu*ghu> which is close to Written Mongol qulughana 'rat'.

What if <qu*> were <qul>? <qul.ghu> is close to qulughana, but I wouldn't expect Khitan u to correspond to Written Mongol a.

That's where I left off last night. Today I realized that <ghu> might be read <ugh> after a consonant. So maybe <216.151> was read <qul.ugh> which is even closer to Written Mongol qulughana and requires no vocalic gymnastics.

The low frequency of 216 (7 times in the 契丹小字研究 Qidan xiaozi yanjiu corpus and 0 times in initial position in Wu and Janhunen 2011 [whose index is organized by initial graphs]) suggests that it probably did not represent a simple CV syllable. If it didn't represent the CVC syllable qul, it may have represented a CVCV sequence qulu, and <qulu.ugh> was read qulugh.

The <qul(u)> hypothesis could be confirmed if 216 alternated with <qu.l>, <qu.ul> (= <>?), etc.

As far as I know, 216 appears only in initial position with one exception: this block

<119.216> <dau.?>

from line 3 of the second inscription in the 萬部華嚴經塔 Wanbu Avataṁsakasūtra Pagoda in Hohhot.

2. I still practice writing Tangut, Khitan, and Jurchen (TJK) every day. Recently I added Manchu to my regimen and today I started writing Mongolian (in the traditional script - I still don't know how to handwrite Ө and Ү in Cyrillic).

All my TJK exercises begin with the date. I'm still going to date these blog entries in Jurchen since it's the thousandth anniversary of the Jurchen large script or close to it (see Kiyose [1977: 22] for three possible dates: 1119, 1121, and 1123; Kane [1989: 3] gives the date 1120, though Kane [2009: 3] gives the date 1119). Today's date in Jurchen is:

songgiyan uliya aniya juwa juwe biya ice nadan inenggi

'yellow pig year, ten one month, new seven day'.

3. Last night I learned about prothesis in Bashkir:

The prothesis is mostly unsurprising, but these correspondences are:

1.2.11:00: I forgot to mention these cases of prothesis in native words:

Without more Bashkir data, I can't test my guesses for motivations: e.g., avoiding initial l- and making monosyllables disyllabic.

The Bashkir letter ҡ <q> surprised me since I'm accustomed to қ <q> from Kazakh, etc. Why do Bashkir and Siberian Tatar have their own special ҡ <q>? Siberian Tatars were educated in (Volga) Tatar which has к <k> for /k/ (including a [q] allophone) and  къ <k"> for /q/.

4. Today I learned about the Caucasian Albanian script used to write a (near?-)ancestor of the Udi language.

I've thought Old Chinese might have had pharyngealized vowels, so I'm interested in the phonetics of Udi's pharyngealized vowels.

5. What is the etymology of Persian شمشیر <šmšyr> shamshir, first (?) attested in Middle Persian as <šmšyl>? It doesn't look Indo-European. Is it an areal word?

6. Why does the Persian word/name فرشته <frsth> fereshte < firishta sometimes appear as Farishta(h), e.g., in this 1958 Bollywood film title (फरिश्ता Phraiśtā; cf. Urdu فرشته Firishta) and this list of Pashto (not Persian, I know) names? YELLOW PIG 12/6

songgiyan uliya aniya juwa juwe biya ice ninggu inenggi

'yellow pig year, ten one month, new six day'

1. Last night I looked up 'hip bone' and discovered it could also be called the innominate bone. Why 'nameless'?

2. Are 清樂 Shingaku 'Qing music' lyrics an overlooked source of data for premodern Mandarin reconstruction? In this sample from 月琴樂譜  Gekkin gakufu (Moon Guitar Sheet Music, 1877), 兒 (now ér [aɚ˧˥] in modern standard Mandarin) has the furigana ルウ <ruu>. That seems to indicate that the kana transcription is based on a dialect in which 兒 was pronounced like [ɻ̩]. (Other evidence rules out the most obvious interpretation [ruː]: e.g., no Mandarin dialect has [u] in 兒.)

The date of the text does not necessarily indicate that the [ɻ̩] pronunciation still existed in the source dialect as of 1878. The kana spelling ルウ <ruu> could have been copied from some earlier source.

ルウ <ruu> bears no resemblance to ジ <zi> [dʑi], the usual Japanese reading of 兒. Strictly speaking, the two Japanese borrowings are not from the same dialect in two different periods: <zi> is from a 7th century northwestern Chinese dialect, whereas <ruu> is from a Qing (perhaps 18th century?) Mandarin dialect. Nonetheless the latter probably underwent more or less the same changes as the former, so as a convenient fiction, here's how the sources of <zi> and <ruu> could be bridged:

Modern standard [aɚ] is from a stage 5-type form that developed a prothetic vowel:

*ɻ̩ > *əɻ > > [aɚ]

In some Mandarin varieties, only the prothetic vowel  has survived without any trace of retroflexion: e.g.,  壽縣 Shouxian [ə] and 鳳陽 Fengyang [a] for 兒.

It is tempting to derive Sino-Korean 아 a for 兒 from a Fengyang-like form, but that would be anachronistic. Fengyang [a] is probably a very recent development from *ar, whereas the earliest attested ancestor of 아 a is ᅀᆞ borrowed from a form like stage 4 *ʐɻ̩. became ʌ in the 16th century, and ʌ then became a in the 18th century.

3. I don't understand how Korean z vanished without a trace. Lee and Ramsey (2011: 142) state that "early examples of the elision of z are all restricted to the environment _i, y, which suggests that the process of change started there." They give these examples:

In those particular cases, I can imagine /z/ being phonetically something like [ʑ] that lenited to [j] and then disappeared before /i/. But what were the intermediate stages between /z/ and zero in initial position before /ʌ/ as in 15th century /zʌ/ > 16th century /ʌ/?

I thought [ɦ] might be a possible intermediate stage by analogy with Sanskrit:

Proto-Indo-Iranian *ĵʱ > Sanskrit h [ɦ] but Avestan z

I assume there was a stage like *ʑʱ underlying both  the Sanskrit and Avestan reflexes. (No, see topic 4 below.) That stage would be like Middle Korean /z/. In some modern Indic languages, Sanskrit initial h- has disappeared in reflexes of hima- 'winter'. I don't know if that's a regular change.

4. I've been trying to work out the phonetics of Proto-Indo-Iranic¹ (PII) reflexes of Proto-Indo-European (PIE) velars.

4.1. The PIE starting point:


4.2. The first palatalization in PII



4.3. Affrication in PII (cf. the alveolar affricate reflexes of Sanskrit palatals in some modern Indic languages)


4.4. The merger of plain velars and labiovelars


4.5. The second palatalization in PII


*ɟʱ *gʱ

Velars palatalized in certain environments. Compare:

4.6. The merger of *e and *o into *a made the second palatalization phonemic:

It was no longer possible to regard *c as an allophone of /k/ before /e/, since /e/ no longer existed. (The e of later Indo-Iranic languages is not from the earlier *e that merged with *a: e.g., Sanskrit e is from PII *ai which could be from PIE *ei or *oi but not PIE *e.)

1.1.0:59: The following sections deal with post-PII developments.

4.7. Pre-Sanskrit (Proto-Indic²) stage 1


*ɟʱ *gʱ

The affricate series palatalized. I thought the absence of *ts-type affricates in Proto-Dravidian might have pressured a shift away from alveolar affricates, but the traces of Indic in the Near East - far from Dravidian - underwent stage 2 (4.8 below): e.g., the name Paršasatar from praśāstar- 'director' with ś < PII *ts-.

4.8. Pre-Sanskrit (Proto-Indic) stage 2



Voiceless *tɕ simplified to *ɕ.

The voiced affricates merged with the voiced palatals.

I don't know the order of those two changes, so I show the results of both changes in the same table instead of arbitarily showing one change at a time in two tables.

4.9. Sanskrit (Proto-Indic)

ś [ɕ]
j [ɟ]
h [ɦ]
gh [gʱ]

*ɟʱ weakened to h [ɦ].

4.10. Proto-Iranic (continuing from 4.6)



The voiced aspirate series merged with the plain voiced series.

4.11. Avestan

j [ɟ] g

The affricates deaffricated. The change of *ts to s is roughly parallel to the change of *tɕ to ś in Sanskrit. But note that Proto-Iranic *dz became Avestan z, whereas pre-Sanskrit *dz did not become Sanskrit ź [ʑ], a sound that does not exist in Sanskrit.

The exact phonetics of c and j are unknown. They were palatal unlike s and z, so I have projected palatal stops forward into Avestan. But maybe Avestan c and j were actually affricates.

4.12. Summing up

2nd palatalization
*kʲ n/a
*gʲ n/a
j [ɟ]
*gʲʱ/*gʱ n/a
*dzʱ h [ɦ]
*k/*kʷ +
*g/*gʷ +
j [ɟ]

j [ɟ]
*gʱ/*gʷʱ +
*ɟʱ h [ɦ]
*k/*kʷ -
*g/*gʷ -
*g g
*gʱ/*gʷʱ -
*gʱ gh [gʱ]

¹1.1.0:40: I favor the term Iranic by analogy with Turkic, Mongolic, etc. to avoid confusion with the country of Iran.

²1.1.0.57: I prefer the term Indic to Indo-Aryan, as the word Aryan is shared by both Indic and Iranic. Ironically, the name Indic is actually Iranic, as it is an Hellenization of Old Persian 𐏃𐎡𐎯𐎢𐏁 <ha i du u sha> [hi(n)duš] 'India', cognate to Sanskrit Sindhus 'Sindhu'. The Old Persian form has two Iranic innovations:

It occurs to me tonight that an Indic name for Indic would be Sindhic, but that's not going to catch on. No one is going to rename the country Sindhia either. And Hindutva advocates are probably not going to change the name of their ideology to Sindhutva. YELLOW PIG 12/6

songgiyan uliya aniya juwa juwe biya ice shunja inenggi

'yellow pig year, ten one month, new five day'

1. I checked Jan van Steenbergen's Interslavic page for updates and noticed a new item in the menu:

The Painted Bird (in Czech: Nabarvené Ptáče) a Czech-Slovak-Ukrainian film written, directed and produced by Václav Marhoul. It is based on Jerzy Kosiński’s novel The Painted Bird from 1965.


The action takes place in some unspecified East-European, Slavic-speaking country. A place that cannot directly be linked to a specific Slavic population requires a language that can instantly be recognised as Slavic but not be linked directly to any specific Slavic population either. That's why Marhoul decided to use Interslavic:

2. I just bought e-access to Vojtěch Merunka's Interslavic zonal constructed language: an introduction for English-speakers. Google says I can check a box to "Make [the book] available offline", but I can't find it.

On page 5, Merunka writes (12.31.14:03: links added),

Interslavic is also an interesting experiment of alternative history: If there was not such strong pressure from the Frankish Latin-oriented church (e.g. Wiching of Nitra and his band) against the Moravian Church in the 9th century, the invasion of the Hungarians into Central Europe and the subsquent collapse of contacts between Moravia (now a territory of both the Czech and Slovak Republics) and Bulgarian, Serbian and Kiev (later Russian) states, it is possible to imagine a hypothetic different evolution of the Slavic early Middle Age language - we have seen a similar phenomenon in the Arabic World: After the end of natural linguistic unity during the Middle Ages, the modernized universal Arabic language based on the religious language of the Qur'an still prevails. It is an artificial language which is close enough to the various contemporary spoken national dialects of Arabic that it is recognized as the standard for communication between Arabic nations and for contact with foreigners and used as an auxiliary language by both state apparatus and the media.

It would be fun to see historical fiction depicting a world where Interslavic - probably simply 'Slavic' - has the same position that modern standard Arabic has.

Page 143 presents a modified Arebica alphabet to write Interslavic.

3. 𗡠 0271 2mer4, representing the second syllable of 𗡢𗡠 0702 0271 1to'4 2mer4  'to seek, find', has a right side (Boxenhorn code: baedar) found nowhere else. I found it in Li (2008: 47) when looking up  𘅊 0273 1le1 for my last entry.

2mer4 sounds like Old and Middle Chinese *mek 'to seek'. If I were to force a relationship between the two, I could trace 2mer4 back to pre-Tangut *RImek-H with labial dissimilation:

*Pek > *Pew > *Pej > Pe

*RImek-H could be related to

𗑉 4684 1me1 < *CAmik or *mek 'eye'

cf. Tibetan mig (archaic dmyig) 'eye' (but Old Chinese has 目 *Cmuk - is *Cmikʷ possible?)

which is the word that made me discover labial dissimilation. Two scenarios:

But there are other possible pre-Tangut sources of 2mer4 that would rule out a connection with the Chinese word:

𗡢 0702 1to'4 'to seek' can appear by itself. That suggests that 𗡠 0271 2mer4 might be a formerly independent verb that only survives as the second half of a synonym compound 'seek-seek'.

4. Li (2008: 120) gives this example of 0702 as an independent verb from The Timely Pearl 292:


5098 0702 0760 1715

2ngon4 1to'1 2dzen4 1rar4

'case seek judge ?'

It corresponds to Chinese 案檢判憑 'case examine judge ?'

Nishida (1964: 215) has the translation 'to examine the case and hand down a judgment'. Nishida (1964: xii) says Burton Watson and a ヤンポルスキー (Yampolsky? - I don't know who this is, or what his preferred Anglicization of Ямпольский is) helped him with the English translations.  Later, Nishida (1964: 216) has the translation'deliver a judgment' for 判憑 in Timely Palm 302.

I would think then that 𘅤 1715 1rar4 /憑 means 'to hand down' or 'to deliver'. But the basic meaning of 𘅤 1715 1rar4 is 'to write' (Li 2008: 285). So might the Tangut phrase in The Timely Palm mean 'write a judgment'?

憑 can be translated many ways in Chinese, but none of those translations mean 'write' or 'hand down' or 'deliver'. Might it be 'proof': i.e., 'evidence'? If so, then there is only a vague parallel between the Tangut object-verb sequence 𗍷𘅤 'write a judgment' and the Chinese verb-object sequence 判憑 'judge evidence (?)', and mechanically equating 𘅤 with 憑 may be a mistake.

Then again, to say Burton Watson's knowledge of Chinese dwarfs mine would be an understatement, and maybe 判憑 is an idiom 'deliver/hand down a judgment' that I just failed to confirm in other sources.

I always assumed Watson had learned Japanese in the American military in WWII, but in fact he didn't know any Japanese when he arrived in Japan in 1945, and he was actually a Chinese major.

5. My DuckDuckGo search for Yampolsky led me to a video of minerva scientia pronouncing Tangut in Gong's (more or less) and Arakawa's reconstructions.

6. ElitekidMu0 comments on that video:

Fun fact: Thunder Force VI [Wikipedia], a shooting game released in 2008 by SEGA for the PS2, included the Tangut Language as the main language for the protagonist of the series, Galaxy Federation (Vastian). Another language included in the game is the Mongolian Script, used by the antagonist of the series, ORN Empire.

7. Last night I learned that Kara Ben Nemsi was meant to mean 'Carl son German' (though nemsi is really closer to نمساوي‎ namsāwiyy/nimsāwiyy 'Austrian'; 'German' is ألماني 'almāniyy).

Karl May has a way with foreign names. I couldn't have come up with something equivalent to Old Shatterhand or Old Surehand in German.

8. I just noticed that the Old English Wikipedia (Ƿikipǣdia) is

Sēo Frēo Ƿīsdōmbōc

'the free wisdombook' (Ƿ <W> wynn is a rune borrowed into the Old English alphabet)

Are Goidelic forms like Irish seo 'this' the only living reflexes of Proto-Indo-European *só retaining s-? Greek [o] has lost h- < *s-, and English the has a th- that spread from the th-reflexes of the *t-initial oblique forms of *só.

9. I finally got around to rewriting my lost entry for 12.26 from memory. I finished right after I ordered a used hardcover copy of William C. Hannas' The Writing on the Wall: How Asian Orthography Curbs Creativity (2003).

10. Tonight I discovered the variant 槑 for 梅 <PLUM>.

11. Baxter and Sagart (2014) reconstruct 梅 <PLUM>. in Old Chinese as *C.mˤə. I suspect that *C was a voiceless consonant because Vietnamese 'apricot' has a ngang tone pointing to an earlier *m̥- which may be from an even earlier *C̥m- with a voiceless *C̥- that conditioned the devoicing of *m-. I would reconstruct the word in Early Old Chinese as *C̥Amə with a low first vowel that triggered the warping of to *ʌə:

*C̥Amə > *C̥Amʌə > *C̥mʌə > *m̥ʌə > *mʌe > *mʌj > *mɑj > *mwɑj > *muj > *mwəj > *məj > standard Mandarin [mej]

It is possible that *C̥A- was simply completely lost after warping in (many? most? all?) dialects other than the one underlying Vietnamese *C̥m-. I have not yet found any Chinese varieties with a yinping tone pointing to *m̥-.

The *m̥- in the scenario above is of late origin. An earlier *m̥- in Old Chinese became *x- in stage 2 below, whereas newer *m̥- merged with *m-:

stage 1
stage 2
stage 3

*m̥- *x-
hǎi [xaj˧˩˧]

*C̥m- *m̥- *m-
méi [mej˧˥]

měi [mej˧˩˧]

The tones above are conditioned by final glottals: final glottal stops conditioned the falling-rising tone [˧˩˧] and stage 3 voiced *m- and the absence of a final glottal conditioned the high rising tone [˧˥]. YELLOW PIG 12/4

songgiyan uliya aniya juwa juwe biya ice duin inenggi

'yellow pig year, ten one month, new four day'

1. Tonight it occurred to me that the Jurchen and Khitan large script characters for 'four' might be graphic cognates:

One might be rotated - but which one? And did the Parhae script have both rotated and nonrotated variants of <FOUR>?

12.30.0:17: Both <FOUR>s have four strokes, so they may simply be two types of tally marks formalized as characters.

In any case, the Khitan large script character is not to be confused with Chinese 卅 <THIRTY> which is a fusion of three 十 <TEN>s.

12.30.12:50: Chinese 卅 <THIRTY> in turn should not be confused with the Jurchen phonogram <sui>:


Jin (1984: 25, 26, 180) reports the first pair of forms in the 大金得勝陀頌碑 Great Jin Victory Hill stele (1185) and the second 卅-like  pair of forms in the Berlin and Tōyō bunko copies of the Ming dynasty Bureau of Translators vocabulary from c. 1500. Without examining the original texts, I cannot be certain about minor variations such as the presence or absence of a hook in the 1185 stele.

I fear that the Bureau of Translators' forms might be unintentionally 'sinified' in the sense that unfamiliar Jurchen characters were accidentally modified by scribes more familiar with sinography. Perhaps the resemblance of <sui> to Chinese卅 <THIRTY> in the Bureau of Translators vocabulary might be an example of sinification.

12.30.15:33: Jin (1984: 58, 76) derives Jurchen <FOUR> from the phonogram <da> which in turn he derives from Chinese 屠:


In the Jin dynasty, 屠 was pronounced *tʰu. Why base a phonogram <da> on a Chinese character pronounced *tʰu?

I don't think <da> was a Jin dynasty invention. I think its roots go back further to a period when 屠 was pronounced as *da in Late Old Chinese. (屠 was once a transcription character for -ddha in 浮屠 *bu da = Buddha.) In other words, I think <da> is potential evidence for the Jurchen large script being an heir to an old tradition of phonetic writing rather than a 12th century invention.

I don't think there is any relationship between <FOUR> and <da> beyond graphic convergence - the bottom of <da> (known only from two inscriptions) may have been remodelled after the far more common character <FOUR>.

2. Tonight while copying character 236 of the Golden Guide, I miswrote the Tangut character element 𘡛 by placing the dot too low so it intersected the stroke below it.

Nishida (1966: 242) interpreted as 𘡛 a radical for things having to do with 愛惜 aiseki 'cherish'. It just occurred to me that 𘡛 might be derived from the top of 愛 <LOVE> or the top right of 惜 <CHERISH>.

But ... what is 𘡛 doing on the top of 𘓉 0993 1lhew1 'to herd', of all things? Is 𘓉 0993 a semantic compound like <CHERISH.LIVESTOCK>?

But ... the bottom of 𘓉 0993 (Boxenhorn code: baecie) is neither 'livestock' nor short for a character for any animal. The only other character with baecie is 𘅊 0273 1le1, a character for writing surnames.

3. I was surprised by this passage (emphasis mine):

Martin Kümmel similarly proposes, based on observations from diachronic typology, that the consonants traditionally reconstructed as voiced stops were really implosive consonants, and the consonants traditionally reconstructed as aspirated stops were originally plain voiced stops, agreeing with a proposal by Michael Weiss that typologically compares the development of the stop system of the Tày language (Cao Bằng Province, Vietnam).

But then I checked Pittayaporn (2009: 110) who explains that in Cao Bằng,

I can see something similar happening in Proto-Indo-European ... except for this problem:

The ejective hypothesis, on the other hand, correctly predicts that Proto-Indo-European labial *pʼ (corresponding to *ɓ- in the implosive hypothesis) would be rare or absent.

4. I wish there were animated GIFs like the Georgian ones at for Manchu and traditional Mongolian letters. I've been using Jun Jiang's Manchu app which has animated images for Manchu syllables and words, but it doesn't seem to match the verbal (nonvisual) instructions in Roth Li's Manchu textbook, so I'd like to see a second opinion.

5. I discovered that the Old English Wikipedia has a runic viewing option. Select ᚱᚢᚾ <run> under the article title.

12.30.0:16: Try the ȝƿ and ᵹƿ viewing options too.

6. Why is Gdańsk Gduńsk in Kashubian? Is Polish a : Kashubian u a regular correspondence in some environment(s)? I don't see anything like *a > u in Stone's (1993: 765) sketch of Kashubian vowel history.

7. Another Kashubian surprise: kùńszt [kwuɲʃt] (I think) 'art' < German Kunst. Why [wu]? How did Kashubian develop [wu] in native words? Is [ɲ] instead of [n] due to assimilation with [ʃ]? Was the word borrowed from a German dialect in which 'art' was [kunʃt] instead of [kʊnst]? 'Hyperlabial' [wu] for [ʊ] seems odd to me.

Aha, I see now that Kashubian /u/ becomes [wu] "[i]nitially or after a labial or a velar" (Stone 1993: 762). So [wu] has nothing to do with German.

8. How did Proto-Slavic *sŭnŭ 'sleep' become Lower Sorbian soń with a palatal ń instead of the expected n as in the rest of Slavic: e.g., Upper Sorbian son? YELLOW PIG 12/3

<so nggiyan uliya aniya juwa juwe biya ice ilan inenggi>

'yellow pig year, ten one month, new three day'

(0. 12.29.0:15: I keep thinking the version of <ilan> above looks like Chinese 斗 <DIPPER>, but it is of course in fact cognate to Chinese 三 <THREE>.)

1. Via Andrew West: Abraham Gross' proposal to encode the missing kana <YI> and <WU> in Unicode. That reminds me to upload my August post about <YI> and <WU>.

2. I first heard the song "Year of the Cat" as a child in 1976, and only years later¹ did I learn that it was a reference to the Vietnamese zodiac which is close to the Chinese one with two exceptions:

The terms for the Vietnamese zodiac are not the normal terms for animals: e.g., in Vietnamese, 'water buffalo' is 𤛠 trâu and 'ox' is 𤙭 ~ 𤞨  bò.

I've long assumed that the reinterpretation of 丑 sửu as water buffalo incorporated a local animal, but water buffalo also exist in China too. Duh. In fact, China has seven times more water buffalo than Vietnam. Shows you what I know about farming: nothing. So I can't explain how sửu came to refer to water buffalo.

As for 卯 mão/mẹo, was its reinterpretation as 'cat' due to a folk etymological association with 貓 ~ 猫 mèo 'cat'?

¹In an interview with Al Stewart that I heard on the radio in 1989?

3. I never heard of screeves until today. The word sounds like it could be a native English word, but in this context it's actually a loan from Georgian მწკრივი cʼkʼrivi 'row, series'. I wonder why it's so Anglicized. It's not as if Japanologists speak of 行 gyō 'rows (of kana sharing the same vowel: e.g., a, ka, sa)' as gheow or however an English speaker might spell it. (It would be fun to ask English speakers unfamiliar with Japanese to write gyō phonetically.)

There turns out to be another screeve which isn't  native or from Georgian. YELLOW PIG 12/2

<so nggiyan uliya aniya juwa juwe biya ice juwe inenggi>

'yellow pig year, ten one month, new two day'

1. Dept. of Ideas I Wish I Had: Alexander Zapryagaev's proposal for writing Old Japanese in hentaigana, a logical extension of the common practice of writing the extinct Japanese syllable ye (now [e]) in hiragana as the hentaigana 𛀁 to differentiate it from え e and ゑ we (also now [e]). (More in this thread by Sven Osterkamp.)

2. The reading ritsu for 立 <STAND> is in that stratum of Japanese that I feel as if I've 'always' known. I suspect I learned the reading in the early 80s when I started to read Japanese books with furigana.

When I started learning Korean in 1987, I immediately picked up on the correspondences between Sino-Korean and Sino-Japanese¹. For instance, I noticed that Sino-Korean -l regularly corresponded to Sino-Japanese -tsu or -chi and vice versa. So I should have expected ritsu to correspond to Sino-Korean 릴 ril. But of course, the actual Sino-Korean reading of 立 is actually 립 rip. I learned that reading so early in my studies that I didn't even know the correspondence patterns yet. Hence the mismatch of -p and -tsu didn't bother me at all.

Not long afterward I learned Sino-Korean 잡 chap corresponding to Sino-Japanese zatsu for 雜 <MIXED>.

And then I learned the Cantonese readings of those characters: lap6 and zaap6.

The next step was learning about Chinese reconstruction. Of course all agree that 立 and 雜 originally ended in *-p in Chinese, and that Cantonese preserves that *-p.

So how did the Sino-Japanese readings of 立 and 雜 come to end in -tsu? Alexander Zapryagaev has a thread on the mystery of 立 ritsu.

¹And Mandarin, but that's not relevant here, since Mandarin lacks final stops. Without knowledge of Mandarin, I would have had a much harder time remembering which Sino-Korean words ended in -ng.

12.29.20:35: How I guessed final consonants in Sino-Korean in 1987 (before I knew anything about Cantonese or Vietnamese):

Sino-Japanese final
Sino-Korean final
vowel (usually; unpredictably occasionally in -p)
-ki, -ku
-chi, -tsu
-n or -m (unpredictable)

At the time I just memorized which Sino-Korean readings ended in -p, since there was no way to guess Sino-Korean -p on the basis of Sino-Japanese or Mandarin even in regular cases such as

十 <TEN> SJ : Md shi : SK 십 ship

In that particular case, *-ip was borrowed into Japanese as *-ipu which became *-iu and then -ū.

Once I learned which Sino-Korean readings ended in -p and -m, I could use that knowledge to guess which Cantonese and Vietnamese readings ended in -p and -m. YELLOW PIG 12/1

(I completed this post but lost it before I could upload it, so I reconstructed it on 12.30.16:13.)

<so nggiyan uliya aniya juwa juwe biya ice inenggi>

'yellow pig year, ten one month, new day'

1. The first ten days of the month are ice 'new' in the Ming Jurchen calendar. (In Jin Jurchen, the first day was 一日 emu inenggi 'one day'. Note how the early graphs are identical to Chinese 一日 <ONE DAY>.) Jin (1984: 105) derives the graph for ice from the left side 亲 of Chinese 新 <NEW>. But I think the Jurchen graph may be more directly connected to Chinese 𢀝 <NEW>, a variant of attested in the Jin dynasty dictionary 四聲篇海 Sisheng pianhai (The Four-Tone Text Sea).

2. In 1998 I reviewed William C. Hannas' Asia's Orthographic Dilemma for Korean Studies. I finally got around to reading a Kindle sample of the 2003 sequel The Writing on the Wall: How Asian Orthography Curbs Creativity.

Here's my attempt to sum up Hannas' argument:

A. East Asia has a "creativity deficit" (Kindle location 146)

B. Writing "affects thought" (Kindle location 245)

C. B causes A - in other words, East Asia writing systems cause a "creativity deficit"

A and/or B could be true. But I am skeptical of C. YELLOW PIG 11/30

<so nggiyan uliya aniya juwa emu biya gūsin inenggi>

'yellow pig year, ten one month, thirty day'

gūsin 'thirty' looks like Janhunen's (2003: 397) Proto-Tungusic *gutïn from para-Mongolic or pre-Proto-Mongolic *gutïn. (The Proto-Tungusic form cannot be from Proto-Mongolic *gucin which underwent two changes: > *i and *ti > *ci.) However, Proto-Tungusic *gutïn should become Jurchen gutin, not gusin.

I propose that Jurchen sin may be a borrowing that replaced an earlier *gūtïn inherited from Proto-Tungusic. (The macron in Jurchen does not symbolize length; it indicates that u is [ʊ].) The source of Jurchen gūsin may be a para-Mongolic (Khitan?) dialect that shifted *c to sh (unlike the prestigious Khitan dialect preserved in the small script that retains c).

I suspect that Khitan large script


is a graphic cognate of Jurchen


(12.26.13:19: Left to right: the earliest form from Nüzhen zishu [Book of Jurchen Characters, c. early 12th c.?], variant in 慶源 Kyŏngwon inscription, 1138-1153,  進士 jinshi candidate list, 1224, Berlin copy of the Bureau of Translators vocabulary, 15th c. It is interesting that the early and late forms are more similar to each other than to the forms between them.)

and sounded something like Jurchen gūsin, though there is no evidence for its pronunciation.

2. When I was studying Russian in the late 90s, I was surprised that 'Kremlin' was Кремль <Kreml'> without an n. I asked my professor why and ... I can't remember his answer. Today I learned from Wiktionary that there is an Old East Slavic кремлинъ <kremlinŭ> with -n-. But how did that n-form enter English? Not directly, I assume.

etymonline says:

1660s, Cremelena, from Old Russian kremlinu, later kremlin (1796), from kreml' "citadel, fortress," a word perhaps of Tartar origin. Originally the citadel of any Russian town or city, now especially the one in Moscow (which enclosed the imperial palace, churches, etc.). Used metonymically for "government of the U.S.S.R." from 1933. The modern form of the word in English might be via French.

The un-Turkic initial cluster kr- makes a Tatar (not 'Tartar') origin improbable. The Russian Wiktionary derives kreml' from Proto-Indo-European *kʷrom 'fence'.

12.26.10:09: Merriam-Webster says:

1662 [...] obsolete German Kremelien the citadel of Moscow, ultimately from Old Russian kremlĭ

That gives the impression that German added the -n (but why?). YELLOW PIG 11/29

<so nggiyan uliya aniya juwa emu biya orin uyewun inenggi>

'yellow pig year, ten one month, twenty nine day'

orin uyewun 'twenty nine' is a para-Mongolian (Khitan?)-Jurchen hybrid. Compare with Written Mongolian qorin yisün 'twenty nine' containing an unrelated Mongolian word for 'nine'.

Jurchen uyewun is trisyllabic unlike any other Tungusic word for 'nine' at starling other than Negidal ijeɣin with different first and third vowels. Neghidal i can correspond to Jurchen/Manchu u: e.g., N edin : J/M edun 'wind'. I have long assumed that Manchu uyun is a contraction of uyewun. That contraction already existed before Manchu got that name since the Ming dynasty Bureau of Interpreters vocabulary has disyllabic uyun (transcribed 兀容). The roughly contemporaneous trisyllabic uyewun (transcribed 兀也溫) in the Ming dynasty Bureau of Translators vocabulary may be more carefully pronounced and/or from a different dialect.

It's already Christmas in most of the world as I write this, so as a 'gift' to my readers, I'm uploading all the posts I wrote over the last month but had kept on my computer until now:

I've been too tired and busy to upload posts late at night. YELLOW PIG 11/28

<so nggiyan uliya aniya juwa emu biya orin jakūn inenggi>

'yellow pig year, ten one month, twenty eight day'

1. orin jakūn 'twenty eight' is a para-Mongolian (Khitan?)-Jurchen hybrid. Compare with Written Mongolian qorin naiman 'twenty eight' containing an unrelated Mongolian word for 'eight'.

Jurchen jakūn 'eight' has not changed much from Proto-Tungusic *japkun whose first syllable *ja looks like Proto-Japonic ya 'eight'. Coincidence? How many other instances of Proto-Tungusic intervocalic *j- correspond to Proto-Japonic *y-?

If one wants to link the Tungusic and Japonic words for 'eight' via borrowing, one must deal with the complication of working out a scenario of Tungusic-Japonic contact (see yesterday's post) and with the question of why Tungusic has *-pkun and Japonic doesn't. Proposing a genetic relationship eliminates the contact problem but still doesn't resolve the *-pkun problem.

It may be tempting to link early Korean *yʌtʌrp (Lee and Ramsey 2011: 160) to the Tungusic and Japonic words, but that raises even more problems: e.g., what is *tʌrp?

2. The current state of Korea-Japan relations in a slogan:

(1.2.15:51: Corrections by Kongduino.)

The verbs appear to be bare stems but are actually a-stems that have absorbed an -a ending that Martin (1992: 466) calls the 'infinitive'. But I would rather not use the term 'infinitive' for the ending of a finite verb.

The -a ending is more obvious in forms like 봐! pwa! 'look!' (< po-a) and 팔아! phar-a 'sell!'

3. I was surprised to learn from Martin et al. (1967: 870) that sa- 'buy' is also an "old-fashioned" term for 'sell (grain)', so ssar-ŭl sa-da 'rice-ACC X-STATEMENT' can be either 'buy rice' or 'sell rice'. YELLOW PIG 11/27

<so nggiyan uliya aniya juwa emu biya orin nadan inenggi>

'yellow pig year, ten one month, twenty seven day'

1. orin nadan 'twenty seven' is a para-Mongolian (Khitan?)-Jurchen hybrid. Compare with Written Mongolian qorin dologhan 'twenty seven' containing an unrelated Mongolian word for 'seven' with the numeral suffix last seen in jirghughan 'six'.

Jurchen nadan 'seven' can be projected intact all the way back to Proto-Tungusic. Proto-Tungusic *nadan looks like Proto-Japonic *nana 'seven'. Coincidence? How many other instances of Proto-Tungusic intervocalic *-d- correspond to Proto-Japonic *-n-?

What complicates a loan scenario is uncertainty over whether the two proto-languages were in contact. I think Tungusic and para-Japonic languages might have been in contact in Parhae, but that's centuries after the ancestor of Japonic spread from the Korean peninsula to the Japanese islands.

2. I just heard Muir pronounced as [mjʊɚ] which is what I'd expect for a theoretical Miur. Wiktionary lists a General American /mɪɚ/. I have never heard the name pronounced before. I thought it was homophonous with Moore in English. Wiktionary lists five (!) pronunciations for Scots muir 'moor': [møːr], [myːr], [meːr], [miːr], [mjuːr].

3. I also heard Buttigieg pronounced for the first time as [ˈbuːtɪdʒɪdʒ]. I had been mispronouncing it as [ˈbuːtɪdʒɛg], thinking gi was like Italian [dʒ]. Turns out both g's are Maltese ġ [dʒ] and ie is [ɨː] (according to Wikipedia's IPA for Maltese page) or [ɪː], [iɛ], or [iː] (according to Wikipedia's Maltese language page). In any case, ie is from ā, and so I'm not surprised to learn that Wiktionary says Buttiġieġ is from Arabic أبو الدجاج <ʔˀbw ʔldjʔj> ʔabū ad-dajāj, lit. 'father [of] the-poultry' with ā.

The bending of ā to ie in Maltese reminded me of the raising of Old Chinese *a to *ie and various high vowels and convinced me that Norman's pharyngeal hypothesis for Chinese was right. In my take on his hypothesis, pharygealization pushed vowels down, whereas vowels raised in its absence. But David Boxenhorn made me think  pharyngealization might not be a factor; vowel harmony alone might trigger vowel lowering and raising. And vowel harmony is a well-attested phenomenon in north Asian languages. YELLOW PIG 11/26

<so nggiyan uliya aniya juwa emu biya orin ninggu inenggi>

'yellow pig year, ten one month, twenty six day'

1. orin ninggu 'twenty six' is a para-Mongolian (Khitan?)-Jurchen hybrid. Compare with Written Mongolian qorin jirghughan 'twenty six' containing an unrelated Mongolian word for 'six'.

Grinstead (1972: 16) noted that

ninggu 'six'

is an inverted Chinese 六 <SIX>. It is not like any of the variants of Khitan large script <SIX>:

Is the Jurchen graph a 12th century invention, or is it derived from a version of the Parhae <SIX> that the Khitan did not adopt for their large script?

The reading of Khitan <SIX> is unknown, but it might be something like Proto-Mongolic *jir-gu-xan 'two-three-NUMERAL' as reconstructed by Janhunen (2003: 17). Jishi read <SIX> as ʧirkɔ: i.e., as 'two-three'. But if Janhunen is right about *jir-gu-xan being an innovation, Khitan might retain an older Proto-Serbi-Mongolic root for 'six'.

The Khitan small script block

<085.033.288> <> (Epitaph for Empress 仁懿 Renyi, d. 1076)

might indicate that <SIX> ended in -i, given how the initial vowel of one block (here, the i of <is>) is often (but not always) the final vowel of the previous block (here, <SIX>).

2. What is the etymology of Hawaiian luakini 'large heiau [Hawaiian temple; < hei 'sacrifice' + ?] where ruling chiefs prayed and human sacrifices were offered'? It looks like a compound of lua plus kini, but I can't find any lua or kini that would transparently add up to 'sacrificial temple'.

3. Wikipedia on the Dzungar genocide:

[Qing emperor] Qianlong issued his orders multiple times as some of his officers were reluctant to carry them out. Some were punished for sparing Dzungars and allowing them to flee, such as Agui and Hadada, while others who participated in the slaughter were rewarded like Tangkelu and Zhaohui (Jaohui).

If Tangkelu is a Manchu name, it violates vowel harmony. I would expect Tangkalu or Tengkelu.

4. I wish I could look for Tangkelu in Giovanni Stary's A Dictionary of Manchu Names (2000). The book's National Library of Australia listing says it's in "Mandingo" (sic). No.

5. In actual Mandingo, "/g/ and /p/ are found in French loans." The language has /k c j t d b/, though. Are /h/ and /p/ in part or in whole from earlier *g and *p?

6. The IPA transcription of the Kazakhstani national anthem is so different from what one might think Kazakh sounds like solely on the basis of the Cyrillic or Latrin alphabet: e.g.,

[jɪrlɪkˈtɪŋ dɑstɑˈnə]

Ерліктің дастаны

<Erliktiņ dastany>

Erlik-tiń dastan-y

'courage-GEN epic-3.POSS.NOM' = 'epic of courage'

One might expect the pronunciation to be something like [erliktiŋ dastanɨ] on the basis of Cyrillic and Latin alone. And if one guessed that Cyrillic і was [i], what would one guess и is? (It's [ɪj] ~ [əj] according to this chart.)

The use of ы/y for [ə] reminds me of my own choice to use y for the Tangut neutral vowel which may have been [ə] or [ə]-like in one or more grades.

The 3rd person singular possessive suffix -ы/y is missing from this table. See Mukhamedova (2016: 81) on the Kazakh X-GEN Y-POSS 'Y of X' construction.

7. Why does Glosbe align Kazakh дастан 'epic' with Dennis in translations?

8. Until now I assumed that Turkic beg was a loanword from the Middle Chinese title 伯 *pæk. That is the etymology in Clauson (1972: 322). But Wiktionary has a second etymology:

the Middle Persian title bag (also baγ or βaγ, Old Iranian baga; cf. Sanskrit भग / bhaga) meaning "lord" and "master". Peter Golden derives the word via Sogdian bġy from the same Iranian root. All Middle Iranian languages retain forms derived from baga- in the sense "god": Middle Persian bay (plur. bayān, baʾān), Parthian baγ, Bactrian bago, Sogdian βγ-, and were used as honorific titles of kings and other men of high rank in the meaning of "lord".

The problem I have with this etymology is: why was  a in some Iranian language borrowed as Turkic e?

If /a/ in the Iranian source language was [æ], how can Slavic bog 'god' be a loan from Iranian? Was the Slavic word borrowed from a different Iranian source language in which /a/ was back and labial: [ɒ] or [ɔ]?

As for the Chinese etymology, the mismatch of initials (Chinese *p- vs. Turkic b-) is not a problem if the borrowing was in an early Turkic variety without p-. (Pre-Proto-Turkic *p- became Proto-Turkic *h- which was preserved in Khaladj and was lost elsewhere.)

The -g of beg might be a Turkic approximation of a  Chinese (allophonic?) [ɣ]-like pronunciation of *-k. Although Old Turkic did have gh, gh could not coexist with e, but g could. And at some point, Middle Chinese raised to *ɛ. Late Middle Chinese *pɛɣ was transcribed in the Tibetan version of the  千字文 Thousand Character Classic (c. 9th-10th c.?)as <peg.> which is close to Turkic beg. (However, the Turkic word is first attested in the 8th century, possibly when 伯 was closer to *pæk than *pɛɣ in western Middle Chinese.)

9. If I understand this correctly, Haddow is a Germanic/Celtic (Scots + Scots Gaelic) hybrid. Are there more common names like it?

10. Aacistak has been called "the Language Capital of the World". What is its more common name? YELLOW PIG 11/25

<so nggiyan uliya aniya juwa emu biya orin shunja inenggi>

'yellow pig year, ten one month, twenty five day'

1. orin shunja 'twenty five' is a para-Mongolian (Khitan?)-Jurchen hybrid. Compare with Written Mongolian qorin tabun 'twenty five' containing an unrelated Mongolian word for 'five'.

The initial of 'five' in Manchu is s-, not sh-. Neither Jurchen sh- nor Manchu s- matches the t- in the rest of Tungusic.

2. Last night I thought of a Chinese character for the first time in many years: 閼. It has the same phonetic as a character that I first encountered last week: 菸.

That phonetic is a drawing of a crow: 於/烏. 烏 still represents the word for crow, but its variant 於 has come to represent a nearly homophonous locative preposition.

Normally 於/烏-graphs represent open syllables in modern languages: e.g.,

So in Cantonese, I would expect 閼 and 菸 to end either in -u [u] or -yu [y]. But they don't:

The vowels are less of an issue (see the appendix) than the codas:

In other words, 於/烏 should represent *-a(ʔ)(s) syllables but not *-t syllables or *-n syllables. Should. But clearly 於 is a phonetic in

I have not found any evidence for 菸 being read with -n before the last millennium. At some point 菸 came to represent a word 'tobacco' < 煙/烟 Old Chinese *CAʔin 'smoke' normally written with -n phonetics (垔 and 因). The top component of 菸 'tobacco' is <GRASS> which makes sense. But the bottom component 於 is a poor phonetic (and 於 is unlikely to be an abbreviation of the uncommon character 閼 which also has non-n readings). Was 菸 'smelly grass' chosen to write an unrelated and phonetically different but semantically similar word 'tobacco'?

I found 菸 via Wiktionary's entry on yen. I forgot that yen could also refer to having a desire for something.

12.22.19:22: APPENDIX: Some *-a rhymes from Old Chinese to Cantonese:

*Voiceless initials condition Cantonese tone 1 unless there ae other conditioning factors:

At some point after tonogenesis,*ʔ- was lost, and zero initials became homorganic glides before high vowels:

Contrast with *ʔa > nonhigh [a] without a glide in Cantonese 閼 aat3 [aːt˧]. YELLOW PIG 11/24

<so nggiyan uliya aniya juwa emu biya orin duin inenggi>

'yellow pig year, ten one month, twenty four day'

1. orin duin 'twenty four' is a para-Mongolian (Khitan?)-Jurchen hybrid. Compare with Written Mongolian qorin dörben 'twenty four' containing an unrelated Mongolian word for 'four'. -ben is the 'feminine'¹ vowel variant of the -ban found in ghurban 'three', and both ghurban and dörben have a shared suffix -r- (Janhunen 2003: 47).

Rozycki (1983: 7, 93) regards Jurchen/Manchu duin and Written Mongolian dörben to be a "[p]re-loan correspondence": "words with a phonology consistent with native Tungus stock and for which there is no evidence of loaning". I regard the vague similarity of duin and Proto-Mongolic *dö- 'four' (as reconstructed by Janhunen 2003: 47) as coincidental.

¹I use the term 'feminine' to avoid committing to a front or higher vowel interpretation of e.

2. Yesterday I forgot how to pronounce 6ix9ine which looks like it was written in the Arabic chat alphabet (in which 6 is ط <ṭ> and 9 is ص <ṣ> or ق <q>). But it's actually a stylized spelling of six nine mixing logograms with letters. The Jurchen (large) script, Korean hyangchal, and Japanese script frequently have logogram-phonogram sequences for words. Perhaps the Khitan large script did too, but it's too poorly understood for me to be certain.

How did Tekashi 6ix9ine come up with the stage name Tekashi? Is it based on Japanese Takashi?

3. I knew Ў wasn't unique to Belarusian (in which it represents /w/), but I forgot which other language was written with Ў: Uzbek. Ў has since been replaced with Oʻ. Ў/ represents mid /o/, whereas О/O represents low /ɒ/ and /o/ in Russian loans. Did Uzbeks perceive Russian /o/ [o] ~ [ɔ]² as being lower than their /o/ and closer to their /ɒ/? Does native /o/ have a high allophone [ʊ]? That would explain why it was written as Ў: i.e., as У <U> with a breve rather than as О <O> plus a diacritic.

²For some reason, Wikipedia IPA has [ɛ] for Russian /e/ and [o] (not [ɔ]) for Russian /o/ even though this diagram shows the two vowels at almost identical heights with [o] lower than [ɛ] rather than the other way around.

4. Cyrillic Ӯ (Ұ after 1957; see here for other uses of Ӯ) for Kazakh /ʊ/ reminds me of Möllendorff's Ū for Manchu /ʊ/.

The 'feminine' counterpart of Manchu /ʊ/ is /u/, but Kazakh has no /u/. It has an interesting three-way categorization of vowels: -RTR, 0RTR (neutral), and +RTR. The [-RTR] and [0RTR] counterparts of [+RTR] // are /ɪ/ and /ʉ/. (Kazakh has no /i/ either. If the IPA symbols are taken at face value, apparently the only high vowel is central /ʉ/; /ɪ/ and /ʊ/ are slightly lower.)

Is Kazakh /œ/ backed if not central? It is a [0RTR] vowel like /ʉ əj ə/ despite being written with a front vowel symbol like the [+RTR] vowels /ɪ jɪ e æ/.

5. I wish I had a key to the 1964-1984 Kazakh Latin alphabet used in China (and in this 1977 edition of Mao's Selected Works).

6. Last night I found Handel (2006) while trying to find where I had first encountered the idea that Korean 바람 param < Middle Korean pʌ̀rʌ̀m 'wind' was a borrowing from Old Chinese. I thought I had read it in Pulleyblank (1962), but I couldn't find it there. This 2013 post reminded me I got it from William Boltz. My apologies to Professor Boltz.

Handel discusses 'wind' on page 1015. In footnote 8, he mentions an internal etymology relating Korean 'wind' to pul- < Middle Korean pǔr- 'to blow'. Although the semantic match is perfect, the phonetic match leaves much to be desired. First, I know of no other cases of a CʌC-noun from a CuC-verb. Second, Middle Korean pǔr- is a class 5 stem in Ramsey's (1986) typology; it is a disyllabic stem /pùúr/, and if I understand Ramsey (1978: 221) correctly, it goes back to *pùrɯ́- with high series vowels and a high-low pitch pattern unlike the low-pitched low series vowels of pʌ̀rʌ̀m.

7. This part of the Wikipedia article on the Common Turkic Alphabet puzzles me:

Some handwritten letters have variant forms. For example: Čč=Jj, Ķķ=, and Ḩḩ=.

But Lithuanian Karaim, the only Turkic Latin alphabet  that I know of with Č, distinguishes Č (for []) from J (for [j]). And I find it hard to believe that two letters with such different shapes could be variants only in Turkic usage.

Of course in general Latin letter usage there are some surprising variants. Would an alien guess that B and b are the same letter? Uzbek used to have в instead of b in the 1928-40 Yaꞑalif alphabet. (I am not italicizing в since I'm not sure if the old Uzbek italic в looked like Russian italic в.)

Turns out that "[t]he small letter B is ʙ (to prevent confusion with Ь ь)". Although Ь represented palatalization in Russian, in Yaꞑalif, it seems to have stood for Soviet Turkic vowels similar to Turkish ı: e.g., Tatar [ɤ]. Uzbek had no such vowel:

[æ] [ɒ]

Nonetheless I guess ʙ remained the lowercase version of B in Uzbek for consistency with the other variants of Yaꞑalif. You can see Uzbek ʙ here.

8. I've never looked at Karakalpak before today. I confess I forgot it even existed.

It has a nearly symmetrical vowel system with palatal vowel harmony. Only e has no nonpalatal counterpart.


It also has labial harmony. If the first vowel is nonlabial, then the second vowel cannot be labial. However, if the first vowel is labial, then the second vowel may or may not be labial. In any case, vowels must match in palatality.

How was Karakalpak /h/ written in Cyrillic? I can't find a Cyrillic letter for it.

9. Wikipedia says that

The [irregular] /otoosan/ form [for Japanese 'father'] first appears in the early Meiji period in educational materials mandated by the 文部省 (Monbushō, "Ministry of Education").

Did /otoosan/ replace earlier /otossan/ by analogy with the long vowel of /okaasan/ 'mother'?

/okaasan/ is itself irregular; it is from /okakasan/ with  irregular intervocalic /k/-loss.

Wikipedia lists Taiwanese borrowings of both words: 多桑 <MANY MULBERRY> tò-sàng and 卡桑 <kha MULBERRY> khà-sàng. Both reflect shorter Japanese forms without the honorific prefix o-.

19.12.18.xx:xx: YELLOW PIG 11/23

<so nggiyan uliya aniya juwa emu biya orin ilan inenggi>

'yellow pig year, ten one month, twenty three day'

1. orin ilan 'twenty three' is a para-Mongolian (Khitan?)-Jurchen hybrid. Compare with Written Mongolian qorin ghurban 'twenty three' containing an unrelated Mongolian word for 'three'.

2. Yesterday I learned that Eom Ik-sang still believes a number of Korean words conventionally regarded as native are actually borrowings from Old Chinese. Even if I assume the Old Chinese forms he cites are correct, there are still issues.

Perhaps the most convincing of his proposals is

Old Chinese 風 *pljəm (Li), *plums (Zhengzhang) 'wind' : Korean 바람 param 'id.'

I would prefer to cite Middle Korean pʌ̀rʌ̀m 'wind' which is even closer to the Old Chinese reconstructions that he cites.

Although I expressed some doubts about a liquid in the Old Chinese word for 'wind' in 2013, I would favor reconstructing that word as *prəm with *-r- now.

That aside, there is one other potential problem with the comparison: I don't think anyone's Old Chinese reconstruction for 'wind' ever had the vowel *ʌ. If the Old Chinese word for 'wind' had *ə, why was it borrowed into early Korean as something like pʌ̀rʌ̀m when Korean also had the vowel ə? In other words, why isn't the Korean word for 'wind' pərəm with ə?

12.19.22:33: Was Edkins (1890: 95) the first to derive Korean param from Old Chinese 風?

param, wind; from [an unspecified - presumably Chinese -] pam. The old Chinese for wind is bam, which has changed to [Mandarin] feng.

Edkins was writing decades before Karlgren reconstructed Old Chinese. I know almost nothing about pre-Karlgren Chinese reconstructions, so I wonder what the reasoning behind pam and bam are. *pam is not a bad guess, since even in the 19th century, it was known that f- was from *p- and that 'wind' rhymed with 南 'south' (Mandarin nán and Cantonese naam4). However, *b- is a surprise, as 'wind' does not have a tone pointing to an earlier *voiced initial.

3. I've never seen anything like this use of the reflexive in Romagnol:

mè a sò 'I am' (cf. Italian [io] sono 'id.')

The reflexive seems less exotic in this case:

mè a j'ò 'I have' (cf. Italian [io] ho 'id.')

And the English and Italian translations of this last instance also have a reflexive:

mè a'm so lavê 'I washed myself' (cf. Italian [io] mi sono lavato 'id.')

4. Wikipedia:

Romagnol has an inventory of up to 20 contrastive vowels in stressed position, in comparison to Italian's 7.

Unfortunately Wikipedia doesn't list all 20 vowel phonemes. How did the 10 native vowels of Latin become 20 in Romagnol? Are some of the Romagnol vowels from Latin diphthongs?

The most interesting Romagnol vowels are these diphthongs which are unlike anything in Latin:

I assume they are phonemes, though Wikipedia represents them with phonetic brackets. /Və̯/ : /Vɐ̯/ is a fine contrast I've never seen before.

5. How did Neapolitan develop this alternation?

Did an earlier *o break to [wo] before the masculine ending *-o merged with the feminine ending *-a?

6. While I'm in languages of Italy mode, It just occurred to me that the gorgia toscana is a bit like Jurchen/Manchu in which *p > f (albeit in all environments, not just intervocalically) and *-k- > -h- (see Vovin 1997 for details).

7. I saw a commercial for the IUDs Mirena [məɹiːnə] and Kyleena [kʰajliːnə]. Those names sound like 'creative' Anglospheric girls' names. The commercial was aimed at young women. Somebody wanted the audience to think of IUDs as if they were daughters. The children that the IUDs are supposed to prevent. Creepy marketing. YELLOW PIG 11/22

<so nggiyan uliya aniya juwa emu biya orin juwe inenggi>

'yellow pig year, ten one month, twenty two day'

1. orin juwe 'twenty two' is a para-Mongolian (Khitan?)-Jurchen hybrid. Compare with Written Mongolian qorin qoyar 'twenty two' containing an unrelated Mongolian word for 'two'. Jurchen juwe 'two' is not to be confused with Jurchen juwa 'ten'.

2. Last night when trying to figure out the Chinese character spellings for damofo and yumofo, I typed <fo> into Windows 10's Pinyin IME and was surprised to see 仸 <PERSON.夭>. 夭 ǎo/yāo/yǎo is normally not phonetic in b/p/f-graphs:

I would have guessed that 仸 was read as something like yao. Then I learned that 仸 is a variant of 佛 'Buddha'. 仸 seems to be a semantic compound with 天 <HEAVEN> slightly altered to 夭. (天 and 夭 are difficult to distinguish in a sans serif font, but in handwriting, the top stroke of 天 is written from left to right, whereas the top stroke of 夭 is written from right to left.)

3. Two elephantine surprises last night: Wiktionary notes a subtle difference between 象 <ELEPHANT> in the PRC standard and nom on the one hand and elsewhere in the Sinosphere on the other. Both versions of 象 have the same codepoint.

I am not sure that the PRC and nom really have a distinct version of 象:

4. 象 was also formerly a simplification of 像. The Wiktionary entry for 象 says it was a 1964-1986 simplification of 像. Wikipedia mentions other two characters restored in 1986: 覆 and 叠. I am skeptical:

5. When trying to type 复 in Microsoft's Bopomofo IME, I found 䲁 <FISH.wèi> wèi 'a snake-like fish' as the 64th and last choice for fù. How did 䲁 get in the list? Graphic confusion with 鮒 <FISH.> 'a kind of fish' which is also in the list?

6. Unidentifiable Khitan small script characters I encountered while copying the 契丹小字研究 Qidan xiaozi yanjiu (Research on the Khitan Small Script)  hand copy of the epitaph for Emperor 興宗 Xingzong (1015-1054) of the Khitan Empire:

⿱⺌月 (but with a dot instead of two horizontal lines in 月; 2.21.1)

a lookalike of Chinese 七 <SEVEN> (2.24.1)

I assume they must be in the book's indices under more conventional forms - but what are those forms?

Ah, the first was a variant of 298 <co> with a narrower bottom half and a curved lower stroke:

The very block with 298 from Xingzong was even discussed in Kane (2009: 71). Duh.

The Qidan xiaozi yanjiu hand copy also has some slight variations of characters I do recognize: e.g.,

243 <HEAVEN> and 240 <TEN>

are written with 𠂉 on top instead of ハ. As a result, 243 <HEAVEN> looks like 矢 204 whose phonetic value is unknown. Could 矢 204 be interpreted as 'heaven'?

I still have no idea what 七 is. Not only is it an unusual (for Khitan) shape, but it is also is the only top element in a pyramid.

7. The Cantonese-only character 乸 <jaa2.MOTHER> for naa2 'female' has an unusual phonetic 也 jaa5. The rhyme is perfect; the initial is not. 乸 has puzzled me since I first saw it some time ago, but today I just realized that a j-phonetic 也 might have been chosen because there are phonetics representing both j- and n-syllables: e.g., 襄 soeng1 (with s-!) < *sInaŋ in

That j- ~ n- alternation goes back to a single Old Chinese *n- that developed two reflexes: *n- before nonhigh vowels and palatal *ɲ- before high vowels.

也 had Old Chinese *l-, another source of Cantonese j-. *l-characters normally aren't phonetics in Cantonese n-characters.

Cantonese speakers would not know which j- are from *n- and which j- are from *l-, so whoever came up with 乸 might have thought, 'if 襄 can stand for j- and n-syllables, 也 can too', unaware that 也 jaa5 isn't from *n- (and hence 'shouldn't represent Cantonese n-syllables).

8. I missed Andrew West's tweet on a cursive Tangut tablet from the Baisigou pagoda.

9. Marijn van Putten on the mystery of Mehmet. YELLOW PIG 11/21

<so nggiyan uliya aniya juwa emu biya orin juwe inenggi>

'yellow pig year, ten one month, twenty two day'

1. orin emu 'twenty one' is a para-Mongolian (Khitan?)-Jurchen hybrid. Compare with Written Mongolian qorin nigen 'twenty one' containing an unrelated Mongolian word for 'one'.

2. I wish I could look more into exceptions to 'Altaic' vowel harmony. Two examples that have long stuck in my mind:

More recently I came across Manchu age 'older brother' (not ege or aga!; see Hauer and Corff [2007]: 7). Rozycki (1983: 22) regards age as somehow related to Written Mongolian aq-a¹ 'id.': "The correspondence is ancient and direction of loan impossible to ascertain." Could this be an anne-like case of intimate deformation?

I couldn't find age or other similar Manchu words like ahūn 'older brother' in Doerfer's Mongolo-Tungusica (1985), so I suppose Doerfer does not think there is any connection between the Manchu and Mongolian words.

What finally pushed me to write about Manchu age was seeing Manchu ajige 'small, little, young' (not ejige or ajiga) on Saturday night. Its root is aji-, also found in ajida 'small' and ajigan 'young, small' which are harmonic. majige 'little' is similarly nonharmonic with similar semantics. Are these cases of cute deformation? Imitating the speech of small children who have not yet mastered vowel harmony? I can't quickly find any article on L1 Turkish vowel harmony acquisition (DuckDuckGo results are often unsatisfying), but Leiwo, Kulju, and Aoyama (2006?) cover Finnish vowel harmony:

The data showed that most of Finnish 2;6-year-olds’ productions do not violate FVH [Finnish vowel harmony], suggesting early mastery of FVH. When there were errors in children's productions, they were mostly substitutions of back vowels for the front rounded vowels.

... which is the opposite of the substitution that occurred in Turkish anne! (Or centuries ago in barmis.)

Unlike Finnish or Turkish, Manchu does not have palatal harmony. Manchu age, etc. have a high series vowel e [ə] in place of its low series counterpart a. But if I 'translate' the Finnish error pattern into Manchu, I would expect substitutions of low series vowels for high series vowels. Which is the opposite of what happened in age, etc.

There is, however, a common denominator: Finnish vowel harmony errors occurred "especially in non-initial syllables and in suffixes" (Leiwo, Kulju, and Aoyama (2006: 151), and the Turkish and Manchu violations above are also in noninitial position: -mis, anne, age.

Incidentally, Aoyama Katsura is a former classmate of mine.

¹The hyphen is a device to transliterate the obligatory space in the Written Mongolian spelling <aq a>; it has no morphological or phonological significance.

3. Looking at Tangut


4440 2len4 'pavilion' (#189 in The Golden Guide)

led me to wonder: Why did Middle English pavilloun become modern English pavilion? Was -i- restored by someone who knew its Latin source pāpiliō 'butterfly'?

4. Today I started copying the epitaph for Emperor 興宗 Xingzong (1015-1054) of the Khitan Empire. I haven't gotten to line 4 yet, but I looked ahead and spotted block 24

<096.339.140> <?.i.en>

of line 17.24.

The only other instances of 096 that I know of are in the block

<096.339> <?.i>

in the epitaphs for Mme. 耶律 Yelü (11.20) and 蕭敵魯 Xiao Dilu (1061-1114; 30.19 and 34.14).


is similar in shape to 095, a lookalike of Chinese 女 <WOMAN>. 095 is more common than 096 and can occur in medial and final positions in blocks. These different distributive patterns suggest that 096 represents a more complex phonetic sequence than 095 - one that so far is only known from the beginnings of words. On the other hand, whatever 095 represents may be more complex than, say, 339 which is simply [i]?

Both 095 and 096 probably represent one or more syllables absent from Liao Chinese, as neither appears in Khitan transcriptions of Chinese. They may contain

I doubt that 095 or 096 represent single segments. I suspect that all the single-segment phonograms of the Khitan small script have been found by now.

As far as I know, as of 2016 there were 482 known small script characters including variants. Have any new ones been found lately? The only new small script texts found lately to the best of my knowledge are fragments of jade tablets from a mausoleum. If this photograph is representative, the texts are too short to be likely to contain any character that hasn't surfaced in any previously known, much longer texts.

5. Today I finally got Jun Jiang's Learn Manchu Handwriting on my iPhone. As neat as it is to see a finger trace strokes on a screen, I wish I could double-check the direction and order of strokes with another source. And I'm not yet accustomed to the wheel interface.

6. Today I also got Jun Jiang's Mongolian Words & Writing app, but I haven't tried it out yet. Users hoping to learn Mongolian Cyrillic will be disappointed since the app only covers the traditional script. I'd like to know how to write Ө <Ö> and Ү <Ü> in cursive. (The rest of the alphabet is identical to Russian, and I've been writing Russian in cursive since 1997.)

7. Jun Jiang's store doesn't have any app for Mongolian Cyrillic, but it does have these apps:

I assume those apps have the same interface as the Manchu app.

So much for my original guess that Jun Jiang might be a Manchu and Mongol specialist.

8. Wikipedia's sample of the traditional Mongolian script is (turn 90 degrees clockwise for the proper orientation - alas, that way the first line is on the right instead of the left where it should be):

ᠴᠣᠷᠢ ᠢᠢᠨ ᠭᠠᠭᠴᠠ

cori yin ghaghca

ᠪᠣᠰᠤᠭ᠎ᠠ ᠪᠢᠴᠢᠭ᠌᠄33

'single GEN single': i.e., 'the one and only'

bosugh-a bicig:

'vertical script:'

ᠮᠣᠩᠭᠣᠯ ᠪᠢᠴᠢᠭ᠌

mongghol bicig

'Mongol script'

I don't know what is meant by 'one and only' since  there are other vertical scripts, and even if one is only thinking of major vertical scripts written from left to right, the Mongolian script is not unique since the Manchu script is written the same way.

ghaghca has a synonym ghanca. How can that word-medial -gh- ~ -n- alternation be explained - assuming they are related words?

9. Today while double-checking the Li Fanwen number for the common Tangut character


4457 2leq3 'great'

I found these interesting characters which appear to be semantic compounds:


4445 2bi1 = 4457 2leq3 'great' + 2547 1chir2 'right'


4454 2ryr1 = 4457 2leq3 'great' + 2920 1zhyq3 'left'

2920 has the Tangraphic Sea analysis


2920 1zhyq3 'left' = all of 3485 1laq 'hand' + right of 4454 2ryr1

which cannot be taken at face value as the origin of the character - why would a character for a common word 'left' be based on a rare character 4445?

4445 and 4454 are only known as members of these compounds:


4445 0661 2bi1 2ngon4  'South Sea'


4454 0661 2ryr1 2ngon4 'North Sea'

4445 and 4454 are not the normal words for 'south' and 'north' which are


4796 1zyr4 'south' and 0942 1laq3 'north'

Although the Tangut script is thought to be full of semantic compounds, it is curious that 4445 and 4454 - glossed by Li Fanwen (2008: 706-707) as 'south' and 'north' - do not contain any components in common with 4796 and 0942, the graphs for the common words 'south' and 'north'.

Nonetheless Li's glosses make sense: 4445 has the notation


4796 0661 1zyr4 2ngon4 'southern sea'

in Homophones D and is a definition for 4796 'south' in Tangraphic Sea 89.251. And if 4454 contains 'left', the opposite of the 'right' in 4445, then 4454 must be 'north', the opposite of 4445 = 4796 'south'. But I am hesitant to gloss 4445 and 4454 simply as 'south' and 'north'. Maybe 'Great South' and 'Great North' or even as 'Great Right' and 'Great Left'?

The association of 'south' with 'right' is reminiscent of Sanskrit dakṣiṇa- 'south/right'. Sanskrit uttara- 'north' can also mean 'left', but the normal word for left is vāma- which does not mean 'south'.

What were the Great South/Right and Great North/Left Seas? Were they mythical? I don't know much about how the landlocked Tangut perceived their world. How many Tangut had ever seen a sea? What is the etymology of 2ngon4 'sea'?

10. Today I saw this passage in Gorelova ( :15; I added the links):

The Mohes [靺鞨] called their tribal leader "damofo mandu" (chin. da [大] "great"), as one can see further, the Southern Shiwei [室韋], who can be identified as people of Tungusic descent, called their tribal chieftains "yumofo mandu".


The language spoken by the Mohe was Tungus-Manchu. What is important to mention is that the language of the Sushen could also be referred to as proto-Tungusic.

During the Tang era, the Mohe, similar to other peoples of northeastern Asia, were subjected to constant political and military pressure from Tang rulers. Soon after the Koguryo state of Korea had been defeated by the Tang empire (668 AD), a large portion of the Koguryo people fled into the lands of the Sumo Mohe [粟末靺鞨]. Soon a lot of towns, surrounded by defensive walls, arose there. Around 700, a new state, "Parhae" (chin. Bohai), raised from the ruins of Koguryo, was established. It was the leader of Sumo Mohe, Cicik Zhungxiang [乞乞仲象] who was considered the creator of Bohai. [...] Later, his grandson, Uazhi Da Tuyu, declared himself the emperor of Bohai, which in the course of time became highly cultured and enlightened, and widely known beyond the borders of the country. The Parhae (Bohai) state—a deserving successor of the culture and power of Koguryo and the tribal league of the Songari Mohe—flourished for 228 years until it was destroyed by the Qitans [Khitans] (926 AD) (Shavkunov, 1968; Crossley, 1997:18; Larichev, 1998:53-4).

What are the characters for damofo mandu and yumofo mandu which sound like modern Mandarin readings of old Chinese transcriptions?

I was surprised to see the Southern Shiwei described as Tungusic since their name - roughly pronounced *shirwi in Late Middle Chinese - is derived from the para-Mongolic autonym Serbi. But of course names are not reliable guides to linguistic affiliation.

Cicik Zhungxiang is a strange, not-quite-Pinyin romanization of 乞 乞仲象 Qǐqǐ Zhòngxiàng with a -k whose motivation is obscure. Assuming the Chinese pronunciation favored in Parhae was like early Sino-Korean, 乞 乞仲象 was pronounced something like *kər kər tyung syang. 乞 乞 <BEG BEG> looks like an insulting ('derographic') transcription of a non-Chinese (i.e., Mohe) name. 乞 乞仲象 is also known as 大 仲象 with a Chinese-style surname 大 <GREAT> to go along with the Chinese-style disyllabic personal name 仲 象 <SECOND.BORN ELEPHANT>.

Uazhi Da Tuyu is presumably 乞 乞仲象's son (not grandson) 大祚榮 (Mandarin: Dà Zuòróng, Korean: Tae Cho-yŏng; r. 712-719), the first king (not emperor) of Parhae. I have no idea what Uazhi is.

11. The best for last: I just discovered Andrew West's Tangraphic Sea search tool! More Tangut resources here.

Tangut Yinchuan font copyright © Prof. 景永时 Jing Yongshi
Tangut character image fonts by
Tangut radical and Khitan fonts by Andrew West
Jurchen font by Jason Glavy
All other content copyright © 2002-2018 Amritavision