It's still the year of the pig in traditional East Asian calendars, but it's the year of the rat (2020) if one coordinates the Chinese animal cycle with the Gregorian calendar:

Last night I realized that Khitan small script character 216 might be a derivative of 118 <qu>:


Let's assume 216 was <qu*> with <*> indicating 'different from <qu> in some way'. Then

<216.151> 'rat'

would be read <qu*ghu> which is close to Written Mongol qulughana 'rat'.

What if <qu*> were <qul>? <qul.ghu> is close to qulughana, but I wouldn't expect Khitan u to correspond to Written Mongol a.

That's where I left off last night. Today I realized that <ghu> might be read <ugh> after a consonant. So maybe <216.151> was read <qul.ugh> which is even closer to Written Mongol qulughana and requires no vocalic gymnastics.

The low frequency of 216 (7 times in the 契丹小字研究 Qidan xiaozi yanjiu corpus and 0 times in initial position in Wu and Janhunen 2011 [whose index is organized by initial graphs]) suggests that it probably did not represent a simple CV syllable. If it didn't represent the CVC syllable qul, it may have represented a CVCV sequence qulu, and <qulu.ugh> was read qulugh.

The <qul(u)> hypothesis could be confirmed if 216 alternated with <qu.l>, <qu.ul> (= <>?), etc.

As far as I know, 216 appears only in initial position with one exception: this block

<119.216> <dau.?>

from line 3 of the second inscription in the 萬部華嚴經塔 Wanbu Avataṁsakasūtra Pagoda in Hohhot.

2. I still practice writing Tangut, Khitan, and Jurchen (TJK) every day. Recently I added Manchu to my regimen and today I started writing Mongolian (in the traditional script - I still don't know how to handwrite Ө and Ү in Cyrillic).

All my TJK exercises begin with the date. I'm still going to date these blog entries in Jurchen since it's the thousandth anniversary of the Jurchen large script or close to it (see Kiyose [1977: 22] for three possible dates: 1119, 1121, and 1123; Kane [1989: 3] gives the date 1120, though Kane [2009: 3] gives the date 1119). Today's date in Jurchen is:

songgiyan uliya aniya juwa juwe biya ice nadan inenggi

'yellow pig year, ten one month, new seven day'.

3. Last night I learned about prothesis in Bashkir:

The prothesis is mostly unsurprising, but these correspondences are:

1.2.11:00: I forgot to mention these cases of prothesis in native words:

Without more Bashkir data, I can't test my guesses for motivations: e.g., avoiding initial l- and making monosyllables disyllabic.

The Bashkir letter ҡ <q> surprised me since I'm accustomed to қ <q> from Kazakh, etc. Why do Bashkir and Siberian Tatar have their own special ҡ <q>? Siberian Tatars were educated in (Volga) Tatar which has к <k> for /k/ (including a [q] allophone) and  къ <k"> for /q/.

4. Today I learned about the Caucasian Albanian script used to write a (near?-)ancestor of the Udi language.

I've thought Old Chinese might have had pharyngealized vowels, so I'm interested in the phonetics of Udi's pharyngealized vowels.

5. What is the etymology of Persian شمشیر <šmšyr> shamshir, first (?) attested in Middle Persian as <šmšyl>? It doesn't look Indo-European. Is it an areal word?

6. Why does the Persian word/name فرشته <frsth> fereshte < firishta sometimes appear as Farishta(h), e.g., in this 1958 Bollywood film title (फरिश्ता Phraiśtā; cf. Urdu فرشته Firishta) and this list of Pashto (not Persian, I know) names? YELLOW PIG 12/6

songgiyan uliya aniya juwa juwe biya ice ninggu inenggi

'yellow pig year, ten one month, new six day'

1. Last night I looked up 'hip bone' and discovered it could also be called the innominate bone. Why 'nameless'?

2. Are 清樂 Shingaku 'Qing music' lyrics an overlooked source of data for premodern Mandarin reconstruction? In this sample from 月琴樂譜  Gekkin gakufu (Moon Guitar Sheet Music, 1877), 兒 (now ér [aɚ˧˥] in modern standard Mandarin) has the furigana ルウ <ruu>. That seems to indicate that the kana transcription is based on a dialect in which 兒 was pronounced like [ɻ̩]. (Other evidence rules out the most obvious interpretation [ruː]: e.g., no Mandarin dialect has [u] in 兒.)

The date of the text does not necessarily indicate that the [ɻ̩] pronunciation still existed in the source dialect as of 1878. The kana spelling ルウ <ruu> could have been copied from some earlier source.

ルウ <ruu> bears no resemblance to ジ <zi> [dʑi], the usual Japanese reading of 兒. Strictly speaking, the two Japanese borrowings are not from the same dialect in two different periods: <zi> is from a 7th century northwestern Chinese dialect, whereas <ruu> is from a Qing (perhaps 18th century?) Mandarin dialect. Nonetheless the latter probably underwent more or less the same changes as the former, so as a convenient fiction, here's how the sources of <zi> and <ruu> could be bridged:

Modern standard [aɚ] is from a stage 5-type form that developed a prothetic vowel:

*ɻ̩ > *əɻ > > [aɚ]

In some Mandarin varieties, only the prothetic vowel  has survived without any trace of retroflexion: e.g.,  壽縣 Shouxian [ə] and 鳳陽 Fengyang [a] for 兒.

It is tempting to derive Sino-Korean 아 a for 兒 from a Fengyang-like form, but that would be anachronistic. Fengyang [a] is probably a very recent development from *ar, whereas the earliest attested ancestor of 아 a is ᅀᆞ borrowed from a form like stage 4 *ʐɻ̩. became ʌ in the 16th century, and ʌ then became a in the 18th century.

3. I don't understand how Korean z vanished without a trace. Lee and Ramsey (2011: 142) state that "early examples of the elision of z are all restricted to the environment _i, y, which suggests that the process of change started there." They give these examples:

In those particular cases, I can imagine /z/ being phonetically something like [ʑ] that lenited to [j] and then disappeared before /i/. But what were the intermediate stages between /z/ and zero in initial position before /ʌ/ as in 15th century /zʌ/ > 16th century /ʌ/?

I thought [ɦ] might be a possible intermediate stage by analogy with Sanskrit:

Proto-Indo-Iranian *ĵʱ > Sanskrit h [ɦ] but Avestan z

I assume there was a stage like *ʑʱ underlying both  the Sanskrit and Avestan reflexes. (No, see topic 4 below.) That stage would be like Middle Korean /z/. In some modern Indic languages, Sanskrit initial h- has disappeared in reflexes of hima- 'winter'. I don't know if that's a regular change.

4. I've been trying to work out the phonetics of Proto-Indo-Iranic¹ (PII) reflexes of Proto-Indo-European (PIE) velars.

4.1. The PIE starting point:


4.2. The first palatalization in PII



4.3. Affrication in PII (cf. the alveolar affricate reflexes of Sanskrit palatals in some modern Indic languages)


4.4. The merger of plain velars and labiovelars


4.5. The second palatalization in PII


*ɟʱ *gʱ

Velars palatalized in certain environments. Compare:

4.6. The merger of *e and *o into *a made the second palatalization phonemic:

It was no longer possible to regard *c as an allophone of /k/ before /e/, since /e/ no longer existed. (The e of later Indo-Iranic languages is not from the earlier *e that merged with *a: e.g., Sanskrit e is from PII *ai which could be from PIE *ei or *oi but not PIE *e.)

1.1.0:59: The following sections deal with post-PII developments.

4.7. Pre-Sanskrit (Proto-Indic²) stage 1


*ɟʱ *gʱ

The affricate series palatalized. I thought the absence of *ts-type affricates in Proto-Dravidian might have pressured a shift away from alveolar affricates, but the traces of Indic in the Near East - far from Dravidian - underwent stage 2 (4.8 below): e.g., the name Paršasatar from praśāstar- 'director' with ś < PII *ts-.

4.8. Pre-Sanskrit (Proto-Indic) stage 2



Voiceless *tɕ simplified to *ɕ.

The voiced affricates merged with the voiced palatals.

I don't know the order of those two changes, so I show the results of both changes in the same table instead of arbitarily showing one change at a time in two tables.

4.9. Sanskrit (Proto-Indic)

ś [ɕ]
j [ɟ]
h [ɦ]
gh [gʱ]

*ɟʱ weakened to h [ɦ].

4.10. Proto-Iranic (continuing from 4.6)



The voiced aspirate series merged with the plain voiced series.

4.11. Avestan

j [ɟ] g

The affricates deaffricated. The change of *ts to s is roughly parallel to the change of *tɕ to ś in Sanskrit. But note that Proto-Iranic *dz became Avestan z, whereas pre-Sanskrit *dz did not become Sanskrit ź [ʑ], a sound that does not exist in Sanskrit.

The exact phonetics of c and j are unknown. They were palatal unlike s and z, so I have projected palatal stops forward into Avestan. But maybe Avestan c and j were actually affricates.

4.12. Summing up

2nd palatalization
*kʲ n/a
*gʲ n/a
j [ɟ]
*gʲʱ/*gʱ n/a
*dzʱ h [ɦ]
*k/*kʷ +
*g/*gʷ +
j [ɟ]

j [ɟ]
*gʱ/*gʷʱ +
*ɟʱ h [ɦ]
*k/*kʷ -
*g/*gʷ -
*g g
*gʱ/*gʷʱ -
*gʱ gh [gʱ]

¹1.1.0:40: I favor the term Iranic by analogy with Turkic, Mongolic, etc. to avoid confusion with the country of Iran.

²1.1.0.57: I prefer the term Indic to Indo-Aryan, as the word Aryan is shared by both Indic and Iranic. Ironically, the name Indic is actually Iranic, as it is an Hellenization of Old Persian 𐏃𐎡𐎯𐎢𐏁 <ha i du u sha> [hi(n)duš] 'India', cognate to Sanskrit Sindhus 'Sindhu'. The Old Persian form has two Iranic innovations:

It occurs to me tonight that an Indic name for Indic would be Sindhic, but that's not going to catch on. No one is going to rename the country Sindhia either. And Hindutva advocates are probably not going to change the name of their ideology to Sindhutva. YELLOW PIG 12/6

songgiyan uliya aniya juwa juwe biya ice shunja inenggi

'yellow pig year, ten one month, new five day'

1. I checked Jan van Steenbergen's Interslavic page for updates and noticed a new item in the menu:

The Painted Bird (in Czech: Nabarvené Ptáče) a Czech-Slovak-Ukrainian film written, directed and produced by Václav Marhoul. It is based on Jerzy Kosiński’s novel The Painted Bird from 1965.


The action takes place in some unspecified East-European, Slavic-speaking country. A place that cannot directly be linked to a specific Slavic population requires a language that can instantly be recognised as Slavic but not be linked directly to any specific Slavic population either. That's why Marhoul decided to use Interslavic:

2. I just bought e-access to Vojtěch Merunka's Interslavic zonal constructed language: an introduction for English-speakers. Google says I can check a box to "Make [the book] available offline", but I can't find it.

On page 5, Merunka writes (12.31.14:03: links added),

Interslavic is also an interesting experiment of alternative history: If there was not such strong pressure from the Frankish Latin-oriented church (e.g. Wiching of Nitra and his band) against the Moravian Church in the 9th century, the invasion of the Hungarians into Central Europe and the subsquent collapse of contacts between Moravia (now a territory of both the Czech and Slovak Republics) and Bulgarian, Serbian and Kiev (later Russian) states, it is possible to imagine a hypothetic different evolution of the Slavic early Middle Age language - we have seen a similar phenomenon in the Arabic World: After the end of natural linguistic unity during the Middle Ages, the modernized universal Arabic language based on the religious language of the Qur'an still prevails. It is an artificial language which is close enough to the various contemporary spoken national dialects of Arabic that it is recognized as the standard for communication between Arabic nations and for contact with foreigners and used as an auxiliary language by both state apparatus and the media.

It would be fun to see historical fiction depicting a world where Interslavic - probably simply 'Slavic' - has the same position that modern standard Arabic has.

Page 143 presents a modified Arebica alphabet to write Interslavic.

3. 𗡠 0271 2mer4, representing the second syllable of 𗡢𗡠 0702 0271 1to'4 2mer4  'to seek, find', has a right side (Boxenhorn code: baedar) found nowhere else. I found it in Li (2008: 47) when looking up  𘅊 0273 1le1 for my last entry.

2mer4 sounds like Old and Middle Chinese *mek 'to seek'. If I were to force a relationship between the two, I could trace 2mer4 back to pre-Tangut *RImek-H with labial dissimilation:

*Pek > *Pew > *Pej > Pe

*RImek-H could be related to

𗑉 4684 1me1 < *CAmik or *mek 'eye'

cf. Tibetan mig (archaic dmyig) 'eye' (but Old Chinese has 目 *Cmuk - is *Cmikʷ possible?)

which is the word that made me discover labial dissimilation. Two scenarios:

But there are other possible pre-Tangut sources of 2mer4 that would rule out a connection with the Chinese word:

𗡢 0702 1to'4 'to seek' can appear by itself. That suggests that 𗡠 0271 2mer4 might be a formerly independent verb that only survives as the second half of a synonym compound 'seek-seek'.

4. Li (2008: 120) gives this example of 0702 as an independent verb from The Timely Pearl 292:


5098 0702 0760 1715

2ngon4 1to'1 2dzen4 1rar4

'case seek judge ?'

It corresponds to Chinese 案檢判憑 'case examine judge ?'

Nishida (1964: 215) has the translation 'to examine the case and hand down a judgment'. Nishida (1964: xii) says Burton Watson and a ヤンポルスキー (Yampolsky? - I don't know who this is, or what his preferred Anglicization of Ямпольский is) helped him with the English translations.  Later, Nishida (1964: 216) has the translation'deliver a judgment' for 判憑 in Timely Palm 302.

I would think then that 𘅤 1715 1rar4 /憑 means 'to hand down' or 'to deliver'. But the basic meaning of 𘅤 1715 1rar4 is 'to write' (Li 2008: 285). So might the Tangut phrase in The Timely Palm mean 'write a judgment'?

憑 can be translated many ways in Chinese, but none of those translations mean 'write' or 'hand down' or 'deliver'. Might it be 'proof': i.e., 'evidence'? If so, then there is only a vague parallel between the Tangut object-verb sequence 𗍷𘅤 'write a judgment' and the Chinese verb-object sequence 判憑 'judge evidence (?)', and mechanically equating 𘅤 with 憑 may be a mistake.

Then again, to say Burton Watson's knowledge of Chinese dwarfs mine would be an understatement, and maybe 判憑 is an idiom 'deliver/hand down a judgment' that I just failed to confirm in other sources.

I always assumed Watson had learned Japanese in the American military in WWII, but in fact he didn't know any Japanese when he arrived in Japan in 1945, and he was actually a Chinese major.

5. My DuckDuckGo search for Yampolsky led me to a video of minerva scientia pronouncing Tangut in Gong's (more or less) and Arakawa's reconstructions.

6. ElitekidMu0 comments on that video:

Fun fact: Thunder Force VI [Wikipedia], a shooting game released in 2008 by SEGA for the PS2, included the Tangut Language as the main language for the protagonist of the series, Galaxy Federation (Vastian). Another language included in the game is the Mongolian Script, used by the antagonist of the series, ORN Empire.

7. Last night I learned that Kara Ben Nemsi was meant to mean 'Carl son German' (though nemsi is really closer to نمساوي‎ namsāwiyy/nimsāwiyy 'Austrian'; 'German' is ألماني 'almāniyy).

Karl May has a way with foreign names. I couldn't have come up with something equivalent to Old Shatterhand or Old Surehand in German.

8. I just noticed that the Old English Wikipedia (Ƿikipǣdia) is

Sēo Frēo Ƿīsdōmbōc

'the free wisdombook' (Ƿ <W> wynn is a rune borrowed into the Old English alphabet)

Are Goidelic forms like Irish seo 'this' the only living reflexes of Proto-Indo-European *só retaining s-? Greek [o] has lost h- < *s-, and English the has a th- that spread from the th-reflexes of the *t-initial oblique forms of *só.

9. I finally got around to rewriting my lost entry for 12.26 from memory. I finished right after I ordered a used hardcover copy of William C. Hannas' The Writing on the Wall: How Asian Orthography Curbs Creativity (2013).

10. Tonight I discovered the variant 槑 for 梅 <PLUM>.

11. Baxter and Sagart (2014) reconstruct 梅 <PLUM>. in Old Chinese as *C.mˤə. I suspect that *C was a voiceless consonant because Vietnamese 'apricot' has a ngang tone pointing to an earlier *m̥- which may be from an even earlier *C̥m- with a voiceless *C̥- that conditioned the devoicing of *m-. I would reconstruct the word in Early Old Chinese as *C̥Amə with a low first vowel that triggered the warping of to *ʌə:

*C̥Amə > *C̥Amʌə > *C̥mʌə > *m̥ʌə > *mʌe > *mʌj > *mɑj > *mwɑj > *muj > *mwəj > *məj > standard Mandarin [mej]

It is possible that *C̥A- was simply completely lost after warping in (many? most? all?) dialects other than the one underlying Vietnamese *C̥m-. I have not yet found any Chinese varieties with a yinping tone pointing to *m̥-.

The *m̥- in the scenario above is of late origin. An earlier *m̥- in Old Chinese became *x- in stage 2 below, whereas newer *m̥- merged with *m-:

stage 1
stage 2
stage 3

*m̥- *x-
hǎi [xaj˧˩˧]

*C̥m- *m̥- *m-
méi [mej˧˥]

měi [mej˧˩˧]

The tones above are conditioned by final glottals: final glottal stops conditioned the falling-rising tone [˧˩˧] and stage 3 voiced *m- and the absence of a final glottal conditioned the high rising tone [˧˥]. YELLOW PIG 12/4

songgiyan uliya aniya juwa juwe biya ice duin inenggi

'yellow pig year, ten one month, new four day'

1. Tonight it occurred to me that the Jurchen and Khitan large script characters for 'four' might be graphic cognates:

One might be rotated - but which one? And did the Parhae script have both rotated and nonrotated variants of <FOUR>?

12.30.0:17: Both <FOUR>s have four strokes, so they may simply be two types of tally marks formalized as characters.

In any case, the Khitan large script character is not to be confused with Chinese 卅 <THIRTY> which is a fusion of three 十 <TEN>s.

12.30.12:50: Chinese 卅 <THIRTY> in turn should not be confused with the Jurchen phonogram <sui>:


Jin (1984: 25, 26, 180) reports the first pair of forms in the 大金得勝陀頌碑 Great Jin Victory Hill stele (1185) and the second 卅-like  pair of forms in the Berlin and Tōyō bunko copies of the Ming dynasty Bureau of Translators vocabulary from c. 1500. Without examining the original texts, I cannot be certain about minor variations such as the presence or absence of a hook in the 1185 stele.

I fear that the Bureau of Translators' forms might be unintentionally 'sinified' in the sense that unfamiliar Jurchen characters were accidentally modified by scribes more familiar with sinography. Perhaps the resemblance of <sui> to Chinese卅 <THIRTY> in the Bureau of Translators vocabulary might be an example of sinification.

12.30.15:33: Jin (1984: 58, 76) derives Jurchen <FOUR> from the phonogram <da> which in turn he derives from Chinese 屠:


In the Jin dynasty, 屠 was pronounced *tʰu. Why base a phonogram <da> on a Chinese character pronounced *tʰu?

I don't think <da> was a Jin dynasty invention. I think its roots go back further to a period when 屠 was pronounced as *da in Late Old Chinese. (屠 was once a transcription character for -ddha in 浮屠 *bu da = Buddha.) In other words, I think <da> is potential evidence for the Jurchen large script being an heir to an old tradition of phonetic writing rather than a 12th century invention.

I don't think there is any relationship between <FOUR> and <da> beyond graphic convergence - the bottom of <da> (known only from two inscriptions) may have been remodelled after the far more common character <FOUR>.

2. Tonight while copying character 236 of the Golden Guide, I miswrote the Tangut character element 𘡛 by placing the dot too low so it intersected the stroke below it.

Nishida (1966: 242) interpreted as 𘡛 a radical for things having to do with 愛惜 aiseki 'cherish'. It just occurred to me that 𘡛 might be derived from the top of 愛 <LOVE> or the top right of 惜 <CHERISH>.

But ... what is 𘡛 doing on the top of 𘓉 0993 1lhew1 'to herd', of all things? Is 𘓉 0993 a semantic compound like <CHERISH.LIVESTOCK>?

But ... the bottom of 𘓉 0993 (Boxenhorn code: baecie) is neither 'livestock' nor short for a character for any animal. The only other character with baecie is 𘅊 0273 1le1, a character for writing surnames.

3. I was surprised by this passage (emphasis mine):

Martin Kümmel similarly proposes, based on observations from diachronic typology, that the consonants traditionally reconstructed as voiced stops were really implosive consonants, and the consonants traditionally reconstructed as aspirated stops were originally plain voiced stops, agreeing with a proposal by Michael Weiss that typologically compares the development of the stop system of the Tày language (Cao Bằng Province, Vietnam).

But then I checked Pittayaporn (2009: 110) who explains that in Cao Bằng,

I can see something similar happening in Proto-Indo-European ... except for this problem:

The ejective hypothesis, on the other hand, correctly predicts that Proto-Indo-European labial *pʼ (corresponding to *ɓ- in the implosive hypothesis) would be rare or absent.

4. I wish there were animated GIFs like the Georgian ones at for Manchu and traditional Mongolian letters. I've been using Jun Jiang's Manchu app which has animated images for Manchu syllables and words, but it doesn't seem to match the verbal (nonvisual) instructions in Roth Li's Manchu textbook, so I'd like to see a second opinion.

5. I discovered that the Old English Wikipedia has a runic viewing option. Select ᚱᚢᚾ <run> under the article title.

12.30.0:16: Try the ȝƿ and ᵹƿ viewing options too.

6. Why is Gdańsk Gduńsk in Kashubian? Is Polish a : Kashubian u a regular correspondence in some environment(s)? I don't see anything like *a > u in Stone's (1993: 765) sketch of Kashubian vowel history.

7. Another Kashubian surprise: kùńszt [kwuɲʃt] (I think) 'art' < German Kunst. Why [wu]? How did Kashubian develop [wu] in native words? Is [ɲ] instead of [n] due to assimilation with [ʃ]? Was the word borrowed from a German dialect in which 'art' was [kunʃt] instead of [kʊnst]? 'Hyperlabial' [wu] for [ʊ] seems odd to me.

Aha, I see now that Kashubian /u/ becomes [wu] "[i]nitially or after a labial or a velar" (Stone 1993: 762). So [wu] has nothing to do with German.

8. How did Proto-Slavic *sŭnŭ 'sleep' become Lower Sorbian soń with a palatal ń instead of the expected n as in the rest of Slavic: e.g., Upper Sorbian son? YELLOW PIG 12/3

<so nggiyan uliya aniya juwa juwe biya ice ilan inenggi>

'yellow pig year, ten one month, new three day'

(0. 12.29.0:15: I keep thinking the version of <ilan> above looks like Chinese 斗 <DIPPER>, but it is of course in fact cognate to Chinese 三 <THREE>.)

1. Via Andrew West: Abraham Gross' proposal to encode the missing kana <YI> and <WU> in Unicode. That reminds me to upload my August post about <YI> and <WU>.

2. I first heard the song "Year of the Cat" as a child in 1976, and only years later¹ did I learn that it was a reference to the Vietnamese zodiac which is close to the Chinese one with two exceptions:

The terms for the Vietnamese zodiac are not the normal terms for animals: e.g., in Vietnamese, 'water buffalo' is 𤛠 trâu and 'ox' is 𤙭 ~ 𤞨  bò.

I've long assumed that the reinterpretation of 丑 sửu as water buffalo incorporated a local animal, but water buffalo also exist in China too. Duh. In fact, China has seven times more water buffalo than Vietnam. Shows you what I know about farming: nothing. So I can't explain how sửu came to refer to water buffalo.

As for 卯 mão/mẹo, was its reinterpretation as 'cat' due to a folk etymological association with 貓 ~ 猫 mèo 'cat'?

¹In an interview with Al Stewart that I heard on the radio in 1989?

3. I never heard of screeves until today. The word sounds like it could be a native English word, but in this context it's actually a loan from Georgian მწკრივი cʼkʼrivi 'row, series'. I wonder why it's so Anglicized. It's not as if Japanologists speak of 行 gyō 'rows (of kana sharing the same vowel: e.g., a, ka, sa)' as gheow or however an English speaker might spell it. (It would be fun to ask English speakers unfamiliar with Japanese to write gyō phonetically.)

There turns out to be another screeve which isn't  native or from Georgian. YELLOW PIG 12/2

<so nggiyan uliya aniya juwa juwe biya ice juwe inenggi>

'yellow pig year, ten one month, new two day'

1. Dept. of Ideas I Wish I Had: Alexander Zapryagaev's proposal for writing Old Japanese in hentaigana, a logical extension of the common practice of writing the extinct Japanese syllable ye (now [e]) in hiragana as the hentaigana 𛀁 to differentiate it from え e and ゑ we (also now [e]). (More in this thread by Sven Osterkamp.)

2. The reading ritsu for 立 <STAND> is in that stratum of Japanese that I feel as if I've 'always' known. I suspect I learned the reading in the early 80s when I started to read Japanese books with furigana.

When I started learning Korean in 1987, I immediately picked up on the correspondences between Sino-Korean and Sino-Japanese¹. For instance, I noticed that Sino-Korean -l regularly corresponded to Sino-Japanese -tsu or -chi and vice versa. So I should have expected ritsu to correspond to Sino-Korean 릴 ril. But of course, the actual Sino-Korean reading of 立 is actually 립 rip. I learned that reading so early in my studies that I didn't even know the correspondence patterns yet. Hence the mismatch of -p and -tsu didn't bother me at all.

Not long afterward I learned Sino-Korean 잡 chap corresponding to Sino-Japanese zatsu for 雜 <MIXED>.

And then I learned the Cantonese readings of those characters: lap6 and zaap6.

The next step was learning about Chinese reconstruction. Of course all agree that 立 and 雜 originally ended in *-p in Chinese, and that Cantonese preserves that *-p.

So how did the Sino-Japanese readings of 立 and 雜 come to end in -tsu? Alexander Zapryagaev has a thread on the mystery of 立 ritsu.

¹And Mandarin, but that's not relevant here, since Mandarin lacks final stops. Without knowledge of Mandarin, I would have had a much harder time remembering which Sino-Korean words ended in -ng.

12.29.20:35: How I guessed final consonants in Sino-Korean in 1987 (before I knew anything about Cantonese or Vietnamese):

Sino-Japanese final
Sino-Korean final
vowel (usually; unpredictably occasionally in -p)
-ki, -ku
-chi, -tsu
-n or -m (unpredictable)

At the time I just memorized which Sino-Korean readings ended in -p, since there was no way to guess Sino-Korean -p on the basis of Sino-Japanese or Mandarin even in regular cases such as

十 <TEN> SJ : Md shi : SK 십 ship

In that particular case, *-ip was borrowed into Japanese as *-ipu which became *-iu and then -ū.

Once I learned which Sino-Korean readings ended in -p and -m, I could use that knowledge to guess which Cantonese and Vietnamese readings ended in -p and -m. YELLOW PIG 12/1

(I completed this post but lost it before I could upload it, so I reconstructed it on 12.30.16:13.)

<so nggiyan uliya aniya juwa juwe biya ice inenggi>

'yellow pig year, ten one month, new day'

1. The first ten days of the month are ice 'new' in the Ming Jurchen calendar. (In Jin Jurchen, the first day was 一日 emu inenggi 'one day'. Note how the early graphs are identical to Chinese 一日 <ONE DAY>.) Jin (1984: 105) derives the graph for ice from the left side 亲 of Chinese 新 <NEW>. But I think the Jurchen graph may be more directly connected to Chinese 𢀝 <NEW>, a variant of attested in the Jin dynasty dictionary 四聲篇海 Sisheng pianhai (The Four-Tone Text Sea).

2. In 1998 I reviewed William C. Hannas' Asia's Orthographic Dilemma for Korean Studies. I finally got around to reading a Kindle sample of the 2013 sequel The Writing on the Wall: How Asian Orthography Curbs Creativity.

Here's my attempt to sum up Hannas' argument:

A. East Asia has a "creativity deficit" (Kindle location 146)

B. Writing "affects thought" (Kindle location 245)

C. B causes A - in other words, East Asia writing systems cause a "creativity deficit"

A and/or B could be true. But I am skeptical of C. YELLOW PIG 11/30

<so nggiyan uliya aniya juwa emu biya gūsin inenggi>

'yellow pig year, ten one month, thirty day'

gūsin 'thirty' looks like Janhunen's (2003: 397) Proto-Tungusic *gutïn from para-Mongolic or pre-Proto-Mongolic *gutïn. (The Proto-Tungusic form cannot be from Proto-Mongolic *gucin which underwent two changes: > *i and *ti > *ci.) However, Proto-Tungusic *gutïn should become Jurchen gutin, not gusin.

I propose that Jurchen sin may be a borrowing that replaced an earlier *gūtïn inherited from Proto-Tungusic. (The macron in Jurchen does not symbolize length; it indicates that u is [ʊ].) The source of Jurchen gūsin may be a para-Mongolic (Khitan?) dialect that shifted *c to sh (unlike the prestigious Khitan dialect preserved in the small script that retains c).

I suspect that Khitan large script


is a graphic cognate of Jurchen


(12.26.13:19: Left to right: the earliest form from Nüzhen zishu [Book of Jurchen Characters, c. early 12th c.?], variant in 慶源 Kyŏngwon inscription, 1138-1153,  進士 jinshi candidate list, 1224, Berlin copy of the Bureau of Translators vocabulary, 15th c. It is interesting that the early and late forms are more similar to each other than to the forms between them.)

and sounded something like Jurchen gūsin, though there is no evidence for its pronunciation.

2. When I was studying Russian in the late 90s, I was surprised that 'Kremlin' was Кремль <Kreml'> without an n. I asked my professor why and ... I can't remember his answer. Today I learned from Wiktionary that there is an Old East Slavic кремлинъ <kremlinŭ> with -n-. But how did that n-form enter English? Not directly, I assume.

etymonline says:

1660s, Cremelena, from Old Russian kremlinu, later kremlin (1796), from kreml' "citadel, fortress," a word perhaps of Tartar origin. Originally the citadel of any Russian town or city, now especially the one in Moscow (which enclosed the imperial palace, churches, etc.). Used metonymically for "government of the U.S.S.R." from 1933. The modern form of the word in English might be via French.

The un-Turkic initial cluster kr- makes a Tatar (not 'Tartar') origin improbable. The Russian Wiktionary derives kreml' from Proto-Indo-European *kʷrom 'fence'.

12.26.10:09: Merriam-Webster says:

1662 [...] obsolete German Kremelien the citadel of Moscow, ultimately from Old Russian kremlĭ

That gives the impression that German added the -n (but why?). YELLOW PIG 11/29

<so nggiyan uliya aniya juwa emu biya orin uyewun inenggi>

'yellow pig year, ten one month, twenty nine day'

orin uyewun 'twenty nine' is a para-Mongolian (Khitan?)-Jurchen hybrid. Compare with Written Mongolian qorin yisün 'twenty nine' containing an unrelated Mongolian word for 'nine'.

Jurchen uyewun is trisyllabic unlike any other Tungusic word for 'nine' at starling other than Negidal ijeɣin with different first and third vowels. Neghidal i can correspond to Jurchen/Manchu u: e.g., N edin : J/M edun 'wind'. I have long assumed that Manchu uyun is a contraction of uyewun. That contraction already existed before Manchu got that name since the Ming dynasty Bureau of Interpreters vocabulary has disyllabic uyun (transcribed 兀容). The roughly contemporaneous trisyllabic uyewun (transcribed 兀也溫) in the Ming dynasty Bureau of Translators vocabulary may be more carefully pronounced and/or from a different dialect.

It's already Christmas in most of the world as I write this, so as a 'gift' to my readers, I'm uploading all the posts I wrote over the last month but had kept on my computer until now:

I've been too tired and busy to upload posts late at night. YELLOW PIG 11/28

<so nggiyan uliya aniya juwa emu biya orin jakūn inenggi>

'yellow pig year, ten one month, twenty eight day'

1. orin jakūn 'twenty eight' is a para-Mongolian (Khitan?)-Jurchen hybrid. Compare with Written Mongolian qorin naiman 'twenty eight' containing an unrelated Mongolian word for 'eight'.

Jurchen jakūn 'eight' has not changed much from Proto-Tungusic *japkun whose first syllable *ja looks like Proto-Japonic ya 'eight'. Coincidence? How many other instances of Proto-Tungusic intervocalic *j- correspond to Proto-Japonic *y-?

If one wants to link the Tungusic and Japonic words for 'eight' via borrowing, one must deal with the complication of working out a scenario of Tungusic-Japonic contact (see yesterday's post) and with the question of why Tungusic has *-pkun and Japonic doesn't. Proposing a genetic relationship eliminates the contact problem but still doesn't resolve the *-pkun problem.

It may be tempting to link early Korean *yʌtʌrp (Lee and Ramsey 2011: 160) to the Tungusic and Japonic words, but that raises even more problems: e.g., what is *tʌrp?

2. The current state of Korea-Japan relations in a slogan:

(1.2.15:51: Corrections by Kongduino.)

The verbs appear to be bare stems but are actually a-stems that have absorbed an -a ending that Martin (1992: 466) calls the 'infinitive'. But I would rather not use the term 'infinitive' for the ending of a finite verb.

The -a ending is more obvious in forms like 봐! pwa! 'look!' (< po-a) and 팔아! phar-a 'sell!'

3. I was surprised to learn from Martin et al. (1967: 870) that sa- 'buy' is also an "old-fashioned" term for 'sell (grain)', so ssar-ŭl sa-da 'rice-ACC X-STATEMENT' can be either 'buy rice' or 'sell rice'. YELLOW PIG 11/27

<so nggiyan uliya aniya juwa emu biya orin nadan inenggi>

'yellow pig year, ten one month, twenty seven day'

1. orin nadan 'twenty seven' is a para-Mongolian (Khitan?)-Jurchen hybrid. Compare with Written Mongolian qorin dologhan 'twenty seven' containing an unrelated Mongolian word for 'seven' with the numeral suffix last seen in jirghughan 'six'.

Jurchen nadan 'seven' can be projected intact all the way back to Proto-Tungusic. Proto-Tungusic *nadan looks like Proto-Japonic *nana 'seven'. Coincidence? How many other instances of Proto-Tungusic intervocalic *-d- correspond to Proto-Japonic *-n-?

What complicates a loan scenario is uncertainty over whether the two proto-languages were in contact. I think Tungusic and para-Japonic languages might have been in contact in Parhae, but that's centuries after the ancestor of Japonic spread from the Korean peninsula to the Japanese islands.

2. I just heard Muir pronounced as [mjʊɚ] which is what I'd expect for a theoretical Miur. Wiktionary lists a General American /mɪɚ/. I have never heard the name pronounced before. I thought it was homophonous with Moore in English. Wiktionary lists five (!) pronunciations for Scots muir 'moor': [møːr], [myːr], [meːr], [miːr], [mjuːr].

3. I also heard Buttigieg pronounced for the first time as [ˈbuːtɪdʒɪdʒ]. I had been mispronouncing it as [ˈbuːtɪdʒɛg], thinking gi was like Italian [dʒ]. Turns out both g's are Maltese ġ [dʒ] and ie is [ɨː] (according to Wikipedia's IPA for Maltese page) or [ɪː], [iɛ], or [iː] (according to Wikipedia's Maltese language page). In any case, ie is from ā, and so I'm not surprised to learn that Wiktionary says Buttiġieġ is from Arabic أبو الدجاج <ʔˀbw ʔldjʔj> ʔabū ad-dajāj, lit. 'father [of] the-poultry' with ā.

The bending of ā to ie in Maltese reminded me of the raising of Old Chinese *a to *ie and various high vowels and convinced me that Norman's pharyngeal hypothesis for Chinese was right. In my take on his hypothesis, pharygealization pushed vowels down, whereas vowels raised in its absence. But David Boxenhorn made me think  pharyngealization might not be a factor; vowel harmony alone might trigger vowel lowering and raising. And vowel harmony is a well-attested phenomenon in north Asian languages. YELLOW PIG 11/26

<so nggiyan uliya aniya juwa emu biya orin ninggu inenggi>

'yellow pig year, ten one month, twenty six day'

1. orin ninggu 'twenty six' is a para-Mongolian (Khitan?)-Jurchen hybrid. Compare with Written Mongolian qorin jirghughan 'twenty six' containing an unrelated Mongolian word for 'six'.

Grinstead (1972: 16) noted that

ninggu 'six'

is an inverted Chinese 六 <SIX>. It is not like any of the variants of Khitan large script <SIX>:

Is the Jurchen graph a 12th century invention, or is it derived from a version of the Parhae <SIX> that the Khitan did not adopt for their large script?

The reading of Khitan <SIX> is unknown, but it might be something like Proto-Mongolic *jir-gu-xan 'two-three-NUMERAL' as reconstructed by Janhunen (2003: 17). Jishi read <SIX> as ʧirkɔ: i.e., as 'two-three'. But if Janhunen is right about *jir-gu-xan being an innovation, Khitan might retain an older Proto-Serbi-Mongolic root for 'six'.

The Khitan small script block

<085.033.288> <> (Epitaph for Empress 仁懿 Renyi, d. 1076)

might indicate that <SIX> ended in -i, given how the initial vowel of one block (here, the i of <is>) is often (but not always) the final vowel of the previous block (here, <SIX>).

2. What is the etymology of Hawaiian luakini 'large heiau [Hawaiian temple; < hei 'sacrifice' + ?] where ruling chiefs prayed and human sacrifices were offered'? It looks like a compound of lua plus kini, but I can't find any lua or kini that would transparently add up to 'sacrificial temple'.

3. Wikipedia on the Dzungar genocide:

[Qing emperor] Qianlong issued his orders multiple times as some of his officers were reluctant to carry them out. Some were punished for sparing Dzungars and allowing them to flee, such as Agui and Hadada, while others who participated in the slaughter were rewarded like Tangkelu and Zhaohui (Jaohui).

If Tangkelu is a Manchu name, it violates vowel harmony. I would expect Tangkalu or Tengkelu.

4. I wish I could look for Tangkelu in Giovanni Stary's A Dictionary of Manchu Names (2000). The book's National Library of Australia listing says it's in "Mandingo" (sic). No.

5. In actual Mandingo, "/g/ and /p/ are found in French loans." The language has /k c j t d b/, though. Are /h/ and /p/ in part or in whole from earlier *g and *p?

6. The IPA transcription of the Kazakhstani national anthem is so different from what one might think Kazakh sounds like solely on the basis of the Cyrillic or Latrin alphabet: e.g.,

[jɪrlɪkˈtɪŋ dɑstɑˈnə]

Ерліктің дастаны

<Erliktiņ dastany>

Erlik-tiń dastan-y

'courage-GEN epic-3.POSS.NOM' = 'epic of courage'

One might expect the pronunciation to be something like [erliktiŋ dastanɨ] on the basis of Cyrillic and Latin alone. And if one guessed that Cyrillic і was [i], what would one guess и is? (It's [ɪj] ~ [əj] according to this chart.)

The use of ы/y for [ə] reminds me of my own choice to use y for the Tangut neutral vowel which may have been [ə] or [ə]-like in one or more grades.

The 3rd person singular possessive suffix -ы/y is missing from this table. See Mukhamedova (2016: 81) on the Kazakh X-GEN Y-POSS 'Y of X' construction.

7. Why does Glosbe align Kazakh дастан 'epic' with Dennis in translations?

8. Until now I assumed that Turkic beg was a loanword from the Middle Chinese title 伯 *pæk. That is the etymology in Clauson (1972: 322). But Wiktionary has a second etymology:

the Middle Persian title bag (also baγ or βaγ, Old Iranian baga; cf. Sanskrit भग / bhaga) meaning "lord" and "master". Peter Golden derives the word via Sogdian bġy from the same Iranian root. All Middle Iranian languages retain forms derived from baga- in the sense "god": Middle Persian bay (plur. bayān, baʾān), Parthian baγ, Bactrian bago, Sogdian βγ-, and were used as honorific titles of kings and other men of high rank in the meaning of "lord".

The problem I have with this etymology is: why was  a in some Iranian language borrowed as Turkic e?

If /a/ in the Iranian source language was [æ], how can Slavic bog 'god' be a loan from Iranian? Was the Slavic word borrowed from a different Iranian source language in which /a/ was back and labial: [ɒ] or [ɔ]?

As for the Chinese etymology, the mismatch of initials (Chinese *p- vs. Turkic b-) is not a problem if the borrowing was in an early Turkic variety without p-. (Pre-Proto-Turkic *p- became Proto-Turkic *h- which was preserved in Khaladj and was lost elsewhere.)

The -g of beg might be a Turkic approximation of a  Chinese (allophonic?) [ɣ]-like pronunciation of *-k. Although Old Turkic did have gh, gh could not coexist with e, but g could. And at some point, Middle Chinese raised to *ɛ. Late Middle Chinese *pɛɣ was transcribed in the Tibetan version of the  千字文 Thousand Character Classic (c. 9th-10th c.?)as <peg.> which is close to Turkic beg. (However, the Turkic word is first attested in the 8th century, possibly when 伯 was closer to *pæk than *pɛɣ in western Middle Chinese.)

9. If I understand this correctly, Haddow is a Germanic/Celtic (Scots + Scots Gaelic) hybrid. Are there more common names like it?

10. Aacistak has been called "the Language Capital of the World". What is its more common name? YELLOW PIG 11/25

<so nggiyan uliya aniya juwa emu biya orin shunja inenggi>

'yellow pig year, ten one month, twenty five day'

1. orin shunja 'twenty five' is a para-Mongolian (Khitan?)-Jurchen hybrid. Compare with Written Mongolian qorin tabun 'twenty five' containing an unrelated Mongolian word for 'five'.

The initial of 'five' in Manchu is s-, not sh-. Neither Jurchen sh- nor Manchu s- matches the t- in the rest of Tungusic.

2. Last night I thought of a Chinese character for the first time in many years: 閼. It has the same phonetic as a character that I first encountered last week: 菸.

That phonetic is a drawing of a crow: 於/烏. 烏 still represents the word for crow, but its variant 於 has come to represent a nearly homophonous locative preposition.

Normally 於/烏-graphs represent open syllables in modern languages: e.g.,

So in Cantonese, I would expect 閼 and 菸 to end either in -u [u] or -yu [y]. But they don't:

The vowels are less of an issue (see the appendix) than the codas:

In other words, 於/烏 should represent *-a(ʔ)(s) syllables but not *-t syllables or *-n syllables. Should. But clearly 於 is a phonetic in

I have not found any evidence for 菸 being read with -n before the last millennium. At some point 菸 came to represent a word 'tobacco' < 煙/烟 Old Chinese *CAʔin 'smoke' normally written with -n phonetics (垔 and 因). The top component of 菸 'tobacco' is <GRASS> which makes sense. But the bottom component 於 is a poor phonetic (and 於 is unlikely to be an abbreviation of the uncommon character 閼 which also has non-n readings). Was 菸 'smelly grass' chosen to write an unrelated and phonetically different but semantically similar word 'tobacco'?

I found 菸 via Wiktionary's entry on yen. I forgot that yen could also refer to having a desire for something.

12.22.19:22: APPENDIX: Some *-a rhymes from Old Chinese to Cantonese:

*Voiceless initials condition Cantonese tone 1 unless there ae other conditioning factors:

At some point after tonogenesis,*ʔ- was lost, and zero initials became homorganic glides before high vowels:

Contrast with *ʔa > nonhigh [a] without a glide in Cantonese 閼 aat3 [aːt˧]. YELLOW PIG 11/24

<so nggiyan uliya aniya juwa emu biya orin duin inenggi>

'yellow pig year, ten one month, twenty four day'

1. orin duin 'twenty four' is a para-Mongolian (Khitan?)-Jurchen hybrid. Compare with Written Mongolian qorin dörben 'twenty four' containing an unrelated Mongolian word for 'four'. -ben is the 'feminine'¹ vowel variant of the -ban found in ghurban 'three', and both ghurban and dörben have a shared suffix -r- (Janhunen 2003: 47).

Rozycki (1983: 7, 93) regards Jurchen/Manchu duin and Written Mongolian dörben to be a "[p]re-loan correspondence": "words with a phonology consistent with native Tungus stock and for which there is no evidence of loaning". I regard the vague similarity of duin and Proto-Mongolic *dö- 'four' (as reconstructed by Janhunen 2003: 47) as coincidental.

¹I use the term 'feminine' to avoid committing to a front or higher vowel interpretation of e.

2. Yesterday I forgot how to pronounce 6ix9ine which looks like it was written in the Arabic chat alphabet (in which 6 is ط <ṭ> and 9 is ص <ṣ> or ق <q>). But it's actually a stylized spelling of six nine mixing logograms with letters. The Jurchen (large) script, Korean hyangchal, and Japanese script frequently have logogram-phonogram sequences for words. Perhaps the Khitan large script did too, but it's too poorly understood for me to be certain.

How did Tekashi 6ix9ine come up with the stage name Tekashi? Is it based on Japanese Takashi?

3. I knew Ў wasn't unique to Belarusian (in which it represents /w/), but I forgot which other language was written with Ў: Uzbek. Ў has since been replaced with Oʻ. Ў/ represents mid /o/, whereas О/O represents low /ɒ/ and /o/ in Russian loans. Did Uzbeks perceive Russian /o/ [o] ~ [ɔ]² as being lower than their /o/ and closer to their /ɒ/? Does native /o/ have a high allophone [ʊ]? That would explain why it was written as Ў: i.e., as У <U> with a breve rather than as О <O> plus a diacritic.

²For some reason, Wikipedia IPA has [ɛ] for Russian /e/ and [o] (not [ɔ]) for Russian /o/ even though this diagram shows the two vowels at almost identical heights with [o] lower than [ɛ] rather than the other way around.

4. Cyrillic Ӯ (Ұ after 1957; see here for other uses of Ӯ) for Kazakh /ʊ/ reminds me of Möllendorff's Ū for Manchu /ʊ/.

The 'feminine' counterpart of Manchu /ʊ/ is /u/, but Kazakh has no /u/. It has an interesting three-way categorization of vowels: -RTR, 0RTR (neutral), and +RTR. The [-RTR] and [0RTR] counterparts of [+RTR] // are /ɪ/ and /ʉ/. (Kazakh has no /i/ either. If the IPA symbols are taken at face value, apparently the only high vowel is central /ʉ/; /ɪ/ and /ʊ/ are slightly lower.)

Is Kazakh /œ/ backed if not central? It is a [0RTR] vowel like /ʉ əj ə/ despite being written with a front vowel symbol like the [+RTR] vowels /ɪ jɪ e æ/.

5. I wish I had a key to the 1964-1984 Kazakh Latin alphabet used in China (and in this 1977 edition of Mao's Selected Works).

6. Last night I found Handel (2006) while trying to find where I had first encountered the idea that Korean 바람 param < Middle Korean pʌ̀rʌ̀m 'wind' was a borrowing from Old Chinese. I thought I had read it in Pulleyblank (1962), but I couldn't find it there. This 2013 post reminded me I got it from William Boltz. My apologies to Professor Boltz.

Handel discusses 'wind' on page 1015. In footnote 8, he mentions an internal etymology relating Korean 'wind' to pul- < Middle Korean pǔr- 'to blow'. Although the semantic match is perfect, the phonetic match leaves much to be desired. First, I know of no other cases of a CʌC-noun from a CuC-verb. Second, Middle Korean pǔr- is a class 5 stem in Ramsey's (1986) typology; it is a disyllabic stem /pùúr/, and if I understand Ramsey (1978: 221) correctly, it goes back to *pùrɯ́- with high series vowels and a high-low pitch pattern unlike the low-pitched low series vowels of pʌ̀rʌ̀m.

7. This part of the Wikipedia article on the Common Turkic Alphabet puzzles me:

Some handwritten letters have variant forms. For example: Čč=Jj, Ķķ=, and Ḩḩ=.

But Lithuanian Karaim, the only Turkic Latin alphabet  that I know of with Č, distinguishes Č (for []) from J (for [j]). And I find it hard to believe that two letters with such different shapes could be variants only in Turkic usage.

Of course in general Latin letter usage there are some surprising variants. Would an alien guess that B and b are the same letter? Uzbek used to have в instead of b in the 1928-40 Yaꞑalif alphabet. (I am not italicizing в since I'm not sure if the old Uzbek italic в looked like Russian italic в.)

Turns out that "[t]he small letter B is ʙ (to prevent confusion with Ь ь)". Although Ь represented palatalization in Russian, in Yaꞑalif, it seems to have stood for Soviet Turkic vowels similar to Turkish ı: e.g., Tatar [ɤ]. Uzbek had no such vowel:

[æ] [ɒ]

Nonetheless I guess ʙ remained the lowercase version of B in Uzbek for consistency with the other variants of Yaꞑalif. You can see Uzbek ʙ here.

8. I've never looked at Karakalpak before today. I confess I forgot it even existed.

It has a nearly symmetrical vowel system with palatal vowel harmony. Only e has no nonpalatal counterpart.


It also has labial harmony. If the first vowel is nonlabial, then the second vowel cannot be labial. However, if the first vowel is labial, then the second vowel may or may not be labial. In any case, vowels must match in palatality.

How was Karakalpak /h/ written in Cyrillic? I can't find a Cyrillic letter for it.

9. Wikipedia says that

The [irregular] /otoosan/ form [for Japanese 'father'] first appears in the early Meiji period in educational materials mandated by the 文部省 (Monbushō, "Ministry of Education").

Did /otoosan/ replace earlier /otossan/ by analogy with the long vowel of /okaasan/ 'mother'?

/okaasan/ is itself irregular; it is from /okakasan/ with  irregular intervocalic /k/-loss.

Wikipedia lists Taiwanese borrowings of both words: 多桑 <MANY MULBERRY> tò-sàng and 卡桑 <kha MULBERRY> khà-sàng. Both reflect shorter Japanese forms without the honorific prefix o-.

19.12.18.xx:xx: YELLOW PIG 11/23

<so nggiyan uliya aniya juwa emu biya orin ilan inenggi>

'yellow pig year, ten one month, twenty three day'

1. orin ilan 'twenty three' is a para-Mongolian (Khitan?)-Jurchen hybrid. Compare with Written Mongolian qorin ghurban 'twenty three' containing an unrelated Mongolian word for 'three'.

2. Yesterday I learned that Eom Ik-sang still believes a number of Korean words conventionally regarded as native are actually borrowings from Old Chinese. Even if I assume the Old Chinese forms he cites are correct, there are still issues.

Perhaps the most convincing of his proposals is

Old Chinese 風 *pljəm (Li), *plums (Zhengzhang) 'wind' : Korean 바람 param 'id.'

I would prefer to cite Middle Korean pʌ̀rʌ̀m 'wind' which is even closer to the Old Chinese reconstructions that he cites.

Although I expressed some doubts about a liquid in the Old Chinese word for 'wind' in 2013, I would favor reconstructing that word as *prəm with *-r- now.

That aside, there is one other potential problem with the comparison: I don't think anyone's Old Chinese reconstruction for 'wind' ever had the vowel *ʌ. If the Old Chinese word for 'wind' had *ə, why was it borrowed into early Korean as something like pʌ̀rʌ̀m when Korean also had the vowel ə? In other words, why isn't the Korean word for 'wind' pərəm with ə?

12.19.22:33: Was Edkins (1890: 95) the first to derive Korean param from Old Chinese 風?

param, wind; from [an unspecified - presumably Chinese -] pam. The old Chinese for wind is bam, which has changed to [Mandarin] feng.

Edkins was writing decades before Karlgren reconstructed Old Chinese. I know almost nothing about pre-Karlgren Chinese reconstructions, so I wonder what the reasoning behind pam and bam are. *pam is not a bad guess, since even in the 19th century, it was known that f- was from *p- and that 'wind' rhymed with 南 'south' (Mandarin nán and Cantonese naam4). However, *b- is a surprise, as 'wind' does not have a tone pointing to an earlier *voiced initial.

3. I've never seen anything like this use of the reflexive in Romagnol:

mè a sò 'I am' (cf. Italian [io] sono 'id.')

The reflexive seems less exotic in this case:

mè a j'ò 'I have' (cf. Italian [io] ho 'id.')

And the English and Italian translations of this last instance also have a reflexive:

mè a'm so lavê 'I washed myself' (cf. Italian [io] mi sono lavato 'id.')

4. Wikipedia:

Romagnol has an inventory of up to 20 contrastive vowels in stressed position, in comparison to Italian's 7.

Unfortunately Wikipedia doesn't list all 20 vowel phonemes. How did the 10 native vowels of Latin become 20 in Romagnol? Are some of the Romagnol vowels from Latin diphthongs?

The most interesting Romagnol vowels are these diphthongs which are unlike anything in Latin:

I assume they are phonemes, though Wikipedia represents them with phonetic brackets. /Və̯/ : /Vɐ̯/ is a fine contrast I've never seen before.

5. How did Neapolitan develop this alternation?

Did an earlier *o break to [wo] before the masculine ending *-o merged with the feminine ending *-a?

6. While I'm in languages of Italy mode, It just occurred to me that the gorgia toscana is a bit like Jurchen/Manchu in which *p > f (albeit in all environments, not just intervocalically) and *-k- > -h- (see Vovin 1997 for details).

7. I saw a commercial for the IUDs Mirena [məɹiːnə] and Kyleena [kʰajliːnə]. Those names sound like 'creative' Anglospheric girls' names. The commercial was aimed at young women. Somebody wanted the audience to think of IUDs as if they were daughters. The children that the IUDs are supposed to prevent. Creepy marketing. YELLOW PIG 11/22

<so nggiyan uliya aniya juwa emu biya orin juwe inenggi>

'yellow pig year, ten one month, twenty two day'

1. orin juwe 'twenty two' is a para-Mongolian (Khitan?)-Jurchen hybrid. Compare with Written Mongolian qorin qoyar 'twenty two' containing an unrelated Mongolian word for 'two'. Jurchen juwe 'two' is not to be confused with Jurchen juwa 'ten'.

2. Last night when trying to figure out the Chinese character spellings for damofo and yumofo, I typed <fo> into Windows 10's Pinyin IME and was surprised to see 仸 <PERSON.夭>. 夭 ǎo/yāo/yǎo is normally not phonetic in b/p/f-graphs:

I would have guessed that 仸 was read as something like yao. Then I learned that 仸 is a variant of 佛 'Buddha'. 仸 seems to be a semantic compound with 天 <HEAVEN> slightly altered to 夭. (天 and 夭 are difficult to distinguish in a sans serif font, but in handwriting, the top stroke of 天 is written from left to right, whereas the top stroke of 夭 is written from right to left.)

3. Two elephantine surprises last night: Wiktionary notes a subtle difference between 象 <ELEPHANT> in the PRC standard and nom on the one hand and elsewhere in the Sinosphere on the other. Both versions of 象 have the same codepoint.

I am not sure that the PRC and nom really have a distinct version of 象:

4. 象 was also formerly a simplification of 像. The Wiktionary entry for 象 says it was a 1964-1986 simplification of 像. Wikipedia mentions other two characters restored in 1986: 覆 and 叠. I am skeptical:

5. When trying to type 复 in Microsoft's Bopomofo IME, I found 䲁 <FISH.wèi> wèi 'a snake-like fish' as the 64th and last choice for fù. How did 䲁 get in the list? Graphic confusion with 鮒 <FISH.> 'a kind of fish' which is also in the list?

6. Unidentifiable Khitan small script characters I encountered while copying the 契丹小字研究 Qidan xiaozi yanjiu (Research on the Khitan Small Script)  hand copy of the epitaph for Emperor 興宗 Xingzong (1015-1054) of the Khitan Empire:

⿱⺌月 (but with a dot instead of two horizontal lines in 月; 2.21.1)

a lookalike of Chinese 七 <SEVEN> (2.24.1)

I assume they must be in the book's indices under more conventional forms - but what are those forms?

Ah, the first was a variant of 298 <co> with a narrower bottom half and a curved lower stroke:

The very block with 298 from Xingzong was even discussed in Kane (2009: 71). Duh.

The Qidan xiaozi yanjiu hand copy also has some slight variations of characters I do recognize: e.g.,

243 <HEAVEN> and 240 <TEN>

are written with 𠂉 on top instead of ハ. As a result, 243 <HEAVEN> looks like 矢 204 whose phonetic value is unknown. Could 矢 204 be interpreted as 'heaven'?

I still have no idea what 七 is. Not only is it an unusual (for Khitan) shape, but it is also is the only top element in a pyramid.

7. The Cantonese-only character 乸 <jaa2.MOTHER> for naa2 'female' has an unusual phonetic 也 jaa5. The rhyme is perfect; the initial is not. 乸 has puzzled me since I first saw it some time ago, but today I just realized that a j-phonetic 也 might have been chosen because there are phonetics representing both j- and n-syllables: e.g., 襄 soeng1 (with s-!) < *sInaŋ in

That j- ~ n- alternation goes back to a single Old Chinese *n- that developed two reflexes: *n- before nonhigh vowels and palatal *ɲ- before high vowels.

也 had Old Chinese *l-, another source of Cantonese j-. *l-characters normally aren't phonetics in Cantonese n-characters.

Cantonese speakers would not know which j- are from *n- and which j- are from *l-, so whoever came up with 乸 might have thought, 'if 襄 can stand for j- and n-syllables, 也 can too', unaware that 也 jaa5 isn't from *n- (and hence 'shouldn't represent Cantonese n-syllables).

8. I missed Andrew West's tweet on a cursive Tangut tablet from the Baisigou pagoda.

9. Marijn van Putten on the mystery of Mehmet. YELLOW PIG 11/21

<so nggiyan uliya aniya juwa emu biya orin juwe inenggi>

'yellow pig year, ten one month, twenty two day'

1. orin emu 'twenty one' is a para-Mongolian (Khitan?)-Jurchen hybrid. Compare with Written Mongolian qorin nigen 'twenty one' containing an unrelated Mongolian word for 'one'.

2. I wish I could look more into exceptions to 'Altaic' vowel harmony. Two examples that have long stuck in my mind:

More recently I came across Manchu age 'older brother' (not ege or aga!; see Hauer and Corff [2007]: 7). Rozycki (1983: 22) regards age as somehow related to Written Mongolian aq-a¹ 'id.': "The correspondence is ancient and direction of loan impossible to ascertain." Could this be an anne-like case of intimate deformation?

I couldn't find age or other similar Manchu words like ahūn 'older brother' in Doerfer's Mongolo-Tungusica (1985), so I suppose Doerfer does not think there is any connection between the Manchu and Mongolian words.

What finally pushed me to write about Manchu age was seeing Manchu ajige 'small, little, young' (not ejige or ajiga) on Saturday night. Its root is aji-, also found in ajida 'small' and ajigan 'young, small' which are harmonic. majige 'little' is similarly nonharmonic with similar semantics. Are these cases of cute deformation? Imitating the speech of small children who have not yet mastered vowel harmony? I can't quickly find any article on L1 Turkish vowel harmony acquisition (DuckDuckGo results are often unsatisfying), but Leiwo, Kulju, and Aoyama (2006?) cover Finnish vowel harmony:

The data showed that most of Finnish 2;6-year-olds’ productions do not violate FVH [Finnish vowel harmony], suggesting early mastery of FVH. When there were errors in children's productions, they were mostly substitutions of back vowels for the front rounded vowels.

... which is the opposite of the substitution that occurred in Turkish anne! (Or centuries ago in barmis.)

Unlike Finnish or Turkish, Manchu does not have palatal harmony. Manchu age, etc. have a high series vowel e [ə] in place of its low series counterpart a. But if I 'translate' the Finnish error pattern into Manchu, I would expect substitutions of low series vowels for high series vowels. Which is the opposite of what happened in age, etc.

There is, however, a common denominator: Finnish vowel harmony errors occurred "especially in non-initial syllables and in suffixes" (Leiwo, Kulju, and Aoyama (2006: 151), and the Turkish and Manchu violations above are also in noninitial position: -mis, anne, age.

Incidentally, Aoyama Katsura is a former classmate of mine.

¹The hyphen is a device to transliterate the obligatory space in the Written Mongolian spelling <aq a>; it has no morphological or phonological significance.

3. Looking at Tangut


4440 2len4 'pavilion' (#189 in The Golden Guide)

led me to wonder: Why did Middle English pavilloun become modern English pavilion? Was -i- restored by someone who knew its Latin source pāpiliō 'butterfly'?

4. Today I started copying the epitaph for Emperor 興宗 Xingzong (1015-1054) of the Khitan Empire. I haven't gotten to line 4 yet, but I looked ahead and spotted block 24

<096.339.140> <?.i.en>

of line 17.24.

The only other instances of 096 that I know of are in the block

<096.339> <?.i>

in the epitaphs for Mme. 耶律 Yelü (11.20) and 蕭敵魯 Xiao Dilu (1061-1114; 30.19 and 34.14).


is similar in shape to 095, a lookalike of Chinese 女 <WOMAN>. 095 is more common than 096 and can occur in medial and final positions in blocks. These different distributive patterns suggest that 096 represents a more complex phonetic sequence than 095 - one that so far is only known from the beginnings of words. On the other hand, whatever 095 represents may be more complex than, say, 339 which is simply [i]?

Both 095 and 096 probably represent one or more syllables absent from Liao Chinese, as neither appears in Khitan transcriptions of Chinese. They may contain

I doubt that 095 or 096 represent single segments. I suspect that all the single-segment phonograms of the Khitan small script have been found by now.

As far as I know, as of 2016 there were 482 known small script characters including variants. Have any new ones been found lately? The only new small script texts found lately to the best of my knowledge are fragments of jade tablets from a mausoleum. If this photograph is representative, the texts are too short to be likely to contain any character that hasn't surfaced in any previously known, much longer texts.

5. Today I finally got Jun Jiang's Learn Manchu Handwriting on my iPhone. As neat as it is to see a finger trace strokes on a screen, I wish I could double-check the direction and order of strokes with another source. And I'm not yet accustomed to the wheel interface.

6. Today I also got Jun Jiang's Mongolian Words & Writing app, but I haven't tried it out yet. Users hoping to learn Mongolian Cyrillic will be disappointed since the app only covers the traditional script. I'd like to know how to write Ө <Ö> and Ү <Ü> in cursive. (The rest of the alphabet is identical to Russian, and I've been writing Russian in cursive since 1997.)

7. Jun Jiang's store doesn't have any app for Mongolian Cyrillic, but it does have these apps:

I assume those apps have the same interface as the Manchu app.

So much for my original guess that Jun Jiang might be a Manchu and Mongol specialist.

8. Wikipedia's sample of the traditional Mongolian script is (turn 90 degrees clockwise for the proper orientation - alas, that way the first line is on the right instead of the left where it should be):

ᠴᠣᠷᠢ ᠢᠢᠨ ᠭᠠᠭᠴᠠ

cori yin ghaghca

ᠪᠣᠰᠤᠭ᠎ᠠ ᠪᠢᠴᠢᠭ᠌᠄33

'single GEN single': i.e., 'the one and only'

bosugh-a bicig:

'vertical script:'

ᠮᠣᠩᠭᠣᠯ ᠪᠢᠴᠢᠭ᠌

mongghol bicig

'Mongol script'

I don't know what is meant by 'one and only' since  there are other vertical scripts, and even if one is only thinking of major vertical scripts written from left to right, the Mongolian script is not unique since the Manchu script is written the same way.

ghaghca has a synonym ghanca. How can that word-medial -gh- ~ -n- alternation be explained - assuming they are related words?

9. Today while double-checking the Li Fanwen number for the common Tangut character


4457 2leq3 'great'

I found these interesting characters which appear to be semantic compounds:


4445 2bi1 = 4457 2leq3 'great' + 2547 1chir2 'right'


4454 2ryr1 = 4457 2leq3 'great' + 2920 1zhyq3 'left'

2920 has the Tangraphic Sea analysis


2920 1zhyq3 'left' = all of 3485 1laq 'hand' + right of 4454 2ryr1

which cannot be taken at face value as the origin of the character - why would a character for a common word 'left' be based on a rare character 4445?

4445 and 4454 are only known as members of these compounds:


4445 0661 2bi1 2ngon4  'South Sea'


4454 0661 2ryr1 2ngon4 'North Sea'

4445 and 4454 are not the normal words for 'south' and 'north' which are


4796 1zyr4 'south' and 0942 1laq3 'north'

Although the Tangut script is thought to be full of semantic compounds, it is curious that 4445 and 4454 - glossed by Li Fanwen (2008: 706-707) as 'south' and 'north' - do not contain any components in common with 4796 and 0942, the graphs for the common words 'south' and 'north'.

Nonetheless Li's glosses make sense: 4445 has the notation


4796 0661 1zyr4 2ngon4 'southern sea'

in Homophones D and is a definition for 4796 'south' in Tangraphic Sea 89.251. And if 4454 contains 'left', the opposite of the 'right' in 4445, then 4454 must be 'north', the opposite of 4445 = 4796 'south'. But I am hesitant to gloss 4445 and 4454 simply as 'south' and 'north'. Maybe 'Great South' and 'Great North' or even as 'Great Right' and 'Great Left'?

The association of 'south' with 'right' is reminiscent of Sanskrit dakṣiṇa- 'south/right'. Sanskrit uttara- 'north' can also mean 'left', but the normal word for left is vāma- which does not mean 'south'.

What were the Great South/Right and Great North/Left Seas? Were they mythical? I don't know much about how the landlocked Tangut perceived their world. How many Tangut had ever seen a sea? What is the etymology of 2ngon4 'sea'?

10. Today I saw this passage in Gorelova ( :15; I added the links):

The Mohes [靺鞨] called their tribal leader "damofo mandu" (chin. da [大] "great"), as one can see further, the Southern Shiwei [室韋], who can be identified as people of Tungusic descent, called their tribal chieftains "yumofo mandu".


The language spoken by the Mohe was Tungus-Manchu. What is important to mention is that the language of the Sushen could also be referred to as proto-Tungusic.

During the Tang era, the Mohe, similar to other peoples of northeastern Asia, were subjected to constant political and military pressure from Tang rulers. Soon after the Koguryo state of Korea had been defeated by the Tang empire (668 AD), a large portion of the Koguryo people fled into the lands of the Sumo Mohe [粟末靺鞨]. Soon a lot of towns, surrounded by defensive walls, arose there. Around 700, a new state, "Parhae" (chin. Bohai), raised from the ruins of Koguryo, was established. It was the leader of Sumo Mohe, Cicik Zhungxiang [乞乞仲象] who was considered the creator of Bohai. [...] Later, his grandson, Uazhi Da Tuyu, declared himself the emperor of Bohai, which in the course of time became highly cultured and enlightened, and widely known beyond the borders of the country. The Parhae (Bohai) state—a deserving successor of the culture and power of Koguryo and the tribal league of the Songari Mohe—flourished for 228 years until it was destroyed by the Qitans [Khitans] (926 AD) (Shavkunov, 1968; Crossley, 1997:18; Larichev, 1998:53-4).

What are the characters for damofo mandu and yumofo mandu which sound like modern Mandarin readings of old Chinese transcriptions?

I was surprised to see the Southern Shiwei described as Tungusic since their name - roughly pronounced *shirwi in Late Middle Chinese - is derived from the para-Mongolic autonym Serbi. But of course names are not reliable guides to linguistic affiliation.

Cicik Zhungxiang is a strange, not-quite-Pinyin romanization of 乞 乞仲象 Qǐqǐ Zhòngxiàng with a -k whose motivation is obscure. Assuming the Chinese pronunciation favored in Parhae was like early Sino-Korean, 乞 乞仲象 was pronounced something like *kər kər tyung syang. 乞 乞 <BEG BEG> looks like an insulting ('derographic') transcription of a non-Chinese (i.e., Mohe) name. 乞 乞仲象 is also known as 大 仲象 with a Chinese-style surname 大 <GREAT> to go along with the Chinese-style disyllabic personal name 仲 象 <SECOND.BORN ELEPHANT>.

Uazhi Da Tuyu is presumably 乞 乞仲象's son (not grandson) 大祚榮 (Mandarin: Dà Zuòróng, Korean: Tae Cho-yŏng; r. 712-719), the first king (not emperor) of Parhae. I have no idea what Uazhi is.

11. The best for last: I just discovered Andrew West's Tangraphic Sea search tool! More Tangut resources here. YELLOW PIG 11/20

<so nggiyan uliya aniya juwa emu biya orin inenggi>

'yellow pig year, ten one month, twenty day'

1. Jurchen and Manchu orin 'twenty' sounds like Written Mongolian qorin 'id.' The pronunciation of Khitan

廿 <TWENTY> (large script)

丁 <TWENTY> (small script)

is unknown; it could have been something like qorin.

Normally Written Mongolian q corresponds to h or k in Manchu.

Rozycki (1983: 11-12) proposes four layers of borrowing into (Jurchen/)Manchu to explain the different correspondences:

Layer 1: Mongolic *q- borrowed as *k- > *x- > *Ø- (within Tungusic): e.g., orin 'twenty'

Layer 2: Mongolic *q- borrowed as *k- > *x- (within Tungusic): e.g., hoton 'city wall' (cf. Written Mongolian qoton 'id.')

Layer 3: Mongolic *q- borrowed as k-: e.g., kobkolo- 'to remove (paper stuck to a surface)' (cf. Written Mongolian qubqol- 'to peel')

Layer 4: modern Mongolic *q- > x- borrowed as h-

This model could be refined: e.g., in the early layers, the borrowing was probably from para-Mongolic (specifically Khitan) rather than from Mongolic.

There doesn't seem to be any way to distinguish between layers 2 and 4 on the basis of Manchu evidence. I suppose Rozycki assigns Manchu words to layer 2 if the borrowings are found elsewhere in Tungusic (e.g., see Doerfer [1985: 81] for hoton-type Tungusic words). Layer 2 words were borrowed into early Tungusic, whereas layer 4 words were borrowed only into (Jurchen/)Manchu.

2. The Khitan large script character 廿 <TWENTY> is identical to the standard Chinese character 廿 <TWENTY> which was pronounced *ɲip in Middle Chinese, a fusion of 二 *ɲi̤ 'two' and 十 *dʑip 'ten'. Wiktionary says the expected standard Mandarin reflex is rì, but the actual reflex is niàn because

[t]he irregular pronunciation (e.g. /nVm/ [with the nasal counterpart of the original coda /p/] dates from the Song dynasty, to avoid homophony with a vulgar word; see 入.

Let's see 入:

The regular Mandarin pronunciation [for 入 <ENTER>] as predicted from Middle Chinese is rì. The irregular sound change [to rù] is for taboo reasons - to avoid homophony with its derived vulgar meaning "to enter > to have sexual intercourse", nowadays represented by 日 (rì).

I would expect 廿 to be nhập in Vietnamese since 二 'two' is nhị and 十 'ten' is tập. Wiktionary lists five Vietnamese readings of 廿:

The normal Vietnamese word for 'twenty' is native: 𠄩𨑮 hai mươi 'two ten', which has its own contracted form hăm (with short ă instead of long a!).

3. Is it obvious to Koreans that the hangul title of the movie 독전 Tokchŏn (English title: Believer) is 毒戰 <POISON BATTLE> tokchŏn rather than 督戰 <SUPERVISE BATTLE> tokchŏn 'urging to fight harder'?

Only the second tokchŏn is in dictionaries. The first tokchŏn is a straightforward Koreanization of the title of its inspiration, the Chinese movie 毒戰 (Mandarin Dúzhàn, Cantonese Duk6 zin3; English title: Drug War]).

The fact that some websites call the Korean movie 독전: 마약전쟁 Tokchŏn: mayak chŏnjaeng 'Poison Battle: Narcotic Wars' implies that Tokchŏn by itself might need clarification. In hanja that longer title looks redundant with two 戰 chŏn: 毒戰: 痲藥戰爭.

4. Naver's Korean-English dictionary gave this sentence as an example of tokchŏn:

암튼 '독전' 화이팅 할까요?

Amthŭn 'Tokchŏn' hwaithing halkkayo?

'Anyway, shall we do "Believer" fighting?'

That made me curious about the etymology of 암튼 amthŭn 'anyway'. Is it of recent origin? I couldn't find it in Martin et al.'s massive 1967 Korean-English dictionary or my old portable favorite, Dong-A's 1981 Korean-English dictionary.

I think 암튼 is an extreme example of contraction:

아무리 하려 하면 하든지

amu-ri ha-ryŏ ha-myŏn ha-dŭ-n-ji


Martin et al. (1967: 1093) derive 암 am 'surely' from


amuryŏmyŏn 'surely'

which according to Martin et al. (1967: 1073) is in turn a contraction of

아무리 하려 하면

amu-ri ha-ryŏ ha-myŏn


thŭn is a reduction of




Martin (1992: 834) translates -dŭ-n-ji as  'the uncertain fact that it has been observed that', 'whether it was (observed to be/happen'). -ji can be dropped. That leaves hadŭn /hatɯn/. th- /tʰ/ looks like the product of syncope, metathesis, and fusion:

/hat/ > /ht/ > /th/ > /tʰ/

Metathesis is a regular process in Korean: /hC/ cannot surface as [hC].

(12.16.0:16: The reduction of /hat/ to /tʰ/ above parallels the reduction of the first syllable of the Korean root 'to ride' between the 12th and 15th centuries:

12th c. *hʌta- > *hta- > 15th c. tha- /tʰa/

The 12th century form is preserved in Chinese transcription as 轄打 *xjaʔta in Jilin leishi. I have followed the conventional view by reconstructing *ʌ in the first syllable, but now it occurs to me that Chinese *-ja- might reflect a 12th century Korean *(y)e or *yə. Perhaps

pre-12th c. *heta- > 12th c. *h(y)eta- or *hyəta- > *hʌta- > *hta- > 15th c. tha- /tʰa/

I reconstruct *e as a front low series vowel in early Korean:




That *e later broke to (= in my modified McCune-Reischauer romanization), the most common yV-sequence in native Korean words.

In my scenario for 'to ride' above, *(y)e or *yə was reduced to *ʌ, the minimal low series vowel, before being lost. By that point Korean had developed vowel harmony, so the vowel in the first syllable had to be a low series vowel like the *a in the second syllable.)

5. More examples of metathesis in Korean:

암클 amkhŭl or 암글 amgŭl < /am(h) kɯr/

'useless knowledge, female writing, hangul'

수클 sukhŭl or 수글 sugŭl < /su(h) kɯr/

'useful knowledge, male writing, Chinese characters'

That pair of words is not only sexist but also reflects  a Sinocentric worldview.

The final /h/ of /amh/ 'female' and /suh/ 'male' surfaces as aspiration following a stop which in this case is the /k/ of /kɯr/ 'writing'.

The variants with -gŭl are compounds in which 'female' and 'male' have been reinterpreted as /am/ and /su/ without /h/. /k/ voices after voiced segments: /m/, /u/, and the /n/ of han'gŭl /hankɯr/ 'great/Korean-writing'.

Naver regards the -g-forms (amgŭl, sugŭl) as correct and states that the -kh-forms are erroneous (see here and here), though Martin et al. (1967: 1011, 1095) only lists the -kh-forms. Does that indicate the reanalysis of 'female' and 'male' as being without /h/ has been completed over the past half-century? Not quite - the official standard for Korean still requires aspiration in, for instance,암캐 amkhae 'female dog' < /amh kɛ/ (not 암개 amgae!) in which am- is still clearly 'female' (한글 맞춤법 Hangul Spelling 4.4.31 and 표준어규정 Standard Language Code 1.1.7). Perhaps the 'writing' words have lost their gendered associations.

I found amkhŭl in Martin et al. (1967: 1095) when looking in vain for amthŭn (ㅋ kh is before ㅌ th in Korean alphaetical order).

6. I was surprised to learn that 怒濤 <ANGER WAVE> dotō) is a Japanese name for a kind of Faucaria plant.

(12.16.2:22: The same characters are the Chinese name [Mandarin nùtāo] for Faucauria paucidens.)

The Korean name for Faucauria tuberculosa is a combination of that and the kanji for the Japanese name of Faucauria tuberculosa (荒波 aranami 'wild wave') read in Sino-Korean: 怒濤荒波 nodo hwangpha.

Sino-Korean 怒濤 nodo 'angry wave' by coincidence sounds like the unrelated native Japanese word 喉 nodo 'throad' - and by another coincidence, Faucaria is from Latin fauces 'throat'.

7. I thought faucet might be related to fauces 'throat',  and Wiktionary agrees, but Merriam-Webster gives a derivation I don't quite follow:

Middle English, bung, faucet, from Middle French fausset bung, perhaps from fausser to damage, from Late Latin falsare to falsify, from Latin falsus false

Falsetto turns out to be from falsus too.

The bottom of Merriam-Webster's entry for faucet led me to their Time Traveler feature showing what words were first attested in English in a given century: e.g., the 15th century (faucet, favored, feasible ...). YELLOW PIG 11/19

<so nggiyan uliya aniya juwa emu biya oniohon inenggi>

'yellow pig year, ten one month, nineteen day'

1. Jurchen oniohon 'nineteen' is unlike either Manchu  juwan uyun 'ten nine' or Written Mongolian arban yisün 'ten nine'. It is a loan from some para-Mongolic language (presumably a nonstandard variety of Khitan) whose morpheme for '-teen' was something like *-hon (or *-kon if the Jurchen word was borrowed before the weakening of *-k- to -h- in Jurchen); yesterday's day number niuhun 'eighteen' has a high vowel harmonic variant of '-teen'. Janhunen (2003: 399) believes the Jurchen words for 'eighteen' and 'nineteen' have the same root before '-teen':

Could *o be related to Proto-Mongolic *onca 'unique'? If the root of unique is 'one', perhaps *o is 'one'.

What's not clear to me is why *a(y)i correspond to u and o in Jurchen. Did *a(y)i reduce to a single vowel that assimilated to surrounding vowels (the *u of '-teen' and *o- 'one')?

2. I got interested in Southern American English vowels a year before I fell in love with Tangut in 1996. It's taken me 24 years to wonder if complex  'drawled' diphthongs like [æ̠ɛæ̠] in Southern American English might have parallels in Tangut. If they do, there would probably be no way to reconstruct them since no fine phonetic notation for Tangut has survived. A simple-looking Tibetan transcription of a Tangut rhyme like <e> might conceal something like[æ̠ɛæ̠] or even ဇိုင်ဂူ <zuiṅ gū> Mon [ʌ ei̯a] (Diffloth 1984: 53¹, 226).

Southern American English [æ̠ɛæ̠] goes back to *æ (and before that, *a) and <zuiṅ gū> Mon [ʌei̯a] goes back to Proto-Monic *-iəw (Diffloth 1984: 226). So I presume that similarly complex Tangut vowels also had simpler origins. I still reconstruct only six vowels in pre-Tangut: *u *i *a *ə *e *o.

¹Diffloth (1984: 53) uses an underscore to indicate "that portion of the vowel which is loudest", whereas I presume the underscore inSouthern American English [æ̠ɛæ̠] is the IPA retraction symbol.

3. What kind of name is Onreitt? Onreitt Murtagh's name was so unlike those of her sisters Jean and Kate (the latter on the cover of Supertramp's Breakfast in America).

4. I just realized that methinks

Also found two other similar defective verbs: meseems and the pseudoarchaic (and obsolete) mehopes. YELLOW PIG 11/18

<so nggiyan uliya aniya juwa emu biya niuhun inenggi>

'yellow pig year, ten one month, eighteen day'

1. Tonight Stephen Colbert made a joke about the new Finnish prime minister Sanna Marin using the pseudo-Finnish phrase Okey Bøömer.

That phrase is so un-Finnish - even un-Scandinavian:

I doubt a more pseudo-Finnish Okej Buumer would have amused as many English speakers, though.

2. How is a violin like a prison?

Votre Nicolas est au violon de la ville

'Your Nicolas is at the violin [i.e., in the prison] of the town'

- Erckmann-Chatrian, Histoire d'un paysan, 1789-1815

3. Some interesting Cantonese characters:

3a. Cantonese me1 'to carry on the back'

has several spellings:

3b. Cantonese mau1 'to squat' is spelled YELLOW PIG 11/17

<so nggiyan uliya aniya juwa emu biya darhon inenggi>

'yellow pig year, ten one month, seventeen day'

1. Why isn't ambush embush?

2. Today I realized that Sanskrit and Okinawan represent two opposing approaches to mid vowel elimination:

Sanskrit has neutralization in two senses:

Okinawan vowels were polarized in the sense that they moved toward the points of the vowel triangle and away from the neutral center.

Both Sanskrit and Okinawan then developed new long vowels from vowel sequences:

(Normally, length in Sanskrit e and o are left unmarked because those vowels are always long, but I have marked their length here and below for clarity.)

Even later, Pali and modern Okinawan developed short e and o:

Pali shortened long and in closed syllables to avoid overlong syllables (long vowels followed by codas).

Okinawan borrowed Japanese short e and o without modification in 'English'.

Some native Okinawan words seem to have Pali-style shortening of overlong syllables:

However, unlike Pali, Okinawan does permit overlong syllables: e.g., the yn- of 'slowly' above and yn 'lightly, gently, weakly'. YELLOW PIG 11/16

I can't decide on a title, so I'm bringing back a generic Jurchen date title since 2019 is the 1000th anniversary of the Jurchen large script:

<so nggiyan uliya aniya juwa emu biya nilhun inenggi>

'yellow pig year, ten one month, sixteen day'

1. Yesterday I ran out of time to write about the  ᠣᠯᠬᠣᠨᠣᠳ <ulqunut> Olqonud, the tribe of Genghis Khan's mother Höelün. The Mongolian Wikipedia article about that tribe is titled Олхонууд <Olxonuud> with a long vowelуу <uu>. ууд <uud> looks like a plural ending, so I suppose Olqonud is 'the Olqons'.

How far back does that long vowel go? Janhunen (2003: 5) writes,

In spite of claims made to the contrary, it has been impossible to establish any quantitative correlation for the Proto-Mongolic vowels. While virtually all the Modern Mongolic idioms have distinctive long (double) vowels, these are of a secondary contractive origin. Occasional instances of irregular lengthening are observed in most of the modern languages, and in a small number of cases there would seem to be a correspondence between two peripheral languages, notably Dagur and (Huzhu) Mongghul, as in Dagur mood ‘tree, wood’ = Mongghul moodi id. < *modu/n. In spite of the seemingly perfect match, such cases are too few and involve too many counterexamples to justify any diachronic conclusion other than that of accidental irregular convergence.

Having said that, Janhunen (2003: 45) goes on to reconstruct a long vowel in *-UUd from an even earlier *-U-d. *-U- (later *-UU-) is a linker vowel of unspecified 'phonological gender' inserted between a final consonant and the plural ending *-d. *-U- is *-u- after masculine vowel stems and *-ü- after feminine vowel stems: e.g. (examples added 12.12.2:03),

*nom-ud 'books' (Janhunen 2003: 12); now Khalkha номууд <nomuud>

*cerix-üd 'soldiers' (Janhunen 2003: 64); now Khalkha цэргүүд <cergüüd>

Why would a linker vowel become long?

There is another Written Mongolian plural suffix ᠨᠤᠭᠤᠳ /ᠨᠤᠭᠦᠳ <nughut>/<nugut> which Janhunen reconstructs as *-nUUd (not *-nUgUd!). I guess <gh>/<g> is an orthographic pseudoarchaism: the logic being '-UU- is a long vowel, and long vowels in speech often correspond to <VghV>/<VgV> in writing, so -UU- should be written as <VghV>/<VgV> too: e.g. (examples added 12.12.2:21),

<yaghan nughut> jaghan nugud 'elephants', now Khalkha заанууд <zaanuud>

<cacag nugut> ceceg nügüd 'flowers', now Khalkha цэцэгнүүд <cecegnüüd>

¹Manchu moo 'tree, wood' also has a long vowel. Loanword or cognate? But I digress.

2. On Monday it took me a moment to realize that 홋카이도 <h.o.s kh.a Ø.i t.o> Hotkhaido on a sign in Honolulu stood for 'Hokkaido'. That got me thinking about the many ways kana have been transcribed in hangul. Although Japanese and Korean are typologically similar in many ways and also share a large amount of vocabulary of Chinese origin, they have very different phonological systems: e.g.,

One challenge for Korean transcribers of Japanese is distinguishing between Japanese voiceless and voiced obstruents. Here are several solutions to the problem from Wikipedia. I use /k/ and /g/ as examples:

initial /k/
noninitial /k/
1986 South Korean standard
2001 North Korean standard
Japanese colonial standard
/k/ [g]
Korean Language Society
/k/ [g]
1948 South Korean standard /k/
/k/ [g]
1963 South Korean standard
/kʰ/ /k/ [g]
Chhoe Yŏng-ae and Kim Yong-ok
/kʰ/ /k/ [g]

Japanese noninitial /k/ cannot be precisely replicated in Korean. The majority solution is to Koreanize it as /k/ even though Korean /k/ is voiced [g] in that position. The current South and North Korean standards Koreanize Japanese /k/ as voiceless /kʰ/ and /k͈/. Compare:

/naka/ [naka]  'middle'
/naga/ [naga] ~ [naŋa] 'long'
majority solution
/naka/ [naga]
/naka/ [naga]
South Korea (1986)
/nakʰa/ [nakʰa] /naka/ [naga]
North Korea (2001)
/nak͈a/ [nak͈a] /naka/ [naga]

Japanese initial /g/ also cannot be precisely replicated in Korean.

The majority solution is to Koreanize it as /k/ [k].

The most interesting solution is the colonial one: Japanese /g/ is transcribed as <k> with a circular diacritic. I presume <°k> was to be read as [g] even in initial position. There are two interesting things about that diacritic. First, <°> in Japanese indicates a voiceless stop [p], not voiced obstruents. Second, <°> in Japanese is placed to the top right of kana, not the top left. I suspect a circle was chosen because it was a shape that already existed in hangul unlike the Japanese voicing diacritic ゛.

Japanese noninitial /g/ can also be pronounced as [ŋ], but that nasal variant is not reflected in any of the above Koreanizations, even though  Korean  does have /ŋ/ [ŋ] in noninitial position: e.g.,  Japanese [naŋa] 'long' sounds like Korean 낭아 <n.a.Ø Ø.a> /naŋa/ [naŋa]. ANOTHER EMPRESS XUANYI (PART 2)

The Japanese Wikipedia has yet other renderings of the name of Genghis Khan's mother Höelün Üjin 'Lady Hoelun', a.k.a. Empress 宣懿 Xuanyi:

The katakana spelling looks like a transliteration of Höelün sans diacritics.

The Old Mandarin spelling 月也倫 *ɥe je lun has front vowels unlike the Secret History spelling 訶額侖*o o lun. The first spelling seems to represent [øelyn], whereas the second spelling might represent [hoəlun]. Do the spellings represent two different Mongolian dialects: one with Turkic-style palatal harmony and another with height harmony?

vowel class
Written Mongolian
palatal harmony dialect
height harmony dialect

The first character of the Yuan shi transcription is crucial: Old Mandarin 月*ɥe cannot stand for a simple [o] which would have been transcribed as Old Mandarin *o. And Mongolian vowel harmony dictates that vowels within a word must match in terms of 'gender': feminine [ø] must be followed by feminine [e] and [y]. Old Mandarin had no syllable *e, so 也 *je was the best available approximation of [e]. Old Mandarin had no syllable *lyn, so 倫 *lun was the best available approximation of [lyn].

The second character of the Secret History spelling訶額侖 is also crucial: Old Mandarin 額 *o cannot stand for a simple [e] which would have been transcribed as Old Mandarin *je. Old Mandarin had no syllable *ə, so 額 *o was the best available approximation of [ə]. In theory 額 *o could even represent [o] or [ø], but the Written Mongolian spelling <a> for this vowel rules out rounded vowels which would have been spelled as <ui>. The other characters are ambiguous out of context:

The Arabic script transcription is ambiguous: <ʔwlʔwn> could represent either [øelyn] or [oəlun] - or even other possibilities that the Chinese and Mongolian spellings rule out: e.g., [ulun].

The Arabic script transcription <fwjyn> looks like a straightforward transcription of Old Mandarin 夫人 *fu žin 'lady' rather than the Mongolian borrowing of that Chinese word as üjin 'id.'

2. The English Wikipedia article on Höelün says

also had a nephew named Palchuk who married a sister of Genghis Khan (Temülün, whose name is misspelled as "Temulin")

The name Palchuk has an un-Mongolian initial p-. Earlier *p- became h- or zero in Mongolian. If Palchuk isn't Mongolian, what is it? It sounds Ukrainian. But seriously ...

The Japanese Wikipedia, on the other hand, says Genghis Khan's sister 帖木倫 Temülün married 不禿 Butu of the Ikires. Höelün was of the Olqonud, not the Ikires, so a nephew of Höelün would be likely to be of the Olqonud too. ANOTHER EMPRESS XUANYI (PART 1)

When I refer to "Empress 宣懿 Xuanyi" on this blog, I refer to 蕭觀音 Xiao Guanyin (r. 1055-1075) of the First Khitan Empire.

But it turns out there are two other Empress Xuanyis:

Today the spelling of Höelün Üjin 'Lady Hoelun' in a 1908 edition of the Secret History of the Mongols caught my eye:


Old Mandarin *o o lun u tʂin

Where's the Old Mandarin *x- that should correspond to Middle Mongolian (MM) h-?*o looks like an error for 訶 *xo.

If the Secret History were all that remained of Mongolian, we might have to assume Höelün was Xoolun. How do we know Xoolun stood for Höelun? Even if we didn't have modern Mongolian Өэлүн <Öelün>, we could still get closer to the original via the Written Mongolian (WM) spelling ᠥᠭᠡᠯᠦᠨ <uikalun>:

Putting the MM and WM evidence together, I could reconstruct a Proto-Mongolic name *Högelün. (There is no 'Old Mongol'.) *h- goes back to an even earlier *p-.

The word üjin (WM <uijin>) 'lady' is a borrowing from Late Middle Chinese or Liao Chinese 夫人 *fuʐin 'id.' That word was also borrowed into Khitan as

<> pusin.

I'm surprised Chinese *f- wasn't similarly borrowed into pre-MM as *p- which would have become MM h-, not zero. Was *fuʐin borrowed into pre-MM as üjin without any initial consonant? THÁNH GIÓNG (PART 2)

1. How was the name 揀 Gióng pronounced in earlier Vietnamese? The vowel is certain. The rest is not:

gi- could be from *kj-, *CVc-, or *pl-
-ng could be from *-ŋ or *-n

Nom spelling variants of Gióng might be able to narrow down the possibilities:

All three types of variants share the semantic element 扌 <HAND> since gióng means 'to beat (a drum'. So does the name Gióng mean 'The Drumbeater', or is it an unrelated homophone written with characters originally devised for gióng 'to beat (a drum)'? (No, see below.)

1a. The initial of Gióng

Trần Quốc Vương's "The Legend of Ông Dóng from the Text to the Field" (1995) has made me rethink everything I just wrote above. Here's what I think happened now:

Trần and Cao Huy Đỉnh (1967) think Dóng is related to dông 'storm', but the vowels and tones do not match, so I think the words are unrelated. I can't find any Vietnamese word dóng other than the name, but I wonder if the name might have cognates in other Vietic languages.

1b. The coda of Gióng

The oldest spelling 扶董 points to *-ŋ. So do 𢫝𢶢. 揀 has an -n-phonetic, but is overruled by 扶董; it must be a later spelling created by someone speaking a nonnorthern dialect in which *-n shifted to [ŋ].

(Note that -on and -ong have not become homophonous in any dialect as far as I know: the distinction between the two in nonsouthern dialects is [ɔŋ] vs. [awŋ͡m] corresponding to [ɔn] vs. [awŋ͡m] in the north.)

1c. A chronology of spellings of Gióng:

Trần (1995: 27) "would like to conclude that the impact of Indra [of the Cham] on the portrayal of Phù Đổng is undeniable. In other words, Phù Đổng Thiên Vương [Heaven King] is, in fact, the Vietnamese metamorphosis of Indra."

I'd like to read an article on the Cham element in Vietnamese culture. Unfortunately recovering similar substratal elements in the Korean and Japanese cultures would seem to be more difficult given the extinction of other cultures on the peninsula and in the islands; we can't say belief or practice X is from Y if we don't even know what Y is like.

Sino-Vietnamese 天 Thiên 'heaven' in 扶董天王 Phù Đổng Thiên Vương 'Heaven King Phù Đổng' sounds like pʰatʰɛ̂ːn 'sky' in the Vietic language Thavung which unlike Vietnamese doesn't have an enormous number of Chinese borrowings. It took me almost three hours to realize that pʰatʰɛ̂ːn is a borrowing from Lao ຟ້າແຖນ [fȃː tʰɛ̆ːn] 'sky' (poetic), a synonym compound of native Lao [fȃː] 'sky' and [tʰɛ̆ːn], a Lao borrowing from Chinese. Were all Thavung words of Chinese origin borrowed recently through Lao? THÁNH GIÓNG (PART 1)

1. Most of the characters of Vietnam's mythical past have anachronistic Sino-Vietnamese names. One exception is 聖揀 Thánh Gióng 'Sage Giong'. 聖 is of course a Sino-Vietnamese title for 'sage' and not a name. ButGióng is an indigenous name with an unusual nom spelling 揀.

The Sino-Vietnamese pronunciation of 揀 is giản with -n, not -ng. Normally Chinese characters with Sino-Vietnamese readings  ending in -n are not used to write native Vietnamese words ending in -ng. But in this case and others like it, I wonder if whoever chose -n characters for native -ng words spoke a dialect with an [n] > [ŋ] shift: i.e., a dialect in which 揀 giản sounded like giảng which is closer to Gióng. Codas in central and southern dialects have undergone a chain shift:

[ɲ] > [n] > [ŋ]

[c] > [t] > [k]

If 揀 originated as a nonnorthern spelling, how was Gióng spelled in the north?

Nom is usually treated as a single body of characters even though it was in use for centuries throughout Vietnam. I'd like to see that body analyzed into geographical, dialectal, and chronological strata. Nom could tell us about when and where sound changes occurred: e.g., when and where -n and -ng characters were first confused (implying [n] > [ŋ]).


Last night I somehow got the idea that the Jurchen phonogram

<gon> [kɔɴ]

might be related to Chinese 恭 <REVERENT> (old and calligraphic forms) rather than 拳 <FIST> (as I wrote about two days ago). I don't know how 恭 <REVERENT> got into my head. I thought I might have seen it when I was scrolling through Wells' "" (2011), but it's not actually there.

In Late Middle Chinese, 恭 <REVERENT> was pronounced *koŋ (cf. Sino-Korean 공 kong) - a good match for Jurchen [kɔɴ]. No need to invoke Old Chinese as I did with 拳 <FIST>. The Jurchen character could simply be based on a Parhae variant of Late Middle Chinese 恭 *koŋ.

Tonight I was wondering if a machine could be 'taught' to find potential Chinese graphic cognates for Khitan and Jurchen characters. One could start training it with Khitan and Jurchen characters identical in shape with Chinese characters: e.g.,

Then one could introduce Khitan and Jurchen characters nearly identical in shape with Chinese characters: e.g.,

Ultimately, one would then 'feed' the machine Khitan and Jurchen characters that have no obvious Chinese graphic cognates and 'ask' the machine if there are any near-matches: e.g., for Jurchen <gon>.

One could even add a phonetic dimension to the search process and get the machine to favor potential Chinese graphic cognates with readings close to the Khitan or Jurchen readings (whenever known). BAIQUIEN

Thirty-five years ago today, I got my first exposure to 連環畫 lianhuanhua - a copy of The People's Comic Book translated by Endymion Wilkinson - at a school fundraising carnival, unaware that just a short walk away, the University of Hawaii would one day have a lianhuanhua collection.

Here's a Sinification I had to DuckDuckGo because I couldn't guess what the original was: 白求恩 Báiqiúēn.

Select the blank area below to see what the original is:

白求恩 Báiqiúēn is (Norman) Bethune.

Bái is a Chinese surname that sounds like Be-.

I was surprised by qiú [tɕʰjow] for -thu- [θuː]. I would have expected t [tʰ] or s [s] intead of q [tɕʰ] for [θ]. But eventually I realized that 求恩 <SEEK FAVOR> is a meaningful verb-object sequence as well as a loose approximation of -thune.

I had a vague memory of Wikipedia having an article on conventions for Sinifying foreign names. This wasn't it, but it was interesting nonetheless. It reminded me of the 佛菻 Fulin problem that I wrote about almost ten years ago. It also taught me the term graphic pejoratives for what I've called derography (derogatory spellings). JURCHEN FIST - BARRIER BREAKER

I've been slowly copying Kiyose's (1977) edition of the Sino-Jurchen vocabulary of the Bureau of Interpreters. This forces me to take a good look at characters and think about how they're pronounced.

Entry 43 is gonkeu 'mountain pass', a borrowing from northeastern Chinese 關口 *gonkeu 'id.' (lit. 'barrier mouth'):

<gon keu>

Note how the style of the two characters doesn't match. I can't find the second character in Jason Glavy's Jurchen font, in Jerry You's fonts, or in N3696. It seems to have been overlooked because it doesn't have an entry in Jin's (1984) Jurchen dictionary. It is identical in shape to N4631 1734 which is in Jerry You's Khitan large script font, so I've made an image of N4631 1734 in lieu of crafting a Glavy-style image.


<gon> [kɔɴ],

a transcription character for northeastern Chinese *gon-syllables (觀冠館 as well as 關), resembles Chinese 拳 <FIST> (see old forms here) and even vaguely sounds like its Early Old Chinese reading *NI-kron. Later readings of 拳 do not have o-like vowels:

If the character were 'invented' c. 1119 according to the conventionally accepted scenario, why modify Jin Chinese 拳 *küen [kʰɥen] to represent the syllable gon [kɔɴ]? There was no shortage of  Chinese characters with *gon-like readings (e.g., the aforementioned 關觀冠館) that could have been phonetically more appropriate models for a Jurchen character <gon>. I think Jurchen <gon> was inherited from an older tradition going back to a time when 拳 had *o in Chinese:

stage 1
stage 2
stage 3
stage 4
拳 Early Old Chinese *NI-kron 'fist'
Serbi script
(graph shape unknown)
*<gon>-like reading (and other non-Chinese-based readings?)

Parhae script

(graph shape unknown)

*<gon>-like reading (and other  non-Chinese-based readings?)

Khitan large script


Jurchen large script


Following Janhunen (1994: 114), I regard "the Khitan and Jurchen 'large' scripts [...] as parallel, rather than successive, developments" of the Parhae script, so I do not think the Khitan large script <gon>-like character from the epitaph for the 太師 Grand Preceptor (1056) as written in Jin (1984: 17) is ancestral to Jurchen <gon>. The two, however, should share a Parhae ancestor.

The problem with the above scenario - besides the fact that the hypothetical Serbi and Parhae ancestors of <gon> are not attested - is the huge gap of over a millennium and a half between Early Old Chinese c. 1000 BC and the (unattested!) Serbi script of c. 400 AD.

It's not entirely implausible, though, that some archaic nonprestige dialect of northern Chinese preserved *o as late as 400 AD. The 8th century Old Japanese phonogram 支 ki (earlier read *ke) reflects an Old Chinese *kie whose initial had palatalized to *tɕ- in prestige dialects long ago. The practice of writing *ke as 支 originated from the Korean peninsula centuries earlier and must have started at a time when some northern Chinese dialect still preserved *k-. (支 ki in Taiwanese and other Min varieties in the south preserves the original initial to this day.)

Jurchen <gon> has variants that do not look much like Chinese 拳 <FIST>:

in line 2 of the monument commemorating the victory of Emperor 太祖 Taizu of Jin over the Khitans in 1114 (1185; the earliest attested form)

~ on the bottom of ne 11 of the monument recording the names of successful candidates for the degree of 進士 jinshi in 1224 (Jin [1984: 17, 199] writes this character two different ways, and without seeing a photo or rubbing of the monument, I don't know which is correct)

(12.5.1:13: Jin and Jin [1980: 301] have the form with ㄴ in their hand copy of the text of that monument. I can't even find the character in this rubbing.)

The most 拳 <FIST>-like form

is first attested as a transcription of the first syllable of 觀音 *gonin 'Guanyin' in line 11 of the monument commemorating the foundation of 永寧寺 Yongning Temple (1413).

12.6.21:06: APPENDIX 1: Modern Chinese o-reflexes of 拳 Early Old Chinese *NI-kron 'fist' from Xiaoxuetang:

東鄉 Dongxiang
kʰuon 24
Pu-Xian Min
仙游 Xianyou
Eastern Min
福清 Fuqing
德慶 Deqing
西岸 Xi'an

Their k(ʰ)- does not reflect the original root-initial *k-; it is from a later fused *g- < *ŋg- < *ŋk- < *NIk-.

I do not know which of those forms is native and which is borrowed.

Moreover, I do not know the details of the phonological histories of those varieties, so I cannot be certain that their -(u)o- directly preserves Old Chinese *-o-. Late Old Chinese *-wɨa- or Middle Chinese *-wɨe- could have fused into -(u)o- later.

I have excluded reflexes like 弋陽 Yiyang Gan ɕʰyon 13 with fricative and affricate initials because they are less conservative-sounding: i.e., because their initials are no longer velar. But who knows, maybe their rhymes are more conservative than their initials.

I initially thought Yiyang Gan ɕʰ- was a typo for Yiyang Gan tɕʰ-, but Xiaoxuetang lists three other  characters read ɕʰyon: 穿權棬. All have [tɕʰ-] in standard Mandarin. If ɕʰ- is a typo, it's not an isolated one. On the other hand, Xiaoxuetang has 124 characters read with tɕʰ- in Yiyang Gan and 0 characters read with ɕʰ- and rhymes other than -yon. So either ɕʰ- has an extremely restricted distribution or it is a typo for the readings of one homophone set (穿權棬拳).

APPENDIX 2: 12.9.22:18: A history of 拳 'fist' from Old Chinese to modern standard Mandarin:

1. The root of 'fist' is *kron 'roll', which does not seem to occur by itself and therefore has no characters. It is also in 卷 *CI-kron-ʔ 'to roll'.

2. A prefix *NI- was added to this root.

The prefix has to have a nasal initial *N- to account for the later voiced initial (see steps 6-8 below).

The prefix has to have a high vowel *-I- to account for the later vocalism (see step 4 below).

The prefix seems to be a nominalizer: 'roll' > 'rolled thing' > 'fist'.

But Baxter and Sagart's (2014: 54) *N(ə)- was not a nominalizer; it converted transitive verbs into intransitive verbs.

Perhaps *NI- was *mI-. Baxter and Sagart (2014: 55) reconstruct an *m- that converts verbs into agentive/instrumental nouns and an *m- for body parts. But neither of these *m- (= *mI- in my system?) are good fits: an agentive/instrumental noun from 'roll' should be mean 'roller', not 'fist', and the body part prefix is added to nouns, not verbs.

3. *o broke to *wa: *NI-kron > *NI-krwan

I follow Starostin (1989) who posits *o-breaking at the 'Classical Old Chinese' stage immediately after the 'Preclassical Old Chinese' stage which is the earliest stage in his reconstruction.

4. A prefix *NI- with a high vowel triggered vowel bending in the following syllable: *NI-krwan > *NIkrwɨan with *a partly bent up to match the height of the unknown high vowel *I

5. The high vowel was lost:*NIkrwɨan > *Nkrwɨan

6. *N assimilated to *k- (if it wasn't already velar): *Nkrwɨan > *ŋkrwɨan

7. *k- assimilated to *ŋ-: *ŋkrwɨan > grwɨan

8. *ŋ- was lost: *ŋgrwɨan > *grwɨan

9. *-r- was lost: *grwɨan > *gwɨan

10. *-a- fronted: *gwɨan > *gwɨen

11. *-ɨ- fronted: *gwɨen > *gwien

12. *-wi- fused into *-ɥ-: *gwien > *gɥen

13. The level tone developed two allophones: one in syllables with *voiced initials like *g- and another in syllables with *voiceless initials like *k(ʰ)-.

14. *g- aspirated and devoiced: *gɥen > *gʱɥen*kʱɥen > *kʰɥen; the allophones of the level tone became phonemic after *g- and *kʰ- merged into *kʰ-

15. *kʰɥ palatalized: *kʰɥen > quán [tɕɥɛn]

What I wrote as *e might have been [ɛ] all along, but I have chosen a simpler symbol since there was no contrast between */e/ and */ɛ/ in diphthongs.

Some of the relative chronology is unclear: e.g., 14 and 15 must have followed 13, but I'm not sure whether 13 followed 12 which must have followed 11. (H)EARING HIPPOS

The Wiktionary entry for ear says its Persian cognate is هوش hush 'intellect' which surprised me because I expect Persian h- to correspond to English s-: e.g., Persian هفت haft : English seven. Wiktionary reconstructs 'intellect' at the Proto-Indo-Iranic level as *Hā́wšiH 'ears; understanding' and at the Proto-Indo-European level as *h₂ṓws 'ear'. Following Beekes, I interpret Proto-Indo-European *h₂ as *ʕ, and I interpret Proto-Indo-Iranic *H as a glottal stop (cf. Beekes' /ʔ/ < *H in Avestan). None of those sounds should correspond to Persian h which is from Proto-Indo-Iranian and ultimately Proto-Indo-European *s- (e.g., 'seven' which was *septḿ̥  in Proto-Indo-European). h- in Persian hush is irregular like the h- in Greek ἵππος híppos 'horse' < Proto-Indo-European *ʔéḱwos (cf. Persian اسب‎ asb 'horse' which has no h-). (The i of híppos is also irregular.)

Could the h- of hush and híppos be by analogy with h-words with similar semantics? But what would the models be? And how did h- appear in both the Kurdish and Persian forms of the word? h- seems to be an innovation in Middle Persian hōš (Old Persian ušiy has no h-). Did Kurdish acquire h- through contact with Persian?

What got me thinking about ears was a stand-up comic on the radio joking about boxen as a plural of box. That made me check what the Old English plural of box was (boxas) and look into Old English declension in general. Here's a colorful summary. Via Wikibooks I found earan 'ears' as an example of a real Old English n-plural.

Then I started thinking about French plurals and via Wikipedia found Mickael Korvin's nouvofrancet proposal to spell all plurals with -s, among other things (like respelling the -ais of the proposal's name as -et). Is there a book like Robbins Burling's Spellbound (now only $14.67 US on Amazon!) on French spelling reform?

ADDENDUM: 12.3.23:39: Today I realized that English ear and hear are near-homophones. I'm afraid to look for a folk etymology 'deriving' one from the other. To my surprise, Wiktionary derives hear from a Proto-Indo-European compound *h₂ḱh₂owsyéti < *h₂eḱ- 'sharp' + *h₂ows- 'ear' + *-yé- (denominative suffix) + *-ti (3rd person singular suffix). The h- is all that is left of *h₂eḱ- (and it is a remnant of *ḱ-, not *h₂- which I interpret as *ʕ-). ARIRANG KOREA

Tonight I discovered the TVK2 channel which airs content from Arirang TV whose onscreen logo alternates between English and Korean. I can't find the Korean logo online. It looks something like this:


"Something", because the letter ㅇ has the same shape and size at both ends of the logo which is almost symmetrical. So all the hangul letters are on the same line, whereas in normal hangul, <ng> would be under <r.a>: 랑, not 라ㅇ. Are linear hangul logos 'in' now, or is this logo an outlier?

If the Khitan small script had survived into modern times, how would it have been computerized? Hangul blocks represent syllables, but Khitan small script blocks represent words (including inflected forms) which are far more plentiful than syllables in any language. In pre-Unicode days, the KS X 1001 encoding of Korean only allowed for 2,350 out of 11,172 possible modern Korean hangul syllables. There must be more than 2,350 or 11,172 possible Khitan small script blocks. Unicode and sophisticated character-combining fonts can handle the Khitan small script now, but how would computers thirty years ago have handled them? Would pre-Unicode computerization have popularized linearization of the Khitan small script?

Back to Korean: I saw this episode of Gangnam Insider's Picks on TVK2 which mentioned Guardian: The Lonely and Great God at 17:00. The Korean title is

쓸쓸하고 찬란하神 – 도깨비

ssŭlssŭl-ha-go chhallan-ha-shi-n - tokkaebi

'lonely-be-and resplendent-be-HON- - goblin'

= 'goblin that is lonely and resplendent'

The title is written entirely in hangul except for the honorific-adnominal suffix sequence -shi-n written with the homophonous hanja 神 <GOD>. I've never seen this kind of hanja wordplay in modern Korean before. (The use of hanja to write homophonous Korean words is, of course, a core practice of the extinct hyangchhal and idu writing systems.)

Here's that show title and much more in calligraphy. CZUCHRY

Today UPtv's GilMORE the Merrier 153-episode marathon of Gilmore Girls ended.

One of the show's stars is Matt Czuchry who "is of Ukrainian descent on his father's side." I was surprised to learn that his name is pronounced [ˈzuːkri] in English rather than [ˈuːkri] which is closer to the Ukrainian pronunciation of Чухрій <Čuxrij> as [tʃuxrʲij] (where's the stress?). I suppose [z] is a spelling pronunciation of Czuchry which looks like a Polish-style romanization. Did Czuchry's paternal ancestors come from western Ukraine?

I got the Ukrainian spelling of Czuchry from the Ukrainian Wikipedia (which unfortunately does not specify the stress). The Russian Wikipedia simply Russifies the English pronunciation of his name as Зукри <Zukri> [ˈzukrʲi] instead of Russifying his Ukrainian name as Чухрий <Čuxrij> [tɕuxrʲij].

While I'm on the subject of Ukrainian names, Wikipedia has a list of "somewhat comical" Cossack surnames. My favorite is Добрийвечір <Dobryjvečir> 'good evening'. Google shows that surname is alive and well in Ukraine today. GHO GUO: THE COUNTRY OF 309

After spending almost all week on spellings of Khitan qudugh (if that is what


represent - see part 4 of "The Qudugh Question" for a different interpretation), I'd like to prove that I can think about other Khitan small script characters.

When looking for instances of isolated

140 en

in 契丹小字研究 Qidan xiaozi yanjiu (Research on the Khitan Small Script, 1985) while writing part 5 of "The Qudugh Question", I stumbled upon a hand copy of the text on a coffin containing ... 国 in line 4. Without seeing the actual coffin (photos would do), I think 国 - an exact lookalike of the Chinese character <COUNTRY> pronounced guó in modern standard Mandarin - might be a mistake for


Kane (2009: 72) transliterated 309 as <hó>. <h> is his symbol for [ɣ]. My uvular gh [ʁ] corresponds to his velar <h>. The acute accent indicates that a vowel may have "the same, or perhaps a similar pronunciation" as its unaccented counterpart. I am more agnostic about vowels, so I don't add any accents to avoid implying that all characters that Kane transliterates with accented <ó> share a vowel that distinguishes them from the characters that he transliterates with plain <o>. In this particular case, he needs an accent to distinguish between

076 <ho> and 309 <hó> (both <gho> in my system)

in his transliteration.

I don't know yet that 309 rhymes with

021 090 125 169

which Kane transliterates as <mó ó ió qó> to distinguish them from

021 090 168 <m(o) o qo>

in his transliteration. I can't find any <io> in his system, so I do not know why he transliterates 125 as <ió> on p. 302. He also transliterates 125 as <iáu> on p. 49 to distinguish it from

362 <iau>

which may be "an allograph" (and hence would have the same vowels). The problem of allography in the Khitan small script remains to be fully solved.

Let's focus on the problem of how to read 309. The key is how 309 seems to correspond to the transcription 訛 in the name 訛里本 (= Khitan Gholbun?) in 遼史 Liao shi (History of Liao, 1344; see Kane 2009: 72 for details). In 1344, 訛 was read as something like *o in Old Mandarin. But was 14th century Old Mandarin the language underlying the choice of 訛? In Liao Chinese 訛 was read as something like *ng(w)o. ng- is unlikely for a native Khitan word, and ngw- even less unlikely. (Initial ŋ- is uncommon in the 'Altaic' world, and ŋw- may be unknown except in loanwords.) So that seems to rule out interpreting 訛 in terms of Liao Chinese (though one must wonder how accurately Gholbun's name was preserved by the 14th century, two centuries after the fall of the [first] Khitan Empire in 1125 - the only datable Gholbun I can find is also known as 侯古 Hougu, sixth son of Emperor 聖宗 Shengzong [b. 972; r. 982-1031]).

Perhaps Khitan 309 gho [ʁɔ] was approximated in Old Mandarin as something like *o without any initial consonant since Old Mandarin had nothing like [ʁ]. 309 could not simply be o since a character for o has already been identified:


That character represents Liao Chinese *o in loanwords. 309, on the other hand, is apparently never in Khitan small script spellings of Chinese loanwords. That implies 309 represented a sound or a syllable absent from Liao Chinese. gho fits the bill: it has an un-Liao Chinese initial gh- disqualifying it from Khitan spellings of Chinese loans, and its vowel o matches the vowel of its apparent transcription 訛 *o.

11.30.21:57: APPENDIX: Other readings of 309

Liu (2009, 2014) reads 309 as u which doesn't match 訛 *o. It is, however, homophonous with Liu's readings of

076 e ~ u ~ ulu, 172 u, 245 u, 372 u, 131 u

which I read as gho, ugh, u, o, u, u, more or less following Kane (2009). (I can't explain the differences between the various u-graphs either. 131 is the usual graph for transcribing Liao Chinese *u, but 172 and 372 also transcribe that vowel. See Kane [2009: 246-247].)

Jishi (2012) reads 309 as k'ua ([kʰwa]?) which is even further from 訛 *o. 訛 did end in Middle Chinese *-wa (cf. the Middle Chinese-derived Sino-Korean reading 와 wa of 訛), but *-wa had become *-(w)o by the Liao dynasty, and 訛 never began with a voiceless stop, aspirated or otherwise. THE QUDUGH QUESTION (PART 5)

Here are the contexts of the two Khitan small script blocks


from Part 4:

1a. 蕭仲恭 Xiao Zhonggong 35.28:

<qatun.i 343.p.en FORTUNE₂ m.gha.379>

queen-GEN wine?-GEN fortune-GEN N write-PFV

'... wrote the queen's wine's? fortune's ?'

<343.p> may be a noun possessed by the queen and possessing 'fortune' in turn. It may be a variant spelling of <342.b> 'wine' (?; Kane 2009: 76) and <342.p> (if <342.p.en> is a genitive). (<p>/<b> alternation is common in Khitan.) 342 and 343 are similar:

But 'wine's fortune' seems like an unlikely combination of words.

<m.gha.379> is in the slot for a noun possessed by the preceding genitive of 'fortune' and an object of the verb cer 'wrote', so I expect it to be a noun.

1b. 蕭仲恭 Xiao Zhonggong 47.26:

<343.p.en FORTUNE₂ t.ugh.ii c.iu.ur.094.c>

'?-GEN fortune-GEN N V-after'

'After ?'s fortune's N V-ed,' or

'After [?] V-ed ?'s fortune's N,'

There's <343.p.en> again and in the same position before 'fortune'.

<t.ugh.ii> could be a verb ending in a converb <ii>, but 'fortune-GEN' needs something to possess, so I regard it as a noun which is either the subject or object of the following verb.

<c.iu.ur.094.c> ends in a converb <c> that I translate as 'after' (following Kane 2009: 153-154).

2. 仁懿 Renyi 5.29:

<326.041 c.l.ugh s.tumu FORTUNE₄.ń hong.ghu p.ud.z.iu TWO en b.qo ○ HEAVEN hong di>

'? ? fortune-GEN.PL Hongghu Pudziu two ? son heaven clear emperor'

'... Pudziu Hongghu of ... blessings' two ? son / the Heaven Clear Emperor'

<326.041> is a hapax legomenon.

<c.l.ugh> is also a hapax legomenon; it might be the singular of <> (for cVlughad?; Xingzong 28.7) which might have a plural ending in -ad (but I'd expect a plural ending in -ud with vowel harmony!).

It's tempting to assume <tumu> is 'ten thousand' which it can be elsewhere but not here with a preceding <s>. There do not seem to be any prefixes in Khitan, and I don't know of any numeral beginning with s-, so <s.tumu> can't be interpreted as 'X ten thousands' with <s> representing a reduced form of a numeral X. <s.tumu> doesn't have a verb ending, so it is unlikely to be the end of a clause. It may be a noun or adjective modifying 'fortune'. It could also be a variant spelling of <s.313> which occurs four times in Zhonggong. Characters 312 <tumu> and 313 <?> are identical in shape except for the location of their right-hand dots:

<p.ud.z.iu> is a unique spelling of a title for Khitan noblewomen that appears elsewhere as <p.ü.z.iu>, <b.ü.z.iu>, and <p.ü.089.iu> (more here). The name of this pudziu is Hongghu, a hapax legomenon.

<TWO en> is strange. First, I would expect <TWO.en> as a single block. Second, reading <TWO en> as 'two-GEN' = 'of two' doesn't make sense in this context: why would 'son of two' be after the name and title of a woman? Third, 'second' might make more sense, but 'second' for masculine nouns like the following <b.qo> 'son' is <> ~ <>, not <TWO>. Fourth, <TWO> without a dot is grammatically feminine, not masculine. Fifth, <en> is almost always never by itself; the only other instance of isolated <en> that I know of is in a wall inscription of the 萬部華嚴經塔 Wanbu Avataṁsaka Sutra pagoda.

'Heaven Clear' following a space of respect transliterated as ○ (and converted to '/' in the translation) is the Khitan era name corresponding to Liao Chinese 清寧 *cingning (now Qingning; 1055-1064) 'Clear and Tranquil' in Chinese. The emperor of that era was 道宗 Daozong, first (not second!) son of Empress Renyi (birth name 撻里 Dali, not Hongghu!), the subject of this epitaph. In  other words, Daozong isn't the <b.qo> 'son' before the respectful space.

Someone else is the son of two - Pudziu Hongghu and some man. Might the mystery words before 'Pudziu Hongghu' be the man's name? I could parse the mystery phrase as

X Hongghu pudziu two-GEN son

'son of the couple - X and Pudziu Hongghu'

X might be somewhere in <326.041 c.l.ugh s.tumu>. The hapax legomena <326.041 c.l.ugh> might be a name. THE QUDUGH QUESTION (PART 4)

In Part 3 I mentioned that these proposed characters for qudugh 'good fortune' previously regarded as two-character blocks


were attested as parts of larger blocks in 契丹小字研究 Qidan xiaozi yanjiu (Research on the Khitan Small Script, 1985):


might be a genitive plural ending which can follow final consonants (Kane 2009: 135), but <> -an looks like a genitive ending for an -a-final noun (Kane 2009: 132) ... which qudugh is not. Could <FORTUNE₂> represent a Khitan cognate of Written Mongolian aja 'fortune'? Could all of the five <FORTUNE> graphs


represent that a-final word? qudugh is attested phonetically in Chinese transcription as 胡覩古 *xutuku and 胡都 *xutu¹. Could its Khitan small script spelling be something other than a form of <FORTUNE> - a block of multiple phonetic characters?

(On to Part 5)

¹11.29.11:33: The reasoning for the interpretation of *xutu(ku) as qudugh: THE QUDUGH QUESTION (PART 3)

(Back to Part 2)

I put off my original plans for Part 3 just minutes ago when I spotted an unusual Khitan small script block configuration in 契丹小字研究 Qidan xiaozi yanjiu (Research on the Khitan Small Script, 1985: 457):

The normal three-character block configuration has two characters on top and one on the bottom:


<013.224.327> <?> (Gu 12.1)

A less common configuration has a wide horizontal character atop two characters:


<001.251.257> <?.n.em> (Ren 13.5)

This new (to me) configuration has a tall vertical character to the left of a stack of two characters:

<335.327.054> <> (Xing 27.12)

That combination looks like the single character

<380> qudugh 'good fortune'

and is even glossed on p. 500 of Qidan xiaozi yanjiu as 福 'good fortune'.

So I now think there are at least five different versions of the single character qudugh 'good fortune':


(11.28.23:41: Transliterations added.)

1. 380, previously regarded as <335.277>

2. serial number needed, previously regarded as <335.275>¹

3. serial number needed, previously regarded as <335.276>

4. serial number needed, previously regarded as <335.278>²

5. serial number needed, previously regarded as <335.327.054> (note the proportions of the components)

(On to Part 4)

¹11.28.15:30: I've extracted this character from a larger block in Qidan xiaozi yanjiu. See part 4.

²11.28.15:30: I've extracted this character from a larger block. in Qidan xiaozi yanjiu. See part 4. THE QUDUGH QUESTION (PART 2)

Yesterday in Part 1, I mentioned that one reason I didn't think

<380> qudugh 'good fortune'

in the Khitan small script was a sequence of two characters

<335.277> <ia.?>

was that qudug does not begin with ia- (i.e., the reading of 335). But what is the reading of 277?

Maybe 277 has no reading. In 契丹小字研究 Qidan xiaozi yanjiu (Research on the Khitan Small Script, 1985), a quartet of characters

<277 275 276 278>

transliterated by Wu and Janhunen 2010 as <LUCK LUCK₂ LUCK₃ LUCK₄> always occur in second position after 335. I suspect that the three two-character blocks

<335.275> <335.276> <335.278>

are really three characters

(note the proportions of the components)

that are variants of

<380> qudugh 'good fortune' (Wu and Janhunen's <LUCK₅>).

So I think three new serial numbers are needed for those variants. And the serial numbers 275-278 should be retired except for historical purposes ("we used to think 275-278 were characters, but we don't anymore"). I don't object to 275-278 being in Unicode, as there has to be a way to discuss elements that were regarded as characters for decades.

I initially thought that 275-278 and

<448> (function unknown)

formed a graphic family sharing 厶 on top, but now I don't think 275-278 are independent characters. 448 seems to be isolated within the Khitan small script character inventory unless it turns out to be a variant of a character without 厶 on top. THE QUDUGH QUESTION (PART 1)

Today I was copying line 11 of the Khitan small script epitaph for Liao dynasty Empress 宣懿 Xuanyi (1040-1075) which contains the word qudugh-er 'good.fortune-ACC'. In 契丹小字研究 Qidan xiaozi yanjiu (Research on the Khitan Small Script, 1985), the word appears as a block of three characters


However, looking at the actual inscription at Wikipedia, I see that the character looks like this:

Can you spot the difference?

It's the proportions of the two elements

in the top half of the block. In Qidan xiaozi yanjiu, they are roughly the same width, whereas in the actual inscription, the left element is about 60% the width of the right element. A narrow left-hand element is characteristic of many two-element Chinese characters: e.g., 福 'good fortune' in which the left side ネ is narrower than the right side 畐. To put it another way, both ネ and 畐 are narrowed in side-by-side combination, but ネ is more compressed than its neighbor. And the proportions of


are closer to those of the single character 福 than to a two-character sequence ネ畐.

So I see for myself now that Kane (2009: 81, 99) is right: <335.271> is a single logogram 380 (or 379, as he numbered it on p. 305):


I've always thought that was a single character because of a reason I'm surprised Kane doesn't bring up. 335 in other contexts is read ia, and obviously qudugh doesn't begin with ia-.

Kane (2009: 99) suggests that 380 may be "derived from the cursive form of the Chinese character 福 fu 'good fortune, happiness'." The site (词典网 Dictionary Net) has 23 samples of 福 in cursive. If Kane is also correct about his derivation of 380 (and I think he is), 380 is a rare Khitan small script character that not only mimics the shape of a Chinese character but also represents a Khitan word (qudugh) with a meaning similar to that of the word represented by that Chinese character (Liao Chinese fuʔ; modern standard Mandarin has lost the glottal stop). Usually Khitan small script characters with Chinese lookalikes have functions completely different from those of their apparent graphic models: e.g.,

shape Khitan small script
Liao Chinese


fourth Heavenly Stem

(none; phonogram)
third Heavenly Stem




335 nor 277

looks exactly like any Chinese characters. And even if they did, their functions could not be guessed on the basis of their hypothetical Chinese lookalikes.

I have no idea why 335 has the reading ia, and I have no idea what the reading of 277 is. I have no time either, so I'll have to look into 277 ... another time.


(Back to Part 2)

The blocks of Jurchen characters resembling Khitan small script blocks in 弇州山人四部稿 Yanzhou shanren sibu gao (Draft [Catalog of] the Four Categories of Yanzhou Shanren['s Library]; 16th c.) and 方氏墨譜 Fang shi mopu (Mr. Fang's Ink [Cake] Book, 1588) exemplify an extreme version of the loyalty principle from part 2: each block is a translation or borrowing of a Chinese monosyllabic word in Chinese word order:

block 1
block 2
block 3
block 4
English gloss
bright > wise

heedful-if-? virtue
Ming Chinese *1'miŋ
English gloss bright > wise prince
Jurchen duin
English gloss four
Ming Chinese
English gloss four

(I have slightly altered Kiyose's reading of the Jurchen.)

The first line would have object-verb order in regular Jurchen:

'wise prince virtue heedful-if'

This special kind of Jurchen appears to be related to the highly sinified Jurchen of Ming dynasty petitions. I hypothesize that unlike the Japanese who read Chinese in a highly stylized Japanese that still maintained Japanese syntax and morphology, the Jurchen read Chinese in a highly stylized Jurchen with Chinese syntax and little morphology.

Notes on the words (the blocks will have to wait until part 4):

Line 1

1. genggiyen 'bright, wise': cf. Manchu genggiyen 'id'.

2. wang 'king, prince': a borrowing from Chinese; cf. Manchu wang. Kiyose wrote wan, probably since Jurchen only had -n (possibly [ɴ] as in Japanese) in native words, but the Bureau of Translators vocabulary transcribes this word as 王 *1'waŋ whose *-ŋ could either point to -n [ɴ] or even -ng as in Manchu. I would be more certain about wan if this word were transcribed in Chinese with a *wan character.

3. tiko-ci-ghun 'if heedful': cf. Manchu -ci 'if'. tiko- is not cognate to Manchu yohi- 'to pay heed to', and -ghun is a verbal suffix of unknown function without any Manchu cognate.

4. de 'virtue': a Chinese borrowing. Could this have coexisted with or even replaced a Jurchen cognate of Manchu erdemu 'virtue'?

Line 2

1. duin 'four': cf. Manchu duin. (I dropped Kiyose's -w- since there is no distinction between ui and uwi in Jurchen.)

2. tulile 'outside': cf. Manchu tule 'id.' (with haplology: i.e., loss of -li- before a similar -le?).

3. hiyen 'all': a Chinese borrowing. This word is literary in Chinese and is probably not the normal Jurchen word for 'all'.

4. andahai 'guest': cf. Manchu antaha 'id.' Did Ming Jurchen shift *nt to *nd? If it did, are words like fanti 'south' and fonto 'chestnut' (the only known cases of -nt- in the Bureau of Translators vocabulary) loanwords? THE ETYMOLOGY OF CANTONESE 'TONGUE'

Wiktionary gives two etymologies for Cantonese 脷 lei⁶ 'tongue':

From(“benefit; profit”), used as a euphemism for “tongue”), which is homophonous to 折, 蝕/蚀 (sit6, “to be at a loss”).

Alternatively, it may be from 舐 (OC *ɦljeʔ [= *mI-leʔ in my reconstruction], “to lick”), preserving the Old Chinese initial *l- (Schuessler, 2007).

Both of these proposals have issues.

Let's start with the second one which is closer to my position. 舐 *mI-leʔ should hypothetically become Cantonese ˟sei5, not lei6. I think it might be more accurate to say that lei6 is related to 舐 *mI-leʔ rather than from it. lei6 may be from a derivative of 舐 *mI-leʔ that lost its first syllable and has a nominalizing suffix *-s ('lick-NMLZ' > 'that which licks' = 'tongue'):

*mIleʔ-s > *mIlies > *məlieh > *lie̤  > 19th c. li6 > lei6

The presence of *mə- blocked the Late Old Chinese sound change *l- > *j- from applying to -l-. *mə- must have been dropped at some point after *l- > *j-.

Now for the homophone avoidance taboo etymology [revised 11.24.20:06]: It makes sense in Cantonese now, but would it have made sense at the Proto-Yue level? The 脷 lei6-type word for 'tongue' is widespread throughout Yue Chinese¹ (see this map), and therefore is likely to have been in Proto-Yue. 舌 'tongue' and 折 (蝕 is a Cantonese respelling) 'to be at a loss' are homophonous in the Middle Chinese phonological tradition and were probably also homophonous in Proto-Yue.

However, Wiktionary also reports 脷 for 'tongue of an animal' in Sichuan (i.e., Sichuan Mandarin) and Hakka, though it does not specify any dialects. There is no Yue in Sichuan, so 脷 there cannot be explained away as a Yue loan. The only Hakka 脷-like word for 'tongue' that I could find in Wiktionary is 脷錢 'tongue' in 陸川 Luchuan Hakka. 脷錢 in that variety and in 柳州 Liuzhou Mandarin is a borrowing from neighboring Yue dialects and is not evidence for reconstructing the word represented by 脷 back to the common ancestor of Hakka and Mandarin as well as Yue.

But if 脷 is in one or more varieties of Sichuan Mandarin, then the word represented by 脷 would be reconstructible in Old Chinese. (The character脷 seems to be a relatively recent invention and cannot be 'Old Chinese'.) And in the early 2nd century, Late Old Chinese 舌 *ʑɨat 'tongue' and 折 *dʑiet 'to lose' might not yet have become homophones (I don't have any rhyming data for that period), so there would be no motivation to replace 折 with 利 *lis 'profit', assuming homophonic substitution existed 1900 years ago.

Without any source to verify that 脷 is in Sichuan, I wouldn't bet on that scenario.

So I need to go back up to the Yue level and ask:

1. How old is the practice of homophonic taboo substitution? Can it be reconstructed at the Proto-Yue level?

2. How common is the practice of homophonic taboo substitution in Yue varieties?

3. Where did the practice of homophonic taboo substitution originate?

4. How did the practice of homophonic taboo substitution spread: inheritance or diffusion?

¹The earliest attestation of the character 折 representing a word 'to lose' (not quite 'to be at a loss', but close enough) is in 漢書 Hanshu (Book of Han, 111 AD).

APPENDIX: Cantonese readings of 舐

Above I wrote that the expected reading of 舐 should be ˟sei5. 粵語音韻集成電子版 A Chinese Talking Syllabary of the Cantonese Dialect: An Electronic Repository has five readings:

Now for the first. As far as I know,

APPENDIX 2: How can Old Chinese *sl- fuse in two different ways?

The two fusions occurred at different times. I don't know the relative chronology, so I provide two different scenarios below.

Scenario 1: *sl- > *l̥- occurred first.

stage 1
stage 2
stage 3
stage 4
*sIl- *sI-
*l̥- *l̥- laai2
*sIl- *sIl-

Scenario 2: *sl- > *s- occurred first.

stage 1
stage 2
stage 3
stage 4
*sIl- *sI-
*s- *s- saai2
*sIl- *sVl-
*l̥- laai2

At an even earlier stage some or all *sl- could have been *sVl-. First vowel loss in both scenarios above is unpredictable; both monosyllabic and disyllabic variants of *s(I)-leʔ existed in stage 2, just as full and abbreviated variants exist in English today (e.g., select [səˈlɛkt] ~ [slɛkt]). SINO-TIBETAN WORDS FOR 'TONGUE'

The native Cantonese word 脷 lei⁶ 'tongue' is reminiscent of l-words for 'tongue' found elsewhere in Sino-Tibetan: e.g.,

I wish I knew the Pyu word to complete the set of the 'big five' Sino-Tibetan literary languages, but Pyu basic vocabulary is all but unknown.

To keep things simple, I have not looked at other potentially related *l-words in Chinese, much less other Sino-Tibetan *l-words for 'tongue' or 'lick' available at STEDT.

Before one jumps to the conclusion that all of the above must share an *l-root, one should note Schuessler's (2007: 467) warning:

Initial *l- is a near-universal sound symbolic feature for 'lick / tongue', hence similar words in other languages are not likely to be related, such as MK-PVM [Mon-Khmer-Proto-Viet-Muong] *laːs 'tongue' [Ferlus]; Kam-Tai: S[iamese] liaA2 < *dl- 'to lick' [cf. ], PKS [Proto-Kam-Sui] *lja² ? [Thurgood].

Proto-Kra *l-maA 'tongue' (Ostapirat 2000: 223; cf. Proto-Kam-Sui *maA 'id.' [Peiros]), Proto-Hlai *liːnʔ 'id.' (Norquest 2016 appendix: 127), Proto-Tai *liːnC 'id.' (Pittayaporn 2009: 389), and Proto-Austronesian *lidam (on the basis of only Puyuma and Rukai; Blust and Trussel 2019) also fit the pattern. (A single Proto-Kra-Dai word for 'tongue' doesn't seem to be reconstructible.)

Continental 'Altaic' words for 'tongue' have noninitial l-: Ming Jurchen ilenggu ~ ilenggi, Written Mongolian kelen, and Turkish dil. (But peripheral 'Altaic' words don't: e.g., Korean hyŏ < *he and Japanese shita.)

European examples are English lick and Latin lingua 'tongue'. (The latter, of course, has an irregular l- < *d- which became the t- in tongue. Wiktionary derives the l- of lingua by analogy with lingō 'I lick', the true Latin cognate of lick. If we ignore that inconvenient fact, we could be daring and 'reconstruct' a 'Proto-World' *lV 'tongue/lick'. No.)

Schuessler was of course warning against linking Sino-Tibetan words to non-Sino-Tibetan words which happen to share the same initials, but lookalikes do also occur within families: e.g., lick and lingua. There could, at least in theory, be two unrelated lateral roots for 'tongue' in Sino-Tibetan.

Trying to reconcile the small set of Sino-Tibetan forms that I listed at the beginning runs into all sorts of difficulties:

Prelaterals (i.e., whatever comes before the L: prefixes or first syllables of disyllabic roots?): If Old Chinese *mI- and pre-Tangut *PI- are prefixes, what are their functions? Maybe the unknown pre-Tangut labial *P- was *m-. (The high vowel *-I- in both proto-forms is needed to account for the fronting of *a.) The labials in those prelaterals clash with Burling's Proto-Lolo-Burmese alveolar *s- and Hill's pre-Tibetan velar *ɣ-.

Laterals: Chinese and Proto-Lolo-Burmese have a voiced *l-, pre-Tangut has voiceless *l̥- (pre-Tangut *Sl- would correspond to Tangut l- + vowel tension), and pre-Tibetan has both voiced *-lʲ-. and voiceless *-l̥ʲ- with palatalization that might be a trace of a preceding high vowel *-I-:

Il̥- > Il̥ʲ > *ɣl̥ʲ-

Il- > Ilʲ > *ɣlʲ-

Vowels: Three types are in the six words at the top:


- the same three stops that might have preceded *-s in pre-Cantonese.

If one regards the various codas as suffixes, one should ideally be able to identify the functions of those suffixes. Affixation can be a dangerous pseudoexplanation for mismatching segments in forms under comparison.

This exercise shows how far we are from being able to reconstruct Proto-Sino-Tibetan. Much more work needs to be done on subgroups before the outlines of their common ancestor can emerge.

