It's still the year of the pig in traditional East Asian calendars, but it's the year of the rat (2020) if one coordinates the Chinese animal cycle with the Gregorian calendar:

Last night I realized that Khitan small script character 216 might be a derivative of 118 <qu>:


Let's assume 216 was <qu*> with <*> indicating 'different from <qu> in some way'. Then

<216.151> 'rat'

would be read <qu*ghu> which is close to Written Mongol qulughana 'rat'.

What if <qu*> were <qul>? <qul.ghu> is close to qulughana, but I wouldn't expect Khitan u to correspond to Written Mongol a.

That's where I left off last night. Today I realized that <ghu> might be read <ugh> after a consonant. So maybe <216.151> was read <qul.ugh> which is even closer to Written Mongol qulughana and requires no vocalic gymnastics.

The low frequency of 216 (7 times in the 契丹小字研究 Qidan xiaozi yanjiu corpus and 0 times in initial position in Wu and Janhunen 2011 [whose index is organized by initial graphs]) suggests that it probably did not represent a simple CV syllable. If it didn't represent the CVC syllable qul, it may have represented a CVCV sequence qulu, and <qulu.ugh> was read qulugh.

The <qul(u)> hypothesis could be confirmed if 216 alternated with <qu.l>, <qu.ul> (= <qu.lu>?), etc.

As far as I know, 216 appears only in initial position with one exception: this block

<119.216> <dau.?>

from line 3 of the second inscription in the 萬部華嚴經塔 Wanbu Avataṁsakasūtra Pagoda in Hohhot.

2. I still practice writing Tangut, Khitan, and Jurchen (TJK) every day. Recently I added Manchu to my regimen and today I started writing Mongolian (in the traditional script - I still don't know how to handwrite Ө and Ү in Cyrillic).

All my TJK exercises begin with the date. I'm still going to date these blog entries in Jurchen since it's the thousandth anniversary of the Jurchen large script or close to it (see Kiyose [1977: 22] for three possible dates: 1119, 1121, and 1123; Kane [1989: 3] gives the date 1120, though Kane [2009: 3] gives the date 1119). Today's date in Jurchen is:

songgiyan uliya aniya juwa emu biya ice nadan inenggi

'yellow pig year, ten one month, new seven day'.

3. Last night I learned about prothesis in Bashkir:

The prothesis is mostly unsurprising, but these correspondences are:

1.2.11:00: I forgot to mention these cases of prothesis in native words:

Without more Bashkir data, I can't test my guesses for motivations: e.g., avoiding initial l- and making monosyllables disyllabic.

The Bashkir letter ҡ <q> surprised me since I'm accustomed to қ <q> from Kazakh, etc. Why do Bashkir and Siberian Tatar have their own special ҡ <q>? Siberian Tatars were educated in (Volga) Tatar which has к <k> for /k/ (including a [q] allophone) and  къ <k"> for /q/.

4. Today I learned about the Caucasian Albanian script used to write a (near?-)ancestor of the Udi language.

I've thought Old Chinese might have had pharyngealized vowels, so I'm interested in the phonetics of Udi's pharyngealized vowels.

5. What is the etymology of Persian شمشیر <šmšyr> shamshir, first (?) attested in Middle Persian as <šmšyl>? It doesn't look Indo-European. Is it an areal word?

6. Why does the Persian word/name فرشته <frsth> fereshte < firishta sometimes appear as Farishta(h), e.g., in this 1958 Bollywood film title (फरिश्ता Phraiśtā; cf. Urdu فرشته Firishta) and this list of Pashto (not Persian, I know) names? YELLOW PIG 12/6

songgiyan uliya aniya juwa emu biya ice ninggu inenggi

'yellow pig year, ten one month, new six day'

1. Last night I looked up 'hip bone' and discovered it could also be called the innominate bone. Why 'nameless'?

2. Are 清樂 Shingaku 'Qing music' lyrics an overlooked source of data for premodern Mandarin reconstruction? In this sample from 月琴樂譜  Gekkin gakufu (Moon Guitar Sheet Music, 1877), 兒 (now ér [aɚ˧˥] in modern standard Mandarin) has the furigana ルウ <ruu>. That seems to indicate that the kana transcription is based on a dialect in which 兒 was pronounced like [ɻ̩]. (Other evidence rules out the most obvious interpretation [ruː]: e.g., no Mandarin dialect has [u] in 兒.)

The date of the text does not necessarily indicate that the [ɻ̩] pronunciation still existed in the source dialect as of 1878. The kana spelling ルウ <ruu> could have been copied from some earlier source.

ルウ <ruu> bears no resemblance to ジ <zi> [dʑi], the usual Japanese reading of 兒. Strictly speaking, the two Japanese borrowings are not from the same dialect in two different periods: <zi> is from a 7th century northwestern Chinese dialect, whereas <ruu> is from a Qing (perhaps 18th century?) Mandarin dialect. Nonetheless the latter probably underwent more or less the same changes as the former, so as a convenient fiction, here's how the sources of <zi> and <ruu> could be bridged:

Modern standard [aɚ] is from a stage 5-type form that developed a prothetic vowel:

*ɻ̩ > *əɻ > > [aɚ]

In some Mandarin varieties, only the prothetic vowel  has survived without any trace of retroflexion: e.g.,  壽縣 Shouxian [ə] and 鳳陽 Fengyang [a] for 兒.

It is tempting to derive Sino-Korean 아 a for 兒 from a Fengyang-like form, but that would be anachronistic. Fengyang [a] is probably a very recent development from *ar, whereas the earliest attested ancestor of 아 a is ᅀᆞ borrowed from a form like stage 4 *ʐɻ̩. became ʌ in the 16th century, and ʌ then became a in the 18th century.

3. I don't understand how Korean z vanished without a trace. Lee and Ramsey (2011: 142) state that "early examples of the elision of z are all restricted to the environment _i, y, which suggests that the process of change started there." They give these examples:

In those particular cases, I can imagine /z/ being phonetically something like [ʑ] that lenited to [j] and then disappeared before /i/. But what were the intermediate stages between /z/ and zero in initial position before /ʌ/ as in 15th century /zʌ/ > 16th century /ʌ/?

I thought [ɦ] might be a possible intermediate stage by analogy with Sanskrit:

Proto-Indo-Iranian *ĵʱ > Sanskrit h [ɦ] but Avestan z

I assume there was a stage like *ʑʱ underlying both  the Sanskrit and Avestan reflexes. (No, see topic 4 below.) That stage would be like Middle Korean /z/. In some modern Indic languages, Sanskrit initial h- has disappeared in reflexes of hima- 'winter'. I don't know if that's a regular change.

4. I've been trying to work out the phonetics of Proto-Indo-Iranic¹ (PII) reflexes of Proto-Indo-European (PIE) velars.

4.1. The PIE starting point:


4.2. The first palatalization in PII



4.3. Affrication in PII (cf. the alveolar affricate reflexes of Sanskrit palatals in some modern Indic languages)


4.4. The merger of plain velars and labiovelars


4.5. The second palatalization in PII


*ɟʱ *gʱ

Velars palatalized in certain environments. Compare:

4.6. The merger of *e and *o into *a made the second palatalization phonemic:

It was no longer possible to regard *c as an allophone of /k/ before /e/, since /e/ no longer existed. (The e of later Indo-Iranic languages is not from the earlier *e that merged with *a: e.g., Sanskrit e is from PII *ai which could be from PIE *ei or *oi but not PIE *e.)

1.1.0:59: The following sections deal with post-PII developments.

4.7. Pre-Sanskrit (Proto-Indic²) stage 1


*ɟʱ *gʱ

The affricate series palatalized. I thought the absence of *ts-type affricates in Proto-Dravidian might have pressured a shift away from alveolar affricates, but the traces of Indic in the Near East - far from Dravidian - underwent stage 2 (4.8 below): e.g., the name Paršasatar from praśāstar- 'director' with ś < PII *ts-.

4.8. Pre-Sanskrit (Proto-Indic) stage 2



Voiceless *tɕ simplified to *ɕ.

The voiced affricates merged with the voiced palatals.

I don't know the order of those two changes, so I show the results of both changes in the same table instead of arbitarily showing one change at a time in two tables.

4.9. Sanskrit (Proto-Indic)

ś [ɕ]
j [ɟ]
h [ɦ]
gh [gʱ]

*ɟʱ weakened to h [ɦ].

4.10. Proto-Iranic (continuing from 4.6)



The voiced aspirate series merged with the plain voiced series.

4.11. Avestan

j [ɟ] g

The affricates deaffricated. The change of *ts to s is roughly parallel to the change of *tɕ to ś in Sanskrit. But note that Proto-Iranic *dz became Avestan z, whereas pre-Sanskrit *dz did not become Sanskrit ź [ʑ], a sound that does not exist in Sanskrit.

The exact phonetics of c and j are unknown. They were palatal unlike s and z, so I have projected palatal stops forward into Avestan. But maybe Avestan c and j were actually affricates.

4.12. Summing up

2nd palatalization
*kʲ n/a
*gʲ n/a
j [ɟ]
*gʲʱ/*gʱ n/a
*dzʱ h [ɦ]
*k/*kʷ +
*g/*gʷ +
j [ɟ]

j [ɟ]
*gʱ/*gʷʱ +
*ɟʱ h [ɦ]
*k/*kʷ -
*g/*gʷ -
*g g
*gʱ/*gʷʱ -
*gʱ gh [gʱ]

¹1.1.0:40: I favor the term Iranic by analogy with Turkic, Mongolic, etc. to avoid confusion with the country of Iran.

²1.1.0.57: I prefer the term Indic to Indo-Aryan, as the word Aryan is shared by both Indic and Iranic. Ironically, the name Indic is actually Iranic, as it is an Hellenization of Old Persian 𐏃𐎡𐎯𐎢𐏁 <ha i du u sha> [hi(n)duš] 'India', cognate to Sanskrit Sindhus 'Sindhu'. The Old Persian form has two Iranic innovations:

It occurs to me tonight that an Indic name for Indic would be Sindhic, but that's not going to catch on. No one is going to rename the country Sindhia either. And Hindutva advocates are probably not going to change the name of their ideology to Sindhutva. YELLOW PIG 12/6

songgiyan uliya aniya juwa emu biya ice shunja inenggi

'yellow pig year, ten one month, new five day'

1. I checked Jan van Steenbergen's Interslavic page for updates and noticed a new item in the menu:

The Painted Bird (in Czech: Nabarvené Ptáče) a Czech-Slovak-Ukrainian film written, directed and produced by Václav Marhoul. It is based on Jerzy Kosiński’s novel The Painted Bird from 1965.


The action takes place in some unspecified East-European, Slavic-speaking country. A place that cannot directly be linked to a specific Slavic population requires a language that can instantly be recognised as Slavic but not be linked directly to any specific Slavic population either. That's why Marhoul decided to use Interslavic:

2. I just bought e-access to Vojtěch Merunka's Interslavic zonal constructed language: an introduction for English-speakers. Google says I can check a box to "Make [the book] available offline", but I can't find it.

On page 5, Merunka writes (12.31.14:03: links added),

Interslavic is also an interesting experiment of alternative history: If there was not such strong pressure from the Frankish Latin-oriented church (e.g. Wiching of Nitra and his band) against the Moravian Church in the 9th century, the invasion of the Hungarians into Central Europe and the subsquent collapse of contacts between Moravia (now a territory of both the Czech and Slovak Republics) and Bulgarian, Serbian and Kiev (later Russian) states, it is possible to imagine a hypothetic different evolution of the Slavic early Middle Age language - we have seen a similar phenomenon in the Arabic World: After the end of natural linguistic unity during the Middle Ages, the modernized universal Arabic language based on the religious language of the Qur'an still prevails. It is an artificial language which is close enough to the various contemporary spoken national dialects of Arabic that it is recognized as the standard for communication between Arabic nations and for contact with foreigners and used as an auxiliary language by both state apparatus and the media.

It would be fun to see historical fiction depicting a world where Interslavic - probably simply 'Slavic' - has the same position that modern standard Arabic has.

Page 143 presents a modified Arebica alphabet to write Interslavic.

3. 𗡠 0271 2mer4, representing the second syllable of 𗡢𗡠 0702 0271 1to'4 2mer4  'to seek, find', has a right side (Boxenhorn code: baedar) found nowhere else. I found it in Li (2008: 47) when looking up  𘅊 0273 1le1 for my last entry.

2mer4 sounds like Old and Middle Chinese *mek 'to seek'. If I were to force a relationship between the two, I could trace 2mer4 back to pre-Tangut *RImek-H with labial dissimilation:

*Pek > *Pew > *Pej > Pe

*RImek-H could be related to

𗑉 4684 1me1 < *CAmik or *mek 'eye'

cf. Tibetan mig (archaic dmyig) 'eye' (but Old Chinese has 目 *Cmuk - is *Cmikʷ possible?)

which is the word that made me discover labial dissimilation. Two scenarios:

But there are other possible pre-Tangut sources of 2mer4 that would rule out a connection with the Chinese word:

𗡢 0702 1to'4 'to seek' can appear by itself. That suggests that 𗡠 0271 2mer4 might be a formerly independent verb that only survives as the second half of a synonym compound 'seek-seek'.

4. Li (2008: 120) gives this example of 0702 as an independent verb from The Timely Pearl 292:


5098 0702 0760 1715

2ngon4 1to'1 2dzen4 1rar4

'case seek judge ?'

It corresponds to Chinese 案檢判憑 'case examine judge ?'

Nishida (1964: 215) has the translation 'to examine the case and hand down a judgment'. Nishida (1964: xii) says Burton Watson and a ヤンポルスキー (Yampolsky? - I don't know who this is, or what his preferred Anglicization of Ямпольский is) helped him with the English translations.  Later, Nishida (1964: 216) has the translation'deliver a judgment' for 判憑 in Timely Palm 302.

I would think then that 𘅤 1715 1rar4 /憑 means 'to hand down' or 'to deliver'. But the basic meaning of 𘅤 1715 1rar4 is 'to write' (Li 2008: 285). So might the Tangut phrase in The Timely Palm mean 'write a judgment'?

憑 can be translated many ways in Chinese, but none of those translations mean 'write' or 'hand down' or 'deliver'. Might it be 'proof': i.e., 'evidence'? If so, then there is only a vague parallel between the Tangut object-verb sequence 𗍷𘅤 'write a judgment' and the Chinese verb-object sequence 判憑 'judge evidence (?)', and mechanically equating 𘅤 with 憑 may be a mistake.

Then again, to say Burton Watson's knowledge of Chinese dwarfs mine would be an understatement, and maybe 判憑 is an idiom 'deliver/hand down a judgment' that I just failed to confirm in other sources.

I always assumed Watson had learned Japanese in the American military in WWII, but in fact he didn't know any Japanese when he arrived in Japan in 1945, and he was actually a Chinese major.

5. My DuckDuckGo search for Yampolsky led me to a video of minerva scientia pronouncing Tangut in Gong's (more or less) and Arakawa's reconstructions.

6. ElitekidMu0 comments on that video:

Fun fact: Thunder Force VI [Wikipedia], a shooting game released in 2008 by SEGA for the PS2, included the Tangut Language as the main language for the protagonist of the series, Galaxy Federation (Vastian). Another language included in the game is the Mongolian Script, used by the antagonist of the series, ORN Empire.

7. Last night I learned that Kara Ben Nemsi was meant to mean 'Carl son German' (though nemsi is really closer to نمساوي‎ namsāwiyy/nimsāwiyy 'Austrian'; 'German' is ألماني 'almāniyy).

Karl May has a way with foreign names. I couldn't have come up with something equivalent to Old Shatterhand or Old Surehand in German.

8. I just noticed that the Old English Wikipedia (Ƿikipǣdia) is

Sēo Frēo Ƿīsdōmbōc

'the free wisdombook' (Ƿ <W> wynn is a rune borrowed into the Old English alphabet)

Are Goidelic forms like Irish seo 'this' the only living reflexes of Proto-Indo-European *só retaining s-? Greek [o] has lost h- < *s-, and English the has a th- that spread from the th-reflexes of the *t-initial oblique forms of *só.

9. I finally got around to rewriting my lost entry for 12.26 from memory. I finished right after I ordered a used hardcover copy of William C. Hannas' The Writing on the Wall: How Asian Orthography Curbs Creativity (2013).

10. Tonight I discovered the variant 槑 for 梅 <PLUM>.

11. Baxter and Sagart (2014) reconstruct 梅 <PLUM>. in Old Chinese as *C.mˤə. I suspect that *C was a voiceless consonant because Vietnamese 'apricot' has a ngang tone pointing to an earlier *m̥- which may be from an even earlier *C̥m- with a voiceless *C̥- that conditioned the devoicing of *m-. I would reconstruct the word in Early Old Chinese as *C̥Amə with a low first vowel that triggered the warping of to *ʌə:

*C̥Amə > *C̥Amʌə > *C̥mʌə > *m̥ʌə > *mʌe > *mʌj > *mɑj > *mwɑj > *muj > *mwəj > *məj > standard Mandarin [mej]

It is possible that *C̥A- was simply completely lost after warping in (many? most? all?) dialects other than the one underlying Vietnamese *C̥m-. I have not yet found any Chinese varieties with a yinping tone pointing to *m̥-.

The *m̥- in the scenario above is of late origin. An earlier *m̥- in Old Chinese became *x- in stage 2 below, whereas newer *m̥- merged with *m-:

stage 1
stage 2
stage 3

*m̥- *x-
hǎi [xaj˧˩˧]

*C̥m- *m̥- *m-
méi [mej˧˥]

měi [mej˧˩˧]

The tones above are conditioned by final glottals: final glottal stops conditioned the falling-rising tone [˧˩˧] and stage 3 voiced *m- and the absence of a final glottal conditioned the high rising tone [˧˥]. YELLOW PIG 12/4

songgiyan uliya aniya juwa emu biya ice duin inenggi

'yellow pig year, ten one month, new four day'

1. Tonight it occurred to me that the Jurchen and Khitan large script characters for 'four' might be graphic cognates:

One might be rotated - but which one? And did the Parhae script have both rotated and nonrotated variants of <FOUR>?

12.30.0:17: Both <FOUR>s have four strokes, so they may simply be two types of tally marks formalized as characters.

In any case, the Khitan large script character is not to be confused with Chinese 卅 <THIRTY> which is a fusion of three 十 <TEN>s.

12.30.12:50: Chinese 卅 <THIRTY> in turn should not be confused with the Jurchen phonogram <sui>:


Jin (1984: 25, 26, 180) reports the first pair of forms in the 大金得勝陀頌碑 Great Jin Victory Hill stele (1185) and the second 卅-like  pair of forms in the Berlin and Tōyō bunko copies of the Ming dynasty Bureau of Translators vocabulary from c. 1500. Without examining the original texts, I cannot be certain about minor variations such as the presence or absence of a hook in the 1185 stele.

I fear that the Bureau of Translators' forms might be unintentionally 'sinified' in the sense that unfamiliar Jurchen characters were accidentally modified by scribes more familiar with sinography. Perhaps the resemblance of <sui> to Chinese卅 <THIRTY> in the Bureau of Translators vocabulary might be an example of sinification.

12.30.15:33: Jin (1984: 58, 76) derives Jurchen <FOUR> from the phonogram <da> which in turn he derives from Chinese 屠:


In the Jin dynasty, 屠 was pronounced *tʰu. Why base a phonogram <da> on a Chinese character pronounced *tʰu?

I don't think <da> was a Jin dynasty invention. I think its roots go back further to a period when 屠 was pronounced as *da in Late Old Chinese. (屠 was once a transcription character for -ddha in 浮屠 *bu da = Buddha.) In other words, I think <da> is potential evidence for the Jurchen large script being an heir to an old tradition of phonetic writing rather than a 12th century invention.

I don't think there is any relationship between <FOUR> and <da> beyond graphic convergence - the bottom of <da> (known only from two inscriptions) may have been remodelled after the far more common character <FOUR>.

2. Tonight while copying character 236 of the Golden Guide, I miswrote the Tangut character element 𘡛 by placing the dot too low so it intersected the stroke below it.

Nishida (1966: 242) interpreted as 𘡛 a radical for things having to do with 愛惜 aiseki 'cherish'. It just occurred to me that 𘡛 might be derived from the top of 愛 <LOVE> or the top right of 惜 <CHERISH>.

But ... what is 𘡛 doing on the top of 𘓉 0993 1lhew1 'to herd', of all things? Is 𘓉 0993 a semantic compound like <CHERISH.LIVESTOCK>?

But ... the bottom of 𘓉 0993 (Boxenhorn code: baecie) is neither 'livestock' nor short for a character for any animal. The only other character with baecie is 𘅊 0273 1le1, a character for writing surnames.

3. I was surprised by this passage (emphasis mine):

Martin Kümmel similarly proposes, based on observations from diachronic typology, that the consonants traditionally reconstructed as voiced stops were really implosive consonants, and the consonants traditionally reconstructed as aspirated stops were originally plain voiced stops, agreeing with a proposal by Michael Weiss that typologically compares the development of the stop system of the Tày language (Cao Bằng Province, Vietnam).

But then I checked Pittayaporn (2009: 110) who explains that in Cao Bằng,

I can see something similar happening in Proto-Indo-European ... except for this problem:

The ejective hypothesis, on the other hand, correctly predicts that Proto-Indo-European labial *pʼ (corresponding to *ɓ- in the implosive hypothesis) would be rare or absent.

4. I wish there were animated GIFs like the Georgian ones at georgian-language.com for Manchu and traditional Mongolian letters. I've been using Jun Jiang's Manchu app which has animated images for Manchu syllables and words, but it doesn't seem to match the verbal (nonvisual) instructions in Roth Li's Manchu textbook, so I'd like to see a second opinion.

5. I discovered that the Old English Wikipedia has a runic viewing option. Select ᚱᚢᚾ <run> under the article title.

12.30.0:16: Try the ȝƿ and ᵹƿ viewing options too.

6. Why is Gdańsk Gduńsk in Kashubian? Is Polish a : Kashubian u a regular correspondence in some environment(s)? I don't see anything like *a > u in Stone's (1993: 765) sketch of Kashubian vowel history.

7. Another Kashubian surprise: kùńszt [kwuɲʃt] (I think) 'art' < German Kunst. Why [wu]? How did Kashubian develop [wu] in native words? Is [ɲ] instead of [n] due to assimilation with [ʃ]? Was the word borrowed from a German dialect in which 'art' was [kunʃt] instead of [kʊnst]? 'Hyperlabial' [wu] for [ʊ] seems odd to me.

Aha, I see now that Kashubian /u/ becomes [wu] "[i]nitially or after a labial or a velar" (Stone 1993: 762). So [wu] has nothing to do with German.

8. How did Proto-Slavic *sŭnŭ 'sleep' become Lower Sorbian soń with a palatal ń instead of the expected n as in the rest of Slavic: e.g., Upper Sorbian son?

