Traditional East Asian dictionaries do not explicitly state whether characters can only occur in combinations or not. At first glance, one might get the idea that both 麒 and 麟 are Chinese words, but in fact the first only occurs in the disyllabic word 麒麟 'qilin'*, whereas the second can be found as an independent word in Classical Chinese** and as a part of other words. A 'distributional dictionary' could make a three-way distinction between

- superbound (appearing solely as part of a single polysyllabic word): e.g., 麒

- bound (appearing as part of two or more polysyllabic words): e.g., 麟 in modern Mandarin

- free (able to appear as an independent word): e.g., 麟 in Classical Chinese

Even finer distinctions may be possible, but that's a start.

Such distinctions could be carried over into a Tangut character dictionary since Tangut, like Chinese, has a large number of monosyllabic morphemes. However, the scheme might have to be altered somewhat for Khitan and Jurchen which have a large number of polysyllabic morphemes. Nonetheless, I still think it is important to know that, for example, as far as I know, Jurchen


may be superbound, as it only appears in

<> 'the name Jahudai'

whereas its homophone


has a far wider distribution: it can represent dai 'girdle' (< Chinese 帶) and the syllable dai in many words other than the name Jahudai. The two characters do not appear to be interchangeable. And even if they were interchangeable, it would be nice to know when that was the case: e.g., from the start or only from the Ming Dynasty onward.

Once we determine that two or more homophonous characters were not interchangeable, then we can try to determine why. In some cases the homophony may not turn out to be original: i.e., the two characters originally had different readings that merged over time, and the original functions of the characters blurred. Since <dai2> resembles Jin Chinese 大 *dai, I think it had always been read dai, whereas <dai1> may have originally stood for a rarer Jurchen syllable that later became dai.

*I am not counting the use of 麒 in definitions such as


'The male qilin is called the qi; the female is called the lin'

from the Book of Han. This explanation for the disyllabic word qilin is a folk etymology.

**In modern Mandarin, 麟 only occurs in morpheme combinations. I would be surprised if 麟 is a monosyllabic word in any modern Chinese language. It is possible that very early attestations of 麟 as an independent word were pronounced *grin, a contraction of 麒麟 *gərin. THE LATE GREAT CHU

Today I downloaded the latest version of Andrew West's BabelStone Han PUA font containing 194 楚 Chu script transcription characters.

In 1127, 1350 years after the fall of the original Chu and less than a decade after the creation of the Jurchen (large) script, the Jurchen Empire established 大楚 Great Chu as a buffer between them and the Southern Song. This puppet state only lasted a month.

How would the Jurchen have written 大楚 *Dai Cu 'Great Chu' in their then-new script?

There were two different types of Jurchen graphs for dai.

Jin and Jin (1984: 81, 136) only list a single word-final example for one type:

<> 'the name Jahudai'

The other type was much more common and used to transcribe Jin Chinese 大 *dai 'great' as well as representing the syllable dai in the native Jurchen names

<> and <> (Jin and Jin 1984: 5)

What was the original reasoning behind having two graphs for the same syllable? Were they originally nonhomophonous? My guess is that the common <dai2> was read as dai from the start, whereas the rare <dai1> was originally for some other syllable that merged with dai: e.g., *daai.

Was there also been a lost phonetic distinction between the two kinds of <cu>? Both could be used to write native words, and both even appeared side by side in

<cu1.cu2.wa.hai> 'according to'?

But only <cu2> appeared in Chinese transcriptions, so I conclude that *Dai Cu would have been written as


4.12.2:42: <cu2> could represent the monosyllabic auxiliary verb cu- 'to be able' (Jin and Jin (1984: 81, 259). Perhaps <cu2> was originally a logograph for that verb, whereas <cu1> may have a phonogram from the beginning.

<dai2> resembles Chinese 大 *dai 'great' and could have initially been intended to write that word (and homophonous Chinese loanwords?), unlike <dai1> which might have been reserved for dai in native words. PROTO-SINO-TIBETAN-AUSTRONESIAN *PONUQ 'BRAIN'?

Old Chinese (OC) 腦 *nuʔ 'brain' was a type A syllable* with vowel lowering. According to my theory, *u partly lowered to harmonize with a low unstressed vowel in a lost presyllable:

*Cʌ-nuʔ > *Cʌ-nouʔ > *nouʔ > *nauʔ > Mandarin nao

However, Laurent Sagart (2002: 5) regarded 腦 *nˁuʔ 'brain' as cognate to Proto-Austronesian (PAN) *punuq with a high first vowel *u. If OC had a high presyllabic vowel in 'brain', it would have matched the high main vowel, and there would have been no lowering:

*pu-nuʔ > *nuʔ > *ɲuʔ > Mandarin *rou

Can both Laurent and I be right? PAN had only four vowels (*a *e [= *ə] *i *u), whereas OC had six (*a *e *ə *i *o *u). Laurent (2002: 8) reconstructed seven vowels in Proto-Sino-Tibetan-Austronesian (PSTAN) to account for the following correspondences in main vowels:

PSTAN Environment OC PAN
*u before labials *u
elsewhere *u
*o before labials *a
elsewhere *o
*a before *y *i *a
elsewhere *a
(everywhere) *e
*e after grave consonants *e
elsewhere *i
*i in open syllables *i
in closed syllables *i

I only reconstruct two vowels in OC presyllables: high and low *ʌ**. I have long thought each resulted from the merger of various unstressed vowels. Let's suppose that those earlier vowels were identical to the seven vowels in PSTAN final syllables:

*i *i
*u *u
*o *u

Above I assume that PAN first vowels developed more or less like second vowels. A study of OC syllable types and PAN fist vowels may reveal a different course of development.

My OC could be from PSTAN *o which raised to *u in PAN:

*ponuq > OC *pʌ-nuʔ and PAN *punuq 'brain'

4.11.1:10: If OC and PAN are not related, the word could be a borrowing from one into the other when the source language had *o as the first vowel.

4.11.1:35: Of course OC is not the only Sino-Tibetan language. STEDT lists nu-words for 'brain' in other languages. The Proto-Sino-Tibetan form may have ended in a *-q that

- was retained in Proto-rGyalrongic

- became *-k in some languages: e.g., Written Burmese ūḥnok

- became a glottal stop in OC

- was lost in Tangut

0118 and 0127 2no1 < *noH 'brain'

Was the mid vowel in some of these forms lowered before *-q? Jacques' (2004: 266) Proto-rGyalrongic reconstruction does not have *-uq. Maybe there was a chain shift: *-uq > *-oq > *-ɔq.

4.11.2:17: I am agnostic about PSTAN. Currently I think Austronesian is more likely to be related to Kra-Dai than to Sino-Tibetan.

If the correspondences above are valid, they do not entail a genetic relationship. They may tell us about patterns of borrowing.

Conversely, if the correspondences are due to common ancestry, exceptional forms may have been borrowed after a split (cf. how the loanword paternal has p instead of the regular Germanic f from Proto-Indo-European p).

*4.11.1:56: Type A syllables were characterized by secondary pharyngealization (a.k.a. 'emphasis') at some point. I do not know of any other Sino-Tibetan language with pharyngealization. I suspect that pharyngealization was a Chinese innovation which may have been due to contact with a substratum or neighboring language. I have omitted pharyngealization in this discussion to focus on the vowels.

**4.11.2:15: I got the symbols and from my phonetic notation for Middle Korean which had a two-class height harmony system like my Old Chinese reconstruction. I chose them because they are visually distinct from the letters for my six vowels. Their actual phonetic values may have overlapped with two of the vowels: e.g., they could have been and *a. It is easier to type than a phrase like "unaccented presyllabic higher vowel" or *ə̆ with a breve. DO AUSTRONESIAN AND SINO-TIBETAN SHARE A WORD FOR SETARIA ITALICA?

Today I saw Laurent Sagart's "Austronesian and Sino-Tibetan words for Setaria italica and Panicum miliaceum: any connection?" (2014) and was surprised to see him mention Khitan in a paper about prehistory (emphasis mine):

There is a complication with the semantics of this comparison: certain modern authors (Li 1983:29; Hu 1984; Chai et al. 1999:9) claim jì 稷 did not mean 'Setaria italica' in early Chinese but 'Panicum miliaceum'. This view, widespread among Chinese agronomists, is based on statements by various Chinese authors from c. 1000 CE down to modern times, to the effect that jì 稷 is the same plant as 穄 *[ts][a][t]-s > tsjejH > jì ‘Panicum miliaceum’. Thus Chai et al. (1999:9) observe that in the three provinces of Shandong, Henan and Hebei, (glutinous) Panicum miliaceum varieties are today usually referred to as jì 稷.

However, this is a confusion arising from the phonetic convergence of these two words after Middle Chinese (a standard reading pronunciation from the sixth century CE, known to us through the dictionary Qie Yun 切韻, prefaced in 601 CE, and its later editions). In Modern Standard Chinese, Middle Chinese (MC) 稷 tsik and 穄 tsjejH have both evolved, quite regularly, to jì [ʨi 51]. The merger had already occurred in northern Chinese during the Khitan or Liao dynasty, which occupied parts of north China, including Hebei, from 916 to 1125 CE. Phonetic transcriptions in Khitan small script of the 11th and 12th century Chinese show that while MC final -k was still represented by a glottal stop in poetry, it had disappeared in everyday speech (Kane 2009:252sq.). thus in everyday Chinese of the Khitan period,'Setaria italica', MC tsik, was probably [tsi]. At the same time, the character 祭, a MC homophone of'Panicum miliaceum' on Middle Chinese (both MC tsjejH), and the phonetic element in'panicum', was also [tsi] (Shen 2014:318). It is significant that there are no statements equating 稷 tsik and 穄 tsjejH from time periods preceding the phonetic merger of the two forms [i.e., from before c. 1000 CE]. Thus we can be satisfied that 稷 tsik and 穄 tsjejH were distinct cereals in early Chinese times, and that (since there is no question that jì 穄 meant ‘panicum’) jì 稷 tsik must be the name of Setaria italica.

I would like to add that Kane's argument is based on Chinese-internal data: the poetry in question is in Chinese, and the loss of final glottal stop is implied in 沈括 Shen Gua in 夢溪筆談 Mengqi bitan 'Dream Pool Essays' (1088; Kane's translation):

Even now the Heshuo [= Hebei; i.e., north of the Yellow River] people pronounce 肉 [*zhiwʔ] as 揉 [*zhiw], and 贖 [*shu] as 樹 [*shu].

In the Khitan small script,

[g]enerally speaking there is no consistency in the use of the graphs used to transcribe syllables which ended in stops in MC and probably a glottal stop in Song Chinese. This does not prove that Liao Chinese did not have a glottal stop in such words, just that the Kitan [= Khitan] transcription does not indicate it. (Kane 2009: 254)

For instance, the Khitan small script character

339 <i>

was used to transcribe syllables whose MC readings ended in -i and -it (both corresponding to Song *-iʔ). The one instance of a word whose MC reading ended in -ik like 稷 tsik 'Setaria italica' was written as

087 <tz>

which also transcribed the open syllables 知 *ji (MC trje) and 旨*ji (MC tsyijX).

The Sino-Tibetan forms for Setaria italica look like a good match for Proto-Austronesian *beCeŋ (*e = [ə]) with the exception of the coda:

Probable Tibeto-Burman cognates of the Chinese word [稷 Old Chinese *[ts]ək] are Trung tɕjaʔ55 ‘millet’, Lhokpu cək ‘Setaria italica’ (van Driem, p.c. to LS, June 24, 2004; not phonologized): if the shape and semantics of this last form are confirmed, the Proto-Sino-Tibetan word for 'Setaria italica' might sound something like #tsək (pre-reconstruction).

Both Proto-Sino-Tibetan (PST) and Proto-Austronesian (PAN) had . I would expect the following correspondences which are in Sagart (2002: 7):

OC (and probably also PST) *-k : PAN *-k

OC (and probably also PST) *-ŋ : PAN *-ŋ

Yet Sagart also found examples of the correspondence

OC (and probably also PST) *-k : PAN *-ŋ

which has Sino-Tibetan-internal parallels: e.g.,

Tangut 1siw4 < *sik, Written Burmese sac < *sik : OC 新 *sin < *siŋ? 'new'

I presume there is morphological variation within Sino-Tibetan. But if the Sino-Tibetan and PAN forms for Setaria italica are related, how can the different codas be explained? Are they different reductions of *-ŋk, a cluster lost in ST and PAN?

Genetic scenario:

Proto-ST-AN *-ŋk > PST *-k but PAN *-ŋ

Nongenetic scenario (i.e., borrowing):

pre-PAN *-ŋk > borrowed as *-k in PST but became * in PAN

4.10.4:40: The first vowel of PAN *beCeŋ (*e = [ə]) is consistent with my theory that presyllables with higher vowels (*i, *ə, *u) conditioned type B syllables in Old Chinese such as 稷 Old Chinese *[ts]ək].

Sagart (2002: 8)  found the following correspondences between  OC syllable types and PAN segments:

OC type A : PAN penultimate syllable initial voiceless stop (except *q-) or zero (i.e., no penultimate syllable)

OC type B : other PAN penultimate syllable initials including *q-

If PAN preserved Proto-ST-AN penultimate syllable initials, I do not understand why bare syllables and syllables preceded by voiceless stops developed type A with pharygealization. And why would *q- block pharygealization which was the default (!) development? (Normally pharygealization is marked: i.e., nondefault.)

PSTAN *(tV)CV > OC *CˁV (type A)

PSTAN *qVCV, *sVCV, *nVCV > OC *CV (type B)

In Semitic terms, type A is 'emphatic', and Semitic q is an 'emphatic' consonant, so I would expect it to be associated with type A.


Last night I found these translated sections of the History of the Liao Dynasty translated in Wittfogel and Fêng (1949: 261):


On the day mou-ch'ên [of the eleventh month in the thirteenth year of T'ung-ho [= 995 AD*] ] Korea sent ten boys to study the [Ch'i-tan [= Khitan] ] national language.


On the day kêng-ch'ên [sic for kêng-hsü] [of the third month in the fourteenth year of T'ung-ho [= 996 AD] ] Korea again sent ten boys to study the [Ch'i-tan [= Khitan] ] national language.


[On the day chia-shên of the twelfth month in the first year of K'ai-t'ai [= 1012 AD] ], Kuei Prefecture reported that its inhabitants, who had originally been moved from Silla [= Korea], were illiterate, and that schools should be set up to educate them. This request was approved by imperial decree.

I wondered which Khitan script(s) those Koreans learned: the large script, the small script, or both.

David Boxenhorn suggested that those Koreans might have tried to write their own language in the small script. That would have been easy to do, since Korean a thousand years ago

- had *CV(C) syllables like Khitan without the consonant clusters of a few centuries later (and even such clusters coud have been written with sequences of small script consonant symbols)

- had roughly the same consonants as Khitan minus the uvulars

- shared most of its vowels with Khitan (*i, *e [> later Korean yŏ], *ə, *a, *u, *o)

Only the apparent absence of the vowels and (> later Korean a/ŭ) in Khitan might be a problem. Existing CV, V, and VC characters could do double duty for those vowels: e.g.,

273 <un>

could represent both Korean *ɯn as well as *un. That would parallel the current use of the Roman letter u to transcribe both Korean [ɯ] and [u]: e.g., Kim Jong-un is [kimdʑəŋɯn].

Also, dots could be added to indicate non-Khitan uses of characters, just as the Khitan added a dot to  <pu> to write the Chinese syllable <fu>:


241 <pu> > 261 <fu>

4.9.3:13: David's scenario makes me wonder if the Jurchen used the small script to write their language.

When I saw this passage in Wittfogel and Fêng (1949: 253),

In 1150 a distinguished Jurchen statesman is said to have written a confidential political letter to his son in the small Ch'i-tan script; this interesting document, translated into vernacular Chinese, is preserved in the Chin Shih [= History of the Jin Dynasty] (CS [= Chin shih 76, 2a ff.; 84, 3a ff.).

I wondered if the statesman wrote in Khitan or in Jurchen using the Khitan small script. Wittfogel and Fêng raised the possibility of the latter:

Many Chin records describe the continued use of the Ch'i-tan script during the early and middle years of the Chin dynasty. Unfortunately, they do not make it clear whether this also involved the use of the Ch'i-tan language. There must have been a number of Jurchen who spoke Ch'i-tan, but the question still arises whether such knowledge was necessary to the use of the Ch'i-tan script. In the formative period of their power the Mongols wrote their documents in the Mongol language but in the alphabetic Uighur script (Browne 28 II, 441; cf. Barthold 28, 41). The Manchus until the year 1599 wrote their documents in Mongol and used the Mongol script (KHTSL 3, 2a-b). The Jurchen may have availed themselves of either method exclusively, or of both at different periods of time, first adopting an alien language and script and later using the alien script for transcribing their own language. In the latter case the smaller script would seem particularly appropriate, for as an alphabetic system of writing it could easily be adjusted to the needs of another language, especially if this language belonged to the same Altaic complex [as Korean does!]

*4.9.2:49: Although I suspect the eleventh month of Tonghe (= T'ung-ho) 13 is in the start of 996 AD, Wittfogel and Fêng referred to 995 AD in their footnote (emphasis mine):

This record is confirmed by the Korean official history which relates that in 995 the Korean government sent ten boys to Liao [the Khitan Empire] to study the Ch'i-tan language (KRS [= Koryŏ-sa 'History of Koryo'] 3, 46). However, this effort seems to have produced very poor results. In 1010, when the Liao vanguard general sent a document written in Ch'i-tan to the Korean court, no one could read it (KRS 94, 86).

Did any of the ten boys return as men to serve the court, and if so, were any of them still at the court in 1010?

4.10.4:54: Andrew West pointed out that the eleventh month of Tonghe 13 is equivalent to 25 November-24 December 995, so Wittfogel and Fêng's date is correct.


I wanted to see 'on the tomb' from my last post in context, so I looked at the text on the lid of the epitaph for 蕭仲恭 Xiao Zhonggong as copied in Qidan xiaozi yanjiu (1985: 594):

1. 139-051-290-253 <na.gha.án.ô>

2. 188-169 <?.qó>

3. 081-140 <MONTH.en>

4. 081-348 <MONTH.e>

5. 334-262 <g.ui>

6. 071 <ong>

7. 076-020-361-140 <gho.y.én.en>

8. 251-084-205 <>

9. 052-334-361 <?.g.én>

Let's go through it block by block:

1. Kane 2009 (51, 106) translated

139-051 <na.gha> and 139-051-290 <na.gha.án>

as 'uncle' and  'maternal uncle' (cf. Written Mongolian naghachu 'maternal uncle'; Ji Shi 1982). Neither occur alone in Qidan xiaozi yanjiu's index of Khitan small script words. Have they been found in isolation in the texts discovered in the three decades since the publication of that book?

Could 290 <án> be the plural suffix also in

311-151-290 <b.ugh.án> 'children' < 311-168 <b.qo> 'son, child'

which also has unexpected medial voicing in the plural? Is -gh- a contraction of *-qw- < *-qo-?

The final character

252 <ô>

is an error for

341 <er>

which Kane (2009: 106) regarded as the invariable (and in this case, nonharmonic) accusative-instrumental suffix ('via the maternal uncle'?). However, I would expect the genitive: 'junior tent of the maternal uncle'.

Could <er> be a plural suffix?


222-362-222-341 <ń.iau.ń.er> 'siblings' < 222-362 <ń.iau> 'sibling'?

is another plural ending in <er>, though the suffix may be <ń.er>. I don't know of any plural suffix <ń>, so I don't think <.ń.er> is a double plural suffix.

Could <er> be a plural genitive suffix if <na.gha.án> is a singular?

Could <na.gha.á> be a doubly marked plural like Japanese ko-domo-tachi, English child-r-en, and Dutch kind-er-en (cf. German Kind-er with only one suffix)?

2. <?.qo> is 'junior' (Kane 2009: 25). Kane interpreted this as an adjective modifying the previous noun ('junior maternal uncles'), though if that was the case, it would be in an un-'Altaic' position: i.e., following instead of preceding hte noun.

Aisin Gioro read the first character

188 <?>

as <od> in 2004 and as <oji> in 2011. If it is <od>, how did it differ from


which Aisin Gioro read as <ad> ~ <od> and <od> ~ <do>?


081 <MONTH>

is an error for

380 <TENT>.

Kane (2009: 25) translated blocks 1-3 as 'the tent of the junior maternal uncles'; I would add an 'of' before 'the' to correspond to the genitive suffix

140 <en>.


081 <MONTH>

is an error for

082 <yw>

with a dot. Hence <yw.e> is a transcription of the Liao Chinese name 越 *Ywe.

5. Transcription of Liao Chinese 國 *gueiʔ 'state'*.

6. Transcription of Liao Chinese 王 *ong 'prince'.

Blocks 4-6 means 越國王 'prince of the state of Yue'.

7. 076-020 <gho.y> may be a verb stem.

361 <én> could be a nominalizing suffix, though I would not expect <é> after <gho.y> if Khitan vowel harmony was like Mongolian or Manchu vowel harmony.

Is 140 <en> a genitive before 'tomb': 'on the tomb of ...'?

8. 'tomb-LOC': 'on the tomb'.

9. Kane transcribed 052 as <RECORD>, and stated that it "is only found in the word

[052-334] <> [= my <g>] 'record'

with various suffixes." However, it can occur in isolation and with characters other than 334, though it cannot occur in noninitial position (Qidan xiaozi yanjiu 1985: 201-202, 690-691). That suggests 052 is not a logogram. Aisin Gioro read it as <cu> in 2004 and <ce> in 2011.

361 <én> is a nominalizing suffix. Kane (2009: 155) translated 052-334-361 <?.g.én> as 'inscription'.

*4.8.3:48: Although the Khitan may have borrowed Liao Chinese 國 as gui [guj], I suspect the Liao Chinese pronunciation was *gueiʔ [kwəjʔ]. In Middle Chinese, 國 was *kwək, and has developed in at least two different ways in modern Mandarin dialects:

1. *kwək > *kwəɰk > *kwəɰʔ > *kwəjʔ > [kwej] (e.g., Jinan)

2. *kwək > *kwəʔ > [kwo] (e.g., Beijing)

Forms like Linquan [kwɛ] or 13th century Phags-pa Chinese ꡂꡟꡠ <gue> may be from either *kwəjʔ or *kwəʔ with fronting of the schwa.

The Khitan borrowed from a dialect with the first path of development.

Prescriptive 15th century Sino-Korean 귁 kuyk might be a conscious compromise between actual Sino-Korean 국 kuk and Ming Mandarin [kuj].


According to my harmonic unwritten vowel hypothesis,

251-084 <n.ra> 'tomb'

in the Khitan small script was read nara without the apparent harmonic violation of Kane's (2009: 123) nera. So far, so good. But the dative-locative suffix for 'tomb' is de, not *da:

251-084-205 <> 'tomb-LOC'

This is not an isolated spelling. It occurs seven times in four texts over a span of a century:

- twice in 蕭令公 (1.10, 26.14; 1057)

- once in 許王 (2.17; 1105)

- once in 耶律撻不也 (1.10; 1115)

- thrice in 蕭仲恭 (lid 3.2, 1.8, 44.38; 1150)

I wonder if there are even earlier occurrences. Did the harmonic form *nara-da ever exist: e.g., at the time of the invention of the small script c. 925?

Here are other examples of seemingly nonharmonic dative-locative -de:

051-251-205 <> '?-DAT/LOC'? (蕭令公 12.17) instead of *ghan-da (assuming ghan is the stem though it is not attested in isolation)

071-205 <> 'prince-DAT' (蕭仲恭 4.51) instead of 071-217 <> (quoted in Kane 2009: 137; source not specified)

076-189-099-205 <> '?-DAT/LOC' (耶律撻不也 21.1) instead of *ogha(a)d-da

141-205 <> 'seven-LOC' (蕭仲恭 8.12) instead of *dolo-do

But -de is expected if Aisin Gioro's (2004, 2005) reconstruction of 'seven' as dil is correct.

248-118-205 <jal.qú.de> '?-DAT/LOC' (許王 50.17) instead of *jalqu-du

The reading <jal> is from Aisin Gioro (2004).

Was nara-de a harbinger of the ultimate fate of the Khitan dative-locative? If Khitan had survived, would it have an invariable -de [də], just as the Jurchen dative-locative suffixes

<do> and <du> (= Kiyose's dö)

merged into Manchu de [də]? Could such an invariable -de already have existed in late colloquial Khitan, emerging occasionally in texts that otherwise reflected harmonic allomorphy lost in speech?

4.7.0:56: Khitan had an invariable accusative-instrumental suffix -er, though the homophonous perfective suffix had -ar and -or  allomorphs (Kane 2009: 131, 145-146). Would as yet undiscovered 10th century small script texts also have accusative-instrumental -ar and -or? Why did merger occur in the accusative-instrumental before the dative-locative? Was disambiguating the former from a homophonous verb suffix a factor?

Unlike Khitan, Jurchen had three allomorphs of the accusative suffix:


<ba> (written with two types of characters), <be>, <bo>

All three merged into Manchu be [bə].


On Friday I was looking for the name of Yelü Abaoji's father

244-084-051-099-222 <ń>

transcribed in Chinese as 撒剌汀 *saʔlaʔding or 撒剌的 *saʔlaʔdiʔ* in Kane (2009). Last night I found it on page 129. I also rediscovered my 2014 post on the name.

Last year I interpreted 084 as ar and read the name as Sargha(a)diń. But if 084 was ar, what was the difference, if any, between it and 123

which also represented ar?

Kane (2009: ) read 084 as ra and tentatively reconstructed an inherent vowel e in 244. Hence he read the name as Seraghadiń. The coexistence of e and a is unexpected in Mongolic or Jurchen/Manchu. There is no guarantee that Khitan vowel harmony was like Mongolic or Jurchen/Manchu vowel harmony, but the limited evidence suggests some degree of similarity. So I am skeptical that the name contained an e. However, other alternatives also have problems: e.g., Sargha(a)diń above. A zero-vowel interpretation of 244 results in Sragha(a)diń with an un-'Altaic' (and hence unlikely) initial cluster. The Chinese transcriptions cannot help us, as Liao Chinese had no *se or *sr-, so 撒剌 *saʔlaʔ- could represent Khitan Sar-, Sera-, or Sra-.

A fourth possibility is that the name was Saragha(a)diń with an unwritten first vowel. Were Khitan small script readers able to supply unwritten vowels with the aid of vowel harmony rules? Perhaps 244 was read as s, sa, or se depending on context. In this case, it was read as sa because sr- would be an impermissible initial cluster and sera- would violate vowel harmony.


244-084-254 <s.ra.d> '?' and 251-084 <n.ra> 'tomb'

which Kane read as serad and nera would be read as sarad and nara according to my harmonic hypothesis.

In these cases, the reader would have to look ahead to determine whether the vowels of 244 <s> and 251 <n> would be a or e.

Conversely, readers of the traditional Mongolian script keep previous vowels in mind to disambiguate later vowel letters: e.g., the second vowel letter of


<eja/en> 'lord'

has to be read as e because the first vowel is e. Although a medial a looks exactly the same as a medial e, *ejan would violate vowel harmony.

*撒剌的 is from the History of the Liao Dynasty. I don't know where Kane (2009: 129) found 撒剌汀.

15.4.4:23:40: WHY <SA> MANY?: PART 1

I have already discussed

(~~) and ,

two of the eight types of Jurchen <sa>-graphs, at length in "Jurchen Polyphony 2", "That Yu-ni- Component", and "Un-<sa>rtain-'tea' ", so I will move on to the third which is only attested in two names:

the surname <sa.hala>* (女真進士題名碑 21)

the personal name <>** (慶源郡女真國書碑 4:2)

Was this <sa> intended solely for use in names, or was it used to write other words absent from the few texts that we have on hand?

*4.5.1:31: Jin Guangping and Jin Qizong (1980: 311)  and Jin Qizong (1984: 107) read the second character as xala = my hala. However, the entry for that character in Jin Qizong (1984: 129) listed gal as Jin Guangping's reading and does not include the surname as an example. To confuse matters further, the Chinese transcription of the name is 撒合烈 *saʔhoʔlieʔ with different vocalism that is not harmonic. I would expect something like *saʔhoʔlaʔ.

**4.5.1:35: I cannot explain the nonharmonic sequence -ae. If u and i were neutral vowels, I would expect *udisaa or *udisee. Could the name be of non-'Altaic' origin: i.e., from a language without vowel harmony? But what language would that be? The name is too long to be Chinese.

15.4.3:23:43: WHY <SA> MANY?: PROLOGUE

My previous entry dealt with the mys-'tea'-ry of why the Chinese character 茶 *cha (in Jin Dynasty pronunciation) 'tea' was used as the basis of the Jurchen character

<sa> (not <cha>!)

None of the Jurchen <ca>-characters look like Jin Chinese *cha-characters:

, so far known only in the word <> 'helmet'.

for <ca> elsewhere (with a possible variant in 女真進士題名碑)

(4.4.1:10: There is an obscure Chinese character 𠮮 attested in the Liao Dynasty dictionary Longkan shoujian with the reading *hua, not *cha. The dotted version vaguely resembles Liao/Jin Chinese 吞 *ten.)

I forgot to ask why a 茶 *cha-based character for sa was needed at all given the existence of seven other types of <sa> in the Jurchen large script:


Why wasn't one - or seven - <sa> enough?

15.4.2:23:59: UN-<SA>-RTAIN-'TEA'

Last Friday, I listed Jurchen


as an example of a graph whose reading seemed to be of Liao/Jin Chinese origin: i.e., based on a northeastern dialect of Chinese from the 10th century onward.

<sa> appears to be a derivative of the Chinese character 茶 'tea'.

If Janhunen is right, and if the Jurchen script is derived from the elusive Parhae script, then the readings of Parhae characters would be based on pre-Liao Chinese: e.g.,a 茶-based graph would be read as something like *da (< Middle Chinese *ɖæ) or even *ra or *la (< Old Chinese *rla) if a Manchurian tradition of writing went back very far or if northeastern Middle Chinese retained an archaic liquid-initial reading of 茶.

However, <sa> has an initial fricative that matches none of the hypothetical initials of the Parhae scenario or the *ch- of the Liao/Jin Chinese reading of 茶. Nonetheless, I thought the reading <sa> was of Liao/Jin Chinese origin because <s> and *ch- are both sibilants. But why would the creator of the Jurchen script take a Chinese character pronounced *cha and use it to write Jurchen sa?

Hypothesis 1: Because the source Chinese dialect had initial *s- in 茶.

Although Japanese does have the Tō-Sō-on (i.e., post-Middle Chinese) reading sa for 茶 (e.g., in 喫茶店 kissaten 'cafe'), that is not evidence for Jin Chinese *s-, because Japanese lacked an affricate at the time of borrowing, so the s- of sa is an approximation of a Chinese affricate. There are a couple of modern Chinese languages with s- in 'tea', but they are far from the northeast, and their s- might be of recent origin: Qinglong Ping sa and Shitai Wu sʰa.

Hypothesis 2: Because the source Chinese dialect had initial *tsʰ- in 茶.

According to, some modern Mandarin varieties including Beijing (presumably the colloquial accent as opposed to the Beijing-based national standard) have tsʰ- in 茶. There was no tsʰ- in Jurchen, so the Jurchen might have perceived tsʰ- as s-. But the tsʰ- of 茶 might be of recent origin like s-.

Hypothesis 3: Because of a sound change in Khitan.

Jurchen/Manchu has both sh- and c- corresponding to Mongolic c-:

'white': Jurchen/Manchu shanggiyan : Proto-Mongolic *cagaxan (Janhunen 1996: 197)

'army': Jurchen cau(r)-, Manchu cooha : Middle Mongolian ca'ur 'to fight' (Kane 2006: 

My guess is that the sh-forms were borrowed from a nonstandard Khitan dialect (Eastern Khitan?) whose speakers were in close contact with the Jurchen, whereas the c-forms were borrowed from a more prestigious variety of Khitan.

Could Jurchen <sa> be based on a Khitan large script character whose reading shifted from cha to sha due to deaffrication in Eastern Khitan?

There are two problems with this scenario. First, I do not know of any Khitan large script character resembling 茶. The shape of <sa> is either a Jurchen innovation or a carryover from the Parhae script absent in the Khitan large script. Second, the Jurchen character was pronounced sa, not sha.

(4.3.3:10: But maybe this hurdle is not insurmountable, as

seems to have been read as both sa and shang judging from Ming Chinese transcriptions. Was <sa> ever read as sha? Conversely, was 'white' ever sanggiyan in Jurchen? Did Jurchen borrow from three kinds of Khitan dialects: one that retained c, another that weakened it to sh, and yet another that weakened it to s?)

Hypothesis 4: Because of a gap in Jin Chinese phonology.

There may not have been a *sa in Jin Chinese*, so *cha was used as the basis of <sa>**.

But even if Jin Chinese only had *saʔ with a final glottal stop, wouldn't characters with that reading (e.g., 撒薩颯卅) be a better match for *sa than 茶 *cha?

And if the Jurchen script is based on the Khitan large script (according to the mainstream view) or the Parhae script (according to Janhunen), why not carry over an existing character from one of those scripts for sa? Why create a new character for a syllable that probably existed in Khitan and whatever language the Parhae elite spoke***?

*4.3.3:18: Middle Chinese *sa became Liao/Jin Chinese *so.

I do not know whether Mandarin sa < *shai for 洒灑 can be projected back into Jin Chinese. Mandarin sa could be a borrowing from a much later dialect in which *sh- became s-.

**4.3.3:31: This kind of substitution has a weak parallel in the Old Japanese man'yōgana script. Middle Chinese 娑 *sa was a low-frequency character and prone to be misread as its phonetic 沙 *ʂæ (and in fact 娑羅双樹 can be read as shara sōju as well as sara sōju in modern Japanese). Hence the most frequent phonogram for Old Japanese sa was the high-frequency character 佐 *tsaʰ in spite of its initial. (See the frequency statistics in my 1999 dissertation and 2003 book and on Ueshiba Hiroshi's site.)

***4.3.4:06: Janhunen (1996: 152-153) doubted that Koguryo and its Parhae successor state were "likely to have been dominated by ethnic elements that would have been linguistically ancestral to the modern Koreans" and proposed that "they were dominated by people ethnically ancestral to the Jurchen": i.e., Tungusic speakers. Nonetheless the limited linguistic material available from Koguryo points to Koreanic and even Japonic rather than Tungusic.


Guillaume Jacques (2015: 220) wrote what I've been thinking for years now:

In all modern systems of [Old Chinese] reconstruction, *-r- is reconstructed for all syllables with either second division rhyme, chongniu 3 and/or retroflex initials in Middle Chinese. While it has been convincingly demonstrated that clusters in *-r- is indeed one possible origin for these syllables (Yakhontov 1961), there is no definite proof that *-r- should be reconstructed in all cases.

I used to reconstruct a lot of medial *-r- in Old Chinese until 2006 when Zev Handel's "Rethinking the medials of Old Chinese: Where are the r's?" opened my eyes to the possibility of preinitial *r-. Over the years I have wondered if those syllables had even more sources: e.g., in 2012 I wrote that *r- "might be from earlier *l- and/or *t- as well as *r-". Classical Tibetan has preinitial l- and d- as well as r-.

Guillaume's figures confirm my suspicion that there is too much noninitial *r in Old Chinese reconstructions:

As a measure of comparison, over 20% of syllables in Old Chinese as reconstructed by Baxter and Sagart (2014) contain a preinitial or a medial *r, while in Japhug and Tibetan, where consonant clusters including r are attested, we only find respectively 12% and 16% of syllables with non-initial r.

Like Classical Tibetan, Japhug has preinitial l-. Would adding the percentage of syllables with preinitial l- raise 12% and 16% to roughly 20%? Japhug l-syllables are rare and presumably of secondary or external origin (e.g., ld- is from *rl- [Jacques 2004: 314]), as original preinitial *l- became j- (Jacques 2004: 271). Maybe the total of j-, l-, r-, and -r- syllables of Japhug might reach 20%. Do any Tibetan or Japhug preinitial (*)l- correspond to *-r- in a typical modern reconstruction of Old Chinese?


On Saturday I found this blog post by Mike Aubrey:

I hold, following Mussies that these two clusters [χθ <khth> and φθ <phth>] were pronounced /kth/ and /pth/ in the Hellenistic Period.

Mussies (1971: 51) wrote (with transliteration that I added),

-φθ- <phth> and -χθ- <khth> are misleading orthographies and respresent resp. -pth- and -kth-.

Aubrey added,

Non-Alveolar Stop + Aspiration + Alveolar Aspirated Stop [i.e., a sequence like phth or khth] is both difficult to pronounce and also phonologically implausible

I do not know of any modern language that allows such sequences: e.g., in Korean /ph th/ and /kh th/ would be pronounced as [ptʰ] and [ktʰ], not *[pʰtʰ] and *[kʰtʰ]. Similarly in Sanskrit, the rule is to reduce /ChCh/ to /CCh/, though

in the manuscripts, both Vedic and later, an aspirate mute is not seldom found written double—especially, if it be one of rare occurrence: for example (RV.), akhkhalī, jájhjhatī (Whitney 1889: 53; emphasis mine).

Aubrey found examples of the spelling error πθ <pth> that his theory predicts.

Are there also examples of κθ <kth> as a misspelling for χθ <khth>?

Supposing Aubrey is right. Given the fact that Classical Greek spelling is basically WYSIWYG (what you see is what you get [i.e., pronounce]), why were /pth/ and /kth/ properly spelled as φθ <phth> and χθ <khth> instead of *πθ <pth> and *κθ <kth>? The aspiration of the φ <ph> in ὀφθαλμός <ophthalmós> 'eye' is not etymological, as the final consonant of the root op- < *okʷ- < Proto-Indo-European *ʕʷekʷ- 'eye' is unaspirated. (4.1.0:49: Beekes derived this word from Pre-Greek *okʷt-alʸ-(m-). The resemblance between inherited *okʷ- and substratal *okʷt- is coincidental. In any case, the aspiration of φ <ph> is not original.)

Last night, I rediscovered Beekes' "Pre-Greek*: The Pre-Greek loans in Greek" to write "Making Machines". Beekes regarded φθ <phth> as a cluster in the substratum language that he calls "Pre-Greek"**. What if Pre-Greek allowed (allophonic***) aspirate sequences that were carried over into Greek**** and reflected in the spelling? Substratum-influenced pronunciations like [pʰtʰ] and [kʰtʰ] may have coexisted side by side with an inherited pronunciation [ptʰ] and [ktʰ]***** for a time. Then the latter dominated in speech, though the spellings with double aspirates persisted as the norm.

4.1.1:45: My theory implies that if the earliest Greek speakers had moved to Greece and there had been no one there, the clusters φθ <phth> and χθ <khth> would not exist (unless the double aspiration had been of purely Greek-internal origin), and πθ <pth> and κθ <kth> would have been the only possible spellings.

*4.1.1:07: Beekes used the prefix Pre- to refer to an unrelated substratum language, where I generally use pre- (without capitalization) to refer to a largely internal reconsruction of an unattested earlier stage of a language: e.g., pre-Tangut is ancestral to Tangut and not a substratum of Tangut.

I use Proto- to refer to the (potential) result of comparative reconstruction of the ancestor of two or more languages: e.g., Proto-Pumi-Tangut (whose existence is implied by the family tree in Jacques 2014: 2).

However, if I speak of, say, the pre-Japanese languages in the plural, I am referring to multiple substratal languages, not an earlier stage of Japanese such as Proto-Japonic.

It would be nice to have three prefixes to distinguish between the three types of earlier languages: substratal, internally reconstructed, and comparatively reconstructed.

**4.1.0:55: Beekes (2007: 12) noted that φθ <phth> was also possible in inherited words.

Although Beekes did not explictly list χθ <khth> as a Pre-Greek cluster, it does appear in words he regarded as Pre-Greek: e.g., μοχθέω 'be weary with toil'.

***4.1.1:11: According to Beekes (2007: 5), aspiration was not phonemic in Pre-Greek.

****4.1.1:19: What if Pre-Greek had fricative allophones of stops long before the Greek aspirated stops became fricatives? Pre-Greek fricatives could have been borrowed into Greek as aspirated stops.

*****4.1.1:12: I assume that the Sanskrit constraint against aspirate sequences was also in the speech of those who brought Greek to Greece.

15.3.30:23:49: MAKING MACHINES

Seeing the Spanish word máquina 'machine' made me wonder about the origins of machine and mechanism. Those two words don't sound much alike in English, and their Japanese derivatives don't even look alike, as they are written with different kana:


<ma.shi.n> mashin ~ <ma.shi.-.n> mashīn 'machine', <mi.shi.n> mishin 'sewing machine'


<me.ka.(> meka(nizumu) 'mechanism'

They appear to be from Latin borrowings of the same Greek word from different dialects at different periods:

newer mechanismus < Attic-Ionic mēkhanḗ

The ē of mēkhanḗ is from an preserved in Doric (see below and Sihler 2008: 50).

older māchina < Doric mākhanā́

Latin i is from unaccented short *a (Sihler 2008: 60), so māchina must have been borrowed as *māchana before the *a > i shift.

Watkins (2011: 52) derived mākhanā́ from Proto-Indo-European root *māgh-anā (accent unspecified) 'that which enables' with an lenghthened-grade form of the root *magh 'to be able', the source of English may and might.

In a Leiden-style reconstruction without *a, would *māgh-anā be something like *mēʕgh-eʕnēʕ with a root *√mʕgh? Is it worth it to reconstruct so many to avoid *a?

But according to Wiktionary, Robert Beekes of the Leiden school derived the word from a pre-Greek substratum in his etymological dictionary which I haven't seen. Why couldn't mākhanā́ be from *√mʕgh?


Last night I mentioned the pan-Central Asian title

053-051 <qa.gha> 'qaghan'

as an example of a non-Chinese loanword in Khitan. I wonder if its medial -gh- indicates a late borrowing.

In native Khitan words, medial *-gh- and *-b- were lost between the vowels *a and *u:


'hundred' *jaghu > 015 <jau>; cf. Written Mongolian jaghun


'five': *tabu > 029 <tau>; cf. Written Mongolian tabun

This loss enabled the graphs for 'hundred' and 'five' in both the large and small scripts to represent the Chinese loanword


<jau.tau> < 招討 'bandit suppression commissioner'.

How many other Khitan words lost their medial consonants: i.e., how many companions did the commissioner have?

At this point, I don't know whether

*-gh- (= */g/?*) and *-b- were lost between other vowel sequences

- *-d- was also lost (or became something else**) between vowels

In other words, I don't know the limits of lenition in Khitan. Knowing those limits would enable us to date borrowings: e.g., if *-gh- was lost in the environment *a_a, then qagha must have been borrowed after that loss, just as Liao Chinese 招討 *jautau was borrowed after the loss of *-gh- and *-b- in the environment *a_u in 'hundred' and 'five'.

Qagha is certainly not native to Khitan, but what about words which might have -aghu- and -abu- sequences*** such as

189-151-123-348 <>**** (興宗 28.14) and 189-196-222 <a.bu.ń> (興宗 31.4)

Are these loanwords? If not, have their intervocalic obstruents been restored by analogy? Or are they of secondary origin from earlier clusters***** (e.g., *ambu > abu) or a lost series of obstruents****** (e.g., *au > abu but *abu > au)?

*3.30.0:53: In pre-Khitan, *gh and *g might have been allophones of */g/ appearing before different vowels: */ga/ was *gha, */ge/ was *ge, etc. In any case, gh and g were distinct phonemes in Khitan because /ga/ was possible in Chinese loans (like Manchu g'a):

Pre-Khitan Khitan IPA
*/ga/ /gha/ [ʁɑ]
*/ge/ /ge/ [gə]
([gɑ] not possible) /ga/ [gɑ]

**3.30.0:30: In Korean according to Alexander Vovin (2010), medial *-p-, *-t-, *-s-, and *-k- lenited to Middle Korean -β-, -r-, -z-, and -ɣ- which became -w/Ø-, -r-, -Ø-, and -Ø- in modern Korean. If Khitan was like Korean, then pre-Khitan *-d- might have lenited to a liquid in intervocalic position. But so far I have not seen any evidence for coronal lenition in Khitan. There was no z in Khitan, so if pre-Khitan *-s- lenited, it must have become something else.

***3.30.0:57: The rules for determining whether a graph was pronounced as VC or CV are still unknown, so perhaps those two blocks were read aughare or aubiń: i.e., without -gh- or -b- between a and u. Still other readings are possible since 123 may have been ra as well as ar, and 222 was ńi as well as (i)ń.

****3.30.1:05: If Khitan had Mongolian or Manchu-like vowel harmony, an e would not be expected in a word with a. Could *a be reduced to e [ə] in unaccented positions?

*****3.30.1:08: This was inspired by Vovin's derivation of Middle Korean intervocalic stops from earlier clusters which were mostly *nasal-stop sequences.

******3.30.1:15: In this scenario, voiced aspirates and voiced nonaspirates had distinct reflexes in intervocalic position but might have merged in other positions: e.g., *b(ʱ)- > b-, etc.


To Chinese eyes, the Khitan large script at first appears to be a random mix of Chinese characters and alien shapes.

Given that the Khitan large script is said to have been 'invented' c. 920 using the Chinese script as a model, one might expect it to be something like the modern Japanese script in which Chinese loans are generally written with Chinese characters and kana almost always represent non-Chinese words*:

Khitan large script characters resembling Chinese characters : Chinese loanwords

Khitan large script characters not resembling Chinese characters : native Khitan words

However, the reality is more complex:

Khitan large script characters resembling Chinese characters :

Chinese loanwords

e.g., 皇帝 (looks like Liao Chinese *hongdi 'emperor') for Khitan hongdi 'id.'

and native Khitan words

e.g., 五 (looks like Liao Chinese *ngu 'five') for Khitan tau 'id.'

Khitan large script characters not resembling Chinese characters :

native Khitan (or at least non-Chinese**) words

e.g.,  doro (?) 'seal'

and Chinese loanwords

e.g., gün 'army' for Liao Chinese 軍 *gün 'id.'

One could also hypothesize that Chinese character lookalikes were used to write Khitan syllables that had (near-)homophones in Chinese, whereas nonlookalikes were used to write non-Chinese Khitan syllables and words with un-Chinese segments and phonotactics: e.g., Khitan iri 'name' with an un-Liao Chinese -r-.

But in fact, syllables shared by Khitan and Chinese were sometimes written with nonlookalikes:

e.g., for ai (why not write it with a lookalike of Liao Chinese *ai-graphs like 愛?)

And syllables and words with un-Chinese elements were sometimes written with lookalikes:

e.g., 午 (looks like Liao Chinese *ngu 'horse (calendrical)') for Khitan iri 'name'

Did the creator(s) of the Khitan large script take the Chinese script as used in the early 10th century, keep random characters, change the sound values of some of them, and then make up new characters?

One might come up with such an explanation for Cyrillic: its inventors took the Latin alphabet, kept some letters (e.g., А), changed the sound values of some of them (e.g., В for [v] instead of [b]), and then made up new characters (e.g., Б for [b] and Г for [g]). However, that is not what what happened. Both the Cyrillic and Latin alphabets are derived from the Greek alphabet. They are sisters, not daughter and mother.

If Janhunen (1994, 1996) is correct, the Khitan large script is to the Chinese script what Cyrillic is to Latin. Like Cyrillic, the Khitan large script was not invented on the spot; it was an adaptation of an existing script: the Parhae script, a Manchurian offshoot of the early Chinese script. The following seven Khitan large script characters might then be inherited from the Parhae script rather than taken from the 10th century Chinese script:

Sinograph Liao/Jin Chinese Khitan large script Khitan Jurchen large script Jurchen
*ho (< Middle Chinese *ɣɑ) ha ha
*she (< Middle Chinese *ɕjæˀ) ? sha
*sien (< Old Chinese *sˁir < *sˁər) ? shira or shïra
*gung ? (no similar Jurchen character) (*gung***)
gung gung
*ong (< Old Chinese *ɢʷaŋ) ong ong

Janhunen then proposed that the Jurchen large script was another derivative of the Parhae script rather than a direct successor of the Khitan large script.

Let's suppose the conventional wisdom is correct and that the Jurchen large script was invented c. 1120 with the then-current Chinese script as a model. Why was Jin Chinese 公 *gung 'duke' written with Jurchen 王, a lookalike of the characters for Jin Chinese *ong 'prince' and Khitan ong 'prince'?

Jin Guangping and Jin Qizong (1980: 56) proposed that Jurchen 王 gung was derived from Jin Chinese 工 *gung 'work' with an added stroke. Why not just copy 公 or 工?

Here is a wild speculation. In Old Chinese, 王 was pronounced *ɢʷaŋ. In mainstream Chinese *ɢʷ- weakened to *w-, and later, *waŋ became -ong in the northeast. What if a now long-extinct Manchurian Chinese dialect retained a stop initial for 王? Then perhaps 王 had two readings in Parhae, *gung based on the colloquial stratum of Manchurian Chinese, and *ong based on a literary stratum borrowed from mainstream Chinese. The first reading is the source of the Jurchen reading and the second is the source of the Khitan reading.

3.29.0:34: I am skeptical of the stop-retention scenario because there is no other evidence for *ɢʷ- surviving as a stop at such a late date in the northeast or anywhere else. Nor is there any evidence for *-ʷaŋ becoming *-ung in the northeast.

3.29.0:46: The Jurchen characters

for ong resemble those for ja (see my previous entry)

with two extra strokes on top.

However, Jin Qizong (1984: 236) regarded the ong-graphs as derivatives of the Khitan small script character

071 <ong>.

How would Janhunen explain that resemblance? Do the Jurchen large script and Khitan small script characters both go back to a Parhae prototype? Could the Jurchen character retain a 'roof' lost in the Khitan small script character?

*3.29.0:57: Although there is a strong tendency to write Chinese loans with Chinese characters in Japanese, some Chinese loans are in kana: e.g., サンゴ sango 'coral' (instead of 珊瑚).

Furthermore, Chinese characters do not always represent Chinese loans. In many cases they represent native Japanese words: e.g., 薔薇 for bara 'rose' as well as the much rarer borrowings shōbi and sōbi.

**3.29.1:01: Not all non-Chinese words in Khitan are native: e.g.,

053-051 <qa.gha> 'qaghan'.

may ultimately be of Xiongnu origin. (Has this word been identified in the large script?)

***3.29.1:14: Jin Qizong read two different Jurchen characters

as gung (in my notation), so in theory either could have transcribed Jin Chinese 工 *gung 'work'.

However, the second is only attested as a transcription of 宮 'palace' which was transcribed as

334-019-345 <g.iu.ung>

in Khitan.

So I suspect that the two Jurchen characters originally represented two different syllables, gung and giung, that merged into gung in the Yuan Dynasty Old Mandarin dialect of the Zhongyuan yinyun but not Phags-pa Chinese where they are still distinct as ꡂꡟꡃ <> and ꡂꡦꡟꡃ <>.


When I first became interested in Jurchen, I assumed that its (large) script was "obviously derived from the Chinese script and the Khitan large script, with many innovations of its own" (Kane 1989: 21).

Then I discovered Janhunen's (1994: 114) hypothesis which I still regard as plausible after almost twenty years:

It was the other Sinitic script [of Parhae] that, due to its firm local [i.e., Manchurian] roots, was later transmitted first to the Khitan, and then to the Jurchen. All of this means that the conventional view, according to which the Jurchen script was successive to the Khitan «large» script, cannot be correct. As graphic systems, and heirs of the Bohai [= Parhae] script, the Khitan and Jurchen «large» scripts should be viewed as parallel, rather than successive developments.

There is much more to Janhunen's argument than that, but for now I want to focus on one of its implications. If the Khitan and Jurchen large scripts are offshoots of the Parhae script developed at some point prior to the end of the Parhae state in 926, then the readings of their Chinese-based elements are likely to reflect pre-10th century Chinese phonology to some extent. Such a scenario has a precedent in Old Japanese man'yōgana whose readings contain archaisms from the Chinese learned by the Paekche centuries earlier: e.g.

支 for Old Japanese ki < *ki and *ke is closer to Late Old Chinese *kie than Middle Chinese *tɕie

止 for Old Japanese is closer to Old Chinese *təʔ than Middle Chinese *tɕɨəˀ

(But Gerald Mathias views 止 as a kungana whose reading is based on Old Japanese töma- 'stop' [my təma-]; if so, then the resemblance to Old Chinese is coincidental.)

富 for Old Japanese is closer to Late Old Chinese *puəh than Middle Chinese *puʰ

Conversely, if the Khitan and Jurchen large scripts had no deeper roots, the readings of their Chinese-based elements should be derivable purely from Liao and Jin Chinese, as there would be no way for their creators to know about earlier readings.

Jin Guangping and Jin Qizong (1980: 56-57), Kane (1989: 23), and Kiyose (2004: 93) list Jurchen characters* with readings as well as shapes of Chinese origin**:

Jurchen Jurchen reading Sinograph Liao/Jin Chinese*** Middle Chinese Old Chinese
aci *ci *tɕʰiek *tɯ-qʰjak
ging *ging *kɨeŋ *Cɯ-qraŋ or *qɯ-raŋ
gung *gung *koŋ *koŋ
hi *si *sej *sʌ-ləj
i *u < *wuo *Cɯ-ɢʷa
i *u < *wuoˀ *Cɯ-waʔ
ja *jr *tɕi < *tɕɨʰ *təs
ki *ki *gɨ *gə
ngu *ngu *ŋo *ŋʷa
sa *cha *ɖæ *rla
u *ngu *ŋoˀ *ŋaʔ
dai *da(i) *dɑjʰ *lats
fu < pu *fu *fu < *puoˀ *poʔ
jul *ju *tɕu < *tɕuo *Cɯ-to
shang *shang *ɕɨaŋ < *dʑɨaŋˀ *Cɯ-daŋʔ or *Nɯ-taŋʔ
tai *tai *tʰɑjʰ *l̥ats
ha *ho *ɣɑ *ɢaj
sha *she *ɕjæˀ *l̥jaʔ
shira (Kiyose) or shïra (Jin and Jin) *sien *sen *sˁir < *sˁər < *Cʌ-sər

Out of that incomplete sample of nineteen characters,

- eleven have readings based on Liao/Jin Chinese (green)

- five have readings that could be based on either Liao/Jin Chinese or Middle Chinese (bluish green)

- two have readings that resemble Middle Chinese (blue)

- at least one has a reading that resembles Old Chinese (yellow)

I'll discuss a less likely instance in my next entry.

The last three characters (which all have have Khitan large script predecessors that look exactly like Chinese 何舍先) are hardly solid proof for Janhunen's hypothesis.

The Khitan and Jurchen may have used Liao/Jin Chinese 何 *ho for ha in their languages because there may not have been a character for *ha in Liao/Jin Chinese. (The only character read ha in the Phags-pa Chinese of the Yuan Dynasty is rare: 閜.)

Nonetheless the other two are difficult to explain if they were devised c. 1120 or perhaps even c. 920. Why write Jurchen sha with a derivative of Jin Chinese 舍 *she when Jin Chinese 沙 *sha was a closer phonetic match? And is the close match of Jurchen shira ~ shïra and Old Chinese *sˁir < *sˁər just a coincidence?

*3.28.2:50: Since this post does not deal with the Jurchen small script, I will refer to Jurchen large script characters simply as Jurchen characters.

**3.28.2:58: There are Jurchen characters with shapes of Chinese origin and native readings that are translations of Chinese: e.g.,


looks like Jin Chinese 一 *i 'one' but represented the native Jurchen word emu 'one'.

***3.28.3:15: I wrote Liao/Jin Chinese forms in an orthography resembling my transcriptions of Khitan and Jurchen to facilitate comparison. Khitan and Jurchen voiced obstruents may have been unaspirated and voiceless: e.g., Jurchen jul may have been [tɕul], a close match for Middle Chinese 朱 *tɕu(o).

15.3.26:23:49: QUINTUP-<UL> TROUBLE (PART 3)

In part 1, I proposed that Khitan small script character


might have represented <ül> because

131-366 <u.?> 'winter'
corresponds to Written Mongolian ebül 'id.'

In a generic 'Altaic' language, harmonic rules prevent the mixture of segments from two classes which I will call A and B*: e.g.,

a, u, ł, ɣ ... e, ü, l, g ...

'Neutral' segments can occur with segments of either class A or B.

Hence <ül> should be a class B character that should only co-occur with class B and/or neutral characters within a Khitan small script word block.

I used to think that

098 and 261

represented class A <ał> and class B <(e)l>, but in fact they not only coexist with each other but even with 366 in

340-098-366-261-349-021 <x.ał.üó>** (興宗 26.6)

(021 <mó> looks like an error for the dotless verb ending 020 <ei>)

which is unexpected from an 'Altaic' perspective. I would have expected


class A *130-098-206-098-051-122 <x.ał.uł.ał.ɣ>*** or class B *340-261-366-261-349-020 <x.el.ü>.

366 can also coexist with both class A 051 <ɣa> and class B 349 <ge> in the same text (道宗):


161-366-261-051-189-123 <aú.ül.el.ɣ> (道宗 12.30)

(instead of 161-206-261-051-189-123 *<aú.uł.ał.ɣ>)

and 131-097-372-366-334-140 <u.úr.û.ül.g.en> (道宗 18.6)

That would also be unusual for an 'Altaic' language.

I am conflicted.

On the one hand, Khitan has sets of suffixes implying the presence of an 'Altaic'-style harmonic system: e.g., the causative-passive suffixes (class A?) and (class B?) in the above pair of words.

On the other hand, there seem to be harmonic violations. Are those violations artifacts of incorrect class assignments (e.g., is 366 a neutral character?), or are they real and perhaps even predictable?

The earliest known small script text is dated 1053, over a century after the invention of the small script c. 925. Do all small scripts discovered so far reflect Khitan after its harmonic system began to break down? Would the very first texts in the small script have more harmonic spellings?

*3:27.2:13: I got the A/B terminology from EG Pulleyblank who used it to describe Old Chinese syllable types. Norman (1994) was the first to draw parallels between Old Chinese and Altaic syllable types. I have gone even further and proposed harmony rules for Old Chinese.

I use the terms A and B to avoid specifying the nature of the classes: e.g., front vs. back, ±RTR, etc. As Khitan is in the Manchurian linguistic area, I suspect it had RTR harmony like its neighbor Jurchen.

**3.27.2:17: This is Andrew West's reading. Qidan xiaozi yanjiu has

340-067-366-261-349-020 <x.eü.ü>

which is not only harmonic but also has the dotless verb ending 020 <ei> instead of dotted 021 <mó> which is not a verb ending. I have not seen the handwritten copy of 興宗, and the original stele is inaccessible, so I do not know who is correct.

***3.27.2:24: I assume 206 is a type A character since it is flanked by a-characters in

029-206-189 <tau.uł.a> 'hare'.

15.3.25:23:59: QUINTUP-<UL> TROUBLE (PART 2)

In part 1, I built upon Aisin Gioro's work by equating the following five Khitan small script characters and regarding the first three as variants of each other:

  013 <ul> = 050 <ul> = 206 <ul> = 228 <ul> = 366 <ul>

The second and third appear in the same word:

050-131-206 <ul.u.ul> (道宗 16.21, 20.13 [1101 AD], 蕭仲恭 33.33 [1150 AD])

Did scribes of two different inscriptions nearly fifty years ago apart really use two variants so close together in three instances, or did  050 and 206 have two different readings?

3.26.1:10: Was 050-131-206 for ulul (?) above related to (or at least partly homophonous with)

050-131-366-311-162 <ul.u.ul.b.c> (宣懿 18.2 [also 1101 AD])

050-131-366-311-222 <ul.u.ul.b.ń> (道宗10.25, 15.19, 28.24, 宣懿 17.11)

which have 366 instead of 206 for their second <ul>? Or did 206 and 366 have different readings?

15.3.24:23:24: QUINTUP-<UL> TROUBLE (PART 1)

In my last entry, I proposed that the rare Khitan small script character 013 might be a variant of 050 which Aisin Gioro (2008) read as <ul>. Both in turn resemble 206 which Aisin Gioro (2003) also read as <ul>. Could 206 be yet another variant of 050?


050 <ul> = 013 <ul> = 206 <ul>

If Aisin Gioro is correct, then

029-206-189 'hare'

was <tau-ul-a> = taula, and Khitan may have lost a final -i retained in Written Mongolian taulai.

How can taula be reconciled with the History of the Liao Dynasty transcription 陶里 *tauli for 'hare'? There is no guarantee that Chinese transcriptions and the Khitan small script represent the same variety of Khitan. Perhaps *ai simplified differently in different dialects of Khitan:

Proto-Khitan-Mongolic *taulai
Proto-Khitan *taulai
Proto-Mongolic *taulai
Standard taula
Nonstandard tauli
Written Mongolian taulai

Aisin Gioro (1999, 2004) identified two more small script characters for <ul>. Why did the Khitan have five characters for the same VC sequence?

050 <ul> = 013 <ul> = 206 <ul> = 228 <ul> = 366 <ul>?

The first three may be allographs, but the last two do not resemble them. Did 050/013/206, 228, and 366 originally represent three different sequences? If Khitan were like Mongolic, an obvious two-way distinction would be between <ul> and <ül>. 366 might have been <ül> since

131-366 <u.ul> 'winter'

corresponds to Written Mongolian ebül 'id.' But what would have been a third value contrasting with <ul> and <ül>? Kane (2009: 29) wrote that Khitan "was exceptionally rich in rounded vowels." Was there a three-way contrast between front [y], back [u], and near-high [ʊ] (like Manchu ū)? Did these three characters

131 <u>, 245 <ú>, 372 <û>

represent those vowels without a following lateral? (I almost wrote [l], but /l/ may have had different allophones depending on the adjacent vowel.)

At first, one might identify 131 as ü since it preceded 366 which might have been ül. However,

226 <ü>

transcribed Liao Chinese ü, whereas the three other <u>-type characters were used to transcribe Liao Chinese *u. Were they always interchangeable, or was that interchangeability due to later mergers?

Has anyone looked at Khitan spelling over time? Spelling variation may give us clues to changes in Khitan over a two or even a three-century period. If Nova N 176 is from, say, 1200 - the eve of the fall of the Qara Khitan - its large script spelling could differ from the norm established c. 920. Moreover, some variation may be due to Jurchen speakers' perceptions of Khitan phonology: e.g., Jurchen speakers may have heard only one or two kinds of /u/ in Khitan which might have had three. (3.25.0:16: First-language influence in Khitan texts written by Jurchen speakers has yet to be explored.)


Last night, I accidentally miswrote

070-050 <w.?>

in the Qidan xiaozi yanjiu transcription of 興宗 15.19 as

070-013 <w.?>

with a slightly different and much rarer character that only appears twice in the texts in Qidan xiaozi yanjiu:

028-067-013 <> (道宗 27.9) and 013-224-327 <?> (耶律撻不也 12.1)

Is 013 in any of the texts that have been found in the three decades since the publication of Qidan xiaozi yanjiu? Could 013 be a variant of 050? Is that why Aisin Gioro did not include 013 in 契丹小字の音価推定および相関問題?

028-067 <> (a transcription of Liao Chinese 守 *sheu; could it also be a native word?)

occurs by itself. Does that imply 028-067-013 <ś.eu.?> is a suffixed form, or are they unrelated partial homophones? There are eight forms beginning with 028-067; some have known suffixes (e.g.,

028-067-273 <> ending in what may be genitive <-un> in 蕭令公 25.16 and 許王 50.5)

and others do not (e.g.,

028-067-041 <> 'dew' in 宣懿 25.14 and 許王 cover 1.5).

Aisin Gioro read 041 as <us>* and 050 as <ul> ~ <l-> for reasons unknown to me. The Khitan small script has many sequences of the same vowel in two adjacent characters, so sequences such as

028-067-041 <> and 028-067-013 <> (if 013 = 050)

look plausible. Moreover, 028-067-041 <> may be a variant spelling of

028-067-244 <> (巴拉哈達洞壁墨書 I.2.4; <s> may be a plural ending)

Unfortunately, I know of no

*028-067-261 <>

corresponding to 028-067-013 <>.

*3.24.1:56: If Aisin Gioro's reading of 041 is correct, then

028-067-041 <> 'dew'

woulld be less of a match for Written Mongolian sigüder(i) 'dew'. If Mongolian -der(i) is not a suffix, then perhaps the Khitan form is a reduction of an earlier sigüder(i)-like form to sheu with a plural suffix -s: i.e., drop of dew. The two words may also be unrelated.


Two nights ago, I wrote,

It seems that Khitan VC characters can also double as CV characters. I've guessed that they are CV before consonants and VC before vowels, but that does not always seem to be the case.

Offhand certain types of VC characters are less likely to have reversible readings than others: e.g., VN characters seem to have nonreversible readings with the sole definite exception of

222 <ń> for ~ ńi (see Kane 2009: 61 for the Chinese transcription evidence)

There are dedicated characters for some NV sequences other than ńi and ngV*: e.g.,

139 <na> and 191 <mú>

Conversely, vowel-liquid characters may have had reversible readings:

084 <ar> ~ <ra>, 098 <al> ~ <la>, 261 <el> ~ <le>?

See "<Ra>-Construction 5" and "Did Khitan Have Two Laterals?".

The VG sequence character

020 <ey> (Kane's <ei>)

represented <y> in word-initial position: e.g.,

020-084-131-344 <> '耶律 Yelü'

Could <w> also represent a VG sequence: i.e., Vw? That is unlikely because such sequences already have characters:

019 <iu>, 023 <iu> (?), 067 <eu>, 138 <iû>, 161 <au>, 164 <au> (?), 210 <aú>, 289 <iú>

(I could also transliterate them as <iw>, etc. to match <ey> instead of <ei>.)

Might some of those characters be read as wV in word-initial position? Could 019, for instance, have been read wi 'to not exist, die'? I doubt it because Chinese w- was always written with initial

070 <w>

instead of any of the above <Vu> (= <Vw>) characters. That character never appears in medial position. If Khitan had medial -w-, it must have been written in some other way: e.g., could 019 <iu> stand for -wi- after vowels? Were

262 and its variant 263

ever read as wi instead of [uj]? Was

210-262-140 <aú.ui.en> 'woman of noble rank-GEN' (耶律撻不也 16.14)

from my last entry pronounced something like awiən?

The only non-Chinese Khitan word with an initial <w> is

070-050 <w.?> (興宗 15.19)

according to Qidan xiaozi yanjiu. Its reading is open to question, as Andrew West read it as

073-? <ên.?>

Alas, I can't consult the original because only a handwritten copy remains, and I have not been able to examine a good reproduction of it.

3.23.1:35: That mystery word occurs after a space that may indicate respect. Does it have an aristocratic referent?

Could it be an error for

072-050 <?.?>

from 道宗 25.17 and 耶律撻不也 16.4? Neither of those instances was preceded by a space.

*3.23.1:27: ng was only in Chinese loanwords. Initial and occasionally final ng were written with

264 <ng>.

The absence of ng in native Khitan words is not surprising since Janhunen (2003: 6) did not reconstruct it for Proto-Mongolic which may be the descendant of a 'sister' of Khitan. However, there is no guarantee that Khitan and Proto-Mongolic lacked the same consonants: e.g., Khitan had p, but Proto-Mongolic did not. Similarly, the absence of Proto-Mongolic *w does not guarantee the absence of w in native Khitan words.


I looked through Qidan xiaozi yanjiu (1985) hoping to find examples of

311-151-290 <b.ghu.án> 'sons'

in the construction

numeral♂ + plural masculine noun (see Kane 2009: 139-142 for examples)

but I only found words other than numerals before 'sons':

1. Genitives before 'sons'

210-262-140 <aú.ui.en> 'woman of noble rank-GEN' (耶律撻不也 16.14)

295-097-311-222 192-339 <p.úr.b.iń shï.i> 'Purbin-Madame (< 氏)-GEN' (耶律撻不也 18.27)

The context implies that Purbin is a woman's name. It does not appear anywhere else in the Qidan xiaozi yanjiu corpus.

241-033-222-140 <ń.en> 'lady (婦人)-GEN' (耶律撻不也 18.32)

374 071-154 <> 'grand prince (太王)-GEN' (蕭仲恭 4.53-54)

334-345 104-289-273 <g.ung.j.iú.un> 'princess (公主)-GEN' (蕭仲恭 6.18-19)

311-168-339 <b.qo.i> 'son-GEN' (蕭仲恭 30.4)

2. Plural nouns in apposition before 'sons'

122-254 <ai.d> 'father-PL' (蕭令公 23.6, 耶律撻不也 9.19)

021-247 <mó.t> 'mother-PL' (蕭令公 24.3,  蕭仲恭 29.8)

131-111-254 <u.?.d> '?-PL' (蕭仲恭 43.36)

How did 'fathers-sons' differ semantically from a hypothetical

*122-254 311-151-290 <ai.d.en b.ghu.án> 'fathers' sons'?

Was the genitive suffix unnecessary after certain nouns (e.g., 'fathers' and 'mothers')?

3. Verbs before 'sons'

295-016-189-123 <> 'return-PERF' (許王 48.26)

287-098 <?.al> '?-CONV' (耶律撻不也 24.4)

287 does not occur alone. It tends to precede a-graphs, so it may have been a Ca-graph. I do not know whether it really represented a verbal root. Nor do I know the exact function of the converb -al.

4. Other words before 'sons'

191*-262-348-162 <mú-ui-e-c> '?' (仁懿 8.20)

This could be a noun in apposition. I don't know of any other attestations.

153-254-222 <j.d.iń> '?-PL-GEN'? (蕭仲恭 34.24)

If 153 (which can occur alone) is a noun, could this be a genitive plural noun? But <j.d> '?-PL' is not attested by itself.

Are numerals - masculine or otherwise - attested before 'sons' in the small script texts discovered after 1985?

*3.22.4:02: Why did Kane (2009: 58) transliterate 191 as <mú>? The fact that 191 is often followed by <u>-graphs

262 <ui>, 366 <ul>, 372 <û>

may imply that it ended in <u>, but what evidence is there for an initial <m>?


In my last entry, I noted that <en>, normally a genitive suffix written in a block with a preceding noun, was isolated in 萬部華嚴經塔塔壁題字 2.8:


244-327-073 140 <ên en> (instead of *244-327-073-140 <ên.en>) 'thousand GEN (?)' before 311-290 178 378 < ku "> '?* people' (the reduplication of ku 'person' is reminscent of Japanese hitobito 'people').

I found one other instances of isolated <en> in Qidan xiaozi yanjiu (1985):

134 140 311- <TWO en b.qo> (仁懿 6.3-5) 'two GEN son' = 'two sons'?

An apparent third instance turned out to be the upper half of a now-illegible stack:

162 345 290 140-? <c ung án en.?> 'Chong An (a Chinese name?) ?' (慶陵壁畫題字 IV)

I have wanted to look into such unusual vertical stacks for almost a year now.

Perhaps there are more examples of independent <en> in the small script texts that have been discovered over the last three decades. Could some represent a word ne? (It seems that Khitan VC characters can also double as CV characters. I've guessed that they are CV before consonants and VC before vowels, but that does not always seem to be the case.)

Are the first two instances examples of numerals followed by genitives? Why is <TWO> nonmasculine in 仁懿? What is the semantic difference between

numeral♂ + plural masculine noun (see Kane 2009: 139-142 for examples) and

numeral + genitive + singular masculine noun

Is gender and number neutralized in the latter construction?

I thought <TWO en b.qo> might be 'two' followed by a compound noun '?-son', but I would expect <TWO> to be masculine and '?-son' to be plural:

<TWO♂ en b.ghu.án> '?-sons'.

3.21.2:11: Should <ghu> in <b.ghu.án> be interpreted as a VC character <ugh> before the vowel-initial character <án>? If so, then 'sons' was bughán which might have been from *buqo-án, implying that <b.qo> 'son' was buqo.

The phonetic difference, if any, between


011 ~ 127 <an> and 290 <án>

is unknown. Kane's acute accent for the transliteration of the latter is arbitrary.

*3.21.2:18: This is the only example of <> in Qidan xiaozi yanjiu. I assume it is an adjective modifying kuku 'people'. I thought <> might be an error for <> 'sons', but I would not expect a bare (i.e., non-genitive) plural noun before 'people' unless the meaning was 'sons [and] people'.

What is the semantic difference between kuku 'people' and


047 <ghor> ~ 047-189 <ghor.a> ~ 047-131 <ghor.u> 'people'?

Did kuku refer to individuals while the ghor-words referred to a collective?


Last night, I asked,

Is there any evidence for Sino-Khitan numerals in the Khitan small script?

Although Kane (2009) only listed Khitan native numerals in his glossary of Khitan small script vocabulary, his list of Liao Chinese borrowings in the Khitan small script includes

244-189-184 <> < Liao Chinese 三 *sam 'three' and

244-327-073 <ên> < Liao Chinese 千 *tshien 'thousand'

as parts of longer loanwords but not as independent words. Perhaps they are Sino-Khitan numeral roots but not numerals themselves. Similarly, English has the Greek and Latin root tri- 'three', but tri is not a free morpheme like three (or Sino-Korean sam 'three' or Sino-Japanese san 'three').

I checked to see if <ên> ever appeared outside Chinese loanwords in the texts in Qidan xiaozi yanjiu, and I found

- two instances of <ên> for Chinese 仙 *sien 'Taoist immortal' (道宗 6.12, 31.33)

- two instances of <ên> for Chinese 前 *tshien 'front, before' (蕭仲恭 20.24, 33.39)

- one instance of <ên> (gloss unknown; 萬部華嚴經塔塔壁題字 2.8) before

140 <en>

which might be the native Khitan genitive suffix after a noun (仙 or 前?; the latter was borrowed into Korean as a free morpheme). I assume that last <ên> is not 'three' since I have not seen a numeral-genitive construction in Khitan. (3.20.23:30: Now I have!)

(3.20.1:25: Maybe <en> is not a genitive suffix. Such a suffix would normally be written in the same block as the preceding noun. I will look at other cases of independent <en> next time.)

So far it seems that <ên> is not 'three' outside the context of

244-327-073 264-019 <ên ng.iu> < Liao Chinese 千牛 *1tshien 1ngiu  'thousand-ox'

from Kane's list corresponding to

<sien ng iu> (耶律昌允 2)

in the large script.

However, there are small script texts discovered after the publication of Qidan xiaozi yanjiu in 1985 which I haven't checked. Nonetheless at this point I am skeptical about freestanding Sino-Khitan numerals in the small script. Moreover, I am not even certain that

<si sien ngu bai> < Liao Chinese 七千五百 *4tshi 1tshien 2ngu 4pai  (耶律昌允 4)

represents Sino-Khitan numerals in the large script. Why did Liu and Wang (2004: 91) identify it as 'seven thousand five hundred'?


Kane (2009: 177) listed no Khitan large script character for 'thousand' corresponding to

207 <ming> (cf. Mongolian mingghan 'thousand', Jurchen minggan 'id.')

in the small script, even though Kane made use of Liu and Wang (2004: 91) which identified a similar large script character looking like Chinese 夹* as 'thousand' in line 4 of the 1062 epitaph for 耶律昌允 Yelü Changyun:

Liu and Wang (2004: 79-81): <si sen ŋu pe>

Kane (2009: 178-179): <sï (t)s(i)en ŋu bai>

'seven thousand five hundred' (cf. Liao Chinese 七千五百 *tsʰi tsʰien ŋu pai)

On the other hand, both Liu and Wang (2004: 91) and N4631 identified


as 'thousand' even though Kane (2009: 176) listed it as 'yellow' in calendrical contexts. Andrew West also listed

as another variant of 'yellow'. Were 'thousand' and 'yellow' homophones in Khitan? (Other evidence points to an *n-initial word for 'yellow, gold' in Khitan. See Kane 2009: 165-166. Maybe Khitan had two words for 'yellow', and the calendrical word sounded like Sino-Khitan 'thousand'.)

Did Khitan have two words for 'thousand', a borrowing from Chinese and a native word? The phrase above may be entirely in Sino-Khitan; would


<dalo (?) ming (?) tau jau>

with some unknown character for 'thousand' be its native equivalent? (Ironically the Khitan wrote their native numerals with large script characters sometimes matching Chinese numeral characters but wrote Sino-Khitan numerals with large script phonograms with almost no resemblance to Chinese numerals.**) When did the Khitan use Sino-Khitan and native numerals? Is there any evidence for Sino-Khitan numerals in the Khitan small script? I'll start to answer that last question next time.

*3.19.1:22: What looks like 夹 in Liu and Wang's (2004: 91) handwritten copy of the epitaph might correspond to

in their list of characters on p. 80. I cannot find 夹 in N4631.

**3.19.1:59: 吾, the Khitan large script character for Sino-Khitan 'five', resembles Chinese 五 'five' because it is a graphic cognate of Chinese 吾 whose phonetic is 五.

高 is the glyph for Sino-Khitan *bai 'hundred' in Liu and Wang's (2004: 91) handwritten copy of the epitaph; a variant

appears in their list of characters on p. 81.

I used to think it was significant that 高 <bai> looked like 'high' and was read <bai> because Tangut

1890 2be4 < *Nɯ-braŋ or *Cɯ-mbraŋ 'high' (cf. Japhug mbro < *mbraŋ 'id.')

had a similar reading and meaning, but now I think the resemblance is merely coincidental. The Tangut word is native. If there was a Khitan word for 'high' like *bai, I doubt that the Khitan would have borrowed it or any other basic vocabulary from the Tangut who were far to the west.


Liu and Wang (2004: 91) identified

in the 1062 epitaph for 耶律昌允 Yelü Changyun as a Khitan large script equivalent of Liao Chinese 都統 *du tuŋ 'commander-in-chief' (translated as 'fighter controller' in the small script?) The large script graphs look exactly like Chinese 弟 'younger brother' and 来 'to come'.

Liu and Wang identified 弟 in line 5 of that epitaph as 'younger brother'. Presumably 弟 is a phonetic character in 弟来 'commander-in-chief' (if that gloss is accurate).

'Younger brother' is

(= ?)

101 (and 072 <EAST>, implying 'younger brother' and 'east' were homophones?)

in the Khitan small script. Kane (2009: 47) read this as <deu> but gave no explanation for that reading which resembles that of the first character of Jurchen

<deu.un> deun 'younger brother'

I will regard the Khitan reading of the large script character 弟 and its small script equivalent 101 as unknown.

I also do not know the reading of 来 in the large script.

It may be significant that in the small script, younger brothers precede older brothers, whereas the reverse order (i.e., Chinese order) is in the Chinese-like large script (see Kane's analysis of the 1114 epitaph for 耶律習涅 Yelü Xinie):



Could the small script reflect a native Khitan word sequence while the large script reflected a borrowing from Liao Chinese 兄弟 *xiuŋ di? (Cf. native Japanese 白黑 shirokuro 'white and black' vs. borrowed Sino-Japanese 黑白 kokubyaku or kokuhaku 'black and white'.)

The first half of Liao Chinese 都統 *du tuŋ appears in line 13 of Yelü Changyun's epitaph as

<du giam> < Liao Chinese 都監 *du giam 'director-in-chief'

whose first character matches its Chinese equivalent. The Khitan large script seal version of 都 is on Andrew West's site. Is there a large script term for 都統 like

with a near-lookalike of 統? Or

with the two characters that are possibly equivalent to the near-lookalike of 統?


Last night, I couldn't figure out why Liu and Wang (2004: 87) identified

and (=)

in the large script as <c> and <u> in my transliteration.

I had forgotten about how Liao Chinese 都統 *du tuŋ 'commander-in-chief' corresponded to the native* Khitan term

<cau.j ɣur.ú>

in the small script.

Apparently Liu and Wang equated the large and small script terms:


<c.auj ɣur.ú>? = <cau.j ɣur.ú>

Although all of those small script readings can be more or less confirmed** by their use in other contexts, I am less confident about the large script readings. Maybe


are <HEAVEN ɣur.ú> 'heaven controller' and <HEAVEN BELOW ɣur.ú> 'world controller' (the world being all under heaven), but is the common character

really <auj> in all 13 occurrences in 耶律褀墓誌?

Maybe <auj> was something like *auji if <cau.j> represented *cau-ji 'fight-er' with a deverbal suffix (Kane 2009: 94; his translation is 'those who engage in battle') that was cognate to Mongolian -ci and Turkish -cI/çI*** for names of vocations.

On the other hand, if the ends of the large and small script era equivalents of the Chinese era name 統和 *tuŋ xwo do not match,


large <?.?> = <?.?> ≠ small <s.bu.o.ɣo>?

then there is no reason to expect the large and small script era equivalents of Chinese 都統 *du tuŋ 'commander-in-chief' to match, and.

might represent something other than <cau.j ɣur.ú>.

*3.17.1:07: 'Non-Chinese' would be a more precise term, as I cannot be sure that any non-Chinese Khitan word is native rather than a borrowing from Xiongnu or even Rouran.

Regardless of the precise origin of <cau.j ɣur.ú>, it contrasts with the loanword

<du t.uŋ>

from Chinese 都統 *du tuŋ. The first character is also a transcription of Liao Chinese 度 *du and <t.uŋ> is also a transcription of Liao Chinese 同 *tuŋ.

**3.17.1:04: Kane (2009) presented evidence for the readings of the four components of <cau.j ɣur.ú>:

022 <cau> corresponds to 炒~嘲 *cau in Chinese transcription.

337 <j> is a variant of 152 which corresponds to 只 *wu in Chinese transcription.

014 <ɣur> corresponds to 斛祿~胡虜~胡魯 *xulu in Chinese transcription. I don't know why Kane didn't read it as <x>, as there was no in Liao Chinese.

245 <ú> corresponds to 武 *wu in Chinese transcription.

**3.17.1:17: Turkish c is voiced [dʒ]. Turkish I is a cover symbol for high vowels (i, ı, u, ü). Clauson (1962: 145) regarded voiceless ç as original.

