Last night I mentioned the pan-Central Asian title

053-051 <qa.gha> 'qaghan'

as an example of a non-Chinese loanword in Khitan. I wonder if its medial -gh- indicates a late borrowing.

In native Khitan words, medial *-gh- and *-b- were lost between the vowels *a and *u:


'hundred' *jaghu > 015 <jau>; cf. Written Mongolian jaghun


'five': *tabu > 029 <tau>; cf. Written Mongolian tabun

This loss enabled the graphs for 'hundred' and 'five' in both the large and small scripts to represent the Chinese loanword


<jau.tau> < 招討 'bandit suppression commissioner'.

How many other Khitan words lost their medial consonants: i.e., how many companions did the commissioner have?

At this point, I don't know whether

*-gh- (= */g/?*) and *-b- were lost between other vowel sequences

- *-d- was also lost (or became something else**) between vowels

In other words, I don't know the limits of lenition in Khitan. Knowing those limits would enable us to date borrowings: e.g., if *-gh- was lost in the environment *a_a, then qagha must have been borrowed after that loss, just as Liao Chinese 招討 *jautau was borrowed after the loss of *-gh- and *-b- in the environment *a_u in 'hundred' and 'five'.

Qagha is certainly not native to Khitan, but what about words which might have -aghu- and -abu- sequences*** such as

189-151-123-348 <>**** (興宗 28.14) and 189-196-222 <a.bu.ń> (興宗 31.4)

Are these loanwords? If not, have their intervocalic obstruents been restored by analogy? Or are they of secondary origin from earlier clusters***** (e.g., *ambu > abu) or a lost series of obstruents****** (e.g., *au > abu but *abu > au)?

*3.30.0:53: In pre-Khitan, *gh and *g might have been allophones of */g/ appearing before different vowels: */ga/ was *gha, */ge/ was *ge, etc. In any case, gh and g were distinct phonemes in Khitan because /ga/ was possible in Chinese loans (like Manchu g'a):

Pre-Khitan Khitan IPA
*/ga/ /gha/ [ʁɑ]
*/ge/ /ge/ [gə]
([gɑ] not possible) /ga/ [gɑ]

**3.30.0:30: In Korean according to Alexander Vovin (2010), medial *-p-, *-t-, *-s-, and *-k- lenited to Middle Korean -β-, -r-, -z-, and -ɣ- which became -w/Ø-, -r-, -Ø-, and -Ø- in modern Korean. If Khitan was like Korean, then pre-Khitan *-d- might have lenited to a liquid in intervocalic position. But so far I have not seen any evidence for coronal lenition in Khitan. There was no z in Khitan, so if pre-Khitan *-s- lenited, it must have become something else.

***3.30.0:57: The rules for determining whether a graph was pronounced as VC or CV are still unknown, so perhaps those two blocks were read aughare or aubiń: i.e., without -gh- or -b- between a and u. Still other readings are possible since 123 may have been ra as well as ar, and 222 was ńi as well as (i)ń.

****3.30.1:05: If Khitan had Mongolian or Manchu-like vowel harmony, an e would not be expected in a word with a. Could *a be reduced to e [ə] in unaccented positions?

*****3.30.1:08: This was inspired by Vovin's derivation of Middle Korean intervocalic stops from earlier clusters which were mostly *nasal-stop sequences.

******3.30.1:15: In this scenario, voiced aspirates and voiced nonaspirates had distinct reflexes in intervocalic position but might have merged in other positions: e.g., *b(ʱ)- > b-, etc.


To Chinese eyes, the Khitan large script at first appears to be a random mix of Chinese characters and alien shapes.

Given that the Khitan large script is said to have been 'invented' c. 920 using the Chinese script as a model, one might expect it to be something like the modern Japanese script in which Chinese loans are generally written with Chinese characters and kana almost always represent non-Chinese words*:

Khitan large script characters resembling Chinese characters : Chinese loanwords

Khitan large script characters not resembling Chinese characters : native Khitan words

However, the reality is more complex:

Khitan large script characters resembling Chinese characters :

Chinese loanwords

e.g., 皇帝 (looks like Liao Chinese *hongdi 'emperor') for Khitan hongdi 'id.'

and native Khitan words

e.g., 五 (looks like Liao Chinese *ngu 'five') for Khitan tau 'id.'

Khitan large script characters not resembling Chinese characters :

native Khitan (or at least non-Chinese**) words

e.g.,  doro (?) 'seal'

and Chinese loanwords

e.g., gün 'army' for Liao Chinese 軍 *gün 'id.'

One could also hypothesize that Chinese character lookalikes were used to write Khitan syllables that had (near-)homophones in Chinese, whereas nonlookalikes were used to write non-Chinese Khitan syllables and words with un-Chinese segments and phonotactics: e.g., Khitan iri 'name' with an un-Liao Chinese -r-.

But in fact, syllables shared by Khitan and Chinese were sometimes written with nonlookalikes:

e.g., for ai (why not write it with a lookalike of Liao Chinese *ai-graphs like 愛?)

And syllables and words with un-Chinese elements were sometimes written with lookalikes:

e.g., 午 (looks like Liao Chinese *ngu 'horse (calendrical)') for Khitan iri 'name'

Did the creator(s) of the Khitan large script take the Chinese script as used in the early 10th century, keep random characters, change the sound values of some of them, and then make up new characters?

One might come up with such an explanation for Cyrillic: its inventors took the Latin alphabet, kept some letters (e.g., А), changed the sound values of some of them (e.g., В for [v] instead of [b]), and then made up new characters (e.g., Б for [b] and Г for [g]). However, that is not what what happened. Both the Cyrillic and Latin alphabets are derived from the Greek alphabet. They are sisters, not daughter and mother.

If Janhunen (1994, 1996) is correct, the Khitan large script is to the Chinese script what Cyrillic is to Latin. Like Cyrillic, the Khitan large script was not invented on the spot; it was an adaptation of an existing script: the Parhae script, a Manchurian offshoot of the early Chinese script. The following seven Khitan large script characters might then be inherited from the Parhae script rather than taken from the 10th century Chinese script:

Sinograph Liao/Jin Chinese Khitan large script Khitan Jurchen large script Jurchen
*ho (< Middle Chinese *ɣɑ) ha ha
*she (< Middle Chinese *ɕjæˀ) ? sha
*sien (< Old Chinese *sˁir < *sˁər) ? shira or shïra
*gung ? (no similar Jurchen character) (*gung***)
gung gung
*ong (< Old Chinese *ɢʷaŋ) ong ong

Janhunen then proposed that the Jurchen large script was another derivative of the Parhae script rather than a direct successor of the Khitan large script.

Let's suppose the conventional wisdom is correct and that the Jurchen large script was invented c. 1120 with the then-current Chinese script as a model. Why was Jin Chinese 公 *gung 'duke' written with Jurchen 王, a lookalike of the characters for Jin Chinese *ong 'prince' and Khitan ong 'prince'?

Jin Guangping and Jin Qizong (1980: 56) proposed that Jurchen 王 gung was derived from Jin Chinese 工 *gung 'work' with an added stroke. Why not just copy 公 or 工?

Here is a wild speculation. In Old Chinese, 王 was pronounced *ɢʷaŋ. In mainstream Chinese *ɢʷ- weakened to *w-, and later, *waŋ became -ong in the northeast. What if a now long-extinct Manchurian Chinese dialect retained a stop initial for 王? Then perhaps 王 had two readings in Parhae, *gung based on the colloquial stratum of Manchurian Chinese, and *ong based on a literary stratum borrowed from mainstream Chinese. The first reading is the source of the Jurchen reading and the second is the source of the Khitan reading.

3.29.0:34: I am skeptical of the stop-retention scenario because there is no other evidence for *ɢʷ- surviving as a stop at such a late date in the northeast or anywhere else. Nor is there any evidence for *-ʷaŋ becoming *-ung in the northeast.

3.29.0:46: The Jurchen characters

for ong resemble those for ja (see my previous entry)

with two extra strokes on top.

However, Jin Qizong (1984: 236) regarded the ong-graphs as derivatives of the Khitan small script character

071 <ong>.

How would Janhunen explain that resemblance? Do the Jurchen large script and Khitan small script characters both go back to a Parhae prototype? Could the Jurchen character retain a 'roof' lost in the Khitan small script character?

*3.29.0:57: Although there is a strong tendency to write Chinese loans with Chinese characters in Japanese, some Chinese loans are in kana: e.g., サンゴ sango 'coral' (instead of 珊瑚).

Furthermore, Chinese characters do not always represent Chinese loans. In many cases they represent native Japanese words: e.g., 薔薇 for bara 'rose' as well as the much rarer borrowings shōbi and sōbi.

**3.29.1:01: Not all non-Chinese words in Khitan are native: e.g.,

053-051 <qa.gha> 'qaghan'.

may ultimately be of Xiongnu origin. (Has this word been identified in the large script?)

***3.29.1:14: Jin Qizong read two different Jurchen characters

as gung (in my notation), so in theory either could have transcribed Jin Chinese 工 *gung 'work'.

However, the second is only attested as a transcription of 宮 'palace' which was transcribed as

334-019-345 <g.iu.ung>

in Khitan.

So I suspect that the two Jurchen characters originally represented two different syllables, gung and giung, that merged into gung in the Yuan Dynasty Old Mandarin dialect of the Zhongyuan yinyun but not Phags-pa Chinese where they are still distinct as ꡂꡟꡃ <> and ꡂꡦꡟꡃ <>.


When I first became interested in Jurchen, I assumed that its (large) script was "obviously derived from the Chinese script and the Khitan large script, with many innovations of its own" (Kane 1989: 21).

Then I discovered Janhunen's (1994: 114) hypothesis which I still regard as plausible after almost twenty years:

It was the other Sinitic script [of Parhae] that, due to its firm local [i.e., Manchurian] roots, was later transmitted first to the Khitan, and then to the Jurchen. All of this means that the conventional view, according to which the Jurchen script was successive to the Khitan «large» script, cannot be correct. As graphic systems, and heirs of the Bohai [= Parhae] script, the Khitan and Jurchen «large» scripts should be viewed as parallel, rather than successive developments.

There is much more to Janhunen's argument than that, but for now I want to focus on one of its implications. If the Khitan and Jurchen large scripts are offshoots of the Parhae script developed at some point prior to the end of the Parhae state in 926, then the readings of their Chinese-based elements are likely to reflect pre-10th century Chinese phonology to some extent. Such a scenario has a precedent in Old Japanese man'yōgana whose readings contain archaisms from the Chinese learned by the Paekche centuries earlier: e.g.

支 for Old Japanese ki < *ki and *ke is closer to Late Old Chinese *kie than Middle Chinese *tɕie

止 for Old Japanese is closer to Old Chinese *təʔ than Middle Chinese *tɕɨəˀ

(But Gerald Mathias views 止 as a kungana whose reading is based on Old Japanese töma- 'stop' [my təma-]; if so, then the resemblance to Old Chinese is coincidental.)

富 for Old Japanese is closer to Late Old Chinese *puəh than Middle Chinese *puʰ

Conversely, if the Khitan and Jurchen large scripts had no deeper roots, the readings of their Chinese-based elements should be derivable purely from Liao and Jin Chinese, as there would be no way for their creators to know about earlier readings.

Jin Guangping and Jin Qizong (1980: 56-57), Kane (1989: 23), and Kiyose (2004: 93) list Jurchen characters* with readings as well as shapes of Chinese origin**:

Jurchen Jurchen reading Sinograph Liao/Jin Chinese*** Middle Chinese Old Chinese
aci *ci *tɕʰiek *tɯ-qʰjak
ging *ging *kɨeŋ *Cɯ-qraŋ or *qɯ-raŋ
gung *gung *koŋ *koŋ
hi *si *sej *sʌ-ləj
i *u < *wuo *Cɯ-ɢʷa
i *u < *wuoˀ *Cɯ-waʔ
ja *jr *tɕi < *tɕɨʰ *təs
ki *ki *gɨ *gə
ngu *ngu *ŋo *ŋʷa
sa *cha *ɖæ *rla
u *ngu *ŋoˀ *ŋaʔ
dai *da(i) *dɑjʰ *lats
fu < pu *fu *fu < *puoˀ *poʔ
jul *ju *tɕu < *tɕuo *Cɯ-to
shang *shang *ɕɨaŋ < *dʑɨaŋˀ *Cɯ-daŋʔ or *Nɯ-taŋʔ
tai *tai *tʰɑjʰ *l̥ats
ha *ho *ɣɑ *ɢaj
sha *she *ɕjæˀ *l̥jaʔ
shira (Kiyose) or shïra (Jin and Jin) *sien *sen *sˁir < *sˁər < *Cʌ-sər

Out of that incomplete sample of nineteen characters,

- eleven have readings based on Liao/Jin Chinese (green)

- five have readings that could be based on either Liao/Jin Chinese or Middle Chinese (bluish green)

- two have readings that resemble Middle Chinese (blue)

- at least one has a reading that resembles Old Chinese (yellow)

I'll discuss a less likely instance in my next entry.

The last three characters (which all have have Khitan large script predecessors that look exactly like Chinese 何舍先) are hardly solid proof for Janhunen's hypothesis.

The Khitan and Jurchen may have used Liao/Jin Chinese 何 *ho for ha in their languages because there may not have been a character for *ha in Liao/Jin Chinese. (The only character read ha in the Phags-pa Chinese of the Yuan Dynasty is rare: 閜.)

Nonetheless the other two are difficult to explain if they were devised c. 1120 or perhaps even c. 920. Why write Jurchen sha with a derivative of Jin Chinese 舍 *she when Jin Chinese 沙 *sha was a closer phonetic match? And is the close match of Jurchen shira ~ shïra and Old Chinese *sˁir < *sˁər just a coincidence?

*3.28.2:50: Since this post does not deal with the Jurchen small script, I will refer to Jurchen large script characters simply as Jurchen characters.

**3.28.2:58: There are Jurchen characters with shapes of Chinese origin and native readings that are translations of Chinese: e.g.,


looks like Jin Chinese 一 *i 'one' but represented the native Jurchen word emu 'one'.

***3.28.3:15: I wrote Liao/Jin Chinese forms in an orthography resembling my transcriptions of Khitan and Jurchen to facilitate comparison. Khitan and Jurchen voiced obstruents may have been unaspirated and voiceless: e.g., Jurchen jul may have been [tɕul], a close match for Middle Chinese 朱 *tɕu(o).

14.3.26:23:49: QUINTUP-<UL> TROUBLE (PART 3)

In part 1, I proposed that Khitan small script character


might have represented <ül> because

131-366 <u.?> 'winter'
corresponds to Written Mongolian ebül 'id.'

In a generic 'Altaic' language, harmonic rules prevent the mixture of segments from two classes which I will call A and B*: e.g.,

a, u, ł, ɣ ... e, ü, l, g ...

'Neutral' segments can occur with segments of either class A or B.

Hence <ül> should be a class B character that should only co-occur with class B and/or neutral characters within a Khitan small script word block.

I used to think that

098 and 261

represented class A <ał> and class B <(e)l>, but in fact they not only coexist with each other but even with 366 in

340-098-366-261-349-021 <x.ał.üó>** (興宗 26.6)

(021 <mó> looks like an error for the dotless verb ending 020 <ei>)

which is unexpected from an 'Altaic' perspective. I would have expected


class A *130-098-206-098-051-122 <x.ał.uł.ał.ɣ>*** or class B *340-261-366-261-349-020 <x.el.ü>.

366 can also coexist with both class A 051 <ɣa> and class B 349 <ge> in the same text (道宗):


161-366-261-051-189-123 <aú.ül.el.ɣ> (道宗 12.30)

(instead of 161-206-261-051-189-123 *<aú.uł.ał.ɣ>)

and 131-097-372-366-334-140 <u.úr.û.ül.g.en> (道宗 18.6)

That would also be unusual for an 'Altaic' language.

I am conflicted.

On the one hand, Khitan has sets of suffixes implying the presence of an 'Altaic'-style harmonic system: e.g., the causative-passive suffixes (class A?) and (class B?) in the above pair of words.

On the other hand, there seem to be harmonic violations. Are those violations artifacts of incorrect class assignments (e.g., is 366 a neutral character?), or are they real and perhaps even predictable?

The earliest known small script text is dated 1053, over a century after the invention of the small script c. 925. Do all small scripts discovered so far reflect Khitan after its harmonic system began to break down? Would the very first texts in the small script have more harmonic spellings?

*3:27.2:13: I got the A/B terminology from EG Pulleyblank who used it to describe Old Chinese syllable types. Norman (1994) was the first to draw parallels between Old Chinese and Altaic syllable types. I have gone even further and proposed harmony rules for Old Chinese.

I use the terms A and B to avoid specifying the nature of the classes: e.g., front vs. back, ±RTR, etc. As Khitan is in the Manchurian linguistic area, I suspect it had RTR harmony like its neighbor Jurchen.

**3.27.2:17: This is Andrew West's reading. Qidan xiaozi yanjiu has

340-067-366-261-349-020 <x.eü.ü>

which is not only harmonic but also has the dotless verb ending 020 <ei> instead of dotted 021 <mó> which is not a verb ending. I have not seen the handwritten copy of 興宗, and the original stele is inaccessible, so I do not know who is correct.

***3.27.2:24: I assume 206 is a type A character since it is flanked by a-characters in

029-206-189 <tau.uł.a> 'hare'.

14.3.25:23:59: QUINTUP-<UL> TROUBLE (PART 2)

In part 1, I built upon Aisin Gioro's work by equating the following five Khitan small script characters and regarding the first three as variants of each other:

  013 <ul> = 050 <ul> = 206 <ul> = 228 <ul> = 366 <ul>

The second and third appear in the same word:

050-131-206 <ul.u.ul> (道宗 16.21, 20.13 [1101 AD], 蕭仲恭 33.33 [1150 AD])

Did scribes of two different inscriptions nearly fifty years ago apart really use two variants so close together in three instances, or did  050 and 206 have two different readings?

3.26.1:10: Was 050-131-206 for ulul (?) above related to (or at least partly homophonous with)

050-131-366-311-162 <ul.u.ul.b.c> (宣懿 18.2 [also 1101 AD])

050-131-366-311-222 <ul.u.ul.b.ń> (道宗10.25, 15.19, 28.24, 宣懿 17.11)

which have 366 instead of 206 for their second <ul>? Or did 206 and 366 have different readings?

14.3.24:23:24: QUINTUP-<UL> TROUBLE (PART 1)

In my last entry, I proposed that the rare Khitan small script character 013 might be a variant of 050 which Aisin Gioro (2008) read as <ul>. Both in turn resemble 206 which Aisin Gioro (2003) also read as <ul>. Could 206 be yet another variant of 050?


050 <ul> = 013 <ul> = 206 <ul>

If Aisin Gioro is correct, then

029-206-189 'hare'

was <tau-ul-a> = taula, and Khitan may have lost a final -i retained in Written Mongolian taulai.

How can taula be reconciled with the History of the Liao Dynasty transcription 陶里 *tauli for 'hare'? There is no guarantee that Chinese transcriptions and the Khitan small script represent the same variety of Khitan. Perhaps *ai simplified differently in different dialects of Khitan:

Proto-Khitan-Mongolic *taulai
Proto-Khitan *taulai
Proto-Mongolic *taulai
Standard taula
Nonstandard tauli
Written Mongolian taulai

Aisin Gioro (1999, 2004) identified two more small script characters for <ul>. Why did the Khitan have five characters for the same VC sequence?

050 <ul> = 013 <ul> = 206 <ul> = 228 <ul> = 366 <ul>?

The first three may be allographs, but the last two do not resemble them. Did 050/013/206, 228, and 366 originally represent three different sequences? If Khitan were like Mongolic, an obvious two-way distinction would be between <ul> and <ül>. 366 might have been <ül> since

131-366 <u.ul> 'winter'

corresponds to Written Mongolian ebül 'id.' But what would have been a third value contrasting with <ul> and <ül>? Kane (2009: 29) wrote that Khitan "was exceptionally rich in rounded vowels." Was there a three-way contrast between front [y], back [u], and near-high [ʊ] (like Manchu ū)? Did these three characters

131 <u>, 245 <ú>, 372 <û>

represent those vowels without a following lateral? (I almost wrote [l], but /l/ may have had different allophones depending on the adjacent vowel.)

At first, one might identify 131 as ü since it preceded 366 which might have been ül. However,

226 <ü>

transcribed Liao Chinese ü, whereas the three other <u>-type characters were used to transcribe Liao Chinese *u. Were they always interchangeable, or was that interchangeability due to later mergers?

Has anyone looked at Khitan spelling over time? Spelling variation may give us clues to changes in Khitan over a two or even a three-century period. If Nova N 176 is from, say, 1200 - the eve of the fall of the Qara Khitan - its large script spelling could differ from the norm established c. 920. Moreover, some variation may be due to Jurchen speakers' perceptions of Khitan phonology: e.g., Jurchen speakers may have heard only one or two kinds of /u/ in Khitan which might have had three. (3.25.0:16: First-language influence in Khitan texts written by Jurchen speakers has yet to be explored.)


Last night, I accidentally miswrote

070-050 <w.?>

in the Qidan xiaozi yanjiu transcription of 興宗 15.19 as

070-013 <w.?>

with a slightly different and much rarer character that only appears twice in the texts in Qidan xiaozi yanjiu:

028-067-013 <> (道宗 27.9) and 013-224-327 <?> (耶律撻不也 12.1)

Is 013 in any of the texts that have been found in the three decades since the publication of Qidan xiaozi yanjiu? Could 013 be a variant of 050? Is that why Aisin Gioro did not include 013 in 契丹小字の音価推定および相関問題?

028-067 <> (a transcription of Liao Chinese 守 *sheu; could it also be a native word?)

occurs by itself. Does that imply 028-067-013 <ś.eu.?> is a suffixed form, or are they unrelated partial homophones? There are eight forms beginning with 028-067; some have known suffixes (e.g.,

028-067-273 <> ending in what may be genitive <-un> in 蕭令公 25.16 and 許王 50.5)

and others do not (e.g.,

028-067-041 <> 'dew' in 宣懿 25.14 and 許王 cover 1.5).

Aisin Gioro read 041 as <us>* and 050 as <ul> ~ <l-> for reasons unknown to me. The Khitan small script has many sequences of the same vowel in two adjacent characters, so sequences such as

028-067-041 <> and 028-067-013 <> (if 013 = 050)

look plausible. Moreover, 028-067-041 <> may be a variant spelling of

028-067-244 <> (巴拉哈達洞壁墨書 I.2.4; <s> may be a plural ending)

Unfortunately, I know of no

*028-067-261 <>

corresponding to 028-067-013 <>.

*3.24.1:56: If Aisin Gioro's reading of 041 is correct, then

028-067-041 <> 'dew'

woulld be less of a match for Written Mongolian sigüder(i) 'dew'. If Mongolian -der(i) is not a suffix, then perhaps the Khitan form is a reduction of an earlier sigüder(i)-like form to sheu with a plural suffix -s: i.e., drop of dew. The two words may also be unrelated.


Two nights ago, I wrote,

It seems that Khitan VC characters can also double as CV characters. I've guessed that they are CV before consonants and VC before vowels, but that does not always seem to be the case.

Offhand certain types of VC characters are less likely to have reversible readings than others: e.g., VN characters seem to have nonreversible readings with the sole definite exception of

222 <ń> for ~ ńi (see Kane 2009: 61 for the Chinese transcription evidence)

There are dedicated characters for some NV sequences other than ńi and ngV*: e.g.,

139 <na> and 191 <mú>

Conversely, vowel-liquid characters may have had reversible readings:

084 <ar> ~ <ra>, 098 <al> ~ <la>, 261 <el> ~ <le>?

See "<Ra>-Construction 5" and "Did Khitan Have Two Laterals?".

The VG sequence character

020 <ey> (Kane's <ei>)

represented <y> in word-initial position: e.g.,

020-084-131-344 <> '耶律 Yelü'

Could <w> also represent a VG sequence: i.e., Vw? That is unlikely because such sequences already have characters:

019 <iu>, 023 <iu> (?), 067 <eu>, 138 <iû>, 161 <au>, 164 <au> (?), 210 <aú>, 289 <iú>

(I could also transliterate them as <iw>, etc. to match <ey> instead of <ei>.)

Might some of those characters be read as wV in word-initial position? Could 019, for instance, have been read wi 'to not exist, die'? I doubt it because Chinese w- was always written with initial

070 <w>

instead of any of the above <Vu> (= <Vw>) characters. That character never appears in medial position. If Khitan had medial -w-, it must have been written in some other way: e.g., could 019 <iu> stand for -wi- after vowels? Were

262 and its variant 263

ever read as wi instead of [uj]? Was

210-262-140 <aú.ui.en> 'woman of noble rank-GEN' (耶律撻不也 16.14)

from my last entry pronounced something like awiən?

The only non-Chinese Khitan word with an initial <w> is

070-050 <w.?> (興宗 15.19)

according to Qidan xiaozi yanjiu. Its reading is open to question, as Andrew West read it as

073-? <ên.?>

Alas, I can't consult the original because only a handwritten copy remains, and I have not been able to examine a good reproduction of it.

3.23.1:35: That mystery word occurs after a space that may indicate respect. Does it have an aristocratic referent?

Could it be an error for

072-050 <?.?>

from 道宗 25.17 and 耶律撻不也 16.4? Neither of those instances was preceded by a space.

*3.23.1:27: ng was only in Chinese loanwords. Initial and occasionally final ng were written with

264 <ng>.

The absence of ng in native Khitan words is not surprising since Janhunen (2003: 6) did not reconstruct it for Proto-Mongolic which may be the descendant of a 'sister' of Khitan. However, there is no guarantee that Khitan and Proto-Mongolic lacked the same consonants: e.g., Khitan had p, but Proto-Mongolic did not. Similarly, the absence of Proto-Mongolic *w does not guarantee the absence of w in native Khitan words.


I looked through Qidan xiaozi yanjiu (1985) hoping to find examples of

311-151-290 <b.ghu.án> 'sons'

in the construction

numeral♂ + plural masculine noun (see Kane 2009: 139-142 for examples)

but I only found words other than numerals before 'sons':

1. Genitives before 'sons'

210-262-140 <aú.ui.en> 'woman of noble rank-GEN' (耶律撻不也 16.14)

295-097-311-222 192-339 <p.úr.b.iń shï.i> 'Purbin-Madame (< 氏)-GEN' (耶律撻不也 18.27)

The context implies that Purbin is a woman's name. It does not appear anywhere else in the Qidan xiaozi yanjiu corpus.

241-033-222-140 <ń.en> 'lady (婦人)-GEN' (耶律撻不也 18.32)

374 071-154 <> 'grand prince (太王)-GEN' (蕭仲恭 4.53-54)

334-345 104-289-273 <g.ung.j.iú.un> 'princess (公主)-GEN' (蕭仲恭 6.18-19)

311-168-339 <b.qo.i> 'son-GEN' (蕭仲恭 30.4)

2. Plural nouns in apposition before 'sons'

122-254 <ai.d> 'father-PL' (蕭令公 23.6, 耶律撻不也 9.19)

021-247 <mó.t> 'mother-PL' (蕭令公 24.3,  蕭仲恭 29.8)

131-111-254 <u.?.d> '?-PL' (蕭仲恭 43.36)

How did 'fathers-sons' differ semantically from a hypothetical

*122-254 311-151-290 <ai.d.en b.ghu.án> 'fathers' sons'?

Was the genitive suffix unnecessary after certain nouns (e.g., 'fathers' and 'mothers')?

3. Verbs before 'sons'

295-016-189-123 <> 'return-PERF' (許王 48.26)

287-098 <?.al> '?-CONV' (耶律撻不也 24.4)

287 does not occur alone. It tends to precede a-graphs, so it may have been a Ca-graph. I do not know whether it really represented a verbal root. Nor do I know the exact function of the converb -al.

4. Other words before 'sons'

191*-262-348-162 <mú-ui-e-c> '?' (仁懿 8.20)

This could be a noun in apposition. I don't know of any other attestations.

153-254-222 <j.d.iń> '?-PL-GEN'? (蕭仲恭 34.24)

If 153 (which can occur alone) is a noun, could this be a genitive plural noun? But <j.d> '?-PL' is not attested by itself.

Are numerals - masculine or otherwise - attested before 'sons' in the small script texts discovered after 1985?

*3.22.4:02: Why did Kane (2009: 58) transliterate 191 as <mú>? The fact that 191 is often followed by <u>-graphs

262 <ui>, 366 <ul>, 372 <û>

may imply that it ended in <u>, but what evidence is there for an initial <m>?


In my last entry, I noted that <en>, normally a genitive suffix written in a block with a preceding noun, was isolated in 萬部華嚴經塔塔壁題字 2.8:


244-327-073 140 <ên en> (instead of *244-327-073-140 <ên.en>) 'thousand GEN (?)' before 311-290 178 378 < ku "> '?* people' (the reduplication of ku 'person' is reminscent of Japanese hitobito 'people').

I found one other instances of isolated <en> in Qidan xiaozi yanjiu (1985):

134 140 311- <TWO en b.qo> (仁懿 6.3-5) 'two GEN son' = 'two sons'?

An apparent third instance turned out to be the upper half of a now-illegible stack:

162 345 290 140-? <c ung án en.?> 'Chong An (a Chinese name?) ?' (慶陵壁畫題字 IV)

I have wanted to look into such unusual vertical stacks for almost a year now.

Perhaps there are more examples of independent <en> in the small script texts that have been discovered over the last three decades. Could some represent a word ne? (It seems that Khitan VC characters can also double as CV characters. I've guessed that they are CV before consonants and VC before vowels, but that does not always seem to be the case.)

Are the first two instances examples of numerals followed by genitives? Why is <TWO> nonmasculine in 仁懿? What is the semantic difference between

numeral♂ + plural masculine noun (see Kane 2009: 139-142 for examples) and

numeral + genitive + singular masculine noun

Is gender and number neutralized in the latter construction?

I thought <TWO en b.qo> might be 'two' followed by a compound noun '?-son', but I would expect <TWO> to be masculine and '?-son' to be plural:

<TWO♂ en b.ghu.án> '?-sons'.

3.21.2:11: Should <ghu> in <b.ghu.án> be interpreted as a VC character <ugh> before the vowel-initial character <án>? If so, then 'sons' was bughán which might have been from *buqo-án, implying that <b.qo> 'son' was buqo.

The phonetic difference, if any, between


011 ~ 127 <an> and 290 <án>

is unknown. Kane's acute accent for the transliteration of the latter is arbitrary.

*3.21.2:18: This is the only example of <> in Qidan xiaozi yanjiu. I assume it is an adjective modifying kuku 'people'. I thought <> might be an error for <> 'sons', but I would not expect a bare (i.e., non-genitive) plural noun before 'people' unless the meaning was 'sons [and] people'.

What is the semantic difference between kuku 'people' and


047 <ghor> ~ 047-189 <ghor.a> ~ 047-131 <ghor.u> 'people'?

Did kuku refer to individuals while the ghor-words referred to a collective?


Last night, I asked,

Is there any evidence for Sino-Khitan numerals in the Khitan small script?

Although Kane (2009) only listed Khitan native numerals in his glossary of Khitan small script vocabulary, his list of Liao Chinese borrowings in the Khitan small script includes

244-189-184 <> < Liao Chinese 三 *sam 'three' and

244-327-073 <ên> < Liao Chinese 千 *tshien 'thousand'

as parts of longer loanwords but not as independent words. Perhaps they are Sino-Khitan numeral roots but not numerals themselves. Similarly, English has the Greek and Latin root tri- 'three', but tri is not a free morpheme like three (or Sino-Korean sam 'three' or Sino-Japanese san 'three').

I checked to see if <ên> ever appeared outside Chinese loanwords in the texts in Qidan xiaozi yanjiu, and I found

- two instances of <ên> for Chinese 仙 *sien 'Taoist immortal' (道宗 6.12, 31.33)

- two instances of <ên> for Chinese 前 *tshien 'front, before' (蕭仲恭 20.24, 33.39)

- one instance of <ên> (gloss unknown; 萬部華嚴經塔塔壁題字 2.8) before

140 <en>

which might be the native Khitan genitive suffix after a noun (仙 or 前?; the latter was borrowed into Korean as a free morpheme). I assume that last <ên> is not 'three' since I have not seen a numeral-genitive construction in Khitan. (3.20.23:30: Now I have!)

(3.20.1:25: Maybe <en> is not a genitive suffix. Such a suffix would normally be written in the same block as the preceding noun. I will look at other cases of independent <en> next time.)

So far it seems that <ên> is not 'three' outside the context of

244-327-073 264-019 <ên ng.iu> < Liao Chinese 千牛 *1tshien 1ngiu  'thousand-ox'

from Kane's list corresponding to

<sien ng iu> (耶律昌允 2)

in the large script.

However, there are small script texts discovered after the publication of Qidan xiaozi yanjiu in 1985 which I haven't checked. Nonetheless at this point I am skeptical about freestanding Sino-Khitan numerals in the small script. Moreover, I am not even certain that

<si sien ngu bai> < Liao Chinese 七千五百 *4tshi 1tshien 2ngu 4pai  (耶律昌允 4)

represents Sino-Khitan numerals in the large script. Why did Liu and Wang (2004: 91) identify it as 'seven thousand five hundred'?


Kane (2009: 177) listed no Khitan large script character for 'thousand' corresponding to

207 <ming> (cf. Mongolian mingghan 'thousand', Jurchen minggan 'id.')

in the small script, even though Kane made use of Liu and Wang (2004: 91) which identified a similar large script character looking like Chinese 夹* as 'thousand' in line 4 of the 1062 epitaph for 耶律昌允 Yelü Changyun:

Liu and Wang (2004: 79-81): <si sen ŋu pe>

Kane (2009: 178-179): <sï (t)s(i)en ŋu bai>

'seven thousand five hundred' (cf. Liao Chinese 七千五百 *tsʰi tsʰien ŋu pai)

On the other hand, both Liu and Wang (2004: 91) and N4631 identified


as 'thousand' even though Kane (2009: 176) listed it as 'yellow' in calendrical contexts. Andrew West also listed

as another variant of 'yellow'. Were 'thousand' and 'yellow' homophones in Khitan? (Other evidence points to an *n-initial word for 'yellow, gold' in Khitan. See Kane 2009: 165-166. Maybe Khitan had two words for 'yellow', and the calendrical word sounded like Sino-Khitan 'thousand'.)

Did Khitan have two words for 'thousand', a borrowing from Chinese and a native word? The phrase above may be entirely in Sino-Khitan; would


<dalo (?) ming (?) tau jau>

with some unknown character for 'thousand' be its native equivalent? (Ironically the Khitan wrote their native numerals with large script characters sometimes matching Chinese numeral characters but wrote Sino-Khitan numerals with large script phonograms with almost no resemblance to Chinese numerals.**) When did the Khitan use Sino-Khitan and native numerals? Is there any evidence for Sino-Khitan numerals in the Khitan small script? I'll start to answer that last question next time.

*3.19.1:22: What looks like 夹 in Liu and Wang's (2004: 91) handwritten copy of the epitaph might correspond to

in their list of characters on p. 80. I cannot find 夹 in N4631.

**3.19.1:59: 吾, the Khitan large script character for Sino-Khitan 'five', resembles Chinese 五 'five' because it is a graphic cognate of Chinese 吾 whose phonetic is 五.

高 is the glyph for Sino-Khitan *bai 'hundred' in Liu and Wang's (2004: 91) handwritten copy of the epitaph; a variant

appears in their list of characters on p. 81.

I used to think it was significant that 高 <bai> looked like 'high' and was read <bai> because Tangut

1890 2be4 < *Nɯ-braŋ or *Cɯ-mbraŋ 'high' (cf. Japhug mbro < *mbraŋ 'id.')

had a similar reading and meaning, but now I think the resemblance is merely coincidental. The Tangut word is native. If there was a Khitan word for 'high' like *bai, I doubt that the Khitan would have borrowed it or any other basic vocabulary from the Tangut who were far to the west.


Liu and Wang (2004: 91) identified

in the 1062 epitaph for 耶律昌允 Yelü Changyun as a Khitan large script equivalent of Liao Chinese 都統 *du tuŋ 'commander-in-chief' (translated as 'fighter controller' in the small script?) The large script graphs look exactly like Chinese 弟 'younger brother' and 来 'to come'.

Liu and Wang identified 弟 in line 5 of that epitaph as 'younger brother'. Presumably 弟 is a phonetic character in 弟来 'commander-in-chief' (if that gloss is accurate).

'Younger brother' is

(= ?)

101 (and 072 <EAST>, implying 'younger brother' and 'east' were homophones?)

in the Khitan small script. Kane (2009: 47) read this as <deu> but gave no explanation for that reading which resembles that of the first character of Jurchen

<deu.un> deun 'younger brother'

I will regard the Khitan reading of the large script character 弟 and its small script equivalent 101 as unknown.

I also do not know the reading of 来 in the large script.

It may be significant that in the small script, younger brothers precede older brothers, whereas the reverse order (i.e., Chinese order) is in the Chinese-like large script (see Kane's analysis of the 1114 epitaph for 耶律習涅 Yelü Xinie):



Could the small script reflect a native Khitan word sequence while the large script reflected a borrowing from Liao Chinese 兄弟 *xiuŋ di? (Cf. native Japanese 白黑 shirokuro 'white and black' vs. borrowed Sino-Japanese 黑白 kokubyaku or kokuhaku 'black and white'.)

The first half of Liao Chinese 都統 *du tuŋ appears in line 13 of Yelü Changyun's epitaph as

<du giam> < Liao Chinese 都監 *du giam 'director-in-chief'

whose first character matches its Chinese equivalent. The Khitan large script seal version of 都 is on Andrew West's site. Is there a large script term for 都統 like

with a near-lookalike of 統? Or

with the two characters that are possibly equivalent to the near-lookalike of 統?


Last night, I couldn't figure out why Liu and Wang (2004: 87) identified

and (=)

in the large script as <c> and <u> in my transliteration.

I had forgotten about how Liao Chinese 都統 *du tuŋ 'commander-in-chief' corresponded to the native* Khitan term

<cau.j ɣur.ú>

in the small script.

Apparently Liu and Wang equated the large and small script terms:


<c.auj ɣur.ú>? = <cau.j ɣur.ú>

Although all of those small script readings can be more or less confirmed** by their use in other contexts, I am less confident about the large script readings. Maybe


are <HEAVEN ɣur.ú> 'heaven controller' and <HEAVEN BELOW ɣur.ú> 'world controller' (the world being all under heaven), but is the common character

really <auj> in all 13 occurrences in 耶律褀墓誌?

Maybe <auj> was something like *auji if <cau.j> represented *cau-ji 'fight-er' with a deverbal suffix (Kane 2009: 94; his translation is 'those who engage in battle') that was cognate to Mongolian -ci and Turkish -cI/çI*** for names of vocations.

On the other hand, if the ends of the large and small script era equivalents of the Chinese era name 統和 *tuŋ xwo do not match,


large <?.?> = <?.?> ≠ small <s.bu.o.ɣo>?

then there is no reason to expect the large and small script era equivalents of Chinese 都統 *du tuŋ 'commander-in-chief' to match, and.

might represent something other than <cau.j ɣur.ú>.

*3.17.1:07: 'Non-Chinese' would be a more precise term, as I cannot be sure that any non-Chinese Khitan word is native rather than a borrowing from Xiongnu or even Rouran.

Regardless of the precise origin of <cau.j ɣur.ú>, it contrasts with the loanword

<du t.uŋ>

from Chinese 都統 *du tuŋ. The first character is also a transcription of Liao Chinese 度 *du and <t.uŋ> is also a transcription of Liao Chinese 同 *tuŋ.

**3.17.1:04: Kane (2009) presented evidence for the readings of the four components of <cau.j ɣur.ú>:

022 <cau> corresponds to 炒~嘲 *cau in Chinese transcription.

337 <j> is a variant of 152 which corresponds to 只 *wu in Chinese transcription.

014 <ɣur> corresponds to 斛祿~胡虜~胡魯 *xulu in Chinese transcription. I don't know why Kane didn't read it as <x>, as there was no in Liao Chinese.

245 <ú> corresponds to 武 *wu in Chinese transcription.

**3.17.1:17: Turkish c is voiced [dʒ]. Turkish I is a cover symbol for high vowels (i, ı, u, ü). Clauson (1962: 145) regarded voiceless ç as original.


I thank Andrew West for his solutions to the problems that I raised in "The Dissection of Khitan 'Succession'.

According to Andrew, the noninitial characters/blocks of the Khitan large and small script equivalents of the Chinese era name 統和 'uniting harmony' did not represent the same words:


large <tu.uŋ> = <tu.uŋ> (< Liao Chinese 統 *tuŋ 'unite') ≠ native <s.bu.o.ɣo>

That mismatch is a clue to the elusive answer to the question of why the Khitan had two scripts.

The first large script spelling is a phonetic borrowing of 統 *tuŋ 'unite' written as a fanqie initial-rhyme sequence whereas the second is a graphic as well as a phonetic borrowing of 統 *tuŋ 'unite'.

Janhunen might regard the second as a graphic cognate of 統 rather than a derivative. Such a cognate might have been of Manchurian (Parhae?) origin, as I was unable to find a Chinese character resembling it or its right half in Longkan shoujian, Dunhuang su zidian, or at / (see its variants of and ).

Andrew also made the following identifications:

<ging en d(u?).u tu.uŋ> (the transliteration is mine)

京之都統 'commander-in-chief of the capital' (北大王墓誌 17)

Andrew regards the third character as a variant of

which Liu and Wang (2004: 87) somehow identified as [tʂʻ] = <c> on the basis of the small script. I suppose it must be equivalent to

162 <c>.

Kane (2009: 181) interpreted it as <c(i)>. I don't understand their reasoning. Did the character have two readings, <d(u?)> and <c(i)>?

<HEAVEN BELOW tu.uŋ> (the transliteration is mine)

'commander-in-chief of the empire' (耶律褀墓誌 6)

<uŋ> (Andrew) = <u>  according to Liu and Wang (2004: 87) and Kane (2009: 181)
Once again, Liu and Wang's equation is somehow based on the small script - presumably on an equation with

131 <u>.

Since the small script characters

106 <uŋ> 345 <uŋ> 357 <úŋ>

are only for Chinese, could

~ ~

represent a Khitanized loan <tu.u> without a final velar nasal from Liao Chinese 統 *tuŋ?

I wonder if <HEAVEN> had different readings depending on what followed: a Sino-Khitan reading like <tien> before <tu.uŋ> and <tuŋ> in the large script and an unknown native reading* before <s.bu.o.ɣo> in the small script:


<ten?.tu.u(ŋ?)> : <ten?.tuŋ> ≠ <? s.bu.o.ɣo>

*3.16.0:51: Possibly <o> as proposed by Ji Shi (Kane 2009: 63), but I suspect that

<s.bu.HEAVEN.ɣo> instead of <s.bu.o.ɣo>

in the 仁先 Renxian inscription might be an error influenced by a preceding <HEAVEN> if it represents the second half of the era name. (I have not seen the Renxian inscription, so I don't know what context <s.bu.HEAVEN.ɣo> appears in.)


Last month, I discovered Pierre Marsone's translations from the annals of the History of the Liao ending in the middle of the era known as 統和 'uniting harmony' in Chinese and as

<HEAVEN ? ?> ~ <HEAVEN ?> (large script)

<HEAVEN s.bu.o.ɣo> (small script)

The first characters in both Khitan scripts are obviously equivalent to each other though not to Chinese 統 'unite':


I am not certain that the remaining Khitan characters are equivalent to each other.


Although s.bu.o.ɣo> in the small script means 'inherited' or 'succeeded' (Kane 2009: 63, 100, 118), it is possible that one or both of the large scirpt words means 'harmony' like Chinese or something else entirely.

It doesn't help that I can't find any other example of the large script character

with the element resembling Chinese 'thread' in the few texts I have on hand. That character does not appear to be a ligature of the other two characters which are in those texts*:

<PIG/AFFAIR ? ?> (耶律褀墓誌 5; <? ?> is presumably either the word in the era name or a homophone; I have not seen the second character without the third)

< ? ? ...> 'wrote ...' (北大王墓誌 4; unknown if the ホ-like character is a separate word or the beginning of a word)

<... ? ? ? ...> '?' (北大王墓誌 8; word boundaries unknown; unknown if the ホ-like character is a separate word)

<? ?> '?' (北大王墓誌 11, 14, 17; surrounded by various characters on both sides, so possibly a word)

<? ?> '?' (耶律褀墓誌 5, 6 [x 2], 22 [x 4], 24, 25 [x 2], 26; surrounded by various characters on both sides, so possibly a word))

I wish I could compile a list of all Khitan large script character combinations that occur twice and do not contain any known suffixes. Such combinations are likely to be new words or stems.

*3.13.1:45: Although there are at least two other large script characters containing the left-hand component 糹resembling Chinese 糹 'silk' (2197 and 2209 in N4631), I have not seen 糸 as an independent large script character. Moreover, I do not know of any attestations of the right-hand component as an independent large script character or as a component in any other character.

3.13.1:52: Could

  (= N4631 0965 which has 小 under ㄴ+丨 instead of a long 亅 intersecting ㄴ?)

be a variant of the phonogram  <u> (= N4631 2211 which has a bottom hook?). That wouldn't be possible if

represented something like <sbuoɣo> (division point unknown) ending in <o>.

14.3.11:23:59: A DAY 3: TONES

I will finish my survey of the forms for 女 'woman' in Chinese languages by looking at tonal categories. I will not examine tone shapes because they may be even more diverse than the vocalism: e.g., within Mandarin varieties alone, 'woman' can have tones that are high level (Jinan), high falling (Xi'an), mid falling (Wuchang), low falling (Tianchang), low falling-rising (Beijing), and low rising (Hefei). However, those are all reflexes of the same tone category (traditionally called 'rising' though its shapes vary; here I will call it 2).

In standard Mandarin, 2 is the tone that regularly developed in syllables with Middle Chinese sonorant initials (*ɳ-) and glottalization (*-ˀ):

Category 1 2 3 4
Old Chinese *-Ø/m/n/r/ŋ *-ʔ *-s *-p/t/k/kʷ
Middle Chinese *-Ø/m/n/ŋ *-ˀ *-ʰ *-p/t/k
Standard Mandarin: *voiceless 1a 2 3 1a, 1b, 2, 3
Standard Mandarin: *voiced sonorant 1b 1a, 3
Standard Mandarin: *voiced obstruent 3 1b, 3

I would expect 'woman' in other Chinese languages to have tone 2 or 2b. (-a indicates a tone conditioned by *voiceless initials and -b a tone conditioned by *voiced initials.) But in fact 'woman' also has tones in other categories:

Group\Rhyme 1a 12a 1b 12b 2 2a 23a 2b 3 3a 3b













Are forms with tones 1 and 3 evidence for reconstructing Old Chinese variants ending in zero and *-s? And are a-tones evidence for reconstructing a voiceless initial? Not necessarily:

- In standard Mandarin, 2 merged with 3 after *voiced obstruents. Maybe some varieties merged 2 with 3 after *voiced sonorants as well.

- In standard Mandarin, 2 has the same reflex after *voiceless initials and *voiced sonorants. Maybe some varieties developed 2a after those two classes of initials and 2b after *voiced obstruents.

- Some varieties merged 2a with 1a (12a) or 2b with 1b (12b). In some cases this may have only occurred after *voiced sonorants (and the results were labeled 1a and 1b).

3.12.2:13: 馬 'horse' belongs to the same Middle Chinese tonal category (*voiced sonorant + *-ˀ) as 'woman'. The notes at give some insight into its unusual tones in Wu varieties: e.g., the tone from *voiced sonorant + *-ˀ merged with 1a in 溧陽 Liyang, 3a in 衢州 Quzhou, and 3b in 崑山 Kunshan.

14.3.10:23:59: A DAY 2: RHYMES

Two days ago, I wrote about the large number of initial consonants in the forms for 女 'woman' in Chinese languages. There is even more variation among the rhymes. This table may be the widest I've ever made in HTML with fifty columns:

Group\Rhyme -i
-iə -iou
-iɛ -iɔ

-yʮ -ye

-ui -uei


-øi -øy -œi -œy
-ɔi -ɔy













I didn't even include the syllabic nasal rhymes  -ŋ̍  and -n̩ in Hakka. More on such rhymes below.

Let me try to make some sense out of all that. I am sure what follows will have to be wrong to some extent because  I am making generalizations and I do not know the histories of all the individual varieties in each group.

I do at least know the history of standard Mandarin [ny] 'woman':

*Rɯ-naʔ > *Rɯ-nɨaʔ > *Rnɨaʔ > *rnɨaʔ > *nrɨaʔ > *ɳɨaʔ > *ɳɨəˀ > *ɳio > *ɳø > *ɳy > ny

-y is the only rhyme for 'woman' found across Chinese. Many -y forms could be borrowings from Mandarin(-like) prestige dialects with -y.

Other high vowels like -i and -u and the diphthong -iu  could either be from *-y or borrowings of -y in varieties lacking -y.

Falling diphthongs could be from warped high vowels: e.g., Cantonese -œy may be from *-y. Diphthongs like *-œy could lose part of their rounding and become -oi, etc.

Rising diphthongs like -iə might be partial retentions of earlier rising diphthongs like *-ɨə.

The mid vowel rhymes may be partial retentions of the second halves of earlier rising diphthongs like *-ɨə.

The low vowel rhymes may retain the height of Old Chinese *-a. Lowering is doubtful since Chinese vowels tend to raise. Could Wenzhou na* directly reflect the second half of Old Chinese *Rɯ-naʔ?

The nasality of the initial conditioned new codas in rhymes like -ɯŋ.

The nasal was all that was left in Hakka forms such as Meixian which I presume is from an earlier *n- + high vowel sequence.

I can't explain the z-type rhymes: ([zʷ] in IPA), -yz, and -yʮ. Normally such rhymes developed after sibilants, not nasals.

*Wenzhou na is labeled as literary at, though the form listed as colloquial (ȵy) resembles mainstream ny-forms and is hence likely to be a borrowing.


The forms for 女 'woman' in Chinese languages may be only the tip of an iceberg of lost diversity. The northwestern Chinese dialect known to the Tangut became extinct, leaving only substratal traces in the Mandarin dialects that replaced it.

I thought northwestern Mandarin m-forms for 'woman' like Xi'an mi* might be substratal retentions, yet I know of no premodern evidence for such a word.

Could those Mandarin m-forms be borrowed from the m-forms of Jin to the east? Such an m-word for 'woman' need not have anything to do with Tangut m-words for females, as m-words for 'mother' have developed independently in many languages.

女 had a retroflex nasal *ɳ- in Middle Chinese. Coblin (1994: 102) listed two other cases of Xining m- corresponding to Middle Chinese *ɳ- before i:

尼 'nun': Xining mi : Middle Chinese *ɳi

膩 'oily': Xining mi : Middle Chinese *ɳiʰ

Did *ɳ- become m- in the northwest after the Tangut period, and were those m-forms replaced by mainstream n-forms with isolated exceptions like 'woman'? The trouble is that *ɳ- > m- before i makes no phonetic sense.

Let me put aside the above 'mi-stery' and look at the earliest attestations of 'woman' in the northwest:

Tibetan transcriptions from c. a millennium ago (Coblin 1994: 156): ji, Hji, HjI**

Preinitial H- represents a homorganic nasal.

Khotanese Brahmi transcriptions (Coblin 1994: 156): jū, ś̮ī***

Tangut transcription

4706 2ju'3 'a name character' (also 'woman', a borrowing from Chinese)

also used to transcribe the first syllable of 'Jurchen'

Alas, I have not seen any Uyghur or Arabic transcription.

Those transcriptions point to something like *ndžy in early northwestern Chinese.

I use a non-IPA symbol ž to avoid committing to *ʐ, *ʒ, or *ʑ. I use j in my Tangut transcription to similarly avoid committing to *ndʐ, *ndʒ, or *ndʑ (though I think *ndʐ is most likely). I leave out prenasalization in my Tangut transcription as it is nonphonemic and possibly even optionally absent.

The ī ~ ū variation in Khotanese indicates a high vowel like or *y absent from Khotanese. I chose front rounded *y because the Tangut could have borrowed central as the vowel that I transcribe as y (following the convention of transliterating Russian ы as y).

I do not know why a Chinese word was transcribed and borrowed with the mysterious phonetic quality that I transcribe as 'prime' (-') in Tangut. If 'prime' was glottal stop or glottalization, it might correspond to the glottalization of Middle Chinese *ɳɨəˀ which might have survived into a later period. (3.10.0:31: Emmerick and Pulleyblank (1993: 56) think Khotanese transcriptions may reflect the late survival of glottalization.)

In early Tang, *ɳɨəˀ became something like *ɳɖɨəˀ and later *ɳɖjøˀ in the northwest. (I am assuming the glottalization survived even after *jø raised and fused to *y in *ndžyˀ.) I cannot tell whether the Japanese Kan-on reading jo from an Old Japaense reading ndiyə was borrowed from a northwestern *ɳɖɨəˀ or *ɳɖjøˀ; Old Japanese ə would have been the best approximation of if it was present in the source of Kan-on.

*Presumably mislabeled as literary at; the more mainstream-looking form ȵy is more likely to be colloquial.

**Capital I transliterates the mysterious gigu inversé (reversed i) of Tibetan. See Hill (2010: 116).

***I think Coblin used ś̮ to transliterate the letter that Emmerick and Pulleyblank (1993: 55) transliterated as ś’ and interpreted as [ʒ].

3.10.0:30: The transcription ś̮ī is taken from nama ś̮ī, a transcription of the phrase 男女 *nam ndžy 'man and woman'. The prenasalization of *ndžy might have been difficult to hear after the nasal coda of the first syllable.

14.3.8:23:59: A DAY

Today is International Women's Day, so I thought I'd look at the development of 女 'woman' in Chinese languages.

I'll start in ... the middle. A generic Middle Chinese form might be *ɳɨəˀ:

Its retroflex initial goes back to Old Chinese *nr-.

Its diphthong goes back to Old Chinese *a which was raised due to the presence of a preceding high-vowel presyllable that was later lost:
*Cɯ-a > *Cɯ-ɨa > *Cɯ-ɨə

Its glottalization goes back to an Old Chinese final glottal stop *-ʔ.

There are two possible Old Chinese reconstructions:

*Cɯ-nraʔ with a presyllable whose initial consonant left no trace in Middle Chinese

*Rɯ-naʔ with a coronal-initial presyllable that lost its vowel, possibly became *r- before *n- (if *R- was a consonant other than *r- such as *l- or *t-), metathesized, and fused with *n:

*Rɯ-naʔ > *Rɯ-nɨaʔ > *Rnɨaʔ > *rnɨaʔ > *nrɨaʔ > *ɳɨaʔ > *ɳɨəˀ

I favor the latter reconstruction because some modern forms point to a simple root initial *n-:

Group\Initial m- t- d- nd- n- l- nz- z- ʐ- ɲ- = ȵ- ʔɲ- j- g- ŋ- Ø-

The diagnostic forms are not those with n- which is from *nr- (in turn from either *Cɯ-nr- or *Rɯ-n-). They are in fact those with nz-, z-, and ʐ which are reflexes of *Cɯ-n- (and *C- could be *R-).

Forms such as

孝義 Xiaoyi nzu (colloquial*)

舒城 Shucheng Mandarin

石樓 Shilou Jin ʐu

may be from *nɨaʔ which lost its presyllable following the partial raising of *a and developed a palatal rather than a retroflex initial:

*Rɯ-naʔ > *Rɯ-nɨaʔ > *nɨaʔ > *ɲɨaʔ > *ɲɨəˀ > nz-/z-/ʐ-

The palatal nasal ɲ- (> j- > zero) may or may not be a retention of *ɲ- from *Cɯ-n-.

I don't know if the glottal stop in ʔɲ- is real or just a notational device like the one that Zee (2003: 131) rejected. I doubt it is a trace of a presyllable.

In some Min varieties, *ɲ- seems to have backed to ŋ- which may have hardened to g- in 沙縣 Shaxian (via an intermediate prenasalized stage: *ŋ- > *ŋg- > g-).

Elsewhere, the retroflex intitial *ɳ- became n- (> l- or nd- > d- > t-?).

3.9.0:51: I forgot to mention the m-forms in Jin and northwestern Mandarin (Coblin 1994: 101-102, 156; the Jin forms are not listed at I thought the northwestern Mandarin forms could be  substratal words related to Tangut m-words for females:

0092 1ma4 < *Cɯ-ma-C 'mother'

0960 1meq4 < *Sɯ-me 'young girl'

3168 1my'4 < *mi-ʔ 'woman'

3209 1my'4 < *mi-ʔ, first syllable of 1my'4 2ur4 'female servant'

3334 1ma4 < *Cɯ-ma-C 'female'

5162 1my4 < *mi 'mother'

Could the m-forms in Jin be borrowings from the northwest, or are they native?

*The literary form ny is a borrowing from a dialect with n- from *nr-.


Two nights ago, I transcribed the reading of Tangut

4602 'eight'

as 1ar4. One might expect its Tibetan and Chinese transcriptions to be a(r) and 阿 *a, but in fact they are

rye, ?e, na (sic!)


with nonzero, non-glottal stop initials.

Given that evidence and the fact that the word is cognate to Written Tibetan brgyad < *p-rjat and Somang wu-rját*, why don't I transcribe it as 1(r)yar4 from pre-Tangut *rjat?

- Tibetan ry- may reflect a Tangut dialect that did not simplify *ry- to y-.

- Tibetan n- may be due to misreading r- as n-.

3.8.2:16: Andrew West pointed out that na is in fact a Tibetan transcription of

4601 2na4 'second person singular suffix'

and not 4602 1ar4 in this manuscript.

- Tibetan and Chinese e and a may reflect a front low vowel [æ]; Grade IV is associated with frontness: e.g., 耶 ye and 盈 ying both have front vowels in modern Mandarin (which is not descended from the northwestern dialect known to the Tangut that became a substratum of the Mandarin dialects that replaced it).

The answer is that my 1ar4 was a mechanical conversion of Gong's 1·jar:

Gong's · (glottal stop) > my zero

I want to make my transcription as simple as possible for nonlinguists. ʔ is not understood by laypeople, and a letter like q- as a substitute for ʔ- could be misunderstood as [k].

Gong's Grade III -j- > my -4 after Class VIII initials before rhymes

Gong was not the first to reconstruct 'eight' with a glottal stop-yod cluster:

Nishida 1964: 1ˀyar

Sofronov 1968: 1·i̯ạ ̣(I have restored a subscript dot that was accidentally omitted)

Arakawa 1997: 1'ya:r

I am suspicious of this cluster because Nishida, Sofronov, and Arakawa did not reconstruct a simple initial [j]. Is there any language that has initial [ʔj] without [j]? Is the glottal stop really necessary?

Although Li Fanwen abandoned his 1986 reconstruction in favor of Gong's mid-90s reconstruction, I think Li may have been correct when he reconstructed a simple j- instead of ʔj- in 1jǐar 'eight'.

It is true that Chinese transcriptions of Tangut syllables that  Li reconstructed with j- would have been pronounced with initial *ʔ- as well as *j- in Middle Chinese. However, there is no guarantee that the *ʔ- : *j- distinction was maintained before high vowels in the post-Middle Chinese Tangut period. (It has been lost in modern Mandarin.) Moreover, even if the distinction was maintained, the Tangut native speaker author Kwyli Rirphu (骨勒茂才 'Gule Maocai') of the Timely Pearl might not have been able to hear it. His Chinese transcriptions would not necessarily be the same as those of a Chinese native speaker.

I will add y- (IPA [j]) to 'eight' and all other syllables in its fanqie chain in my database of Tangut syllables.

*Although I prefer to use Japhug as an example of a rGyalrong language, Japhug rcat has an inexplicable -c- instead of the expected *-ʑ- from the *-j- preserved in Somang.


In " 'Prime'-'Eight' Problems", I mentioned

2621 2se'4 < *Cɯ-saŋʔ-s 'to think'

as an example of a Tangut word with 'prime'.

Li Fanwen (2008: 431) regarded it as a Chinese loanword. Its pre-Tangut form is almost identical to Old Chinese 想 *Cɯ-saŋʔ 'to think'. Is this striking resemblance due to inheritance or early borrowing?

First, the resemblance is open to question.

Besides the fact that nobody but me reconstructs presyllables in 2621 and 想 to account for their later vocalism, the root initial of 想 is uncertain. We know sure that 想 had initial *s- in Middle Chinese, but *s- had a variety of possible Old Chinese sources: e.g., *s-nasal and *s-liquid clusters (Baxter and Sagart 2014: 151).

Tangut -e'4 could be from *-je-ʔ as well as *Cɯ-...-aŋʔ, so 2se'4 might be cognate to

3469 2se4 < *sje-s  'to know' (cf. Written Tibetan shes-pa 'id.')

with a primary yod if it is from *sje-ʔ.

However, I think all of the above may be excessively cautious given Japhug sɯ-so 'to think' from *saŋ which is a better semantic match for 2621 'to think' than 3469 'to know'. Hence I follow Guillaume Jacques (2014: 180) in reconstructing 2621 with a final nasal (though not a yod or vowel length which correspond to my high-vowel presyllable and glottal stop).

Jacques' *sjaaŋ : my *Cɯ-saŋʔ-s

It is tempting to assume that Japhug retains a presyllable sɯ- lost in Tangut and Chinese, but in fact that sɯ- is a reduplication of the following syllable (Jacques 2014: 180). I am hesitant to project such reduplication back to the common ancestor of Japhug, Tangut, and Chinese (which may have been a daughter of Proto-Sino-Tibetan rather than Proto-Sino-Tibetan itself).

Second, if the resemblance is genuine, it is probably due to inheritance rather than borrowing. Tangut-Chinese contact seems to predate the founding of the Tangut Empire by only a few centuries. I know of no Tangut borrowings predating Middle Chinese. In Middle Chinese, *Cɯ-saŋʔ became *sɨaŋˀ which in turn became *2son3 with a nasal vowel and later *2so3 with an oral vowel in the northwest. (I use 2- to indicate the 'rising' tone and -3 to indicate Grade III.) Tangut borrowings of Middle Chinese words with the rhyme of 想 have Tangut rhymes 53 -o3, 56 -on1, 57 -on2, and 58 -on4, not rhyme 40 -e' which is otherwise unknown in loans from Chinese (Gong 2002: 423-424). If 2621 is from Chinese - which I doubt - it is anomalous and must be a very early loan predating the shift of *-ɨaŋˀ to 2-on3:

Middle Chinese *sɨaŋˀ > pre-Tangut *sjaŋʔ-s (with added suffix) > Tangut 2se'4

One could try to make a similar argument for


2192 1me'4 < *mjaŋ-ʔ 'corpse' and 0781 ~ 0788 2me4 < *mjaŋ-s 'to die'.

as borrowings from Late Old Chinese 亡 *mɨaŋ (< Old Chinese *Cɯ-maŋ) 'to disappear, die', but a pre-Middle Chinese loan is even less likely than an early Middle Chinese loan. I prefer to derive those words from *Cɯ-maŋ-ʔ and *Cɯ-maŋ-s and view them as true cognates of Old Chinese *Cɯ-maŋ.

3.7.0:28: Borrowings of basic words like 'to die' are extremely unlikely when contact is minimal (and of course impossible when contact is nonexistent).

14.3.5:23:55: 'PRIME'-'EIGHT' PROBLEMS

Last night I forgot to include two things in my overview of the development of a-rhymes in Tangut.

First, I wrote nothing about rhymes with the mysterious attribute that I call 'prime': i.e, -a' and -ar'. I write the pre-Tangut source of 'prime' as *X. I arbitrarily write it after vowels, but I really don't know where *X was positioned.
At a glance, it seems that *aX-rhymes developed like *a-rhymes apart from the development of 'prime'': e.g.,

4629 2ghi'4 < *Cɯ-KaX 'to cook'

is parallel to

4513 2dzi4 < *Nɯ-dza-s 'to eat, drink; food'

This afternoon I thought that perhaps *X was *ʔ, and that all cases of the 'rising' tone go back to *-s:

Before: *-ʔ(-s) and *-s > tone 2

Then: *-ʔ > -' + tone 1, *-ʔ-s > -' + tone 2, *-s > tone 2

But now I realize I would have to reconstruct awkward stop clusters with glottal stop in words like

3192 1la'1 < *lakX (*lakʔ?) 'thick'

which has a non-'prime' cognate

2700 1laq1 < *S-lak 'thick'.

A final stop blocked *a from raising. Japhug jaʁ < *laq 'thick' suggests that stop was velar or even uvular. (There is no Tangut-internal evidence for a distinction between velar and uvular codas. I could write the pre-Tangut coda as *-K.)

Maybe *lakX had a final geminate (*lakk) or a stop cluster without a glottal stop (*lak-t or *lak-p which assimilated to *lakk?).

I suspect that tense-rhyme words once had tense initials from *S-C-clusters: e.g., *llak 'thick'. (I write tense initials as geminates following hangul and romanization conventions for Korean.) There are no tense rhymes with 'prime': e.g, *-aq'. Perhaps there was a constraint against syllables of the type *kkakk in pre-Tangut.

Summing up my current view (leaving out presyllables, vowels, and retroflexion to focus on tones and 'prime'):

*-V(C) > *1-V

e.g., *S-lak > 3192 1laq1 'thick'

*-V(C)-s > 2-V

e.g., *Nɯ-dza-s > 4513 2dzi4 'to eat, drink; food'

*-Vʔ, *-V + sonorant + stop, *-V + stop cluster > 1-V'

e.g., *Cɯ-Kaʔ > 4629 1ghi'4 'to cook', *Cɯ-maŋX > 0330 1me'4 'dream' and *lakX > 3192 1la'1 'thick' (*X = *p, *t, *k, *ʔ)

*-Vʔ-s, *-V + sonorant + stop + -s, *-V + stop cluster + *-s > 2-V'

e.g., *Cɯ-ne-ʔ-s > 2518 2ne'4 'heart', *Cɯ-saŋʔ-s > 2621 2se'4 'to think' (cf. Old Chinese 想 *Cɯ-saŋʔ 'to think') and *rjakX-s > 0811 2ar'4 'day' (Forgot to add examples until 3.6.0:36!)

Second, I didn't mention *a-rhymes with what I call 'primary yod' (following Bodman). *r has distinct reflexes before a-rhymes and *ja-rhymes.

Tangraph Li Fanwen number Gloss Pre-Tangut (Jacques 2014) Tangut (Gong) Pre-Tangut (this site) Tangut (this site) External cognates
1579 to get *rja 1rjiʳ *Cɯ-ra 1rir4 Written Burmese <ra> 'to get'
4602 eight *r-jat 1ʔjaʳ *rjat 1ar4 [jaʳ]? Classical Tibetan brgyad < *p-rjat 'eight'

Guillaume Jacques' distiction between *rj- and *r-j- corresponds to my *Cɯ-r- and *rj-. I prefer my reconstruction because his medial *-j- (projected backwards from Gong's Grade III/IV -j-) usually does not correspond to anything in other languages or Tibetan transcriptions of Tangut. I think Grade III/IV generally had other sources: e.g., *Cɯ-. However, I think yod is justified in pre-Tangut if it corresponds to yod in other languages: e.g., 'eight' (which was transcribed as rye in Tibetan). Moreover, external evidence points to *r- and not *j- as the initial of the Proto-Sino-Tibetan root for 'eight'.


Yesterday I derived

4513 2dzi4 [ndzi] 'to eat, drink; food'

from *CI-ndza with a high front vowel conditioning 'brightening' (raising and fronting) of *a to i4. I could have also reconstructed *NI-dza.

Today I would instead reconstruct *Nɯ-dza with symbolizing an unstressed high vowel.

Until now I reconstructed high front and back presyllabic vowels to condition *a in different ways:

*Cɯ-Ca > Ca3/4 (the grade is dependent on the preceding consonant)

*CI-Ca > Ci3/4 (ditto)

However, I now partly follow Guillaume Jacques (2014) who derived Tangut -a from pre-Tangut *-a-stop combinations. Here is my version of the a-rows of his table 39 on p. 206:

Stage 0 Stage 1 Stage 2 Stage 3
*(Cʌ-)...-(r)a 1-i1/2
*(Cʌ-)...-(r)aʔ(-s) *(Cʌ-...)-aH 2-i1/2
*(Cʌ-)...-(r)ap *1-aʔ1/2 1-a1/2
*(Cʌ-)...-(r)ap-s *(Cʌ-)...-(r)aS *2-aH1/2 2-a1/2
*(Cʌ-)...-(r)ar(-ʔ/s) *(Cʌ-)...-(r)ar(-H) *1/2-ar1/2 1/2-ar1/2
*(Cʌ-)...-(r)aw(-ʔ/s) ? 1/2-o1/2
*(Cʌ-)...-(r)aj(-ʔ/s) 1/2-e1/2?
*(Cʌ-)...-(r)am(-ʔ/s) 1/2-on1/2
*(Cʌ-)...-(r)an(-ʔ/s) 1/2-an1/2?
*(Cʌ-)...-(r)aŋ(-ʔ/s) 1/2-o1/2
*Cɯ-...-a *Cɯ-...-ɨa 1-i3/4
*Cɯ-...-aʔ(-s) *Cɯ-...-ɨaH 2-i3/4
*Cɯ...-ap *Cɯ-...-ɨap *1-aʔ3/4 1-a3/4
*Cɯ-...-at *Cɯ-...-ɨat
*Cɯ-...-ak *Cɯ-...-ɨak
*Cɯ-...-ap-s *Cɯ-...-ɨaS *2-aH3/4 2-a3/4
*Cɯ-...-ar(-ʔ/s) *Cɯ-...-ɨar *1/2-ar3/4 1/2-ar3/4
*Cɯ-...-aw(-ʔ/s) ? 1/2-e3/4?
*Cɯ-...-am(-ʔ/s) 1/2-on3/4
*Cɯ-...-an(-ʔ/s) 1/2-an3/4?
*Cɯ-...-aŋ(-ʔ/s) 1/2-e3/4
*(Cʌ)-(r)aˠm(-ʔ/s) 1/2-a1/2?

Notes on stage 0

1. I assume the pre-Tangut coda inventory was similar to that of Old Chinese.

2. I assume that Tangut tones originated from final segments as in Old Chinese.

3. Unlike Old Chinese, pre-Tangut had a velarized *aˠ. This vowel only had distinct reflexes before *-m and *-ŋ.

4. I assume that *-ʔ and *-s were suffixes after consonants. In other words, there were no roots ending in two consonants. (I could be wrong if, for instance, *-nʔ was from an earlier root-final *-nTV, etc.)

5. I assume *-ʔ could not occur after stops: e.g., there was no *-k-ʔ, etc.

6. Some *-w are third person patient suffixes: e.g., *Nɯ-dza-w, the stem of 'I eat it' and 'thou eats it'. That stem was later written as

4547 1dzo?

whose grade could be 3 or 4.

Notes on stage 1

7. *-ʔ and *-h merged into *-H. This merger has no parallel in Chinese.

8. *-p-s, *-t-s, and *-k-s merged into a siblilant *-S that could have been [ts]. This is similar to the merger of *-p-s and *-t-s (but not *-k-s!) in Old Chinese.

9. *Cɯ- conditioned the partial raising of *a to -ɨa.

Notes on stage 2

10. Presyllables might have been gone by this point. They were certainly gone by stage 3 (unless the preinitials in the Tibetan transcriptions of Tangut represent presyllables in a conservative Tangut dialect).

11. Reflexes of *a developed Grade I, whereas reflexes of *ra developed Grade II.

12. *(r)a raised to i1/2 unless followed by a consonant.

13. *ɨa raised to i3/4 (the grade is dependent on the preceding consonant) unless followed by a consonant. The presence or absence of *-r- made no difference before *ɨa.

14. There was a chain shift:

*-S > *-H > *-Ø

Stage 1 *-H conditioned tone 2 (possibly breathy voice at this point?) and disappeared.

A new stage 2 *-H from *-S blocked the raising of *a and *ɨa. Syllables with this *-H developed tone 2.

Notes on stage 3

15. All codas were lost. What appear to be codas in the transcription represent vowel qualities:

Stage 2 -r is [r], whereas stage 3 -r represents vowel retroflexion [ʳ].

Stage 3 -n represents nasalization [˜].

16. Nearly all syllables with (pre)initial r- developed retroflex vowels by stage 3:

*ra > rir

*rCa > Cir

For simplicity this retroflexion is not included in the stage 3 column.

17. The monophthongization of *-aw, *-aj, *-am, and *-aŋ was complete by stage 3, but I don't know if it occurred at stage 1 or 2. A close examination of the layers of Chinese borrowings may clarify the relative chronlology of sound changes in Tangut.


In my last post, I mentioned

4517 1dzi3 'to eat'

as an example of a basic word which should have been in the 'level' tone volume of Tangraphic Sea but was actually in Mixed Categories. All tangraphs with initial dz- were placed in Mixed Categories due to a massive error by the compilers of the first two volumes.

The word is not only noteworthy for its location in Tangraphic Sea but also for its unusual initial-rhyme combination. Class VI initials (i.e., alveolar sibilants) and Class IX z- usually precede Grade I and Grade IV rhymes. So in theory there could be a 1dzi1 and a 1dzi4, but not a 1dzi2 or 1dzi3. However,

4517 1dzi3 'to eat' and its homophones 0382 and 4912

have the Grade III rhyme 1.10 instead of the expected Grade IV rhyme 1.11 in

0943 1110 2696 3259 4829

which are in a separate homophone group in both Mixed Categories and Homophones.

Moreover, the 'rising' tone cognate of 4517 1dzi3 has the Grade IV rhyme 2.10!

4513 2dzi4 'to eat, drink; food'

Another possible Grade IV cognate is

4581 2dzi4 'to entertain at a banquet'

The only other case of Grade III/IV alternation following the same initial that I can think of is

3408 1tsa3 'to broil, roast' ~ 0618 1tsa4 'hot'

Both 'to eat' and 'hot' have external cognates ending in -a(t):

'to eat': Written Tibetan za-ba < *dz- 'to eat', Japhug ndza 'to eat'; more here

'hot': Written Tibetan tsha < *ts- 'hot', tshad-pa < *tsat- 'heat'; more here

Why did *a(t) develop into four different rhymes in Tangut? Normally *-a rose to -i1 unless preceded by the raising prefix *CI- or a stop coda *-p/-t/-k (most likely a suffix *-t in the case of 'hot'):

*CI-ndza-H > *2dzi4 'to eat' (*ndza would have become *1dzi1)

but the *-H-less 'level' tone counterpart is 1dzi3, not 1dzi4!

*tsa-t > 1tsa4 'hot' (*tsa would have become *1tsi1)

How can the anomalous Grade III forms be explained? Do they have some rare prefix? Are they borrowings from another dialect that had undergone different changes? It is unlikely that a basic word like 4517 'to eat' could be borrowed. Could they be archaisms? I used to reconstuct Grade III with central -ɨ- and Grade IV with front -i-. I thought *-ɨ- fronted to -i- after alveolar sibilants, but perhaps 4517 'to eat' and 3408 'to broil' had not undergone that fronting:

*CI-ndza > *CI-ndzɨa > *CI-ndzɨi > 1dzi3 [ndzɨi] (cf. 2dzi4 [ndzi] 'to eat, drink; food')

*tsa-t > *tsɨat > 1tsa3 [tsɨa]? (cf. 1tsa4 [tsia]? 'hot')

Lastly, is the rhyme of the o-stem of 4517 'to eat' Grade III  like 4517 or IV like 4513?


4547 1dzo? 'to eat' = 4517 1dzi3 + 5376 1tso4

I used to think there was no phonemic distinction between -o3 and -o4, and I automatically reconstructed -o4 after alveolar sibilants. So I once reconstructed 4547 as 1dzo4 like


5854 1dzo4 'to rein in; to tie or strap something tightly' = 4829 1dzi4 + 5848 1tsho4

However, 4547 and 5854 have different fanqie (see above) in Mixed Categories implying they weren't homophonous, even though both editions of Homophones have them in the same homophone group. Moreover, their fanqie final spellers are in the same chain:


5376 1tso4 < 4839 1so4 > 5854 1dzo4

so any distinction between them cannot be in their final vowels. My guess is that the fanqie for 4547 and 5854 are to be interpreted as

1dzi3 + 1tso4 = 1dzo3 [ndzɨo]? (with the Grade III medial -ɨ- of 1dzi3 [ndzɨi]?)

1dzi4 + 1tsho4 = 1dzo4 [ndzio]?

Grade III -ɨ- might have fronted to Grade IV -i- in the speech of the compiler(s) of Homophones.


Last week, I asked,

Was it [the Mixed Categories volume of Tangraphic Sea] a compilation of characters that were accidentally left out of the other two volumes, or do its characters have something else in common?

If Mixed Characters were an appendix, I would not expect three of its tangraphs to be among the top 25 tangraphs in the Tangut translation of The Art of War. The table below incorporates frequency data from Kotaka (2009: 2):

Li Fanwen number
Tangraphic Sea
Golden Guide

to say

to do



topic marker

to fight


to be

to say

perfective prefix


to go
genitive-dative suffix

to talk


transcription tangraph
the surname Vi

the surname Li


transcription tangraph

to command

Why would 2dzwo4 'person', 1jeq3 'go', and 2lheq4 'country' be accidentally omitted from the first two volumes of Tangraphic Sea along with all other dz-, j-, and lh-tangraphs*: e.g., basic words such as

4517 1dzi3 'to eat', 0443 1jo3 'long', 2814 2lheq4 'moon, month'

which were in Mixed Categories?

The only scenario I can conceive is this: When the Tangraphic Sea was compiled, tangraphs were sorted by initials and were then sorted by rhyme. Somebody forgot to grab the lists of dz-, j-, and lh-tangraphs when sorting by rhyme, and those tangraphs and scattered others that were accidentally omitted were listed in Mixed Categories.

Why was Mixed Categories ordered by initial class rather than by rhyme? Tonight I realized that consistency might not always have been a good thing. Dividing only 558+ tangraphs into 183 rhyme categories (97 'level' tone rhymes and 86 'rising' tone rhymes) instead of 18 categories (nine initial classes per tone) would have made Mixed Categories difficult to navigate, would have resulted in blank sections under certain rhymes (e.g., 'level' tone rhymes 96 and 97 which are unrepresented in Mixed Categories), and would have scattered the dz-, j-, and lh-tangraphs across the volume instead of conveniently concentrating them in six sections (the Class VI, VII, and IX sections under the 'level' and rising' tones).

*Arakawa (1997: 22, 32, 126, 128) reconstructed the Precious Rhymes of the Tangraphic Sea 'rising' tone volume tangraphs

4781 5919  4983

as 2dzi, 2dzi, and 2ja:, but I follow Gong and transcribe them as 2tshi1, 2tshi1, and 2zha3, so they are not exceptions to the rule that dz- and j-tangraphs are in Mixed Categories.

Arakawa (1997: 32, 128) listed


as a homophone of 4983 (his 2ja:) outside Mixed Categories, but it is in fact in Mixed Categories (he listed it again as a Mixed Categories tangraph on p. 96 but not p. 131) and has a completely different tone and rhyme (1jy3).

I used to think

5780 2lhi2

was a rare example of an lh-tangraph in the 'rising' tone volume of the Precious Rhymes of the Tangraphic Sea, but it is a fanqie initial speller for zh-, not lh-, so I now transcribe it as 2zhi2.

14.3.1:2:11: ?-HEARTED GIRL FIGHTER?

I was puzzled by the Thai title


nak rop saaw hua cay mahaakaan

lit. '-er fight girl head heart ?' = '?-hearted girl fighter'?

for Brave.

Mahaakaan is spelled <mahākāḷ> and seems to ultimately* be from Sanskrit mahākāla-, lit. 'great-black', originally a form of Shiva in Hinduism and later a dharma defender in Vajrayana Buddhism.

The Royal Institute Dictionary defines mahaakaan as a drug or as a plant (Gynura pseudochina).

None of those definitions seem to fit the context of Brave. Would a Scottish princess have the heart of Mahakala who isn't associated with Thai Buddhism? I would expect something like 'brave' modifying hua cay 'heart'.

The Vietnamese title of Brave has no semantic challenges for me:

Công chúa tóc xù 'Princess Bushy Hair'

However, 'bushy' has an unexpected combination of x- (normally < *cʰ-) and a lower series tone in a native word. x- with lower series tones in Sino-Vietnamese (e.g., 蛇 xà) comes from Late Middle Chinese *tɕʰ- < *(d)ʑ-. No such devoicing with aspiration occurred in Vietnamese. I could mechanically reconstruct an earlier voiced aspirate *ɟʱ- to account for the tone of 'bushy', but I wonder if the actual source of x- plus lower series tones in native words could be *cʰ- with a voiced prefix and/or *ɟ- with a voiceless prefix conditioning aspiration.

*The retroflex letter ฬ <ḷ> is due to influence from กาฬ kaan <kāḷ> from Pali kāḷa- 'black' which in turn is from Sanskrit kāla- with a dental l-. I don't understand why "[d]ental and retroflex sounds sporadically change into one another" in Pali.

The Pali Text Society's Pali-English Dictionary (1921-1925) does not list a *mahākāḷa- with a retroflex corresponding to Sanskrit mahākāla- with a dental l.


I usually say that two out of three volumes of the Tangraphic Sea have survived, but for brevity I don't note that the early parts of the level and rising tone sections of the Mixed Categories (MC) volume are missing. This is why Andrew West's electronic version of MC begins with class IV of the 'level' tone and class V of the 'rising' tone.

The Precious Rhymes of the Tangraphic Sea (PRTS) manuscript has bits of those sections. Here is the number of tangraphs per class in each section of Mixed Categories (MC) in PRTS:

'level'/1 3 2 ? 5 6 106 82 22* 74
'rising'/2 1 1 7 2 6 82 72 13 74

The figures (based on Shi et al. 2000: 319-343) are not complete, but what remains leads me to doubt that they could be much higher: e.g., the list of 'level' tone class I tangraphs begins and ends on the same page.

The majority of tangraphs in MC in PRTS are in the class VI, VII, and IX sections largely containing tangraphs for syllables with dz- (VI), j- (VII), and lh- (IX). I don't know why those syllables were not listed in the 'level' or 'rising' tone volumes. dz- and j- are both voiced obstruents, but they do not form a natural class with the voiceless sonorant lh-.

Conversely, why are scattered non-dz-/j-/lh-tangraphs in MC in PRTS: e.g., why was

5405 2ma1 'the Tangut surname syllable Ma'

the sole tangraph in the 'rising' tone class I section of MC instead of in the 'rising' tone volume of PRTS?

*3.1.0:06: This figure includes the class VIII tangraph

0222 1horn1 'to roar, howl'

which was actually listed toward the end of class VII.


I don't want this blog to become a survey of Tangraphic Sea homophone groups. I want to only look at a few cases that differ enough from each other to warrant posts.

This trio in Tangraphic Sea rhyme 1.2 caught my eye because Arakawa reconstructed different initials (š-, š²-) in his Nishida-style reconstruction while reconstructing a three-way merger in his own reconstruction:

Tangraphic Sea 1.2 homophone group Tangraphic Sea circle Example Homophones A Homophones B Nishida-style reconstruction in Arakawa (1997) Arakawa (1997) Gong This site Tangraphs
8 36A38-36A51 41B42 1šĭu 1shyu 1ɕju 1shu3 A 1
9 36B67 1š²ĭu 1shu3 B 3
12 37A54-37A63 37B75-38A11 1šĭu 1ɕjwu 1shwu3 5

Each homophone group has a distinct fanqie. Their initial spellers are in three nonoverlapping chains including the example tangraphs above:

8. <> (initial transcribed in Chinese as 室 *sh- and Tibetan as sh- and (g)j-; transcribed Sanskrit ś- and possibly c-; Sofronov 1968 II: 22 and Tai 2008: 192)

9. <> (initial transcribed in Tibetan as (b)sh-, gs-, and zh(w)-; transcribed Sanskrit ś-, ṣ-, s-; Tai 2008: 192)

12. <> (initial transcribed in Chinese as 說 *shw- or 姪 *chh-; Sofronov 1968 II: 17)

The external evidence generally points to sh- for all three. Group 7 has initial chh-, so group 12 might have initial shw- by process of elimination.

If Tangut had two kinds of sh-, I would expect them to correspond to Sanskrit palatal ś- and retroflex ṣ-, but both were written with tangraphs whose initial spellers were in the fanqie chain for the initial of homophone group 9.

The fanqie final spellers for 8 and 9 are in the same chain:


group 8 < 1013 < 3003 > group 9

implying that the distinction between the two groups is in their initials. Yet Homophones A regards the two groups as homophones unlike Homophones B.

The placement of circles in the Tangraphic Sea also implies that the two groups are somehow united (they are undoubtedly similar), though circles are also missing between groups that are undeniably very different: e.g., 1.1.7 1nu1 B and 1.1.8 1ku1.

The Tangraphic Sea and both editions of Homophones agree that group 12 is distinct, and the final speller 0622 is in a separate chain from that of groups 8 and 9:


0622 1wu3 <> 2842 1shwu3

I have not found any strong transcription evidence for the -w- that Gong reconstructed for 0622 and its fanqie initial speller

1795 2wi4

However, if 2842 had shw-, then its final speller probably had -w- too.


In my last entry, I found that the 'A' and 'B' homophone groups in Tangraphic Sea rhyme 1.1 had identical rhymes because their fanqie final spellers were in the same chain. That may seem like a tautological statement, but there is no guarantee that two or more rhymes (1.1a, 1.1b ...) were not conflated under a single heading. Arakawa reconstructed two types of 1.1 rhymes, 1-u and 1-u2. (All of the 'A' and 'B' pairs have 1-u in his system.)

Groups 6 (1nu1 A) and 7 (1nu1 B) also had fanqie initial spellers in the same chain, so their fanqie seem to indicate they were homophonous. Why would identical syllables be arbitrarily split into two groups?

What about the fanqie initial spellers of the other two 'A/B' groups?

Group 9 (1khu1 A) has a 'rising' tone fanqie initial speller without any known fanqie:

2782 2khi4 < Chinese 氣 *2khi3 'gas'

If the missing volume of the Tangraphic Sea is ever rediscovered, we could see if 2782 is in the same fanqie chain as 4807, the fanqie initial speller of group 10 (1khu1 B):


4807 1khi4 <> 5399 1khu4

2782 and 4807 are in the same homophone group in Homophones A, implying that they had the same initial, but they are in different homophone groups in Homophones B, implying that they had different initials as well as finals.

I thought group 10 (1khu1 B) might have had secondary kh- from an earlier g-. If 4807 'to lose' was a speller for *g-, it should have cognates with voiced initials. However, Jacques (2014: 81, 94, 250) identified its Japhug cognate as kra 'to drop' with a voiceless initial. (The mismatch in aspiration requires explanation.) On the other hand, I think 4807 is a loan from Chinese 棄 *3khi4 'to discard'. Either external connection points to kh- and not to g-.

Lastly, groups 13 (1tshu1 A) and 14 (1tshu1 B) have initial fanqie spellers in the same chain:

group 13 < 3291 < 4996 > 1319 > group 14

so they had the same initial (tsh-) as well as the same rhyme (1-u1) ... and yet groups 13 and 14 are separate in both the A and B versions of Homophones!

13: Homophones A 33B37Homophones B 34A47 (isolated characters without homophones at the end of chapter VI)

14: Homophones A 32B15Homophones B 33A31

I give up ... for now. I doubt this is the last time I'll try to wrestle with this problem.

Tangut fonts by
Tangut radical and Khitan fonts by Andrew West
Jurchen font by Jason Glavy
All other content copyright © 2002-2014 Amritavision