4859 2to1 'to end'

is a relatively simple character that was supposedly derived from more complex characters according to Combined Homophones and Tangraphic Sea 6.231:


0117 2705 5712 0737

2thew1 2ber'4 1jwa3 1chhen3

'(first half of 'finally'?) right to.end bottom'

But surely 0117 and 5712 were derived from 4859 rather than the other way around.

0117 is a particularly odd 'source' as


0117.0048 2thew1 2thwu4 'finally' (?)

is a disyllabic word apparently only in dictionaries. Does it belong to the 'ritual language' (which I think was a substratal language)? It looks like a reduplicative form.

1.19: Li Fanwen (2008: 20) even phonetically glossed 0117.0048 in Chinese as 都都 dudu as if it were a perfect reduplication, though it wasn't; there is no doubt that its two syllables belong to different rhyme categories (2.38 and 2.3). If the word is of native origin or was borrowed very early, it could be mechanically derived from *tʰopH.Pɯ.tʰoH:

*-op > -ew1 (but -ew1 also has other sources; see below)

*-H > tone 2-

*Pɯ- > -w-...4

*-o > -u

Could 2thew1 2thwu4 be a borrowing of something like *tʰop(p)ɯtʰoH? Could a single medial *-p- be the source of both final -w and medial -w-? Could tone 2 have spread from the second syllable? Or was the original medial consonant an aspirated *-pʰ- that was the source of (1) final -w and tone 2 of the first syllable and (2) medial -w- of the second syllable?

One problem with the above scenario is that both halves of 2thew1 2thwu4 are attested apart from each other in the definition for 5712 (Mixed Categories of the Tangraphic Sea 7.133):


5712 3583 4859 5712 0117 5285 ... 0048 5285

1jwa3 1ta4 2to1 1jwa3 2thew1 1ly3 ... 2thwu4 1ly3

'5712 is [as in] 4859 5712 0117 ... is 0048.'

Li Fanwen (2008: 20) translated that definition as 畢者終、竟、畢也...終也 'finish is end, finally, finish, ... is end', interpreting


4859 5712 0117

2to1 1jwa3 2thew1

as separate glosses. However, if that were the author's intent, he could have broken up the three syllables with the phrase-final particle 5285:


5712 3583 4859 5285 5712 5285 0117 5285

1jwa3 1ta4 2to1 1ly3 1jwa3 1ly3 2thew1 1ly3

'5712 is 4859, is 5712, is 0117.' = 畢者終也、竟也、畢也

Although 4859 5712 0117 could be a string of three words (there is no Tangut word for 'and'), I tentatively assume that


4859 5712 0117

2to1 1jwa3 2thew1

is a trisyllabic word ending in a bound morpheme 0117. In any case, 0117 and 0048 are in two separate parts of the definition of 5712 and therefore are probably not borrowed from a single disyllabic word, though it is hypothetically possible for an original disyllabic word to be later reanalyzed as a sequence of two morphemes: cf. Late Old Chinese 獅子*ʂitsəʔ 'lion' (from a form like Tocharian B ṣecake) later reanalyzed as 'lion' + noun suffix.

I thought 0117 2thew1 might be a reduplication of 4859 2to1 in an X Y X' pattern, but they can't be terribly close in pre-Tangut,


4859 5712 0117

2to1 1jwa3 2thew1< *tokH PɯNCaC KtopH?

and it would be weird for X' in such a pattern to then combine with an X'' in another word - namely,


0117.0048 2thew1 2thwu4 < *KtopH Pɯ.KtoH?

Could 4859, 0117, and 0048 share a root *to? Here is a list of possible reconstructions for each morpheme:

4859 2to1 < *taŋH, *tokH, *tojH?

*taŋH resembles Old Chinese 終 *tuŋ 'end', but the vowels don't match.

0117 2thew1 < *tʰopH, *Cʌ.tʰukH, *Cʌ.tʰikH

or *KtopH, *Kʌ.tukH, *Kʌ.tikH?

The *top-like reconstructions resemble Proto-Kuki-Chin *toop 'end', but it's unlikely a o

I reconstruct lower-vowel presyllables *Cʌ- and *Kʌ- to condition Grade I in the higher-vowel rhymes *-ukH and *-ikH; without such presyllables, those rhymes would have retained vowel height and developed into Grade IV -iw rather than Grade I -ew.

If aspiration is not original, then it is from *K-.

0048 2thwu4 < *Pɯ.tʰoH, *PtʰəH, *Pɯ.KtoH, *PKtəH, *Kɯ.PtoH, *KPtəH?

I assume medial -w- is always secondary from *P-, but I could be wrong.

I reconstruct higher-vowel presyllables to condition Grade IV in the lower-vowel rhyme *-oH.

If aspiration is secondary, then it is from *K-, and the order of this *K- relative to *P- is uncertain.

Out of all the above possibilities, I could pick a set sharing *to as a common denominator and then regard the other elements as affixes:

4859 2to1 < *to-k-H, *to-j-H?

0117 2thew1 < *K-to-p-H?

0048 2thwu4 < *Pɯ-K-to-H, *Kɯ-P-to-H?

But what would those affixes mean? And are there any other alternations of the type -u ~ -ew justifying the reconstruction of an earlier alternation *-Ø ~ *-p?

Putting diachrony aside, the synchronic meanings of 0117 and 0048 are uncertain:

Li Fanwen number
Clauson 2016
Nishida 1966
Grinstead 1972
Kychanov and Arakawa 2006
Li Fanwen 2008
遇い終わる 'to finish meeting'
заканчивать, завершать
finish, end
約束, 完結, 終
completely, finally
完 (adv.)
會見を終わる 'to end a meeting'?
約束, 終
at last, in the end
終 (adv.)

(no polysyllabic words)
заканчивать, завершать finish, end
約束, 完結, 終了, 做完
完畢, 終畢

(1.29.1:27: Filled in Nishida column. 0117 does not have its own entry in Nishida 1966, but its meaning is given in the entry for 0048.)

I think the definitions in modern (i.e., post-Clauson) dictionaries are speculative. Not entirely groundless - the fact 0117, 0048, and 0117-0048 appear in definitions for 'end'-words indicates that they mean something like 'end'. But 'something like' is not the same thing as certainty that they are verbs (according to Kychanov and Arakawa) or adverbs (according to Li Fanwen). It is, however, more than a simple question mark indicating we have no idea what something means.

1.24.16:01: A future Tangut dictionary could distinguish between three categories of words:

1. words whose meanings can be confirmed from context

2. words whose general semantic domain can be determined from dictionary entries

3. words whose meanings are unknown

0117, 0048, and 0017-0048 fall into the middle category. Strictly speaking, 0117 may not even be a word; it  may be a bound morpheme.

A distinction between bound and free morphemes would also be a useful feature of a future Tangut dictionary. Current dictionaries are character-based, and all characters are given definitions, even though not all characters represent free morphemes. Users unfamiliar with Tangut cannot easily determine whether a given nontranscription character represents a word (i.e., a free morpheme) or only part of a word. (Transcription characters are indicated as such and by definition represent sounds, not words.)

To come: Is 0099 another member of the 'final' family? WHAT'S SO MATERNAL ABOUT BROTHERS?

Could character structure elucidate the meaning of


0012.5873 1bu3.2kuq1

which Li Fanwen (2008: 3, 926) defined as 'brothers'?

The first character 0012 has this analysis in Tangraphic Sea 1.7.131:


0092 2750 5415 1602

1ma4 1ghu2 1bu3 2ngorn1

'mother head <bu> all' = top of 'mother' plus all of the homophonous phonetic <bu>

(I use < > to indicate that <bu> is a transliteration - only a loose phonetic approximation and not IPA.)

Of course brothers are born from mothers. But so are sisters. Why not abbreviate 'man' or, better yet, one of these more common characters to create 0012?


2447 0605

2lo3 2toq4


Could 1bu3.2kuq1 have referred to brothers sharing a mother?

1.18.19:33: But if that was the case, why isn't the top of 'mother' also in 5873? Disyllabic words written with characters sharing the same component are common in both Chinese and Tangut?

Combined Homophones and Tangraphic Sea A 7.203 analyzed 5873 as


5876 3936 5307 2705

2kuq1 1pha1 1ghwi2 2ber'4

'<ku> left power right' = left of the phonetic <ku> + right of 'power'

5876 2kuq1 means 'to tie', so its meaning may also be relevant. Could 0012.5873 be interpreted as


'mother' + <bu> + <ku>/'tie' + 'power'

i.e., powerful (people) with maternal ties called <bu.ku>? Why 'powerful'? Could the right-hand component just signify 'person': 'people with maternal ties called <bu.ku>'? That component appears in three of the characters for m-'people' words. However, I assumed it was self-promoting in the autonyms

𗼇 ~ 𗼎𗾧

2344 2mi4 ~ 3752 3296 2my4 2na4 'Tangut'

since it means 聖 'sage' by itself and corresponds to Sanskrit ārya 'noble' (Clauson 2016: 339). It doesn't seem to have such a function in the character


for the presumably neutral word 0607 1myr4 'people, clan'. Maybe the common denominator of 5873, 2344, 3296, and 0607 is 'kinsman' which would explain why the component



without implications of kinship was not used in 5873.

That component does, however, appear in


2447 2lo3 ''

but not


0605 2toq4 ''

which has a different component


of unknown origin and function. TOUCHY-FEELY TOOL HARMONY

Tonight I discovered the spelling 摸摸具和 'touch touch tool harmony' for Japanese momonga 'flying squirrel' in Wikipedia. To modern Japanese eyes it looks as if it should be read momoguwa, and it turns out that in the Edo period it was read as something like momongwa. Why not spell momongwa as <> with a <mon>-graph? Was 具 still read with a prenasalized stop [ŋg] when the spelling 摸摸具和 was devised? Offhand I can't think of other cases of unwritten -n-. Or of CwV-syllables spelled as CV.CV. The two-character spelling 具和 <gu.wa> for gwa indicates that gwa in loans from Chinese (see a list here) had become ga and therefore gwa-characters (瓦畫) were no longer suitable for transcribing gwa.

1.18.12:51: According to Wikipedia, the word is first attested as momi in the Heian period; momo came later, and momongwa is from the Edo period. If -n- is short for genitive *-no-, then what is -gwa? Could it be an irregular reduction of something like *kupa? Could that reduction postdate the simplification of Sino-Japanese gwa to ga?

Stage 1
Stage 2
Stage 3
(readings for 瓦畫)
flying squirrel

I don't know what the *kupa in *mono-no-kupa would be, but I doubt it's 鍬 kuwa < *kupa 'hoe' or 桑 kuwa < 具波 *kupa 'mulberry'. (Wish I had 上代仮名遣辞典 A Dictionary of Old Japanese Kana Usage by 五十嵐仁一 Igarashi Jin'ichi on hand to quickly find the Old Japanese phonogram spellings - if any - of those words and remove the asterisks. I did find the combining form 具波 gupa for 'mulberry' in Man'yōshū 3350.)

I wouldn't normally expect the syllable gwa or kwa in a native Japanese word, though such syllables are not impossible in native Japonic words: e.g., Okinawan kwain < *kura- 'eat', cognate to Japanese kurau. CLAUSON 2016: THE FRATERNAL TEST

Two days ago, I got my copy of Sir Gerard Clauson's skeleton dictionary of Tangut over twenty years after I had first read about it in Analysis of the Tangut Script.

One of the many things I like about Clauson's dictionary is that it is free of the speculative definitions found in later dictionaries. For instance in my last entry, I used Li Fanwen's (2008: 3, 926) definition of 'brothers' for


0012.5873 1bu3.2kuq1

That definition is presumably based solely on the Tangraphic Sea definitions of 0012 and 5873:

Tangraphic Sea 1.7.131:


0012 3583 0012.5873 5285

1bu3 1ta4 1bu3.2kuq1 1ly3

'0012 TOP 0012.5873 AFF' = '0012 is [as in] 0012.5873'


2447 0605 5285

2lo3 2toq4 1ly3

' AFF' = '[It means] elder [and] younger brother'

4739 0213 0635.1424 1139 1279

1tsewr1 1ne4 1ny4.1thu4 1e4 1y4

'joint near relative GEN COMP' = '[It is what] closely related relatives [are] called'

Combined Homophones and Tangraphic Sea A 7.203:


2447 0605 4739 0213 0635.1424 1139 1279

2lo3 2toq4 1tsewr1 1ne4 1ny4.1thu4 1e4 1y4

' joint near relative GEN COMP' = '[5873 is what] elder [and] younger brothers [and] closely related relatives [are] called'

The word 0012.5873 is apparently not attested outside the entries for its characters in dictionaries.

Last night I looked up both 0012 and 5873 in Clauson, and as I had hoped, he glossed both as '?' in entries 1069 and 3120. The question marks most likely reflect Clauson's lack of access to the Tangraphic Sea, but I think they are still appropriate to some degree today because there is no guarantee that the components of Tangraphic Sea entries are precise synonyms: e.g., 'elder and younger brothers' is certainly not the same thing as 'closely related relatives'. So could 0012.5873 have been 'sibling'?

1.14.11:14: I don't think 0012.5873 was 'sibling' because I would expect 'sibling' to appear in definitions for sororal words. Perhaps 'closely related relatives' is needed to specify biological brothers as opposed to brothers in a broader, nonbiological sense: e.g., males of the same age. Were Tangut 2lo3 'elder brother' and 2toq4 'younger brother' used as nonbiological terms of address like Burmese ကို ko 'elder brother' and မောင် maũ 'younger brother'?

Unfortunately the Tangraphic Sea definitions for 2lo3 'elder brother' and 2toq4 'younger brother' have been lost. I would not expect 0012.5873 to appear in them since I think 0012.5873 was a subset (biological) of 2lo3-2toq4 'brothers' in a broader sense. ANTHROPOGENESIS IN TANGUT

One of the first Tangut words - and characters - that I learned over twenty years ago was


2541 2dzwo4 'person'

which doesn't belong to the *m-'people' word family from my last post.

I never knew its etymology until I saw the Loloish words for 'person' in Burling (1967: 89):

Lisu tshō, Lahu chɔ̄, Akha tsɔ́hà

which are from Proto-Lolo-Burmese *tsaŋ.

Tangut -o is partly from *-a, so 2dzwo4 could be from *Pɯ-N-tsaŋH or *Nɯ-P-tsaŋH with

- *P- to condition medial -w-

- *-ɯ- to condition Grade IV

- *-N- to condition voicing of *-ts-

- *-H to condition tone 2 (the 'rising tone' - or was it really a phonation?)

According to STEDT, this word is also found in Central Naga and Bai (see forms here), so it is not an innovation of Burmo-Qiangic (Jacques' [2014: 2] proposed Sino-Tibetan subgroup containing both Lolo-Burmese and Tangut [as part of Qiangic]).

1.10: I'm glad I didn't post this right away because I realized my proposal has a problem.

Jacques' (2014: 206) pre-Tangut *-jaŋ (= my *Cɯ- ... -aŋ) became Gong's Tangut -jij (= my -e3/4), not Gong's Tangut -jo (= my -o3/4). (The initial determines whether the rhyme has Grade III or IV.)

Therefore I would expect *-jwaŋ (= my *Pɯ- ... -aŋ or *Cɯ-P- ... -aŋ) to become Gong's Tangut -jwij (= my -we3/4), not Gong's Tangut -jwo (= my -wo3/4).

2dzwo4 ends in -wo4, not -we4, so it cannot be from *Pɯ-N-tsaŋH or *Nɯ-P-tsaŋH. Or can it?

I can't find any examples of *-jwaŋ (= my *Pɯ- ... -aŋ or *Cɯ-P- ... -aŋ) in Jacques (2014). I propose that such a sequence became -wo3/4:

*Pɯ-Caŋ > *Pɯ-Cɨaŋ > *P-Cɨaŋ > *Cwɨaŋ > *Cwo3/4 and/or

*Cɯ-P-Caŋ > *Cɯ-Cwaŋ > *Cɯ-Cwɨaŋ > *Cwɨaŋ > *Cwo3/4

The medial *-w- 'encouraged' the following vowel to retain its labiality, whereas labiality was lost without *-w-:

*Cɯ-Caŋ > *Cɯ-Cɨaŋ > *Cɨaŋ > *Ciaŋ > *Cö > *Ce3/4

The *Cɯ- above is not *Pɯ- which would have condtioned -w-.

Unfortunately I do not know of any Chinese loanword evidence for my proposed sound change. Middle Chinese *-waŋ3 corresponds to Sino-Tangut -on1 rather than -wo3 in the one case known to me (Gong 2002: 424):

旺 MC *3waŋ3 : ST 𗼤 2340 1von1 'prosperous'

I suspect the word was *3won3 with a nasalized vowel -on in Tangut period northwestern Chinese (TPNWC), and that this form was borrowed into Tangut with -on1, a nasalized vowel rhyme that originated from something like *-om, a merger of Cʌ- ... -um, *-am, *-em, and *-om (but not rhymes ending in the velar nasal *-ŋ!). If my proposal is correct, an earlier borrowing of 旺 might have had -wo rather than -on in Tangut. Here is a possible relative chronology:

Stage 1 2
Tangut *-waŋ3/4 -wo3/4
*-om1 -on1
TPNWC *-waŋ3 -won3

At stage 1, TPNWC *3waŋ3 is a better match for Tangut *vaŋ3 (I write initial *w- as v-) than Tangut *vom1. But at stage 2, TPNWC *3won3 is a better match for Tangut 1von1 (which was how *3won3 was actually borrowed) than Tangut *vo3.

Did the sound change *-aŋ > -o spread from Chinese to Tangut? Japhug underwent the same change (Jacques 2004: 143) even though it was not in contact with Chinese until recently and its ancestor separated from that of Tangut long ago. A case of drift? Or just coincidence? The fusion of au into o is common (e.g., Sanskrit*), though the shift > > *u that would precede it isn't.

Lastly, on Monday morning in the rGyalrongic Languages Database I found some forms for 'person' that have labial + affricate initials like my pre-Tangut *Pɯ-N-tsaŋH: e.g.,

mDaH mdo βdzi

Tag gsum vdzi̤

At first I thought -i might be an unusual reflex of *-waŋ. However, I suspect that -i is from a rhyme with a lost *-t given Ri ṣe wdzit̚.

Forms like Nye dgaH brgya gcig vdzɨmi look like redundant compounds of the Pdz-word for 'person' with the m-'people' word from my last post.

I was initially hopeful that a third type of 'person' word in the database might be related to Tangut 2dzwo4 < *Pɯ-N-tsaŋH:

Rong wam kə' mcu

Wobzi vɟú

Hbrong rzong βɟuʔ

But now I think their palatal stops are hardened from what might be a *-j- still more or less present in

Khog po kə' mbju

Tsho bdun A ke' ᵐbo

Tsho bdun B kə³³ rəᴺ⁴⁴ bjo⁵⁴

Khang sar kə' rbju

rDzong Hbur kə' rmbju

Those words are reminiscent of Gong's 1bjuu = my 1bu3, the first half of


0012.5873 1bu3.2kuq1 'brothers' <*NPə.SkoH or *NɯPo.SkoH**?

a word only known from dictionaries. But I do not know of any examples of 1bu3 standing by itself, so I don't think there is any connection.

Go la thang nya lo ta' ʁap is a fourth type of rGyalrongic word for 'person' without any known Tangut cognate.

*Sanskrit au is from *āu. There was a chain shift: *āu > au > o.

**1.12.6:12: These reconstructions assume that the word is native or at least was borrowed before the sound changes that occurred between pre-Tangut and Tangut.

I suspect the word is from a non-Sino-Tibetan substratum that is the source of other unanalyzable disyllabic words in Tangut. Could it have simply meant 'brother' without any age distinction?

In Old Chinese, there was a strong tendency for both halves of disyllabic noncompound words to be of the same syllable type: AA or BB rather than AB or BA.

A-type syllables had low presyllabic vowels (*ʌ) or lower main vowels (*e *a *o) and developed Grades I or II in Middle Chinese.

B-type syllables had high presyllabic vowels (*ɯ) or higher main vowels (*i *ə *u) and developed Grades III or IV in Middle Chinese.

Tangut and Chinese seem to have undergone similar (though not identical) developments. I believe both languages underwent syllable-internal harmonization: i.e., the height of the main vowel harmonized with the height of the presyllable (if any). The presyllables were then lost, the harmonized vowels became phonemic, and the two languages developed a four-grade distinction.

Chinese disyllabic noncompound words usually had height harmony. I have never looked into whether Tangut disyllabic noncompound words also usually had height harmony. Tangut 1bu3.2kuq1 lacks height harmony; it combines a Grade III (type B) syllable with a Grade I (type A) syllable. If height harmony was the norm in Tangut between as well as within syllables, then 1bu3.2kuq1 was either a loanword from a language that lacked height harmony*** or a compound 1bu3-2kuq1****. (I use hyphens to indicate morphological boundaries and periods to indicate linked syllables without any certain morphological relationship between them.) I favor the former, as I have not found words like 1bu3 or 2kuq1 with meanings I would expect for the halves of 'brothers'. I also have not found a source for 1bu3.2kuq1. I suspect Tangut may be our only source of information on its substratum: i.e., we will never find external confirmation for a word like buku.

***Cf. Turkish kitap 'book' from Arabic kitāb. Kitap violates Turkish palatal vowel harmony because it contains a front vowel i followed by a nonfront vowel a.

****Cf. Finnish seinäkello 'wall clock', a compound word without palatal harmony across its halves: seinä 'wall' has a front vowel ä whereas kello 'clock' has a back vowel o. (The vowels e and i are neutral.) THE TANGUT *M-'PEOPLE' WORD FAMILY

For a long time I have assumed that


2344 2mi4 'Tangut' (see my last post)

and the first syllable of


3752 2my4 2na'4 < *-k 'Tangut' (borrowed into Tibetan as mi-nyag)

was from *mi, a cognate to Tibetan mi 'person', and that 2na'4 was from *Cɯ-nak-XH, cognate to Tibetan nag 'black' and almost homophonous with

𗰞 0176 1na4 'black'.

In short, I thought that the Tangut called themselves the '(Black) People'. I thought that 2my4 was phonetically something like [mjə], an unstressed, reduced form of the independent monosyllabic form 2mi4.

Although I still think 2my4 had some sort of nonlow nonpalatal vowel, last night I realized that 2mi4 could not go back to *mi because *i backed to Tangut -y. Tonight I think 2mi4 is from *Cɯ-meH with a mid vowel like the main vowel of Japhug tɯr-me 'person'. Cf.


4469 2shi3 < *Cɯ-sheH 'to go' : Japhug ɕe 'id.'

The final *-H symbolizes the glottal source of the second 'tone' 2- (which may have been phonation rather than a ptich). The presyllable could not have ended with *-r like Japhug tɯr- at the time Tangut developed retroflex vowels because *Cɯr-meH would have become Tangut *2mir4 with a retroflex vowel -ir, not 2mi4 with a nonretroflex vowel -i. Jacques (2014: 24) identified


3818 2mer4 < *Cɯr-mejH 'person; nominalizer'

as a cognate of Japhug tɯr-me with the expected retroflex vowel. (But what is the *-j needed to block the raising of *-e to -i? A suffix? Is *-mejH from *-meH + *-j?)

The alternation of 4469 2shi3 with


4481 1shy3 < *Cɯ-sheH 'to go' : Japhug ɕe 'id.'

is reminiscent of the -i ~-y alternation of

𗼇 ~ 𗼎𗾧

2344 2mi4 ~ 3752 3296 2my4 2na4 'Tangut'

though the former is not part of a disyllabic word.

A monosyllabic member of the m-'people' word family with -y is


4574 1my4 < *mi 'other person'

which Jacques (2014: 145) also identified as a cognate of Japhug tɯr-me. Could this be the true direct cognate of Tibetan mi?

Another such member is


0607 1myr4 < *r-mi 'people, tribe'

I am now inclined to think there was a *-e ~ *-i alternation in the pre-Tangut *m-'people' word family:

*-e-words *-i-words
𗼇 2344 2mi4 < *Cɯ-me-H 'Tangut'
𗇋 3818 2mer4 < *Cɯ-r-me-j-H 'person; nominalizer'
𘈑 0607 1myr4 < *r-mi 'people, tribe'
3752 2my4 < *mi-H- (first syllable of 'Tangut')
𘉑 4574 1my4 < *mi 'other person'

Did that alternation originate as a distinction between, say, a schwa-grade *-əj and a zero-grade *-i? The Sanskrit alternation between guṇa-grade nara- and zero-grade n-, both 'man', comes to mind.

Next: What is the etymology of the most common word for 'person' in Tangut? INSTALLATION-FREE TANGUT

After my initial post using the Tangut Yinchuan font, I was worried about how to tell readers they'd need that font to see Tangut text in subsequent posts. Thanks to Andrew West and David Boxenhorn, you may now be able to see


2344 4797 2403 2mi4 1wyr4 2di4 'Tangut script'

on this blog without installing a font on any device. I've been using images for the past decade to be able to read Tangut on my phone, but now I only need them for Khitan and Jurchen until they're added to Unicode. Unfortunately I can't view the characters online in Chrome even though they're visible in my local copy in Chrome. But they are visible online in Firefox and on my iPhone. I don't know about visibility elsewhere. I don't have time to fix this issue right now. Maybe next week. MIYAKO IN TANGUT

Japanese names are Sinified by reading their Chinese characters (if any) in a Chinese language: e.g., 宮古 Miyako is Mandarin Gonggu, Cantonese Gunggu, etc.

How would Japanese names have been Tangutized? The Tangut only knew of Japan through Chinese written records, so they wouldn't have known how Japanese names were pronounced in Japanese, much less other Japonic languages like Miyako. Thus the Tangut would have phonetically transcribed the Tangut period northwestern Chinese readings for the characters of Japanese names: e.g., 宮古 TPNWC *1kun3 2ku1 would have been Tangutized as


1306 1034 1kon4 1kwo1

using the transcriptions of 宮 and 古 in the Forest of Categories.

Tangut had no rhyme -un3 and generally did not permit Grade III rhymes after velars, so -on4 was the best available match for TPWNC *-un3.

Although Tangut had a rhyme -u1 whose romanization on this site happens to match TPWNC *-u1, my notation is not IPA, and perhaps Tangut -wo was an attempt to approximate a TPWNC final like [ʊ], a vowel partway between [u], the vocalic counterpart of the glide w, and mid o.

There are other hypothetical and less likely approaches to Tangutizing Miyako.

One is to translate the Chinese characters 宮古 'palace ancient' into their Tangut equivalents: e.g.,


1623 0429 2vaq1 2nwo4 'palace ancient' (which happens to have Tangut noun-adjective order!).

(Li Fanwen's Chinese-Tangut index has a typo; it lists 0428 as the equivalent of 古 'ancient'.)

Another - the least likely of all - is to phonetically transcribe the Japanese name:


5026 5314 2946 1mi4 2a4 1ko1

I have used the Tangut characters for transcribing the Sanskrit syllables mi, ya, and ko. 5314 was probably phonetically something like [ja].

But how would the Tangut have known that 宮古 was read Miyako?

All of the above assumes the spelling 宮古 existed during the heyday of the Tangut. But I don't know how old it is. JAROSZ ON NEVSKY ON MIYAKO

I was planning to write a follow-up to this post using the Tangut Yinchuan font. But I ran out of time, so I'll merely link to Aleksandra Jarosz' 2015 PhD dissertation Nikolay Nevskiy's Miyakoan Dictionary reconstruction from the manuscript and its ethnolinguistic analysis: Studies on the manuscript (via Bitxəšï-史). Although it is obviously about 宮古 Miyako, its profile of Nevsky is still of interest to Tangutologists. It is no wonder that he "succeeded in deciphering the highly complicated, Chinese-character-inspired and by then largely unintelligible script of the medieval Xixia kingdom, the homeland of Tangut speakers" given that he

... was a very prolific and dedicated scholar, remembered by his colleagues and informants alike as one truly open-minded and able to grasp the cultures and languages of the subjects of his study almost intuitively. He was also a brilliant multilingual speaker, reportedly having mastered as many as sixteen Asiatic languages (apart from Japanese including Tibetan, Mongolian, Manchu, Pali, Korean and Giliak), as well as English, German, French and Latin (Kanna 2008:167). He acquired his first Orient languages as early as in the times of his Rybinsk gymnasium (post-1900), when he learned Tatar from a local family of native speakers, as well as mastered Arabic alphabet through self-study (Katō 2011:18). (p. 19)

And the Tangutologist and Khitanologist Viacheslav Zaytsev appears on page 8 and in the acknowledgements!

Next: How to write 'Miyako' in Tangut. TANGUT AVIAN ANATOMY

About twenty-five years ago I learned the following method to convert base-10 numerals from 1 to 60 into their Chinese sexagenary equivalents.. The coming Chinese new year is the 34th in the 60-year cycle. The first character of the sexagenary term is the Heavenly Stem for the second digit: i.e., 丁 'fourth Heavenly Stem'. The second character is the Earthly Branch for X, a number between 1 and 12:

(X + (Y * 12)) = 34

X turns out to be 10 (and Y is 2), so the second character of the sexagenary term is 酉 'tenth Earthly branch': i.e.., rooster'.

The Tangut adopted the sexagenary cycle and somehow found Tangut equivalents for the Heavenly Stems. Last time I wrote about


0410 1vi1 'fourth Heavenly Stem'.

The reasoning behind choosing 1vi1 eludes me. (I'm assuming the Tangut terms were repurposed existing words rather than just made up.)

On the other hand, the logic behind the Tangut equivalents of the Earthly Branches is transparent: e.g., 酉 'rooster' was simply translated as


2262 1jwon3 'bird'.

(The Japanese did the same thing; they read 酉 as the native word tori 'bird'.)

2262 has three components:


Andrew West has written about the first at length here; it appears in other characters for words for birds (see below) but also has other associations.

The first and second components


appear in the second entry in the Tangraphic Sea:

𘀑 = 𘀏 + 𘤊 (< 𗿼)

3911 1pu1 'a kind of bird' = left and top of 3909 1pu1 'the name Pu' (phonetic) + left of 2262 1jwon3 'bird'.

3911 in turn is part of the analysis of 3909, the first entry in the Tangraphic Sea:

𘀏 = 𘀑 + 𘦑 (< 𗩝)

3909 1pu1 'the name Pu' = left of 3911 1pu1 'a kind of bird' (phonetic) + right of 2653 1penq 'horn'

You can see those two characters in context at Andrew West's site.

𘤏 might also be semantic for 'bird' in 3911 and even 3909 if the Pu were associated with birds and horns.

The third 𘪣 means 'bird', but like most Tangut semantic components (and unlike its possible inspiration, Chinese 鳥 'bird'), it cannot stand by itself. I don't know why some elements can be independent and others can't. In the case of 2260, the elements appended to 𘪣 'bird' are phonetic:

𗿼 = 𘤊𘤏 (< 𗿤) + 𘪣 (< 𘝋)

2262 1jwon3 'bird' = left and center of 2260 1jwon3 'breeding' + left of 1242 2dzwy4 'wing' (with a slight modification of the top element's bottom right corner)

2262 1jwon3 'bird' sounds like 2260 1jwon3 'breeding' and has 1242 2dzwy4 'wings'.

2260 has a circular derivation:

𗿤 = 𘤊𘤏 (< 𗿼) + 𘣑 (< 𘟢)

2260 1jwon3 'breeding' = left and center of 2262 1jwon3 'bird' (phonetic) + right of 0373 2vi1 'to copulate, mate' (semantic)

So does 1242:

𘝋 = 𘪢 (< 𘝁) + 𗟎

1242 2dzwy4 'wing' = left of 0673 2thy1 'wing' (semantic) + bottom right of 4289 2dzwy2 'winding corridor' (phonetic; is that component [Boxenhorn code: caigie] in Unicode?)

4289 is obviously from 1242 as a phonetic plus the semantic component 𘡩 'wood' (a corridor can be a wooden structure). The Tangraphic Sea confirms my guess:

𗟎 = 𘡩 (< 𗞵) + 𘝋

4289 2dzwy2 'winding corridor' = top of 4364 1rur4 'wooden framework' (semantic) + all of 1242 2dzwy4 'wing' (phonetic)

(The Boxenhorn code for the bottom right of 4289 is tok, not caigie, but the two components look alike to me.)

𗟎 4289 must postdate the less complex 𘝋 1242 'wing'. But does 𘝋 1242 'wing' postdate 𗿼 2262 'bird'? And what is the function of the right side of 1242 (stroke code EACCQBE not in N4636) which is unique to that character? If it is derived from two other characters, why weren't those characters mentioned in the Precious Rhymes of the Tangraphic Sea (the corresponding volume of the Tangraphic Sea has been lost)? MY FIRST POST IN TANGUT YINCHUAN

I just used Andrew West's file to convert Li Fanwen numbers for Tangut characters into Unicode for the first time to type the Tangraphic Sea analysis of1vi1 'fourth Heavenly Stem', the first half of the sexagenary term for the Tangut year beginning on 28 January 2017:

𗸃 = 𘣟 (< 𗷰) + 𘧦 (< 𘔁)

0410 1vi1 'fourth Heavenly Stem' = left of 0613 2t-? 'to refuse, remove' + 'fire', left of 4661 1bi4 'third Heavenly Stem'

If you can't see the characters, please install Prof. 景永时 Jing Yongshi's free Tangut font at BabelStone.

It's not surprising that 0410 shares 𘧦 'fire' with 4661 since both the third and fourth Heavenly Stems are associated with fire and were hence called 'red' in Khitan and Jurchen (and ᡠᠯᡤᡳᠶᠠᠨ fulgiyan 'red' and ᡠᠯᠠᡥᡡᠨ fulahūn 'reddish' in Manchu).

But why was 𘧦 'fire' combined with 𘣟from 2t-? 'to refuse, remove' which is neither (nearly) homophonous with 1vi1 'fourth Heavenly Stem' nor obviously semantically relevant to it? 𘣟 is not among the character components that Nishida (1966) was able to gloss.

𗷰 0613 2t-? 'to refuse, remove' is listed in Tangraphic Sea as a component in at least two more characters:

𗅯 = 𘠐 (< 𗅉) + 𘣟 (< 𗷰)

2377 1ky4 'to prohibit' = 'not', left of 1906 1non2 (conjunction) (semantic) + left of 0613 2t-? 'to refuse, remove' (semantic)

𘒐 = 𘧉 (< 𘒖) + 𘣟 (< 𗷰)

1462 1lo1' 'cooperation' = 1535 1lo'1 'to gather, assemble' (semantic/phonetic) + all of 0613 2t-? 'to refuse, remove'

There may have been other derivatives of 0613 in the lost 'rising tone' volume of the Tangraphic Sea.

I can understand why 'to refuse' is in 'to prohibit', but what's it doing in 'cooperation'?

And I might expect 'not' + 'to refuse' to represent a word for 'not refuse', but I presume the character is like a double negative.

Lastly, I have no idea what the etymologies for 𘔁 1bi4 'third Heavenly Stem' and 𗸃 1vi1 'fourth Heavenly Stem' are.


I just installed Prof. 景永时 Jing Yongshi's free Tangut font which can be downloaded from BabelStone. Thanks to Prof. Jing for making it freely available and to Andrew West and Michael Everson for mapping his font onto Unicode and extending it to include more characters and even character components.

銀川 Yinchuan 'Silver River' is the modern Mandarin name for the city now on the site of the former Tangut capital. Yinchuan has a 西夏区 Xixia qu 'Western Xia (i.e., Tangut) District' in name only (population 329,310).

I should eventually add Tangut characters in that font to my database of Tangut character readings (download version 1.3 1 here). TANGUT PHONETIC DATABASE VERSION 1.3.1

I have updated my database of Tangut readings (download version 1.3 1 here) with the following changes:

- the anomalous ren-readings combining r- with a nonretroflex rhyme -en have been replaced by len. See Jacques (2014: 184-185).

- corrected readings for

L2164 and L3965 which had a nonexistent -an' rhyme instead of -y'

Thanks to Andrew West for spotting the wrong reading of L2164.

L5566 which belongs to rhyme 70 (1.67), not rhyme 67 (1.64)

L6027 which had r in the "Nasal/w" column instead of the "Cycle" column

The first change is in version 1.3 which I did not upload. All other changes are new in version 1.3.1. THREE TANGUT MEATS

I used to think Tangut

5865 1soq1 'three' (Tibetan transcriptions gsoH x 14, gso x 4, so x 2)

was from *k-sum (cf. Japhug rGyalrong χsɯm and Written Tibetan gsum) with a *k- that weakened to *x- and assimilated to the following *s- which conditioned vowel tension (written as -q):

*k-s- > *xs- > *ss- > s-q

The Tibetan transcriptions with g- may reflect a dialect retaining preinitial *k-.

Guillaume Jacques (2014: 197) proposed a pre-Tangut form *sə-svm (whose *v = any vowel but *i) with reduplication like Tagalog ta-tlo < Proto-Austronesian *telu.

(Might the retroflexion of Tangut

2005 21lyr' < *rliXH 'four'

be from a reduplicated *l- that merged with the preinitial *r- that conditioned vowel retroflexion written as -r? Retroflex vowels are so common in Tangut that I suspect they had sources other than *r-.)

But if *k-s- became s- + tension - and *k-obstruent sequences which became aspirates - then how can I account for the aspirate in

3465 1chhi3 'meat' (Tibetan transcription: chi)

whose root had initial *sj- (cf. Written Tibetan sha, Written Burmese sāḥ)?

Maybe *k-s- and *k-sj- had different reflexes:

*k-s > *x-s- > *s-s- > s-q

*k-sj- > *kʂʰ- > chh- [tʂʰ-]

But I doubt that was the case. There is no external support for *k- in 'meat'.

Perhaps the preinitial of 'meat' was *t- (cf. Japhug tɯ-ɕa), and *t-sj- fused into aspirated chh-. That proposal is not without problems, as we'lll see next time. CORDIAL COMPASSION?

Almost two weeks ago, I realized that

1483 2ne4 'compassion'

in the Tangut text

3457 0478 1483 2323 5404 4625 5302 1siw4 1sho'3 2ne4 1vy1 1la1 2me'4 0L?

'new collect compassion piety record final volume'

sounds almost exactly like

2518 2ne'4 'heart' (Tibetan transcriptions from Tai 2008: 215: gne x 4, ne x 1, nye x 1, gnyeH x 1)

The only difference between the two is the presence of the unknown phonetic quality 'prime' (transcribed as -') in 'heart'.

Are the two words are related? In other words, did they have similar forms in pre-Tangut?

Before I can answer those questions, I should survey the phonetic details of 'heart' in Tangut:

- According to Arakawa's hypothesis, Tibetan preinitial g- indicates tone 1, but 'heart' has tone 2

- For twenty years I have suspected, contra everyone else, that the Tibetan preinitials might be taken literally rather than as orthographic devices for tones. Could the transcribed dialect preserve a preinitial *k- (written as g- following Tibetan spelling conventions; kn- is un-Tibetan) lost in standard Tangut? Perhaps preinitials in the transcribed dialect normally corresponded to tone 1 in standard Tangut, but 'heart' developed tone 2 in standard Tangut because it had lost its preinitial before tonogenesis.

- If Tangut grades were like Chinese grades as I interpret them, Grade IV was the most palatal. But exactly how this palatality was expressed is unclear. Did Tibetan nye ~ ne transcribe [ɲe], [nʲe], [nie], etc.?

- What Tangut sound did Tibetan final -H transcribe? The mysterious 'prime'?

On to pre-Tangut:

-e'4 with 'prime' has six sources:

*Cɯ-...-aŋX, *Cɯ-...-eŋX, *Cɯ-...-enX

*(Cɯ-)...-jaŋX, *(Cɯ-)...-jeŋX, *(Cɯ-)...-jenX

-e4 without 'prime' has only two sources (and yet is more common!):

*Cɯ-...-aŋ and *(Cɯ-)...-jaŋ

I no longer think *Cɯ-...-an(X) is a source of -e(')4.

- Exterior cognates of 'heart' point to a front vowel and *-ŋ e.g., Tibetan snying.

- But they also point to *s- and not *k-.

- STEDT's Proto-Tibeto-Burman roots #251, #689, and #1385 have *s/k-, but the data on the site don't seem to support *k-.

- And if pre-Tangut had *s- in 'heart', that consonant would condition tension absent from 2ne'4 (i.e.., 'heart' would be *2neq4).

Taking all of the above into account, the pre-Tangut word for 'heart' was

*kɯ-neŋXH or *k(ɯ)-njeŋXH

with a front vowel like Tibetan snying. (*-H conditioned tone 2.)

But 'conscience' could not be

*kɯ-neŋH or *k(ɯ)-njeŋH

because those forms would have developed into *2ni4, not 2ne4. (Whatever *X was blocked the raising of *e in *eŋX.)

Moreover, it is improbable that a nonbasic word 'conscience' would be derived from a basic word 'heart' via subtraction.

It is more probable that 'conscience' is an unrelated word with a different rhyme 

*Cɯ-naŋX or *(Cɯ)-njaŋX

that came to sound like 'heart'. A VEXING VOLUME

At the beginning of "An Interesting Reading", I mentioned the Tangut text

3457 0478 1483 2323 5404 4625 5302 1siw4 1sho'3 2ne4 1vy1 1la1 2me'4 0L?

'new collect compassion piety record final volume'

whose final character has an unknown reading. 0- indicates an unknown tone. L- indicates an unknown Class IX initial (l- lh- ld- r- z- zh-). ? indicates an unknown rhyme. I am going to start using -0 for an unknown grade.

This character appears as an initial speller in this fanqie chain:


1165 1luq3 'to rub, knead' = 5302  0L?0 + 0500 1tsuq4

(There is no phonemic distinction between Grades III and IV in rhyme 62 -uq3/4; the grade is automatically determined by the initial.)


4550 1lheq4 'sorcerer' = 1165 1luq3 + 3318 1cheq3

(There is no phonemic distinction between Grades III and IV in rhyme 64 -eq3/4; the grade is automatically determined by the initial.)

I have converted the readings from Gong's reconstruction from Li Fanwen (2008) into my system. I am unaware of any transcriptive evidence for 1165 and 4550 - or even any attestations of either character outside dictionaries. How can 1165 have a known initial if the initial of its initial speller is unknown? Why not reconstruct 1165 as 1Luq3/4? (l- r- zh- would be followed by Grade III and ld- lh- z- by Grade IV.) And why does the initial of 4550 (lh-) not match the initial of its initial speller 1165 (l-)? Shouldn't 4550 be reconstructed as 0Leq3/4?

How many other Tangut character readings are shaky? AN INTERESTING READING

I've been filling holes in my Tangut character folder lately. So far I have images for 3,634 out of the 6,125 Tangut characters in Unicode 9.0: i.e., about 59% of the total. The fact that I haven't needed images for two out of five characters even after nearly eleven years of blogging about Tangut indicates how skewed the distribution of characters is. I estimate the number of distinct characters in Guillaume Jacques' index to the

3457 0478 1483 2323 5404 4625 5302 1siw4 1sho'3 2ne4 1vy1 1la1 2me'4 0L?

'new collect compassion piety record final volume'

to be about a thousand.

The reading of 5302 is unknown. It is in the section for characters without homophones in the ninth chapter of Homophones, so it must have an L-type initial (l- lh- ld- r- z- zh). Beyond that nothing else can be said. I know of no transcriptions of it.

The reading of the first half of

4006 5383 0TS? 2se4 'interest' (in the financial sense)

should be unknown. Yet Li Fanwen (2008: 643) lists Gong's reconstructed reading as 2tswər. This reading was not in Li Fanwen (1997: 740). Kychanov and Arakawa (2006: 367), on the other hand, list the Sofronov-style reconstruction 2?ə̣.

How does anyone know what the rhyme of 4006 is? The character does not appear as a final speller in any fanqie. I do not know of any transcriptions of it that could even give us a vague idea of what the rhyme might have been. And transcriptions would not indicate the tone which I have written as 0- for unknown.

I write the initial as capital TS- to indicate that it belonged to class VI (alveolar sibilants other than z- which I suspect might have been lateral [ɮ]). But I don't know which class VI initial it had: ts-, tsh-, dz-, or s-.

Lastly, how does anyone know what 4006 5383 means? Neither edition of Li cites any attestations outside dictionaries, and Li (1997: 740, 974) lists no definitions for either 4006 or 5383. Have Kychanov and/or Arakawa found such attestations and identified the meaning from context? SINO-TANGUT PHONOLOGICAL PARALLELS (PART 1)

At a glance, Tangut and Tangut period northwestern Chinese (hereafter simply 'Chinese') phonology appear to be similar: 

- They had largely overlapping consonant inventories with a three-way distinction between voiceless unaspirated, voiceless aspirated, and prenasalized voiced: e.g., p- : ph- : b- [mb].

Tangut, however, had more consonants: gh-, lh-, ld-, r-, z- [ɮ].

And Chinese had an f- absent in most Tangut reconstructions (the exceptions being Nishida's and Arakawa's).

- They had six basic vowel types: u, i, a, y, e, o.

- These vowels had four types of variations ('grades').

Tangut, however, had further variations absent from Chinese: tension, retroflexion, and the mysterious quality that I write with -' and call 'prime'.

- They contrasted oral and nasal vowels.

- Their syllables had the structure C(w)V(G); they only permitted -w and perhaps -j in coda position.

Despite many common features, it would be an exaggeration to say that the two languages share a common phonology. Notice that I have not mentioned tones. There does not seem to be any correlation between the two 'tones' of Tangut and Chinese tonal categories: e.g., Chinese 龍 *2lon3 'dragon' was borrowed twice with both tones:

4897 1lon3 and 4203 2lon3

This could imply that Tangut and Chinese tones sounded very different, making one-to-one mapping between them impossible.

Or perhaps Tangut had phonations (plain vs. breathy?) instead of tones despite the use of 'tone' in the Tangut phonological tradition. The Tangut couldn't hear tones because they didn't have any. (I am now skeptical of the phonation hypothesis that I came up with in the late 90s. If Tangut had phonation and Chinese didn't, why didn't the Tangut simply borrow and transcribe all Chinese tones with Tangut clear phonation?)

One last possiblity - as yet unexplored - is that the Tangut were sensitive to sandhi variants of tones. Suppose, for instance, that Tangut and Chinese tones 1 and 2 were similar, and that Chinese tone 1 became tone 2 before tone 4: 龍栢 */1lon3 4pe2/ > [2lon3 4pe2] 'dragon cypress'. Then it would make sense to borrow that disyllabic word as

4203 4119 2lon3 1pi2

with the second tone while borrowing monosyllabic 龍 /1lon3/ = [1lun3] 'dragon' as

4897 1lon3

with the first tone. But why, then, was Chinese 龍栢 */1lon3 4pe2/ 'dragon cypress' transcribed (as opposed to borrowed) in the Timely Pearl as

4897 5970 1lon3 1pi2

with the first tone rather than the second? Here are five explanations:

1. The most boring, namely, that this is a random error.

2. Hypercorrection: the transcriber knew that the Chinese word for 'dragon' had tone 1 and might have assumed that tone 2 in the Tangut loan deviated from the Chinese (when in fact it reflected Chinese tone sandhi).

3. The transcription reflects a careful Chinese reading pronunciation "1lon3 ... 4pe2" without tone sandhi.

4. The transcription reflects a variant Chinese pronunciation without tone sandhi - perhaps from a dialect slightly different from the source of the Tangut borrowing.

5. The borrowing reflects a slightly earlier stage of Chinese with tone sandhi and the transcription reflects a slightly later stage without tone sandhi (and with the original first tone restored by analogy with 'dragon' in isolation?).

The tones are not the only differences between Tangut 2lon3 1pi2 'dragon cypress' and its Chinese source lon3 4pe2. I'll explore the others in part 2. DISSECTING A TANGUT MARRIAGE (PART 5)

If 5051 (second half of 1y4 1naq4 'marriage'; Boxenhorn code: biogeodex) could be abbreviated to resemble 2544 'sage'  (Boxenhorn code: geo) in 0532 2ge4 'to marry' (Boxenhorn code: hosgeo),


why wasn't it abbreviated that way in other derivatives?

3657 1y4 (first half of 1y4 1naq4 'marriage'; Boxenhorn code: giibiogeo)

1625 2tuq4 'to mate, marry' (Boxenhorn code: fosbiogeo)

5975 1naq4 'parallel, weft' (Boxenhorn code: palbiogeo)

In other words, why do those three characters have a 'hat' (bio) absent in 0532?

I think 3657 needed a 'hat' (bio) to distinguish it from an existing character without it:

2449 2bi1 'sun' (Boxenhorn code: giigeo)

2449 must precede 3657 in the chronology of tangraphic creation.

But there are no characters with the structures





so in theory the 'hats' (bio) are redundant, though their presence does makes the connection of 1625 and 5975 to 5051 more transparent.

I am reminded of the inconsistency of simplification in the postwar Japanese script:

- 獨 'alone' was simplified to 独 (with the phonetic 蜀 'the state of Shu' reduced to 虫 'bug')

- but 濁 'muddy' was not simplified to 浊 even though no such character already exists (and years later, 濁 was simplified to 浊 in the PRC).

There is no deep meaning behind the inconsistency of 独 and 濁. Perhaps there is none behind the inconsistency of

0532 without a 'hat' (bio)

on the one hand and

1625 and 5975 with 'hats' (bio)


Many Tangut marital characters from the previous parts contain

2544 2shen4 'sage' < Chinese *3shen3

and if one had never known about 2544, one might guess that it was a semantic component 'marry'. But it acquired that secondary function as an abbreviation of 5051:


5051 1naq4 = 3657 2705 2546 2705 1y4 2ber'4 1naq4 2ber'4

(first half of 1y4 1naq4 'marriage') right + 'god' right

2544 'sage' is semantic in 2546 'god', the phonetic of 5051. I have no doubt about the first half of the Tangraphic Sea analysis of 2546:


2546 1naq4 = 2544 1602 0149 0737 1naq4 2ngorn1 2wer1 1chhen3

'sage' all + 'protect' bottom

But I have doubts about the second half. 0149 must be derived from 2546 rather than the other way around. The 'person' on the right of 2546 is either simply 'person' (but why would 'god' have 'person'?) or an abbreviation of one of the 1,186 (!) tangraphs containing 'person'.

Someone (I?) should try to reconstruct a chronology of the derivation of tangraphs based on the Tangraphic Sea derivations plus common sense. Here's a sliver of that chronology:

In words: 2544 begat 2546, which in turn begat 5051 and 0149.

5051 begat 3657, 1625, 5975, and 0532 (but why does 0532 lack the 'horned hat' of the others?).

5138 begat 5138 1gu'1, first syllable of 1gu'1 1chhiw4, the name of a Tangut god (1chhiw4 is 'six').

Next: Why don't all married sages wear hats? DISSECTING A TANGUT MARRIAGE (PART 3)

As I wrote in part 2, I thought that 5051 1naq4 was simply phonetic in its homophone 5975:


5975 1naq4 'parallel, weft' = 5938 3936 5051 3936 2ge4 1pha1 1naq4 1pha1

'classical text, warp' left + (second half of 1y4 1naq4 'marriage') left

But then I discovered that 5938, listed as the source of the left side of 5975, had a homophone

0532 2ge4 'to marry'

which the Tangraphic Sea lists as a definition for the first half of

3657 5051 1y4 1naq4 'marriage'.

Is 0532 'to marry' a metaphorical extension of 5938 'warp' (in the sense of weaving)? Li (1997: 104) defined 0532 as 'weave, marry' - to which STEDT added '(join in marriage)' - but the revision of the entry for 0532 in Li (2008) has the definition 'to marry, to unite in marriage' without any reference to weaving..

If 0532 is originally a weaving term, then could 5051 1naq4 of 3657 5051 1y4 1naq4 'marriage' also originally be a weaving term - specifically, an extended usage of 5975 1naq4 'parallel, weft'?

3657 1y4 is attested as an independent word 'marriage, matchmaker, relatives by marriage'. 3657 5051 1y4 1naq4 'marriage' is thus originally 'marriage weft' with the first half clarifying the metaphorical use of the second half which does not occur on its own in the sense of 'marriage'.

Do 5938 2ge4 < *Nɯ-Kan/ŋ ~ *Cɯ-ŋgan/ŋ 'warp'* and 5975 1naq4 < *Sɯ-naC 'weft' have cognates outside Tangut? Unfortunately, neither 'warp' nor weft' are in the rGyalrongic Languages Database. Both are at STEDT, but I can't find any cognates there or in Guillaume Jacques' Japhug dictionary which lists tɤ-ʁjar 'warp' and tɯ-jlɤβ 'weft'.

*I reconstruct a presyllable with to condition Grade IV after a velar. However, I do not know whether that presyllable had a nasal initial *N- or preceded a nasal. I also do not know if the velar stop after the nasal was originally voiced or not. In any case, Tangut g- is from *ŋg- which may in turn have more complex origins. DISSECTING A TANGUT MARRIAGE (PART 2)

The character for the second half o

3657 5051 1y4 1naq4 'marriage'

has two probable derivatives besides the first character:


1625 2tuq4 'to mate, marry' = *0482 3936 5051 3936 2dzen4 1pha1 1naq4 1pha1

*'to copulate' left + (second half of 1y4 1naq4 'marriage') left?


5975 1naq4 'parallel, weft' = 5938 3936 5051 3936 2ge4 1pha1 1naq4 1pha1

'classical text, warp' left + (second half of 1y4 1naq4 'marriage') left

The analysis of 1625 is my guess since it is one of the many characters whose analysis was in the lost second tone volume of the Tangraphic Sea.

0482 is the clarifier of 1625 in Homophones, so it is certain that the Tangut considered the two to be semantically related even if 0482 was not actually in the analysis of 1625.

1625 2tuq4 should go back to pre-Tangut *Sɯ-to-H:

*S- conditioned the tension of the vowel transcribed as -q.

*-ɯ- conditioned Grade IV in lower vowels (*a, *e, *o) after dentals

I am assuming that the raising of *o to *u predated the conditioning of Grade IV.

I could be wrong. Maybe Grade IV was conditioned by a raised *o after a dental:

*S(ɯ)-to-H > *S(ɯ)-tu-H > 2tuq4

If so, maybe there was no after *S-.

*-o raised to -u (Jacques 2014: 206); whether this occurred before or after Grade IV is uncertain.

*-H conditioned tone 2; it may ultimately be from *-ʔ or *-h (< *-s).

*Sɯ-to-H might go back to an even earlier *Sɯ-ton-H if it is cognate to forms for 'to marry' like

Somang rGyalrong ston muŋ ka-pa

Daofu sto lmo və  (is v- a lenited *p preserved in Somang?)

Xinlong Queyu ste⁵⁵ rmu⁵⁵ vi¹³ (did *o front before *-n?; v- < *p-?)

and if *Cɯ-...-on merged with *Cɯ-...-o into -u3/4. That would be parallel with the merger of *Cɯ...-en and *Cɯ...-e into -i3/4, and one could propose a general rule:

*Cɯ-...mid vowel + -n > Grade III/IV high vowel

5051 must be semantic in 1625 since the two sound nothing alike. Conversely, 5051 must be phonetic in 5975. But could it be something more? I didn't think so at first. I'll explain why I changed my mind next time. DISSECTING A TANGUT MARRIAGE (PART 1)

The two halves of

3657 5051 1y4 1naq4 'marriage'

are written similarly, so it's not surprising that they have circular derivations in the Tangraphic Sea:


3657 1y4 = 3436 2705 4973 3936 1sa'1 2ber'4 1naq4 1pha1

(second half of 1ne4 1sa'1 'close relative) right + (second half of 1y4 1naq4 'marriage') left


5051 1naq4 = 3657 2705 2546 2705 1y4 2ber'4 1naq4 2ber'4

(first half of 1y4 1naq4 'marriage') right + 'god' right

2546 is clearly phonetic in its homophone 5051. So I think the sequence of character creation was

2546 > 5051 > 3657

though I am surprised the character for a second syllable was devised before the character for a first syllable.

Why was 5051 abbreviated in 3657? Because it was no longer phonetic, so there was no longer any need to keep all of 2546 under the 'horned hat'? Because the right-hand 'person' component (Boxenhorn code: dex) is so common (it appears in one out of five Tangut characters) that it is almost expendable? In any case, 5051 doesn't appear in its entirety as a component of any character.

Next: Other instances of 'depersonalized' 5051. EATING BEGINS WITH LOVE, NOT MARRIAGE

Thanks to Guillaume Jacques for catching my mistake. The correct fanqie for 'eat' from "The Past and Present Sound of Eating in Tangut" (Part 1 / Part 2) is


4517 1dzi3 'eat' = 4973 1dzu4 'love'+ 0932 1i3 'many, more, much'

The correct initial speller 4973 is visually very similar to 5051, the erroneous initial speller that I posted which represents the second half of the disyllabic word

3657 5051 1y4 1naq4 'marriage'

I will take a closer look at this word starting tomorrow.

Alas, the enigma of the final speller remains. Why is it Grade III instead of Grade IV after dz-? THE 'RIGHT' RHYME (PART 6) / TANGUT PHONETIC DATABASE VERSION 1.2

After this I think I'll be really be done with the topic of Tangut rhyme 101 (1.93/2.86) for some time.

I have updated my database of Tangut readings (download version 1.2 here) to incorporate the changes I have proposed in this series:

- the reinterpretation of rhyme 101 as -er' instead of -ir' (see part 1)

- the reassignment of

2705 'to help; right side of character (i.e., assistant)' and 2928 'to explain, note' (with 'speech' on its right side; probably a different spelling of a specialized usage of 'to help')

from rhyme 2.54 to 2.86 (see "Explaining the 'Right' Reading")

I didn't upload version 1.1 in which I replaced a lot of symbols for nasalization with glides after mid vowels, bringing my transcription closer to Gong's reconstruction: e.g.,

-en > -ey (corresponding to Gong's -əj, -iəj, -jɨj)

-on > -ow (corresponding to Gong's -ow, -iow, -jow)

I have retained the nasalization from 1.0 in 1.2. THE 'RIGHT' RHYME (PART 5)

I didn't expect to write a five-part series on this topic, but I forgot to mention the Tangut-internal evidence on the rhyme in part 4, so it's getting its own part.

"It" consists of alternations between rhyme 101 and other rhymes (70 -iq3/4 and 84 -ir3/4) in what superficially appear to be synonym pairs (Gong 2002: 103):


1. 5683 2er'4 ~ 5209 2iq4 'to stretch, lengthen'


2. 1928 1ler'3 ~ 5850 1liq3 'to rub'


3. 5742 1tser'4 ~ 3641 1tsir4 'to choose'

Gong reconstructed rhyme 101 as -iir which matched the i of the non-101 members of those pairs. However, the placement of rhyme 101 in the Tangraphic Sea (see part 1) and the Tibetan and Chinese transcription evidence (see parts 2 and 3) point to a nonhigh vowel. The merger of *-ir' with *-er' that I first proposed in part 1 can account for the vowel mismatch.

Those three pairs could be reconstructed in accordance with the proposals in part 4 as

1. 'to stretch, lengthen'

*r((ɯ)-s)ɯ-ʔa-X/*r((ɯ)-s)ɯ-ʔe(n/ŋ)-X ~


2. 'to rub'

*r((ɯ)-s)ɯ-la-X/*r((ɯ)-s)ɯ-le(n/ŋ)-X ~


3. 'to choose'

*rɯ-tsa-X/*rɯ-tse(n/ŋ)-X ~



*Cɯ-tsar-X/*Cɯ-tser-X ~


All three pairs require a -vowel presyllable to condition the raising of *a and/or *e to their Grade III/IV reflexes.

The rhyme 101 members of pairs 1 and 2 must have had a prefix *r(ɯ)- to condition retroflexion absent in their rhyme 70 counterparts ending in -iq. If their bases had *sɯ-prefixes, then there is no need to reconstruct  after *r- since the  of *sɯ- would be sufficient to condition Grades III/IV. But the possibility of *rɯ-sɯ- cannot be ruled out.

All three pairs involve the presence or absence of the mysterious factor *X that I have arbitrarily written at the ends of syllables.

Without external evidence, there is no way to narrow down the possibilities.

And without narrower definitions, there is no way to be sure about the functions of the various affixes. (*X may not have been an affix, though for convenience I write it as if it were a suffix.) THE 'RIGHT' RHYME (PART 4)

I originally wanted to end this series with comparative evidence for Tangut rhyme 101 (1.93/2.86) words, but I don't know of any. Which is not surprising as there are only thirteen characters with a total of five different readings ending in rhyme 101 (Arakawa 1997: 91):

Homophones initial class Tone 1 Tone 2
I   2ber'4
V 2ker'4
VI 1tser'4 2tser'4
IX 1ler'3  

Grades III and IV are normally in complementary distribution. If there are no Grade III/IV minimal pairs in a rhyme, I assign Grade III to Class II and VII consonants and the Class IX consonant l-. The default grade for all other initials including the Class IX consonant lh- is IV. This assignment parallels the general pattern of distribution of initials in rhymes that have Grade III and IV minimal pairs. The different distribution of l- and lh- suggests that they did not simply differ in terms of voicing. I think l- was velar [ɫ] (as in Nishida's 1ɫĭə̣r corresponding to my 1ler'3) whereas lh- was a fricative [ɬ]. (Note that Nishida reconstructed l- and ɫ- as distinct initials, whereas I think /l/ was always [ɫ] except before Grade IV rhymes where it was [l].)

Nishida (1964: 67) was the first to identify rhyme 101 as retroflex, and that classification has been carried over into the reconstructions of Arakawa, Gong, and this site. (Sofronov does not reconstruct retroflexion in Tangut vowels in any of his three reconstructions.)

Vowel retroflexion has two sources in Tangut: (pre)initial *r- and final *-r. The r- of Tibetan transcriptions of Tangut rhyme 101 syllables may directly reflect a preinitial r- preserved in a nonstandard Tangut dialect (see part 2). Preinitial *r- may have been *rɯ- with a high vowel conditioning Grade III/IV.

Nishida (1964: 67) also reconstructed tension in rhyme 101. Gong reconstructed preinitial *s- as the source of tension in Tangut vowels, though he reconstructed length rather than tension as an extra nonretroflex feature in rhyme 101: -iir. If Nishida and Gong are both right, the sources of rhyme 101 syllables would have preinitial *s(-r)- or *(r-)s-.

I can't rule out Nishida's tension, but I do not think Gong's length was present in this rhyme or any other, at least not during the Tangut imperial period. Sanskrit has phonemic vowel length that does not correlate with Gong's reconstructed vowel length in the readings of Tangut transcription characters. For now I simply acknowledge that this rhyme was somehow different from regular -er3/4, and indicate that difference with an apostrophe that I call 'prime'. I arbitrarily indicate the unknown source of 'prime' as a final *-X in pre-Tangut. The position of *-X is simply carried over from *-'; the actual conditioning factor of -' could have been anywhere in the syllable. I have never seen -'/*-X correlate to anything in any other Sino-Tibetan language. It is remotely possible that Tangut preserves something lost in the rest of its gigantic family, but I am hesitant to make such an extreme claim until I look harder. Which won't be tonight. All I can say for now is that *-X seems to have blocked raising in rhyme 40, the nonretroflex counterpart of 101:

R10 -i3 < *Cɯ...-en, *Cɯ...-eŋ

R11 -i4 < *Cɯ...-en, *Cɯ...-eŋ

R40 -e'3/4 < *Cɯ...-enX, *Cɯ...-eŋX

and perhaps *Cɯ...-anX, *Cɯ...-aŋX?

Integrating the above proposals (other than *s-) with Guillaume Jacques' sources for -i/e3/4 (in my notation; -ji(j) in his) and my hypotheses of mergers from part 1 and part 3, I have come up with up to twelve possible sources of -er'3/4:

Early pre-Tangut Late pre-Tangut Standard Tangut
*rɯ-CaX *Cir'3/4 Cer'3/4
*r(ɯ)-CukX *Ciwr'3/4
*rɯ-CekX *Cewr'3/4
*rɯ-CanX *Cer'3/4

Obviously not all twelve had to exist in pre-Tangut. THE 'RIGHT' RHYME (PART 3)

In part 1, I proposed reinterpreting Tangut rhyme 101 (1.93/2.86) as -er' instead of -ir'. A mid vowel e fit the Chinese transcription *3me3 (Timely Pearl 32.2.8) for

2705 'to help; right side of character (i.e., assistant)'

better. And if rhyme 101 had e, that would be the vowel expected after y in rhyme 100 following the usual order of Tangut vowels.

In this part, I will start to look at the transcriptional evidence for rhyme 101.

1. Sanskrit transcription evidence

There isn't any. This tells us that 101 probably didn't sound like anything in Sanskrit. V'-rhymes are rare in Tangut transcriptions of Sanskrit and Vr'-rhymes seem to be nonexistent. That tells me that the unknown quality that I write with a prime symbol was absent from Sanskrit.

2. Tibetan transcription evidence

Tai (2008: 229) lists 22 transcriptions of two tangraphs with rhyme 101:

0467 1tser' 'method, art, skill, dharma'

transcribed as rtsi (x 1), rtse (x 5), rdze (x 1), rc? (x 1)

2698 2tser' 'nature, character'

transcribed as rtse (x 12), ?e (x 1)

Out of 22 transcriptions, 20 end in -e, 1 ends in -i, and 1 ends in an unknown vowel. The obvious conclusion is that the vowel of rhyme 101 was something like Tibetan e.(Why did Gong reconstruct long i instead of long e for rhyme 101?)

The preinitial r- of the transcriptions may either indicate the retroflexion of the following vowel or reflect an actual preinitial r- in the transcribed Tangut dialect corresponding to retroflexion in the standard dialect described in the Tangut phonological tradition:

Tangut dialect transcribed in Tibetan Standard Tangut
CV (plain vowel) CV (plain vowel)
rCV (r- + plain vowel) CVr (retroflex vowel)

If the table above is correct, the dialect transcribed in Tibetan had fewer vowels than the standard dialect; the latter had retroflex vowels absent in the former.

I would expect the Chinese transcriptions of rhyme 101 characters other than 2705 to also contain *e, but we will see that is not the case in part 3.

Tangut Yinchuan font copyright © Prof. 景永时 Jing Yongshi
Tangut character image fonts by
Tangut radical and Khitan fonts by Andrew West
Jurchen font by Jason Glavy
All other content copyright © 2002-2017 Amritavision