Two days ago, I got my copy of Sir Gerard Clauson's skeleton dictionary of Tangut over twenty years after I had first read about it in Analysis of the Tangut Script.

One of the many things I like about Clauson's dictionary is that it is free of the speculative definitions found in later dictionaries. For instance in my last entry, I used Li Fanwen's (2008: 3, 926) definition of 'brothers' for


0012.5873 1bu3.2kuq1

That definition is presumably based solely on the Tangraphic Sea definitions of 0012 and 5873:

Tangraphic Sea 1.7.131:


0012 3583 0012.5873 5285

1bu3 1ta4 1bu3.2kuq1 1ly3

'0012 TOP 0012.5873 AFF' = '0012 is [as in] 0012.5873'


2447 0605 5285

2lo3 2toq4 1ly3

' AFF' = '[It means] elder [and] younger brother'

4739 0213 0635.1424 1139 1279

1tsewr1 1ne4 1ny4.1thu4 1e4 1y4

'joint near relative GEN COMP' = '[It is what] closely related relatives [are] called'

Combined Homophones and Tangraphic Sea A 7.203:


2447 0605 4739 0213 0635.1424 1139 1279

2lo3 2toq4 1tsewr1 1ne4 1ny4.1thu4 1e4 1y4

' joint near relative GEN COMP' = '[5873 is what] elder [and] younger brothers [and] closely related relatives [are] called'

The word 0012.5873 is apparently not attested outside the entries for its characters in dictionaries.

Last night I looked up both 0012 and 5873 in Clauson, and as I had hoped, he glossed both as '?' in entries 1069 and 3120. The question marks most likely reflect Clauson's lack of access to the Tangraphic Sea, but I think they are still appropriate to some degree today because there is no guarantee that the components of Tangraphic Sea entries are precise synonyms: e.g., 'elder and younger brothers' is certainly not the same thing as 'closely related relatives'. So could 0012.5873 have been 'sibling'?

1.14.11:14: I don't think 0012.5873 was 'sibling' because I would expect 'sibling' to appear in definitions for sororal words. Perhaps 'closely related relatives' is needed to specify biological brothers as opposed to brothers in a broader, nonbiological sense: e.g., males of the same age. Were Tangut 2lo3 'elder brother' and 2toq4 'younger brother' used as nonbiological terms of address like Burmese ကို ko 'elder brother' and မောင် maũ 'younger brother'?

Unfortunately the Tangraphic Sea definitions for 2lo3 'elder brother' and 2toq4 'younger brother' have been lost. I would not expect 0012.5873 to appear in them since I think 0012.5873 was a subset (biological) of 2lo3-2toq4 'brothers' in a broader sense. ANTHROPOGENESIS IN TANGUT

One of the first Tangut words - and characters - that I learned over twenty years ago was


2541 2dzwo4 'person'

which doesn't belong to the *m-'people' word family from my last post.

I never knew its etymology until I saw the Loloish words for 'person' in Burling (1967: 89):

Lisu tshō, Lahu chɔ̄, Akha tsɔ́hà

which are from Proto-Lolo-Burmese *tsaŋ.

Tangut -o is partly from *-a, so 2dzwo4 could be from *Pɯ-N-tsaŋH or *Nɯ-P-tsaŋH with

- *P- to condition medial -w-

- *-ɯ- to condition Grade IV

- *-N- to condition voicing of *-ts-

- *-H to condition tone 2 (the 'rising tone' - or was it really a phonation?)

According to STEDT, this word is also found in Central Naga and Bai (see forms here), so it is not an innovation of Burmo-Qiangic (Jacques' [2014: 2] proposed Sino-Tibetan subgroup containing both Lolo-Burmese and Tangut [as part of Qiangic]).

1.10: I'm glad I didn't post this right away because I realized my proposal has a problem.

Jacques' (2014: 206) pre-Tangut *-jaŋ (= my *Cɯ- ... -aŋ) became Gong's Tangut -jij (= my -e3/4), not Gong's Tangut -jo (= my -o3/4). (The initial determines whether the rhyme has Grade III or IV.)

Therefore I would expect *-jwaŋ (= my *Pɯ- ... -aŋ or *Cɯ-P- ... -aŋ) to become Gong's Tangut -jwij (= my -we3/4), not Gong's Tangut -jwo (= my -wo3/4).

2dzwo4 ends in -wo4, not -we4, so it cannot be from *Pɯ-N-tsaŋH or *Nɯ-P-tsaŋH. Or can it?

I can't find any examples of *-jwaŋ (= my *Pɯ- ... -aŋ or *Cɯ-P- ... -aŋ) in Jacques (2014). I propose that such a sequence became -wo3/4:

*Pɯ-Caŋ > *Pɯ-Cɨaŋ > *P-Cɨaŋ > *Cwɨaŋ > *Cwo3/4 and/or

*Cɯ-P-Caŋ > *Cɯ-Cwaŋ > *Cɯ-Cwɨaŋ > *Cwɨaŋ > *Cwo3/4

The medial *-w- 'encouraged' the following vowel to retain its labiality, whereas labiality was lost without *-w-:

*Cɯ-Caŋ > *Cɯ-Cɨaŋ > *Cɨaŋ > *Ciaŋ > *Cö > *Ce3/4

The *Cɯ- above is not *Pɯ- which would have condtioned -w-.

Unfortunately I do not know of any Chinese loanword evidence for my proposed sound change. Middle Chinese *-waŋ3 corresponds to Sino-Tangut -on1 rather than -wo3 in the one case known to me (Gong 2002: 424):

旺 MC *3waŋ3 : ST 𗼤 2340 1von1 'prosperous'

I suspect the word was *3won3 with a nasalized vowel -on in Tangut period northwestern Chinese (TPNWC), and that this form was borrowed into Tangut with -on1, a nasalized vowel rhyme that originated from something like *-om, a merger of Cʌ- ... -um, *-am, *-em, and *-om (but not rhymes ending in the velar nasal *-ŋ!). If my proposal is correct, an earlier borrowing of 旺 might have had -wo rather than -on in Tangut. Here is a possible relative chronology:

Stage 1 2
Tangut *-waŋ3/4 -wo3/4
*-om1 -on1
TPNWC *-waŋ3 -won3

At stage 1, TPNWC *3waŋ3 is a better match for Tangut *vaŋ3 (I write initial *w- as v-) than Tangut *vom1. But at stage 2, TPNWC *3won3 is a better match for Tangut 1von1 (which was how *3won3 was actually borrowed) than Tangut *vo3.

Did the sound change *-aŋ > -o spread from Chinese to Tangut? Japhug underwent the same change (Jacques 2004: 143) even though it was not in contact with Chinese until recently and its ancestor separated from that of Tangut long ago. A case of drift? Or just coincidence? The fusion of au into o is common (e.g., Sanskrit*), though the shift > > *u that would precede it isn't.

Lastly, on Monday morning in the rGyalrongic Languages Database I found some forms for 'person' that have labial + affricate initials like my pre-Tangut *Pɯ-N-tsaŋH: e.g.,

mDaH mdo βdzi

Tag gsum vdzi̤

At first I thought -i might be an unusual reflex of *-waŋ. However, I suspect that -i is from a rhyme with a lost *-t given Ri ṣe wdzit̚.

Forms like Nye dgaH brgya gcig vdzɨmi look like redundant compounds of the Pdz-word for 'person' with the m-'people' word from my last post.

I was initially hopeful that a third type of 'person' word in the database might be related to Tangut 2dzwo4 < *Pɯ-N-tsaŋH:

Rong wam kə' mcu

Wobzi vɟú

Hbrong rzong βɟuʔ

But now I think their palatal stops are hardened from what might be a *-j- still more or less present in

Khog po kə' mbju

Tsho bdun A ke' ᵐbo

Tsho bdun B kə³³ rəᴺ⁴⁴ bjo⁵⁴

Khang sar kə' rbju

rDzong Hbur kə' rmbju

Those words are reminiscent of Gong's 1bjuu = my 1bu3, the first half of


0012.5873 1bu3.2kuq1 'brothers' <*NPə.SkoH or *NɯPo.SkoH**?

a word only known from dictionaries. But I do not know of any examples of 1bu3 standing by itself, so I don't think there is any connection.

Go la thang nya lo ta' ʁap is a fourth type of rGyalrongic word for 'person' without any known Tangut cognate.

*Sanskrit au is from *āu. There was a chain shift: *āu > au > o.

**1.12.6:12: These reconstructions assume that the word is native or at least was borrowed before the sound changes that occurred between pre-Tangut and Tangut.

I suspect the word is from a non-Sino-Tibetan substratum that is the source of other unanalyzable disyllabic words in Tangut. Could it have simply meant 'brother' without any age distinction?

In Old Chinese, there was a strong tendency for both halves of disyllabic noncompound words to be of the same syllable type: AA or BB rather than AB or BA.

A-type syllables had low presyllabic vowels (*ʌ) or lower main vowels (*e *a *o) and developed Grades I or II in Middle Chinese.

B-type syllables had high presyllabic vowels (*ɯ) or higher main vowels (*i *ə *u) and developed Grades III or IV in Middle Chinese.

Tangut and Chinese seem to have undergone similar (though not identical) developments. I believe both languages underwent syllable-internal harmonization: i.e., the height of the main vowel harmonized with the height of the presyllable (if any). The presyllables were then lost, the harmonized vowels became phonemic, and the two languages developed a four-grade distinction.

Chinese disyllabic noncompound words usually had height harmony. I have never looked into whether Tangut disyllabic noncompound words also usually had height harmony. Tangut 1bu3.2kuq1 lacks height harmony; it combines a Grade III (type B) syllable with a Grade I (type A) syllable. If height harmony was the norm in Tangut between as well as within syllables, then 1bu3.2kuq1 was either a loanword from a language that lacked height harmony*** or a compound 1bu3-2kuq1****. (I use hyphens to indicate morphological boundaries and periods to indicate linked syllables without any certain morphological relationship between them.) I favor the former, as I have not found words like 1bu3 or 2kuq1 with meanings I would expect for the halves of 'brothers'. I also have not found a source for 1bu3.2kuq1. I suspect Tangut may be our only source of information on its substratum: i.e., we will never find external confirmation for a word like buku.

***Cf. Turkish kitap 'book' from Arabic kitāb. Kitap violates Turkish palatal vowel harmony because it contains a front vowel i followed by a nonfront vowel a.

****Cf. Finnish seinäkello 'wall clock', a compound word without palatal harmony across its halves: seinä 'wall' has a front vowel ä whereas kello 'clock' has a back vowel o. (The vowels e and i are neutral.) THE TANGUT *M-'PEOPLE' WORD FAMILY

For a long time I have assumed that


2344 2mi4 'Tangut' (see my last post)

and the first syllable of


3752 2my4 2na'4 < *-k 'Tangut' (borrowed into Tibetan as mi-nyag)

was from *mi, a cognate to Tibetan mi 'person', and that 2na'4 was from *Cɯ-nak-XH, cognate to Tibetan nag 'black' and almost homophonous with 𗰞 0176 1na4 'black'. In short, I thought that the Tangut called themselves the '(Black) People'. I thought that 2my4 was phonetically something like [mjə], an unstressed, reduced form of the independent monosyllabic form 2mi4.

Although I still think 2my4 had some sort of nonlow nonpalatal vowel, last night I realized that 2mi4 could not go back to *mi because *i backed to Tangut -y. Tonight I think 2mi4 is from *Cɯ-meH with a mid vowel like the main vowel of Japhug tɯr-me 'person'. Cf.


4469 2shi3 < *Cɯ-sheH 'to go' : Japhug ɕe 'id.'

The final *-H symbolizes the glottal source of the second 'tone' 2- (which may have been phonation rather than a ptich). The presyllable could not have ended with *-r like Japhug tɯr- at the time Tangut developed retroflex vowels because *Cɯr-meH would have become Tangut *2mir4 with a retroflex vowel -ir, not 2mi4 with a nonretroflex vowel -i. Jacques (2014: 24) identified


3818 2mer4 < *Cɯr-mejH 'person; nominalizer'

as a cognate of Japhug tɯr-me with the expected retroflex vowel. (But what is the *-j needed to block the raising of *-e to -i? A suffix? Is *-mejH from *-meH + *-j?)

The alternation of 4469 2shi3 with


4481 1shy3 < *Cɯ-sheH 'to go' : Japhug ɕe 'id.'

is reminiscent of the -i ~-y alternation of

𗼇 ~ 𗼎𗾧

2344 2mi4 ~ 3752 2my4 2na4

though the former is not part of a disyllabic word.

A monosyllabic member of the m-'people' word family with -y is


4574 1my4 < *mi 'other person'

which Jacques (2014: 145) also identified as a cognate of Japhug tɯr-me. Could this be the true direct cognate of Tibetan mi?

Another such member is


0607 1myr4 < *r-mi 'people, tribe'

I am now inclined to think there was a *-e ~ *-i alternation in the pre-Tangut *m-'people' word family:

*-e-words *-i-words
𗼇 2344 2mi4 < *Cɯ-me-H 'Tangut'
𗇋 3818 2mer4 < *Cɯ-r-me-j-H 'person; nominalizer'
𘈑 0607 1myr4 < *r-mi 'people, tribe'
3752 2my4 < *mi-H- (first syllable of 'Tangut')
𘉑 4574 1my4 < *mi 'other person'

Did that alternation originate as a distinction between, say, a schwa-grade *-əj and a zero-grade *-i? The Sanskrit alternation between guṇa-grade nara- and zero-grade n-, both 'man', comes to mind.

Next: What is the etymology of the most common word for 'person' in Tangut? INSTALLATION-FREE TANGUT

After my initial post using the Tangut Yinchuan font, I was worried about how to tell readers they'd need that font to see Tangut text in subsequent posts. Thanks to Andrew West and David Boxenhorn, you may now be able to see


2344 4797 2403 2mi4 1wyr4 2di4 'Tangut script'

on this blog without installing a font on any device. I've been using images for the past decade to be able to read Tangut on my phone, but now I only need them for Khitan and Jurchen until they're added to Unicode. Unfortunately I can't view the characters online in Chrome even though they're visible in my local copy in Chrome. But they are visible online in Firefox and on my iPhone. I don't know about visibility elsewhere. I don't have time to fix this issue right now. Maybe next week. MIYAKO IN TANGUT

Japanese names are Sinified by reading their Chinese characters (if any) in a Chinese language: e.g., 宮古 Miyako is Mandarin Gonggu, Cantonese Gunggu, etc.

How would Japanese names have been Tangutized? The Tangut only knew of Japan through Chinese written records, so they wouldn't have known how Japanese names were pronounced in Japanese, much less other Japonic languages like Miyako. Thus the Tangut would have phonetically transcribed the Tangut period northwestern Chinese readings for the characters of Japanese names: e.g., 宮古 TPNWC *1kun3 2ku1 would have been Tangutized as


1306 1034 1kon4 1kwo1

using the transcriptions of 宮 and 古 in the Forest of Categories.

Tangut had no rhyme -un3 and generally did not permit Grade III rhymes after velars, so -on4 was the best available match for TPWNC *-un3.

Although Tangut had a rhyme -u1 whose romanization on this site happens to match TPWNC *-u1, my notation is not IPA, and perhaps Tangut -wo was an attempt to approximate a TPWNC final like [ʊ], a vowel partway between [u], the vocalic counterpart of the glide w, and mid o.

There are other hypothetical and less likely approaches to Tangutizing Miyako.

One is to translate the Chinese characters 宮古 'palace ancient' into their Tangut equivalents: e.g.,


1623 0429 2vaq1 2nwo4 'palace ancient' (which happens to have Tangut noun-adjective order!).

(Li Fanwen's Chinese-Tangut index has a typo; it lists 0428 as the equivalent of 古 'ancient'.)

Another - the least likely of all - is to phonetically transcribe the Japanese name:


5026 5314 2946 1mi4 2a4 1ko1

I have used the Tangut characters for transcribing the Sanskrit syllables mi, ya, and ko. 5314 was probably phonetically something like [ja].

But how would the Tangut have known that 宮古 was read Miyako?

All of the above assumes the spelling 宮古 existed during the heyday of the Tangut. But I don't know how old it is. JAROSZ ON NEVSKY ON MIYAKO

I was planning to write a follow-up to this post using the Tangut Yinchuan font. But I ran out of time, so I'll merely link to Aleksandra Jarosz' 2015 PhD dissertation Nikolay Nevskiy's Miyakoan Dictionary reconstruction from the manuscript and its ethnolinguistic analysis: Studies on the manuscript (via Bitxəšï-史). Although it is obviously about 宮古 Miyako, its profile of Nevsky is still of interest to Tangutologists. It is no wonder that he "succeeded in deciphering the highly complicated, Chinese-character-inspired and by then largely unintelligible script of the medieval Xixia kingdom, the homeland of Tangut speakers" given that he

... was a very prolific and dedicated scholar, remembered by his colleagues and informants alike as one truly open-minded and able to grasp the cultures and languages of the subjects of his study almost intuitively. He was also a brilliant multilingual speaker, reportedly having mastered as many as sixteen Asiatic languages (apart from Japanese including Tibetan, Mongolian, Manchu, Pali, Korean and Giliak), as well as English, German, French and Latin (Kanna 2008:167). He acquired his first Orient languages as early as in the times of his Rybinsk gymnasium (post-1900), when he learned Tatar from a local family of native speakers, as well as mastered Arabic alphabet through self-study (Katō 2011:18). (p. 19)

And the Tangutologist and Khitanologist Viacheslav Zaytsev appears on page 8 and in the acknowledgements!

Next: How to write 'Miyako' in Tangut. TANGUT AVIAN ANATOMY

About twenty-five years ago I learned the following method to convert base-10 numerals from 1 to 60 into their Chinese sexagenary equivalents.. The coming Chinese new year is the 34th in the 60-year cycle. The first character of the sexagenary term is the Heavenly Stem for the second digit: i.e., 丁 'fourth Heavenly Stem'. The second character is the Earthly Branch for X, a number between 1 and 12:

(X + (Y * 12)) = 34

X turns out to be 10 (and Y is 2), so the second character of the sexagenary term is 酉 'tenth Earthly branch': i.e.., rooster'.

The Tangut adopted the sexagenary cycle and somehow found Tangut equivalents for the Heavenly Stems. Last time I wrote about 𗸃 1vi1 'fourth Heavenly Stem'. The reasoning behind choosing 1vi1 eludes me. (I'm assuming the Tangut terms were repurposed existing words rather than just made up.)

On the other hand, the logic behind the Tangut equivalents of the Earthly Branches is transparent: e.g., 酉 'rooster' was simply translated as 𗿼 2262 1jwon3 'bird'. (The Japanese did the same thing; they read 酉 as the native word tori 'bird'.)

2262 has three components: 𘤊𘤏𘪣.

Andrew West has written about the first at length here; it appears in other characters for words for birds (see below) but also has other associations.

The first and second components 𘤊𘤏 appear in the second entry in the Tangraphic Sea:

𘀑 = 𘀏 + 𘤊 (< 𗿼)

3911 1pu1 'a kind of bird' = left and top of 3909 1pu1 'the name Pu' (phonetic) + left of 2262 1jwon3 'bird'.

3911 in turn is part of the analysis of 3909, the first entry in the Tangraphic Sea:

𘀏 = 𘀑 + 𘦑 (< 𗩝)

3909 1pu1 'the name Pu' = left of 3911 1pu1 'a kind of bird' (phonetic) + right of 2653 1penq 'horn'

You can see those two characters in context at Andrew West's site.

𘤏 might also be semantic for 'bird' in 3911 and even 3909 if the Pu were associated with birds and horns.

The third 𘪣 means 'bird', but like most Tangut semantic components (and unlike its possible inspiration, Chinese 鳥 'bird'), it cannot stand by itself. I don't know why some elements can be independent and others can't. In the case of 2260, the elements appended to 𘪣 'bird' are phonetic:

𗿼 = 𘤊𘤏 (< 𗿤) + 𘪣 (< 𘝋)

2262 1jwon3 'bird' = left and center of 2260 1jwon3 'breeding' + left of 1242 2dzwy4 'wing' (with a slight modification of the top element's bottom right corner)

2262 1jwon3 'bird' sounds like 2260 1jwon3 'breeding' and has 1242 2dzwy4 'wings'.

2260 has a circular derivation:

𗿤 = 𘤊𘤏 (< 𗿼) + 𘣑 (< 𘟢)

2260 1jwon3 'breeding' = left and center of 2262 1jwon3 'bird' (phonetic) + right of 0373 2vi1 'to copulate, mate' (semantic)

So does 1242:

𘝋 = 𘪢 (< 𘝁) + 𗟎

1242 2dzwy4 'wing' = left of 0673 2thy1 'wing' (semantic) + bottom right of 4289 2dzwy2 'winding corridor' (phonetic; is that component [Boxenhorn code: caigie] in Unicode?)

4289 is obviously from 1242 as a phonetic plus the semantic component 𘡩 'wood' (a corridor can be a wooden structure). The Tangraphic Sea confirms my guess:

𗟎 = 𘡩 (< 𗞵) + 𘝋

4289 2dzwy2 'winding corridor' = top of 4364 1rur4 'wooden framework' (semantic) + all of 1242 2dzwy4 'wing' (phonetic)

(The Boxenhorn code for the bottom right of 4289 is tok, not caigie, but the two components look alike to me.)

𗟎 4289 must postdate the less complex 𘝋 1242 'wing'. But does 𘝋 1242 'wing' postdate 𗿼 2262 'bird'? And what is the function of the right side of 1242 (stroke code EACCQBE not in N4636) which is unique to that character? If it is derived from two other characters, why weren't those characters mentioned in the Precious Rhymes of the Tangraphic Sea (the corresponding volume of the Tangraphic Sea has been lost)? MY FIRST POST IN TANGUT YINCHUAN

I just used Andrew West's file to convert Li Fanwen numbers for Tangut characters into Unicode for the first time to type the Tangraphic Sea analysis of1vi1 'fourth Heavenly Stem', the first half of the sexagenary term for the Tangut year beginning on 28 January 2017:

𗸃 = 𘣟 (< 𗷰) + 𘧦 (< 𘔁)

0410 1vi1 'fourth Heavenly Stem' = left of 0613 2t-? 'to refuse, remove' + 'fire', left of 4661 1bi4 'third Heavenly Stem'

If you can't see the characters, please install Prof. 景永时 Jing Yongshi's free Tangut font at BabelStone.

It's not surprising that 0410 shares 𘧦 'fire' with 4661 since both the third and fourth Heavenly Stems are associated with fire and were hence called 'red' in Khitan and Jurchen (and ᡠᠯᡤᡳᠶᠠᠨ fulgiyan 'red' and ᡠᠯᠠᡥᡡᠨ fulahūn 'reddish' in Manchu).

But why was 𘧦 'fire' combined with 𘣟from 2t-? 'to refuse, remove' which is neither (nearly) homophonous with 1vi1 'fourth Heavenly Stem' nor obviously semantically relevant to it? 𘣟 is not among the character components that Nishida (1966) was able to gloss.

𗷰 0613 2t-? 'to refuse, remove' is listed in Tangraphic Sea as a component in at least two more characters:

𗅯 = 𘠐 (< 𗅉) + 𘣟 (< 𗷰)

2377 1ky4 'to prohibit' = 'not', left of 1906 1non2 (conjunction) (semantic) + left of 0613 2t-? 'to refuse, remove' (semantic)

𘒐 = 𘧉 (< 𘒖) + 𘣟 (< 𗷰)

1462 1lo1' 'cooperation' = 1535 1lo'1 'to gather, assemble' (semantic/phonetic) + all of 0613 2t-? 'to refuse, remove'

There may have been other derivatives of 0613 in the lost 'rising tone' volume of the Tangraphic Sea.

I can understand why 'to refuse' is in 'to prohibit', but what's it doing in 'cooperation'?

And I might expect 'not' + 'to refuse' to represent a word for 'not refuse', but I presume the character is like a double negative.

Lastly, I have no idea what the etymologies for 𘔁 1bi4 'third Heavenly Stem' and 𗸃 1vi1 'fourth Heavenly Stem' are.


I just installed Prof. 景永时 Jing Yongshi's free Tangut font which can be downloaded from BabelStone. Thanks to Prof. Jing for making it freely available and to Andrew West and Michael Everson for mapping his font onto Unicode and extending it to include more characters and even character components.

銀川 Yinchuan 'Silver River' is the modern Mandarin name for the city now on the site of the former Tangut capital. Yinchuan has a 西夏区 Xixia qu 'Western Xia (i.e., Tangut) District' in name only (population 329,310).

I should eventually add Tangut characters in that font to my database of Tangut character readings (download version 1.3 1 here). TANGUT PHONETIC DATABASE VERSION 1.3.1

I have updated my database of Tangut readings (download version 1.3 1 here) with the following changes:

- the anomalous ren-readings combining r- with a nonretroflex rhyme -en have been replaced by len. See Jacques (2014: 184-185).

- corrected readings for

L2164 and L3965 which had a nonexistent -an' rhyme instead of -y'

Thanks to Andrew West for spotting the wrong reading of L2164.

L5566 which belongs to rhyme 70 (1.67), not rhyme 67 (1.64)

L6027 which had r in the "Nasal/w" column instead of the "Cycle" column

The first change is in version 1.3 which I did not upload. All other changes are new in version 1.3.1. THREE TANGUT MEATS

I used to think Tangut

5865 1soq1 'three' (Tibetan transcriptions gsoH x 14, gso x 4, so x 2)

was from *k-sum (cf. Japhug rGyalrong χsɯm and Written Tibetan gsum) with a *k- that weakened to *x- and assimilated to the following *s- which conditioned vowel tension (written as -q):

*k-s- > *xs- > *ss- > s-q

The Tibetan transcriptions with g- may reflect a dialect retaining preinitial *k-.

Guillaume Jacques (2014: 197) proposed a pre-Tangut form *sə-svm (whose *v = any vowel but *i) with reduplication like Tagalog ta-tlo < Proto-Austronesian *telu.

(Might the retroflexion of Tangut

2005 21lyr' < *rliXH 'four'

be from a reduplicated *l- that merged with the preinitial *r- that conditioned vowel retroflexion written as -r? Retroflex vowels are so common in Tangut that I suspect they had sources other than *r-.)

But if *k-s- became s- + tension - and *k-obstruent sequences which became aspirates - then how can I account for the aspirate in

3465 1chhi3 'meat' (Tibetan transcription: chi)

whose root had initial *sj- (cf. Written Tibetan sha, Written Burmese sāḥ)?

Maybe *k-s- and *k-sj- had different reflexes:

*k-s > *x-s- > *s-s- > s-q

*k-sj- > *kʂʰ- > chh- [tʂʰ-]

But I doubt that was the case. There is no external support for *k- in 'meat'.

Perhaps the preinitial of 'meat' was *t- (cf. Japhug tɯ-ɕa), and *t-sj- fused into aspirated chh-. That proposal is not without problems, as we'lll see next time. CORDIAL COMPASSION?

Almost two weeks ago, I realized that

1483 2ne4 'compassion'

in the Tangut text

3457 0478 1483 2323 5404 4625 5302 1siw4 1sho'3 2ne4 1vy1 1la1 2me'4 0L?

'new collect compassion piety record final volume'

sounds almost exactly like

2518 2ne'4 'heart' (Tibetan transcriptions from Tai 2008: 215: gne x 4, ne x 1, nye x 1, gnyeH x 1)

The only difference between the two is the presence of the unknown phonetic quality 'prime' (transcribed as -') in 'heart'.

Are the two words are related? In other words, did they have similar forms in pre-Tangut?

Before I can answer those questions, I should survey the phonetic details of 'heart' in Tangut:

- According to Arakawa's hypothesis, Tibetan preinitial g- indicates tone 1, but 'heart' has tone 2

- For twenty years I have suspected, contra everyone else, that the Tibetan preinitials might be taken literally rather than as orthographic devices for tones. Could the transcribed dialect preserve a preinitial *k- (written as g- following Tibetan spelling conventions; kn- is un-Tibetan) lost in standard Tangut? Perhaps preinitials in the transcribed dialect normally corresponded to tone 1 in standard Tangut, but 'heart' developed tone 2 in standard Tangut because it had lost its preinitial before tonogenesis.

- If Tangut grades were like Chinese grades as I interpret them, Grade IV was the most palatal. But exactly how this palatality was expressed is unclear. Did Tibetan nye ~ ne transcribe [ɲe], [nʲe], [nie], etc.?

- What Tangut sound did Tibetan final -H transcribe? The mysterious 'prime'?

On to pre-Tangut:

-e'4 with 'prime' has six sources:

*Cɯ-...-aŋX, *Cɯ-...-eŋX, *Cɯ-...-enX

*(Cɯ-)...-jaŋX, *(Cɯ-)...-jeŋX, *(Cɯ-)...-jenX

-e4 without 'prime' has only two sources (and yet is more common!):

*Cɯ-...-aŋ and *(Cɯ-)...-jaŋ

I no longer think *Cɯ-...-an(X) is a source of -e(')4.

- Exterior cognates of 'heart' point to a front vowel and *-ŋ e.g., Tibetan snying.

- But they also point to *s- and not *k-.

- STEDT's Proto-Tibeto-Burman roots #251, #689, and #1385 have *s/k-, but the data on the site don't seem to support *k-.

- And if pre-Tangut had *s- in 'heart', that consonant would condition tension absent from 2ne'4 (i.e.., 'heart' would be *2neq4).

Taking all of the above into account, the pre-Tangut word for 'heart' was

*kɯ-neŋXH or *k(ɯ)-njeŋXH

with a front vowel like Tibetan snying. (*-H conditioned tone 2.)

But 'conscience' could not be

*kɯ-neŋH or *k(ɯ)-njeŋH

because those forms would have developed into *2ni4, not 2ne4. (Whatever *X was blocked the raising of *e in *eŋX.)

Moreover, it is improbable that a nonbasic word 'conscience' would be derived from a basic word 'heart' via subtraction.

It is more probable that 'conscience' is an unrelated word with a different rhyme 

*Cɯ-naŋX or *(Cɯ)-njaŋX

that came to sound like 'heart'. A VEXING VOLUME

At the beginning of "An Interesting Reading", I mentioned the Tangut text

3457 0478 1483 2323 5404 4625 5302 1siw4 1sho'3 2ne4 1vy1 1la1 2me'4 0L?

'new collect compassion piety record final volume'

whose final character has an unknown reading. 0- indicates an unknown tone. L- indicates an unknown Class IX initial (l- lh- ld- r- z- zh-). ? indicates an unknown rhyme. I am going to start using -0 for an unknown grade.

This character appears as an initial speller in this fanqie chain:


1165 1luq3 'to rub, knead' = 5302  0L?0 + 0500 1tsuq4

(There is no phonemic distinction between Grades III and IV in rhyme 62 -uq3/4; the grade is automatically determined by the initial.)


4550 1lheq4 'sorcerer' = 1165 1luq3 + 3318 1cheq3

(There is no phonemic distinction between Grades III and IV in rhyme 64 -eq3/4; the grade is automatically determined by the initial.)

I have converted the readings from Gong's reconstruction from Li Fanwen (2008) into my system. I am unaware of any transcriptive evidence for 1165 and 4550 - or even any attestations of either character outside dictionaries. How can 1165 have a known initial if the initial of its initial speller is unknown? Why not reconstruct 1165 as 1Luq3/4? (l- r- zh- would be followed by Grade III and ld- lh- z- by Grade IV.) And why does the initial of 4550 (lh-) not match the initial of its initial speller 1165 (l-)? Shouldn't 4550 be reconstructed as 0Leq3/4?

How many other Tangut character readings are shaky? AN INTERESTING READING

I've been filling holes in my Tangut character folder lately. So far I have images for 3,634 out of the 6,125 Tangut characters in Unicode 9.0: i.e., about 59% of the total. The fact that I haven't needed images for two out of five characters even after nearly eleven years of blogging about Tangut indicates how skewed the distribution of characters is. I estimate the number of distinct characters in Guillaume Jacques' index to the

3457 0478 1483 2323 5404 4625 5302 1siw4 1sho'3 2ne4 1vy1 1la1 2me'4 0L?

'new collect compassion piety record final volume'

to be about a thousand.

The reading of 5302 is unknown. It is in the section for characters without homophones in the ninth chapter of Homophones, so it must have an L-type initial (l- lh- ld- r- z- zh). Beyond that nothing else can be said. I know of no transcriptions of it.

The reading of the first half of

4006 5383 0TS? 2se4 'interest' (in the financial sense)

should be unknown. Yet Li Fanwen (2008: 643) lists Gong's reconstructed reading as 2tswər. This reading was not in Li Fanwen (1997: 740). Kychanov and Arakawa (2006: 367), on the other hand, list the Sofronov-style reconstruction 2?ə̣.

How does anyone know what the rhyme of 4006 is? The character does not appear as a final speller in any fanqie. I do not know of any transcriptions of it that could even give us a vague idea of what the rhyme might have been. And transcriptions would not indicate the tone which I have written as 0- for unknown.

I write the initial as capital TS- to indicate that it belonged to class VI (alveolar sibilants other than z- which I suspect might have been lateral [ɮ]). But I don't know which class VI initial it had: ts-, tsh-, dz-, or s-.

Lastly, how does anyone know what 4006 5383 means? Neither edition of Li cites any attestations outside dictionaries, and Li (1997: 740, 974) lists no definitions for either 4006 or 5383. Have Kychanov and/or Arakawa found such attestations and identified the meaning from context? SINO-TANGUT PHONOLOGICAL PARALLELS (PART 1)

At a glance, Tangut and Tangut period northwestern Chinese (hereafter simply 'Chinese') phonology appear to be similar: 

- They had largely overlapping consonant inventories with a three-way distinction between voiceless unaspirated, voiceless aspirated, and prenasalized voiced: e.g., p- : ph- : b- [mb].

Tangut, however, had more consonants: gh-, lh-, ld-, r-, z- [ɮ].

And Chinese had an f- absent in most Tangut reconstructions (the exceptions being Nishida's and Arakawa's).

- They had six basic vowel types: u, i, a, y, e, o.

- These vowels had four types of variations ('grades').

Tangut, however, had further variations absent from Chinese: tension, retroflexion, and the mysterious quality that I write with -' and call 'prime'.

- They contrasted oral and nasal vowels.

- Their syllables had the structure C(w)V(G); they only permitted -w and perhaps -j in coda position.

Despite many common features, it would be an exaggeration to say that the two languages share a common phonology. Notice that I have not mentioned tones. There does not seem to be any correlation between the two 'tones' of Tangut and Chinese tonal categories: e.g., Chinese 龍 *2lon3 'dragon' was borrowed twice with both tones:

4897 1lon3 and 4203 2lon3

This could imply that Tangut and Chinese tones sounded very different, making one-to-one mapping between them impossible.

Or perhaps Tangut had phonations (plain vs. breathy?) instead of tones despite the use of 'tone' in the Tangut phonological tradition. The Tangut couldn't hear tones because they didn't have any. (I am now skeptical of the phonation hypothesis that I came up with in the late 90s. If Tangut had phonation and Chinese didn't, why didn't the Tangut simply borrow and transcribe all Chinese tones with Tangut clear phonation?)

One last possiblity - as yet unexplored - is that the Tangut were sensitive to sandhi variants of tones. Suppose, for instance, that Tangut and Chinese tones 1 and 2 were similar, and that Chinese tone 1 became tone 2 before tone 4: 龍栢 */1lon3 4pe2/ > [2lon3 4pe2] 'dragon cypress'. Then it would make sense to borrow that disyllabic word as

4203 4119 2lon3 1pi2

with the second tone while borrowing monosyllabic 龍 /1lon3/ = [1lun3] 'dragon' as

4897 1lon3

with the first tone. But why, then, was Chinese 龍栢 */1lon3 4pe2/ 'dragon cypress' transcribed (as opposed to borrowed) in the Timely Pearl as

4897 5970 1lon3 1pi2

with the first tone rather than the second? Here are five explanations:

1. The most boring, namely, that this is a random error.

2. Hypercorrection: the transcriber knew that the Chinese word for 'dragon' had tone 1 and might have assumed that tone 2 in the Tangut loan deviated from the Chinese (when in fact it reflected Chinese tone sandhi).

3. The transcription reflects a careful Chinese reading pronunciation "1lon3 ... 4pe2" without tone sandhi.

4. The transcription reflects a variant Chinese pronunciation without tone sandhi - perhaps from a dialect slightly different from the source of the Tangut borrowing.

5. The borrowing reflects a slightly earlier stage of Chinese with tone sandhi and the transcription reflects a slightly later stage without tone sandhi (and with the original first tone restored by analogy with 'dragon' in isolation?).

The tones are not the only differences between Tangut 2lon3 1pi2 'dragon cypress' and its Chinese source lon3 4pe2. I'll explore the others in part 2. DISSECTING A TANGUT MARRIAGE (PART 5)

If 5051 (second half of 1y4 1naq4 'marriage'; Boxenhorn code: biogeodex) could be abbreviated to resemble 2544 'sage'  (Boxenhorn code: geo) in 0532 2ge4 'to marry' (Boxenhorn code: hosgeo),


why wasn't it abbreviated that way in other derivatives?

3657 1y4 (first half of 1y4 1naq4 'marriage'; Boxenhorn code: giibiogeo)

1625 2tuq4 'to mate, marry' (Boxenhorn code: fosbiogeo)

5975 1naq4 'parallel, weft' (Boxenhorn code: palbiogeo)

In other words, why do those three characters have a 'hat' (bio) absent in 0532?

I think 3657 needed a 'hat' (bio) to distinguish it from an existing character without it:

2449 2bi1 'sun' (Boxenhorn code: giigeo)

2449 must precede 3657 in the chronology of tangraphic creation.

But there are no characters with the structures





so in theory the 'hats' (bio) are redundant, though their presence does makes the connection of 1625 and 5975 to 5051 more transparent.

I am reminded of the inconsistency of simplification in the postwar Japanese script:

- 獨 'alone' was simplified to 独 (with the phonetic 蜀 'the state of Shu' reduced to 虫 'bug')

- but 濁 'muddy' was not simplified to 浊 even though no such character already exists (and years later, 濁 was simplified to 浊 in the PRC).

There is no deep meaning behind the inconsistency of 独 and 濁. Perhaps there is none behind the inconsistency of

0532 without a 'hat' (bio)

on the one hand and

1625 and 5975 with 'hats' (bio)


Many Tangut marital characters from the previous parts contain

2544 2shen4 'sage' < Chinese *3shen3

and if one had never known about 2544, one might guess that it was a semantic component 'marry'. But it acquired that secondary function as an abbreviation of 5051:


5051 1naq4 = 3657 2705 2546 2705 1y4 2ber'4 1naq4 2ber'4

(first half of 1y4 1naq4 'marriage') right + 'god' right

2544 'sage' is semantic in 2546 'god', the phonetic of 5051. I have no doubt about the first half of the Tangraphic Sea analysis of 2546:


2546 1naq4 = 2544 1602 0149 0737 1naq4 2ngorn1 2wer1 1chhen3

'sage' all + 'protect' bottom

But I have doubts about the second half. 0149 must be derived from 2546 rather than the other way around. The 'person' on the right of 2546 is either simply 'person' (but why would 'god' have 'person'?) or an abbreviation of one of the 1,186 (!) tangraphs containing 'person'.

Someone (I?) should try to reconstruct a chronology of the derivation of tangraphs based on the Tangraphic Sea derivations plus common sense. Here's a sliver of that chronology:

In words: 2544 begat 2546, which in turn begat 5051 and 0149.

5051 begat 3657, 1625, 5975, and 0532 (but why does 0532 lack the 'horned hat' of the others?).

5138 begat 5138 1gu'1, first syllable of 1gu'1 1chhiw4, the name of a Tangut god (1chhiw4 is 'six').

Next: Why don't all married sages wear hats? DISSECTING A TANGUT MARRIAGE (PART 3)

As I wrote in part 2, I thought that 5051 1naq4 was simply phonetic in its homophone 5975:


5975 1naq4 'parallel, weft' = 5938 3936 5051 3936 2ge4 1pha1 1naq4 1pha1

'classical text, warp' left + (second half of 1y4 1naq4 'marriage') left

But then I discovered that 5938, listed as the source of the left side of 5975, had a homophone

0532 2ge4 'to marry'

which the Tangraphic Sea lists as a definition for the first half of

3657 5051 1y4 1naq4 'marriage'.

Is 0532 'to marry' a metaphorical extension of 5938 'warp' (in the sense of weaving)? Li (1997: 104) defined 0532 as 'weave, marry' - to which STEDT added '(join in marriage)' - but the revision of the entry for 0532 in Li (2008) has the definition 'to marry, to unite in marriage' without any reference to weaving..

If 0532 is originally a weaving term, then could 5051 1naq4 of 3657 5051 1y4 1naq4 'marriage' also originally be a weaving term - specifically, an extended usage of 5975 1naq4 'parallel, weft'?

3657 1y4 is attested as an independent word 'marriage, matchmaker, relatives by marriage'. 3657 5051 1y4 1naq4 'marriage' is thus originally 'marriage weft' with the first half clarifying the metaphorical use of the second half which does not occur on its own in the sense of 'marriage'.

Do 5938 2ge4 < *Nɯ-Kan/ŋ ~ *Cɯ-ŋgan/ŋ 'warp'* and 5975 1naq4 < *Sɯ-naC 'weft' have cognates outside Tangut? Unfortunately, neither 'warp' nor weft' are in the rGyalrongic Languages Database. Both are at STEDT, but I can't find any cognates there or in Guillaume Jacques' Japhug dictionary which lists tɤ-ʁjar 'warp' and tɯ-jlɤβ 'weft'.

*I reconstruct a presyllable with to condition Grade IV after a velar. However, I do not know whether that presyllable had a nasal initial *N- or preceded a nasal. I also do not know if the velar stop after the nasal was originally voiced or not. In any case, Tangut g- is from *ŋg- which may in turn have more complex origins. DISSECTING A TANGUT MARRIAGE (PART 2)

The character for the second half o

3657 5051 1y4 1naq4 'marriage'

has two probable derivatives besides the first character:


1625 2tuq4 'to mate, marry' = *0482 3936 5051 3936 2dzen4 1pha1 1naq4 1pha1

*'to copulate' left + (second half of 1y4 1naq4 'marriage') left?


5975 1naq4 'parallel, weft' = 5938 3936 5051 3936 2ge4 1pha1 1naq4 1pha1

'classical text, warp' left + (second half of 1y4 1naq4 'marriage') left

The analysis of 1625 is my guess since it is one of the many characters whose analysis was in the lost second tone volume of the Tangraphic Sea.

0482 is the clarifier of 1625 in Homophones, so it is certain that the Tangut considered the two to be semantically related even if 0482 was not actually in the analysis of 1625.

1625 2tuq4 should go back to pre-Tangut *Sɯ-to-H:

*S- conditioned the tension of the vowel transcribed as -q.

*-ɯ- conditioned Grade IV in lower vowels (*a, *e, *o) after dentals

I am assuming that the raising of *o to *u predated the conditioning of Grade IV.

I could be wrong. Maybe Grade IV was conditioned by a raised *o after a dental:

*S(ɯ)-to-H > *S(ɯ)-tu-H > 2tuq4

If so, maybe there was no after *S-.

*-o raised to -u (Jacques 2014: 206); whether this occurred before or after Grade IV is uncertain.

*-H conditioned tone 2; it may ultimately be from *-ʔ or *-h (< *-s).

*Sɯ-to-H might go back to an even earlier *Sɯ-ton-H if it is cognate to forms for 'to marry' like

Somang rGyalrong ston muŋ ka-pa

Daofu sto lmo və  (is v- a lenited *p preserved in Somang?)

Xinlong Queyu ste⁵⁵ rmu⁵⁵ vi¹³ (did *o front before *-n?; v- < *p-?)

and if *Cɯ-...-on merged with *Cɯ-...-o into -u3/4. That would be parallel with the merger of *Cɯ...-en and *Cɯ...-e into -i3/4, and one could propose a general rule:

*Cɯ-...mid vowel + -n > Grade III/IV high vowel

5051 must be semantic in 1625 since the two sound nothing alike. Conversely, 5051 must be phonetic in 5975. But could it be something more? I didn't think so at first. I'll explain why I changed my mind next time. DISSECTING A TANGUT MARRIAGE (PART 1)

The two halves of

3657 5051 1y4 1naq4 'marriage'

are written similarly, so it's not surprising that they have circular derivations in the Tangraphic Sea:


3657 1y4 = 3436 2705 4973 3936 1sa'1 2ber'4 1naq4 1pha1

(second half of 1ne4 1sa'1 'close relative) right + (second half of 1y4 1naq4 'marriage') left


5051 1naq4 = 3657 2705 2546 2705 1y4 2ber'4 1naq4 2ber'4

(first half of 1y4 1naq4 'marriage') right + 'god' right

2546 is clearly phonetic in its homophone 5051. So I think the sequence of character creation was

2546 > 5051 > 3657

though I am surprised the character for a second syllable was devised before the character for a first syllable.

Why was 5051 abbreviated in 3657? Because it was no longer phonetic, so there was no longer any need to keep all of 2546 under the 'horned hat'? Because the right-hand 'person' component (Boxenhorn code: dex) is so common (it appears in one out of five Tangut characters) that it is almost expendable? In any case, 5051 doesn't appear in its entirety as a component of any character.

Next: Other instances of 'depersonalized' 5051. EATING BEGINS WITH LOVE, NOT MARRIAGE

Thanks to Guillaume Jacques for catching my mistake. The correct fanqie for 'eat' from "The Past and Present Sound of Eating in Tangut" (Part 1 / Part 2) is


4517 1dzi3 'eat' = 4973 1dzu4 'love'+ 0932 1i3 'many, more, much'

The correct initial speller 4973 is visually very similar to 5051, the erroneous initial speller that I posted which represents the second half of the disyllabic word

3657 5051 1y4 1naq4 'marriage'

I will take a closer look at this word starting tomorrow.

Alas, the enigma of the final speller remains. Why is it Grade III instead of Grade IV after dz-? THE 'RIGHT' RHYME (PART 6) / TANGUT PHONETIC DATABASE VERSION 1.2

After this I think I'll be really be done with the topic of Tangut rhyme 101 (1.93/2.86) for some time.

I have updated my database of Tangut readings (download version 1.2 here) to incorporate the changes I have proposed in this series:

- the reinterpretation of rhyme 101 as -er' instead of -ir' (see part 1)

- the reassignment of

2705 'to help; right side of character (i.e., assistant)' and 2928 'to explain, note' (with 'speech' on its right side; probably a different spelling of a specialized usage of 'to help')

from rhyme 2.54 to 2.86 (see "Explaining the 'Right' Reading")

I didn't upload version 1.1 in which I replaced a lot of symbols for nasalization with glides after mid vowels, bringing my transcription closer to Gong's reconstruction: e.g.,

-en > -ey (corresponding to Gong's -əj, -iəj, -jɨj)

-on > -ow (corresponding to Gong's -ow, -iow, -jow)

I have retained the nasalization from 1.0 in 1.2. THE 'RIGHT' RHYME (PART 5)

I didn't expect to write a five-part series on this topic, but I forgot to mention the Tangut-internal evidence on the rhyme in part 4, so it's getting its own part.

"It" consists of alternations between rhyme 101 and other rhymes (70 -iq3/4 and 84 -ir3/4) in what superficially appear to be synonym pairs (Gong 2002: 103):


1. 5683 2er'4 ~ 5209 2iq4 'to stretch, lengthen'


2. 1928 1ler'3 ~ 5850 1liq3 'to rub'


3. 5742 1tser'4 ~ 3641 1tsir4 'to choose'

Gong reconstructed rhyme 101 as -iir which matched the i of the non-101 members of those pairs. However, the placement of rhyme 101 in the Tangraphic Sea (see part 1) and the Tibetan and Chinese transcription evidence (see parts 2 and 3) point to a nonhigh vowel. The merger of *-ir' with *-er' that I first proposed in part 1 can account for the vowel mismatch.

Those three pairs could be reconstructed in accordance with the proposals in part 4 as

1. 'to stretch, lengthen'

*r((ɯ)-s)ɯ-ʔa-X/*r((ɯ)-s)ɯ-ʔe(n/ŋ)-X ~


2. 'to rub'

*r((ɯ)-s)ɯ-la-X/*r((ɯ)-s)ɯ-le(n/ŋ)-X ~


3. 'to choose'

*rɯ-tsa-X/*rɯ-tse(n/ŋ)-X ~



*Cɯ-tsar-X/*Cɯ-tser-X ~


All three pairs require a -vowel presyllable to condition the raising of *a and/or *e to their Grade III/IV reflexes.

The rhyme 101 members of pairs 1 and 2 must have had a prefix *r(ɯ)- to condition retroflexion absent in their rhyme 70 counterparts ending in -iq. If their bases had *sɯ-prefixes, then there is no need to reconstruct  after *r- since the  of *sɯ- would be sufficient to condition Grades III/IV. But the possibility of *rɯ-sɯ- cannot be ruled out.

All three pairs involve the presence or absence of the mysterious factor *X that I have arbitrarily written at the ends of syllables.

Without external evidence, there is no way to narrow down the possibilities.

And without narrower definitions, there is no way to be sure about the functions of the various affixes. (*X may not have been an affix, though for convenience I write it as if it were a suffix.) THE 'RIGHT' RHYME (PART 4)

I originally wanted to end this series with comparative evidence for Tangut rhyme 101 (1.93/2.86) words, but I don't know of any. Which is not surprising as there are only thirteen characters with a total of five different readings ending in rhyme 101 (Arakawa 1997: 91):

Homophones initial class Tone 1 Tone 2
I   2ber'4
V 2ker'4
VI 1tser'4 2tser'4
IX 1ler'3  

Grades III and IV are normally in complementary distribution. If there are no Grade III/IV minimal pairs in a rhyme, I assign Grade III to Class II and VII consonants and the Class IX consonant l-. The default grade for all other initials including the Class IX consonant lh- is IV. This assignment parallels the general pattern of distribution of initials in rhymes that have Grade III and IV minimal pairs. The different distribution of l- and lh- suggests that they did not simply differ in terms of voicing. I think l- was velar [ɫ] (as in Nishida's 1ɫĭə̣r corresponding to my 1ler'3) whereas lh- was a fricative [ɬ]. (Note that Nishida reconstructed l- and ɫ- as distinct initials, whereas I think /l/ was always [ɫ] except before Grade IV rhymes where it was [l].)

Nishida (1964: 67) was the first to identify rhyme 101 as retroflex, and that classification has been carried over into the reconstructions of Arakawa, Gong, and this site. (Sofronov does not reconstruct retroflexion in Tangut vowels in any of his three reconstructions.)

Vowel retroflexion has two sources in Tangut: (pre)initial *r- and final *-r. The r- of Tibetan transcriptions of Tangut rhyme 101 syllables may directly reflect a preinitial r- preserved in a nonstandard Tangut dialect (see part 2). Preinitial *r- may have been *rɯ- with a high vowel conditioning Grade III/IV.

Nishida (1964: 67) also reconstructed tension in rhyme 101. Gong reconstructed preinitial *s- as the source of tension in Tangut vowels, though he reconstructed length rather than tension as an extra nonretroflex feature in rhyme 101: -iir. If Nishida and Gong are both right, the sources of rhyme 101 syllables would have preinitial *s(-r)- or *(r-)s-.

I can't rule out Nishida's tension, but I do not think Gong's length was present in this rhyme or any other, at least not during the Tangut imperial period. Sanskrit has phonemic vowel length that does not correlate with Gong's reconstructed vowel length in the readings of Tangut transcription characters. For now I simply acknowledge that this rhyme was somehow different from regular -er3/4, and indicate that difference with an apostrophe that I call 'prime'. I arbitrarily indicate the unknown source of 'prime' as a final *-X in pre-Tangut. The position of *-X is simply carried over from *-'; the actual conditioning factor of -' could have been anywhere in the syllable. I have never seen -'/*-X correlate to anything in any other Sino-Tibetan language. It is remotely possible that Tangut preserves something lost in the rest of its gigantic family, but I am hesitant to make such an extreme claim until I look harder. Which won't be tonight. All I can say for now is that *-X seems to have blocked raising in rhyme 40, the nonretroflex counterpart of 101:

R10 -i3 < *Cɯ...-en, *Cɯ...-eŋ

R11 -i4 < *Cɯ...-en, *Cɯ...-eŋ

R40 -e'3/4 < *Cɯ...-enX, *Cɯ...-eŋX

and perhaps *Cɯ...-anX, *Cɯ...-aŋX?

Integrating the above proposals (other than *s-) with Guillaume Jacques' sources for -i/e3/4 (in my notation; -ji(j) in his) and my hypotheses of mergers from part 1 and part 3, I have come up with up to twelve possible sources of -er'3/4:

Early pre-Tangut Late pre-Tangut Standard Tangut
*rɯ-CaX *Cir'3/4 Cer'3/4
*r(ɯ)-CukX *Ciwr'3/4
*rɯ-CekX *Cewr'3/4
*rɯ-CanX *Cer'3/4

Obviously not all twelve had to exist in pre-Tangut. THE 'RIGHT' RHYME (PART 3)

In part 1, I proposed reinterpreting Tangut rhyme 101 (1.93/2.86) as -er' instead of -ir'. A mid vowel e fit the Chinese transcription *3me3 (Timely Pearl 32.2.8) for

2705 'to help; right side of character (i.e., assistant)'

better. And if rhyme 101 had e, that would be the vowel expected after y in rhyme 100 following the usual order of Tangut vowels.

In this part, I will start to look at the transcriptional evidence for rhyme 101.

1. Sanskrit transcription evidence

There isn't any. This tells us that 101 probably didn't sound like anything in Sanskrit. V'-rhymes are rare in Tangut transcriptions of Sanskrit and Vr'-rhymes seem to be nonexistent. That tells me that the unknown quality that I write with a prime symbol was absent from Sanskrit.

2. Tibetan transcription evidence

Tai (2008: 229) lists 22 transcriptions of two tangraphs with rhyme 101:

0467 1tser' 'method, art, skill, dharma'

transcribed as rtsi (x 1), rtse (x 5), rdze (x 1), rc? (x 1)

2698 2tser' 'nature, character'

transcribed as rtse (x 12), ?e (x 1)

Out of 22 transcriptions, 20 end in -e, 1 ends in -i, and 1 ends in an unknown vowel. The obvious conclusion is that the vowel of rhyme 101 was something like Tibetan e.(Why did Gong reconstruct long i instead of long e for rhyme 101?)

The preinitial r- of the transcriptions may either indicate the retroflexion of the following vowel or reflect an actual preinitial r- in the transcribed Tangut dialect corresponding to retroflexion in the standard dialect described in the Tangut phonological tradition:

Tangut dialect transcribed in Tibetan Standard Tangut
CV (plain vowel) CV (plain vowel)
rCV (r- + plain vowel) CVr (retroflex vowel)

If the table above is correct, the dialect transcribed in Tibetan had fewer vowels than the standard dialect; the latter had retroflex vowels absent in the former.

I would expect the Chinese transcriptions of rhyme 101 characters other than 2705 to also contain *e, but we will see that is not the case in part 3. THE 'RIGHT' RHYME (PART 1)

I wouldn't have guessed that

2705 2bir'4 'to help; right side of character (i.e., assistant)'

was transcribed in the Timely Pearl as *3me3 (32.2.8) with -e rather than -i. Why not transcribe it as, say, 彌 *1mbi4 with *-i?

It is a fact that 2705 is listed under rhyme 2.86 in the Precious Rhymes of the Tangraphic Sea.

On the other hand, it is merely a hypothesis that rhyme 2.86 was 2-ir'4: i.e., second tone Grade IV i with retroflexion and some unknown quality marked as -'. Should -ir' be -er' with a mid vowel like the transcription *3me3 of 2705?

In my systerm, there is no -er' or -ur', though all other Tangut vowel types are represented in -Vr' rhymes: -ir', -ar', -yr', -or'. Did *-ir' and *-er' merge into one rhyme while *-ur' and *-or' merged into another? It would be neat if the merged rhymes were of the same height: e.g., mid -er' and -or' or high -ir' and -ur'.

The usual order of vowels in the Tangraphic Sea is u-i-a-y-e-o (with iw/ew between e and o). This order is not followed in the -Vr' rhymes. Reinterpreting rhyme 101 (1.93/2.86) as -er' (cf. Arakawa's -yer2) would make the -Vr' rhyme order closer to the norm:

88. 1.83 -ar'1

89. 2.75 -ar'3/4

99. 2.84 -ir'1 (a merger of *-ir'1 and *-er'1?)

100. 1.92/2.85 -yr'3/4

101. 1.93/2.86 -er'3/4 (formerly written as -ir'3/4; a merger of *-ir'3/4 and *-er'3/4?)

102. 1.94 -or'1 (a merger of *-ur'1 and *-or'1?)

103. 1.95 -or'3/4 (a merger of *-ur'3/4 and *-or'3/4?)

(The absence of -Vr'2 rhymes may tell us that the phonetic quality of -' was incompatible with Grade II which was at least partly from *-r-. Retroflex vowels were conditioned by [pre]initial *r- or final *-r but not *-r-.)

The only remaining oddity in the order of -Vr' rhymes is the placement of 88-89 -ar' not only before 99 -ir' but also in the middle of the -Vr rhyme sequence:

77. -er1

78. -er2

79. -er3/4

80. 1.75/2.69 -ur1

81. 1.76/2.70 -ur4

82. 1.77/2.71 -ir1

83. 1.78 -ir2

84. 1.79/2.72 -ir3/4

85. 1.80/2.73 -ar1

86. 1.81 -ar2

87. 1.82/2.74 -ar3/4

88. 1.83 -ar'1

89. 2.75 -ar'3/4

90. 1.84/2.76 -yr1

91. 1.85 -yr2

92. 1.86/2.77 -yr3/4

93. 1.87/2.78 -ewr1

94. 1.88/2.79 -iwr4

95. 1.89/2.80 -or1

96/97. 1.90/2.81 -or2/3/4

The placement of the -er rhymes (77-79) before the -ur rhymes (80-81) instead of after the -yr rhymes (90-92) defies explanation.

Next: The transcriptive evidence for the 'right' rhyme. EXPLANING THE RIGHT READING

Until now I've been reading the character

2705 'to help; right side of character (i.e., assistant)'

in Tangut character analyses as 2beq4 (a reading converted from Gong's reconstruction 2bjịj in Li Fanwen's 1997 dictionary).  But last night I discovered that it should be 2bir'4 (a reading converted from Gong's reconstruction 2bjir in Li Fanwen's 1997 dictionary).

2705 and its homophone

2928 'to explain, note' (with 'speech' on its right side; probably a different spelling of a specialized usage of 'to help')

form a two-character homophone group in the first chapter of Homophones. All characters in that first chapter have readings with labial initials (p-, ph-, b-, m-). The Timely Pearl transcription of 2705 is *3me3 (32.2.8); the diacritic indicates b- rather than m-.

I have no idea how Nishida, Sofronov, and Gong determined the rhyme

N -ɛ̣ 2.54

S -ɪ̭e 2.? (no rhyme number listed on II: 307, but implictly 2.54 since its transcription 命 is listed under 2.54 on II: , presumably a typo for -ɪ̭ẹ since -ɪ̭e 2.54 should either be 2.35 or 2.37 according to I: 137)

G -jịj 2.54

= my 2-eq4

before the rediscovery of the Precious Rhymes of the Tangraphic Sea which listed 2705 under rhyme 2.86 (-ir'). 2928 is not in PRTS, but its rhyme must be identical to that of 2705 since the two are homophones.

(8.29.0:59: Perhaps 2.54 was determned by a process of elimination. The Chinese transcription 命 *3me3 most likely reflected a Tangut reading like be4 (be3 would be anomalous in Tangut). 2705 could not have had the first tone because 2705 was not in the Tangraphic Sea's volume for first-tone characters. 2705 was thus thought to be in the lost second-tone volume [and later it indeed was found in  the second-tone volume of  the Precious Rhymes of the Tangraphic Sea]. But was it 2be2 [2.33], 2be'2 [2.35], 2beq4 [2.54], or 2ber4 [2.68]? 2705 was not in the homophone groups thought to be for 2be2 and 2be'2, so 2beq4 and 2ber4 were the remaining possibilities. I don't know why 2.54 was favored over 2.68. The actual rhyme turned out to be 2.86 - which was none of the above!)

All this makes me wonder how many other second tone readings have been revised in Li Fanwen's 2008 dictionary in accordance with the Precious Rhymes of the Tangraphic Sea. FULL TETRATANGRAPHIC FORMULAE

You may have noticed in my last entry that I started quoting Tangraphic Sea four-character character analyses in full instead of converting them into A + B (+ C) formulae which are easier for me to type.  Perhaps paying attention to the exact wording of the analyses might help me formulate hypotheses about the script (which remains enigmatic to me after two decades).

David Boxenhorn saw the analysis of 'six' (which I will rewrite here in full) in "Tired Chapters of Accuracy"


3200 1chhiw4 = 3849 3130 1012 2705 1zhiw3 2mer4 1zeq4 2beq4

'(first syllable of 'sixth (month)') + palace + how-much right'

and my assessment of it as "implausible". He wrote that the implausiblity

supports the theory that the analyses are mnemonic.

You know what else supports the mnemonic theory (I just thought of it!)? The analyses are all four characters. I can easily imagine a group of students reciting them. Even singing them.

And isn't there a Chinese tradition of four-character sayings?

[... I]s there any continuity between adjacent analyses, either semantic or phonetic? Something to suggest that they are not meant to be read in isolation?

I just realized that The Golden Guide consists of 5-character lines. Characters plus their 4-character analyses are also 5 characters long. They could be both be recited in the same way, e.g. with the same tune!

You know, in English the most common meter is iambic pentameter, which would be great for every two lines of these works.

I should look at analyses in adjacent Tangraphic Sea entries in the new features.

I had been thinking that if the Golden Guide had a tune, it couldn't apply to the Tangraphic Sea analyses, but I had overlooked the possibility of including the analyzed character's reading in the tune.

As for meter, I have never seen a comparative study of the structures of poetry in (South)east Asian 'monosyllabic' languages. Does such a study exist? Or even, say, a study of poetic structures in what Guillaume Jacques might call the Macro-rGyalrongic world? Is there a Qiangic language today with poetry characterized by five-syllable lines? Is there a tonal pattern within and/or between lines of the Golden Guide and/or the Tangraphic Sea analyses?

In any case, I don't think the Tangraphic Sea analyses are truly etymological. In other words, I don't think they necessarily reflect the reasoning of the creator(s) of the script. Who would devise graphs for 'sixth (month)', 'palace', and 'how much', and then fuse them into 'six'?

