14.3.1:2:11: ?-HEARTED GIRL FIGHTER?

I was puzzled by the Thai title


nak rop saaw hua cay mahaakaan

lit. '-er fight girl head heart ?' = '?-hearted girl fighter'?

for Brave.

Mahaakaan is spelled <mahākāḷ> and seems to ultimately* be from Sanskrit mahākāla-, lit. 'great-black', originally a form of Shiva in Hinduism and later a dharma defender in Vajrayana Buddhism.

The Royal Institute Dictionary defines mahaakaan as a drug or as a plant (Gynura pseudochina).

None of those definitions seem to fit the context of Brave. Would a Scottish princess have the heart of Mahakala who isn't associated with Thai Buddhism? I would expect something like 'brave' modifying hua cay 'heart'.

The Vietnamese title of Brave has no semantic challenges for me:

Công chúa tóc xù 'Princess Bushy Hair'

However, 'bushy' has an unexpected combination of x- (normally < *cʰ-) and a lower series tone in a native word. x- with lower series tones in Sino-Vietnamese (e.g., 蛇 xà) comes from Late Middle Chinese *tɕʰ- < *(d)ʑ-. No such devoicing with aspiration occurred in Vietnamese. I could mechanically reconstruct an earlier voiced aspirate *ɟʱ- to account for the tone of 'bushy', but I wonder if the actual source of x- plus lower series tones in native words could be *cʰ- with a voiced prefix and/or *ɟ- with a voiceless prefix conditioning aspiration.

*The retroflex letter ฬ <ḷ> is due to influence from กาฬ kaan <kāḷ> from Pali kāḷa- 'black' which in turn is from Sanskrit kāla- with a dental l-. I don't understand why "[d]ental and retroflex sounds sporadically change into one another" in Pali.

The Pali Text Society's Pali-English Dictionary (1921-1925) does not list a *mahākāḷa- with a retroflex corresponding to Sanskrit mahākāla- with a dental l.


I usually say that two out of three volumes of the Tangraphic Sea have survived, but for brevity I don't note that the early parts of the level and rising tone sections of the Mixed Categories (MC) volume are missing. This is why Andrew West's electronic version of MC begins with class IV of the 'level' tone and class V of the 'rising' tone.

The Precious Rhymes of the Tangraphic Sea (PRTS) manuscript has bits of those sections. Here is the number of tangraphs per class in each section of Mixed Categories (MC) in PRTS:

'level'/1 3 2 ? 5 6 106 82 22* 74
'rising'/2 1 1 7 2 6 82 72 13 74

The figures (based on Shi et al. 2000: 319-343) are not complete, but what remains leads me to doubt that they could be much higher: e.g., the list of 'level' tone class I tangraphs begins and ends on the same page.

The majority of tangraphs in MC in PRTS are in the class VI, VII, and IX sections largely containing tangraphs for syllables with dz- (VI), j- (VII), and lh- (IX). I don't know why those syllables were not listed in the 'level' or 'rising' tone volumes. dz- and j- are both voiced obstruents, but they do not form a natural class with the voiceless sonorant lh-.

Conversely, why are scattered non-dz-/j-/lh-tangraphs in MC in PRTS: e.g., why was

5405 2ma1 'the Tangut surname syllable Ma'

the sole tangraph in the 'rising' tone class I section of MC instead of in the 'rising' tone volume of PRTS?

*3.1.0:06: This figure includes the class VIII tangraph

0222 1horn1 'to roar, howl'

which was actually listed toward the end of class VII.


I don't want this blog to become a survey of Tangraphic Sea homophone groups. I want to only look at a few cases that differ enough from each other to warrant posts.

This trio in Tangraphic Sea rhyme 1.2 caught my eye because Arakawa reconstructed different initials (š-, š²-) in his Nishida-style reconstruction while reconstructing a three-way merger in his own reconstruction:

Tangraphic Sea 1.2 homophone group Tangraphic Sea circle Example Homophones A Homophones B Nishida-style reconstruction in Arakawa (1997) Arakawa (1997) Gong This site Tangraphs
8 36A38-36A51 41B42 1šĭu 1shyu 1ɕju 1shu3 A 1
9 36B67 1š²ĭu 1shu3 B 3
12 37A54-37A63 37B75-38A11 1šĭu 1ɕjwu 1shwu3 5

Each homophone group has a distinct fanqie. Their initial spellers are in three nonoverlapping chains including the example tangraphs above:

8. <> (initial transcribed in Chinese as 室 *sh- and Tibetan as sh- and (g)j-; transcribed Sanskrit ś- and possibly c-; Sofronov 1968 II: 22 and Tai 2008: 192)

9. <> (initial transcribed in Tibetan as (b)sh-, gs-, and zh(w)-; transcribed Sanskrit ś-, ṣ-, s-; Tai 2008: 192)

12. <> (initial transcribed in Chinese as 說 *shw- or 姪 *chh-; Sofronov 1968 II: 17)

The external evidence generally points to sh- for all three. Group 7 has initial chh-, so group 12 might have initial shw- by process of elimination.

If Tangut had two kinds of sh-, I would expect them to correspond to Sanskrit palatal ś- and retroflex ṣ-, but both were written with tangraphs whose initial spellers were in the fanqie chain for the initial of homophone group 9.

The fanqie final spellers for 8 and 9 are in the same chain:


group 8 < 1013 < 3003 > group 9

implying that the distinction between the two groups is in their initials. Yet Homophones A regards the two groups as homophones unlike Homophones B.

The placement of circles in the Tangraphic Sea also implies that the two groups are somehow united (they are undoubtedly similar), though circles are also missing between groups that are undeniably very different: e.g., 1.1.7 1nu1 B and 1.1.8 1ku1.

The Tangraphic Sea and both editions of Homophones agree that group 12 is distinct, and the final speller 0622 is in a separate chain from that of groups 8 and 9:


0622 1wu3 <> 2842 1shwu3

I have not found any strong transcription evidence for the -w- that Gong reconstructed for 0622 and its fanqie initial speller

1795 2wi4

However, if 2842 had shw-, then its final speller probably had -w- too.


In my last entry, I found that the 'A' and 'B' homophone groups in Tangraphic Sea rhyme 1.1 had identical rhymes because their fanqie final spellers were in the same chain. That may seem like a tautological statement, but there is no guarantee that two or more rhymes (1.1a, 1.1b ...) were not conflated under a single heading. Arakawa reconstructed two types of 1.1 rhymes, 1-u and 1-u2. (All of the 'A' and 'B' pairs have 1-u in his system.)

Groups 6 (1nu1 A) and 7 (1nu1 B) also had fanqie initial spellers in the same chain, so their fanqie seem to indicate they were homophonous. Why would identical syllables be arbitrarily split into two groups?

What about the fanqie initial spellers of the other two 'A/B' groups?

Group 9 (1khu1 A) has a 'rising' tone fanqie initial speller without any known fanqie:

2782 2khi4 < Chinese 氣 *2khi3 'gas'

If the missing volume of the Tangraphic Sea is ever rediscovered, we could see if 2782 is in the same fanqie chain as 4807, the fanqie initial speller of group 10 (1khu1 B):


4807 1khi4 <> 5399 1khu4

2782 and 4807 are in the same homophone group in Homophones A, implying that they had the same initial, but they are in different homophone groups in Homophones B, implying that they had different initials as well as finals.

I thought group 10 (1khu1 B) might have had secondary kh- from an earlier g-. If 4807 'to lose' was a speller for *g-, it should have cognates with voiced initials. However, Jacques (2014: 81, 94, 250) identified its Japhug cognate as kra 'to drop' with a voiceless initial. (The mismatch in aspiration requires explanation.) On the other hand, I think 4807 is a loan from Chinese 棄 *3khi4 'to discard'. Either external connection points to kh- and not to g-.

Lastly, groups 13 (1tshu1 A) and 14 (1tshu1 B) have initial fanqie spellers in the same chain:

group 13 < 3291 < 4996 > 1319 > group 14

so they had the same initial (tsh-) as well as the same rhyme (1-u1) ... and yet groups 13 and 14 are separate in both the A and B versions of Homophones!

13: Homophones A 33B37Homophones B 34A47 (isolated characters without homophones at the end of chapter VI)

14: Homophones A 32B15Homophones B 33A31

I give up ... for now. I doubt this is the last time I'll try to wrestle with this problem. AN UN-NU-N DISTINCTION

Gong reconstructed Tangraphic Sea rhyme 1.1 homophone groups 6 and 7 identically even though they (1) have different fanqie, (2) are separated by circles, and (3) correspond to different (though adjacent) homophone groups in Homophones:

Tangraphic Sea homophone group Homophones Tangraphs Gong
6 17B44-17B45 1nu1
7 17B41-17B43
(The third tangraph in parentheses is in Homophones but not the Tangraphic Sea.)


Yesterday I hypothesized that the two groups might have had different initials: voiceless hn- and voiced n-. If the distinction were in the initials, then their fanqie initial spellers 3226 and 4027 should not be in the same fanqie chain. But in fact they had the same fanqie initial speller (0616):


group 6 < 3226 < 0616 > 4027 > group 7

Their fanqie final spellers are also in the same chain:

'A' groups 6, 9, 13 < < < > > 'B' groups 7, 10, 14

So how could groups 6 and 7 be distinct if they had the same initials and finals? A U-NIQUE TANGRAPH

I added a note to my previous entry about

1520 0wu1

which is the only possible rhyme 1 syllable in the Class VIII section of the 'level' tone section of the Mixed Categories volume of the Tangraphic Sea. I wrote that its "reconstruction is problematic." Here's why.

1. Tone

I transcribed the tone of 1520 as 0 (= unknown) following Gong. I should have followed Arakawa and regarded its tone as 'level' because it is in the 'level' (= first) tone section of the Mixed Categories and its fanqie contains a final speller with the 'level' tone:


1520 0wu1 = 0434 1i3 'alas' (with a glottal stop initial) + 3044 1u1 'grave, death'

1520 is a fanqie character, so that formula doubles as its graphic analysis: 'speech' (the left side of 0434) plus all of 3044 (phonetic).

2. Medial

I don't know why Sofronov (1968 II: 379) and Gong reconstructed a medial -w- and Li Fanwen (1986: 429) reconstructed an initial w-, as neither fanqie speller contained -w-. 0434 transcribed Sanskrit i, not vi (Arakawa 1997: 110). (Sanskrit has no w.) 3044 belongs to homophone group 16 which was transcribed in Tangut period northwestern Chinese as 烏 *1u1 without w (Sofronov 1968 II: 6).

My guess is that w was reconstructed to account for the fact that Homophones lists no homophones for 1520: i.e., 1520 could not be homophonous with 3044 1u1.

However, such a w is difficult to reconcile with the fact that 1520 transcribed Sanskrit u, not vu (Arakawa 1997: 110). Moreover, 1520 combined with 'long' to form 1540 which transcribed Sanskrit ū, not (Grinstead 1972: 87, 184):


1540 'Sanskrit ū' = 1520 'Sanskrit u' + 0443 1jo3 'long'

(The placement of 0434 and 1520 in Grinstead's table on p. 184 imply they were for Sanskrit long ī and ū, but I think they should be in the previous row for short vowels.)

3. Final

Arakawa (1997) reconstructed 1520 with what appears to be rhyme 1.4* (1-uɦ) using Nishida's system on page 110 but with rhyme 1.1 (1-u) using Nishida's system and his own system on pages 98 and 125. Yet the final fanqie speller of 1520 has rhyme 1.1. I cannot explain this inconsistency. (Nishida 1966: 420 reconstructed 1520 with rhyme 1.1 1-u.)

4. Placement

If 1520 was 1u1 as its fanqie indicates, why was it in Mixed Categories instead of the 'level' tone volume with 3044 1u1? Is it because 1520 is a Sanskrit transcription character like others in the Class VIII section of Mixed Categories? But not all Class VIII characters in Miscellaneous Characters transcribe Sanskrit, and there are Sanskrit transcription characters in the other two volumes of Tangraphic Sea.

The bigger question is why the Miscellaneous Characters volume exists at all. Was it a compilation of characters that were accidentally left out of the other two volumes, or do its characters have something else in common?

*Arakawa wrote "1.? øuɦ" without specifying a rhyme, though -uɦ can only be rhyme 4 in Nishida's system. The question of whether Sanskrit vowel transcription characters were pronounced with zero initials and/or glottal stops merits investigation. TANGRAPHIC SEA RHYME 1.1: 'PREFACE'

It's neat that the name of the first Tangut rhyme

5085 1bu1

means 'preface' rather than, say, 'epilogue'.

It is the first of six tangraphs in homophone group 3 in the 'level' (first) tone volume of the Tangraphic Sea:

Homophone group






1nu1 A

1nu1 B
1khu1 A

1khu1 B


1tshu1 A
1tshu1 B





You can see the tangraphs and fanqie spellings for all of those groups on Andrew West's site.

I discussed the tshu-groups (13, 14, and 22) in my last post, so I created that color-coded table to see them in context. I'd like to go through both surviving volumes of the Tangraphic Sea to study their organization.

Groups 1-19 follow the order of consonant classes also in Homophones. The order of consonants within each class in turn follows the Chinese tradition which in turn is based on the Indian tradition: e.g.,

k-, kh-, kh- (< *g-?), g- [ŋg] (groups 8-11)

The consonant class cycle starts again in group 20, implying that groups 20-24 constitute a set distinct from the first 19 groups. Gong reconstructed -w- in groups 20-24, though there is no Tibetan, Chinese, or Sanskrit transcription evidence for a medial. (Sofronov 1968 II: 6 collected all the Tibetan and Chinese evidence for the first rhyme of the Tangraphic Sea.) Arakawa (1997: 17, 125) wrote the rhyme of groups 20-24 as -u² in Nishida's system and as -u2 in his own system.

The absence of Class II is due to chance. Rhyme 1.1 is a Grade I rhyme that cannot have Class IV and VII initials.

Circles divide homophone groups 1-6, but the correlation between circles and groups breaks down when circles do not separate groups 7-9 even though 7 is Class III and 8-9 are Class V.

Gong reconstructed groups 9 and 10 identically even though they are separated by a circle, have different fanqie, and are in different columns in a Tangut phonetic table. 10 should have had *kɦ- or even *g- if the Tangut were following Chinese consonant order. Gong pointed out that groups 9 and 10 were transcribed with the same Chinese initial *kh- in the Pearl in the Palm (1190) but that does not necessarily mean they had the same initial when the Tangraphic Sea was written. Late 12th century northwestern Chinese *kh- is from both *kh- and *g-, and late 12th century Tangut kh- may also have had two sources in earlier Tangut:

Tangraphic Sea Tangut k-
Pearl in the Palm Tangut

This hypothesis predicts that kh- and old g- (not to be confused with the prenasalized stop transcribed as g-) should have different fanqie spellers and different correspondences with consonants in other languages. I should test both predictions later.

Groups 13 and 14 may also have had different initials:

Tangraphic Sea Tangut ts-
Pearl in the Palm Tangut

I have arbitrarily used the letters A and B to distinguish between groups with uncertain differences in my transcription here. I do not intend to imply that 1khu1 A (group 9) was to 1khu1 B (group 10) what 1nu1 A (group 6) was to 1nu1 B (group 7), though perhaps there are parallels: e.g., if group 6 n- was voiceless [n̥] and group 7 n- was voiced [n], then A syllables had voiceless (aspirated) initials and B syllables originally had voiced initials that later merged.

I am baffled by the absence of a circle separating group 19 from groups 20-24. Were the latter considered to be self-evidently different (e.g., with Gong's medial -w- or with a different rhyme)?

I also do not understand why there are no m- or d- syllables with rhyme 1.1 (though rhyme 1.4 has one m- syllable and six d-syllables - the only rhyme 1.4 syllables with nonback initials).

2.24.11:18: Nor do I understand why dzu-syllables with rhyme 1.1 - and all other syllables with dz- and either tone - were placed in the Miscellaneous Characters volume of the Tangraphic Sea along with

1520 0wu1

whose reconstruction is problematic. ADMONISHING POTATOES

The first character in the fragment I discussed last week has this fanqie in the Miscellaneous Characters volume of the Tangraphic Sea:


3543 1dzwy1 'chapter' = 0524 1dzu3 'to admonish, instruct' + 3605 1tshy1 (first syllable of 1tshy1 2on4 'potato')

Why does 1dzwy1 contain a -w- absent from its fanqie spellers?

The short answer is that my transcriptions are based on Gong's 1dzwə, 1dzju, and 1tshə.

At times like this I want to reconstruct from scratch. Let's look at the evidence for 3543.

It is a fact that 3543 is in the tone 1 / Class VI section of Miscellaneous Characters. So the tone is certain, and the initial must be ts-, tsh-, dz-, or s-.

The initial speller of 3543 has this fanqie in Miscellaneous Characters:


0524 1dzu2 = 1092 1dzy4 +  3003 1u3


- was transcribed in Tibetan as HdziH (whose initial was [ndz])

- was transcribed in Chinese as 尼則 *nji tsy = *ndzy

- transcribed Sanskrit j (which was [dz] in the variety of Sanskrit known to the Tangut)

so its initial - and the initials of 0524 and 3543 - must have been dz- [ndz].

The final speller of 3543 has this fanqie in the 'level' tone (= tone 1) volume of the Tangraphic Sea:


3605 1tshy1 = 0984 1tshwu1 + 3732 1ky1

I think 3605 should be 1tshwy1 instead of 1tshy1. Then 3543 would also have -w-:

3543 1dzwy1 = 0524 1dzu3 + 3605 1tshwy1

Yet both Gong and Sofronov reconstructed 3605 without -w- and 0984 with -w-.

There is no transcriptional support for -w- in 0984 which belongs to the last of three tshu-type homophone groups in rhyme 1 of the 'level' tone (= tone 1) volume of the Tangraphic Sea:

1tshu1 group 1:

1tshu1 group 2:

1tshu1 group 3:

The third group is isolated toward the end of rhyme 1, far from the other two which are next to each other. Gong and Sofronov reconstructed a -w- in the third group and regarded the first two groups as homophonous despite their different fanqie. I suspect there was a three-way distinction that no one has reconstructed yet.

2.23.4:03: I forgot to justify reconstructing -y for 3543. Unlike the other two volumes of the Tangraphic Sea, Miscellaneous Characters is organized by tone and initial consonant class, not tone and rhyme. So the rhymes of Miscellaneous Characters entries have to be determined from their fanqie final spellers. The final speller of 3543 is 3605 which is listed under rhyme 27 of the 'level' tone (= tone 1) volume of the Tangraphic Sea. That rhyme was transcribed in Tibetan as -a, -i, -e, and -o. The corresponding tone 2 rhyme was transcribed with -u as well as the other four Tibetan vowels. This diversity of transcriptions indicates a Tangut vowel that was unlike anything in Tibetan: e.g., a schwa. (5:03: I transcribe all nonlow central and back unrounded vowels as -y.) CHAPTERS ON SKINNY REALITY

The first character in the fragment I discussed in my last post


3543 1dzwy1 'chapter' = 3020 1ja'3 'reality' + 5334 1dzwy1 'to catch' (phonetic)

contains a component

found in only three other characters:


3020 1ja'3 'reality' = left of 3543 1dzwy1 'chapter' (circular!) + 'speech' (left of 1817 2daq1 'to know')


3523 2ja'3 'skinny, wan and sallow' = left of 3020 1ja'3 'reality' (phonetic) + all of 4675 2rer4 'toil' (why?)


3021 1ja'3 'chapter' = possibly left of 3543 1dzwy1 'chapter' + ?

I suppose their shared component is a phonogram for ja'3 that is semantic in 3543 if it is an abbreviation for 3021 1ja'3 'chapter':


3543 1dzwy1 'chapter' = 3021 'chapter'? + 5334 1dzwy1 (phonetic)

If 3021 predated 3543, then 3543 cannot be the true source of the left side of 3021 (though such a circular analysis might have appeared in the Tangraphic Sea if it had an entry for 3021). 3020 1ja'3 or 3523 2ja'3 may be an abbreviated phonetic in 3021 1ja'3. HOW LONG DOES IT TAKE TO LEARN THE GOLDEN GUIDE?

Andrew West identified a fragment of Tangut writing practice (British Library Or. 12380/2625) that puzzled me back in 2011 as being from the preface to the Golden Guide:

Location 01B508 01B509 01B510 01B511 01B601
Li Fanwen number 3543 0113 1771 4735 2814
My transcription 1dzwy1 1shen3 2seq4 2ja3 2lhiq4
Gloss chapter to accomplish wisdom sharp month

Here is the full context of this passage from Kotaka (2005) with his translation:

1ngwy1 2di4 2gwi4 2pho'4 1lyr'3 1ny'4 1dzwy1 1shen3

lit. 'five character phrase collect four two chapter accomplish

'Collecting phrases of five characters, [we] made four-two chapters.'

1nwy1 2ja3 2lhiq4 1oq2 2zeq4

lit. 'know sharp month round finish

'The sharp of mind may finish [them] in one month.'

Kotaka thought

1lyr'3 1ny'4 'four-two'

meant 'eight', though he broke the Golden Guide into three sections with ten subsections in this 2003 article:

I. The Tangut world and its government

A. The creation of heaven and earth, the seasons, the Heavenly Stems and Earthly Branches, the changing of the days and years

B. The life and work of the court, the imperial family, and its officials

II. Names

A. Tangut surnames

B. The Tangut people and surrounding peoples; the Jurchen are excluded, leading Nishida (1997) to guess that the Golden Guide was composed before the rise of the Jurchen Empire in 1115

C. Chinese surnames

III. Tangut daily life

A. Marriage and kinship terms

B. Luxury items and clothes

C. Utensils, food, monks

D. Beasts, birds, domestic animals

E. Daily life of the masses

Although Kotaka (2005) used IOM #741 as his basis, his typeset version of the Golden Guide has

2699 instead of 1771

for 01B510. Both IOM #741 and British Library Or. 12380/2625 clearly have 1771. I translate

1seq4 2ja3

as '[those with] sharp wisdom' or 'the wise and sharp'.

Kotaka translated

2lhiq4 1oq2

as 'one month'. The first half is 'month', but what is the second half 'round' doing? Maybe this phrase means 'in about a month'. Are there any other instances of quantity words followed by 'round'?

Lastly, Kotaka translated


as a verb 'finish', though Kychanov and Arakawa (2006: 283) and Li (2008: 518) translated it as an adjective 'honest'. Kotaka's translation is closer to Nishida's (1966: 432) 'to do'.

The phrase

1seq4 2ja3 2lhiq4 1oq2 2zeq4

'wisdom sharp month round ?'
is parallel in structure to the following phrase:

1viq1 1lwen1 1kew4 1mi4 1chen3

'foolish obtuse year not ?'

which Kotaka translated as 'Even the stupid will take less than a year [to finish it].'

1chen3 normally means 'correct' but can also mean 'to pass'. 'Correct' is similar in meaning to 'honest' for 2zeq4. Could 1kew4 1mi4 1chen3 'year not correct' mean 'not precisely a year'?

I would then expect 'not precisely a month' in the previous line. But if 2lhiq4 2zeq4 were 'precisely a month' (lit. 'month honest'), wouldn't a modifier follow 'honest' instead of precede it: i.e., *2lhiq4 2zeq4 1oq2 'about (lit. round) an honest month' instead of 2lhiq4 1oq2 2zeq4 'honest approximate (lit. round) month'?

Maybe 2lhiq4 1oq2 2zeq4 means 'precisely a full month' if 1oq2 is 'full' as in

1oq2 1sy1 'full'

which may be a redundant compound since 1sy1 means 'full' by itself. CLEAR FOUNTAIN TEMPLE?

I often play the game of 'guess the Chinese characters' when seeing Korean in hangul or romanization or Vietnamese in Quốc ngữ (國語 'national language'; i.e., the Vietnamese alphabet).

A Buddhist temple named Chùa Thanh Nguyên was on the front page of Sunday's Honolulu Star-Advertiser. (Yes, this post is four days late.)

I think the characters for Thanh Nguyên are 清源 'clear fountain'. There are 清源寺 'Clear Fountain Temples' in China, Korea, and Japan.

The Sino-Vietnamese reading for 寺 'temple' is tự; its native (?) equivalent is chùa, written as 廚 chù 'kitchen' (phonetic) with or without 寺 'temple' (semantic) beneath it in the nom script.

For years I have been puzzled by chùa which should go back to an earlier *ɟuə.

I would expect a word for a Buddhist temple to be a loanword. (English temple is from Latin templum.) But there is no Chinese word for 'temple' like *ɟuə. tự is from a southern Late Middle Chinese form like *sɨ̀ which in turn goes back to Late Old Chinese *zɨəʰ. *zɨəʰ is close to *ɟuə, but not close enough. If *zɨəʰ were borrowed into early Vietnamese, it might have become *ɟ(ɨ)ə(h) which would have developed into modern *chờ, *chỡ, chừa, or *chữa without a labial vowel, not chùa with a labial vowel.

Moreover, I can't find any Vietic cognates of chùa, so I am not sure it is a native word. Could its nom spelling 廚 be etymological: i.e., is it a borrowing from Early Middle Chinese 廚 *ɖuə 'kitchen'? Although the initials are not a problem (Early Middle Chinese 重 *ɖuoŋʰ 'heavy, important' corresponds to Vietnamese chuộng 'to esteem' [i.e., regard as important]), there is a vast semantic gap between 'temple' and 'kitchen'. Is a shift from 'kitchen' to 'temple' attested in another language?

While I'm on this topic, I should mention another Vietnamese word that I cannot explain. A Buddhist monk is a thầy chùa. Thầy 'master' has no Chinese source that I know of. Moreover, its huyền tone would normally indicate a *voiced initial, even though I doubt Proto-Vietic had voiced aspirated *dʱ- or *ʑ-. (Sino-Vietnamese th- followed by a huyền tone is from southern Late Middle Chinese *ʑ-.) Nom spellings of thầy contain the voiced-initial phonetic 柴 sài < *draːj with optional semantic additions 亻 'person' or 師 'master'.

Another puzzling word of this type that is certainly native is thịt 'meat' whose Vietic cognates have voiceless s- even though a nặng tone also normally indicates a *voiced initial. (Proto-Vietic *s- normally became Vietnamese t-, not th-.) Nom spellings of thịt contain the voiced-initial phonetic 舌 thiệt < *ʑiet with optional semantic additions 月 'meat' or 肉 'meat'.

Could the tones of 'master' and 'meat' reflect voiced prefixes? Did those prefixes condition aspiration? Or did Proto-Vietic have *dʱ- or *ʑ-? MOCANG, MOZANG, MIZANG, MYDZON?

While looking through for references to the Uyghur in the Tangut Empire in Dunnell (1996), I found a passage about Lady Mocang (Chinese Wikipedia entry), mother of Emperor Yizong. Mocang is how Dunnell read 没藏; the second character can also be read as zang in Mandarin. Is there any premodern note specifying the pronunciation of 藏?

Dunnell (1996: 56) also mentions another Chinese transcription pronounced Mizang in modern Mandarin in 李燾 Li Tao's 續資治通鑑長編 Xu Zizhi tongjian changbian, but I only see 没藏 Mocang/Mozang in this online edition. Alas, she does not provide characters for Mizang in the back of her book.

Although the surname Sinified as Mocang/Mozang (and Mizang?) is important in Tangut history, I don't know what its Tangut characters were. The closest matches in the surnames in Miscellaneous Characters are

3682 5164 1my1 1dzon1 and 2888 5164 2my1 1dzon1

3682 is a word for 'merit' that may not be attested outside dictionaries. Its meaning is not apparent from its Tangraphic Sea analysis:


3682 = 2216 1teq4 'swift' (semantic?) + 3513 1my1 'heaven' (phonetic and semantic?)

2888 means 'surname' and is obviously from 'person' + 'surname'; it also appears in the surname

2888 1085 2my1 1zi4

with 1085 'man'. I would not expect 'surname' in a surname.

5164 is not attested alone; it is a second syllable in six other surnames in Miscellaneous Characters:

2214 5164 2ly3 1dzon1 and 4698 5164 2ren4 1dzon1
3219 5164 1pa1 1dzon1 and 1989 5164 2vy3 1dzon1
3334 5164 1ma4 1dzon1 and 3889 5164 2be'4 1dzon1

2214, 3219, 1989, and 3889 are surname characters.

4698 is a surname and toponym character.

3334 means 'female'.

Does the Tangraphic Sea analysis of 5164 tell us something about the eight -dzon families?


5164 = 5031 1by1 'second half of Lyby, ancestor of the Black-Headed Tangut' + 1332 1de1 'to pass on' + 2132 2ew4 'achievement'

Were those families considered to be direct descendants of Lyby?

3682 1my1 and 2888 2my1 are good phonetic matches for 没 which would have been pronounced something like 4my1 in Tangut period northwestern Chinese, but 5164 1dzon1 has a voiced initial unlike 1tshon1 and 3tshon1, the Tangut period northwestern Chinese readings of 藏.

2.18.0:45: Conversely, 藏 did have a voiced initial in the eastern dialect(s) underlying the later Phags-pa Chinese reading ꡐꡃ <tsaŋ> [dzaŋ] (with both tones 1 and 3), but the final does not match Tangut -on (which may be from pre-Tangut *-am, *-em, *-om, or *-um; see Jacques 2014: 206). SOUTHERN MOUNTAINS IN XINJIANG MANDARIN

Yesterday I rediscovered a note from last October for a future post idea. I had found this passage in Hahn (1998: 383) which happened to contain the Uyghur equivalent of Turkish adamlar 'men' from my previous post (emphasis mine):

Loanwords [in Uyghur] may contain both front and back elements, but any suffix attached to such a loanword stem takes its harmonic information only from the last syllable of the stem. This may be illustrated by means of the suffixes -lAr and -dA attached to adem 'human being' < Arabic ʕādam, polek 'Pole' < Russian poljak, šenduŋ 'Shandong' < Mandarin Shāndōng and Xunen 'Hunan' < Mandarin Húnán: ademler 'human beings', polekler 'Poles', Šenduŋda 'in Shandong', Xunende 'in Hunan'.

Why does Uyghur front e correspond to standard Mandarin nonfront a? I think the key word is "standard". I don't have any data on the Mandarin spoken in Xinjiang, so I can only guess that the source dialect of those loanwords might be like other Mandarin dialects which have fronted the vowel of *-an: e.g., 南 næ̃ 'south' and 山 sæ̃ 'mountain' in Xi'an which is 2,500 km from Ürümqi.

The nasal front vowels in the Xi'an forms for 'south' and 'mountain' might reflect the pre-Mandarin substratum dialect underlying Tangut

3382 1nin1 (transcription of Chinese 南 'south' in the Tangut translation of The Art of War)

3763 1shan2 'mountain' (borrowing from Chinese 山 'id.')

The -n of my Tangut transcriptions is not a consonant [n]; it indicates vowel nasalization.

Gong reconstructed the rhyme of 3382 as -ẽ. I think -in1 might have been something like [ə̃j], [ẽj], or [ɪ̃]. In any case, it was not [ĩ] like -in4. VOWEL HARMONY IN OLD CHINESE, PRE-TANGUT, QIANG, TIBETAN, AND XIONGNU

Back in 2002, I proposed that presyllabic vowels conditioned vowel 'warping' in Old Chinese:

low-vowel presyllable + high-vowel syllable > falling diphthong

e.g., *Cʌ-Ci > Cei

high-vowel presyllable + low-vowel syllable > rising diphthong

e.g., *Cɯ-Ca > Cɨa

Ten years later, I proposed a similar origin for Tangut grades:

low-vowel presyllable + high-vowel syllable > Grades I and II (depending on medial)

e.g., *Cʌ-Ci > Ci1 and *Cʌ-Cri > Ci2

high-vowel presyllable + low-vowel syllable > Grades III and IV (depending on initial)

e.g., *Cɯ-sha > sha3, *Cɯ-sa > sa4

In all of the above cases, harmony is from presyllable to syllable (= prefix to root in many cases) and only partial. The second vowel does not completely change to match the first.

However, in Yadu Qiang, a relative of both Tangut and Chinese, harmony is from root to affix and total according to Nate Sims:

a + = ɑ-hɑ 'one bunch' (backing of a)

a + = ɛ-pɛ 'one bowl' (raising of a)

(a 'one' is cognate to Tangut

5981 0a0 'one'.)

+ pʰu = tu-pʰu 'up-flee' = 'to flee upward'

ʁwɑkʰu + = ʁwɑkʰu-pu 'sarcasm-do' = 'to be sarcastic (raising and rounding of ə)

Did vowel harmony develop in opposite directions in different branches of Sino-Tibetan? Lhasa Tibetan even has harmony "from affixes to stems and vice versa" (DeLancey 2003: 271): e.g.,

Hgro gyi yinukijiː̃] 'go (conjunct future)' (stem vowel raising)

zhim-po [ɕimpu] 'delicious' (suffix vowel raising)

That harmony is relatively recent, as it is not reflected in the orthography based on Classical Tibetan.

I don't know how old Yadu Qiang harmony is.

My proposed Pre-Tangut harmony must predate the 11th century when Tangut was first written (i.e., when Tangut had already lost the presyllables that conditioned harmony).

My Old Chinese harmony is even older.

I used to think that vowel harmony might have spread from Old Chinese to core 'Altaic' (Turkic, Mongolic, Tungusic), but I somehow overlooked the obvious fact that my proposed Old Chinese harmony is nothing like 'Altaic' harmony which is also root-to-affix and total: e.g.,

Turkish adam-lar 'men', Türk-ler 'Turks'

Written Mongolian aqa-nar 'older brothers', degüü-ner 'younger brothers'

Manchu sagda-sa 'old men', gege-se 'older sisters'

Korean pad-a 'receive and ...', ŏb-ŏ 'carry and ...'

I have not seen any evidence for vowel harmony in Xiongnu. The Xiongnu title 'crown prince' was transcribed in Old Chinese as 護于 with a mixture of syllable types (A and B in Chinese terminology). (But could 'crown prince' have been a compound noun without vowel harmony? Its second half may have been shared with the Tangut title for 'supreme ruler', transcribed in Old Chinese as 單于 and interpreted by Vovin as 'north-ruler'.)

If I am right about Chinese and Tangut, then

- Chinese and Altaic vowel harmony are unrelated

- Altaic vowel harmony had nothing to do with Xiongnu

Perhaps Mongolic* got it from Turkic or vice versa when speakers of both were under Xiongnu rule

- Tangut vowel harmony had nothing to do with Tangut's neighbor Uyghur

*I use 'Mongolic' here in a broad sense to refer to the ancestor of all languages related to the Mongolic languages: e.g., Khitan, which is, strictly speaking, a Para-Mongolic language that is a sister to Mongolic proper. AN ARMY COMMANDER WHO WAS SMALL LONG AGO

(Small [smɔɰ] and ago [əʹgoɰ] almost rhyme in my dialect.)

Andrew West posted the second part of his series on two Tangut families. (Last month I wrote a response to his first part.) Part 2 concerns 小李鈐部 Xiaoli Qianbu (1191-1259) who was commemorated on a stele where his name appears as

3799 1141 1531 2805 2sew1 2li3 1ga4 2bu'4

His surname was also sinified as 昔里 Xili (*Sili in Old Mandarin; the characters are phonetic symbols normally meaning 'long ago' and 'village' or 'Chinese mile'). Andrew explained the variation:

A more plausible explanation is that their family name was originally Tangut, pronounced something like Sili [which would be meaningless in Chinese], and that after Xiaoli Qianbu moved away from Hexi he sinified it to the similar-sounding [and meaningful] Chinese characters xiǎolǐ 小李 ['small plum'; *sew li in Old Mandarin], and invented the story of his ancestors being Shatuo Turks who had been given the Tang royal family name [Li].

I think there is another possible explanation. The alternation of *sew and *si in the Old Mandarinizations of his name remind me of how Tangut -ew often appears as -i in Tibetan transcriptions. If Tibetans heard -ew, they could have transcribed it as -eHu (which is how they transcribed similar Chinese rhymes a couple or so centuries earlier). Yet the Chinese transcriptions of Tangut in the Timely Pearl clearly indicate a final *-w: e.g.,

0100 1lew1 'one' (which shares rhyme 44 with 3799 2sew1 though the tones are different)

transcribed in Tibetan as kli (x 3), gli (x 1), gliH (x 1)

transcribed in Chinese as 婁 *1lew1

The Tibetan and Chinese transcriptions reflected two kinds of Tangut dialects, one that shifted -ew to -i* and another that retained -ew. Xili and Xiaoli may be Sinifications of the same surname in those two types of Tangut dialects:

(*2sew1 2li3 >) 2si1 2li3 > Old Mandarin 昔里 *sili (= modern standard Mandarin Xili)
2sew1 2li3 > Old Mandarin 小李 *sewli (= modern standard Mandarin Xiaoli)

Perhaps one Tangut dialect was Xiaoli's native dialect and the other was the prestige dialect.

This kind of double transcription still exists today: e.g., someone may be known in English as both Wong and Huang, Cantonese and Mandarin versions of the same family name 黄.

As for 鈐部 Qianbu (Old Mandarin *kem pu), it too has variants: 甘卜/敢不 *gam bu and 紺部 *gam pu. The *-mb-* and *-mp- sequences reflect the prenasalization of the initial b- [mb] of

2805 2bu'4 'to command'

Andrew proposed that 鈐 *kem has a double function: it means 'to stamp a seal' (and is hence reminscent of Qianbu's Mongolian title daruγači which could be interpreted as 'presser'; see part III of my series on the title) and also sounds like

1531 1ga4 'army'

plus the [m] of 2bu'4 [mb ...]. (I am not certain about how the rhyme -u'4 was pronounced.) The Old Mandarin vowel *e may reflect the frontness of Tangut Grade IV a which may have been low front [a] or [æ] (as opposed to Tangut Grade I a which may have been low back [ɑ]).

Andrew interpreted

2893 2khwe1 'great'

following Xiaoli Qianbu's name on the stele as a noun 'great man' (i.e., 'official') rather than as an adjective (which was my interpretation).

In either case I agree with Li Fanwen (2008: 474) who regarded it as a loan from Chinese 魁 *1khwe1 with a similar semantic range: 'great' ~ 'great man'. The native Tangut word for 'great' is

4457 2leq3

which may be cognate to Old Chinese 大 *lats 'big' and 太 *l̥ats 'great'.

*A more exotic possibility is something like -iɰ with an unrounded glide coda (cf. Sofronov's -eɯ for rhyme 44). had no equivalent in either Tibetan or Chinese and would be ignored in transcriptions.

However, I would not be surprised if -w was simply lost, as Tangut had already lost all other codas. A Tangut dialect that lost -w would have a simple C(w)V syllable structure (assuming that the mysterious quality indicated by ' was not a coda). AN AGNOSTIC APPROACH TO TANGUT GRADES

Perhaps the most distinctive feature of the complex version of my Tangut transcription system is my use of numbers for grades. I got the idea from Arakawa Shintarō, who uses syllable-final numbers (e.g., the -2 of his -eq2) to distinguish between similar rhymes (but not grades):

Arakawa's grade
This site
Tibetan transcription
Chinese transcription
-e(H), -ye (once)
*-e1 (奈碎), *-e3 (兵丙), *-e4 (丁頂), *-u3 (sic!; 局)
*-o1 (桑郎)
-eq2 I
*-e2 (擺)
-yeq2 II
-er1 -e(H)
*-e1 (嵬乃), *-e2 (冷)
-eq'2 I
*-e3 (領), *-e4 (名酩)
-yeq'2 II
(r-) -e, r- -i (once), -aH (sic; once) *-e3 (領), *-e4 (名), *-i3 (你)
-a (sic; once)
*-o1 (我餓)

Note there is no Grade I -eq in Arakawa's transcription corresponding to his Grade I -eq2.

I also use syllable-final grade numbers in my notation for Tangut period northwestern Chinese for ease of comparison with Tangut.

In theory one could even rewrite other reconstructions using grade numbers to further facilitate comparisons: e.g.,

Gong (rewritten)
Arakawa (rewritten)
This site
-a: -a3a

Numerical notation enables us to look at reconstructions in terms of categories rather than phonetic details. All three generally* agree on the categorization of rhymes 17 and 18, but disagree on whether 19 and 20 are the same grade (Gong), variants of the same grade (Arakawa), or different grades (this site).

I have changed my mind about the phonetic interpretation of the grades several times, but I am almost certain that Grade II did not have -i- or -y-. Grade II tangraphs were not used to transcribe Sanskrit CyV-syllables. As far as I know, the only Grade II tangraph used to transcribe Sanskrit was

3144 1pa2 (= Gong's 1pia and Arakawa's 1pya)

and it stood for pā, not pyā. Moreover, there was another transcription for Sanskrit which was Grade IV:

3425 1pa4 (= Gong's 1pja and Arakawa's 1pa:)

That tells me Grade II syllables had some phonetic quality without any Sanskrit parallel. -2 indicates that quality without specifying what it was. For years I thought that quality was lowering and backing and more recently I thought it was a medial -ɤ-, but now I am not so sure. That quality originated  at least in part from  pre-Tangut medial *-r-, but Tibetan transcriptions of Grade II syllables do not contain any r, so *-r- must have become something else in Tangut by the early first millennium.

As for the other grades, I once thought that Grades I, III, and IV were characterized by (partly) lowered vowels, medial -ɨ-, and medial -i-. However, my reconstruction could not account for facts such as these:

- Sanskrit i was transcribed as

0932 i3 (which I reconstructed as ʔɨi, a diphthong that does not exist in Sanskrit) instead of i4 (which I reconstructed as i).

- Sanskrit -a and were often transcribed with -a4 (which I reconstructed as -ia) as well as -a1 (which I reconstructed as -a): e.g., Skt (not pyā!) as 3425 1pa4 (see above).

- Similarly, Sanskrit -e was often transcribed as -e4 (which I reconstructed as -ie) instead of -e1 (which I reconstructed as -e).

Until I figure out how to reconstruct the grades in a manner that fits what is known about the Tangut transcription of Sanskrit - and the Tibetan transcription of Tangut and the Chinese grade system that influenced traditional Tangut phonological analysis - I will use numbers for grades in lieu of specific phonetic symbols.

*Arakawa (1997: 128-129) distinguished between two types of rhyme 17: -a and -a2 (whose 2 does not indicate a grade). There are a few minimal pairs such as

3755 1tsha 'mixed' (< Chn 雜): 5969 1tsha2 'hollow bag' (in Arakawa's notation)

which are not distinguished in Gong's or my notation. 3755 and 5969 had different fanqie in Tangraphic Sea and were in widely separated homophone groups in Tangraphic Sea (see 23.212 and 25.122 in Andrew West's online edition) and Homophones (29B31 and 35A58). I should indicate this type of rhyme-internal distinction somehow in my database: e.g., as -a1a and -a1b. The problem with using -a and -b is that it may imply that all rhymes ending in the same letter have something in common. Does 5969 1tsha1b have something in common with, say,

4917 1tshu1b 'shovel'

which was somehow different from

0916 1tshu1a 'conceited'?

Arakawa's notation does not distinguish between those two syllables which have different fanqie and are not homophonous in Homophones (32B16 and 33B37). In Tangraphic Sea, 4917 1tshu1b immediately follows 0916 1tshu1a (see 6.171 and 6.162 of Andrew West's edition), implying that the two may have been similar in a way that 3755 1tsha1a and 5969 1tsha1b were not. TANGUT PHONETIC DATABASE VERSION 1.0

1586 1ghiq2 'sound'

I have been posting reconstructed Tangut readings on my blog for years, but I have never provided a complete list of readings until now. Strictly speaking, the forms in version 1.0 of my Tangut phonetic database (downloadable in three formats: .htm / .pdf / .xls ) are transcriptions rather than phonetic reconstructions. The letters and numbers symbolize phonetic distinctions but do not necessarily precisely represent them.

The database has sixteen columns:

A. LFW: Tangut character numbere from Li Fanwen's 2008 dictionary. L stands for Li Fanwen. Characters L5995-6000 from his 1997 dictionary are also included as L5995a-L6000a.

B. Simple: Transcription for lay use. No tone numbers (see column F), -q (see column K), apostrophes (see column M), or grade numbers (see column N).

C. Complex: Full transcription for scientific use. Includes tone numbers (see column F), -q (see column K), apostrophes (see column M), and grade numbers (see column N).

Question marks indicate missing information: initials, vowels, and grades.

D. Class: The nine initial consonant types from Homophones and/or Mixed Categories of (the Precious Rhymes of the) Tangraphic Sea:

I 重唇音 Heavy lip sounds p- ph- b- m-    
II 輕唇音 Light lip sounds   v-
III 舌頭音 Tongue head sounds t- th- d- n-  
IV 舌上音 Tongue top sounds   (j-) (n-)
V 牙音 Tooth sounds* k- kh- g- ng-
VI 齒頭音 Tooth head sounds ts- tsh- dz-    s-
VII 正齒音 True tooth sounds ch- chh- j- sh-
VIII 喉音 Throat sounds Ø- h- gh-  
IX 來日音 L- and zh-sounds**   lh- l- z- zh- r-

Voiced obstruents may have been prenasalized: e.g., b- may have been [mb], etc.

The Class II consonant v- may have been [w], [ʋ], or [v]. There may have been another Class II consonant f- which is not included in the database. Five characters with h(w)- according to Tangraphic Sea fanqie are listed as Class II in Homophones. They may have had initial f-.

There is no consensus on the reconstruction of Class IV. This database tentatively follows Gong and has two Class IV initials which are identical to Class III n- and Class VII j-. Li Fanwen (1986) reconstructed a distinct palatal nasal. Nishida (1964) and Arakawa (1999) reconstructed a series of Class IV consonants.

Class VII consonants may have been retroflex [tʂ tʂʰ dʐ ʂ], alveopalatal [tʃ tʃʰ dʒ ʃ], or palatal [tɕ tɕʰ dʑ ɕ].

Class VIII glottal stop is not written in either the simple or complex transcription. The absence of an initial consonant in the transcription (a 'zero initial') indicates an initial glottal stop [ʔ].

What appears to be Initial w- in the transcription is actually a Class VIII zero initial plus medial w-sequence indicating the glottal stop-glide cluster [ʔw], not a simple glide [w].

Class VIII fricatives may have been velar [x ɣ] or glottal [h ɦ]. They may have had uvular allophones [χ ʁ].

Class IX lh- may have been voiceless [l̥] or a lateral fricative [ɬ].

Class IX z- may have been a lateral fricative [ɮ].

Class IX zh- may have been retroflex [ʐ], alveopalatal [ʒ], or palatal [ʑ].

Tai (2008) has made a convincing case for a sixth Class IX consonant ld- on the basis of Tibetan transcriptions of Tangut. I have not yet included this consonant in my database. It may have been a lateral affricate [dɮ]. I speculate it may have a voiceless counterpart lt- [tɬ].

E. Rhyme: The 105 rhymes of Tangut without regard for tones (see column F).

F. Tones: Translations of Tangut tone names which are in turn translations of traditional Chinese tone names. The names may not have described the contours of the tones: e.g., the 'level' tone may not have been level.

0. Tone unknown. (There is no tone zero.)

1. 平聲 'Level' tone.

2. 上聲 'Rising' tone.

(3. Reserved for the Tangut equivalent of the Chinese 去聲 'departing' tone. No tone 3 syllables are in the Tangut primary sources that I have on hand.)

4. 入聲 'Entering' tone. Rare tone in eleven syllables from a partial page of the Precious Rhymes of the Tangraphic Sea. More examples must have been on the rest of the page.

G. Subrhyme: Rhyme number taking tone into account. For example, subrhyme 1 of tone 1 is the first tone 1 rhyme, etc. There are 97 tone 1 subrhymes and 86 tone 2 subrhymes. Most rhymes have two subrhymes, but some have only one: e.g., rhyme 6 has no tone 2 subrhyme and rhyme 23 has no tone 1 subrhyme.

H. Initial consonant: See the list for column D above.

I. Medial consonant: -w- or zero. When no consonant precedes w- in the transcription, w- represents the cluster [ʔw], not [w].

J. Vowel: There are six vowel symbols:

i y u
e a o

These may have symbolized diphthongs and/or glide-vowel sequences in different grades (see column 14). Heights and other qualities may also have varied by grade.

With one exception (see below), the symbol y represents a nonfront, nonlow, unrounded vowel: e.g., [ɨ ɘ ə ɯ ɤ ʌ], etc. Cf. the use of y to romanize Russian ы.

Rhyme 105 -wya may have been [ɥa]. w and y respectively indicate the labiality and palatality of the glide [ɥ].

K. Cycle: zero, -q-, or -r-.

Tangut rhymes can be grouped into at least three cycles. The term 'cycle' refers to a somewhat fixed sequence of vowels within each cycle (ideally u, i, a, y, e, o). I have adopted Gong's groupings:

1. The basic cycle (rhymes 1-60). Unmarked.

2. The tense cycle (rhymes 61-76) indicated with -q (which is not a consonant). Tension is often indicated by a subscript dot in Tangutological literature, but Arakawa's symbol -q is easier to type and read. There is no agreement on the ending of the tense cycle: e.g., Sofronov ended it at rhyme 75, and Arakawa ended it at rhyme 79.

3. The retroflex cycle (rhymes 77-103) indicated with -r (which indicates vowel retroflexion and is not a consonant). Nearly all r-syllables are in the retroflex cycle. I note exceptions in column P.

Kychanov and Sofronov also recognize a fourth cycle (rhymes 99-105).

Rhymes 104-105 are like cycle 1 rhymes: i.e., they are neither tense nor retroflex. Kychanov and Sofronov regarded them as part of the fourth cycle.

L. Nasal/w:

Final -n indicates nasalization of the previous vowel and is not a final nasal [n].

Final -w is a glide [w]. It is in complementary distribution with -n, so I have combined the two into one column to save space. Historically both are traces of earlier codas: -n is from earlier nasals and -w is from *-k as well as *-w.

M. Prime: An apostrophe serving as an easily typed substitute for the prime symbol representing an unknown distinctive phonetic quality. Gong interpreted this quality as vowel length, though there is no correlation between prime and vowel length in Tangut transcriptions of Sanskrit.

N. Grade: Although grades are not explicitly identified in the Tangut phonological tradition to the best of my knowledge, correlations between certain Tangut rhymes and the four grades of Chinese traditional phonology have been recognized for over a century. These correlations have been interpreted as evidence for a Chinese-like system of grades in Tangut. There is no agreement on the number of grades or their phonetic qualities. Here I generally follow Gong's classification with two exceptions:

- I regard rhymes 4 and 6 as Grade II, not Grades I and III.

- I divide Gong's Grade III into Grades III and IV.

There are strong but not absolute correlations between consonant classes and grades. The typical pattern is as follows:

Grade\Class I/III/V/VIII, lh-, z- II, l- IV/VII, zh- VI, r-

I list exceptions to this pattern in the notes (column P).

O. Variant: Number of main character entry in Li Fanwen (2008) for a variant. E.g., the main entry for L0486 is L0017.

P. Notes (which I was unable to fit into the PDF version):

1. All instances of Class VIII h-syllables listed as Class II in Homophones.

2. All initial consonants followed by rhymes with unexpected grades.

3. R-syllables outside the retroflex cycle.

4. Miscellaneous and not comprehensive:

- Possible duplicate character (L6000a).

- Uncertain initial (L3358).

- Typos in Li Fanwen 2008 (L3975, L4718)

- Loan from/transcriptions of Chinese from L5995a onward

- Ghost characters in Nevsky and Sofronov.

- Important and interesting Sanskrit transcription characters.

- Variant second character for the Tangut autonym Tibetanized as Mi-nyag (L0831).

- Chinese phonetic approximations for L2583 (possibly indicating a -w- not in Gong's reconstruction) and L2955

*2.2.6:16: The Chinese term 牙音 'tooth sounds' literally translated into Tangut as

0039 1586 2korn1 1ghiq2 'tooth sound'

refers to velars which are not pronounced with the teeth.

The Chinese and Tangut terms for classes VII and VIII contain different words for 'tooth': 齒 and

0169 1shwi3 'tooth'.

**2.2.6:50: Also known as 流風音 'flowing wind sounds', the name used by Nishida and Arakawa. I have not seen any Tangut term for Class IX containing a word meaning 'flow'. The term I am familiar with is

2302 5425 1586 1ly3 2zheq3 1ghiq2 'wind (l-) and zhe-sounds'

which is a Tangut equivalent of Chinese 來日音 'l- and zh-sounds'.

1ly3 'wind' may tell us that the Tangut thought laterals sounded like the wind.

2zheq3 is a transcription character without any meaning. A STRANGE SERVANT WHO'S HALF A CALF

According to various secondary sources (Sofronov 1968 II, Li Fanwen 1986, 1997, 2008, Shi et al. 2000), Tangut 1449 'servant' has rhyme 2.40, yet its fanqie in a primary source (Mixed Rhymes of the Tangraphic Sea) indicates it has rhyme 2.80 -or1:


1449 2chhor1 = 5886 1chhon2  'to steal' + 1237 2vor1 'calf'

Perhaps the incorrect rhyme 2.40 originated with Sofronov (1968 II: 375) who may have been the first to identify the rhymes of the final speller 'calf' and its homophone

0387 2vor1 'the surname Vor'

as 2.40 -i̭eɯ (= my -ew3, not my -or1). Was 4 a typo for 8? In any case, the Precious Rhymes of the Tangraphic Sea lists 'calf' under rhyme 2.80. (Unfortunately, 'servant' is in the Mixed Categories section of the Precious Rhymes of the Tangraphic Sea which is organized by initial class* and not by rhyme, so its rhyme has to be inferred from its fanqie.)

Gong regarded rhyme 2.80 as Grade I. Normally Grade I rhymes do not follow Class VII initials like chh-, yet 'servant' has the unusual combination chh- + 2-or1 even though it is not a known loanword or transcription character (i.e., a character for an un-Tangut syllable).

I would expect Grade II/III/IV rhyme 2.81** (as listed in Kychanov and Arakawa 2006: 289) after chh-. Gong's reconstruction in Li Fanwen's 1997 and 2008 dictionaries contains rhyme 2.81 -jwor (= my -or2 and -or3), even though it is next to the rhyme number 2.40 for Gong's -jiw (= my -ew3).

Could the fanqie of 'servant' be an error that tells us rhymes 2.80 and 2.81 were similar? Or did 'servant' truly have an exceptional reading 2chhor1 instead of 2chhor2 or 2chhor3?

*2.1.2:18: 'Servant' is in the Mixed Categories section for syllables with tone 2 and Class VII initials (ch-, chh-, j-, sh-). For some unknown reason, j-syllables are only in Mixed Categories and not the other two volumes of the Precious Rhymes of the Tangraphic Sea. A random syllable with a Class VII initial in Mixed Categories is likely to have j-. Perhaps that is why Sofronov (1968 II: 298) reconstructed 'servant' with initial ndź- (= my j-). But 5886 'to steal', the fanqie initial speller of 'servant', has chh-. That initial can be confirmed by the Tibetan transcription chi [tɕʰi] for

3465 1chhi3 'meat'

the fanqie initial speller of 'to steal'. I do not know why 'servant' with chh- was placed in Mixed Categories instead of the Precious Rhymes of the Tangraphic Sea volume for the second tone.

*2.1.2:45: Grade II rhymes like -or2 are normally distinguished from Grade III/IV rhymes like -or3 and -or4. I do not know why they were grouped together:

Rhyme 96 1.91 1-or2 1-or3 1-or4
2.82 2-or2 2-or3 2-or4

Such clustering may indicate that Grades II-IV were more phonetically similar to each other than to Grade I: e.g., in Gong's reconstruction, Grade II had medial -i- and Grade III (= my Grades III and IV) had medial -j-. There are no rhymes combining Grade I with other grades.

There are minimal pairs of rhyme 1.91 syllables such as


4798 1mor2 'transcription character'*** : 1543 1mor4 'true'

which Gong interpreted as a distinction between Grade II mior and Grade III mjor (he did not recognize Grade IV). There are no such minimal pairs of rhyme 2.81 syllables, though that is probably due to chance. Generally there are fewer tone 2 syllables than tone 1 syllables in a rhyme category. (Nonetheless there are a few rhyme categories consisting solely of tone 2 syllables: e.g., there are no tone 1 syllables with rhyme 23.)

***2.1.2:51: According to the Tangraphic Sea, 4798 1mor2 is for transcribing mantras. However, Grade II tangraphs are almost never used in transcribing Sanskrit. That implies Grade II had a phonetic quality absent from Sanskrit. A rare exception is the transcription character

0310 2mu2

for Sanskrit mū. What Sanskrit syllable was 4798 1mor2 devised to write? 單于 *DAR-UGHA? PART V: THE SPOUSE SUPREME

Having just mentioned the Central Asian title qaɣan and Old Chinese (OC) 烏 *qˤa 'crow' in my last post, I thought I should write something about 閼氏, the transcription of the Xiongnu title for the wife of the supreme ruler (單于; see parts I-IV below).

The phonetic of 閼 is 於 *qa 'in', a near-homophone of  烏 *qˤa 'crow';  both 烏 and 於 are drawings of crows). Baxter and Sagart (2014: 40) reconstructed 閼 as *qˤat and 氏 as *k.deʔ. It is tempting to conclude that *qˤat k.deʔ transcribed the Xiongnu prototype of the Central Asian title qatun 'queen', particularly if *k.d- had been simplified to *d- in the Han Dynasty.

However, the late Eastern Han scholar 蘇林 Su Lin phonetically glossed 閼氏 as 焉支 *ʔɨen kie, implying an alternate earlier OC reading *ke for 支. Guangyun (1008) lists the Middle Chinese reading *tɕie < OC *ke for 氏 in 閼氏. The modern Mandarin reading yānzhī for 閼氏 is a reflex of Su Lin's *ʔɨen kie. (OC *qˤat k.deʔ would have become Mandarin *èshì.)

Moreover, Old Chinese *q- had already shifted to a glottal stop by the time 閼氏 first appears in the Records of the Grand Historian (c. 109 BC) in which 'Alexandria' was transcribed as 烏弋山離. If 烏 transcribed a foreign word with a zero initial, then 閼 (which shared an initial with 烏 *qˤa) must also have transcribed a foreign word with a zero initial (or initial glottal stop or, less likely, ʕ-), then Former Han 閼氏 *ʔˤat deʔ / ke could not have transcribed a Xiongnu word like qatun (or even *χatun).

Would alternate readings of 閼 help salvage the qatun hypothesis? 閼 has four Middle Chinese readings in Guangyun:

MC *ʔɑt  < OC *qˤat (my *qat, Zhengzhang's *qaːd)

MC *ʔɨat  < OC *qat (my *Cɯ-qat, Zhengzhang's *qad)

MC *ʔɨen  < OC *qran or *qren (my *Rɯ-qan or *Rɯ-qen, Zhengzhang's *qran)

MC *ʔen  < OC *qˤen (my *qen, Zhengzhang's *qeːn)

The OC readings other than OC *qˤat are not in Baxter and Sagart (2014) but are my attempts to project the MC readings back into OC using their system.

The last two MC readings are listed for 閼氏 in Guangyun.

All readings would have had an initial glottal stop during the Former Han. *ʔˤat, *ʔat, *ʔran  ~ *ʔren, *ʔˤen.

The final *-t or *-n could have been an attempt to transcribe Xiongnu *l. Vovin (2002: 392) identified 閼氏 as a transcription of the Xiongnu cognate of Proto-Yeniseian *ʔal/r'it 'wife'.

If Old Chinese still had *-r (and the transcription of 'Alexandria' may indicate otherwise'*), then the use of a *-t/-n graph instead of an *-r graph points to *l instead of *r in the Xiongnu word. Otherwise, *-t/-n could indicate either *l or *r in Xiongnu.

Other segments may be problematic for the *ʔal/r'it hypothesis:

Proto-Yeniseian *a does not match *e in the OC ancestors of the readings prescribed by Guangyun. But maybe the first Xiongnu vowel was higher than *a: e.g., *æ, *ɛ, or even *e.

The final *-t of the Proto-Yeniseian word matches the *d- of 氏 *deʔ but not the *k- of the reading *ke that would be the source of the readings later specified by Su Lin and Guangyun.

Moving on to semantics, did Xiongnu elevate 'wife' to 'empress' (cf. how Proto-Indo-European *gʷēn 'woman' was elevated to English queen), or did 'empress' become 'wife' (cf. the downward shift of status of English lady)? Or was the word 'wife' in Xiongnu and interpreted as 'empress' by the Chinese?

If the Xiongnu title transcribed as 閼氏 is not the source of qatun, where did qatun come from? I am not convinced that the Xiongnu title transcribed as 護于 is the source of qaɣan (see part IV), but I do favor Vovin's (2007: 184) speculation that qatun is qan 'ruler' with a Rouran feminine infix *-tu-. Alas, as Vovin noted, "[a]lmost nothing is known about the Ruan-ruan [= Rouran] language", so that hypothesis cannot be tested.

(The above statement does not rule out a Yeniseian origin for qaɣan and/or qan. Perhaps there was a Xiongnu phrase 'great ruler' that sounded like qaɣan, but that phrase was unrelated to the Xiongnu word underlying 護于 and was not an official title in the Xiongnu language [or it somehow escaped the notice of the Chinese]. The Rouran could have been the first to make qaɣan a title.)

*Baxter and Sagart (2014) reconstructed 山 corresponding to -xan- in 'Alexandria' with final *-r. I expressed my doubts here. But if they are correct, then Early Old Chinese *-r must have shifted to *-n in the Former Han dialect underlying 烏弋山離 unless the transcriber misheard 'Alexandria' as 'Alexaria'. 單于 *DAR-UGHA? PART IV: QA-UGHA

I thought of one more reason not to reconstruct Old Chinese 單于 as *dar-ʔu-ɣa. If 護于 'crown prince' (with the same second character) is a transcription of the Xiongnu prototype of the Central Asian title qaɣan, then it is unlikely that 護于 was *ʁwah-ʔu-ɣa, unless a trisyllabic Xiongnu word was compressed into a disyllabic word in Turkic and Mongolic.

Moreover, if Vovin (2007: 180) is correct, qaɣan is from the Xiongnu cognate of Proto-Yeniseian *qɛʔ 'great' plus the Xiongnu cognate of Proto-Yeniseian *qʌ̄j ~ *χʌ̄j 'ruler' with allophonic medial voicing and a Mongolic suffix -n; no *u would be expected in such a compound (unless it was a Xiongnu linking vowel). Vovin (2007: 181) suggested that medial *-w- could have been lost in 'ruler' during the two millennia separating the Xiongnu from the modern Yeniseian languages.

Vovin regarded 'ruler' as being the second half of 單于 whose first half he identified as the Xiongnu cognate of Yeniseian tɨr-words for 'lower reaches of the Yenisei, north'. (He did not provide a Proto-Yeniseian source for those words.) He translated 單于 as 'Ruler of the North' (as opposed to the Chinese in the south).

That brings to mind two questions.

First, is there another system of titles in which the 'ruler of a region' outranks a 'great ruler'?

Second, how did 'great ruler' come to outrank 'ruler of the north' (which may have been the prototype of early Turkic tarxaːn and Written Mongolian <darqan>) if not Written Mongolian <daruγ-a>?

A third question is why neither 護于 nor 單于 contain Han Dynasty uvulars corresponding to the q of *qʌ̄j, qaɣan and <darqan>. By that time, the original Old Chinese uvular stops had been lost, and new uvulars had developed from velars: e.g.,

*q- > *ʔ- (e.g., 烏 'crow'; more here)

*Cʌ-k-, *k- + lower vowel > *q-

*ɢʷ- > *ʁw- (e.g., 護)

*Cɯ-ɢʷ- > *ɣw- (e.g., 于)

The first certain attestation of qaɣan in Chinese transcription has such secondary uvulars: 可寒 *qʰɑˀʁɑn ~ *qʰɑˀɢɑn (Book of the Later Han, 5th century AD; Baxter and Sagart reconstructed 可 and 寒 with velars in Old Chinese, though I think 可 may have been uvular all along).

Did Xiongnu *q weaken to *χ? Does the variation in later forms of the titles reflect borrowing and transcription before and after the shift or from different dialects of Xiongnu? Even if Xiongnu had both *q and at different times and/or in different dialects, that still doesn't explain the voiced initial of 護于 *ʁwah-ɣwa.

I noted on Tuesday that 單于 had type B syllables corresponding to what are type A syllables from a Chinese perspective in <daruγ-a>. tarxaːn and <darqan> also have type A syllables. How did a Xiongnu BB word become an AA word in 'Altaic'? Conversely, if the word were AA in Xiongnu, why was it transcribed as AA in Chinese?

護于 has a different kind of mismatch; 護 is type A and 于 is type B, but qaɣan is a sequence of type A syllables (as is its later Chinese transcription 可寒). Was the Xiongnu title harmonized in Turkic and Mongolic? Harmonizing (*AB > AA) is understandable (and is the key to my Old Chinese reconstruction), but a complete flip-flop (*BB > AA) is not. 單于 *DAR-UGHA? PART III: THE OPPRESSOR

I thank Andrew West for pointing out the biggest problem with my already discredited Old Chinese 單于 *dar-ʔu-ɣa idea: the fact that Written Mongolian ᠳᠠᠷᠤᠭ᠋᠎ᠠ <daruɣ-a> 'chief' has an internal etymology: it is from the verb <daru-> 'to press' plus a suffix <ɣ-a> that "expresses an unfinished action which started in the past and continues into the present" (Poppe section 362; the hyphen in transliteration indicates a break in the traditional Mongolian script, not a morphemic boundary). If <yabu> is 'to go' and <yabuɣ-a> is 'someone who started going and is still going', then a <daruɣ-a> is 'someone who started pressing and is still pressing': i.e., an oppressor (Dashdondog 2010: 105).

Although I assume the Xiongnu empire included Mongolic speakers, it is unlikely that its non-Mongolic rulers would adopt a title for their supreme ruler from the Mongolic language of the ruled. Therefore the resemblance of 單于 *dar-ɦwa (as reconstructed by Baxter and Sagart 2014: 260) and <daruɣ-a> is coincidental.

Pulleyblank (1963: 257) thought that <daruɣ-a> might have been a direct borrowing from Xiongnu (unlikely for the reasons I just stated unless the derivation from 'to press' is a folk etymology) unlike early Turkic tarxaːn (borrowed into Written Mongolian as <darqan>; see Clauson 1972: 539-540) which might have been an indirect borrowing (perhaps via the Rouran). If the two words are unrelated, perhaps the Turkic form is the true successor of 單于. Although Turkic t- does not match Old Chinese *d-, Written Mongolian <darqan> may be from an unattested Turkic variant with *d-. (Clauson listed no Turkic forms with d-.) 單于 *DAR-UGHA? PART II: THE XIONGNU SON OF HEAVEN

Last night I left out one other argument why 于 could not have been something like *ʔu-ɢa in Old Chinese. If Late Old Chinese *-w- had developed from *Cu-presyllables, I would expect a lot of word families with zero ~ *-w-alternations like those of Tangut: e.g.,

1671 1ne < *Cɯ-ne 'red' : 2765 1nwe < *Pɯ-ne 'to turn red'

See Gong (2002: 45-46) for more examples. However, Gong noted that "such examples are exceptional in Chinese": e.g.,

熱 Middle Chinese *ɲiet 'hot' : 爇 Middle Chinese *ɲwiet 'to burn'

Baxter and Sagart (2014: 47) reconstructed this pair as *C.nat and *not which are equivalent to *Cɯ.nat and *Cɯ.not in my system*. What was originally an alternation of root vowels became an alternation of medials; there is no need to explain it in terms of a *Cu-presyllable: e.g.,

*Cu.not > *Cɯ.not > *ɲwiet

*Cu.nat > *Cu.nwat > *ɲwiet

There is only one zero ~ *-w-alternation in phonetic series 0097 于: 荂 has two Middle Chinese readings, *kʰwæ and *xɨə. All other 于-graphs have *w. Is it likely that all 于-readings were once *Cu-Qa (later becoming Middle Chinese *(C)w-syllables) with the sole exception of *Cɯ-qʰa which became *xɨə? It is simpler to reconstruct 0097 as a *Qʷ-series and regard the second reading of 荂 as an outlier which may be from a Middle Chinese dialect whose ancestor lost *-w- under certain conditions.

Worse yet, I know of no word families with zero ~ *-w-alternations written with phonetic series 0097: e.g., there is nothing other than the transcription 單于 suggesting that 于 'to go' had a root *ɢa. And there is no guarantee that the Xiongnu word underlying 單于 was like its much later Written Mongolian descendant ᠳᠠᠷᠤᠭ᠋᠎ᠠ <daruɣ-a> 'chief'; 于 could have transcribed Xiongnu *ɣwa with *-w-.

I would not be surprised if Xiongnu had a cluster *ɣw reduced to <ɣ> in Written Mongolian because Xiongnu *Tr- in the full title of the supreme ruler

𢴤黎孤塗單于 *Tˁraŋ rˁi kˁwa lˁa dar ɣwa 'heaven son chanyu'

was simplified to <t> in Written Mongolian ᠲᠩᠷᠢ <tngri> 'heaven'.  (The initial of 𢴤 could have been *tˁr-, *tʰˁr-, or *dˁr-. See Pulleyblank 1963: 241.)

The initial of 孤塗 *kˁwa lˁa, the Chinese transcription of the Xiongnu word for 'son', was probably [q], as Early Old Chinese *q had become a glottal stop by this point (as in Egyptian Arabic). Could Ket qalek' 'younger son, grandson' be cognate to Xiongnu *qwala? As Pulleyblank (1963: 245) wrote,

Being a word for a fundamental human relationship, it ['son'] is unlikely to be a loanword in Yenisseian and unless it is an extraordinary coincidence it creates a presumption that the Hsiung-nu [Xiongnu] belonged to that language group.

*Given that the phonetic of 熱 and 爇 is 埶 *ŋ̊et-s (= my *Cɯ-ŋ̊et-s or *Hɯ-ŋet-s) with a velar nasal, perhaps their root could also be reconstructed with a velar nasal as *ŋjat ~ *ŋjot. The *-j- would condition the palatalization of *ŋ-. There is no *-j- in Baxter and Sagart's 2014 reconstruction. 單于 *DAR-UGHA?

It would be interesting to reexamine all of the Late Old Chinese transcriptions of Xiongnu words from Pulleyblank's seminal 1962-1963 article on Old Chinese consonants.

Baxter and Sagart (2014: 260) reconstructed one of those words as

單于 Han *dar-ɦwa 'Xiongnu ruler' (< Early Old Chinese *[d]ar + *ɢʷ(r)a)

The word is better known by its modern Mandarin pronunciation chanyu (shanyu in Giles' 1892 dictionary). It survives in Written Mongolian as ᠳᠠᠷᠤᠭ᠋᠎ᠠ <daruɣ-a> 'governor'. (Has the word been found in Khitan?):

Three notes:

1. Both 單 and 于 are type B syllables. Norman (1994) proposed that the type A/B distinction in Old Chinese was one of pharyngealization: type A syllables were pharyngealized, and type B syllables weren't. Norman drew parallels with 'Altaic' languages, and I have wondered if the phenomenon spread from Chinese to 'Altaic', as it is reconstructible at the Middle Old Chinese level (or even at the Early Old Chinese level according to Baxter and Sagart): i.e., it existed in Chinese during a period long before the first attested 'Altaic' language.

It is commonly assumed that Xiongnu was a typologically 'Altaic' language. If it was, and if 單于 was an accurate transcription of the Xiongnu word for 'ruler', then that word should have contained type B (i.e., nonpharyngealized syllables) in Xiongnu as well as in Late Old Chinese. However, Written Mongolian <daruɣ-a> contains the equivalents of type A syllables. Did the word shift from type B to type A during the many centuries separating 單于 and <daruɣ-a>? Or is 單于 simply an inaccurate transcription?

2. Did 單 still end in *-r during the Han Dynasty? According to Starostin (1989), *-r became *-n in what he called 'Classical Old Chinese' prior to the Han Dynasty. I have not yet seen any Han Dynasty rhyme evidence for *-r. Nonetheless there is independent evidence for *-r-retention in some dialect(s) at a much later date: e.g., Muromachi Period (!) Japanese soroban still resembles 算盤 Old Chinese *[sˁ]orʔ-s [bˁ]an 'abacus'; the *-r- corresponds to *-n- in prestigious Middle Chinese *swanʰ ban from centuries earlier. Perhaps *-r was retained in nonprestigious dialects on the margins: e.g., the presumably eastern dialect that was the source of soroban. The transcription 單于 could be based on a northern dialect of Chinese spoken near the Xiongnu state.

3. Written Mongolian has <ugh> corresponding to *ɦw (or *ɣw?) in the Chinese transcription. The labial and fricative segments are in the opposite order.

Unlike Baxter and Sagart, I don't think Old Chinese originally had a distinction between pharygealized and nonpharygealized uvulars. Their nonpharygealized *ɢʷ- corresponds to my *Cɯ-ɢʷ- (details here). *ɯ is my cover symbol for an unknown high vowel. What if 于 was *ʔu-ɢa which later underwent metathesis?

Old Chinese *ʔu-ɢa > *ʔu-ɣa > *ʔu-ɣua > *ɣua > Middle Chinese *wuo

Then 單于 would be *dar-ʔu-ɢa (or ...ɣa?) which would be a near-perfect match for Written Mongolian <daruɣ-a>.

There are several problems with the metathesis hypothesis:

First, Baxter and Sagart (2014) did not reconstruct presyllables with *ʔ-. Perhaps some of their *C- are glottal stops, but that remains to be confirmed.

Second, according to my theory of the origins of the type A/B distinction, 單 should also have a presyllable since I reconstruct them to shift *a-syllables from type A to type B:

Old Chinese *Cɯ-dar > *Cɯ-dɨar > *dɨan > *dian > *dien > Middle Chinese *dʑien > Mandarin chan ~ shan

Yet such a presyllable would correspond to zero in Written Mongolian <daruɣ-a> which I assume did not undergo apheresis.

One possibility is that the voicing of the initial is secondary and is from a nasal prefix that was absorbed before 于 lost its presyllable:

*Nɯ-tar > *Nɯ-tɨar > *Ntɨar > *dɨar

The normal reading of 單 was *Cə.tˤar (= my *Cʌ.tar) with *-t-.

The trouble with this scenario is that the presyllable had to condition vowel warping before its unstressed vowel was lost. Did the Xiongnu word have a diphthong like *ɨa in its first syllable?

Another possibility is that dental initials like *d- were normally type B unless a conditioning factor (a low-vowel presyllable?) was present (cf. how I think uvulars were normally type A unless a high-vowel presyllable was present).

Third, the labiality of 于 'to go' was probably in the root if it is cognate to Written Tibetan Hgro 'to go' and Written Burmese ကြွ <krva> 'to go (honorific)' (Gong 1994: 81) or to wa-type words for 'to go': e.g., Tangut

0676 1ve3 'to go' < *CE-wa(ŋ)

So the Old Chinese root initial must have been labiouvular rather than simply uvular.

Could Written Mongolian <ruɣ> be a simplification of a Xiongnu cluster like *rɣw that was faithfully reproduced in the Chinese transcription? TANGUT RHYME 41: SMALL PERSON, BIG PROBLEM

I have long been troubled by my reconstruction of Tangut rhyme 41

3798 1tsen1 'small' (which looks like 'person' + 'small' but is from 'few' + 'small')

because the evidence points in different directions:

1. Internal evidence

There are only fourteen known rhyme 41 tangraphs in eight homophone groups: seven in the first ('level') tone volume of the Tangraphic Sea and one with dz- in the Mixed Categories volume:

Homophone group
Tangraphic Sea 'level' tone volume


Mixed Categories of the Tangraphic Sea

The low frequency of this rhyme and the absence of a 'rising' tone counterpart *2-en1 imply that it could not have been something simple like -e.

The circles divide some but not all homophone groups. The reasoning for the implied grouping of, for instance, 1phen1 and 1den1 as distinct from 1ben1 is unknown. (I would understand if 1phen1 and 1ben1 were grouped together since both had Class I [i.e., labial] initials.

The coexistence of v- and l- with dz- indicates that rhyme 41 must have been Grade I. They do not normally coexist in any other grade:

v-, l-




(The table above only shows the general pattern. There are exceptions.)

Tangut rhymes normally form sequences with the following order:

- Grade I-IV V

- Grade I-IV V'

- Grade I-IV Vn

Rhyme 41 is where I would expect 1en1:

- Rhymes 34-37: -e1, -e2, -e3, -e4

- Rhymes 38-40: -e1, -e2, -e3/e4

- Rhyme 41: -en1?

- Rhymes 42-43: -en2, -en3/en4

Yet as we will see, no other evidence supports a nasal vowel. (My -n indicates nasalization; it is not a consonant.)

2. Chinese transcription evidence

As I already noted in my entry on line 104 of the Golden Guide,

1720 1ven1

was a transcription character for the Chinese Grade III (not I!) surname 隗 *2wi3 in the Tangut translation of Sunzi.

1720 also transcribed Chinese Grade I  外 *3wai1 in Sunzi and the Timely Pearl and Grade III (not I!) 偉 *2wi3 in the Ganying Pagoda inscription.

Sofronov (1968 II: 30) listed 1720 as a transcription of Chinese Grade I 磑 *1we1 ~ 3we1.

None of those sinographs would have been read with nasal vowels in the dialect known to the Tangut (unless the nasality of *ŋ-, the former initial of 外 and 磑, spread to the vowel - but 隗 and 偉 never had nasal initials).

No other rhyme 41 tangraphs were used to transcribe Chinese to the best of my knowledge.

In the Timely Pearl,

3798 1tsen1 'small'

was transcribed as 栽 *1tse1, which may indicate that the nasality of rhyme 41 was lost by the end of the 12th century in the author's dialect. I do not know of nay other

3. Tibetan transcription evidence

All three Tibetan transcriptions known to me lack nasals:

dwi, dwe (Nishida 1964: 53; frequency of each unknown; note the absence of -w- in the reconstructions!)

-eH (Tai 2008: 216; initial consonant unknown)

4. Sanskrit transcription evidence

As far as I know, rhyme 41 tangraphs were never used to transcribe Sanskrit. That implies rhyme 41 was unlike anything in Sanskrit: e.g., it was not short i, long ī, or long e [eː] with or without a following nasal or nasalization. (Sanskrit has no short e.)

5. Comparative evidence

Guillaume Jacques (2014: 186) compared

3798 1tsen1 'small'

to Japhug xtɕi < 'id.' without a nasal, but noted the unexpected initial correspondence (cf. Somang kə-ktsî 'id.' with ts-). He derived rhyme 41 from pre-Tangut *-ij without a nasal. Is it possible that Japhug and Somang lost a final nasal?

6. Conclusion

For the time being, I weigh the internal evidence over the external evidence and write rhyme 41 as 1-en1, but I remain uneasy about nasality.

An alternative is to follow Gong and write rhymes 41-43 with -i or -y instead of -n:

41. -ei1 (instead of -en1; cf. Gong's -əj)

42. -ei2 (instead of -en2; cf. Gong's -iəj)

43. -ei3/-ei4 (instead of -en3/-en4; cf. Gong's -jɨj)

I would then change my -on rhymes to -ou or -ow (cf. Gong's -ow) so that there are no nasalized mid vowels.

However, I do not understand why Gong reconstructed final glides in those rhyme groups. GSR 0289

In line 104 of the Golden Guide, the Tangut character


may have transcribed the Chinese surname 薛 which was pronounced something like *4se4 in the dialect known to the Tangut) and as ɕie in the modern northwestern dialect of Xi'an. Those readings lack a labial segment present in the modern standard Mandarin reading Xuē [ɕye]. The [y] of Xuē corresponds to nothing in the prestige Middle Chinese dialects preserved in Chinese traditional phonological sources and in the reading traditions of Japan, Korea, and Vietnam:

Middle Chinese *siet

Phags-pa Chinese ꡛꡦ <see>

Sino-Japanese setsu

Sino-Korean sŏl; idealized Middle Sino-Korean syə́rʔ (this y is IPA [j], not [y])

Sino-Vietnamese tiết < *siət

Baxter and Sagart (2014)'s reconstructions of Grammata Serica Recensa series 0289 also lack labial segments:

Old Chinese
Old Chinese (this site)
Middle Chinese (this site)
0289a *s.ŋat
*sɯ.ŋat *siet

to control, correct, govern
0289d-e spec. of plant; place name (i.e., where the plant grows?)
0289g *ŋ(r)at *
(was *C- = *s-?)
concubine’s son
0289j (shoots from) tree stump

(Karlgren [1957: 89] wrote,

The alternation s- :ng- in this series is probably a trace of some Archaic [i.e., Old Chinese] initial consonant combination.

and Baxter, Sagart, and I would agree.)

Hence I thought the [y] of 薛 Xuē [ɕye] might be a local Mandarin innovation, but it isn't. Forms with labial vowels coexist with ɕie-type forms in all branches of Chinese: e.g.,

Jin: 并州 Bingzhou ɕieʔ (lit.), ɕyəʔ (colloq.)
Wu: 常山 Changshan ɕyʌʔ

Hui: 祁門 Qimen syɐ̆

Gan: 都昌 Duchang siol

Xiang: 雙峰 Shuangfeng ɕya (lit.), se (colloq.)


Southern: 雷城 Leicheng soi

Northern: 石陂 Shibei sye

Yue: 高要 Gaoyao sit (new), syt (old)

Ping: 南寧 Nanning ɬyt

Hakka: 惠州 Huizhou syet

Unclassified languages also have a mix of labial and nonlabial forms: e.g., in 富川 Fuchuan, the 七都 Qidu dialect has si but the 八都 Badu dialect has suɐi.

I don't see any obvious pattern here. In Bingzhou the labial form is colloquial (i.e., likely to be native or at least from an earlier layer of borrowing), but in Shuangfeng, it is literary. (I suppose that if the labial form is an innovation, it must originate outside Shuangfeng.) Labiality is so widespread that it must have been present in some earlier prestige dialect(s), albeit not those recorded in the mainstream phonological tradition.

Nonlabial forms once (?) existed in Beijing Mandarin itself. Giles (1892: 449) listed under the reading* xiē and gave xiě (with a different tone and xuē as alternate readings. Are xiē and xiě now extinct? Do any 薛 families today call themselves Xiē and Xiě?

*I converted Giles' romanization to pinyin for ease of comparison. THE GOLDEN GUIDE: LINE 104: TANGRAPHS 516-520

104. I have a long list of topics, and I can't make up my mind about what to write about next, so I'll fall back on the Golden Guide. Two lines to the last surname ...

Tangraph number 516 517 518 519 520
Li Fanwen number 1720 1456 4686 0881 4760
My transcription 1ven1 1chhi2 1khwan4 2se4 1an1
Tangraph gloss the surname Ven the surname Chhi; Sanskrit chi, che? the surname Khwan; transcription of Chinese 郡 *3khwin3 'administrative region' the surname Se the surname An
Word the surname 隗 Wei (*2wi3) or 韋 Wei (*1wi3) the surname 翟 Zhai (*4chhe2) the surname 權 Quan (*1khwan2) the surname 薛 Xue (*4se4) the surname 安 An (*1an1)
Translation Wei, Zhai, Quan, Xue, An

Now I have Kotaka's six-part series on the Golden Guide on hand, so I'll use that as reference in addition to Nie Hongyin and Shi Jinbo's article. Kotaka's notes make me realize how poorly understood the relationship between the phonologies of Tangut, Tangut period northwestern Chinese, and Sanskrit still is.

516: I suppose this analysis of 1720 somehow describes the Ven family:


1720 1ven1 = right of 1105 1khon4 'to give' + left of 5659 1ver1 'luxuriant'

Grade I 1720 appears in the Tangut translation of Sunzi as a transcription of the Chinese Grade III surname 隗 *2wi3 which had no nasality. Why wasn't 隗 transcribed as Grade III *vi3?

Nie and Shi identified 1720 as the Chinese Grade III surname 韋 *1wi3 which also had no nasality. Kotaka noted that 韋 was transcribed as

5287 1vi1 (with Grade I, not III!)

in The Forest of Categories, so he thought 1720 was unlikely to be 韋. The fact that *wi3 (with different tones) was transcribed as 1ven1 and 1vi1 may indicate that Chinese *-i3 was unlike either Tangut -en1 or -i1 and had no exact match in Tangut.

517: 1456 is a fanqie character:


1456 1chhi2 = 1796 1chhuq3 'to lure' + 4972 1chi2 'to amuse'

The Tangraphic Sea states that 1456 is for transcribing mantras. Arakawa (1997: 116) thought 1456 might represent Sanskrit chi or che, but Sanskrit ch- was normaly transcribed as Tangut tsh-, not chh-. Moreover, the rhyme -i2 is not in Arakawa's table of attested Sanskrit transcriptions, implying that -i2 was somehow unlike Sanskrit short i, long ī, or long e [eː]. (Sanskrit has no short e.)

The use of 1456 1chhi2 for Chinese *4chhe2 may imply that neither Tangut -i2 nor Tangut -e2 precisely matched Chinese *-e2.

518: 4686 is a semantic compound:


4686 'administrative region' = 4719 2keq2 'boundary' + 2725 1wo2 'circle'

4686 1khwan4 is a poor vocalic match for Chinese 郡 *3khwin3 'administrative region'.

4686 appears in Sunzi as a transcription character for 權 *1khwan2 which can be a surname. The vowel types match but the grades (Tangut IV, Chinese II) don't.

519: The analysis of 0881 is unknown.

The left side may not be phonetic since I cannot find any se4-graphs containing it. Perhaps it describes the Se family: e.g., if it is short for

4773 2luq3 'silk',

the Se might have be known as sellers of silk (and Se brings to mind Latin sericum 'silk', though the similarity is probably coincidental).

The right side must be from 2888 2my1 'surname'.

Nie and Shi identified 0881 as a transcripiton of Chinese 薛 *4se4, but Kotaka pointed out that 薛 was transcribed as

3683 2sa4 (first syllable of 3683 2532 2sa4 1de4 'day after tomorrow' 1de4 is 'day')

in The Forest of Categories.

520: 4760 is a fanqie character:


4760 1an1  = top of 4940 2y4 '' + bottom left and center of 4685 1an1

4940 either represented initial glottal stop or zero. Homophones defined 4760 as a surname and its homophone 4685 as a place name, but 4760 also appeared in the place name

4760 3628 1869 1an1 1ghwan4 1po1  ' 安原堡 *1an1 1(ngg)wan3 1po1 Anyuan Fortress'

The correspondence of Tangut gh- to Chinese *ngg- or zero should be investigated. MEETING TANGUT CROWS

Do non-Chinese Sino-Tibetan languages have different sets of consonants corresponding to the (labio)velars and (labio)uvulars that Baxter and Sagart (2014) reconstructed? I would like to see the comparative evidence in Pan Wuyun's "喉音考" (On laryngeals, 1997) mentioned in Baxter and Sagart's 2010 paper on uvulars. Baxter and Sagart focused on Chinese-internal evidence, though they did point out that they thought

Written Tibetan g- corresponded to Old Chinese *ɢʷ- (which I used to reconstruct as *w-)

Written Burmese ဟောင်း <hoŋḥ> 'old' might be cognate to 公 'father, ruler' (< 'elder'?) which they now reconstruct as *C.qˁoŋ (*Cə.qˁoŋ in 2010)

cf. the velar-velar correspondence of

Written Burmese ကိုး  <kuiḥ> : Old Chinese *kuʔ 'nine'

It would be especially nice if comparative evidence supported the pharyngealized/nonpharygealized distinction in Baxter and Sagart's velars and uvulars.

I proposed that uvulars may be one source of Tangut Grade II based on these comparisons:

2750 1ghu2 < *ɢu 'head' : Old Chinese 后 *ɢˤ(r)oʔ 'sovereign', Written Tibetan mgo 'head'

4046 1khi2 < *CI-qha 'bitter' : Zhongu qʰɐⁿde 'bitter', Ronghong Qiang qʰɑ(q)

but Old Chinese *kʰˤaʔ has a velar!

Could Tangut clarify whether Old Chinese 烏 *qˤa  ~ 鴉 *qˤra 'crow' and 迓 *ŋˤ<r>ak-s (*m-[qʰ](r)ak-s?) 'to meet'* had uvulars?

Nishida (1964: 203) and Grinstead (1972: 114) glossed

1550 3110 2ka1 0jiq3  < *kra or *qa  + S-ji(-H) or *SI-ja(-H)

as 'crow', but the Chinese gloss in the Timely Pearl is 老鴟, literally 'old bird of prey' ('old' is a common noun prefix; cf. English old expressing familiarity rather than age). In any case, 2ka1 is Grade I rather than Grade II *2ka2 which I would expect for a cognate of Old Chinese *qˤ(r)a.

2ka1 can appear by itself, but 0jiq3 is a bound morpheme. (0 indicates an unknown tone.)

Other Tangut words for 'crow' cannot be cognate to 烏/鴉:

2261 0176 2on4 1na'3

2261 is also the second syllable of 1ta1 2on4 'swallow'; apparently an on was a kind of bird, and a crow was a 'black on'.

2262 2114 1jwon3 1leq2

The graph for 2114 is derived from 2262 'bird' plus 0176 'black'.

As for 迓 *ŋˤ<r>ak-s (*m-[qʰ](r)ak-s?) 'to meet', the only vague match I could find was Grade II

4040 1khu'2 'to invite'

which could be from *khru-X, *qhu-X, or *qhru-X. *qhru is the most likely since its cluster matches that of Japhug qru 'to invite'*.

''.*X (the pre-Tangut source of the mysterious phonemic attribute that I write with an apostrophe as a convenient subsitutte for a prime symbol) must be an affix***, as cognates identified by Guillaume Jacques (2014: 59) lack it:

2791 2khu4 < *Cɯ-qhru-H 'to call, invite'

3254 2khu4 < *Cɯ-qhru-H 'imperial edict'

(The presyllable *Cɯ- conditioned Grade IV and the *-H conditioned the second tone.)

I don't think any of these khu-words are related to the Chinese word, as the rhymes cannot be reconciled.

Guillaume compared 4040 to Written Burmese ကြို  <krui> 'to meet someone on arrival' which has <k-> corresponding to Japhug q-. If Japhug q- corresponded to Old Chinese *q-, does that mean *q- remained as a stop in Written Burmese clusters?

*q- > h- in ဟောင်း <hoŋḥ> 'old'

*qr- > kr- in ကြို  <krui> 'to meet someone on arrival'

*1.24.4:48: The phonetic series of 迓 is largely uvular, and 迓 may be related to 御 *m-[qʰ](r)aʔ 'to ward off', so perhaps 迓 could be reconstructed with a uvular root initial.

**1.24.4:57: Although Guillaume Jacques (2014: 58) proposed Japhug qru as a cognate, he reconstructed the pre-Tangut form of 4040 as *khjoo without a uvular or *-r-. His *-j- is a carryover from Gong Hwang-cherng's Tangut reconstruction.

***1.24.4:57: I conventionally write *-X as a suffix, but it could have been a prefix or infix. A 'DENTAL' UVULAR SERIES

At the end of my last entry, I wrote,

Perhaps 鴉 *q(r)a once had an *m-prefix for animals that justified the choice of 牙 *m-ɢˤ<r>a 'tooth' as a phonetic but vanished without a trace.

Baxter and Sagart (2014) reconstructed most other members of Grammata Serica Recensa series 0037 牙 'tooth' with the structure *N-Qra:

Old Chinese (B&S)
Old Chinese (this site)
Middle Chinese

*m-ɢ<r>a *ŋæ

*[N]-qʰraʔ *ŋæˀ covered galleries

*m-ɢ<r>a *ŋæ shoot, sprout

*ŋ<r>ak-s *ŋæʰ to meet
kind of musical instrument
proper, refined

to walk slowly
2nd syllable of mountain name
interrogative particle

I added 0047 since it too contains 牙 'tooth' as a phonetic. (Karlgren did not recognize that. Hence he placed it in a separate series. It seems that the trend is to combine his series. I can't think of any series of his that has been split by later scholars.)

Notes on individual members:

0037a 牙 *m-ɢˤ<r>a: I long assumed that 'tooth' had *ŋ- and might have been a loan from a Southeast Asian ŋa-word for 'ivory'. But if Baxter and Sagart are correct, then the word could have spread throughout Southeast Asia after *m-ɢˁ- fused into a nasal. Or *m-ɢˁ- was borrowed as a nasal. In either case, the only non-Chinese support I know of for a medial liquid is Bahnar ŋə-la 'ivory' (from Schuessler 2007: 550). I could not find that word in SEAlang's Bahnaric data which contains other words that appear to be reflexes of Shorto's (2006) Proto-Mon-Khmer *[m]laʔ 'ivory' and *bluək 'id.' I presume Shorto considered ŋa-words to be borrowings. Perhaps Chinese traders lacking their own word for tusks called them 牙 'teeth' and Southeast Asians borrowed that word for 'tusk'. It is unlikely the semantic shift went in the other direction: i.e., Southeast Asians sold ŋa 'tusks' to the Chinese who then adopted that word for 'teeth'.

0037b is a variant of 0037a.

0037c 庌 *[N]-qʰˤraʔ 'covered galleries' belongs to the same word family as ⾑ *qʰˤraʔ (this site: *qʰraʔ) 'cover'.

0037d 芽: Is this the word for 'teeth' applied to plants? Are sprouts like teeth growing from the earth?

0037e 訝: Not listed. Synonymous with 0037f 'to meet', so I supposed it was also *ŋˤ<r>ak-s.

Can also mean 'astonished'. Could 'astonished' be *[N]-qʰˤrak-s which would be in the same word family as 虩 *qʰrak 'to fear'?

0037f 迓 *ŋˤ<r>ak-s: I would rather not reconstruct a velar word in a uvular series, but I think Baxter and Sagart did so because it belongs to a velar word family with

0699d 迎 *ŋ<r>aŋ(-s) 'to meet'

0766n' 輅 (no reconstruction given; *ŋˤ<r>ak-s since 0766 is a mostly velar series*) 'to meet'

0788a 屰 *ŋrak 'to go against'

(23:36: Also cf. Written Burmese ငြား <ŋrāḥ> 'to meet'.)

Then again, Schuessler (2009: 551) thought 0037f was cognate to

0060l 御 *m-[qʰ](r)aʔ 'to ward off'

Should all of those words be reconstructed with the same root initial, and if so, should that initial be velar or uvular?

0037g 雅 *[N-ɢ]ˤraʔ ~ *N-ɢˤraʔ: I guess Baxter and Sagart are less sure about how to reconstruct 'kind of musical instrument' than 'proper, refined' because they regard the latter as cognate to 夏 *N-ɢˤraʔ 'great' whereas the etymology of the former is unknown (at least to me), so there is no word-family evidence to favor a specific preinitial or root initial.

0047a 邪 *sə.ɢA ~ *sə.la ~ *ɢ(r)A ~ *[ɢ](r)A: Until now I would have reconstructed something like *sɯ-ŋlja to accomodate its various readings and 牙 which I would have reconstructed as *ŋra or *rŋa.

(My *ja is equivalent to Baxter and Sagart's cover symbol *A for an *a which has an unusual Middle Chinese reflex. See pages 223-224 of their book. I once considered reconstructing a seventh vowel, but as they pointed out, there is no rhyme evidence for one.)

I think they feel compelled to reconstruct 邪 *sə.la 'to walk slowly' with a liquid because it is an alternate spelling of

0062p 徐 *sə.la 'to walk slowly'

in the Classic of Poetry - or at least the text as we have it now. How old is the use of 邪 for 'walk slowly'? Could it postdate the merger of *ɢ- (my *sɯ-ɢ-) and *l- as *j-? There is no doubt that 0062 is a lateral series. WHY DO SOME CHINESE CROWS HAVE 'TEETH'?

Last night I mentioned the near-homophony of Old Chinese
*Cɯ-qa (Baxter and Sagart: *[ʔ]a) > Middle Chinese *ʔɨə 'in'

*qa (Baxter and Sagart: *qˤa) > Middle Chinese *ʔo 'crow'

which were written with variations of a drawing of a crow and are regarded as members of the same phonetic series (Grammata Serica Recensa 0061). Ideally I'd want them to have the same initial consonant, though Baxter and Sagart reconstruct them with different consonants while leaving the option of *q- open for 於. (Brackets indicate uncertaintly; in this case, *[ʔ] means 'either or something else that has the same Middle Chinese reflex as *ʔ: i.e., *q.)

The Middle Chinese readings of the 烏/於 phonetic series only has initial *ʔ-, so without additional evidence, there is no way to tell whether that Middle Chinese *ʔ- was from Old Chinese *ʔ- or *q-.

Transcriptions such as 烏弋山離 'Alexandria' and 烏桓 'Avar' from the Records of the Grand Historian (c. 100 BC) tell us that 烏 was something like *a toward the end of the first millennium AD, though the fine details are uncertain: e.g., was the initial consonant in the underlying dialect(s) zero, a glottal stop (with or without pharyngealization?), or even a pharyngeal fricative *ʕ- (cf. how Arabic ʕ- is borrowed as phonemic zero in English)? They do not rule out the possibility of another initial consonant at an earlier period: e.g., *q-.

I favor *q- because it enables me to regard

*qa (Baxter and Sagart: *qˤa) > Middle Chinese *ʔo 'crow'

*qra (Baxter and Sagart: *qˤra) > Middle Chinese æ 'crow'

as members of the same *qa-word family.

By establishing that 牙 was a uvular series, Baxter and Sagart solved the mystery of why Middle Chinese 牙 *ŋæ is phonetic in 鴉 whose reading probably never had a nasal.

If Schuessler (2007: 83, 517) is correct, 鴉 never had *-r- (which is unlikely to be an infix in 'crow'*) and its Middle Chinese low vowel is an archaism. Hence 烏 and 鴉 could have been homophones, and Mandarin 烏鴉 wuya 'crow' would be a reduplication if pronounced in Old Chinese as *qa qa.

Moreover, *qa matches the global sound-symbolic archetype for 'crow': e.g., Sanskrit kāka-. (See Wiktionary for more examples. Thai kaa 'crow' could be a borrowing from Chinese, though it could also be the independent product of sound symbolism.)

*1.22.4:52: Old Chinese *-r- indicates double or multiple objects (Baxter and Sagart 2014: 58) and would be out of place in 'crow' unless 鴉 *qra once meant something like 'a flock of crows'.

If 鴉 'crow' had *-r-, then 烏 *qa and 鴉 *qra were a word family only in the weak sense that they were based on the same sound symbolism (cf. the English 'word family' of gl-words: gleam, etc.). I would not consider *qra to be a derivative of a root √*qa.

Perhaps 鴉 *q(r)a once had an *m-prefix for animals that justified the choice of 牙 *m-ɢˤ<r>a 'tooth' as a phonetic but vanished without a trace.

