This is all going to fall apart in part 4 ...

For a while I thought rGyalrong, a living relative of Tangut, provided evidence for reconstructing retroflex vowels. Looking at Guillaume Jacques' "Essai de comparaison des rimes du tangoute et du rGyalrong", I found that

- Tangut retroflex vowel rhymes appeared in 41 rGyalrong-Tangut cognate sets

- Tangut retroflex vowels corresponded to rGyalrong r (in any position) in 35 cases: e.g.,

rG initial of non-initial syllable r:

T 'yar 1.82 'eight' : rG kU rchat 'eight'

(None of the rG forms in the data contained word-initial r-.)

rG syllable-medial r:

T zyir 2.72 'long' : rG kU zri 'long'

rG final r:

T wer 2.71 'wing' : rG të Rar 'wing'

- Conversely, Tangut retroflex vowels did not correspond to rGyalrong r (in any position) in 6 cases: i.e.,

T chier 1.78 'right' : rG Xchha 'right'

T kier 1.78 'chew' : rG kë në ngka 'gnaw'

T kar 2.73 'separate', differentiate' : rG kë qët 'separate'

T lyïïr 1.92 'four' : rG kU Vde 'four'

T kyur 1.76 'to smoke' : rG kë së kU 'to smoke'

T tsorw 1.91 'to stab' : rG të mdzu 'to stab'

It appears that rGyalrong has preserved an earlier r that was the source of most retroflex vowels in Tangut.

Next: Things are not what they seem ... RETROFLEXION: RIGHT OR WRONG? (PART 2 OF PART 2)

Before I go on to present another reason for reconstructing retroflex vowels in Tangut, here's a correction for part 2. I wrote:

In fact, ï is the only Tangut vowel in his [Gong's] reconstruction that has no retroflex counterpart.

This is false. Gong reconstructed 121 syllables with the retroflex vowel ïr. I've mentioned them before: e.g., lyïïr 1.92 'four', thought to be cognate to other Sino-Tibetan words for 'four' without any rhotic quality: Old Chinese s-hlits, Written Tibetan bzhi (< b-lyi), and Written Burmese leH.

I just noticed that two ryïr syllables (TT0668 BUILDING and TT3543 THIRD-SON) belong to rhyme 2.37 in the nonretroflex big cycle - alongside the ten ryïy syllables mentioned in part 2. What's going on here? Does rhyme 2.37 conflate both -r and -yïy syllables? Why weren't these two ryïr syllables assigned to rhyme 2.77 like other ryïr syllables? Is this inconsistency due to the evidence pointing in different directions?

BUILDING and THIRD-SON are clearly listed under rhyme 2.37 in Precious Rhymes of the Tangraphic Sea. (There is no way to know their position, if any, in the lost second volume of the plain ol' Tangraphic Sea. The surviving first and third volumes of TS and the whole of PRTS are similar but do not completely duplicate each other.)

But where are BUILDING and THIRD-SON located in the Homophones dictionary? Are they grouped with ryïy 2.37 or ryïr 2.77 syllables? Let's find out ...

BUILDING is 47A43 (p. 47, side one, column 4, character 3).

THIRD-SON is 47A53 (p. 47, side one, column 5, character 3).

They are in a homophone group (47A31-47A53) which otherwise contains ryïy 2.37 syllables except for 47A44 (TT2387 DEMON ryiry 2.68*)!

ryïr 2.77 syllables are in a different homophone group (47B75-48A11).

Maybe the two ryïr 2.37 syllables (47A43, 47A53) should be re-reconstructed as ryïy 2.37 to match the other-yïy syllables in the homophone group 47A31-47A53. ryïr 2.37 appears to be a typo for ryïy 2.37.

* TT2387 DEMON ryiry 2.68 would be expected to be in the homophone group 47B18-47B43 consisting of ryiry 2.68 syllables except perhaps for

47B18 (TT5616 SIDE) which is reconstructed by Gong as liry 2.68 with l- and without -y- even though Sofronov (1968: 391) and Li Fanwen (1986: 438) regard it as homophonous with the others; no other 2.68 syllable ends in iry without -y- except for

TT1658 SWELLING (no retroflex vowel!) which appears to be an error

TT3752 BLACK myiiry (with a long retroflex vowel!?) which appears to be a typo for myiry with a short retroflex vowel (there is no other syllable ending in -iiry)

TT5762 SPIRIT myịry (with a tense retroflex vowel!?) which appears to be a typo for myiry with a lax retroflex vowel (could the fourth rhyme cycle consist of rhymes with tense retroflex vowels, given that the first cycle consists of rhymes with normal vowels, the second rhymes with tense vowels, and the third rhymes with retroflex vowels?)

47B34 (TT2674 TIME) which is reconstructed by Li Fanwen (1986: 438) as having initial z- even though Sofronov (1968: 337) and Gong regard it as homophonous with the others

47B35 (TT4924 OVERMUCH) which belongs to rhyme 1.74 and is homophonous with the other except for tone. Perhaps this was a tone 2 syllable to the Homophones author whereas it was a tone 1 syllable to the Tangraphic Sea author. Other ryiry 1.74 words are in the homophone group 47B44-47B48. RETROFLEXION: RIGHT OR WRONG? (PART 2)

I'm a little skeptical again, and I'll go into what might be wrong in part 4. So much for my planned trilogy: intro, internal evidence, and external evidence.

My internal evidence is nothing more than a simple observation about Gong's Tangut reconstruction. Using Sven Osterkamp's amazing Tangut search software, I found that initial r- is almost always followed by a retroflex cycle rhyme: e.g., rer, rur, ror occur but re, ru, ro do not occur.

The only exceptions I found were of three types:

- ten instances of the syllable ryïy 2.37 (not ryïry) in the large, nonretroflex rhyme cycle

This clearly shows that Tangut was not 'monosyllabic' in the sense that each syllable was a word, as it is highly improbable that it had ten homophonous words.

- two syllables with nasal vowels: rã 2.22 and rẽ (rhyme unknown)

- two syllables with unknown rhymes: ra (rhyme unknown) as well as the aforementioned rẽ (rhyme unknown)

Why is this skewed distribution of initial r- significant? Let's rename the retroflex cycle rhymes 'the X cycle rhymes'. Why would initial r- be so strongly associated with X cycle rhymes? Could there be something r-like about those rhymes? A couple of Tibetan transcriptions - e.g., bdur 1.84, d-wir 1.86 - suggest that this may be the case (Nishida 1966: 65-66). Did Tangut undergo what I call autoretroflexion - the automatic retroflexion of vowels following r: i.e.,

r(y)V(G) > r(y)Vr(G)

(G) = y or w

Those vowels assimilated to r- by assuming its rhotic quality.

Example: A Proto-Tibeto-Burman word for 'horse' was something like mrang (cf. Written Burmese mrang 'horse' and rGyalrong mbro 'horse'). Guillaume Jacques told me that he thinks this is the source of

TT5233 HORSE ryiry 1.74

If so, here's what might have happened:

original form: mrang

loss of initial consonant: rang

ang > yiy shift: ryiy (Guillaume says there are many examples of this shift.*)

autoretroflexion: ryiry

But how can the exceptions be explained? Perhaps ryïy 2.37 is from an earlier ryïry. ï and ïr sound similar and may have been difficult to distinguish. There is no ryïry in Gong's reconstruction. (Wrong! There is!) In fact, ï is the only Tangut vowel in his reconstruction that has no retroflex counterpart.

(Cf. how Pulleyblank [1991: 12-13] replaced his 1984 reconstruction of the retroflex vowels ar and Er for Early Middle Chinese with the diphthongs and əï, citing "a strong acoustic affinity between retroflexion and back unrounded vowels". Actually, ï in my notation [corresponding to barred i in Gong and Pulleyblank's notation] represents a central unrounded vowel, but ï sounds very much like a back unrounded U, at least to my insensitive ear. I know of no language with a phonemic distinction between ï and U.) There are no nasalized retroflex vowels like r and r in Gong's reconstruction. Such vowels would have been very difficult to pronounce. Therefore it is not surprising that rã and rẽ occur instead of rãr and rẽr. The syllable that Gong reconstructed as ra with an unknown rhyme could have actually been rar 1.80 or rar 2.73 depending on the tone (1 = 'level', 2 = 'rising'). Even if it really belonged to a simple -a rhyme of the big nonretroflex cycle, it would be the only exception I couldn't explain.

(Perhaps it is a loanword which did not conform to Tangut phonotactics. I wondered if it was the pronunciation of a transcription tangraph for Sanskrit ra, but none of the ra-tangraphs match the ra-tangraph [Mojikyou 1393; no TT number].)

Needless to say, if Gong's reconstruction is wrong, then this whole post collapses. *I briefly considered this possibility on my own before Guillaume contacted me, but rejected it because I did not know about the -ang > -yiy shift. Moreover, Gong (1995: 56, 73) proposed that earlier -ang became -o, not -yiy in Tangut, so I would have expected mrang to become ror, not ryiry. RETROFLEXION: RIGHT OR WRONG?

The superscript r-s all over Gong Hwang-cherng's Tangut reconstructions in this blog represent vowel retroflexion. I've been skeptical about retroflex vowels in Tangut for over a decade. I'm not anymore. Why? I'll save that for my next post and lay out the background in this post.

The rhyme dictionary Tangraphic Sea was apparently modelled after the Middle Chinese (MC) rhyme dictionary 廣韻 Guangyun (Broad Rhymes). In Guangyun, rhymes were arranged in a somewhat arbitrary order from -ong to -wom within each tone: i.e.,

Level tone: -ong ... -i ... -a ... -wom

Rising tone: -ong ... -i ... -a ... -wom

Departing tone: -ong ... -i ... -a ... -wom

Entering tone: -ok ... (no counterparts to -i, -a) ... -wop

('Entering tone' syllables had final stops corresponding to nasals in the other tones. Open syllables in the three non-entering tones had no entering tone counterparts.)

Tangut (A) did not fit this model very well since it had few if any final consonants. Yet it had 105 different rhymes arranged in four 'cycles': one big cycle and three little cycles (Sofronov 1968 I.136-138; his rhyme groupings and reconstructions are cited):

Cycle I (big): rhymes 1-60: -u ... -i ... -a ... -ywo

Cycle II (small): rhymes 61-76: -ụ ... -Ị ... -ạ ... -yọ̣̣̃

Cycle III (small): rhymes 77-98: -ại ... -ụ ... -ạ .... -yụo

Cycle IV (small): rhymes 99-105: -ẹ ... -Ị ... yọ̃

(Note that the first rhyme of Guangyun may have been pronounced like -ung in the Chinese dialect known to the Tangut. That would explain why the cycles had -u roughly corresponding to Middle Chinese -ong.)

In Sofronov's reconstruction, there is a clear division between the big cycle with lax vowels (lacking subscript dots) and the small cycles with tense vowels (written with subscript dots).

In Nishida's (1964) reconstruction, the third cycle and, to a lesser extent, fourth cycle contain retroflex vowels. Nishida (1964: 63) reconstructed retroflexion on the basis of a small number of Tibetan transcriptions ending in -r.

Strangely, Nishida's retroflexion (more or less carried over into Gong's reconstruction) often does not correspond to an r in Chinese or Tibeto-Burman. Here are examples from Gong (1995: 53, 55, 63):

Tangut tserw 1.87 'joint' : pre-Old Chinese 節 tsik, Written Tibetan tsigs, Written Burmese a-chhach

Tangut borw 1.91 'bee': OC 蜂 bong (m-phong?), (mə-?)phong 'bee', WT bung 'bee'

Tangut lyïïr 1.92 'four': OC 四 s-hlits 'four', WT bzhi (< b-lyi) 'four', WB leH 'four'

Conversely, Gong (1995: 67-69) has no examples of WT -r corresponding to a Tangut retroflex vowel.

Gong (1995: 88) did, however, list one instance of Tangut retroflexion corresponding to an r elsewhere:

Tangut 'yar 'stand' : OC 立 rəp 'stand', WB rap 'stand'

(But is the correspondence of T 'y- to r- regular? I suspect it is. I'm saving another example for a future post.)

Guillaume Jacques pointed out to me that Tangut mur 1.75 'obscured'; 'dark' (probably the same word written with two different tangraphs; see one here) apparently corresponds to Tibetan mun-pa 'dark' (with no r). Gong (1995: 70) added Old Chinese 昏 hmən 'dusk' to this set. Although some Chinese -n are from -r, this is not the case with hmən 'dusk'.

If Tangut retroflexion did not (always) come from a Tibeto-Burman r, where did it come from? Or did it simply not exist in some cases? Is it an artifact of an incorrect reconstruction? Should the difference between the earlier and later rhymes in Tangraphic Sea be explained on some other basis?* Once I might have said "yes" or firmly remained agnostic. But now ...

Next: Affirmation through autoretroflexion.

* In "The Phonological Reconstruction of Tangut", Gong (1989: 35-37) wrote,

In order to solve the problem as to how these [105] rhymes were distinguished in the Wen-hai [= Tangraphic Sea] period, we face first of all with the problem what the distinguishing features of each cycle were. The fact of the matter is that the existing data cannot offer any clue to the solution of the problem ...

[Hence Gong turned to a Qiang (= Ch'iang) language. Tangut (A) is thought to be an extinct Qiangic language.]

According to Sun (1981: 30), the Northern Ch'iang dialect (the Ma-uo dialect) distinguished between long and short vowels [as in Gong's current Tangut reconstruction] as well as between retroflex and non-retroflex vowels [ditto]. With a purpose to determine whether the retroflex vowel in the Ma-uo corresponds to the third group [of Tangut rhymes] reconstructed with -r by Nishida, I have made a comparative study of the cognates. However, the result is quite discouraging. The retroflex vowels in the Ma-uo do not correspond with vowels of a special cycle [in Tangut], but correspond with vowels of all cycles except the last one, where no cognates have been discovered. It must be pointed out that for the reconstruction of retroflex vowels (or vowels with -r ending), neither internal nor comparative evidence is available. [But see my next post!] If I now, following Nishida in writing vowels with -r ending, it is only for the purpose of keeping distinction among different rhymes. I looked for all retroflex-vowel Tangut words in Guillaume's comparison of Tangut with rGyalrong (another living Qiangic language) and found ... too much stuff to write tonight. I'll reveal my findings later. WHY CAN'T ヒ ヒ SEMANTIC?

(The first ヒ is read as hi [its value as a Japanese kana syllabic symbol and the second ヒ is read as bi [its Mandarin reading as a Chinese character 'dagger'], so this could be read as 'why can't he/dagger be semantic?)

Before I go on to talk about what I call autoretroflexion in Tangut, I want to explain why I've been searching for a phonetic instead of a semantic value for the tangraphic element ヒ.

First, if ヒ were a semantic element, I would expect many (not all) characters containing it to share a semantic field. This is not the case, even if cases like

TT2257 PERCEPTION byu 2.3

derived from the tangraph for its homophone

TT0012 BROTHERS byu 1.2 (as a phonetic element)

are discarded. (In such cases, ヒ has no semantic value in the derived character, since ヒ is part of a larger phonetic element.) The glosses of ヒ- tangraphs go all over the place. Here is every tenth gloss given to a ヒ-tangraph (whenever available) by Grinstead (1972: 99-106):

TT0168 AUNT (mother's sister)

TT1657 CUT




TT4446 EGG


TT1832 EAR










TT2889 AS-IF







TT4442 WINTER (cf. FROST above)

TT0935 (a tree)

TT4224 SOUND (cf. EAR above; has WATER element!)

TT2882 YOU

TT0051 MEET (cf. ASSEMBLE above)

TT3184 HOE

TT (no number) FALL-OUT (of teeth) (resembles TT1050)


Second, if the semantic interpretation of the Tangraphic Sea's tangraphic analyses is correct in all cases, then ヒ should ideally turn out to be an element taken from tangraphs of a single semantic field. That field would then be the semantic value of ヒ: e.g., if ヒ were always taken from plant tangraphs, then I would conclude that it meant 'plant' (and might have been an abbreviation of a full tangraph 'plant'). Some tangraphic elements do have clear semantic values and do generally seem to be abbreviations of full tangraphs. But ヒ is not one of them. Taken at face value, the Tangraphic Sea tells us that, for instance, ヒ is taken from


BUBBLE in FIVE (why?)


FINGERNAILS in FISHHOOK (hooks are like fingernails?) and OFFICIAL (the same one in the analysis for FINGERNAILS)

BORDER in FAR (... from the border?)



LAW in ARABLE-LAND, SURPASS (same one as in ALTHOUGH), and (understandably for once) DECIDE-LAW-CASE



EGG in EGG (twice; two different tangraphs analyzed in terms of each other)

FROST in MORNING (and vice versa; surprisingly WINTER and COLD aren't in the analyses)

among others.

If ヒ were really useful as a semantic element, the reader would have a good chance at guessing what ヒ represented. But ヒ was apparently taken from tangraphs without any single semantic feature in common. A reader seeing ヒ would have no idea whether it was from HAVING, BUBBLE, OFFICIAL, FINGERNAILS, BORDER, SURPASS, etc. I can only conclude that those tangraphs had another feature in common - a phonetic feature in Tangut B (since they share few or no phonetic features in Tangut A, except by chance in most instances*).

Next: Autoretroflexion, or why I believe in retroflex vowels now.

*The exceptions would be cases like


TT0012 BROTHERS byu 1.2 <

TT2257 PERCEPTION byu 1.2 (as a phonetic element)

(+ the element TOP from the top of MOTHER - why TOP?)

The phonetic similarity between such tangraphs is not coincidental. To answer my own question: BROTHERS may be a double-phonetic tangraph consisting of a Tangut B phonetic (TOP) over a Tangut A phonetic (PERCEPTION). Suppose that the Tangut B word for 'perception' was ABC (corresponding to its three elements). ABC was semantically equivalent to Tangut A byu 1.2 'perception'. Now suppose that the Tangut B word for 'top' was X. The structure of BROTHERS tells us that

- the Tangut B word for 'brothers' was similar to X (homophonous with 'top')

- the Tangut A word for 'brothers' was similar to byu 1.2 (and in fact was homophonous with 'perception')

The Tangraphic Sea cited

TT0245 MOTHER mya 1.2

as the source of TOP as an attempt to find some semantically relevant tangraph with the appropriate element. Perhaps the initial syllable of 'mother' in Tangut B was homophonous with the Tangut B word for 'top'.

Nishida (1966: 334) calls TOP the WOMAN radical (女部), though it occurs in tangraphs like HEAD and BRAIN and is BOTTOM upside-down. It is true that MOTHER is not the only feminine tangraph with (一+卅). According to the Tangraphic Sea, the top of MOTHER is from TT5657 MOTHER myiy 2.33 (< earlier ma?). But there are many tangraphs with TOP/MOTHER without upper maternal semantics: e.g., (from Nishida 1966: 334-335):


I presume that TOP/MOTHER was a Tangut B phonetic in some if not all of those characters. It's likely that it was pronounced with m- in Tangut B (assuming the Tangut B word for 'mother' was an m-word like most around the world - with a few exceptions like Jpn haha 'mother' and Georgian deda 'mother' [the Georgian word for 'father' is mama!]). WHY DON'T HORSES STAND ALONE?

Tangraphic elements can be divided into two categories, independent and dependent.

Independent elements can stand alone as tangraphs: e.g.,

TT3344 MAN dzywo 2.44

from the last post.

Dependent elements only occur as parts of tangraphs: e.g., the element HORSE (cf. Chn 馬 'horse') on the left side of

TT5233 HORSE ryiry 1.74

(cf. Written Tibetan rta 'horse': rta > ra > ryiry [with the a > i shift common in Tangut]?)

TT5240 HORSE (calendrical) gyiy 1.36

TT4789 HORSE ryar 2.74

(variant of ryiry without the a > i shift?)

TT5225 ELEPHANT byu 2.3

TT5241 CENTER ka 1.17

TT5232 MIDDLE gu 2.1

Chinese writing also has independent and dependent elements. However, the dependent elements usually turn out to be graphic variants of the independent elements: e.g., 亻 is the left-hand dependent version of the independent element 人 'man'.

This is generally not the case in Tangut writing. Dependent elements like HORSE are not graphic variants of independent elements. There is no simple one-element tangraph HORSE. The tangraph for the basic word for HORSE is

TT5233 HORSE ryiry 1.74

with an extremely common right-hand element ヒ that occurs in roughly one out of twelve tangraphs.

Why does HORSE need to be written with ヒ? Why is ヒ so common? And why does ヒ only seem to appear in the right-hand position?

I suspect that ヒ is a 'clarifier' or 'disambiguator' like Korean 乙 -l.

In Korean, the Chinese root for 'stone' (borrowed as sOk) can be written with the Chinese character 石 'stone' whereas the native word tol for 'stone' can be written as 乭, a combination of Chinese 'stone' 石 plus the clarifier 乙 -l. 乭 represents 'the word for stone that ends in -l'. (Both are almost always written in hangUl today as 석 sOk and 돌 tol.)


TT5233 HORSE ryiry 1.74

may have represented 'the word for HORSE that ended in the sound represented by ヒ'. What could that ヒ-final word for HORSE have been? The Tangut A word for 'horse' was ryiry, and it is clear that ヒ did not represent Tangut A vowel retroflexion (-r-) or final -y, for there are many Tangut A words written with ヒ which Gong did not reconstruct with either vowel retroflexion or final -y: e.g.,

TT2257 PERCEPTION byu 1.2

TT0012 BROTHERS byu 1.2

TT2882 YOU nya 2.17

TT3119 FIVE ngwə 1.27

There are words written with ヒ which Gong did reconstruct with vowel retroflexion and/or final -y: e.g.,

TT2890 BLACK mur 1.75 (borrowed from Old Chinese hmək 'black'?)

TT1850 EYE mey 1.33 (cf. Written Tibetan dmig 'eye')

TT0711 GARDEN lhery 2.66

but it's not clear whether ヒ really stood for Tangut A vowel retroflexion (-r-) and/or final -y in those cases. ヒ is so common that it appears in tangraphs representing all kinds of Tangut A syllables (semantically equivalent to Tangut B words ending in the sound indicated by ヒ).

My guess is that ヒ represented

a. a consonant that only occurred in final position in Tangut B (cf. visarga [H] and anusvaara [M] in Sanskrit which could not appear word-initially)

b. a final-position graphic variant of another phonetic symbol: cf. Greek ς, the final-position graphic variant of σ.

If the Tangut B word for 'horse' were borrowed, ヒ could have represented:

-t (if 'horse' were from Turkic at 'horse') or

-r (if 'horse' were from something like Mongolian morin; cf. Middle Korean mʌr 'horse' [> modern Korean말 mal])


would have represented at or mor (written as HORSE-t or HORSE-r).

The Written Tibetan and Tangut-period northwestern Chinese words for horse (rta and mba) had no final consonants, so they could not be potential candidates unless (a) the clarifier hypothesis is wrong or (b) Tangut B added a suffix to the borrowed Tibetan or Chinese word, and it was this suffix that was represented by the clarifier. The possibility that Tangut B had an indigenous word for 'horse' cannot be ruled out.

Next: Autoretroflexion. On Saturday I briefly wondered if ヒ were a vowel length marker like Japanese ー which also cannot appear in initial position (since one can't have a 'long zero'). This would mean that ヒ-final tangraphs represented Tangut B words with final long vowels. Since ヒ only appears in one out of twelve tangraphs, this also implies that only one out of twelve Tangut B words ended in a long vowel. This figure seems low to me, though it's not impossible: cf. the low frequency of final long vowels in Japanese if inflected forms and Chinese words are excluded.

