The Sanskrit syllable ho [ɦoː] was transcribed in Tangut with the common Grade I transcription characters

3118 1xʊ (rhyme 1) and 5595 2xwo (rhyme 51)

(Arakawa 1999: 111). 3118 makes more sense if R1 was which is closer to [oː] than the -u of other reconstructions. hu and ho have very distinctive vowel characters in Indic scripts, so I doubt a Tangut speaker misread Sanskrit ho and transcribed it using 3118 as if it were hu.

Perhaps the familiarity of those high-frequency transcription characters took priority over phonetic precision when 3118 and 5595 were chosen for Sanskrit ho.

A better phonetic match for Sanskrit ho would have been

5661 1xo 'third person singular pronoun'*

without -w-. Although 5661 is a rare character, transcriptions may contain characters that are hardly used in other contexts: cf. Mandarin 伊 yi 'third person pronoun (obsolete)' which is now mostly used for transcribing foreign i in names such as 伊拉克 Yilake 'Iraq'. Perhaps whoever chose 5595 had a dialect in which xwo had become xo.

The level and rising tone names for R51 in the Precious Rhymes of the Tangraphic Sea are

2326 1tho 'tired, weary' and 4290 2thwo 'to fall into'**

I can safely reconstruct R51 as -o since

- Sanskrit -o usually corresponds to R51 in Tangut transcriptions (Arakawa 1999: 111)

- R51 is almost always transcribed as Tibetan -o (Tai 2008: 218)

As I will explain in a later post, perhaps -o was more precisely [ɔ], a vowel absent from both Sanskrit and Tibetan. [ɔ] might have been the best match for Sanskrit -o if Tangut had no [o]. And of course Tibetan o would be the best match for a Tangut [ɔ]. But for now I will continue to use the simpler symbol -o.

*5661 is apparently only in dictionaries. It may be a pronoun of the so-called 'ritual language' (which I suspect was a substratal language a.k.a. 'Tangut B').

The normal Tangut third person singular pronoun is

0388 2thia

which may be derived from the demonstrative

2019 1thia 'that' (written as 0388 plus 'water' on the left)

plus a suffix *-H that conditioned the second (i.e., 'rising') tone. Kepping (1985: 61) regarded 2019 as a demonstrative, whereas Gong (2003: 607) translated it as both a pronoun and a demonstrative. I assume the pronominal uses of 2019 are secondary.

1thia is cognate to Ronghong Qiang the 'that' (LaPolla and Huang 1996: 52).

In turn, the Tangut and Qiang forms resemble the Mandarin third person pronoun ta [tʰa]. That similarity may be coincidental: cf. how those three words happen to sound vaguely like English that. The third person pronoun of the Chinese dialect known to the Tangut is unknown. In any case, the Tangut, Qiang, and Mandarin words cannot be from a Proto-Sino-Tibetan *tha or the like since Mandarin ta is from Old Chinese *hlaj with a lateral initial and a final glide.

**Unlike previous rising tone rhyme names, 4290 2thwo has a -w- absent from its level tone counterpart 1tho. There was no Tangut syllable *2tho without -w-, so 2thwo was the closest rising tone match for level tone 1tho.

I don't know the reasoning behind rhyme names. Why was 1tho, a word known only from dictionaries (and hence a possible 'ritual'/substratum language/Tangut B word) chosen instead, of say,

1292 1to 'the surname To'

which had an exact rising tone counterpart

4859 2to 'end'?

I think it's appropriate to end this post with that character! A DIVINE PREFACE: TANGUT RHYME 1

Tangut rhyme 1 (R1) has two names in the Precious Rhymes of the Tangraphic Sea: one for its level tone version (1.1 = 1st tone, 1st rhyme) and another for its rising tone version (2.1 = 2nd tone, 1st rhyme).

5085 1bʊ 'preface' and 3224 2bʊ 'to divine, tell fortunes' (< Chn 卜?; see below; "Divine" in the title is an English play on words and is not mean to imply 3224 is an adjective 'divine')

R1 is almost always transcribed as -u in Tibetan with a few exceptions (Tai 2008: 217):

-uH (once)

-o (twice)

-i (once)

-iH (once)

Therefore R1 must have been u-like. (I cannot explain the i-transcriptions.)

Having reconstructed Grade I i / R8 as in my previous post, I would like to reconstruct Grade I u / R1 as its back counterpart -ʊ. The Tibetan -o transcriptions may imply a vowel like ʊ that was slightly lower than u.

However, R1 was almost always used to transcribe Sanskrit -u (Arakawa 1997: 110), implying that it may have simply been -u: i.e., an exact match of the Sanskrit vowel. Like other Grade I rhymes, -u lacks the vowels that characterize Grades II-IV: -ɤ-, -ɨ-, -i-.

But -u has three problems.

First, if pre-Tangut *i lowered to ɪ in Grade I, I would expect its back counterpart *u to lower to ʊ in Grade I. Of course there is no absolute rule of symmetry in vocalic development. Nonetheless, vowels shifting in unison are more probable than vowels each going their own way.

Second, if Tangut had -u but no -ʊ, it would have had an asymmetrical set of simple vowels:

i (R11 / Grade IV)   u? (R1 / Grade I [not IV!])
ɪ (R1 / Grade I)   (no Grade I -ʊ!)
e (R34 / Grade I) ə (R28 / Grade I) o (R51 / Grade I)
  a (R17 / Grade I)  

On the other hand, Ukrainian has a similar asymmetrical vowel system (without ə; Ukrainian ɪ is central front, reflecting its origin as a merger of central and front *i). (6.28.1:14: No such merger producing an isolated ɪ ever occurred in Tangut.)

Reconstructing might result in another asymmetrical vowel system unless I reconstructed Grade III R2 or Grade IV R3 as -u, a possibility I have considered from time to time: e.g., in my June 20 and June 17 reconstructions.

Third, this Tangut loanword from Middle Chinese (MC) has R1 corresponding to EMC *-o:

3806 2bʊ < MC *bo 'cattail'

-ʊ is closer to MC *-o than -u. (Perhaps the ancestor of R51 was not *-o when this word was borrowed.)

Unfortunately there are no other examples of R1 loans that are unambiguously from MC. The name of the rising tone version of R1,

3224 2bʊ 'to divine, tell fortunes'

could be from Chinese 卜, but the initial is irregular (does it incorporate a Tangut voicing prefix: *b- < *N-p-?) and cannot be used to determine the age of the borrowing. (A voiced obstruent initial in a Sino-Tangut loanword indicates MC origin, since later loans have voiceless aspirates reflecting post-MC devoicing.) 2bʊ could be from a form anywhere on a spectrum from MC *pok to post-MC *pu.

Hence for now I seem to be alone in reconstructing R1 as -ʊ, though I remain open to the possibility that it was -u, the reconstruction favored by the majority of scholars. AN INTENTION TO WHISTLE: TANGUT RHYME 8

At the end of part 2 of "G-*r-adation in Tangut", I wrote,

I will explain the reasoning behind the reconstruction of individual [Tangut] vowels and diphthongs in future posts.

I initially thought that I would start with Tangut rhyme 1, but I realized that my explanation for that rhyme was dependent on my explanation for rhyme 8. So I'm going to start with 8 and go back to 1.

Rhyme 8 (R8) has two names in the Precious Rhymes of the Tangraphic Sea: one for its level tone version (1.8 = 1st tone, 8th rhyme) and another for its rising tone version (2.7 = 2nd tone, 7th rhyme).

3100 1sɪ 'intention' and 1007 2sɪ '(to) whistle'

R8 is the Grade I member of a set of four i-type rhymes (R8-R11). I reconstruct Grade I with zero corresponding to the vowels that characterize Grades II-IV: -ɤ-, -ɨ-, -i-. I would prefer to reconstruct R8 as zero plus a simple vowel. But what was that vowel?

R8 was transcribed in Tibetan as -i (5 times), -iH (twice), -yi (once), and -ing (once) (Tai 2008: 206). This does not necessarily mean that R8 was -i, but it does tell us that R8 was something like -i.

R8 was never used to transcribe Sanskrit short -i or long (Arakawa 1997: 110). Therefore R8 was -i/ī-like but not -i/ī itself, and I rule out Arakawa's reconstruction of -i for R8.

Sofronov (1968 and 2012) and Gong (1997) reconstructed R8 as short -e, a sound absent from Sanskrit. (Sanskrit e is always long [eː].) Their reconstructions are consistent with the absence of R8 in Tangut transcriptions of Sanskrit (Arakawa 1997), but not with the Tibetan i-transcriptions of R8. If R8 were -e, I would expect its Tibetan transcription to be *-e, not -i.

Given that

- Grade I in Chinese is associated with descendants of lower(ed) vowels (that were emphatic at an even earlier stage)

- the Tangut grade system was influenced by Chinese

- there was no Grade I i-type rhyme in Chinese, implying Tangut R8 was unlike anything in Chinese (and further indicating that a simple -i is improbable for R8 since Chinese certainly had -i, which was in Chinese Grade IV, not Chinese Grade I!)

I reconstruct R8 as -ɪ, a lowered version of pre-Tangut *i. -ɪ is like Tibetan -i while also being unlike anything in Sanskrit. Nishida (1964) and Li (1986) also reconstructed R8 as -ɪ.

Other proposed reconstructions of R8 are unlike Tibetan -i or anything in Sanskrit:

Hashimoto (1965): -ɛj [-eːj] (I would expect the Tibetan transcription *-e)

Huang Zhenhua (1983): -ɔi, -oi (I would expect the Tibetan transcription *-oHi)

I used to reconstruct R9 as -ɪ, but now I reconstruct it as -ɤi with the Grade II vowel -ɤ. NAMES OF THE TANGUT CAPITAL (PART 2)

In part 1 I covered names containing 州 zhou, 府 fu, and/or 興 xing from Dunnell (1989).. Here are the miscellaneous names from her article (with the exception of #4).

1. 衙頭 Yatou

衙 is 'government office' and 頭 is 'head'. Is this a Chinese term coined in the Tangut Empire and/or a translation or even a transcription of a Tangut term?

2. 牙帳 Yazhang

This superficially looks like 'tooth tent', but I wonder 牙 is another spelling of its homophone 衙 'government office'. According to Dunnell, "In Tang and Liao usage, yazhang designated the imperial camp, or the emperor himself."

3. Erighaya/Egrigaia/Iriqai/Irigai

This is the most mysterious of all the names. It is presumably a transcription of a Tangut name. At first I thought Mongolian speakers had added the first syllable had been added to an r-initial original because Mongolian did not permit initial r- (cf. Mongolian Orus for 'Russia'), but now I wonder if E-/I- is the mysterious E- in Etsina/Etzina whose latter two-thirds mean 'black water':

3058 0176 2ziəəʳ 1nɨaa

I have never been able to identify a plausible Tangut word corresponding to E-.

Kychanov identified Ir- as a Mongolian inversion of Tangut ri: i.e.,

4396 2riəʳ 'room, hall, main buiilding'.

He also identified the final syllable as, in Dunnell's (1989: 58) words, one of "various Tangut words denoting fortified settlement". (Does Kychanov's original article specify those words?) However, I do not know of any Tangut word sounding like ghaya/gaia/gai/qai with such a meaning. And the words that do sound nothing like ghaya/gaia/gai/qai:

0289 1vɪ 'walled city'

1623 2vạ 'imperial city, imperial palace'

1869 1po 'fort' < Chn 堡

My current Tangut reconstruction does not even have the rhyme -ai. Nor have I ever seen a Tibetan transcription of Tangut indicating a rhyme -ai.

4. Calachan

Andrew West pointed that Calachan is Marco Polo's name for the capital of the "province [not city!] called Egrigaia [...] belonging to Tangut" (The Travels of Marco Polo, p. 281). I can't see the note about Calachan in that edition, so I found  it in another edition. Rashid al-Din wrote the name as Kalajān, and the name refers to Alashan (Mongolian Alasha) known in Tangut as

2xɪ̃ 1lã

I cannot explain the mismatch in the first vowels of 2xɪ̃ 1lã, Alasha (whose sha is from Chinese 山 'mountain'), and Mandarin 賀蘭 Helan < *xɔlan.

Palladius identified Calachan as "the summer residence of the Tangut kings [i.e., not the Tangut capital], which was 60 li from Ning-hia, at the foot of the Alashan Mountains. It was built by the famous Tangut king Yuen-hao, on a large scale, in the shape of a castle, in which were high terraces and magnificent buildings." Palladius stated that the Tangut name of Calachan was "apparently" Halachar. Is that name in Chinese transcription in 西夏書事 Xixia shushi? I cannot find any char-like Tangut syllable that would make sense after 2xɪ̃ 1lã.

5. 開封 Kaifeng (!)

Kaifeng was of course the capital of the Northern Song. Like Shi Jinbo, I don't think the Tangut ever used the name of the capital of their neighbor and rival. Dunnell could not find 'Kaifeng' in any Tangut sources. I don't know how that Chinese name would have been written in Tangut. A possible transcription might have been

4186 2635 1khe 2xiõ

with the character that appeared in names in part 1, bringing us full circle. G-*R-ADATION IN TANGUT (PART 2)

While writing "Hilo in Tangraphy", I changed my mind about how to reconstruct the core Tangut vowel system. What follows is an outline of the history I reconstruct leading up to my newest diagram.

The earliest stage of pre-Tangut had only six vowels:

i ə u
e a o

Basic pre-Tangut words had the structure

presyllable + syllable


with stress on the second vowel. The six unstressed first vowels of the presyllable may have merged into a smaller set. For now let's suppose there were only two unstressed first vowels: higher and lower *ʌ.

Medial *-r- lenited to *-ɨ-.

Under the influence of the neighboring dialect of Chinese, palatals became retroflexes followed by *-ɨ-.

Vowel harmony required the first and second vowels of a word to have matching height classes. Nonhigh vowels bent upward after and high vowels bent downward after *ʌ. *-ɨ- lowered and backed to *-ɤ- after and before lower vowels not preceded by *ɯ. These changes produced a richer vowel system full of diphthongs that became unpredictable and hence phonemic after presyllables were lost:

original main vowel *i *e *ə *a *u *o
no presyllable: no change *i *e *ə *a *u *o
no presyllable + *-ɨ- *ɨi *ɤe *ɨə *ɤa *ɨu *ɤo
presyllable with *i *ie *ɨa *u *uo
presyllable with + *-ɨ- *ɨi *ɨe *ɨu *ɨo
presyllable with *ei *e *a *ou *o
presyllable with + *-ɨ- *ɤi *ɤe *ɤə *ɤa *ɤu *ɤo

*-ɨ- generally fronted to *-i- after initials other than retroflexes and *l- which may have been velar [ɫ]. *-ɨ- survives in a few words which may be archaisms and/or borrowings from dialects without *-ɨ-fronting: e.g.,

0785 1bɨu < *bru 'border'

3408 1tsɨa < *Cɯ-tsa 'to broil' (cf. Tibetan tsha < *tsa 'hot'; was the presyllabic a causative prefix?)

*u merged with *iu.

*ei and *ou monophthongized as ɪ and ʊ: i.e., as compromises between upper-mid and high vowels.

The Tangut phonological tradition categorized these vowels and diphthongs into four grades. Three (II-IV) each had a characteristic vowel, whereas rhymes of the first grade did not begin with any of those vowels:

Vowel Front Central Back
Grade i e ə a u o
IV: i i ie ia iu io
III: ɨ ɨi ɨe ɨə ɨa ɨu ɨo
II: ɤ ɤi ɤe ɤə ɤa ɤu ɤo
I: Ø ɪ e ə a ʊ o

This latest system is close to the one I've been using for the last six years but has the following differences:

- All Grade II diphthongs now share a vowel; my current diphthongs transparently share more in common than the lowered vowels of my earlier reconstruction

- The Grade I i- and u-vowels are now monophthongs like the other Grade I vowels. David Boxenhorn poined out that my system from last week had no simple vowels; all phonetically simple vowels were phonemic diphthongs. That is highly unlikely, so I have reinterpreted Grade I as the home of simple vowels.

I will explain the reasoning behind the reconstruction of individual vowels and diphthongs in future posts. WHAT CAN A KUNG FU POSTER TEACH US ABOUT SOUTHEAST ASIAN PHONETIC HISTORY?

Today I saw a poster for หมัดนรก ฝ่ามือพญายม Mat narok famɯɯ phayaa Yom (Hell Fist and King Yama's Palm), the Thai version of 幽靈神功 Youling shengong (Phantom Kung Fu). (Many more Thai posters for Chinese movies are at Kung Fu Movie Posters.)

The word พญา <bañā> phayaa 'king' caught my eye because it didn't look like an Indic loan even though most polysyllabic Thai words are of Indic origin. Where could it come from? My guess was Khmer, and yes, there is a Khmer word ពញា <bañā> phɲiə. But the trail doesn't stop there. It goes back to another Tai language. The online version of Headley's 1977 Khmer dictionary derives it from Lao ພະຍາ <baḥyā> phaɲaa. So where did that come from?

Here's what I think happened. The root of the word is ultimately Khmer after all: Old Khmer vrah 'divine being' which later became premodern Khmer brah. This word was borrowed into early Thai and Lao (or their common ancestor?) as *bra and added to the native Tai word *yaa 'male' (surviving in Thai royal language). *brayaa developed regularly into modern Thai พระยา <braḥyā> phrayaa 'a rank of nobility'. Meanwhile in Lao it underwent *y-nasalization, becoming *braɲaa.

That Khmer-Lao hybrid form was then borrowed into Khmer as ព្រញា <brañā>. Then Lao lost medial *-r- and the resulting *baɲaa was borrowed into Khmer as ពញា  <bañā> bɔɲaa.

This bɔɲaa (or bəɲaa with a reduced first vowel) in turn was borrowed into premodern Thai as *baɲaa, becoming modern Thai พญา <bañā> phayaa after devoicing of *b- and denasalization of *ɲ. Thai phayaa 'king' coexists alongside phrayaa 'a rank of nobility'. (Although one might prefer to simply derive the Thai form directly from Lao without a Khmer intermediary, Khmer is a more likely source than Lao since Khmer was a source of Thai court terminology.)

Finally, Khmer bəɲaa became modern phɲiə after further Khmer-internal changes: breaking of the stressed second vowel after a voiced consonant, devoicing of *b, and loss of the unstressed first vowel. The Khmer spelling ពញា  <bañā> still reflects the word's disyllabic origin even though the word is now pronounced as a monosyllable.

All of the above implies the following relative chronology of changes:

1. Khmer vr > br (before Lao and Thai borrowed *bra from Khmer)

(Is it also possible that early Tai speakers borrowed Khmer vr- as *br- even before this change in Khmer?)

2. Lao *y > ɲ (before Khmer borrowed <brañā> from Lao)

3. Lao *-r- > Ø (before Khmer borrowed Lao *baɲaa as <bañā>)

4. Khmer aa > after voiced consonants (before voiced obstruents were devoiced)

5. Devoicing of *b (and other voiced obstruents) in Khmer, Lao, and Thai (after aa-breaking)

Devoicing in Khmer must have occurred before voicing conditioned aa-breaking, and it could have occurred in Khmer after it had already occurred in Lao and Thai.

The loss of the first vowel in Khmer must date after Lao *baɲaa was borrowed as a disyllable in Khmer. Thai *baɲaa could be from a Khmer monosyllable *bɲaa, though I cannot find any attestations of a monosyllabic Khmer spelling. YOUNG CELERY FROM BAMBOO STREAM CITY

This morning I saw this meteorologist and learned that her mother is from 芹𡮲 Cần Thơ.

cần 'celery' is a loan from Chinese.

𡮲 thơ 'young' is a native Vietnamese word written with a made-in-Vietnam nom character: a semantophonetic compound of the Chinese characters 小 'small' (semantic) atop 詩 thơ 'poem' (phonetic). Vietnamese adjectives follow the nouns they modify.

I found the nom spelling in the Chinese Wikipedia entry for Cần Thơ and confirmed it at this wiki in the nom script.

I expected the Chinese name of Cần Thơ to be 芹詩 with the Chinese character 詩 'poem' as the closest possible substitute for the nom character 𡮲 which is not in most Chinese fonts and has no Chinese reading. Moreover, 詩 has a nom (but not Sino-Vietnamese) reading thơ that is homophonous with 𡮲 thơ. However, I was surprised to learn that the actual Chinese name is 芹苴 with 苴 'hemp'. The Sino-Vietnamese reading of 苴 is thư which is phonetically similar to 𡮲 thơ, but Chinese readings of 苴 such as Mandarin ju or Cantonese zeoi [tsɵy] don't sound like thơ. A Chinese speaker would have chosen a character with a Chinese reading resembling thơ [tʰəː]. Was the Chinese name 芹苴 coined by Vietnamese-Chinese bilinguals who knew 𡮲 thơ was not possible in Chinese, rejected 詩 thơ, and thought 苴 thư was close enough? Did the spelling 芹苴 originally appear in nom texts and spread into Chinese?

The Khmer name of Cần Thơ is ព្រែកឫស្សី prɛɛk rɨhsǝy. ព្រែ prɛɛk is 'stream' and ឫស្សី rɨhsǝy is 'bamboo'. HILO IN TANGRAPHY

Tonight I saw this video which made me nostalgic for Hilo where I wrote much of the first two years of this blog (now offline). It defined Hilo as 'to twist'.

That made me wonder how Hilo could be written using the principles of Tangut script.

Of course the most obvious solution would be to write it phonetically: e.g., as

1xi 1lo


1xi 2lo

using characters used to transcribe the Sanskrit syllables hi and lo (Arakawa 1997: 110-111).

(You may have noticed that I am reconstructing Grade I rhyme 51 as -o again rather than as -ao from part 1 of "G-*r-adation in Tangut". I've revised my reconstruction, as I'll explain in part 2 of "G-*r-adation in Tangut".)

But it would be more fun to create original characters: e.g.,

?xi ?lo (tones uncertain)

consisting of two semantic compounds:


?xi (first half of 'Hilo') =

bottom left of 1408 1lhioʳ 'place' +

right of 5585 2ŋiəʳ 'to twist'


?lo (second half of 'Hilo') =

left ('hand') of 5585 2ŋiəʳ 'to twist' +

left ('earth') of 2627 2lɨə̣ 'earth'

In that imaginary case, we know why 'Hilo' would have been written with 'twist'.

In the real case I wrote about last night, I don't know why the name of the Tangut capital

2635 0707 2xiõ 1tʂɨew

was written with

Nishida radical 191 / Boxenhorn code dol

a component in eight other characters with seemingly inappropriate semantics: 'mud', 'servant', 'Xianbei', 'tick', a suffix, 'phantom', 'stupid', and 'rush used as a candle wick'. Ruling out the suffix, perhaps one of the seven others is somehow relevant to the location of the capital in an unexpected way.

A list of meanings of Hilo from the online Pukui/Elbert Hawaiian-English dictionary is just as eclectic as those eight meanings:

1. 'to twist, braid, spin; twisted, braided; threadlike; faint streak of light'

2. 'first night of the new moon'

3. 'name of a famous Polynesian navigator for whom the city and district may have been named'

(Was the navigator named after one of the other six meanings?)

4. 'Hilo grass'

5. 'a running sore'

6. 'a variety of sweet potato'

7. 'thighbone'

Each of these meanings in turn has its own associations.

Untangling the Tangut script involves reconstructing a web of such associations. Many must be culture-specific, but the only surviving evidence for them may be in ... the script! Can researchers escape the trap of circular reasoning?

