Last Friday (yes, I'm behind), I saw

新商品 'new product', lit. 'new trade item'

on packaging.

In Old Chinese, 商 was

either *sɯ-taŋ (corresponding to Baxter and Sagart 2014's *s-taŋ)

or *sɯ-laŋ (corresponding to Schuessler 2009's *lhaŋ)

and in Middle Chinese, it was *ɕɨaŋ.

It occurred to me that the palatalization of *sɯ-t- to *ɕ-

*sɯ-t- > *sɯ-tɨ-  > *stɨ- > *stɕɨ- > *ɕtɕɨ- > *ɕːɨ- > *ɕɨ-

was like what I understand to be the palatalization of *stj- to [ɕː] in Russian:

*stj- > *stɕ- > *ɕtɕ- > щ [ɕː]

Above I presume there was an intermediate *ɕtɕɨ-stage at some point in Old Chinese resembling romanizations of Russian щ as šč or shch (e.g., Хрущёв Khrushchev), but without external evidence (e.g., Old Chinese transcriptions of a foreign word with šč-), it's impossible to say when that point was.

3.14.11:45: I assume that Russian alternations such as

вместить 'to contain (perf.)' ~ вмещу 'I will contain'

can be internally reconstructed as

*vmestitĭ ~ *vmestju

to fit the pattern of

вменить 'to consider (perf.)' ~ вменю 'I will consider'

< *vmenitĭ ~ *vmenju

Ideally I'd like to find an example of initial щ- [ɕː] from *stj-, but I think initial щ [ɕː] is normally from *sk-. A possible exception I found in Preobrazhensky's Etymological Dictionary of the Russian Language is щегол 'goldfinch'; Duden says German Stieglitz 'goldfinch' is of Slavic origin.

Proto-Slavic *štjegŭlŭ? > *ščegŭlŭ

East Slavic:

Ukrainian щиголь <ščyhol'>, щоголь <ščohol'>, щоглих <ščohlix>

Belarusian щигель <ščihel'>, щиглик <ščiglik> (I have kept Preobrazhensky's spellings with щ and и instead of modern шч and і)

(why -ль as if from *-lĭ?)

(no South Slavic reflexes? I would expect Bulgarian initial щ- [št], Serbo-Croatian initial št-, and Slovene initial šč-)

West Slavic:

Czech stehlec, stehlík (with ste- rather than the regular ště- [ʃcɛ] - could this be a borrowing from some variety of German in which st- was [st] instead of [ʃt]?)

Polish szczygieł [ʂtʂɨɡʲɛw]

Upper Sorbian šćihlica [ʃtsʲihlitsa]

Lower Sorbian ščgeľc [ʂtʂgɛlts] (I have kept Preobrazhensky's spelling with ľ instead of modern l)

The reflexes of *stj- could have had parallels in Old Chinese at different stages and/or different places. A LITTLE MISTAKE: ÍT ÓT TO BE THE PHONETIC

In my last post, I wrote that 乚 ất was the phonetic of the Vietnamese Chữ Nôm character 𡮒 ót 'a kind of fish'. After announcing that post on Twitter, I realized that the actual phonetic was 𠃝 which has two readings, ít 'little' and út 'youngest'. I didn't think of 𠃝 because 乙 appears as 乚  in 𡮒.

If the creator of 𡮒 had the reading út in mind for its phonetic 𠃝, the score of 𡮒 would be 2 + 3 + 2 + 2 = 9 - much higher than my original score of 6.

乙 is a 'Semitic phonetic': it can represent syllables with a wide range of vowels as long as those vowels are within the consonantal frame [ʔ-t]:

Neutral or achromatic vowels (neither palatal nor labial)

ướt [ʔɨət]

ất [ʔət]

ớt [ʔəːt]

𢖮 ắt [ʔat]

𢖮 át [ʔaːt]

Palatal vowels

𠃝 ít [ʔit]

𠮙 ét [ʔɛt]

Labial vowels

𠃝 út [ʔut]

𡮒 ót [ʔɔt]

All of those syllables have the sắc tone written with an acute accent. Syllables with initial glottal stops and final stops regularly develop that tone.

Such a range of vocalism for a phonetic is unusual in Chữ Nôm. In my 2003 book, I proposed that phonetics generally belong to three vowel classes: neutral, palatal, or labial.

'Semitic phonetics' are exceptions to that generalization: e.g., 曰 viết in

neutral: 曰 vất [vət], 抇 vớt [vəːt]

palatal: 𢪏 vít [vit], 𧿭 vết [vet], 𢪏 vét [vɛt]

labial: ⿰曰𡿨 vót [vɔt]

3.1.0:39: Compare the ranges of readings for 'Semitic phonetics' above with those for کت <kt> listed in Hayyim's  New Persian-English Dictionary:

neutral: kat

palatal: ket

labial: kot

(Of course, Persian is not a Semitic language, but it is written in a Semitic script.)

One difference is that all of those k-t readings have no tones, whereas all of the readings for Chữ Nôm characters with the two 'Semitic phonetics' above have the same tone. Perhaps the term 'Semitic phonetic' is a misnomer if the consonantal frames are actually consonant-and-tone frames.

cam is a third 'Semitic phonetic' whose derivatives below have readings with three different tones (ngang, huyền, sắc) as well as three different vowel classes:

neutral: 坩柑泔 cam [kaːm], 紺 cám [kaːm], ⿰月甘 cằm [kam], 𩚵 [kəːm], 鉗 cườm [kɨəm]

palatal: 鉗 kìm [kim], ghìm [ɣim], kiềm [kiəm], kềm [kem], kèm [kɛm]

labial: 鉗 cùm [kum], 柑 cùm [kum]

Note, however, that all but one of the readings in that sample have either the ngang or huyền tones which are variants of the same proto-tone conditioned by voicing or its absence in proto-onsets. Also, only one of those characters is a made-in-Vietnam character (⿰月甘). 甘 was already a neutral and palatal phonetic in Middle Chinese because Old Chinese *a often had palatal reflexes after nonemphatic initials. An ideal example of a 'Semitic phonetic' would have many made-in-Vietnam derivatives with a wide range of vowels and tones. I should dig deeper to see if I can find one. ÓT TO BE WRITTEN: FISHING FOR PHONETICS

The Vietnamese Chữ Nôm script represents Vietnamese syllables with existing and modified Chinese characters. The problem is that Vietnamese has many more syllables than Sino-Vietnamese, the subset of Vietnamese syllables that are Chinese character readings. For instance, Vietnamese has syllables ending in -ót, a rhyme absent from Sino-Vietnamese.

In my last two posts, I looked at Vietnamese solutions for writing the syllable lót.

I got curious about how other -ót syllables were written and found several strategies. My examples are not exhaustive, and I have omitted glosses in most cases since I am focusing on readings.

1. Overall match

⿰口脫 thót : 脫 thoát (score: 2 + 3 + 2 + 2 = 9; not a 10 only because the vowel heights don't match: o [ɔ] is higher than oa [wa], though I could be generous and say oa is like [o] + [a], and [ɔ] is between those two vowels in height)

2. Matching the onset and coda without much regard for the vowel

𡮒 ót 'a kind of fish' : 乚 ất (the unwritten onset is [ʔ]; score: 2 + 0 + 2 + 2 = 6)

mót : 蔑 miệt (score: 2 + 1 + 2 + 1 = 6; the only matching vowel quality is length*)

⿰曰𡿨 vót : 曰 viết (score: 2 + 1 + 2 + 2 = 7; the only matching vowel quality is length)

This is the consonantal skeleton or Semitic strategy. If English were written with such a strategy:

cat = drawing of a cat

Kate = <woman> + <cat>

kite = <wing> (representing flight) + <cat>

cut = <blade> + <cat>

coat = <clothes> + <cat>

coot = <bird> + <cat>

caught = <hand> + <cat>

Cf. the reverse Semitic strategy (5 below).

3. Matching the rhyme without much if any regard for the onset

3a. Glottal onset : nonglottal phonetic

𡁾 hót : 說 thuyết < *ɕ- (or *sʰ-?) (score: 0 or 1 + 2 + 2 + 2 = 6 or 7, depending on whether the aspiration of th- [tʰ] < *sʰ-? counts as a partial match for h-)

3b. *Palatal onset : nonpalatal phonetic

chót with initial [c] : 卒 tốt < *(t)s- (score: 1 + 3 + 2 + 2 = 8)

giót < *ɟ- < *CV-c- : 卒 tốt < *(t)s- (score: 0 or 1 + 3 + 2 + 2 = 7 or 8, depending on how close the initials were when 埣 was created: *CV-c- is not far from *(t)s-, whereas modern gi- [z] ~ [j] is far from t-)

xót < *ɕ- < *cʰ- : 卒 tốt < *(t)s- (score: 0 or 1 + 3 + 2 + 2 = 7 or 8, depending on how close the initials were when 埣 was created: *cʰ- is not far from *(t)s-, whereas modern x- [s] is far from t-)

⿰律𡿨 xót < *ɕ- < *cʰ- : 律 luật (score: 0 + 2 + 2 + 1 = 5)

3c. *Retroflex onset : nonpalatal phonetic

sót < *ʂ- < *Cr- : 卒 tốt < *(t)s- (score: 0 or 1 + 3 + 2 + 2 = 7 or 8, depending on whether the proto-onset was *sr- which isn't too far from *(t)s-; *(t)s- had hardened to t- by the time *Cr- fused into *ʂ-)

rót < *r- or *CV-s- (proto-onset unknown) : 卒 tốt < *(t)s- (score: 0 or 1 + 3 + 2 + 2 = 7 or 8, depending on whether the proto-onset was *CV-s-)

3d. Palatal nasal onset nh- [ɲ] : oral onset phonetic

nhót : 卒 tốt < *(t)s- (score: 0 + 3 + 2 + 2 = 7)

𦝬 nhót : 突 đột with initial [ɗ] < *t- (score: 0 + 3 + 2 + 1 = 6)

𣑵 nhót : 聿 duật with initial [z] ~ [j] < *dʲ- < *j- (score: 0 or 1 + 2 + 2 + 1 = 5 or 6, depending what the onset of 聿 was when 𣑵 was created)

3e. Lateral onset : nonlateral onset phonetic

⿰貝骨 lót : 骨 cốt (score: 0 + 3 + 2 + 2 = 7)

lót : 卒 tốt < *(t)s- (score: 1 + 3 + 2 + 2 = 8)

3f. Labial onset : nonlabial onset phonetic

𡁾 vót : 說 thuyết < *ɕ- (or *sʰ-?) (score: 0 + 2 + 2 + 2 = 6)

vót : 卒 tốt < *(t)s- (score: 0 + 3 + 2 + 2 = 7)

This character could belong to 2 or 3a depending on which part is phonetic:

⿰孛乙 ót 'back of brain' : 孛 bột '' 'comet' + 乙 ất 'second Heavenly Stem' (score: 0 + 3 + 2 + 1 if 孛 is phonetic or 6 or 2 + 0 + 2  + 2 = 6 if 乙 is phonetic)

Neither part is obviously semantic. The absence of any component meaning 'brain' or even 'head' is puzzling. Could this be a double phonetic compound with 孛 approximating the vowel and 乙 the rest?

4. Approximating the onset, vowel, and tone without regard for the coda

𠲿 thót : 束 thúc (score: 2 + 3 + 1 + 2 = 8)

I suspect 𠲿 was created by a speaker of a central or southern dialect in which *-t > [k]. If so, 𠲿 is really an example of strategy 1, and the score should be 9 (with a penalty solely for vowel height mismatch).

5. Approximating the vowel and tone without regard for the consonants

The reverse Semitic strategy (cf. 2 which is the Semitic strategy).

hót : 束 thúc (score: 0 + 3 + 1 + 2 = 6)

I suspect this usage of 束 started with a speaker of a central or southern dialect in which *-t > [k]. If so, 束 is really an example of strategy 3, and the score should be 7 (with penalties for the onset and vowel height mismatch). The score could be raised to 8 if the aspiration of th- [tʰ] counts as a partial match for h-.

No solution has a score of 4 for vowels simply because no phonetic has a Sino-Vietnamese reading with o [ɔ]. The maximum possible score for -ót syllables is 9 out of an ideal of 10 (= 2 + 4 + 2 + 2). The actual scores above range from 5 to 9. It is not possible to determine the median or the mode of scores for ót-characters from the data in this post because it is incomplete and only typologically rather than statistically represenative: e.g., I omitted all but one strategy 1 character with a score of 9 because near-exact matches are boring.

Until now Chữ Nôm characters and readings have been treated as a uniform, timeless body. The next phase of Chữ Nôm studies should take space and time into account: where and when do certain spellings arise, and what can they tell us about Vietnamese phonetics in a given place and period?

*I consider all Vietnamese vowels and diphthongs to be the same length for scoring purposes with the exceptions of the short vowels ă [a] and â [ə] which cannot appear in syllable-final position because all Vietnamese syllables must be bimoraic. Hypothetical *Că and *Câ-syllables would be monomoraic and therefore not permissible. A LÓT OF BRIBES OF BONES AND SHELLS

字典𡦂喃引解 Tự Điển Chữ Nôm Dẫn Giải ‘Character Dictionary of Chữ Nôm with Quotations and Explanations' lists

⿰貝骨 (not in Unicode) lót 'bribe'

as a homophone of lót 'to add a layer beneath or inside' from yesterday. (I suspect the noun is an extension of the verb: a bribe is something one pockets - put inside.)

bối 'shell' on the left is the monetary radical. It's not surprising.

What is surprising is 骨 cốt 'bone' on the right with initial [k] instead of [l]. Or is it?

Using yesterday's scoring system for phonetic fidelity, ⿰貝骨 is a 7:

- the initial consonant is a 0 - [k] and [l] have nothing in common

- the vowel is a 3 - o [ɔ] and ô [o] are both back rounded and of the same length; only their height differs

- the final consonants is a 2 - a perfect match

- the tone is a 2 - a perfect match

Taberd lists a spelling of lót 'bribe' with a matching initial and an ironic original meaning:

律, originally for luật 'law' (bare phonetic)

I find his entry format confusing:

— 揬 | đút —, subornare

Why are the dashes in the Chữ Nôm and the Quốc Ngữ romanization on opposite sides? Why isn't the entry like this?

揬 — | đút —, subornare

đút, another word for 'bribe' (presumably an extended usage of đút 'to insert'), has two other spellings without the 扌 'hand' radical (the means of insertion):

⿰貝突 with the monetary radical plus the same phonetic 突 đột 'suddenly'

is there the syllables of the redundant compound ⿰貝突⿰貝骨 đút lót 'bribe' would have matching radicals with this spelling: cf. Sino-Vietnamese 賄賂 hối lộ 'bribe' with double monetary radicals

đút with the monetary radical plus the phonetic 卒 tốt 'to end'

Let's score those spellings:

揬 and ⿰貝突: initial 2, vowel 3, final 2, tone 1 = 8

賥: initial 1.5 (t- is closer to đ- than, say, l- which would be a 1), vowel 3, final 2, tone 2 = 8.5

Do scores correlate with textual frequency? Did writers tend to favor better phonetic matches? Probably not. I admit my scoring is arbitrary and for fun. And timely given that the


Thế vận hội Mùa đông

'World athletic meeting Season winter' = 'Olympic Winter Games'

are still going. Though not for long - they end tomorrow.

(I wanted to type a made-in-Vietnam character for mùa 'season', but my editor doesn't support CJK Unified Ideographs Extension E. And it probably never will since KompoZer's development has been frozen since 2010.) A LÓT OF COMPROMISES: FITTING VIETNAMESE INTO A CHINESE SYLLABARY

Today I found out that one of the Chữ Nôm spellings of Vietnamese tốt 'good' (see parts 1 and 2 of my series)

䘹 = semantic 衤 y 'clothes' + phonetic 卒 tốt 'to end'

is also a Chinese character in the strict sense; it has been attested in Chinese since at least c. 2000 years ago in 楊雄 Yang Xiong's 方言 Fangyan 'Regional Speech' where it refers to *tsout 'underwear'. Did the Vietnamese recycle 䘹 for tốt 'good', or did they unintentionally recreate it? I suspect the latter, as 䘹 is a rare character; the fact that it was encoded in Unicode's Extension A block rather than the main CJK Unified Ideographs block tells me that it wasn't common enough to make it into the first wave of 20,971 characters.

字典𡦂喃引解* Tự Điển Chữ Nôm Dẫn Giải ‘Character Dictionary of Chữ Nôm with Quotations and Explanations' lists a second reading for 䘹, lót 'add a layer beneath or inside', citing


lót trong áo cừu

'add.layer in coat'

from 嗣德聖製字學解義歌 Tự Đức thánh chế tự học giải nghĩa ca 'Tự Đức's Sagely Made Song for Character Study and Explaining Meanings' edited by Emperor Tự Đức sometime in the 19th century.

Normally the Vietnamese did not write l-syllables with t-characters. The other three spellings of lót in Tự Điển Chữ Nôm Dẫn Giải have l-phonetics:

1. 律 luật 'law' (bare phonetic)

2. 𢯰 =扌 'hand' + 律 luật

3. ⿰衤律 (not in Unicode) = 衤 'clothes' + 律 luật

It is not possible to write lót with Chinese characters for lót [lɔt] or even as lốt [lot] because no Chinese characters with those Sino-Vietnamese readings exist.

Chinese syllables with *l- ending in stops were borrowed with the nặng tone written as a subscript dot, not the sắc tone written with an acute accent. I can't think of any exceptions to this rule at the moment. So it seems a tonal match was impossible.

Tone aside, a perfect segmental match was also impossible.

lọt may not even have been a theoretically possible reading since *-ɔt does not seem to have been a rhyme in any variety of Chinese known to the Vietnamese during a millennium of Chinese rule.

The absence of lột, on the other hand, is partly accidental - there was no Chinese phonotactic rule forbidding it. lột would have ultimately come from an early Old Chinese *Cʌ-rut, with a *low vowel conditioning the lowering of *u:

*Cʌ-rut > *Cʌ-rout > ́́́*rout > *lout > *lot (> Sino-Vietnamese *lột)

luật 'law' comes from early Old Chinese *rut without a preceding *low vowel to lower its *u:

*rut > *lut > *lwit > *lwət (> Sino-Vietnamese luật)

Without any Chinese characters read as lọt or lột (or lót or lốt), the Vietnamese

- had to compromise on the tone if they were to use an l-phonetic

- had to compromise on the vowel if they were to use an l-phonetic

- had to compromise on the initial if they were to use a non-l -ốt phonetic

The four spellings of lót reflects two different kinds of compromises:

- the 律-spellings have l- at the expense of the tone and the vowel

- 䘹 has a perfectly matching rhyme at the expense of the initial

It seems the Vietnamese generally favored approximating the initial, but I would like to see statistics.

It would be fun to come up with a scoring system for how close a Chữ Nôm character reading matches the pronunciation of its Chinese phonetic component**. Off the top of my head:


0 points - nothing in common

1 point - shared point of articulation (l- and t- as in 䘹) or shared manner of articulation

2 points - both shared point and manner of articulation


0 points - nothing in common

1 point - shared frontness, height, roundness(less), or length

4 points - perfect match


0 points - nothing in common

1 point - same register or *VQHC class

2 points - perfect match

Using that scoring system, out of a maximum score of 10 (2 for onset consonant, 4 for vowel, 2 for coda consonant, 2 for tone):

the 律-spellings have 7 points:

2 + 2 (length match; partial shared frontness and roundness) + 2 + 1 (same *VQHC class)

䘹 has 9 points:

1 (l- instead of t-) + 4 + 2 + 2

Yet spellings of the 䘹 type which compromise on the initial are less common despite their higher score. That makes me think I need a better scale for measuring onset fidelity.

*2.23.14:10: Chữ 'character' can be spelled at least six different ways in Chữ Nôm:

1. 字

2. 𡦂 (字 doubled; this character formation strategy is rare)

3. 𡨸 (like 2 but with one 字 abbreviated as 宁)

4. ⿰ 字 + 宁 (reversal of 3)

5. ⿰ 字 + 文 'writing'

6. 𡨹 (like 4 but with thủ 'to guard' instead of 字; 𡨹 is a Chữ Nôm character for giữ 'to guard' doing double duty for a phonetically similar word chữ; its phonetic 宁 is an abbreviation of 字 chữ)

I don't know which is the preferred spelling of the author (Nguyễn Quang Hồng). I picked 𡦂 to differentiate chữ from 字 tự which also means 'character'. Both words are borrowed from the same Chinese etymon at different periods.

**2.23.14:11: 𡗶 is a rare example of a Chữ Nôm character without a phonetic component: it is a compound of 天 'heaven' above 上 'above'. It is reminiscent of the Khitan large script character

for 'heaven' (with 土 'earth' on the bottom) which probably predates it. (The earliest surviving Chữ Nôm text is from 1209; the earliest known uses Chữ Nôm from the late first millennium do not include 𡗶.) WHAT'S SO BEAUTIFUL ABOUT A BUG'S BOTTOM?: THE ORIGIN AND ORTHOGRAPHY OF VIETNAMESE TỐT (PART 2)

I've abandoned a lot of series on this blog, but I haven't forgotten to continue what I started two weeks ago. and both list nine spellings of Vietnamese tốt 'good':

Group A with phonetic 卒 tốt 'to end'

A1: bare phonetic

1. 卒

A2: tốt đệp: 'good' in the sense of 'good-looking'

2. 䘹 with 衣 'clothes' on the left

Was  䘹 used in reference to clothes, or could it be used more broadly?

3. 𬙼 with 美 'beautiful' on the left

4. 𩫛 with 高 'high' on the left (why isn't this in group A3?)

5. 𡄰 with 善 'good (opposite of evil)' on the left (why isn't this in group A4?)

A3: tốt (dáng cao): 'good' in the sense of 'high'

6. 崪 with 山 'mountain' on the left

7. 崒 with 山 'mountain' on top

A4: tốt xấu: 'good' as the opposite of 'bad'

8. 𡨧 with a 宀 roof on top (why? - it's not by analogy with a roof in xấu 'bad', since none of the spellings of xấu contain a roof: 丑瘦臭醜.)

Group B without any phonetic: tốt đệp: 'good' in the sense of 'good-looking'

9. 𧍉 with 虫 trùng 'bug' plus 底 để 'bottom'

This last character is like so many Tangut characters: it does not seem to be the sum of its parts. Its components neither sound like tốt nor mean anything like 'good'. I wish I could find an example of 𧍉 in context to confirm this reading.

𧍉 has a second reading which makes phonetic and semantic sense: đỉa 'leech'. HENTAI KITSUBUN?

In premodern Japan, there was a form of Japanized Chinese now known as 變體漢文 hentai kanbun 'modified Chinese prose'.

Just as the Japanese once wrote in Chinese, the Jurchen once wrote in Khitan. There was no Jurchen script until 1119. Nonetheless, as late as 1156, 18 years after the creation of the second Jurchen script,

... it was officially ordered that in the [Jurchen Empire's] examination for copyist in the Department of National Historiography the Jurchen copyists be able to translate Kitan [= Khitan] into Jurchen, and the Kitan copyists Chinese into Kitan. Even the Jin [= Jurchen] emperor Shizong commented, "The new Jurchen script cannot match it [Khitan]." The Chinese original was first written in the Kitan small script and then annotated in or translated into the Jurchen script. (Kane 2009: 3)

Last night as I was thinking about the last known dated Khitan small script text from the Jurchen era known as

<GREAT> and <HEAVEN> (1161-1189)

in the Khitan small script, I realized that Jurchenized Khitan might be called 變體契文 hentai kitsubun 'modified Khitan prose' - or in Sino-Jurchen, something like biyanti kiwen.

How might Khitan be Jurchenized? Khitan seemed to have grammatical gender (Kane 2009: 144):

In the past tense of verbs, one can also see this distinction between the suffix <er> for males and <én> for females.


With the numerals, Wu Yingzhe has noticed an important phenomenon: in most cases, the dotted form refers to a male, and the undotted form to a female, or is non-gender specific. Dotted and undotted forms also appear with inanimate objects, strongly suggesting grammatical gender in Kitan. The whole corpus needs to be reexamined with a view to pursuing these clues, but that research has not yet been done.

Jurchen, on the other hand, did not. In theory Jurchen speakers writing in Khitan might have omitted masculine dots or added them to nonmasculine forms. (Not just feminine because I suspect there might be a neuter gender with agreement patterns blending masculine and feminine characteristics; seemingly inconsistent nouns may have been neuter.) Do inconsistencies in gender marking cluster in Jurchen-period Khitan texts? Even if that were the case, that does not necessarily mean gender problems would have been unique to Jurchen speakers. Perhaps gender was on the decline in native speaker Khitan.

Other Jurchen errors from a Khitan native speaker's perspective could have been less dramatic: e.g., incorrect case marking akin to saying Japanese-style 'X DAT become' instead of 'X NOM become' for 'become X' in Korean.

Centuries later, the Jurchen used Khitan's sister language Mongolian in writing after they had forgotten their own scripts. Has there ever been a study of Mongolian as written by Manchu speakers?

So far, the Khitan corpus has generally been treated as a single entity. But Khitan was written across a wide area for three centuries. What may appear to be inconsistencies within the corpus may turn out to be innovations and/or hentai kitsubun features correlated with specific times and/or places.

*2.22.9:24: The hypothetical Khitan neuter might be like the Romanian neuter which has no unique features:

[...] in synchronic terms, Romanian neuter nouns can also be analysed as "ambigeneric", i.e. as being masculine in the singular and feminine in the plural

Or maybe there is no neuter. The dots may not have a simple one-to-one correlation with  the two genders. KHITAN DOROGHAM 'PEACE'?

Last night I mentioned one exception to 'heaven' and 'great' matching up in the Khitan large and small scripts in Andrew West's list of era names:

= =

The Chinese era name 大定 'great settlement' (1161-1189; Kane 2009's translation) corresponds to


in the large script and both


in the small script.

I assume the third large script character

is <gham> corresponding to


in the small script.

Did the Khitan have two era names recorded in the small script? This list mentions only one small script inscription from that era: the epitaph for the 博州防禦使 Bozhou defense commissioner (1171). I don't have access to the text. Do both small script era names appear in it, or is a single instance of the era name difficult to read? I can imagine a damaged 'great' looking like it could be 'heaven' in the small script or vice versa.

In any case, the second name element also appears in the Khitan large script equivalent of the Chinese era name  保寧 'protect tranquility' (969-979; again, Kane's translation):

<? SEAL gham>

The first character might represent a Khitan word for 'protect'. I cannot guess its reading because I don't know the Khitan small script equivalent of that era name.

I can, however, draw this equation:

定 'settlement' = 寧 'tranquility' = large <SEAL gham> = small <SEAL>

Thesaurus Linguae Sericae defines both 定 and 寧 as 'peaceful' in the synonym group 'delightful because orderly and lacking chaos'.

I conclude that <SEAL.gha(.a)m> (a joint transliteration of the large and small spellings) could have meant something like 'peace'. It may have had nothing to do with seals or rituals.

The Khitan word for 'seal' doubled as the word for 'ritual'. Jurchen doro(n), written as


<SEAL> ~ <SEAL> ~ <SEAL.un> (with clarifier)

with a character clearly related to


in the Khitan large script, also had that double meaning. Jurchen doro(n) may either be a borrowing from Khitan or be an unrelated word whose semantic scope was influenced by Khitan.

Khitan <SEAL> in <SEAL.gha(.a)m> may be a phonogram rather than a logogram. If the Khitan word <SEAL> was the source of Jurchen doro(n), it might have been read doro, and the word in question was dorogham.

Khitan dorogham looks vaguely like Written Mongolian words such as doru 'weak' and doru-ghsi 'downward' (cf. dege-gsi 'upward'*). Could they all share a root *dor 'down'?

> 'calmed down' > 'peaceful'

> 'pushed down onto paper' > 'seal'

> 'act performed to calm down', 'act of stamping a mark on circumstances' > 'ritual'

> 'strength down' > 'weak'

So perhaps <SEAL> wasn't just a phonogram; it might have been etymologically appropriate as well in <SEAL.gha(.a)m>. But what would Khitan -gham be if <SEAL> was the root?

*2.21.1:51: -gsi is an allomorph of the Mongolian '-ward' suffix after 'feminine' vowels like e. THE RE-DISSECTION OF KHITAN 'SUCCESSION'

Last night I noticed something obvious that eluded me three years ago: the second character of the Khitan large script equivalent of the Chinese era name 統和 'uniting harmony'

looks like 統 'to unite, govern', the first character of the Chinese era name. Could the 統-like Khitan large script character have represented a Khitan borrowing of Liao Chinese 統 *tʰuŋ 'to unite, govern' or a native Khitan word meaning 'to unite' and/or 'to govern'? If so, then my earlier equation of that single character with two characters in the large script and with <s.bu.o.ɣo> in the small script would have to be changed to


'to unite/govern'? ≠ 'succession' = 'succession'

That then brings up the disturbing possibility that other Khitan large script era names may not be equivalents of Khitan small script names even if there are definite partial matches: e.g., 'heaven' and 'great' almost always match up in the two scripts in Andrew West's list of era names:

= =

I will look at one exception to that generalization in my next post. ECHOES OF THUNDER: CLARIFIERS IN THE JURCHEN LARGE SCRIPT

Today I saw Japanese dare 'who' spelled as 誰れ <> in the title of the Gatchaman episode that first aired forty-five years ago today: 「総裁Xは誰れだ」. That is unusual because 誰 by itself is normally sufficient as a logogram for dare 'who'. The character has no other modern reading when it stands alone for a word. The reading tare is archaic; no one would look at a cartoon titled 「総裁Xは誰だ」 and wonder whether 誰 should be read tare or dare.) Therefore there is no need to add a clarifier hiragana れ <re>.

The Jurchen script is full of such clarifiers: e.g., akdiyan 'thunder' appears as both


<THUNDER> and <>.

If not for the word's Manchu reflex akjan and Chinese transcriptions like 阿玷 ~ 阿甸 *atjan*, we would not be able to reconstruct the probable reading of <THUNDER>.

Why did the Jurchen write akdiyan with a clarifier <an>? If Juha Janhunen is right, the Jurchen large script was not invented; rather, it was an adaptation of an existing variant of the Chinese script that was in use in the kingdom of Parhae that once ruled the Jurchen. If <THUNDER> was taken from that Parhae script, the Jurchen may have thought <THUNDER> could stand for both the lost Parhae word for 'thunder' or their own word. Adding <an> insured that <> would be read as their word akdiyan rather than as the Parhae word (which in this scenario wouldn't have ended in -an).

The Jurchen could have gotten the idea of clarifiers from their southern neighbors in Korea who used clarifiers to indicate that Chinese characters were to be read as Korean words rather than as Chinese words: e.g., in line 7 of the Old Korean poem 彗星歌 Hyeysŏngga 'Song of the Comet' (lit. 'Sweeping Star Song') by 融天師 Master Yungchhŏn in the 鄉札 hyangchhal script during the reign of Shilla's 眞平王 King Chinphyŏng (r. 579-632).

道尸 <ROAD.l> 'road' (cf. modern Korean kil)

掃尸 <SWEEP.l> 'sweep' (cf. modern Korean ssŭl)

星利 <> 'star.?' (cf. modern Korean pyŏl.i**)

are written with the clarifiers 尸 <l> and 利 <li> to rule out Sino-Korean readings *to, *so, and *seŋ (later syŏng and now sŏng) for 道, 掃, and 星. We do not know for certain whether the modern Korean words above are the direct descendants of the Old Korean words, but they are the most likely reflexes even though in theory Old Korean could have had another word for road ending in *-l that was not ancestral to modern kil, etc. We can be far less certain about the Old Korean pronunciations of <ROAD>, <BROOM>, and <STAR> without internal sources (i.e., alternate phonogram spellings in hyangchhal) or external sources (e.g., Chinese transcriptions).

*2.19.5:16: 阿玷 is from the Sino-Jurchen vocabulary of the Bureau of Translators (四夷館; entry 7) and  阿甸 is from the Sino-Jurchen vocabulary of the Bureau of Interpreters (會同館; entry 4).

**2.19.23:39: <> at first appears to corresponding to modern Korean 'star' followed by the nominative case marker i, but the context seems to require an accusative case marker, so either the usage of i may have changed over time or 'star' in fact was once a disyllabic word ending in *-li or *-ri with a final *-i reminiscent of Old Japanese posi 'star' which is sometimes thought to be a cognate. I need to look into this more. HOW WAS OLD PERSIAN CUNEIFORM LIKE THE KHITAN SMALL SCRIPT?

It occurred to me today that both Old Persian cuneiform and the Khitan small script superficially resemble other scripts (Sumero-Akkadian cuneiform and Chinese characters) but operate on different principles.

Old Persian cuneiform was a syllabary with random gaps (transliterated below) plus a few logograms for 'Ahura Mazda', etc. not included in the table.


The gaps do not correspond to gaps in Old Persian phonology or phonetics: e.g., although there was no cuneiform character <ki>, Old Persian did have /ki/, and that was written as a sequence <ka.i>.

The Khitan small script (KSS) also seems to have been a syllabary with random gaps plus some possible consonant letters and a few logograms. The phonetic values of the c. 400 distinct syllabograms are not well understood and in some cases are unknown. But the picture that emerges from Kane's (2009) transliteration of the KSS is one of random gaps: e.g., unlike Old Persian, the  KSS has <ki>,  but no <ka> is known yet. That particular gap may reflect the absence of ka in Khitan if its phonology were like that of its surviving relative Mongolian. However, other gaps may have been random: e.g., there is no known syllabogram <ma>, though there was a syllable ma that had to be written as a letter sequence <m.a>.

The Khitan and Jurchen large scripts are mixtures of syllabograms and logograms. The Khitan large script is too poorly understood for me to say anything about gaps in it. The Jurchen large script, OTOH, is mostly readable, and Kane's (1989: 27) table of syllabograms shows a few gaps. I have transliterated their contexts below:


The absence of <si> is not surprising since *si could have become ši or merged with ši. Jurchen could have been like Korean or Japanese which lack a distinction between /ši/ and /si/.

On the other hand, it is striking that gaps cluster in the column of <Co>-syllabograms. There is no obvious phonetic motivation for the absence of <Co>-syllabograms with the initials m- n- c- š- k- which do not constitute a coherent class of consonants. Nor is there a clear reason why there is no <de> if <te> and <ne> exist with initials at the same point of articulation. Each of those gaps could be either random or illusory - the 'missing' syllabograms could simply be one of the characters whose readings are currently unknown.

The Jurchen small script is all but unknown; the existing samples are too small for decipherment, much less the detection of gaps. DID OLD PERSIAN HAVE UNWRITTEN FINAL CONSONANTS LIKE PYU?

It seems that Pyu sometimes had unwritten syllable-final consonants with the exception of /h/ which was always written on the line as a colon-like visarga. Some Pyu texts have subscript syllable-final consonant symbols and other don't. One Pyu text - the 'B' pillar of the Kubyaukgyi (a.k.a. Myazedi) inscription - has subscript consonants only in its first three lines and none in the remaining twenty-six. There is no obvious correlation between the presence or absence of subscript consonants and geography, date, or genre. The problem of why there were two styles of writing Pyu is reminiscent of the problem of why the Khitan had two scripts.

The Indic scripts of the Philippines originally had no means of indicating final consonants, and the Hanunó'o script is still generally written without the pamudpod vowel cancellation sign introduced in the 1950s.

Schmitt (2008: 84) suggests that Old Persian may have had a third type of situation in which some final consonants were written (/m r š/) and others were not though they

were perhaps still pronounced but in some manner phonetically reduced. Note that original Proto-Iranian *-a is written as Old Persian <-a> (i.e., [-aː]), but original *-an or *-ad is written as -<Ca> (i.e., [-a]).

I see two possibilities here:

1. *-an and *-ad merged into final short [-a] distinct from *-a which became long [-aː].

2. *-an became nasalized short [-ã] and *-ad became short [-aʔ] with a final glottal stop.

I used to think that Pyu also had unwritten nasal vowels and glottal stops that were reduced from earlier nasals and oral stops that were once written, but even the earliest texts do not always have final written consonants, even when there is more than sufficient space for them. DOES PERSIAN 'AND' HAVE A PROTO-INDO-EUROPEAN SOURCE?

My short answer is no.

My long answer:

Persian و <wa> [væ] ~ [o] 'and' looks like a loan from the identically spelled و Arabic wa, but is in fact a convergence of an Arabic loanword with a native word *u < Old Persian utā (cf. Avestan and Vedic Sanskrit uta 'and'; the final lengthening is secondary).

Wiktionary derives utā in turn from Proto-Indo-European (PIE) *éti 'and', the source of Latin et 'and'. There are at least three problems with that etymology:

1. PIE *e should become Old Persian a, not u.

2. PIE *i should become Old Persian i (word-final -iy), not -ā.

3. There is already an Old Persian ati- 'beyond' (cf. Avestan aiti-* and Sanskrti ati- 'id.') which looks like the regular reflex of PIE *éti.

I think it's more likely that *uta was a Proto-Indo-Iranian innovation unless there are *uta-like forms elsewhere in Indo-European.

*2.16.15:21: The first -i- in Avestan aiti is epenthetic and conditioned by an i in the following syllable. WHY WRITE 'WIND' AS 'PAGE NUMBER'?

The last of the fourteen spellings of Vietnamese gió 'wind' in the traditional Chữ Nôm script at is

𩖅 = số 'number' + 頁 hiệt 'page' (originally 'head')

số is phonetic. In Middle Vietnamese s- was retroflex [ʂ] and gi- was palatal [ɟ]*, but in modern Hanoi, they are respectively much closer as alveolar [s] and [z]. Does the spelling 𩖅 reflect a dialect like Hanoi? How far back does it go, and is it associated with a certain region? In theory the Chữ Nôm script could be a rich source of dialect history since scribes could invent characters for native Vietnamese words incorporating phonetic elements whose readings resembled those words in their dialect (but not necessarily in other dialecfs or even their own dialect at a different point in time).

The function of the right half of 𩖅 is obscure. The wind has nothing to do with pages or heads. But wait, I see at that 𩖅 could also write sỏ in đầu sỏ 'leader'. I think that word is a compound of the Chinese loan 頭 đầu 'head' and the native word sỏ 'head of a pig'. If so, then 𩖅 for gió 'wind' is a case of a Chữ Nôm character originally devised for one Vietnamese word being recycled to write another:

sỏ 'head of a pig' > written as 𩖅 'sô-head'́ > 𩖅 recycled for gió 'wind'

What I still don't understand is how 頁 'head' came to represent 'page'. In Vietnamese as far as I can tell, hiệt  (ultimately going back to Old Chinese *get) means both 'head' and 'page', but in Chinese, 頁 has a second, unrelated reading for 'page' going back to Old Chinese *sɯ-lap 'leaf' (normally written 葉). In theory the 'page' reading of 頁 should exist in Vietnamese as *diệp (the reading of 葉 'leaf'), but no such reading seems to exist.

*2.15.0:20: De Rhodes (1651) said gi- "should be pronounced in the Italian manner" (translation from Gregerson 1969: 161). I interpret that to mean gi- was a palatal stop [ɟ] rather than an Italian palato-alveolar affricate [] since the former is more likely in Southeast Asia.

2.15.3:03: Added a high-vowel presyllalbe *sɯ- to Early Old Chinese *lap 'leaf' to account for the lack of emphasis which is normally conditioned by lower vowels such as low *a in Middle Old Chinese. The phonetic series of 葉  (Karlgren's GSR 339 + 633) points to *sɯ- in most cases.

Stage 1
Stage 2
Stage 3
Stage 4
世 'generation' (< 'leaf' + suffix)
*sɯ-lap-s *slap-s
葉 'leaf' *sɯ-lap *lap
韘 'archer's thimble'
*sɯ-lap *slap
屧 'bottom inlay in shoe' *sʌ-lep *sʌ-lˁep *slˁep

In Stage 2, high-voweled *Cɯ- blocks emphasis in the following syllable, but low-voweled *Cʌ- conditions it.

In Stage 3, some *CV-presyllables are reduced to *C- whereas others are dropped entirely.

In Stage 4, *sl- has fused into *l̥-, whereas *sV-l- still intact at stage 3 became a new *sl-.

But note 蝴蝶 *galep 'butterfly' in which *sɯ- or  even *s- cannot be reconstructed. WELCOMING THIS WIND (PART 2)

How did a Chinese character 這 'to welcome' which should have been read as nghiện come to be read as giá (and hence qualify as a spelling of the native Vietnamese word gió 'wind')? I'll embed my answer in a longer discussion of the words written with 這 below.

Wiktionary regards 這 as

part of the(OC [= Old Chinese] *ŋaŋ, *ŋraŋs, “to face, to meet”) word family

and cites Zhengzhang's OC reconstruction *ŋrans.

I cannot immediately reject all that. Nonetheless, I am skeptical.

First, the earliest attestation of the word I can find is in an entry in the dictionary 玉篇 Yupian compiled in the 6th century AD: i.e., during the Middle Chinese (MC) period. Is there evidence for the word in Old Chinese, or was MC *ŋɨenʰ mechanically projected back into OC as *ŋrans? The word is not in Schuessler's 1987 dictionary of early Zhou Chinese. There is a common, unspoken, and dangerous assumption that almost any native Chinese word can be traced back to early Old Chinese. (I was going to say that obvious loanwords like 佛 Middle Chinese *but 'Buddha' are thankfully exempt and that no one would reconstruct an Old Chinese 'reading' of 佛, but Zhengzhang's site has such a reconstruction: *bɯd!)

Second, the 迎 word family has two types of forms: open syllables and velar-final syllables. (I disregard *-ʔ and *-s* which may be suffixes.) In the past I have proposed that *-a was from an earlier syllabic *-ŋ, the 'zero grade' of *-aŋ. I also proposed that *-a could be the zero grade of *-an. Below I provide examples with Sanskrit parallels (citing Sanskrit zero ~ -m alternations in lieu of Sanskrit zero ~ -ṅ [ŋ̍] alternations which don't exist since Proto-Indo-European had no *ŋ̍).

Old Chinese
zero grade
zero grade
*wŋ̍ 'to go'
*waŋ-ʔ̍ 'to go'̍
ga-tá- < *gʷm̩-tó- 'gone' gám-a-ti < *gʷóm-e-ti 'goes' (Vedic)
*ŋn̩-ʔ 'to talk'
an 'speech'
ha-tá- < *gʷʰn̩-tó- 'slain'
hánti < *gʷʰénti 'slays'

My proposal explains why these word families don't seem to have forms with a mixture of final consonants: e.g., the 迎 *√ŋ-ŋ word family does not contain words with *-t, *-p, *-m, *-j, *-r, *-w, etc. The few *-k forms could reflect a lost denasalizing suffix.

If 這 belonged to that family, it would be the sole member with *-n.

My proposal has a number of problems: e.g., no support from the rest of Sino-Tibetan and no explanation for when zero grade occurs. (In the Sanskrit past participles above, one can see that unaccented roots take zero grade.)

In any case, the fact remains that -n words are anomalous in a zero ~ velar-final series, and that fact should be explained somehow - even if the zero-grade hypothesis is wrong.

It seems that at some point in the late first millennium AD, 這 came to be used to write an unrelated, nonhomophonous word 'this' (now zhè in Mandarin). The earliest attestation of 這 for 'this' that I can find is in the Jiu Tang shu 'Old Book of Tang' (945). How did that happen?

Here's what I've pieced together from Wiktionary (which should cite its sources) with my caveats.

The word 'this' was once written as 者 and was

[d]erived from (OC *tjaːʔ, “one which”), around the Tang Dynasty.

(OC *tjaːʔ, “one which”) > 者 (MC t͡ɕiaX, “this (possessive case)”) > 者 (MC t͡ɕiaX, “this (general demonstrative)”) > Mandarin 這 (zhè).

There are three problems with that etymology:

1. 者 'one which', unlike 'this', does not precede nouns.

but perhaps X 者 Y 'one which X Y' was reinterpreted as 'X this Y', followed by X before 者 becoming unnecessary?

2. 者 'this' has no 'possessive case' - no word in Chinese does.

That is really an analytical and terminological error that doesn't affect the validity of deriving 'this' from 'one which'.

3. 者 'one which'/'this' had a 'rising tone' in MC but 這 'this' has a 'departing tone'.

Was 者 'one which' used to write an unrelated homophone 'this'? Is the 'departing tone' of 這 'this' due to a sandhi tone (a 'departing'-like allophone of the 'rising tone'?) reinterpreted as the default tone since 'this' must always precede something (i.e., is in a sandhi context)?

There was also a word for 'this' with a 'level tone' written phonetically with a character 遮 for 'block off'. Could the 'departing tone' of 這 'this' be the etymological and colloquial tone while 'level' and 'rising tone' readings were artificial spelling pronunciations based on 遮 for 'block off' and 者 'one which'?

Wiktionary then says there was a "confusion in medieval handwriting" between 遮 'this' and 這 'to welcome' which led to 這 becoming the dominant spelling for 'this'.

Although Sino-Vietnamese readings almost entirely reflect Chinese as it was spoken during the end of the third Chinese domination of Vietnam (602-938; i.e., right before the early attestation of 這 for 'this' in Jiu Tang shu), I briefly thought già 'this' might be from the fourth Chinese domination (1407-1427). By that time 這 was firmly in place for 'this' in Chinese, and I could see the word entering Vietnamese via the Ming occupation. The trouble is that 這 would have been something like [tʂjɛ] in Ming Chinese* which would have been borrowed into Vietnamese as *tré with a retroflex initial and a mid vowel, not già with a *palatal initial and low vowel. So I think già is from the last days of the third Chinese domination before *a rose to a mid vowel in Chinese.

*2.14.6:28: I don't have any Ming materials on hand, so I am projecting the Yuan dynasty Phags-pa reading ꡆꡦ <jee> (interpreted by Coblin 2007: 171 as [tʂjɛ]; needless to say, the script should be rotated 90 degrees clockwise) of 這 forward into the Ming dynasty.

The Ming reading of 者 (homophonous except for its rhyme) was used to transcribe Jurchen je /tšə/: e.g., 兀者 *[utʂjɛ] for uje 'heavy' (#67 in the Bureau of Interpreters' Sino-Jurchen vocabulary, Kane 1989: 49).

According to Coblin (2003: 349), Robert Morrison romanized the early 19th century Mandarin rhyme of 這 and 者 as -ay as in May, possibly [e] in his dialect of English. (The Mandarin rhyme is a back [ɤ] in the modern standard.)

Of course all those different varieties of northern Chinese were probably not in a linear relationship across half a millennium, but they all have a nonlow vowel in common unlike giá whose low vowel is characteristic of pre-second millennium pronunciation.

Moreover I don't know if the Vietnamese would have perceived the 'departing tone' of the Ming occupiers as a sắc tone which was the standard equivalent of that tone in late first millennnium borrowings. Perhaps my hypothetical *tré would have had a different tone. WELCOMING THIS WIND (PART 1) listed fourteen spellings of Vietnamese gió 'wind' in the traditional Chữ Nôm script. In the previous post, I have already listed twelve spellings containing the phonetic 俞 du 'to consent' (or phonetics containing that phonetic: du 'to pass' and 愈 dũ 'more'). The remaining two spellings lack 俞 and could be said to belong to a Group D of miscellaneous spellings (or groups D and E with one character each):

13. 這 (see the entry for gió in Anthony Trần Văn Kiệm's Giúp đọc Nôm và Hán Việt 'Aid for Reading Nôm and Sino-Vietnamese')

14. 𩖅

I will discuss 14 later.

13 這 represented Middle Chinese *ŋɨenʰ 'to welcome'; its phonetic is 言 ngôn 'speech' atop 辶, the semantic element for motion. Sino-Vietnamese is almost entirely based on southern Late Middle Chinese, so the Sino-Vietnamese reading of 這 should have been *nghiện which is the SV reading of 這's Middle Chinese homophones such as 唁 'to offer condolences' and 彥 'handsome man'.

But the actual Sino-Vietnamese reading of 這 is giá which is obviously not far from gió 'wind'. It's understandable why gió 'wind' would have been written as 這 giá: the initial consonant gi- and the sắc tone (represented by an acute accent) match even though the vowels don't. (At least lower mid o [ɔ] is just one step up above low a.) It's less understandable how a character that looks like it should have been read something like 言 ngôn came to be read as an open syllable giá without an initial nasal. However, I think I figured out what happened, and I'll post my solution in part 2. The title of this two-part microseries hints at the answer.

2.13.0:11: Giles' Chinese-English Dictionary (1892 I: 48) lists the Sino-Vietnamese readings of 這 as the expected nghiện (converted from its notation) as well as gia (no tone indicated). CHÂN GIÒ NƯỚNG IN CHỮ NÔM

Last night I went to a Vietnamese restaurant in search of chân giò nướng 'grilled trotters'. They weren't on the menu, but I did try to look up how that dish would have been written in the traditional Chữ Nôm script. My guess is something like


The first and third characters are straightforward made-in-Vietnam semantophonetic compounds:

chân 'foot, leg' = 足 'foot' + 眞 chân 'true'

𤓢 nướng 'to grill' = 火 'fire' + 曩 nãng 'formerly'

Although 娘 nương and 孃 nương are better phonetic matches for nướng 'to grill', both already have a left-hand element 女, and it would be awkward to place another left-hand element 火 'fire' next to it. (I suppose placing 娘 or 孃 atop the bottom version 灬 of 'fire' would have been possible, but I haven't seen any made-in-Vietnam characters with 灬.) Stripping them of 女 and replacing that element with 火 'fire' would result in

烺 which already exists and is read lãng 'bright' with l- (not n-)

爙 which already exists and is read nhưỡng 'fiery appearance; Mars' (rare) with nh- [ɲ] (not n-)

whereas 曩 nãng does have n-.

One might conclude that matching initials were a high priority when selecting phonetic components of Chữ Nôm characters. But the second character of the dish I wanted does not have a phonetic with a matching initial:

𨃝 giò 'leg of an animal' = 足 'foot' + 徒 đồ 'disciple'

The trouble is that I don't think there is any Chinese character whose Sino-Vietnamese reading combines gi- with a rounded vowel. Although the only part of 徒 đồ that precisely matches giò 'leg of an animal' is the tone, everything else is close enough:

đ- [] is not gi- [z] (northern) ~ [j] (southern) < *ʑ, but at least it's neither labial nor velar; it's in the middle zone with gi-

ô [o] is back mid rounded like o [ɔ]

I just realized giò 'leg of an animal' could in theory have been written with 由 do (d- [z] (northern) ~ [j] (southern) < *j) as a phonetic. Cf. how gió 'wind' with a different tone was written with a d-phonetic:

Group A with phonetic 俞 du 'to consent'

1. with a Vietnamese abbreviation of 風 'wind' on top: ⿱風俞 (U+2CC82)

2. with a Vietnamese abbreviation of 風 'wind' on the right: 𫖾

3. with 雨 'rain' (symbolizing weather phenomena) on top: ⿱雨俞 (U+2CC05)

Group B with phonetic 逾 du 'to pass'

4. as 逾 without modification

5. with 風 'wind' on top: 𩙋

6. with a Vietnamese abbreviation of 風 'wind' (not in Unicode) on top: 𩙌

7. with 雨 'rain' (symbolizing weather phenomena) on top: 𫕲
Group C with phonetic 愈 dũ 'more'

8. with 風 'wind' on the left: 𩙍

9. with a Vietnamese abbreviation of 風 'wind' on the left: 𫗃

10. with 月 'moon/meat' as a substitute for the Vietnamese abbreviation of 風 'wind' on the left: ⿰月愈 (not in Unicode)

11. with a Vietnamese abbreviation of 風 'wind' (not in Unicode) on the right: ⿰(𠘨+二)愈

12. with a Vietnamese abbreviation of 風 'wind' (not in Unicode) on top: 𫗄

2.12.2:04: Added all d-phonetic characters for gió 'wind' (I couldn't stop at one). THE ZANABAZAR SQUARE SCRIPT (PART 3)

I left out the retroflex sibilant out of my discussion of Zanabazar Square script retroflex characters in part 2. Unlike the other retroflexes, <ṣa> is a mirror image of a nonretroflex character <śa> as in Tibetan and not as in other Brahmic scripts such as Brahmi or Devanagari where <ṣa> and <śa> are completely different.


𑨯 𑀱
𑨋 𑀓
𑨲 𑀓𑁆𑀱

The table above includes <kṣa> which has a special character in the Zanabazar Square script which is clearly derived from <ka> though the altered lower left corner bears little resemblance to <ṣa>.

Tibetan has a transparent stack of <ka> over <ṣa>.

In Brahmi, <ka> and <ṣa> are fused into a transparent ligature.

Only now after twenty-six years do I finally see the logic in Devanagari <kṣa> which I learned as a special character. The top left loop is what's left of <ka> and the bottom of the left side is what's left of <ṣa>.

2.11.15:03: I wonder if Zanabazar's <kṣa>  was influenced by Devanagari <kṣa>. Both have similar bottom left-hand corners. THE ZANABAZAR SQUARE SCRIPT (PART 2)

Thanks to Andrew West for providing me with a WOFF version of his font for the Zanabazar Square script. I have tagged part 1 of this series to employ that font.

The Tibetan roots of the script are implied in its characters for retroflex consonants which are derived from the characters for dental consonants:

Zanabazar Transliteration
𑨙 <tha>
𑨕 <ṭha>
𑨚 <tha>
𑨖 <ḍa>
𑨛 <da>
𑨘 <ṇa>
𑨝 <na>

'Implied' because the retroflexes are derived from the dentals in several different ways rather than simple mirror-image versions of the dentals as in Tibetan:



The Zanabazar retroflexes nonetheless are not as distinct from the dentals as they would have been if they had directly descended from Brahmic retroflex characters:

𑀢 <tha>
𑀞 <ṭha>
𑀣 <tha>
𑀟 <ḍa>
𑀤 <da>
𑀡 <ṇa>
𑀦 <na>

(If the Semitic hypothesis of the origin of Brahmi is correct, <tha da na> are original* and <ṭha ḍa ṇa> may be derived from them, but the relationships between them were no longer obvious in the Brahmic scripts of Zanabazar's time after centuries of graphic evolution: e.g., Devanagari <ṭha> and <tha>.)

The Tibetan script did not incorporate any descendants of the Brahmi retroflex characters. Here is the earliest extent account of the creation of the Tibetan script (translated by Sam van Schaik):

In India the script has 50 letters. Tönmi discarded the gha [voiced aspirate] group and the ṭa [retroflex] group, which do not appear in Tibetan speech.

The consequences of discarding the gha group are visible in part 1 where I explained how the Tibetan script and its Zanabazar derivative represented voiced aspirates without relying on descendants of the Brahmi voiced aspirate series.

Later the Tibetan script was extended for transcribing Sanskrit, and retroflex letters were created by mirror-imaging dental letters instead of Tibetanizing northern Indian retroflex letters that descended from Brahmi retroflex letters.

Was Zanabazar unaware of a Brahmic script with a retroflex series completely different from its dental series? If he was aware of one, he might have decided to somewhat follow the Tibetan precedent anyway because graphically related characters are easier to learn than graphically unrelated ones. (The same logic may underlie the voiced aspirates of the Zanabazar square script.)

*2.10.0:15: Salomon (1998: 25) compares Brahmi <tha da na> to Phoenician and Aramaic <tˁ d n>. I don't see much similarity between Brahmi <da na> and Semitic <d n>, but Brahmi 𑀣 <tha> certainly does look like Phoenician 𐤈 <tˁ> and its Greek derivative Θ theta. However, coincidental overlaps between simple shapes are expected.

If Brahmi was based on a Semitic script, its <ṭa> seems to have been created ex nihilo unless Bühler's (1895) view that <ṭa> was a reduction of <ṭha> is correct. A GOOD AMOUNT OF VARIATION: THE ORIGIN AND ORTHOGRAPHY OF VIETNAMESE TỐT (PART 1)

Thompson (1976: 116) reconstructed Proto-Viet-Muong *tʰoc 'good (beautiful)' on the basis of

(I have rewritten Thompson's segments in IPA but have retained his tonal notation.)

The trouble with reconstructing *tʰ is that there is a different set of sound correspondences also pointing to *tʰ found in 'medicine':

Thompson solved this problem by reconstructing two kinds of proto-*tʰ: one that deaspirated and one that didn't. But why would one deaspirate?

Premodern Chinese loans into Vietnamese point to a simpler solution. They have the following correspondences:

A chain shift occurred in Vietnamese after borrowing from Chinese:

*s > *t >

That shift postdates the split of Vietnamese from the other Viet-Muong languages.

Proto-Viet-Muong *s became aspirated in the ancestor of Mường Khến rather than unaspirated t as in Vietnamese. Some other Muong varieties retain *s.

Thus I reconstruct Proto-Viet Muong *soc 'good'.

Can that word be projected back into Proto-Vietic, or is it a Proto-Viet-Muong innovation? In other words, does it exist in any non-Viet-Muong Vietic languages, and if it does, is it native to those languages (rather than a borrowing from Viet-Muong)?

The only match I could find at SEAlang is Ruc tʰóːt 'good'. This looks like a loan postdating the fortition of Proto-Viet-Muong *s and the shift of Proto-Viet-Muong *oc to ôt in Vietnamese since

The aspirated initial rules out Vietnamese as a source. Is there a nearby Muong language with a word like tʰóːt for 'good'? Could the Ruc word be a composite of a Muong word with tʰ- and a Vietnamese word with [ot]? But there do not seem to be any Muong in Quảng Bình Province where the Ruc live. Puzzling.

I suppose Ruc haːj is the native word for 'good'.

Next: The many spellings of Vietnamese tốt in Chữ Nôm. THE ONSET OF PROTO-TAI 'NEAR'

I know from experience that interrupting a series of posts leads me to dropping a series midway and forgetting about it. (I do remember the Golden Guide series I never finished, though. That is too big to forget.) But on the other hand I also don't want to forget topics that come up in the middle of a series. This is one such topic.

In Siamese,

(I use Tai tone terminology [A1, C1] in lieu of IPA tone letters to facilitate comparison between Tai languages.)

are a minimal pair distinguished only by tone in pronunciation. Their different vowel symbols imply an earlier segmental distinction that was lost - and that can be confirmed by other Tai languages which preserve different rhymes: e.g., Yay caj A1 'far' and caɰ C1 'near'. (All non-Lao Tai data in this post is from Pittayaporn 2009.)

The Lao cognates of 'far' and 'near' are like those of Siamese apart from lacking a medial [l]:

So far, Siamese, Lao, and Yay seem to indicate that 'far' and 'near' should be reconstructed with the same initial in their common ancestor Proto-Tai. However, other Tai languages have different initials in the two words: e.g.,

Bao Yen
kwɤj A1
kwaj A1
sɤɰ C1
kʰjaɰ C1

Therefore the two words must have had different initials in the proto-language. Pittayaporn (2009: 345) reconstructed them as sesquisyllables ('one-and-a-half syllables') *k.laj A and *k.raɰ C with a presyllable (his 'degenerate syllable') k.-. The presyllable-onset sequences *k.l- and *k.r- were distinct from the true clusters *kl- and *kr- which had different reflexes:

Pittayaporn's Tai subgroup
Bao Yen
*k.l- (only in 'far')
*kl-: e.g., 'rice seedling'
*k.r-: e.g., 'illness, fever'
kʰ- c-
*kr-: e.g., 'six'

Notice that the initials of the non-Yay reflexes of *k.raɰ C 'far' do not match those of the similar-sounding word *k.raj A 'illness, fever' in the table above:

'Far' is the only instance of Siamese kl- from *k.r-. Could the common ancestor of Siamese and Lao have irregularly altered 'far' to match the*k.l-initial of 'near'? However, no such analogy would motivate the initials of 'far' in Bao Yen and Lungchow.

Bao Yen has both kʰ- and s- as reflexes of *k.r-. Might they be reflexes of different presyllables?

Bao Yen, as its Vietnamese name implies, is spoken in northwestern Vietnam. It may be no coincidence that the kʰ- and s- reflexes of *k.r- are like the Mường Khến and northern Vietnamese reflexes of Proto-Viet-Muong *kr-:  x- (< *kʰ-) and [s].

As for Lungchow, it has three reflexes of *k.r-:

Perhaps *k.r- generally simplified to *kr- in pre-Lungchow but not in 'hard' and 'near' where it developed into kʰ(j)-. The medial -j- of kʰjaɰ C1 'near' is reminiscent of the -j- that is a reflex of *-l- in *kl-. kʰjaɰ C1 'near' looks like a compromise between *k.r- and *kl-variants of 'near' in pre-Lungchow. Could such variation go back to Proto-Tai, with some languages like Thai and Lao reflecting a version of the word with *-l- instead of *-r-?

But I think the Lungchow words for 'near' and 'hard' may actually reflect different presyllabic vowels:

A palatal presyllabic vowel conditioned -j- in 'near', whereas 'hard' had no such vowel and therefore never developed -j-.

That hypothesis might explain other cases of unexpected -j- in Lungchow. But do such cases exist? And might the s- of Bao Yen sɤɰ C1 come from the *kIr- I proposed above (as opposed to Bao Yen kʰ- < *kVr- in which V is not palatal)? THE ZANABAZAR SQUARE SCRIPT (PART 1)

Yesterday I learned of the Zanabazar Square script from Andrew West and downloaded his font for it. In short it is like an extended version of the Tibetan script with additional characters for Sanskrit and Mongolian. If you do not have a Zanabazar Square font, you can see the characters here.

The first thing that caught my eye was that it has characters for voiced aspirated initial <gha ḍha dha bha dzha> that are not simply ligatures with <ha> like Tibetan གྷ ཌྷ དྷ བྷ ཛྷ <g.ha ḍ.ha d.ha b.ha dz.ha>. They are also not derived from the Brahmi characters for for 𑀖𑀠𑀥𑀪𑀛 <gha ḍha dha bha jha>:

There is no consistent graphic method of derivation. Moreover, the base characters are a mix of voiceless unaspirated initial characters (<ka ta>) and voiced initial characters (<ḍa ba dza>). Might that hint at how Mongolians perceived Tibetan pronunciations of Sanskrit voiced aspirates? Nonetheless, those derived characters are still easier to learn than hypothetical Brahmi-based characters for <gha ḍha dha bha dzha> whose shapes bore no relation to those of characters for phonetically similar consonants: e.g., 𑀖 <gha> looks nothing like 𑀓 𑀔 𑀕 <ka kha ga>, etc. You can see many more examples in Brahmi-descended scripts here. RETURN TO THE SILVER RIVER (PART 4)

Given that the Tangut character for the Chinese loanword for 'river'


1990 1chhwan3 'river' < Tangut period northwestern Chinese 川 1chhwan3 'river'



Unicode Tangut component 036 / Nishida radical 181 / Boxenhorn code cir

the left side of


3058 2zyr'4 'water'

I would expect the Tangut character for the native loanword for 'river' to contain that component. But it doesn't. 1530 1ma4 'river' is analyzed in Tangraphic Sea as


- top right (not left side!) of 3058 2zyr'4 'water'

- right side of 0632 1vi1 'ripe, cooked'

0632 has no phonetic or semantic similarity to 1530.

Of course, there is water in a river, so 3058 is not surprising, though its abbreviation as


Unicode Tangut component 185 / Nishida radical 026 / Boxenhorn code fam

is. Nishida (1966: 242) regarded that component as 'stone', as it appears in


1074 1luq1 'stone'

But this is a case where labels for components are misleading; it makes no sense to call the top of 1530 'stone'.

If the Tangut script reflects the phonetic structure of a second Tangut language - 'Tangut B' - the characters imply that the Tangut B readings of 3058, 1530, and 1074 all have a common element X, and that 'river' and 'ripe' are near-homophones:

3058 'water': X + ? + ? (the left-hand element might be semantic and have no reading)

1530 'river': X + the sounds of 0632 'ripe'

1074 'stone': X + ? + ?

Is there a language in the region in which 'river' sounds like 'ripe' preceded by a segment or syllable that would be the phonetic value of X?

2.6.1:09: I would also expect that language to have the same initial consonant (or syllable) in 'river' and 'stone'. I'm guessing no such language exists today. But did one exist in the past? RETURN TO THE SILVER RIVER (PART 3)

3572 2ngwo1 'silver' was analyzed in Precious Rhymes of the Tangraphic Sea as


- the bottom left of 0136 2de'4 'ingot' (< Middle Chinese 鋌 *deŋˀ 'id.') +

- the right of 5722 2ngwo1 (first half of 3360 5722 0nwy0 2ngwo1 'eloquence')

Clearly 0136 is semantic and 5722 is phonetic. Case closed? Not quite.

First, why pick the bottom left of 0136 instead of the 'metal' radical


Unicode Tangut component 542 / Nishida radical 028 / Boxenhorn code tex

which is also absent from the character for another major metal in the analysis of 0136 2de'4 'ingot':


- top of 0152 1kiq2 'gold' +

- left of 3572 2ngwo1 'silver' +

- right of 2290 2lon1 'round' (but an ingot isn't round! is 0136 really 'ingot'?)

Why write the words for some metal objects with 'metal' but not others? That is quite different from the situation in Chinese where nearly all metals are written with 金 'metal' - one exception that comes to mind is 汞 gong 'mercury', a combination of 工 gong (phonetic) and 水 'water' (semantic).



Unicode Tangut component 529 / Nishida radical - / Boxenhorn code tau

is also phonetic in


5723 2ngwo1 'elephant' (only found in dictionaries) =

- bottom center of 0021 1bu2 'elephant, ox'? (the semantics of this word need closer examination)

- right of 3572 2ngwo1 'silver'

but the seven other characters containing that component are not read 2ngwo1. I would like to look into the other functions of that component after I wrap up this tetralogy on the Silver River. RETURN TO THE SILVER RIVER (PART 2)

1990 1chhwan3 'river', the Tangut character I used to transcribe Chinese chuan 'river' in the name of the Tangut font I use, was analyzed in Tangraphic Sea as a combination of


1. the left side of 3058 2zyr'4 'water'

2. the center of 2474 2rar1 'to flow'

3. the right of 2107 1tsir1 'earth'

The left side of 'water' is no surprise.


Unicode Tangut component 036 / Nishida radical 181 / Boxenhorn code cir

is the left-hand form of the Tangut radical for 'water' - one of the few elements in the script that has an indisputable single meaning - and its presence in 'river' is similar to that of the presence of the Chinese radical 氵 'water' in 江 jiang 'river' (as in 江泽民 Jiang Zemin) but not 川 chuan which is a drawing of a river). Grinstead (1972) regarded the Tangut radical as a derivative of the Chinese radical.

The right side of 江 jiang 'river' is phonetic - in Old Chinese, 江 was *kroŋ and its phonetic 工 was *koŋ - but the remaining two components of Tangut 1chhwan3 are not phonetic: they sound nothing like 1chhwan3. And unlike the water radical,


Unicode Tangut components 101, 053 / Nishida radicals 104, - / Boxenhorn codes dai, cok

have no obvious fixed meanings.

Nishida (1966) attempted to assign meanings to as many components as possible but could not find one for his radical 104. It seems to be phonetic in twelve characters pronounced rar, but of course 1chhwan3 'river' is phonetically completely different, and sixty other characters containg it are also not pronounced rar. It could not mean 'flow' in, say:


0205 1jan3

which is a meaningless transcription character.

And no one seems to have found any function for the small, closed set of exclusively right-hand components such as Unicode 053 - could they be phonetic symbols for Tangut B final consonants akin to 音 in Old Korean pam 'night', written 夜音 <NIGHT.m> or 乙 in the made-in-Korea character 乭 tol 'stone', a ligature of the Chinese characters 石 'stone' and 乙 ŭl? If the intent was to represent 'river' as 'water flowing through land', why not pick the 'earth' radical


Unicode Tangut component 263 / Nishida radical 210 / Boxenhorn code ges

instead of a right-hand component also found in non-'earth' characters such as


1906 1non'2 'and, also, again' and 1918 1mi4 'not'

which didn't even sound like 2107 1tsir1 'earth' (or each other)?

2.4.0:36: Ironically, the 'earth' radical isn't in


2107 1tsir1 'earth'

whose simlar-looking left-hand component (Unicode Tangut component 267 / Nishida radical 211 / Boxenhorn code gii) looks like


3087 1dzew4 'waist'!

Andrew West looked at every single character containing a element resembling 3087:

This component is Nishida Tatsuo's Radical No. 211, which he calls the "sun radical" 日部 (see Seikago no kenkyū 西夏語の研究 [A Study of the Hsi-Hsia Language] page 244). However, very few characters with this component are in any way related to the sun, and so Nishida's radical name is a misnomer (by far the largest semantic group of characters with this component is the "Bird-related" group, but Nishida already has a "bird" radical). As we shall see below, unlike most Chinese radicals, Tangut radicals do not have a single fixed meaning, and so giving names to them (as Nishida and others have done) is at best not very useful, and at worst misleading.

I am one of those 'others', and I confess I give names partly out of convenience - it's easier for me to remember names than numbers. However, those names are only truly justified whenever there is a nearly one-to-one correspondence between a component and a function: e.g., 'water' usually is in water-related characters, though there are still puzzling exceptions I cannot explain like


2019 1tha4 'third person pronoun' and 2590 2vy3 'outward motion; perfective prefix'

which have no obvious aquatic connection (unless 2vy3 once meant 'out of a river'?). More on those characters in my entry on line 25 of the Golden Guide, a series I have yet to finish. RETURN TO THE SILVER RIVER (PART 1)

I haven't posted anything in over a year. In fact this is the first time I'm using KompoZer since the end of last February when I started a post I never finished. I have lots of those. It might be interesting to complete them knowing what I know now. But in the meantime, I thought I'd start a new wave of posts with the name of Prof. 景永时 Jing Yongshi's font that freed me from the need to make a GIF every time I wanted to display a Tangut character: Tangut Yinchuan. Which in Tangut might be


3752 3296 1478 1990 2my4 2na'3 1gin4 1chhwan3

if I phonetically transcribe how Yinchuan 'Silver River' was pronounced in the Chinese dialect known to the Tangut a millennium ago. (1990 isn't just a transcription; it is a borrowing of that dialect's word for 'river'.) Otherwise I could render the name as


3752 3296 3572 1530 2my4 2na'3 2ngwo1 1ma4

with the native words for 'silver' and 'river'.

I've written about the Tangut autonym 3752 3296 before, so I'm going to look at the four characters I've added to it above. In order to not be too ambitious, I'll focus on just one character per post.

The first is the transcription character 1478 analyzed in Tangraphic Sea as


1478 1gin4 = left of 0830 1kin4 + right of 0405 1dzwyq4 'wall'

1478 1gin4 and 0830 1kin4 are nearly homophonous, so obviously


Unicode Tangut component 224 / Nishida radical 123 'together' / Boxenhorn code fol

is supposed to tell us that 1478 sounds like 0830, though it is not obvious how a Tangut reader would know that 0830 was the source instead of any of the 96 other characters with 224/fol is a problem:


It is also not obvious is why 1478 is said to have its right side taken from the left side of 0405 1dzwyq4 'wall' which sounds nothing like 1gin4. 1gin4 does not mean 'wall'. The Tangraphic Sea defines it as a tribal and place name, giving


1478 0707 1gin4 1chew3 (a transcription of Tangut period Chinese 銀州 *1gin4 1chiw 'Silver Prefecture')

as an example. Was the Gin tribe or the Silver Prefecture associated with walls?

Some Tangut transcription characters are combinations of components of two characters: one character for the initial consonant and another for the rhyme. The last of those characters in Li Fanwen's 2008 dictionary is 6072:


6072 2pu3 = 5970 1pi2 + 3057 2zhu3

In theory, 1gin4 could have been written as a combination of a component from a g-character and a component from a 1-in4 character (the initial 1- indicates the tone which belongs to the rhyme, though I write it first following Arakawa Shintarō's convention). And in fact, such a character exists:


5622 1gin4 = 1638 1gi4 + 0494 1in4

5622 even appeared in another transcription of Tangut period Chinese 銀州 1gin3 1chiw 'Silver Prefecture':


5622 0707 1gin4 1chew3

So why were two characters


1478 and 5622

created to write the syllable 1gin4? 5622 doesn't have an entry in Tangraphic Sea, but Homophones lists 5622 and 1478 as ... nonhomophones.

Ah, I see what happened now. My readings are based on those of Gong Hwang-cherng who thought 5622 and 1478 were homophones. But Homophones is right - the Tangraphic Sea fanqie for 1478 indicates that 1478 is Grade III, not Grade IV:


1478 1gin4 = 3590 1gi'4 + 1661 1lin3

Hence from now on I will read 1478 as 1gin3 with a final -3 for Grade III.

I think I understand what happened. Normally Tangut only permits three grades in a syllable with g-: I, II, and IV. But the Chinese word for 'silver' was 1gin3 with Grade III. So the Tangut had two options: they could either write the Chinese word as 1478 1gin3 with an un-Tangut combination of g- and Grade III, or they could write it as 5622 1gin4 with a slightly Tangutized pronunciation. I wouldn't be surprised if most Tangut called the Silver Prefecture 1gin4 1chew3 and if the literate among them often 'misread' 1478 as 1gin4 with Grade IV to avoid the un-Tangut combination of g- and Grade III.

What were the Grades, exactly? I still don't know, but for now I believe III was nonpalatal and IV was palatal. Tangut was like Russian which normally favors 'Grade IV' [i] over 'Grade III' [ɨ] afer velars: e.g., the plural of pirog is pirogi [pʲirɐˈɡʲi] rather than *[pʲirɐˈɡɨ] with the regular plural ending [ɨ]. "Normally" but not always, because the great late Prof. Kychanov's name contained a velar [k] followed by 'Grade III' [ɨ]. And because the Tangut could pronounce velars with Grade III if they really wanted to closely imitate Chinese which had no restrictions on velars and Grade III. BEGINNINGS OF THE END



4859 2to1 'to end'

is a relatively simple character that was supposedly derived from more complex characters according to Combined Homophones and Tangraphic Sea 6.231:


0117 2705 5712 0737

2thew1 2ber'4 1jwa3 1chhen3

'(first half of 'finally'?) right to.end bottom'

But surely 0117 and 5712 were derived from 4859 rather than the other way around.

0117 is a particularly odd 'source' as


0117.0048 2thew1 2thwu4 'finally' (?)

is a disyllabic word apparently only in dictionaries. Does it belong to the 'ritual language' (which I think was a substratal language)? It looks like a reduplicative form.

1.19: Li Fanwen (2008: 20) even phonetically glossed 0117.0048 in Chinese as 都都 dudu as if it were a perfect reduplication, though it wasn't; there is no doubt that its two syllables belong to different rhyme categories (2.38 and 2.3). If the word is of native origin or was borrowed very early, it could be mechanically derived from *tʰopH.Pɯ.tʰoH:

*-op > -ew1 (but -ew1 also has other sources; see below)

*-H > tone 2-

*Pɯ- > -w-...4

*-o > -u

Could 2thew1 2thwu4 be a borrowing of something like *tʰop(p)ɯtʰoH? Could a single medial *-p- be the source of both final -w and medial -w-? Could tone 2 have spread from the second syllable? Or was the original medial consonant an aspirated *-pʰ- that was the source of (1) final -w and tone 2 of the first syllable and (2) medial -w- of the second syllable?

One problem with the above scenario is that both halves of 2thew1 2thwu4 are attested apart from each other in the definition for 5712 (Mixed Categories of the Tangraphic Sea 7.133):


5712 3583 4859 5712 0117 5285 ... 0048 5285

1jwa3 1ta4 2to1 1jwa3 2thew1 1ly3 ... 2thwu4 1ly3

'5712 is [as in] 4859 5712 0117 ... is 0048.'

Li Fanwen (2008: 20) translated that definition as 畢者終、竟、畢也...終也 'finish is end, finally, finish, ... is end', interpreting


4859 5712 0117

2to1 1jwa3 2thew1

as separate glosses. However, if that were the author's intent, he could have broken up the three syllables with the phrase-final particle 5285:


5712 3583 4859 5285 5712 5285 0117 5285

1jwa3 1ta4 2to1 1ly3 1jwa3 1ly3 2thew1 1ly3

'5712 is 4859, is 5712, is 0117.' = 畢者終也、竟也、畢也

Although 4859 5712 0117 could be a string of three words (there is no Tangut word for 'and'), I tentatively assume that


4859 5712 0117

2to1 1jwa3 2thew1

is a trisyllabic word ending in a bound morpheme 0117. In any case, 0117 and 0048 are in two separate parts of the definition of 5712 and therefore are probably not borrowed from a single disyllabic word, though it is hypothetically possible for an original disyllabic word to be later reanalyzed as a sequence of two morphemes: cf. Late Old Chinese 獅子*ʂitsəʔ 'lion' (from a form like Tocharian B ṣecake) later reanalyzed as 'lion' + noun suffix.

I thought 0117 2thew1 might be a reduplication of 4859 2to1 in an X Y X' pattern, but they can't be terribly close in pre-Tangut,


4859 5712 0117

2to1 1jwa3 2thew1< *tokH PɯNCaC KtopH?

and it would be weird for X' in such a pattern to then combine with an X'' in another word - namely,


0117.0048 2thew1 2thwu4 < *KtopH Pɯ.KtoH?

Could 4859, 0117, and 0048 share a root *to? Here is a list of possible reconstructions for each morpheme:

4859 2to1 < *taŋH, *tokH, *tojH?

*taŋH resembles Old Chinese 終 *tuŋ 'end', but the vowels don't match.

0117 2thew1 < *tʰopH, *Cʌ.tʰukH, *Cʌ.tʰikH

or *KtopH, *Kʌ.tukH, *Kʌ.tikH?

The *top-like reconstructions resemble Proto-Kuki-Chin *toop 'end', but it's unlikely a o

I reconstruct lower-vowel presyllables *Cʌ- and *Kʌ- to condition Grade I in the higher-vowel rhymes *-ukH and *-ikH; without such presyllables, those rhymes would have retained vowel height and developed into Grade IV -iw rather than Grade I -ew.

If aspiration is not original, then it is from *K-.

0048 2thwu4 < *Pɯ.tʰoH, *PtʰəH, *Pɯ.KtoH, *PKtəH, *Kɯ.PtoH, *KPtəH?

I assume medial -w- is always secondary from *P-, but I could be wrong.

I reconstruct higher-vowel presyllables to condition Grade IV in the lower-vowel rhyme *-oH.

If aspiration is secondary, then it is from *K-, and the order of this *K- relative to *P- is uncertain.

Out of all the above possibilities, I could pick a set sharing *to as a common denominator and then regard the other elements as affixes:

4859 2to1 < *to-k-H, *to-j-H?

0117 2thew1 < *K-to-p-H?

0048 2thwu4 < *Pɯ-K-to-H, *Kɯ-P-to-H?

But what would those affixes mean? And are there any other alternations of the type -u ~ -ew justifying the reconstruction of an earlier alternation *-Ø ~ *-p?

Putting diachrony aside, the synchronic meanings of 0117 and 0048 are uncertain:

Li Fanwen number
Clauson 2016
Nishida 1966
Grinstead 1972
Kychanov and Arakawa 2006
Li Fanwen 2008
遇い終わる 'to finish meeting'
заканчивать, завершать
finish, end
約束, 完結, 終
completely, finally
完 (adv.)
會見を終わる 'to end a meeting'?
約束, 終
at last, in the end
終 (adv.)

(no polysyllabic words)
заканчивать, завершать finish, end
約束, 完結, 終了, 做完
完畢, 終畢

(1.29.1:27: Filled in Nishida column. 0117 does not have its own entry in Nishida 1966, but its meaning is given in the entry for 0048.)

I think the definitions in modern (i.e., post-Clauson) dictionaries are speculative. Not entirely groundless - the fact 0117, 0048, and 0117-0048 appear in definitions for 'end'-words indicates that they mean something like 'end'. But 'something like' is not the same thing as certainty that they are verbs (according to Kychanov and Arakawa) or adverbs (according to Li Fanwen). It is, however, more than a simple question mark indicating we have no idea what something means.

1.24.16:01: A future Tangut dictionary could distinguish between three categories of words:

1. words whose meanings can be confirmed from context

2. words whose general semantic domain can be determined from dictionary entries

3. words whose meanings are unknown

0117, 0048, and 0017-0048 fall into the middle category. Strictly speaking, 0117 may not even be a word; it  may be a bound morpheme.

A distinction between bound and free morphemes would also be a useful feature of a future Tangut dictionary. Current dictionaries are character-based, and all characters are given definitions, even though not all characters represent free morphemes. Users unfamiliar with Tangut cannot easily determine whether a given nontranscription character represents a word (i.e., a free morpheme) or only part of a word. (Transcription characters are indicated as such and by definition represent sounds, not words.)

To come: Is 0099 another member of the 'final' family? WHAT'S SO MATERNAL ABOUT BROTHERS?

Could character structure elucidate the meaning of


0012.5873 1bu3.2kuq1

which Li Fanwen (2008: 3, 926) defined as 'brothers'?

The first character 0012 has this analysis in Tangraphic Sea 1.7.131:


0092 2750 5415 1602

1ma4 1ghu2 1bu3 2ngorn1

'mother head <bu> all' = top of 'mother' plus all of the homophonous phonetic <bu>

(I use < > to indicate that <bu> is a transliteration - only a loose phonetic approximation and not IPA.)

Of course brothers are born from mothers. But so are sisters. Why not abbreviate 'man' or, better yet, one of these more common characters to create 0012?


2447 0605

2lo3 2toq4


Could 1bu3.2kuq1 have referred to brothers sharing a mother?

1.18.19:33: But if that was the case, why isn't the top of 'mother' also in 5873? Disyllabic words written with characters sharing the same component are common in both Chinese and Tangut?

Combined Homophones and Tangraphic Sea A 7.203 analyzed 5873 as


5876 3936 5307 2705

2kuq1 1pha1 1ghwi2 2ber'4

'<ku> left power right' = left of the phonetic <ku> + right of 'power'

5876 2kuq1 means 'to tie', so its meaning may also be relevant. Could 0012.5873 be interpreted as


'mother' + <bu> + <ku>/'tie' + 'power'

i.e., powerful (people) with maternal ties called <bu.ku>? Why 'powerful'? Could the right-hand component just signify 'person': 'people with maternal ties called <bu.ku>'? That component appears in three of the characters for m-'people' words. However, I assumed it was self-promoting in the autonyms

𗼇 ~ 𗼎𗾧

2344 2mi4 ~ 3752 3296 2my4 2na4 'Tangut'

since it means 聖 'sage' by itself and corresponds to Sanskrit ārya 'noble' (Clauson 2016: 339). It doesn't seem to have such a function in the character


for the presumably neutral word 0607 1myr4 'people, clan'. Maybe the common denominator of 5873, 2344, 3296, and 0607 is 'kinsman' which would explain why the component



without implications of kinship was not used in 5873.

That component does, however, appear in


2447 2lo3 ''

but not


0605 2toq4 ''

which has a different component


of unknown origin and function.

