This description of a cryptic crossword clue reminded me of some tangraphic analyses:

15D Very sad unfinished story about rising smoke (8)

is a clue for TRAGICAL. This breaks down as follows.

15D indicates the location and direction (down) of the solution in the grid

"Very sad" is the definition

"unfinished story" gives "tal" ("tale" with one letter missing; i.e., unfinished)

"rising smoke" gives "ragic" (a "cigar" is a smoke and this is a down clue so "rising" indicates that "cigar" should be written up the page; i.e., backwards)

"about" means that the letters of "tal" should be put either side of "ragic", giving "tragical"

"(8)" says that the answer is a single word of eight letters.

There are many "code words" or "indicators" that have a special meaning in the cryptic crossword context. (In the example above, "about", "unfinished" and "rising" all fall into this category). Learning these, or being able to spot them, is a useful and necessary part of becoming a skilled cryptic crossword solver.

Tangraphs have no components equivalent to "15D" or "(8)", but "very sad" is like a semantic element of a tangraph and "unfinished story" and "rising smoke" are like cryptophonetics in tangraphs: e.g.,


5916 1xã (transcription of Chinese 漢 *xã 'Chinese') =

all of 5882 1zaʳ 'Chinese' (cryptophonetic referring to its Chinese translation 漢 *xã 'Chinese'; a semantic compound of


'small' + 'insect') +

right of 0789 2ɣʊ 'the surname Ghu' (function unknown)

High-frequency elements like ヒ (alphacode: cin) on the right of 5916 and 547 other tangraphs might be like the "code words" or "indicators" of cryptic crossword puzzles.

The indicator "about" reminds me of the term

5258 1ʔɔ̣ 'round'

used in tangraphic analyses to mean 'take the surrounding elements of the preceding character': e.g., 2634 is made up of the surrounding elements of 2639 plus the right side of 2705:


2634 1dʒwiõ 'publicize; propagate; declare; spread; to name' =

2639 2miee 'name' (semantic)

5258 (take the surrounding elements of 2639)

3678 2to 'to be born; to rise' (semantic)

2705 (take the right side of 3678)

I translate 5258 as 'frame' in analyses: e.g.,

2634 = frame of 2639 + right of 2705

I still do not know whether the analyses from the Tangraphic Sea reflect the intent of the creator(s) of the script or were (independently?) devised later as mnemonic devices. LEDYARD ON "THE SO-CALLED JURCHEN SCRIPT"

While looking through The Korean Alphabet for Middle Korean ss-words last night, I found this passage by Gari Ledyard (1997: 54; emphasis mine):

The so-called Jurchen script was more a code than a writing system; to this day its complete decipherment is unattained and probably unattainable given the few written texts that still exist. Although what exists is often partly decipherable because of surviving Sino-Jurchen glossaries, no one yet has figured out the principle of this writing - indeed it may not have had any. If it did no more than discourage Koreans from imitating it in developing their own writing, it made a noble contribution [to the development of hangul, the Korean alphabet].

14 years later, I have yet to see anyone explain the principle(s) of the Jurchen script. When I first took a serious look at it 15 years ago, it struck me as a random imitation of Chinese characters. Its strokes were mostly Chinese, but they weren't combined into phonetic or semantic elements recycled in multiple characters. Learning one character with a certain component would not help you learn the pronunciation or meaning of any other characters sharing that component. The recurring shape 山 has no apparent recurring function in Jurchen. How could anyone learn such a nonsystem of c. 1,000 characters, excluding variants? I used to think that the Jurchen script was to sinography what the Cherokee syllabary was to the Roman alphabet - a recycling of shapes without regard for phonetics.

Some [Cherokee] symbols do resemble the Latin, Greek and even the Cyrillic scripts' letters, but the sounds are completely different (for example, the sound /a/ is written with a letter that resembles Latin D).

However, my analogy was incorrect because the Jurchen elite were literate in Chinese, whereas Sequoyah was not literate in English. Sequoyah did not know how alphabets worked, so he independently invented a syllabary. The Jurchen, on the other hand, must have understood the semantophonetic principles of sinography, so why did they create a script that had no (obvious) principle?

Juha Janhunen did not think the Jurchen actually created a script. He viewed the Jurchen script as an offshoot of a Manchurian branch of the Chinese script:

Sinography proper Manchurian sinography
(the existence of the Parhae script is still controversial)
Khitan large script Jurchen (large) script

Although I think Janhunen is correct, his view leads to more questions. What was the principle of the Khitan large script? Why does the Manchurian sinographic tradition seem to be based on different principles (if any?) from mainstream sinography? Do the Khitan and Jurchen (large) scripts seem to lack principles because they were originally designed for a third language spoken in Parhae? That third language would most likely be Koreanic (or possibly even Japonic) since Parhae was a successor to Koguryo. But I don't recall seeing anything hinting at Japonic-based phonetic elements in the Khitan and Jurchen (large) scripts and my attempts to find Koreanic-based phonetic elements have been unconvincing:

Koreanic *an 'not' in "An-certain about Oxen in Jurchen"

Koreanic *on- 'to come' in "Getting Back on the Jurchen Track"

I am more interested in the Khitan and Jurchen (large) scripts than the Khitan small script because the principles of the former are a mystery, whereas the principles of the latter are at least somewhat understood, though the details remain hazy and the phonetic values of many symbols await identification.

The stacking principle of the Khitan small script (and the Jurchen small script?) is very reminiscent of the stacking principle of hangul. I am still not certain that this similarity is just a coincidence. Could the stacking in all three scripts reflect stacking in an earlier fourth script (i.e., Parhae) rather than Khitan and/or Jurchen influence on hangul? If the occasional ligatures of the Khitan large script such as


<muɣoo> < <mu> + <ɣoo> 'snake'

predate the Khitan small script, they could be forerunners to the stacking of the Khitan small script. SSEQUENCES (SSIC!)

While looking up 0586 in Li Fanwen (2008: 100) last night for the analysis of 1306 in line 95 of the Golden Guide, I saw the entry for 0585

2śjị 'cogon grass' (in Gong's reconstruction; mine is 2ʃɨị)

and wondered how it was pronounced in pre-Tangut if Tangut tense vowels (in rhymes 61-75 in Gong's reconstruction and mine) were conditioned by earlier *s-clusters:

*sCV > *CCV > *CC > CṾ

Was 'cogon grass' once *sʃiH? (The -ɨ- in later 2ʃɨị is nonphonemic. /i/ is [ɨi] after alveopalatals: cf. Russian ши [ʃɨ]. *-H - a glottal stop or fricative - is the source of the second tone. *-H may ultimately be from an *-s in at least some cases.)

Just as Russian SS-clusters came from earlier *SVS-sequences: e.g.,

ссора < съсора (the prerevolutionary spelling) 'quarrel'

Tangut SS-clusters could have had similar origins: e.g.,

*iH < *sɯʃiH 'cogon grass'

is my cover symbol for a pre-Tangut vowel that conditioned high vowels in Tangut.

But the simple prefix *s- that Gong proposed may be another source.

A third source might be *h(V) or *χ(V) or *x(V): cf. Ramsey's (1997: 135) emphatic prefix *hɯ- in proto-Korean on the basis of the 雞林類事 Jilin leishi (1103-1104) transcription of 'to write' as

核薩 *xəʔ s (cf. Late Middle Korean ssɯ-)

Qiang languages have χC- and xC-clusters. Ronghong Qiang has xs- corresponding to Mawo Qiang khs-:

RQ xsə : MQ khsə 'new' < *k-sə (the root is *sə, cognate to Tangut

1siw < *sik

and Old Chinese 新 *sin 'new')

Perhaps pre-Tangut *h- or *χ- or *x- could be from an even earlier *kV- that lenited to a fricative before fricatives after its vowel was lost: e.g.,

*kVSV > *kSV > *HSV > *SSV > *SS > SṾ

I derive Tangut aspirates from pre-Tangut *k-C- if they alternate with nonaspirateś: e.g.,

1pị < *s-pi 'to aim at' (*s- is a transitive verb prefix)

1phi < *k-pi 'aim' (noun)

There are no fricative-initial words with such alternations since Tangut has no aspirated fricatives. Perhaps the reflexes of *kS-clusters may be found among SṾ-words with SV-cognates: e.g.,

2sie < *Cɯ-seH 'to know; knowledge'

2siẹ (rather than 2shie) < *-seH 'knowledge'

12.31.11:36: The Tangut words for 'know' are cognate to Tibetan shes- 'to know'. According to von Koerber's rule*, sh- is from *sy-. So I could rewrite the Tangut derivations as

2sie < *sjeH 'to know; knowledge'

2siẹ (rather than 2shie) < *k-sjeH 'knowledge'

I would no longer need *ɯ-prefixes to account for the upward bending of *e to ie. ie would simply be a glide-vowel sequence /je/ reanalyzed as a diphthong /ie/.

Tangut -H was probably from an *-s corresponding to the final -s of Tibetan shes- < *syes-. So pre-Tangut *sjes and pre-Tibetan *syes were identical. (The choice of j or y for the palatal glide is merely a convention.) Of course, one should not expect all pre-Tangut and pre-Tibetan forms to be identical: e.g., Tangut ʃɨạ 'seven' is not cognate to Tibetan bdun 'id.'

Could Tangut 'six' and 'seven' share a *k(V)-prefix?

1tʃhɨiw < *k(ɯ)-trik or *k(ɯ)-drik 'six' (cf. Tibetan drug; could Tangut -i- be from a *-y- < *-u- that assimilated to a front vowel *-i- in the prefix?)

1ʃɨạ < *kɯ-ʃa 'seven'

(or did ʃ < *kʃ- < *ks- < *kɯ-ʃ-? cf. Skt kṣ [kʂ] < *ks)

(or did ʃ < *ʃt- < *st-? cf. Mawo Qiang stə 'seven' and German st [ʃt] < *st)

12.31.12:09: Here are several kinds of *s-/*k-presyllables and their effects on Tangut syllables.

I. Dropped without a trace

Presyllable vowel matches height class of following vowel:

*Cɯ-Ci > Ci

*Cʌ-Ca > Ca

II. Dropped with a trace

Presyllable vowel height causes following vowel to bend:

*Cɯ-Ca > *Cɯ-Cia > Cia

*Cʌ-Ci > *Cʌ-Cəi > Cəi

III. Fused before presyllabic vowel (if any) can condition lenition

*s(V)-CV > *sCV > *CCV > *CC̣ > CṾ

*k(V)-CV > *kCV > ChV (if C is not a fricative; Sh- is not possible)

*k(V)-SV > *kSV > *xSV > *hSV > *SSV > *SS > SV (S is any fricative)

IV. Fused after presyllabic vowel conditioned lenition

*sV-sV > *sV-zV > *szV > *zzV > *zẓ > zṾ

*kV-tsV > *kV-dzV > *kV-zV > *kzV > *gzV > *ɣzV > *ɦzV > *zzV > *zz > zṾ

I couldn't think of cover symbols for 'lenited fricative' or 'lenited nonfricative', so I gave specific examples above.

Many consonants merged in lenition:

Consonant class (Homophones chapter)

Before lenition

After lenition

Labials (I)

*-p-, *-ph-, *-b-


Dentals (III)

*-t-, *-th-, *-d-


Alveolars (VI)

*-s-, *-ts-, *-tsh-, *-dz-


Alveopalatals (VII)

*-ʃ-, *-tʃ-, *-tʃh-, *-dʒ-


Velars (V, VIII)

*-x-, *-k-, *-kh-, *-g-


Perhaps the glottal stop and sonorant consonants (nasals, liquids, and glides including v- /w/) did not lenite.

*I use Nathan Hill's (2011) names for Tibetan sound laws. THE GOLDEN GUIDE: LINE 95: TANGRAPHS 471-475

95. Three out of five tangraphs are transcriptive characters not associated with any specific morpheme:

Tangraph number 471 472 473 474 475
Li Fanwen number 1936 0707 4660 3774 1306
My reconstructed pronunciation 2xɛ̃ 1tʃɨw 1ʔiã 2ʃɨõ 1kiõ
Tangraph gloss (transcription of Chinese) district (transcription of Chinese) to guard (transcription of Chinese)
Word the surname 解 Xie (*xɛ) the surname 周 Zhou (*tʃɨw) the surname 燕/閆/鄢 Yan (*jã) the surname 尚/商/賞  Shang (*ʃɨõ) or 昌/常 Chang (*tʃhɨõ) the surname 龔/弓/宮/鞏 Gong (*kiũ) or 姜 Jiang (*kiõ)
Translation Xie, Zhou, Yan, Shang/Chang, Gong/Jiang

471: 'High' on the left of 1936 is an abbreviated phonetic. Was there a Xie family that raised livestock?


1936 2xɛ̃ (transcription of Chinese) =

left of 2949 2xɛ̃ 'skill' +

all of 2306 1pə 'small livestock'

I am not sure that 1936 should be reconstructed with a nasal vowel. It could transcribe Chinese syllables with oral and nasal vowels:

解薤 *xɛ


Perhaps 1936 was 2xɛj with a -j (cf. Gong's reconstruction 2xiəj).

(12.30:13:30: Li Fanwen 2008: 322 phonetically glossed 1936 as 郝 *xa, but the vowel doesn't match.)

472: 0707 is a semantic compound:


0707 1tʃɨw 'district' (borrowed from Chn 州 *tʃɨw) =

bottom left of 1408 1lhiooʳ 'place, site, market, street, military formation' +

left of 2627 2lɨə̣ 'earth'

473: Were the components of 4660 meant to be reminscent of Chn 炎/焱/焰 *jã 'flames'? 炎 and 焱 both consist of multiple 火 fires.


4660 1ʔiã (transcription of Chinese) =

bottom right of 4408 1məə 'fire' +

left of 5659 1veʳ 'flourishing, luxuriant'

I am not sure whether had a simple initial j- (as reconstructed by Arakawa) or an initial ʔ- (as reconstructed by Gong). I chose ʔ- because of its fanqie initial speller:


4660 1ʔiã (transcription of Chinese) =

0932 1ʔɨi 'many, more, much' +

1102 1kiã (transcription of Chinese)

But perhaps 0932 also had initial j-. 0932 transcribed Chinese syllables which were *ʔi and *ji in Middle Chinese. It is not clear whether the *ʔi/*ji distinction survived into Tangut period northwestern Chinese.

474: I suppose guarding enables the guarded to evade the effects of evil, but I would have expected a semantic compound like 'evil' + 'shield':


3774 2ʃɨõ 'to guard' =

left of 3551 2niõ 'evil, wicked, bad' +

center and right of 3789 1phie 'to escape, evade'

(12.30.12:21: Possibly borrowed from Chn 避 *phi 'to avoid'? But I would expect that to correspond to Tangut 1phi, not 1phie. Tangut -ie matches the -ie of Early Middle Chinese *bieh, but the initials don't match. Could Tangut ph- be from *k-b- with a native prefix *k- rather than from Tangut period NW Chn *ph- from EMC *b-?)

3774 could represent affricate-initial as well as fricative-initial Chinese syllables:



Why not transcribe those syllables with tangraphs for tʃɨõ and tʃhɨõ, syllables which existed in Tangut?

Although all Chinese syllables transcribed with 3774 had nasal vowels, Gong reconstructed it as 2ɕjow and I wonder if its rhyme was -ow with a nasal vowel. Gong's glide codas correspond to my nasal vowels in his rhyme groups VIII and XI:

Rhyme group Rhyme Grade Gong This site
(nasal interpretation)
This site
(glide interpretation)
VIII 41 I -əj -ẽ -ej
42 II -iəj -ɛ̃ -ɛj
43a III -jɨj -ɨẽ -ɨej
43b IV -iẽ -iej
XI 56 I -ow -õ -ow
57 II -iow -ɔ̃ -ɔw
58a III -jow -ɨõ -ɨow
58b IV -iõ -iow

(I have excluded tense, retroflex, and long vowel rhymes for simplicity. Unlike Gong, I recognize a Grade IV distinct from Grade III.)

2ʃɨow without a nasal vowel is close to Chn 守 *ʃɨw 'to guard', but I doubt the former was borrowed from the latter because the vowels don't match. I would expect Chn *ʃɨw to correspond to ʃɨw, a syllable that exists in Tangut.

475: 1306 represented the 龔 Gong of the late Tangutologist 龔煌城 Gong Hwang-cherng in the Forest of Categories.

1306 1kiõ was not a perfect match for Chn 龔/弓/宮/鞏 *kiũ but it was the best available match other than 1kiu. There was no Tangut rhyme -iũ.

Were any Gongs or Jiangs related to Su and/or Qian families?


1306 1kiõ (transcription of Chinese) =

0586 2siu (transcription of Chinese: e.g., the surnames 蘇 *su [without *-i-!] and 宿 *siu, now both Su in modern standard Mandarin)

3277 2tshia (transcription of Chinese: e.g., the surname 錢 *tshiã, now Qian in modern standard Mandarin)

3277 only transcribed Chn 千潛賤淺錢踐 *tshiã with a nasal vowel even though it belongs to the oral vowel rhyme group IV rather than the nasal rhyme group V. Were Tangut period northwestern Chinese vowels losing nasalization? THE GOLDEN GUIDE: LINE 94: TANGRAPHS 466-470

94. Four out of these five are transcription characters not associated with any specific morpheme:

Tangraph number 466 467 468 469 470
Li Fanwen number 5916 2152 2635 2138 3617
My reconstructed pronunciation 1xã 1ʃɨi 1xiõ 2bəəu 2xwe
Tangraph gloss (transcription of Chinese) grave (transcription of Chinese)
Word the surname 韓 Han (*xã) the surname 施/史時/石/師 Shi (*ʃɨĩ)? the surname 馮/鳳/豐酆/封 Feng (*fɨũ) or  方/房 Fang (*fɨõ) or Xiang 向 (*xɨõ) the surname 慕 Mu (*mbəu)? the surname 惠 Hui (*xwej)
Translation Han, Shi, Feng/Fang/Xiang, Mu, Hui.

466: 5916 has 5882 as a cryptophonetic (its Chinese translation was 漢 *xã) plus the mysterious right-hand element ヒ (alphacode cin):


5916 1xã (transcription of Chinese 漢/韓/邯 *xã) =

all of 5882 1zaʳ 'Chinese' +

right of 0789 2ɣʊ 'the surname Ghu'

Does 0789 represent a Ghu family related to the Han?

The 馬韓 Mahan confederacy in Korea was called

2bæ 1xã (cf. Tangut period NW Chn *mbæ xã)

so Korea, the 韓國 'Han country', might be known as

1xã 2lhiẹ 'Han country'

in modern Tangut. Only two strokes (cin) would distinguish 'Korea' from 'Chinese'!


1xã 'Korea' <> 1zaʳ 'Chinese'

467: Were the Shi the 'elder Nga'?


2152 1ʃɨi (transcription of Chinese 漢/韓/邯 *xã) =

2888 2mə 'surname' +

1633 2pəụ 'elder' +

2075 2ŋa 'the surname Nga'

468: 2635 looks like a combination of 'earth' (indicating a geographic name? from which tangraph?) plus an element of unknown function (alphacode: dol) found in only eight other tangraphs that don't sound like xiõ.


Although Nishida and Arakawa have reconstructed Tangut f-, I am skeptical because Chinese f-syllables were transcribed with tangraphs like this one listed in chapter VIII (glottal initials) of Homophones. (Velar x- is treated as a glottal initial and may have been glottal [h].)

469: The analysis of 2138 is unknown. It looks like 'earth' (cf. the 土 'earth' in Chn 墓 'grave') plus 'hand' plus an right-hand element of unknown function (alphacode: dal) found in 80 other tangraphs:


2bəəu 'grave' is borrowed from Tangut period northwestern Chinese 墓 *mbəu 'id.' The reason for the Tangut long vowel is unknown. Could it compensate for the loss of a native Tangut suffix?


470: The analysis of 3617 is unknown:


Its left component is 'person' but I don't know what the other two (alphacodes bal and juu) are doing. The sequence baljuu does not occur anywhere else. There are no other tangraphs pronounced xwe.

12.29.1:25: Could 'person' be from 2888 'surname' as in 2152 above? THE ROOTS OF RAWNESS

Having just written about the etymology of Zhuang sawgun 'Chinese character', I should write about the etymology of the second half of sawndip. (The saw is the same.)

Despite the spelling, ndip 'raw' is [ɗip7] without a nasal. d without n is unaspirated [t] in Zhuang spelling. This usage is a carryover from Pinyin* in which d and t respectively represent unaspirated [t] and aspirated [th]. Zhuang has no [th], though it does have a [θ] written as s. The n of nd [ɗ] differentiates it from d [t]. The 1957-1982 spelling of [ɗ] was Ƌ, which might be a mirror image of the 1957-1982 letter Ƃ [ɓ] as well as a derivative of d.

Zhuang even-numbered tones usually developed in syllables with *voiced initials, but syllables with voiced implosive initials developed the odd-numbered tones associated with *voiceless initials:

*Proto-voicing *Proto-initial Tones
voiceless *p-, *t- ... 1, 3, 5, 7
voiced *ɓ-, *ɗ- ...
*b-, *d- ... 2, 4, 6, 8

Tone 1 is not indicated in spelling. Tones 2-5 are indicated by silent letters following a syllable:

Tone 1957 spelling 1982 spelling
2 -z
3 -j
4 -x
5 -q
6 -h

Note how similar the 1957 letters are to the numerals 2-6 and the Cyrillic letters г (italic), з, ч, and ь. (ƽ doesn't look like any Cyrillic letter.)

h can also be an initial letter in Zhuang, but z, j, x, and q are always tonal.

Syllables ending in stops can only have tones 7 and 8 which are indicated by the spelling of the stops:

Tone Spelling
7 -p, -t, -k
8 -b, -d, -g

Tones 7 and 8 are identical to 5 and 6, but this spelling convention avoids final digraphs like -pq for -p with tone 5, etc.

Li Fang-Kuei (1977: 129) reconstructed 'raw' in Proto-Tai as *dl/rip. The reflexes of PT *dl/r- in 'raw' vary from d- to ɗ- to n- to r-. Some of the sawndip spellings of ndip imply earlier phonetic similarity with Middle Chinese *ɳ- (which is r-like) and *l-:

生 'raw' + 尼 *ɳi

立 MC *lip + 生 'raw'

生 'raw' + 立 MC *lip

月 < 肉 'meat'+ 立 MC *lip

立 MC *lip by itself

Other spellings have no (?) phonetic:

米 'rice' + 生 'raw' over 失 'lose'

生 'raw' + 勺 'ladle'

㐅'?' + 力 *lɨk 'strength' (phonetic?; *-k is a grave consonant like -p)

㐅 appears in at least 13 sawndip characters. I don't know what its function is.

Sawndip may be the earliest indigenous Tai writing system. It would be interesting to reexamine existing reconstructions of Proto-Tai with sawndip evidence in mind. Although the spellings of ndip 'raw' imply an earlier liquid or even nasal, Pittayaporn (2009) reconstructed Proto-Tai 'raw' as *C̥.dip without either a liquid or a nasal. *C̥- is a presyllable with a voiceless initial. Pittayaporn compares his PT *C̥.dip with Blust's Proto-Austronesian *quDip. PAN *D is [ɖ]. Could PAN *quɖip have been borrowed into Proto-Kra-Dai as *qudrip**, simplifying to PT *C̥.dip and Norquest's (2008: 277) Proto-Hlai *Curiip and Proto-Be *Curjəp? Or did PKD inherit 'raw' from an ancestor shared with PAN or even from PAN itself?

Benedict's Austro-Tai (Austro-Kra-Dai in modern terminology?)

Proto-Austro-Tai (Proto-Austro-Kra-Dai)
Austronesian Kra-Dai (including Tai)

Sagart: Kra-Dai as (Sino-)Austronesian subgroup

Sino-Tibetan Proto-Austronesian
Non-Muic subgroups of Austronesian Muic
Non-Kra-Dai subgroups of Muic Kra-Dai

I used to be highly skeptical of a connection between Kra-Dai and Austronesian. As Pittayaporn wrote,

Benedict’s [Austro-Tai] work has been rightly criticized for its methodology and the quality of its evidence.

However, I now see

Undeniable evidence (Benedict 1942, Sagart 2004, and Ostapirat 2005) for some kind of relationship

But I have no opinion about which kind of relationship exists between Kra-Dai and Austronesian. For now, I can only recommend Pittayaporn (2009) as an overview of compression phenomena which may be relevant to compression in the histories of Chinese and Tangut.

*Not all Zhuang letters are used as in Pinyin. Exceptions:

Zh c is [ɕ] like Pinyin x, not Pinyin c [tsh]. Zhuang has no aspirates.

Zh j, x, q, z are tonal letters (see above), not consonants as in Pinyin.

Zh s is [θ], not [s] as in Pinyin.

Zhuang spelling indicates short vowels with an added -e- in closed syllables, whereas Pinyin has no devices for vowel length:

Short ae [a] oe [o]
Long a [aa] o [oo]

There is no length distinction in open syllables.

**12.28.00:11: A presyllable initial *q- is reconstructible in Proto-Kra-Dai on the basis of Buyang qaɗip 'raw' (Li Jinfang 1999 as cited in Sagart 2004: 50). SAWGUN STRATOGRAPHY?

I was surprised to see this in Wikipedia's "Sawndip" entry (emphasis mine):

The Zhuang word for Chinese characters used in the Chinese language is sawgun (Sawndip: (史+書)倱; lit. "original writing system") (saw meaning character or book, and gun meaning the Han Chinese ethnicity, cognate to 漢)

According to Sawndip sawdenj (Sawndip Dictionary), gun [kun1] means 汉 = 漢 'Chinese', not 'original'.

The Zhuang were in contact with Cantonese speakers. In standard Cantonese, 漢 is [hɔn5] from Middle Chinese *xanh which in turn may be from Old Chinese *hnars. If gun were "cognate to" 漢, I would expect it to be hoenq [hon5], hanq [haan5], or nanq [naan5]. No Chinese language known to me has

- initial k-

- the vowel -u-

- tone 1

in 漢 'Chinese'.

Sawndip sawdenj lists three sawndip spellings for gun on p. 208:

倱 = 亻 'person'* + phonetic 昆 Ct [kwan1] < MC *kon < OC *kun

軍 Ct [kwan1] < MC/late OC *kun < OC *kur 'army'

倌 = 亻 'person' + phonetic 官 Ct [kuun1] < MC/late OC *kwan < OC *kwan 'government official'

At first I thought that the third spelling of gun might be the key to its etymology. The Zhuang may have heard Cantonese speakers refer to themselves as 官 [kuun1] 'officials' and come to call the Chinese gun 'officials'. The problem is that Ct [kuun1] has a long vowel, whereas Zh gun [kun1] has a short vowel. Vowel length is phonemic in Zhuang, so Ct [kuun1] should correspond to a Zh guen [kuun1]. Moreover, the shift of MC *wa to Ct [uu] was sometime within the last millennium. (Sino-Vietnamese from the late Tang Dynasty still has [waa] corresponding to MC *wa.) The Zhuang have been in contact with the Chinese for much longer than that, so their name for the Chinese would probably be older than a millennium.

MC (and late OC) 軍 *kun 'army' is a perfect phonetic match for Zh gun [kun1] 'Chinese'. though the semantic match is loose. Did the Zhuang hear Chinese soldiers speaking about a *kun and adopt that term as the name for their occupiers and even the civilians associated with them?

The "strato" in the title is from Greek στρατός 'army' and refers to Zh gun. Of course, "graphy" is also from Greek and refers to Zh saw [θaɰ1]. I also considered the title "Stratobiblion" because I think saw is borrowed from Chinese 書 'book, script' which is also the phonetic in three of its sawndip spellings from p. 451 of Sawndip sawdenj:

史 'history' + 書 'script'

書 'script' + 青 'green' (implying 'not ripe'; cf. ndip [ɗip7] 'immature' in sawndip 'Zhuang writing', lit. 'immature writing'.)

字 'character' + 書 'script'

字 'character' by itself can also represent Zh saw. Zh sawdenj 'dictionary' is a calque of Chinese 字典 'dictionary' = 'character book'. Zh denj [teen3] sounds like a borrowing from 典 MC *t(i)enʔ 'reference book'. (Oddly, denj has no entry in Sawndip sawdenj. I presume it was written as 典 without modification.)

A fourth spelling (土 atop 卜) might be a simplification of 書 'script'. Compare 土 atop 卜 to these cursive forms of 書.

Zh saw [θaɰ1] could be an attempt to imitate Cantonese 書 [søɥ1] which has a vowel and glide absent from Zhuang. However, I doubt the word is a recent loan. Its rhyme also matches the *-aɰ that Pulleyblank reconstructed for the Old Chinese rhyme category of 書, though most would reconstruct that category as *-a.

12.27.15:09: Sino-Vietnamese has two layers of correspondences for that Chinese rhyme category:

[ɨɨ] (newer layer from Late Middle Chinese: e.g., 書 thư)

-ưa [ɨə] (older layer from Early Middle Chinese; no known loan of 書 from this layer)

I am surprised Zh saw isn't sw [θɯ1]: cf. Proto-Tai *sɯ A 'writing' from Chn 書.

One might think that PT *-ɯ became Zh -aw [aɰ], but no such change is reflected in

PT *mɯ A > Zh mwz [mɯ2] 'hand'

and Zh -aw [aɰ] corresponds to PT *-aɰ:

PT *ɓaɰ A > Zh mbaw [ɓaɰ1] 'leaf'

This does not necessarily mean that all Zh -aw [aɰ] are from PT *-aɰ. Nonetheless, I suspect that Zh saw is a very old loan preserving the rhyme *-aɰ without the nonlow vowels reflected in the later Vietnamese borrowings from Middle Chinese.

One might also propose a connection between Zh saw and PT *dʑɯ B 'name', possibly from Early Middle Chinese 字 *dzɨh* 'character, name'. (PT had no *dz-.) However, Chinese *dz- corresponds to Zh c-, not Zh s-. The Sino-Zhuang version of 字 is cih [ɕi6], written as 字+之. The front vowel [i] indicates that the borrowing occurred

- after the fronting of EMC to *i (Sino-Vietnamese chữ 'character' predates this change)

- before the shift of *dzi to *dzzˌ (Sino-Vietnamese ̣ 'character' reflects the latter)

Next: The Roots of Rawness

*12.27.1:05: By coincidence, the native Zhuang word goenz [kon2] 'person' vaguely resembles gun 'Chinese', but its second tone derives from an earlier voiced initial: [kon2] < *gon. It is cognate to Thai คน khon < Proto-Tai *ɣon 'person'. TANGUT THROUGH TIBETAN (PART 4: CONCLUSION)

This concludes my comments on Andrew West's observations on Tangut in Tibetan transcription:

Although most Tibetan glosses do approximately correspond to the modern phonetic reconstructions of the corresponding Tangut characters, the correspondence is disappointingly poor, with only a very few characters showing an exact correspondence between Tangut reconstruction and Tibetan transcription (e.g.

L[i Fanwen 2008 #] 2098  "I, me"

which is reconstructed *ŋa and glossed ŋa ... which also happens to be the Tibetan word for "I, me").

This correspondence, though vague at best, was sufficient to identify most of the consonant classes in the monolingual Tangut Homophones dictionary (see part 3). However, it is certainly not sufficient to identify the 105 rhymes of the Tangraphic Sea.

In most cases the Tibetan glosses miss out what should be essential phonetic features, for example transcribing *mja as ma, *ŋwu as ŋu, *ɣjɨ̣ as rgi, *war as wa, *lew as li, and *lhjwịj as lhi.

These nonmatches belong to at least four categories:

1. Random errors (slips of the brush) and pseudoerrors caused by damage to the manuscripts preventing us from seeing vowel symbols that were once there: e.g.,

- a hole above the base consonant of a gloss could cause us to read it as <Ca> with the default vowel <a> instead of the ི <i>, ེ <e>, or ོ <o> that was once above it

- a hole below the base consonant of a gloss could cause us to read it as <Ca> with the default vowel <a> instead of the ུ <u> that was once below it

2. Nonmatches involving Tangut features that did not exist in Tibetan: e.g., there were no consonant clusters ŋw- and lhjw- in Classical Tibetan, so it's not surprising that such Tangut clusters were glossed as <ng> and <l> even though <ngw> and <lhyw> would have been ideal.

3. Nonmatches involving Tangut features that did exist in Tibetan: e.g., Tangut mja could easily have been glossed as Tibetan <mya> instead of <ma>. Why wasn't it? None of the 87 complete glosses in Tai (2008: 209-210) for syllables reconstructed with -ja by Gong have <y>, so the eight instances of <ma> for expected <mya> cannot be disregarded as a random error.

4. Nonmatches involving Tibetan letters corresponding to nothing in standard Tangut: e.g., the <r> of <rgi> for standard Tangut ɣjɨ̣.

The third category makes me think that, in Andrew's words,

the modern reconstructions of Tangut are seriously flawed (a possibility I can't reject)

At present I do not think any reconstruction of Tangut - not even any of my own - is anywhere near accurate. I expect a major overhaul of my reconstruction once I analyze the Tangut rhyme tables. I will do that after I finish the translation of the Golden Guide.

The fourth category makes me think that the glosses reflect a nonstandard variety of Tangut: e.g., <rgi> reflected nonstandard ɣjɨ̣̣ʳ with a retroflex vowel rather than standard ɣjɨ̣ with a nonretroflex vowel. Even the most accurate reconstruction of the standard dialect may not match the dialect(s) reflected in the glosses.

It does not help that I don't know what dialect(s) of Tibetan underlie the glosses. That problem would have to be solved by a Tibetologist like Nathan Hill with a background in Tangutology. Converting the glosses into any Classical Tibetan romanization system does not necessarily generate a result resembling their intended pronunciations.

What happens when we look at language A transcribed by speakers of language B and assume that we are looking at language A' transcribed by speakers of language B'?

Suppose we know that the Russian word for 'place' was место [ˈmʲɛstə] and we know how to pronounce Chinese characters in modern standard Mandarin. What if we find the transcription


Md mietuo [mjɛthwɔ]

for what we assume to be место? We might wonder why there was no attempt to transcribe the [s] and why Rus [tə] didn't correspond to Md 特 te [thə]. (Let's assume Md [t] was already used as a transcription of Ru [d], so Md [th] was chosen as a transcription of Ru [t].)

But we didn't know that the transcription was made by a Cantonese speaker who heard Ukrainian місто [mistɔ] and intended 滅陀 to be pronounced [mitthɔ]. Cantonese [tt] corresponds to Ukrainian [st].* Md [jɛ] does happen to correspond to Rus [ʲɛ], but the scribe really intended Cantonese [i] to correspond to Ukrainian [i]!

Later misinterpretation Russian m ʲɛ s t ə
Mandarin m - th wo
Original intent Cantonese m i t th ɔ
Ukrainian m i s t ɔ

Is it too much to expect accuracy? Were

the Tibetan scribes were content to provide a very approximate representation of Tangut, so approximate that it is hard to imagine that a Tangut speaker could have understood much that a Tibetan reading the Tibetan transcriptions of Tangut was saying[?]

On the one hand, I do find Andrew's theory attractive:

So what was the purpose of the Tibetan transcriptions? My theory is that they were intended for Tibetan monks to be able to chant in unison with their Tangut colleagues, not knowing what they were chanting or needing to chant perfectly, but just vaguely correct enough to be able to chant along without sticking out like a sore thumb. Maybe the Tibetan monks who made the transcriptions did not speak a word of Tangut, and they just wrote down what they thought they heard, which would explain why the transcriptions are so imprecise.

On the other hand (emphasis mine),

Thirdly, the Tibetan glosses utilise prefix letters (g, d, b, m and ') and superfixed letters (s, r and l) in a way that suggests they might have been intended to indicate a particular pronunciation of the corresponding Tangut character, but it is not immediately obvious what this might have been (it has been suggested that these nominally silent letters may have been intended to represent tone in Tangut, but I am not convinced), and they are used inconsistently (e.g. L1245 ·jij is glossed as either ye or g.ye). Likewise, the glosses frequently use a final letter -'a, seemingly to indicate a long vowel, but again it is used inconsistently (e.g. L1278 ·jɨ is glossed as either g.yi or g.yi'). Perhaps the oddest feature of the Tibetan transcriptions is the use of prefix letters in front of letters that do not allow prefix letters in standard Tibetan orthography, for example d.wi དཝི and g.ru' གརུའ. This feature occurs across different manuscripts, and could suggest that the scribes were actually using a formally defined orthography for transcribing Tangut, and not just putting down what they could hear, as I suggested above.

Some hypotheses about these letters:

- I long assumed that the preinitial letters represented real consonants not preserved in standard Tangut, but am inclined to view many of them as attempts at tonal spelling. An exception is preinitial <b> which may indicate a Tangut medial -w- (Nie 1986). This usage may help us identify the underlying Tibetan dialect(s) of the glosses.

Apparent exceptions to Arakawa's (1999) tonal spelling interpretation (preinitial = level tone, no preinitial = rising tone)

1542 1ku (Gong), glossed as <gu> as well as <b.ku> and <H.ku>

2999 2swu (Gong), glossed as <b.zu> instead of <s(w)u>

(1:23: Were <b.z> and <s> pronounced with different tones in the Tibetan dialect underlying this transcription?)

may be due to tone sandhi in compounds or within phrases. All studies I have seen of the Tibetan glosses take the glosses out of context, so my hypothesis has yet to be tested.

- Preinitial <m> and <H> (Andrew's <'>) could represent prenasalization.

- Final <H> may represent a final consonant that only appears in certain phonological environments: e.g.,

*CVC >

CV if the next syllable begins with a (certain type of?) consonant

CVH if the next syllable begins with a vowel (and/or a certain type of consonant?)

- Preinitial <r> and final <r> could represent vowel retroflexion, which may also be present in words lacking retroflex vowels in standard Tangut (see example above).

Future studies of the glosses may lead to the birth of Tangut dialectology.

*12.26.2:36: I am assuming that the transcriber does not know Cyrillic and is not influenced by spelling.

Cantonese still preserves the final stops of Middle Chinese. Neither Cantonese nor MC had syllables with final *-s or initial *st-. I found two instances of Indic st-type sequences transcribed as MC *-tt(h?)- in Soothill (1937): e.g.,

Prakrit Kustana as 屈丹 MC *khuttan
Sanskrit Veṣṭana (or an unattested Pali *Veṭṭhana?) as 別他那 MC *bɨetthana

Unfortunately, I haven't been able to find any examples of Cantonese [VttV] corresponding to English [VstV] in the pages of Bauer and Benedict (1997) visible at Google Books. I suspect no such cases exist because borrowings from English are likely to be influenced by English spelling.

I wonder if any Cantonese speaker has ever pronounced English [VstV] as [VttV]. If not, then my место example is invalid.

Perhaps this would be a better example of the same kind of problem. Suppose we are puzzled by Russian от [ɔt] 'from' transcribed as what we think is supposed to be Mandarin 威地 weidi [wejti] ... which in fact was meant to be a Cantonese [wajtej] transcribing Ukrainian від [ʋid]. (There is no Cantonese [ʋi(t)], [vi(t)], or [wi(t)]. Cantonese syllables cannot end in voiced stops.)

Later misinterpretation Russian - ɔ t -
Mandarin w ej t i
Original intent Cantonese w aj t ei
Ukrainian ʋ i d -

(Tables added 12.26.13:46.)

