Today I saw boxes of chocolate-flavored Pocky (Japanese Pokkii) labeled in Thai as


Pókkîi rót chɔ́kkoolɛ́t 'Pocky flavor chocolate'



Kuulíkòʔ 'Glico'

I wonder

- how tones are assigned to Japanese loanwords in Thai

- were the tones of Pókkîi assigned by analogy with the English loanwords like ร็อคกี้ Rókkîi 'Rocky'?

- why chɔ́kkoolɛ́t has a long oo absent in English

- is the long ee of Japanese chokkoreeto due to the assumption that chocolate rhymes with late?

- why Kuulíkòʔ 'Glico' (a compromise between Japanese Guriko and its Anglicized form Glico) has a long first vowel TANGUT 'TALLY MARKS' AND PAHAWH KHMU

For nearly a century, Tangutologists have devised various systems of 'radicals' to index Tangut characters. These modern radicals do not necessarily correspond to the components implied by the analyses in Tangraphic Sea. No explicit premodern list of Tangut character components was known prior to the discovery of the book known in Mandarin as 擇要常傳同名雜字 Zeyao changchuan tongming zazi, translated by Andrew West as 'Essential Selection of Often Transmitted Homonyms and Mixed Characters'. Andrew has written an article on the publicly available pages of this book.

The most interesting aspect I have seen so far is the list of radicals (?) which may be titled

3583 0706 5865 1084 2403 0092 1ta4? 2vi1? 1soq1 2ghaq12di4 1ma4 'TOPIC? rhyme? three ten character mother'

The first two characters are difficult to identify. I have followed Andrew's intepretation here.

The use of a topic suffix as the first part of a title is baffling.. 3583 is a common fanqie speller for Tangut rhyme 1.20 1-a4. Could the title mean 'thirty letters for the rhyme 1-a4?' But why would there be thirty ways to represent the same rhyme? There is no obvious correlation between these thirty elements and the rhyme 1-a4, and as we will soon see, some of the thirty elements do not even occur in any Tangut characters. Another possbility is that 3583 is a phonetic symbol for a word ta, as the graph can transcribe the Sanskrit syllables ta and tā. But what would that ta mean/? What if ta is a non-Tangut word in the Tangut script?

The first eleven elements look like tally marks:

1-4: one dot, vertical stacks of two to four dots

5-8: one horizontal line, vertical stacks of two to four horizontal lines of the same length

9-11: one vertical ine, horizontal clusters of two to three vertical lines of the same length

Tangut characters do contain single dots (1), horizontal lines (5), and vertical lines (9), but the combinations of those elements (2-4, 6-8, 10-11) are unknown in Tangut. (When two horizontal lines are vertically stacked in Tangut, the top line is shorter than the bottom line, whereas 6 has lines of equal length.)

Why include these un-Tangut 'tally marks'? Andrew wrote (emphasis mine):

At present it is a mystery to me as to what these thirty "letters" are intended to represent, and whether they represent rhymes in Tangut or some other language.

What if 3583 1ta4 is the name of a language? A name related to the source of the exonym Tangut (which does not resemble any Tangut autonym)? What if the 'tally marks' were used to write 'Ta' but not Tangut? What if Ta and Tangut had scripts with partially overlapping components?

I am reminded of Shong Lue Yang who invented the Pahawh Hmong and Pahawh Khmu scripts for Hmong and Khmu. As far as I know, Pahawh Khmu has disappeared without a trace. William Smalley (1990: 195) wrote:

All we know about the form of the Pahawh Khmu' is that Chia Koua Vang has seen it, and that the letters were like the Pahawh Hmong letters, which leaves us no way of evaluating it as a writing system. It may not have been preserved.

Hmong and Khmu have very different phonologies, so I assume that Pahawh Khmu had graphemes absent from Pahawh Khmu and vice versa. Would a list of graphemes of both of Shong Lue Yang's scripts resemble the Tangut list of 'thirty letters'in the sense that it would be a mixture of familiar and alien components? MONOMASTICS

Old Chinese, Written Tibetan, Old Burmese, and Tangut all have the same vowels in 'one' and 'name' (hence the title). Why is Pyu the odd man out?

Gloss Old Chinese Written Tibetan Old Burmese Tangut Pyu
one *Cɯ-tek gcig tac 1lew1 < *Cʌ-tek taṃ
name *Cɯ-meŋ ming mañ 2me'4 < *Cɯ-meXH mi

The phonetic value of the Pyu grapheme that I transliterate as  is uncertain.

Here are four possible explanations for why Pyu has different vowels in those two words.

1. The heights of a and i were conditioned by different presyllabic vowels. If pre-Pyu had *low and *high presyllabic vowels in 'one' and 'name', the main vowels might have harmonized with them: e.g.,

*Cʌ-tektaṃ (*e lowered to *a after low *ʌ)

*Cɯ-meŋmi (*e raised to *i after high *ɯ)

2. Matisoff (2003) reconstructed Proto-Tibeto-Burman *tyak as well as *g-t(y)ik. Putting aside the problem of whether PTB even existed (I don't think it did), one might say that Pyu taṃ is from *tyak (which in Indo-European-like terms could be an a-grade form of *tik).

3. The different vowels in Pyu reflect the presence or absence of something corresponding to the mysterious pre-Tangut feature that I write as *X and conventionally place after the vowel, though I do not know its location.

4. Pyu had asymmetrical developments of vowels after homorganic codas. Such asymmetrical development later occurred in Burmese:

-ac > [iʔ]

-añ > [e] and [ɛ] as well as [ĩ] and [i]

Mandarin also has similar instances of asymmetrical development: e.g.,

*-ak > -e, -o, -uo (generally depending on initial, but note how 樂 *lak became le whereas 洛 *lak became luo; is this apparent split due to dialect mixture?)

*-aŋ > -ang

In these particular examples, vowels before *nasals are lower than those before *stops: cf. how French -in has a vowel lower than the vowel in French -it.

However, Burmese also has an example of the opposite phenomenon:

*-ak > [ɛʔ] (which has no nasal counterpart [ɛ̃])

*-aŋ > [ĩ]

Has anyone studied asymmetrical development across languages?

8.12.4:48: A fifth possibility is that Pyu preserves a vocalic distinction lost in the other four languages. ORIENTALISTA ÉS MONGOLISZTIKA

I saw those words in the Hungarian Wikipedia entry on Kara György (a.k.a. George Kara) and wondered

- when is foreign s borrowed as Hungarian s [ʃ], and when is it borrowed as sz [s] - do the two correspondences indicate different strata?

- why is foreign s treated differently in orientalista 'Orientalist' and mongolisztika 'Mongol studies' - or mongolista 'Mongolist' and germanisztika 'German studies', etc.? DID FINAL *-P CONDITON LABIAL FLIGHT IN TANGUT?

So far, I have been using 'labial flight' to refer to the loss of labiality in pre-Tangut syllables with labial onsets and codas such as *mbjvm 'to fly'. The labial codas in my examples - actual or hypothetical - have been either *-m or *-w (< *-k, *-ŋ). But pre-Tangut had one more labial coda: *-p. Did it also condition labial flight?

Here is the fate of pre-Tangut *-p according to Guillaume Jacques (2014: 206)

1. *-ap > -a (lost completely)

2. *-ip, *-up > (*-əp?) > (= my -y; lost completely; I propose an merged intermediate stage *-əp)

3. *-op > -ew (lenited to -w; labial *o dissimilated to e before a labial - a vocalic variety of labial flight)

Both Guillaume and I reconstruct the same six vowels (*u *i *a *ə *e *o) for pre-Tangut. In theory, one might expect *-ep and an *-əp (that merged with *-ip and *-up?), but Guillaume only reconstructed three possible codas for *e (*-ej, *-en, *-eŋ; see p. 207) and no codas for *-ə. The unbalanced distribution of vowels and codas in his pre-Tangut deserve further study. (I never worked out all the possible combinations of vowels and codas in my pre-Tangut.)

If labial flight - of the consonantal type - is real, I would expect *Pop to become Pe1 (cf. *mew > 1me1 'eye'). *Pjop might merge with *Pjaŋ and become Cwo3 (cf. *mbjvm > 1jwon3 'to fly').

Guillaume identified only one example of *-op:

3299 1lwew1 < *P-lop 'vapor'; cf. Japhug tɤ-jlɤβ, Situ ta-jlôp < *jlɔp

The pre-Tangut prefix and onset are mine; Guillaume only reconstructed the rhyme *-op. My pre-Tangut *P- conditions Tangut medial -w-.

Next: Did syllables like *P(j)op exist in pre-Tangut? A CHRONOLOGY OF LABIAL FLIGHT IN TANGUT

In my previous post, I made a few references to the order of changes that Guillaume Jacques (2014: 199) and I proposed. Perhaps a table would be easier to understand:

Type of 'labial flight' *Pj-m ('to fly') *P-w < *P-k ('eye') *Pjaŋ (examples?) *PV-jvm (examples?) *PV-jaŋ (examples?)
Stage 1: early pre-Tangut *mbjvm *mek *Pjaŋ *PV-jvm *PV-jaŋ
Stage 2: velar coda lenition *meɣ > *meɰ *Pjaɰ *PV-jaɰ
Stage 3: labialization of glide *mew *Pjaw *PV-jaw
Stage 4: labial initial-coda dissimilation *ǰwvm *mej *Cwaw
Stage 5: presyllable-initial fusion *Pjvm *Pjaw
Stage 6: Tangut 1jwon3 1me1 Cwo3 Pon4 Po4


The five types: So far I only know of one example each for the first two ('to fly' and 'eye'). The other three are theoretical. *P represents any labial consonant.

Stage 1: *v is Guillaume's notation for a non-*i-vowel. Japhug has o and Proto-Lolo-Burmese has *a in 'to fly', so I think the pre-Tangut word might have been *mbjom or *mbjam.

Stages 2-3: The weakening of velar codas may have also occurred in the northwestern Chinese dialect known to the Tangut.

The rare velar glide (only in 2.66% of UPSID's languages) which was only in coda shifted to the more common labial glide *w which could also occur in other positions.

Stage 4: Dissimilation only occurred within the same syllable. Presyllabic labial onsets followed by syllables ending in labials remained intact.

I use the symbol to represent a pre-Tangut affricate that could have been [dʑ], [dʒ], or [dʐ]. I think pre-Tangut palatals became retroflexes at some point before stage 6. *C represents the consonants *č, *čh, and that became Class VII initials in Tangut.

The glide in *Cw- from *Pj- could have been phonetically [ɥ] if preceded by a palatal onset.

Stage 5: *PV-j- fused into *Pj-, filling the void left by *Pj- that dissimilated to *Cw-.

Stage 6: *-m nasalized the preceding vowel before being lost. *-m also conditioned the rounding of nonlabial vowels preceding it: e.g., *-am > *-om > -on [õ].

1me1 might have still ended in a glide [j].

The monophongization of *-aw has partial parallels in the northwestern Chinese dialect known to the Tangut. See Gong (2002: 374-376) for details.

7.31.0:41: I forgot to explain the grades:

- Pre-Tangut syllables with *-j- generally became Tangut Grade IV syllables. Exceptions with Grade III had Class VII initials (either primary or secondary).

In the past I have reconstructed Grade IV with a medial -i-, and Gong reconstructed Grade III (equivalent to my Grades III and IV) with a medial -j-. However, Tibetan transcriptions of Tangut do not strongly support a palatal interpretation of Grades III and IV.

- Pre-Tangut syllables with *e developed Grade I unless followed by a high-vowel presyllable. WAS DISSIMILATION THE MOTIVE FOR LABIAL FLIGHT IN TANGUT?

Last night, I forgot to mention why this unusual sound change proposed by Guillaume Jacques (2014: 199)

*mbj- > dʑ- (should this be dʑjw- = my jw-3?)

which could be formulated more generally as

*Pj- > Cw-3 or *Class I-j- > Class VII-w-3

might have occurred before any non-*i-vowel followed by *-m and *-aŋ.

With two exceptions of possible foreign origin below*, labials do not occur before -w in my Tangut reconstruction**:

2313 1pew4 'poor' (only in dictionaries) and 3412 2mew4 'the name Mew; transcription character for Sanskrit myak'

So I suspect that pre-Tangut had a constraint against *PVP syllables with labial onsets and codas.

Such a constraint also exists in modern Cantonese. Earlier *PVP sequences have become PVT: e.g.,

梵 Early Middle Chinese *buam > Cantonese faan

法 Early Middle Chinese *puap > Cantonese faat

In Cantonese, the coda became nonlabial, whereas in Tangut, the coda disappeared entirely (or at least became nonlabial) in

4684 1me1 ([mej]?) < *mew < *mek 'eye'

and the onset became a palatal-labial cluster in

2262 1jwon3 < *mbjvm 'bird/to fly'.

'Eye' indicates that dissimilation postdated the weakening of *-k to *-w.

Guillaume did not provide any examples of labials becoming palatals before *-aŋ. Given that *-aŋ became Tangut -o (Jacques 2014: 193), there might have been an intermediate *-aw phase that predated dissimilation:

Cwo3 < *Cwɔ < *Cwaw < *Cwaɰ < *Cwaŋ

The velar codas *-k (in 'eye') and *-ŋ may have merged into a velar glide *-ɰ that became a labial glide *-w conditioning dissimilation in labial-initial syllables.

*7.29.23:57: 2313 is a rare word without any known etymology. It may have been borrowed from my (hypothetical) substratum 'Tangut B' language after dissimilation (see above).

The name Mew written as 3412 may also be of Tangut B origin.

**7.30.0:18: Guillaume uses Gong's reconstruction which has far more -w than mine. Gong's -w corresponds to my -n (symbolizing nasalization and not a coda [n]) after o in his rhyme group XI (rhymes 56-60 and 97-98):

Rhyme Gong This site
56 -ow -on1
57 -iow -on2
58 -jow -on3/-on4
59 -ioow -on'2
60 -joow -on'3/-on'4
97 -owr -orn1
98 -jowr -orn4

 However, Gong and I agree that his rhyme group IX (rhymes 44-49 and 93-94) had -w:

Rhyme Gong This site
44 -ew -ew1
45 -iew -ew2
46 -jiw -ew3, -ew4
47 -iw3, -iw4
48 -eew -ew'1
49 -jiiw -iw'3, -iw'4
93 -ewr -ewr1
94 -jiwr -iwr4

 2313 and 3412 are the only examples of labial-initial syllables in rhyme group IX. SEVEN FROM ONE?: LABIAL FLIGHT IN TANGUT

In my last entry, I asked which meaning of

2262 1jwon3 'bird/to fly'

was older. It seems that 'to fly' might be older, since Guillaume Jacques (2014: 199) compared  2262 1dʑjow (sic; should be 1dʑjwow) = my 1jwon3 to

(the last syllable of?) Japhug nɯqambɯmbjom 'to fly'

Proto-Lolo-Burmese *(b)-yam 'to fly'

Written Burmese pyaṃ 'to fly'

and reconstructed pre-Tangut *mbjvm (in which *v could be any vowel other than *i). Although *mbj- would normally become bj- (= my b- + Grade IV), Guillaume proposed the sound change

*mbj- > dʑ- (should this be dʑjw- = my jw-3?)

before *-vm and *-aŋ. He noted there was no Tangut *bjow (= my *bon4) and few examples of -jow (= my -on3/-on4) after labials. I know of only two examples:

5954 2porn4 'luxuriant, exuberant' and 0421 2phon4 (mantra transcription character)

0421 is not for native words, so it has no pre-Tangut source.

On the one hand, if *mbj- became j-, wouldn't other *Class I (labial)-j-sequences also become Class VII initials*?

On the other hand, if Class I (labial)-j-sequences became Class VII initials, why does 5954 still have a labial initial?

I propose the following changes  to solve that conundrum:

1. *pj-> chw-3 (2331 2chwon3 'to contribute'?)

2. *(m)bj- > jw-3 (2262 1jwon3 'to fly')

3. *pV-j- > *pj- > p-4 (5954 2porn4 'luxuriant, exuberant'?)

New *Pj-sequences from old *PV-j- sequences replaced old *Pj-sequences that became *Cw-sequences (*C = Class VII initial):

Stage 1 Stage 2 Stage 3
*Pj- *Cw- Cw-3
*PV-j- *Pj- P-4

*phj- and *mj- would hypothetically become chhw-3 and nw-4 (via *ɲw-), but Tangut has no *chhwon3 or *nwon4.

*(m)bV-j- would hypothetically become b-4, but Tangut has no *bon4 because presyllables probably did not have voiced or prenasalized initials.

7.29.12:19: If 0421 were a native word, I could propose

4. *phV-j- > *phj- >  ph-4

but I doubt that aspirates were permissible in presyllables. I expect presyllables to only have a subset of segments that are permissible in the syllables that follow them.

*Guillaume follows Gong and reconstructs Class VII as palatal, but I prefer to regard it as retroflex. My notation is not IPA and can accomodate either interpretation: e.g., j- may be palatal [dʑ] or retroflex [dʐ]. ENGLISH FLIES FLY; TANGUT BIRDS BIRD

While looking for examples of the Tangut directional perfective prefix 1a0- 'up-', I found this example from volume 10 of the Tangut translation of the Golden Light Sutra in Li Fanwen (2008: 942):

1364 1136 5981 2262 4342 2511 1nga1-2gu1 1a0-1jwon3 2da4-2ryr4

'void-in PERFup-bird/fly PERFaway-go out/arise' = '... flew up and away into the air'

It corresponds to Chinese 空中飛騰而去, lit. 'void-in fly-rise and leave' (see the context here).

2262 1jwon3 can be either a noun 'bird' or a verb 'to fly'. Which meaning is primary? Which meaning is older? (The answer to those two questions may not be the same; a newer usage can outnumber an older one.)

5981 1a0 can also mean 'one' before nouns. Can 5981 2262 1a0 1jwon3 ever mean 'one bird' instead of 'flew'? If we did not have the Chinese edition, would it be possible to translate that line as 'a bird rose into the air'?

I would like to see more examples of PERF-V PERF-V sequences. A COLLECTION OF DESIRABLE DIRECTIONS

Thanks to Andrew West for drawing my attention to 3349 (last seen here) in this line from the preface to the Pearl in the Palm:

1319 3349 5981 4018 1326 0478 1tshi1 2rer4 1a0 2chhi3 1ky4-1sho'2

'? ? one root/basic/book PERF-collect'

Nishida (1964: 187):


hitsuyō-na kotogara wo ikkon ni atsumeta.

'collected important matters into one root.' (my translation of his Japanese)

'All the important aspects have been gathered into this one basic text' (the English translation accompanying his Japanese translation)

The last four words are straightforward: 'collected a book'. 5981 1a0 here is 'one'* (or - by coincidence in English - 'a'!) and not the directional perfective prefix 'up'.

The first two words are more troublesome.

Nishida (and Kychanov and Arakawa (2006: 462) regarded 1319 1tshi1 as an adjective 'important' (though adjectives normally follow rather than precede nouns in Tangut!), whereas Li Fanwen (2008: 220) regarded it as a verb 'to desire, want'.

It doesn't make sense to interpret 3349 2rer4 as 'direction' after 'important' or 'to desire, want'. Nishida translated it as 'aspects' in English and treated it as the object of the verb in his Japanese translation. There is no Tangut postposition corresponding to the Japanese locative postposition ni in his translation.

I would like to see more examples of constructions like this.

*7.27.1:01: It is curious that Tangut shares a 'one' with the Qiang languages but not with Pumi which may be its closest living relative according to Jacques (2014). MANUAL METAL ACTION?: TANGUT 2CHYR'3 'TO SHOOT'

When looking up 3468 in Li Fanwen (2008), I found his neighboring entry for


3471 2chyr'3 'to shoot' (= left of 3485 1laq1 'hand' + 'metal' (< top of 1shon3 'iron') + 5113 1vi3 'to do'?)

whose analysis is unknown. (Above is my guess. The combination of elements on the right side of 3471 is unique to that tangraph.)

Li listed 3471 as a Chinese loanword. But the closest word in the Chinese dialect known to the Tangut was 射 *3sha3 < *3zha3 < *m-lak-s 'to shoot', and it would have been borrowed as *sha3 or *zha3 (omitting unpredictable tones), not 2chyr'3 which I assume is a native word from pre-Tangut *RcəXH or *cərXH:

- *R- could be a dental stop or *l- that lenited to preinitial *r- as well as *r-; *r- and *-r conditioned retroflexion of the vowel before disappearing

- *c could have been *[c], *[tɕ], *[tʃ], or *[tʂ] (though I suspect retroflexion was a late phenomenon in Tangut)

- *-X symbolizes the source of the unknown phonetic quality that I transcribe with the prime symbol (-')

- *-H is the glottal source of the second ('rising') tone

I can't narrow down the possibilities because I can't find any strong candidate for an outside cognate. Pumi has khətʂhɑ (with tones depending on variety) 'to shoot' with aspiration absent in Tangut (my ch may have been unaspirated [tʂ]) and nothing corresponding to Tangut vowel retroflexion. Pumi may be Tangut's closest living relative (Jacques 2014); its sound correspondences with Tangut remain to be explored. (Unfortunately, there are no Pumi words ending in low back -ɑ with proposed Tangut cognates in Jacques' book.)

Does 3471 have any internal cognates in Tangut? Let's look at its (near-)homophones from Homophones A 35B43-35B54:

Homophones Tangraph Li Fanwen number Reading Gloss
35B43 1349 2chyr'3 first half of 2chyr'3-2lu1 'sage'
35B44 1783 'five' (in the 'ritual' language which I suspect was a non-Sino-Tibetan substratum language)
35B45 2803 the surname Chyr
35B46 3267 skill, artistry
35B47 3482 1chyr'3 to pare
35B51 3483 to attack (only attested in Homophones?)
35B52 2321 2chyr'3 afraid, scared
35B53 3826 to twine, wind, tie up; < *RcəXH or *cərXH
35B54 5223 half of 2chyr'3 1geq4 ~ 1geq4 2chyr'3 'constellation'; first half of the name of the Tangut ancestor 2chyr'3 2jwa3

None have anything to do with shooting.

Near-homophones without 'prime' don't have any semantic similarity to 3471:

2176 1chyr3 'to tie' (< *Rcə or *cər; cognate to 3826 above)

1359 the second half of 2phy1 2chyr3 'conceited'

Ah, I think I found the root of 3471:

5245 1chy*cə 'to draw a bow' (only attested in dictionaries?)

3471 may be 5245 plus a prefix *R- and an affix *X (I don't know if *X is a prefix or suffix, though I conventionally write it as a suffix since I have to put it somewhere). So I can reject my earlier *cərXH since *-r is not a suffix.

Lastly, how do we know 3471 means 'to shoot'? It is apparently only attested in Homophones, where it is preceded by the clarifier

5710 1liq4 'arrow' < *S-li (cognate to Old Chinese 矢 *l̥iʔ 'arrow'?)

so I suppose it's been assumed that 3471 is a verb since Tangut has object-verb order. However, I don't know how one can be certain that 3471 means 'to shoot' and not, say, 'to pull out of a quiver'. 3471 might even be a noun like 'quiver' modified by 5710.

I am skeptical of definitions of Tangut words known only from Homophones unless they have clarifiers like 'name' which leave no room for interpretation. LOOKED AROUND AT DECORATIONS

I was wondering if the verb 2khu'4-2rer4 'watch-direction' from my last post was a hapax legomenon. Thanks to Andrew West for pointing out that it occurs on the last page of the last ode:


3468 3457 4342 2258 3349 1vir1 1siw4 2da4-2khu'4-2rer4 '(...decoration?) new PERF-watch-direction'

Unfortunately, the surrounding characters were lost due to damage.

I suspect the character before 3468 is 5371, as 3468 is the second half of

5371 3468 1taq4 1vir1 'decoration', 'to be decorated' (see Kychanov and Arakawa 2006: 634)

and I do not know if 3468 can occur by itself*.

In any case,  4342 2258 3349 looks like a perfective verb 'looked around', and 3457 'new' modifies its object (5371?) 3468 'decoration' (?).

Nishida (1986) interpreted 4342 as 'inward', but Gong's (2003) 'away from the speaker' fits 'looked around' better.

It occurred to me today that 3349 might be a verb ('to direct'), so 2258 3349 would be a verb-verb rather than a verb-object sequence. But Kychanov and Arakawa (2006: 313) nor Li Fanwen (2008: 543) list it only as a noun. Does 2258 3349 reflect an earlier period when 3349 could also be a verb?

*Li Fanwen 2008: 562 lists no examples of 3468 in isolation other than dictionary definitions. Li Fanwen 2008: 847 gives the impression that 5371 is almost always followed by 3468; the one exception is

0542 5371 2shwo3 1taq4

which he defined as 嚴飾, interpreted as a verb 'to decorate' by Kychanov and Arakawa (2006: 429). LOOKING IN FOUR DIRECTIONS: THE TANGUT VERB-OBJECT COMPOUND 2KHU'4-2RER4

While preparing for part 3 of "Grokking Up", I saw this phrase in Li Fanwen (2008: 374)*:

4684 2205 3349 2258 3349 1me1 1lyr'3 2rer4 2khu'4-2rer4 'eye four direction watch-direction'

It caught my eye (pun unintended!) because 2rer4 'direction' appears twice, though there is only one 'direction' in Li Fanwen's Chinese translation 目視於四方, lit. 'eye look in four direction'.

Kychanov and Arakawa (2006: 316) regard 2khu'4-2rer4 'watch-direction' as a verb 'look from side to side; look around'. Those glosses make sense in this context. Does this verb occur in other texts?

Tangut is a verb-final language, so I am surprised that a compound verb would have a verb-object structure instead of an object-verb structure. Are there other verbs of that type? Could the first four words be modifying the noun 'direction': 'the direction from which the eye watches the four directions'?

*7.25.0:25: Li Fanwen gives the source of this phrase as Tangraphic Sea 67.113 (i.e., the third entry in column 1 of side 1 [= the right side] of page 67), but it's not there. I assume 67.113 is a typo. GROKKING UP TANGUT PERFECTIVE PREFIXES (PART 2: WRITING 'UP')

Given that the Tangut script has a reputation for being largely semantically based, it is curious that the seven characters for directional perfective prefixes do not share a common graphic denominator. Nor do they incorporate parts of characters for directions. For instance,


5981 1a0- 'up-', 'one' = left of 5951 1a0, first half of 1a0 1chwa3 'boots worn in mud' +  3654 1a0, first half of 1a0 1shy2 'monk' / a surname / kinship term prefix

does not have any components in common with, say,

1890 2be4 'high', 2612 2phu4 'up, above, over', or 2750 1ghu2 'head' (i.e., something on top)

The Tangraphic Sea analysis of 5981 (above) is circular, as its supposed sources 3654 and 5951 are in turn derived from 5981:


3654 = left of 3119 1i4 'many' + all of 5981


5951 =  left of 5981 + left of 1321 1ziq4 'boots'

Both 3654 and 5951 are phonosemantic compounds: 'person' (the left side of 'many') + a and 'boots' + a.

5981 in turn shares a phonetic left-hand component

with 5951. Could that component (Boxenhorn alphacodes: cil/cur) be derived from the left-hand side of Chinese 阿 (1a1 in the northwestern dialect known to the Tangut)? That would make it a distant cousin of the Japanese katakana character ア which is also derived from the left-hand side of Chinese 阿.

7.24.13:27: Both 5981 and 

4541 1a0 (Sanskrit a)

transcribed Sanskrit long ā (Arakawa 1997: 112). However, there was also a special character


4623 2a'2 = 4541 + 0443 'long'

for Sanskrit long ā, and 4541 normally represented Sanskrit short a. Moreover, 5981, 4541, and 4623 belonged to different homophone groups in Homophones and had different fanqie in Tangraphic Sea.  I conclude that 4541 sounded most like Sanskrit a* and that 5981 differed somehow: e.g., it may have been 1a4 whereas 4541 may have been 1a1 or even 1a2 (if it had the same grade as 4623 2a'2). (-0 in the readings of 5981 and 4541 indicates an unknown grade. The grades of the fanqie final spellers of 5981 and 4541 are unknown:

0165 1ha0 [Sanskrit hi, he, hye - sic!] and 4475 1ha0 [Sanskrit ha and hā].

The use of 0165 for Sanskrit front-vowel syllables implies that its rhyme - and hence the rhyme of 5981 - was palatal: i.e., Grade IV.)

It is tempting to assume that some aspect of 2a'2 absent from 1a0 - the second tone, the 'prime' quality of the rhyme (transcribed as -'), and/or Grade II - was associated with length, but many Tangut transcriptions of Sanskrit syllables with long vowels lack most or all of those qualities: e.g.,

3948 and 3985 1ka'4 for Sanskrit and 5299 1ta1 for Sanskrit

Unlike Gong and Arakawa, I doubt that vowel length played a role in the complex Tangut vowel system. If Tangut had long vowels, they would have systematically corresponded to Sanskrit long vowels in transcriptions.

*Or to be more precise, the pronunciation of Sanskrit a known to the Tangut. In the Indian phonetic tradition, Sanskrit a was [ə], but the Tangut probably heard something like [a] because they would have borrowed [ə] as the central vowel that I transcribe as y, not a. THE <ɃI⁝>-GINNING OF THE MYAZEDI INSCRIPTION

Writing about Tangut perfective prefixes made me wonder if Pyu had a perfective prefix. An obvious candidate for such a prefix could be transliterated as <ḅi⁝>.

The word is problematic even on the level of transliteration:

- does Pyu have a <b> : <ƀ> distinction?

- are three dots in Pyu equivalent to an overdot-'colon' sequence?

- is the overdot a nasal like anusvāra or something else? The fact that it also occurs in <tȧ> 'one' and <hrȧ> 'eight' corresponding to Old Burmese <tac> and <het> suggests that it might stand for a stop.

- is the 'colon' a fricative like visarga or something else?

Here are the first two occurrences of <ḅi⁝> in the Pyu A text of the Myazedi inscription (following Blagden's 1919 analysis):

1 ||| siri || dathagạda ƀa dọ ƀȧ: ƀi⁝ pdụ̄ sgu dạ: ƀa tva M DC
prosperity Tathagata ? ? HON? ? achieve or enter [nirvana]? establish [a religion]? ? ? ? ? thousand six hundred
2 XX hrȧ u sni: ƀi⁝ tvạ: thada ||  
twenty eight GEN? year ? elapse PAST?

The verb after the first <ƀi⁝> (if that <ƀi⁝> is a verbal prefix and not an unrelated homophone) does not seem to correspond to any of the verbs in the other three languages of the Myazedi inscription (see Appendices 1-3 below).

Maybe <pdụ̄ sgu dạ: ƀa tva> is a sequence of a verb followed by 'since'. Perhaps that verb was intransitive: e.g., 'die' (a multi-word honorific euphemism?) or 'rise'. I don't think that verb was transitive because I would expect its object to precede it, and I would not expect 'nirvana' or 'religion' to have an honorific suffix which is generally otherwise an honorific prefix (!)  for people (or images of them) in this inscription. Given the large number of Indic loans in Pyu, I would be surprised if there was a native term for 'nirvana' or 'religion' (unless the latter were 'teaching'). I think <ḅȧ:> might have originally been a noun, and <ƀa dọ ḅȧ:> might be a native title for the Buddha corresponding to Old Burmese <purhā skhaṅ> (see Appendix 2).or Old Mon <kyek ... tirley> (see Appendix 3).

Could <pdụ̄ sgu dạ: ƀa tva> be a prefix-object-verb(-'since'?) sequence with 'nirvana' or 'religion' somewhere in it?. (7.23.2:48: Cf. noun incorporation between directional perfective prefixes and verbs in Tangut. See Jacques 2014: 266. If Pyu did have incorporation, it is likely to have developed independently, as Pyu <ƀi⁝-> does not look like a cognate of any Tangut directional perfective prefix other than 2vy3- whose v- may or may not be from a lenited labial stop.)

Could <ƀi⁝-tvạ:-thada> be analogous in structure to Russian pro-sh-lo 'PERF-go-PAST' = 'passed'?

Does <ƀi⁝-> correspond to the Old Burmese indefinite past suffix <liy> (see Appendix 2)?

Does <-thada> end sentences, or is it a continuative suffix like Old Burmese <brī rakā> (see Appendix 2)?

APPENDIX 1: THE START OF THE PALI A TEXT (Duroiselle 1919; the glosses are mine; I don't know what anārikaṃ means or what [normally 'or'] is doing)

1 || śrī || buddhādikaṃ vatthuvaraṃ namitvā puññaṃ kataṃ yaṃ jinasā-
prosperity Buddha-beginning with object-excellent bowing merit work REL conquered-
2 -sanasmiṃ anārikaṃ rājakumāranāmadheyyena akkhā-   
-religion dispensation? in the name of Rājakumāra relate
3 -mi sunātha me taṃ || nibbānā lokanāthassa aṭṭhavī-  
-I hear me CORREL nirvana world-lord eight-
4 -sādhike gate sahasse pana vassānaṃ chasate pare ta-  
-twenty-and gone thousand and years six-hundred or? before thus
5 -thā ||

Duroiselle's translation: 'Prosperity! Having bowed to the Buddha and the other (two) Excellent Objects, I shall relate the noble work of merit performed, in the Conqueror's dispensation, by Rājakumāra. Hearken to me! When one thousand six hundred and twenty-eight years had elapsed after the Nirvāṇa of the Lord of the World [...]'


1 || śrī || namo buddhāya || purhā skhaṅ sāsana anhac ta-
prosperity honor Buddha exalted personage lord religion year one
2 -c thoṅ khrok ryā nhac chāy het nhac lon
  thousand six hundred two ten eight year elapse
3 liy brī rakā ||

Duroiselle's translation: 'Prosperity! Honour to the Buddha! One thousand six hundred and twenty-eight years of the Buddha's religion having elapsed [...]'


1 || śrī || [n]amo b[u]ddhāya || śrī || sās kyek buddha tirley
prosperity honor Buddha prosperity religion worshipful person Buddha lord
2 kuli ār moy lṅim turow k[l]aṃ ḅār cwas diñcām cnām  
go on go/AUX one thousand six hundred two ten eight year
3 tuy ||  

Blagden's translation: 'Prosperity! Honour to Buddha! Prosperity! After the religion of the Lord Buddha had gone on for one thousand six hundred and twenty-eight years [...]' GROKKING UP TANGUT PERFECTIVE PREFIXES (PART 1: OVERVIEW)

Tangut has a set of directional perfective prefixes that remind me a bit of perfective prefixes of prepositional origin in Slavic and adverbs of prepositional origin in English. Arakawa's Studies on the Tangut Version of the Vajracchedikā-prajñāpāramitā (2014: 149) reproduces Nishida's (1989) list of directional perfective prefixes with the addition of a seventh prefix in a footnote:

Direction (Arakawa 2014: 149, based on Nishida 1989) Direction (Gong 2003) Tangraph Arakawa reading This site Arakawa's (2014) notes on usage in the Tangut version of the Vajracchedikā-prajñāpāramitā summarized Frequency in Sea of Meaning (Arakawa 2015: 18) Arakawa's (2015) notes summarized
upward 1a?- 1a0- high frequency (p. 149) 16 " 'upward' in many cases"
downward 1na:- 1na4- low frequency (p. 149) 12 " 'downward' in most cases"
here, toward the speaker here, inside 1kI:- 1ky4- low frequency (p. 150) 40 "might be 'inside' in some cases"
there, away from the speaker there, outside 2wI:- 2vy3- used with adverbs indicating the past (e.g., 1pI: 2no: 'long ago' = my 1py4 2no4) and in the word 2wI: 2rar 'past' (= my 2vy3 2rar1; prefixed to a verb 'to pass'; cf. English past; p. 150) 29 "Probably [...] 'outside' "
upriver; inward away from the speaker 2da:- 2da4- used with various verbs without any common denominator (p. 150) 43 "Here, the tendency is 'away from the speaker or agent', 'not accessible', and 'to leave, not to return' [...] In some cases, the verb following 2da:- seems to be 'unhappy'."
downriver; outward "direction not found" 2rI:r- 2ryr4- often with verbs of speaking but also with 'to come' and 'to arrive' (p. 151) 26 "difficult to determine the direction"; "precedes some verbs related to vocal acts"
(not given) towards the speaker 2dI:- 2dy4- often with verbs of taking unlawfully by force; rare (p. 152) 4 "so rare"; direction "uncertain"

(Thanks to Mahādātṛ for Arakawa's 2015 article "On the Tangut verb phrase in The Sea of Meaning Established by the Saints".)

Arakawa and Gong agree on the functions of the first two prefixes. Their interpretations of the second two partially overlap, and their views on the last three are very different.

The -0 in my reading of the first prefix indicates that its grade is unknown. The other prefixes all belong to Grade IV (indicated by -4) except for the third prefix which has Grade III (indicated by -3; Grade IV cannot occur with v-). The significance of this skewing is unknown. (7.22.0:50: If Tangut and Chinese grades have similar origins, then Tangut Grades III/IV may have developed in unmarked syllables, just as Chinese Grade III and chongniu Grade IV developed in nonemphatic [i.e., unmarked] syllables. I expect affixes to tend to be phonologically unmarked. I believe Old Chinese grammatical morphemes tend to be nonemphatic.)

The prefixes either have a or y (phonetically a nonlow central vowel like schwa). They may have been unstressed and therefore only had the achromatic (i.e., neither palatal nor labial) subset of the Tangut vowel system.

Arakawa (2014) did not supply frequency statistics for perfective prefixes in the Vajracchedikā-prajñāpāramitā. Nonetheless it is clear from his text that their frequencies do not match those in the Sea of Meanings: e.g., the most common prefix in Vajracchedikā-prajñāpāramitā is 1a0-, whereas it is 2da4- in The Sea of Meaning. I do not know whether this difference is due to geography, chronology, and/or genre. It would be interesting to see how individual verbs are prefixes in those two texts and others. My dream is to have a Tangut verb dictionary containing all attested affixes with text-specific frequency data. We are still far from being able to say that we have

1a0-2tse4-2ni4 'understood' = lit. 'up-understand-PL' (= the "Grokked Up" of the post title*)

how Tangut verbs work. The outlines have been established; the details remain unclear.

*7.22.0:28: Based on Vajracchedikā-prajñāpāramitā 18.4

1a0-2tse4-2nga1 'I understood' = lit. 'up-understand-1S'

with the suffix changed. WHAT IS THE ORIGIN OF ROHINGYA TONES?

Are the three tones described in this Unicode proposal for the Rohingya script due to Burmese influence? What conditioned each tone? The low frequency of the tonal signs suggests that tones may have arisen as compensation for lost low-frequency segments or segmental features. What is the pitch associated with the absence of a tonal sign?

Do Burmese loanwords have tones, and if so, do they retain their original tones, have Rohingya approximations of those tones, or have yet other tones?

Might Rohingya have pitch accent instead of Southeast Asian-style tones?

I just learned that Unicode has six characters called "ARABIC TONE" (08EA-08EF) corresponding to the Rohingya tone characters. Are those six characters used in Arabic-script Rohingya? (7.21.0:03: Yes. I was thinking they might have been invented for some African language.)

Wikipedia states that the Rohingya script has "a few borrowings from Roman and Burmese", but I can't find them. WHY DOESN'T MON <ṄA> LOOK LIKE BURMESE <ṄA>?

Mon and Burmese are written in variants of the same script. Hence the Mon and Burmese spellings of 'Tenasserim' are similar:

Mon: တနၚ်သြဳ <tanaṅsrī>

Burmese: တနင်္သာရီ <tanaṅsārī> [tənɪ̀ɴθàjì]

One difference is not as great as it seems. Burmese <ṅ> is written as a superscript ɛ-like shape atop <sa>. When written on the line (with its inherent vowel restored), Burmese င <ṅa> looks like Mon ၚ <ṅa> except for the lack of a bottom stroke. (The top stroke of Mon ၚ် indicates that ၚ <ṅa> is to be read without its inherent vowel.) Burmese င <ṅa> is clearly a rounded descendant of Brahmi 𑀗 <ṅa>. The Mon character in the Myazedi inscription from nine centuries ago looks like modern Burmese င <ṅa>. When did the Mon add a stroke, and why? And does that bottom stroke have anything to do with the bottom parts attached to C-shapes in <ṅa> in other Indic scripts: e.g., Devanagari ङ?

7.20.1:36:: IndoSkript shows ㄷ-shaped characters for <ṅa> from c. 100-200 AD. Then it displays a <ṅa> resembling Tibetan ང <ṅa> atop a ㅅ shape from c. 350-375 AD in Kuchar - in what is now northwestern China, quite far from the Mon. Is that ㅅ-shape relevant to the reversed S-shape at the bottom of Mon ၚ <ṅa>? That shape resembles the <ṅa> of the Manur inscription (c. 840-880 AD) in what is now Tamil Nadu. Could Mon ၚ <ṅa> originate from a stack of two <ṅa>? WHY DOES TENASSERIM END IN M?

I wonder what the etymology of the name Tenasserim is. Seri looks like Sanskrit Śrī, but what is the first half? Is it Mon?

None of the major languages of the area have e or m in their versions of the name:

Burmese: တနင်္သာရီ <tanaṅsārī> [tənɪ̀ɴθàjì]

Mon: တနၚ်သြဳ <tanaṅsrī>

Thai: ตะนาวศรี <taḥnāvśrī> [tanaːwsǐː]

Malay: تانه ساري <tānh sāry> Tanah Sari

I think e may reflect a schwa and an epethentic vowel between s and r. But why does the name have a final -m in Western languages?

Why does the second syllable have different codas (nasal, [w], or [h])? VOWEL LENGTH AND INTRUSIVE NASALS IN SANSKRIT VS-STEMS

I was puzzled by Sanskrit Vs-stems when I first learned how to decline them in 1992, and I remain puzzled today. Most forms (singular and dual / plural) can be generated by adding endings to -Vs stems and applying the following rules which apply to Sanskrit in general:

1. s > after i or u and before a vowel: e.g., havis > havi 'oblation' (inst. sg.)

2. -as > -o before voiced consonants: e.g., manas-bhyām > manobhyām 'minds' (inst./dat./abl. du.)

3. -s > -r after i or u and before voiced consonants: e.g., havis-bhyām > havirbhyām 'oblations' (inst./dat./abl. du.)

4. s before s: e.g., manas-su > mana-su 'minds' (loc. pl.)

But those rules cannot explain a few forms:

Why do the m./f. nom sgs. have long vowels before the stem-final -s?

5. sumanās 'favorably minded' (m./f. nom sg.)

cf. sumanas 'id.' (n. nom. sg.)

Why do the n. nom./acc./voc. pls. have long vowels and an anusvāra nasal (written here as and in Whitney's grammar as ṅ) before the stem-final -s?

6. manāṃs-i 'minds' (n. nom./acc./voc. pl.) instead of *manas-i

Is 5 by analogy with mant/vant-stems that also have lengthening in the m. nom. sg.? (The feminines of mant/vant-stems are ī-stems:  paśumatī instead of *paśumān.)

sumanāspaśumān*-ēn < *-en-s < *-ent-s? 'rich in cattle' (m. nom. sg.; is the long vowel of -mān due to Szemerényi's law?)

cf. the acc. sg.: sumanas-ampaśumant-am

Why do ant-participles and an-stems work somewhat differently?

bhavan < *-ont-s 'being' (m. nom. sg.); why isn't this *bhavā < *-ō < *-ōn < *-on-s < *-ont-s?; cf. rājā below

rājā*-ō < *-ōn < *-on-s 'king' (m. nom. sg.); final *-n was lost after  but not 

the m. acc. sg. bhavant-am has a short vowel like sumanas-am and paśumant-am, but rājān-am has a long vowel even though Szemerényi's law can't apply to a word without *s or *laryngeals!

I think 6 and similar neutral plurals of vowel stems are by analogy with an-neuters; they all share the pattern long stem vowel + nasal + -i:

manas > manāṃs-i 'minds' (as-stem; -ns--ṃs-)

asya-m > asyān-i 'mouths' (a-stem)

vari > varīṇ-i 'waters' (i-stem; n after r)

madhu > madhūn-i 'honeys' (u-stem)

nāma > nāmān-i 'names' (an-stem)

The stem-final nasal in nāmāni was restored by analogy and the regular neuter plural ending -i was added:

*ʕʷneʕʷmon-ʕ > *nōmō > (*)nāmā (attested in Vedic?) > nāmān-i (cf. nom./acc./voc. du. nāman-ī and voc. sg. naman, but the second a in those forms isn't old; they go back to *ʕʷneʕʷmn-ʕi and *ʕʷneʕʷmn)

This restoration must have predated the split of Indo-Aryan from Iranian, since the restored nasal is present in Avestan nāmə̄n-i 'names'. However, the nasals in the Sanskrit Vs and vowel stem neuter plurals have no parallels in Avestan, so they must be Sanskrit innovations. POSSESSING SIMILAR ENDINGS IN THE PRESENT

Last week I was puzzled by the stems of Hungarian van 'is' and megy 'goes'. Now I want to look at their present tense endings which partly overlap with possessive endings:

Person/number Present indefinite verb endings Present definite verb endings Possessive endings for singular nouns Possessive endings for plural nouns
1S -ok/-ek/-ök
e.g., lát-ok egy 'I see a ...'
-om/-em/-öm (optional for -ik verbs):
e.g., játsz-ok ~ játsz-om 'I play'
-om/-em/-öm e.g., lát-om a(z) 'I see the ...' -(V)m
e.g., órá-m 'my clock'
e.g., órá-im 'my clocks'
2S -sz [s]
-ol/-el/-öl (-s, -sz, -z, -dz verbs)
e.g., játsz-ol 'thou playest'
-od/-ed/-öd -(V)d -(j)(a)id/-(j)(e)id
-ik (-ik
e.g., játsz-ik 'he/she/it plays'
-ja/-i -(j)a/-(j)e -(j)(a)i/-(j)(e)i
1P -unk/-ünk -juk/-jük -(u)nk/-(ü)nk -(j)(a)ink/-(j)(e)ink
2P -(o)tok/-(e)tek/-(ö)tök -játok/-itek -(V)tok/-(V)tek/-(V)tök -(j)(a)itok/-(j)(e)itek
3P -(a)nak/-(e)nek -ják/-ik -(j)uk/-(j)ük -(j)(a)ik/-(j)(e)ik


1. Why is the pattern of overlap between verb and possessive endings so complex?

Person/number Possessive endings for singular nouns Possessive endings for plural nouns
1S ends in -m like present definite
2S ends in -d like present definite
3S -ja looks like present definite but presumably linking -j- + 3S possessive suffix -a -i looks like present definite but presumably an unrelated plural possessive suffix -i
1P similar to present indefinite ends in -nk like present indefinite
2P definite, indefinite, and both types of possessives all of the tVk type
3P unlike verb endings aside from plural -k -ik looks like present definite but presumably an unrelated plural possessive suffix -i + plural -k

Is the unity of the 2P endings original or the result of a merger? Can a more consistent system be reconstructed for an earlier stage?

2. Why do -ik verbs have optional indefinite endings that look like definite endings only in 1S?

3. Did 2S present indefinite -sz and -Vl originally have different functions before being reinterpreted as allomorphs for different stem types?

4. Why do -ik verbs have a special 3S ending?

5. Why do possessive endings for consonant-final plural nouns have 'bridges' that look like the third singular possessive endings for singular nouns?

kert-je-im 'my gardens' (not *kert-im); cf. kert-je 'his/her/its garden'

6. Why do definite 3S, 2P, and 3P have a jA ~ i alternation instead of a jA ~ je alternation?

7. Why doesn't definite 1P end in -nk?

8. Why do definite 2P and 3P have long á instead of short a?

BONUS: Why isn't játsz- [jaːts] spelled jác [jaːts]? I think I can answer that one myself. The spelling is etymological; játsz- is from ját- plus -sz-. VAN MEN

What is the story behind the irregular conjugations of Hungarian van 'is' and megy 'goes'?

Number/person Ending(s) 'to be' < Proto-Finno-Ugric* *wole- 'to go' < Proto-Uralic *mene-
1st singular -ok/-ek vagy-ok megy-ek
2nd singular -sz [s] vagy-Ø mé-sz ~ mégy-Ø
3rd singular van-Ø megy-Ø
1st plural -unk/-ünk vagy-unk megy-ünk
2nd plural -tok/-tek vagy-tok men-tek
3rd plural -nak/-nek van-nak men-nek

The list of endings is not exhaustive and only includes endings that would normally be expected for these two verbs.

1. Why do the two verbs have -gy [ɟ] even though their roots lack palatal consonants?

2, Why does that gy have different distributions in the paradigms of the two verbs: e.g.,  van and vagytok (not *vagy and *vantok) but megy and mentek (not *men and *megytek)?

3. Why do 'thou art' and one form of 'thou goest' have a zero ending?

4. Why do the forms of 'thou goest' have long vowels? Is length in mész compensating for a root-final consonant lost before -sz?

5. Why does 'to be' have a instead of o which is still in other forms like volt 'he/she/it was'?

6. Why does 'to be' have n instead of l which is still in other forms like volt 'he/she/it was'?

I could ask even more questions about the rest of the paradigms of those two verbs (e.g., why is the potential of 'to go' me-het with the stem reduced to an open syllable?), but I'll stop here.

*Although Proto-Finno-Ugric may not even exist (cf. Tibeto-Burman in Sino-Tibetan), I cite this form merely to indicate that the source of the Hungarian verb had *l which is still in some other forms of the verb (e.g., volt 'he/she/it was') as well as in related languages: Finnish olla and Estonian olema. WHAT IS THE INDIC SOURCE OF THAI NATTA?

That question came to mind when I saw the name of this restaurant. I assume the Natta of Natta Thai is from the name [náttʰaː] which I've seen spelled ณัฏฐา <ṇaṭṭhā> and ณัฐฐา <ṇaṭhṭhā>. The letters ณ <ṇ>, ฏ <ṭ>, and ฐ <ṭh> are for retroflex consonants that were never in Thai and usually signal Indic origin. Yet I cannot find any Sanskrit or Pali words beginning with ṇa- other than Skt ṇakāra 'the sound ṇ' which is not relevant here. Is ณ <ṇ> a hypercorrection for น <n>? There is a Pali word naṭṭha ... but it means 'destroyed'!

Since I mentioned ณ <ṇ>, here is a question I've had for a long time: why is the Thai preposition [náʔ] spelled ณะ <ṇḥ>? Was that an attempt to dress up a native word in Indic-like guise? The use of a low-frequency letter also makes the word stand out. Was that intentional? How far back does the retroflex spelling go? Was the word ever spelled with dental น <n> as นะ <nḥ> like Lao ນະ <nḥ> [nāʔ]? TONOGENETIC CLUES IN MIZO 'DECLENSION'?

I first heard of Mizo (as 'Lushai') back in the late 90s when I learned that Starostin had found a correlation between Mizo short vowels and Middle Chinese Grade III (going back to Old Chinese 'type B' syllables which he reconstructed with short vowels and which I reconstruct as nonemphatic). See Sagart's (1999: 42-43) summary of proposals concerning the origin of Grade III which is often reconstructed as a  medial *-j-.

I didn't look at Mizo again until tonight when I took a good look at its Wikipedia entry. Normally, I expect Asian tonal languages to be 'isolating' like Chinese, but Mizo nouns decline! Or is 'declination' an artifact of looking at Mizo through an Indo-Aryan lens and/or Mizo orthography? Would it be better to analyze the suffixed case forms as noun-postposition sequences as in DeLancey (2004)? In any case, the ergative and instrumental both end in -in but have different tones. Does that tonal alternation reflect one or more lost final consonants? Are the two -in from a single original suffix (or postposition) with or without a following glottal suffix that conditioned a different tone?

*-in > -in + tone

*-in-H > -in + a different tone

6.21.23:36: Segmental affixes may also be the source of tone changes in derived verbs (though some derivations may postdate tonogenesis and be by analogy with existing pairs of verbs).

6.21.23:57: How many tones does Mizo have? Wikipedia lists eight. But Khoi Lam Thang (2001: 40) listed five, and Lorrain (1940) in Namkung (1996: 234) listed only three! How can these different descriptions be reconciled? And where did these tones come from? Wikipedia makes it sound as if Mizo had Chinese-style tonogenesis:

Tone systems have developed independently in many of the daughter languages [daughters of which language?] largely through simplifications in the set of possible syllable-final and syllable-initial consonants. Typically, a distinction between voiceless and voiced initial consonants is replaced by a distinction between high and low tone, while falling and rising tones developed from syllable-final h and glottal stop, which themselves often reflect earlier consonants.

I hoped to see the details in this process in Khoi Lam Thang's (2001: 98) dissertation. Unfortunately, his reconstruction of Proto-Chin, the ancestor of Mizo and its sisters, lacked a tonal component.

This  analysis  shows  that  there  are  comparatively  clearer  tonal  correspondences between Tedim, Mizo and Hakha. However, tone in Mara, Khumi and Kaang are split within the Patterns [established by Gordon Luce for Chin languages such as Mizo], tremendously complicated and without predictable environments. Thus, while a reconstruction of proto Northern Chin may be proposed from this data, a reconstruction of Proto Chin tone is incomplete and cannot at present be proposed. Therefore this thesis will be limited to a segmental reconstruction for Proto Chin. A Chin tonal analysis is in progress by Dr. Fraser Bennett and Ajarn Noel Mann. Their initial findings seem much closer to Luce’s Tonal Patterns.

I wonder what their final findings were. REPLICATING GRAINS OF GOLD

I like Andrew West's English title for the Tangut text that I have beencalling the Golden Guide. His latest post is about manuscript copies of the Grains and practice pieces in which characters from the Grains were written repeatedly.

He used my notation to transcribe Tangut readings with a twist: he wrote tones as superscript numerals and grades as subscript numerals: e.g., he wrote the reading of

'moon, month'

as ²lhiq₄ = lhiq with tone 2 and Grade IV. I write it as 2lhiq4 because superscript and subscript numerals are difficult for me to type and to read.

He linked to my notes on the Grains whenever they were available. I still have 96 lines left to translate and annotate. (I stopped at line 104 in January.) Now I want to finish so Andrew can add more links to his entry. DID KHITAN AND JURCHEN SHARE A WORD FOR 'GRANDSON' (PART 2)?

I forgot to make a few points about Khitan

191 'grandson'

in my last entry, and I've thought more about the topic since, so here's a follow-up I didn't plan.

Why does 191 mean 'grandson'?

I don't know. I haven't seen Liu Fengzhu and Chengel's (2003: 18) explanation for that gloss. If I can find it, I might write a part 3.

191 occurs four times in the epitaph of Field Marshal Yelü, but none of those four occurrences unambiguously mean 'grandson' (Wu and Janhunen 2010: 159, 161, 190).

191 also functions as a phonogram: e.g., in the female name

191-236-372-361 <191.ur.û.en> (Xiao Dilu 26.26; see Wu and Janhunen 2010: 106-107).

How was 191 pronounced?

Lu Yinghong & Zhou Feng (2000: 49) read it as [mu] because they regarded it as a transcription of Liao Chinese 睦 *muʔ. However, the rest of what they regarded as a transcription of a Chinese phrase is not a good match (Wu and Janhunen 2010: 107). Given that other Chinese final glottal stops may have been Khitanized as (= -h in Kane 2009), perhaps 191 was <muɣ> (which resembles Written Mongolian omuɣ 'clan', though I am skeptical of apheresis; see below).

The fact that 191 is often followed by u-graphs (e.g., 236 <ur> above; see Qidan xiaozi yanjiu 312 and Wu and Janhunen 2010: 317 for others) suggests that its reading may have ended in -u. Perhaps the name above was something like Mu(u)ruen.

Kane (2009: 302) transcribed 191 as <mú>, but his entry for the character on p. 58 is blank, so I do not know his reasoning.

Wu and Janhunen (2010: 264) transcribed 191 as <mó>, presumably reflecting Wu Yingzhe's (2007: 46-47) which I haven't seen. Maybe by part 3 ...

Does the Khitan word written as 191 has external cognates?

If the reading of 191 began with an m-, I doubt it can be connected to Manchu omolo 'grandson', since I don't know of any cases of Khitan C- corresponding to VC- in other languages. Hence I don't think Khitan underwent apheresis. (Is there any language that lost all initial vowels?) A reading mu would make a link even more problematic since I would not expect Khitan u to correspond to Manchu o.

In part 1, I proposed that 191 may have been <om>. Such a short form - if valid - raises other issues. Manchu omolo has apparent cognates throughout Tungusic with the shape omol(g)V (Cincius 1975 2: 17-18). Therefore the word might be reconstructed at the Proto-Tungusic level. Is the word a loan from pre-Khitan (prior to monosyllabic reduction) into ((pre-)Proto-)Tungusic or vice versa? It cannot be a loan from Khitan into Jurchen or any other Tungusic language, since that scenario cannot account for final -l(g)V. Gorelova (2002: 114) analyzed Manchu omolo as omo-lo with a noun suffix -lo. That analysis seems to be synchronically correct since the plural of omolo is omosi with the plural -si replacing -lo before the root.  But is it diachronically correct? Was the Proto-Tungusic root *omo- rather than *omol(g)V, or was the word reanalyzed within Manchu? The Jurchen plural

<omo.lo.shi>  (Kyŏngwŏn inscription 3:2)

 could either be analyzed as omo-lo-shi with double suffixes or as omolo-shi with a trisyllabic root that was later reanalyzed as a root-suffix sequence omo-lo by analogy with other -lo nouns in Manchu. 

Starostin's online Altaic database treats Proto-Tungusic  *omu- (sic) 'offspring, descendant, grandchild' and *umu- 'to lay eggs' as one and the same root. I reject that identity for three reasons. First, the supposed initial vowel alternation looks like an ad hoc device to tie the two roots together. Second, all evidence points to *o as the second vowel of the om-root; *u is another bridging device to make the child root look like 'to lay eggs'. Finally, *umu- was apparently reconstructed solely on the basis of Evenki umū-. A form in a single language cannot be projected back to the proto-language.

All that effort enables Starostin to connect the Tungusic omo- (not omu-!) words to various um-words elsewhere in 'Altaic':

Old Turkic umay 'name of a goddess' < 'placenta'?

If I am reading Clauson 1972: 164-165 correctly, the word is first attested in the 8th century AD as the name of a goddess "whose particular function was to look after women and children, possibly because this object [the placenta] was supposed to have magic qualities". The first attestation of the meaning 'placenta' that I can see in his entry was in the 11th century AD. I assume 'placenta' is the earlier meaning even though it is actually found later.

Written Mongolian umai 'womb'

Korean um 'sprout'

Japanese um- 'to give birth'

The Turkic and Mongolian words must share a common source; one language probably loaned the word to the other.

The semantics of the Korean word are distant from 'womb'. Um may be an -m-suffixed nominalization of an extinct verb 'to sprout'.

The Japanese word may be a chance lookalike like English womb [wum]. Is English 'Altaic'?

Starostin reconstructed Proto-Altaic *úmu 'to give birth'. According to the rules in Etymological Dictionary of the Altaic Languages (2003 1: 18), the first vowel of the reflexes of a Proto-Altaic word with the vowel sequence *u-u should be *U in Proto-Tungusic. The cover symbol *U enabled Starostin et al. to regard both the improbable *umu- and the incorrect *omu- to be descendants of *úmu. DID KHITAN AND JURCHEN SHARE A WORD FOR 'GRANDSON'?

Last weekend, I opened Wu and Janhunen (2010) at random and saw this passage about the Khitan small script character


on p. 107:

Even so, assuming that the value [mu], here romanized as mó, is approximately correct, the Khitan item for 'grandson' may perhaps be compared with Manchu omolo id., suggesting that the actual pronunciation might also have been [omo] (Wu Yingzhe 2007f: 46-47).
I wonder if 191 was [om] ~ [mo] with a reversible reading like other Khitan small script characters such as

222 [iń] ~ [ńi].

The Jurchen word for 'grandson' was omolo as in Manchu. I suspect that the character variously written as

was originally a logogram for omolo (though it is not attested alone) which later acquired a following <lo> (see my posts from 6.1 and 6.3) in the attested spellings:


(Kyŏngwŏn inscription 3:2, mid-12th century; the spelling on the left is from Jin 1984: 205 and the spelling on the right is from Jin and Jin 1980: 336)

(Deshengtuo inscription 14, 1185; Jin 1984: 205 also reports this in Yongning 12, but Jin and Jin 1980: only list the second of the next two spellings in Yongning 12.)


(Yongning temple inscription 12, 1413; the spelling on the left is from Jin 1984: 205 and the spelling on the right is from Jin and Jin 1980: 364)

(Hua-Yi yiyu Berlin ms. people section 14, before c. 1500?)

I have not seen any of the originals, so I am not certain about the details.

I have not yet been able to find an exact match for Jurchen <omo> in the Khitan large script. Characters 0170, 0204, and 0205 in N4631 are vaguely similar, but until their readings and/or meanings are known, I cannot regard them as prototypes for Jurchen <omo>. KHITAN SMALL SCRIPT CHARACTER 346 IN QIDAN XIAOZI YANJIU

Qidan xiaozi yanjiu (1985), the foundation of current studies on the Khitan small script, only lists four instances of 346 in the texts it covers:

244-346-273 <s.?.un> (道 14.11, 24.16, 仲 17.37) and 251-346-273 <n.?.un> (許 57.33)

Are those genitives of nouns, or is <un> part of the stem? If <un> is a genitive suffix, the vowel of 346 should be u according to the present understanding of Khitan vowel harmony. So perhaps that is partly why Kane (2009) transliterated it as <uŋ> and Wu and Janhunen (2010) transliterated it as <ung₂>. The final nasal reflects the assumption that 346 is a variant of single-dotted 345 <ung> from my last post:

345 is much more common. Qidan xiaozi yanjiu lists 72 occurrences of 345 which can appear by itself (on the murals where characters are often not grouped into blocks) and in first, third, and fourth position: e.g.,

345-041 <> (興 25.3), 334-019-345 <> for Liao Chinese 宮 *giung (or *güng?) (道 6.33), 048-092-261-345-341 <?> (許 61.2)

Is 346 simply a variant of 345 (Kane 2009: 77), or is it a distinct character? If it is the latter, was its reading similar to <ung> (e.g., <üng>) or was it something else with an u-vowel? 346 coexists with 345 in all three texts where it was found (道, 許, 仲). Was the number of dots on the bottom random like the dots in the three variants of Jurchen <lo>?

The fact that 346 only occurs in blocks of the type <C.346.un> suggests a deliberate choice, though it could also be an artifact of extremely limited data. Qidan xiaozi yanjiu does not list the blocks <s.ung.un> and <n.ung.un> with 345 instead of 346. Is this complementary distribution accidental or meaningful? Have any such blocks been found in the three decades following the publication of Qidan xiaozi yanjiu? The closest block with 345 is

244-345 <s.ung> 宋 'Song (dynasty)' (仁 8.13)

which might be the stem of

244-346-273 <s.?.un> (道 14.11, 24.16, 仲 17.37)

if 345 and 346 really are equivalent and if <un> is a genitive suffix.

If 244-346 is also 'Song', could 251-346 be a loan of a Liao Chinese word *nung? AN 'ETERNAL' LINK BETWEEN THE KHITAN SMALL SCRIPT AND THE JURCHEN (LARGE) SCRIPT?

Tonight I noticed that the Jurchen (large) script character


for the transcription of Ming Chinese 永 *yüng 'eternal' resembles a cross between the Khitan small script characters

106 ~ 345 ~ 346

which are slightly different ways to transcribe Liao Chinese *-ung. (I assume 106 is an abbreviation of 345. The function, if any, of the extra dot in 346 is unknown.)

Was the Jurchen character derived from 106/345/346, or is the similarity a coincidence? Normally Jurchen characters are thought to be derivatives of Khitan large script characters or 'sisters' if not descendants of those characters. So I would expect Jurchen <üng> to be somehow related to Khitan large script characters such as these two (1692 and 0555 in N4631):

N4631 glossed 1692 as 'first' and listed the reading [tʰur] (= <tur> in my Khitan transcription). There is no semantic or phonetic resemblance to 永 *yüng 'eternal' or its (near-)homophones.

Nothing is known about 0555. Was it pronounced üng?

The Khitan small script character for Chinese transcription


that Kane (2009) transcribed as <iúng> may have been pronounced üng. It of course does not look anything like Jurchen <üng> unless one is imaginative. I doubt the Jurchen - who were literate in Khitan - overlooked it and chose a small script character with a somewhat different reading (106/345/346) as the basis for their <üng>.

6.3.1:06: Maybe I am wrong about 181 being üng. The Liao Chinese rhyme that it transcribes was also transcribed in the small script as 019-345 <iu.ung>: e.g.,

334-019-345 <g.iu.ung> for 宮

So was 181 <iung>? (I see no reason to add an acute accent, as there is no <iung> distinct from <iúng> in Kane 2009.)

Another possibility is that the Liao Chinese rhyme was -üng, and the Khitan had two strategies for writing it: a spelling reflecting a partially nativized -iung (if Khitan had no ü) and a spelling with a character specifically designed for -üng. The degree of phonetic mismatch between Liao Chinese and Khitan must have been considerable, though it eludes precise measurement. LAST OF THE OLD JURCHEN SCREENCAPS

Yesterday I discovered a screencap of Jurchen characters that I made four years and two laptops ago. At the time I created images of all but two that I didn't upload until tonight:

<i(r)> and <lo>

Those two might be the last all-new images of 72-point characters on this site. All future images will be of 48-point characters or be derivatives of existing 72-point character images.

Although the two characters look very similar, they have completely different phonetic values.

<i(r)> was transcribed as Ming Chinese* 一兒 *ir in the Sino-Jurchen glossary (Kiyose 1977: 91) and in turn transcribed the initial *y- of Ming Chinese 永 on the Yongning Temple Stele (lines 1, 6, 8, 10, and 13). A dotless variant

appears on line 9 of that inscription.

<lo> corresponds to the Ming Chinese phonetic transcription 洛 *lo of the name

<> = 充哥洛 *cunggolo (<ge> was also transcribed as Ming Chinese 革 *ge.)

in memorial XI (Kiyose 1977: 201) and has dotless and single-dotted variants:

The dotless variant from the Yongning Temple Stele looks like the Chinese character 早 *dzaw and the Khitan large script character

whose reading is unknown. Was 早 also read <lo> in Khitan?

The two appear together - ignoring variation - in the transcriptions 


<i.üng.lo> (line 8 and 10) and <i.üng.lo> (line 13*)

of 永樂 *yünglo in the Yongning Temple Stele inscription. That word illustrates how the presence of absence of a left-hand bend in the central stroke is the key difference between the two graphs.

*6.2.1:08: Ming Chinese reconstructions are in the same non-IPA orthography that I use for Jurchen and Khitan to facilitate comparison. The use of identical letters in different languages does not necessarily entail exact phonetic matches: e.g., Ming dz [ts] was voiceless unlike Khitan

104~354 <dz>

which may have been voiced [dz].

**6.2.0:54: The two-character transcription


of 永樂 *yüŋlo in line 6 is presumably an error for


attested on line 13.

Tangut fonts by
Tangut radical and Khitan fonts by Andrew West
Jurchen font by Jason Glavy
All other content copyright © 2002-2015 Amritavision