Amaravati: Abode of Amritas

15.3.7:23:49: Y NOT? THE INITIAL OF TANGUT 'EIGHT'

Two nights ago, I transcribed the reading of Tangut

4602 'eight'

as 1ar4. One might expect its Tibetan and Chinese transcriptions to be a(r) and 阿 *a, but in fact they are

rye, ?e, na (sic!)

耶 *1ya4, 盈 *1yeng4

with nonzero, non-glottal stop initials.

Given that evidence and the fact that the word is cognate to Written Tibetan brgyad < *p-rjat and Somang wu-rját*, why don't I transcribe it as 1(r)yar4 from pre-Tangut *rjat?

- Tibetan ry- may reflect a Tangut dialect that did not simplify *ry- to y-.

- Tibetan n- may be due to misreading r- as n-.

3.8.2:16: Andrew West pointed out that na is in fact a Tibetan transcription of

4601 2na4 'second person singular suffix'

and not 4602 1ar4 in this manuscript.

- Tibetan and Chinese e and a may reflect a front low vowel [æ]; Grade IV is associated with frontness: e.g., 耶 ye and 盈 ying both have front vowels in modern Mandarin (which is not descended from the northwestern dialect known to the Tangut that became a substratum of the Mandarin dialects that replaced it).

The answer is that my 1ar4 was a mechanical conversion of Gong's 1·jar:

Gong's · (glottal stop) > my zero

I want to make my transcription as simple as possible for nonlinguists. ʔ is not understood by laypeople, and a letter like q- as a substitute for ʔ- could be misunderstood as [k].

Gong's Grade III -j- > my -4 after Class VIII initials before rhymes

Gong was not the first to reconstruct 'eight' with a glottal stop-yod cluster:

Nishida 1964: 1ˀyar

Sofronov 1968: 1·i̯ạ ̣(I have restored a subscript dot that was accidentally omitted)

Arakawa 1997: 1'ya:r

I am suspicious of this cluster because Nishida, Sofronov, and Arakawa did not reconstruct a simple initial [j]. Is there any language that has initial [ʔj] without [j]? Is the glottal stop really necessary?

Although Li Fanwen abandoned his 1986 reconstruction in favor of Gong's mid-90s reconstruction, I think Li may have been correct when he reconstructed a simple j- instead of ʔj- in 1jǐar 'eight'.

It is true that Chinese transcriptions of Tangut syllables that Li reconstructed with j- would have been pronounced with initial *ʔ- as well as *j- in Middle Chinese. However, there is no guarantee that the *ʔ- : *j- distinction was maintained before high vowels in the post-Middle Chinese Tangut period. (It has been lost in modern Mandarin.) Moreover, even if the distinction was maintained, the Tangut native speaker author Kwyli Rirphu (骨勒茂才 'Gule Maocai') of the Timely Pearl might not have been able to hear it. His Chinese transcriptions would not necessarily be the same as those of a Chinese native speaker.

I will add y- (IPA [j]) to 'eight' and all other syllables in its fanqie chain in my database of Tangut syllables.

*Although I prefer to use Japhug as an example of a rGyalrong language, Japhug rcat has an inexplicable -c- instead of the expected *-ʑ- from the *-j- preserved in Somang.

15.3.6:23:56: TANGUT THOUGHTS FROM CHINA?

In " 'Prime'-'Eight' Problems", I mentioned

2621 2se'4 < *Cɯ-saŋʔ-s 'to think'

as an example of a Tangut word with 'prime'.

Li Fanwen (2008: 431) regarded it as a Chinese loanword. Its pre-Tangut form is almost identical to Old Chinese 想 *Cɯ-saŋʔ 'to think'. Is this striking resemblance due to inheritance or early borrowing?

First, the resemblance is open to question.

Besides the fact that nobody but me reconstructs presyllables in 2621 and 想 to account for their later vocalism, the root initial of 想 is uncertain. We know sure that 想 had initial *s- in Middle Chinese, but *s- had a variety of possible Old Chinese sources: e.g., *s-nasal and *s-liquid clusters (Baxter and Sagart 2014: 151).

Tangut -e'4 could be from *-je-ʔ as well as *Cɯ-...-aŋʔ, so 2se'4 might be cognate to

3469 2se4 < *sje-s 'to know' (cf. Written Tibetan shes-pa 'id.')

with a primary yod if it is from *sje-ʔ.

However, I think all of the above may be excessively cautious given Japhug sɯ-so 'to think' from *saŋ which is a better semantic match for 2621 'to think' than 3469 'to know'. Hence I follow Guillaume Jacques (2014: 180) in reconstructing 2621 with a final nasal (though not a yod or vowel length which correspond to my high-vowel presyllable and glottal stop).

Jacques' *sjaaŋ : my *Cɯ-saŋʔ-s

It is tempting to assume that Japhug retains a presyllable sɯ- lost in Tangut and Chinese, but in fact that sɯ- is a reduplication of the following syllable (Jacques 2014: 180). I am hesitant to project such reduplication back to the common ancestor of Japhug, Tangut, and Chinese (which may have been a daughter of Proto-Sino-Tibetan rather than Proto-Sino-Tibetan itself).

Second, if the resemblance is genuine, it is probably due to inheritance rather than borrowing. Tangut-Chinese contact seems to predate the founding of the Tangut Empire by only a few centuries. I know of no Tangut borrowings predating Middle Chinese. In Middle Chinese, *Cɯ-saŋʔ became *sɨaŋˀ which in turn became *2son3 with a nasal vowel and later *2so3 with an oral vowel in the northwest. (I use 2- to indicate the 'rising' tone and -3 to indicate Grade III.) Tangut borrowings of Middle Chinese words with the rhyme of 想 have Tangut rhymes 53 -o3, 56 -on1, 57 -on2, and 58 -on4, not rhyme 40 -e' which is otherwise unknown in loans from Chinese (Gong 2002: 423-424). If 2621 is from Chinese - which I doubt - it is anomalous and must be a very early loan predating the shift of *-ɨaŋˀ to 2-on3:

Middle Chinese *sɨaŋˀ > pre-Tangut *sjaŋʔ-s (with added suffix) > Tangut 2se'4

One could try to make a similar argument for

~

2192 1me'4 < *mjaŋ-ʔ 'corpse' and 0781 ~ 0788 2me4 < *mjaŋ-s 'to die'.

as borrowings from Late Old Chinese 亡 *mɨaŋ (< Old Chinese *Cɯ-maŋ) 'to disappear, die', but a pre-Middle Chinese loan is even less likely than an early Middle Chinese loan. I prefer to derive those words from *Cɯ-maŋ-ʔ and *Cɯ-maŋ-s and view them as true cognates of Old Chinese *Cɯ-maŋ.

3.7.0:28: Borrowings of basic words like 'to die' are extremely unlikely when contact is minimal (and of course impossible when contact is nonexistent).

15.3.5:23:55: 'PRIME'-'EIGHT' PROBLEMS

Last night I forgot to include two things in my overview of the development of a-rhymes in Tangut.

First, I wrote nothing about rhymes with the mysterious attribute that I call 'prime': i.e, -a' and -ar'. I write the pre-Tangut source of 'prime' as *X. I arbitrarily write it after vowels, but I really don't know where *X was positioned.
At a glance, it seems that *aX-rhymes developed like *a-rhymes apart from the development of 'prime'': e.g.,

4629 2ghi'4 < *Cɯ-KaX 'to cook'

is parallel to

4513 2dzi4 < *Nɯ-dza-s 'to eat, drink; food'

This afternoon I thought that perhaps *X was *ʔ, and that all cases of the 'rising' tone go back to *-s:

Before: *-ʔ(-s) and *-s > tone 2

Then: *-ʔ > -' + tone 1, *-ʔ-s > -' + tone 2, *-s > tone 2

But now I realize I would have to reconstruct awkward stop clusters with glottal stop in words like

3192 1la'1 < *lakX (*lakʔ?) 'thick'

which has a non-'prime' cognate

2700 1laq1 < *S-lak 'thick'.

A final stop blocked *a from raising. Japhug jaʁ < *laq 'thick' suggests that stop was velar or even uvular. (There is no Tangut-internal evidence for a distinction between velar and uvular codas. I could write the pre-Tangut coda as *-K.)

Maybe *lakX had a final geminate (*lakk) or a stop cluster without a glottal stop (*lak-t or *lak-p which assimilated to *lakk?).

I suspect that tense-rhyme words once had tense initials from *S-C-clusters: e.g., *llak 'thick'. (I write tense initials as geminates following hangul and romanization conventions for Korean.) There are no tense rhymes with 'prime': e.g, *-aq'. Perhaps there was a constraint against syllables of the type *kkakk in pre-Tangut.

Summing up my current view (leaving out presyllables, vowels, and retroflexion to focus on tones and 'prime'):

*-V(C) > *1-V

e.g., *S-lak > 3192 1laq1 'thick'

*-V(C)-s > 2-V

e.g., *Nɯ-dza-s > 4513 2dzi4 'to eat, drink; food'

*-Vʔ, *-V + sonorant + stop, *-V + stop cluster > 1-V'

e.g., *Cɯ-Kaʔ > 4629 1ghi'4 'to cook', *Cɯ-maŋX > 0330 1me'4 'dream' and *lakX > 3192 1la'1 'thick' (*X = *p, *t, *k, *ʔ)

*-Vʔ-s, *-V + sonorant + stop + -s, *-V + stop cluster + *-s > 2-V'

e.g., *Cɯ-ne-ʔ-s > 2518 2ne'4 'heart', *Cɯ-saŋʔ-s > 2621 2se'4 'to think' (cf. Old Chinese 想 *Cɯ-saŋʔ 'to think') and *rjakX-s > 0811 2ar'4 'day' (Forgot to add examples until 3.6.0:36!)

Second, I didn't mention *a-rhymes with what I call 'primary yod' (following Bodman). *r has distinct reflexes before a-rhymes and *ja-rhymes.

Tangraph	Li Fanwen number	Gloss	Pre-Tangut (Jacques 2014)	Tangut (Gong)	Pre-Tangut (this site)	Tangut (this site)	External cognates
	1579	to get	*rja	1rjiʳ	*Cɯ-ra	1rir4	Written Burmese <ra> 'to get'
	4602	eight	*r-jat	1ʔjaʳ	*rjat	1ar4 [jaʳ]?	Classical Tibetan brgyad < *p-rjat 'eight'

Guillaume Jacques' distiction between *rj- and *r-j- corresponds to my *Cɯ-r- and *rj-. I prefer my reconstruction because his medial *-j- (projected backwards from Gong's Grade III/IV -j-) usually does not correspond to anything in other languages or Tibetan transcriptions of Tangut. I think Grade III/IV generally had other sources: e.g., *Cɯ-. However, I think yod is justified in pre-Tangut if it corresponds to yod in other languages: e.g., 'eight' (which was transcribed as rye in Tibetan). Moreover, external evidence points to *r- and not *j- as the initial of the Proto-Sino-Tibetan root for 'eight'.

15.3.4:23:48: FROM PRE-TANGUT TO TANGUT: A-RHYMES

Yesterday I derived

4513 2dzi4 [ndzi] 'to eat, drink; food'

from *CI-ndza with a high front vowel conditioning 'brightening' (raising and fronting) of *a to i4. I could have also reconstructed *NI-dza.

Today I would instead reconstruct *Nɯ-dza with *ɯ symbolizing an unstressed high vowel.

Until now I reconstructed high front and back presyllabic vowels to condition *a in different ways:

*Cɯ-Ca > Ca3/4 (the grade is dependent on the preceding consonant)

*CI-Ca > Ci3/4 (ditto)

However, I now partly follow Guillaume Jacques (2014) who derived Tangut -a from pre-Tangut *-a-stop combinations. Here is my version of the a-rows of his table 39 on p. 206:

Stage 0	Stage 1	Stage 2	Stage 3
*(Cʌ-)...-(r)a		1-i1/2
*(Cʌ-)...-(r)aʔ(-s)	*(Cʌ-...)-aH	2-i1/2
*(Cʌ-)...-(r)as
*(Cʌ-)...-(r)ap		*1-aʔ1/2	1-a1/2
*(Cʌ-)...-(r)at
*(Cʌ-)...-(r)ak
*(Cʌ-)...-(r)ap-s	*(Cʌ-)...-(r)aS	*2-aH1/2	2-a1/2
*(Cʌ-)...-(r)at-s
*(Cʌ-)...-(r)ak-s
*(Cʌ-)...-(r)ar(-ʔ/s)	*(Cʌ-)...-(r)ar(-H)	*1/2-ar1/2	1/2-ar1/2
*(Cʌ-)...-(r)aw(-ʔ/s)	?		1/2-o1/2
*(Cʌ-)...-(r)aj(-ʔ/s)			1/2-e1/2?
*(Cʌ-)...-(r)am(-ʔ/s)			1/2-on1/2
*(Cʌ-)...-(r)an(-ʔ/s)			1/2-an1/2?
*(Cʌ-)...-(r)aŋ(-ʔ/s)			1/2-o1/2
*Cɯ-...-a	*Cɯ-...-ɨa	1-i3/4
*Cɯ-...-aʔ(-s)	*Cɯ-...-ɨaH	2-i3/4
*Cɯ-...-as
*Cɯ...-ap	*Cɯ-...-ɨap	*1-aʔ3/4	1-a3/4
*Cɯ-...-at	*Cɯ-...-ɨat
*Cɯ-...-ak	*Cɯ-...-ɨak
*Cɯ-...-ap-s	*Cɯ-...-ɨaS	*2-aH3/4	2-a3/4
*Cɯ-...-at-s
*Cɯ-...-ak-s
*Cɯ-...-ar(-ʔ/s)	*Cɯ-...-ɨar	*1/2-ar3/4	1/2-ar3/4
*Cɯ-...-aw(-ʔ/s)	?		1/2-e3/4?
*Cɯ...-aj(-ʔ/s)
*Cɯ-...-am(-ʔ/s)			1/2-on3/4
*Cɯ-...-an(-ʔ/s)			1/2-an3/4?
*Cɯ-...-aŋ(-ʔ/s)			1/2-e3/4
*(Cʌ)-(r)aˠm(-ʔ/s)			1/2-a1/2?
*(Cʌ)-(r)aˠŋ(-ʔ/s)

Notes on stage 0

1. I assume the pre-Tangut coda inventory was similar to that of Old Chinese.

2. I assume that Tangut tones originated from final segments as in Old Chinese.

3. Unlike Old Chinese, pre-Tangut had a velarized *aˠ. This vowel only had distinct reflexes before *-m and *-ŋ.

4. I assume that *-ʔ and *-s were suffixes after consonants. In other words, there were no roots ending in two consonants. (I could be wrong if, for instance, *-nʔ was from an earlier root-final *-nTV, etc.)

5. I assume *-ʔ could not occur after stops: e.g., there was no *-k-ʔ, etc.

6. Some *-w are third person patient suffixes: e.g., *Nɯ-dza-w, the stem of 'I eat it' and 'thou eats it'. That stem was later written as

4547 1dzo?

whose grade could be 3 or 4.

Notes on stage 1

7. *-ʔ and *-h merged into *-H. This merger has no parallel in Chinese.

8. *-p-s, *-t-s, and *-k-s merged into a siblilant *-S that could have been [ts]. This is similar to the merger of *-p-s and *-t-s (but not *-k-s!) in Old Chinese.

9. *Cɯ- conditioned the partial raising of *a to -ɨa.

Notes on stage 2

10. Presyllables might have been gone by this point. They were certainly gone by stage 3 (unless the preinitials in the Tibetan transcriptions of Tangut represent presyllables in a conservative Tangut dialect).

11. Reflexes of *a developed Grade I, whereas reflexes of *ra developed Grade II.

12. *(r)a raised to i1/2 unless followed by a consonant.

13. *ɨa raised to i3/4 (the grade is dependent on the preceding consonant) unless followed by a consonant. The presence or absence of *-r- made no difference before *ɨa.

14. There was a chain shift:

*-S > *-H > *-Ø

Stage 1 *-H conditioned tone 2 (possibly breathy voice at this point?) and disappeared.

A new stage 2 *-H from *-S blocked the raising of *a and *ɨa. Syllables with this *-H developed tone 2.

Notes on stage 3

15. All codas were lost. What appear to be codas in the transcription represent vowel qualities:

Stage 2 -r is [r], whereas stage 3 -r represents vowel retroflexion [ʳ].

Stage 3 -n represents nasalization [˜].

16. Nearly all syllables with (pre)initial r- developed retroflex vowels by stage 3:

*ra > rir

*rCa > Cir

For simplicity this retroflexion is not included in the stage 3 column.

17. The monophthongization of *-aw, *-aj, *-am, and *-aŋ was complete by stage 3, but I don't know if it occurred at stage 1 or 2. A close examination of the layers of Chinese borrowings may clarify the relative chronlology of sound changes in Tangut.

15.3.3:17:09: FINE DINING: GRADING TANGUT WORDS FOR EATING

In my last post, I mentioned

4517 1dzi3 'to eat'

as an example of a basic word which should have been in the 'level' tone volume of Tangraphic Sea but was actually in Mixed Categories. All tangraphs with initial dz- were placed in Mixed Categories due to a massive error by the compilers of the first two volumes.

The word is not only noteworthy for its location in Tangraphic Sea but also for its unusual initial-rhyme combination. Class VI initials (i.e., alveolar sibilants) and Class IX z- usually precede Grade I and Grade IV rhymes. So in theory there could be a 1dzi1 and a 1dzi4, but not a 1dzi2 or 1dzi3. However,

4517 1dzi3 'to eat' and its homophones 0382 and 4912

have the Grade III rhyme 1.10 instead of the expected Grade IV rhyme 1.11 in

0943 1110 2696 3259 4829

which are in a separate homophone group in both Mixed Categories and Homophones.

Moreover, the 'rising' tone cognate of 4517 1dzi3 has the Grade IV rhyme 2.10!

4513 2dzi4 'to eat, drink; food'

Another possible Grade IV cognate is

4581 2dzi4 'to entertain at a banquet'

The only other case of Grade III/IV alternation following the same initial that I can think of is

3408 1tsa3 'to broil, roast' ~ 0618 1tsa4 'hot'

Both 'to eat' and 'hot' have external cognates ending in -a(t):

'to eat': Written Tibetan za-ba < *dz- 'to eat', Japhug ndza 'to eat'; more here
'hot': Written Tibetan tsha < *ts- 'hot', tshad-pa < *tsat- 'heat'; more here

Why did *a(t) develop into four different rhymes in Tangut? Normally *-a rose to -i1 unless preceded by the raising prefix *CI- or a stop coda *-p/-t/-k (most likely a suffix *-t in the case of 'hot'):

*CI-ndza-H > *2dzi4 'to eat' (*ndza would have become *1dzi1)

but the *-H-less 'level' tone counterpart is 1dzi3, not 1dzi4!

*tsa-t > 1tsa4 'hot' (*tsa would have become *1tsi1)

How can the anomalous Grade III forms be explained? Do they have some rare prefix? Are they borrowings from another dialect that had undergone different changes? It is unlikely that a basic word like 4517 'to eat' could be borrowed. Could they be archaisms? I used to reconstuct Grade III with central -ɨ- and Grade IV with front -i-. I thought *-ɨ- fronted to -i- after alveolar sibilants, but perhaps 4517 'to eat' and 3408 'to broil' had not undergone that fronting:

*CI-ndza > *CI-ndzɨa > *CI-ndzɨi > 1dzi3 [ndzɨi] (cf. 2dzi4 [ndzi] 'to eat, drink; food')

*tsa-t > *tsɨat > 1tsa3 [tsɨa]? (cf. 1tsa4 [tsia]? 'hot')

Lastly, is the rhyme of the o-stem of 4517 'to eat' Grade III like 4517 or IV like 4513?

=+

4547 1dzo? 'to eat' = 4517 1dzi3 + 5376 1tso4

I used to think there was no phonemic distinction between -o3 and -o4, and I automatically reconstructed -o4 after alveolar sibilants. So I once reconstructed 4547 as 1dzo4 like

=+

5854 1dzo4 'to rein in; to tie or strap something tightly' = 4829 1dzi4 + 5848 1tsho4

However, 4547 and 5854 have different fanqie (see above) in Mixed Categories implying they weren't homophonous, even though both editions of Homophones have them in the same homophone group. Moreover, their fanqie final spellers are in the same chain:

<>

5376 1tso4 < 4839 1so4 > 5854 1dzo4

so any distinction between them cannot be in their final vowels. My guess is that the fanqie for 4547 and 5854 are to be interpreted as

1dzi3 + 1tso4 = 1dzo3 [ndzɨo]? (with the Grade III medial -ɨ- of 1dzi3 [ndzɨi]?)

1dzi4 + 1tsho4 = 1dzo4 [ndzio]?

Grade III -ɨ- might have fronted to Grade IV -i- in the speech of the compiler(s) of Homophones.

15.3.3:1:18: TOP 25 TANGRAPHS IN THE TANGUT TRANSLATION OF THE ART OF WAR

Last week, I asked,

Was it [the Mixed Categories volume of Tangraphic Sea] a compilation of characters that were accidentally left out of the other two volumes, or do its characters have something else in common?

If Mixed Characters were an appendix, I would not expect three of its tangraphs to be among the top 25 tangraphs in the Tangut translation of The Art of War. The table below incorporates frequency data from Kotaka (2009: 2):

Rank	Frequency	Tangraph	Li Fanwen number	*Tangraphic Sea* volume	*Golden Guide*	Reading	Gloss
1	622		1531	1	✓	1ga4	army
2	442		1822	1	X	1ngwu'1	to say
3	389		5113	1	✓	1vi3	to do
4	297		1542	1	✓	1ku1	therefore
5	282		1918	1	✓	1mi4	not
6	278		3583	1	✓	1ta4	topic marker
7	231		4916	1	✓	1ghwe1	to fight
8	220		2627	2	✓	2lyq3	earth
9	219		0508	2	✓	2ngwu1	to be
10	214		2541	MC	✓	2dzwo4	person
11	183		1278	2	X	2y4	to say
12	180		1326	1	✓	1ky4	perfective prefix
13	179		1045	2	✓	2daq1	speech
14	176		3844	MC	✓	1jeq3	to go
15	175		1139	1	X	1e4	genitive-dative suffix
16	174		3951	1	✓	1thu1	to talk
17	170		0089	1	✓	1chha'3	on
18	167		5388	2	✓	2bo'1	transcription tangraph
19	163		4962	2	X	2vi1	the surname Vi
20	157		1142	2	✓	2li3	the surname Li
21	153		1245	1	✓	1e4	self
22	149		3045	1	✓	1tshew1	transcription tangraph
23	145		0144	1	X	1tshwan4	whole
24	144		2937	MC	✓	2lheq4	country
25	143		2805	2	X	2bu'4	to command

Why would 2dzwo4 'person', 1jeq3 'go', and 2lheq4 'country' be accidentally omitted from the first two volumes of Tangraphic Sea along with all other dz-, j-, and lh-tangraphs*: e.g., basic words such as

4517 1dzi3 'to eat', 0443 1jo3 'long', 2814 2lheq4 'moon, month'

which were in Mixed Categories?

The only scenario I can conceive is this: When the Tangraphic Sea was compiled, tangraphs were sorted by initials and were then sorted by rhyme. Somebody forgot to grab the lists of dz-, j-, and lh-tangraphs when sorting by rhyme, and those tangraphs and scattered others that were accidentally omitted were listed in Mixed Categories.

Why was Mixed Categories ordered by initial class rather than by rhyme? Tonight I realized that consistency might not always have been a good thing. Dividing only 558+ tangraphs into 183 rhyme categories (97 'level' tone rhymes and 86 'rising' tone rhymes) instead of 18 categories (nine initial classes per tone) would have made Mixed Categories difficult to navigate, would have resulted in blank sections under certain rhymes (e.g., 'level' tone rhymes 96 and 97 which are unrepresented in Mixed Categories), and would have scattered the dz-, j-, and lh-tangraphs across the volume instead of conveniently concentrating them in six sections (the Class VI, VII, and IX sections under the 'level' and rising' tones).

*Arakawa (1997: 22, 32, 126, 128) reconstructed the Precious Rhymes of the Tangraphic Sea 'rising' tone volume tangraphs

4781 5919 4983

as 2dzi, 2dzi, and 2ja:, but I follow Gong and transcribe them as 2tshi1, 2tshi1, and 2zha3, so they are not exceptions to the rule that dz- and j-tangraphs are in Mixed Categories.

Arakawa (1997: 32, 128) listed

4876

as a homophone of 4983 (his 2ja:) outside Mixed Categories, but it is in fact in Mixed Categories (he listed it again as a Mixed Categories tangraph on p. 96 but not p. 131) and has a completely different tone and rhyme (1jy3).

I used to think

5780 2lhi2

was a rare example of an lh-tangraph in the 'rising' tone volume of the Precious Rhymes of the Tangraphic Sea, but it is a fanqie initial speller for zh-, not lh-, so I now transcribe it as 2zhi2.

15.3.1:2:11: ?-HEARTED GIRL FIGHTER?

I was puzzled by the Thai title

นักรบสาวหัวใจมหากาฬ

nak rop saaw hua cay mahaakaan

lit. '-er fight girl head heart ?' = '?-hearted girl fighter'?

for Brave.

Mahaakaan is spelled <mahākāḷ> and seems to ultimately* be from Sanskrit mahākāla-, lit. 'great-black', originally a form of Shiva in Hinduism and later a dharma defender in Vajrayana Buddhism.

The Royal Institute Dictionary defines mahaakaan as a drug or as a plant (Gynura pseudochina).

None of those definitions seem to fit the context of Brave. Would a Scottish princess have the heart of Mahakala who isn't associated with Thai Buddhism? I would expect something like 'brave' modifying hua cay 'heart'.

The Vietnamese title of Brave has no semantic challenges for me:

Công chúa tóc xù 'Princess Bushy Hair'

However, xù 'bushy' has an unexpected combination of x- (normally < *cʰ-) and a lower series tone in a native word. x- with lower series tones in Sino-Vietnamese (e.g., 蛇 xà) comes from Late Middle Chinese *tɕʰ- < *(d)ʑ-. No such devoicing with aspiration occurred in Vietnamese. I could mechanically reconstruct an earlier voiced aspirate *ɟʱ- to account for the tone of xù 'bushy', but I wonder if the actual source of x- plus lower series tones in native words could be *cʰ- with a voiced prefix and/or *ɟ- with a voiceless prefix conditioning aspiration.

*The retroflex letter ฬ <ḷ> is due to influence from กาฬ kaan <kāḷ> from Pali kāḷa- 'black' which in turn is from Sanskrit kāla- with a dental l-. I don't understand why "[d]ental and retroflex sounds sporadically change into one another" in Pali.

The Pali Text Society's Pali-English Dictionary (1921-1925) does not list a *mahākāḷa- with a retroflex ḷ corresponding to Sanskrit mahākāla- with a dental l.