Amaravati: Abode of Amritas

12.11.28.23:50: MYSTERIES OF THE MESSENGERS' SCRIPT

It's been twenty years since I learned my first Brahmic script* (देवनागरी Devanagari). Within three years, I learned the Tibetan, Thai, Burmese, and Khmer scripts. And from time to time I dabble in the other scripts of the subcontinental region. Lately I've been exploring the ಕನ್ನಡ ಅಕ್ಷರಮಾಲೆ Kannada script thanks to Robinson Mason. But I never got around to closely looking at any insular Southeast Asian script until I recently installed RS Wihananto's free Tuladha Jejeg font for Javanese Hanacaraka 'there were [two] messengers'.

I think of Brahmic scripts as being almost isomorphic: i.e., having a common structure masked by very different characters. (There are many exceptions to this generalization; hence "almost".) So I came to Javanese with a lot of preconceptions. Javanese in Unicode generally fits the pattern I'm used to after two decades with a few exceptions:

1. There are two characters for word-initial (i.e., 'independent') i-: and . What is now used to represent long [iː], and modern <ī> did not exist:

Phonetic value	[i]	[iː]
Earlier Javanese	<i kawi>	<i>
Modern Javanese	<i>	<ī>

How did this shift occur? I don't know, but I suspect the fact that modern Javanese has no phonemic vowel length was a factor in the use of for both short [i] and long [iː] in word-initial position.

2. At first glance there seemed to be no characters for Sanskrit word-initial ū- and au-, but it turns out they are encoded as sequences:

<ū> = + <ā>

<au> = <o>+ <ā>

(I assume that the long "version" of <o> in The Unicode Standard, Version 6.2 refers to <au>. Sanskrit o [oː] is from short *au and Sanskrit au is from long *āu.)

3. I was similarly surprised by the apparent absence of a character for word-initial ṝ- given the presence of characters for the equally rare word-initial syllabic liquids ḷ- and ḹ- (none of the three ever appears in any real Sanskrit words and I doubt they are in any Javanese words), but it too is encoded as a sequence:

<ṝ> = <ṛ> + <ā>

4. What is the difference between <ra> and <ra> agung ('noble ra')? Both Sanskrit and modern Javanese only have one kind of /r/.

5. Why is the dependent (i.e., noninitial) vowel character <o> listed after <ā> instead of after <ai>?

6. Although The Unicode Standard, Version 6.2 states that

Vocalic liquids (ṛ and ḷ) are treated as consonant letters in Javanese; they are not independent vowels with dependent vowel equivalents, as is the case in Balinese or Devanagari.

there is a dependent vowel <ṛ> listed in addition to independent (i.e., word-initial) <ṛ>. Was dependent <ṛ> an option in noninitial position? Conversely, the absence of dependent <ṝ ḷ ḹ> implies that independent <ṝ ḷ ḹ> were obligatory in all positions including those where I'd expect dependent characters.

(11.29.8:20: Kuipers and McDermott 1996: 478 do list a dependent form of <ḷ>, but there is no special codepoint for it in Unicode; it is encoded as <pangkon> (= Sanskrit virāma) + <ḷ>. K&M's dependent <ṛ> is similarly encoded as <pangkon> + <ṛ> and is not the same as the dependent <ṛ> in Unicode which has a special codepoint A9BD. What is the difference between the two kinds of dependent <ṛ>?)

7. Not an encoding question, but ... how did r and l swap positions in lontar 'palm leaf manuscript' from ron 'leaf' and tal 'palm'?

*I had read about Brahmic scripts - mostly Siddham - before that, but an abstract knowledge is not the same as reading and writing on a regular basis.

12.11.26.23:59: NASAL CODAS AS CLUES FOR THE STRATIFICATION OF CHINESE LOANWORDS IN RONGHONG QIANG

If I had read Huang and LaPolla's (1996) A Grammar of Qiang from beginning to end, I wouldn't have thought I was being original when I proposed in "Two Strata of Nasals in Tangut?" that Ronghong Qiang (RHQ) lɑ 'wolf' was an early loan from Chinese 狼 *lɑŋ because it had lost its final nasal. I later found the page (46) on which H&L made that very point:

There are in fact two or more layers of loans from Chinese, as there are older, harder to identify loans, such as /lup/ 'radish' (< Chinese luóbo [luopuo]) and /lɑ/ 'wolf' (< Chinese láng), and newer, more transparent loans, such as /kuntʂhɑntɑn/ 'communist party' (< Chinese gòngchǎndǎng [kʊŋtʂhɑntɑŋ]). As shown by Sun (1988), there are differences in the phonology and use between the old and the new loans.

I wish I had seen Sun 1988 too. I wouldn't be surprised if I'm reinventing the wheel below.

I assume that RHQ -n for standard Mandarin -ng indicates a middle stratum loan, whereas RHQ -ŋ indicates a new stratum loan (or at least a less assimilated) loan.

Early stratrum: borrowing before loss of codas

Chn 麻糖 *mathɑŋ > pre-RHQ *mathɑŋ > RHQ mɑthɑ 'candy'

(with the first vowel backing due to vowel harmony; see H&L 1996: 35)

or borrowing after loss of codas but before development of new codas: i.e., during a period when pre-RHQ only had open syllables:

Chn 麻糖 *mathɑŋ > pre-RHQ *mathɑ > RHQ mɑthɑ 'candy'

Middle stratum: borrowing after development of new codas:

Chn 白糖 *pethɑŋ > RHQ pethɑn 'candy'

(with the same Chn morpheme 糖 'sugar' as 'candy' above)

(I presume pe was the local Mandarin cognate of standard bai [pai] 'white'.)

Mandarin -ng was RHQized as -n because -ŋ may be a rare coda in native words due to its low frequency as an initial: ŋ- only occurs before u (H&L 1996: 23), so native -ŋ words would either be from compounds with *ŋu(C) or words that once had *ŋ- and lost it after compounding:

*X-ŋV > Xŋ

*ŋV (V ≠ *u) > (C)V (C ≠ ŋ)

Late stratum: borrowing by bilinguals who have no trouble with -ŋ:

Chn 池塘 *tshɨthɑŋ > RHQ tshəthɑŋ 'pond'

(I presume tshɨ is the local Mandarin cognate of standard chi [tʂhɨʳ] 'pond'. RHQ has no ɨ; ə is its only central vowel.)

This stratum may only slightly postdate the middle stratum because the latter is already clearly modern; kuntʂhɑntɑn 'Communist Party' cannot be more than 90 years old (and I doubt any RHQ speakers had heard of the Communist Party of China when it was founded in 1921). Are there any doublets? Would younger speakers say kuŋtʂhɑntɑŋ?

11.27:3:57: My scenario cannot account for RHQ jy keʴ 'fishing rod' from Md 魚竿 yugan [jykan]. I would have expected RHQ *jy ka(n). Could the retroflexion of jy keʴ have been an attempt to imitate a Chinese -n at a time when RHQ had no codas? I am reminded of section 3 of my earlier post about Chinese nasals as a possible source of Tangut vowel retroflexion.

12.11.25.23:27: A MODEL FOR COINCIDENTAL CODAS

In "A Cloudy Complication", I proposed that nasals and and nasalized vowels in Qiangic words for 'cloud' were the results of compounding that restored a nasal that had been lost:

*-Vm > *-V > *-V-mV > -Vm ~ -Ṽ

How often would nasals be restored? Here's an extremely simplistic model of Qiangic-style codas. Suppose Pre-Quasi-Qiang (PQQ) had Benedict's 23 (1972) simple Proto-Tibeto-Burman onsets as presented in Matisoff (2003: 15) plus ʔ- (see Matisoff 2003: 11).

The 24 PQQ onsets

*p-	*t-	*ts-	*tś-	*k-	*ʔ-
*b-	*d-	*dz-	*dź-	*g-
		*s-	*ś-		*h-
		*z-	*ź-
*m-	*n-		*ń-	*ŋ-
*w-	*l-	*r-	*j-

PQQ had Matisoff's (2003: 237) eleven Proto-Tibeto-Burman codas plus -Ø.

The 12 PQQ codas

*-p	*-t			*-k	*-Ø
*-m	*-n			*-ŋ
	*-s
*-w	*-l	*-r	*-j

If every PQQ syllable had one of five vowels (*a, *i, *u, *e, *o - the core of Matisoff's 2003 Proto-Tibeto-Burman vowel inventory), there were 24 x 5 x 12 = 1,440 possible PQQ syllables.

It is unlikely that all 1,440 would actually be attested, but let's suppose they are, and that each syllable corresponds to a distinct root (i.e., no homophony - also unlikely).

If all roots could combine in any order with any other root (extremely unlikely), there would be 1,440 x 1,440 = 2.0736 million possible compounds. (Of course no real language has that many words in use, compound or otherwise.)

How many compounds would have 'restored' nasality in Quasi-Qiang (QQ) which lost all original codas and gained new codas through compounding? There would be 24 x 5 x 3 = 360 roots with historical nasal codas (*CVN) and 4 x 5 x 12 = 240 roots with nasal initials (*NVC). 360 times 240 equals 86,400 which is 4.2% of 2.0736 million (= the number of possible *CVC-*CVC compounds) and 16.7% of 518,400 (= 360 x 1,440, the number of possible *CVN-*CVC compounds).

If QQ dialect A developed nasal vowels from compounds with nasal-initial second syllables, there would be a one out of six chance of pseudohistorical nasality: e.g., words like QQA dẽ 'cloud' seemingly reflecting the final nasal of its PQQ root *dem but actually from a late PQQ compound *de-NVC after the loss of PQQ *-m.

There would be 120 QQ roots ending in any given nasal and 60 roots beginning with that nasal. 120 times 60 times 3 equals 21,600 compounds of the types

*CVm + *mVC

*CVn + *nVC

*CVŋ + *ŋVC

which is 1% of 2.0736 million (= the number of possible *CVC-*CVC compounds) and 4.2% of 518,400 (= 360 x 1,440, the number of possible *CVN-*CVC compounds).

If QQ dialect B retained the onsets of second syllables of compounds as codas, there would be a one out of twenty-four chance of a pseudohistorical nasal coda: e.g., words like QQB dem 'cloud' seemingly reflecting the final nasal of its PQQ root *dem but actually from a late PQQ compound *de-mVC after the loss of PQQ *-m.

This model is unrealistic because it assumes a completely even pattern of distribution: each (P)QQ consonant is as likely as any other. Does any real language exhibit such a pattern? jh is roughly 1/200 as common as k in Sanskrit continuous texts (Whitney 1924: 26) and only four Sanskrit roots have jh- (1885: 15, 57-58): ujh, jhaṭ, jhaṇ, jhar. Phonological statistics is a field ripe for exploration.

In any case, I predict that pseudohistorical nasality is possible though rare in Qiangic and Tangut. But if there are many words like 'cloud' with nasality corresponding to historical nasal codas preserved elsewhere in Sino-Tibetan, then that cannot be due to chance, and my hypothesis is wrong.

ADDENDUM: What if nasal loss postdated compounding and final syllables (with certain onsets: e.g., consonants homorganic with the preceding coda?) blocked nasal codas from being lost?

*CVN-CVC > *CVNC > CVN ~ CṼ

In this scenario, words like Tangut

2diẹ̃ 'cloud'

could be from, say, Pre-Tangut

*Sɯ-dem-PVH > *diẹmPH > *diẹmH > *2diẹm

whereas a hypothetical *Sɯ-dem would have become *2diẹ without nasality.

Another possibility is that original nasal codas could have (partly) survived if they were followed by suffixes: e.g., Pre-Tangut *Sɯ-dem-H became Tangut 2diẹ̃. Therefore there should not be any Tangut 'level' (i.e., first) tone syllables with nasal vowels from earlier root-final nasals since their Pre-Tangut sources would have lacked the *-H suffix that conditioned the 'rising' (i.e., second) tone. All Tangut 'level' tone syllables with nasal vowels would have to be loans or former compounds.

11.26.00:50: Pre-Tangut *-H in *Sɯ-dem-H 'cloud' may correspond to the glottal stop in Caodeng rGyalrong zndimʔ ~ zndəmʔ and Benzhen rGyalrong zdiɐmʔ 'cloud' in the STEDT database.