I rediscovered this article earlier today. I hadn't seen it in seventeen years. So much has changed since then. Ostapirat (2000) introduced the term 'Kra-Dai' as a replacement for 'Tai-Kadai'. I wonder what Graham Thurgood thinks of it. I haven't talked to him since about 1999.

Graham's Tai-Kadai tree (p. 362) has no Kra clade:

Lakkja-Kam-Sui-Tai-Be-Hlai-Laha Gelao-Lachi
Lakkja-Kam-Sui-Tai-Be (Kam-Tai)
Hlai-Laha Gelao

(The hyphenated names other than Tai-Kadai, Kam-Tai, and Kam-Sui are mine. I admit my coinages are unimaginative, long, and awkward and have no desire to promote them. Their sole function is to clearly indicate the members of each subgroup.)

He split the Kra languages among two different branches, grouping Laha with Hlai rather than with Gelao and Lachi (Ostapirat's 'Western Kra'). I don't know how he would classify Paha (Ostapirat's 'Central Kra') and Buyang and Pubiao (Ostapirat's 'Eastern Kra').

His subgrouping is similar to that of Edmondson and Solnit (1997: 2) as reproduced in Pittayaporn (2009: 5) with one exception: Laha is grouped with Gelao, Lachi, Buyang, and Pubiao as 'Geyang' (short for Gelao-Buyang; i.e., Ostapirat's Kra). Do Edmondson and Solnit consider Paha "to be part of the Buyang dialect cluster"?

I've reproduced Ostapirat's (2000: 1) subgrouping before, but here it is again with the branches rearranged for ease of comparison and with Kra subgrouping included:

Kam-Tai Hlai
Central-East Kra Southwestern Kra
Central Kra: Paha Eastern Kra Southern Kra: Laha Western Kra
Buyang Pubiao Gelao

According to Graham's student Norquest (2007: 15), "Ostapirat (2005) treats Lakkja as part of Kam-Sui", and perhaps Ostapirat also did so in 2000 since Lakkja is not a separate branch in the above subgrouping.

Norquest (2007: 16) has another subgrouping similar to Ostapirat (2005) except that he regarded Lakkja as distinct from Kam-Sui:

Southern Kra-Dai
Northern Kra-Dai
Southwestern Kra-Dai
Southeastern Kra-Dai: Hlai
Northeastern Kra-Dai Northwestern Kra-Dai: Kra

I wonder if Graham agrees or disagrees with Norquest. They reconstructed Proto-Hlai very differently, and their reconstructions in turn don't look like Ostapirat's Proto-Hlai: e.g.,

*nʔom A1
*hnɔm *(ə)num A
*ɮʔou A1
*ruː *aRu A

Norquest and Ostapirat's reconstructions for those two numerals are closer to Proto-Austronesian. Norquest (2007: 416) even explicitly derived his *hnɔm and *ruː from Proto-Austronesian, but his account, though detailed, leaves some questions unanswered:

- Why does Proto-Hlai have *-ɔm instead of *-əm? Is the rounding an innovation in the Austronesian source language?

- Why does voiced *r- condition odd-numbered tones normally indicating an earlier voiceless initial? Such odd-numbered tones are probably the reason behind Thurgood's *ɮʔ- with a voiceless glottal stop and Ostapirat's *aR- which might have been phonetically *ʔaʁ- (also with a voiceless glottal stop) and contracted to *ʔʁ-.

Similarity to Proto-Austronesian need not entail a genetic relationship. As Graham wrote, "The borrowing of numerals is rampant in Southeast Asia" (p. 360) and shared basic vocabulary with irregular correspondences reflect "intimate contact" between Tai-Kadai (= Kra-Dai) and Austronesian (p. 362). Shared basic vocabulary with regular correspondences, OTOH, would indicate a genetic relationship. UNPREDICTABLE EMPHATIC VOICING IN SEMITIC

Long ago, I used to assume that Proto-Semitic consonants were like Arabic consonants minus some obvious innovations: e.g., *p > f. My notion of Proto-Semitic was like early Sanskrit-ish conceptions of Indo-European: too close to a prestigious, well-known daughter language. So I was surprised to see how different Arabic was from Proto-Semitic reconstructions. Huehnergard (2011) reconstructed all Proto-Semitic emphatics as voiceless and glottalized, yet some of their Arabic - and Ugaritic and Aramaic - reflexes seem to be voiced at random (bold):

Proto-Semitic *θʼ *tʼ *tsʼ *tɬʼ *kʼ *xʼ
Arabic ðˤ ~ q ħ
Ugaritic ɣ (?) or ðˤ (?*) (and ɣ?) ?
Aramaic *ʁˁ > ʕ ħ

(I have attempted to rewrite all consonants in IPA. Huehnergard did not include Ugaritic in his article. Ugaritic reflexes are from Wikipedia's Semitic and Ugaritic language articles which conflict. Neither article mentions a Proto-Semitic emphatic velar fricative which was also absent from an earlier version of Huehnergard's correspondence table that David Boxenhorn wrote about. 10.12.1:16: Is *xʼ  a new proposal of Huehnergard's?)

Why are some emphatics more prone to voicing than others? Contrast that randomness with the regular voicing (or lack of voicing) of glottalized consonants in Indo-European: e.g.,

Proto-Indo-European *pʼ *tʼ *kʼ *kʷʼ
Sanskrit b d j j, g
Greek g b, d, g
Latin v, g(u)
Proto-Germanic *p *t *k *k(w)

10.12.1:01: Was Proto-Semitic *tɬʼ more likely to voice because its component could easily shift to voiced *l? Proto-Semitic *θʼ has no lateral component but is acoustically similar to ɬʼ. But what about voiced pronunciations of q in Arabic dialects? Is the voiced pronunciation of <q> in Persian due to Arabic influence or did earlier Persian q voice independently?

*10.12.0:39: According to Wikipedia's Ugaritic language article which lists a ðˤ not in the Semitic languages article. HEBREW : ARAMAIC ʕ

The unusual-looking correspondence between Hebrew and Ugaritic ɣ that I discovered yesterday led me to rediscover the even more unusual-looking correspondence between Hebrew and Aramaic ʕ*. I had first seen the latter nine years ago but never tried to figure out the history behind it until now. My guess is that Ugaritic ɣ and Aramaic ʕ are the products of similar processes:

Proto-Semitic *θˁ > Hebrew but Ugaritic *ðˁ > *zˁ > > ɣ (?**)

Proto-Semitic *ɬˁ > Hebrew but Aramaic*** *ɮˤ > > > ʕ

I have never understood why emphatics voiced in Semitic. There is no comparable phenomenon in Chinese emphatic theory.

Are emphatics more prone to backing? The nonemphatic counterparts of those consonants did not back to velars or pharyngeals:

Proto-Semitic > Hebrew ʃ but Ugaritic θ

Proto-Semitic > Hebrew ɬ > s but Aramaic s

*10.11.0:07: The example I stumbled upon last night was

Hebrew ere  'earth' : Aramaic araʕ

in the Wikipedia entry "Tsade".

**10.11.0:50: Wikipedia's Semitic languages article derived Ugaritic ɣ partly from Proto-Semitic *θˁ, whereas Wikipedia's Ugaritic language article derived ɣ partly from Proto-Semitic *ɬˁ. Which is correct?

***10.11.1:09: The -stage in Aramaic may have been attested:

The sounds *ġ [i.e., *ɣ] and *ḫ were always represented using the pharyngeal letters ʿ [i.e., ʕ] ḥ, but they are distinguished from the pharyngeals in the [Egyptian] Demotic-script papyrus Amherst 63, written about 200 BC. This suggests that these sounds, too, were distinguished in Old Aramaic language, but written using the same letters as they later merged with.

I don't know whether the ɣ in Amherst 63 was partly from Proto-Semitic *ɬˁ; Proto-Semitic is another potential source.

Is it possible that the author of Amherst 63 spoke a dialect of Aramaic that preserved consonantal distinctions lost in mainstream dialects? (I am reminded of the possibility that the dialect[s] of Tangut in Tibetan transcription may have been more conservative in some ways than the dialects preserved elsewhere.) N-Sˁ-R

Today is Michael Netzer's birthday, and I celebrate it by looking at the root n-sˁ-r.

Wagner's (1998: 541) entry on Hebrew n-sˁ-r listed two related forms that struck me as unusual: Ugaritic n-ǵ-r 'protect, keep, beware' and Hebrew māsˁôr (translated by Sweeney [2000: 470] as 'siege-work, rampart').

First, I wouldn't expect Ugaritic ǵ to come from sˁ: the two consonants have nothing in common (apart from being consonants, of course). I suppose ǵ is equivalent to Wikipedia's ġ [ɣ]. According to Wikipedia, "Sometimes Ugaritic ġ [ɣ] corresponds to Proto-Semitic ṣ́ [ɬˤ]". So is n-ɣ-r (?) from Proto-Semitic *n-ɬˤ-r rather than *n-sˁ-r? Is this how *ɬˤ became ɣ?

*ɬˤ > *ɮˤ > > ɣ

Such a change is vaguely parallel to this shift in Spanish:

early fricativization of palatal /ʎ/ (from Vulgar Latin -LY-, -CL-, -GL-), first into palatal /ʒ/ and ultimately into velar /x/, e.g., filiu →hijo, *oc'lu → ojo, *coag'lare → cuajar

Second, the m- of sˁôr does not match the n- of n-sˁ-r. Could māsˁôr and n-sˁ-r share a biliteral root *sˁ-r? Hecker (2007: 135) proposed such a root *sˁ-r 'narrow' as a source of māsˁôr and n-sˁ-r words (including an Ugaritic q-sˁ-r 'short' but not Ugaritic n-ɣ-r (?)). Although I can see how 'narrow' could become 'short' (both refer to a small degree of a dimension), 'narrow' doesn't seem to have anything to do with "visual observation [which] is the primary meaning of n-sˁ-r" (Wagner 1998: 541). If not for that visual aspect*, I could strain to link 'narrow' with 'protect' and the like:

'narrow' > 'confine to a narrow space' > 'surround' > 'protect'

Hecker's biliteral roots underlying triliteral roots remind me of Blust's CVC 'roots' underlying Proto-Austronesian CVCVC 'word-bases'. Blust (1988: 1) contrasts the use of the 'root' in Semitic, Indo-European, and Austronesian linguistics.

*I am reminded of Japanese mamor- 'protect' from ma- 'eye' + mor- 'protect'. SAEK (PART 2)

I wish I knew what the Vietic Sách called themselves. As Chamberlain (1998: 102) wrote (if I understand him correctly), the fact that they share a name with their Tai Saek neighbors "cannot be accidental".

Chamberlain mentioned two glosses for Sách:

The name Sách in Vietnamese has been translated as 'division administrative équivalente au village' which according to Ngô Đ.T. (1977) was a name "recorded from the 15th c. in historical documents." Cadière (1905: 349) translates Sách as "liste, registre, rôle d'impôt," perhaps indicating villages newly registered, or subject to tribute.

I assume those "historical documents" were in Classical Chinese, the dominant written language of premodern Vietnam. I don't know of any Sino-Vietnamese word sách meaning 'division administrative équivalente au village', but 冊 sách (northern [sac] < 17th century *ʂac < *ʂɛk, borrowed from Late Middle Chinese *tʂʰæk; now normally 'book') can mean 'list'. Did 冊 also refer to an administrative division in 15th century Vietnam?

Pre-17th century Vietnamese *ʂɛk is a good phonetic match for Lao sɛk and Thai sɛɛk 'Saek', but not for Saek thrɛɛk 'Saek'. How can I account for this mismatch? Here are two scenarios:

A: Shortly after the liberation of Vietnam from Tang rule, 冊 was *tʂʰɛk in Vietnamese - still very close to Late Middle Chinese *tʂʰæk - and this term was borrowed into Saek as thrɛɛk. Neither Saek nor Proto-Tai had any tʂʰ-, so thr- was the closest available approximation. Later, *tʂʰɛk became *ʂɛk in Vietnamese, and this deaffricated form is the source of Lao sɛk and Thai sɛɛk.

B: thrɛɛk (or a more archaic *kreek?*) was an old name of the region that the Tai Saek and Vietic Sách lived in. The name was borrowed into Vietnamese as *trɛk (or *krɛk?) and the initial cluster fused into *ʂ-. *ʂɛk 'list' was an unrelated homophone which could have been used to phonetically write the ethnonym.

Keep in mind that I don't know what the Sách word for 'Sách' (or 'Saek'!) is. That crucial information could invalidate much of what I wrote. I am particularly interested in whether the Sách word has a Cr-cluster or a presyllable-r-sequence corresponding to Saek thr-.

I am surprised that I can't find any listing for Sách (or even Chứt varieties in general) in the Southeast Asian Linguistics Archives.

*The hypothetical Proto-Tai source of Saek thrɛɛk would be *kreek in Pittayaporn's (2009) reconstruction. Although I doubt that the name existed in Proto-Tai, I think it is possible that modern Saek thrɛɛk may have undergone one or both of the following sound changes:

*kr- > thr-

*-eek > -ɛɛk

Scenario A above rules out an earlier *kr-ethnonym for the Saek, whereas such a name is possible in scenario B since *kr- as well as *tr- merged into *ʂ- in Vietnamese. SAEK (PART 1)

I recently rediscovered William J. Gedney's Comparative Tai Source Book when looking for native (or at least non-Chinese) Kra-Dai numerals.

When I looked up Saek, I was surprised to learn that 'Saek' was thrɛɛk D1 (with a initial cluster from *kr- in Pittayaporn's 2009 Proto-Tai reconstruction) in Saek. Why do the Thai call the thrɛɛk แสก <sɛɛk> sɛɛk D1? The Vietnamese name for a nearby Vietic (not Tai!) group is Sách, and Vietnamese s- can be from any *Cr-cluster including *kr-. Do Thai sɛɛk D1 and Lao ແສກ <sɛk> sɛk D1 reflect Vietnamese Sách after *Cr- became s- (in Vietnamese but not in Saek)?

Early Saek *krɛɛk or *thrɛɛk > Early Vietnamese *Crɛk > Vietnamese Sách > Lao sɛk > Thai sɛɛk? THE OTHER KRA-DAI NUMERALS: 'SIX', 'EIGHT', 'NINE', 'TEN'

I grouped 'six' through 'ten' together since they share a common first syllable in Qabiao:

Gloss six seven eight nine ten
Proto-Austronesian (Blust 1999 in IPA) *ənəm *pitu *walu *siwa *ça-puluq
Proto-Kra (Ostapirat 2000) *x-nəm A *t-ru A *m-ru A *s-ɣwa B *pwlot D
En nəm ʔam tu me ru wa θət
Qabiao ma nam ma tu ma ʐɯ ma ɕia pət
Pre-Proto-Hlai (Norquest 2007) *nɔm *tuː *ruː *C-βɯːʔ *fuːt

In my last entry on 'seven', I once projected the m of En and Qabiao 'seven' back into Proto-Kra (and even further back into Austro-Kra-Dai (!)) but now I realize that was a mistake, as Qabiao ma may have spread from 'eight' to neighboring numerals, stopping at ma 'five' (to avoid *ma ma) and pət 'ten' (to avoid homophony with ma pət 'fifty').

(10.7.0:55: Similarly, En me in 'eight' may have spread to 'seven', reduced to syllabic *m̩, and then expanded to ʔam, just as a syllabic *r̩ was expanded to er in Standard Mandarin.)

The mismatch of Proto-Austronesian *w- and Proto-Kra *m- in 'eight' has yet to be explained. Could both be from an earlier Austro-Kra-Dai (not again!) *mw-? Or is Proto-Kra *m- from a descendant of Austronesian with *w- > *m- (a sound change that I've never seen anywhere)?

Proto-Kra *x- in 'six' doesn't match anything in Proto-Austronesian, but is necessary because some Kra forms have first series tones implying a proto-voiceless initial, and Proto-Kra *ʔ-n- is not possible since it has a different reflex in Pubiao (nɦ- instead of the n̥- of 'six'). (10.7.0:45: See pp. 113, 131, 146, 154, 163, 186, 197, 211, and 245 of Ostapirat 2000 for all the Kra data for 'six'. No reflex of 'six' has a velar initial, so I don't feel compelled to reconstruct *x- as opposed to a voiceless laryngeal *H- or a presyllable *HV-.)

I don't understand why Ostapirat reconstructed *-ɣ- in 'nine'. Proto-Kra tone B should correspond to a final *-h absent from Proto-Austronesian. I would expect tone A corresponding to a Proto-Austronesian open syllable (cf. 'six', 'seven', and 'eight').

(10.7.0:51: I can only find two references to 'nine' in Ostapirat 2000: pp. 192 and 245, and have not found any other examples of *s-ɣw-. Maybe the word could be reconstructed with *s-w- since is optional in his Proto-Central-Eastern-Kra *(ɣ)w- and Proto-Central-Eastern-Kra/Proto-Kra *k(ɣ)w- on pp. 186, 191, and 215. I can't explain why Paha has dh- in 'nine': cf. Paha θ- < *s- and v- < *(ɣ)w-. Perhaps is needed to account for the aspiration in Paha dh-. *dz- became unaspirated *d- in Paha, whereas *s-ɣw- might have become dh- via an intermediate stage *(d)zh(w)-. The Paha reflex of the rare initial *z- is unknown. Is Paha dh- [dʱ] like Indic dh-?)

I can't explain the different codas of 'ten' in Proto-Austronesian on the one hand and Proto-Kra and pre-Proto-Hlai on the other. (10.7.1:11: The mismatch of *-q and *-t reminds me of how  冊 Middle Chinese *tʂʰɛk corresponds to Japanese satsu < *sat.)

(10.7.0:11: Proto-Kra *-w- may be a trace of an earlier *-u-. Tone D is expected for a syllable ending in a stop.)

