In my last entry, I mentioned that Thai had no [ey] or [eey]. Yet Thai spelling does have the letter sequence เ ... ย <e y> which looks like it should be pronounced [eey] but is actually pronounced [əəy].

Thai has no single spelling for [ə]/[əə]:



Open syllable

เ ... อะ <ee ... ɔɔḥ>

เ ... อ <ee ... ɔɔ>

Closed syllable not ending in [y]

เอิ <ee ... i>

Syllable ending in [y]

(does [əy] exist?)

เ ... ย <ee ... y>

There is no way to distinguish between short and long schwa in closed syllables ending in non-[y] codas. I don't know how to type a floating <i>, so I've inserted อ as a dummy consonant to prop it up.

<ee ... ɔɔ> reminds me of the use of eo for Korean ㅓ[ə]. A schwa is a vowel between an e and an o (close to <ɔɔ>).

The logic of <ee ... i> eludes me. I would have expected

เอึ <ee ... ɨ> for short schwa

เอื <ee .. ɨɨ> for long schwa

since schwa has the height of <ee> and the nonpalatality, nonlabiality of <ɨ(ɨ)>.

How old are these conventions?

One might expect these Thai practices to originate from Khmer like the rest of the Thai alphabet. In modern Khmer, there is only one way to write the vowels descended from Old Khmer [əə] (there was no short [ə] in OK):

<ee ... ii> (cf. Thai <ee ... i> with short <i>)

In OK, [əə] and [ee] were both written as <ee>.

The Thai practice of writing [əəy] as <ee ... y> is probably from OK <ee ... y> for [əəy], now <ee ... iiy>: e.g.,

OK ហេយ <eehy> [həəy]

modern ហើយ <eehiiy> [haəy]

(postverbal completion marker)

5.22.00:18: Forgot to mention that the Khmer alphabet in turn is based on an Indic prototype which has no conventions to represent [ə].

Which came first, Thai <ee ... i> or Khmer <ee ... ii> for [əə]? THE ṢAD-DEST NUMBER

Although Persian sad 'hundred' is native and never had a pharyngeal ṣ, it is written as صد <ṣd> with a letter ض <ṣ> normally reserved for Arabic borrowings with pharyngeal ṣ. Why does it have a pseudo-Arabic spelling? Did Arabic speakers perceive Persian /s/ as being like their [ṣ]?

In Southeast Asian scripts, letters for voiced aspirates are normally reserved for Indic borrowings in languages that never had such sounds, yet there are a few spellings like

Burmese <dhaaḥ> < *d- 'knife' (My HTML editor won't support Unicode Burmese!)

Thai ฆ่า <ghaa1> khâa < *g- 'to kill'

Thai เฒ่า <ɖhaw1> thâw < *th- (not even *d-!) 'aged, old' (spelled with an Indic retroflex letter! - Thai has never had retroflexes)

This spelling must postdate the devoicing of ฒ. An (older?) etymological spelling เถ้า <thaw2> also exists.

Thai ธง <dhoŋ> thoŋ < Khmer *d- 'flag'

Thai เธอ <dhəə> thəə < ?*d- 'you'

How do these nonetymological spellings arise? Are there recurring motivations across orthographies? Are there cases of pseudo-Greek spellings of Slavic words in premodern Cyrillic and/or pseudo-Greco-Latinized spellings of non-Greco-Latin words in the Roman alphabet?

5.21.18:45: Not quite in the same category: the native Thai autonym

Thai ไทย <daiy> thay < *d- 'Thai'

has an unnecessary final ย <y> as if it were from an Indic word ending in -ya. Cf.

ไถ <thaiy> thǎy < Pali theyya 'theft'

ไช <jaiy> chay < Pali jeyya 'to be conquered'

ไส <saiy> sǎy 'black magic' < ไสยศาสตร <saiyyaśaastr> sǎyyasàat < Pali seyya 'better' + Skt śaastra 'teaching'

เวไน <wenaiy> wenay < Pali veneyya 'tractable'

คังไค <gaŋgaiy> khaŋkhay < Pali gaŋgeyya 'of the Ganges'

อธิปไต <adhipataiy> àthíppàtay < Pali adhipateyya 'sovereignty'
อาชาไน <aajaanaiy> aachaanay < Skt or Pali aajaaneyya 'well-born'

อุปไม <upamaiy> ùp(p)àmay < Pali upameyya 'to be compared'

I have long presumed that ไทย <daiy> 'Thai' was spelled like an Indic word to make it seem as if it were of Indic origin even though there is no Pali *deyya 'Thai'. However, why would anyone want to make 'kill' (or Burmese 'knife' or Persian 'hundred') appear 'classier' by spelling it with letters associated with a more prestigious language?

One might expect Pali eyya to correspond to Thai เ ... ยย <eyy> [eey] given the regular correspondences

Pali e : Thai เ [ee]

Pali y : Thai ย [y] (and Thai does not permit any final clusters in speech, so <yy> = [y], not [yy])

but in fact it corresponds to Thai ไ ... ย <aiy> [ay] and ยย <yy> or even ยย์ <yy> with a silent letter marker don't exist. Oddly, I can't find anything in Gedney (1947) about this. As Haas (1956: 18) noted, there is no [eey] in Thai, so one might conclude that Thai borrowed Pali eyya as [ay] since [eey] and even [ey] were not possible. However, that overlooks how Thai probably borrowed Pali through Khmer. In Khmer, Pali eyya corresponds to <aiy> ~ <eyy> [ay]:

សៃយ <saiy> [say] < Pali seyya 'better'

អធិបតេយ្យ <adhipateyy> [athippatay] < Pali adhipateyya 'sovereignty'

ឧបមេយ្យ <upameyy> ~ ឧបមៃយ <upamaiy> [upamay] < Pali upameyya 'to be compared'

though normally *ay > ey after voiced consonants in Khmer

and normally no final consonant follows <ai> (Huffman 1970: 107 goes as far to say that final consonants never follow <ai>. How many other counterexamples are there?)

I don't think -e(e)y was a possible rhyme in earlier Khmer. So perhaps

Pali ey(ya) > Khmer <eyy> ~ <aiy> ay > Thai <aiy> ay

Obviously 'Thai' wasn't borrowed from Khmer into Thai, but its spelling was influenced by Pali loans via Khmer.

The Khmer word for 'Thai' is ថៃ <thai> [thay] with <th>, indicating that the spelling must postdate the shift of  *d to thin Thai. I wonder if there is an earlier word ទៃ <dai> which would be pronounced [tey] today. APE-ATRIARCH?

I would never have guessed that Russian обезьяна 'monkey' is from Turkish or Persian abuzine 'monkey' according to Vasmer and Arabic abu zina 'отец блуда' (presumably the source of the Turkish and Persian words?) according to Ushakov. The phonetic match is vague at best. I would expect абузина.

Turkish / Persian / Arabic a b u z i n T/P e, A a
Actual Russian о б е зь я н а
Expected Russian а б у з и н а

(5.20.1:01: Although the о of обезьяна is currently an unstressed a-like unrounded [ə], presumably it was [o] at the time of borrowing. Was ь still a vowel like [ɪ] at the time of borrowing?)

Moreover, I can't find a Turkish or Persian word abuzine, though there is a Persian بوزينه buzine <buuziineh> 'monkey' without a- and this journal mentions a Turkish (e)buzine (obsolete? - I can't find it in Büyük Türkçe Sözlük). Although Arabic abuu (أبو) is 'father' (= отец), the only Arabic zina that matches блуда has a short vowel (زنا <zinaa>) unlike the formerly long ii of بوزينه <buuziineh> but like the short i of (earlier?) Persian buuzina mentioned in this journal and بُوزِنه <buuzineh> in Sen's 1821 dictionary.

Ushakov was not satisfied with Brandt's suggestion that обезьяна was altered by analogy with the prefixes о- 'about' and без- 'without' (cf. words like о-без-лес-ение 'deforestation'; лес is 'forest' as in Лес категорий, the Russian translation of 類林 The Forest of Categories from my recent Tangut posts). What's the story behind these words?

5.20.00:30: How recently did earlier Persian short i and u become modern standard Persian e and o? This 1902 grammar describes a transitional state in which

- <i> is normally [i] but is [e] before <ḥ>, <h>, and <gh> and as ezaafe with exceptions

- <u> is normally [u] but is [o] before <ḥ>, <h>, <gh>, <'>, and "at the end of all Arabic words where it is pronounced by the Persians at all": e.g. <Allahu akbar> Allaaho akbar.

That book even has archaic English: e.g., shewn on p. 11. BACK TO TANGUT: SPEAKING OF SCARS

Here are a couple of puzzling transcription tangraphs from p.139 of Kepping's book on the Tangut translation of 類林 The Forest of Categories. The reconstructed readings are all mine.

0182 1vɨạ 'to allow; to say' for 陵 *lĩ 'mound'

3587 1pæ 'scar' for 項 *xø̃ 'neck' as well as 霸 *pæ 'overlord'

(This is the first tangraph image that I've made since January 1!)

They make no phonetic or even semantic sense, not even in terms of others' reconstructions: e.g.,

Tangraph Sofronov 1968 Li Fanwen 1986 Gong Hwang-cherng Gong's Chinese
0182 1vi̭a 1wǐ 1wjạ *ljĩ
3587 1pâ 1pǐa 1pia *xio(w)

Moreover, Li Fanwen (2008: 32, 580) doesn't list 0182 or 3587 as equivalents of 陵 or 項. Are these equations correct?

5.19.23:12: 3587 is a borrowing from Chinese 疤 *pæ 'scar', a word first attested around the same time as the Tangut script. (I can't find it in any texts before Jiyun [1037]).

0182 might be from pre-Tangut *Sɯ-wa, cognate to Old Chinese

*wat < *Cɯ-wat 'to say'

*wən < *-t-n? 'to say'

*wət-s 'to say'

(all ultimately sharing a *w-t root?)

On the other hand, the v- of 0182 may also be a lenited labial:

*Sɯ-pa > *Sɯ-βa > *Sɯ-βɨa > *Sβɨa > *ββɨa > *ββɨạ > *βɨạ > vɨạ BACK TO TANGUT: LONG VOWELS OR BENT VOWELS?

In the previous post, I mentioned how Tangut period northwestern Chinese *ŋgaw and *(ŋ)a were transcribed with a tangraph (or tangraphs?) for 1ngaɯ (in Sofronov's reconstruction) or 1gaa (in Gong's reconstruction). If Tangut had no -aw rhyme, then -aɯ and -a(a) would be among the next best substitutes. Another Tangut substitute for Chinese *-aw was rhyme 44: Sofronov's -eɯ and Gong's -ew. For instance, Chinese 高 *kaw 'high' was transcribed as

2074 1keɯ or 1kew 'a surname' (written as PERSON + HIGH: i.e., a person with a surname sounding like the Chinese word for 'high' rather than Tangut 2bie 'high')

Sofronov reconstructed and -i but not -w or -u after vowels. On the other hand, Gong reconstucted -w and -j but not -ɯ. Gong's system has long vowels corresponding to Sofronov's early Tangut final consonants (which become a source of in his late Tangut reconstruction): e.g.,

Gong -aa : Sofronov early -aC > late -aɯ

The -g of Tibetan Minyag 'Tangut' corresponds to -ɯ in Sofronov's reconstruction of a Tangut autonym:

2mɪ 2nɪ̭aɯ = Gong's 2mjɨ 2njaa

A sound change

-k > > =

is reasonable. -aɯ is like an upward bent version of a. What if some of Gong's long vowels were bent vowels or vowel-glide sequences? E.g.,

Rhyme Sofronov Gong My bent proposal
5 -un > -u -uu -uu = -uw
14 -i̭eC > -i̭e -jii -ii = -ij
22 -aC > -aɯ -aa -aɨ = -aɰ
32 -əC > -əə -əɨ = -əɰ
34 -ai -ej -e
38 -aiC > -ai -eej -ei = -ej
54 -oC > -oɯ -oo -ou = -ow
56 -on -ow -õ

Gong has no simple -e, but I have changed his -ej to -e to match the other simple vowels he reconstructed for rhymes whose positions correspond to those of R34 in other rhyme groups (e.g., R1 -u, R17 -a, etc.).

I have changed Gong's -ow rhymes (R56-60, 97-98) to nasalized rhymes following Sofronov, freeing me to reconstruct R54-55 as -ow rhymes.

The vowels of R5 and R14 are already high and can't bend any higher, so they lengthen. For consistency, I could reconstruct them with -w and -j. I have omitted medial -j- from R14 since I regard it as redundant. There is no minimal pair -ii : -jii in Gong's reconstruction.

I have long been skeptical of Gong's minimal triplet -jii : -jij : -jiij (which would be -ii : -ie : -iej in my bent proposal). Are there languages which distinguish -ii and -ij in speech? I confess that I can't distinguish between Russian армии 'armies' (nom. pl., etc.) and армий 'armies'' (gen. pl.) except in writing. Can a Russian speaker distinguish between the two in speech without any context?

My bent proposal has at least two problems:

First, it can't apply to Gong's -eew-type rhymes since they already end in glides unless I change -eew into -eɥ < -ejw. The glide is unusual in final position. Which languages other than Cantonese (e.g., 女 nøɥ) have final -ɥ? I'd rather not reconstruct extraordinary segments without extraordinary evidence. Of course, Tangut is anything but ordinary ...

Second, there is no Tibetan transcription evidence for bent vowels or final glides. The absence of such evidence for rhymes ending in -ɯ -ɨ -ɰ is understandable since there are no Tibetan letters for those sounds, but -ei and -ou could have been transcribed as -eHi and -oHu, though they never were. Perhaps the Tangut dialect(s) transcribed by Tibetans had lost diphthongs and/or final glides, but I'd rather not use that as an excuse to reconstruct whatever I wanted in 'pre-Tibetan transcription Tangut'. Does anyone know when the Tibetan transcriptions were made? Or know of any transcriptions dating from the period when the 105-rhyme system of the Tangraphic Sea was devised? Could that 105-rhyme system already have been archaic when the Tangraphic Sea was written? Did the average 11th century Tangut speaker have a simpler system reflected in the Tibetan and Chinese transcriptions? BACK TO TANGUT: SAUNTER SUMMER

Two weeks ago, I briefly got to see some Tangut books that were either new to me or that I hadn't seen in over a decade. Kepping's book on the Tangut translation of 類林 The Forest of Categories falls into the latter category. It has an appendix of Tangut equivalents of Chinese names from Sun Tzu as well as The Forest of Categories.

One entry toward the end of that appendix jumped out at me:

0533 1ki̭e transcribing Chinese 敖 *ŋgaw 'saunter; a surname' and 夏 *xæ 'summer; a surname' (marked with a "?")

(The number is Li Fanwen's. The Chinese reconstructions and translations are mine.)

Why would Tangut 1ki̭e approximate *(ŋ)aw and *xæ?

The short answer is that it didn't.

First, 1ki̭e is an error for 1ki̭ẹ, Sofronov's reconstruction of the reading of

0484 'to drink breast milk'

without a long horizontal bar in the left-hand radical. In David Boxenhorn's alphacode, short-barred 0484 is foadex whereas long-barred 0533 is foodex. The latter is a variant of

0555 'a surname' (alphacode: fomdex)

There are only two foo-tangraphs (0256, 0533) and both are variants of fom-tangraphs (0316, 0555).

Second, the correct reading of 0533 (and 0555) in Sofronov's reconstruction is 1ngaɯ, equivalent to Gong's and my 1gaa. 1ngaɯ and gaa are not far from Tangut period northwestern Chinese 敖 *ŋgaw. (Tangut had no -w.) However, neither reconstruction sounds much like 夏 *xæ.

In Li Fanwen (2008: 95), 0555 is listed as the Tangut transcription of Chinese

1. 敖 *ŋgaw (in The Forest of Categories)

2. 奡 *ŋgaw (in The Forest of Categories)

3. 囂 *ŋgaw (in Sun Tzu)

4. 閼 *(ŋ)a (in Sun Tzu)

but not 夏 *xæ. Since 夏 was marked with "?", perhaps it should be replaced by the similarly shaped graph 奡.

I don't have access to the text of Forest so I have no idea whether the tangraph representing 敖 and 奡 was

0484, 0533, or 0555

They may have been confused in Forest. The mistakes of native Tangut authors could be instructive.

My point is not to criticize Kepping but to give an example of the difficulties that one faces when even writing a single line of a book on Tangut. Lookalike tangraphs (and sinographs!) and small typographical errors can lead a researcher astray. I would have struggled to reconcile Tangut 1ki̭e with Chinese *(ŋ)aw and *xæ if I hadn't figured out what was going on.

