Last night, I got my first Unicode Avestan font and struggled in vain to add Avestan text to "Are Camels Carriers?" in KompoZer. All I could see were question marks. Then today I switched to SeaMonkey 2.2 (thanks to David Boxenhorn) for HTML editing and was pleased to see the first word I had ever typed in the Unicode Avestan alphabet:


<a r t š u> (read from right to left)

Avestan uštra 'camel'

I've added it to my previous entry.

If you don't see anything, switch to Firefox and download the same font I did for free. It's named after

𐬀𐬵𐬎𐬭𐬀 𐬨𐬀𐬰𐬛𐬁

Ahura Mazdaa

whose name isn't in Avestan script on Wikipedia yet, so I had to type it myself instead of copying and pasting it. At the moment I input Avestan letters one by one using Andrew West's BabelMap. Maybe someday I could try to design a custom QWERTY keyboard for Avestan. Since Avestan has 53 letters*, I'd need combinations of <s>, <shift>, and, say, <alt> to cover the four diffrerent s-letters:

𐬴 𐬳 𐬱 𐬯

s š š́ ṣ̌ [s ʃ ɕ ʂ]

(Tangut's living relative Taoping Qiang, a variety of Southern Qiang, also has this four-way distinction, though the Wikipedia entry for Southern Qiang does not list [ʃ].)

Ah, the Avestan script version of Ahura Mazdaa is in Wiktionary!

Those of you who have been reading my blog for years may remember my 2004 series of beginning lessons in Avestan (now taken offline). Unfortunately, I typed it using the non-Unicode Wistaspa font. I wish I could rewrite those lessons using Unicode, but I have many other projects that are more important: e.g., the completion of my translation of the second half of the Golden Guide. (Here's where I left off back in November.)

*Avestan has no distinction between lower and upper case.

Unicode Avestan has 54 letters because it includes an additional letter <l> for Middle Persian. Avestan had no l. Hence Avestan r may correspond to non-Avestan l as well as r: e.g. (examples from Jackson 1892: 39),

Av 𐬑𐬭𐬀𐬞𐬀𐬌𐬙𐬍 xrapaitii : Skt kalpate 'arranges' (root kḷp < Proto-Indo-European *kel?)

Av 𐬭𐬀𐬚𐬆𐬨 raθəm : Skt ratham 'wagon' (cognate to Latin rota 'wheel') ARE CAMELS CARRIERS?

Last night, I rejected the Sanskrit roots

ukṣ 'sprinkle'

uṣ 'to burn'

as roots for Sanskrit uṣṭra 'camel' and its later Pali cognate otṭha.

Burrow (1955: 151) derived uṣṭra 'camel'

from [the root] vah- ['carry'] with irregular sandhi, cf. Av[estan] vaštar- 'drought animal' = [Skt] voḍhár-)

I would have expected *uuḍhra < *uẓh-tra with a long first vowel and -ḍh- instead of -ṣṭ-

*uẓh- would be a zero-grade descendant of Proto-Indo-European *wegh 'to go, transport in a vehicle' (Watkins 2000: 95), the root of wagon and vehicle.

Since uṣṭra 'camel' and its Avestan cognate 𐬎𐬱𐬙𐬭𐬀 uštra 'id.' have no uncontroversial cognates outside Indo-Iranian, I suspected that the word might be from a substratum language. Tonight I discovered that 'camel' was on Lubotsky's 1999 list of potential substratum borrowings in Indo-Iranian:

Lubotsky pointed out that the phonological and morphological similarity of 55 loanwords [including 'camel'] in Proto-Indo-Iranian and in Sanskrit indicates that a substratum of Indo-Iranian and a substratum of Indo-Aryan represent the same language, or perhaps two dialects of the same language.

Google Books provides a snippet view of 'camel' in Lubotsky's list.

8.13.5:05: Camels play a role in The quest for the origins of Vedic culture: the Indo-Aryan migration debate. Is India the homeland of Proto-Indo-European, or did Indo-European speakers come to India? Thieme (1964) thought the absence of 'camel' in PIE ruled out Iran as a homeland for PIE and used similar logic to also rule out India (p. 112). Dhar's (1930) counterargument is that "the names of such animals [e.g., camels] as are peculiar to the East might easily be forgotten by the people [after they had left India] in the West where those animals were not found" (p. 113).

I side with Michael Witzel on this issue:

[Prehistoric i]mmigration [of Indo-European speakers into India], however, has often been denied in India especially during the past two decades, and more recently also by some western archaeologists [... W]e actually  do know that one group after the other has entered the Indian subcontinent, as immigrants or as invaders, in historical times [...] Why, then, should all immigration, or even mere transhumance trickling in, be excluded in the single case of the Indo-Aryans, especially when the linguistic evidence [...] so clearly speaks for it?  Just one "Afghan" Indo-Aryan tribe that did not return to the highlands but stayed in their Panjab winter quarters in spring was needed to set off a wave of acculturation in the plains, by transmitting its 'status kit' (Ehret) to its neighbors. The vehement denial of any such possibility [...] is simply unreasonable, given the frequency of movements, large and small, into South Asia via the northwestern corridors.

Unfortunately, the denial of foreign influence in cultural history is not confined to some segments of Indology. ʔUT-TERLY BAFFLING

The Thai word for 'camel' is

อูฐ  <ʔuuṭha> [ʔuut]

which should go back to a Sanskrit or Pali uuṭha. But the corresponding Indic words are

S uṣṭra (with a short u) which should correspond to T อุษฏร <ʔuṣṭra> *ʔut

(8.12.00:30: Thai borrowings from Indic usually drop final -a. Thai lacks final consonant clusters, final sibilants, and retroflexes in any position, so the -ṣṭr remaining after -a loss would be reduced to -t.)

P otṭha (with an o, not a u) which should correspond to T โอฏฐ <ʔoṭṭha> *ʔoot

(P o is regularly borrowed into Thai as <o> oo.)

The actual Thai word <ʔuuṭha> ʔuut is like a compromise between the two hypothetical borrowed forms. It has an u like Sanskrit-based *ʔut but that u is long like the long oo of Pali-based *ʔoot. However, it is completely missing the letter ฏ <ṭ> present in both hypothetical forms.

The o of the Pali form is unexpected, since S u normally corresponds to P u.

I don't know what the source of the earlier Sanskrit form is. Monier-Williams' dictionary lists two possible roots. Neither sounds plausible to me:

ukṣ 'sprinkle' (with irregular -k- loss)

uṣ 'to burn'

I also don't know whether the Avestan cognate uštra 'camel' is in Zarathushtra's name or not. FROM A SINGLE MUSTARD SEED? PART 0A

I've been asked to evaluate some proposed Chinese etymologies for a number of Japanese words including 'mustard'. But before I can get there, I have to outline my account of early Chinese phonology. Bear with me. This is relevant, even if the payoff isn't immediate.

Chinese is often regarded as a single 'language' with monosyllabic words, each with a tone and a distinct Chinese character. The classic example of tones that I saw in my first Chinese textbook back in 1980 was

ma + level tone 'mother'

ma + high rising tone 'hemp'

ma + low rising tone 'horse'

ma + falling tone 'to scold'

ma + neutral tone (question particle)
You've probably seen it before. Don't go to sleep yet.

My views about Chinese changed radically over the next decade and a half.

I was vaguely aware of Hakka and Cantonese even in the 80s, but I may not have realized how different Cantonese was until I got a book of Cantonese character readings in 1989. What surprised me was that Cantonese wasn't simply Mandarin with different character readings, but that it had words without Mandarin cognates: e.g., 'what': Ct 乜嘢 matye : Md 什么 shenme.

The following year, I tried to read colloquial Cantonese and could make little sense of it, even though I knew a few key words like 乜嘢. I would have understood more if my Mandarin were stronger, but no amount of Mandarin would enable me to guess the meanings of Cantonese-only words. (Similarly, no amount of French would enable one to guess that Spanish izquierda, a borrowing from Basque, means 'left'. French gauche 'left' is unrelated.) I started to think of Cantonese as a separate language, not just as Mandarin with different pronunciation.

In 1991, I was shocked to see Toudou Akiyasu (or someone else in a dictionary that he co-edited) speculate that Old Chinese might not have had any tones. (I don't remember the exact wording. Sorry.) How could this be? At the time, I was studying Middle Chinese poetry which was based on tonal rhyming. If Chinese originally lacked tones, where did the tones come from?

I found out two years later when Anatole Lyovin introduced me to the idea of tonogenesis. Phonemic tones could develop to compensate for lost consonantal distinctions. In the case of Mandarin:

final sonorant (vowel, glide, nasal) final glottal stop final *-s
voiceless initial level tone low rising tone falling tone
voiced initial high rising tone low rising tone after sonorants;
falling tone elsewhere

(This account is highly simplified, but sufficient for the purposes of this series.)

I reconstruct the Old Chinese (OC) ancestors of the first four Mandarin words above as

ma + level tone 'mother' < earlier ?*hma (not attested in OC; could be from *ma with a tone by analogy with 爸 ba [pa] 'father' which has a voiceless initial in spite of its pinyin romanization)

ma + high rising tone 'hemp' < OC *mraj

ma + low rising tone 'horse' < OC *mraʔ (*m- is a sonorant)

ma + falling tone 'to scold' < OC *mras

I'll explain the function of the underlining later in this series.

The neutral tone question particle 吗 ma may be from an unstressed Old Chinese 無 *ma 'not' (Schuessler 2007: 373).

Professor Lyovin also suggested that Chinese words might have alternated between long and short over time:

Old Chinese long > Middle Chinese short > Mandarin long

Mandarin has only about 1,200 distinct syllables (Ramsey 1987: 41) but has many more words. Most words are polysyllabic.

In some cases, Mandarin words are even longer than their English equivalents: e.g., 为什么 weishenme 'why'. (Even if one literally translates weishenme as 'for what', 什么 shenme is still longer than its translation 'what'.)

Daniel Kane (2006: 67) stated that "the standard Chinese-English Dictionary" (现代汉英词典?) has "6,000 single-character entries" and "50,000 compound character entries" Most of the "6,000 single-character entries" represent monosyllabic roots, not words, but even if they all represented words, the "50,000 compound character entries" representing polysyllabic words still outnumber them by an 8 to 1 ratio.

"Polysyllabic words" are nearly always synonymous with "compound character entries". Almost all Chinese characters represent single syllables with very few exceptions: e.g., 儿 Md -r and 吋 Yingcun 'English inch' (also written as two characters 英寸; more examples here). Those syllables fit a simple template:

(C)(G)V(C) + T

C = consonant

G = glide (e.g., y or w)

V = vowel

T = tone

Chinese languages do not have rich consonant or vowel inventories, so the number of possible syllables is limited, and the number of homophonous Chinese characters is large. Windows 7's Simplified Chinese input method lists 103 different characters pronounced Md shi + falling tone: e.g.,

是市事式世士室氏试视势示饰仕释 etc.

Granted, some of those 103 are variants, but still ...

There are no modern Chinese languages with syllables like Eng strengths which has three consonants on both sides of a vowel:


[str] [ɛ] [ŋθs]

However, neighboring languages do have more complex syllable structure: e.g., Classical Tibetan


bsgrubs 'accomplished' (root sgrub)

Guillaume Jacques has compiled a table of possible initial consonant clusters in Classical Tibetan.

I studied Khmer partly to get a feel for a nearby spoken language with complex consonant clusters: e.g., the Khm- of ខ្មែរ  <khmɛr> Khmae 'Khmer' itself.

In addition to complex single sylables, Khmer also has sesquisyllables (one-and-a-half syllables) consisting of an unstressed 'half' syllable (= minor syllable) followed by a stressed main syllable: e.g.,

cvc CVC

pəmɓəj 'eight'

reduced from disyllabic prambəj < pram 'five' + ɓəj 'three' (and still spelled ប្រាំបី <praaṃpii>).

What if Old Chinese had sesquisyllables which also originated from earlier disyllables? What if many modern homophones were once nonhomophonous disyllables in (pre-)Old Chinese?

Next: The Khmerization of Old Chinese HOLE-Y THREAD

While looking for 糸 'thread' + 六 'six', I found this made-in-Vietnam character for the syllable khoong:

糸 'thread' + 孔 khổng 'hole'

Vietnamese syllables ending in -oong are rare and are "in a few borrowed words"* (Thompson 1984-85: 66): e.g.,

ba toong [ɓa tɔŋ͡m] < French baton 'cane'

boong [ɓɔŋ͡m] < French pont 'ship deck' (Vietnamese has no initial p-)

Nguyễn (1997: 24) has more examples, including xoong 'casserole' (but the phonetic resemblance with French is limited to x [s] and oo [ɔ]!).

I can't find a definition for khoong anywhere and I don't know when 糸+ 孔 was used. 糸+ 孔 could be evidence for

- early adoption of the rhyme -oong: i.e., when vietography (nôm) was still in use

- late creation of vietographs: i.e., even after the rhyme -oong came into use

*I can't identify the source of

xoong ~ soong (in cải xoong ~ cải soong 'watercress'; cải is 'mustard greens'

Is it from a non-French language spoken in Vietnam?

I cannot find any vietograph for xoong at nomfoundation.org, but that site does have a listing for an undefined soong resembling 双 song 'pair' atop 二 nhị 'two'.

There are also no entries for loong or toong vietographs. But there are two entries for boong 'bell' (onomatopetic rather than a loanword?):

葻 < 艹 thảo 'grass' atop 風 phong 'wind'

艹 plus 几 with two 丿slashes intersecting the 乚 on the right

Were either of these ever used to write boong 'ship deck'? TEN PEOPLE, ONE UMBRELLA, TWO READINGS

In my quest for 糸 'thread' + 六 'six', I found this variant of 紒 'to tie' (itself a variant of 結 'to tie')

糸 + 仐 (resembling 人 'person' atop 十 'ten')

which reminded me to write about the Khitan small script character  <s> resembling 仐. I thought <s> might be an abbreviation of Khitan period Chinese 舍 *ʃie 'house', but the initials (*s- and *ʃ-) didn't match. I discovered at zdic.net that 仐 is a variant of 傘 'umbrella' which would have been read *san in Khitan period Chinese. So <s> must be based on *san.

Interestingly, 仐 is not one of the variants of 傘 'umbrella' listed in the Longkan shoujing dictionary compiled in the Khitan Empire.

According to zdic.net, 仐 can also be a variant of 今 Md jin 'now', a word totally unrelated to 傘 Md san 'umbrella'.

One variant of 傘 looks like 企 'plan' atop 平 'peace'. SIX THREADS

I happened to find this etymology for the surname Samet while I was still in a numerical state of mind after writing about 叱 'to scold' (containing 七 'seven') and 三枝 'three grasses' (emphasis mine):

German and Jewish (Ashkenazic): metonymic occupational name for a maker or seller of velvet, from Yiddish samet 'velvet' (German Samt, ultimately from Greek hexamiton, a compound of hex 'six' + mitos 'thread').

I can't help but imagine a Chinese character 糸 'thread' + 六 'six'. I can't find any such character at the Taiwanese variants dictionary, but even that massive dictionary with 106,230 entries isn't complete. No luck at the Unihan database or in nomfoundation.org's list of made-in-Vietnam characters. Sometimes it seems as if every combination of Chinese character components was already used by someone over the last three millennia, but that's not literally true.

How did 'six threads' come to mean 'velvet'? Liddell and Scott defined ἑξάμιτος hexamitos (masc.) as 'of six strands'. Hexamiton is the neuter form of that adjective.

The word samet 'velvet' also existed in Middle English.

