While researching my last post, I found this Proto-Indo-European verb conjugator. Unfortunately, I couldn't find *h1es- 'to be'. I'm surprised by its apparent absence. I wonder

- how many people actually use this part of the Verbix site

- how many PIE verbs are in the database

- whether the top ten PIE verbs might mostly reflect the order of the verbs in the database (with a bit of reranking due to the few who used this site)

The top verb on Verbix's Dutch, English, French, German, Italian, and Spanish pages is 'to be', so I would have expected *h1es- to be number one on the PIE page, but that honor goes to uid 'know', the ancestor of English wit and Sanskrit vid. Sanskrit retains the PIE alternation in the stems of present forms:

Singular: PIE *uóid- (accented) : Skt véd- < *váid-

Plural: PIE *uid- (unaccented) : Skt vid-
In both PIE and Sanskrit, the accent shifts to the ending in plural forms:

'I know': PIE *uóid-mi, Skt véd-mi

'we know': PIE *uid-més, Skt vid-más

I have no idea why.

In Beekes' (1995: 16) paradigm for *h1es-, the unaccented stem has no vowel:

Gloss PIE Sanskrit
I am *h1és-mi ás-mi
thou art *h1és-si á-si
he/she/it is *h1és-ti ás-ti
we are *h1s-més s-más
you are *h1s-th1é s-thá
they are *h1s-énti s-ánti

Sanskrit has dropped the initial consonant of the unaccented plural stem: *h1s- > s-.

(6.12.00:45: Latin has es- ~ s- whose correspondences with Sanskrit ás- ~ s- are reversed in two forms:

L sum : Skt ás-mi 'I am'

L estis : Skt s-thá 'you are'

I don't know why the Latin forms aren't esmi and stis.)

Other things I don't know:

I was going to write that I didn't know the difference between PIE *uid- 'know' and *gnh3- 'know' (the ancestor of English know), but Beekes (1995: 228) defined the latter as 'get to know'.

(6.12:00:40: Although wit is now obsolete as a verb in English, German retains both 'know'-verbs: wissen is 'to know a fact' whereas kennen is 'to be familiar with'.)

The Sanskrit descendant of *gnh3- has a present stem jaanáa- with two long vowels instead of the expected janáa- from *gn-neh3- (sic from Beekes 1995: 228; should this be *gn-néh3-?).  I don't know why there's an infixed nasal.

Sanskrit vid has other paradigms besides the 'class II' paradigm above:

- class I: thematic vowel after the stem: védaami, védasi, védati, vidáamas, vidátha, vidánti

- class VI: fixed accent on a thematic vowel after the stem,: vidáami, vidási, vidáti, vidáamas, vidátha, vidánti (the last three forms are indistinguishable from the previous paradigm)

- class VI with an irregular infixed nasal: vindámi, vindási, vindáti, vindáamas, vindátha, vindánti (more)

I can understand the appeal of a thematic paradigm (which is common in Sanskrit) or a simpler fixed-accent paradigm, but why infix a nasal?

6.12.00:32: Sanskrit has three regular verb classes with infixed nasal (syllables):

- class V: root su

su--mi 'I press'

su-nu-más 'we press'

but su-tá 'pressed' (no nasal)

- class VII: root yuj (cognate to Eng yoke):

yu--j-mi 'I join'

yu-ñ-j-más 'we join'

but yuk-tá 'joined' (no nasal)

cf. Latin-based junction with the nasal

- class IX: root krii

krii-ṇáa-mi 'I buy'

krii-ṇii-más 'we buy'

but krii-tá 'bought' (no nasal)

How did this "nasal present" (Beekes 1995: 231) develop? According to Beekes, it "only remained in Indo-Iranian and Hittite". IS 'AM' 'IS-AM'?

In my last post, I mentioned that the Polish first person singular ending < -o-m has a suffix -m cognate to the -m of English am. One might wonder: if Polish and English have cognate verb endings, could they also have cognate verbs? Could the Polish word for 'am' be cognate to am? The answers to both those questions are yes and yes.

The Proto-Indo-European source of am was *h1es-mi. (Asterisks indicate reconstructions.) There is an entire Wikipedia article on PIE words for 'be' listing many descendants of *h1es-mi (and saving me a lot of researcha and typing!). They generally fit the pattern (V)(s)(V)m(i), with V standing for any vowel: e.g.,

Language (V) (s) (V) m (i)
English and Old Irish a

Gothic i

s u m
Greek e
i m i
Albanian ja

Lithuanian e s
m i
Latvian e s
m u (!)*
Old Church Slavonic je s
m ĭ
Hittite ē š
m i
Armenian e

Avestan a h
m i
Sanskrit a s
m i

Most Slavic forms for 'am' fit the formula with the exception of Polish jestem which has an unexpected -t-.

Persian hastam 'am' has an unexpected h- as well as a -t-. This h- is not a retention of PIE *h1- which "disappeared without a trace" in most IE languages (Beekes 1995: 143).

Do Polish and Persian preserve a -t- that was lost in all other Indo-European languages. No. Here's what I think independently happened in both languages. Jestem and hastam look like

Polish jest 'is' + -em (1st person singular)

Persian hast 'is' + -am (1st person singular)

Jest and hast themselves are root-suffix sequences from PIE *h1es-ti 'is' (the most frequent form?**) which was reinterpreted as a stem for other forms. Nearly all the other members of the present paradigms for Polish and Persian 'be' also look like jest and hast plus various person/number endings:

Gloss Polish Persian
thou art jest-eś hast-i
he/she/it is jest hast
we are jest-eśmy hast-im
you are jest-eście hast-id
they are (!) hast-and

The sole exception is Polish 'they are' which is not jest-ą.

I still have no idea where the h- of the Persian forms comes from. I doubt it's by analogy with an earlier form of 'they are' cognate to Avestan hən̨ti < *santi (cf. Skt santi, Russian sut', Polish są, Latin sunt).

*Both Latvian and Lithuanian have another form esu 'I am' which looks like the root es- plus a descendant of the PIE 'thematic' first person ending *-oH (> Latin -o in amo 'I love', etc.). Latvian esmu looks like a mix of esu and a hypothetical earlier esmi which is still in its sister Lithuanian.

**In modern Russian, most of the paradigm of jes- 'to be' is extinct with the exceptions of jest' 'is' and the much less common sut' '(they) are'. I assume that these two forms survived because they were more frequent than jesm' 'I am', etc. in earlier Russian. I THANK, THEREFORE I AM

On Wednesday, Jim Shooter mentioned that two of his "few words of Polish" were dzień dobry (lit. 'day good'). That got me thinking about what little Polish I knew. (I mostly guess my way through Polish using my Russian.) I picked up the Polish for 'thank you' from, of all places, Jenny McCarthy's Jen-X (1997). (The great thing about linguistics is that data is everywhere. Even in celebrity junk books. But maybe not in celebrity junk science*.) I can't remember how she spelled it phonetically, but I do know the correct Polish spelling:


This literally means 'I thank'. The stem is dziękuj- and the suffix indicates a first person singular subject: i.e., I.

The hook under the e indicates nasality. Nasal goes back to pre-Proto-Slavic -o (not e!) plus -m.

The -o ending is cognate to the first person singular -o of Latin: e.g., amo 'I love' (which is still amo in Spanish today!). (Click on the links to see full paradigms of those verbs at Verbix. Alas, Polish isn't available.)

The -m ending is cognate to the -m of English am to the west of Polish ... and to the -mi of Sanskrit asmi 'am' to the east of Polish. (Technical caveat.**)

These shared endings are strong evidence for a common ancestor of these languages. Not all similarities between languages indicate shared ancestry. People easily borrow words, but they are less likely to borrow affixes. Common affixes are not unique to the first person singular; they can be found throughout the paradigms of the Indo-European languages. It's possible that different languages can have a similar affix by chance, but systematic, large-scale correspondences probably did not develop independently. They are probably descendants of a single earlier system - that of Proto-Indo-European.

*I do not endorse Jenny McCarthy's stance on autism. The James Randi Educational Foundation gave her a Pigasus (sic) award for having "fooled the greatest number of people with the least effort". Ouch. I don't think "fooled" is the right term as it implies she intended to deceive. Obviously, I can't read her mind, but my guess is that she is sincere. However, good intentions do not justify misinformation.

**Pre-Proto-Slavic -m goes back to the Proto-Indo-European 'secondary' ending -m whereas the -m of -am and the -mi of Skt asmi 'am' go back to the Proto-Indo-European 'primary' ending -mi. I suspect that PIE -m and -mi are not only cognate to each other but also ultimately cognate to PIE Hme 'me'.

Modern Indo-European languages still have obvious descendants of PIE Hme:

English me

Spanish me 'me'

Polish mnie, mię 'me'

Even IE languages in the east at least retain m:

Persian man raa 'me' (raa indicates a direct object; man by itself is 'I')

Hindi mujh 'me' IN--EGRAL YET SEPARA-ṬH-E

The Thai name ณัฏฐ์ Nat from my last post ends in the letter

<ṭha> [thɔɔ] (< > indicate an Indic-based transliteration; [ ] indicate modern Thai pronunciation)

Its complexity surprised me when I first learned the Thai alphabet at the end of 1993. But why? It's not as complex, as, say, a 48-stroke Chinese character like

'flight of dragons' (龍 'dragon' written thrice in the space of one character)

which has its own Wikipedia article or the 64-stroke


'verbose' (龍 'dragon' written four times in the space of one character)

which you might not be able to see.

What struck me as unusual was the complexity of the unattached lower part of it below the baseline. The rest of <ṭha> resembles the Thai letter for <ca> with the top of <dha> or <ra>:

ฐ จ ธ ร

<ṭha> <ca> <dha> <ra>

Until I encountered Thai, I had never seen an alphabet with complex unattached elements.

The basic 26-letter Roman alphabet has one such letter: lower case i. The dot is not attached to the ı below it. However, this dot may not be absolutely integral to the letter except in Turkish:

Non-Turkish versions of the Roman alphabet I i (upper case without dot, lower case with dot)
Turkish I ı [ɯ] (without dots) İ i [i] (with dots)

The dot is absent from Gaelic type. It took me a while to notice that Noss' 1964 Thai grammar has a dotless ı to accomodate tone marks. A bit odd, perhaps, but not as bad as, say, substituting Г for F. How many non-Turks would be able to distinguish between Turkish dotted i and dotless ı without effort? Or notice that the dot of i was missing?* And the dot is lost when accents are added:

i ì í î ï ĩ ī ĭ ǐ ȉ ȋ ỉ

None of those accents are as complex as the shape at the bottom of Thai ฐ <ṭha>. That shape disappears when <u>, <uu>, or the phinthu vowel cancellation sign are added to the bottom in Windows Thai fonts:
ฐ ฐุ ฐู ฐฺ

<ṭha> <ṭhu> <ṭhuu> <ṭh>

So the bottom part of ฐ <ṭha> isn't absolutely essential either, though as far as I know, <ṭhu> and <ṭhuu> are not in any Thai words and vowelless <ṭh> is only used in the Thai transcription of Indic words. Are Thai taught to delete the bottom of ฐ <ṭha> to make room for vowels and phinthu in their handwriting? Deletion never came up in the Thai writing textbooks I studied in or out of class.

The simpler shape at the bottom of  ญ <ña> also disappears in Windows Thai fonts if signs are added beneath it:

ญ ญุ ญู ญฺ

<ña> <ñu> <ñuu> <ñ>

No other Thai letters have integral yet separate parts. Where did these parts come from?

The dot (tittle) of i only dates from the 11th century. It was intended "to distinguish the letter i from strokes of nearby letters." The Greek source of Roman i still has no dot: ι.

The dots of Arabic consonants were added to disambiguate consonants: e.g., one shape for three consonants


<b> / <t> / <th>

became three different letters:

ث  ت  ب

<b> <t> <th>

What was the function of the separate parts of Thai ฐ <ṭha> and ญ <ña>? The Thai alphabet was derived from the Khmer alphabet in the 13th century, but I don't know what either alphabet looked like back then. The modern Khmer letters for <ṭha> and <ña> are quite different from their modern Thai equivalents:

Transliteration <ṭha> <ña>

I wonder if the bottom part of Khmer <ña> was added to distinguish it from <baa>:

ញ ពា

<ña> <baa>

though in Thai, there is no similarity between the two:

ญ พา

<ña> <baa>

Dropping the bottom part of Thai ญ <ña> would result in a completely unambiguous letter.

Similarly, dropping the bottom part of Thai ฐ <ṭha> would also result in a completely unambiguous letter. I could call the bottom of ฐ <ṭha> the 'platform' after the name of the letter, ฐอ ฐาน <ṭha ṭhaan> [thɔɔ thaan] 'platform th'. (Thaan is borrowed from Pali ṭhaan, cognate to English stand. Each Thai consonant letter is associated with an example word.)

The original Brahmi source of <ṭha> is simply a circle: O (corresponding to Greek θ). This original shape is still reflected in modern Devanagari ठ <ṭha>. The contrast between Devanagari ठ and Thai ฐ shocked me back in 1993 and the contrast between Brahmi O and Thai ฐ is even greater. How did <ṭha> get so complicated? Why did it grow a 'platform'?

Perhaps the letters preceding <ṭha> in the Thai alphabet may provide a clue:

ฎ ฏ ฐ

<ʔḍa> <ṭa> <ṭha>

The attached bottom parts of <ʔḍa> and <ṭa> resemble the unattached part of <ṭha>. Is this a coincidence? Removing those parts would make <ʔḍa> and <ṭa> resemble <bha>:

ฎ ฏ ภ

<ʔḍa> <ṭa>  <bha>

Were the bottom parts originally only attached to <ʔḍa> and <ṭa> as disambiguators, spreading by analogy to <ṭha>, the next letter in the sequence of the alphabet? <ṭha> had no vertical stroke on the right, so there was nothing for the disambiguator to attach to. Hence it became a freestanding platform.

*6.10.2:05: I did a little experiment on facebook yesterday. I wrote the following status:

Do you notıce anythıng odd about the way thıs status ıs wrıtten? Your answers wıll help me wıth a future blog post. Thanks ın advance!

Five people responded. All noticed the dotless ı, but not everyone noticed it immediately.

On the other hand, if the bottom of Thai ฐ <ṭha> were consistently omitted, I predict that Thai readers would notice right away. A dot is tiny compared to the shape beneath Thai ฐ <ṭha>.

6.10.2:35: There are two Devanagari letters that are distinguished solely by the presence or absence of a dot:

ड ङ

<ḍa> and <ṅa>

Their Brahmi sources look completely different: <ḍa> and <ṅa>. Did they converge in shape and require a dot to diffrentiate them? They look completely different in Thai and Khmer:

Transliteration <ḍa> <ṅa>
Khmer UN-THAI-NG A ณัฏฐ์ NAT

Thai names often incorporate Sanskrit and Pali elements which may be highly disguised: e.g., ภูมิพล Phuumiphon, a name of the current Thai king, is from Skt or P bhuumi 'earth' + bala 'strength'. (The romanization Bhumibol is a compromise between the Thai and Sanskrit pronunciations.) I can usually identify the source of a Thai name, but here's a name that's been baffling me for days:


Nat (spelled <ṇaṭṭha> with a silencing mark over the <ṭha> to indicate it's not pronounced)

The spelling implies that the name is from an Indic ṇaṭṭha, but there is no such word.

In fact, Monier-Williams lists no Sanskrit words beginning with ṇa other than the name of the letter itself, a few dubious ṇa nouns only listed in lexicons without any textual attestations, and the attested word ṇya 'name of an ocean in the Brahma-loka'. Consonant plus a is a very unusual shape for a Sanskrit noun. Are inflected forms of such monosyllabic nouns attested? Monosyllabic noun stems in Sanskrit usually end in long vowels: jaa- 'progeny' (cf. Latin genus), dhii- 'thought', bhuu- 'earth' (cf. bhuumi 'id.' above).

The Pali Text Society lists no words with initial at all.

The Royal Institute of Thailand lists only two ณ na <ṇa> words and one ณรงค์  naroŋ <ṇaraŋga> 'campaign', an abbreviation of รณรงค์ ranaroŋ <raṇaraŋga> < Skt raṇaraŋga 'battlefield'.

The closest word I can think of is Pali naṭṭha 'destroyed' < Skt naṣṭa 'destroyed' with a dental n- rather than a retroflex ṇ-. But who would name their son 'Destroyed'? And why isn't the name spelled with dental น <n> as


Nat (spelled <naṭṭha>)?

Skt dental n sporadically becomes P retroflex ṇ: e.g., Skt jñaana > P ñaaṇa 'knowledge' (cognate to know, gnosis, etc.). But did that ever occur in initial position? Even if ณัฏฐ์ is from some Pali  ṇaṭṭha not in dictionaries, that still doesn't account for the negative semantics. Is the use of retroflex ณ <ṇ> an attempt to avoid writing 'destroyed'?

The only similar Indic word with positive semantics that I can think of is นาถ naat <naatha> < Skt or Pali naatha 'protector', but the long vowel doesn't match and the spelling is very different.

Could Nat have originated as an abbreviation of a longer name like 'X-destroyed'? But X in that type of Indic compound would be the destroyer. Why name your son 'destroyed by X'? Thai names are normally positive: e.g., I had a classmate named เศรษฐพล Setthaphon < P seṭṭha 'best' +  bala 'strength' (cf. Phuumiphon above). I assume nobody would name their son บาปิฏฐ์ Baapit <  P paapiṭṭha 'worst'. I was relieved to see zero hits for Baapit in Google. FROM KARATE TO KHAARAATEE

Thai and Japanese both have short and long vowels. One would expect the following patterns of borrowing:

Japanese short vowels > Thai short vowels: e.g.,

J Okinawa : T โอะกินะวะ Okinawa

Japanese long vowels > Thai long vowels
J Toohoo : T โตโฮ Toohoo (Toho is the studio of Godzilla)

(For simplicity and ease of comparison, I've omitted glottal stops from my Thai transcription.)

However, karate with three short vowels turned up in the last post as คาราเต้ khaaraatee with three long vowels. Other mismatches:

J (Pat) Morita : T มอริตะ Mɔɔrita

J Gojira 'Godzilla' : T ก็อตซิลล่า Kɔtsilaa (there is no g, j, or dz in Thai)

J Bushidoo : T บูชิโด Buusidoo (there is no sh in Thai)

J samurai : T ซามูไร saamuurai

J ninja : T นินจา ninchaa

J Naruto : T นารุโตะ Naaruto

J Nara : T นารา Naaraa (the name of a character in Naruto) but นะระ Nara (the name of a city)

I cannot find any cases of Japanese long vowels corresponding to Thai short vowels.*

One could propose that some of the mismatches are due to borrowing through English, but the Thai long vowels and even some vowel qualities are unexpected from an English perspective. I'd predict kharaatii, Mɔriita, Kɔtsila (without any long vowels), sɛ(ɛ)mura(a)i based on Anglicized pronunciations I've heard.

Do Thai learners of Japanese (and vice versa) have trouble with each other's vowel lengths? I didn't have any problems with vowel length in either language.

In the Thai Wikipedia, vowel length is preserved in Thai versions of less well-known Japanese names. I suspect that vowel length correlations are looser in 'casual' borrowings which are likely to be filtered through English. But why didn't the Thai translator of Naruto systematically carry over vowel lengths into Thai versions of names? (I don't even understand the reasoning behind retitling Naruto as


Ninchaa khaathaa Oohoohe

lit. 'ninja magic-word Ohohe'

in Thai. What's an 'Ohohe'? I don't think it means anything in Japanese or Thai and as far as I can tell, it's not the name of anyone in the series. Is it a made-up magic word that the Thai version of Naruto says?

khaathaa is from earlier Thai *gaathaa, a borrowing from Sanskrit gaathaa 'verse', cognate to Avestan gaaθaa (i.e., the gathas of Zoroastrianism).

Vowel length correlations are strong between Thai and three of its major vocabulary sources: Khmer, Sanskrit, and Pali. An exception I learned early on was

T ครู khruu < *gruu < Khmer គ្រូ  *gruu < Sanskrit or Pali guru (no long vowel) 'teacher'

Vowel length inaccuracy interests me because Gong and others have reconstructed phonemic vowel length in Tangut, though I have yet to see any correlations between Tangut vowel length and Sanskrit vowel length in Tangut transcriptions of Sanskrit. This is perhaps because the Tangut learned Sanskrit through Chinese which has no vowel length. However, didn't the Tangut also learn Sanskrit through Tibetans who did preserve Sanskrit vowel length even though their own language may have lacked it at the time? Could the Tangut vowel length distinction be reinterpreted as some other kind of distinction: e.g., a final glottal stop as implied by the notation of Arakawa's reconstruction?

*One might argue that

J Tookyoo : T โตเกียว Tookiaw

is a case of a Thai shortening of a Japanese long vowel, but I suspect that the Thai name is a transliteration of the old kana spelling トウキヤウ <toukiyau> for Tookyoo. Moreover, there is no long -iaaw or -yoo rhyme in Thai. (Yoo is possible by itself in Thai, but not after a consonant.)

Tangut fonts by Mojikyo.org
Tangut radical font by Andrew West
All other content copyright © 2002-2011 Amritavision