Tonight I was reading about the Tibetan Mongolian Buddhist Cultural Center whose director is the 8th Arjia Rinpoche. The English Wikipedia has the Tibetan spelling ཨ་རྒྱ་ <a.rgya.> but the Chinese Wikipedia has the Tibetan spelling ཨ་ཀྱ་ <a.kya.>. Which is correct? Does the r of English Arjia correspond to a Tibetan r, or is it an intrusive r like the r of Burma and Bamar from r-less Burmese ဗမာ <bamā>?

The 8th Arjia Rinpoche's title ཧོ་ཐོག་ཐུ་ <ho.thog.thu> is from Mongolian qutuɣ-tu 'holy' (lit. 'sanctity-possessing'). Why was Mongolian u borrowed as o in the first two syllables but as u in the third syllable? Was Mongolian /u/ lowered to [ʊ] in the vicinity of uvular /q/ and /ɣ/ (presumably [χ] and [ʁ] in the source variety of Mongolian)?

There is a link on the TMBCC site to the site of Sera Je Secondary School in India. The school is named after the Sera སེ་ར་ <se.ra.> Monastery but the spelling on the school's site is monosyllabic: སེར་ <ser.>. Is the latter correct? INITIAL AND ME-DE-AL KHITAN SMALL SCRIPT 205

The Khitan small script character


is usually in final position where it most likely represented the dative-locative suffix <de>. It can also be a word by itself (巴拉哈達洞壁墨書 3 2-8). Here are instances of 205 in other positions:

205-334 <de.g(i)> (仁懿皇后 23-15)

Kane (2009) proposed that Khitan small script consonant characters like 334 had inherent vowels, but I can't tell if 334 was <g> or <gi> here.

From now I am using Qidan xiaozi yanjiu numbers to refer to the characters both in the text of my blog and in future image names to reflect my partial agnosticism. 334 will still be 334 even d I change my mind about its reading.

205-339 <de.i> (道宗 22-21, 許王 31-9, 48-11, 55-5, 蕭仲恭 38-12)

This could be a noun 205 plus a genitive suffix 339 or a monomorphemic, disyllabic word.

205-131-097 <de.u.úr> (蕭仲恭 38-4, 40-33), 205-131-236-339 <de.u.ur.i> (蕭仲恭 36-42)

Are these two spellings of a noun deu(u)r without and with a genitive suffix -i?

205-261-112-361 <de.l(e).ge.én> (蕭仲恭 3-11)

Is this a verb del(e)g(e)- followed by a feminine past tense suffix -en?

028-205 <ś.de> (許王 10-15, 37-13), <ś.de.i> 028-205-339 (許王 60-5)

'Altaic'-type languages generally lack initial clusters*, so Khitan is often assumed to lack initial clusters. If this word lacked an initial cluster, its first vowel of this may always be unknown unless a spelling with a CV first character or a vowel second character after 028 is found.

If this word had vowel harmony, its first vowel may have been e or i. 205 is a dative-locative suffix for words with those two vowels, indicating that e and i belonged to the same vowel harmony class. However, 205 is probably not a suffix in 028-205 since I have not seen 028 as an independent word. Kane (2009: 188) transcribed this word as <śi.de> with an inherent vowel <i> in his reading of 028.

If 028 was an independent word, 205-339 might be the ablative suffix <de.i>. Is <de.i> a combination of dative-locative <de> with genitive <i>? That combination is reminiscent of Benzing's (1955: 83) Proto-Tungusic ablative *-du-ki from dative *-du plus a particle *-ki. Is this similarity coincidental or the product of the influence of Khitan or a related langauge on Tungusic? One must be careful about jumping to conclusions about short grammatical morphemes, as unrelated lookalikes abound: e.g., the Japanese locative de (< ni-te, gerund of ni [Vovin 2003: 46]) and the Classical Tibetan genitive -Hi < *ki.

028-205 may be a noun with a genitive suffix 339.

Without seeing the original texts, I cannot determine whether the next two (groups of?) words are sequences of two blocks or single blocks with five elements.


028-205-131-153 134 <ś.de.ú.j(i) TWO> (according to Andrew West) or

028-205 131-153-254 <ś.de ú.j(i)d> (according to Kane 2009: 188) or

028-205-131-153-254 <ś.de.ú.j(i)d> (according to Qidan xiaozi yanjiu)

(大金皇弟都統經略郎君行記 3-6)

028-205 might be the same word as 028-205 above.

254 may be a plural suffix. 131-152 (= 131-153) occurs by itself in 興宗 14-28 and with the genitive suffix 140 in 仁懿皇后. So 028-205 (-)131-153-254 could be a plural compound noun. How often are compound nouns are written in single blocks?

If the final character is 134 <TWO> rather than 254, then 134 <TWO> may modify the following word


which Kane speculated might mean 'appearance'. As of 2009, Aisin Gioro speculated it could have been read jisu (cf. Mongolian jisü 'appearance'). I do not know Aisin Gioro's current reading for this character.

227 lacks a plural suffix which I would expect after a numeral. Did 227 have a zero plural, or was it an inherent plural (cf. English people)? Unfortunately I do not know of any other instances in which 227 is preceded by a numeral.

The absence of a masculine dot on 134 <TWO> may imply that 227 was feminine (or neuter?).


208-205 020-332-339 <lu.de ei.nai.i>  (according to Qidan xiaozi yanjiu) or

208-205-340-332-339 <lu.de.x.nai.i> (according to Andrew West) (興宗 3-26)

I assume <lu.de> is not the dative-locative of <lu> 'dragon' which should be

205-179 <lu.dú>
with the suffix variant for <u>-stems.

020-332-339 <ei.nai.i>

seems to mix <e> with <a> which is unlikely according to my admittedly small understanding of Khitan vowel harmony. Perhaps 020 was <y> [j] here, and <y.nai.i> was read as something like [janaji].

If 208-205-340-332-339 <lu.de.x.nai.i> is a genitive of a compound, I would expect to find 340-332 <x.nai> elsewhere. However, the closest match I could find was

340-332-021-144 <x.nai.mó.ún>

in Andrew West's database. That corresponds to

340-289-184-144 <x.iú/da?.am.ún>

in Qidan xiaozi yanjiu. Both have unexpected mixes of <a> with <ú>. But it's likely I misunderstand how Khitan vowel harmony works.

340 <x> probably had an inherent vowel. Kane (2009: 30) transcribed 340 as <xe>. It must have had a vowel when it appeared as a word in isolation as in 慶陵壁畫題字 X-2. If that vowel was <e>, I wouldn't expect it to be followed by <a> - but 289 <da?>, 332 <nai>, and 184 <am> all contain <a>! Was 340 <xe> combined with an <a>-class word?

*An obvious exception is Middle Korean which was full of clusters like st- and even pst-. However, these clusters were short-lived products of syncope. Old Korean and other early Koreanic languages may have lacked initial clusters except in Chinese loanwords. KHITAN LARGE SCRIPT WORD DE-VISION

The Khitan small script has two big advantages over the Khitan large script for modern scholars:

- a smaller number of characters: i.e., fewer variables - perhaps 400 small script characters excluding variants as opposed to perhaps 1,000 large script characters excluding variants

- clustering of characters in blocks corresponding to words (or at least morphemes) as opposed to no obvious word or morphemic division in the large script

The key word in the second point is "obvious". There are clues to word division in the large script if both scripts represent the same language.

Khitan is an 'Altaic'-type language (but see here!) with suffixes and vowel harmony. For example, there are four dative-locative suffixes in the small script:

Khitan small script Transliteration Generally after stems with
<iú> (<da>?*) <a>
<de> <e> or <i>
<do> <o>
<dú> <u>

As we'll see below, there are exceptions to these patterns.

Since large script characters often represent single syllables, I would expect each of these small script characters to have a large script counterpart.

So far the only large script dative-locative suffix I know of is

<de> [tə]?

which may be graphically related to Old Chinese 時 *də. If the reading of a Khitan large script character resembles a pre-Liao Dynasty reading of a similar Chinese character, that may either be a coincidence or evidence for Janhunen's Parhae hypothesis (in which the Khitan large script was based on a Parhae script that was a sister of mainstream sinography rather than an early 10th century invention).

In theory, if we see <de> in a Khitan large script text, it may be

- a dative-locative suffix for a noun with <e> or <i>

- a phonogram for <de>

- at the end of a word

- in the middle of a word

- at the beginning of a word

- which is a monosyllabic word

If we see a string <X.de> and see <X> with other known case suffixes, then <X> is probably a noun with <e> or <i>.

Or is it? <jau tau> 'bandit suppression commissioner' has neither <e> nor <i> but is followed by <de> in the large script:

<jau tau de> (Yongning 8; Kane 2009: 174)

Its Chinese source 招討 *tʃiaw tʰaw does have an *i. Was the Khitan word [tɕiaw tʰaw] with an [i] reflected in vowel harmony but not in its spellings with <jau>? I have not found any large or small script examples of <tau> 'five' plus <de>. That may indicate <iau ... au ... e> was possible but not <au ...  e>.

Another case in which the first vowel of a word may determine vowel harmony is

<ne.ra.de> 'on the tomb' (Kane 2009: 137)

This begs the question of how a disharmonic word


came to be. <ne.ra> is probably not a compound of <ne> and <ra> since it is unlikely that any Khitan word could begin with <r>.

Perhaps sporadic cases of <de> after non-<e>/<i> vowels indicate that the case markers were beginning to collapse into a single [tə] usable after any vowel. Such a suffix would have been like the Manchu dative-locative suffix -de [tə] which is a merger of Jin Jurchen

-do and -dö

(as reconstructed by Kiyose 1977: 42; Jin 1984 reconstructed -do and -du).

Is the Proto-Tungusic dative *-du (as reconstructed by Georg 2004) a borrowing from Khitan or a related language? Are its locative uses in Jurchen/Manchu due to the influence of Khitan which lacked a dative/locative distinction?

*<iú> is clearly something like <iú> in Khitan small script transcriptions of Chinese syllables with *-y, but appears where <da> would be expected after nouns. Did it have two unrelated readings? DERUSU UZĀRA

Much of my career has involved the reconstruction of languages through scripts for other languages: e.g., Old Japanese and Old Korean through Chinese characters, Tangut through Chinese characters and the Tibetan script, etc. To understand how transcription worked in the past, it is useful to study transcriptions in the present.

Today I was surprised by how Russian Дерсу Узала Dersu Uzala [dʲɪrˈsu ʊzɐˈla] corresponded to Japanese デルス・ウザーラ Derusu Uzāra with an unexpected long vowel. Russian does not have phonemic vowel length, and these languages which do have phonemic vowel length only have short vowels in their versions of the name:

Czech Děrsu Uzala (not *Uzála; the háček indicates palatalization of the preceding D)

Finnish Dersu Uzala (not *Uzaala)

Slovak Dersu Uzala (not *Uzála)

What was Dersu Uzala's original Nanai name? Nanai has long vowels, so I thought it might be Dərsu Uzāla, but then I discovered that the name is Дэрсүү Узаала Dersüü Uzaala as well as Дэрсү Узала Dersü Uzala in Mongolian. Was the Nanai name Dərsü̅ Uzāla with two long vowels?

Is the Japanese name based on Nanai, and if so, why was only one long vowel retained? Could Derusu Uzara be based on Tuvan actor Maxim Munzuk's pronunciation of his character's name?

I assume the Czech, Finnish, and Slovak names are transliterations of Russian without any reference to Nanai.

The first long vowel in Mongolian Дэрсүү Узаала corresponds to a stressed Russian vowel, but the second doesn't. What is the reasoning behind final stress for both halves of the name in Russian?

Closer to home, I don't understand how stress is assigned to Hawaiian and Japanese names in English in Hawaii: e.g.,

[kʰʊˈhiːow] for Kūhiō [ˌkuːhiˈoː]

[tʰəˈnakə] for Tanaka

Stress in Hawaiian does not necessarily match stress in Anglicized Hawaiian names, and Japanese has no stress. CELTIC HEROES AT DAWN

I was recently asked about the etymology of the English word Easter and whether it had a Celtic connection. The Proto-Indo-European root of Easter is *ʕews 'shine', so I would expect Proto-Celtic *aus according to the correspondences here. However, the University of Wales has a file with a very different reconstruction:

Proto-Indo-European *ʕe w s -
Hypothetical Proto-Celtic *a u s
Actual? Proto-Celtic *wā s ri-

The initial consonants of Middle Irish fair 'sunrise' and Welsh gwawr 'dawn' point to Proto-Celtic *w-. Did *au irregularly become *wā?

Simply reversing *a and *u would not account for the long vowel. Does that vowel reflect a Proto-Indo-European lenghthened grade ēws?

Is *-ri- a suffix?

Not far from *wāsri- 'dawn' in that list of Proto-Celtic reconstructions is *wāro- 'hero'. The latter superficially (and presumably only coincidentally) resembles Proto-Indo-European *wiHro- 'man' whose true Proto-Celtic reflex is *wiro- (> Irish fear and Welsh gŵr). What attested Celtic forms underlie the reconstruction of *wāro-? DR. ZHIVOGO (SIC)

The Russian name Zhivago = Živago [ʐɨˈvagə]. was borrowed from the Old Church Slavonic definite adjective živago '(of) the living' (masculine animate accusative-genitive singular), a contraction of živ-a 'living' plus -jego 'that which is known'. The native Russian equivalent of this adjective is živogo [ʐɨˈvovə] with a different second vowel.

Two weeks ago I wrote about the unexpected final -a of masculine animate accusative-genitive singular adjectives in Serbo-Croatian and Slovene. I was also puzzled by the penultimate vowel o in Serbo-Croatian which is like that of East Slavic:

Branch Language '(the) living' Type Cf. 'him/his' Cf. 'whom/whose'
(n/a) Proto-Slavic *živ-a-jego aje *jego *kogo
South Old Church Slavonic živajego, živaago, živago aje/aa/a jego kogo
East Russian živógo o jegó [jɪˈvo] kogó [kɐˈvo]
Belarusian živóha jahó kahó
Ukrainian žyvóho johó kohó
South Serbo-Croatian živog(a) njega kog(a)
Slovene živega e koga
West Polish żywego jego kogo
Czech and Slovak živého jeho koho

Most languages above have transparent compressions of *aje. Old Church Slavonic favored the first vowel (aje > aa > a), whereas Slovene and West Slavic favored the second (*aje > long é in Czech and Slovak, e in Slovene and Polish). However, East Slavic and Serbo-Croatian o does not look like a compression of *aje.

One could try to explain that o as having assimilated to the following *o which then became a in Belarusian and Serbo-Croatian (and even optionally lost in the latter).

However, my guess is that the o was by analogy with the hard pronominal declension: e.g., *kogo 'whom/whose'. Ukrainian has taken this analogy further than the others so even soft-stem adjectives have o, whereas the others have e or a ja which may be from *e:

Ukrainian -'oho (with palatalization of the stem-final consonant before o)

Belarusian -jaho < *-ego (or *-ogo before a palatalized stem-final consonant?)

Russian -ego

Serbo-Croatian -eg(a)

Moving on from morphology to semantics, is the surname Živago really a frozen genitive?

The same Old Church Slavonic word is an accusative in Luke 24: 5: 'Why do you seek the Living One (živago) among the dead?' (See the OCS text with a Russian translation here.)

That line in turn reminded me of Dutch surnames that are frozen accusatives: e.g., Den Beste.

I would like to see studies of case frequency (e.g., how frequent are accusatives relative to nominatives?) and case freezing (e.g., how often do frozen forms originate from old accusatives as in Romance?). I don't expect a simple account of the latter since multiple factors can influence speakers' choices of forms to freeze. For instance, suppose two languages lost their case system. Language A originally had a single stem for all cases and a zero ending for the nominative. On the other hand,  language B originally had one stem for the nominative and an oblique stem for all other cases.

Case Language A Language B
Nominative ka cəns
Accusative ka-ti cəs-e
Genitive ka-pu cəs-o

I would expect language A to have nouns based on the nominative (e.g., ka) and language B to have nouns based on the oblique (e.g, cəs instead of *cəns). ROMANIZZAZIONE DELLA LINGUA RUSSA

This morning I saw this Italian cover for Doctor Zhivago. The style of romanization caught my eye because of its use of accents and háčeks (in bold):

Borís Leonídovič Pasternàk


The accents correspond to Russian stress. I expected all stressed vowels to bear grave accents. Was an acute accent chosen for í because it is high and more like mid-high é [e] and ó [o] than mid-low è [ɛ], ò [ɔ], and low à [a]? Why doesn't "Živago" bear an accent? Because the accent is penultimate and predictable? The Italian Wikipedia has no accent in the title Il dottor Živago in the title but has an accent on the character's name Jùrij Andrèevič Živàgo. (Stressed ù [u] has a grave accent even though it is high like í [i].)

Do most Italian understand the function of the háček? According to Wikipedia, Zivago and Boris Leonidovich Pasternak without háčeks are the usual Italianizations of those names. Similarly, I see Chernobyl at Corriere della sera, though the Italian Wikipedia has Černobyl' with a háček and an apostrophe for the soft sign. Should encyclopedias favor scientific transcriptions over lay transcriptions? Which is a user more likely to look up, Chernobyl or Černobyl'? Does it matter if one redirects to the other?

I'm surprised there is no article on Russian romanization or transliteration in the Italian Wikipedia.

