English distinguishes between voiceless aspirated and voiced unaspirated stops and affricates in word-initial position:

Voiceless aspirated tʃʰ
Spelling p t ch k
Voiced unaspirated b d g
Spelling b d j g

The aspiration [ʰ] is not indicated in spelling. English ph is [f], English th is [θ] ~ [ð], and English kh is phonetically identical to English k [kʰ].

Many East, Southeast, and South Asian languages distinguish between voiceless unaspirated (i.e., [ʰ]-less) and aspirated stops and affricates in word-initial position:

Voiceless unaspirated p t k
Voiceless aspirated tʃʰ

([tʃ] is only an example of a palatal affricate; the details of affricates or palatal stops vary by language.)

This distinction can be romanized in several ways:

International Phonetic Alphabet p t tʃʰ k
I. No distinction p t ch k
II. Apostrophe p p' t t' ch ch' k k'
III. Medial -h- for aspirates p ph t th c(h) ch(h) k kh
IV. Initial h- for aspirates p hp t ht c hc k hk
V. Voiced letters for aspirates b p d t j ch g k

Type III c and ch are often written with extra h's as ch and chh in English to avoid the mispronunciation of c as [k].

Type II was the most common for Chinese and Korean until the recent rise of apostrophe-free type V romanizations: e.g.,

International Phonetic Alphabet [tɑw] [tɑŋ] [tɛgu] [tʰɛkkwəndo]*
II. Apostrophe Tao T'ang Taegu T'aekwŏndo
V. Voiced letters for aspirates Dao Tang Daegu Taegwondo

While many type II spellings have been replaced by type V spellings in English (e.g., [mɑw tsɤ tʊŋ]  as Mao Zedong instead of Mao Tse-tung and [kaŋnam] as Gangnam instead of Kangnam), others are too established to change: e.g., [kimtɕʰi] as kimchi but not gimchi.

Many Chinese and Korean names in English are in type I romanization: i.e., a simplified type II without apostrophes or anything else beyond the basic 26 letters. Hence T'aipei and t'aekwondo are normally written Taipei and taekwondo.

Type III is favored for nearly all Southeast and South Asian languages: e.g., Thai and Khmer (which would be Tai and Kmer in type I and T'ai and K'mer in type II).

I favor type III for Korean since it makes some phonetic processes transparent: e.g.,

/tɕoh/ + /ko/ = 좋고 [tɕokʰo] 'good and'

III. choh- + -ko = chokho (-h and -k- 'trade places')

II. choh- + -ko = chok'o (-h 'becomes' an apostrophe after -k-)

V. joh- + -go = joko (the two letters -hg- are replaced by a completely different single letter -k-)

/tɕoh/ + /ta/ =  좋다 [tɕotʰa] 'is good'

III. choh- + -ta = chotha (-h and -t- 'trade places')

II. choh- + -ta = chot'a (-h 'becomes' an apostrophe after -t-)

V. joh- + -da = jota (the two letters -hd- are replaced by a completely different single letter -t-)

Burmese is sometimes romanized with type IV to avoid confusing the aspirated stop ht [tʰ] with the fricative th [θ].

All of that demonstrates the difficulty in writing the sounds of one language in a script for another.

The Khitan encountered similar problems with Chinese. They generally wrote Chinese voiceless aspirated obstruents with what I'll call 'series 1' small script characters but wrote Chinese voiceless unaspirated obstruents with both 'series 1' and 'series 2' small script characters:

Liao Chinese *p *pʰ *t *tʰ *tʂ *tʂʰ *k *kʰ
Series 1 transcription - -
Series 2 transcription - - - -

I excluded exceptions to this basic pattern requiring investigation.

I conclude that series 1 was voiceless aspirated and series 2 was voiced: i.e., that Khitan obstruents were like those of English rather than Chinese: i.e., without a voiceless unaspirated series. Just as Chinese voiceless unaspirated [t] may sound 'halfway' between an English [tʰ] and [d] (and is romanized as both t and d), Liao Chinese *t may have sounded 'haflway' between Khitan *tʰ and *d and was transcribed as both

series 1 <t> [tʰ] and series 2 <d>


<t.ei>, <d.ei>, <t.i>, <d.i>, <t.oi>

for Liao Chinese 德 *təj (corresponding to modern Mandarin [tɤ] which has been romanized as te and de).

Although modern Mongolian and Manchu have a Chinese-style distinction between voiceless unaspirated and aspirated obstruents, that may be due to long-term Chinese influence, and Khitan may have preserved an earlier English-style distinction. (Not that Khitan was ever influenced by English!)

I have wondered if Khitan had an English-style distinction for some time. I thank David Boxenhorn for independently suggesting it and inspiring me to test the hypothesis with the Khitan small script transcription data in Kane 2009.

*The [kk] of [tʰɛkkwəndo] is treated as if it were /k/ in the Korean spelling <thae.kwŏ> and in romanization. 'VIRTUAL' VARIATION IN THE SMALL KHITAN SCRIPT

Liao Chinese 德 'virtue' corresponded to at least three if not four or five different Khitan small script spellings (Kane 2009: 245, Qidan xiaozi yanjiu 1985: 621, 623):

<t.ei>, <d.ei>, <t.i>, <d.i>, <t.oi>

Kane listed the block for <t.i> with the transcription <d.i>. Both appear in the corpus in Qidan xiaozi yanjiu (許 王 44-12, 道宗 34-15, 許王 7-16), though only the instance of <t.i> (whose initial may not be certain) in 許王 7-16 was identified as 德.

The <t> ~ <d> variation here and elsewhere implies that Khitan and Liao Chinese initial consonants did not quite match. Liao Chinese had a *t- ~ *tʰ- distinction, whereas Khitan may have distinguished between

- t(ʰ)- and d- (cf. English)

- t- and implosive ɗ- (cf. Vietnamese)

- t- and ejective tʼ- (cf. Nez Perce)

- t- and tense tt- (cf. Korean which also has tʰ-)

- t- and pharyngealized tˁ- (cf. Semitic; added 4.22.0:04)

- t- and preaspirated ʰt- (cf. Huautla Mazatec; 4.22.0:25)

Are there any other possibilities?t

The variation in rhymes either indicates that the Chinese rhyme was something absent in Khitan or that the Khitan heard more than one version of the Chinese word. Dated Khitan texts may shed light on Chinese dialectal variation during the two centuries of the Liao Dynasty.

4.22.1:27: <t.oi> looks like an unlikely Khitanization of 德 given what we know of earlier and later stages:

Middle Chinese *tək (> borrowed into Korean as 덕 tŏk [tək])

Phags-pa Chinese ꡊꡜꡞꡗ <dhiy> [təj]

Old Mandarin *təj

Beijing de [tɤ]

However, supposing that <t.oi> is 德, perhaps MC *tək underwent these shifts in the dialect known to the Khitan:

*tək > *təɣ > *təɰ > *təj > *toj > *tøj

<ei> ([əj]?), <i>, and <oi> could all be Khitanizations of *-øj.

A couple of Chinese varieties in the 小學堂 database do have front rounded vowels in 德:

Funing Eastern Min tœk

Southeastern Northern Min

Both of these varieties are spoken far to the south of former Liao territory and cannot be descended from the Chinese underlying loans in Khitan. Nonetheless they demonstrate that rounding of original schwa is possible in德.

Unfortunately I do not know of any Chinese words rhyming with 德 that were also Khitanized with <oi>. The only other Chinese word in 德's rhyme class that Kane (2009: 245) listed is 特 corresponding to

<t.ei> and <d.ei>

in the Khitan small script.

It is also possible that <t.oi> is

- an error (doubtful; would the same error be made on both a mirror and a coin?)

- a taboo deformation of one of the other spellings (but did anyone important at the time have 德 in their name?)

- a unrelated native word for 'virtue'

- a word meaning something other than 'virtue' BUC__REST(_)

What accounts for the variation in non-Romanian names for București [bukuˈreʃtʲ]?

1. Why does Irish Búcairist [bkəɾʲɪʃtʲ] have a long vowel? Is it an attempt to preserve the quality of Romanian [u]? (Irish short u is [ʊ], not [u].)

2. Why does English Bucharest have ch instead of c corresponding to [k]? Was it borrowed from a language which had [x]? Why don't those languages have [k] instead?

3. Why do so many languages have a instead of u for the second vowel? Even two of Romanian's neighbors have a: Hungarian Bukarest [bukɒreʃt] and Ukrainian Бухарест <Buxarest>. Oddly the u somehow crossed the Hungarian 'a-barrier' into Slovak and (then?) Czech Bukurešť.

4. At least I'm pretty sure that [s] in many languages is a spelling pronunciation or transliteration of s sans comma. Initially I thought it was odd that Romanian's neighbor Ukrainian has [s] instead of [ʃ], but perhaps Ukrainian Бухарест <Buxarest> was borrowed from Russian, replacing an earlier name more like Serbian and Bulgarian Букурешт <Bukurešt>.

5. When did -ti become [tʲ] in Romanian? I'm surprised this sound change isn't mentioned in this long article on Romanian phonological history. Does final [t] in most languages reflect [tʲ], a sound absent in most European languages? Ukrainian and Russian Бухарест <Buxarest> has <t> even though both languages have [tʲ].

Does final [tʲ] in Czech and Slovak Bukurešť directly reflect Romanian -ti [tʲ]? In theory, an earlier Romanian [bukaresti] could have been interpreted as a Slovak genitive, dative, or locative singular from which a nominative and accusative Bukurešť could have been created by analogy with kosť 'bone'. Then that Bukurešť could have been borrowed into Czech. However, it would be simpler to assume that both postdate the -ti to [tʲ] shift in Romanian. How did Czechs and Slovaks know the name ended in [tʲ] if neither had direct contact with Romanians?  I would have expected the Slovak name to be like the Hungarian name. Was the name a learned borrowing?

Hungarian has final ty [tʲ], but as far as I know, it does not have final -sty [ʃtʲ]. Is that why Hungarian Bukarest [bukɒreʃt] ends in [t]?

Irish Búcairist [buːkəɾʲɪʃtʲ] mixes an un-Romanian a with a final cluster that looks like an attempt to mimic the Romanian original. (A hypothetical English-based name would be *Bucairiost [bʊkəɾʲɪst]. I think the vowel of the final syllable is [ɪ] because unstressed [ɛ] is not possible if I understand this article correctly.)

6. Why do these names have non-i vowels in their final syllables?

Romansh Bucaresta

Slovene Bukarešta

Lithuanian Bukareštas (with masculine nominative singular -s)

Portuguese Bucareste

Latvian Bukareste

Final -o in Japanese ブカレスト Bukaresuto indicates borrowing from a source with final -t.

I wonder if Korean once had an earlier Japanese-like borrowing that was replaced by 부쿠레슈티 Pukhureshuthi [pukʰureɕutʰi] which seems to be a straightforward transliteration of București. A(R)JIA AND SER(A)

Tonight I was reading about the Tibetan Mongolian Buddhist Cultural Center whose director is the 8th Arjia Rinpoche. The English Wikipedia has the Tibetan spelling ཨ་རྒྱ་ <a.rgya.> but the Chinese Wikipedia has the Tibetan spelling ཨ་ཀྱ་ <a.kya.>. Which is correct? Does the r of English Arjia correspond to a Tibetan r, or is it an intrusive r like the r of Burma and Bamar from r-less Burmese ဗမာ <bamā>?

The 8th Arjia Rinpoche's title ཧོ་ཐོག་ཐུ་ <ho.thog.thu> is from Mongolian qutuɣ-tu 'holy' (lit. 'sanctity-possessing'). Why was Mongolian u borrowed as o in the first two syllables but as u in the third syllable? Was Mongolian /u/ lowered to [ʊ] in the vicinity of uvular /q/ and /ɣ/ (presumably [χ] and [ʁ] in the source variety of Mongolian)?

There is a link on the TMBCC site to the site of Sera Je Secondary School in India. The school is named after the Sera སེ་ར་ <se.ra.> Monastery but the spelling on the school's site is monosyllabic: སེར་ <ser.>. Is the latter correct? INITIAL AND ME-DE-AL KHITAN SMALL SCRIPT 205

The Khitan small script character


is usually in final position where it most likely represented the dative-locative suffix <de>. It can also be a word by itself (巴拉哈達洞壁墨書 3 2-8). Here are instances of 205 in other positions:

205-334 <de.g(i)> (仁懿皇后 23-15)

Kane (2009) proposed that Khitan small script consonant characters like 334 had inherent vowels, but I can't tell if 334 was <g> or <gi> here.

From now I am using Qidan xiaozi yanjiu numbers to refer to the characters both in the text of my blog and in future image names to reflect my partial agnosticism. 334 will still be 334 even d I change my mind about its reading.

205-339 <de.i> (道宗 22-21, 許王 31-9, 48-11, 55-5, 蕭仲恭 38-12)

This could be a noun 205 plus a genitive suffix 339 or a monomorphemic, disyllabic word.

205-131-097 <de.u.úr> (蕭仲恭 38-4, 40-33), 205-131-236-339 <de.u.ur.i> (蕭仲恭 36-42)

Are these two spellings of a noun deu(u)r without and with a genitive suffix -i?

205-261-112-361 <de.l(e).ge.én> (蕭仲恭 3-11)

Is this a verb del(e)g(e)- followed by a feminine past tense suffix -en?

028-205 <ś.de> (許王 10-15, 37-13), <ś.de.i> 028-205-339 (許王 60-5)

'Altaic'-type languages generally lack initial clusters*, so Khitan is often assumed to lack initial clusters. If this word lacked an initial cluster, its first vowel of this may always be unknown unless a spelling with a CV first character or a vowel second character after 028 is found.

If this word had vowel harmony, its first vowel may have been e or i. 205 is a dative-locative suffix for words with those two vowels, indicating that e and i belonged to the same vowel harmony class. However, 205 is probably not a suffix in 028-205 since I have not seen 028 as an independent word. Kane (2009: 188) transcribed this word as <ś> with an inherent vowel <i> in his reading of 028.

If 028 was an independent word, 205-339 might be the ablative suffix <de.i>. Is <de.i> a combination of dative-locative <de> with genitive <i>? That combination is reminiscent of Benzing's (1955: 83) Proto-Tungusic ablative *-du-ki from dative *-du plus a particle *-ki. Is this similarity coincidental or the product of the influence of Khitan or a related langauge on Tungusic? One must be careful about jumping to conclusions about short grammatical morphemes, as unrelated lookalikes abound: e.g., the Japanese locative de (< ni-te, gerund of ni [Vovin 2003: 46]) and the Classical Tibetan genitive -Hi < *ki.

028-205 may be a noun with a genitive suffix 339.

Without seeing the original texts, I cannot determine whether the next two (groups of?) words are sequences of two blocks or single blocks with five elements.


028-205-131-153 134 <ś.de.ú.j(i) TWO> (according to Andrew West) or

028-205 131-153-254 <ś.de ú.j(i)d> (according to Kane 2009: 188) or

028-205-131-153-254 <ś.de.ú.j(i)d> (according to Qidan xiaozi yanjiu)

(大金皇弟都統經略郎君行記 3-6)

028-205 might be the same word as 028-205 above.

254 may be a plural suffix. 131-152 (= 131-153) occurs by itself in 興宗 14-28 and with the genitive suffix 140 in 仁懿皇后. So 028-205 (-)131-153-254 could be a plural compound noun. How often are compound nouns are written in single blocks?

If the final character is 134 <TWO> rather than 254, then 134 <TWO> may modify the following word


which Kane speculated might mean 'appearance'. As of 2009, Aisin Gioro speculated it could have been read jisu (cf. Mongolian jisü 'appearance'). I do not know Aisin Gioro's current reading for this character.

227 lacks a plural suffix which I would expect after a numeral. Did 227 have a zero plural, or was it an inherent plural (cf. English people)? Unfortunately I do not know of any other instances in which 227 is preceded by a numeral.

The absence of a masculine dot on 134 <TWO> may imply that 227 was feminine (or neuter?).


208-205 020-332-339 < ei.nai.i>  (according to Qidan xiaozi yanjiu) or

208-205-340-332-339 <> (according to Andrew West) (興宗 3-26)

I assume <> is not the dative-locative of <lu> 'dragon' which should be

205-179 <lu.dú>
with the suffix variant for <u>-stems.

020-332-339 <ei.nai.i>

seems to mix <e> with <a> which is unlikely according to my admittedly small understanding of Khitan vowel harmony. Perhaps 020 was <y> [j] here, and <y.nai.i> was read as something like [janaji].

If 208-205-340-332-339 <> is a genitive of a compound, I would expect to find 340-332 <x.nai> elsewhere. However, the closest match I could find was

340-332-021-144 <x.nai.mó.ún>

in Andrew West's database. That corresponds to

340-289-184-144 <x.iú/da?.am.ún>

in Qidan xiaozi yanjiu. Both have unexpected mixes of <a> with <ú>. But it's likely I misunderstand how Khitan vowel harmony works.

340 <x> probably had an inherent vowel. Kane (2009: 30) transcribed 340 as <xe>. It must have had a vowel when it appeared as a word in isolation as in 慶陵壁畫題字 X-2. If that vowel was <e>, I wouldn't expect it to be followed by <a> - but 289 <da?>, 332 <nai>, and 184 <am> all contain <a>! Was 340 <xe> combined with an <a>-class word?

*An obvious exception is Middle Korean which was full of clusters like st- and even pst-. However, these clusters were short-lived products of syncope. Old Korean and other early Koreanic languages may have lacked initial clusters except in Chinese loanwords. KHITAN LARGE SCRIPT WORD DE-VISION

The Khitan small script has two big advantages over the Khitan large script for modern scholars:

- a smaller number of characters: i.e., fewer variables - perhaps 400 small script characters excluding variants as opposed to perhaps 1,000 large script characters excluding variants

- clustering of characters in blocks corresponding to words (or at least morphemes) as opposed to no obvious word or morphemic division in the large script

The key word in the second point is "obvious". There are clues to word division in the large script if both scripts represent the same language.

Khitan is an 'Altaic'-type language (but see here!) with suffixes and vowel harmony. For example, there are four dative-locative suffixes in the small script:

Khitan small script Transliteration Generally after stems with
<iú> (<da>?*) <a>
<de> <e> or <i>
<do> <o>
<dú> <u>

As we'll see below, there are exceptions to these patterns.

Since large script characters often represent single syllables, I would expect each of these small script characters to have a large script counterpart.

So far the only large script dative-locative suffix I know of is

<de> [tə]?

which may be graphically related to Old Chinese 時 *də. If the reading of a Khitan large script character resembles a pre-Liao Dynasty reading of a similar Chinese character, that may either be a coincidence or evidence for Janhunen's Parhae hypothesis (in which the Khitan large script was based on a Parhae script that was a sister of mainstream sinography rather than an early 10th century invention).

In theory, if we see <de> in a Khitan large script text, it may be

- a dative-locative suffix for a noun with <e> or <i>

- a phonogram for <de>

- at the end of a word

- in the middle of a word

- at the beginning of a word

- which is a monosyllabic word

If we see a string <> and see <X> with other known case suffixes, then <X> is probably a noun with <e> or <i>.

Or is it? <jau tau> 'bandit suppression commissioner' has neither <e> nor <i> but is followed by <de> in the large script:

<jau tau de> (Yongning 8; Kane 2009: 174)

Its Chinese source 招討 *tʃiaw tʰaw does have an *i. Was the Khitan word [tɕiaw tʰaw] with an [i] reflected in vowel harmony but not in its spellings with <jau>? I have not found any large or small script examples of <tau> 'five' plus <de>. That may indicate <iau ... au ... e> was possible but not <au ...  e>.

Another case in which the first vowel of a word may determine vowel harmony is

<> 'on the tomb' (Kane 2009: 137)

This begs the question of how a disharmonic word


came to be. <ne.ra> is probably not a compound of <ne> and <ra> since it is unlikely that any Khitan word could begin with <r>.

Perhaps sporadic cases of <de> after non-<e>/<i> vowels indicate that the case markers were beginning to collapse into a single [tə] usable after any vowel. Such a suffix would have been like the Manchu dative-locative suffix -de [tə] which is a merger of Jin Jurchen

-do and -dö

(as reconstructed by Kiyose 1977: 42; Jin 1984 reconstructed -do and -du).

Is the Proto-Tungusic dative *-du (as reconstructed by Georg 2004) a borrowing from Khitan or a related language? Are its locative uses in Jurchen/Manchu due to the influence of Khitan which lacked a dative/locative distinction?

*<iú> is clearly something like <iú> in Khitan small script transcriptions of Chinese syllables with *-y, but appears where <da> would be expected after nouns. Did it have two unrelated readings? DERUSU UZĀRA

Much of my career has involved the reconstruction of languages through scripts for other languages: e.g., Old Japanese and Old Korean through Chinese characters, Tangut through Chinese characters and the Tibetan script, etc. To understand how transcription worked in the past, it is useful to study transcriptions in the present.

Today I was surprised by how Russian Дерсу Узала Dersu Uzala [dʲɪrˈsu ʊzɐˈla] corresponded to Japanese デルス・ウザーラ Derusu Uzāra with an unexpected long vowel. Russian does not have phonemic vowel length, and these languages which do have phonemic vowel length only have short vowels in their versions of the name:

Czech Děrsu Uzala (not *Uzála; the háček indicates palatalization of the preceding D)

Finnish Dersu Uzala (not *Uzaala)

Slovak Dersu Uzala (not *Uzála)

What was Dersu Uzala's original Nanai name? Nanai has long vowels, so I thought it might be Dərsu Uzāla, but then I discovered that the name is Дэрсүү Узаала Dersüü Uzaala as well as Дэрсү Узала Dersü Uzala in Mongolian. Was the Nanai name Dərsü̅ Uzāla with two long vowels?

Is the Japanese name based on Nanai, and if so, why was only one long vowel retained? Could Derusu Uzara be based on Tuvan actor Maxim Munzuk's pronunciation of his character's name?

I assume the Czech, Finnish, and Slovak names are transliterations of Russian without any reference to Nanai.

The first long vowel in Mongolian Дэрсүү Узаала corresponds to a stressed Russian vowel, but the second doesn't. What is the reasoning behind final stress for both halves of the name in Russian?

Closer to home, I don't understand how stress is assigned to Hawaiian and Japanese names in English in Hawaii: e.g.,

[kʰʊˈhiːow] for Kūhiō [ˌkuːhiˈoː]

[tʰəˈnakə] for Tanaka

Stress in Hawaiian does not necessarily match stress in Anglicized Hawaiian names, and Japanese has no stress. CELTIC HEROES AT DAWN

I was recently asked about the etymology of the English word Easter and whether it had a Celtic connection. The Proto-Indo-European root of Easter is *ʕews 'shine', so I would expect Proto-Celtic *aus according to the correspondences here. However, the University of Wales has a file with a very different reconstruction:

Proto-Indo-European *ʕe w s -
Hypothetical Proto-Celtic *a u s
Actual? Proto-Celtic *wā s ri-

The initial consonants of Middle Irish fair 'sunrise' and Welsh gwawr 'dawn' point to Proto-Celtic *w-. Did *au irregularly become *wā?

Simply reversing *a and *u would not account for the long vowel. Does that vowel reflect a Proto-Indo-European lenghthened grade ēws?

Is *-ri- a suffix?

Not far from *wāsri- 'dawn' in that list of Proto-Celtic reconstructions is *wāro- 'hero'. The latter superficially (and presumably only coincidentally) resembles Proto-Indo-European *wiHro- 'man' whose true Proto-Celtic reflex is *wiro- (> Irish fear and Welsh gŵr). What attested Celtic forms underlie the reconstruction of *wāro-? DR. ZHIVOGO (SIC)

The Russian name Zhivago = Živago [ʐɨˈvagə]. was borrowed from the Old Church Slavonic definite adjective živago '(of) the living' (masculine animate accusative-genitive singular), a contraction of živ-a 'living' plus -jego 'that which is known'. The native Russian equivalent of this adjective is živogo [ʐɨˈvovə] with a different second vowel.

Two weeks ago I wrote about the unexpected final -a of masculine animate accusative-genitive singular adjectives in Serbo-Croatian and Slovene. I was also puzzled by the penultimate vowel o in Serbo-Croatian which is like that of East Slavic:

Branch Language '(the) living' Type Cf. 'him/his' Cf. 'whom/whose'
(n/a) Proto-Slavic *živ-a-jego aje *jego *kogo
South Old Church Slavonic živajego, živaago, živago aje/aa/a jego kogo
East Russian živógo o jegó [jɪˈvo] kogó [kɐˈvo]
Belarusian živóha jahó kahó
Ukrainian žyvóho johó kohó
South Serbo-Croatian živog(a) njega kog(a)
Slovene živega e koga
West Polish żywego jego kogo
Czech and Slovak živého jeho koho

Most languages above have transparent compressions of *aje. Old Church Slavonic favored the first vowel (aje > aa > a), whereas Slovene and West Slavic favored the second (*aje > long é in Czech and Slovak, e in Slovene and Polish). However, East Slavic and Serbo-Croatian o does not look like a compression of *aje.

One could try to explain that o as having assimilated to the following *o which then became a in Belarusian and Serbo-Croatian (and even optionally lost in the latter).

However, my guess is that the o was by analogy with the hard pronominal declension: e.g., *kogo 'whom/whose'. Ukrainian has taken this analogy further than the others so even soft-stem adjectives have o, whereas the others have e or a ja which may be from *e:

Ukrainian -'oho (with palatalization of the stem-final consonant before o)

Belarusian -jaho < *-ego (or *-ogo before a palatalized stem-final consonant?)

Russian -ego

Serbo-Croatian -eg(a)

Moving on from morphology to semantics, is the surname Živago really a frozen genitive?

The same Old Church Slavonic word is an accusative in Luke 24: 5: 'Why do you seek the Living One (živago) among the dead?' (See the OCS text with a Russian translation here.)

That line in turn reminded me of Dutch surnames that are frozen accusatives: e.g., Den Beste.

I would like to see studies of case frequency (e.g., how frequent are accusatives relative to nominatives?) and case freezing (e.g., how often do frozen forms originate from old accusatives as in Romance?). I don't expect a simple account of the latter since multiple factors can influence speakers' choices of forms to freeze. For instance, suppose two languages lost their case system. Language A originally had a single stem for all cases and a zero ending for the nominative. On the other hand,  language B originally had one stem for the nominative and an oblique stem for all other cases.

Case Language A Language B
Nominative ka cəns
Accusative ka-ti cəs-e
Genitive ka-pu cəs-o

I would expect language A to have nouns based on the nominative (e.g., ka) and language B to have nouns based on the oblique (e.g, cəs instead of *cəns). ROMANIZZAZIONE DELLA LINGUA RUSSA

This morning I saw this Italian cover for Doctor Zhivago. The style of romanization caught my eye because of its use of accents and háčeks (in bold):

Borís Leonídovič Pasternàk


The accents correspond to Russian stress. I expected all stressed vowels to bear grave accents. Was an acute accent chosen for í because it is high and more like mid-high é [e] and ó [o] than mid-low è [ɛ], ò [ɔ], and low à [a]? Why doesn't "Živago" bear an accent? Because the accent is penultimate and predictable? The Italian Wikipedia has no accent in the title Il dottor Živago in the title but has an accent on the character's name Jùrij Andrèevič Živàgo. (Stressed ù [u] has a grave accent even though it is high like í [i].)

Do most Italian understand the function of the háček? According to Wikipedia, Zivago and Boris Leonidovich Pasternak without háčeks are the usual Italianizations of those names. Similarly, I see Chernobyl at Corriere della sera, though the Italian Wikipedia has Černobyl' with a háček and an apostrophe for the soft sign. Should encyclopedias favor scientific transcriptions over lay transcriptions? Which is a user more likely to look up, Chernobyl or Černobyl'? Does it matter if one redirects to the other?

I'm surprised there is no article on Russian romanization or transliteration in the Italian Wikipedia. THE CHARACTERS OF MINISTERS: MODIFIED-MODIFIER ORDER IN KHITAN

Let me take a brief diversion into Khitan syntax. (I say that as if any of you could stop me!)

While looking through Kane (2009) for the umpteenth time, I noticed something odd that should have caught my eye years ago. Khitan had three equivalents of Chinese 四字功臣 'four character meritorious official' (i.e., an official whose title is written with four Chinese characters):

1. <FOUR us.g.d g.ung> 'four characters meritorious official'

2. <g.ung FOUR us.g.d > 'meritorious official four characters'

3. <g.ung us.g.d FOUR> 'meritorious official characters four'

1 follows the modifier-modified order that is the norm in 'Altaic' languages and even in non-Altaic Chinese. 2 and 3, however, have un-'Altaic' order. 4-6 are structurally similar to 1-3:

4. <ONE us.g.en g.ung> 'one character-GEN meritorious official'

5. <g.ung EIGHT us.g.d> 'meritorious official eight characters'

6. <g.ung us.g.d SIX> 'meritorious official characters six'

<g.ung cin> is a Chinese loan; it is bimorphemic in Chinese but was probably monomorphemic in Khitan, so I do not ever expect to see

*<cin g.ung>

'official meritorious'. (Similar Chinese loans retain Chinese morpheme order in Vietnamese which has un-'Altaic' modified-modifier order.)

At first I thought 2 and 5 had mixed order -

modifier + modified: numeral + 'character'

modified + modifier: 'meritorious official' + (numeral + 'character')

- but then I realized 'four characters' in 2 and 'eight characters' in 5 could be analyzed as single syntactic units following nouns rather than as numeral-noun sequences.

How can the un-'Altaic' order in 2, 3, 5, and 6 be explained? Are there other Khitan phrases with modified-modifier order? KHITAN SMALL SCRIPT CHARACTERS IN AISIN GIORO (2012) BUT NOT KANE (2009)

I have begun to compile a database of Khitan small script characters to facilitate my study of Aisin Gioro Ulhicun's Khitan reconstruction. So far it includes 473 characters in two numbering systems (Qidan xiaozi yanjiu/Kane's and Aisin Gioro's):

378 from Qidan xiaozi yanjiu

2 added to those 378 in Kane (2009)

90 in Andrew West's font that are not in Qidan xiaozi yanjiu or Kane (2009)

3 that are in Aisin Gioro (2012) but not in any of the above sources or N3820. (4.12.0:30: I can't see the Khitan characters in N3918R, but I assume they are the same as those in N3820, as the total number of characters has not changed.)

The latter three are her numbers 109,  234, and 293:

Unfortunately her reconstructions of their readings are in her 2011 book 契 丹語諸形態の研究 which I haven't seen. AISIN GIORO'S RECONSTRUCTIONS OF KHITAN VOWELS

After four posts in a row about consonants, it's time to look at vowels for a change.

Aisin Gioro Ulhicun has not yet publicly released a full description of her reconstruction of Khitan phonology, but I can attempt to reverse-engineer it from the fragments in this 2012 article which has her latest reconstructions of many Khitan small script characters in the fourth column. (Some reconstructions are in Aisin Gioro's 2011 book 契 丹語諸形態の研究 which I haven't seen.) Those reconstructions have eleven vowels:

i  ï u
ö ə o
æ a ã ɑ

Other vowels (e.g., ü, ɪ, e) may be in the 2011 reconstructions I have not yet seen.

The core vowels appear to be i, ə, a, u, and o. The others appear only in restricted environments to the best of my limited knowledge:

- æ is only in closed syllables (cf. English [æ] which cannot appear in word-final position)

- ã is only in 45 <qa> ~ <qã> 'khan' (= 051 <ha> in Kane 2009; see Andrew West's list for the glyphs corresponding to each of Kane's numbers which are all from Qidan xiaozi yanjiu except for the last two)

Do any known northeast Asian languages have nasal vowels?

If we didn't know about the word 'khan' in other languages, would it be possible to reconstruct a nasal vowel?

I suppose it is possible that 'khan' is the only word in Khitan with a nasal vowel because it was borrowed from a language in which *-an became *-ã, but I wouldn't bet on that.

- ɑ is only in the closed syllable 160 <tʃɑl> (= 183 <car> in Kane 2009)

Is <ɑ> a typo for <a>? If not, why not reconstruct <tʃal>? What is the evidence for a back allophone of */a/ before a final (velarized?) */l/?

- ö is only in 324 <u> ~ <ö> (= 372 <û> in Kane 2009); it transcribes Chinese *u and in native words corresponds to Mongol ö and perhaps u (see Kane 2009: 80, 99, and 105)

- ʊ is only in 313 <ʊŋ> and 320 <tʃʊŋ> (= 357 <úŋ> and 367 <źuŋ> in Kane 2009) for Chinese loanwords

I don't see why 313 (Kane 357) can't be reconstructed as <uŋ>. I don't know what the difference was - if any - between its rhyme and the rhyme of Kane's 106/345 <uŋ>; all three characters transcribed Chinese *-uŋ. Kane (2009: 77) regarded 346 as a variant of 345, though he gave no examples of their interchangeability. Kane 181 <iúŋ> for 龍 *ljuŋ or *lyŋ might have been <üŋ> (= Aisin Gioro's 158 <juŋ>).

320 (Kane 367) is probably <ywiŋ> since it transcribed Chinese 榮 *jwiŋ (which still rhymed with *-iŋ words during the Liao Dynasty; see Kane 2009: 249) and is clearly derived from 榮:

*jwiŋ shifted to *juŋ by the Yuan Dynasty and did not develop a *z-like initial until after the Yuan Dynasty, long after the fall of the Khitan. It never had an affricate initial in Chinese, so I do not know why Aisin Gioro reconstructed 320 with<tʃ->. (This section revised 4.10.23:27.)

- ï is presumably only for Chinese loans, though I wonder if it also existed in native Khitan words (Korean kŏran < *kətan 'Khitan' may imply a Khitan *qïtan, and Janhunen [2003: 5] reconstructed in pre-Proto-Mongolic.)

Only six vowels (the core five pljus æ) appear in diphthongs:

Rising diphthongs: iV = /jV/?


I have included ju since it's not clear to me how iV and jV are different in Aisin Gioro's reconstruciton.

Rising diphthongs: uV = /wV/?


I could have included ui if it was /wi/, but I suspect it was the high counterpart of oi /oj/ (see below) and the mirror image of ju (see above).

Falling diphthongs: Vi = /Vj/?

  əi oi
æi ai  

Falling diphthongs: Vu = /Vw/?

  əu, (jəu)  

Comparing the four tables above, a pattern emerges:

V in iV/Vi is never nonlow and front

V in uV/Vu is always nonhigh and central

Offhand I wonder if Aisin Gioro's core vowel system is compatible with a height harmony system:

high i ə u
low æ = /e/? a o

This is like the height harmony system of Middle Korean (and my reconstructions of Old Chinese and pre-Tangut). However, the limited distribution of æ makes me wonder if it was just an allophone of /a/ or /ə/.

I don't know where ö would fit. The conflicting clues for its pronunciation - back [u] or front [ø]? - reminds me of Manchu ū [ʊ] which was written like Mongolian ü.

I suppose the Chinese loan vowels ï and ʊ would fall into the high and low series.

I doubt ã and ɑ (as a phoneme distinct from /æ/ and /a/ as opposed to an allophone of /a/) ever existed. *C(.)R-USTERS IN BLACK TAI AND BAO YEN

I concluded "S-implification in Black Tai and Bao Yen" by writing,

Without looking at the development of other Proto-Tai *C(.)r-clusters in the two languages, I cannot be confident about these reconstructions.

I already gathered all the reflexes of Proto-Tai *C(.)r-clusters in Black Tai in that post. I list them again below in a more convenient tabular format along with the corresponding reflexes in Bao Yen from Pittayaporn (2009). Unlike Pittayaporn, I distinguish between *qr- and *q.r-, and I reconstruct *c.r- instead of *cr-.  I have added reflexes of *ʰr- and *r- for comparison.

Proto-Tai *pr- *p.r- *br- *tr- *r.t- *ʰr- *c.r- *kr- *qr- *k.r- *q.r- *gr- *voiced C.r- *r-
Black Tai pʰ-/f- t- p- h- tʰ- h- s- c- h- h-
Bao Yen pʰj- pʰ- pj- kʰ- r-, (l-) r-

See "S-implification in Black Tai and Bao Yen" for more on the development of *K(.)r-clusters. Details on other types of *r-clusters follow.

Notes on Black Tai:

1. Pittayaporn (2009) has /pʰ/ correponding to /f/ in Gedney's data in Hudak (2008). The former is more conservative:

*pr- > *pɣ- > *px- > /pʰ-/ > /f-/

2. *p.r- may have become *pr- and then *tr- after original *pr- and *tr- had been lost. This new *tr- then simplified to /t-/.

3. *br- lost all trace of its medial:

*br- > *bɣ- > *bɰ- > *bj- > *b- > /p-/

Compare with *gr- whose medial became *-j- and palatalized the preceding velar:

*gr- > *gɣ- > *gɰ- > *gj- > *kj- > /c-/

(The relative chronology of changes I do not discuss in detail is not intended to be exact: e.g., devoicing might have preceded palatalization.)

4. *r.t- merged with *tr- and perhaps became /h-/ via a dental fricative stage:

*r.t- > *tr- > *tɣ- > *tx- > *θ- > /h-/

Then again, if *kr- became a *kx- that simplified to *x-, then perhaps *tx- also simplified to *x-.

5. *voiced C.r- merged with *r-.*ʰr- and *r- may have become *x- and *ɣ- after original *x- and *ɣ- had become *kʰ- and *g- (now /kʰ/ and /k/). These new velar fricatives then backed and merged as /h/.

6. *c.r- (Pittayaporn's *cr-) may have become a third kind of *tr- dating between the other two (original *tr- and *tr- from *p.r-):

Proto-Tai *r.t-/*tr-merger *-r- > *-x- sesquillabic compression; *Cx- > *x- cluster assimilation Black Tai
*p.r- *p.r- *p.r- *pr- *tr- /t/
*tr- *tr- *tx- *x- /h/
*c.r- *c.r- *c.r- *tr- *tθ- /tʰ/

Cluster assimilation required one part of a cluster to become more like the other:

*pr- > *tr- (labial to dental)

*tr- > *tθ- (voiced sonorant to voiceless obstruent)

That is my attempt to find commonality between two otherwise seemingly very different paths of change.

Notes on Bao Yen:

1. *-r- became /(ʰ)j/ after labials as well as *g-. I don't understand why this didn't happen after *k- and *q-:

*pr- > /pʰj/

*br- > /pj/

*gr- > *kj- > /c/

but *kr-, *qr- > /kʰ/ (not */kʰj/)

Maybe there was a constraint against coronals + *-j-.

2. Pittayaporn's Proto-Tai *p.r- has two kinds of reflexes in Bao Yen:

/pʰj/ (like *pr-: e.g., 'shuttle of loom')

/pʰ/ (unlike *p.r-: .e.g., 'cucumber')

My guess is that some *p.r-words (e.g., 'shuttle of loom') compressed into monosyllables before others (e.g., 'cucumber') in pre-Bao Yen.

Here is how original *pr- and secondary *pr- might have developed:

*pr- > *pɣ- > *pʰɣ-  *pʰɰ- > /pʰj-/

*p.r- > *pr- > *pɣ- > *px- > /pʰ-/

3. Proto-Tai *ʰrwɯ:j A became Bao Yen /wi: A1/ rather than */wi: A1/, presumably because Bao Yen does not have initial /hw/. I don't know that for a fact; I only know that /hw/ is not in any Bao Yen word in Pittayaporn's data.

Note that Proto-Tai *hw- sans *-r- became Bao Yen /pʰ/.

4. I didn't reconstruct *kx- (from *kr- and *qr-) simplifying to *x- in Bao Yen, so I won't reconstruct *tx- (from *tr- and *r.t-) simplifying to *x-. Instead, I'll have *tx- fuse into *θ- and back to /h-/ (cf. Black Tai note 4 above).

5. Unlike Black Tai, Bao Yen did not merge *ʰr- and *(voiced C.)r-. Only the former became /h/; the latter (generally?) remained *r- (Pittayaporn found a single case of /l/ < *voiced C.r- - perhaps a loanword?).

6. I'm not happy with how I bridged *c.r- and /tʰ/ in Black Tai (note 6 above), but I can't think of any better solution, and for now I recycle it for Bao Yen. THE ORI--IN OF MOHAWK'S ONLY AFFRICATE

I rediscovered Mohawk when looking for a language lacking m and found it mentioned in Wikipedia's article on bilabial nasals. I last wrote about Mohawk five years ago after seeing it on a stop sign. Back then I didn't mention Mohawk's only affricate /dʒ/ which apparently is quite different from other obstruents judging from Wikipedia's description of Mohawk phonology:

- it is always voiced: [dz] ~ [dʒ] (depending on dialect)

- it patterns in clusters like a sonorant rather than an obstruent

One other unusual characteristic of /dʒ/ is its ability to combine with /j/ in both initial and medial position. Is /dʒj/ pronounced [dʒj] which would be very difficult to distinguish from [dʒ] without [j]? Or is /dʒj/ pronounced as palatal [dʑ] or [ɟ]?

I wonder if /dʒ/ was originally a voiced sonorant like *r which does not exist in modern Mohawk (though Proto-Iroquoian had *ɹ). I am not confident about that solution for three reasons:

1. I don't know what /dʒ/ corresponds to in other Iroquoian languages. Is it from Proto-Iroquoian *ts?

2. I don't know of any other language in which *r(j) hardened to /dʒ(j)/: *r > > *ʒ > /dʒ/ or *r > > *dʐ > /dʒ/.

3. If *r hardened to /dʒ/, I would expect other instances of fortition. Do such instances exist?

Like Mohawk, Proto-Iroquoian lacked labials other than *w. Did pre-Proto-Iroquoian undergo lenition: e.g., did *p and *m weakened to *w?

If Proto-Iroquoian had no *m, where did Cherokee /m/ come from? Loanwords? S-IMPLIFICATION IN BLACK TAI AND BAO YEN

I've read the introduction to Ostapirat (2000) many times, but recently this passage on p. 19 jumped out at me, and not just because of the un-PC use of "inferior":

"Kra", the autonym which originally means 'human being' [...] Cf. the related form in Black Tai /saa C1/, which has been borrowed as Vietnamese /xá/ to designate various inferior ethnic groups in Vietnam

How did Kra come to have initial [s] in Black Tai and Vietnamese? (Vietnamese x is [s].)

Black Tai is spoken in northwestern Vietnam, and I initially thought that perhaps it had undergone the shift

*Cr- > *ʂ- > s-

which had also occurred in northern Vietnamese. (Southern Vietnamese still has [ʂ].) The resulting /saa C1/ was then borrowed as xá.

However, that was not the case:


*pr- > Black Tai /pʰ-/ (Pittayaporn 2009: 140) or /f-/ (Gedney 0287, 0300) but northern Vietnamese s [s]

*br- > Black Tai /p-/ (e.g., Gedney 0647)

*tr-, *kr- (and my *qr-; see below) > Black Tai /h-/ (Pittayaporn 2009: 141, 143; e.g., Gedney 0081 and 0082) but northern Vietnamese s [s]

*cr- > Black Tai /tʰ-/ (Pittayaporn 2009: 142; e.g., Gedney 0706) but northern Vietnamese s [s]?

*gr- > Black Tai /c-/ (e.g., Gedney 0160)

*qr- (= my *q.r-; see below) > Black Tai /s-/ (Pittayaporn 2009: 144; e.g., Gedney 0124)

*C.r-sequences (i.e., sesquisyllables beginning with *CVr-)

*p.r- > Black Tai /t-/ (e.g., Gedney 0345)

*k.r- > Black Tai /s-/ (e.g., Gedney 0120)

*Unknown voiced consonant + r- > Black Tai /h-/ (e.g., Gedney 0310)

Gedney numbers refer to cognates in Hudak 2008.

Pittayaporn's Proto-Tai has no *dr-, *ɟr-, or *ɢr-; these may be accidental gaps.

Pittayaporn (2009: 337) reconstructed Proto-Tai *kraː C 'slave' which should have become Black Tai */haa C1/ but is /saa C1/ as if it were from *qraː C or *k.raː C. My guess is that pre-Black Tai retained a sesquisyllabic *k.raː C that collapsed into a monosyllabic *kraː C in other early Tai varieties.

Bao Yen  (Pittayaporn 2009), another Tai language in Vietnam, also seems to have retained *k.raː C. Compare its reflexes of *K(.)r- with those of Black Tai (Gedney in Hudak 2008). Exclamation marks indicate forms that would be irregular if they did not come from sesquisyllables.

Gloss Proto-Tai (Pittayaporn 2009) Bao Yen Black Tai
slave *kraː C (my pre-BY and pre-BT *k.raː C) /saa C1/ (!) /saa C1/ (!)
spider *krwaːw A (my pre-BY and pre-BT *k.rwaːw A) /saaw A1/ (!) /saaw A1/ (!)
to imprison *k.raŋ A /saŋ A1/ /saŋ A1/
six *krok D (my pre-BY *k.rok D) /sok DS1/ (!) /hok DS1/
to seek *kraː A (my pre-BY *k.raː A) /saa A1/ (!) /haa A1/
to sift *qrɤŋ A (my pre-BY *q.rɤŋ A) /sɤŋ A1/ (!) /hɤŋ A1/
egg *qraj A (my pre-BT *q.raj A) /kʰaj B1/ /saj B1/ (!)
mountain stream *qrwɤj C /kʰuəj C1/ /huaj C1/
to laugh *krɯəw A /kʰuə A1/ /hua A1/
fish net *kreː A /kʰɛː A1/ /hɛ A1/
mortar *grok D /cok DS2/ /cok DS2/

Unlike Pittayaporn, I distinguish between *q.r- and *qr-; his *qr- corresponds to my q.r- whose reflexes are like those of *k.r-, and my new *qr- has reflexes like those of *kr-.

Like 'slave', 'spider' was sesquisyllabic in the ancestors of Bao Yen and Black Tai.

Pre-Bao Yen apparently retained sesquisyllabic forms of 'six', 'seek', and 'sift' that became monosyllables in pre-Black Tai and other early Tai varieties.

Conversely, pre-Black Tai retained a sesquisyllabic form of 'egg' that became a monosyllable in pre-Bao Yen and other early Tai varieties.

Here is how *K(.)r- might have simplified:

Black Tai

Proto-Tai Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 Today
*kr-, *qr- *kr- *kɣ- *kx- *x- /h/
*k.r-, *q.r- *k.r- *kr- *kʂ- *ʂ- /s/
*gr- *gr- *gɣ- *gɰ- *gj- *kj- /c/

I cannot reconstruct a *kʰ-stage between *kr-, *qr- and /h-/ because Black Tai has a /kʰ-/ distinct from /h-/.

Bao Yen

Proto-Tai Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 Today
*kr-, *qr- *kr- *kɣ- *kx- *kʰ- /kʰ/
*k.r-, *q.r- *k.r- *kr- *kʂ- *ʂ- /s/
*gr- *gr- *gɣ- *gɰ- *gj- *kj- /c/

The relative chronology is only approximate.

Without looking at the development of other Proto-Tai *C(.)r-clusters in the two languages, I cannot be confident about these reconstructions. MEETING AT NIGHT: GEMINATION IN UKRAINIAN AND BELARUSIAN DECLENSION

While looking for a free online Ukrainian grammar, I found Martin Dietze's "Ukrainian Grammar Short Reference" at his (what a domain name!) with the paradigm of зустріч 'meeting':

Case/number Ukrainian Belarusian Russian
Nominative/accusative singular зустріч ніч ноч ночь
Vocative singular* зустріче ноче -
Genitive/dative/locative singular
Nominative/vocative/accusative plural
зустрічі ночі ночы ночи
Instrumental singular зустріччю ніччю ноччу ночью
Genitive plural зустрічей ночей начэй ночей
Dative plural зустрічам ночам начам ночам
Instrumental plural зустрічами ночами начамі ночами
Locative plural зустрічах ночах начах ночах

I've included the paradigm of ніч 'night' and the paradigm of its Belarusian and Russian cognates for comparsion. (The Belarusian and Russian cognates of зустріч are сустрэча and встреча which belong to a different declension.)

Why do Ukrainian and Belarusian have stem-final gemination (in bold) in the instrumental singular? My guess is that the consonant lengthened to compensate for the loss of the vowel which is still preserved in Russian orthography:

*nočьju  > U [nʲitʃʲːu], B [notʂːu], R [notɕu]**

Compare how a vowel rather than a consonant lengthened in a similar environment in Japanese: *tiyu > *tyu > chū.

I am also reminded of Pali geminates from Sanskrit *-(C)Cy-: e.g.,

maccu- 'god of death' < mṛtyu- 'death'

maccha- < matsya- 'fish' (Does the aspiration of geminate cch indicate that Sanskrit s was aspirated [sʰ] like Korean s? See Jacques 2011 for more on aspirated fricatives.)

According to Wikipedia, there was no gemination if there were one or more consonants before *CьjV. Gemination would have resulted in a C1C2ː-cluster which doesn't exist in Ukrainian. See Wikipedia and Mayo (1993: 903) for further constraints on gemination in Ukrainian and Belarusian.

4.6.20:39: The Ukrainian verb лити [lɪtɪ] 'to pour' has similar stem-initial gemination: e.g.,

*lьju > ллю [lʲːu] 'I pour' (cf. Belarusian and Russian лью [lʲju])

However, the stem-initial gemination of Ukrainian ссати [ɑtɪ] and Belarusian ссаць [atsʲ] 'to suck' appears to be from *sъ- with instead of *ь; cf. Russian сосать [sɐsatʲ].

*Dietz did not list a vocative singular for зустріче, so I supplied one by analogy with ноче.

**Can Ukrainian, Belarusian, and Russian speakers tell each other apart by their pronunciations of ч /č/? I have never heard Belarusian, and have relied on Wikipedia's IPA guides (Ukrainian, Belarusian, and Russian) for the phonetic forms here. To my poor ear, Ukrainian and Russian ч sound similar, and both sound completely different from Mandarin j [tɕ] even though Wikipedia's IPA for Mandarin j is identical to its IPA for Russian ч. FROM FATHER TO UNCLE

I think Proto-Indo-European *pʕtḗr 'father' could have become something like *poti in Proto-Slavic*, but in fact Proto-Slavic had a different word *otьcь for 'father', and according to Schenker (1993: 113), the Indo-European word for 'father' became Proto-Slavic *strъjь 'paternal uncle' (presumably from *s- + zero grade  *ptr- + -ъjь). I have three questions about *strъjь:

1. Is the initial *s- s-mobile?

2. What is the suffix *-ъjь?

3. What other languages shifted 'father' to 'uncle'? Or 'mother' to 'aunt'?

*Cf. Proto-Slavic *mati 'mother' from *méʕtēr. ĐER(I)-ATION

I am puzzled by the Glagolitic letter Ⰼ for several reasons (not even including the derivation of its form!):

1. Why does it exist? Wikipedia lists its sound value as /dʑ/, though that phoneme did not exist in Old Church Slavonic.

2. Does it really correspond to modern Serbo-Croatian ћ ć, its voiced counterpart Serbo-Croatian ђ đ, and Macedonian ѓ ǵ, as Wikipedia implies? None of those three sounds existed in Old Church Slavonic.

3. Why does its name (đervь ~ ǵervь 'tree') have initial đ- (see the Croatian Wikipedia) or ǵ- (both sounds that did not exist in Old Church Slavonic!) if no Slavic word for 'tree' has similar initial consonants: e.g., Serbo-Croatian and Macedonian drvo (not SC *đrvo or M *ѓрво *ǵrvo)?

(4.5.0:05: Moreover, why does its name end in -ь if Slavic words for 'tree' end in -o?)

3'. And why does Old Church Slavonic have дрѣво drěvo with ě if Proto-Slavic had *dervo with e?

4. Cubberley (1993: 24) listed ћ (transliterated as ǵ/j, not ć) as the early Cyrillic equivalent of Ⰼ, and questions 1 and 2 apply to ћ as well as Ⰼ.

On the other hand, Wikipedia does not mention ћ in its article on the early Cyrillic alphabet, though its article "Tshe" (ћ) does mention that the later Cyrillic letter ћ ć was based on the earlier Cyrillic letter ћ ǵ/j. Oddly the article on "Dje" (ђ), an obvious derivative of ћ, does not mention either ћ. THE VÖWELS OF PREKMÜRŠČINA

Two days ago, I asked about Slovene surnames with ü which I thought was un-Slovene. It's actually un-standard Slovene. I forgot that I read in December about how some Slovene dialects have front rounded vowels: e.g., that of Prekmurje, where Albina Nećak Lük is from (and where Danilo Türk's ancestors might be from; his native Maribor is "where significant immigrant communities from Prekmurje have settled"). I was reminded of that fact by the last line of the Lord's Prayer in Prekmurje Slovene on p. 273 of Francis Tapon's The Hidden Europe:

Prekmurje: nego odslobodi nas od hüdoga

Standard: temveč reši nas hudega

'but deliver us from evil'

I assume those front rounded vowels are due to Hungarian and/or German influence since Prekmurje borders Hungary and Austria. What I don't understand is how they developed in native words. In some cases, they seem to reflect nearby palatal segments: e.g.,

P odpüščamo 'we forgive' with ü before š (cf. standard odpuščamo)

P hüdoga < *hudega 'evil'? (assuming the standard form is more conservative)

Presumably ö is always conditioned (by some palatal segment?) since it is nonphonemic according to Wikipedia.

On the other hand, ü is presumably phonemic because it is unpredictable: e.g., it appears in the river name Müra (standard Mura) which contains no palatal segments other than ü.

The unusual vocalism of Prekmurje is not limited to front rounded vowels: e.g., it has au or ou from earlier *o and *ǫ: e.g.,

'God': Baug ~ Boug < *bogъ

'road': paut ~ pout < *pǫtь

I presume this shift postdated the merger of *o and *ǫ. But why did some *o break while others didn't? Is accent a factor? THE SERBO-CROATIAN CASE -GA-ME

Vowel correspondences between Slavic languages are generally very straightforward, so I'm frustrated by how Serbo-Croatian and Slovene third person pronoun and adjective case endings don't quite line up with their equivalents elsewhere. Unexpected vowels are in red.

masculine/neuter singular Proto-Slavic Serbo-Croatian Slovene Polish Russian
'he', 'it'
accusative/genitive *jego njega njega jego jego
clitic* accusative/genitive (*go) ga ga go n/a
locative *jemь njemu njem nim nem
dative *jemu njemu jemu jemu
clitic dative (*mu) mu mu mu n/a
adjective 'new' accusative (masculine animate), genitive *nova-jego novog(a) novega nowego novogo
short genitive *nova nova n/a
locative *nově-jemь novom(e/u) novem nowym novom
dative *novu-jemu novemu nowemu novomu
short locative *nově novu n/a
short dative *novu

(I have left out masculine inanimate and neuter accusative forms for 'new' since they are regular.)

Today I realized that the unexpected Serbo-Croatian and Slovene accusative/genitive a might be by analogy with the genitive ending -a in masculine and neuter o-stems (e.g., mesto 'place', 'town') and short adjectives (e.g., *nova).

*nova-jego města > SC novog(a) m(j)esta 'of the new place', Sl novega mesta 'of the new town'

Did this analogy occur

- indepedently in Serbo-Croatian and Slovene?

- in a common ancestor of Serbo-Croatian and Slovene (Proto-Southwestern Slavic as opposed to Macedonian and Bulgarian)?

- in Proto-South Slavic (and if so, are a-forms attested in earlier Bulgarian and Macedonian)?

Could the unexpected Serbo-Croatian dative-locative e also be due to analogy? If so, what would be the model? Proto-Slavic  masculine and neuter o-stems had a locative ending *-e. Were forms like novome first created at a time when Serbo-Croatian masculine and neuter o-stems (e.g., mesto 'place') and short adjectives (e.g., *nově) still had an *-ě-like locative? (Those stems now have -u for both dative and locative.)

*u nově-jemь městě > *u novome m(j)est(j)e? > SC u novom(e) m(j)estu 'in the new place'

*4.3.19:00: Wiktionary does not reconstruct clitic forms for the third person pronoun, but I have done so because nearly identical clitics are attested in all three branches of Slavic. (No standard East Slavic language has them, but ho < *go and mu are in nonstandard Ukrainian [Shevelov 1993: 960].) It's possible that all three branches (or even languages within them) independently dropped the first syllables of the third person pronouns to form the clitics, but is it probable?

4.3.22:18: When I first saw Serbo-Croatian -ga, I thought of how unstressed Russian -го is pronounced [və] and assumed that Serbo-Croatian a was also the product of vowel reduction. But I later rejected that idea because as far as I knew, the reduction of *o was unique to Russian and Belarusian. (Belarusian has -га [ɣa] corresponding to Russian -го [və] and SC -ga.) However, *o-reduction is actually more widespread that I thought: it's also in Upper Carniolan Slovene and Smolyan Bulgarian. Wikipedia reports akan'e (vowel reduction to a) in Polissian Ukrainian whose eastern variety is transitional with Russian, so I assume it has Russian-style *o-reduction (rather than Belarusian-style *o and *e-reduction). Nonetheless that wider distribution doesn't necessarily mean my original guess was correct. As far as I know, akan'e is completely unknown in Serbo-Croatian. LÜK-ING TÜRK-ISH

Tonight I rediscovered my copy of Language in the Former Yugoslav Lands. The name of the author of the chapter on Slovene caught my eye: Albina Nećak Lük. Neither ć nor ü are in Slovene. Ć is in Serbo-Croatian (and nećak is Serbo-Croatian for 'nephew' - a coincidence?), but ü is in neither Slovene nor Serbo-Croatian. Lük is from Prekmurje, where "[p]eople of different languages, Slovene, Hungarian, German, Romani, Yiddish, live in close contact for centuries". Is Lük a Hungarian or German name?

The umlaut in Lük reminded me of another name from Slovenia that puzzled me: Danilo Türk. I thought I might have already written about Türk, but Google tells me I haven't. In any case, these folks wrote about it years before I ever heard of him. The first post in that thread quotes this article.

I wonder how the average Slovene pronounces ć and ü. Are they consistently distinguished from native č and u? If not, who distinguishes them and who doesn't? "A TIME FOR TRUTH-TELLING"

Joanne Jacobs wrote that

March 31 is my birthday, a time for truth-telling.

I tried to translate the latter phrase and came up with

1ʐɨəʳ 1tshiee 2ziọ 'truth speak time'

That got me thinking about two of the various Tangut words for 'time':

0705 1ziẹ and 4861 2ziọ

Although their characters are completely different, they are phonetically similar and exhibit an e ~ o alternation almost always otherwise found in verbs. (See below for the only other example with nouns that I know of.)

Jacques (2009) regarded that e ~ o alternation as the result of suffixation. (I have converted his reconstruction into mine; the basic principle is the same.)

0749 *CI-pha > 1phi 'to send, cause' (stem 1*; no suffix)

4568 *CI-pha-w-H > 2phio 'to send, cause' (stem 2; suffixed)

Jacques regarded *-w as a third person patient suffix with a cognate -w still in northern Qiang today.

(The function of *-H which conditioned the second tone is unknown. Not all second stems have the second tone.)

By analogy I could reconstruct the two words for 'time' as *SE-Sa and *SE-Sa-w-H.

*S- conditioned vowel tension (indicated by a subscript dot)

*-E- conditioned the raising of *-a to *-ia and later -ie; it and *-a conditioned the intervocalic lenition of an earlier sibilant *-S- to -z- (phonetically [ɮ]?)

But there are limits to analogy. The *-w in 'time' obviously cannot be a third person patient suffix. What is it? Could it be from an earlier *-k? Or is 'time' a true case of ablaut: i.e., primary vowel alternation as opposed to secondary vowel alternation?

(4.1.0:55: Yet another solution is to assume that the original root vowel was *o and the -e-form is from *-o plus a suffix *-j:

*Sɯ-So-H > 2ziọ

*Sɯ-So-j > 1zi

However, that begs the question of what *-j is. I have reconstructed the presyllabic vowel conditioning later -i- as *ɯ, since the frontness of -e is due to *-j. Unlike *E, *ɯ did not cause stressed vowels to front.)

A pair of nouns with a similar alternation* is

1591 2nie 'language' and 1824 2nwio 'word'

Is the semantic difference between those nouns comparable to that between the two words for 'time' (e.g., time as a whole vs. a point in time)? Tangut has many apparent synonyms whose distinctions are not yet fully understood.

*Jacques (2009: 3) explained the difference between stems 1 and 2:

Stem 2 is used when the verb’s subject (that is, A for a transitive verb or S for an intransitive one) is 1Sg or 2Sg and the patient is third person (Gong 2001:26). Stem 1 occurs in all other cases, including those when a 1sg or 2sg agreement suffix appears but is coreferent with the patient of the verb (Gong 2001:32-34).

**4.1.1:00: Unlike the words for 'time', 'language' and 'word' share the same tone, and 'word' has a medial -w- conditioned by a *P-prefix. YERNAZ RAMAUTARSING

I first heard about Yernaz Ramautarsing today through Bosch Fawstin. I've been trying to find the derivation of his name.

Ramautarsing is a Hindi compound of three elements of Sanskrit origin:

राम Rām 'Rama'

औतार autār 'avatar'

सिंह Siṃh 'lion'

When I Googled Yernaz, the results not involving Ramautarsing were mostly ... Kazakh! According to, ер yer is 'hero'* and наз naz is a loan from Persian (presumably ناز nāz 'glory'). Did this Turco-Persian hybrid spread into India before arriving in Suriname where Ramautarsing was born?

*I'm surprised I can't find the Kazakh word in this entry for Proto-Turkic *ēr 'man'. (Also see Clauson 1972: 192.) A KHIOOR-IOUS KHAN-UNDRUM

There are only three tangraphs with the element vai:


1018 1lwo 'moist' (vemvai) =

left of 0642 2lõ 'origin' (vemjolcon; phonetic) +

center of 3052 1niooʳ 'water (trigram ☵)' (cirvaigii; semantic)


3052 1niooʳ 'water (trigram ☵)' (cirvaigii) =

left of 3058 2ziəəʳ 'water' (cirzaa; semantic) +

right of 1018 1lwo 'moist' (vemvai; semantic) +

right of 5941 1diə̣ 'strip' (pargii; why?)


4754 1khiooʳ (Sanskrit transcription) (biobuxvai) =

top and left of 4807 1khi 'to lose' (biobuxpik; initial) +

center of 3052 1niooʳ 'water (trigram ☵)' (cirvaigii; rhyme)

As I wrote in my last entry, 1018 "is clearly a phonosemantic compound," though I don't understand why its semantic component is vai instead of the much more common radical cir 'water'.

3052 is probably a semantic compound. 5941 'strip' might be a reference to how trigrams look like strips, though its components par and gii are absent from the tangraphs for the seven other trigram names:

3950 1tshwiu 'heaven (☰)' (girdexgie)

3389 2ŋiõ 'mountain (☶)' (dexfei)

2777 1ŋeʳw 'thunder (☳)' (dexdukcin)

1995 2məi 'wind (☴)' (biidexdak)

4555 1pə 'fire (☲)' (qeucok)

3910 1phəu 'earth (☷)' (girges)

1976 2bie 'swamp (☱)' (baebeldexbel)

4754 is a fanqie character for Sanskrit transcription according to both the Tangraphic Sea and Homophones. I would expect it to transcribe a Sanskrit phoneme sequence khyor [kʰjoːr] corresponding to its reading 1khiooʳ. However, the only Sanskrit khyor I know of is the genitive and locative dual of sakhi- 'friend', its compounds, and a few rare -khi nouns before a voiced segment. Would the Tangut really create a special tangraph for such forms? Or am I overlooking a common verb form, dharani, or mantra with khyor?

Perhaps my slight rewriting of Gong's Tangut reconstruction is wrong. Arakawa (1997: 149) reconstructed 4754 as khya:n. Monier-Williams' Sanskrit dictionary has 156 nouns with -khyān-; 124 end in khyāna-. However, according to Arakawa (1997: 117), 4754 transcribed Sanskrit khan and khyan, not khyān! That is hard to reconcile with the Chinese transcription 娘 *ndʐɨo for 3052 which rhymes with 4754. The only way out I can see is to reconstruct a -(y)an-like reading which shifted to an -o-like reading by 1190 when 3052 was transcribed in the Pearl. VEM-ŚA

There are only three tangraphs with the left-hand element vem (in David Boxenhorn's code):

0200 2lõ 'relative'

0642 2lõ 'origin'

1018 1lwo 'moist'

I could call this trio of lo-characters a vem-śa (vaṃśa- being Sanskrit for 'family').

Only the third has a known Tangraphic Sea analysis:


1018 1lwo 'moist' = left of 0642 2lõ 'origin'(phonetic) + center of 1niooʳ 'water (trigram)' (semantic)

It is clearly a phonosemantic compound. I should look into how many tangraphs with oral vowel readings have nasal vowel phonetics and vice versa.* I am also interested in how many tangraphs with -w-readings have -w-less phonetics and vice versa. The looseness of the 'fit' of Tangut phonetics has yet to be measured.

I presume 'relative' is phonetic in 'origin' or the other way around. It is also possible that 'origin' is semantic in 'relative' (i.e., people sharing a common origin).

The right side of 0200 (Boxenhorn code: vemqii)

is semantic; it is shared with ten other tangraphs:

Tangraph LFW2008 Boxenhorn code Reading Gloss
0199 fulqii 2siə first half of 2siə 2sa 'to connect' (i.e., to make near)
0213 fioqii 1nie relative (people who are near to one)
0915 qiitos 1thaa to haunt, make mischief (to be haunted is to have ghosts [symbolized by the right element tos] nearby, and ghosts make mischief)
1639 qiibaehae 1khwạ far (opposite of near)
1951 ciadexqii 2sa Second half of 2siə 2sa 'to connect'
1957 gaaqii variant of 0213
2217 dexbelqii 1ɣʌ near; not attested outside dictionaries? 'ritual' word? resemblance to Middle Chinese 近 *gɨnˀ 'near' coincidental?
2223 bilhasqii 2rieʳ to mend, sew (to seal holes in clothing is to connect threads - to make them near each other)
4506 banqiiqex 2ləu to burn, ignite, light (the right element is qex 'fire')
5228 haehasqii ?lẽ husband of sisters (relative and hence near)

Nearly all have glosses involving nearness (0199, 0213 = 1957, 0915, 1639, 1951, 2217, 2223, 5228). The sole exception is 4506 in which qii might be phonetic: 2ləu is vaguely like 2lõ, the reading of 0200.

Unfortunately the right side of 0642 (Boxenhorn code: vemjolcon)

is unique. It may be a compound of parts (jol and con) extracted from two tangraphs: one of 15 others with jol and one of 12 others with con.

*Gong reconstructed rhyme 2.47 as -ow which corresponds to my -õ. If Gong was correct, vem represented lo-like syllables with or without medial or final glides.

In any case, I don't know why only three lo-like syllables were written with vem while 41 others were not.

Tangut fonts by
Tangut radical and Khitan fonts by Andrew West
Jurchen font by Jason Glavy
All other content copyright © 2002-2013 Amritavision