For nine and a half years, I have been using 96 x 124-pixel images to represent the Tangut, Khitan, and Jurchen (TJK) scripts. These images are almost always screen captures of TJK characters in 72-point fonts in BabelPad. (In a few cases, they are screen captures of Chinese characters that I have modified by hand to represent Khitan large script characters that I don't have in a font.) This system has worked since January 2006 on five different laptops: two with Windows XP, one with Windows Vista, and two with Windows 7. However, it no longer works on my new Windows 8 machines. 72-point characters no longer fit in a 96 x 124-pixel space: e.g., the Khitan small script character for 'twenty'

was 85 x 82 in 72 point in Andrew West's font on previous machines but is now 107 x 104 on one Windows 8 machine and 129 x 125 on another. The screen resolution is 1920 x 1080 on both machines. Is there anything I can do to make characters appear at the same old size again? I am reluctant to post about TJK if future character images can't be consistent with previous ones.

Ah, I see now. Windows 8 has nothing to do with this problem. Resolution is the key. My previous machines had screens set to 1024 x 768, 1280 x 800, and 1366 x 768.  If I reset my current machines' resolution to one of those smaller formats, 'twenty' is 85 x 82 in 72 point. Why is it 26% and 52% larger on two different 1920 x 1080 screens? Intel should know the answer; both machines have an Intel HD Graphics Control Panel. IN-S-ERTED IN TIME

Having mentioned Middle Korean (MK) ᄣᅢ pstay 'time' in my last post, this might be a good, um, time for a short note on John Whitman's (2012: 32) etymology which I just discovered on Wednesday:

Proto-Koreo-Japonic (PKJ) *pə(n)tə >

Proto-Korean *pət-ay > MK pstay > modern Korean ttae 'time'

I thought Whitman's split of 'time' into two parts was ingenious.*-ay is a locative suffix reanalyzed as part of the stem, so modern Korean 때에 ttae-e 'time-LOC' has the same suffix twice: i.e., it is etymologically *'(time-LOC)-LOC'. Although the sequence *pət-ay is not harmonic according to Middle Korean rules, earlier Korean may not have had any harmony.

Proto-Japonic *pə(n)tə 'interval' > Japanese hodo

I see two phonetic problems with this etymology.

First, if the PKJ form had *-n-, that consonant corresponds to nothing in MK unless *pnt- became pst- (via *pzt-?). I know of no parallel for such an unusual change.

Second, if the J form did not have -n-, there is no source for MK -s-, unless *pt- became pst-, which is not only strange but also raises the issue of how MK could also have pt- (e.g., in ᄠᅢ ptay 'dirt'). I would rather not posit a chain shift with syncope in 'time' and 'dirt' occurring in different periods to explain why both pst- and pt- exist in MK:

Proto-Korean *pət-ay 'time' *pVtay 'dirt'
Early syncope *ptay *pVtay
*s-insertion *pstay *pVtay
Late syncope pstay ptay

I did not specify a syncopated vowel in 'dirt' because I have no evidence for its quality. One could hypothesize that *pət- became pst- whereas *pVt- (in which *V was a vowel other than *ə) became pt-, but I don't know why *ə would be more s-friendly (sigmaphilic?) than, say, *ʌ. SICILIAN GEMINATION

What is the origin of initial geminates in Sicilian which may or may not be written? Has gemination from syntactic doubling been carried over into isolated forms: e.g.,

è bonu [ebˈboːnu] 'is good' (?) > bonu [bboːnu]?

(And is [ebb] in turn from *es b ...?)

In any case, I assume the phenomenon is a Sicilian innovation. Although nomu [nnomu] 'name' originally might have had a consonant cluster in Proto-Indo-European (*ʕʷnomn*), that cluster was gone in Latin, so no parallel with Korean 'tense' consonants from earlier clusters (e.g., pstay > ttae 'time') can be drawn.

Is gemination in words like mmàggini 'image' from earlier *VC-sequences (*imàggini > mmàggini), or does it postdate apheresis (*imàggini > *màggini > mmàggini)?

*I assume Greek o- in onoma 'name' is from PIE *ʕʷ- rather than a prothetic vowel as proposed by Cowgill and Beekes (1969). What would be the motivation for prothesis? Greek does not have a constraint against initial n-. (Does any language have such a constraint before o? In Korean, n- was lost before i and y: e.g., 李 Ri > Ni > I 'the surname Lee'.)

Another possible initial PIE cluster is *ʔn- which would normally become Greek en-. Greek o- could be due to *e- assimilating to a following *o. SOLVING FOR X IN MALTESE

Why does x equal [ʃ] in Maltese? That usage surprises me since it is not in English, Italian, or Sicilian (whose alphabet does not include the letter x). Is it

-a remnant of a convention that once existed on the Italian peninsula (and perhaps still exists there, albeit in a language other than Italian or Sicilian)?

- influenced by the orthography of some European language which Maltese is not in direct contact with: e.g., Portuguese and Catalan?

- a Maltese-internal innovation possibly motivated by a one-sound-per-symbol principle (though the digraph from my last post is not compatible with that principle).

I was also going to ask why Maltese has j for [j] unlike English or Italian, but then I learned that Sicilian has the same usage. MYSTIFIED BY MALTESE VOWEL BENDING

I was surprised to see this in the Wikipedia article on Maltese:

/ɐɪ ɛɪ/ represented by għi, and /ɐʊ ɛʊ ɪʊ ɔɪ ɔʊ/ written għu.

I only had a vague memory of għi standing for /ɛɪ/ and għu standing for /ɔʊ/. I was particularly surprised by the equation of għu with /ɔɪ/. So I went to the source (Borg and Marie Azzopardi-Alexander 1997: 299) and found that the passage should be rewritten as follows:

/ɐɪ ɛɪ/ represented by għi, and /ɐʊ ɔʊ/ written għu.

/ɛʊ ɪʊ ɔɪ/ are written as ew, iw, and oj.

I have long assumed that the spellings with are historical and point to a time when there was an 'emphatic' consonant (a voiced uvular fricative corresponding to Arabic gh?) that conditioned the bending of the following vowel before disappearing. Cf. how 'emphasis' (pharyngealization) conditioned the bending of *i and *u in Old Chinese before disappearing:

*Cˁi > *Cˁei > *Cei

*Cˁu > *Cˁou > *Cau

I wish I could confirm my guess by consulting a work on Maltese historical phonology.

I also wish I knew why għi and għu each have two readings. Those readings can't be allophones because they are in brackets: i.e., they are phonemic. Do the spellings reflect a period before a phonemic split conditioned by a factor that has now been lost? WHAT IS THE ORIGIN OF UVULARIZATION IN QIANG? (PART 2) In part 1, I asked,

Did MLQ [Mawo and Luhua Qiang] merge its equivalents of 'Grade I' and 'Grade II': i.e., is MLQ QVʁ from *QV with a plain vowel and *QVʁ with a uvularized vowel?)

I suspect that uvularization is secondary in at least some MLQ words with uvular initials: e.g., Mawo and Luhua Qiang qaʶ 'I' whose external cognates lack any trace of a medial *-r-. Other possible examples are 'afraid/fear', 'fish', 'Chinese', and 'chisel' below.

Luhua Qiang χuʶ 'tiger' looks like a loan from modern Mandarin 虎 hu 'tiger'. Could secondary uvularization be very recent in MLQ?

On the other hand, Luhua Qiang qʰaʶ 'bitter' corresponds to Tangut Grade II

4046 1khi2 'bitter'

Normally I reconstruct medial *-r- as the source of Grade II in Tangut, so it is initially tempting to regard uvularization in the Luhua form as having beem conditioned by a lost *-r-. However, once again there is no external evidence for a medial *-r- (e.g., the cognate Tibetan root is kha, not *khra), so I think some Tangut velar-initial Grade II syllables originally had uvular initials with secondary uvularized vowels:

*qʰa > *qʰVʶ > 1khi2 [kʰiʶ]

(I do not specify the vowel in the intermediate stage since I don't know if raising in Tangut preceded or followed uvularization.)

Possible Tangut cognates of other uvularized MLQ words in Evans et al. (2015) are not Grade II: i.e., they lacked medial *-r-. (Syllable-final numbers indicate Tangut grades: e.g., 1khi2 above is Grade II.)

LFW number
Mawo Qiang
Luhua Qiang

younger brother

təʶ 'brother of a man'
təʶ 'brother of a man'

baʶ 'old (of objects)
to fear

quʶ 'afraid'
quʶ 'fear'


diʶ 'thigh'
diʶ 'thigh'

suʶ 'hemp'
suʶ 'hemp'
to know

niʶ niʶ




six-year-old sheep

nuʶ 'ram'
nuʶ-tə 'ram'

*zar or *Rza

Some of those words may be unrelated: e.g., the Tangut word for 'chisel' is probably a loan from Middle Chinese 鑿 *dzak. The sound correspondences between MLQ and Tangut are not yet known, so I am not able to easily distinguish between true cognates and mere lookalikes.

Could a uvular affix absent in Tangut have conditioned MLQ uvularization?

5.26.2:34: Mawo Qiang nuʶ and Luhua nuʶ-tə 'ram' might be from MLQ nu 'sheep' plus a uvular affix: e.g., *ʁ-nu. However, Evans (2001: 298) listed the Mawo word for 'sheep' as ȵu with a palatal initial instead of n-. I wish I had Qiangyu jianzhi on hand to check the word.

5.26.2:39: Luhua Qiang suʶ 'ten' has no Tangut cognate, but it does resemble Pyu <sū> (Krech 2012's <sav>) ~ <sau> 'ten'. Pyu had initial <sr> in native words (e.g., <srūḥ> 'relative'), so the simple <s> of 'ten' cannot be from *sr- unless there was a chain shift: *Xr- > *sr- > s-.

(Pyu has <h> in <hoḥ> 'three' corresponding to s- elsewhere: e.g., Tangut 1soq1 'three'. That may imply Pyu <s> was once something else that filled the gap left by original *s- when it lenited to *h-.) WHAT IS THE ORIGIN OF UVULARIZATION IN QIANG? (PART 1)

I forgot to make one point in my last entry. It seems that a lot of Chinese historical phonological studies are conducted in a vacuum without much reference to other Sino-Tibetan languages, let alone general phonological typology. Even Sinoxenic (Sino-Vietnamese, Sino-Korean, Sino-Japanese: i.e., systematic borrowings of Chinese) and transcriptive data are not getting as much as attention as I think they deserve. A better (I dare not say 'true' or 'correct') reconstruction of the history of Chinese should take into account the bigger picture.

One of the reasons I like Norman's pharyngeal theory for Old Chinese (OC) is that it makes sense both areally and typologically; it makes OC like its 'Altaic' neighbors (see Norman's 1994 article for details) and it allows nongenetic parallels to be drawn between OC and Semitic. It was a chance look at Maltese that convinced me Norman was right; pharyngeals conditioned vowel lowering in both languages: e.g.,

Imġarri Maltese [anté͜ik] < *antˁk 'ancient' (loan from Italian; Camilleri & Vanhove 1994: 104)

MC *tek < *tejk < OC 弔 *tˁi 'arrive' (but *tˁekʷ is also possible; is there any rhyming evidence pointing to one or the other vowel? Baxter and Sagart 2014 regard the vowel as ambiguous. My guess is that the word was originally *tekʷ with pharyngealization developing before the lower series vowel *e. If the word was *tˁikʷ, its pharygealization would reflect a lost presyllable with a lower vowel: *Cʌ-tikʷ > Cˁʌ-tˁikʷ > *tˁikʷ.)

The vowel changes in OC are also similar to those in Khmer, though the conditioning factor in Khmer was voicing rather than pharyngealization: e.g.,

Khmer [əj] < *iː after voiceless consonants

Late OC *ej < *i after pharyngealized consonants

Last night I proposed that uvularization was conditioned by pharyngealized (= 'emphatic') initials followed by uvular allophones of */r/ in OC and Tangut. Does uvularization in Qiang have a similar origin?

The distribution of uvularized vowels in Mawo and Luhua Qiang (MLQ) as described in Evans et al. (2015) suggests that exact parallels cannot be drawn between Qiang on the one hand and OC and Tangut on the other:

1. Chinese and Tangut contrasted plain and uvularized vowels ('Grade I' and 'Grade II') after reflexes of *uvulars, whereas only uvularized vowels can occur after uvulars in MLQ (Evans et al. 2015: 24). (5.25.1:50: Did MLQ merge its equivalents of 'Grade I' and 'Grade II': i.e., is MLQ QVʁ from *QV with a plain vowel and *QVʶ with a uvularized vowel?)

2. If I understand Evans et al. (2015) correctly, MLQ permits both plain and uvularized vowels to occur in uvular QC-clusters. In theory, both QRV and QRVʁ might exist in MLQ, though I cannot find any examples in Evans et al. (2015). On the other hand, Chinese and Tangut only had uvularized vowels after reflexes of *QR-clusters.

3. Some uvularization in MLQ is due to right-to-left spreading: e.g.,

Luhua kʰɹa 'eight' + suʶ 'ten' = kʰɹaʶ-suʶ 'eighty' (Evans et al. 2015: 29)

(5.25.1:51: I think uvularization in vowels after velars is exclusively secondary in MLQ: i.e., there are no isolated monosyllabic roots combining velar initials with uvularized vowels.)

This phenomenon has no known parallel in Chinese or Tangut. (I reconstruct left-to-right emphatic spreading in those languages.)

Although Tangut is more closely related to Qiang than to Chinese, Tangut areally aligns with Chinese at least as far as uvularization is concerned if my interpretation of Grade II is correct for both languages. WAS MIDDLE CHINESE (AND TANGUT) GRADE II UVULARIZED?

Last week I finally started an entry that was more than just a link to another scholar's work. However, I ran into Internet problems and put off writing nearly all of the entry until tonight.

On the 14th I discovered Evans et al.'s "Uvular approximation as an articulatory vowel feature". Although the paper only discusses that feature in the Mawo and Luhua dialects of Northwestern Qiang, I wonder if that feature characterized Grade II in Middle Chinese (MC).

Old Chinese (OC) syllables with 'emphasis' (pharyngealization) became Grade II syllables in MC if they had a medial *-r-. Otherwise they became Grade I syllables:



In my reconstruction of OC, uvulars were only in 'emphatic' syllables. Medial *-r- had an uvular allophone *[ʀ] after 'emphatic' initials. This *[ʀ] weakened to a fricative *[ʁ] and uvularized the following vowel before disappearing in late OC:

OC *CˁʀV > *CˁʁV > *CˁʁVʶ > *CVʶ

Tangut Grade II may have had a similar origin: e.g.,

*pʰroH > *pʰˁʀoH > *pʰˁʁoH > *pʰˁʁoʶH > 0080 2pho2 [pʰoʶ²] 'snake'

There was no way to indicate uvularization in the Tibetan script, so Tangut Grades I and II were not distinguished in Tibetan transcription.

MC Grade II vowels were borrowed as nonuvularized vowels in Vietnamese, Korean, and Japanese - all languages lacking uvularization:

Vietnamese Korean Japanese
*ka (Grade I) 'song' ca [kaː] ka

*k(ɰ/j)aʶ (Grade II) 'to add' gia [zaː] < *kjaː

Grade II developed a glide after velars in the MC dialect underlying Sino-Vietnamese:

OC *KˁʀV > *KˁʁV > *KˁʁVʶ > *KɰVʶ > *KjVʶ

Sino-Korean is based on an eighth century northeastern dialect in which that glide had not yet fronted to *-j-. Velar *-ɰ- has not left a trace in Sino-Korean. The -y- in a few Sino-Korean borrowings of Grade II syllables is due to the Korean-internal breaking of *e and does not reflect later NE MC *-j-: e.g.,

界 MC *kɰèʶj > Old Korean *kéy > Middle Korean *〮곙 *kyŏ́y > modern Korean 계 kye [ke]

(5.24.1:42: The MC 'departing' tone that I indicate with a grave accent corresponds to the Middle Korean high tone that I indicate with an acute accent. I have projected the high tone back into Old Korean, but it is possible that the OK source of the high tone had a different contour. In any case, the contours of the OK and the northeastern MC tones were probably similar.

The earliest attested MK reading for 界 is a prescriptive reading 〮갱 káy that is not ancestral to modern Korean 계 kye. The prescriptive reading is from 界 MC *kɰàʶj. I reconstructed MK *〮곙 *kyŏ́y to account for the modern form.)

Sino-Khitan is based on a later stage of that northeastern dialect in which *-ɰ- had fronted to *-j-: e.g.,

家 MC *kɰaʶ > Liao *kja(ʶ) > Khitan small script  <g.ia>

Uvularization may have been lost in Liao Chinese after plain *a raised to *o, leaving a gap to be filled by uvularized *aʶ:

*aʶ > *a > *o

If uvularization persisted in later stages, it must have been subphonemic. It has not been observed in any living Chinese languages. A WEB OF TANGUT CATALOGUES

Andrew West wrote,

Dozens of Tangut Buddhist manuscripts held at the Institute of Oriental Manuscripts (IOM) in Saint Petersburg have been digitized and made available online at the IDP [International Dunhuang Project] website, but there is no accompanying description or bibliographic information for any of them, and not even the title of the text is given. This makes it difficult for the handful of scholars in the world who can read Tangut to usefully browse the Tangut collection on IDP, and next to impossible for everybody else.

I am one of those scholars. Fortunately, Andrew has come to the rescue with his Web of Tangut Catalogues. Thank you, Andrew!

If only Khitan and Jurchen had as many texts as Tangut! BIBLIOGRAPHY OF RGYALRONG STUDIES

Here's another list of recommended reading while I'm away. It was compiled by Guillaume Jacques, whose works taught me almost everything I know about rGyalrong (which is to say very little - mea culpa for not reading enough).

I would add these 1979 Linguistics of the Tibeto-Burman Area articles by Nagano Yasuhiko, who edited the online rGyalrongic languages database along with Marielle Prins:

A historical study of rGyarong initials and prefixes. 4.2: 44-68.

A historical study of rGyarong rhymes. 5.1: 37-47.

I haven't seen those articles since 2008. It would be interesting to compare his views of rGyalrong phonological history with Guillaume's. GUILLAUME JACQUES' BLOGS - AND A BABELSTONE BONUS

I haven't written anything here for almost two weeks now. I may not blog much for the next several weeks.

If you are waiting for me to return, I recommend Guillaume Jacques' posts at these three sites:

Panchronica (in French)

Diversity Linguistics Comment (in English)

HimalCo (in English)

Oh, and Andrew West has a new post that I didn't see until just now! I see he used my simplified transcription of Tangut. Nice!

One reason I've been away is that I got a new laptop and haven't gotten around to setting up its connection with my server yet. Here goes ... if you can see this, I succeeded. DISTRIBUTIONAL DICTIONARIES OF CHARACTERS

Traditional East Asian dictionaries do not explicitly state whether characters can only occur in combinations or not. At first glance, one might get the idea that both 麒 and 麟 are Chinese words, but in fact the first only occurs in the disyllabic word 麒麟 'qilin'*, whereas the second can be found as an independent word in Classical Chinese** and as a part of other words. A 'distributional dictionary' could make a three-way distinction between

- superbound (appearing solely as part of a single polysyllabic word): e.g., 麒

- bound (appearing as part of two or more polysyllabic words): e.g., 麟 in modern Mandarin

- free (able to appear as an independent word): e.g., 麟 in Classical Chinese

Even finer distinctions may be possible, but that's a start.

Such distinctions could be carried over into a Tangut character dictionary since Tangut, like Chinese, has a large number of monosyllabic morphemes. However, the scheme might have to be altered somewhat for Khitan and Jurchen which have a large number of polysyllabic morphemes. Nonetheless, I still think it is important to know that, for example, as far as I know, Jurchen


may be superbound, as it only appears in

<> 'the name Jahudai'

whereas its homophone


has a far wider distribution: it can represent dai 'girdle' (< Chinese 帶) and the syllable dai in many words other than the name Jahudai. The two characters do not appear to be interchangeable. And even if they were interchangeable, it would be nice to know when that was the case: e.g., from the start or only from the Ming Dynasty onward.

Once we determine that two or more homophonous characters were not interchangeable, then we can try to determine why. In some cases the homophony may not turn out to be original: i.e., the two characters originally had different readings that merged over time, and the original functions of the characters blurred. Since <dai2> resembles Jin Chinese 大 *dai, I think it had always been read dai, whereas <dai1> may have originally stood for a rarer Jurchen syllable that later became dai.

*I am not counting the use of 麒 in definitions such as


'The male qilin is called the qi; the female is called the lin'

from the Book of Han. This explanation for the disyllabic word qilin is a folk etymology.

**In modern Mandarin, 麟 only occurs in morpheme combinations. I would be surprised if 麟 is a monosyllabic word in any modern Chinese language. It is possible that very early attestations of 麟 as an independent word were pronounced *grin, a contraction of 麒麟 *gərin. THE LATE GREAT CHU

Today I downloaded the latest version of Andrew West's BabelStone Han PUA font containing 194 楚 Chu script transcription characters.

In 1127, 1350 years after the fall of the original Chu and less than a decade after the creation of the Jurchen (large) script, the Jurchen Empire established 大楚 Great Chu as a buffer between them and the Southern Song. This puppet state only lasted a month.

How would the Jurchen have written 大楚 *Dai Cu 'Great Chu' in their then-new script?

There were two different types of Jurchen graphs for dai.

Jin and Jin (1984: 81, 136) only list a single word-final example for one type:

<> 'the name Jahudai'

The other type was much more common and used to transcribe Jin Chinese 大 *dai 'great' as well as representing the syllable dai in the native Jurchen names

<> and <> (Jin and Jin 1984: 5)

What was the original reasoning behind having two graphs for the same syllable? Were they originally nonhomophonous? My guess is that the common <dai2> was read as dai from the start, whereas the rare <dai1> was originally for some other syllable that merged with dai: e.g., *daai.

Was there also been a lost phonetic distinction between the two kinds of <cu>? Both could be used to write native words, and both even appeared side by side in

<cu1.cu2.wa.hai> 'according to'?

But only <cu2> appeared in Chinese transcriptions, so I conclude that *Dai Cu would have been written as


4.12.2:42: <cu2> could represent the monosyllabic auxiliary verb cu- 'to be able' (Jin and Jin (1984: 81, 259). Perhaps <cu2> was originally a logograph for that verb, whereas <cu1> may have a phonogram from the beginning.

<dai2> resembles Chinese 大 *dai 'great' and could have initially been intended to write that word (and homophonous Chinese loanwords?), unlike <dai1> which might have been reserved for dai in native words. PROTO-SINO-TIBETAN-AUSTRONESIAN *PONUQ 'BRAIN'?

Old Chinese (OC) 腦 *nuʔ 'brain' was a type A syllable* with vowel lowering. According to my theory, *u partly lowered to harmonize with a low unstressed vowel in a lost presyllable:

*Cʌ-nuʔ > *Cʌ-nouʔ > *nouʔ > *nauʔ > Mandarin nao

However, Laurent Sagart (2002: 5) regarded 腦 *nˁuʔ 'brain' as cognate to Proto-Austronesian (PAN) *punuq with a high first vowel *u. If OC had a high presyllabic vowel in 'brain', it would have matched the high main vowel, and there would have been no lowering:

*pu-nuʔ > *nuʔ > *ɲuʔ > Mandarin *rou

Can both Laurent and I be right? PAN had only four vowels (*a *e [= *ə] *i *u), whereas OC had six (*a *e *ə *i *o *u). Laurent (2002: 8) reconstructed seven vowels in Proto-Sino-Tibetan-Austronesian (PSTAN) to account for the following correspondences in main vowels:

PSTAN Environment OC PAN
*u before labials *u
elsewhere *u
*o before labials *a
elsewhere *o
*a before *y *i *a
elsewhere *a
(everywhere) *e
*e after grave consonants *e
elsewhere *i
*i in open syllables *i
in closed syllables *i

I only reconstruct two vowels in OC presyllables: high and low *ʌ**. I have long thought each resulted from the merger of various unstressed vowels. Let's suppose that those earlier vowels were identical to the seven vowels in PSTAN final syllables:

*i *i
*u *u
*o *u

Above I assume that PAN first vowels developed more or less like second vowels. A study of OC syllable types and PAN fist vowels may reveal a different course of development.

My OC could be from PSTAN *o which raised to *u in PAN:

*ponuq > OC *pʌ-nuʔ and PAN *punuq 'brain'

4.11.1:10: If OC and PAN are not related, the word could be a borrowing from one into the other when the source language had *o as the first vowel.

4.11.1:35: Of course OC is not the only Sino-Tibetan language. STEDT lists nu-words for 'brain' in other languages. The Proto-Sino-Tibetan form may have ended in a *-q that

- was retained in Proto-rGyalrongic

- became *-k in some languages: e.g., Written Burmese ūḥnok

- became a glottal stop in OC

- was lost in Tangut

0118 and 0127 2no1 < *noH 'brain'

Was the mid vowel in some of these forms lowered before *-q? Jacques' (2004: 266) Proto-rGyalrongic reconstruction does not have *-uq. Maybe there was a chain shift: *-uq > *-oq > *-ɔq.

4.11.2:17: I am agnostic about PSTAN. Currently I think Austronesian is more likely to be related to Kra-Dai than to Sino-Tibetan.

If the correspondences above are valid, they do not entail a genetic relationship. They may tell us about patterns of borrowing.

Conversely, if the correspondences are due to common ancestry, exceptional forms may have been borrowed after a split (cf. how the loanword paternal has p instead of the regular Germanic f from Proto-Indo-European p).

*4.11.1:56: Type A syllables were characterized by secondary pharyngealization (a.k.a. 'emphasis') at some point. I do not know of any other Sino-Tibetan language with pharyngealization. I suspect that pharyngealization was a Chinese innovation which may have been due to contact with a substratum or neighboring language. I have omitted pharyngealization in this discussion to focus on the vowels.

**4.11.2:15: I got the symbols and from my phonetic notation for Middle Korean which had a two-class height harmony system like my Old Chinese reconstruction. I chose them because they are visually distinct from the letters for my six vowels. Their actual phonetic values may have overlapped with two of the vowels: e.g., they could have been and *a. It is easier to type than a phrase like "unaccented presyllabic higher vowel" or *ə̆ with a breve. DO AUSTRONESIAN AND SINO-TIBETAN SHARE A WORD FOR SETARIA ITALICA?

Today I saw Laurent Sagart's "Austronesian and Sino-Tibetan words for Setaria italica and Panicum miliaceum: any connection?" (2014) and was surprised to see him mention Khitan in a paper about prehistory (emphasis mine):

There is a complication with the semantics of this comparison: certain modern authors (Li 1983:29; Hu 1984; Chai et al. 1999:9) claim jì 稷 did not mean 'Setaria italica' in early Chinese but 'Panicum miliaceum'. This view, widespread among Chinese agronomists, is based on statements by various Chinese authors from c. 1000 CE down to modern times, to the effect that jì 稷 is the same plant as 穄 *[ts][a][t]-s > tsjejH > jì ‘Panicum miliaceum’. Thus Chai et al. (1999:9) observe that in the three provinces of Shandong, Henan and Hebei, (glutinous) Panicum miliaceum varieties are today usually referred to as jì 稷.

However, this is a confusion arising from the phonetic convergence of these two words after Middle Chinese (a standard reading pronunciation from the sixth century CE, known to us through the dictionary Qie Yun 切韻, prefaced in 601 CE, and its later editions). In Modern Standard Chinese, Middle Chinese (MC) 稷 tsik and 穄 tsjejH have both evolved, quite regularly, to jì [ʨi 51]. The merger had already occurred in northern Chinese during the Khitan or Liao dynasty, which occupied parts of north China, including Hebei, from 916 to 1125 CE. Phonetic transcriptions in Khitan small script of the 11th and 12th century Chinese show that while MC final -k was still represented by a glottal stop in poetry, it had disappeared in everyday speech (Kane 2009:252sq.). thus in everyday Chinese of the Khitan period,'Setaria italica', MC tsik, was probably [tsi]. At the same time, the character 祭, a MC homophone of'Panicum miliaceum' on Middle Chinese (both MC tsjejH), and the phonetic element in'panicum', was also [tsi] (Shen 2014:318). It is significant that there are no statements equating 稷 tsik and 穄 tsjejH from time periods preceding the phonetic merger of the two forms [i.e., from before c. 1000 CE]. Thus we can be satisfied that 稷 tsik and 穄 tsjejH were distinct cereals in early Chinese times, and that (since there is no question that jì 穄 meant ‘panicum’) jì 稷 tsik must be the name of Setaria italica.

I would like to add that Kane's argument is based on Chinese-internal data: the poetry in question is in Chinese, and the loss of final glottal stop is implied in 沈括 Shen Gua in 夢溪筆談 Mengqi bitan 'Dream Pool Essays' (1088; Kane's translation):

Even now the Heshuo [= Hebei; i.e., north of the Yellow River] people pronounce 肉 [*zhiwʔ] as 揉 [*zhiw], and 贖 [*shu] as 樹 [*shu].

In the Khitan small script,

[g]enerally speaking there is no consistency in the use of the graphs used to transcribe syllables which ended in stops in MC and probably a glottal stop in Song Chinese. This does not prove that Liao Chinese did not have a glottal stop in such words, just that the Kitan [= Khitan] transcription does not indicate it. (Kane 2009: 254)

For instance, the Khitan small script character

339 <i>

was used to transcribe syllables whose MC readings ended in -i and -it (both corresponding to Song *-iʔ). The one instance of a word whose MC reading ended in -ik like 稷 tsik 'Setaria italica' was written as

087 <tz>

which also transcribed the open syllables 知 *ji (MC trje) and 旨*ji (MC tsyijX).

The Sino-Tibetan forms for Setaria italica look like a good match for Proto-Austronesian *beCeŋ (*e = [ə]) with the exception of the coda:

Probable Tibeto-Burman cognates of the Chinese word [稷 Old Chinese *[ts]ək] are Trung tɕjaʔ55 ‘millet’, Lhokpu cək ‘Setaria italica’ (van Driem, p.c. to LS, June 24, 2004; not phonologized): if the shape and semantics of this last form are confirmed, the Proto-Sino-Tibetan word for 'Setaria italica' might sound something like #tsək (pre-reconstruction).

Both Proto-Sino-Tibetan (PST) and Proto-Austronesian (PAN) had . I would expect the following correspondences which are in Sagart (2002: 7):

OC (and probably also PST) *-k : PAN *-k

OC (and probably also PST) *-ŋ : PAN *-ŋ

Yet Sagart also found examples of the correspondence

OC (and probably also PST) *-k : PAN *-ŋ

which has Sino-Tibetan-internal parallels: e.g.,

Tangut 1siw4 < *sik, Written Burmese sac < *sik : OC 新 *sin < *siŋ? 'new'

I presume there is morphological variation within Sino-Tibetan. But if the Sino-Tibetan and PAN forms for Setaria italica are related, how can the different codas be explained? Are they different reductions of *-ŋk, a cluster lost in ST and PAN?

Genetic scenario:

Proto-ST-AN *-ŋk > PST *-k but PAN *-ŋ

Nongenetic scenario (i.e., borrowing):

pre-PAN *-ŋk > borrowed as *-k in PST but became * in PAN

4.10.4:40: The first vowel of PAN *beCeŋ (*e = [ə]) is consistent with my theory that presyllables with higher vowels (*i, *ə, *u) conditioned type B syllables in Old Chinese such as 稷 Old Chinese *[ts]ək].

Sagart (2002: 8)  found the following correspondences between  OC syllable types and PAN segments:

OC type A : PAN penultimate syllable initial voiceless stop (except *q-) or zero (i.e., no penultimate syllable)

OC type B : other PAN penultimate syllable initials including *q-

If PAN preserved Proto-ST-AN penultimate syllable initials, I do not understand why bare syllables and syllables preceded by voiceless stops developed type A with pharygealization. And why would *q- block pharygealization which was the default (!) development? (Normally pharygealization is marked: i.e., nondefault.)

PSTAN *(tV)CV > OC *CˁV (type A)

PSTAN *qVCV, *sVCV, *nVCV > OC *CV (type B)

In Semitic terms, type A is 'emphatic', and Semitic q is an 'emphatic' consonant, so I would expect it to be associated with type A.


Last night I found these translated sections of the History of the Liao Dynasty translated in Wittfogel and Fêng (1949: 261):


On the day mou-ch'ên [of the eleventh month in the thirteenth year of T'ung-ho [= 995 AD*] ] Korea sent ten boys to study the [Ch'i-tan [= Khitan] ] national language.


On the day kêng-ch'ên [sic for kêng-hsü] [of the third month in the fourteenth year of T'ung-ho [= 996 AD] ] Korea again sent ten boys to study the [Ch'i-tan [= Khitan] ] national language.


[On the day chia-shên of the twelfth month in the first year of K'ai-t'ai [= 1012 AD] ], Kuei Prefecture reported that its inhabitants, who had originally been moved from Silla [= Korea], were illiterate, and that schools should be set up to educate them. This request was approved by imperial decree.

I wondered which Khitan script(s) those Koreans learned: the large script, the small script, or both.

David Boxenhorn suggested that those Koreans might have tried to write their own language in the small script. That would have been easy to do, since Korean a thousand years ago

- had *CV(C) syllables like Khitan without the consonant clusters of a few centuries later (and even such clusters coud have been written with sequences of small script consonant symbols)

- had roughly the same consonants as Khitan minus the uvulars

- shared most of its vowels with Khitan (*i, *e [> later Korean yŏ], *ə, *a, *u, *o)

Only the apparent absence of the vowels and (> later Korean a/ŭ) in Khitan might be a problem. Existing CV, V, and VC characters could do double duty for those vowels: e.g.,

273 <un>

could represent both Korean *ɯn as well as *un. That would parallel the current use of the Roman letter u to transcribe both Korean [ɯ] and [u]: e.g., Kim Jong-un is [kimdʑəŋɯn].

Also, dots could be added to indicate non-Khitan uses of characters, just as the Khitan added a dot to  <pu> to write the Chinese syllable <fu>:


241 <pu> > 261 <fu>

4.9.3:13: David's scenario makes me wonder if the Jurchen used the small script to write their language.

When I saw this passage in Wittfogel and Fêng (1949: 253),

In 1150 a distinguished Jurchen statesman is said to have written a confidential political letter to his son in the small Ch'i-tan script; this interesting document, translated into vernacular Chinese, is preserved in the Chin Shih [= History of the Jin Dynasty] (CS [= Chin shih 76, 2a ff.; 84, 3a ff.).

I wondered if the statesman wrote in Khitan or in Jurchen using the Khitan small script. Wittfogel and Fêng raised the possibility of the latter:

Many Chin records describe the continued use of the Ch'i-tan script during the early and middle years of the Chin dynasty. Unfortunately, they do not make it clear whether this also involved the use of the Ch'i-tan language. There must have been a number of Jurchen who spoke Ch'i-tan, but the question still arises whether such knowledge was necessary to the use of the Ch'i-tan script. In the formative period of their power the Mongols wrote their documents in the Mongol language but in the alphabetic Uighur script (Browne 28 II, 441; cf. Barthold 28, 41). The Manchus until the year 1599 wrote their documents in Mongol and used the Mongol script (KHTSL 3, 2a-b). The Jurchen may have availed themselves of either method exclusively, or of both at different periods of time, first adopting an alien language and script and later using the alien script for transcribing their own language. In the latter case the smaller script would seem particularly appropriate, for as an alphabetic system of writing it could easily be adjusted to the needs of another language, especially if this language belonged to the same Altaic complex [as Korean does!]

*4.9.2:49: Although I suspect the eleventh month of Tonghe (= T'ung-ho) 13 is in the start of 996 AD, Wittfogel and Fêng referred to 995 AD in their footnote (emphasis mine):

This record is confirmed by the Korean official history which relates that in 995 the Korean government sent ten boys to Liao [the Khitan Empire] to study the Ch'i-tan language (KRS [= Koryŏ-sa 'History of Koryo'] 3, 46). However, this effort seems to have produced very poor results. In 1010, when the Liao vanguard general sent a document written in Ch'i-tan to the Korean court, no one could read it (KRS 94, 86).

Did any of the ten boys return as men to serve the court, and if so, were any of them still at the court in 1010?

4.10.4:54: Andrew West pointed out that the eleventh month of Tonghe 13 is equivalent to 25 November-24 December 995, so Wittfogel and Fêng's date is correct.


I wanted to see 'on the tomb' from my last post in context, so I looked at the text on the lid of the epitaph for 蕭仲恭 Xiao Zhonggong as copied in Qidan xiaozi yanjiu (1985: 594):

1. 139-051-290-253 <na.gha.án.ô>

2. 188-169 <?.qó>

3. 081-140 <MONTH.en>

4. 081-348 <MONTH.e>

5. 334-262 <g.ui>

6. 071 <ong>

7. 076-020-361-140 <gho.y.én.en>

8. 251-084-205 <>

9. 052-334-361 <?.g.én>

Let's go through it block by block:

1. Kane 2009 (51, 106) translated

139-051 <na.gha> and 139-051-290 <na.gha.án>

as 'uncle' and  'maternal uncle' (cf. Written Mongolian naghachu 'maternal uncle'; Ji Shi 1982). Neither occur alone in Qidan xiaozi yanjiu's index of Khitan small script words. Have they been found in isolation in the texts discovered in the three decades since the publication of that book?

Could 290 <án> be the plural suffix also in

311-151-290 <b.ugh.án> 'children' < 311-168 <b.qo> 'son, child'

which also has unexpected medial voicing in the plural? Is -gh- a contraction of *-qw- < *-qo-?

The final character

252 <ô>

is an error for

341 <er>

which Kane (2009: 106) regarded as the invariable (and in this case, nonharmonic) accusative-instrumental suffix ('via the maternal uncle'?). However, I would expect the genitive: 'junior tent of the maternal uncle'.

Could <er> be a plural suffix?


222-362-222-341 <ń.iau.ń.er> 'siblings' < 222-362 <ń.iau> 'sibling'?

is another plural ending in <er>, though the suffix may be <ń.er>. I don't know of any plural suffix <ń>, so I don't think <.ń.er> is a double plural suffix.

Could <er> be a plural genitive suffix if <na.gha.án> is a singular?

Could <na.gha.á> be a doubly marked plural like Japanese ko-domo-tachi, English child-r-en, and Dutch kind-er-en (cf. German Kind-er with only one suffix)?

2. <?.qo> is 'junior' (Kane 2009: 25). Kane interpreted this as an adjective modifying the previous noun ('junior maternal uncles'), though if that was the case, it would be in an un-'Altaic' position: i.e., following instead of preceding hte noun.

Aisin Gioro read the first character

188 <?>

as <od> in 2004 and as <oji> in 2011. If it is <od>, how did it differ from


which Aisin Gioro read as <ad> ~ <od> and <od> ~ <do>?


081 <MONTH>

is an error for

380 <TENT>.

Kane (2009: 25) translated blocks 1-3 as 'the tent of the junior maternal uncles'; I would add an 'of' before 'the' to correspond to the genitive suffix

140 <en>.


081 <MONTH>

is an error for

082 <yw>

with a dot. Hence <yw.e> is a transcription of the Liao Chinese name 越 *Ywe.

5. Transcription of Liao Chinese 國 *gueiʔ 'state'*.

6. Transcription of Liao Chinese 王 *ong 'prince'.

Blocks 4-6 means 越國王 'prince of the state of Yue'.

7. 076-020 <gho.y> may be a verb stem.

361 <én> could be a nominalizing suffix, though I would not expect <é> after <gho.y> if Khitan vowel harmony was like Mongolian or Manchu vowel harmony.

Is 140 <en> a genitive before 'tomb': 'on the tomb of ...'?

8. 'tomb-LOC': 'on the tomb'.

9. Kane transcribed 052 as <RECORD>, and stated that it "is only found in the word

[052-334] <> [= my <g>] 'record'

with various suffixes." However, it can occur in isolation and with characters other than 334, though it cannot occur in noninitial position (Qidan xiaozi yanjiu 1985: 201-202, 690-691). That suggests 052 is not a logogram. Aisin Gioro read it as <cu> in 2004 and <ce> in 2011.

361 <én> is a nominalizing suffix. Kane (2009: 155) translated 052-334-361 <?.g.én> as 'inscription'.

*4.8.3:48: Although the Khitan may have borrowed Liao Chinese 國 as gui [guj], I suspect the Liao Chinese pronunciation was *gueiʔ [kwəjʔ]. In Middle Chinese, 國 was *kwək, and has developed in at least two different ways in modern Mandarin dialects:

1. *kwək > *kwəɰk > *kwəɰʔ > *kwəjʔ > [kwej] (e.g., Jinan)

2. *kwək > *kwəʔ > [kwo] (e.g., Beijing)

Forms like Linquan [kwɛ] or 13th century Phags-pa Chinese ꡂꡟꡠ <gue> may be from either *kwəjʔ or *kwəʔ with fronting of the schwa.

The Khitan borrowed from a dialect with the first path of development.

Prescriptive 15th century Sino-Korean 귁 kuyk might be a conscious compromise between actual Sino-Korean 국 kuk and Ming Mandarin [kuj].


According to my harmonic unwritten vowel hypothesis,

251-084 <n.ra> 'tomb'

in the Khitan small script was read nara without the apparent harmonic violation of Kane's (2009: 123) nera. So far, so good. But the dative-locative suffix for 'tomb' is de, not *da:

251-084-205 <> 'tomb-LOC'

This is not an isolated spelling. It occurs seven times in four texts over a span of a century:

- twice in 蕭令公 (1.10, 26.14; 1057)

- once in 許王 (2.17; 1105)

- once in 耶律撻不也 (1.10; 1115)

- thrice in 蕭仲恭 (lid 3.2, 1.8, 44.38; 1150)

I wonder if there are even earlier occurrences. Did the harmonic form *nara-da ever exist: e.g., at the time of the invention of the small script c. 925?

Here are other examples of seemingly nonharmonic dative-locative -de:

051-251-205 <> '?-DAT/LOC'? (蕭令公 12.17) instead of *ghan-da (assuming ghan is the stem though it is not attested in isolation)

071-205 <> 'prince-DAT' (蕭仲恭 4.51) instead of 071-217 <> (quoted in Kane 2009: 137; source not specified)

076-189-099-205 <> '?-DAT/LOC' (耶律撻不也 21.1) instead of *ogha(a)d-da

141-205 <> 'seven-LOC' (蕭仲恭 8.12) instead of *dolo-do

But -de is expected if Aisin Gioro's (2004, 2005) reconstruction of 'seven' as dil is correct.

248-118-205 <jal.qú.de> '?-DAT/LOC' (許王 50.17) instead of *jalqu-du

The reading <jal> is from Aisin Gioro (2004).

Was nara-de a harbinger of the ultimate fate of the Khitan dative-locative? If Khitan had survived, would it have an invariable -de [də], just as the Jurchen dative-locative suffixes

<do> and <du> (= Kiyose's dö)

merged into Manchu de [də]? Could such an invariable -de already have existed in late colloquial Khitan, emerging occasionally in texts that otherwise reflected harmonic allomorphy lost in speech?

4.7.0:56: Khitan had an invariable accusative-instrumental suffix -er, though the homophonous perfective suffix had -ar and -or  allomorphs (Kane 2009: 131, 145-146). Would as yet undiscovered 10th century small script texts also have accusative-instrumental -ar and -or? Why did merger occur in the accusative-instrumental before the dative-locative? Was disambiguating the former from a homophonous verb suffix a factor?

Unlike Khitan, Jurchen had three allomorphs of the accusative suffix:


<ba> (written with two types of characters), <be>, <bo>

All three merged into Manchu be [bə].


On Friday I was looking for the name of Yelü Abaoji's father

244-084-051-099-222 <ń>

transcribed in Chinese as 撒剌汀 *saʔlaʔding or 撒剌的 *saʔlaʔdiʔ* in Kane (2009). Last night I found it on page 129. I also rediscovered my 2014 post on the name.

Last year I interpreted 084 as ar and read the name as Sargha(a)diń. But if 084 was ar, what was the difference, if any, between it and 123

which also represented ar?

Kane (2009: ) read 084 as ra and tentatively reconstructed an inherent vowel e in 244. Hence he read the name as Seraghadiń. The coexistence of e and a is unexpected in Mongolic or Jurchen/Manchu. There is no guarantee that Khitan vowel harmony was like Mongolic or Jurchen/Manchu vowel harmony, but the limited evidence suggests some degree of similarity. So I am skeptical that the name contained an e. However, other alternatives also have problems: e.g., Sargha(a)diń above. A zero-vowel interpretation of 244 results in Sragha(a)diń with an un-'Altaic' (and hence unlikely) initial cluster. The Chinese transcriptions cannot help us, as Liao Chinese had no *se or *sr-, so 撒剌 *saʔlaʔ- could represent Khitan Sar-, Sera-, or Sra-.

A fourth possibility is that the name was Saragha(a)diń with an unwritten first vowel. Were Khitan small script readers able to supply unwritten vowels with the aid of vowel harmony rules? Perhaps 244 was read as s, sa, or se depending on context. In this case, it was read as sa because sr- would be an impermissible initial cluster and sera- would violate vowel harmony.


244-084-254 <s.ra.d> '?' and 251-084 <n.ra> 'tomb'

which Kane read as serad and nera would be read as sarad and nara according to my harmonic hypothesis.

In these cases, the reader would have to look ahead to determine whether the vowels of 244 <s> and 251 <n> would be a or e.

Conversely, readers of the traditional Mongolian script keep previous vowels in mind to disambiguate later vowel letters: e.g., the second vowel letter of


<eja/en> 'lord'

has to be read as e because the first vowel is e. Although a medial a looks exactly the same as a medial e, *ejan would violate vowel harmony.

*撒剌的 is from the History of the Liao Dynasty. I don't know where Kane (2009: 129) found 撒剌汀.

15.4.4:23:40: WHY <SA> MANY?: PART 1

I have already discussed

(~~) and ,

two of the eight types of Jurchen <sa>-graphs, at length in "Jurchen Polyphony 2", "That Yu-ni- Component", and "Un-<sa>rtain-'tea' ", so I will move on to the third which is only attested in two names:

the surname <sa.hala>* (女真進士題名碑 21)

the personal name <>** (慶源郡女真國書碑 4:2)

Was this <sa> intended solely for use in names, or was it used to write other words absent from the few texts that we have on hand?

*4.5.1:31: Jin Guangping and Jin Qizong (1980: 311)  and Jin Qizong (1984: 107) read the second character as xala = my hala. However, the entry for that character in Jin Qizong (1984: 129) listed gal as Jin Guangping's reading and does not include the surname as an example. To confuse matters further, the Chinese transcription of the name is 撒合烈 *saʔhoʔlieʔ with different vocalism that is not harmonic. I would expect something like *saʔhoʔlaʔ.

**4.5.1:35: I cannot explain the nonharmonic sequence -ae. If u and i were neutral vowels, I would expect *udisaa or *udisee. Could the name be of non-'Altaic' origin: i.e., from a language without vowel harmony? But what language would that be? The name is too long to be Chinese.

15.4.3:23:43: WHY <SA> MANY?: PROLOGUE

My previous entry dealt with the mys-'tea'-ry of why the Chinese character 茶 *cha (in Jin Dynasty pronunciation) 'tea' was used as the basis of the Jurchen character

<sa> (not <cha>!)

None of the Jurchen <ca>-characters look like Jin Chinese *cha-characters:

, so far known only in the word <> 'helmet'.

for <ca> elsewhere (with a possible variant in 女真進士題名碑)

(4.4.1:10: There is an obscure Chinese character 𠮮 attested in the Liao Dynasty dictionary Longkan shoujian with the reading *hua, not *cha. The dotted version vaguely resembles Liao/Jin Chinese 吞 *ten.)

I forgot to ask why a 茶 *cha-based character for sa was needed at all given the existence of seven other types of <sa> in the Jurchen large script:


Why wasn't one - or seven - <sa> enough?

15.4.2:23:59: UN-<SA>-RTAIN-'TEA'

Last Friday, I listed Jurchen


as an example of a graph whose reading seemed to be of Liao/Jin Chinese origin: i.e., based on a northeastern dialect of Chinese from the 10th century onward.

<sa> appears to be a derivative of the Chinese character 茶 'tea'.

If Janhunen is right, and if the Jurchen script is derived from the elusive Parhae script, then the readings of Parhae characters would be based on pre-Liao Chinese: e.g.,a 茶-based graph would be read as something like *da (< Middle Chinese *ɖæ) or even *ra or *la (< Old Chinese *rla) if a Manchurian tradition of writing went back very far or if northeastern Middle Chinese retained an archaic liquid-initial reading of 茶.

However, <sa> has an initial fricative that matches none of the hypothetical initials of the Parhae scenario or the *ch- of the Liao/Jin Chinese reading of 茶. Nonetheless, I thought the reading <sa> was of Liao/Jin Chinese origin because <s> and *ch- are both sibilants. But why would the creator of the Jurchen script take a Chinese character pronounced *cha and use it to write Jurchen sa?

Hypothesis 1: Because the source Chinese dialect had initial *s- in 茶.

Although Japanese does have the Tō-Sō-on (i.e., post-Middle Chinese) reading sa for 茶 (e.g., in 喫茶店 kissaten 'cafe'), that is not evidence for Jin Chinese *s-, because Japanese lacked an affricate at the time of borrowing, so the s- of sa is an approximation of a Chinese affricate. There are a couple of modern Chinese languages with s- in 'tea', but they are far from the northeast, and their s- might be of recent origin: Qinglong Ping sa and Shitai Wu sʰa.

Hypothesis 2: Because the source Chinese dialect had initial *tsʰ- in 茶.

According to, some modern Mandarin varieties including Beijing (presumably the colloquial accent as opposed to the Beijing-based national standard) have tsʰ- in 茶. There was no tsʰ- in Jurchen, so the Jurchen might have perceived tsʰ- as s-. But the tsʰ- of 茶 might be of recent origin like s-.

Hypothesis 3: Because of a sound change in Khitan.

Jurchen/Manchu has both sh- and c- corresponding to Mongolic c-:

'white': Jurchen/Manchu shanggiyan : Proto-Mongolic *cagaxan (Janhunen 1996: 197)

'army': Jurchen cau(r)-, Manchu cooha : Middle Mongolian ca'ur 'to fight' (Kane 2006: 

My guess is that the sh-forms were borrowed from a nonstandard Khitan dialect (Eastern Khitan?) whose speakers were in close contact with the Jurchen, whereas the c-forms were borrowed from a more prestigious variety of Khitan.

Could Jurchen <sa> be based on a Khitan large script character whose reading shifted from cha to sha due to deaffrication in Eastern Khitan?

There are two problems with this scenario. First, I do not know of any Khitan large script character resembling 茶. The shape of <sa> is either a Jurchen innovation or a carryover from the Parhae script absent in the Khitan large script. Second, the Jurchen character was pronounced sa, not sha.

(4.3.3:10: But maybe this hurdle is not insurmountable, as

seems to have been read as both sa and shang judging from Ming Chinese transcriptions. Was <sa> ever read as sha? Conversely, was 'white' ever sanggiyan in Jurchen? Did Jurchen borrow from three kinds of Khitan dialects: one that retained c, another that weakened it to sh, and yet another that weakened it to s?)

Hypothesis 4: Because of a gap in Jin Chinese phonology.

There may not have been a *sa in Jin Chinese*, so *cha was used as the basis of <sa>**.

But even if Jin Chinese only had *saʔ with a final glottal stop, wouldn't characters with that reading (e.g., 撒薩颯卅) be a better match for *sa than 茶 *cha?

And if the Jurchen script is based on the Khitan large script (according to the mainstream view) or the Parhae script (according to Janhunen), why not carry over an existing character from one of those scripts for sa? Why create a new character for a syllable that probably existed in Khitan and whatever language the Parhae elite spoke***?

*4.3.3:18: Middle Chinese *sa became Liao/Jin Chinese *so.

I do not know whether Mandarin sa < *shai for 洒灑 can be projected back into Jin Chinese. Mandarin sa could be a borrowing from a much later dialect in which *sh- became s-.

**4.3.3:31: This kind of substitution has a weak parallel in the Old Japanese man'yōgana script. Middle Chinese 娑 *sa was a low-frequency character and prone to be misread as its phonetic 沙 *ʂæ (and in fact 娑羅双樹 can be read as shara sōju as well as sara sōju in modern Japanese). Hence the most frequent phonogram for Old Japanese sa was the high-frequency character 佐 *tsaʰ in spite of its initial. (See the frequency statistics in my 1999 dissertation and 2003 book and on Ueshiba Hiroshi's site.)

***4.3.4:06: Janhunen (1996: 152-153) doubted that Koguryo and its Parhae successor state were "likely to have been dominated by ethnic elements that would have been linguistically ancestral to the modern Koreans" and proposed that "they were dominated by people ethnically ancestral to the Jurchen": i.e., Tungusic speakers. Nonetheless the limited linguistic material available from Koguryo points to Koreanic and even Japonic rather than Tungusic.


Guillaume Jacques (2015: 220) wrote what I've been thinking for years now:

In all modern systems of [Old Chinese] reconstruction, *-r- is reconstructed for all syllables with either second division rhyme, chongniu 3 and/or retroflex initials in Middle Chinese. While it has been convincingly demonstrated that clusters in *-r- is indeed one possible origin for these syllables (Yakhontov 1961), there is no definite proof that *-r- should be reconstructed in all cases.

I used to reconstruct a lot of medial *-r- in Old Chinese until 2006 when Zev Handel's "Rethinking the medials of Old Chinese: Where are the r's?" opened my eyes to the possibility of preinitial *r-. Over the years I have wondered if those syllables had even more sources: e.g., in 2012 I wrote that *r- "might be from earlier *l- and/or *t- as well as *r-". Classical Tibetan has preinitial l- and d- as well as r-.

Guillaume's figures confirm my suspicion that there is too much noninitial *r in Old Chinese reconstructions:

As a measure of comparison, over 20% of syllables in Old Chinese as reconstructed by Baxter and Sagart (2014) contain a preinitial or a medial *r, while in Japhug and Tibetan, where consonant clusters including r are attested, we only find respectively 12% and 16% of syllables with non-initial r.

Like Classical Tibetan, Japhug has preinitial l-. Would adding the percentage of syllables with preinitial l- raise 12% and 16% to roughly 20%? Japhug l-syllables are rare and presumably of secondary or external origin (e.g., ld- is from *rl- [Jacques 2004: 314]), as original preinitial *l- became j- (Jacques 2004: 271). Maybe the total of j-, l-, r-, and -r- syllables of Japhug might reach 20%. Do any Tibetan or Japhug preinitial (*)l- correspond to *-r- in a typical modern reconstruction of Old Chinese?


On Saturday I found this blog post by Mike Aubrey:

I hold, following Mussies that these two clusters [χθ <khth> and φθ <phth>] were pronounced /kth/ and /pth/ in the Hellenistic Period.

Mussies (1971: 51) wrote (with transliteration that I added),

-φθ- <phth> and -χθ- <khth> are misleading orthographies and respresent resp. -pth- and -kth-.

Aubrey added,

Non-Alveolar Stop + Aspiration + Alveolar Aspirated Stop [i.e., a sequence like phth or khth] is both difficult to pronounce and also phonologically implausible

I do not know of any modern language that allows such sequences: e.g., in Korean /ph th/ and /kh th/ would be pronounced as [ptʰ] and [ktʰ], not *[pʰtʰ] and *[kʰtʰ]. Similarly in Sanskrit, the rule is to reduce /ChCh/ to /CCh/, though

in the manuscripts, both Vedic and later, an aspirate mute is not seldom found written double—especially, if it be one of rare occurrence: for example (RV.), akhkhalī, jájhjhatī (Whitney 1889: 53; emphasis mine).

Aubrey found examples of the spelling error πθ <pth> that his theory predicts.

Are there also examples of κθ <kth> as a misspelling for χθ <khth>?

Supposing Aubrey is right. Given the fact that Classical Greek spelling is basically WYSIWYG (what you see is what you get [i.e., pronounce]), why were /pth/ and /kth/ properly spelled as φθ <phth> and χθ <khth> instead of *πθ <pth> and *κθ <kth>? The aspiration of the φ <ph> in ὀφθαλμός <ophthalmós> 'eye' is not etymological, as the final consonant of the root op- < *okʷ- < Proto-Indo-European *ʕʷekʷ- 'eye' is unaspirated. (4.1.0:49: Beekes derived this word from Pre-Greek *okʷt-alʸ-(m-). The resemblance between inherited *okʷ- and substratal *okʷt- is coincidental. In any case, the aspiration of φ <ph> is not original.)

Last night, I rediscovered Beekes' "Pre-Greek*: The Pre-Greek loans in Greek" to write "Making Machines". Beekes regarded φθ <phth> as a cluster in the substratum language that he calls "Pre-Greek"**. What if Pre-Greek allowed (allophonic***) aspirate sequences that were carried over into Greek**** and reflected in the spelling? Substratum-influenced pronunciations like [pʰtʰ] and [kʰtʰ] may have coexisted side by side with an inherited pronunciation [ptʰ] and [ktʰ]***** for a time. Then the latter dominated in speech, though the spellings with double aspirates persisted as the norm.

4.1.1:45: My theory implies that if the earliest Greek speakers had moved to Greece and there had been no one there, the clusters φθ <phth> and χθ <khth> would not exist (unless the double aspiration had been of purely Greek-internal origin), and πθ <pth> and κθ <kth> would have been the only possible spellings.

*4.1.1:07: Beekes used the prefix Pre- to refer to an unrelated substratum language, where I generally use pre- (without capitalization) to refer to a largely internal reconsruction of an unattested earlier stage of a language: e.g., pre-Tangut is ancestral to Tangut and not a substratum of Tangut.

I use Proto- to refer to the (potential) result of comparative reconstruction of the ancestor of two or more languages: e.g., Proto-Pumi-Tangut (whose existence is implied by the family tree in Jacques 2014: 2).

However, if I speak of, say, the pre-Japanese languages in the plural, I am referring to multiple substratal languages, not an earlier stage of Japanese such as Proto-Japonic.

It would be nice to have three prefixes to distinguish between the three types of earlier languages: substratal, internally reconstructed, and comparatively reconstructed.

**4.1.0:55: Beekes (2007: 12) noted that φθ <phth> was also possible in inherited words.

Although Beekes did not explictly list χθ <khth> as a Pre-Greek cluster, it does appear in words he regarded as Pre-Greek: e.g., μοχθέω 'be weary with toil'.

***4.1.1:11: According to Beekes (2007: 5), aspiration was not phonemic in Pre-Greek.

****4.1.1:19: What if Pre-Greek had fricative allophones of stops long before the Greek aspirated stops became fricatives? Pre-Greek fricatives could have been borrowed into Greek as aspirated stops.

*****4.1.1:12: I assume that the Sanskrit constraint against aspirate sequences was also in the speech of those who brought Greek to Greece.

15.3.30:23:49: MAKING MACHINES

Seeing the Spanish word máquina 'machine' made me wonder about the origins of machine and mechanism. Those two words don't sound much alike in English, and their Japanese derivatives don't even look alike, as they are written with different kana:


<ma.shi.n> mashin ~ <ma.shi.-.n> mashīn 'machine', <mi.shi.n> mishin 'sewing machine'


<me.ka.(> meka(nizumu) 'mechanism'

They appear to be from Latin borrowings of the same Greek word from different dialects at different periods:

newer mechanismus < Attic-Ionic mēkhanḗ

The ē of mēkhanḗ is from an preserved in Doric (see below and Sihler 2008: 50).

older māchina < Doric mākhanā́

Latin i is from unaccented short *a (Sihler 2008: 60), so māchina must have been borrowed as *māchana before the *a > i shift.

Watkins (2011: 52) derived mākhanā́ from Proto-Indo-European root *māgh-anā (accent unspecified) 'that which enables' with an lenghthened-grade form of the root *magh 'to be able', the source of English may and might.

In a Leiden-style reconstruction without *a, would *māgh-anā be something like *mēʕgh-eʕnēʕ with a root *√mʕgh? Is it worth it to reconstruct so many to avoid *a?

But according to Wiktionary, Robert Beekes of the Leiden school derived the word from a pre-Greek substratum in his etymological dictionary which I haven't seen. Why couldn't mākhanā́ be from *√mʕgh?


Last night I mentioned the pan-Central Asian title

053-051 <qa.gha> 'qaghan'

as an example of a non-Chinese loanword in Khitan. I wonder if its medial -gh- indicates a late borrowing.

In native Khitan words, medial *-gh- and *-b- were lost between the vowels *a and *u:


'hundred' *jaghu > 015 <jau>; cf. Written Mongolian jaghun


'five': *tabu > 029 <tau>; cf. Written Mongolian tabun

This loss enabled the graphs for 'hundred' and 'five' in both the large and small scripts to represent the Chinese loanword


<jau.tau> < 招討 'bandit suppression commissioner'.

How many other Khitan words lost their medial consonants: i.e., how many companions did the commissioner have?

At this point, I don't know whether

*-gh- (= */g/?*) and *-b- were lost between other vowel sequences

- *-d- was also lost (or became something else**) between vowels

In other words, I don't know the limits of lenition in Khitan. Knowing those limits would enable us to date borrowings: e.g., if *-gh- was lost in the environment *a_a, then qagha must have been borrowed after that loss, just as Liao Chinese 招討 *jautau was borrowed after the loss of *-gh- and *-b- in the environment *a_u in 'hundred' and 'five'.

Qagha is certainly not native to Khitan, but what about words which might have -aghu- and -abu- sequences*** such as

189-151-123-348 <>**** (興宗 28.14) and 189-196-222 <a.bu.ń> (興宗 31.4)

Are these loanwords? If not, have their intervocalic obstruents been restored by analogy? Or are they of secondary origin from earlier clusters***** (e.g., *ambu > abu) or a lost series of obstruents****** (e.g., *au > abu but *abu > au)?

*3.30.0:53: In pre-Khitan, *gh and *g might have been allophones of */g/ appearing before different vowels: */ga/ was *gha, */ge/ was *ge, etc. In any case, gh and g were distinct phonemes in Khitan because /ga/ was possible in Chinese loans (like Manchu g'a):

Pre-Khitan Khitan IPA
*/ga/ /gha/ [ʁɑ]
*/ge/ /ge/ [gə]
([gɑ] not possible) /ga/ [gɑ]

**3.30.0:30: In Korean according to Alexander Vovin (2010), medial *-p-, *-t-, *-s-, and *-k- lenited to Middle Korean -β-, -r-, -z-, and -ɣ- which became -w/Ø-, -r-, -Ø-, and -Ø- in modern Korean. If Khitan was like Korean, then pre-Khitan *-d- might have lenited to a liquid in intervocalic position. But so far I have not seen any evidence for coronal lenition in Khitan. There was no z in Khitan, so if pre-Khitan *-s- lenited, it must have become something else.

***3.30.0:57: The rules for determining whether a graph was pronounced as VC or CV are still unknown, so perhaps those two blocks were read aughare or aubiń: i.e., without -gh- or -b- between a and u. Still other readings are possible since 123 may have been ra as well as ar, and 222 was ńi as well as (i)ń.

****3.30.1:05: If Khitan had Mongolian or Manchu-like vowel harmony, an e would not be expected in a word with a. Could *a be reduced to e [ə] in unaccented positions?

*****3.30.1:08: This was inspired by Vovin's derivation of Middle Korean intervocalic stops from earlier clusters which were mostly *nasal-stop sequences.

******3.30.1:15: In this scenario, voiced aspirates and voiced nonaspirates had distinct reflexes in intervocalic position but might have merged in other positions: e.g., *b(ʱ)- > b-, etc.


To Chinese eyes, the Khitan large script at first appears to be a random mix of Chinese characters and alien shapes.

Given that the Khitan large script is said to have been 'invented' c. 920 using the Chinese script as a model, one might expect it to be something like the modern Japanese script in which Chinese loans are generally written with Chinese characters and kana almost always represent non-Chinese words*:

Khitan large script characters resembling Chinese characters : Chinese loanwords

Khitan large script characters not resembling Chinese characters : native Khitan words

However, the reality is more complex:

Khitan large script characters resembling Chinese characters :

Chinese loanwords

e.g., 皇帝 (looks like Liao Chinese *hongdi 'emperor') for Khitan hongdi 'id.'

and native Khitan words

e.g., 五 (looks like Liao Chinese *ngu 'five') for Khitan tau 'id.'

Khitan large script characters not resembling Chinese characters :

native Khitan (or at least non-Chinese**) words

e.g.,  doro (?) 'seal'

and Chinese loanwords

e.g., gün 'army' for Liao Chinese 軍 *gün 'id.'

One could also hypothesize that Chinese character lookalikes were used to write Khitan syllables that had (near-)homophones in Chinese, whereas nonlookalikes were used to write non-Chinese Khitan syllables and words with un-Chinese segments and phonotactics: e.g., Khitan iri 'name' with an un-Liao Chinese -r-.

But in fact, syllables shared by Khitan and Chinese were sometimes written with nonlookalikes:

e.g., for ai (why not write it with a lookalike of Liao Chinese *ai-graphs like 愛?)

And syllables and words with un-Chinese elements were sometimes written with lookalikes:

e.g., 午 (looks like Liao Chinese *ngu 'horse (calendrical)') for Khitan iri 'name'

Did the creator(s) of the Khitan large script take the Chinese script as used in the early 10th century, keep random characters, change the sound values of some of them, and then make up new characters?

One might come up with such an explanation for Cyrillic: its inventors took the Latin alphabet, kept some letters (e.g., А), changed the sound values of some of them (e.g., В for [v] instead of [b]), and then made up new characters (e.g., Б for [b] and Г for [g]). However, that is not what what happened. Both the Cyrillic and Latin alphabets are derived from the Greek alphabet. They are sisters, not daughter and mother.

If Janhunen (1994, 1996) is correct, the Khitan large script is to the Chinese script what Cyrillic is to Latin. Like Cyrillic, the Khitan large script was not invented on the spot; it was an adaptation of an existing script: the Parhae script, a Manchurian offshoot of the early Chinese script. The following seven Khitan large script characters might then be inherited from the Parhae script rather than taken from the 10th century Chinese script:

Sinograph Liao/Jin Chinese Khitan large script Khitan Jurchen large script Jurchen
*ho (< Middle Chinese *ɣɑ) ha ha
*she (< Middle Chinese *ɕjæˀ) ? sha
*sien (< Old Chinese *sˁir < *sˁər) ? shira or shïra
*gung ? (no similar Jurchen character) (*gung***)
gung gung
*ong (< Old Chinese *ɢʷaŋ) ong ong

Janhunen then proposed that the Jurchen large script was another derivative of the Parhae script rather than a direct successor of the Khitan large script.

Let's suppose the conventional wisdom is correct and that the Jurchen large script was invented c. 1120 with the then-current Chinese script as a model. Why was Jin Chinese 公 *gung 'duke' written with Jurchen 王, a lookalike of the characters for Jin Chinese *ong 'prince' and Khitan ong 'prince'?

Jin Guangping and Jin Qizong (1980: 56) proposed that Jurchen 王 gung was derived from Jin Chinese 工 *gung 'work' with an added stroke. Why not just copy 公 or 工?

Here is a wild speculation. In Old Chinese, 王 was pronounced *ɢʷaŋ. In mainstream Chinese *ɢʷ- weakened to *w-, and later, *waŋ became -ong in the northeast. What if a now long-extinct Manchurian Chinese dialect retained a stop initial for 王? Then perhaps 王 had two readings in Parhae, *gung based on the colloquial stratum of Manchurian Chinese, and *ong based on a literary stratum borrowed from mainstream Chinese. The first reading is the source of the Jurchen reading and the second is the source of the Khitan reading.

3.29.0:34: I am skeptical of the stop-retention scenario because there is no other evidence for *ɢʷ- surviving as a stop at such a late date in the northeast or anywhere else. Nor is there any evidence for *-ʷaŋ becoming *-ung in the northeast.

3.29.0:46: The Jurchen characters

for ong resemble those for ja (see my previous entry)

with two extra strokes on top.

However, Jin Qizong (1984: 236) regarded the ong-graphs as derivatives of the Khitan small script character

071 <ong>.

How would Janhunen explain that resemblance? Do the Jurchen large script and Khitan small script characters both go back to a Parhae prototype? Could the Jurchen character retain a 'roof' lost in the Khitan small script character?

*3.29.0:57: Although there is a strong tendency to write Chinese loans with Chinese characters in Japanese, some Chinese loans are in kana: e.g., サンゴ sango 'coral' (instead of 珊瑚).

Furthermore, Chinese characters do not always represent Chinese loans. In many cases they represent native Japanese words: e.g., 薔薇 for bara 'rose' as well as the much rarer borrowings shōbi and sōbi.

**3.29.1:01: Not all non-Chinese words in Khitan are native: e.g.,

053-051 <qa.gha> 'qaghan'.

may ultimately be of Xiongnu origin. (Has this word been identified in the large script?)

***3.29.1:14: Jin Qizong read two different Jurchen characters

as gung (in my notation), so in theory either could have transcribed Jin Chinese 工 *gung 'work'.

However, the second is only attested as a transcription of 宮 'palace' which was transcribed as

334-019-345 <g.iu.ung>

in Khitan.

So I suspect that the two Jurchen characters originally represented two different syllables, gung and giung, that merged into gung in the Yuan Dynasty Old Mandarin dialect of the Zhongyuan yinyun but not Phags-pa Chinese where they are still distinct as ꡂꡟꡃ <> and ꡂꡦꡟꡃ <>.


When I first became interested in Jurchen, I assumed that its (large) script was "obviously derived from the Chinese script and the Khitan large script, with many innovations of its own" (Kane 1989: 21).

Then I discovered Janhunen's (1994: 114) hypothesis which I still regard as plausible after almost twenty years:

It was the other Sinitic script [of Parhae] that, due to its firm local [i.e., Manchurian] roots, was later transmitted first to the Khitan, and then to the Jurchen. All of this means that the conventional view, according to which the Jurchen script was successive to the Khitan «large» script, cannot be correct. As graphic systems, and heirs of the Bohai [= Parhae] script, the Khitan and Jurchen «large» scripts should be viewed as parallel, rather than successive developments.

There is much more to Janhunen's argument than that, but for now I want to focus on one of its implications. If the Khitan and Jurchen large scripts are offshoots of the Parhae script developed at some point prior to the end of the Parhae state in 926, then the readings of their Chinese-based elements are likely to reflect pre-10th century Chinese phonology to some extent. Such a scenario has a precedent in Old Japanese man'yōgana whose readings contain archaisms from the Chinese learned by the Paekche centuries earlier: e.g.

支 for Old Japanese ki < *ki and *ke is closer to Late Old Chinese *kie than Middle Chinese *tɕie

止 for Old Japanese is closer to Old Chinese *təʔ than Middle Chinese *tɕɨəˀ

(But Gerald Mathias views 止 as a kungana whose reading is based on Old Japanese töma- 'stop' [my təma-]; if so, then the resemblance to Old Chinese is coincidental.)

富 for Old Japanese is closer to Late Old Chinese *puəh than Middle Chinese *puʰ

Conversely, if the Khitan and Jurchen large scripts had no deeper roots, the readings of their Chinese-based elements should be derivable purely from Liao and Jin Chinese, as there would be no way for their creators to know about earlier readings.

Jin Guangping and Jin Qizong (1980: 56-57), Kane (1989: 23), and Kiyose (2004: 93) list Jurchen characters* with readings as well as shapes of Chinese origin**:

Jurchen Jurchen reading Sinograph Liao/Jin Chinese*** Middle Chinese Old Chinese
aci *ci *tɕʰiek *tɯ-qʰjak
ging *ging *kɨeŋ *Cɯ-qraŋ or *qɯ-raŋ
gung *gung *koŋ *koŋ
hi *si *sej *sʌ-ləj
i *u < *wuo *Cɯ-ɢʷa
i *u < *wuoˀ *Cɯ-waʔ
ja *jr *tɕi < *tɕɨʰ *təs
ki *ki *gɨ *gə
ngu *ngu *ŋo *ŋʷa
sa *cha *ɖæ *rla
u *ngu *ŋoˀ *ŋaʔ
dai *da(i) *dɑjʰ *lats
fu < pu *fu *fu < *puoˀ *poʔ
jul *ju *tɕu < *tɕuo *Cɯ-to
shang *shang *ɕɨaŋ < *dʑɨaŋˀ *Cɯ-daŋʔ or *Nɯ-taŋʔ
tai *tai *tʰɑjʰ *l̥ats
ha *ho *ɣɑ *ɢaj
sha *she *ɕjæˀ *l̥jaʔ
shira (Kiyose) or shïra (Jin and Jin) *sien *sen *sˁir < *sˁər < *Cʌ-sər

Out of that incomplete sample of nineteen characters,

- eleven have readings based on Liao/Jin Chinese (green)

- five have readings that could be based on either Liao/Jin Chinese or Middle Chinese (bluish green)

- two have readings that resemble Middle Chinese (blue)

- at least one has a reading that resembles Old Chinese (yellow)

I'll discuss a less likely instance in my next entry.

The last three characters (which all have have Khitan large script predecessors that look exactly like Chinese 何舍先) are hardly solid proof for Janhunen's hypothesis.

The Khitan and Jurchen may have used Liao/Jin Chinese 何 *ho for ha in their languages because there may not have been a character for *ha in Liao/Jin Chinese. (The only character read ha in the Phags-pa Chinese of the Yuan Dynasty is rare: 閜.)

Nonetheless the other two are difficult to explain if they were devised c. 1120 or perhaps even c. 920. Why write Jurchen sha with a derivative of Jin Chinese 舍 *she when Jin Chinese 沙 *sha was a closer phonetic match? And is the close match of Jurchen shira ~ shïra and Old Chinese *sˁir < *sˁər just a coincidence?

*3.28.2:50: Since this post does not deal with the Jurchen small script, I will refer to Jurchen large script characters simply as Jurchen characters.

**3.28.2:58: There are Jurchen characters with shapes of Chinese origin and native readings that are translations of Chinese: e.g.,


looks like Jin Chinese 一 *i 'one' but represented the native Jurchen word emu 'one'.

***3.28.3:15: I wrote Liao/Jin Chinese forms in an orthography resembling my transcriptions of Khitan and Jurchen to facilitate comparison. Khitan and Jurchen voiced obstruents may have been unaspirated and voiceless: e.g., Jurchen jul may have been [tɕul], a close match for Middle Chinese 朱 *tɕu(o).

15.3.26:23:49: QUINTUP-<UL> TROUBLE (PART 3)

In part 1, I proposed that Khitan small script character


might have represented <ül> because

131-366 <u.?> 'winter'
corresponds to Written Mongolian ebül 'id.'

In a generic 'Altaic' language, harmonic rules prevent the mixture of segments from two classes which I will call A and B*: e.g.,

a, u, ł, ɣ ... e, ü, l, g ...

'Neutral' segments can occur with segments of either class A or B.

Hence <ül> should be a class B character that should only co-occur with class B and/or neutral characters within a Khitan small script word block.

I used to think that

098 and 261

represented class A <ał> and class B <(e)l>, but in fact they not only coexist with each other but even with 366 in

340-098-366-261-349-021 <x.ał.üó>** (興宗 26.6)

(021 <mó> looks like an error for the dotless verb ending 020 <ei>)

which is unexpected from an 'Altaic' perspective. I would have expected


class A *130-098-206-098-051-122 <x.ał.uł.ał.ɣ>*** or class B *340-261-366-261-349-020 <x.el.ü>.

366 can also coexist with both class A 051 <ɣa> and class B 349 <ge> in the same text (道宗):


161-366-261-051-189-123 <aú.ül.el.ɣ> (道宗 12.30)

(instead of 161-206-261-051-189-123 *<aú.uł.ał.ɣ>)

and 131-097-372-366-334-140 <u.úr.û.ül.g.en> (道宗 18.6)

That would also be unusual for an 'Altaic' language.

I am conflicted.

On the one hand, Khitan has sets of suffixes implying the presence of an 'Altaic'-style harmonic system: e.g., the causative-passive suffixes (class A?) and (class B?) in the above pair of words.

On the other hand, there seem to be harmonic violations. Are those violations artifacts of incorrect class assignments (e.g., is 366 a neutral character?), or are they real and perhaps even predictable?

The earliest known small script text is dated 1053, over a century after the invention of the small script c. 925. Do all small scripts discovered so far reflect Khitan after its harmonic system began to break down? Would the very first texts in the small script have more harmonic spellings?

*3:27.2:13: I got the A/B terminology from EG Pulleyblank who used it to describe Old Chinese syllable types. Norman (1994) was the first to draw parallels between Old Chinese and Altaic syllable types. I have gone even further and proposed harmony rules for Old Chinese.

I use the terms A and B to avoid specifying the nature of the classes: e.g., front vs. back, ±RTR, etc. As Khitan is in the Manchurian linguistic area, I suspect it had RTR harmony like its neighbor Jurchen.

**3.27.2:17: This is Andrew West's reading. Qidan xiaozi yanjiu has

340-067-366-261-349-020 <x.eü.ü>

which is not only harmonic but also has the dotless verb ending 020 <ei> instead of dotted 021 <mó> which is not a verb ending. I have not seen the handwritten copy of 興宗, and the original stele is inaccessible, so I do not know who is correct.

***3.27.2:24: I assume 206 is a type A character since it is flanked by a-characters in

029-206-189 <tau.uł.a> 'hare'.

15.3.25:23:59: QUINTUP-<UL> TROUBLE (PART 2)

In part 1, I built upon Aisin Gioro's work by equating the following five Khitan small script characters and regarding the first three as variants of each other:

  013 <ul> = 050 <ul> = 206 <ul> = 228 <ul> = 366 <ul>

The second and third appear in the same word:

050-131-206 <ul.u.ul> (道宗 16.21, 20.13 [1101 AD], 蕭仲恭 33.33 [1150 AD])

Did scribes of two different inscriptions nearly fifty years ago apart really use two variants so close together in three instances, or did  050 and 206 have two different readings?

3.26.1:10: Was 050-131-206 for ulul (?) above related to (or at least partly homophonous with)

050-131-366-311-162 <ul.u.ul.b.c> (宣懿 18.2 [also 1101 AD])

050-131-366-311-222 <ul.u.ul.b.ń> (道宗10.25, 15.19, 28.24, 宣懿 17.11)

which have 366 instead of 206 for their second <ul>? Or did 206 and 366 have different readings?

Tangut fonts by
Tangut radical and Khitan fonts by Andrew West
Jurchen font by Jason Glavy
All other content copyright © 2002-2014 Amritavision