In my last post, I remarked upon the similarity of Tangut


0645 2wuq1 < *Sʌ-ʔwə/oH 'to aid'

to the Sino-Korean reading 우 u for 祐 'to aid'. I considered and rejected the possibility that the Tangut and Chinese words were cognates: i.e., inherited from Proto-Sino-Tibetan.

But I didn't consider yet another possibility: could the Tangut word be a borrowing from Chinese? That would explain the similarity between 2wuq1 and Sino-Korean u: they were both borrowed from roughly contemporary varieties of Chinese. 2wuq1 looks like Edwin G. Pulleyblank's Early Middle Chinese 祐 *wuwʰ (= my *wuʰ) and Tangut period northwestern Chinese (TPNWC) *3wu3.

However, "looks" does not mean "sounds". My w- is [ʔw], not a true [w] like Pulleyblank's *w-. Middle Chinese *w- corresponds to Tangut v- ([v]? [ʋ]?), not w- [ʔw] in Gong's list of Chinese loans in Tangut (2002: 407-408):


0403 1von1 : 王 *wɨaŋ 'the surname Wang'


2340 1von1 : 旺 *wɨaŋʰ 'bright'

I wrote "corresponds" because 'Middle Chinese' is a Platonic entity distinct from whatever northwestern dialect the Tangut were in contact with.

On the other hand, Gong's list of Tangut transcriptions of Chinese in the Forest of Categories (2002: 436-437, 444-445) shows vacillation between v-and w- for Chinese *w-syllables (correspondence types A and E). That seems to imply that the Tangut lacked a simple initial [w]: they could only approximate a Chinese initial [w] with either v- ([v]? [ʋ]?) or w- [ʔw].

Homophones B chapter and homophone group
Li Fanwen number
Tangut reading
Tangut period NW Chinese
Early Middle Chinese
Corresponence type
𗍁  II 1

*wɨejʰ A: v- : w-
II 2


𘍵 II 3



II 9



𗍾  II 9



II 26


B: Ø- : w-
*1hun3 *wuŋ C: h- : w-
𗭴 VIII 5087
*wɨaŋ B: Ø- : w-
𗇝 VIII 4689

*wɨet D: yw- : w-
𗫖 VIII 2094

E: w- : w-
𗤭 VIII 3128


𗨂 VIII 3685

*wɨep B: Ø- : w-
VIII 3628
*wɨen F: gh- : w-
*2/3wen3 *wɨenˀ/ʰ

*wuŋ C: h- : w-

There are also four other types of correspondences:

B: Tangut Grade IV Ø-syllables may have begun with [j], and Chinese Grade III *w- may have become [ɥ], a glide absent from native Tangut words. (But see correspondence D below.)

C: Unique to transcriptions of 雄 *1hun3 (for †1wun3) which must have developed the same irregular fricative found through much of Chinese: e.g., Cantonese hoŋ and Mandarin xiong < *hjuŋ.

D: Tangut ywa [ɥa] is a special rhyme in the readings of only three characters:

𗇝 4689 1ywa4 'glittering'

𗇜 5014 1ywa4 'to go fast; quick' (only attested in the Tangraphic Sea dictionary)

𗮞 5099 1shywa3 'transcription character for Sanskrit śva'

The first two words may be borrowings from 'Tangut B', the non-Sino-Tibetan language that I think is the source of much Tangut vocabulary and possibly even reflected in the structure of the more obscure characters.

F: Tangut ghw [ɣw] might have been an attempt to approximate Chinese [w] without the initial stop of Tangut w- [ʔw]. ghw- is from Gong's reconstruction; it corresponds to w- in Sofronov and Nishida's reconstructions converted into my notation. If Sofronov and Nishida are right, the use of 3628 is simply another instance of correspondence E.

Given that TPNWC 右 *2wu3, the phonetic of TPNWC 祐 *3wu3, was transcribed in Tangut as both 1vi3 and 2ew4, I would expect TPNWC 祐 *3wu3 (or an earlier Early Middle Chinese *wuʰ) to have been borrowed as †1vi3 or †2ew4 with initial †v- or †Ø-,  not 2wuq1 with initial w-. However, the existence of correspondence pattern E (Tangut w- [ʔw] : Chinese *w-) weakens an initial-based argument against a borrowing scenario. Note, however, that pattern E is not attested with the rhyme type of 右 and 祐. That may suggest that w- [ʔw] was inappropriate for 右 and 祐 even though it was appropriate for TPNWC 雲 *1wun3 and 員 *1wen3. TPNWC *w- could have had different allophones before different rhymes.

As I will explain in part 2, I think the rhyme of


0645 2wuq1 'to aid'

may even more strongly rule out a borrowing scenario. THE PREHISTORY OF TANGUT 2WUQ1 'TO AID'

When looking at Andrew West's post about a Tangut hand mirror with the character


0645 2wuq1

which he translated as 祐 'to aid', it occurred to me that 2wuq1 sounds like 우 u, the Sino-Korean reading of 祐. (The Yale romanization of the reading is visually even closer - wu!)

That makes the Tangut word easier to learn. I try to take advantage of soundalikes whenever I can. But are the two forms related? I don't think so, because 2wuq1 is from Pre-Tangut *Sʌ-ʔwə/oH, whereas u is ultimately from Old Chinese *wəʔ(-s).

1.13.13:49: Commentary on the reconstructions


T0. The only remotely similar words I know of are Old Chinese Pa-type words (my ignorance of the rest of Sino-Tibetan is showing):

*Cɯ.P(r)a > *bɨa 'to help'

*Cɯ.P(r)a-ʔ > *bɨaʔ 'to help'

*Cɯ.P- may have fused into *b-: *N-p- > *m-p- > *m-b- > *b-. Another possibility is that *-P- was *-b-, and that *C- has left no trace: *Cɯ.ba > *Cɯ.bɨa > *bɨa.

輔 may be a *ʔ-suffixed variant of 扶.

The presyllabic and main vowels don't match.

There is no guarantee that Tangut -w- is from a lenited stop *P.

A medial *-r- cannot be ruled out; if it existed, it corresponds to nothing in Tangut.

T1. Pre-Tangut *S- conditions Tangut vowel tenseness that I indicate in my notation as -q.

T2. Pre-Tangut *-ʌ- conditions the grade of the Tangut syllable (-1). The phonetic value of -u1 was (partly) lower than [u]: e.g., [ou]. *-u (< *-ə or *-o) lowered to harmonize with the height of unaccented *-ʌ- which was later lost.

T3. I have projected Tangut [ʔw] (w- in my notation) back into pre-Tangut. But I suspect that at the pre-Tangut stage there was a sequence *-CVP- that was compressed into Tangut [ʔw]. Pre-Tangut *-ʌ- could have been in that sequence: e.g., *S(ʌ).Cʌ.PəH.

T4. The pre-Tangut vowel could be either or *o; both merged into -u1 (Jacques 2014: 206).

T5. Pre-Tangut *-H is a laryngeal that conditioned Tangut tone 2 which I write Arakawa-style at the beginning of my notation. *-H could correspond to Old Chinese *-ʔ-s. My assumption that Tangut tones originated Chinese-style from laryngeals could be wrong; they may preserve Proto-Sino-Tibetan tones or have some entirely different origin. But if Tangut and Chinese developed similar grade systems (possibly via contact), they might have developed tones in similar ways as well.

Old Chinese

C0. 祐 *wəʔ(-s) 'to assist' belongs to a large word family discussed at length in Schuessler (2007: 581-582). Schuessler reconstructs a Proto-Sino-Tibetan root *wəs. I don't know how he would account for the *-ʔ in the Old Chinese members of the family.

On the other hand, Matisoff (2003: 327, 591) relates 佑 (another spelling of 祐) to his Proto-Tibeto-Burman *grwak 'friend/assist'. The rhyme might work: Matisoff's Proto-Tibeto-Burman *a can be from Proto-Sino-Tibetan *ə. See Schuessler (2007: 31-32) on Tibeto-Burman -k corresponding to Old Chinese *-ʔ. I don't believe in 'Tibeto-Burman' except as a convenient term for 'non-Chinese Sino-Tibetan', so 'Tibeto-Burman' here is to be taken as the latter.

As for *gr-, see C1-C2 below.

C1. Baxter and Sagart reconstruct the 祐 word family with *[ɢ]ʷ-. The brackets indicate 'either *ɢʷ-, or something else that has the same Middle Chinese reflex as *ɢʷ-' (wording based on Baxter and Sagart 2014: 8): e.g., *N-qʷ- or *m-qʷ-. *ɢʷ- does look like Matisoff's Proto-Tibeto-Burman *g- (see C0 above). But I am suspicious of it - there is no Chinese-internal evidence that there was ever a stop in this word family. And Baxter and Sagart's system has no simple *w- which is what Schuessler and I reconstruct instead of *ɢʷ- in this word family.

C2. There could not have been an *-r- in this word because *wrəʔ-s (or Baxter and Sagart's *[ɢ]ʷrəʔ-s) would have become Middle Chinese †wiʰ (cf. 鮪 *wrəʔ / *[ɢ]ʷəʔ > *wiˀ 'sturgeon'²), not *wuʰ (> Sino-Korean 우 u).

¹Samuel E. Martin designed the Yale romanization of Korean to be typeable on a standard US keyboard, so it has no diacritics or nonbasic Latin letters. w distinguishes labial wu [u] from nonlabial u [ɯ] (= ŭ in the modified McCune-Reischauer romanization of Korean on this site).

²The Sino-Korean reading of 鮪 should be †위 wi, but in fact it is 유 yu, presumably by analogy with 유 yu, the reading of the far more common character 有 'to exist'. There would be few opportunities to use 鮪 in Korean; the Korean word for 'sturgeon' is 鐵甲상어 chhŏlgapsangŏ 'iron armor shark'.

The suffix -ngŏ 'fish' is from Middle Chinese 魚 *ŋɨə, but in hangul it is written as < Ø.ŏ> across two syllables, so it is not associated with 魚 since character readings always only occupy single hangul blocks.

상어 sangŏ 'shark' is from Middle Chinese 鯊魚 *ʂæ ŋɨə, though it cannot be written as 鯊魚 in Korean since its parts do not correspond to syllable blocks/Sino-Korean readings:

Sinographs (aligned with Middle Chinese)
Middle Chinese
Korean (transliterated)

Sino-Korean (transliterated) s
Sinographs (aligned with Sino-Korean)

The Sino-Korean readings of 鯊 and 魚 are 사 sa and 어 ŏ, so 鯊魚 is read as saŏ. I suspect that sangŏ is an old borrowing from spoken Middle Chinese, whereas saŏ is a literary Korean creation combining the isolated readings sa and ŏ (< ngŏ). A TR-OUBLING TR-ANSCRIPTION

蔡同榮 Chai Trong-rong passed away fifteen years ago today. At first his name might look Vietnamese because of its tr, a letter combination not used in romanizations of the other major East Asian languages. However:

- Chai is not a Vietnamese surname. It is not even a possible Sino-Vietnamese syllable.

- Vietnamese names are usually made up of Sino-Vietnamese elements, and rồng 'dragon' is not one of them; it is a loan from Late Old Chinese 龍 *roŋ, but it is not Sino-Vietnamese in the strict sense: i.e., it is not the reading of 龍 which is long, a much later loan which may postdate rồng by a millennium.

- The Sino-Vietnamese reading of 蔡同榮 is Thái Đồng Vinh which is quite different from Chai Trong-rong.

- Nothing in Chai's background - beginning with his childhood in Japanese-ruled colonial Taiwan - points to a Vietnamese connection. (At first I thought he might be a Vietnamese immigrant to Taiwan. But he is ethnically Taiwanese.)

The tr doesn't match anything in the other forms of the name listed at Wikipedia:

Mandarin (IPA: [tsʰaj˥˩ tʰʊŋ˧˥ ɻʊŋ˧˥]):

Pinyin: Cài Tóngróng

Wade-Giles: Tsài Tóngróng (sic; the correct Wade-Giles is Tsʻai⁴ T'ung²-jung²)

Tainan Taiwanese Hokkien (IPA: [tsʰwa˥˩ tʰɔŋ˧ ʔeŋ˨˦]):

Pe̍h-ōe-jī: Chhòa Tông-êng

Then it occurred to me that Trong has the same letters as torng, the Gwoyeu Romatzyh (GR) romanization of 同. The -r- represents a high rising tone, not a consonant [r]. Could Trong be a metathesis of torng? And was the reordering of o and r accidental or intentional?

Rong is the GR romanization of 榮, but Chai is not the GR romanization of 蔡 which is tsay with y signalling [j] preceding a high falling tone.

1.12.13:41: Later last night it occurred to me that someone unfamiliar with Chinese might have accidentally spread the r- of -rong to the preceding syllable: Tong-rong > Trong-rong. But why would Chai adopt someone else's error? TODAY IN JURCHEN HISTORY

By coincidence, two major anniversaries today are exactly 23 years apart:

- the fall of the Northern Song capital of Bianjing (now Kaifeng) to the Jurchen:

Emperor Qinzong and his father, Emperor Huizong, were captured by the Jin army. The Northern Song dynasty came to an end.

- the assassination of Emperor Xizong in 1050


Emperor Xizong felt depressed by the loss of his sons that he developed an addiction to alcohol and started neglecting state affairs. He also became more violent and ruthless, and started killing people indiscriminately. One of his victims was Ambaghai, a Mongol chieftain and great-granduncle of Genghis Khan.

Emperor Xizong was overthrown and murdered by his chancellor, Digunai, and other court officials in a
coup d'état on 9 January 1150.

Xizong is of linguistic interest as the man attributed with the creation of the mysterious Jurchen small script whose fate may have been intertwined with his:

During the 1970s a number of gold and silver paiza with the same inscription, apparently in the small Khitan script, were unearthed in northern China. Aisin-Gioro has analysed the inscription on these paiza, and although the structure of the characters is identical to the Khitan small script she concludes that the script is not actually the Khitan small script but is in fact the otherwise unattested Jurchen small script. She argues that this small script was only used briefly during the last five years of the reign of its creator, Emperor Xizong, and when he was murdered in a coup d'état the small script fell out of use as it was less convenient to use than the earlier large script.

If Aisin Gioro Ulhicun is right, the only two surviving samples of the script are those that she has identified. I have written about the first twice before; I should finally get around to writing about the second six years later. THE JURCHEN SCRIPT: INNOVATION OR DERIVATION? (PART 1)

(Edited 1.10.0:41 before posting.)

川崎保 Kawasaki Tamotsu's 「渤 海」文字資料からみた女真文字の起源に関する一考察 ('An Observation Concerning the Origin of the Jurchen Script as Seen from Parhae Script Materials', 2014), a response to Alexander Vovin's "Did Wanyan Xiyin Invent the Jurchen Script?" (2012), contrasts two views of the Jurchen large script: 発明 hatsumei 'invention' (the ex Khitanis¹ hypothesis) and 発展 hatten 'development' (the ex Parhis² hypothesis).

To try to parallel how both terms begin with the root 発 hatsu- (hat- before t-) 'go out', I have loosely rendered 発展 hatten as 'derivation' in the title so that it has the same ending as invention.

Kawasaki first summarizes Vovin's  English-language paper in Japanese before presenting his own views.

Vovin read a Parhae stamped tile in Jurchen as

pe gorhon ni

'old thirteen GEN' = 'of Old Thirteen'

Kawasaki interpreted the first two characters as a single Jurchen character looking like Chinese 舍 'to set aside; lodging'.

I am not sure what to make of this stamp for several reasons. Here are the first two.

1. The first character on the stamp has 人 on the top rather than ス. There is no evidence that those two elements were interchangeable in the Jurchen large script, as no ス-graphs have 人-variants in Jin Qizong's dictionary (1984: 23-24). Nonetheless that does not refute Vovin's reading because 人 and ス could have been interchangeable in the earlier Parhae script. Alternately, the stamp may show an older Parhae form with 人 that was replaced by ス in Jurchen.

2. Vovin interprets

as Jurchen pe 'old'. At first this seems plausible given that (1) the Manchu word for 'old' is fe and (2)  Jin Jurchen p- corresponds to Ming Jurchen and Manchu f-. Parhae Jurchen predated Jin Jurchen and probably would also have had p-.

However, the word for 'old' is attested in the Jurchen large script as disyllabic

<pu (g)e> = pu(g)e (奧屯良弼餞飲碑 Aotun Liangbi picnic inscription 1)

and not as a monosyllabic pe. The contraction of pu(g)e to Manchu fe may have been a post-Jin innovation in some but not all varieties of Ming Jurchen. The Bureau of Translators vocabulary has disyllabic fuwe (transcribed as 弗厄 *fu ə and spelled in the Jurchen large script as in the Aotun Liangbi picnic inscription; #667) whereas the Bureau of Interpreters vocabulary has a monosyllabic transcription 佛 *fo, presumably for a form like fo in a Ming Jurchen dialect that compressed fuwe differently than the ancestor of standard Manchu. (o is labial like -uw- and mid like e [ə].)

It is unlikely that a monosyllabic Parhae Jurchen pe expanded into a Jin Jurchen pu(g)e and then recontracted into fo or fe. Parhae Jurchen pe may be an anachronism unless it is from a dialect not ancestral to the more conservative varieties of Jurchen with disyllabic pu(g)e/fuwe. Could standard Manchu fe be a descendant of the Pohai Jurchen pe-dialect (or another Pohai Jurchen dialect with the same type of *uge > -e compression)?

'Old' in Jurchen: a simplified possible family tree

(† = expected but not attested)

Proto-Jurchen *puge
Parhae Jurchen dialect 1: pe Parhae Jurchen dialect 2: †pu(g)e
Jin Jurchen dialect 1: †pe Jin Jurchen dialect 2: pu(g)e (Aotun 1)
Ming Jurchen dialect 1: †fe Ming Jurchen dialect 2: fo (Bureau of Interpreters) Ming Jurchen dialect 3: fuwe (Bureau of Translators)
Standard Manchu fe (Did these dialects survive into the Manchu era?)

Maybe, though Manchu does have a single word with -uge: buge ~ buhe (rather than †be < *buge) 'gristle'. Are those loans from noncompressing dialects? How heterogenous is standard Manchu?

And I would like confirmation of the sound value of


Ideally I'd like to see a polysyllabic word written with that character in the Parhae material that corresponds to a Jin Jurchen pe or Ming Jurchen/Manchu fe. Without such interlocking of both internal and external evidence, I have no way of knowing how

would have been read in Parhae. Graphic similarity does not entail phonetic similarity: e.g., Jurchen 日 'day' and 月 'month' look exactly like Jin or Ming Mandarin 日 and 月 but are pronounced completely differently: inenggi and biya instead of *ʐi and *ɥe. I am convinced that there is graphic continuity between the Parhae and Jurchen scripts. I am more agnostic about projecting Jin Jurchen values back onto an earlier script that may have been used to write other languages, related or otherwise. (In theory even Koreanic and para-Japonic speakers in Parhae could have used the Parhae script.)

I am not even sure there is graphic continuity between this particular Parhae script character and its Jurchen (near-)lookalike. I have already mentioned the problem of the different shapes of the top elements. Might the Parhae character be the source of

<?> '?' (N4631 #1355; font from

in the Khitan (not Jurchen!) large script?

To the best of my knowledge, the character

is not attested in Jin Jurchen; it is only known - at least to me - from the Ming Jurchen 永寧寺碑 Yongning Temple Stele (lines 3 and 4) where

<pe ing>

transcribes Ming Chinese 平 *pʰiŋ.

There is no evidence that

was read with final -e or any other vowel since it is only followed by -ing in the corpus. Could that character have been devised in Ming Jurchen to write  Ming Chinese *pʰ, a consonant absent from native Jurchen words (in which p [pʰ] had shifted to f)? I would prefer to read that character simply as p.

Jin Qizong (1984: 24) suggests that

is derived from Jurchen

<FORTY> dehi 'forty'

or from the aforementioned Chinese character 平 *pʰiŋ.

I would add that there is an even closer match for the shape of Jurchen <p> in the Khitan large script:

<FORTY> (北大王墓誌 Epitaph for the Grand Prince of the North 5; font from

One other potentially related Khitan large script character

<?> '?' (N4631 #2026; font from

is a lookalike for simplified Chinese 圣 'sage'. Unfortunately nothing is known about the phonetic or semantic value of 圣 in Khitan.

Although <FORTY> in the Khitan and Jurchen large scripts and what I read as p certainly look similar, I can't understand why one would decide to write a phonogram <p> as a modification of a logogram <FORTY>. Jurchen dehi has no p in it, and the Khitan word for 'forty' probably sounded something like Written Mongolian döcin 'forty' which also lacks p. Why not add a dot to, say,

be (accusative marker)

to create a phonogram for <p(e)>?

In fact a dotted phonogram derived from <be> already exists - the aforementioned


which is only attested in noninitial position after sonorants (vowels and l; Jin Qizong 1984: 100).

Might the graphic similarity between

<be> and <(g)e>

indicate that *-g- might have lenited to [ɣ] ~ [β] - the latter perhaps just after u? - justifying the choice of <be> as a basis for <(g)e>? Cf. how Proto-Koreanic *-p- lenited to [ɣ] and [β] in different dialects:

Proto-Koreanic *tupur 'two' > southeastern Old Korean 二肸 <TWO.> tuɣur but western Middle Korean 두ᄫᅳᆯ tuβur

(This is a revision of Alexander Vovin's proposal that Proto-Koreanic *-b- became [ɣ] and [β] in different dialects. Now neither he nor I follow S. Robert Ramsey's proposal to reconstruct *b in earlier Korean.)

Medial lenition is common to both Jurchen/Manchu and Korea, and a detailed comparison of the process in the two might be interesting.

I'll get to more of the problems with 'Old Thirteen' in part 2.

¹I originally wrote ex nihilo, but that could be misinterpreted as a straw man, as no one has ever claimed that Wanyan Xiyin had invented the Jurchen large script without any influence from other scripts. 'From the Khitans' is better since the orthodox view is that Wanyan Xiyin took the existing Khitan large script and arbitrarily changed it.

On the other hand, I side with Janhunen who thinks the Jurchen script is a modification of the Parhae script which was a sister of the standard Chinese script. Contrasting the two views:

Ex Khitanis

Chinese script
Khitan large script
Jurchen large script

There is no Parhae script for the Jurchen script to derive from in the orthodox view which holds that the Parhae wrote exclusively in Chinese characters.

Ex Parhis (Janhunen)

Proto-Chinese script
Standard Chinese script
Nonstandard northeastern Chinese script
Parhae script
Khitan large script
Jurchen large script

Today it occurred to me that the lost Tabghach script might fit into the above schema as the ancestor of the Khitan large script:

Ex Parhis (this site)

Proto-Chinese script
Standard Chinese script
Nonstandard northern Chinese scripts
Tabghach script? Parhae script
Khitan large script
Jurchen large script

In the above scenario, one could speak of a  para-Mongolic (or 'Xianbeic' = Shimunek's 'Serbic'?) line of scripts (Tabghach and Khitan) and a  possibly 'Tungusic' line of scripts (Parhae and Jurchen). But this is extremely speculative, as we have no idea what the Tabghach script looked like; it might not have been what Janhunen called 'sinoform' (i.e., Chinese-like). It may have had no relationship to any of the other scripts in the diagram above.

Janhunen (1996: 153) wrote,

In view of the later ethnic situation in the border zone between Korea and Continental Manchuria, and in the absence of any contradicting evidence, the most natural assumption about the states of Koguryo and Bohai [= Parhae] is that they were dominated by people ethnically ancestral to the Jurchen. It is well known that the Bohai ruling elite as largely formed by descendants of Koguryo nobility [...] there are no indications that any significant number of people linguistically connected with the modern ethnic Koreans would, during this period, have been present outside of the United Shilla territory.

To some extent, the above conjecture about the possible Jurchen identity of the Bohai population is complicated by the fact that the Bohai people continued to be counted as a separate ethnopolitical entity even after the fall of the Bohai kingdom. Not only the [Khitan] Liao [dynasty] but also the [Jurchen] Jin [dynasty] system of ethnic administration registered the Bohai people as distinct from the Khitan and Jurchen populations.

Another complicating factor is the absence of Jurchenic elements in the Koguryo onomastic material which is split between Koreanic and Para-Japonic items.

My attempt to reconcile the above points:

- Koguryo was a multiethnic state with a Chinese-influenced Koreanic-speaking elite ruling over Tungusic (including Jurchenic), Koreanic, and Para-Japonic-speaking subjects.

- As there is no evidence for Para-Japonic in Parhae, the Para-Japonic language(s) of Koguryo may have become extinct by the time Parhae was established in 698. Or Para-Japonic speakers were in the part of Koguryo that Shilla conquered: i.e., the part that did not become part of Parhae.

- The majority population of Parhae was Tungusic-speaking; their languages were not prestigious and hence almost totally absent from written records except in the Parhae script at a local and unofficial level.

- 'Parhae' as an ethnonym could have been a cover term for various peoples speaking Jurchenic and/or para-Jurchenic languages: i.e., Tungusic languages more or less related to Jurchen.

- The Jurchen script, then, was an official version of the previously informal Parhae script originally used to record one or more relatives of Jurchen but possibly not Jurchen itself. Wanyan Xiyin may have introduced new usages or characters to adapt the script to Jurchen. This introduction may not have been by him alone; it could have paralleled what happened when the Mongolian script was adapted for Manchu five centuries later. Roth Li (2000: 13) wrote,

The process of modifying [the Mongolian] script [for Manchu] occurred over at least a decade and was not, as some Chinese, sources made it appear, carried out singlehandedly by Dahai in 1632.

Dahai may have been the new Wanyan Xiyin: i.e., the man later attributed with the result of a slow process involving multiple people. As far as I know, there is no contemporary documentation indicating that Wanyan Xiyin 'created' the Jurchen large script in 1119 according to his biography in History of the Jin Dynasty 73; that 'fact' may be a later oversimplification. The later dates from other sources (1121 in the Wanyan Xiyin inscription and 1123 in the Record of the Great Jin State) could be reconciled by viewing the 'creation' of the Jurchen script as a process spanning years before and after 1120. Cf. Wikipedia:

The date of the creation of the script (1119 or 1120) varies in different sources. Franke (1994) says that "[t]he Jurchens developed ... [the large script] ... in 1119". Kane (1989) (p. 3) quotes the Jin Shi [History of the Jin Dynasty], which states that "[i]n the eighth month of the third year of the [天輔] Tianfu period (1120), the composition of the new script was finished". The two dates can be reconciled as one may imagine that the work started in 1119 and was completed in August–September (the eighth month of the Chinese calendar) of 1120.

In fact Tianfu 3 is 1119, so Kane's (1989: 3) 1120 may be an error; his 2009 book has 1119 on p. 3. (The date of the script is on page 3 of both books!)

²My guess at the Latin ablative of Parhae. I declined Parhae like Thebae 'Thebes' and Sinae 'China' which are pluralia tantum (though of course no such concept exists in Korean or Chinese [the Korean name is the Sino-Korean reading of the Chinese place name 渤海). WHAT IF THE LANGJUN INSCRIPTION WERE IN JURCHEN?

It just occurred to me that the Jurchen large script was fifteen years old when the 郎君 Langjun inscription was written in Khitan in 1134. The Arkhara inscripton of 1127 demonstrates that the Jurchen large script was already in use in the years between its 'creation'¹. So why wasn't the Langjun inscription about a Jurchen aristocrat - none other than the emperor's brother - written in his language?

It would be interesting to try to construct a Jurchen version of the inscription. At least I can guess that they would have written 'Tang dynasty' as

<ta ang> (attested in 1185 in 大 金得勝陀頌碑 26)

which partly parallels the structure of the Khitan small script spelling


from the Langjun inscription, though of course the Khitan small script fuses two characters (<ta> and <ang>) into a single block unlike the Jurchen large script characters which remain full-sized.

All that makes me wonder about the influence of Khitan writing practices on the Jurchen script - and if they can be differentiated from Parhae writing practices. Were the Parhae the first to write CVC syllables as <CV VC> sequences?

¹Not the best word, as I agree with Juha Janhunen (1994) who first proposed that the Jurchen script is actually derived from a preexisting Parhae script rather than being invented on the spot. It was Alexander Vovin who introduced me to Janhunen's idea over twenty years ago. I just found 川崎保 Kawasaki Tamotsu's 「渤 海」文字資料からみた女真文字の起源に関する一考察 (2014), a response to Vovin's "Did Wanyan Xiyin Invent the Jurchen Script?" (2012).

Next: My thoughts on Kawasaki's article. THE FIRST KHITAN SMALL SCRIPT ANALYSIS I EVER SAW

... was in 西田龍雄 Nishida Tatsuo's 『言語学を学ぶ人のために』 For People Who Study Linguistics in 1992:

229-199-140 073-163 261-303-205

<t.ang.en ki.ên>

tang.GEN Qianling.DAT

'to the Qianling tombs of the Tang dynasty'

I thought that book also had an analyzed sample of Jurchen, but I think I confused it with an article of Nishida's that I saw later in the 90s.

That three-block Khitan sample is from line two of the Langjun inscription which was once thought to be in Jurchen. Turns out that the 'Langjun' in question was Jurchen - the younger brother of the emperor of the Jurchen Jin dynasty. So this post still fits this year's focus on Jurchen, though I do have many non-Jurchen topics in a queue.

The first word has bothered me for years because I would expect the genitive suffix to be -an rather than -en after tang. But today I realize that perhaps -en obeys consonant harmony rather than vowel harmony. Jin Chinese *tʰaŋ 'Tang dynasty' ended in a velar *ŋ, and I hypothesize that in Khitan, velar consonants went with higher series vowels like e [ə] rather than lower series vowels like a which went with uvulars. I am projecting the distribution of Mongolian and Manchu velars, uvulars, and vowel series back into Khitan.

I predict that if Khitan borrowed a noun ending in -əq - a sequence as impossible in native Khitan as Jin Chinese *-aŋ - its genitive would end in †-an with an †a to match uvular -q. BITHE ENDEHE¹

Today while handwriting the date in Jurchen, I noticed one mistake and then made another.

First, looking at my New Year's image which I made in haste, I spotted that I had written

minggan 'thousand' (cf. Chinese 千)


topohon 'fifteen' (cf. Chinese 十五 - if only the other Jurchen '-teen' characters were as transparent!).

I don't have a Jurchen input method set up yet. That's something I ought to do to celebrate the 900th anniversary of the Jurchen large script. So I copy and paste characters from lists. I have a text file of all Jurchen characters organized by number of strokes and 'radicals'. Without bothering to take note of the number of strokes, I spotted a character with a 五 'five'-like shape at the bottom and pasted it into the image file. Unfortunately, I pasted the wrong one. I've reuploaded the image with the right one.

Then I wrote 'first month' incorrectly as

emu biya 'one month'

a calque of Chinese 一月 'one month' = 'January'. I forgot that the correct term is

niyengniye(n) biya 'spring month'

or perhaps

se biya 'year month'

attested only in Chinese transcription in the Bureau of Interpreters vocabulary as (#297). Kane (1989: 194) thinks it might be an error for the Jurchen cognate of Manchu aniya biya 'first month', lit. 'year month'.

(1.5.1:21: Is it coincidental that the characters for niyengniye(n) and se are so similar? Was one derived from the other? Did a Chinese misreading of niyengniye(n) as se result in the Bureau of Interpreters term for 'first month'? Does any attested Manchu variety have an equivalent of se biya?)

Jurchen niyengniye(n) [ɲəŋɲə(n)] 'spring' was transcribed in Chinese as

捏年 *njenjen in the Bureau of Translators' vocabulary

捏捏 *njenje in the Bureau of Interpreters' vocabulary

The presence or absence of final -n may have varied by dialect; cf. how 'horse' had -n in the Bureau of Translators' vocabulary but -o in the Bureau of Interpreters' vocabulary.

As for the final nasal -ng in the first syllable, I reconstruct it because both Manchu and other several other Tungusic languages have it - and because Ming Mandarin did not have a syllable like *ɲəŋ. So I think 捏 *nje was an attempt to imitate the first two-thirds of Jurchen [ɲəŋ] at the expense of the third. Moreover, the Chinese may not have heard Jurchen -ng before n-, and in rapid speech, niyengniye(n) might have been simplified to [ɲəɲɲə(n)] or [ɲəɲə(n)].

¹1.5.0:08: Bithe endehe is Jurchen for 'made a mistake in writing'. THE FIRST SYLLABLE OF THE JURCHEN WORD FOR 'NOSE'

There is disagreement not only about how the Jurchen word for 'nose' was written in the Jurchen large script but also about how it was pronounced. Jin Qizong (1984: 287) provides three reconstructions:

Jin Guangping: ʃoŋgi

Yamaji Hiroaki: šonggi

Kiyose Gisaburō: songi

Kane (1989: 314) reconstructed it as sunggi.

There is no doubt that the word is related to the root songgi- of Manchu songgiha ~ songgin 'tip of the nose'. But on the surface the Chinese transcriptions seem to represent slightly different forms:

Bureau of Translators: 雙吉 (for *šuwanggi?)

Bureau of Interpreters: 宋吉 (for *sunggi?)

I have long thought that the extant sources for Jurchen are not directly ancestral to Manchu. Do those forms contain retentions or innovations absent from Manchu? Apart from perhaps preserving an original final -i sans suffix, I think they are artifacts of Chinese transcription.

The problem was that Ming Mandarin did not have a syllable *soŋ that would have been a good match for the Jurchen syllable song. So the transcribers of the two bureaus found different solutions to this dilemma:

The Bureau of Translators solution: Ming Mandarin 雙 *ʂwaŋ approximated Jurchen o with wa (a labial glide-nonhigh vowel sequence). The syllable swaŋ did not exist in Ming Mandarin, so *ʂwaŋ was the next best approximation.

(There was a phase when early Mandarin had < *wɑ, but 雙 had *wa with a different vowel and was not affected by the shift.)

The Bureau of Interpreters solution: Ming Mandarin 宋 *suŋ approximated Jurchen o with u (a labial nonlow vowel).

It would be worthwhile to go through all the Bureau of Translators and Bureau of Interpreters transcriptons of Jurchen with an eye (or should I say an ear?) for the limitations of Ming Mandarin phonology and see if there are other cases of Manchu-like forms distorted through the lens of the Chinese finite syllabary.

That is not to say that all apparent slight differences from Manchu in those transcriptions can be explained away as inescapable compromises. Some transcriptions do represent genuine dialectal variants: e.g., in the Bureau of Interpreters vocabulary, the Jurchen word for 'sky' is transcribed as 阿瓜 pointing to agwa rather the expected abka [apka]. The Chinese could have transcribed Jurchen [pq] as *-pu-k-, but they didn't because they heard [ɢw] (or [ɣw]? - 1.5.0:05). The Bureau of Interpreters dialect had shifted *-pk- to -gw-. WHO KNOWS THE RIGHT WAY TO WRITE 'NOSE'?

Not me. At least not in the Jurchen large script.

The Unicode proposals I've seen for Jurchen (N3628 and N3788) have only two forms of songgi 'nose' from Jin Qizong's 1984 dictionary (pp. 267 and 257):

The dotted form is attributed to the Berlin text of the Hua-Yi yiyu; the dotless form is an unattributed variant.

I have not been able to find a third form from p. 267 of his dictionary:

He attributes it to Grube (1896), but I think it's a ghost, because I see three different forms on pages 26, 53, and 85 of Grube:

The first form is identical to Jin's Berlin form, but the others don't match the one he says is from Grube.

Kiyose (1977) has one more form from the Hua-yi yiyu:

Apart from the possible ghost, which of the five remaining forms is real? The only way to know is to examine the original texts or high-quality reproductions of them. This exercise demonstrates the limits of relying on publications with handwriting or fonts in lieu of the actual forms. 900TH ANNIVERSARY OF THE JURCHEN LARGE SCRIPT

Happy New Year!

This year is the 900th anniversary of the Jurchen large script.

I have been getting back into Jurchen over the past few weeks, and I hope to post more about it this year.

Key to the Jurchen large script characters above:

1. In the center: ju(r?)šen amba(n) bithe 'Jurchen big script'

1a. ju(r?)- 'Jur-' (red)

Could this have originally been a logogram for 'Jurchen'? This character is not attested without the following character which might have been added as a phonetic clarifier: <JURCHEN.šen>. Moreover, it is never used to write any word other than 'Jurchen'. If it originally stood for a simple CV syllable ju, I would expect to see it in more words. Hence I suspect its original reading was longer: jur or even juršen.

Unfortunately this character has no attestations outside the Hua-Yi yiyu.

I would like to say that *rš or *rc regularly became š, but the high frequency of and rc in Manchu casts doubt on that proposal. Nonetheless it is hard to believe that Mongols inserted an -r- into jürcin for no reason unless there is some analogy I'm missing (cf. -r-insertion in Tartar because of Tartarus).

I wondered if Mongolian -r- might correspond to the *-k of the Middle Old Chinese name of an early Manchurian people,  肅愼 *siwk dinh: *-k > *-g > *-ɣ > *-r? Janhunen (2004: 70) independently came up with the same idea years earlier, citing Dagur rhotacism. But other mismatches remain to be explained:

OC *s : j-

OC *d : -c- (but the affrication of Jurchen t/d to Manchu c/j would not happen for centuries, so Jurchen -c- cannot be from a stop!)

1.1.18:54: Ligeti reconstructed the Ming Jurchen reading as from Jin Jurchen jür. See Kiyose (1977: 90).

1b. -šen '-chen' (yellow)


1c. amba(n) 'big' (green)

I suspect that this was originally a logogram for amban 'big' and that the spelling <> with a phonetic clarifier <an> was a later innovation.

The Jurchen term for 'large script' is not attested, so I don't know if it contained amba or amban or something else entirely.

1d. bithe 'script' (blue)


2. Around the center: uyewun tanggū 'nine hundred'

2a. uyewun 'nine' (orange)

Logogram obviously related to Chinese 九 'nine'.

2b. tanggū 'hundred' (pink)

Logogram presumably related to Chinese 百 'hundred'.

Last night it occurred to me that Jurchen/Manchu ū could be romanized as ů to signify how [ʊ] is between [u] and [o]. But I will go with the traditional letter here.

3. The corners: juwe minggan oniyohon aniya 'two thousand nineteen year'

3a. juwe 'two' (yellow-green; top right)

Logogram identical to Chinese 二. Later written with a L-shaped second stroke that distinguished it from the Chinese form.

3b. minggan 'thousand' (blue-green; bottom right)

Logogram presumably related to Chinese 千 'thousand'.

3c. oniyohon 'nineteen' (dark purple; top left)

Logogram possibly related to Chinese  九 'nine'

-hon may be a Khitanic word for 'ten' related to Mongolian arban (< *xarban?) 'ten'. The Khitan word for 'ten' is unknown. The word for 'nineteen' in Khitan was 'ten-nine', not 'nineteen', but a related language or dialect could have had the opposite order.

If -hon is Khitanic, oniyo- should be too, but the Khitan word for 'nine' was completely different: is.

3d. aniya 'year' (light purple; bottom right)

4. Background: indahūn 'dog' (the animal for 2019 which also happens to be mine)

Logogram. The word was later spelled with a phonetic clarifier as <DOG.hūn>.

It is tempting to link this word to Japanese inu 'dog', but the mismatch between -dahūn < *-dakun and -u would have to be explained. I would expect a Japanese cognate of indahūn to be †idaku with †-d- < *-nd-. And I'm overlooking the problem of the initial consonant - Jurchen has lost an initial *ŋ- still preserved in some other Tungusic languages.

1.1.16:04: The character could be related to Chinese 犬 'dog'. If Janhunen's hypothesis that the Jurchen script is derived from a sister of the Chinese script is correct, then <DOG> might be a more elaborate descendant of the drawing of a dog ancestral to犬. WHY NICHOLAS WITH AN H?


Above: 'Nicholas' in Tangut as 4884 2946 5560 0493 2ni4 1ko1 1la1 2sy4

Saint Nicholas died 675 years ago today. I would expect the ch of Nicholas to correspond to Greek chi, but in fact it corresponds to Greek kappa in Νικόλαος <Nikólaos>. Wiktionary says the English name was borrowed from Old French Nicholas. Since the Latin form of the name is Nicolaus, why isn't the English name †Nicolas?

My guess is that Nicholas originated in France as a hyper-Hellenicism: 'Greek use ch. This name is Greek. So it must have ch.' But it originally didn't!

Aren't there other Latin words with ch for Greek k? And/or other hyper-Hellenicisms? I can't remember. I vaguely recall Nicholas Ostler's Ad Infinitum: A Biography of Latin had something to say about this, but I don't have my copy on hand.

In any case, the h has now spread within English to the feminine Nichole which seems to date from the 1960s.

Three other mysteries involving this name:

Why does it begin with M- in parts of Eastern Europe: e.g., Lithuanian Mikalojus, Polish Mikołaj, Czech Mikoláš, Slovak Mikuláš, Hungarian Miklós, Ukrainian Миколай <Mykolaj>? A dissimilation of *ni? Analogy with 'Michael'?

Why do the Czech, Slovak, and Hungarian forms end in [ʃ] rather than [s]? Is Hungarian s [ʃ] instead of sz [s] a spelling pronunciation of Latin s? Did the Hungarian form spread westward?

Why is the Ligurian form Nichioso with -chi- [ki]? Does *-ico- regularly become -ichio- in Ligurian? JADE LONGEVITY: NISHIDA TATSUO'S 90TH BIRTHDAY

西田龍雄 Nishida Tatsuo, Japan's greatest Tangutologist, would have turned ninety today*.

The first Tangut I literally heard was Nishida's reconstruction. Thirty years ago, I saw the movie 敦煌 Tun-huang on a flight to Japan. The Tangut characters spoke in Nishida-style Tangut; they would have pronounced


'ninety' (lit. 'nine ten')

as ŋgɨ̃¹ ɣɑ̣² - equivalent to my 1gy'4 2ghaq1. My Tangut notation is designed to be easy to type and not be precisely phonetic. If I were asked what I think 'ninety' sounded like, my guess would be something like [ŋgɨ¹ ɣɑ̣²]  which happens to be pretty close to Nishida's reconstruction. (I should write about the lack of a nasal vowel in 'nine' later. I still don't know what phonetic feature my 'prime' symbol represented.)

Four years later, I was studying linguistics in Japan. The assigned textbook was his 『言語学を学ぶ人のために』 For People Who Study Linguistics. At the time I had seen glimpses of Khitan and Jurchen in Nakanishi Akira's Writing Systems of the World, but I think it was Nishida's textbook that gave me my first linguistic introduction to those languages and scripts. (I don't have my copy on hand, so I can't check my memory.)

(1.5.23:35: When I finally got ahold of my copy, I was disappointed to learn that no Jurchen samples were in it. But it did have a Khitan small script sample.)

In the Khitan large script, 'ninety' (pronunciation unknown - possibly a cognate of Written Mongolian yeren 'id.'?) looks exactly like the Chinese anti-fraud numeral character 'nine':

To the eye of someone familiar with Chinese characters, the left side looks like 𤣩, an abbreviation of 玉 'jade' resembling 王 'king', and the right side looks like 久 'long time'.

The Khitan small script character for 'ninety' is unknown, at least to me.

One of the Jurchen (large script) characters for uyewunju 'ninety' (cf. Manchu uyunju 'id.') looks like a reversed Chinese character 上 'above, top':


All of the similar-looking Chinese characters I've mentioned - 'jade', 'king', 'long time', 'above' - suit the great Nishida, a gem and master among scholars. He was at the top of his field, and his work will endure for a long time.

*As I write this, it's still November 26th in the UTC -11:00 time zone; no one lives in the UTC -12:00 time zone. NIEPODLEGŁOŚĆ

Today is the hundredth anniversary of the reestablishment of Poland.

Every November 11th is Narodowe Święto Niepodległości - 'National Holiday of Independence'.

I can account for all the morphemes in niepodległość 'independence' except one:

nie- 'not'

-pod- 'under, sub-' (cf. be subordinate, the translation of the verbs podlec < *-g-tei [impf.] ~ podlegać [perf.])?

-leg- 'lie'

-ość ('appended to adjectives to form names of abstract concepts'; in this case the adjective is podległy 'subordinate')

What is -ł-? Does it form adjectives from verbs? Is it related to the past suffix in podległ 'was subordinate (m. sg.)'?

11.12.2:43: Added derivation of -c < Proto-Balto-Slavic *-g-tei.

Interslavic leg-ti 'to lie down' (impf.) retains the -g- of the root and is more transparent than its Polish equivalent lec.

Do legti and lec (imperfective according to Swan's dictionary) have perfective equivalents? Wiktionary says lec is perfective and lists no imperfective counterpart. I'm confused.

I could mechanically render niepodległość 'independence' into Interslavic as †nepodleglost, but the actual equivalents are

nezaležnost with za- instead of pod-, -g- > -ž-, and -n- instead of -l- (cf. Polish niezależność)

nezavisnost < zavisny 'dependent' (cf. Polish niezawisły 'independent', again with -ł- after the root)

samostojnost < samo- 'self' + stoj 'stand' (cf. Ukrainian самостійність with і < *o)

The third word is reminiscent of the pan-East Asian word 獨立 'alone-stand' for 'independence': Mandarin duli, Japanese dokuritsu, Korean tongnip, and Vietnamese độc lập. THEY KANTU BE COGNATES II: A BIG MISTAKE IN TANGUT ETYMOLOGY

Middle Chinese (MC) *d- has two main Early Old Chinese (EOC) sources: *d- and *l- before the 'lower' vowels¹ *a/e/o. (I list all other sources here.²)

In theory, MC 大 *da̤(j) 'big' < EOC *lats (last seen in my last post) could have had *d- or *l- in EOC, and in fact, the word has been reconstructed in Old Chinese with both initials (*d- by Schuessler 2009 and *lˁ- [= my *l-] by Baxter and Sagart 2014). Two pieces of evidence point toward *l-:

- an alternate spelling as 世 *Hɯ-lap-s (Baxter and Sagart 2014: 109; they reconstruct *l̥ap-s)
*H- indicates a consonant that conditions aspiration or devoicing: *Hɯ-l- > *l̥-.

The use of 世 could indicate that 'big' was really *laps or that 世 was chosen to write *lats after *-ps merged with *-ts. Unfortunately there are no known cognates that could point to *-t or *-p.

- the 古丈 Guzhang subvariety of the 瓦鄉 Waxiang variety that preserves *l- in 'type A' syllables with 'lower(ed)' vowels has /lu 22/ 'big' with /l-/ (Baxter and Sagart 2014: 109).

I am guessing /u/ is from *-as and not *-ats.

For some time I thought EOC *lats (or *laps?) was cognate to Tangut

4456 2leq3 'big'

and would have correlated the pre-Tangut *S- which conditioned -q (tenseness) with an aspirating prefix *H- that made *lats into 太 *H-lats > MC *tʰa̤j 'great'.

Now I see that a proto-Sino-Tibetan *l-word for 'big' based on those Chinese and Tangut words is impossible. The rhymes do not match. Converting Jacques' pre-Tangut reconstructions into my system, I posit three possible sources for *2leq3:

1. *Sɯ.leH

2. *Sɯ.leŋH

3. *Sɯ.laŋH

Lining up the components of the EOC and pre-Tangut forms:

*H (= *s?)
*t or *p
*a or *e
or *ŋ *H


✓ or ✕

In theory, EOC 太 'great' could have been *sɯ-lats with a high presyllabic vowel that was lost before it could trigger partial vowel raising in the main syllable.

The presyllables might match, but there is no way pre-Tangut *-e, *-eŋ, or *-aŋ can be reconciled with EOC *-ats or *-aps. So I can only regard the pre-Tangut and EOC words for 'big' as lookalikes.

¹EOC had two sets of vowels:




This happens to be identical to the higher/lower eight-vowel system I reconstruct for Early Korean apart from the inclusion of stress which is irrelevant to Korean phonology. Higher/lower systems are a trait of northeast Asian languages (EOC, Mongolic, Tungusic, Korean³, and possibly Tangut - but not Tibetan to the west, Burmese to the south, or Japanese across the sea to the east).

and are cover symbols for 'unknown unstressed higher vowel' and 'unknown unstressed lower vowel'. They are based on the Korean higher and lower minimal vowels which really were and *ʌ. It seems that EOC had *i as an unstressed higher vowel at an early point, and it is possible that the unstressed subsystem was triangular: *a/i/u. If that was the case, *u has left no traces of its labiality on the following syllable, whereas *i has left traces of its palatality, and *a = has triggered partial vowel lowering and pharyngealization.

EOC *d- and *l- before the 'higher' vowels *ə/i/u have palatalized Middle Chinese reflexes *d- and *j-:

時 EOC *də > MC *dʑɨ 'time'

慎 EOC *dins > MC *dʑi̤n 'careful'

受 EOC *duʔ > MC *dʑṵ 'to receive'

怡 EOC *lə > MC *jɨ 'cheerful'

引 EOC *linʔ > MC *jḭn 'to draw a bow'

誘 EOC *luʔ > MC *jṵ 'to lead, influence'

The last two examples might have had EOC *ɟ- (me), *j- (Schuessler), or *z- (Karlgren), but let's go with a currently mainstream *l- for now.

There is no such palatalization before the 'lower' vowels (unless a higher-vowel presyllable preceded during the period of height harmony; see below).

²All other sources of MC *d- (converted from Baxter and Sagart's 2014 reconstruction):

1. EOC *nasal preinitial or *nasal-ʌ-presyllable + *t-

奠 EOC *N-ten-s > MC *de̤n 'to be fixed (v.i.)'

奠 EOC *m-ten-s > MC *de̤n 'to set forth (v.t.)'

突 EOC *mʌ-tʰut > MC *dot 'to burst through'

毒 EOC *mʌ-duk > MC *dok 'to poison'

with a longer version of the volitional prefix in 奠 EOC *m-ten-s 'to set forth'

2. EOC *Cʌ.d/l-

道 EOC *kʌ.luʔ > *kʌ.lʌuʔ > MC da̰w 'way'

cf. Proto-Hmong-Mien *kləuʔ 'way', a borrowing from Chinese

3. EOC *Cɯ.d/l- + *a/e/o > *C.d/l- + *a/e/o

The presyllabic higher vowel was lost by the period of height harmony, so it could not trigger partial raising of the vowel of the main syllable.

4. EOC *mV- + *r- > Early Middle Old Chinese (MOC) *mr- > Late MOC *d-

There were two waves of *mV.r- simplification: this one (1) and a later one (2):

Stage \ Simplification wave
1. EOC
2. Early MOC
3. Late MOC
4. MC

Examples of the two waves:

Wave 1: 逮 EOC *mʌ.rəp-s > Early MOC *mrʌəts > Late LOC *dəts > MC *də̤j  'to reach to'

Like Schuessler, I would normally prefer to reconstruct *l- instead of *r-, but for the moment I want to make the Baxter-Sagart *r- work within my system. For the logic behind *r-, see Baxter and Sagart (2014: 133-134).

Wave 2: 埋 EOC *mʌ.rə > Late MOC *mrʌə > MC *mɛj  'to bury'

I could treat cases like

萏 EOC *CV-romʔ  > MC *də̤m, second syllable of MC 菡萏 *ɣə̤mdə̤m 'lotus flower'

cf. Baxter and Sagart 2014's *rˁomʔ which should normally become MC *la̰m, not *da̰m!

as examples of wave 1 with presyllabic *m-, though once again I would prefer to reconstruct *l- instead of *r-.

Baxter and Sagart (2014: 134) posit different developments in different dialects instead of two waves within the same language.

5. EOC *N.r- + *a/e/o

 蕩 EOC *N.raŋʔ > MC *da̰ŋ 'to beat furiously (heart)'

Again, I would normally prefer to reconstruct *l- instead of *r-, but for the moment I want to make the Baxter-Sagart *r- work within my system. This word is not in Schuessler (2009), but it would have *l- in that book's system.

³I hesitate to say 'Koreanic' since I do not know if non-Korean Koreanic languages also had higher/lower vowel systems. Korean height vowel harmony seems to be an internal innovation dating long after EOC; it may be due to contact with Jurchen to the north. THEY KANTU BE COGNATES

While looking for cognates of Vietnamese trai ~ giai 'boy' outside Vietic for my last post, I discovered a Kantu noncognate ʔandrus 'male, man' (L-Thongkum 2001¹). If a layman saw these three words and were asked to pick the one word not related to the other two, they'd choose nara-:

Kantu ʔandrus 'man'

Ancient Greek andrós 'man (genitive singular)'

Sanskrit náras 'man (nominative singular)'

But of course the last two are from Proto-Indo-European *ʕnḗr 'man'. The Ancient Greek nominative singular anḗr is almost identical apart from the epenthetic -a-.

The direct Sanskrit cognate of anḗr is nā́ 'man'. The loss of *ʕ- and the shift of *ḗ to ā́ are regular; the loss of *-r is not² (compare with PIE *dʰwṓr > Skt dvār 'door' which retains *-r but has a different irregularity - d- instead of dh-).

Sanskrit nár-a- is an extended version of the same word with an -a- suffix.

The Kantu word has a compressed variant ndrus. Kantu is a Katu dialect; other Katu varieties and Souei have

- shifted -s to -jh

(cf. Old Chinese *-ts, *-ps > Late Old Chinese *-s > Early Middle Chinese *-jh)³

- lost the nasal

- or added a prefix

assuming that they derive from Sidwell's (2005) Proto-Katuic *ʔndruːs 'male, man':

Katu (Triw) ʔandruːjh 'male, man'

Katu (Phuong) trus ~ padrɨjh 'boy, man'

why two different rhymes? different dialects?

Katu (An Diem) padruːjh 'boy, man'

Souei kantruah 'male, man'

¹Why isn't this word visible when I view L-Thongkum (2001) using the "build custom dictionary" option in the SEAlang Mon-Khmer database?

²This loss is regular for -stems like nr̥ for nā́, but it is not regular for Sanskrit as a whole (hence the *-r-retention in 'door').

³18.7.6.22:21: An even more relevant parallel is in Vietnamese:

*-s > *-ɕ > *-jh > hỏi/ngã tone (depending on voicing of the *onset) + /j/ as in mũi 'nose'

Thavung mús 'nose' (Premsirat 2000) retains the original *-s. Ruc muːʃ  'nose' (Phu 1998) is like my intermediate stage *-ɕ between *-s and *-jh; another Ruc form, muᵊh (Phu 1998), has no final palatal segment.

In Chinese, primary and secondary *-s generally had two different reflexes which are like those of Vietnamese *-h and *-s:

Early Old Chinese
Middle Old Chinese *-s
Late Old Chinese
Middle Chinese
'departing tone'
*-j + 'departing tone'

The general pattern of mergers (four categories into two) is clear, but the phonetic details are not: e.g., perhaps *-ks became *-x and merged with *-h from *-s at the Middle Old Chinese stage:

Early Old Chinese
Middle Old Chinese *-h
Late Old Chinese
Middle Chinese
'departing tone'
*-j + 'departing tone'

The two scenarios above need not be mutually exclusive, as they - and others I have not yet imagined - could represent what happened in different varieties of Old Chinese.

Late Old Chinese secondary *-s may have been phonetically *[ɕ], a simplification of *[tɕ] < *[ts] < *[ts] and *[ps]. I am unaware of any evidence for reconstructing an affricate as a source of Vietnamese *-s.

The high-frequency Early Old Chinese word  大 *lats 'big' has modern reflexes with and without [j]: e.g., Taiwanese tuā < *las and tāi < *lats. (The macron represents a Taiwanese 'departing tone' that developed after *voiced initials.) In the case of Taiwanese, tuā is native and tāi is borrowed, but I do not know if such an explanation can account for the presence or absence of [j] elsewhere.

I am not aware of evidence pointing toward a dialect in which all *-ts (and *-ps?) merged with *-s and *-ks as *-s, though there is no a priori reason for doubting that such a massive merger could happen.

High-frequency words may be subject to greater erosion, so perhaps *lats had an abbreviated variant *las that became the ancestor of standard Mandarin (as opposed to standard Mandarin dài < *lats.) A CHRONOLOGY OF ANDROGRAPHY

Looking at the many Chữ Nôm spellings of Vietnamese trai ~ giai < *plaːj 'boy' at made me realize how useful it would be to have a Chữ Nôm dictionary with characters organized by chronology and geography. Even without any dated manuscripts on hand, I can make guesses about the ages of spellings by the sound changes that they reflect. (See my last post on the history of Vietnamese *pl-.) The spellings fall into five strata listed below in approximate chronological order. (The dating of the fifth category is uncertain.)

1. *p-l-spellings reflecting *plaːj

2. *l-spellings which may reflect *plaːj or *tlaːj (i.e., postdate the shift of *pl- to *tl-)

3. a *t-l-spelling which postdates the shift of *pl- to *tl- and the shift of *s- to *t-

4. a tr-spelling which postdate the merger of *tl- and *ʈ- as tr-

5. gi-spellings which postdate the shift of *kj- to gi-; they may reflect *, or, more likely, an alternate (dialectal?) development of *pl- as gi-.

Graph Semantic Phonetic 1 Phonetic 2

none ba < *p- lai

nam 'man'
lai none
⿱司來 none
< *s-

trai < *ʈ- none

giai < *kj-
nam 'man'
giai distorted as 隹 chuy¹
⿰男皆 giai < *kj-

All of the above are listed under the reading trai at (including 佳 despite the fact that the Sino-Vietnamese reading of 佳 is giai) except for the last two which are listed under the reading giai.

There is no graphic evidence for *CV.plaːj which I thought might be the source of giay: i.e., there are no characters with a phonetic component corresponding to my hypothetical presyllable. In theory, a sesquisyllable like *kV.plaːj could have been written as a combination of *kɤː and *paːj characters: e.g., ⿰居拜.

There is also no comparative evidence within Vietic for *CV.plaːj as opposed to *plaːj. I found these forms outside Vietic at the Mon-Khmer Comparative Dictionary:

Katuic branch:

Katu (An Diem) mblɑːj 'unmarried man' (Costello 1971)

Palaungic branch:

Lawa (Mae Sariang) [kuan] mblia, mbluai 'young man' (Shorto 2006)

Is the [pi] in Lawa (Bo Luang) [pi]-plia 'young man' (Shorto 2006) reduplicative?

Old Mon blāy 'young man' (Shorto 1971) is the same word, albeit without any element before bl-.

Shorto (2006) reconstructed a Mon-looking Proto-Mon-Khmer *blaːj 'young man' which I am tempted to revise as *m.blaːj on the basis of Katu and Lawa. But perhaps there is another explanation for the shared m- in Katu and Lawa. It may be significant that Sidwell (2005, 2010) did not reconstruct this word in Proto-Katuic or Proto-Palaungic.

But I doubt Vietnamese trai ~ giai is related to those non-Vietic words because it does not go back to *b- ... -j; its tone points to voiceless *p-, and its -i is from *-l which is still preserved in more conservative Vietic languages (look up 'man' at the Mon-Khmer Comparative Dictionary for examples).

There is no Chữ Nôm evidence for *-l in the spellings of 'boy' or any other words, which leads me to think that the change of *-l to *-j was complete before Vietnamese was first written. Otherwise I would expect *-l words to be at least sporadically spelled with *-w, *-t and/or *-n phonetics. (Chinese had long ago lost its *-l by the time Vietnamese was first written, so the only options for writing Vietnamese *-l were *-j, *-w, *-t, or *-n.) *-w seems unlikely since *-l must have been palatal [ʎ] or palatalized [lʲ] before becoming *-j. I cannot yet dismiss the possibility that *-l was consistently written with *-j phonetics. If Chữ Nôm reflects some feature lost before *-l became *-j, then I would have no choice but to assume that was the case.

7.1.21:52: Wiktionary has the following etymology:

From Proto-Vietic *p-laːl, from Proto-Mon-Khmer *bplaaj, an infixed form of *blaaj (“young man”); cognate with Mon ဗၠဲာ (plai, “bachelor, unmarried male past age of puberty”).

*bp- explains the *voiceless tone and even fits my hypothesis of gi- coming in part from * I could even claim that Lawa (Bo Luang) [pi]-plia preserves the original presyllable. So maybe:

*bil- > *bi-p-l- (infixation; *voiceless tone on stressed syllable) > *bi.βj- > *CV.ʑ- > gi-

But the problem of Vietic having *-l instead of *-j remains, unless one reconstructs a Proto-Austroasiatic *-ʎ to account for the mismatch in codas.

* *-j

I would rather not reconstruct *-ʎ to save a single troubled etymology, though. And I think it's still too early to reconstruct Proto-Austroasiatic.

As for the presyllable, it is unnecessary if gi- is regarded as a dialectal development:

The sound change from *p-l and *b-l to ‹gi› is a regular sound change in Northern Vietnamese; compare giồng, giầu, giời and giun.

trai would then have to be a southern loanword in northern Vietnamese coexisting alongside native giai.

¹隹 chuy 'bird' makes no sense from a semantic or phonetic perspective. At first I thought it might be an abbreviation of 雄 hùng 'male' in 𪟦, but it's more likely to be a reshaping of the phonetic 佳 giai. 隹 is a frequent right-hand element of Chinese characters whereas 佳 is not a right-hand element in Chinese and is not common in that position in Chữ Nôm. (I don't know of any other Chữ Nôm characters with the structure ⿰X佳.) FRUITS OF THE SKY

I thought Pittayaporn's Proto-Tai cluster *ɓl- was unusual until I remembered that Middle Vietnamese had /ɓl/ in words such as blái /ɓláːj/ 'fruit' whose Chữ Nôm spelling 𢁑¹ is from 巴 ba + 賴 lại. That /ɓl/ is not very old; it goes back to Proto-Vietic *pl-, and some Vietic languages preserve the original *p- to this day: e.g., Ruc pəlíː 'fruit' (Phu 1998). Vietnamese shifted *p- to b- /ɓ/ both before vowels and before *l-.² The modern Vietnamese word for 'fruit' is trái (northern /cáːj/, central-southern /ʈáːj/). lists a variant lái - is this current, and if so, where?

Another source of Middle Vietnamese /ɓl/ is *bl-: e.g., Middle Vietnamese blời 'sky' (the huyền tone written with a grave accent points to a voiced proto-initial). Cf. Ruc pləːj < *b- 'sky' (Phu 1998).

As one can see at the Wiktionary entry for blời, its modern reflexes are trời and giời. tr- is the initial I'd expect since *Cl- clusters normally merge as tr-. gi-, on the other hand, has long puzzled me since it is normally from presyllable-palatal sequences and *kj-

It just occurred to me that trời is from 'bare' *blơi whereas giời may be from *CV.blơi with a presyllable whose vowel conditioned the lenition of the following consonant which became gi- (possibly a voiced palatal stop [ɟ] in Middle Vietnamese).  In Hanoi, retroflex tr- /ʈ/ < *Cl- became palatal ch- /c/. So perhaps * similarly became gi- which was palatal [ɟ] in Middle Vietnamese:

* > *CV.βj- > *CV.ʑ- > gi-

or if a prefix were added to *b- after it devoiced to *p- and imploded to *ɓ-:

*bl- > *pl- > *ɓl-CV-ɓl- > *CV-ɓj- > *CV- > gi-

(*ʄ is a palatal implosive stop. Cf. the palatal stop pronunciation [ɟ] of gi- in Vinh, Thanh Chương, and Hà Tinh.)

Similar changes could be proposed for *(CV.)pl-:

* > * > *CV.βj- > *CV.ʑ- > gi-

*pl- > *ɓl-CV-ɓl- > *CV-ɓj- > *CV- > gi-

*(CV.)pl-words would have upper register tones (ngang, sắc, hỏi) conditioned by *voiceless initials, whereas  *(CV.)bl-words would have lower register tones (huyền, nặng, ngã) conditioned by *voiced initials: e.g.,

*CV.pla, *CV.plaʔ, *CV.plah > gia, giá, giả 


*CV.bla, *CV.blaʔ, *CV.blah > già, giạ, giã

There are two major problems with those accounts.

First, I don't know of any comparative evidence for a presyllable in 'sky'. If there were such a presyllable, it might have to be a prefix that was a Vietnamese-internal innovation. Maybe the development of *pl-/*bl- to gi- paralleled that of *kj- to gi-:

*pl- > *pj- > bj- > gi-

*kj- > *gj- > gi-

Second - and this applies to my no-presyllable solution as well - I don't know of any other instance of *-l- becoming *-j-,though perhaps such a stage could bridge *Cl- and Hanoi ch- /c/:

*Cl- > *tl- > *tj- > ch- /c/

Dialects that have retroflex reflexes of *Cl- had no *-j-stage:

*Cl- > *tl- > *tr- > tr- /ʈ/

l- ~ nh- /ɲ/ variation in Vietnamese (Thompson 1987: 70) suggest that *l may have once been palatal [ʎ]: e.g., hai mươi lăm ~ northern hai mươi nhăm 'twenty-five' (but năm 'five' and mươi lăm 'fifteen', not †mươi nhăm).

I don't know of any modern Vietnamese dialects that have labial reflexes of Middle Vietnamese /ɓl/, but I know almost nothing about Vietnamese dialects. If not for Middle Vietnamese or more conservative Vietic languages, I wouldn't be able to reconstruct a labial initial in 'fruit' or 'sky'.

The reflexes³ of Middle Vietnamese /ɓl/ -

Vinh, Thanh Chương, Hà Tinh
Huế, Saigon

(IPA from the Wiktionary entries for giời and trời.)

- are quite different from those of Proto-Tai: /bl bj b mj m ɗ d l n/. In fact none match!

(Mostly written 18.6.18; revised, expanded, and finished 18.6.29.)

¹At, there are other spellings, all with two components:

Graph Semantic Phonetic 1 Phonetic 2
⿰來巴 none lai ba
⿱巴乃 quả 'fruit'̉ ba nãi
𣛤 lai none
𣡙 lại
𧀞 lại

The use of an n-phonetic 乃 nãi for a syllable with /l/ makes me wonder if *pn- merged with *pl-. After such a merger, one might spell *pn- (now *pl-) words with both n- and l-phonetics, and one might also spell *pl-words such as 'fruit' with both types of phonetics.

²But not *-r-: *pr- in 'squirrel' became s- [ʂ] (sóc), not †br- [ɓr]. Khmer កំប្រុក <kaṁpruk> 'squirrel' retains the original cluster (Gage 1985: 506). This could imply that

*pr- > *pr̥- > *pʂ - >  s- [ʂ]

predated *p- to b- [ɓ]: i.e., that *p- in *pr- was lost before it could implode to †ɓ-. Then again, those are a lot of steps, and perhaps *pr- was also subject to implosion:

*pr- > *ɓr- > *ɓʐ- > *ʐ- > s- [ʂ].

³Strictly speaking, gi- is not a reflex of /ɓl/ if I am right about it being from * rather than from /ɓl/ < *pl-/*bl-.

⁴[ɟ] may be the closest living approximation of the Middle Vietnamese consonant that de Rhodes wrote as gi- in the 17th century. Cf. the Italian pronunciation of gi- as [dʒ]. RETROFLEXES FROM DENTALS IN ZHENGZHANG AND PAN'S OLD CHINESE RECONSTRUCTIONS

Middle Chinese has a series of retroflex initials

*ʈʰ *ɖ- *ɳ-
*tʂ- *tʂʰ- *dʐ-

which in the West are thought to come from Old Chinese *r-clusters. Contrast these two words in Baxter's Old Chinese reconstruction and their Middle Chinese reflexes in my reconstruction:

專 OC *ton > MC *tɕwɨen 'exclusively'

傳 OC *tron-s > MC *ʈwɨe̤n 'what is transmitted'

OC *t- palatalized to MC *tɕ-, whereas OC *tr- became retroflex *ʈ-.

I would reconstruct the two words as OC *Cɯ.ton and *Rɯ.ton-s with high-vowel presyllables triggering the diphthongization of the following vowel:

Stage 1
*Cɯ.Con presyllable present
no effect of presyllable on following syllable
Stage 2
*Cɯ.Cuon highness of presyllabic vowel transferred onto beginning of next vowel
Stage 3
*Cwɨan presyllable lost
labiality of *uo shifted to onset; *ɨa is *uo stripped of its labiality
Stage 4
*Cwɨen *a fronted to *e before the acute coda *-n

*Rɯ- fuses with the following consonant: *Rɯ.t- > *rt- > *tr- > *ʈ-..

But the basic pattern remains: *t- palatalizes, whereas a *t- plus *R- sequence results in *ʈ-.

On the other hand, Zhengzhang Shangfang and Pan Wuyun reconstructed 'what is transmitted' without *r:

Old Chinese
Middle Chinese
Baxter & Sagart
This site
exclusively *ton
*Cɯ.ton *tjon
what is transmitted *tron-s
*Rɯ.ton-s *tons

The palatalization of *tj- to *tɕ- makes perfect sense. But why would *t- back to *ʈ- in 'what is transmitted' and other 'type B' syllables with short vowels in Zhengzhang and Pan's reconstructions? (Their short vowels correspond to the absence of pharyngealization in Baxter and Sagart's reconstruction and the presence of high vowels in my reconstruction.)

Old Chinese
Middle Chinese
Baxter & Sagart
This site

*Rɯ.ta *ta

*truŋ *tuŋ

to ascend
*Rɯ.tək *tɯɡ

*Rɯ.te *ʔl'e

tree root
*Rɯ.to *to

Pan's *k-l- shifting to *ʈ- is a change also found in Vietnamese.

Zhengzhang's *ʔl'- (what is *'-?) shifting to *ʈ- is a similar change.

There is no *tri in Baxter and Sagart's system or mine, so there may not be a *ti in Zhengzhang and Pan's systems.

*R could be *r- or *l-; the *R- of *Rɯ.tək 'to ascend' may have been *l- if Written Tibetan ltag-pa 'upper part' is cognate.

I don't know if Zhengzhang or Pan have a *te, but I would expect their *te to have the same Middle Chinese reflex as 'know'.

For comparison, Pan and Zhengzhang's *t- does not become retroflex in 'type A' syllables with long vowels corresponding to the presence of pharyngealization in Baxter and Sagart's reconstruction and the presence of low vowels in my reconstruction. (I use as a symbol for 'unknown unstressed lower vowel.)

Old Chinese
Middle Chinese
Baxter & Sagart
This site

capital city
*ta *taː

western tribes
*Cʌ.ti *tiːl

*Cʌ.tuʔ *tuːwː

*Cʌ.tək *tɯːg

son of principal wife
*tˁek *tek

*tˁo *to

Summing up the patterns (and adding type A syllables with Middle Chinese retroflexes for completeness):

Syllable type
Old Chinese
Middle Chinese
Baxter & Sagart
This site
*ti, *Cɯ.tA
*RtI, Rɯ.tA, *trI
*tˁr- *RtA, *Rʌ.tI, *trA
*tA, *Cʌ.tI *tVː

I use *I to symbolize the stressed higher vowel series *ə *i *u and *A to symbolize the stressed lower vowel series *a *e *o.

Zhengzhang and Pan do allow *tr-type clusters to become retroflexes. But why would simple *t-initials also become retroflexes before short vowels? I have never seen that change anywhere else, and

My guess is that Zhengzhang and Pan both observed the high frequency of retroflex initials in Middle Chinese and chose to reconstruct single dentals as their sources. Phonostatistics could be suggestive. If, for instance, the proportion of *ʈ- to *tɕ- in Middle Chinese B-type syllables were three to one, then it might make sense to reconstruct their Old Chinese sources as *t- and *tj- instead of as *tr- and *t-, since simple initials are normally more common than clusters. (Whether short vowels make sense as a conditioning factor for retroflexion is another matter.) PITTAYAPORN'S PROTO-TAI CONSONANTS (PART 1: *ɓl-)

I can't even remember how many series I've started and never finished this year alone. Yes, I have a short attention span. I also have excessive ambition. I keep picking - or stumbling on - extremely complex topics that I can't tackle in a day. What I think are bite-size pieces just keep growing. Let's see how small I can keep this.

Pittayawat Pittayaporn's (2009: 149) reconstruction of Proto-Tai has only one cluster with an implosive: *ɓl-. That initially struck me as unusual because I am accustomed to Southeast Asian languages with complex onsets but no clusters with implosives as first elements: e.g., Pyu, Mon, and Khmer. But then I remembered that Middle Vietnamese had bl- /ɓl/.

I don't know of any modern Vietnamese dialects that have labial reflexes of Middle Vietnamese /ɓl/, but I know almost nothing about Vietnamese dialects. If not for Middle Vietnamese, I wouldn't be able to reconstruct such a reflex.

On the other hand, there are labial reflexes in modern Tai languages that make the reconstruction of Proto-Tai *ɓl- possible. Below I cite reflexes of Proto-Tai *ɓlɯən A 'moon' mostly from Pittayaporn (2009) and Hudak (2008) plus 扶绥 Fusui and Shan forms from the Austronesian Basic Vocabulary Database):

1. *ɓl-type reflexes

1a. /bl-/: Saek /bliən A1/ 'moon' (only Saek retains *-l-)

1b. /bj-/: (Bao Yen /bjɔːk DL1/ < Proto-Tai *ɓloːk D 'flower')

1c. /mj-/ (mentioned as a reflex in Pittayaporn 2009: 150; unable to find examples)

cf. Vietnamese *ɓ- > m- (but the reverse may have occurred in Pyu!)

2. *ɓ-type reflexes

2a. /b-/: Shangsi /bun A1/

cf. Shangsi /boy A1/ < Proto-Tai *ɓaɰ A 'leaf'

is Shangsi /oy/ [oj] or [oɥ]?

2b. /m-/: Fusui /mɯːn A1/

cf. Fusui /mɯj A1/ < Proto-Tai *ɓaɰ A 'leaf'

(Shan /mɔk DL1/ < Proto-Tai *ɓloːk D 'flower'; see 3c below for the Shan word for 'moon')

3. *ɗ-type reflexes (*ɗ- combines the implosion of *ɓ- with the place of articulation of *-l-)

3a. /ɗ-/: Wuming /ɗɯan A1/

cf. Wuming /ɗoj A1/ < Proto-Tai *ɗɤj A 'good'

3b. /d-/: Thai /dɯən A1/

cf. Thai /diː A1/ < Proto-Tai *ɗɤj A 'good'

3c. /l-/: Shan /lɤn A1/ (but cf. 'flower' in 2b above!)

cf. Shan /li A1/ [liː] < Proto-Tai *ɗɤj A 'good'

3d. /n-/: Po-ai /nɯːn A1/

cf. Po-ai /niː A1/ < Proto-Tai *ɗɤj A 'good'

cf. Vietnamese *ɗ- > n- (no evidence for the reverse in Pyu which had no /ɗ/)

I started writing about Proto-Tai *ɓlɯən A 'moon', but I'm going to move that to a post of its own. This post started on the 17th and has taken me five days to finish. I don't want peripheral material to hold it back any longer. MEITEI PHONOLOGY

Meitei is an isolate within the Sino-Tibetan family. Its speakers constitute the majority of the population in the Indian state of Manipur on the border with Burma.

I originally wanted to write a post with the characters of the Meitei script reorganized for my own convenience in the standard Indic order, but KompoZer doesn't support the Meitei range of Unicode for some reason, so I'm going to write about Meitei phonology instead based on Chelliah (2016).

Consonant phonemes

native initial (ideophones only)
native medial
borrowed initial/medial
borrowed only?




r l



I am not sure whether voiced aspirates appear in native words. In the Wikipedia article on the Meitei native religion of Sanamahism, some names of native deities are spelled with -dhou after u (in the recurring name element Ebudhou),  but others have -thou after -ng. That suggests native voiceless aspirates might have voiced in intervocalic position.

I do not know the origin of voiced obstruents in medial position in native words. Two scenarios with hypothetical examples:

Scenario 1

Initial devoicing but retention of voiced series in medial position:

*gaka > kaka

*kaga > kaga

Scenario 2

Medial voicing of voiceless series; development of new medial voiceless series from something else:

*kaka > kaga

*kaXa > kaka

Although Meitei borrowed voiced aspirates from Indo-Aryan, it did not borrow retroflex consonants.

I can't find a character in the Meitei script for /cʰ/. Are /c/ and /cʰ/ both written with U+ABC6 <c>?

Vowel phonemes

Meitei has a six-vowel system identical to that of Old Chinese and pre-Tangut:


Pyu has a similar seven-vowel system with an additional distinction between front /ä/ and nonfront (back?) /a/.

It would be asking too much for Meitei vowels to precisely line up with those of Old Chinese, pre-Tangut, or Pyu. Nonetheless it is nice to see these mostly straightforward correspondences for the Meitei numerals at Omniglot.

Old Chinese
/p.lä/ < *-e
*C.ŋaʔ *P.ŋa /pə.ŋa/

Needless to say, numerals alone are insufficient evidence for a genetic relationship, as they can be borrowed: e.g., Proto-Tai has *saːm A 'three', *siː B 'four', *haː C 'five', and *krok D 'six' from Chinese (Pittayaporn 2009). (*soːŋ A  'two' is from Late Old Chinese 雙 *ʂɔŋ 'pair'.) What gives away the Chinese origin of the numerals are Chinese-internal innovations: e.g., the irregular lowering of *-u- in 'three' and the loss of *-l- in 'four'. SHARP-HEAVY, ENTERING/DEPARTING, AND B/D

A generic model for tonogenesis in the Sinosphere involves four categories of final consonants.

Category names
Hmong-Mien Kra-Dai
ngang 'even' / huyền 'dark'
平 'level'
sắc 'sharp' / nặng 'heavy'
上 'rising'
hỏi 'ask' / ngã 'fall'
去 'departing'
*nonglottal stops
sắc 'sharp' / nặng 'heavy' 入 'entering'

To demonstrate these categories, I will use hypothetical examples for simplicity.

Vietnamese has six categories. Each proto-category split in two depending on the voicing of the proto-initial consonant, and as I'd expect, all syllables with final stops developed the same tones (sắc/nặng).

*kaʔ > cá (sắc)

́*kak > các (sắc)

*gaʔ > cạ (nặng)

*gak > cạc (nặng)

In White Hmong, syllables with initial *voiced consonants and final *glottal stops developed the same tone as syllables with initial *voiceless (!) consonants and final *nonglottal stops  (Ratliff 2010: 184).

*gaʔ > kas

́*kak > kas

(-s represents a low tone.)

However, in other Sinospheric languages, syllables with final *-h and syllables with final *nonglottal stops may develop similar or identical tones:

1. In standard Mandarin, there is a weak tendency for syllables which once had final *nonglottal stops to have the same high falling tone as syllables which once had final *-h.

*kah > ku (high falling tone)

*kak > (high falling tone)

2. In Cantonese, syllables with final nonglottal stops have noncontour tones like syllables which once had final *-h.

*kah > kuː (mid level tone)

*kak > kɔːk (mid level tone)

3. Gedney (2008) has descriptions of the tone systems of 19 Tai varieties. In all of them, there is at least partial overlap between the tones of the B and D categories: e.g., in Thai, syllables of those categories almost always have the same tones:

*ka(ː)ʔ > kaː (low tone)

́*ka(ː)k > ka(ː)k (low tone)

*ga(ː)ʔ > kʰaː (falling tone)

*gaːk > kʰaːk (falling tone)

but *gak > kʰak (high tone!)

I suspect there was no distinction between */Vʔ/ and */Vːʔ/.

There is a strong tendency for B tones to overlap with D tones with long vowels. Conversely, D tones with short vowels (e.g., *gak in the hypothetical Thai example), tend to go their own way.

Today I realized what might have led to similar tones in the B and D categories (and their equivalents outside Tai). Contrast these two scenarios:

Scenario 1

Stage 1
Stage 2
V + tone 1
+ tone 2
V + tone 3
Vk + tone 2

Scenario 2

Stage 1
Stage 2
*V̰ʔ *Vh
Stage 3
*V̰ *Vh
Stage 4
V + tone 1
+ tone 2
V + tone 3
V(k) + tone 3

Scenario 1 is straightforward; all syllables with *stops develop the same tones. This is what happened in Vietnamese.

Scenario 2 is more complicated.

In stage 2, final *glottal stops condition *creaky voice which is nonphonemic (= predictable and hence nondistinctive).

In stage 3, final glottal stops are lost, and *creaky voice becomes phonemic (= unpredictable and hence distinctive). *V̰ no longer ends in a stop, so it loses its phonetic resemblance to *Vk which still ends in a stop.

In stage 4, modal and creaky-voiced syllables develop tones 1 and 2, whereas syllables ending in obstruents develop tone 3.

Pittayaporn (2009) posits a third scenario for Proto-Tai to which I add a top row (he does not reconstruct segmental sources for Proto-Tai tones).

Scenario 3

Stage 1
Stage 2
*Vʔ *V̰ *k
Stage 3
V + tone 1
+ tone 2
V + tone 3
Vk + tone 3

What I don't understand is why *Vh conditions creakiness (a ʔ-quality)  rather than breathiness (an h-quality).

Pittayaporn's Proto-Tai (stage 2) is somewhat like modern Burmese which distinguishes between creaky vowels and vowels followed by glottal stops:

Stage 1
Stage 2
*V̰ *V̤ *Vk
Stage 3
V + tone 1
  + (tone 2)
V + tone 3
(tone 4)

I could argue that Burmese really only has two tones, low (tone 1) and high (tone 3); it is creakiness and a final glottal stop that distinguish the other two syllable types.

The difference between Burmese and Pittayaporn's Proto-Tai is that creakiness in the former has a more straightforward source (*-ʔ) than in the latter (*-h).

Here is one more scenario that mixes elements of 2 and 3 via a chain shift in Proto-Tai:

Scenario 4

Stage 1
Stage 2
*V̰ *Vʔ *Vk
Stage 3
*V + tone A
*V + tone C
*V̰ *k
Stage 4
V + tone A
+ tone C
V + tone B
Vk + tone D

Stage 1 has no phonemic phonation or tones.

Stage 2 has a distinction between modal and creaky phonation. The latter is from *glottal stop. A new glottal stop from *-h takes the place of that lost glottal stop:

*-Vh > *-Vʔ > *-V̰

Stage 3 continues the chain shift of stage 2:

*-Vʔ > *-V̰ > *-V + tone C

The result is a system resembling that of modern Burmese.

Stage 4 is fully tonal.

Scenario 4 cannot account for Thai in which tone C is still glottalized after (formerly) *voiced initials. EARLY OLD CHINESE PRESYLLABIC *I

Baxter and Sagart (2014: 224) use the notation *A for an Old Chinese *a that has unexpected Middle Chinese reflexes with palatal elements: e.g.,

土 Old Chinese *tʰˁaʔ > Early Middle Chinese *tʰɔˀ 'earth'

is phonetic/semantic in

社 Old Chinese *m-tʰAʔ > Early Middle Chinese *dʑiæˀ 'sacrifice to the spirit of the soil'

instead of Early Middle Chinese †dʑɨəˀ without any palatal vowels

in fact, the phonetic series of 土 has no examples of Early Middle Chinese †-ɨə

Baxter and Sagart speculated that what they write as *A could actually reflect the effect of some unknown preinitial consonant on the vowel.

I propose that we are actually seeing vocalic transfer: the effect of some unknown first syllable vowel on a second syllable vowel.

That occurred to me as I was working out the development of 抯 'to pull out of water' which I originally intended to cite in the addendum to my last post until I decided to use more straightforward examples.

Early Old Chinese
*r(i)-tsa or *ts-r-a *Ci-tsa-ʔ *Ni-tsa-ʔ
Middle Old Chinese
Late Old Chinese
*tʂɤɑ *tsiæʔ
Early Middle Chinese
*tʂæ *tsiæ̰

Early Old Chinese: It is not clear whether the first variant has a prefix *r(i)- or an infix *-r-. *Ci- might be the same syllable as *r(i)-; if so, then there is a root *tsaʔ with only two prefixes, *ri- and *Ni-.

Middle Old Chinese: The *i in the first syllable has caused the following *a to break to *ia. No breaking occurred in *tsrˁa, either because its *i was lost before vocalic transfer (see the table below) or because it never had an *i (i.e., its *-r- was an infix).

Early vs. late *i-loss

Stage 1
*ri-tsa *ri-tsa-ʔ
Stage 2: early *i-loss
*r-tsa *ri-tsaʔ
Stage 3: vocalic transfer
*r-tsa *ri-tsi
Stage 4: late *i-loss
*tsrˁa *tsiaʔ

*tsrˁaʔ has pharyngealization because it lacked a high vowel that would have blocked pharyngealization.

*r- and *-ts- underwent metathesis: *r-ts- > *tsr-.

Late Old Chinese:

*tsr- fused to retroflex *tʂ-.

*N-ts- fused to voiced *dz-.

Pharyngealization was lost.

*a broke to *ɤa after retroflexes.

*a fronted to to assimilate with the preceding front vowel *i.

Early Middle Chinese:

*ɤa became *ea (to avoid *ɤ, a vowel absent elsewhere in the system) which then fused to *æ.

Final glottal stop *-ʔ became creaky voice (written with a subscript tilde).


backed to /a/.

*ts palatalized to /tɕ/ before *-i-.

The diphthong *iæ became /jɛ/.

'Deeper' phonemicizations are possible: e.g., Pulleyblank's (1984: 52) /iă/ or my /jə/.

Open syllables with *voiceless initials and nonglottalized, nonbreathy vowels developed tone 1.

Syllables with *voiceless obstruent initials and *creaky voice developed tone 3.

Syllables with *voiced obstruent initials and *creaky voice developed tone 4 via assimilation:

*dziæ̰ > *dzʱiæ̰ > *dzʱiæ̤ > *tsʱiæ̤ > /tɕjɛ4/

The *voiced initial became breathy voiced, and the *creaky voice in the following vowel became *breathy voice. Then the initial devoiced and lost its breathy voice to dissimilate from the following *breathy voiced vowel. Ultimately the vowel lost the breathiness that conditioned what is now Mandarin tone 4.

6.19.12:51: Contrast the three readings of 抯 'to pull out of water' with those of other words written with the same phonetic 且: 且/祖 'ancestor', 沮 'to leak', 沮 'marsh', and 菹 'marsh':.

且/祖 'ancestor' 沮 'to leak' 沮 'marsh' 菹 'marsh'
Early Old Chinese
*ts *Nɯ.tsaʔ *rɯ.tsa-s *ri.tsa *rɯ.tsa *r.tsa
Middle Old Chinese
*tsɨa *tsrˁa
Late Old Chinese
*tsɑʔ *dzɨaʔ
*tsiæ *tsɨa *tʂɤɑ
Early Middle Chinese
*tsɔ̰ *dzɨə̰
*tsɨə̤̰ *tsiæ *tsɨə̤ *tʂæ
/tɕjɛ1/ /tɕy1/ †/tʂu1/

Karlgren (1957: 32) regards the character 菹 as primarily representing Old Chinese *tṣi̯o 'to pickle', equivalent to my *r(ɯ).tsa.

At a pre-Early Old Chinese stage with many (mostly?) disyllabic words, there were three basic words:


Possibly *CV.ts if the first syllable was lost without a trace. Who knows, maybe at some earlier point *CVC or even more complex first syllables were possible.

*NV.tsaʔ 'to leak'

I use periods to indicate breaks between syllables and - in later stages - the borders between presyllables and syllables if I cannot detect a morpheme boundary. I don't know of any 'leak' word family with different prefixes before a root √tsaʔ, so I assume *NV.tsaʔ was a disyllabic root rather than √tsaʔ plus a nasal prefix.

*ri.tsa 'marsh'

I don't know of any 'marsh' word family with different prefixes before a root √tsa, so I assume *ri.tsaʔ was a disyllabic root rather than √tsa plus a nasal prefix.

In Early Old Chinese, the higher vowels of first syllables mostly merged into (my symbol for 'unknown high vowel' inspired by the Middle Korean minimal vowel ㅡ /ɯ/). *i is recoverable if it triggered vocalic transfer after acute initials, but I otherwise can't tell if was from *i, *ə, *u, or even some other vowel like *y that no longer existed in Old Chinese.

How did three words become six? 'Marsh' developed four variants:

1. Conservative: Middle Chinese *tsiæ implied by the fanqie of Jiyun (1037, after the Middle Chinese period but still based on Middle Chinese phonology) is the direct descendant of *ri.tsa.

No real Mandarin descendant. /tɕjɛ1/ is a reading generated by reading the Jiyun fanqie as Mandarin, not a naturally transmitted word. It would be convenient to have a term for this kind of artificial modern reading.

2. Depalatalized first vowel; presyllable lost after vocalic transfer: *ri.tsa > *rɯ.tsa > *tsɨa

This variant is not recorded in the phonological tradition but is reconstrucible on the basis of Mandarin /tɕy1/ unless that is a reading by analogy with 且 /tɕy1/.

Windows' IME says 沮 can also be read /tɕy1/.

3. Depalatalized first vowel and suffixed: *ri.tsa > *rɯ.tsa-s

I have no idea what the final *-s is doing.

4. First vowel lost before vocalic transfer; metathesis: *ri.tsa > *r.tsa > *tsrˁa

This variant has no standard Mandarin descendant.

The phonetic series of 且 is 'mixed' in the sense that it includes characters for three types of Early Old Chinese (sesqui)syllables:

1. *(Cʌ.)CV: e.g., 且/祖 *tsa/ʔ 'ancestor'

If there was a low-vowel presyllable *Cʌ-, it wouldn't have affected the vowel in a following *a-syllable, so its presence or absence is impossible to detect. I only reconstruct it whenever it leaves a trace.

2. *Cɯ.CV: e.g., 沮 *Nɯ.tsaʔ 'to leak', 且 *Cɯ.tsa 'many'

3. *Ci.CV: e.g., 菹 *ri.tsa 'marsh', 且 *Ci.tsʰaʔ 'moreover'

'many' and moreover' might have a common root √(Ci.)tsa with a prefix or a root presyllable that conditioned aspiration in 'moreover':

*Ci.tsa-ʔ > *Ci.tsiaʔ > *C.tsiaʔ > *tsʰiaʔ?

The vowels of the three types (low *ʌ, high front *i, and high back *ɯ) are reminiscent of the three vowels of open presyllables in Pacoh (Watson 1964: 144): /a i u/¹. Watson's examples are:

/pa.piː/ 'to converse'

/ti.noːl/ 'post (n.)'

/ku.ceːt/ 'to die'

Vietnamese chết /cét/ 'to die' has no trace of the presyllable still in Rục ku.cíːt  'to die'.

Here are Early Old Chinese words with similiar presyllables and their post-vocalic transfer Middle Old Chinese descendants:

*Cʌ.nuʔ > *nˁauʔ  'brain'

*Cʌ could be *pʌ-; cf. Proto-Austronesian *punuq 'brain'; see this post on the mismatch of the first vowels

*Ci.sak > *tsiak 'to loan, borrow'

*Ci- could be *ti-

*kɯ.dzraŋ > *k.dzrɨaŋ 'bed'

Appendix: More examples of Early Old Chinese *Ci- words

I collected these from Baxter and Sagart (2014: 223-226) and Schuessler (2009: 45, 64). Middle Old Chinese forms follow to show the aftermath of vocalic transfer.

1. 邪 *Ci.ɢa > *ɢia 'interrogative particle'

2. 者 *Ci.taʔ > *tiaʔ 'nominalizing particle'

I am not comfortable with sesquisyllabic particles in a language whose roots are typically monosyllabic. But then again, maybe in EOC, sesquisyllabic roots were the norm.

3. 奢 *si.tʰa > *stʰia   'extravagant'

but not all characters with the phonetic 者 had *i: e.g.,

*sɯ.taʔ 'to cook'

possibly cognate with Tangut


4664 1liq1 < ?*S.ta < ??*Si.ta 'to cook'

The *i assumes cognancy with 炙; see below.

4. 車 *ki.kla > *kkia > *kʰlia 'chariot'

also *kɯ.kla > *klɨa 'id.'

cf. Proto-Indo-European *kʷékʷlo- 'wheel'

5. 寫 *Ci.saʔ > *siaʔ 'to depict'

6. 遮 *Ci.ta > *tia 'to cover'

7. 野 *mi.laʔ > *liaʔ 'open country'

also *mɯ.laʔ > *mlɨaʔ 'id.'; the Middle Chinese reading *dʑɨə̰ has an irregular *d-

*mɯ.rəʔ 'village' on the left might be phonetic; is the resemblance to Japanese mura coincidental? If the Japanese word were really related, I would expect †muro < ††-ə.

8. 昔 *Ci.sak > *siak 'in the past'

9. 柘 *Ci.taks > *tiaks 'mulberry tree'

In Proto-Min (PM), the merger of *i with in presyllables was total before *-ak; there are no traces of *i, and both *Ci.Cak and *Cɯ.Cak have become PM *Ciok (Baxter and Sagart 2014: 226):

10. 炙 *si.tak > *tiak 'to roast'

PM *tšiok

cognate to 煮 *sɯ.taʔ 'to cook'?

is the resemblance to Japanese tak- 'to burn, to cook rice' coincidental?

11. 尺 *Ci.tʰak 'foot (measure')

PM *tšhiok

12. 石 *Ci.dak or *Ni.Tak 'stone'

PM *džiok

13. 螫 *Ci.l̥ak > *l̥iak 'to sting'

PM *tšhiok

14. 𥼶 *Ci.l̥ak > *l̥iak 'to wash rice'

PM *tšhiok

15. 射 *mi.lak > *mliak 'to hit with an arrow'

PM *džiok

16. 借 *Ci.sak > *tsiak 'to loan, borrow'

PM *tsiok

*C- fused with *s- into *ts-

17. 席 *Ci.dzak > *ziak 'mat' with irregular *z
PM *dziok
cf. 藉 *Ci.dzak-s > *dziaks 'mat'

Unsolved mystery: Why did *-i- only leave traces in syllables ending in *-a, *-aʔ, *-as, and *-ak(s)?

¹But in closed sesquisyllables, only /ə/ is possible: e.g., /ɓəmɓar/ 'to divide by two' < /ɓar/ 'two'. I do not know yet whether Old Chinese had closed sesquisyllables. REFLEXES OF PROTO-TAI *P.T- IN SAEK

Earlier today (in a table in an addendum I finished on 6.14) I mentioned the 'famous' Saek word for 'eye' (praː) which attracts attention because it's not like Thai taː or similar words in other Tai languages. Pittayaporn (2009: 323) reconstructs its Proto-Tai source as *p.ta which elegantly accounts for the p-, -r- (< *-t-), and t-.

That made me curious about whether Proto-Tai *p.t- always became pr- in Saek. Going through Pittayaporn's list of Proto-Tai reconstructions, I see that Proto-Tai *p.t- has two different reflexes:

1. pr- as in 'eye' (above) and pra:j 'die' (Pittayaporn 2009: 357)

2. t- as in tɤ: 'gizzard' (Pittayaporn 2009: 330)

The presyllable *p.- must have been lost in the ancestor of Saek 'gizzard'; it is reconstructible on the basis of Bao Yen pʰɤɰ whose aspiration is from *-r̥- < *-r- < *-t- (cf. Cao Bang tʰɤj with the same source of aspiration).

Pittayporn (2009: 328) reconstructs Proto-Tai *p.tak 'grasshopper' even though that word has no reflexes in Saek or Bao Yen. Does it have any reflexes with p-like initials? I think he reconstructs *p.t- on the basis of forms like Cao Bang and Shangsi tʰak which have aspiration from  *-r̥- < *-r- < *-t- (as in Bao Yen). Even without Saek or Bao Yen or anything labial, the pattern of initials in Cao Bang and Shangsi matches that of *p.t-words rather than *t-words:

Bao Yen
Cao Bang
pʰ- tʰ-

If Proto-Tai 'grasshopper' were simply *tak, the Cao Bang and Shangsi reflexes would be †tak with †t-.

6.15.10:16: Old Chinese had many words of the 'gizzard' type that had variants with and without presyllables: e.g., 扶 'to crawl'.

Early Old Chinese
*Nɯ-pʰa *pʰa
Middle Old Chinese
Late Old Chinese
*bua *pʰɑ
Early Middle Chinese
*buo *pʰɔ
Late Middle Chinese

At a stage even before Early Old Chinese, the word may have been *Ni-pʰa, *Nə-pʰa, or *Nu-pʰa with a high series vowel that was later reduced to in an unstressed position and ultimately lost.

In Early Old Chinese, the word had developed a variant without a presyllable. *pʰa is comparable to English 'cause, a variant of because without a presyllable be-. Presyllable loss - and other forms of reduction - are not entirely mechanically predictable. Just because because could lose its be- doesn't mean that it always did, much less that all be-words had such variation: e.g., there is no monosyllabic variant †lieve of believe.

In Middle Old Chinese, the high vowel of the presyllable conditioned the warping of *a to *ɨa. The variant without a presyllable had no high vowel and was subject to developing pharyngealization. I write pharyngealization after the initial consonant, but it was a quality of the entire syllable.

In Late Old Chinese, *N-pʰ- fused into *b-. rounded to *u after labials. Pharygealized *a backed to *ɑ. Pharyngealization disappeared after leaving its mark on the vowel.

In Early Middle Chinese, *a raised and rounded to *o after *u. *ɑ raised and rounded to *ɔ.

In Late Middle Chinese, the vowels raised further: *uo > *u, *ɔ > *o. *b- became breathy *fʱ before *u.

In Mandarin, breathiness conditioned tone 2 before being lost. Open syllables without that breathiness or any laryngeals developed tone 1. *o raised even further to /u/.

痡 'suffering' and 鋪 'to spread out' both have two variants, one with a presyllable and one without. The bare version happens to be homophonous with the monosyllabic version of 'to crawl'.

Early Old Chinese
*Cɯ-pʰa *pʰa
Middle Old Chinese
Late Old Chinese
*pʰua *pʰɑ
Early Middle Chinese
*pʰuo *pʰɔ
Late Middle Chinese

*pʰ-, unlike *b-, did not develop a breathy reflex in Late Middle Chinese. As a result, Late Middle Chinese *fu became Mandarin /fu1/ rather than /fu2/ with tone 2 conditioned by *breathiness.

I suspect that the sesquisyllabic (and even earlier disyllabic) versions of 痡 'sufferihg' and 鋪 'to spread out' had very different first halves: e.g., *kupʰa and *pipʰa, etc. The original first consonants are not recoverable, and all that can be said about the original first vowel was that it was nonlow; a low series vowel (*a *e *o) would not have conditioned the warping of *a to *ɨa. *ɯ is my symbol for an unknown high series vowel. So the 'homophony' of 痡 'sufferihg' and 鋪 'to spread out' is an illusion caused by my agnostic notation *Cɯ-pʰa; the two words may not have been homophonous until Middle Old Chinese.

I don't know why 鋪 'to spread out' is written with the 金 'metal' radical. The sesquisyllabic version of 'to spread out' has a more common spelling 敷 with the radicals 方 'direction' and 攵 'action with hand'¹ which make more sense. 敷 is not a spelling of the monosyllabic version *pʰa.

Schuessler (2007: 173) regards 鋪敷 'to spread out' to be cognate to 布 *pa-s 'to spread out' and 博 *pa-k 'wide'. The aspirated initial *pʰ- may be from some earlier cluster like *kp- (which is absent from Baxter and Sagart's 2014 reconstruction). Perhaps the earliest reconstructible form of 鋪 'to spread out' is *kɯ-pa. The two Middle Old Chinese forms would then both reflect the presyllable.

Stage 1: Early Old Chinese


Stage 2: early presyllabic vowel loss
Stage 3: vocalic transfer
*kɯ-pɨa *kpa
Stage 4: late presyllabic vowel loss
*kpɨa *kpa
Stage 5: aspiration
*pʰɨa *pʰa
Stage 6: Middle Old Chinese
*pʰɨa *pʰˁa

In Stage 1, there is only one form of the word.

In Stage 2, the word develops a monosyllabic variant *kpa.

In Stage 3, the vowel of *kpa remains unbent since there is no presyllabic high vowel to condition the bending of *a to *ɨa.

In Stage 4, the presyllabic vowel of *kɯ-pɨa was lost.

In Stage 5, *kp- became *pʰ- - a change that probably also occurred in Middle Korean centuries later.

In Stage 6, the variant without a high vowel developed pharyngealization.

I forgot about the use of 布 *pa-s 'to spread out' to write 'cloth' (a borrowing from an Austroasiatic language: cf. Katu [Kantu dialect] kapaːs 'cotton', Kuy kpah 'cloth', and Sanskrit kārpāsa- 'cotton', also an AA borrowing) which fits my hypothesis of an earlier *k- in 'to spread out', a native word that happened to sound like 'cloth'. The *k-p-word was later reborrowed with disyllabic spellings:

幏布 *kæh-pɑh 'cotton' (c. 100 AD); is the first *-h for foreign *-r-, or was this spelling coined by someone who still had *kr- in 幏: *krɑh-pɑh?

古貝 *kɔˀ-pɑɕ 'cotton' (c. 430 AD)

See Schuessler (2007: 173) for further discussion, though he does not reconstruct *k- in the Old Chinese words for 'cloth' or 'to spread out'.

¹There is no Chinese word 攵 'action with hand'; the gloss refers to the use of 攵 *(r-)pʰok 'to beat' as a component in other characters. (The word 'to beat' is more commonly written 撲 which is not a component in other characters.) DID SAEK SHIFT *Z- UNDER VIETNAMESE INFLUENCE?

Last night I stumbled upon found this passage in Pittayaporn (2009: 296):

In Saek, *z- became /j-/ merging with PT *ˀj-, probably due to influence from North-Central Vietnamese, where original *z- has become /j-/ (Alves 2007).

Northern Vietnamese has /z/ corresponding to /j/ in central and southern Vietnamese. I think Saek would be or would have been in contact with central Vietnamese. (It's not clear if there are Saek villages in Vietnam anymore.)

One might conclude that the north preserves a /z/ that became /j/ elsewhere. This would then be parallel with Saek. But I am not sure that is the case. Here are the data:

Old Vietnamese
*kj-, *-C-
*j-, *-T-
*r-, *-s-
Middle Vietnamese spelling
Northern Vietnamese
Nonnorthern Vietnamese

By 'northern' I mean Hanoi and Vinh (the latter is north central); 'nonnorthern' refers to Huế (at the center) and Saigon. (I don't want to say 'south' because Huế is certainly not in the south.)

Capital letters stand for obstruents with unspecified voicing: e.g., *C could be voiceless *c or voiced *ɟ.

Hyphens before consonants indicate the presence of an unspecified presyllable: e.g, *-C- represents *c or voiced *ɟ. preceded by a presyllable.

Exactly what the Middle Vietnamese spellings gi- d- r- stood for is not certain. I can only say that none of those three consonants were /z-/ or /j-/. I think it's possible that gi- and d- became /j-/ without a *z-phase. But maybe Saek is evidence for such a phase.

Or is it? The /z-/ of Vietnamese postdates the 17th century and long postdates the devoicing of original *voiced obstruents (possibly by the late first millennium AD). On the other hand, Saek *z- is original. Did Saek have *z- and a full set of voiced obstruents as late as the 18th century - almost a thousand years after Vietnamese devoiced its voiced obstruents?

6.14.2:21: I don't think what I wrote above is clear. Let me try again.

Phases of Vietnamese

Vietnamese consonants can be said to have gone through five phases which I will illustrate with hypothetical examples for simplicity:

-voc -voc
*praː *taː
*p *taː
*pʂ *taː
/zaː/ ~ /jaː/ /zaː/ ~ /jaː/ /saː/ ~ /ʂaː/

Phase 1: Early Old Vietnamese:

presyllables present

no tones

no lenition

phonemic voicing in obstruents

I am not sure Early Old Vietnamese ever had *(d)z-. It is perhaps telling that Early Middle Chinese 字 *dzɨʰ 'written character' was borrowed as ́*ɟɨːʰ (now chữ) rather than as †zɨːʰ which would have become †tữ. Later Early Middle Chinese 字 *dzɨʰ became Late Middle Chinese 字 *tsɨ̣ and was borrowed again into Vietnamese; see phase 3 below.

Phase 2: Middle Old Vietnamese:

*-r- > *-r̥- after a voiceless initial

subphonemic tones conditioned by voicing before main vowel: *voiceless > unmarked ngang tone, *voiced > grave accent for huyền tone

tones conditioned by final consonants may date between phase 1 and phase 2

Phase 3: Late Old Vietnamese:

voicing (lenition) of medial obstruents: *-t- > *-d-

*-r̥- > *-ʂ-

devoicing of voiced obstruent initials

words formerly distinguished by obstruent voicing now distinguished only by tone which had become phonemic

Late Middle Chinese 字 *tsɨ̣ 'written character' (with a devoiced initial) was borrowed as ́*sɨ̣ː (now tự). (For simplicity I use a Vietnamese tone mark even for Late Middle Chinese.)

Phase 4: Middle Vietnamese:

presyllables lost

*Cʂ- > s- /ʂ/

Drag chain *s- > *t- > /ɗ/

Italicized forms are 17th century spellings; those spellings of consonants remain in use today. đ is /ɗ/, but the phonetic value of d is uncertain. [d] is the simplest interpretation, but [dʲ] and [ð] are also possible.

Phase 5: Modern Vietnamese: different reflexes of Middle Vietnamese s and d depending on dialect. s lost retroflexion in Hanoi (but not in Vinh which has /z/ like Hanoi and unlike the nonnorth dialects; Thompson 1987: 98). The picture for d is less clear. Two scenarios:

Scenario 1. All dialects shifted d to /z/, and nonnorthern dialects shifted /z/ to /j/


Scenario 2. d shifted in different ways; no shared /z/-phase


There is no doubt that Proto-Tai *z- became /j-/ in Saek. The question is whether that shift in Saek reflects the influence of Vietnamese given scenario 1. Let's suppose scenario 1 is true. Phase 4 is in the 17th century and phase 5b perhaps starts in the middle 19th century. (The last traces of Middle Vietnamese consonantism seem to disappear after the early 19th century.) So the Saek change would have to be dated between the 17th and 19th centuries. But if the Saek change were that recent, Saek would have had *z- - and presumably other Proto-Tai voiced obstruents such as *g *d *b- - as late as the 17th or even 18th century. That doesn't seem likely given that its neighbor Vietnamese had undergone devoicing prior to borrowing from Late Middle Chinese during phase 3 (circa the 10th century).

Phases of Saek

Saek has gone through some of the same changes as Vietnamese up to phase 3, though the details differ:

*praː *taː
*pər *pr̥aː *taː
*pdaː *praː *pʰraː
pr raː pʰraː taː àː saː

Phase 1: Proto-Tai:

presyllables present (rewritten here as *Cə- instead of as *C.- as in Pittayaporn's notation)

no tones

no lenition

phonemic voicing in obstruents

Phase 2:
drag chain shift: *-t- > *-d- > *-r-; contrast with Vietnamese phase 3 in which  *-t- > *-d-; 

Phase 3:

loss of presyllabic vowels

*pər- > *pr-; *pr̥- > *pʰr-

subphonemic tones determined by initial consonant (Including presyllabic consonants unlike Vietnamese) after lenition (again, unlike Vietnamese)

To facilitate comparison with Vietnamese, I use Vietnamese tone notation: zero for tone A1 and a grave accent for tone A2.

Tones conditioned by final consonants may have developed between phase 1 and phase 3.

Phase 4:

drag chain shift: *pd- > *pr- > r-, *d- > tʰ-, *z- > j-

words formerly distinguished by initial voicing now distinguished by tone wh\ich has become phonemic

My guess is that lenition and devoicing happened independently in Vietnamese and Saek, whereas tonogenesis did not - Vietnamese phase 3 and Saek phase 3 may have been simultaneous.

Phases of Cao Bang

On 6.11, I thought Saek having *z- and other voiced consonants as late as the 18th century was improbable, but Tai languages on the Sino-Vietnamese border never underwent devoicing (PIttayaporn 2009: 110). Compare the phases of Cao Bang with those of Vietnamese and Saek:

*pdaː *p
*p *pdàː *pʂ
dàː pʰj
taː àː

Phase 1: Proto-Tai: same as Saek phase 1

Phase 2:

loss of presyllabic vowels

*-r- > *-r̥- after a voiceless initial (as in Vietnamese and Saek)

Phase 3:

Chain shift: *pt- > *pr̥-*pʂ-

subphonemic tones determined by voicing of consonant before vowel (contrast with Saek)

To facilitate comparison with Vietnamese, I use Vietnamese tone notation: zero for tone A1 and a grave accent for tone A2.

Tones conditioned by final consonants may have developed between phase 1 and phase 3.

Phase 4:

*pr̥- > *tr̥- > tʰ-

elimination of *voiceless-voiced clusters and chain shift: *pd- > *d- > dʱ-

*pʂ- > *pɕ- > pʰj-

*z- > *s- > tʰ-; *z- devoiced but this seems to be an anomaly; see my 6.13 entry; the fortition is reminiscent of Vietnamese (see Phan 2013 for examples of *s- > /tʰ/ in Vietnamese: eg., *sit > thịt 'meat'¹) but probably occurred independently much later. Phan (2013: 65) regards fortition of fricatives as "common in Southeast Asia and should not be considered a shared innovation."

tone A2 still strongly associated with voiced initials but has become phonemic due to the devoicing of *z-

Finally, for reference:

Phases of Thai/Lao

Thai and Lao never underwent lenition; medial *-t- and *-d- remain as stops today.

*pdaː *p
* *ɗ *taː
d pʰaː
taː àː saː sàː

Phase 1: Proto-Tai: same as Saek and Cao Bang phase 1

Phase 2:

loss of presyllabic vowels

*-r- > *-r̥- after a voiceless initial (as in Vietnamese, Saek, and Cao Bang)

Phase 3: More or less represented by Thai and Lao spelling (but Lao has no <z>; *z- corresponds to ຊ <j>)

reduction of *pC- to *t- and *ɗ- (not *d-!); was there an intermediate geminate stage *tt- and *dd-?

*-r̥- > -ʰ-

subphonemic tones determined by initial consonant (Including former presyllabic consonants unlike Vietnamese)

To facilitate comparison with Vietnamese, I use Vietnamese tone notation: zero for tone A1 and a grave accent for tone A2.

Tones conditioned by final consonants may have developed between phase 1 and phase 3.

Phase 4

drag chain shift: *ɗ- > d- > *tʰ-

words formerly distinguished by initial voicing now distinguished by tone wh\ich has become phonemic

the Vietnamese notation, though convenient, is misleading, as tones A1 and A2 have undergone splits and, in Thai, a merger.

The development of tones A1 and A2 in Thai and Lao

Stage 3 subphonemic tone
Stage 3 initials
*pʰ-, *s-
*ɗ-, t-
*d-, *z-
Stage 4: Thai tones
Stage 4: Vientiane Lao tones

All of the phases above are my speculations built upon the work of Gage ("Vietnamese in Mon-Khmer Perspective", 1985) and Pittayaporn (2009). The relative chronology is only approximate; some but not all changes could be reordered with the same final results.

¹The nặng tone written with a subscript dot normally indicates a *voiced initial. It is tempting to reconstruct a change *z- > /tʰ/ as in Cao Bang. But support for *z- in native words is weak. The tone may reflect a lost voiced prefix. EMPHATIC SAND

Tonight I found the section on the Middle Korean emphatic particle za at random in Lee and Ramsey (2011: 194). The earliest attestations of it I can find in Old Korean are in two 鄉歌 hyangga


*motʌn kəs sa

'all thing EMPH'

- 慕竹旨郎歌 (c. 700)


*hʌtʌn sa

'one EMPH'

- 禱千手觀音歌 (c. mid-8th century)
where it is spelled phonetically with Middle Chinese 沙 *ʂæ 'sand'.

It occurred to me that the 'sand' spelling of that particle¹ obviously must predate the lenition of *s to Middle Korean z.

If a *z-pronunciation had existed in Old Korean, it could have been spelled with Middle Chinese

嵯嵳𣩈㽨瘥𥰭䑘艖蒫醝䰈鹺䴾齹虘蔖䠡䣜躦𪘓 *dza

or 邪䓉耶椰瑘𥯘鎁釾𦭿𦰳斜䔑擨 *ziæ².

(There was no Middle Chinese syllable *za. This gap is not accidental. I should look into it.)

It turns out that 邪 'evil' is attested as a phonogram in Old Korean hyangga, but 俞昌均 Yu Chhang-gyun (1994: 76) interprets it as a symbol for *ra (cf. its possible Old Chinese reading *la in Schuessler 2009: 56). There have been many attempts to reconstruct the pronunciation of Old Korean. Has anyone interpreted 邪 as *sa (possibly tempted by its modern Sino-Korean reading sa) or *za? I don't have any other sets of hyangga readings on hand. Another thing to look into when I get the chance.

¹6.11.21:29: It never occurred to me to use Unicode superscript numerals for endnotes until now. No more long strings of asterisks.

It's theoretically possible that the 'sand' spelling in this text postdates the 8th century, as these poems survive in 三國遺事 Samguk yusa (1285) whose earliest surviving copy is from 1512. Even if these poems are actually from c. 700 AD, their spellings could have been altered in the centuries between then and 1512.However, I know of no other evidence pointing toward some other Ur-spelling of the emphatic particle. The 口訣 kugyŏl phonogram for *sa ~ *za is 氵 which is almost certainly an abbreviation of 沙 'sand', the most common sa-character with the left-hand component 氵 'water'. Kugyŏl manuscripts from the Koryŏ dynasty (918-1392) predate 1512; one need not worry about potential errors in their transmission.

²6.11.23:44: Nearly all of these characters are rare and therefore not likely candidates for phonograms which tended to be high-frequency characters. So one might argue that the Old Korean particle was *za but not written as such because there was no high-frequency characters with a similar reading other than 邪 *ziæ 'evil' which was already being used for *ra if Yu (1994) is correct. However, if *s had already lenited to *z in Old Korean, I would expect to see other phonogram spellings unambiguously reflecting lenition. But I know of none offhand. Although one might argue that *s lenited before other consonants, that possibility could only be confirmed if there were *(d)z-spellings of later z-words. No such spellings seem to exist.

The only *(d)z-phonogram in Yu's (1994: 75-78) catalog of phonograms in hyangga are the aforementioned 邪 *ziæ 'evil' and

齊 Middle Chinese *dzej 'equal' : Yu's Old Korean *tsjə (my *tse)

which, like 邪 *ziæ 'evil', does not represent an Old Korean syllable corresponding to a Middle Korean z-syllable. So if Old Korean already had *z-syllables, they were not written with Chinese *(d)z-characters and cannot be detected.

I could argue that in fact the dialect of Chinese known to educated Old Koreans had shifted *(d)z- to *(t)sʱ- (as in Pulleyblank's Late Middle Chinese reconstruction), so the characters above wouldn't have been appropriate for an Old Korean *za.

That Chinese dialect had a reflex of Middle Chinese *ɲ- that corresponds to z in Middle Korean Sino-Korean readings. But there was no Middle Korean Sino-Korean reading †za. So it seems Old Koreans had no good options for writing *za if they had such a syllable - and I still don't think they did.

(The questions of what that Chinese dialect's reflex of *ɲ- was and how it was borrowed into Old Korean - as *z- or as something else that became z- in Middle Korean - remain open. The simplest solution is to assume that Chinese dialect had something like the *ž- of Liao Chinese. This was borrowed into Old Korean as *z-, a consonant originally only in borrowings. Later, Middle Korean lenited *s in native words, resulting in a new /z/ that shared the fate of the old borrowed one: both /z/ soon disappeared from the Seoul dialect. [But does any Korean dialect today have a trace of /z/ in Sino-Korean words?) THE PHONETIC VALUE OF MIDDLE KOREAN DOUBLE ZERO

In the earliest hangul texts from the 15th century, there were three circular letters.

ㅇ <Ø> : ㆁ <ŋ> : ㆀ <ØØ>

In modern hangul, ㅇ <Ø> has come to represent zero in initial position and /ŋ/ in coda position: e.g., 앙 <ØaØ> /aŋ/. Although ㅇ may appear with a short vertical line on top like ㆁ <ŋ> in some fonts, that line no longer distinguishes ㆁ <ŋ> from ㅇ <Ø>; the reading of ㅇ /ㆁ is now wholly dependent on its position within a syllabic block.

ㅇ <Ø> had two uses in the earliest hangul orthography for Late Middle Korean in the 15th century. it could represent initial /Ø/ as in the modern language and - unlike the modern language - also represented /ɣ/ in four environments:

1. between /r/ and a vowel

2. between /z/ and a vowel

3. between /j/ and a vowel

4. between /i/ and a vowel

This /ɣ/ has disappeared in the modern standard language, though traces remain in dialects: e.g., 15th century 몰애 <morØai> /morɣaj/ 'sand' corresponds to Pukchhŏng molgɛ with -g- (cf. standard morɛ).

What was ㆀ <ØØ>? Lee and Ramsey (2011: 146) regard it as another spelling of Late Middle Korean /ɣ/. But why would two letters be devised for the same sound at the very beginning of a script? A clue may lie in the limited distribution of ㆀ <ØØ> which was solely used to write forms of the passive/causative suffix ᅇᅵ<ØØi> - and in one instance, the causative suffix ᅇᅮ <ØØu> (月印釋譜 Wŏrin sŏkpo 14:14) - after /j/. If the first suffix were simply /ɣi/, why not spell it as 이 <Øi> which is the spelling after /l z/? (I don't know of any instances of that suffix after /i/. The second suffix is otherwise spelled <Øu> = /ɣu/ after /l z j/.)

Yesterday afternoon it occurred to me that ㆀ <ØØ> might represent a palatal allophone [ʝ] of /ɣ/. This allophone may have been geminated [ʝʝ] if it was like /ss/ and /hh/ which were written as double consonants ㅆ ㆅ <ss hh>. There is even one case of /nn/ as ㅥ <nn> in 訓民正音諺解 Hunmin chŏngŭm ŏnhae.

There is, however, no guarantee that a double consonant necessarily represented a geminate, as ㅆ ㆅ <ss hh> could also represent /z ɦ/ in the prescriptive transcription of Sino-Korean readings. (Native /z/ had a different letter ㅿ <z>. It might be more accurate to regard the artificial voiced consonants of Sino-Korean readings as breathy voiced: e.g., Sino-Korean ㅆ <ss> was /zʱ/ or /sʱ/ and therefore distinct from ㅿ /z/.) Doubled ㄲ ㄸ ㅃ ㅉ <kk tt pp cc> could only represent /g d b dz/ in that transcription in the earliest hangul texts; their use for reinforced consonants came later.

Moreover, the circle was used to derive consonant characters for nongeminates: e.g., /β/ was written as ㅸ. So ㆀ <ØØ> could be interpreted as 'derivative of circle' for [ʝ] rather as than 'double circle' for [ʝʝ] (or geminate zero which would make no sense).

One problem with this proposal is that it cannot easily account for the one instance of ㆀ <ØØ> in the causative suffix ᅇᅮ <ØØu>. It is understandable that /ɣ/ would palatalize to [ʝ] between /j/ and /i/ in, for instance, ᄆᆡᅇᅵ<mʌi.ØØi> /mʌjɣi/ [mʌjʝi] 'to be bound to', the passive stem of /mʌj/ 'to bind'. It is slightly less understandable why /ɣ/ would palatalize to [ʝ] between /j/ and /w/ in  뮈ᅇᅯ <mui.ØØuə> /mujɣwə/ 'moving'. (/ɣw/ is an allomorph of /ɣu/ before vowel-initial suffixes like /ə/ '-ing', called the 'infinitive' [though it is not like an Indo-European infinitive].)

Perhaps 뮈ᅇᅯ <mui.ØØuə> reflects a pronunciation [mujʝɥə] in which the palatal quality of /j/ spread into the following consonants. That pronunciation might even have been common, though for most purposes a phonemic spelling 뮈워 <mui.ØØuə> for /mujɣwə/ might have sufficed instead of a more precise phonetic spelling 뮈ᅇᅯ <mui.ØØuə>. I don't know if the spelling 뮈워 <mui.ØØuə> is attested, but 月印千江之曲 Wŏrin ch'ŏn'gang chi kok 62 has the spelling 뮈우 <mui.Øu> /mujɣu/ for the stem. FRGÁL

Slavic languages normally only have [f] in loanwords and as a positional variant of /v/ (which is why Russian names in -v have variant spellings in -ff).

As far as I know (thanks to Short 1993), Czech initial [f] can only appear

- in onomatopoetic words (e.g., foukat 'to blow')

- as a positional variant of v before voiceless consonants (e.g., vsadit 'to bet', pronounced [fsadit])

- in loanwords from non-Slavic languages (e.g., .fonetický 'phonetic')

So what is the source of the f in the dish called frgál? That f- is before a voiced syllabic r and is not a variant of v-. Is it onomatopoetic or from a foreign language - perhaps Romanian, given that frgál is from Moravian Wallachia? That region isn't continguous with modern Romania, but it was settled by Vlachs. SHIMUNEK (2017) AND DOWNES (2018)

Last night, I found the addenda and corrigenda to Andrew Shimunek's Languages of Ancient Southern Mongolia and North China (2017). I thought that would be as close as I'd get to having his book which I can't afford at $116.76 until I saw an online sampler.

It's remarkable that three books on Khitan have appeared in English within a decade - the other two being Daniel Kane's The Kitan Language and Script (2009) and Wu Yingzhe and Juha Janhunen's New Materials on the Khitan Small Script: A Critical Edition of Xiao Dilu and Yelü Xiangwen (2010 - just a year after Kane's book!).

Can a new book on Jurchen be far behind? It has been almost thirty years since Kane's The Sino-Jurchen Vocabulary of Interpreters (1989) which despite its title is a general gateway to Jurchen language studies as well as complementing Kiyose Gisaburō's A Study of the Jurchen Language and Script - Reconstruction and Decipherment (1977) which covered the Sino-Jurchen vocabulary of the Bureau of Translators.

Not long after Imre Galambos' Translating Chinese Tradition and Teaching Tangut Culture: Manuscripts and Printed Books from Khara-Khoto (2015) comes Alan Downes' PhD dissertation "How Does Tangut Work?" (submitted 2016, revised 2018), a follow-up to his BA honors thesis "The Xixia Writing System" (2008) - and his website which links to mine.

Alas, I haven't written about Tangut - much less Khitan or Jurchen - in a long time. If I may rephrase Downes' question, I have been trying to come up with the answer to "How Does Pyu Work?" It's coming in a series of articles and a book.

These are exciting times for the study of extinct Asian languages.

Tangut Yinchuan font copyright © Prof. 景永时 Jing Yongshi
Tangut character image fonts by
Tangut radical and Khitan fonts by Andrew West
Jurchen font by Jason Glavy
All other content copyright © 2002-2018 Amritavision