The native Cantonese word 脷 lei⁶ 'tongue' is reminiscent of l-words for 'tongue' found elsewhere in Sino-Tibetan: e.g.,

I wish I knew the Pyu word to complete the set of the 'big five' Sino-Tibetan literary languages, but Pyu basic vocabulary is all but unknown.

To keep things simple, I have not looked at other potentially related *l-words in Chinese, much less other Sino-Tibetan *l-words for 'tongue' or 'lick' available at STEDT.

Before one jumps to the conclusion that all of the above must share an *l-root, one should note Schuessler's (2007: 467) warning:

Initial *l- is a near-universal sound symbolic feature for 'lick / tongue', hence similar words in other languages are not likely to be related, such as MK-PVM [Mon-Khmer-Proto-Viet-Muong] *laːs 'tongue' [Ferlus]; Kam-Tai: S[iamese] liaA2 < *dl- 'to lick' [cf. ], PKS [Proto-Kam-Sui] *lja² ? [Thurgood].

Proto-Kra *l-maA 'tongue' (Ostapirat 2000: 223; cf. Proto-Kam-Sui *maA 'id.' [Peiros]), Proto-Hlai *liːnʔ 'id.' (Norquest 2016 appendix: 127), Proto-Tai *liːnC 'id.' (Pittayaporn 2009: 389), and Proto-Austronesian *lidam (on the basis of only Puyuma and Rukai; Blust and Trussel 2019) also fit the pattern. (A single Proto-Kra-Dai word for 'tongue' doesn't seem to be reconstructible.)

Continental 'Altaic' words for 'tongue' have noninitial l-: Ming Jurchen ilenggu ~ ilenggi, Written Mongolian kelen, and Turkish dil. (But peripheral 'Altaic' words don't: e.g., Korean hyŏ < *he and Japanese shita.)

European examples are English lick and Latin lingua 'tongue'. (The latter, of course, has an irregular l- < *d- which became the t- in tongue. Wiktionary derives the l- of lingua by analogy with lingō 'I lick', the true Latin cognate of lick. If we ignore that inconvenient fact, we could be daring and 'reconstruct' a 'Proto-World' *lV 'tongue/lick'. No.)

Schuessler was of course warning against linking Sino-Tibetan words to non-Sino-Tibetan words which happen to share the same initials, but lookalikes do also occur within families: e.g., lick and lingua. There could, at least in theory, be two unrelated lateral roots for 'tongue' in Sino-Tibetan.

Trying to reconcile the small set of Sino-Tibetan forms that I listed at the beginning runs into all sorts of difficulties:

Prelaterals (i.e., whatever comes before the L: prefixes or first syllables of disyllabic roots?): If Old Chinese *mI- and pre-Tangut *PI- are prefixes, what are their functions? Maybe the unknown pre-Tangut labial *P- was *m-. (The high vowel *-I- in both proto-forms is needed to account for the fronting of *a.) The labials in those prelaterals clash with Burling's Proto-Lolo-Burmese alveolar *s- and Hill's pre-Tibetan velar *ɣ-.

Laterals: Chinese and Proto-Lolo-Burmese have a voiced *l-, pre-Tangut has voiceless *l̥- (pre-Tangut *Sl- would correspond to Tangut l- + vowel tension), and pre-Tibetan has both voiced *-lʲ-. and voiceless *-l̥ʲ- with palatalization that might be a trace of a preceding high vowel *-I-:

Il̥- > Il̥ʲ > *ɣl̥ʲ-

Il- > Ilʲ > *ɣlʲ-

Vowels: Three types are in the six words at the top:


- the same three stops that might have preceded *-s in pre-Cantonese.

If one regards the various codas as suffixes, one should ideally be able to identify the functions of those suffixes. Affixation can be a dangerous pseudoexplanation for mismatching segments in forms under comparison.

This exercise shows how far we are from being able to reconstruct Proto-Sino-Tibetan. Much more work needs to be done on subgroups before the outlines of their common ancestor can emerge. BACKLOGS AND RECOMMENDED READING: DMITRIEV, PHAN AND DE SOUSA, FERLUS, KING

I don't like interrupting series because I rarely get back to them - two examples being my Golden Guide posts (which I stopped almost five years ago!) and a series on Mon that I started in September but haven't even posted until today. (I've posted the Mon series on my front page above yesterday's post even though my other September posts have long since fallen off.)

The Mon series should make up for my dearth of original content today. I don't have time to say much about today's finds:

Phan's 2013 PhD dissertation set me straight six years ago, but it was good to see a short, clear demonstration of the differences between Ct and SV.

Is a slide about velar softening missing?

Pyu makes a cameo appearance on slide 3 (in which Chenla is further northwest than I'd expect)

the people we call today Vietnamese were even more recent arrivals in the Red River Delta as previous thought, probably arriving from the 10th century AD onward, and that the migration (or movement) of Viet-Muong people generally has been from south to north and not reverse. (p. 4)

But this claim of late arrival clashes with the fact that Vietnamese is full of layers of Chinese loanwords going perhaps as far back as the end of the Dong Son period. Those words were acquired during a millennium of Chinese rule in what is now northern Vietnam - a region that Schliesinger regards as a purely Tai area until the 10th century. Vietnamese could not have gotten all those loans via Tai because Vietnamese has far more Chinese loanwords of various ages than the Tai of northern Vietnam. WHAT IS THE RELATIONSHIP BETWEEN THE KHITAN SMALL SCRIPT AND THE JURCHEN LARGE SCRIPT? (PART 2: THE LOYALTY PRINCIPLE)

(Back to Part 1)

In the Khitan large script, there is an nearly one-to-one relationship between words and character blocks: e.g., the trisyllabic word taulia 'hare' is written as a single block of three characters:


Exceptions are polysyllabic Chinese loanwords which are written with one block per syllable: e.g., the name

<340.339.303 244.357> <h.i.ing s.ung> Hingsung (Xing 2.2)

from Liao Chinese 興宗 *1hing 1tsung 'flourishing ancestor'. (Khitan had no /ts/ in its native phonological inventory, so Chinese /ts/ was often approximated as /s/.)

In theory the name Hingsung could have been written as one five-character block

<340.339.303.244.357> <h.i.ing.s.ung>

since neither /xiŋ/ nor /suŋ/ are words in Khitan, but the loyalty principle of imitating the original Chinese spelling with separated syllables overruled the normal lexical principle of one block per word.

The loyalty principle has no equivalent in the Khitan large script (KLS). There is no strict one-to-one correlation between Chinese characters and KLS characters:

Liao Chinese pronunciation
Khitan pronunciation



皇帝 *1'hong¹ 3ti 皇帝 EMPEROR₁ EMPEROR₂
(a name)

何至 ha an

夫坐 sho oi

Strictly speaking, 'mountain' and 'commander' are probably parts of Chinese borrowings in Khitan rather than Khitan words.

I have not seen the Chinese borrowing bai 'hundred' outside the 耶律昌允 Yelü Changyun KLS inscription (1062); the usual word is native jau.

The fact that the name 'Han' and 'commander' are written with two KLS characters may indicate that either the KLS had no phonograms <han> and <shoi> for the monosyllables han and shoi or that the KLS may have had logograms pronounced han and shoi which were inappropriate for 'Han' and 'commander' because they stood for other words. A study of multiple-character KLS spellings for Chinese monosyllables may enable us to guess which monosyllables did not have phonograms in the KLS.

"May", because there is at least one case of a Chinese monosyllable with both one- and two-character KLS spellings: 上 *3shang corresponds to


<shang> ~ <sha.ang> ~ <sha.ang>

in lines 3, 1, and 17 of Yelü Changyun. There is also a KLS 北 <shang> used to write Liao Chinese 尚 *3shang. Perhaps 仲 and 北 are morpheme-specific logograms corresponding to the Liao Chinese homophones 上 and 尚.

The KLS does have a character 上 which looks exactly like Liao Chinese 上 *3shang, but KLS 上 represents the syllable ha instead of shang. KLS can be disorienting from the perspective of someone accustomed to the Chinese script because so many KLS characters do not function like their Chinese lookalikes: e.g.,

Did the Khitan randomly decide to retain Chinese-like readings for some Chinese characters (山, 皇, 帝) and assign arbitrary non-Chinese readings to others (高 etc. in the list above)? I don't think so. I think the un-Chinese readings of 高 etc. originate from the use of those characters as semantograms for non-Chinese languages. 高 etc. may also be simplifications of more complex Chinese characters: e.g., 至 could be a 'katakana' phonogram reduction of a semantogram like









(Some of those characters may postdate the creation of the KLS and therefore be disqualified as potential cognates of KLS 至 <an>.)

Once again I am out of time, so I didn't get to write about the Jurchen text from part 1, much less come even remotely close to answering the title question. What I originally thought might be a single post just gets longer.

¹Yesterday it occurred to me that I could differentiate between 'yin' and 'yang' tones in my tonal notation by marking yang tones with '. I project the absence of non-1 yang tones (2'-, 3'-, 4'-) in modern Mandarin back into Liao Chinese, but I could be wrong. WHAT IS THE RELATIONSHIP BETWEEN THE KHITAN SMALL SCRIPT AND THE JURCHEN LARGE SCRIPT? (PART 1)

The short and oversimplified answer is that there isn't any.

The real answer is more complicated.

The defining characteristic of the Khitan small script is how its characters are combined into blocks. For that reason, Shimunek (2017) calls it the 'assembled script' to avoid commiting to the term 'small script'¹.

Kiyose (1977: 27-28) proposed that the elusive Jurchen small script is nothing more than the Jurchen large script characters combined into Khitan small script-like blocks. The known examples of these Jurchen blocks are in 弇州山人四部稿 Yanzhou shanren sibu gao (Draft [Catalog of] the Four Categories of Yanzhou Shanren['s Library]; 16th c.) and 方氏墨譜 Fang shi mopu (Mr. Fang's Ink [Cake] Book, 1588) and on a 牌子 paizi (travel pass).

Here is Kiyose's (1984) decipherment of the eight blocks in Yanzhou and Fang shi which can be seen at Wikipedia:

block #

bright > wise
prince (< Chn)


virtue (< Chn)



all (< Chn)



'When a wise prince is heedful of virtue /

'Foreigners from the four quarters come as guests'

(tr. by Kiyose 1984: 84)

Unlike most Khitan small script blocks, the blocks in that text are purely vertical: e.g., <tiqo.ci.ghun> and <an.da.hai> are vertical stacks of three characters - an arrangement never found in the Khitan small script. (But two-element vertical stacks like <gen.giyen> and <tuli.le> are occasionally found in the Khitan small script.)

Making images of those blocks and their components took so long that I don't have time to write about the words they represent or how they're strung together! Next time ...

In the meantime, I thank Jason Glavy for making the font that is the basis of nearly all my 600+ Jurchen images. (Seven of the eight images in the transcription of the Yanzhou/Fang shi text above are modifications of characters from his font; only <duwin> 'four' is unaltered.) I couldn't have written any of my many posts about the Jurchen script over the last eight years without his font.

¹The terms 'small script' and 'large script' are only known from Chinese sources. 遼史 Liao shi (History of the Liao Dynasty) vol. 64 says that 耶律迭剌 Yelü Diela


'was able to learn their [the Uyghur] spoken language and script. Then he created (a script) of smaller Khitan characters which, although few in number, covered everything.' (tr. by Kane 1989: 2)

That passage hints at the possibility of the Khitan small script being somehow influenced by Uyghur and indicates that the small script had 'few' characters. The 'assembled script' has characters combining into words as in the Uyghur script and has fewer characters than the other Khitan script (the 'linear script'), so I am certain that the 'assembled script' is the small script (and that the 'linear script' is the large script).

Contrast the "few" characters of that passage with the description of the creation of a Chinese-like first Khitan script with "several thousand characters" in 新五代史 Xin wu dai shi (New History of the Five Dynasties) vol. 72:


'He [阿保機 Abaoji, the first Khitan emperor] employed many Chinese, who taught them [the Khitan] how to write by altering characters in the clerical script, adding here and cutting there. They created a script of several thousand characters, replacing the contracts made by making notches on wood.' (tr. by Kane 2009: 167)

That earlier script must be the large script which has over a thousand known characters and resembles Chinese more strongly than the small script.

Unfortunately, these passages from Liao shi vol. 2 using the term 'large script' do not give any specifics:

五 年春正月乙丑,始制契丹大字。

'Fifth year: spring, first month, yichou day: work began on the creation of the Kitan large script.'

九月 [...] 壬寅,大字成,詔頒行之。

'Ninth month [...] renyin day: The large script was completed. It was implemented by imperial edict.' (tr. by Kane 2009: 167)

There was no Khitan script before the fifth year of Abaoji's reign, so the large script must be the earliest Khitan script - the "script of several thousand characters" mentioned in Xin wu dai shi. JURCHEN LARGE SCRIPT CHARACTER DERIVATIONS: <STAR>, <GIYA>, <HOTO>, <LE>

The first of these occurred to me last night; the rest are from today.

Jurchen reading
Jurchen gloss
Jurchen etymology
cognate sinograph
Chinese gloss
source reading
~ osiha
< Proto-Tungusic *xōsī (Vovin 1996 class handout)

Para-Japonic cognate of Proto-Japonic *osi or *usi 'cow'

giya [kʲa]
< post-Early Middle Chinese 街 *kja

post-Early Middle Chinese *kja

cf. 土 'earth'
a word related to the source of Manchu hoton 'city'
~ le
Middle Chinese *lḛj

A. The Jurchen logogram <STAR> may be a Parhae cognate of standard Chinese 牛 <COW> which was once used to write a para-Japonic cognate of Proto-Japonic *osi or *usi 'cow' and later borrowed to write an unrelated Jurchen soundalike osiha 'star'. That borrowing must postdate the loss of *x- in pre-Jurchen.

(The resemblance between Proto-Tungusic *xōsī 'star' and Japanese hoshi 'id.' is fortuitous. Japanese h- goes back to *p-, and Proto-Tungusic *p- became p- [later f-] rather than zero in Jurchen.)

The second form of <STAR> with ㇓ on the left and a hook on the bottom is from Grube (1896: 1). Without access to the Berlin manuscript that he used, I cannot verify how accurate his handwritten form of <STAR> is.

B. The Jurchen phonogram <giya> may be a Parhae cognate of standard Chinese 家 <HOUSE> (post-Early Middle Chinese *kja) used to write the syllable giya [kʲa]. Jurchen speakers borrowed giya 'street' from post-Early Middle Chinese 街 *kja 'id.' (via Sino-Parhae?; cf. Sino-Korean 街 ka) but wrote it with a version of 家 which was homophonous with 街 in the Chinese known to the Jurchen. (That variety of Chinese had merged the rhymes of 街 and 家, whereas modern standard Mandarin 街 jiē reflects a variety that had not merged those rhymes.街 and 家 are not homophonous in any modern variety of Mandarin at 小學堂 Xiaoxuetang: compare their readings here and here.)

C. The Jurchen phonogram <hoto> containing 土 <EARTH>  may have originated as a logogram <CITY> for an areal word attested in Koguryŏ place names (as 忽 *hot), Manchu (hoton, a loan from Mongolian), and Mongolian qota(n). The logogram could have originated in the Parhae script (to represent the *hot-word from Koguryŏ) or in the completely lost Northern Wei script (to represent a Serbi cognate of Mongolian qota[n]). This logogram may be original to the Parhae or Northern Wei precursor of the Jurchen script and therefore lack a cognate in the standard Chinese script.

D. The Jurchen phonogram <le> [lə] may be a cognate of standard Chinese 礼 <CEREMONY>. In Liao and Jin Chinese, 礼 was pronounced *li which would have been a less than optimal match for Jurchen [lə]. The use of a cognate of 礼 to represent the syllable /lə/ probably predates the shift of *-ej to *-i in northeastern Chinese: i.e., it may go back to the Parhae script or perhaps even the Northern Wei scirpt.

Hiragana れ <re> and katakana ㇾ <re> are respectively derived from a cursive form of 礼 and the right side of 礼, so I regard them as potential 'relatives' of <le>.

The second form of <le> with 天 on the left and a hook on the bottom is from the Berlin copy of the Ming dynasty Bureau of Translators vocabulary and could be a mistake for the correct form with 夫 on the left. The Jurchen script in extant copies of the vocabulary was probably written by Chinese scribes and hence may contain nonnative errors. It might be difficult to differentiate between genuine Ming Jurchen innovations and scribal errors. I assume that the dots of the Ming Jurchen characters

<DAY> inenggi and <MOON> biya

are genuine innovations, but the replacement of 夫 with 天 in <le> may not be. SINO-PARHAE INFLUENCE ON THE JURCHEN SCRIPT?

Janhunen (1994: 133) proposed that the Jurchen script was a descendant of an "old local system of writing" rather than a 12th century creation as commonly assumed.

An obvious candidate for a concretely identifiable historical entity that had the potential to create a written language in pre-Liao Manchuria is the Bohai 渤海 [= Parhae] kingdom (698-926).


The Khitan and Jurchen "large" scripts were likewise not true "inventions" but, rather, natural stages in an evolutionary process that extended backwards through the Bohai script to some early northern variety of the Chinese script. [...] There is also the possibility that the Korean state of Koguryeo 高句麗 (-668), often regarded as the direct precursor of Bohai, was somehow involved. The influence of United Shilla 新羅 (668-918), a contemporary of Bohai, appears somewhat less likely, but cannot be completely ruled out. (pp. 114-115)

If Janhunen's hypothesis is correct, I might expect some peninsular features in the Jurchen script. Unfortunately, little is known about the languages of the peninsula prior to the invention of hangul in the 15th century, and very little is known about languages outside of Shilla. So it is dangerous to project Shilla features onto the rest of the peninsula, and a greater leap still to assume such features might have reached Parhae in the north.

Nonetheless let's suppose that one of those features - the *-r (> modern -l) that characterizes Sino-Korean (i.e., Sino-Shilla) - was present in the Chinese known in Parhae. It was certainly present in the Chinese of the capital in northwestern China, but whether the feature also existed in northeastern China and Parhae next door is open to question. Let's answer the question in the affirmative for now. If Sino-Parhae had *-r readings for Chinese characters - and its local characters - I would expect *-r local characters to appear in the Jurchen script as symbols for CVr(V) syllable (sequence)s.

In Sino-Korean, 失 <LOSE> is pronounced 실 shil < *sir. Jin (1984: 14) derived the Jurchen phonogram


<šir> (the earlier form is on the left)

from ... 失 <LOSE>. The -r of the Jurchen reading could reflect a Chinese or Sino-Parhae *-r.

How many other Jurchen CVr(V)-characters resemble Chinese characters with Sino-Korean -l (< *-r) readings?

I am not saying that Jurchen has Sino-Korean features. I use Sino-Korean as the only available proxy for Sino-Parhae: how Chinese characters might have been pronounced in Parhae to the north of Old Korean-speaking Shilla.

I am also not saying that all Jurchen CVr(V)-characters must be derived from Chinese characters with Sino-Korean -l (< *-r) readings.

In theory Parhae characters for native Koreanic *CVr(V) words could have been recycled for Jurchen CVr(V)-sequences.

Koreanic need not be the only source of CVr(V)-readings in Jurchen. If Vovin (2012) is correct and Jurchen was already written in Parhae two or more centuries before the establishment of the Jurchen Empire, Parhae characters (渤海字? 渤字?) could have functioned as semantograms for Jurchen words:

Chinese character X meaning Y : Parhae character X' for Jurchen word CVr(V) meaning Y'

Semantograms for Khitan CVr(V) words could have been reused for unrelated Jurchen CVr(V) words and syllable (sequence)s.

Going beyond Jurchen and Khitan, I have already proposed that the Jurchen character

<HORSE> mori(n)

is related to Chinese 保 <PROTECT> (Sino-Korean  보 po) which does not have a Sino-Korean reading ending in -r but which could have represented a para-Japonic (i.e., the peninsular sister of pelagic Japonic) morpheme (sequence?) mor(-i) 'protect(-INF)' (cf. the use of 保 for mori 'protecting' in Japanese names).

A very wild possibility is that some Jurchen CVr(V)-characters may be derived from Parhae characters representing Amuric (!) morphemes. Fortescue's 2016 Proto-Amuric ('Proto-Nivkh' at Wiktionary) reconstruction has *-r(V) and *-ʀ(V)-final roots.

To sum up my thoughts, I present a table of possible sources for Jurchen CVr(V)-readings:

source character
recycled for
Chinese *-r characters
Sino-Parhae *CVr
Jurchen CVr(V)
Parhae characters
Jurchen CVr(V)?
Khitan CVr(V)??
Koreanic CVr(V)???
para-Japonic CVrV????
Amuric CVr(V)?????

My assumption here and elsewhere is that Jurchen character readings are not random in the way that Cherokee character readings appear to be random: e.g., Sequoyah assigned the reading a rather than dV to Ꭰ. (Cherokee <da de di do du dv> are Ꮣ Ꮥ Ꮧ Λ¹ Ꮪ Ꮫ.) I would like all Jurchen character readings to be derived either from Chinese or some non-Chinese language's approximate semantic equivalent of a Chinese morpheme (e.g., a para-Japonic *mor-i 'protect-INF' as a translation of Middle Chinese 保 *pa̰w 'protect').

Incredulity is no argument, I know, but I just can't bring myself to believe that 完顏希尹 Wanyan Xiyin did what Sequoyah did on a mass scale: take character shapes from existing scripts (the Chinese and Khitan large scripts) and assign hundreds of them to Jurchen morphemes and syllable (sequence)s at random. Sequoyah was illiterate until he invented his own script and may have never known English, but Xiyin

was fascinated by Chinese classics, and collected a large library when Jurchens seized and looted the capital of the Northern Song dynasty, Bianjing (present-day Kaifeng), in the Jin–Song Wars.

That happened either during the first siege of Bianjing in 1126 or the second in 1127 - years after c. 1119-1120, when Xiyin was said to have created the script. In theory Xiyin could have been illiterate until (or even after?) the siege and just liked the idea of having books he couldn't read, but I doubt that. The Jurchen had lived under literate rulers familiar with Chinese culture for centuries. Surely 阿骨打 Aguda, the founder of the empire, would have assigned a literate man to 'create' a script for his new state.

I put 'create' in quotes, since I think Xiyin standardized an existing script. Maybe standardized should also be in quotes, since the Jurchen script has a lot of variation. This variation may imply that the script has a lot of history behind it. (The Tangut script is most likely a true invention without a history, and it has far less variation.)

Some of that variation postdates Xiyin's time: e.g., the dots of

<DAY> inenggi and <MOON> biya

are not in the manuscript thought to be the earliest example of the Jurchen script. I am not counting the Parhae tiles that Vovin (2012) regarded as even earlier examples. If one counts those tiles, then that manuscript may be the earliest example of post-Xiyin written Jurchen.

Variation in the Jurchen script - and the Khitan scripts - is an issue deserving of much attention. Jin (1984) has already done some basic work by identifying which texts characters appear in. The next step is to create a visual chronology organizing characters by date: e.g.,

c. 1500
<DAY> inenggi
<MOON> biya
<COOKED> uru

Note how the earlier version of <šir> is closer to Jin's (1984) proposed Chinese source character 失 <LOSE>. The newer version of <šir> has a lookalike of the Khitan small script character 051 <qa>

atop a half-height 人, whereas the older version and 失 have a full-height 人 shape.

I included the Jurchen character <COOKED> because its later version looks exactly like 失 <LOSE>. The absence of a dot on the bottom of the late version could simply be a mistake in the Berlin copy of the Ming dynasty Bureau of Translators vocabulary. I have no idea why the shape of 失 <LOSE> - with or without a dot - was read as uru. Korean ilh- < ìrh- 'lose' and Old Japanese usinap- 'lose' only have one segment matching uru, and Proto-Amuric bək(ə)z- 'lose' doesn't match at all. Was there a Khitan root ur(u)- 'lose'? Might Khitan large script character 1511 <?>

have been read ur(u)-?

¹11.17.23:52: In 1834, Samuel Worcester inverted Sequoyah's Λ <do> to Ꮩ to differentiate it from Ꭺ <go>.

Tangut Yinchuan font copyright © Prof. 景永时 Jing Yongshi
Tangut character image fonts by Mojikyo.org
Tangut radical and Khitan fonts by Andrew West
Jurchen font by Jason Glavy
zAll other content copyright © 2002-2019 Amritavision