字典𡦂喃引解 Tự Điển Chữ Nôm Dẫn Giải ‘Character Dictionary of Chữ Nôm with Quotations and Explanations' lists

⿰貝骨 (not in Unicode) lót 'bribe'

as a homophone of lót 'to add a layer beneath or inside' from yesterday. (I suspect the noun is an extension of the verb: a bribe is something one pockets - put inside.)

bối 'shell' on the left is the monetary radical. It's not surprising.

What is surprising is 骨 cốt 'bone' on the right with initial [k] instead of [l]. Or is it?

Using yesterday's scoring system for phonetic fidelity, ⿰貝骨 is a 7:

- the initial consonant is a 0 - [k] and [l] have nothing in common

- the vowel is a 3 - o [ɔ] and ô [o] are both back rounded and of the same length; only their height differs

- the final consonants is a 2 - a perfect match

- the tone is a 2 - a perfect match

Taberd lists a spelling of lót 'bribe' with a matching initial and an ironic original meaning:

律, originally for luật 'law' (bare phonetic)

I find his entry format confusing:

— 揬 | đút —, subornare

Why are the dashes in the Chữ Nôm and the Quốc Ngữ romanization on opposite sides? Why isn't the entry like this?

揬 — | đút —, subornare

đút, another word for 'bribe' (presumably an extended usage of đút 'to insert'), has two other spellings without the 扌 'hand' radical (the means of insertion):

⿰貝突 with the monetary radical plus the same phonetic 突 đột 'suddenly'

is there the syllables of the redundant compound ⿰貝突⿰貝骨 đút lót 'bribe' would have matching radicals with this spelling: cf. Sino-Vietnamese 賄賂 hối lộ 'bribe' with double monetary radicals

đút with the monetary radical plus the phonetic 卒 tốt 'to end'

Let's score those spellings:

揬 and ⿰貝突: initial 2, vowel 3, final 2, tone 1 = 8

賥: initial 1.5 (t- is closer to đ- than, say, l- which would be a 1), vowel 3, final 2, tone 2 = 8.5

Do scores correlate with textual frequency? Did writers tend to favor better phonetic matches? Probably not. I admit my scoring is arbitrary and for fun. And timely given that the


Thế vận hội Mùa đông

'World athletic meeting Season winter' = 'Olympic Winter Games'

are still going. Though not for long - they end tomorrow.

(I wanted to type a made-in-Vietnam character for mùa 'season', but my editor doesn't support CJK Unified Ideographs Extension E. And it probably never will since KompoZer's development has been frozen since 2010.) A LÓT OF COMPROMISES: FITTING VIETNAMESE INTO A CHINESE SYLLABARY

Today I found out that one of the Chữ Nôm spellings of Vietnamese tốt 'good' (see parts 1 and 2 of my series)

䘹 = semantic 衤 y 'clothes' + phonetic 卒 tốt 'to end'

is also a Chinese character in the strict sense; it has been attested in Chinese since at least c. 2000 years ago in 楊雄 Yang Xiong's 方言 Fangyan 'Regional Speech' where it refers to *tsout 'underwear'. Did the Vietnamese recycle 䘹 for tốt 'good', or did they unintentionally recreate it? I suspect the latter, as 䘹 is a rare character; the fact that it was encoded in Unicode's Extension A block rather than the main CJK Unified Ideographs block tells me that it wasn't common enough to make it into the first wave of 20,971 characters.

字典𡦂喃引解* Tự Điển Chữ Nôm Dẫn Giải ‘Character Dictionary of Chữ Nôm with Quotations and Explanations' lists a second reading for 䘹, lót 'to add a layer beneath or inside', citing


lót trong áo cừu

'add.layer in coat'

from 嗣德聖製字學解義歌 Tự Đức thánh chế tự học giải nghĩa ca 'Tự Đức's Sagely Made Song for Character Study and Explaining Meanings' edited by Emperor Tự Đức sometime in the 19th century.

Normally the Vietnamese did not write l-syllables with t-characters. The other three spellings of lót in Tự Điển Chữ Nôm Dẫn Giải have l-phonetics:

1. 律 luật 'law' (bare phonetic)

2. 𢯰 =扌 'hand' + 律 luật

3. ⿰衤律 (not in Unicode) = 衤 'clothes' + 律 luật

It is not possible to write lót with Chinese characters for lót [lɔt] or even as lốt [lot] because no Chinese characters with those Sino-Vietnamese readings exist.

Chinese syllables with *l- ending in stops were borrowed with the nặng tone written as a subscript dot, not the sắc tone written with an acute accent. I can't think of any exceptions to this rule at the moment. So it seems a tonal match was impossible.

Tone aside, a perfect segmental match was also impossible.

lọt may not even have been a theoretically possible reading since *-ɔt does not seem to have been a rhyme in any variety of Chinese known to the Vietnamese during a millennium of Chinese rule.

The absence of lột, on the other hand, is partly accidental - there was no Chinese phonotactic rule forbidding it. lột would have ultimately come from an early Old Chinese *Cʌ-rut, with a *low vowel conditioning the lowering of *u:

*Cʌ-rut > *Cʌ-rout > ́́́*rout > *lout > *lot (> Sino-Vietnamese *lột)

luật 'law' comes from early Old Chinese *rut without a preceding *low vowel to lower its *u:

*rut > *lut > *lwit > *lwət (> Sino-Vietnamese luật)

Without any Chinese characters read as lọt or lột (or lót or lốt), the Vietnamese

- had to compromise on the tone if they were to use an l-phonetic

- had to compromise on the vowel if they were to use an l-phonetic

- had to compromise on the initial if they were to use a non-l -ốt phonetic

The four spellings of lót reflects two different kinds of compromises:

- the 律-spellings have l- at the expense of the tone and the vowel

- 䘹 has a perfectly matching rhyme at the expense of the initial

It seems the Vietnamese generally favored approximating the initial, but I would like to see statistics.

It would be fun to come up with a scoring system for how close a Chữ Nôm character reading matches the pronunciation of its Chinese phonetic component**. Off the top of my head:


0 points - nothing in common

1 point - shared point of articulation (l- and t- as in 䘹) or shared manner of articulation

2 points - both shared point and manner of articulation


0 points - nothing in common

1 point - shared frontness, height, roundness(less), or length

4 points - perfect match


0 points - nothing in common

1 point - same register or *VQHC class

2 points - perfect match

Using that scoring system, out of a maximum score of 10 (2 for onset consonant, 4 for vowel, 2 for coda consonant, 2 for tone):

the 律-spellings have 7 points:

2 + 2 (length match; partial shared frontness and roundness) + 2 + 1 (same *VQHC class)

䘹 has 9 points:

1 (l- instead of t-) + 4 + 2 + 2

Yet spellings of the 䘹 type which compromise on the initial are less common despite their higher score. That makes me think I need a better scale for measuring onset fidelity.

*2.23.14:10: Chữ 'character' can be spelled at least six different ways in Chữ Nôm:

1. 字

2. 𡦂 (字 doubled; this character formation strategy is rare)

3. 𡨸 (like 2 but with one 字 abbreviated as 宁)

4. ⿰ 字 + 宁 (reversal of 3)

5. ⿰ 字 + 文 'writing'

6. 𡨹 (like 4 but with thủ 'to guard' instead of 字; 𡨹 is a Chữ Nôm character for giữ 'to guard' doing double duty for a phonetically similar word chữ; its phonetic 宁 is an abbreviation of 字 chữ)

I don't know which is the preferred spelling of the author (Nguyễn Quang Hồng). I picked 𡦂 to differentiate chữ from 字 tự which also means 'character'. Both words are borrowed from the same Chinese etymon at different periods.

**2.23.14:11: 𡗶 is a rare example of a Chữ Nôm character without a phonetic component: it is a compound of 天 'heaven' above 上 'above'. It is reminiscent of the Khitan large script character

for 'heaven' (with 土 'earth' on the bottom) which probably predates it. (The earliest surviving Chữ Nôm text is from 1209; the earliest known uses Chữ Nôm from the late first millennium do not include 𡗶.) WHAT'S SO BEAUTIFUL ABOUT A BUG'S BOTTOM?: THE ORIGIN AND ORTHOGRAPHY OF VIETNAMESE TỐT (PART 2)

I've abandoned a lot of series on this blog, but I haven't forgotten to continue what I started two weeks ago.

nomfoundation.org and hvdic.thivien.net both list nine Chữ Nôm spellings of Vietnamese tốt 'good':

Group A with phonetic 卒 tốt 'to end'

A1: bare phonetic

1. 卒

A2: tốt đệp: 'good' in the sense of 'good-looking'

2. 䘹 with 衤 'clothes' on the left

Was  䘹 used in reference to clothes, or could it be used more broadly?

3. 𬙼 with 美 'beautiful' on the left

4. 𩫛 with 高 'high' on the left (why isn't this in group A3?)

5. 𡄰 with 善 'good (opposite of evil)' on the left (why isn't this in group A4?)

A3: tốt (dáng cao): 'good' in the sense of 'high'

6. 崪 with 山 'mountain' on the left

7. 崒 with 山 'mountain' on top

A4: tốt xấu: 'good' as the opposite of 'bad'

8. 𡨧 with a 宀 roof on top (why? - it's not by analogy with a roof in xấu 'bad', since none of the spellings of xấu contain a roof: 丑瘦臭醜.)

Group B without any phonetic: tốt đệp: 'good' in the sense of 'good-looking'

9. 𧍉 with 虫 trùng 'bug' plus 底 để 'bottom'

This last character is like so many Tangut characters: it does not seem to be the sum of its parts. Its components neither sound like tốt nor mean anything like 'good'. I wish I could find an example of 𧍉 in context to confirm this reading.

𧍉 has a second reading which makes phonetic and semantic sense: đỉa 'leech'. HENTAI KITSUBUN?

In premodern Japan, there was a form of Japanized Chinese now known as 變體漢文 hentai kanbun 'modified Chinese prose'.

Just as the Japanese once wrote in Chinese, the Jurchen once wrote in Khitan. There was no Jurchen script until 1119. Nonetheless, as late as 1156, 18 years after the creation of the second Jurchen script,

... it was officially ordered that in the [Jurchen Empire's] examination for copyist in the Department of National Historiography the Jurchen copyists be able to translate Kitan [= Khitan] into Jurchen, and the Kitan copyists Chinese into Kitan. Even the Jin [= Jurchen] emperor Shizong commented, "The new Jurchen script cannot match it [Khitan]." The Chinese original was first written in the Kitan small script and then annotated in or translated into the Jurchen script. (Kane 2009: 3)

Last night as I was thinking about the last known dated Khitan small script text from the Jurchen era known as

<GREAT SEAL.gha.am> and <HEAVEN SEAL.gha.am> (1161-1189)

in the Khitan small script, I realized that Jurchenized Khitan might be called 變體契文 hentai kitsubun 'modified Khitan prose' - or in Sino-Jurchen, something like biyanti kiwen.

How might Khitan be Jurchenized? Khitan seemed to have grammatical gender (Kane 2009: 144):

In the past tense of verbs, one can also see this distinction between the suffix <er> for males and <én> for females.


With the numerals, Wu Yingzhe has noticed an important phenomenon: in most cases, the dotted form refers to a male, and the undotted form to a female, or is non-gender specific. Dotted and undotted forms also appear with inanimate objects, strongly suggesting grammatical gender in Kitan. The whole corpus needs to be reexamined with a view to pursuing these clues, but that research has not yet been done.

Jurchen, on the other hand, did not. In theory Jurchen speakers writing in Khitan might have omitted masculine dots or added them to nonmasculine forms. (Not just feminine because I suspect there might be a neuter gender with agreement patterns blending masculine and feminine characteristics; seemingly inconsistent nouns may have been neuter.) Do inconsistencies in gender marking cluster in Jurchen-period Khitan texts? Even if that were the case, that does not necessarily mean gender problems would have been unique to Jurchen speakers. Perhaps gender was on the decline in native speaker Khitan.

Other Jurchen errors from a Khitan native speaker's perspective could have been less dramatic: e.g., incorrect case marking akin to saying Japanese-style 'X DAT become' instead of 'X NOM become' for 'become X' in Korean.

Centuries later, the Jurchen used Khitan's sister language Mongolian in writing after they had forgotten their own scripts. Has there ever been a study of Mongolian as written by Manchu speakers?

So far, the Khitan corpus has generally been treated as a single entity. But Khitan was written across a wide area for three centuries. What may appear to be inconsistencies within the corpus may turn out to be innovations and/or hentai kitsubun features correlated with specific times and/or places.

*2.22.9:24: The hypothetical Khitan neuter might be like the Romanian neuter which has no unique features:

[...] in synchronic terms, Romanian neuter nouns can also be analysed as "ambigeneric", i.e. as being masculine in the singular and feminine in the plural

Or maybe there is no neuter. The dots may not have a simple one-to-one correlation with  the two genders. KHITAN DOROGHAM 'PEACE'?

Last night I mentioned one exception to 'heaven' and 'great' matching up in the Khitan large and small scripts in Andrew West's list of era names:

= =

The Chinese era name 大定 'great settlement' (1161-1189; Kane 2009's translation) corresponds to


in the large script and both

<GREAT SEAL.gha.am> and <HEAVEN SEAL.gha.am>

in the small script:

I assume the third large script character

is <gham> corresponding to


in the small script.

Did the Khitan have two era names recorded in the small script? This list mentions only one small script inscription from that era: the epitaph for the 博州防禦使 Bozhou defense commissioner (1171). I don't have access to the text. Do both small script era names appear in it, or is a single instance of the era name difficult to read? I can imagine a damaged 'great' looking like it could be 'heaven' in the small script or vice versa.

In any case, the second name element also appears in the Khitan large script equivalent of the Chinese era name  保寧 'protect tranquility' (969-979; again, Kane's translation):

<? SEAL gham>

The first character might represent a Khitan word for 'protect'. I cannot guess its reading because I don't know the Khitan small script equivalent of that era name.

I can, however, draw this equation:

定 'settlement' = 寧 'tranquility' = large <SEAL gham> = small <SEAL gha.am>

Thesaurus Linguae Sericae defines both 定 and 寧 as 'peaceful' in the synonym group 'delightful because orderly and lacking chaos'.

I conclude that <SEAL.gha(.a)m> (a joint transliteration of the large and small spellings) could have meant something like 'peace'. It may have had nothing to do with seals or rituals.

The Khitan word for 'seal' doubled as the word for 'ritual'. Jurchen doro(n), written as


<SEAL> ~ <SEAL> ~ <SEAL.un> (with clarifier)

with a character clearly related to


in the Khitan large script, also had that double meaning. Jurchen doro(n) may either be a borrowing from Khitan or be an unrelated word whose semantic scope was influenced by Khitan.

Khitan <SEAL> in <SEAL.gha(.a)m> may be a phonogram rather than a logogram. If the Khitan word <SEAL> was the source of Jurchen doro(n), it might have been read doro, and the word in question was dorogham.

Khitan dorogham looks vaguely like Written Mongolian words such as doru 'weak' and doru-ghsi 'downward' (cf. dege-gsi 'upward'*). Could they all share a root *dor 'down'?

> 'calmed down' > 'peaceful'

> 'pushed down onto paper' > 'seal'

> 'act performed to calm down', 'act of stamping a mark on circumstances' > 'ritual'

> 'strength down' > 'weak'

So perhaps <SEAL> wasn't just a phonogram; it might have been etymologically appropriate as well in <SEAL.gha(.a)m>. But what would Khitan -gham be if <SEAL> was the root?

*2.21.1:51: -gsi is an allomorph of the Mongolian '-ward' suffix after 'feminine' vowels like e. THE RE-DISSECTION OF KHITAN 'SUCCESSION'

Last night I noticed something obvious that eluded me three years ago: the second character of the Khitan large script equivalent of the Chinese era name 統和 'uniting harmony'

looks like 統 'to unite, govern', the first character of the Chinese era name. Could the 統-like Khitan large script character have represented a Khitan borrowing of Liao Chinese 統 *tʰuŋ 'to unite, govern' or a native Khitan word meaning 'to unite' and/or 'to govern'? If so, then my earlier equation of that single character with two characters in the large script and with <s.bu.o.ɣo> in the small script would have to be changed to


'to unite/govern'? ≠ 'succession' = 'succession'

That then brings up the disturbing possibility that other Khitan large script era names may not be equivalents of Khitan small script names even if there are definite partial matches: e.g., 'heaven' and 'great' almost always match up in the two scripts in Andrew West's list of era names:

= =

I will look at one exception to that generalization in my next post. ECHOES OF THUNDER: CLARIFIERS IN THE JURCHEN LARGE SCRIPT

Today I saw Japanese dare 'who' spelled as 誰れ <WHO.re> in the title of the Gatchaman episode that first aired forty-five years ago today: 「総裁Xは誰れだ」. That is unusual because 誰 by itself is normally sufficient as a logogram for dare 'who'. The character has no other modern reading when it stands alone for a word. The reading tare is archaic; no one would look at a cartoon titled 「総裁Xは誰だ」 and wonder whether 誰 should be read tare or dare.) Therefore there is no need to add a clarifier hiragana れ <re>.

The Jurchen script is full of such clarifiers: e.g., akdiyan 'thunder' appears as both



If not for the word's Manchu reflex akjan and Chinese transcriptions like 阿玷 ~ 阿甸 *atjan*, we would not be able to reconstruct the probable reading of <THUNDER>.

Why did the Jurchen write akdiyan with a clarifier <an>? If Juha Janhunen is right, the Jurchen large script was not invented; rather, it was an adaptation of an existing variant of the Chinese script that was in use in the kingdom of Parhae that once ruled the Jurchen. If <THUNDER> was taken from that Parhae script, the Jurchen may have thought <THUNDER> could stand for both the lost Parhae word for 'thunder' or their own word. Adding <an> insured that <THUNDER.an> would be read as their word akdiyan rather than as the Parhae word (which in this scenario wouldn't have ended in -an).

The Jurchen could have gotten the idea of clarifiers from their southern neighbors in Korea who used clarifiers to indicate that Chinese characters were to be read as Korean words rather than as Chinese words: e.g., in line 7 of the Old Korean poem 彗星歌 Hyeysŏngga 'Song of the Comet' (lit. 'Sweeping Star Song') by 融天師 Master Yungchhŏn in the 鄉札 hyangchhal script during the reign of Shilla's 眞平王 King Chinphyŏng (r. 579-632).

道尸 <ROAD.l> 'road' (cf. modern Korean kil)

掃尸 <SWEEP.l> 'sweep' (cf. modern Korean ssŭl)

星利 <STAR.li> 'star.?' (cf. modern Korean pyŏl.i**)

are written with the clarifiers 尸 <l> and 利 <li> to rule out Sino-Korean readings *to, *so, and *seŋ (later syŏng and now sŏng) for 道, 掃, and 星. We do not know for certain whether the modern Korean words above are the direct descendants of the Old Korean words, but they are the most likely reflexes even though in theory Old Korean could have had another word for road ending in *-l that was not ancestral to modern kil, etc. We can be far less certain about the Old Korean pronunciations of <ROAD>, <BROOM>, and <STAR> without internal sources (i.e., alternate phonogram spellings in hyangchhal) or external sources (e.g., Chinese transcriptions).

*2.19.5:16: 阿玷 is from the Sino-Jurchen vocabulary of the Bureau of Translators (四夷館; entry 7) and  阿甸 is from the Sino-Jurchen vocabulary of the Bureau of Interpreters (會同館; entry 4).

**2.19.23:39: <STAR.li> at first appears to corresponding to modern Korean 'star' followed by the nominative case marker i, but the context seems to require an accusative case marker, so either the usage of i may have changed over time or 'star' in fact was once a disyllabic word ending in *-li or *-ri with a final *-i reminiscent of Old Japanese posi 'star' which is sometimes thought to be a cognate. I need to look into this more.

