It occurred to me today that both Old Persian cuneiform and the Khitan small script superficially resemble other scripts (Sumero-Akkadian cuneiform and Chinese characters) but operate on different principles.

Old Persian cuneiform was a syllabary with random gaps (transliterated below) plus a few logograms for 'Ahura Mazda', etc. not included in the table.


The gaps do not correspond to gaps in Old Persian phonology or phonetics: e.g., although there was no cuneiform character <ki>, Old Persian did have /ki/, and that was written as a sequence <ka.i>.

The Khitan small script (KSS) also seems to have been a syllabary with random gaps plus some possible consonant letters and a few logograms. The phonetic values of the c. 400 distinct syllabograms are not well understood and in some cases are unknown. But the picture that emerges from Kane's (2009) transliteration of the KSS is one of random gaps: e.g., unlike Old Persian, the  KSS has <ki>,  but no <ka> is known yet. That particular gap may reflect the absence of ka in Khitan if its phonology were like that of its surviving relative Mongolian. However, other gaps may have been random: e.g., there is no known syllabogram <ma>, though there was a syllable ma that had to be written as a letter sequence <m.a>.

The Khitan and Jurchen large scripts are mixtures of syllabograms and logograms. The Khitan large script is too poorly understood for me to say anything about gaps in it. The Jurchen large script, OTOH, is mostly readable, and Kane's (1989: 27) table of syllabograms shows a few gaps. I have transliterated their contexts below:


The absence of <si> is not surprising since *si could have become ši or merged with ši. Jurchen could have been like Korean or Japanese which lack a distinction between /ši/ and /si/.

On the other hand, it is striking that gaps cluster in the column of <Co>-syllabograms. There is no obvious phonetic motivation for the absence of <Co>-syllabograms with the initials m- n- c- š- k- which do not constitute a coherent class of consonants. Nor is there a clear reason why there is no <de> if <te> and <ne> exist with initials at the same point of articulation. Each of those gaps could be either random or illusory - the 'missing' syllabograms could simply be one of the characters whose readings are currently unknown.

The Jurchen small script is all but unknown; the existing samples are too small for decipherment, much less the detection of gaps. DID OLD PERSIAN HAVE UNWRITTEN FINAL CONSONANTS LIKE PYU?

It seems that Pyu sometimes had unwritten syllable-final consonants with the exception of /h/ which was always written on the line as a colon-like visarga. Some Pyu texts have subscript syllable-final consonant symbols and other don't. One Pyu text - the 'B' pillar of the Kubyaukgyi (a.k.a. Myazedi) inscription - has subscript consonants only in its first three lines and none in the remaining twenty-six. There is no obvious correlation between the presence or absence of subscript consonants and geography, date, or genre. The problem of why there were two styles of writing Pyu is reminiscent of the problem of why the Khitan had two scripts.

The Indic scripts of the Philippines originally had no means of indicating final consonants, and the Hanunó'o script is still generally written without the pamudpod vowel cancellation sign introduced in the 1950s.

Schmitt (2008: 84) suggests that Old Persian may have had a third type of situation in which some final consonants were written (/m r š/) and others were not though they

were perhaps still pronounced but in some manner phonetically reduced. Note that original Proto-Iranian *-a is written as Old Persian <-a> (i.e., [-aː]), but original *-an or *-ad is written as -<Ca> (i.e., [-a]).

I see two possibilities here:

1. *-an and *-ad merged into final short [-a] distinct from *-a which became long [-aː].

2. *-an became nasalized short [-ã] and *-ad became short [-aʔ] with a final glottal stop.

I used to think that Pyu also had unwritten nasal vowels and glottal stops that were reduced from earlier nasals and oral stops that were once written, but even the earliest texts do not always have final written consonants, even when there is more than sufficient space for them. DOES PERSIAN 'AND' HAVE A PROTO-INDO-EUROPEAN SOURCE?

My short answer is no.

My long answer:

Persian و <wa> [væ] ~ [o] 'and' looks like a loan from the identically spelled و Arabic wa, but is in fact a convergence of an Arabic loanword with a native word *u < Old Persian utā (cf. Avestan and Vedic Sanskrit uta 'and'; the final lengthening is secondary).

Wiktionary derives utā in turn from Proto-Indo-European (PIE) *éti 'and', the source of Latin et 'and'. There are at least three problems with that etymology:

1. PIE *e should become Old Persian a, not u.

2. PIE *i should become Old Persian i (word-final -iy), not -ā.

3. There is already an Old Persian ati- 'beyond' (cf. Avestan aiti-* and Sanskrti ati- 'id.') which looks like the regular reflex of PIE *éti.

I think it's more likely that *uta was a Proto-Indo-Iranian innovation unless there are *uta-like forms elsewhere in Indo-European.

*2.16.15:21: The first -i- in Avestan aiti is epenthetic and conditioned by an i in the following syllable. WHY WRITE 'WIND' AS 'PAGE NUMBER'?

The last of the fourteen spellings of Vietnamese gió 'wind' in the traditional Chữ Nôm script at nomfoundation.org is

𩖅 = số 'number' + 頁 hiệt 'page' (originally 'head')

số is phonetic. In Middle Vietnamese s- was retroflex [ʂ] and gi- was palatal [ɟ]*, but in modern Hanoi, they are respectively much closer as alveolar [s] and [z]. Does the spelling 𩖅 reflect a dialect like Hanoi? How far back does it go, and is it associated with a certain region? In theory the Chữ Nôm script could be a rich source of dialect history since scribes could invent characters for native Vietnamese words incorporating phonetic elements whose readings resembled those words in their dialect (but not necessarily in other dialecfs or even their own dialect at a different point in time).

The function of the right half of 𩖅 is obscure. The wind has nothing to do with pages or heads. But wait, I see at hvdic.thivien.net that 𩖅 could also write sỏ in đầu sỏ 'leader'. I think that word is a compound of the Chinese loan 頭 đầu 'head' and the native word sỏ 'head of a pig'. If so, then 𩖅 for gió 'wind' is a case of a Chữ Nôm character originally devised for one Vietnamese word being recycled to write another:

sỏ 'head of a pig' > written as 𩖅 'sô-head'́ > 𩖅 recycled for gió 'wind'

What I still don't understand is how 頁 'head' came to represent 'page'. In Vietnamese as far as I can tell, hiệt  (ultimately going back to Old Chinese *get) means both 'head' and 'page', but in Chinese, 頁 has a second, unrelated reading for 'page' going back to Old Chinese *sɯ-lap 'leaf' (normally written 葉). In theory the 'page' reading of 頁 should exist in Vietnamese as *diệp (the reading of 葉 'leaf'), but no such reading seems to exist.

*2.15.0:20: De Rhodes (1651) said gi- "should be pronounced in the Italian manner" (translation from Gregerson 1969: 161). I interpret that to mean gi- was a palatal stop [ɟ] rather than an Italian palato-alveolar affricate [] since the former is more likely in Southeast Asia.

2.15.3:03: Added a high-vowel presyllalbe *sɯ- to Early Old Chinese *lap 'leaf' to account for the lack of emphasis which is normally conditioned by lower vowels such as low *a in Middle Old Chinese. The phonetic series of 葉  (Karlgren's GSR 339 + 633) points to *sɯ- in most cases.

Stage 1
Stage 2
Stage 3
Stage 4
世 'generation' (< 'leaf' + suffix)
*sɯ-lap-s *slap-s
葉 'leaf' *sɯ-lap *lap
韘 'archer's thimble'
*sɯ-lap *slap
屧 'bottom inlay in shoe' *sʌ-lep *sʌ-lˁep *slˁep

In Stage 2, high-voweled *Cɯ- blocks emphasis in the following syllable, but low-voweled *Cʌ- conditions it.

In Stage 3, some *CV-presyllables are reduced to *C- whereas others are dropped entirely.

In Stage 4, *sl- has fused into *l̥-, whereas *sV-l- still intact at stage 3 became a new *sl-.

But note 蝴蝶 *galep 'butterfly' in which *sɯ- or  even *s- cannot be reconstructed. WELCOMING THIS WIND (PART 2)

How did a Chinese character 這 'to welcome' which should have been read as nghiện come to be read as giá (and hence qualify as a spelling of the native Vietnamese word gió 'wind')? I'll embed my answer in a longer discussion of the words written with 這 below.

Wiktionary regards 這 as

part of the(OC [= Old Chinese] *ŋaŋ, *ŋraŋs, “to face, to meet”) word family

and cites Zhengzhang's OC reconstruction *ŋrans.

I cannot immediately reject all that. Nonetheless, I am skeptical.

First, the earliest attestation of the word I can find is in an entry in the dictionary 玉篇 Yupian compiled in the 6th century AD: i.e., during the Middle Chinese (MC) period. Is there evidence for the word in Old Chinese, or was MC *ŋɨenʰ mechanically projected back into OC as *ŋrans? The word is not in Schuessler's 1987 dictionary of early Zhou Chinese. There is a common, unspoken, and dangerous assumption that almost any native Chinese word can be traced back to early Old Chinese. (I was going to say that obvious loanwords like 佛 Middle Chinese *but 'Buddha' are thankfully exempt and that no one would reconstruct an Old Chinese 'reading' of 佛, but Zhengzhang's site has such a reconstruction: *bɯd!)

Second, the 迎 word family has two types of forms: open syllables and velar-final syllables. (I disregard *-ʔ and *-s* which may be suffixes.) In the past I have proposed that *-a was from an earlier syllabic *-ŋ, the 'zero grade' of *-aŋ. I also proposed that *-a could be the zero grade of *-an. Below I provide examples with Sanskrit parallels (citing Sanskrit zero ~ -m alternations in lieu of Sanskrit zero ~ -ṅ [ŋ̍] alternations which don't exist since Proto-Indo-European had no *ŋ̍).

Old Chinese
zero grade
zero grade
*wŋ̍ 'to go'
*waŋ-ʔ̍ 'to go'̍
ga-tá- < *gʷm̩-tó- 'gone' gám-a-ti < *gʷóm-e-ti 'goes' (Vedic)
*ŋn̩-ʔ 'to talk'
an 'speech'
ha-tá- < *gʷʰn̩-tó- 'slain'
hánti < *gʷʰénti 'slays'

My proposal explains why these word families don't seem to have forms with a mixture of final consonants: e.g., the 迎 *√ŋ-ŋ word family does not contain words with *-t, *-p, *-m, *-j, *-r, *-w, etc. The few *-k forms could reflect a lost denasalizing suffix.

If 這 belonged to that family, it would be the sole member with *-n.

My proposal has a number of problems: e.g., no support from the rest of Sino-Tibetan and no explanation for when zero grade occurs. (In the Sanskrit past participles above, one can see that unaccented roots take zero grade.)

In any case, the fact remains that -n words are anomalous in a zero ~ velar-final series, and that fact should be explained somehow - even if the zero-grade hypothesis is wrong.

It seems that at some point in the late first millennium AD, 這 came to be used to write an unrelated, nonhomophonous word 'this' (now zhè in Mandarin). The earliest attestation of 這 for 'this' that I can find is in the Jiu Tang shu 'Old Book of Tang' (945). How did that happen?

Here's what I've pieced together from Wiktionary (which should cite its sources) with my caveats.

The word 'this' was once written as 者 and was

[d]erived from (OC *tjaːʔ, “one which”), around the Tang Dynasty.

(OC *tjaːʔ, “one which”) > 者 (MC t͡ɕiaX, “this (possessive case)”) > 者 (MC t͡ɕiaX, “this (general demonstrative)”) > Mandarin 這 (zhè).

There are three problems with that etymology:

1. 者 'one which', unlike 'this', does not precede nouns.

but perhaps X 者 Y 'one which X Y' was reinterpreted as 'X this Y', followed by X before 者 becoming unnecessary?

2. 者 'this' has no 'possessive case' - no word in Chinese does.

That is really an analytical and terminological error that doesn't affect the validity of deriving 'this' from 'one which'.

3. 者 'one which'/'this' had a 'rising tone' in MC but 這 'this' has a 'departing tone'.

Was 者 'one which' used to write an unrelated homophone 'this'? Is the 'departing tone' of 這 'this' due to a sandhi tone (a 'departing'-like allophone of the 'rising tone'?) reinterpreted as the default tone since 'this' must always precede something (i.e., is in a sandhi context)?

There was also a word for 'this' with a 'level tone' written phonetically with a character 遮 for 'block off'. Could the 'departing tone' of 這 'this' be the etymological and colloquial tone while 'level' and 'rising tone' readings were artificial spelling pronunciations based on 遮 for 'block off' and 者 'one which'?

Wiktionary then says there was a "confusion in medieval handwriting" between 遮 'this' and 這 'to welcome' which led to 這 becoming the dominant spelling for 'this'.

Although Sino-Vietnamese readings almost entirely reflect Chinese as it was spoken during the end of the third Chinese domination of Vietnam (602-938; i.e., right before the early attestation of 這 for 'this' in Jiu Tang shu), I briefly thought già 'this' might be from the fourth Chinese domination (1407-1427). By that time 這 was firmly in place for 'this' in Chinese, and I could see the word entering Vietnamese via the Ming occupation. The trouble is that 這 would have been something like [tʂjɛ] in Ming Chinese* which would have been borrowed into Vietnamese as *tré with a retroflex initial and a mid vowel, not già with a *palatal initial and low vowel. So I think già is from the last days of the third Chinese domination before *a rose to a mid vowel in Chinese.

*2.14.6:28: I don't have any Ming materials on hand, so I am projecting the Yuan dynasty Phags-pa reading ꡆꡦ <jee> (interpreted by Coblin 2007: 171 as [tʂjɛ]; needless to say, the script should be rotated 90 degrees clockwise) of 這 forward into the Ming dynasty.

The Ming reading of 者 (homophonous except for its rhyme) was used to transcribe Jurchen je /tšə/: e.g., 兀者 *[utʂjɛ] for uje 'heavy' (#67 in the Bureau of Interpreters' Sino-Jurchen vocabulary, Kane 1989: 49).

According to Coblin (2003: 349), Robert Morrison romanized the early 19th century Mandarin rhyme of 這 and 者 as -ay as in May, possibly [e] in his dialect of English. (The Mandarin rhyme is a back [ɤ] in the modern standard.)

Of course all those different varieties of northern Chinese were probably not in a linear relationship across half a millennium, but they all have a nonlow vowel in common unlike giá whose low vowel is characteristic of pre-second millennium pronunciation.

Moreover I don't know if the Vietnamese would have perceived the 'departing tone' of the Ming occupiers as a sắc tone which was the standard equivalent of that tone in late first millennnium borrowings. Perhaps my hypothetical *tré would have had a different tone. WELCOMING THIS WIND (PART 1)

nomfoundation.org listed fourteen spellings of Vietnamese gió 'wind' in the traditional Chữ Nôm script. In the previous post, I have already listed twelve spellings containing the phonetic 俞 du 'to consent' (or phonetics containing that phonetic: du 'to pass' and 愈 dũ 'more'). The remaining two spellings lack 俞 and could be said to belong to a Group D of miscellaneous spellings (or groups D and E with one character each):

13. 這 (see the entry for gió in Anthony Trần Văn Kiệm's Giúp đọc( Nôm và Hán Việt 'Aid for Reading Nôm and Sino-Vietnamese')

14. 𩖅

I will discuss 14 later.

13 這 represented Middle Chinese *ŋɨenʰ 'to welcome'; its phonetic is 言 ngôn 'speech' atop 辶, the semantic element for motion. Sino-Vietnamese is almost entirely based on southern Late Middle Chinese, so the Sino-Vietnamese reading of 這 should have been *nghiện which is the SV reading of 這's Middle Chinese homophones such as 唁 'to offer condolences' and 彥 'handsome man'.

But the actual Sino-Vietnamese reading of 這 is giá which is obviously not far from gió 'wind'. It's understandable why gió 'wind' would have been written as 這 giá: the initial consonant gi- and the sắc tone (represented by an acute accent) match even though the vowels don't. (At least lower mid o [ɔ] is just one step up above low a.) It's less understandable how a character that looks like it should have been read something like 言 ngôn came to be read as an open syllable giá without an initial nasal. However, I think I figured out what happened, and I'll post my solution in part 2. The title of this two-part microseries hints at the answer.

2.13.0:11: Giles' Chinese-English Dictionary (1892 I: 48) lists the Sino-Vietnamese readings of 這 as the expected nghiện (converted from its notation) as well as gia (no tone indicated). CHÂN GIÒ NƯỚNG IN CHỮ NÔM

Last night I went to a Vietnamese restaurant in search of chân giò nướng 'grilled trotters'. They weren't on the menu, but I did try to look up how that dish would have been written in the traditional Chữ Nôm script. My guess is something like


The first and third characters are straightforward made-in-Vietnam semantophonetic compounds:

chân 'foot, leg' = 足 'foot' + 眞 chân 'true'

𤓢 nướng 'to grill' = 火 'fire' + 曩 nãng 'formerly'

Although 娘 nương and 孃 nương are better phonetic matches for nướng 'to grill', both already have a left-hand element 女, and it would be awkward to place another left-hand element 火 'fire' next to it. (I suppose placing 娘 or 孃 atop the bottom version 灬 of 'fire' would have been possible, but I haven't seen any made-in-Vietnam characters with 灬.) Stripping them of 女 and replacing that element with 火 'fire' would result in

烺 which already exists and is read lãng 'bright' with l- (not n-)

爙 which already exists and is read nhưỡng 'fiery appearance; Mars' (rare) with nh- [ɲ] (not n-)

whereas 曩 nãng does have n-.

One might conclude that matching initials were a high priority when selecting phonetic components of Chữ Nôm characters. But the second character of the dish I wanted does not have a phonetic with a matching initial:

𨃝 giò 'leg of an animal' = 足 'foot' + 徒 đồ 'disciple'

The trouble is that I don't think there is any Chinese character whose Sino-Vietnamese reading combines gi- with a rounded vowel. Although the only part of 徒 đồ that precisely matches giò 'leg of an animal' is the tone, everything else is close enough:

đ- [] is not gi- [z] (northern) ~ [j] (southern) < *ʑ, but at least it's neither labial nor velar; it's in the middle zone with gi-

ô [o] is back mid rounded like o [ɔ]

I just realized giò 'leg of an animal' could in theory have been written with 由 do (d- [z] (northern) ~ [j] (southern) < *j) as a phonetic. Cf. how gió 'wind' with a different tone was written with a d-phonetic:

Group A with phonetic 俞 du 'to consent'

1. with a Vietnamese abbreviation of 風 'wind' on top: ⿱風俞 (U+2CC82)

2. with a Vietnamese abbreviation of 風 'wind' on the right: 𫖾

3. with 雨 'rain' (symbolizing weather phenomena) on top: ⿱雨俞 (U+2CC05)

Group B with phonetic 逾 du 'to pass'

4. as 逾 without modification

5. with 風 'wind' on top: 𩙋

6. with a Vietnamese abbreviation of 風 'wind' (not in Unicode) on top: 𩙌

7. with 雨 'rain' (symbolizing weather phenomena) on top: 𫕲
Group C with phonetic 愈 dũ 'more'

8. with 風 'wind' on the left: 𩙍

9. with a Vietnamese abbreviation of 風 'wind' on the left: 𫗃

10. with 月 'moon/meat' as a substitute for the Vietnamese abbreviation of 風 'wind' on the left: ⿰月愈 (not in Unicode)

11. with a Vietnamese abbreviation of 風 'wind' (not in Unicode) on the right: ⿰(𠘨+二)愈

12. with a Vietnamese abbreviation of 風 'wind' (not in Unicode) on top: 𫗄

2.12.2:10: Added all d-phonetic characters for gió 'wind' (I couldn't stop at one).

Tangut Yinchuan font copyright © Prof. 景永时 Jing Yongshi
Tangut character image fonts by Mojikyo.org
Tangut radical and Khitan fonts by Andrew West
Jurchen font by Jason Glavy
All other content copyright © 2002-2018 Amritavision