Amaravati: Abode of Amritas

20.1.25.23:55: KHITAN NEW YEAR: THE 1100TH ANNIVERSARY OF THE KHITAN LARGE SCRIPT

? qulugh ai nai sair ? nyair

'white rat year, head month, one day'

This year the Khitan large script turns 1100, so I'll be using the Khitan large script for dates for the rest of the year.

The last three characters of the date are shared with Chinese (月一日), but the other four bear no resemblance to Chinese 白鼠年首 <WHITE RAT YEAR HEAD>. (Other Chinese equivalents exist, but they don't look like the Khitan large script characters either: e.g., 子 <CALENDRICAL.RAT>, 頭 <HEAD>, etc.) Why? As Janhunen (1994: 111) asked,

If it was the aim to create a [Khitan] script distinct from the Chinese, why were not all [Khitan large script] characters consistently replaced or modified?

I agree with Janhunen that the Khitan large script was not 'invented' in 920; it is an outgrowth of some earlier script, perhaps the fragmentarily attested Parhae script or the wholly lost Serbi script. And that script was a sister of the Chinese script with innovations absent from Chinese: <WHITE>, etc. I think the date 920 may refer to the revision of an earlier script for Khitan use.

The date itself may not be accurate, as the earliest dated Khitan script text is the epitaph for 耶律延寧 Yelü Yanning from 986. Are there Khitan large texts from 920-985 that have not yet been discovered, or was the script 'created' after 920?

Notes on the characters/words:

<WHITE>: Reading unknown.
<RAT>: See "Happy New Year 2020" for an explanation of the reading.
<HEAD>: A pictograph of a headdress?
<ONE>: Shimunek (2017: 234) reads this as mas, but does not explain his reasoning.

To tie up loose ends from last year, I've posted all the blog entries from between 1.2 and today:

20.1.24.23:59: YELLOW PIG 12/30

songgiyan uliya aniya

juwa juwe biya orin gusin inenggi

'yellow pig year, ten two month, thirty day'

1. Today I clicked on Andrew Hsiu's map of the Qiangic linguistic area. Tangut is to the north of it; the former Tangut capital is now known by its modern Mandarin name Yinchuan.

I care about the Qiangic linguistic area because the languages in it are Tangut-like. Whether they actually subgroup with Tangut is another matter. Jacques (2014: 2) thinks most of them do, so he places them in a 'Macro-rGyalrongic group'. The exceptions are:

Ersuic (Jacques [2004] regards this as a sister of Macro-rGyalrongic)
Naic (Jacques [2004] regards this as a sister of Macro-rGyalrongic)
Baima (no close relation to Bai or Macro-rGyalrongic)
Bai (no close relation to Baima or Macro-rGyalrongic)

2. Hsiu posits 'missing' Sino-Tibetan branches to serve as sources of Sino-Tibetan-like vocabulary in

Austroasiatic

"Contrary to Starosta's (2005) proposal that Sino-Tibetan, Austroasiatic, and Austronesian are all related (the 'Altaic of southern China,' so to speak), I consider similarities between these language families to be due to intensive contact in southern China during the Middle Neolithic. [...] Could Proto-Austroasiatic have started out as a Middle or Late Neolithic creole in Lingnan that mixed elements of Sino-Tibetan and Austronesian with a non-related native substratum? The creoloid typology of Austroasiatic, especially in its grammar, seems to suggest so, in addition to the lexical evidence."

Hmong-Mien

"Proto-Hmong-Mien *dzjuŋH 'seven' is shared by [the Sino-Tibetan branches] Bodish, Idu-Taraon, Miju, and Meyor." [Links mine.]

Is that the only word from a lost branch in Hmong-Mien? I think it's just a random vague lookalike of Tibetan bdun, Miju nɯn, and Hsiu's Proto-Idu-Taraon *ɣoŋ < *roŋ, which may not be related to each other. STEDT lists weŋ (cognate to the Idu-Taraon word?) for Meyor.
Roger Blench and my former student Mark Post (2011) regard Miju as an isolate, Idu-Taraon (their 'Mishmic') as a non-Sino-Tibetan language family, and Meyor as an East Bodish language.

Kra-Dai

Jiamao

Hsiu considers Jiamao to be an independent branch of Kra-Dai, but usually Jiamao is regarded as an aberrant member of the Hlai branch, and Norquest (2016) regards Jiamao as an isolate heavily influenced by Hlai. I find Norquest's arguments persuasive at first glance.

To solidify the case for these branches, one would have to

demonstrate the resemblances are not by chance
demonstrate that the words could not be from known branches because of

geography
features (innovations/retentions) absent from other potential donors

3. When hearing the word petrol /ˈpɛtɹəl/ out loud on Magnum, P.I. tonight I finally realized it's short for petroleum /pəˈtɹoʊliəm/. Duh. That isn't the first time it took me a long time to link two words whose relationship is obvious in spelling but not in pronunciation. I wish I could remember the last time that happened. I think it was sometime within the past few months.

Someone learning English as a foreign language and first encountering those words in print would immediately link them and face the different problem of pronouncing them differently: petrol is not /pəˈtɹoʊl/, and petroleum is not /ˈpɛtɹəliəm/.

4. Tonight - three days after I started reading William C. Hannas' The Writing on the Wall: How Asian Orthography Curbs Creativity (2003) - it finally occurred to me that literate Khitan would be interesting test subjects for his ideas about the effects of writing systems on thinking.

The Khitan had two scripts, and nobody really knows why. Andrew West's great essay on the mystery ends,

Both scripts are complex enough to require a considerable investment of time and effort to learn to read and write, so how is it possible that both scripts managed to coexist and flourish for so long ? Did the Khitan education system require students to learn both scripts, or were Khitan scholars only able to read and write one or other of the two scripts ? It makes no sense to me ...

... or me.

Let's imagine that Hannas could be sent back a thousand years to the Khitan Empire. Using his knowledge of Chinese, Japanese, and Korean, Hannas would be able to easily learn Khitan, an 'Altaic'-type language with many Chinese loanwords like Japanese and Korean. Hannas proposes that syllabic scripts without word division inhibit creativity. So in his framework, what effects would the Khitan scripts have?

A brief comparison:

Khitan script	syllabic?	alphabetic?	word division?
large	not quite	no	no
small	no	not quite	mostly

The large script, despite its superficial similarity to the Chinese script, does not have a one-to-one correspondence between syllables and characters. Some syllables are written as two-character sequences: e.g., Han (the Chinese name 韓) as 何至 <ha.an>. Conversely, some disyllabic words are written as single characters: e.g., namur 'autumn' as 禾 (cf. Chinese 秋 <AUTUMN>).

The small script has a mixture of characters for single segments and syllables. The small script is more analytic than the large script which in turn is more analytic than the Chinese script:

small script > large script > Chinese script

And unlike either the large script or Chinese script, words are generally written as blocks - the first instance of word division in East Asia. The only exceptions to that rule are Chinese loanwords which are written as one syllable per block (not counting Khitan affixes added to those blocks): e.g., the disyllabic word hongdi 'emperor' from Liao Chinese 皇帝 *hongdi [xɔŋti] is written as two blocks

<075 037> <hong di>

rather than as a single block

<075.037> <hong.di>.

So if Hannas is right, small script users might be more inclined toward creativity than large script users who would still be more inclined toward creativity than those only literate in the Chinese script.

20.1.23.23:51: YELLOW PIG 12/29

songgiyan uliya aniya

juwa juwe biya orin uyewun inenggi

'yellow pig year, ten two month, twenty nine day'

1. Last night I learned from the Korean Wikipedia that the eight trigrams have 二進法 ijinbŏp 'binary' equivalents.

2. I have no idea if this is a true explanation for the presence of 隹 <BIRD> in 進 <WALK.BIRD> for 'forward', but it's a useful memory aid:

A bird can only walk forward but not backwards, hence implying "forward".

3. How did Proto-Germanic *hw- become tsj- in West Frisian tsjil 'wheel'?

4. Wikipedia's discussion of the possible Indo-European origin of the Chinese chariot is a bit anachronistic:

However archeological evidence shows that small scale use of the chariot [in China] began around 1200 BCE in the late Shang dynasty. This corroborates the material spread of the invention from the Eurasian Grass-Steppe to the West, by Proto-Indo-Europeans (likely the Tocharians) who similarly have borne horse, agricultural, and honey making technologies through the Tarim Basin into China.

Proto-Indo-European speakers and Tocharians are not the same people. Proto-Indo-European had ceased to exist centuries before eastern Indo-European speakers might have introduced the chariot to China.

5. Today I found a Wiktionary entry for

རོ་མཱན་གྱི་ལྷ་གཙོ་བོ་

<ro.mān.gyi.lha.gtso.bo.> = 'Roman-GEN god principal' = 'principal god of the Romans' = 'Jupiter'

Is that a real Dzongkha expression? It looks like a nonce attempt to explain who Jupiter was rather than a name for Jupiter. I appreciate how Wiktionary contains entries for items absent from traditional dictionaries, but I draw the line at transparent phrases. And a Google search for that particular phrase only leads to that Wiktionary entry. (I'm not counting partial matches.)

It is strange that a Dzongkha description of Jupiter has an entry but that Tibetan ཕུ་བོ <phu.bo> 'older brother' does not.

Oddly STEDT doesn't have that Tibetan word either.

6. Today while copying the 契丹小字研究 Qidan xiaozi yanjiu (Research on the Khitan Small Script) hand copy of the epitaph for Empress 仁懿 Renyi (?-1076) of the Khitan Empire, I encountered the first Khitan small script block I've ever seen with three components in a row:

<244.172.339> <s.ugh.i> (12.1)

The index of blocks has a more conventional two-on-one form:

<244.172/339> <s.ugh/i>

(I use </> to indicate row breaks within a block.)

Which form is the one on the inscription?

I don't know what the word means. It was a hapax legomenon as of 1985. Have more attestations been found since?

7. Today I've been puzzled by the Sino-Tibetan word for 'horn':

Old Chinese 角 *C[k]rok 'horn', 'corner'

*[k] is in brackets because Baxter and Sagart (2014) could not determine on the basis of Chinese-internal evidence whether the stop was velar or uvular. rGyalrong forms like Tshobdun tə˥qrɯ˧ 'horn' point to *-q-.

Written Tibetan ru 'horn', gru 'corner'

did a prefix *k- voice to g- before r-?

Old Burmese khruiv· > Written Burmese khyui [dʑo] 'horn'

khr- is from *kr-

pre-Tangut *KAkru > 𗅡 3517 1khwy2 'horn'

Li Fanwen (2008: 329) says 𗅡 1981 is an 訛體 erroneous form of 𗅡 3517, but I can't see any difference between the two in his font. Both have the same Unicode codepoint (U+17161).
see below for my reasoning for *K-
the *-A- and *-r- are needed to condition Grade II (indicated by final -2 in my transcription) in a higher vowel (*u) syllable

I wish I knew the Pyu word for 'horn'.

Nathan reconstructs *əw for this correspondence:

OC *o : WT u : OB uiv·

I think OB uiv· was [əw], a direct preservation of Sino-Tibetan *əw that became modern ui [o] via *ow.

I reconstruct a root *rəw. That much seems certain. The rest, however ...

The *C- in Old Chinese 'horn' is carried over from Baxter and Sagart (2014). Why did they reconstruct it?

I know of no phonetic series evidence or Chinese-internal comparative evidence for *C-: e.g., 建陽 Jianyang Min has k- instead of an h- resulting from intervocalic lenition (cf. 狗 Jianyang hou˨˩ < *CAkoʔ 'dog').

What is the function of the *-k in Old Chinese?
What is the function of the g- in Written Tibetan?
What is the function of the *k- in pre-Burmese?
Why is the initial of modern spoken Burmese 'horn' voiced [dʑ] instead of the regular reflex of khr-: voiceless aspirated [tɕʰ]?
Why does Tangut have an aspirated initial? I think the aspiration may be a trace of an earlier aspirating prefix: *KAk > *xk- > kh-.

I think pre-Tangut *KA- is the same prefix that I can't explain in Old Chinese (*C-), Written Tibetan (g-), and pre-Burmese (*k-).
On the other hand, rGyalrong forms often have minor syllables with t-: e.g., Tshobdun tə˥qrɯ˧.

8. Nathan Hill (2019: 227) thinks dr- in Tibetan drug 'six' is from *kr- (cf. Old Burmese khrok· < *krəwk 'six'). *kr- > dr- would be a double assimilation in terms of place and voicing.

But Pyu has tr- (tru 'six') and Tangut chh- in 𗤁 3200 1chhiw3 'six' may be from *Ktr- (cf. rGyalrong kətr-forms like lCogtse kətɽok; Jacques [2004: 296] reconstructed Proto-rGyalrong *kə-tɽɔk. Moreover, *kr- became khr- (Hill 2019: 221) in Tibetan khrab 'armor', so why would it become dr- in 'six'?

Might the Tibetan, Burmese, and Pyu initials all be simplifications of an earlier complex cluster *ktr-?

9. Nathan Hill (2019: 229) proposes that Written Burmese kuiy· [ko] 'body' may be a borrowing from Pali kāya- 'body' rather than a Sino-Tibetan word cognate to Tibetan sku and Old Chinese 軀 *CIkʰo (*HIko with a minor syllable initial conditioning aspiration?). But Luce (1981) lists kuiv· [kəw] as the Old Burmese spelling. Perhaps the Pali-like silent -y· in the modern spelling was an addition motivated by folk etymology. However, regarding kuiv· as native raises another unresolved issue: k- should be from *g- which doesn't match the voiceless stops in Tibetan and Chinese.

10. Burmese has [tɕ tɕʰ dʑ] but [ʃ] (not [ɕ]). What is the reason for this asymmetry?

11. The rGyalrongic Languages Database has two varieties called "Pho sul" in nearby locations: 蒲西 Puxi (a Mandarinization of Pho sul?) and a village called 斯遥吾 Siyaowu in 蒲西 Puxi. Wikipedia says there are "Phosul" varieties of both Khroskyabs and Horpa. Is there one Phosul language that has been classified two different ways or are there two Phosul languages? Jackson Sun (2000: 214) explains:

Puxi is one of the three townships in southern Rangtang County in which Shangzhai [Horpa] speakers dwell [...] Of the five villages within Puxi Township, Shangzhai is used in Dayili Village and those hamlets of Puxi and Xiaoyili Villages north of the Rangtang River, abutting Lavrung [Khroskyabs]-speaking hamlets across the river in the same villages. The latter language is distributed in Siyaowu Village also [...]

If I understand that passage correctly, a variety of Shangzhai Horpa is spoken in Puxi Village, and a variety of Khroskyabs is spoken in Siyaowu Village.

Horpa and Khroskaybs have different words for 'sleep'. Let's compare the "Pho sul" words for 'person' from the rGyalrongic Languages Database with some data from Jackson Sun (2018: 4) (sortable version at Wikipedia):

Phosul (Puxi Village) vdzi
Phosul (Siyaowu Village) vʝu
Hbrongrdzong Khroskyabs vɟoʔ
Stau (a.k.a. Rtau; a Horpa language) vdzi

Puxi Village Phosul may be Horpa, as it has vdz- like Horpa languages, whereas Siyaowu Phosul may be Khroskyabs, as it has a palatal after v like Hbrongrdzong Khroskyabs.

The Khroskyabs and Horpa words for 'person' may be cognate to Tangut 𘓐 2541 2dzwo4 < *PIndzojH 'person'.

Jacques (2014: 206) only proposes pre-Tangut *-jok (= my *-I-ok) as a source of -jo (= my -o3 and -o4), but I wonder if pre-Tangut *-I-oj (equivalent to a nonexistent *-joj in Jacques' system) might be another source. Puxi Village Phosul and Stau -i seem like unlikely reflexes of an earlier *-ok.

12. Is Lai Yunfan's site the only website written in Wobzi?

20.1.22.23:59: YELLOW PIG 12/28

songgiyan uliya aniya

juwa juwe biya orin jakun inenggi

'yellow pig year, ten two month, twenty eight day'

1. Last night I got the copy of William C. Hannas' The Writing on the Wall: How Asian Orthography Curbs Creativity (2003) that I ordered on Yellow Pig 12/6. On Yellow Pig 12/1, I wrote my initial impressions based on a preview on my Kindle. I'm rereading the preview now. I'm not used to reading on paper anymore.

2. Last night I found Andrew Hsiu's Sino-Tibetan Branches Project for its Proto-rGyalrong reconstruction. Why does rGyalrong matter?

Proto-rGyalrong is an elegant marvel. It may be one of the most conservative reconstructable Sino-Tibetan meso-languages. It is clear that a reconstruction of Proto-Sino-Tibetan would definitely need to take Proto-rGyalrong into account, since Proto-Sino-Tibetan morphology, phonology, and lexicon would have looked very similar to those of Proto-rGyalrong. In order to understand how reflexes of highly eroded eastern Sino-Tibetan languages had gotten to where they are from Proto-Sino-Tibetan, it is crucial to consider Proto-rGyalrong.

Is rGyalrong the Sanskrit or Greek of Sino-Tibetan?

Hsiu's Proto-rGyalrong *k.tek 'one' is very much like my pre-Tangut *kVtek or *kAtik (formerly *kʌ-tek or *kʌ-tik in 2012 and *CV-tek in 2011).

The low series vowel of Tangut 1lew1 'one' is either original (< *e) or secondary (< *A-i): i.e., conditioned by a preceding *A.
The l- of 1lew1 'one' is from a *-t- that lenited intervocalically. The lost preceding vowel could have been an unknown low series vowel *A that conditioned the lowering of a following *i or it could have been a high or low series vowel *V that was lost after lenition but before presyllabic vowels conditioned the warping of main vowels.

3. Hsiu also has a page on Pyu. His 2018 Excel file incorporates data from my 2016 SEALS presentation on Pyu numerals. A paper on Pyu language history is on my to-do list.

4. I just found Hsiu's page illustrating his wave model of Sino-Tibetan. He places Pyu in his fourth wave, but I am hesitant to commit to such a detail.

5. I want to figure out where Pyu is in the comparative framework that Nathan Hill established in his landmark book The Historical Phonology of Tibetan, Burmese, and Chinese (2019).

Nathan wrote on p. 156,

Many features of [Old Chinese] loans into Vietic are not predictable on the basis of the Old Chinese source word in Baxter and Sagart's reconstruction; for example, Rục has at least -ə-, -à-, -a-, and -u- available as the vowel of the minor syllable (kəcáy 'paper', kàraŋ 'bright sunshine', kadɔːk 'nape of the neck', kumúa 'dance'), but these different vowels are not predictable on the basis of the Old Chinese forms (紙 tsyeX < *k.teʔ 'paper', 朗 langX < *k.rˤaŋʔ 'bright', 脰 duwH < *kə.dˤok-s 'neck', 舞 mjuX < *k.m(r)aʔ 'dance').

I first saw those comparisons six years ago, but it didn't occur to me until last night to compare Ruc minor syllable vowels with the minor syllable vowels that I would reconstruct for Old Chinese if I didn't know about Ruc:

sinograph	gloss	Early Old Chinese	Middle Old Chinese	Late Old Chinese	Middle Chinese	Ruc	height match?
紙	paper	*CIteʔ	*CItieʔ	*tɕieʔ	*tɕḭe	kəcáy	?
朗	bright	*raŋʔ	*raŋʔ	*laŋʔ	*la̰ŋ	kàraŋ	?
脰	neck	*CAdoks	*CAdoks	*doh	*do̤w	kadɔːk	yes
舞	dance	*CImaʔ	*CImɨaʔ	*mɨaʔ	*mṵo	kumúa	yes

Notes on each word:

'paper': I know of no Chinese-internal evidence for the identity of *C-. Baxter and Sagart reconstruct *k- on the basis of Ruc.

A high vowel *I is needed to account for the Middle Chinese vocalism and the palatalization of *t. I don't know whether *I was *[i], *[ɨ], or *[u]. Ruc ə would seem to rule out *[u]. I don't know if Ruc has i in minor syllables; if it doesn't, Ruc ə might correspond to a Chinese *[i] or *[ɨ].

'bright': Baxter and Sagart reconstruct *k- on the basis of Ruc. I am unaware of any Chinese-internal evidence for a minor syllable. Early and Middle Old Chinese *CACa-sequences and *Ca-sequences can have the same reflexes in Late Old Chinese, so it's possible that Late Old Chinese *laŋʔ is from an earlier, Ruc-like *kAraŋʔ.

I cannot explain why Ruc kàraŋ doesn't have an acute tone corresponding to Chinese *-ʔ. Cf. the tone/*-ʔ correspondences in 'paper' and 'dance'.

'neck': Lenition in 建陽 Jianyang lo was condiotnied by the vowel of a lost presyllable:

*CVd- > *CVl- > l-

I know of no Chinese-internal evidence for the identity of *C-. Baxter and Sagart reconstruct *k- on the basis of Ruc. *V had to be low *A since high *I would have conditioned the palatalization of *d.

'dance': I know of no Chinese-internal evidence for the identity of *C-. Baxter and Sagart reconstruct *k- on the basis of Ruc.

A high vowel *I is needed to account for the Middle Chinese vocalism. *CAmaʔ or *maʔ would have become Middle Chinese *mo̰, not *mṵo with a high vowel. Ruc enables me to identify *I as *u. I think Early and Middle Chinese had at least two kinds of high vowels in minor syllables: *i and *u. It is usually not possible to determine whether a minor syllable's high vowel was front or back, but this is a rare exception.

Another kind of rare exception involves *i before *a:

*CiCa > *Cia

*CuCa (and *CɨCa?) > *Cɨa

Contrast these two words for 'chariot' which are both written 車:

Early Old Chinese *tiqʰ(l)a > Late Old Chinese *tɕʰia > Mandarin chē

Baxter and Sagart (2014: 157) reconstruct *t.K- for cases of velars and uvulars palatalizing before nonfront vowels. But maybe such cases involved *CiK-.

Early Old Chinese *Cuq(l)a > Late Old Chinese *kɨa > Mandarin jū

Could *C- have been *t-?

See Baxter and Sagart (2014: 158) for the reasoning behind reconstructing a uvular.

The *-qʰ- ~ *-q- alternation is unexplained. If *Ci- were *ki-, perhaps *kik- > *xtɕ- > *tɕʰ-. *k-conditioned aspiration is reconstructed for Korean, and I have reconstructed it for Tangut as well.

There is no Chinese-internal evidence for a medial liquid, but if there was one, I think it would have to be *-l- which disappeared without a trace. On the other hand, Baxter and Sagart (2014) see *-r- as a possibility, but I think an *-r- would have conditioned retroflexion: *tiqʰr- would have become *tʂʰ- rather than *tɕʰ- in Late Old Chinese.

I recall that Pulleyblank thought this word might be a loan from Indo-European (cf. Proto-Indo-European *kʷékʷlos 'wheel', Tocharian B kokale 'cart, wagon', Sanskrit cakra- 'wheel'). But Baxter and Sagart's *t- doesn't match *kʷ-, though it might be the closest approximation of a foreign palatal *c- absent from Old Chinese.

Another possibility was that the Chinese forms were something like *kiqʰla and *ku- in *kuqla-. But why would the Chinese borrow a foreign labiovelar as a uvular if they already had labiovelar *kʷ in their own language?

Here are revised reconstructions incorporating features from Ruc:

sinograph	gloss	Early Old Chinese	Middle Old Chinese	Late Old Chinese	Ruc	height match?
紙	paper	*kIteʔ	*kItieʔ	*tɕieʔ	kəcáy	?
朗	bright	*kAraŋʔ	*kAraŋʔ	*laŋʔ	kàraŋ	yes
脰	neck	*kAdoks	*kAdoks	*doh	kadɔːk	yes
舞	dance	*kumaʔ	*kumɨaʔ	*muaʔ	kumúa	yes

The Ruc forms seem to have been borrowed between the Middle and Late Old Chinese stages. They have a mix of old and new features:

old (i.e., like Middle Old Chinese):

minor syllable retention
*r-retention
*-k-retention

new (i.e., like Late Old Chinese):

*t-palatalization after *I
*-e > -aj (southern breaking; a dialectal feature not in my generic Late Middle Chinese reconstruction)
*-ɨa > *-ua after labials (a change I thought came later in Early Middle Chinese)

6. While using BabelMap to type Pho sul βjot 'eight' last night, I discovered the character Ꞵ (U+A7B4 LATIN CAPITAL LETTER BETA). What languages are written with it? eki.ee lists none.

7. Today I discovered that both my 2012 sketch of pre-Tangut and Sofronov's 2012 reconstruction of Tangut rhymes are online at orientalstudies.ru.

20.1.21.17:34: YELLOW PIG 12/27

songgiyan uliya aniya

juwa juwe biya orin nadan inenggi

'yellow pig year, ten two month, twenty seven day'

1. Thoughts today while typing and handwriting the Sino-Jurchen vocabulary of the Ming dynasty bureau of translators:

1a. Jin (1984: 159) identified the phonogram

<gai> [kaj]

as being derived from Chinese 可 when it was read *ka (*kʰa to be more precise). But if the Jurchen script was invented c. 1119, long after 可 came to be read *kʰo in northern Chinese, how would its creator(s) know of the old reading *kʰa? This archaism hints at older roots for the Jurchen script. In Late Old Chinese, 可 was read *kʰaiʔ. Perhaps the origin of <gai> goes back to a pre-Jurchen script in which 可 or a derivative was used to write [kaj]. The trouble is that the earliest (?) of the northern scripts, the lost Serbi script, is from the 5th century AD after *-ai shifted to *-a in Chinese.

1b. Why does the Jurchen phonogram

<giyau>

have what looks like

<BRUSH> pi (< graph and word from Chinese 筆)

on the right side? And what is the function of the element resembling Chinese 亻 <PERSON> on the left?

1c. Jurchen aliku 'platter' was miswritten as

<ali.in>

as if it were alin 'mountain'. Presumably the unknown correct spelling has two characters <ali.ku>. But which of these <ku> is the proper <ku>?

In theory the unknown character could even be a fifth <ku> that has not yet been discovered. There is no guarantee that all Jurchen large script characters have been found. (Almost none of the Jurchen small script characters have been found except for these six in two blocks:

. Assuming the Jurchen small script had roughly the same number of characters as the Khitan small script, I presume there were a few hundred Jurchen small script characters.)

1d. The Jurchen phonogram

<me>

resembles Chinese THOUSAND>, so for a second I thought it might have originated as a graph for a me-something word for 'thousand' resembling Jurchen minggan 'thousand' in some language in Parhae. But then I thought it might be a simplification of the right side of Liao or Jin Chinese 脉 *mai.

2. Today I was reading about Mary Callahan Erdoes. How is Erdoes pronounced in American Emglish? It looks like an Americanization of Hungarian Erdős [ɛrdøːʃ] (as in the Erdős number). I associate oe with German ö and not Hungarian ő, but I just learned that óe with an acute accent is a historical spelling of ő in names. Was the name spelled with an acute accent as Erdóes in Hungary? I only found a single Google result for Erdóes.

3. The late Paul Erdős

would offer payments for solutions to unresolved problems. These ranged from $25 for problems that he felt were just out of the reach of the current mathematical thinking (both his and others), to several thousand dollars for problems that were both difficult to attack and mathematically significant. There are thought to be at least a thousand remaining unsolved problems, though there is no official or comprehensive list. The offers remain active despite Erdős's death[.]

What would a list of unsolved linguistic problems be like, and how much would each problem be worth? Naturally I first think of Pyu and TJK (Tangut/Jurchen/Khitan), but other possibilities include the Voynich manuscript, Linear A, rongorongo, etc.

4. Erdős had his own personal vocabulary.

5. Timothy Gowers in a review of a book by Terence Tao:

It has been said that David Hilbert was the last person to know all of mathematics

Is it possible to 'know all of linguistics'? I vote no.

6. Speaking of knowing, Hilbert's epitaph is a response to ignoramus et ignorabimus 'we do not know and we shall not know':

Wir müssen wissen. 'We must know.'

Wir werden wissen. 'We will know.'

I wish I could say we will know how the TJK scripts work. I want to believe there is some reasoning that has eluded us. But what I want and believe is not necessarily what is real.

Hilbert was speaking of mathematics. Here's a quotation in a similar vein about decipherment:

Any possible system made by a man can be solved or cracked by a man.

- Yuri Knorozov, 1998

I didn't learn of him until the following year after his death. It's been over twenty years since I read Breaking the Maya Code, a gift from my Russian language professor Prof. James Brown. I should read the new edition I got a few years ago.

7. I remember the dark hour when Russian might have been eliminated from the University of Hawaii (despite Russia's Pacific presence!):

But James Brown, a professor of Russian, said just because a subject is not popular now does not mean it is not needed.

"There's something to be said about providing students with what they want, but it can become ludicrous to the point where you provide only that"; Brown said. "You end up having just one flavor of things."

Obviously I wanted (and took) Russian.

Today Russian is still around at UH. But where is Prof. Brown?

8. In the final episode of Reba,v the title character coined luffle (sp.?) from loving couple. What's interesting is the [f] in the middle: it's a fricative like [v] but voiceless like [p].

20.1.20.23:24: YELLOW PIG 12/26

songgiyan uliya aniya

juwa juwe biya orin ninggu inenggi

'yellow pig year, ten two month, twenty six day'

1. I've been playing 宇宙からのメッセージ・銀河大戦 Uchū kara no messēji: ginga taisen (Message from Space: Galactic Wars, 1978-79) in the background while working. A name in the ending credits caught my eye: 高梨曻 Takanashi ?. I had never seen the third character 曻 before and couldn't find it anywhere until today when I figured out that its radical according to Unicode was 曰 <SAY> rather than 日 <SUN> and was finally able to find it in Andrew West's BabelMap.

I guessed that 曻 was an alternate spelling of the common name 昇 Noboru 'rise', and Wiktionary confirms my guess. 曻 is a Japanese-only character with the same readings as 昇: Sino-Japanese shō and native Japanese noboru.

昇 is a semantic-phonetic compound <SUN.stəŋ>: the phonetic 升 (Old Chinese *stəŋ) is a drawing of a container (in Old Chinese, 'container' and 'to rise' were homophones both written as 升), and 日 <SUN> (something that rises) was added as a disambiguator.

The top element of 曻 should also be 日 <SUN>, but the character is in the 曰 <SAY> block of characters in Unicode, and I think that's a mistake.

The bottom element of 曻 is 舛 <OPPOSE> which sounded nothing like 升 in Old Chinese:

舛 *CI[tʰ]o[n]ʔ vs. 升 *stəŋ

the initial of 舛 could have been a *CIC- sequence that later simplified to*tʰ-

But 舛 and 升 are graphically similar, so in Japanese, 舛 (also with an optional 木 <WOOD> radical: 桝) came to be an alternative spelling for the native word masu (a unit of measurement) written 升. So 舛 came to replace 升 in 曻. And 舛 <OPPOSE> with its original meaning is so rare in Japanese that few would perceive any negative connotations in 曻.

Shpika stats (plus 漢検 Kanken levels added 1.21.19:01):

kanji	Aozora	news	Twitter	Wikipedia	Kanken
日	5	1	2	2	10
木	118	244	201	142	10
曰	1397	3535	2229	1753	1 (!)
昇	1578	618	1138	879	3
升	2095	2122	2482	2370	pre-2
桝	3393	2845	2713	2893	pre-1
舛	5491	1665	2667	2989	pre-1
曻	-	-	-	4698	-

曰 <SAY> in modern Japanese is almost wholly in the archaic expression 曰く <SAY.ku> iwaku 'sayeth'.

(1.21.19:03: 曰 is at the highest Kanken level, which makes no sense given its relative frequency and the fact that every high school student in Japan encounters it during the required study of Literary Chinese.)

(1.21.20:11: I would expect 曰 to be a level pre-1 character. Only characters required in school can be at levels 2 or lower. Pre-1 characters are relatively common but not required, whereas level 1 characters are rare. 曰 is encountered in school but is not on the must-learn jōyō kanji list.)

I'm surprised 桝 is more common than 舛 which I've encountered in the name 舛田 Masuda (a name I learned from 舛田利雄 Masuda Toshio on the staff of Space Battleship Yamato). I've never seen 桝 before. Wiktionary says some strange things: that 桝 is a postwar simplified form of 枡 (but 桝 has more strokes!) and means 'measuring box' in Chinese (even though I thought 舛 = 升 is a Japanese-only equation).

I just learned that 升 has a new modern reading: チート chīto 'cheat', based on the coincidental similarity of the katakana to 升. There is no graphic relationship between the katakana and 升:

チ <chi> is derived from 千 <THOUSAND>
ー (vowel length marker) is derived from the right side of 引 <PULL> (turned ninety degrees)
ト <to> is derived from 止 <STOP>

2. New words I encountered today:

2a. Redology (紅學 - not Erythrology?)

That is a real field of study, though Wiktionary's second definition of -ology is

(often humorous) added to an ordinary English word to create a name for a (possibly non-existent) field of study.

2b. logy, the sister of ism (with an unrelated homograph)

2c. pseudepigrapha (not pseudo- ... or ... -ia!)

2d. anapodoton

2e. anacoluthon

I've known of all of those things but didn't have names for them until now.

3. I did not, however, know of the Codex Amiatinus until today.

The Codex Amiatinus is the earliest surviving complete manuscript of the Latin Vulgate version of the Christian Bible.

Although it is named after the Italian mountain where it was found, it

was produced around 700 A.D in the north-east of England, at the Benedictine monastery of Monkwearmouth–Jarrow in the Anglo-Saxon Kingdom of Northumbria and taken to Italy as a gift for Pope Gregory II in 716.

More new words (in bold):

A little space is often left between words, but the writing is in general continuous. The text is divided into sections, which in the Gospels correspond closely to the Ammonian Sections. There are no marks of punctuation, but the skilled reader was guided into the sense by stichometric, or verse-like, arrangement into cola and commata, which correspond roughly to the principal and dependent clauses of a sentence.

Today, colons and commas have different referents (and regularized English plurals).

4. I never heard of the acronym TRO until today.

20.1.19.23:29: YELLOW PIG 12/25

songgiyan uliya aniya

juwa juwe biya orin shunja inenggi

'yellow pig year, ten two month, twenty five day'

1. For years, I thought

<ca> (a transcription of Liao Chinese 察 *cha in line 1 of the 耶律昌允 Yelü Changyun epitaph [1062])

was unique to the Khitan large script, but today I learned that it looks like a Tang dynasty (i.e., pre-Khitan Empire) variant of the Chinese character 司. There is even a variant of the derivative character 詞 <SPEECH.司> with a <ca> lookalike on the right side. But ... 司 was pronounced *sï in Liao Chinese. Not very much like ca. So why is a lookalike of a variant of 司 a phonogram for ca? The odds of any Khitan large script character being pronounced approximately like its Chinese lookalike are low, though not zero, as there are some Khitan large script characters that have Liao Chinese-like readings (minus tones, of course): 太 tai (but also dai!), 天 tên, 水 shui, 吾 ngu, 之 cï, 皇帝 hongdi, 京 ging, 守 sheu, 王 ong, etc. But one must be on guard, because many other Khitan large script characters are false friends: e.g.,

上 is read ha (cf. Liao Chinese *shang)
仲 is read shang (cf. Liao Chinese *zhung)
五 is read tau (cf. Liao Chinese *ngu)
高 is read tau (cf. Liao Chinese *gau)
氷 is read mu (cf. Liao Chinese *bing 'ice'

was the Khitan word for 'ice' cognate to Written Mongol mösün?

2. Another variant of 詞 <SPECH.司> is 𧥝 <SPEECH.𠃌> with the phonetic reduced to 𠃌. The current simplified character for standard Mandarin 詞 cí 'word' is 词, but that could be reduced even further to three strokes: ⿹𠃌讠.

3. Wiktionary regards 𠃌 as a component in the Korean phonogram 㔖 <ka.k> kak, but the bottom component is in fact the hangul letter ㄱ <k> which is never written with a hook. The top part is the Chinese character 加 <ADD> which is pronounced ka in Korean.

4. Peter Golden's An Introduction to the History of the Turkic Peoples (1992) brings up the ever-vexing problem of consistency in writing different languages in the Roman alphabet: e.g., <ł> represents both Polish [w] and Armenian ղ [ʁ]. Historicallyղ was a velar lateral [ɫ] (which is what I think Tangut /l/ was).

Classical Armenian had a distinction between velar ղ /ɫ/ and 'regular' լ /l/ absent from Proto-Indo-European? How did that develop? Wikipedia's article on 'Proto-Armenian' (more like 'pre-Armenian'; cf. my 'pre-Tangut') doesn't have the answer yet. But it listed a few words with both liquids:

Velar /ɫ/-words:

ałaxin 'slave girl' from Hurrian al(l)a(e)ḫḫenne
pełem 'dig, excavate' from Urartian pile 'canal', Hurrian pilli

Wiktionary derives this from Proto-Indo-European *bel 'to dig', citing Sanskrit bila- 'hole, pit', but a regular Sanskrit reflex of *bel should have a < *e

ułt 'camel' from Hurrian uḷtu 'camel'

Wiktionary has a dot under the t, but Wikipedia's table of Hurrian consonants shows only one kind of /l/ and /t/
Wiktionary suggests that Proto-Indo-Iranic *uštras 'camel' could be connected to this word, but the l : š correspondence and -r- need to be explained

/l/-words:

salor ~ šlor 'plum' from Hurrian *sāll-orə or Urartian *šaluri (cf. Akkadian šallūru 'plum')
xarxarel 'to destroy' from Urartian harhar-š- 'to destroy'

was Urartian /š/ lateral [ɬ]?

I don't see any pattern that would enable me to predict when foreign l was borrowed as /ɫ/ or /l/ in Armenian.

5. I've never seen a French name written in Armenian before: Րեմի Վիրդա.

20.1.12-20.1.18

Tangut Yinchuan font copyright © Prof. 景永时 Jing Yongshi
Tangut character image fonts by Mojikyo.org
Tangut radical and Khitan fonts by Andrew West
Jurchen font by Jason Glavy
zAll other content copyright © 2002-2019 Amritavision