Archives

19.11.22.23:55: SINO-TIBETAN WORDS FOR 'TONGUE'

The native Cantonese word 脷 lei⁶ 'tongue' is reminiscent of l-words for 'tongue' found elsewhere in Sino-Tibetan: e.g.,

I wish I knew the Pyu word to complete the set of the 'big five' Sino-Tibetan literary languages, but Pyu basic vocabulary is all but unknown.

To keep things simple, I have not looked at other potentially related *l-words in Chinese, much less other Sino-Tibetan *l-words for 'tongue' or 'lick' available at STEDT.

Before one jumps to the conclusion that all of the above must share an *l-root, one should note Schuessler's (2007: 467) warning:

Initial *l- is a near-universal sound symbolic feature for 'lick / tongue', hence similar words in other languages are not likely to be related, such as MK-PVM [Mon-Khmer-Proto-Viet-Muong] *laːs 'tongue' [Ferlus]; Kam-Tai: S[iamese] liaA2 < *dl- 'to lick' [cf. ], PKS [Proto-Kam-Sui] *lja² ? [Thurgood].

Proto-Kra *l-maA 'tongue' (Ostapirat 2000: 223; cf. Proto-Kam-Sui *maA 'id.' [Peiros]), Proto-Hlai *liːnʔ 'id.' (Norquest 2016 appendix: 127), Proto-Tai *liːnC 'id.' (Pittayaporn 2009: 389), and Proto-Austronesian *lidam (on the basis of only Puyuma and Rukai; Blust and Trussel 2019) also fit the pattern. (A single Proto-Kra-Dai word for 'tongue' doesn't seem to be reconstructible.)

Continental 'Altaic' words for 'tongue' have noninitial l-: Ming Jurchen ilenggu ~ ilenggi, Written Mongolian kelen, and Turkish dil. (But peripheral 'Altaic' words don't: e.g., Korean hyŏ < *he and Japanese shita.)

European examples are English lick and Latin lingua 'tongue'. (The latter, of course, has an irregular l- < *d- which became the t- in tongue. Wiktionary derives the l- of lingua by analogy with lingō 'I lick', the true Latin cognate of lick. If we ignore that inconvenient fact, we could be daring and 'reconstruct' a 'Proto-World' *lV 'tongue/lick'. No.)

Schuessler was of course warning against linking Sino-Tibetan words to non-Sino-Tibetan words which happen to share the same initials, but lookalikes do also occur within families: e.g., lick and lingua. There could, at least in theory, be two unrelated lateral roots for 'tongue' in Sino-Tibetan.

Trying to reconcile the small set of Sino-Tibetan forms that I listed at the beginning runs into all sorts of difficulties:

Prelaterals (i.e., whatever comes before the L: prefixes or first syllables of disyllabic roots?): If Old Chinese *mI- and pre-Tangut *PI- are prefixes, what are their functions? Maybe the unknown pre-Tangut labial *P- was *m-. (The high vowel *-I- in both proto-forms is needed to account for the fronting of *a.) The labials in those prelaterals clash with Burling's Proto-Lolo-Burmese alveolar *s- and Hill's pre-Tibetan velar *ɣ-.

Laterals: Chinese and Proto-Lolo-Burmese have a voiced *l-, pre-Tangut has voiceless *l̥- (pre-Tangut *Sl- would correspond to Tangut l- + vowel tension), and pre-Tibetan has both voiced *-lʲ-. and voiceless *-l̥ʲ- with palatalization that might be a trace of a preceding high vowel *-I-:

Il̥- > Il̥ʲ > *ɣl̥ʲ-

Il- > Ilʲ > *ɣlʲ-

Vowels: Three types are in the six words at the top:

Codas:

- the same three stops that might have preceded *-s in pre-Cantonese.

If one regards the various codas as suffixes, one should ideally be able to identify the functions of those suffixes. Affixation can be a dangerous pseudoexplanation for mismatching segments in forms under comparison.

This exercise shows how far we are from being able to reconstruct Proto-Sino-Tibetan. Much more work needs to be done on subgroups before the outlines of their common ancestor can emerge.


19.11.21.22:01: BACKLOGS AND RECOMMENDED READING: DMITRIEV, PHAN AND DE SOUSA, FERLUS, KING

I don't like interrupting series because I rarely get back to them - two examples being my Golden Guide posts (which I stopped almost five years ago!) and a series on Mon that I started in September but haven't even posted until today. (I've posted the Mon series on my front page above yesterday's post even though my other September posts have long since fallen off.)

The Mon series should make up for my dearth of original content today. I don't have time to say much about today's finds:

Phan's 2013 PhD dissertation set me straight six years ago, but it was good to see a short, clear demonstration of the differences between Ct and SV.

Is a slide about velar softening missing?

Pyu makes a cameo appearance on slide 3 (in which Chenla is further northwest than I'd expect)

the people we call today Vietnamese were even more recent arrivals in the Red River Delta as previous thought, probably arriving from the 10th century AD onward, and that the migration (or movement) of Viet-Muong people generally has been from south to north and not reverse. (p. 4)

But this claim of late arrival clashes with the fact that Vietnamese is full of layers of Chinese loanwords going perhaps as far back as the end of the Dong Son period. Those words were acquired during a millennium of Chinese rule in what is now northern Vietnam - a region that Schliesinger regards as a purely Tai area until the 10th century. Vietnamese could not have gotten all those loans via Tai because Vietnamese has far more Chinese loanwords of various ages than the Tai of northern Vietnam.


19.9.21.23:41: THE MON GRAPHONETIC GAP (PART 7)

(Posted 19.11.21.)

In parts 1-6, I've examined the 'graphonetic gap' between Nai Pan Hla's spelling and pronunciation in Mon.

Beginning with this part, I will look at Minegishi Makoto's tables in Nai Pan Hla (1988-89) comparing

Mon Rao and Mon Ro are "two major groups of [Mon] dialects spoken in Burma today" (Diffloth 1984: 41). Nai Pan Hla's focuses on Mon Rao even though his native dialect is Mon Ro because Mon Rao is the dialect of "the majority" (1984: 14). In general Mon Rao dialects are further north than Mon Ro dialects; one exception is Kawkyaik, the dialect in Shorto's dictionary, which is directly east of the southeastern Mon Ro dialects.

It would be interesting to include the Mon dialects of Thailand which "forms a group of dialects by itself" (Diffloth 1984: 42) in this survey, but for now I'm  mostly going to stick with Minegishi's tables which I'm going to break down into smaller tables, starting with one for <a>:

written coda
*voiceless initial
*voiced initial
NPH Rao
Shorto
Ro
NPH Rao Shorto
Ro
Ø
ɛ̤ʔe̤ʔ
<k·>, <ṅ·>
aC
ɛC
aC ɛ̤C
ɛ̤aC
ɛ̤C
<t·>, <n·>, <p·>, <m·>, <h·>, <ʔ·> ɔC
ɔ̤C
o̤C ɔ̤C
<v·> ɔ ɔ̤
ɔ̤
<y·> oa
oa ~ ɔa o̤a
o̤a ~ ɔ̤a

So far, Nai Pan Hla's nonnative Mon Rao is closer to his native Mon Ro than Shorto's Kawkyaik Mon Rao. Nai Pan Hla has lowered his native [e̤ʔ] to [ɛ̤ʔ] and favored [o̤a] over [ɔ̤a] when speaking Mon Rao.

The raising of *-aʔ after *voiced initials to [e̤ʔ] in Mon Rao reminds me of the shift of *-ak after *voiced initials to [eəʔ] in Khmer. Diffloth (1984: 155-156) reconstructs Proto-Mon *-ɛ̤ə̯ʔ with a Khmer-like diphthong. Thai Mon dialects in Diffloth (1984) still have diphthongs:

Shorto's [ɛC] reminds me of Burmese [ɛʔ] < *-ak. (But Burmese *-aŋ became [ɪ̃], not [ɛ̃]!)

Shorto's [ɛ̤aC] reminds me of Khmer [eəC] in the same environment.

Nearly all of Shorto's pronunciations after *voiced initials are higher than after *voiceless initials with two exceptions:

Mon Ro [ɔa] ~ [ɔ̤a] is probably more conservative than Nai Pan Hla's nonnative Mon Rao and Shorto's [oa] ~ [o̤a]. Diffloth (1984: 249) reconstructs Proto-Mon *-ɒɛ̯ and *-ɔ̤ɛ̯ which differ in height as well as phonation.

(On to part 8 someday?)


19.9.20.23:40: THE MON GRAPHONETIC GAP (PART 6)

(Posted 19.11.21.)

(Back to Part 5)

One feature that distinguishes Mon and Burmese script from the other Indic scripts of Continental Southeast Asian languages is the digraph <ui> which never represents anything like [ui]. In Burmese, it has surprising phonetic values:

For comparison, here are the pronunciations of <ui> in Mon from Nai Pan Hla (1988-89: 16,18):

written coda
*voiceless initial
*voiced initial
<k·>, <ṅ·>
aɨC
a̤ɨC
<t·>, <n·>
ɒC
o̤iC
<p·>, <m·>, <h·>, <ʔ·> ɜ̤C
<v·> ɒ ɜ̤
<y·> oi
o̤i

Shorto (1971: xviii, xx) interprets <ui> in Old and Middle Mon as /ø/ even though none of the above modern pronunciations has a front component except for <uiy·> whose final [i] presumably corresponds to *-j and could be rewritten as [j].

A quick check of Diffloth (1984) shows that his Proto-Monic (there is no in his reconstruction) generally corresponds to modern Mon <ui>¹. also matches the earlier value of <ui> in Burmese that I reconstructed seven years before I first saw Diffloth's book.

Let's go with as a placeholder for what <ui> once stood for.

As I'd expect, generally lowered after *voiceless initials and raised after *voiced initials. An exception is before velars: always warped to [aɨ] with a lowered beginning and raised ending regardless of initial. (Of course *voiced initials conditioned breathiness: [a̤ɨ].)

Judging from  [ɜ̤], maybe *ɜ would be a more precise reconstruction of *ə.

dissimilated after *voiced initials and before *iT:

*gəT > *gə̤T > *gə̤iT > *go̤iT

I suspect a similar dissimilation occurred before *-j after a new *-əj developed (from where?) to replace the old one in stage 2:

stage 1
*-əj *-?
stage 2
*-aj *-əj
stage 3
*-ɔɛ
*-əe
stage 4
[oa]
[oi]

For comparison, *əː developed much more simply in Khmer:

¹The exceptions I found ended in *-əj which corresponds to modern Mon <ai>, not <uiy·>: e.g.,

*təj > <tai> [toa] 'arm, hand'

That makes me wonder where <uiy·> came from.

(On to Part 7)


19.9.19.22:59: THE MON GRAPHONETIC GAP (PART 5)

(Posted 19.11.21.)

(Back to Part 4)

Continental Southeast Asian languages typically distinguish between upper mid /e o/ and lower mid /ɛ ɔ/. Modern spoken Mon as described by Nai Pan Hla (1988-89: 12, 16-18) fits this pattern:

Mon spellings - such as those above - imply that the distinction is the result of reorganization. The earlier sound system implied by modern Mon spelling has no (though Diffloth 1984: 284 reconstructs *ɛː at the Proto-Monic level). However, modern Mon spelling does imply an spelled <aṁ> before velars (Nai Pan Hla 1988-89: 16, 18):

written coda *voiceless initial *voiced initial
<amk·> [ɔk]
[ɔ̤k]
<amṅ·> [ɔŋ]
[ɔ̤ŋ]

The behavior of written Mon <aṁ> is generally quite different from that of Khmer and *ɔː in the same environment.

Reflexes of Khmer before velar codas:

written coda *voiceless initial *voiced initial
<aka'> [ɑːk]
[ɔːk]
<aṅa'> [ɑːŋ]
[ɔːŋ]

Reflexes of Khmer ɔː before velar codas:

written coda *voiceless initial *voiced initial
<aka> [ɑk]
[ʊək]
<aṅa> [ɑŋ]
[ʊəŋ]

Khmer lowered *ɔ(ː) after *voiceless initials and broke short after *voiced initials, but Mon only developed a register distinction (which once existed in Khmer but is now gone).

Diffloth (1984: 299) reconstructed a Proto-Monic upper-lower mid vowel distinction only before certain codas:

vowel\coda
*-ʔ
*-h
*-k/-ŋ *-c/-ɲ
*-p/m
*-w
*eː



*eːC

*e

*eh




*ɛː
*ɛːʔ


*ɛːP


*ɛh



*oː
*oːʔ
*oːK
*oːP *oːw
*ɔː *ɔːʔ *ɔh *ɔːK
*ɔːP



*ɔK


Proto-Monic *-c/-ɲ have been lost even from modern Mon spelling: e.g.,

Proto-Monic
Old Mon
modern Mon
IPA
gloss
*ceːc (not attested?)
ci coik great-grandchild
*ciːɲ ṅ· ~ ciṅ· ciṅ· coiŋ elephant
*puːc ~ pu pu put to gouge with a chisel
*smaːɲ smāñ· smā hman to inquire

(9.20.17:00: Reformatted the examples above into a table and added the Old Mon examples preserving final palatals.)

(No Proto-Monic *-eːɲ words have survived in modern Mon.)

The distribution of vowels seems chaotic, though one generalization can be made: length is nondistinctive before glottals. (All vowels before *-ʔ are long, and all vowels before *-h are short. That statement is true even for vowels absent in the table.) I wonder if a more orderly distribution can be reconstructed.

I have excluded the central mid vowel *ə(ː) which I will discuss in part 6.

Diffloth does not reconstruct short *o.

Diffloth does reconstruct *-t, *-n, *-j, *-r, *-l, and *-s, but does not reconstruct *e(ː) *ɛ(ː) *oː *ɔ(ː) before those codas. In modern Mon, *-r and *-l have disappeared even in spelling, and *-s has become [h] (Diffloth 1984: 295-296).

Does modern Mon spelling reflects a stage of the language in which *e(ː)/*ɛ(ː) merged into *e and *oː/*ɔ(ː) merged into *o almost everywhere except before velars?

(On to Part 6)


19.9.18.23:19: THE MON GRAPHONETIC GAP (PART 4)

(Posted 19.11.21.)

(Back to Part 3)

1. Written Mon mid vowels <e> and <o> (Nai Pan Hla (1988-89: 11-12, 16-18)

1a. <e>

written coda
*voiceless initial
*voiced initial
Ø
e
<k·>, <ṅ·>
ɔeC
ɔ̤eC
<t·>, <n·>, <p·>, <m·>
eC e̤C
<y·> ea
e̤a
<v·> ɛ
ɛ̤
<h·>, <ʔ·> ɛC
ɛ̤C

<e> in open syllables does not lower after *voicedless initials unlike <ī> [ɔe].

<ek·> [ɔeK] is like a lowered version of <ik·> [oiC] but has the diphthong [ɔe] of open <ī> after *voicedless initials.

I suppose that *-ej > *-eaj (partial dissimilation from *j?) > [ea].

I don't know why *e lowered before *-w and final glottals. So far I haven't seen any other cases of *-Vw having the same vowel as *-vQ (see below). In part 2, we saw that after *voiced initials, <āp·> and <ām·> were [ɛ̤p] and [ɛ̤m] (with raising rather than lowering!) whereas <āv·> was respelled as <au> [ɛ̤a] (again with raising rather than lowering!).

1b. <o>

written coda
*voiceless initial
*voiced initial
Ø
ao
ɜ̤
<k·>, <ṅ·>, <t·>, <n·>, <p·>, <m·>, <h·> oC
o̤C
<y·> oa
o̤a
<v·> o
<ʔ·>
o̤ʔ ~ ɜ̤ʔ

Given that final <e> after *voiceless initials is [e], I would expect <o> in the same environment to be [o], but the actual vowel is [ao]. (Cf. Khmer in which *eː and *oː in the same environment both became diphthongs: [ae] and [ao].)

Given that final <e> after *voiced initials is [e̤], I would expect <o> in the same environment to be [o̤], but the actual vowel is [ɜ̤]. I guess *-o recently centralized to [ɜ̤], and that a similar change is optional for /o̤ʔ/.

The breaking of *-oj to *-oaj with simplification to [oa] is similar to the breaking of *-ej to *-eaj with simplification to [ea]. A single sound change can be formulated: mid vowel + *-j > that vowel + [a].

I suspect that *-ow became [o] and [o̤] after earlier *-o and *o̤ became [ao] and [ɜ̤].

2. I wish I could have been at the "Using Manchu sources" workshop in Munich to hear the Hölzls' "Chinese Kyakala: The language and its sources".

Besides being of interest from a Jurchenic perspective (Chinese Kyakala is a Jurchenic language), the online presentation introduced me to the Yiddish original of Max Weinreich's famous quotation (original script from Wikipedia):

אַ שפּראַך איז אַ דיאַלעקט מיט אַן אַרמיי און פֿלאָט

a shprakh iz a dialekt mit an armey un flot

Kyakala didn't have an army or a navy. Is it a "dialekt"?

3. Given how Hong Kong is in the news lately, it's a good time to write about Cantonese.

Cantonese has a number of unique Chinese characters representing words absent from literary Chinese and Mandarin. Generally those characters are transparent semantophonetic compounds, but here are a couple that aren't, at least not to me:

𗋭<𘠣+𘞌

2997 1diq4 'to sink' < <WATER> + 1zhyr3 'real' (why?)

11.21.19:24: It took me until now to supply that example because I got distracted and left this entry unfinished until now.

An even more mysterious example is

𗊓< 𘠣+?

3011 2my1 'fountainhead, wellspring'

whose right side <?> is unique to that character. I can't find its right side in Unicode.

If those examples are 'singly' semantomysterious, here's a doubly semantomysterious case: why does the transcription character

𗊛< 𘠣+𗤒

3045 1tshew1

for the Chinese name 曹 Cao (*1'tshaw1 in the Chinese dialect known to the Tangut)  look like 𘠣  <WATER> plus the right side of  𗤒  3305 1kew4 'year'? 3305 might be a partial phonetic because the rhymes only differ by grade (1-ew1 and 1-ew4).

Was <WATER> in 3045 meant to correspond to the 氵 <WATER> in Chinese 漕 <WATER.Cao> *1'tshaw1 'canal', a homophone of the name 曹 *1'tshaw1?

(On to Part 5)


19.9.17.23:19: THE MON GRAPHONETIC GAP (PART 3)

(Posted 19.11.21.)

1. In parts 1 and 2 we saw that written Mon <a> and <ā> generally had fronter values before velar, glottal, and zero codas:

vowel\coda
velar/glottal/zero
elsewhere
<a>
a, ɛ̤
ɔ, ɔ̤, oa, o̤a
<ā>
a, ɛ̤a, ai, a̤i a, a̤, ɛ̤ (!)

The exception to this pattern was <ā> [ɛ̤] after *voiced initials and before <p·> and <m·>.

I would have expected <āv·> to be [ɛ̤w] after *voiced initials, but in fact there is no longer any <āv·>. That rhyme has been respelled as a single symbol ဴ <au> (distinct from the inherent vowel <a> and the dependent vowel ု <u>) pronounced [ɛ̤a] after *voiced initials.

The pattern above has parallels in the pronunciation of the high vowel symbols <i ī u ū> (Nai Pan Hla (1988-89: 10-11, 15-17):

<i>

written coda
*voiceless initial
*voiced initial
Ø
ɔeʔ
i̤ʔ
<k·>, <ṅ·>
oiC
o̤iC
<t·>, <n·>, <p·>, <m·>, <h·>
iC i̤C

<ī>

written coda
*voiceless initial
*voiced initial
Ø
ɔe

<u>

written coda
*voiceless initial
*voiced initial
Ø
aoʔ
ṳʔ
<k·>, <ṅ·>
ɜC
ɜ̤C
<t·>, <n·>, <p·>, <m·>, <y·>, <h·> uC
ṳC

<ū>

written coda
*voiceless initial
*voiced initial
Ø
ao

In Mon, high vowels remain high except after velars and glottal stop (but not the glottal fricative <h·>!). This is unlike Khmer in which high vowels almost always lower after *voiceless consonants.

There are no rhymes ending in [i] and [u]; high vowels must be breathy [i̤ ṳ] in open syllables. (*High vowels *i and *u with modal voice broke into diphthongs [ɔe] and [ao]; cf. the similar breaking of *modal voice *iː and *uː in Khmer to [əj] and [ow].)

I wonder if [ɜK] is from an earlier *euK parallel to [oiK]. Central [ɜ] could be a compromise between front *e and back *u.

<ī> and <ū> are apparently only in native open syllables. Do they exist in borrowed closed syllables? If they do, I imagine they are read as if they were <i> and <u> in borrowed closed syllables judging from Old Mon <jiv·> ~ <jīv·> /ɟiw/ 'Jīvaka (a physician's name)'. (In Shorto's [1971] analysis of Old Mon, /i/ and /u/ were written with both short and long vowel symbols, implying there was no length distinction.)

2. I'd like to see a guide to the history of the Mon-Burmese script. I'm familiar with the Mon-Burmese script of the 12th century Kubyaukgyi inscription and the modern script, but know nothing about the stages in between.

The Mon symbols for အ <°a> and ာ <ā> are identical to those for Burmese, but there are subtle differences between the symbols for high vowels in Mon and Burmese:

transliteration
i ī °i °ī u ū
°u °ū
Brahmi
𑀺
𑀻
𑀇
𑀈
𑀼 𑀽 𑀉
𑀊
Mon


ဣ~ဣိ
ဣဳ


ဥု~ဥူ
Burmese




In the Kubyaukgyi inscription, both Mon and Burmese have a circle with a stroke inside for <i>. One or both tips of the inner stroke may touch the circle. In modern Mon and Burmese, that stroke has evolved in different ways.

Mon ဣ <°i> has a variant with a redundant ဣိ <i> added. That variant could be transliterated as <°ii>.

Mon ဣဳ <°ī> is easier to remember than its Burmese counterpart; it is simply a combination of ဣ <°i> and ဳ <ī>. Did earlier Mon ever have an<°ī> like Burmese ဤ <°ī> (which is similar to Khmer ឦ <°ī>?)

Similarly, Mon ဥု~ဥူ <°ū> is easier to remember than its Burmese counterpart; it is simply a combination of ဥ <°u> and ု <u> or ူ <ū>, whereas Burmese ဦ <ū> looks like ဥ <u> plus ီ <ī> (why?). The logic of Mon ဥု <°ū> is like that of Khmer  ឩ <°ū> which is ឧ <°u> plus an extra vertical stroke reminiscent of ុ <u>.

3. I just found SEAlang's Old Mon page. Alas, Shorto's Old Mon dictionary has yet to be digitized.

(On to Part 4)


19.9.16.23:07: THE MON GRAPHONETIC GAP (PART 2)

(Posted 19.11.21.)

(Back to Part 1)

1. Written Mon ာ <ā> has six different phonetic values according to Nai Pan Hla (1988-89: 10-11, 15, 17):

written coda
*voiceless initial
*voiced initial
Ø
a ɛ̤a
<k·>, <ṅ·>
aiC
a̤iC
<t·>, <n·>, <y·> aC
a̤C
<p·>, <m·> ɛ̤C

<ā> in *voiced-initial open syllables is pronounced almost like Khmer <ā> [iə] in the same environment.

As with <a>, <ā> has a fronted reading before velar codas. (But <a> did not have a fronted reading after *voiceless initials.)

<ā> also has a fronted (but monophthongal) reading [ɛ̤] before labial codas. This reading is homophonous with <a> before zero and velar (not labial!) codas.

<ā> is never followed by glottal codas or <v·>. <āv·> has been respelled as ဴ <au> which is [ao] after *voiceless consonants but [ɛ̤a] after *voiced consonants.

2. Yesterday I mentioned five nôm spellings of Vietnamese người 'person':

㝵𠊚𠊛(⿰㝵仁)倘

This frequency table for five editions of Kiều lists two more spellings:

3. Looking at 獻花歌 Hŏnhwaga (The Flower-Offering Song, c. early 8th c.) made me wonder why 獻 <OFFER> is abbreviated as 献. 獻 doesn't sound like 南 <SOUTH> which also has no semantic relevance:

sinograph
Mandarin
Cantonese
Sino-Japanese
Sino-Korean
Sino-Vietnamese
xiàn [ɕjɛn˥˩]
[hiːn˧]
ken
hŏn
hiến
nán [nan˧˥]
[naːm˨˩]
nan
nam
nam

Is a vague resemblance between the bottom left of 獻 and 南 enough to justify 南 as an abbrevation of 鬳 <CAULDRON>? Ah, I see now that there are variants of 鬳 with a 南-like (⿵冂𢆉) on the bottom. If I had to abbreviate 獻, I'd choose one of three strategies:

The all-new approach dispenses with any attempt to retain any part of the original, since neither 鬳 <CAULDRON> nor 犬 <DOG> have any obvious relationship to offering. 㧥 already exists, and 先 isn't the best phonetic, so it's not an option.

I'm surprised there is no super-simplified replacement for 獻.

4. The character 鬳 <CAULDRON> (Old Chinese *kVrek) itself doesn't make much sense to me since the supposed phonetic 虍 <TIGER.STRIPES> *qʰra according to Shuowen doesn't sound much like it. I suppose *kVr- and *qʰr- are not too distant, but the rhymes don't match at all.

5. Today I read about a Mexican restaurant named Buho in Waikiki. I assume Buho is from Spanish búho 'owl'. If búho is from Latin būbō, why did -b- become <h> (phonetically zero) rather than <b> [β]?

(On to Part 3)


19.9.15.23:59: THE MON GRAPHONETIC GAP (PART 1)

(Posted 19.11.21.)

1. I'm going to start a new series to unfold at a glacial pace.

Yesterday [in a post to be uploaded] I mentioned how Mon vowels developed differently depending on the voicing of preceding consonants: e.g.,

*paʔ > [paʔ]

*maʔ > [mɛ̤ʔ] (vowels after *voiced consonants become breathy and are higher than after *voiceless consonants)

Examining the graphonetic gap between spelling and pronunciation can give us some idea of how Mon vowels developed. The two syllables above are spelled ပ <pa> and မ <ma>; the characters have the same inherent vowel <a> though their readings  have different rhymes. Let's look at all Mon <a>-rhymes as presented by Nai Pan Hla (1988-89: 15, 17):

written coda
*voiceless initial
*voiced initial
Ø
ɛ̤ʔ
<k·>, <ṅ·>
aC
ɛ̤C
<t·>, <n·>, <p·>, <m·>, <h·>, <ʔ·> ɔC
ɔ̤C
<v·> ɔ ɔ̤
<y·> oa
o̤a

C is shorthand for 'the coda you'd expect based on the spelling'.

Post-*voiced raising only occurred before graphic zero and velar codas.

(19.9.16.20:01: [ɛ̤k] has a lower mid front vowel like Burmese [ɛʔ] < *ak, and [ɛ̤ŋ] has a front vowel like Burmese [ɪ̃] < *iŋ. The Burmese *VK changes, however, have nothing to do with initial voicing; they occur after all initials, voiceless or voiced. There seems to be something about velar codas that make them front-vowel friendly. Also cf. Khmer *aK > [eəK] after voiced initials. Khmer *a does not front between voiced initials and nonback codas: *aC > [oəC].)

In all other environments, readings of Mon <a> are identical except for phonation: modal after *voiceless initials and breathy after *voiced initials.

Nonimplosive voiced obstruents had devoiced, so the pronunciations of written syllables like <kan·> and <gan·> only differ in phonation: [kɔn] and [kɔ̤n].

I imagine that <av·> was pronounced something like *[ɔw] before *[w] was lost. (I don't know whether this loss predated the development of phonemic phonation, a.k.a. register.)

As for <ay·>, I think there is a parallel with French moi [mwa]:

*aj > *ɔj > *ɔe > [oa]

Thai Mon preserves a palatal vowel in [cɔɛ] < Proto-Monic *caj 'louse' (Diffloth 1984: 75). In modern written Mon, 'louse' is spelled <cai>. <ai> is an abbreviated spelling of <ay·> (Nai Pan Hla 1988-89: 2).

Wiktionary has a list of Diffloth's 1984 Proto-Monic, Proto-Mon, and Proto-Nyah Kur reconstructions. Monic is a subgroup of Austroasiatic with two divisions, Mon proper and Nyah Kur.

2. And now for the Khmer side of Mon-Khmer ... yesterday morning while reviewing Khmer, I encountered a couple of interesting words.

3. Last night while copying line 7 of the epitaph of Yelü Dilie as reproduced in Kane (2009: 191-211) to practice the Khitan small script, I found that Kane had transliterated

270-302-222

as <yi.il.iń> instead of <êm.il.iń> which would be the normal transliteration in his system. Kane (2009: 194) thinks it corresponds to a Khitan "name or official title" transcribed in Chinese as what is now read as yilimian and yilibi in Mandarin (which is not far from the northern dialect underlying the transcriptions). But neither <yi.il.iń> nor <êm.il.iń> resembles those Chinese transcriptions. <êm.il.iń> might be a match if the Chinese reversed the medial segments, writing Khitan *emilin as if it were *elimin. <yi.il.iń> is even less likely, as it lacks the labials in yilimian and yilibi.

4. I found four nôm spellings of Vietnamese người 'person' via nomfoundation.org's Nôm Lookup Tool. The last two are unusual:

The use of 仁 as an indirect semantic component reminds me of how 仁 rather than 人 is the Khitan large script character for ku 'person'. Was 仁 a carryover from the lost Parhae script, and if it was, what motivated the Parhae to write their word for 'person' as 仁 instead of as 人?

倘 is like many Tangut characters that have an obvious semantic component and one or more mystery components with no transparent function.

5. It's been almost three years since I switched from Mojikyo to Tangut Yinchuan, and I'm not going back except for preview thumbnails.

Today I noticed that Mojikyo always renders Tangut component 𘡕 086 as 𘢩 170 - the reverse of the mistake I've been making in my handwriting.

There are two minimal pairs necessitating a distinction between the two:

6. Today I discovered that Homophones edition A has the aforementioned 𗯗 5834 2le1 'to change' (24A77) instead of 𗯖 5841 2khwuq1 'to cut' as in editions B2 and B5. Such confusions not only make myself feel better about my many errors writing Tangut but also may give insight into how the script worked. Below 𗯗 5834/𗯖 5841 is 𘖵 5019 2khwuq1 'saw' which has its homophone 𗯖 5841 as phonetic beneath component 𘨝 542 <METAL>. It is written correctly in all three editions.

(On to Part 2)

19.11.20.23:23: WHAT IS THE RELATIONSHIP BETWEEN THE KHITAN SMALL SCRIPT AND THE JURCHEN LARGE SCRIPT? (PART 2: THE LOYALTY PRINCIPLE)

(Back to Part 1)

In the Khitan large script, there is an nearly one-to-one relationship between words and character blocks: e.g., the trisyllabic word taulia 'hare' is written as a single block of three characters:

<tau.li.a>c

Exceptions are polysyllabic Chinese loanwords which are written with one block per syllable: e.g., the name

<340.339.303 244.357> <h.i.ing s.ung> Hingsung (Xing 2.2)

from Liao Chinese 興宗 *1hing 1tsung 'flourishing ancestor'. (Khitan had no /ts/ in its native phonological inventory, so Chinese /ts/ was often approximated as /s/.)

In theory the name Hingsung could have been written as one five-character block

<340.339.303.244.357> <h.i.ing.s.ung>

since neither /xiŋ/ nor /suŋ/ are words in Khitan, but the loyalty principle of imitating the original Chinese spelling with separated syllables overruled the normal lexical principle of one block per word.

The loyalty principle has no equivalent in the Khitan large script (KLS). There is no strict one-to-one correlation between Chinese characters and KLS characters:

gloss
Chinese
Liao Chinese pronunciation
KLS
transliteration
Khitan pronunciation
mountain

*1shan
MOUNTAIN
shan
hundred

*4pai

bai
bai
emperor
皇帝 *1'hong¹ 3ti 皇帝 EMPEROR₁ EMPEROR₂
hongdi
(a name)

*1'han
何至 ha an
han
commander

*3shoi
夫坐 sho oi
shoi

Strictly speaking, 'mountain' and 'commander' are probably parts of Chinese borrowings in Khitan rather than Khitan words.

I have not seen the Chinese borrowing bai 'hundred' outside the 耶律昌允 Yelü Changyun KLS inscription (1062); the usual word is native jau.

The fact that the name 'Han' and 'commander' are written with two KLS characters may indicate that either the KLS had no phonograms <han> and <shoi> for the monosyllables han and shoi or that the KLS may have had logograms pronounced han and shoi which were inappropriate for 'Han' and 'commander' because they stood for other words. A study of multiple-character KLS spellings for Chinese monosyllables may enable us to guess which monosyllables did not have phonograms in the KLS.

"May", because there is at least one case of a Chinese monosyllable with both one- and two-character KLS spellings: 上 *3shang corresponds to

~~

<shang> ~ <sha.ang> ~ <sha.ang>

in lines 3, 1, and 17 of Yelü Changyun. There is also a KLS 北 <shang> used to write Liao Chinese 尚 *3shang. Perhaps 仲 and 北 are morpheme-specific logograms corresponding to the Liao Chinese homophones 上 and 尚.

The KLS does have a character 上 which looks exactly like Liao Chinese 上 *3shang, but KLS 上 represents the syllable ha instead of shang. KLS can be disorienting from the perspective of someone accustomed to the Chinese script because so many KLS characters do not function like their Chinese lookalikes: e.g.,

Did the Khitan randomly decide to retain Chinese-like readings for some Chinese characters (山, 皇, 帝) and assign arbitrary non-Chinese readings to others (高 etc. in the list above)? I don't think so. I think the un-Chinese readings of 高 etc. originate from the use of those characters as semantograms for non-Chinese languages. 高 etc. may also be simplifications of more complex Chinese characters: e.g., 至 could be a 'katakana' phonogram reduction of a semantogram like

侄厔𦤵䑒咥姪庢挃洷𤞂𦤷晊桎𦤶𦤹𦤺眰祬秷𦤻

𦤼䘭臷臸蛭𢰙𦤿䑓輊𦥁𧠫𫇎䬹銍𥔊𦥂𦥄𦥅𦥆𦥇

𦥈𦥉𦥊𦥋𦥌𨖹𦥎𦥎𦥏𫇑𨆧𪗻𥒓𦤸𦤽𦤾𦥀𦥃𦥍𪏀

𫇏𫇐𬛱𬛳𬛶𬛷𮍢

致𠊷𡍶㨖㮹㴛𤸓緻𦟔𦥐𧤡𧩼䞃䦯𩋩

室𠋤𢯶𧫡𩋡鰘

窒㗧䏄膣螲

𡌥𡏀𣖭

𦤳, etc.

(Some of those characters may postdate the creation of the KLS and therefore be disqualified as potential cognates of KLS 至 <an>.)

Once again I am out of time, so I didn't get to write about the Jurchen text from part 1, much less come even remotely close to answering the title question. What I originally thought might be a single post just gets longer.

¹Yesterday it occurred to me that I could differentiate between 'yin' and 'yang' tones in my tonal notation by marking yang tones with '. I project the absence of non-1 yang tones (2'-, 3'-, 4'-) in modern Mandarin back into Liao Chinese, but I could be wrong.


19.11.19.23:15: WHAT IS THE RELATIONSHIP BETWEEN THE KHITAN SMALL SCRIPT AND THE JURCHEN LARGE SCRIPT? (PART 1)

The short and oversimplified answer is that there isn't any.

The real answer is more complicated.

The defining characteristic of the Khitan small script is how its characters are combined into blocks. For that reason, Shimunek (2017) calls it the 'assembled script' to avoid commiting to the term 'small script'¹.

Kiyose (1977: 27-28) proposed that the elusive Jurchen small script is nothing more than the Jurchen large script characters combined into Khitan small script-like blocks. The known examples of these Jurchen blocks are in 弇州山人四部稿 Yanzhou shanren sibu gao (Draft [Catalog of] the Four Categories of Yanzhou Shanren['s Library]; 16th c.) and 方氏墨譜 Fang shi mopu (Mr. Fang's Ink [Cake] Book, 1588) and on a 牌子 paizi (travel pass).

Here is Kiyose's (1984) decipherment of the eight blocks in Yanzhou and Fang shi which can be seen at Wikipedia:

row
block #
block
transliteration
meaning
Chinese
meaning
1
1
gen.giyen
bright

bright > wise
2
wan
prince (< Chn)

prince
3
tiqo.ci.ghun
heedful-if-?

heedful
4
de
virtue (< Chn)

virtue
2
5
duwin
four

four
6
tuli.le
outside

foreigner
7
hiyen
all (< Chn)

all
8
an.da.hai
guest

guest

'When a wise prince is heedful of virtue /

'Foreigners from the four quarters come as guests'

(tr. by Kiyose 1984: 84)

Unlike most Khitan small script blocks, the blocks in that text are purely vertical: e.g., <tiqo.ci.ghun> and <an.da.hai> are vertical stacks of three characters - an arrangement never found in the Khitan small script. (But two-element vertical stacks like <gen.giyen> and <tuli.le> are occasionally found in the Khitan small script.)

Making images of those blocks and their components took so long that I don't have time to write about the words they represent or how they're strung together! Next time ...

In the meantime, I thank Jason Glavy for making the font that is the basis of nearly all my 600+ Jurchen images. (Seven of the eight images in the transcription of the Yanzhou/Fang shi text above are modifications of characters from his font; only <duwin> 'four' is unaltered.) I couldn't have written any of my many posts about the Jurchen script over the last eight years without his font.

¹The terms 'small script' and 'large script' are only known from Chinese sources. 遼史 Liao shi (History of the Liao Dynasty) vol. 64 says that 耶律迭剌 Yelü Diela

能習其言與書,因制契丹小字,數少而該貫。制契丹小字,數少而該貫。

'was able to learn their [the Uyghur] spoken language and script. Then he created (a script) of smaller Khitan characters which, although few in number, covered everything.' (tr. by Kane 1989: 2)

That passage hints at the possibility of the Khitan small script being somehow influenced by Uyghur and indicates that the small script had 'few' characters. The 'assembled script' has characters combining into words as in the Uyghur script and has fewer characters than the other Khitan script (the 'linear script'), so I am certain that the 'assembled script' is the small script (and that the 'linear script' is the large script).

Contrast the "few" characters of that passage with the description of the creation of a Chinese-like first Khitan script with "several thousand characters" in 新五代史 Xin wu dai shi (New History of the Five Dynasties) vol. 72:

多用漢人,漢人教之以隸書之半增損之,作文字數千,以代刻木之約。

'He [阿保機 Abaoji, the first Khitan emperor] employed many Chinese, who taught them [the Khitan] how to write by altering characters in the clerical script, adding here and cutting there. They created a script of several thousand characters, replacing the contracts made by making notches on wood.' (tr. by Kane 2009: 167)

That earlier script must be the large script which has over a thousand known characters and resembles Chinese more strongly than the small script.

Unfortunately, these passages from Liao shi vol. 2 using the term 'large script' do not give any specifics:

五 年春正月乙丑,始制契丹大字。

'Fifth year: spring, first month, yichou day: work began on the creation of the Kitan large script.'

九月 [...] 壬寅,大字成,詔頒行之。

'Ninth month [...] renyin day: The large script was completed. It was implemented by imperial edict.' (tr. by Kane 2009: 167)

There was no Khitan script before the fifth year of Abaoji's reign, so the large script must be the earliest Khitan script - the "script of several thousand characters" mentioned in Xin wu dai shi.


19.11.18.22:06: JURCHEN LARGE SCRIPT CHARACTER DERIVATIONS: <STAR>, <GIYA>, <HOTO>, <LE>

The first of these occurred to me last night; the rest are from today.

Jurchen
Jurchen reading
Jurchen gloss
Jurchen etymology
cognate sinograph
Chinese gloss
source reading
~ osiha
star
< Proto-Tungusic *xōsī (Vovin 1996 class handout)

cow
Para-Japonic cognate of Proto-Japonic *osi or *usi 'cow'

giya [kʲa]
street
< post-Early Middle Chinese 街 *kja

house
post-Early Middle Chinese *kja

hoto
-
-
cf. 土 'earth'
a word related to the source of Manchu hoton 'city'
~ le
-
-
ceremony
Middle Chinese *lḛj

A. The Jurchen logogram <STAR> may be a Parhae cognate of standard Chinese 牛 <COW> which was once used to write a para-Japonic cognate of Proto-Japonic *osi or *usi 'cow' and later borrowed to write an unrelated Jurchen soundalike osiha 'star'. That borrowing must postdate the loss of *x- in pre-Jurchen.

(The resemblance between Proto-Tungusic *xōsī 'star' and Japanese hoshi 'id.' is fortuitous. Japanese h- goes back to *p-, and Proto-Tungusic *p- became p- [later f-] rather than zero in Jurchen.)

The second form of <STAR> with ㇓ on the left and a hook on the bottom is from Grube (1896: 1). Without access to the Berlin manuscript that he used, I cannot verify how accurate his handwritten form of <STAR> is.

B. The Jurchen phonogram <giya> may be a Parhae cognate of standard Chinese 家 <HOUSE> (post-Early Middle Chinese *kja) used to write the syllable giya [kʲa]. Jurchen speakers borrowed giya 'street' from post-Early Middle Chinese 街 *kja 'id.' (via Sino-Parhae?; cf. Sino-Korean 街 ka) but wrote it with a version of 家 which was homophonous with 街 in the Chinese known to the Jurchen. (That variety of Chinese had merged the rhymes of 街 and 家, whereas modern standard Mandarin 街 jiē reflects a variety that had not merged those rhymes.街 and 家 are not homophonous in any modern variety of Mandarin at 小學堂 Xiaoxuetang: compare their readings here and here.)

C. The Jurchen phonogram <hoto> containing 土 <EARTH>  may have originated as a logogram <CITY> for an areal word attested in Koguryŏ place names (as 忽 *hot), Manchu (hoton, a loan from Mongolian), and Mongolian qota(n). The logogram could have originated in the Parhae script (to represent the *hot-word from Koguryŏ) or in the completely lost Northern Wei script (to represent a Serbi cognate of Mongolian qota[n]). This logogram may be original to the Parhae or Northern Wei precursor of the Jurchen script and therefore lack a cognate in the standard Chinese script.

D. The Jurchen phonogram <le> [lə] may be a cognate of standard Chinese 礼 <CEREMONY>. In Liao and Jin Chinese, 礼 was pronounced *li which would have been a less than optimal match for Jurchen [lə]. The use of a cognate of 礼 to represent the syllable /lə/ probably predates the shift of *-ej to *-i in northeastern Chinese: i.e., it may go back to the Parhae script or perhaps even the Northern Wei scirpt.

Hiragana れ <re> and katakana ㇾ <re> are respectively derived from a cursive form of 礼 and the right side of 礼, so I regard them as potential 'relatives' of <le>.

The second form of <le> with 天 on the left and a hook on the bottom is from the Berlin copy of the Ming dynasty Bureau of Translators vocabulary and could be a mistake for the correct form with 夫 on the left. The Jurchen script in extant copies of the vocabulary was probably written by Chinese scribes and hence may contain nonnative errors. It might be difficult to differentiate between genuine Ming Jurchen innovations and scribal errors. I assume that the dots of the Ming Jurchen characters

<DAY> inenggi and <MOON> biya

are genuine innovations, but the replacement of 夫 with 天 in <le> may not be.


19.11.17.21:07: SINO-PARHAE INFLUENCE ON THE JURCHEN SCRIPT?

Janhunen (1994: 133) proposed that the Jurchen script was a descendant of an "old local system of writing" rather than a 12th century creation as commonly assumed.

An obvious candidate for a concretely identifiable historical entity that had the potential to create a written language in pre-Liao Manchuria is the Bohai 渤海 [= Parhae] kingdom (698-926).

[...]

The Khitan and Jurchen "large" scripts were likewise not true "inventions" but, rather, natural stages in an evolutionary process that extended backwards through the Bohai script to some early northern variety of the Chinese script. [...] There is also the possibility that the Korean state of Koguryeo 高句麗 (-668), often regarded as the direct precursor of Bohai, was somehow involved. The influence of United Shilla 新羅 (668-918), a contemporary of Bohai, appears somewhat less likely, but cannot be completely ruled out. (pp. 114-115)

If Janhunen's hypothesis is correct, I might expect some peninsular features in the Jurchen script. Unfortunately, little is known about the languages of the peninsula prior to the invention of hangul in the 15th century, and very little is known about languages outside of Shilla. So it is dangerous to project Shilla features onto the rest of the peninsula, and a greater leap still to assume such features might have reached Parhae in the north.

Nonetheless let's suppose that one of those features - the *-r (> modern -l) that characterizes Sino-Korean (i.e., Sino-Shilla) - was present in the Chinese known in Parhae. It was certainly present in the Chinese of the capital in northwestern China, but whether the feature also existed in northeastern China and Parhae next door is open to question. Let's answer the question in the affirmative for now. If Sino-Parhae had *-r readings for Chinese characters - and its local characters - I would expect *-r local characters to appear in the Jurchen script as symbols for CVr(V) syllable (sequence)s.

In Sino-Korean, 失 <LOSE> is pronounced 실 shil < *sir. Jin (1984: 14) derived the Jurchen phonogram

~

<šir> (the earlier form is on the left)

from ... 失 <LOSE>. The -r of the Jurchen reading could reflect a Chinese or Sino-Parhae *-r.

How many other Jurchen CVr(V)-characters resemble Chinese characters with Sino-Korean -l (< *-r) readings?

I am not saying that Jurchen has Sino-Korean features. I use Sino-Korean as the only available proxy for Sino-Parhae: how Chinese characters might have been pronounced in Parhae to the north of Old Korean-speaking Shilla.

I am also not saying that all Jurchen CVr(V)-characters must be derived from Chinese characters with Sino-Korean -l (< *-r) readings.

In theory Parhae characters for native Koreanic *CVr(V) words could have been recycled for Jurchen CVr(V)-sequences.

Koreanic need not be the only source of CVr(V)-readings in Jurchen. If Vovin (2012) is correct and Jurchen was already written in Parhae two or more centuries before the establishment of the Jurchen Empire, Parhae characters (渤海字? 渤字?) could have functioned as semantograms for Jurchen words:

Chinese character X meaning Y : Parhae character X' for Jurchen word CVr(V) meaning Y'

Semantograms for Khitan CVr(V) words could have been reused for unrelated Jurchen CVr(V) words and syllable (sequence)s.

Going beyond Jurchen and Khitan, I have already proposed that the Jurchen character

<HORSE> mori(n)

is related to Chinese 保 <PROTECT> (Sino-Korean  보 po) which does not have a Sino-Korean reading ending in -r but which could have represented a para-Japonic (i.e., the peninsular sister of pelagic Japonic) morpheme (sequence?) mor(-i) 'protect(-INF)' (cf. the use of 保 for mori 'protecting' in Japanese names).

A very wild possibility is that some Jurchen CVr(V)-characters may be derived from Parhae characters representing Amuric (!) morphemes. Fortescue's 2016 Proto-Amuric ('Proto-Nivkh' at Wiktionary) reconstruction has *-r(V) and *-ʀ(V)-final roots.

To sum up my thoughts, I present a table of possible sources for Jurchen CVr(V)-readings:

source character
representing
recycled for
example
Chinese *-r characters
Sino-Parhae *CVr
Jurchen CVr(V)
<šir>?
Parhae characters
Jurchen CVr(V)?
?
Khitan CVr(V)??
?
Koreanic CVr(V)???
?
para-Japonic CVrV????
<HORSE>?
Amuric CVr(V)?????
?

My assumption here and elsewhere is that Jurchen character readings are not random in the way that Cherokee character readings appear to be random: e.g., Sequoyah assigned the reading a rather than dV to Ꭰ. (Cherokee <da de di do du dv> are Ꮣ Ꮥ Ꮧ Λ¹ Ꮪ Ꮫ.) I would like all Jurchen character readings to be derived either from Chinese or some non-Chinese language's approximate semantic equivalent of a Chinese morpheme (e.g., a para-Japonic *mor-i 'protect-INF' as a translation of Middle Chinese 保 *pa̰w 'protect').

Incredulity is no argument, I know, but I just can't bring myself to believe that 完顏希尹 Wanyan Xiyin did what Sequoyah did on a mass scale: take character shapes from existing scripts (the Chinese and Khitan large scripts) and assign hundreds of them to Jurchen morphemes and syllable (sequence)s at random. Sequoyah was illiterate until he invented his own script and may have never known English, but Xiyin

was fascinated by Chinese classics, and collected a large library when Jurchens seized and looted the capital of the Northern Song dynasty, Bianjing (present-day Kaifeng), in the Jin–Song Wars.

That happened either during the first siege of Bianjing in 1126 or the second in 1127 - years after c. 1119-1120, when Xiyin was said to have created the script. In theory Xiyin could have been illiterate until (or even after?) the siege and just liked the idea of having books he couldn't read, but I doubt that. The Jurchen had lived under literate rulers familiar with Chinese culture for centuries. Surely 阿骨打 Aguda, the founder of the empire, would have assigned a literate man to 'create' a script for his new state.

I put 'create' in quotes, since I think Xiyin standardized an existing script. Maybe standardized should also be in quotes, since the Jurchen script has a lot of variation. This variation may imply that the script has a lot of history behind it. (The Tangut script is most likely a true invention without a history, and it has far less variation.)

Some of that variation postdates Xiyin's time: e.g., the dots of

<DAY> inenggi and <MOON> biya

are not in the manuscript thought to be the earliest example of the Jurchen script. I am not counting the Parhae tiles that Vovin (2012) regarded as even earlier examples. If one counts those tiles, then that manuscript may be the earliest example of post-Xiyin written Jurchen.

Variation in the Jurchen script - and the Khitan scripts - is an issue deserving of much attention. Jin (1984) has already done some basic work by identifying which texts characters appear in. The next step is to create a visual chronology organizing characters by date: e.g.,

transliteration
1100s
1200s
c. 1500
<šir>
<DAY> inenggi
<MOON> biya
<COOKED> uru
?

Note how the earlier version of <šir> is closer to Jin's (1984) proposed Chinese source character 失 <LOSE>. The newer version of <šir> has a lookalike of the Khitan small script character 051 <qa>


atop a half-height 人, whereas the older version and 失 have a full-height 人 shape.

I included the Jurchen character <COOKED> because its later version looks exactly like 失 <LOSE>. The absence of a dot on the bottom of the late version could simply be a mistake in the Berlin copy of the Ming dynasty Bureau of Translators vocabulary. I have no idea why the shape of 失 <LOSE> - with or without a dot - was read as uru. Korean ilh- < ìrh- 'lose' and Old Japanese usinap- 'lose' only have one segment matching uru, and Proto-Amuric bək(ə)z- 'lose' doesn't match at all. Was there a Khitan root ur(u)- 'lose'? Might Khitan large script character 1511 <?>


have been read ur(u)-?

¹11.17.23:52: In 1834, Samuel Worcester inverted Sequoyah's Λ <do> to Ꮩ to differentiate it from Ꭺ <go>.


19.11.16.23:54: JURCHEN PSEUDORADICAL GRAPHEMES

Janhunen (2012: 109) wrote that the Khitan large and Jurchen (large) scripts had "no functionally relevant 'radical' components".

Here's what he meant. This trio of Jurchen characters have different elements on the left resembling Chinese radicals:

graph
reading
gloss
cf. Chinese radical
cf. Chinese phonetic

un
(phonogram)
亻 <PERSON>
干 <kan>
-
~ hūlha
thief
火 <FIRE>

The non-干 elements of the Jurchen characters have no apparent semantic or even phonetic function:

The shared component 干 has no apparent semantic or phonetic function either.

Many Jurchen (large script) characters appear to be random combinations of Chinese elements (often with slight alterations) without any apparent logic linking their graphic structure to their Jurchen pronunciations or the meanings of the Jurchen morphemes that they represent.

Are Jurchen (large script) characters randomly constructed? I find that hard to believe given that the Jurchen elite was literate in Khitan and Chinese. I can imagine an illiterate script inventor coming up with unsystematic recyclings and alterations of existing shapes as a new script. (That is precisely how the Cherokee script was developed.) But whoever created the Jurchen script was the heir of traditions of literacy - traditions that may have gone back to Parhae if not earlier. (Janhunen [2012: 109] suggests a link between the Parhae script and the even earlier Serbi script that was totally lost.)

I propose that the structure of Jurchen characters tells us nothing about Jurchen itself because it reflects another language. Let's call that language X. Here's my xenogenetic scenario for the development of Jurchen characters.

The challenge, then, is to identify X, A, and B on the basis of the shape of the Jurchen character and its reading C. D is of no relevance.

Suppose that English were written in a Jurchen-like script based on the Japanese usage of Chinese characters.

In that scenario, Japanese is language X. The Chinese character 香 representing A = Middle Chinese *xɨaŋ meaning B = 'fragrance' was used to write an unrelated Japanese morpheme C = ka meaning B = 'fragrance'.

English speakers then used 香 to write syllables that sounded like C: e.g., C' = car. Any attempt to see an automobile in 香 would be doomed.¹

In short,

Can a similar formula be written with Jurchen un and hūlha as C'?

11.17.0:31: 香 <FRAGRANCE> looks like 禾 <GRAIN> atop 日 <SUN> but is actually an abbreviation of 𪏽: 黍 <MILLET> atop 甘 <SWEET> - the sweet smell of millet. Jurchen characters have relatively low stroke counts and could be abbreviations of more complex Chinese originals, so a Jurchen character that looks like Chinese <E.F> could be a reduction of <G.H> (just as 香 <GRAIN.SUN> is a reduction of 𪏽 <MILLET.SWEET>).

In Japanese, 香 was reduced to 𛀠 as a now-obsolete hiragana for ka.


19.11.15.23:59: THE ORIGIN OF THE JURCHEN CHARACTER FOR 'CLOUD'

The Jurchen character for tugi 'cloud'


looks like Japanese 広 <WIDE>, the postwar simplification of 廣, plus a dot. But last night I realized it may be related to Chinese 云 <CLOUD>:

What I don't know is whether Jurchen <CLOUD> is a consciously altered Chinese character or the product of an alternate line of gradual evolution from an early form of 云 <CLOUD>. The first possibility fits the standard view of the Jurchen large script as a 12th century alteration of the Khitan large script (and perhaps also Chinese characters). The second possibility fits Janhunen's (1994) hypothesis: the Jurchen script is a descendant of the Parhae script, an organic offshoot of the Chinese script.

I continue to favor the second possibility because of the question Janhunen (1994) raised for the Khitan large script: if the Khitan goal was to differentiate their script from Chinese, why did they retain some Chinese characters as is? The question also applies to the Jurchen script to a lesser extent. The Jurchen script shares far fewer characters with the Chinese script, but nonetheless a handful of key obvious lookalikes remain:

Altered forms of all four appear only in late Jurchen.


19.11.14.23:59: THE ORIGIN OF THE NAME NARA

1. When I first became interested in Korean in 1987, I instantly fell in love with the idea of what Leon Serafim would later call Koreo-Japonic: the idea that Korean and Japanese were related. (It would be at least a couple of years before I would learn of Altaic from Roy Andrew Miller's books.) Unfortunately at the time I was a high school sophomore and had no idea what linguistics was, much less how historical linguistics worked. So I uncritically accepted claims like the derivation of Nara from Korean 나라 nara 'country'.

Now I know that the earliest attested form of the name is 15th century 나랗 nàráh < *narak 'country'. The hypothetical final *-k matches up nicely with the coda of the phonogram 樂 Middle Chinese *lak in these old spellings for Nara:

乃樂 ~ 那樂 ~  諾樂 ~ 寧樂

(Old Japanese Ca-syllables were not usually written with *-k phonograms.)

But sound matches alone do not an etymology make. The semantics also have to make sense. And naming a place in Japan 'Country' makes no sense to me. I don't know of any parallels anywhere else.

Wikipedia on the Korean hypothesis:

American linguist Christopher I. Beckwith infers the Korean narak derives from the late Middle Old Chinese 壌 (*nrak, earth), from early *narak, and has no connection with Goguryoic and Japanese na.

I have no idea how Beckwith reconstructs Middle Old Chinese *nrak. I know of no evidence for *-r- or *-k in 壌 (the Japanese postwar simplication of 壤). Here's how I reconstruct the word:

I don't know of anyone other than Beckwith who reconstructs the word with *-r- or *-k.

壤 is, incidentally, the yang of 平壤 Pyongyang 'Level Earth'.

More from Wikipedia:

There is the idea that Nara is akin to Tungusic na. In some Tungusic languages such as Orok (and likely Goguryeo language [let's not go into the issue of whether a 'Koguryo language' existed]), na means earth, land or the like. Some have speculated about a connection between these Tungusic words and Old Japanesean archaic and somewhat obscure word that appears in the verb phrases nawi furu and nawi yoru ('an earthquake occurs, to have an earthquake').

Two problems:

11.15.22:25: Can the Korean etymology be saved? The phonetic match appears to be perfect. So any salvage work has to be done on the semantic end:

A. If *narak meant 'country', maybe the Japanese name is only the second half of a longer name *X narak 'the land of X'. But there is no evidence for any longer name.

B. *narak did not mean 'country'. The Koreanic root attested with the meaning 'country' from Middle Korean online could have had different meanings in earlier (and extinct) Koreanic languages: e.g., 'flat land' (cf. various Japanese nar- 'flat' words and the proposals to link them to Nara). Or *narak is not the root that became 'country' in Korean proper but some unrelated root - might Japanese 楢 nara 'oak' (one proposed source of the name Nara) be from a Koreanic *narak 'oak'? (But there is no attested Koreanic word for 'oak' like *narak - currently, oaks are 참나무 chham namu 'true trees', and no Korean names for types of oaks contain anything like *narak. Could chham namu be a later replacement for an older word *narak that only survives in Japanese? And might *narak have been a Koreanic translation of Japanese kashi 'oak'? Today there is a nearby city named 橿原 Kashihara 'oak plain'. Perhaps the whole region was once called Kashi. If *narak meant 'flat land', it could be even a translation of Old Japanese para [now hara] 'plain'. Wild idea: there were two unrelated Koreanic words *narak 'oak' and *narak 'plain', so *narak sufficed as a translation of the local name Kasipara 'oak plain'.)

In any case, I think the unusual spellings of Nara pointing to a final *-k make a Koreanic origin likely, though the underlying *narak may not have meant 'country'.

How did Nara get a Koreanic name? Two possibilities:

A. The name may date long before Nara became a capital - back to the period when 古墳 kofun were built there. I suspect kofun were a Japanese innovation triggered by the Koreanic influence that was certainly on the rise in those days, and Koreanic speakers brought it back home:

In recent years, South Korea has begun to allocate more resources toward archaeology, and keyhole tombs [i.e., kofun] have been found around the Yeongsan River basin, during the mid-Baekje [= 百濟 Paekche] Era. The keyhole tombs that have thus far been discovered on the Korean peninsula, were built between the 5th and the 6th centuries AD. [The earliest kofun in Japan date from the 3rd century AD.] There remains question over whether the tombs were made for Japanese aristocrats loyal to Baekje, Japanese merchants who controlled the region [Is there any evidence that any Japanese had such power in Paekche? Has anyone claimed the tombs are evidence for 任那 Mimana?], or a class independent from both Baekje and Yamato Japan.

Or how about Paekche aristocrats who had adopted a Japanese fashion?

B. Nara was named after a Japanese word by someone who may not have even been aware that the word was a borrowing from Koreanic. The trouble with this hypothesis is that it fails to explain the spellings of Nara pointing to an un-Japanese *-k.

I still doubt Japanese nawi 'earthquake' has anything to do with Tungusic na 'earth'. Just to illustrate the dangers of shared-monosyllable pseudoetymologies, I could claim that nation and nature are 'cognate' to Jurchen na 'earth'. But the initial n- of those two words go back to an earlier Latin gn- without any parallel in Tungusic which lacks initial consonant clusters.

2. The Jurchen word for 'frost' was transcribed in Ming Chinese as

塞馬吉 *sə ma ki ~ *saj ma ki (Bureau of Translators #9)

塞忙吉 *sə maŋ ki ~ *saj maŋ ki (Bureau of Interpreters #8)

(塞 has two possible readings: *sə and *saj.)

Looking at Kane's (1989: 136) reconstruction of the Jurchen word for 'frost' as semanggi got me thinking about a simple notation for vowel classes. The three vowels in that word could be symbolized as HLN:

Manchu vowel harmony permits H or L vowels to coexist with N but usually doesn't tolerate mixed H/L roots. Jurchen seems to have even more vowel harmony than Manchu, so I agree with Jin (1984: 193) that the Jurchen word for 'frost' was saimanggi (LLN; I treat ai as an L-glide sequence /aj/ and not as an LN vowel sequence).

(11.15.0:35: H/L terminology is also useful for my version of Chinese historical phonology: I claim that 壤 started out as *CInaŋ HL but harmonized to *CInɨaŋ HH. I regard the diphthong ɨa as the H counterpart of the L monophthong a.)

A similar kind of notation for languages with front/back harmony like Turkish would use the letters F, B, and N: e.g., the Arabic nonharmonic loanword kitap 'book' is FB.

3. Tonight I saw tzuris in this movie review which made me realize that tsuris (the only spelling I had ever seen until now) doesn't seem to have a singular in English. But it does have a singular in Yiddish. I should have guessed Yiddish got it from Hebrew since it has no German cognate.

11.15.0:18: What I don't understand is how Yiddish tsores became tsuris in English. (Wiktionary reports o-variants in English.)


19.11.13.23:59: THE ORIGIN OF THE JURCHEN CHARACTER FOR 'PERSON'

How did I not notice the similarity between the Khitan large script character <ku> 'person' (left) and the Jurchen (large) script character <niyalma> 'person' until now (right)?

Khitan 仁 <ku> looks exactly like Chinese 仁 <HUMANE>. Why didn't the Khitan simply write ku with a lookalike of Chinese 人 <PERSON>?

人 <PERSON> and 仁 <HUMANE> have been homophonous since at least the early first millennium AD¹. Someone decided to use 仁 <HUMANE> to represent a non-Chinese word for 'person' because 仁 <HUMANE> is homophonous with 人 <PERSON> in Chinese. (And 仁 <HUMANE> also contains the left-hand variant 亻 of 人 <PERSON>.)

(11.14.1:26: Maybe it's remotely relevant that both 仁 and 人 have the kun [native] reading hito in Japanese. The most famous instance of 仁 hito is 裕仁 Hirohito. So famous it needs no explanatory link! Perhaps some non-Chinese language of Manchuria also had the same reading for both 仁 and 人.)

(11.14.1:41: 仁 can even mean 人  'person' in Chinese itself. See noun definitions 2 and 3 in the ROC's 教育部重編國語辭典修訂本.)

I've been deliberately vague about that "someone" who spoke a "non-Chinese" language because I do not know who chose 仁 for 'person' and when. Here are three possibilities:

A. That someone was Serbi, and 仁 originally stood for the Serbi word for 'person' - possibly a cognate of Khitan ku 'person'. But the Serbi script is unattested, so no one even knows if it was what Janhunen (1994: 441) would call 'Sinoform': i.e., Chinese-like in appearance, much less if it was ancestral to the Khitan large script (a possibility raised by Shimunek [2017: 211]).

B. That someone was from Parhae, and 仁 originally stood for the word for 'person' in some language of Parhae. That language could have been

Among the three entities formed by the Xianbei [= Serbi, including the ancestors of the Khitan], Fuyu and Yilou [= the ancestors of the Jurchen?], the Fuyu are the most obscure. If they were not Tungusic [like the Jurchen], they may have been Amuric. If they were not Amuric, they may have been another Palaeo-Asiatic entity, unconnected with the extant ethnic corpus of Manchuria. [See Janhunen 1996: 235 for more speculation about such entities in early Manchuria.]

The descendants of the Fuyu (Korean: 夫餘 Puyŏ) lived in Parhae.

C. That someone was Khitan.

C fits the standard account of the origin of the Khitan large script: that it was a Khitan 'invention' without any precedents beyond the standard Chinese script. B is my expansion of Janhunen's (1994) Parhae hypothesis, and A is built upon Shimunek's (2017) Serbi hypothesis.

Why does the Jurchen character for 'person' have an extra stroke added to 仁? Two possibilities:

A. The stroke might have been added in Parhae times to distinguish a semantogram for a non-Chinese word for 'person' from 仁 <HUMANE> which might have been used to write the Chinese morpheme 'humane' in some non-Chinese language. This type of strategy was productive in the Vietnamese nôm script: the optional extra stroke nháy 'blink' differentiates native Vietnamese 買 mới  'new' from Sino-Vietnamese 買 mãi 'to buy'. See Handel (2018: 151) for more examples of nháy. (I'm surprised nomfoundation.org doesn't have the nháy version of 買 mới.)

B. The stroke was added by a Jurchen in the 12th century to distinguish Jurchen niyalma from Khitan ku. There was no need to distinguish niyalma from 仁 which didn't exist in the 12th century (or later) Jurchen script. (But if the Jurchen were really interested in differentiating their large script from the Khitan large script [still in use in the Jurchen Empire], why do the two scripts still share characters: e.g., 一 <ONE> and 二 <TWO>?)

Lastly, here's a wild idea: Khitan 仁 <ku> may in fact be a distortion of a four-stroke variant of 人 <PERSON>: 人 plus two lines on the right². Grinstead (1972: 56-57) thinks that variant underlies the Tangut element 𘢌 <PERSON>. If so, then Khitan 仁 <ku> and Tangut 𘢌 <PERSON> are cognates. (The issue of the potential relationship between the Khitan large script and the Tangut script remains unexplored.)

¹It is unclear whether the homophony of 人 <PERSON> and 仁 <HUMANE> goes back any further than that, and it is also unclear whether the two words are related. Is their later similarity merely the result of the convergence of two unrelated etyma? See the discussion in Schuessler (2007: 440-441).

²Unfortunately this variant is not only absent from Unicode but also absent from the ROC variants dictionary (though that dictionary does have a similar variant with three lines on the right). It is attested as recently as a 1963 South Korean movie ad that I wrote about in September. I think I've also seen it in a Hong Kong comic book or movie poster from the 70s.


19.11.12.23:59: LARGE STONE EGO

神武 <GOD MARTIAL> Jinmu, the legendary first emperor of Japan best known by his disyllabic Sino-Japanese name, has a longer native Old Japanese name

Kamu Yamatə Ipare-m-biko-n-ə sumera-mi-kətə

God Yamato Iware-GEN-prince-be-ATTR august-HON-act

'The August Agent [i.e., Emperor] Prince of Divine Yamato Iware'

spelled

神日本磐余彥天皇

<GOD SUN ORIGIN LARGE.STONE I ELEGANT HEAVEN EMPEROR>

in Nihon shoki (Chronicles of Japan, 720). Seeing that name again today for the first time in a long time made me wonder why the -re of Ipare corresponds to 余 <I>. There is no Old Japanese word re 'I', and at no point in Chinese language history up to the 8th century does 余 ever sound like re:

*CIla > *CIlɨa > *lɨa > *jɨa > *jɨə > *jə > *jø

I wonder if the Old Japanese place name 磐余 Ipare has nothing to do with 磐 ipa 'large rock'. The spelling might reflect a folk etymology for a pre-Japanese indigenous toponym.

11.13.1:07: It just occurred to me that if one only knew Iware, the modern Japanese form of Ipare, one might guess that 磐余 is <iwa ware> with both characters simultaneously representing the wa of Iware.  The trouble, however, is that 'large rock' was ipa in Old Japanese, not iwa.

Old Japanese 'I' was ware more or less as in modern Japanese (there could have been subtle phonetic differences that can't be reconstructed). But I can't think of any other examples of a semantogram for Old Japanese XY (here, ware) also serving as a phonogram for Old Japanese Y  (here, re).


19.11.11.23:59: THE ORIGIN OF THE JURCHEN CHARACTER FOR 'HORSE'

In 1994 I first encountered Eric Grinstead's (1972: 16) explanation for the Jurchen characters for 'horse':

The Ruzhen [= Jurchen] language is like Manchu, which is a Tungus language, but some words could have been borrowed from Mongol. To take a very common word, and one characteristic of Mongol culture, the word for 'horse', we find the Ruzhen word to be something like 'mu-lin' (Grube, no. 138). The Mongol word for 'horse' is 'morin', not greatly different. In Grube's vocabulary we find a binome (of two characters),

,

which we will operate on according to the rules of deliberate alteration [from Chinese]. Adding a stroke this time, one to each character, we get 保列, pronounced in modern Chinese 'baolie'. This is reasonably close for a guess.

What bothered me was why a derivative of 12th century northern Chinese 保 *pɔw was used to write Jurchen mo- with a nasal initial. There was no shortage of Chinese *mo-characters forcing a scribe to fall back on a *p-character.

Tonight I realized why. Follow the logic here:

11.12.23:40: Jin (1984: 215) proposed that the (first) Jurchen character for 'horse' could be from a Khitan large script character in line 11 of the 蕭孝忠 Xiao Xiaozhong inscription (but not in N4631!):

< <?>

Could the character on the right have been a Khitan phonogram <mor(i)>?

The closest characters I can find in N4631 are


1217 <?> 1220 <sam>

Is 1217 a different interpretation of the character that Jin saw in Xiao Xiaozhong? I haven't seen that inscription myself.

1220 is unrelated (and nobody ever said it was related); it is a phonetic transcription of Liao Chinese 三 *sam 'three'. (No one really knows how the Khitan large script character 三 0113 <THREE> was pronounced.)

Why 1220 is pronounced sam is unknown. Did it originate as a logogram for a non-Chinese word *sam - in Khitan, in some language of Parhae, or even in Serbi (if the Khitan large script is a [partial?] offshoot of the lost Serbi script; see Shimunek 2017: 211)? (The Khitan large script could have three strata: Serbi, Parhae, and Khitan-only innovations.)


19.11.10.23:59: KORNICKI'S "WHY ARE THERE SO MANY DIFFERENT SCRIPTS IN EAST ASIA?" (2018)

1. Peter Kornicki (author of a recent book I want to read) asks:

You don’t have to learn a new script when you learn Norwegian, Czech, or Portuguese, let alone French, so why does every East Asian language require you to learn a new script as well?

That is also true of most major South and mainland Southeast Asian languages. It would be fun to see Kornicki at a roundtable with experts on South and mainland Southeast Asian languages on an expanded version of his question.

Asia is the continent of scripts. Contrast with the Americas where the Latin alphabet has a near-total monopoly. (Two exceptions that come to mind are Cherokee and Canadian Aboriginal syllabics. I am unaware of any non-Latin scripts actively being used in Central and South America. Here's a map of the world color-coded by script types.)

2. Also by Kornicki: "How did a Japanese book come to be reprinted in Philadelphia in 1855?"

3. Before I found Kornicki's article, I was going to title this entry "Each Forehead" after this spelling of Nukata that I found in Osterkamp (2008: 222):

各田 <EACH FIELD> (normally 額田 <FOREHEAD FIELD>

nuka 'forehead' has been abbreviated as 各 (normally read kaku) before 田 ta 'field'.

Wikipedia lists another unusual spelling of Nukata:

農多  <FARM FIELD>

農 is normally read - rarely nu - but never nuka. So 農多 looks like it should be read Nuta, not Nukata. I can't think of any other 'underwritten' case like this.

The reduction of 額 to 各 makes me wonder if Khitan large script and Jurchen characters - and/or their Parhae prototypes - were similarly reduced from more complex Chinese characters (which would explain why Andrew West observed that Khitan large script and Jurchen characters have "only half the number of strokes as traditional CJK characters on average"). Such reduction would make the logic behind their readings very difficult to recover. If not for the full spelling 額田, I would have a hard time figuring out why 各 is read Nuka in 各田.

Osterkamp's article also deals with silent characters in place names: e.g.,

Speaking of repetition, here no is written twice:

野 is not a silent character; it is read no 'field' and redundantly represents the second syllable of the name. 角野 is reminscent of Old Korean semantogram-phonogram spellings like

in which 音 represents -m. (夜音 is assumed to represent an Old Korean ancestor of later Korean pam 'night', but it might represent an unrelated, extinct -m word for 'night'.)

4. Looking at this 1605 copy of the 倭玉篇 Wagokuhen (c. 1489), I found a variant 晜 <SUN.YOUNGER BROTHER> of 昆 ani <SUN.COMPARE> 'older brother'. Why is 弟 <YOUNGER BROTHER> on the bottom?

(11.11.0:23: I suppose I could make up a story about an 晜 older brother being like a sun to a 弟 younger brother, but ... no.

Karlgren [1940: 231] cannot explain the structure of the standard graph 昆. 比 <COMPARE> is a drawing of two people. 昆 represents a variety of unrelated words pronounced *CA{q/k}u{n/r}: 'elder brother', 'descendants', 'afterwards', and 'numerous'.

Might the irregular aspiration in the [kʰ] of Mandarin 晜/昆 kun reflect a lost presyllable?)

Also found the erroneous reading seu for 昇 <RISE>, evidence for the merger of -eu and -you by 1605.

5. A CHAM CULTURAL SUBSTRATUM? Wish I could be at Nhung Tuyet Tran's Nov. 15 talk "Articulating Sinic Values at the Interstices of Empire: Literary Sinitic, Vernacular Vietnamese, and Neo-Confucianism in the Cham Heartland" (emphasis mine):

In 1718, in the coastal city of Quy Ninh, in what is now Vietnam’s south central coast, a group of students reprinted the "Guide for Young Learners by Category and Rhyme (指南幼學備品協韻)" in honour of their teacher. [...] More than a simple dictionary, I suggest that the bi-lingual glosses reflect the influence of Cham cultural patterns and habits in its articulation of orthodox Confucian values.

6. Alas, no abstract up yet for Sujung Kim's forthcoming talk "The Old Man and the Sea: Shinra Myōjin and Buddhist Networks of the East Asian ‘Mediterranean' " (2 March 2020). 新羅明神 Shinra Myōjin 'Shilla bright deity' is the guardian of 三井寺 Mii-dera a.k.a. 園城寺 Onjōji. See Shinra Myōjin save 円珍 Enchin (814-891) here.

7. I just realized that the normal reading Shiragi for 新羅 is the opposite of 和泉 Izumi from topic 3. 和泉 is 'overwritten' with a first character that has no phonetic value, whereas Shiragi is 'underwritten' as 新羅 <shin ra> without a character 城 <FORTRESS> for -gi 'fortress'. For some reason, the Japanese called Shilla Sira (now Shiragi) 'Shilla-fortress' rather than simply 'Shilla', though they adopted the spelling of the Shilla autonym 新羅.

8. While Shilla unified most of the Korean peninsula,渤海 Parhae ruled its northern part and much of Manchuria. I wonder what this was about ...

This term [名神 myōjin 'famous deities'] 'is first attested in the Shoku Nihongi [Continued Chronicles of Japan], where offerings from the kingdom of Bohai (Balhae [= Parhae]) are stated to have been offered to "the eminent shrines (名神社 myōjin-sha) in each province [of Japan]" in the year 730 (Tenpyō 2).

9. Did 小高句麗 Little Koguryo exist to the southwest of Parhae? 日野開三郎 Hino Kaizaburō first proposed it in his PhD dissertation 小高句麗国の研究 Studies of the Country of Little Koguryo (1958). Oddly Little Koguryo has no Japanese Wikipedia entry.


19.11.9.23:57: TJK IN KORNICKI (2018)

I was happy to see Tangut, Jurchen, and Khitan discussed - and not just in brief mentions - in Peter Francis Kornicki's Languages, Scripts, and Chinese Texts in East Asia (2018). Search for 'Tangut', 'Jurchen', and 'Khitan' in the Google Books preview to see what I mean.

Kornicki's decision to avoid the sinocentric names Liao and Jin for the Khitan and Jurchen states reminds me of my own preference for 'Khitan Empire' and 'Jurchen Empire'. I have, however, been using Liao and Jin to refer to the ruling dynasties of those empires, not the empires themselves. But Kornicki has inspired me to start referring to the dynasties by their family names (cf the Yi dynasty of Korea).

conventional term
my term for the state
my term for the dynasty
Liao dynasty
(First) Khitan Empire
(First) Yelü dynasty
Western Liao ~ Qara Khitai
Second Khitan Empire
(Second) Yelü dynasty
Western Xia ~ Xixia
Tangut Empire
Ngwimi dynasty
Jin dynasty
Jurchen Empire
Wanyan dynasty

Unfortunately, the reconstruction of the names of the Khitan and Jurchen ruling dynasties is still uncertain, so for now I use their Mandarinized versions. I'm more certain about the name 𗼨𗆟 Ngwimi, so I don't need to use the Mandarin version Weiming.

2. Speaking of names, last night I learned that 石田ひかり Ishida Hikari, an actress I haven't seen in over thirty years, has now married and taken the name 訓覇 Kurube which isn't in O'Neill's Japanese Names.

訓 is normally read kun. O'Neill doesn't include kuru as one of its name readings. I have wondered if  Chinese *-n characters representing Japanese CVrV sequences reflect (via Sino-Paekche) late survivals of an Old Chinese *-r that had not shifted to *-n. Unfortunately, there is no Chinese-internal evidence pointing to *-r instead of *-n in 訓.

覇 is normally read ha < *pa. Until now I would have regarded he < *pe as a hypothetical Go-on reading created for modern dictionaries by analogy with actual Go-on readings in the same rhyme class, but now I think there must have been a Go-on *pe since the name 訓覇 (prewar 訓霸) Kurube is attested in 和名類聚抄 Wamyō ruijūshō (Japanese names [for things], classified and annotated, 938), presumably before the creation of artificial readings for dictionaries and about four or five centuries after the importation of Go-on.

3. I still don't know why the Sino-Korean reading of 霸 is 패 phae < *phay. The expected reading is pha (for some reason p- cannot precede the rhyme -a in Sino-Korean, though pak, pan, pal, etc. are possible), and the idealized reading in 東國正韻 Tongguk chŏngun (Correct Rhymes of the Eastern Country, 1448) is pá, not y.

Although it is tempting to link phay to the front vowel of Go-on *pe from topic 2 above, the latter reading reflects a Middle Chinese *æ̤ from Old Chinese *-raks without anything *-y-like. Korean *-y in 霸 seems to be a Korean-internal innovation. Crazy idea: *-y is the noun suffix *-y added to *p(h)a 'tyrant' and reinterpreted as part of the reading.

(I write *(h) in parentheses since I don't know whether the initial was aspirated before or after *-y addition.)

4. I checked 小學堂 Xiaoxuetang to see if any Chinese variety there had a *phay-like reading of 霸. None did. But I did find this bizarre triplet of readings in 三林塘 Sanlintang Shanghai:

p/tɕ uo/a/i 35/53

I think that's supposed to be read as

puo³⁵ / pa³⁵ / tɕi⁵³

I'm guessing that the first two are the same morpheme in two different strata (borrowed and native) and that the third is an unrelated native synonym.

5. I wish I could help make khitan.info and jurchen.info, hypothetical companions to Alan Downes' tangut.info.

6. Timothy Michael O'Neill's Ideography and Chinese Language Theory: A History (2016) looks interesting.

The introduction introduced me to 譱 <LAMB.SPEAK.SPEAK> (using the glosses of Joseph de Prémare, SJ, for the components), an old form of modern 善 for classical 'to regard someting as good, beautiful'.

7. I just realized that Japanese 匹 hiki < *piki (which does not match Middle Chinese *pʰit) might reflect (via Sino-Paekche) the late survival of an Old Chinese *-k that had not shifted to *-t after *i. Unfortunately, there is no Chinese-internal evidence pointing to *-k instead of *-t in 匹.


19.11.8.23:59: TANGUT.INFO

Today I couldn't use my computer, so instead of writing part 2 of "Do Korean and Japanese Share a Copula?", I started reading Alan Downes' PhD dissertation "How Does Tangut Work?" on my Kindle. I downloaded it over a year ago but never gave it the attention it deserves until now. Unfortunately now that I can use my computer again, I can't give it the attention it deserves on my blog right now.

All I can do is link once again to his site tangut.info and recommend his Tangut Character Lookup tool on the front page. If you don't know any Li Fanwen numbers for Tangut characters, try inputting a random number between 1 and 6000 in the "Li Fanwen Number" field or inputting English into the "Meaning" or "Keyword" fields. When you reach the results page, select "Show Concordances" and be amazed. Right now Downes' concordance is limited to part of the Tangut law code, but imagine an even bigger future concordance encompassing a larger number of texts. And - once Khitan and Jurchen are in Unicode - similar tools for those languages.

I wish Downes applied his computational powers to the Khitan and Jurchen scripts. I don't know of any computational analysis of Khitan since the efforts of Starikov's team in Russia (1964-1986)*, and Jurchen might be terra incognita.

*11.9.0:12:

This was a significant step towards an actual decipherment of the [Khitan small] script, but, unfortunately, the work was discontinued before it had proceeded to the level of phonological reconstruction.

- Wu and Janhunen (2010: 22)


19.11.7.23:59: DO KOREAN AND JAPANESE SHARE A COPULA? (PART 1)

1. Bjarke Frellesvig (2001) proposed that Korean and Japanese had a shared copula, and Alexander Vovin (2008: 546-547, 2010: 73-76) proposed that this copula was a Korean borrowing in Japanese rather than an inheritance from a common proto-language.

There are two or three phonological problems with either version of the shared origin hypothesis (inheritance or borrowing).

(11.8.22:19: What follows deals with Vovin's  [2008: 547] formulation

Middle Korean írò- < Old Korean *ito > Western Old Japanese

converted above into my notation. Frellesvig's formulation is far more complex, and I cannot do it justice in a short post.)

First, the Middle Korean copula is írò- which could be from an earlier rò- or tò- with *-t-lenition. Both versions of the shared origin hypothesis require *t, though there is no Korean-internal evidence for it; the only evidence is the t of Old Japanese 'be'. But I'm OK with using foreign data to resolve ambiguities in reconstruction, so I don't see this as an issue. I do, however, see the next two issues as harder to overlook.

Second, the Korean form has an í- that corresponds to hnothing in Japanese. Why doesn't the Japanese form begin with i-? Could Middle Korean írò- be a  redundant compound of í- (also attested as a copula) with *tò, a root 'be' (?) possibly shared with tʌ̀ɣòy- ~ tʌ̀βʌ̀y < *tʌ̀pʌ̀y < *tò-pʌ̀y? 'become'. Cf. other redundant copulas like Classical Japanese tar- 'be' < 'be' + ar- 'be'.

Maybe Paekche, the Koreanic language most in contact with Japanese and hence the most likely source for Koreanic loans in Japanese, had an uncompounded root *tò- (or had reduced a disyllabic root *ítò- to *tò-).

Third, the o in the Korean form should correspond to Japanese o or u < *o, not ə.

So in short, the two copulas only have a single consonant t in common (and even that t is shaky in Korean - if not for the proposed Japanese etymology, there would be no reason to favor reconstructing *t instead of *r).

11.8.22:44: Here's a table showing how the Korean and Japanese forms line up:

Middle Korean
í
r
ò
earlier Korean

*t

Old Japanese
-
t
ə

There are only two Old Japanese verbs with infinitives in -ə, 'be' and its homophone 'say'. The Middle Korean infinitive is -a ~ depending on the vowel of the preceding verb root. Could 'be' be a direct borrowing of an inflected Paekche *t-ə 'be' + infinitive suffix? If Frellesvig (2001) is right, and 'be' and 'say' are etymologically one and the same, then tə 'say' would also be a direct borrowing of a Paekche infinitive.

That solution is not without its problems. First, no Paekche verb paradigms are known, so we don't know what the Paekche infinitive suffix was. Second, the Old Korean infinitive suffix seems to have been -a, not -ə; the latter developed after Korean developed vowel harmony.

2. You'd think that I'd know the songs of Kojiki well after having typed them all out for use as data in my PhD dissertation. No. It doesn't help that was over twenty years ago. So this morning I had to use this search engine to identify poem 31:

伊能知能

inəti-nə

life-POSS

麻多祁牟比登波

mata-k-em-u pitə-pa

complete-ATTR-TENT-ATTR person-TOP

多多美許母

tatamikəmə

幣具理能夜麻能

peŋguri-n-ə yama-nə

Heguri-be-ATTR mountain-POSS

久麻加志賀波袁

kuma-kasi-ŋga-pa-wo

bear-oak-POSS-leaf-ACC

宇受爾佐勢

unzu-ni sas-e

headdress-be-INF stick-IMP

曾能古

sənə ko

that child

Here's the full context.

Who converted the all-phonogram 8th century script into the modern mixed kanji-kana orthography that appears in the July 1958 issue of 日本古典文學大系月報 Nihon koten bungaku taikei geppō (Japanese Classical Literature Series Monthly)?

命の 全けむ人は 畳薦 平群の山の 熊白檮が葉を 髻華に插せ その子

Chamberlain's translation (1932: 266):

'Let those whose life may be complete stick

[in their hair] as a head-dress the leaves

of the bear-oak from Mount Heguri, -

those children!'

11.8.23:39: I forgot the whole point of quoting that poem: commenting on the choices in the modernized orthography.

2a. kuma-kasi 'bear oak' as 熊白檮 <BEAR WHITE STUMP>

kuma is 'bear', but kasi 'oak' doesn't map in a straightforward way onto <WHITE STUMP>. I suppose oaks are characterized by white stumps.

Almost thirty years ago I asked H. Mack Horton if I had to be a botanist to be a Japanese literature scholar. I can't remember his answer. I'm obviously neither.

2b. unzu 'headdress' as 髻華 <MIZURA FLOWER>

The word unzu itself has no internal structure, so it doesn't split up into mizura (< Old Japanse mindura), an ancient Japanese male hairstyle, and 'flower' (Old Japanese pana).

The mizura (see it here) didn't exist in China, so its name was spelled semantically with various repurposed characters:

11.9.20:09: Another name for the mizura is agemaki <  aŋgəy-maki 'raise-coil', written 總角 <ALL CORNER>.

3. The Japanese Wikipedia article on 漢文訓読 kanbun kundoku 'reading literary Chinese as Japanese' says that Khitan and Uyghur had their own versions of that. Is that a reference to Hong Mai's anecdote about how Khitan children read Chinese in Khitan word order? (See Kane 2009: 130.) What is the evidence for Uyghurized Chinese?


19.11.6.23:59: THE DAMAGED TEXT OF YELÜ DILIE'S EPITAPH: LINE 14

1. Looking at Andrew West's 2011 photo of the Khitan small script epitaph of 耶律迪烈 Yelü Dilie (1092), I see that the □ in Kane's (2009: 198) edition corresponds to damage in the third from last block in line 14, though Kane transliterates  □ as □, whereas he transliterates ⌧ as [damaged]. Unfortunately, all instances of Kane's ⌧ correspond to the tops of lines 33-38 which are hard to see in the photo. So I still can't figure out the difference, if any, between Kane's symbols ⌧ and □. For a moment I thought he might have changed his symbol for damage from □ to ⌧, but both symbols coexist in lines 33, 36, and 38 of his edition.

2. Sahaptin has long versions of all vowels except /ɨ/. Is there a historical reason of that?

3. Today I finally got around to looking at Valerie Henitiuk's Worlding Sei Shônagon: The Pillow Book in Translation (2012) when looking for Lone Takeuchi's A Study of Classical Japanese Tense and Aspect (1987). Wish I had the time to compare all 48 translation samples. Not that Google Books' preview would let me.

4. Michal Biran's "The Non-Han Dynasties" (2017) is a nice, short overview of the TJK (Tangut/Jurchen/Khitan) empires and their Mongol and Manchu successors.

5. Two lines of poetry caught my eye in Alexander Vovin's A Descriptive and Comparative Grammar of Western Old Japanese:

5a. 那賀那加佐麻久阿佐阿米能疑理爾多多牟叙

<na ŋga na ka sa ma ku a sa a məy nə ŋgɨ ri ni ta ta mu nzə>

na-ŋga nak-as-am-aku asa-aməy-nə ri-ni tat-am-u-nzə

you-POSS cry-HON-TENT-NML morning-rain-COMP fog-LOC rise-TENT-ATTR PT

'your weeping will rise into fog like the morning rain' (Kojiki song 4; analysis and tr. by Vovin 2008: 846)

Does the spelling 疑理 <ŋgɨ ri> indicate that /kɨri/ 'fog' was pronounced [ŋgɨri] with nasalization of /k/ spreading from the preceding comparative suffix /nə/ with a nasalized vowel [ə̃]?

5b. 吾妹子之阿乎偲良志

<I YOUNGER.SISTER CHILD si a wo LONG.FOR ra si>

wa-ŋg-imo-ko si a-wo sinop-urasi

I-POSS-beloved-DIM PT I-ACC long.for-SUP

'It seems that my beloved longs for me' (Man'yōshū XII: 3145; analysis and tr. by Vovin 2008: 681)

The modern Japanese reflex of sinop- is shinob- < *sinomb-. Is this another case of nasality spreading into a following consonant?

sinop- > *sinõp- > *sinõmb- > *sinomb-?

11.7.21:47: *sinomb- was later confused with the unrelated verb *sinomb- < sinəmbɨ- 'to conceal, to endure'. Both verbs are written with sinographs originally representing Chinese morphemes with different meanings:

偲 <PERSON.THINK> might have been interpreted as a semantic compound 'thinking of a person' appropriate for writing sinop- 'to long for'.

忍 <ENDURE> does match sinəmbɨ- in the sense of 'endure' but not 'conceal'. Is 'endure' an extended usage of 'conceal' ('conceal' > *'conceal discomfort' > 'endure')?

忍 in the sense of 'conceal' is well-known as the nin of 忍者 ninja 'concealer' and in 忍び shinobi 'art of the ninja' (lit. 'concealing').

6. My copy of 古代歌謡集 Kodai kayōshū (A Collection of Ancient Songs, 1958)  has a newsletter (日本古典文學大系月報 Nihon koten bungaku taikei geppō (Japanese Classical Literature Series Monthly) in postwar simplified Japanese orthography with at least one exception: 關聯 kanren 'connection' instead of postwar 関連. A slip?

11.7.22:12: I'm not counting how the title 日本古典文學大系 is consistently spelled that way instead of as postwar 日本古典文学大系. Or the subtly different prewar forms of 錄 and 卷 instead of postwar 録 and 巻 in the title section. (Postwar 巻 appears in the body on p. 1. I can't find 錄/録 in the body.)

7. Origin story of the day: Japanologist Hugh Cortazzi's in his own words. (Also found that when looking up Lone Takeuchi.)

8. I might have seen an aphid on my phone screen last night. Wikipedia says 'aphid' in Korean is 진딧물 <c.i.n t.i.s m.u.r> chindinmul which looks like chindi- + genitive -s + -mul. (Genitive -s assimilates to the following nasal: /sm/ > nm.) Martin et al. (1967: 1545) define 진디 chindi by itself as 'aphid' and chindinmul as 'a nest of aphides' (not 'aphids'; 11.7.0:22: aphides is the plural of Latin aphis; the stem is aphid-).

mul looks like a shortened (or unsuffixed?) form of 무리 muri 'group'. I suspect Japanese 群れ mure 'group of animals' < *mura-i is a borrowing of a Koreanic *mur plus a Japanese filler vowel *a and noun suffix *-i.

(11.7.0:13: Sakihara's 2006 Okinawan dictionary has no entry for a cognate of *mura-. If there are no Ryukyuan cognates of *mura-, then *mura- may be a borrowing from Koreanic [specifically Paekche?] into mainland Japanese after it split from Ryukyuan.)

Martin et al. derive chindi 'aphid' from <c.i.n t.ŭ k.i> 진드기 ~ <c.i.n t.ŭ.k Ø.i> 진득이 chindŭgi 'tick, mite, louse' with irregular -g-loss. They in turn derive chindŭgi from chindŭk-i with a noun suffix -i but do not define chindŭk. I suspect that chindŭk is the same chindŭk that is in

Was chindŭgi originally 'clinger'?


19.11.5.23:59: THE DAMAGED TEXT OF YELÜ DILIE'S EPITAPH

1. I'm still slowly copying out the Khitan small script epitaph of 耶律迪烈 Yelü Dilie (1092) as published in Kane (2009: 191-211) because I don't have access to any photos. (11.6.1:00: I do now!)

I forgot to ask last week - what is the difference, if any, between ⌧ and □ in the printed text? ⌧ is transliterated as [damaged] whereas □ is transliterated as □. But aren't both forms of damage?

What caught my eye was what appeared tho be a rare eight-character block <⌧⌧⌧⌧⌧⌧⌧eu> at the start of line 33. I don't remember seeing any blocks with more than seven characters. (Here are the standard block layouts. Other possibiities are vertical stacks, 'diamonds', and 'pyramids'.)

2. Last night I saw Jennifer Taylor (née Bini) on Two and a Half Men. Bini is a name reduced to a b- (e.g., Iacobo) plus a diminutive suffix -ini. Are there any other surnames of that type?

3. I would have never guessed that Touchet was pronounced [ˈtuːʃi]. The -t seems nonetymological (the original name was Sahaptin tu-se) and presumably was added by analogy with rhyming French words ending in -t. I'm guessing French [tuʃe] was then Anglicized as [ˈtuːʃi] with final [i] in place of [e] (cf. how karate was borrowed as [kʰəˈɹɑti]). American English has [ej] and [i] but not [e].

4. I also would have never guessed that Muth in America was pronounced [mjuːθ]. I would have guessed [muːt] as in German or an Anglicized [mʌθ].

5. Proof Hollywood German has only one gender: Der Waffle Haus. GAHHHH!!

Or should I say Vancouver German? Dead like Me was filmed in Vancouver.


19.11.4.23:14: PRE-TANGUT *O BEFORE *CORONALS

1. Yesterday, I mentioned how pre-Tangut *o shifted to *y before *-r but not in open syllables in my interpretation of Jacques (2004: 206). Compare:

A similar *y-shift occurs before *-t (which is subsequently lost):

Other codas do not have y-reflexes in Tangut:

If *-oŋ and *-on existed, I don't know what their reflexes are.

If *-op never became *-yw, I can say that *o became *y before coronals *-t and *-r (and *-n, the nasal counterpart of *-t?). But why would *o lose its labiality and become achromatic (nonlabial and nonpalatal) y before coronals?

Perhaps the key lies in the fact that *i also underwent *y-shift in even more environments than *o: before *-Ø and *-p as well as before *coronals.

*vowel\*coda
*-Ø
*-k
*-ŋ
*-j
*-t
*-n
*r
*-p
*-m
*i
-y
-ew
?
?
-y
?
-yr
-y
-en
*o
-u
-o < *-ow
?
-o
-y
?
-yr
-ew
-on

Given how *o fronted before coronals in Lhasa Tibetan

I think this might have happened in Tangut:

pre-Tangut
*o-fronting
> *e
*eT > *iT
Tangut
*-ot
*-øt
*-et
*-it
-y
*-it
*-it
*-it
*-i
*-i
*-i
*-i
*-ir
*-ir
*-ir
*-ir
-yr
*-or
*-ør
*-er

*o-fronting in Tangut is like *o-fronting in Lhasa Tibetan, though the conditioning codas differ.

It is interesting that *o did not front (i.e., become palatal) before the palatal (and hence dorsal and noncoronal) coda *-j. *o was preserved before final glides: *-j and *-w < *-k.

The that resulted from*o-fronting has no relation to Grade IV o which I think might have been [ø]. Compare:

pre-Tangut
*o-fronting
> *e
Grading
*eT > *iT
Tangut
*CoT *CøT *CeT
*CeT1
*CiT1 Cy1
*CICoT
*CICøT
*CICeT
*CICeT4
*CICiT4 Cy4
*Cok
*Cok
*Cok
*Cok1
*Cok1 Co1
*CICok *CICok *CICok
*CICok4 [CICøk]
*CICok4 [CICøk] Co4 [Cø]

(I leave out Grades II and III for simplicity.)

In the above scenario, grades developed after > *e but before *e > *i > y:

By the time *e raised to high *i, height-driven grading was over, so *e1 would not become *i4; it became *i1, retaining its grade.

> *e may have something to do with the *o > *e shift before *-p. The lack of parallelism between *-op and *-om might indicate that the latter no longer had a labial coda by the time *o dissimilated to *e before *-p (or did *-p already lenite to -w?).

Did *-om become a nasal vowel when *-op > *-ep/*-ew? Or did *-om become *-on after *o-fronting (so this new *-on never became *-øn)?

Later, *-eT (*T = *t or *r) merged with *-iT. That resulted in a large number of *-i(T) syllables which were all subject to the *i > y shift. I don't know whether that shift predated or postdated coda loss. (The -r in the Tangut column is not a coda.)

¹-r in unstarred Tangut forms indicates vowel retroflexion unlike pre-Tangut *-r which is a true liquid coda.

In yesterday's tables, none of the Tangut forms ended in -r because all of the forms had a preceding *S- at the pre-Tangut level. That *S- conditioned vowel tension (written as -q) which could not coexist with retroflexion: *SCVr > CVq (not ˣCVrq).

2. This is a strong statement (via Joanne Jacobs; emphasis mine):

Once a week, children wear a vest that includes a pocket for a listening device officials refer to as a “word pedometer.” The device, made by the nonprofit LENA, syncs with an online program that counts how many words the children hear each day, but it does not recognize which words are exchanged. The system works with any language, and it can differentiate between words broadcast by a TV or computer and those spoken by a person.

If the online program "does not recognize which words are exchanged", how does it 'know' what is and isn't a word? My fear is that it doesn't 'know'; it might be a syllable detector which indeed would work "with any language", and the word counts might be the number of syllables divided by some figure for the average number of syllables in an English word. Which might not be the average number of syllables of a word in some other language. And let's not even get into the issue of what counts as a word.

3. What is the invertive case of Kabardian? Is it the same thing as the adverbial case?

4. I just heard Kelly Clarkson pronounce Bebe Rexha as [ˈbiːbi ˈɹɛksə]. Ouch. Will Albanian xh = [dʒ] ever become common knowledge? Will it become fashionable to pronounce xh as [ʒ], a consonant that doesn't even exist in Albanian? Some English speakers seem to think [ʒ] is the foreign sound (e.g., Beijing as [bejˈʒɪŋ]), presumably due to its low frequency in English and its presence in French.


19.11.4.23:14: PRE-TANGUT *O BEFORE *CORONALS

1. Yesterday, I mentioned how pre-Tangut *o shifted to *y before *-r but not in open syllables in my interpretation of Jacques (2004: 206). Compare:

A similar *y-shift occurs before *-t (which is subsequently lost):

Other codas do not have y-reflexes in Tangut:

If *-oŋ and *-on existed, I don't know what their reflexes are.

If *-op never became *-yw, I can say that *o became *y before coronals *-t and *-r (and *-n, the nasal counterpart of *-t?). But why would *o lose its labiality and become achromatic (nonlabial and nonpalatal) y before coronals?

Perhaps the key lies in the fact that *i also underwent *y-shift in even more environments than *o: before *-Ø and *-p as well as before *coronals.

*vowel\*coda
*-Ø
*-k
*-ŋ
*-j
*-t
*-n
*r
*-p
*-m
*i
-y
-ew
?
?
-y
?
-yr
-y
-en
*o
-u
-o < *-ow
?
-o
-y
?
-yr
-ew
-on

Given how *o fronted before coronals in Lhasa Tibetan

I think this might have happened in Tangut:

pre-Tangut
*o-fronting
> *e
*eT > *iT
Tangut
*-ot
*-øt
*-et
*-it
-y
*-it
*-it
*-it
*-i
*-i
*-i
*-i
*-ir
*-ir
*-ir
*-ir
-yr
*-or
*-ør
*-er

*o-fronting in Tangut is like *o-fronting in Lhasa Tibetan, though the conditioning codas differ.

It is interesting that *o did not front (i.e., become palatal) before the palatal (and hence dorsal and noncoronal) coda *-j. *o was preserved before final glides: *-j and *-w < *-k.

The that resulted from*o-fronting has no relation to Grade IV o which I think might have been [ø]. Compare:

pre-Tangut
*o-fronting
> *e
Grading
*eT > *iT
Tangut
*CoT *CøT *CeT
*CeT1
*CiT1 Cy1
*CICoT
*CICøT
*CICeT
*CICeT4
*CICiT4 Cy4
*Cok
*Cok
*Cok
*Cok1
*Cok1 Co1
*CICok *CICok *CICok
*CICok4 [CICøk]
*CICok4 [CICøk] Co4 [Cø]

(I leave out Grades II and III for simplicity.)

In the above scenario, grades developed after > *e but before *e > *i > y:

By the time *e raised to high *i, height-driven grading was over, so *e1 would not become *i4; it became *i1, retaining its grade.

> *e may have something to do with the *o > *e shift before *-p. The lack of parallelism between *-op and *-om might indicate that the latter no longer had a labial coda by the time *o dissimilated to *e before *-p (or did *-p already lenite to -w?).

Did *-om become a nasal vowel when *-op > *-ep/*-ew? Or did *-om become *-on after *o-fronting (so this new *-on never became *-øn)?

Later, *-eT (*T = *t or *r) merged with *-iT. That resulted in a large number of *-i(T) syllables which were all subject to the *i > y shift. I don't know whether that shift predated or postdated coda loss. (The -r in the Tangut column is not a coda.)

¹-r in unstarred Tangut forms indicates vowel retroflexion unlike pre-Tangut *-r which is a true liquid coda.

In yesterday's tables, none of the Tangut forms ended in -r because all of the forms had a preceding *S- at the pre-Tangut level. That *S- conditioned vowel tension (written as -q) which could not coexist with retroflexion: *SCVr > CVq (not ˣCVrq).

2. This is a strong statement (via Joanne Jacobs; emphasis mine):

Once a week, children wear a vest that includes a pocket for a listening device officials refer to as a “word pedometer.” The device, made by the nonprofit LENA, syncs with an online program that counts how many words the children hear each day, but it does not recognize which words are exchanged. The system works with any language, and it can differentiate between words broadcast by a TV or computer and those spoken by a person.

If the online program "does not recognize which words are exchanged", how does it 'know' what is and isn't a word? My fear is that it doesn't 'know'; it might be a syllable detector which indeed would work "with any language", and the word counts might be the number of syllables divided by some figure for the average number of syllables in an English word. Which might not be the average number of syllables of a word in some other language. And let's not even get into the issue of what counts as a word.

3. What is the invertive case of Kabardian? Is it the same thing as the adverbial case?

4. I just heard Kelly Clarkson pronounce Bebe Rexha as [ˈbiːbi ˈɹɛksə]. Ouch. Will Albanian xh = [dʒ] ever become common knowledge? Will it become fashionable to pronounce xh as [ʒ], a consonant that doesn't even exist in Albanian? Some English speakers seem to think [ʒ] is the foreign sound (e.g., Beijing as [bejˈʒɪŋ]), presumably due to its low frequency in English and its presence in French.


19.11.3.23:59: ABC COMPRESSION

1. Today I found the Korean-language YouTube channel of Oliver Ssaem.ssaem is a compressed slang form of Korean sŏnsaengnim 'teacher'. I'll call that ABC compression: i.e., compression at both ends that turns a trisyllabic word into a monosyllable. Here's a case of Tangut  ABC compression from Jacques (2014: 126; somewhat modified here; he reconstructs disyllabic *S-kar-u):

𗕐 1252 1kyq4 'to frighten' < *SI-kar-u

The A (*SI-) has left traces as -q4: i.e., tension (written as -q) and Grade IV (high and perhaps palatal?): y4 = [jɨ]?).

The C (*-u) has fused with *-ar into *-or which ultimately became -y. Normally *-r conditions vowel retroflexion, but Tangut vowels cannot be both tense and retroflex at the same time. It seems that if the conditioning factors for both tenseness and retroflexion coexist - as they do in - tenseness dominates. *-r must have been lost after *-o > *-u. Otherwise *-o after *-r-loss would have raised to *-u and become -wy, not -y.

11.4.17:28:

What I think actually happened to rhymes of *S-r syllables (omitting how *I conditioned Grade IV for simplicity)

pre-Tangut
*o-fusion
*y-shift
*o-raising
Tangut
*S-aru
*-or
*-yr
-yq
*S-or
*S-o
*-o
*-o
*-u
-uq
*S-ur
*-ur
*-wyr
-wyq
*S-u
*-u
*-wy

*-o shifted to y (which was some sort of achromatic vowel: neither palatal nor labial) before *-r. *-u underwent a similar shift to *-wy with or without a following *-r. Following y-shift, pre-Tangut had no *-u, so *-o raised to fill that gap.

By the final stage (Tangut), *S- was lost after conditioning vowel tension (written as -q) at some earlier point, and *-r was lost. I have deliberately ignored the *S- > -q shift until the Tangut stage, since (1) I cannot date that change and (2) my interest here lies in the relative chronology of *o-raising and *r-loss.

*r-loss normally left behind retroflex vowels (which I still write with final -r for convenience), but in the above cases, *-r left no trace when preceded by *S-V because Tangut does not permit tense retroflex vowels: i.e., ˣVrq.

What would have happened if *o-raising occurred after *S-tension and *r-loss and before *y-shift

pre-Tangut
*o-fusion
*S-tension/*r-loss
*o-raising
*y-shift/Tangut
correct?
*S-aru
*S-or
*-oq
*-uq

-wyq


*S-or
*S-o
*S-o
*S-ur
*S-ur
*-uq

*S-u
*S-u

In the above incorrect scenario, the *-r that blocked *-o from *y-shift disappeared early, so *S-or and *S-o merged into *-oq which merged with *-uq and underwent *y-shift.

What would have happened if *o-raising occurred after *S-tension and *r-loss and after *y-shift

pre-Tangut
*o-fusion
*S-tension/*r-loss
*y-shift *o-raising/Tangut
correct?
*S-aru
*S-or
*-oq
*-oq
-uq
*S-or
*S-o
*S-o

*S-ur
*S-ur
*-uq
*-wyq
-wyq

*S-u
*S-u

That's closer to what I think actually happened, but still not quite right.

I'd like to work out a full relative chronology of Tangut sound changes.

2. John McWhorter uses Hmong as an example of a language with lots of tones. Dananshan Miao (= Hmong) has eight tones. I think of Kam as the record-holder with nine tones, but I just learned that

Preliminary work on the Wobe language of Liberia and Côte d'Ivoire and the Chatino languages of southern Mexico suggests that some dialects may distinguish as many as fourteen tones, but many linguists believe that many of these will turn out to be sequences of tones or prosodic effects.

Wikipedia describes the fourteen tones of Wobé (the height classes are mine):

height class
level
rising
falling
rising-falling
V
55
35
51
-
IV
44
34
41
-
III
33
25
31
-
II
22
24
21
231
I
-
23
-
-

Wikipedia describes a ten-tone variety of Chatino (converted to a 4 = high/1 = low scale; again, the height classes are mine):

height class
level
rising
falling
IV
4
-
43
III
3
34
32
II
2
23
31
I
1
-
21

I wonder what the frequency of each tone is.

(11.4.16:52: I forgot to mention this hypothesis:

tone languages are less likely to develop in dry environments because dry air deprives the vocal cords of the suppleness required to produce subtle differences in tone.)

3. When did ants first arrive in Hawaii? Ants were first scientifically recorded in Hawaii in 1879. Yet there are several native words for 'ant' which appear to be related to each other:

naonao, nonanona, ʻānonanona

There was a newspaper, Ka Nonanona 'The Ant' (1841-5), named after this saying:

E hele ʻoe i ka ʻānonanona, e nānā i kona ʻaoʻao e hoʻonaʻauao iho.

'Go to the ant, study her ways and learn.'

Did Europeans (unintentionally, of course) bring ants to Hawaii? Or did Polynesians do so long before Captain Cook?

None of the Hawaiian words for 'ant' have an etymology even though words for 'ant' are reconstructible at the Proto-Oceanic and higher levels. I suppose they are unique to Hawaiian.

I'm reminded of the native word for horse, lio. Horses definitely weren't here before contact with Europeans? So how did they get a native name? I forgot this etymology at wehewehe.org:

a shortening of ʻīlio, formerly a generic name for quadrupeds

I prefer that proposal to one identifying the word as an extended usage of lio 'tight, taut, as a rope, or of hair or horse's ears pulled back tightly'.


19.11.2.23:59: MCWHORTER'S SIMPLIFICATION HYPOTHESIS

1. Is this why Tangut is simpler than Japhug?

When a language seems especially telegraphic, usually another factor has come into play: Enough adults learned it at a certain stage in its history that, given the difficulty of learning a new language after childhood, it became a kind of stripped-down "schoolroom" version of itself. Because all languages, are, to some extent, busier than they need to be, this streamlining leaves the language thoroughly complex and nuanced, just lighter on the bric-a-brac that so many languages pant under.

I'm surprised McWhorter thinks Mandarin has tense:

Mandarin can mark tense but often doesn’t

2. When machines judge machines (emphasis mine):

A Google AI research team recently published the paper Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges, proposing a universal neural machine translation (NMT) system trained on over 25 billion examples that can handle 103 languages.

[...]

There are a couple of limitations in the experimental results. First, it is not clear which points in the figures correspond to which languages, so it is hard to get finer-grained takeaways about which types of languages are benefitting from this type of training. Second, there are no qualitative results or translation examples, only results measured using automatic measures such as BLEU score. Because of this it is hard to tell which of these systems have reached a practical level.

(11.3.21:15: What is BLEU?

BLEU (bilingual evaluation understudy) is an algorithm for evaluating the quality of text which has been machine-translated from one natural language to another. Quality is considered to be the correspondence between a machine's output and that of a human: "the closer a machine translation is to a professional human translation, the better it is" – this is the central idea behind BLEU.)

3. Surprising kanji readings of the day:

3a. 日暮 Nippori (even though 暮 isn't -pori or hori < *p- elsewhere). The spelling 日暮里 has a clarifier 里 ri at the end to indicate that 日暮 isn't read higure or higurashi with normal readings for both 日 and 暮).

(11.3.20:55: Is the reading pori for 暮 rooted in the normal reading bo for 暮? Is 日暮 Nippori a clipped version of 日暮里, whose three parts in isolation would be read as nichi, bo, and ri?)

3b. 麻布 Azabu (even though 麻 is normally asa, not aza). I can't think of any other case of voicing being ignored in the middle of a reading to write a similar-sounding name.


19.11.1.22:41: FAN VS. HAN

Words for 'outer' groups of people can be hard to explain and translate. Using a local example, is haole always pejorative?

Shao-yun Yang's "Fan and Han: The Origins and Uses of a Conceptual Dichotomy in Mid-Imperial China, ca. 500-1200" (2014) is of interest to TJK (Tangut/Jurchen/Khitan) scholars:

However, the dichotomy eventually became ethnic in the Kitan [= Khitan] Liao and Western Xia [= Tangut Empire], where Han reverted to being an ethnonym for the "Chinese." Our understanding of the word Fan as used in the Kitan empire remains incomplete, but one of its uses was as a synonym for Kitan. Similarly, the primary use of Fan in the Xia was as a synonym for Mi, the ruling Tangut people’s most common self-appellation. [...] Meanwhile, the Jin  [= Jurchen Empire] revived the use of Han as an ethnonym for the Chinese in the North China Plain, but banned the use of Fan as an appellation for the ruling Jurchen and their language in 1191¹—possibly as a way of asserting the political legitimacy of Jurchen rule over north China.

The article makes me think how dangerous it can be to be wedded to the simplistic tag translations that I and pretty much everyone uses for Chinese morphemes: e.g., <BARBARIAN> for 蕃 ~ 番 Fan and <CHINESE> for 漢 Han. There is an unspoken assumption that the semantics of Chinese morphemes are just as 'unchanging' as character forms, but this is not so - meanings varied over time and space. In this respect, Chinese is no different from any other language or language family. What is different about Chinese is the illusion of semantic stability implied by the graphic stability of character forms. Chinese has the world's most conservative orthography. It would be nice to see a dictionary that goes beyond the simple dichotomy of premodern/literary vs. modern/colloquial (Mandarin) and says that, for instance, 蕃 ~ 番 meant this in one time and place, that in another time and place, etc.

¹11.2.22:29: I don't think it's a coincidence that the ban occurred during this period described by Kane (2009: 3-4):

In 1191 the Jin emperor Zhangzong ordered that Jurchen should be directly translated into Chinese [rather than via Khitan]. Clerks of the Department of National Historiography who knew only the Kitan script[s] were dismissed. In 1192, the position of Kitan secretary was abolished in all ministries. [...] The Kitan script[s were] abolished by the Jin Emperor Zhangzong in 1191-1192.

The abolishing of Khitan (a 'Fan' language) was part of the trend of distancing the Jurchen from the 'Fan' category.


19.10.31.22:41: XIAOXUETANG READINGS OR TRANSLATIONS?

I've been looking in 小學堂 Xiaoxuetang for w/v-readings of 扔 <THROW> cognate to Cantonese wing1. So far the only such reading I've found is 斗門 Doumen Yue veŋ1. No w/v-readings in Hakka or Pinghua varieties.

Is "readings" the right word? Looking at the Pinghua entry for 扔 <THROW>, I see pronunciations for morphemes not cognate to the pan-Chinese¹ root 扔 *ɲ̊iŋ:

As far as I know, no Pinghua varieties are written. So I wonder how the above data were elicited. Were speakers asked to translate Mandarin 扔 reng1 'to throw' into their native languages?

In 1992 I had difficulty eliciting Taiwanese readings for Chinese characters from a Taiwanese speaker. In some cases she only knew the Mandarin reading for a character. That seems inevitable if one is only educated in Mandarin and isn't taught to read lower-frequency and literary characters in one's own language. I wouldn't expect her to know the Taiwanese reading of, say, the literary perfective particle 矣. How did Xiaoxuetang get Pinghua readings of 矣?

11.1.22:23:22: The ROC Ministry of Education Taiwanese dictionary lists ah as the reading of 矣, but I think ah is a native Taiwanese word 'anterior aspect particle' (as defined by Philip T. Lin [2015]) written semantically with a character 矣 <PERFECTIVE> originally for an unrelated particle whose Taiwanese pronunciation should be something like i. (Xiaoxuetang says the reading in 漳州 Zhangzhou, a close mainland relative of Taiwanese, is i.) ah is to i what Pinghua pʰi, sa, and tiu are to Mandarin reng1 and other reflexes of pan-Chinese *ɲ̊iŋ: a (loose) semantic equivalent rather than a cognate. I would say that ah is probably a Taiwanese reading of 矣 which I imagine has another i-like reading used when literary Chinese texts are read out loud in Taiwanese (something that probably doesn't happen much anymore - hence the absence of an i-like reading in the government dictionary). I doubt scholars in premodern Taiwan read 矣 as ah.

I wonder how old the practice of writing ah as 矣 is. A historical dictionary of Taiwanese orthography showing all the competing spellings and their dates would be fun.

I was surprised that the Maryknoll Taiwanese dictionary (now in Excel as well as PDF format!) doesn't have an entry for the particle ah.

¹I use the term 'pan-Chinese' for a form inferred from modern forms rather based on philological data like 'Middle Chinese' or 'Old Chinese'.


19.10.30.23:59: KIRWANDAN = KINYARWANDA?

1. From one IMDb biography for Stephanie Katherine Grant:

There aren't many thirteen year olds who speak Greek, French and Kirwandan

Is Kirwandan another name for Kinyarwanda? I wouldn't have guessed how <rw> is pronounced in Kinyarwanda.

(10.31.20:55: Lovely, I might be able to answer my question if I paid $480/year so I could see the Ethnologue entry for Rwanda.)

2. I was surprised by this entry in Chalmers and Dealy's English and Cantonese Dictionary, Volume 2 (1907: 809) for two reasons.

X and Y, (algebraic symbols) 天元 t‘in uen, 天地 t‘in tiʾ.

First, the method of marking tones:

tone
sonorant-final syllable
stop-final syllable
1
italics
2
ʿitalics
-
3
italicsʾ
4
roman
-
5
ʿroman
-
6
romanʾ
roman (no ʾ!)

(10.31.0:33: I can't find volume 1 which presumably had a key to the tone system, so I've had to reverse-engineer the notation for entries in volume 2.)

Second, how 'X' is 天元 <HEAVEN ORIGIN> and 'Y' is 天地 <HEAVEN EARTH>. Who came up with those equivalents?

3. I was also surprised to see that Eitel's A Chinese dictionary in the Cantonese dialect, Part 1 (1877: xiii) says Cantonese has an u : uu distinction in addition to an a : aa distinction. Only the latter survives a half-century later (and strictly speaking are distinguished by quality as well as length: [ɐ] : [aː] - but were they only distinguished by quality in Eitel's time?).


Tangut Yinchuan font copyright © Prof. 景永时 Jing Yongshi
Tangut character image fonts by Mojikyo.org
Tangut radical and Khitan fonts by Andrew West
Jurchen font by Jason Glavy
All other content copyright © 2002-2018 Amritavision