Amaravati: Abode of Amritas

19.10.12.23:49: KHITAN CHICKENS, STABLES, AND ALTERNATORS

(Posted 19.10.16.)

1. I've long been bothered by the Khitan word for 'chicken', written as

(~?)

<t.qo.a> (~ <CHICKEN>?)

I have put the large script character in parenthesis, since I do not know of any evidence for its pronunciation. It is parsimonious to assume that it was pronounced like <t.qo.a> in the small script.

How was <t.qo.a> pronounced? That's half of my problem. The other half is how that pronunciation relates to other words for 'chicken' in continental 'Altaic' (which I regard as a language area, not a language family).

Kane (2009: 88) reads <t.qo.a> as teqoa with an inherent vowel e. This reading has a nonharmonic e-a sequence that is unusual for a continental 'Altaic' language.

Shimunek (2017: 372) reads the small script spelling as <t.aq.a>. It is certain that the second symbol stood for something absent from Chinese, though what that somehting was is uncertain.

Neither interpretation is a simple match for other words for 'chicken' in the area:

Old Uyghur takığuː (from Clauson [1972: 468]; see that entry for other Turkic forms)
Middle Mongolian takiya (from Shimunek [2017: 372]; see that entry for other Mongolic forms)
Jurchen tiho
Manchu coko

Here's an attempt to make sense out of most of that, taking Kane's interpretation as an endpoint in Khitan:

1. The earliest form of the word was something like *taqɯʁu. I don't know whether it originates from Turkic, Serbi-Mongolic, or a third language in contact with them (Ruanruan? Xiongnu?).

2. The vowels metathesized in pre-Khitan: *tɯqaʁu.

3. Medial *ʁ lenited to zero: *tɯqaʁu > *tɯqau.

4. *au metathesized: *tɯqau > *tɯqua. (I thought all au in Khitan were either from *aCu or in Chinese loanwords, but what of au in taulia 'hare' corresponding to Mongolian taulai? Maybe only root-final *au metathesized.)

5. In an unwritten eastern Khitan dialect, *tɯqau or *tɯqua became *tiqo.

6. That *tiqo was borrowed into pre-Jurchen which then lenited intervocalic *q to h: *tiqo > tiho. (It's also possible that pre-Jurchen borrowed Khitan *ɯ as *i, so maybe the Khitan source form retained *ɯ.)

(10.13.1:23: I briefly considered the possibility that *au monophthongized within Jurchen: *tiqau > tiho. But *au should correspond to Manchu oo which is not in Manchu coko. So I think the common Khitan source of the Jurchen and Manchu words already ended in -o.)

7. In the written Khitan dialect, *tɯqua became teqoa with high vowels lowering under the influence of a.

8. One Jurchen dialect shifted *tiqo to *tyoqo. *ty then palatalized to c, resulting in Manchu coko *[tɕʰɔqʰɔ] (later [tʂʰɔqʰɔ] with a Mandarin-like retroflex [tʂ]). (But I cannot explain why Manchu has -k- instead of -h-. Manchu intervocalic -k- should be from a *cluster, not a simple *-q-.)

Shimunek (2017: 372) reconstructs Proto-Serbi-Mongolic *tʰakʰɪɣa 'chicken'. Presumably *-ɪɣa was reduced to -a in his Khitan.

Shimunek (2017: 372) thinks Middle Korean ᄃᆞᆰ <tărk> [tʌrk] 'chicken' is also part of the same word family.

Old Japanese təri 'bird' has been linked to that Korean word (perhaps most recently by Francis-Ratte [2016: 211] who regards them as genetic cognates).

I suppose one could regard Korean and Japanese r as attempts to imitate a foreign *ʁ. I can't explain the Korean and Japanese vowels.

2. <t.qo.a> 'chicken' is an example of what I call a stable in Khitan. I propose grouping obstruent-initial words in Khitan into two categories, stables and alternators. Stables are always written consistently with the same type of consonant, whereas alternators alternate.

class	stable	alternator
K	<ku> 'person'	<k.ai> ~ <x.ai> 'open' < 開 *kʰaj
X	<x.s> 'region'	<k.ai> ~ <x.ai> 'open' < 開 *kʰaj
G	<go.er> 'tent'	(none)
C	<c.i.is> 'blood'	<c.ur.er> ~ <j.ur.er> 'second'
J	<ju.un> 'summer'	<c.ur.er> ~ <j.ur.er> 'second'
T	<t.qo.a> 'chicken'	<t.ur.er> ~ <d.ur.er> 'fourth'
D	<da.lV?> 'seven'	<t.ur.er> ~ <d.ur.er> 'fourth'
P	<p.o(.o)> 'monkey'	<p.u> ~ <b.u> 'to be' (Kane 2009: 156)
B	<b.qo> 'son'	<p.u> ~ <b.u> 'to be' (Kane 2009: 156)

(19.10.16.0:57: Chinese loanwords seem to generally be stable, though there are exceptions like the syllable in the table above.)

I think the stables mostly have unaspirated-aspirated oppositions:

series	velar	palatal	dental	labial
1	k	c	t	p
1	/kʰ/	/cʰ/	/tʰ/	/pʰ/
2	g	j	d	b
2	/k/	/c/	/t/	/p/

I've omitted x /x/ which doesn't fit into the above paradigm. The shift of k > x is also in Jurchen and Manchu (under Khitan influence?).

The alternators are a mystery:

Are the alternations specific to certain environments: e.g., do /aspirates/ deaspirate between voiced segments? (But why don't all /aspirates/ deaspirate?). 'Second', 'fourth', and 'to be' are unlikely to have /aspirates/ since their Mongolian cognates begin with unaspirated j-, d-, and b-. But ... why would /nonaspirates/ surface as [aspirates]? Was Khitan like English with word-initial [aspiration]?
Could the alternators have a third obstruent series without any unique spellings? The small script was made under Uyghur influence. The Old Uyghur script had only two consonant series (though its Aramaic source had three). Are the two series of the script an artifact of Uyghur? Might Mongolic have reduced three series of consonants to two?

3. I didn't see Andrew West's latest post until just now. Two points:

3a.

This sequence of three characters 没蜜施 does not make any sense as Chinese, but are here used to transcribe the Old Uighur word bolmïš "to have become" (from bol- "to be, to become" plus perfect participle -mïš) or bulmïš "to have received" (from bul- "to find, to get, to receive" plus perfect participle -mïš) which both occur in the titles of nine Uighur khans between 747 and 848, as recorded in the Old Book of Tang (舊唐書 Jiù Tángshū) and New Book of Tang (新唐書 Xīn Tángshū).

Why were bol- and bul- both written with the same Chinese character 没 (now read with m- in most Chinese varieties, not b-!)?

In the Tang prestige dialect, 没 was pronounced something like *mbor.

Idealized Middle Sino-Korean 모ᇙ〮 <morʔ·> preserves *-r.
Kan-on botsu < Old Japanese Kan-on *mbot preserves the stop part of the initial, though it lacks a final liquid because it was borrowed before *-t > *-r.
Old Uyghur transcriptions of Chinese render Tang prestige Chinese *mb- as <m> or, once in the case of 穆, <p>. (Unfortunately I do not know of any Uyghur transcription of 没.)
Some Chinese varieties may still preserve the *stop part of the initial (unless they denasalized the initial independently):

Toisanese and a few Yue varieties have mb-
Some Min varieties and the Hakka of Hong Kong have b-

The Tang prestige dialect no longer had *b-, so *mb- was the closest available approximation of Old Uyghur b-.

The Tang prestige dialect had no *-l, so *-r was the closest available approximation of Old Uyghur -l.

没 *mbor is the best possible approximation of Old Uyghur bol- 'to be'. But why wasn't Old Uyghur bul- 'to find' transcribed as *mbur? Because the Tang prestige dialect lacked that syllable: earlier *mut had become *mvur instead of *mbur. *mb- could not occur before *u in the Tang prestige dialect. So 没 *mbor had to do double duty for Old Uyghur bol- 'to be' and Old Uyghur bul- 'to find'.

3b.

There is some confusion over the two words bolmïš "to have become" and bulmïš "to have received" as they are both written the same in the Old Uighur and Old Turkic scripts, and are both transcribed as 没蜜施 or 没密施 mò mì shī in Chinese (没 is pronounced mut⁶ in Cantonese), which suggests to me that they should be the same word.

Clauson (1972: 332) says early Turkic bol- and bul- were "normally indistinguishable graphically". I assume the different vowels have been projected backward from the modern languages: e.g., modern Turkish has olmak 'to be' (with irregular b-loss) and bulmak 'to find'.

3c. Why did 蜜 *mbir ~ 密 *mbɨir transcribe an Old Uyghur open syllable mï? In theory, 麋糜靡 *mbɨi would have been better fits, but 蜜 and 密 are more frequent characters, so maybe they came first to the transcriber's mind. (No, in fact, 糜靡 are more frequent than 蜜 in Jun Da's premodern corpus!)

4. Japanese amefurashi 'sea hare' has a surprising spelling (in addition to boring ones: アメフラシ, 雨降らし, 雨降):

雨虎 <RAIN TIGER>

ame-fur-ash-i is literally 'rain-fall-CAUS-ing'. ame obviously corresponds to 雨 <RAIN>, but furashi has nothing to do with 虎 <TIGER>. (It is a coincidence that 虎 is pronounced fu2 in Cantonese.)

Are there any native Chinese, Korean, or Vietnamese words for 'sea hare'? Do the words I found in Wikipedia count, or are they loan translations of Latin lepus marinus (via sea hare or some similar European term)?

C 海兔 'sea hare'
K 바다 토끼 pada thokki 'sea hare'

tho may be borrowed from Chinese 兔

V thỏ biển, lit. 'hare sea' with Vietnamese modified-modifier order

thỏ is borrowed from Chinese 兔

There are five nôm spellings for biển 'sea' at nomfoundation.org:

graph	radical	phonetic	phonetic reading	vowel	tone	coda	points
㴜	氵<WATER>	扁	biển	✓✓	✓✓	✓	5
𣷷		变	biến		✓		4
𤅶		變	biến		✓		4
汴		卞	biện		✕		3
𣷭		彼	bỉ	✓	✓✓	✕	3

The points indicate the quality of matches: i.e., the number of check marks.

㴜 is the most straightforward; the phonetic 扁 is a perfect match.

𣷷𤅶 have a phonetic with a mismatching tone in the same *register as the tone of biển.

汴 has a phonetic with a mismatching tone in a register differing from that of the tone of biển.

𣷭 has a phonetic with a matching tone, but the vowel of 彼 is a monophthong instead of a diphthong, and 彼 has no -n. Did 𣷭 originate as an error for 𣷷?

19.10.11.14:55: EYE OF THE ARAB

(Posted 19.10.16.)

That's what I thought عين العرب `Ayn al-`Arab meant at first. It's actually 'Spring of the Arab' - the water sort of spring. عين `ayn has multiple meanings. `Ayn al-`Arab is also known as Kobanî or Kobane:

Nobody disputes that the town [of Kobane] is a relatively new settlement. Before the 20th century, it was just a water meadow where even great commanders like Saladin used to feed the horses of his army. For a long time, it was referred to as Arab Punarı (“Arab Spring” in Turkish).

Muhsin Kızılkaya, a writer of Kurdish origin, told private Turkish broadcaster CNN Türk on Oct. 13 that Kobane was not even a small village at the turn of the century. “The Germans set a small station there while building the Baghdad Railway. A new settlement was developed around the construction and locals called it Kobane, in reference to the German ‘company’ that built a road in the area,” he said.

The rendering of “company” as “Kobane” seems logical at first glance, considering the fact that both Kurds and Arabs adapt many Western words by changing the letter “m” to “b.”

Really? Initially I assumed that the p of German Kompanie was Arabized as b since standard Arabic has no p. Kurdish has p, and both Kurdish and Arabic have m, so there would be no motivation for changing m to b.

(The correspondence of mp to b reminds me of how Japanese b is from *mp: e.g., in 旅人 tabibito < *tambimpitə 'traveller'. See yesterday's entry.)

The Hurriyet Daily News article points out another problem:

Historically, however, the “company theory” sounds weak, as Germans use the word “Gesellschaft” for business companies. “Kompanie,” on the other hand, refers to military units. [See Wiktionary.]

Others have suggested that the middle part of the name Kobane could come from the German word “bahn” (road). In fact, Anatolische Eisenbahn, a German company, built the landmark Baghdad Railway, which some historians see as one of the causes of the First World War.

But if -bane is from German Bahn, what is Ko-?

I'm amazed that such a recent name has no certain etymology.

19.10.10.23:51: TANGUT DATABASE 4.0

1. Version 4.0 of my Tangut database includes a new Unicode column and has corrected data for 11 entries thanks to Andrew West. Details in the changelog on sheet 2.

2. I never gave any thought to the d in Japanese 仲人 nakōdo 'matchmaker' (cf. naka 'middle' and hito 'person') and 狩人・猟人 karyūdo 'hunter' (cf. kari 'hunting' and hito 'person') until I learned today that 旅人 <TRIP PERSON> could be read as tabyūdo 'traveller' (just one of six possible readings!). Wiktionary gives the following derivation:

*/tapiputo/ → */tabibuto/ → /tabiudo/ → /tabjuːdo/

Here's my derivation:

Stage 1: *tambi nə pitə 'trip GEN person'

Stage 2: *tambinpitə

The genitive marker *nə is reduced to *n.

Stage 3: *tambimpitə

*n assimilates to the following *p. (This stage and the previous one were probably simultaneous, as I doubt there was ever a period in which */np/ and */mp/ were distinct. I think *n assimilated as soon as the following vowel was lost.)

Stage 4: *tambimbitə

The earliest attested form from Man'yōshū (but written semantographically as 客人 <GUEST PERSON>; if not for the reading tradition, I would never have guessed that 客 was read tabi).

*p voices to *b after voiced *m.

The regular reflex of this word survives in modern Japanese as tabibito, the most common reading of 旅人.

Stage 4: *tambimbũto

*ə rounds to *o.

Irregular assimilation of *i to the preceding prenasalized labial stop *mb.

Stage 5: *tambiũndo

The nasalization spreads from the vowel to the following stop.

Stage 6: *tabjuːdo

Voiced stops lose prenasalization.

*i is reduced to *j and its length is transferred to the following *u.

Stage 7: [tabjɯːdo]

*u loses its rounding.

(10.11.14:55: Stages 2-3 added and stage 4 edited.)

3. If Hawaiian mōʻī 'king' is "of recent origin", not in print until 1832, where did it come from? wehewehe.org proposes a link to ʻī 'supreme'. I would expect mōʻī to be a noun-adjective phrase 'supreme mō'. But what is mō? The short form of moku 'district'?

It seems mōʻī got 'promoted' over time: 19th century attestations mean 'temple image', 'lord of images', and 'a rank of chiefs who could succeed to the government but who were of lower rank than chiefs descended from the god Kāne'. Cf. how the Xiongnu title 'crown prince' transcribed in Late Old Chinese as 護于 *ɣwah-wɨa (phonetically [ʁwɑχwɨa]¹?) may be the source of the later Altaic title qaghan 'supreme ruler' (see Vovin 2007 on its etymology).

¹I speculate that uvular [χ] is an allophone of final /h/ in 'type A' syllables which are characterized by lower, backer vowels (like [ɑ] for /a/) and uvulars. The use of *ɣ for what I think was [ʁ] is out of habit and in accordance with tradition.

It is interesting that the second syllable of 護于 *ɣwah-wɨa is a 'type B' syllable which is characterized by higher, less back vowels (like *ɨa < *a). That suggests the original Xiongnu word had a mix of vowel types and that the Xiongnu language did not have vowel harmony like Altaic (or, I think, Early Old Chinese). A very un-Altaic *ʁwɑχwa² with two different vowels may have been simplified to qaghan (i.e., two syllables with the same vowel) in Turkic and Serbi-Mongolic (e.g., Khitan qagha) via Ruanruan.

²Late Old Chinese had no syllable *wa, so 于 *wɨa might have been an approximation of a Xiongnu *wa.

4. 寄席 <GATHER SEAT> yose 'traditional Japanese verbal entertainment theater' is an interesting case of an abbreviation in speech but not in writing. Naver regards it as an abbreviation of 寄せ席 yoseseki 'gather-seat'. seki 'seat' is no longer pronounced in yose, but its character 席 remains in spelling.

19.10.9.22:26: KHITAN SMALL SCRIPT CHARACTERS 108, 110, AND 111

(Posted 19.10.11.)

Could the low-frequency Khitan small script characters

110 and 111

be variants of the high-frequency character

254 <d>?

If 110 is a variant of 254, then perhaps the low-frequency character

108

is a variant of

107 ~ 347 <oi>.

Let's see if any instances of 110 and 111 are in environments matching those of 254.

In 契丹小字研究 Research on the Khitan Small Script (1985), 110 always only appears in second position unlike 254 which has no such restriction. Is that significant? Could that imply that 110 is a vowel character that must follow an initial consonant character?

text	block	transliteration	254 match?
Xuan 19.24, Zhong 6.6, 13.27, 41.26, 46.4	021.110	mo.110	none?
Zhong 41.12, Zhong 44.44	021.110.140	mo.110.en	Dao 25.13

<mo.110.en> looks like a genitive of <mo.110>. If 110 is <d>, then the above words are mod and mod-en. Could mod in turn be a variant of 021.247 <mo.t> 'woman-PL'? Both -d and -t are attested as plural suffixes after vowel-final nouns (Kane 2009: 138-140).

text	block	transliteration	254 match?
Xu 51.44	111	111	Gu 6.30
Zhong 28.9	028.111.339.100	sh.111.i.en	no

111 is followed by 241 <pu> and resembles

374 <tai>.

Could 111 241 be <tai pu> for Liao Chinese 太傅*tʰajfu 'Grand Tutor'?

254 appears by itself where 254.122 <d.ai> for Liao Chinese 大 *taj would be expected. I suspect 254 is an error for 254.122 and not a true standalone character like 374 (and 111?).

028.111.339.100 is the spelling on pp. 607 and 705 of Research on the Khitan Small Script, but p. 178 has 028.110.339.100.

108 is always in second or third position. Is it in any environments where 107 and/or 347 are attested?

text	block	transliteration	107 match?	347 match?
Xing 4.13, Dao 7.24, 19.8, Xuan 6.20	131.108	u.108	none?	none?
Dao 28.11	021.108.261.112.020	mo.108.l.ge.ei
Xuan 27.26	162.327.108.140	c.ie.108.en
Gu 15.6, 15.8	104.327.108	j.ie.108
Zhong 43.36	131.108.254	u.108.d

<u.108.d> could be a plural of <u.108>.

<mo.108.l.ge.ei> looks like a verb + -lge- causative/passive + converb -ei sequence. Interpreting 108 as <oi> would work nicely there: moilgei? But <oi> would result in awkward vowel sequences elsewhere: e.g., cieoien? Could 108 represent a CV syllable absent from Chinese loanwords?

19.10.8.20:44: ÖCALAN

(Posted 19.10.11.)

1. What is the etymology of the surname of Kurdish leader Abdullah Öcalan? It doesn't look Kurdish since it has the vowel ö absent from Kurdish. The vowel ö is characteristic of Turkish (and Wiktionary identifies the name as Turkish), but the vowel sequence ö-a violates Turkish vowel harmony. (Hypothetical harmonic names would be Ocalan and Öcelen.) Such violations are possible in compounds and loanwords, but I cannot find any Turkish words ö, öc, or calan that would enable me to analyze the name as ö-calan or öc-alan. (There is a Turkish word alan.)

2. "Areal developments in the history of Iranic: West vs. East" (2018) by Martin Joachim Kümmel looks like a handy all-in-one-place reference for the big picture of Iranic historical phonology.

I'm going to start using 'Iranic' instead of 'Iranian' in linguistic contexts

[t]o avoid confusion with terms related to the country or territory of Iran (especially in recent geneticist papers speaking of prehistoric "Iranian" populations almost certainly not "Iranian" in the linguistic sense) (Kümmel 2018, slide 3)

Iranic is consistent with my use of Turkic to avoid confusion with Turkish referring to the country and dominant language of Turkey.

3. By analogy, maybe I could use Taic to avoid confusion with Thai referring to the country and dominant language of Thailand. Or perhaps better yet, Daic for consistency with Kra-Dai (it is odd that most speak of a 'Tai' rather than a 'Dai' branch of Kra-Dai, though there is a Kra branch).

4. Back to Kurdish: how did Sorani and Kurmanji develop h- in hesp 'horse' in this table? That seems to be an independent Kurdish innovation that has nothing to do with the equally mysterious h- in Greek ἵππος <híppos>. The Proto-Indo-European initial consonant of 'horse' was *ʔ-, not the *s- that became h- in Greek and Iranic.

*s-weakening occurred independently in those two branches, as it is absent from Indic and cannot be reconstructed at the Proto-Indo-Iranic level (see Kümmel 2018, slide 14):

Proto-Indo-European *s
Greek h	Proto-Indo-Iranic
	Sanskrit s	Proto-Iranic *h

Pyu also has *s-weakening: e.g., hi 'to die' (cf. Tangut 𗢏 3072 2si4 < *CIseH 'id.').

Typing "Proto-Indo-Iranic" and "Proto-Iranic" feels weird. But are such terms any worse than Indic instead of Indian?

5. In that same table are Sorani erz ~ erd, Kurmanji erd, and Zaza erd 'earth'. Is that word a northwestern Iranic innovation that has nothing to do with English earth? Wiktionary lists no Iranic reflexes of Proto-Indo-European *ʔer- 'earth'.

6. How did I never encounter the word glossonym before?

Among some Yazidis, the glossonym Ezdîkî is used for Kurmanji to signify an attempt to erase their affiliation to Kurds.

7. Another new word for today: Kurdification.

Kurdification is a cultural change in which non-ethnic Kurds or/and non-ethnic Kurdish area or/and non-Kurdish languages becomes Kurdish.

I don't see how languages can become Kurdish as opposed to be replaced by Kurdish.

8. I don't understand the phonetic logic behind this Kurdish rule:

After /ɫ/, /t/ is palatalized to [tʲ]. An example is the Central Kurdish word gâlta ('joke'), which is pronounced as [gɑːɫˈtʲæ].

Is /t/ dissimilating after velarized /ɫ/? I think of velarization and palatalization as being opposites. But palatals are between dentals and velars in terms of point of articulation, so maybe this rule is assimilatory.

9. I also don't get this Kurdish rule which involves palatalization next to a velar:

When preceding /ŋ/, /s, z/ are palatalized to /ʒ/.

I guess that happens that happens because [ʒ] (I think phonetic brackets were intended) is closer to /ŋ/ than /s, z/.

Is there any language in which /s z/ become velar [x ɣ] next to a velar?

10. This Sorani allophony reminds me of how Middle Chinese *æ corresponds to Mandarin [ə] in 生 shēng [ʂəŋ] (I should continue my series on 生):

The vowel [æ] is sometimes pronounced as [ə] (the sound found in the first syllable of the English word "above"). This sound change takes place when [æ] directly precedes [w] or when it is followed by the sound [j] (like English "y") in the same syllable.

The environment is completely different, though. And I don't understand why glides are schwa-friendly, though I admit I find [əw əj] easier to pronounce than [æw æj].

10. I am baffled by Kümmel's use of [ʆ] as well as [ɕ] for Sanskrit in slide 14 of his 2018 PowerPoint. I have never seen anyone use [ʆ] for Sanskrit (or any language, really). Is [ʆ] an allophone of /ś/ before /r/?

11. I've never seen the letters ḧ ẍ before. They are optional symbols for [ħ ɣ] in the Hawar and universal extended alphabets for Kurdish. (The Hawar section of the Wikipedia article seems to have the Arabic equivalents of ḧ ẍ reversed. I assume the IPA at the bottom of the article is correct, and that the diaereses are added to indicate voiced versions of h [h] and x [x].)

12. Trying to relate Mandan in North Dakota to ... Welsh seems so random. I didn't know there were deep inland variants of the Welsh-in-America myth. Or that there were this many variants:

In all, at least thirteen real tribes, five unidentified tribes, and three unnamed tribes have been suggested as "Welsh Indians."

Chris Harvey/Languagegeek patiently proves "That Mandan Is Not Welsh".

Just seven years after Harvey wrote that article, Mandan became extinct when Edwin Benson died in 2016.

13. Backer is bigger in Mandan:

Mandan, like many other North American languages, has elements of sound symbolism in its vocabulary. A /s/ sound often denotes smallness/less intensity, /ʃ/ denotes medium-ness, /x/ denotes largeness/greater intensity

Is there a similar gradation for stops: /t/ 'less' vs. /k/ 'more'? (There is no /tʃ/.)

19.10.7.23:42: THE AMERICANIZATION OF MICHÁLKA

(Posted 19.10.11.)

1. It's hard to predict how non-English names are pronounced in American English. For instance, I wouldn't have guessed that Czech Michálka [ˈmɪxaːlka] would be pronounced as [miːˈʃɑːkə]. I'm not too surprised by the stress moving to the original long vowel (unstressed long vowels are difficult for English speakers). [miː] may reflect a Czech variety with [i] instead of [ɪ] (English has no nonfinal short [i], so [i] had to be lengthened to [iː]). But how did ch end up as [ʃ]? By analogy with French? Did <l> become silent by analogy with the silent <l> of words like <talk>?

2. Czech vowels look phonologically symmetrical, but that symmetry is lost in phonetic notation (source):

long	short	long	short	long	short
/iː/ [iː]				/uː/ [uː]	/u/ [u]
	/i/ [ɪ]
				/oː/ [oː]	/o/ [o]
/eː/ [ɛː]	/e/ [ɛ]
		/aː/ [aː]	/a/ [a]

Why are /iː i/ not phonetically the same height? (They are both equally high in Eastern Moravian Czech. The [miːˈʃɑːkə] may reflect a Czech variety with /i/ [i] instead of /i/ [ɪ].)

What motivates /i eː e/ being lower than /u oː o/?

Slovak, on the other hand, at first glance seems to have a nearly symmetrical system apart from the nearly extinct vowel /æ/ which has no long counterpart (source):

long	short	long	short	long	short
/iː/ [iː]	/i/ [i]			/uː/ [uː]	/u/ [u]
/eː/ [ɛː]	/e/ [ɛ]			/oː/ [oː]	/o/ [o]
	/æ/ [æ]	/aː/ [aː]	/a/ [a]

But in closer notation, /eː e/ are higher than /ɔː ɔ/ - the reverse of Czech!

long	short	long	short	long	short
/iː/ [i̞ː]	/i/ [i̞]			/uː/ [u̞ː]	/u/ [u̞]
/eː/ [e̞ː]	/e/ [e̞]
				/oː/ [ɔ̝ː]	/o/ [ɔ̝]
	/æ/ [æ]	/aː/ [aː]	/a/ [a]

Can Czech and Slovak native speakers tell each other apart from the minor differences in their vowels?

3. Why is Japanese sazo(kashi) 'certainly' spelled 嘸(かし) with 嘸 <MOUTH.NOT>? 嘸 has represented several morphemes over time in Chinese. The earliest one known to me is Late Old Chinese *mɨaʔ 'surprised' in 漢書 Book of Han (111 AD). But none mean 'certainly'. Did someone in Japan think that 嘸 would be appropriate for sazo 'certainly' because 無 <NOT> would convey 'no (doubt)' or 'no (choice)': i.e., inevitability and hence certainty?

口 <MOUTH> is a common radical in Chinese grammatical words, though I can't think of any parallels in Japanese off the top of my head. (The one made-in-Japan kanji with口 <MOUTH> that comes to mind is 噺 hanashi 'story' which isn't a grammatical morpheme or even an abstract one like sazo 'certainly'. I just found that Wiktionary has a list of made-in-Japan kanji - the only other one with 口 <MOUTH> is 囎, a phonetic symbol for the first syllable of the placename 囎唹 Soo).

Shpika stats, Kanken levels, and Jun Da's general Chinese ranks:

kanji	Aozora	news	Twitter	Wikipedia	Kanken	Jun Da
口	89	295	187	177	10	224
無 = 无	56	237	55	257	7	73
噺	2510	2353	2792	2674	pre-1	-
嘸 = 呒	3541	-	-	7132	1	5223
唹	5650	-	-	4776	1	-
囎	6097	-	-	4810	-	-

Windows 10's Japanese IME does not include 嘸 as a choice for sazo, but it does include 嘸かし for sazokashi after さぞかし.

Google frequencies:

さぞかし: 5.34 million
嘸かし: 18,500

さぞかし outnumbers 嘸かし by 289 to 1.

4. Wiktionary lists so as the on reading for 囎 even though by definition a made-in-Japan kanji cannot have borrowed Chinese readings. It can, however, have Chinese readings, as a Chinese reader could read it like a component that could be interpreted as a phonetic: e.g.,

噺 hanashi as xīn like 新 xīn
囎 so as zèng like 贈 zèng

in Mandarin.

Wiktionary also lists a kun reading shō. If 囎 is only for 囎唹 Soo, when is shō used?

19.10.6.23:54: THE EARLY LIFE OF 生 (PART 1)

(Posted 19.10.11.)

1. Continuing from "The Two Lives of 生" with a new title:

For a long time - all the way into the late Old Chinese period - 生 belonged to the 耕 rhyme category.

In 詩經 Shijing about three millennia ago, 友生 'friend' rhymed with 平 'peace', another 耕 rhyme word (translation by James Legge):

矧伊人矣、不求友生。

And shall a man,

Not seek to have his friends?

神之聽之、終和且平。

Spiritual beings will then hearken to him;

He shall have harmony and peace.

I reconstruct 生 as Old Chinese *sIreŋ and 平 as Old Chinese *CIPreŋ.

In modern Cantonese, 生 and 平 have pairs of readings that don't rhyme at all:

sinograph	colloquial	literary
生	saang1	sang1
平	peng4	ping4

How did that happen? Stick with me for future parts.

2. I found the Hong Kong Characters section of the 漢語多功能字庫 Multi-function Chinese Character Database when looking for Cantonese 𢫏 kam2 from the previous post. I don't know why I never explored that database before. I was aware of it but just never clicked. I don't have time to look around, so I'm just going to look at the 33-stroke Hong Kong characters:

𡤻 lyun4 '?' < 女 <FEMALE> + 鸞 lyun4
鱻 sin1 < 魚 <FISH> x 3 'fresh' (I guessed this was a variant of 鮮 sin1 'fresh', and I was right)

鱻 appears in this list of variant characters. Two others of interest in that list:

騐 for 驗 yim6 'to test' (even though 念 nim6 has n-!)
亱 for 夜 ye6 'night' (even though 但 daan3 sounds nothing like ye6!)

Both appear to be graphic errors (facilitated by the rhyming of 驗 and 念).

龗 ling4 'dragon' < 霝 ling4 + 龍 <DRAGON>

This character goes back to 説文 Shuowen and is not Cantonese-specific, so I don't understand why it's in a Hong Kong character list.
Is ling4 an ablaut variant of 龍 lung4 'dragon', or is it merely a different spelling of 靈 ling4 'spirit' used for 'spirit' as a reference to dragons? (Although I cite Cantonese readings here, the ablaut would be at the Old Chinese level and involve an alternation of *e and *o, not i and u.)

Is 𡤻 a character for girls' names?

3. When I tried to type 靈 líng 'spirit' (Jun Da frequency for simplified 灵: #730) for the previous topic, Windows 10's Microsoft Pinyin IME did not include it in its 94 choices for Mandarin ling (excluding the lin-graphs after those 94). I don't understand why some common characters don't appear in the first batch of options. The first four options are (with Jun Da rankings)

1. 零 líng 'zero' (#1498)

2. 令 lìng 'order' (#267)

3. 另 líng 'other' (#620)

4. 霛 líng (variant of 靈; #8145)

I've never even seen 霛 before.

Another case like this is 家 jiā 'home' (Jun Da #55) which is not in Microsoft Pinyin IME's 89 choices for Mandarin jia (excluding the ji-graphs after those 89).

To type 靈, I typed <lingjing> (see topic 4), got 靈境, and deleted the second character 境.

To type 家, I typed <jiazu> 'family', got 家族, and deleted the second character 族.

I shouldn't have to do that.

4. I meant to type <jingling> for 精靈 jīnglíng 'spirit' and ended up discovering another word 靈境 língjìng 'spiritual territory' instead.

Taishūkan's 新漢和辞典 New Sino-Japanese Dictionary defines 靈境 <SPIRIT TERRITORY> as 靈地 <SPIRIT EARTH>: こうごうしい土地 kōgōshii tochi 'godly land'.

kōgōshii has a long ō common in Chinese loanwords like 皇后 kōgō 'empress', but if it were a Chinese loanword, it wouldn't be spelled in kana as こうごうしい. It has a partly semantographic spelling 神神しい <GOD GOD si i> revealing its true etymology. The Japanese root kamu- 'god' was compressed into kō:

*kamu > *kau > *kɔː > kō [koː]

The gō of kōgōshii is the same kō but with the voicing characteristic of the initial consonants of second elements of compounds; cf. 神神 <GOD GOD> kamigami 'gods' < kami < *kamu-i 'god'.

5. Korean 간직 kanjik 'storing away' sounds like a Chinese loanword but has no obvious Chinese etymology. Martin et al. (1967: 41) suggests a Sino-Korean source *看直 'look straight' which would be pronounced kanjik in Korean. It's a perfect phonetic match, but I don't see how it can semantically fit.

6. Today I learned that no get means 'does not have' in West African Pidgin English as well as in Pidgin (Hawaii Creole English/HCE):

25% of young pipo no get one single friend

I say pipo too.

HCE has about 600,000 speakers (Sakoda and Siegel 2003: 1), whereas West African Pidgin English has about 75 million! The largest French-based creole seems to be Haitian Creole with 12 million speakers.

Here's "The absolute beginners' guide to [West African] Pidgin" by speaker Kobby Ankomah-Graham:

Pidgin is defined by its practicality. Fluency will reduce how much you have to pay for cab fares or market tomatoes. [...] Advertising in Pidgin – once unthinkable – is now commonplace.

That's not yet the case for HCE in Hawaii, though we do have HCE greeting cards in stores.

Tangut Yinchuan font copyright © Prof. 景永时 Jing Yongshi
Tangut character image fonts by Mojikyo.org
Tangut radical and Khitan fonts by Andrew West
Jurchen font by Jason Glavy
All other content copyright © 2002-2019 Amritavision