WHITE RAT 2.7
? qulugh ai ? sair ? nyair
'white rat year, two month, seven day'
1. THE UYGHUR QUESTION: Dunnell (1996: 54) wrote:
[...] Uighurs consistuted diverse groups who played important but ill-documented roles in both Liao [= Khitan Empire] and Xia [= Tangut Empire] state formation. Were they refugees from Central Asian Muslim rulers or ambitious emigrants from Shazhou (Dunhuang) anxious for employment? Or were they local Alashan (i.e., Helan shan) Uighurs? Different groups of Uighurs moved through the region in the tenth through twelfth centuries, some entering the Liao elite and some the Xia elite [...] and many maintaining traditional trading connections with Tibet, Qinghai, and Central Asia.
Are the Uyghurs in the Tangut Empire better understood now, twenty-four years later?
I have never seen anyone address the possibility of Uyghur influence on Tangut, apart from Kwanten's (1989: 18) suggestion that the Tangut may have used the Uyghur script before using their own script in 1036. (I am not counting Kwanten's hypothesis that the Tangut script represents an 'Altaic'-type [but not specifically Uyghur] language.)
Yesterday I was wondering if Tangut vocalism could have been under Turkic influence: e.g., it could have had front rounded vowels (which are unlikely to have been in Proto-Sino-Tibetan). Tangut certainly could not have shared a vowel inventory with Uyghur, as it had an enormous number of vowels with distinctions without Turkic parallels: nasality, tension, retroflex, and the mysterious quality that I indicate with an apostrophe (a substitute for a prime symbol). Nonetheless perhaps at some earlier stage Tangut could have had a simpler Turkic-like vowel inventory.
The Tangut imperial family claimed descent from the Tuoba Wei [a Serbi dynasty], and "other powerful Tangut clans [...] could also claim Xianbei [= Serbi] descent" (Dunnell 1996: 45). That brings up the possibility of para-Mongolic (i.e., Serbi) influence on Tangut: e.g., height harmony (which I have suspected to have driven the development of the grade system).
Any Turkic or para-Mongolic - in a word, 'Altaic'¹
- influence on Tangut was certainly not morpholo'gical or syntactic, as
Tangut has no 'Altaic'-type morphology, and its word order is typically
Tibeto-Burman (i.e., Sino-Tibetan minus Chinese). Yes, Tangut has final
verbs like 'Altaic', but so does, say, Pyu which had zero contact with
'Altaic' languages. What Tangut does not have is 'Altaic'
syntactic features absent from Tibeto-Burman: e.g., consistent
¹I use the term 'Altaic' to indicate a
linguistic area, not a language family.
Atypically even among the country's small educated elite, Sukarno was fluent in several languages. In addition to the Javanese language of his childhood, he was a master of Sundanese, Balinese and Indonesian, and was especially strong in Dutch. He was also quite comfortable in German, English, French, Arabic, and Japanese, all of which were taught at his HBS [Hogere Burgerschool].
Was Japanese really taught at a Hogere Burgerschool in the Dutch East Indies in the late 1910s? Was Japanese taught at any high school as a foreign language outside the Japanse Empire a century ago?
As for Arabic, I imagine it was taught as a component of his
religious studies rather than as a language for active use.
3. What is Fatma- in the name of Sukarno's third wife Fatmawati? It looks
like Sanskrit padma- 'lotus' with p- changed to f-
to Arabize it (standard Arabic has no p) and -d changed
to -t to conform to Indonesian phonotactics which do not permit
syllable-final voiced stops. (I assume the d in Indonesian padma
'lotus' represents [t].) -wati is from Sanskrit -vatī
'having' (feminine nominative singular).
4. Is this serious?
Therefore, a new possibility arises that the origin of Uralic languages (and perhaps also of the Yukaghir languages) may be Liao River region.
5. A reminder that 'genetic' relationships of languages are not really genetic:
N-M178* has higher average frequency in Northern Europe than in Siberia, reaching frequencies of approximately 60% among Finns and approximately 40% among Latvians, Lithuanians & 35% among Estonians (Derenko 2007 and Lappalainen 2008).
Finnish and Estonian are Uralic, but Latvian and Lithuanian are Indo-European. I would guess that N-M178 is uncommon or absent from Hungarians even though Hungarian is Uralic.
6. Writing about the pan-East Asian word 比較 'comparison' which does not seem to exist in Vietnamese made me wonder how Vietnamese so [ʂɔ] 'compare' was written in nom. nomfoundation.org has 14 different spellings with phonetics:
||variant of 芻||NomNaTongLight.ttf U+F125C
||'hand' indicates a verb
||車 shared with 較 'compare'
||區 < 樞 xu
||'hand' indicates a verb|
||抠||区 < 枢 = 樞 xu||anachronistic?; 'hand' indicates a verb|
||慮 lự||'hand' indicates a verb
||𢫘||卢 = 盧 lô
||anachronistic?; 車 shared with 較 'compare'; does D2 轤 exist?|
||𨏧||盧 lô||車 shared with 較 'compare'|
||⿰卢車||卢 = 盧 lô||anachronistic?; 車 shared with 較 'compare'|
||⿰口初||口||初 sơ||NomNaTongLight.ttf U+F129D|
Vietnamese s- is from *Cr-. Sometimes nom spellings
created before *Cr- > s- can point to what *C
was. Unfortunately none of the spellings seem to be helpful unless *C-
Normally phonetics with unrounded vowels like ơ and ư are not used to write Vietnamese syllables with rounded vowels. C1 攄 may be a graphic error for D1 攎. I cannot explain E1 ⿰口初.
Some forms look like modern PRC-style simplified characters and may
be anachronistic, though maybe they did exist in premodern Vietnam.
7. MANCHU BEFORE MANCHU?: This image of a "Manchu" couple dates from c. 1590, 46 years before the Manchu adopted the autonym Manju 'Manchu'.
What does "Tohany" in the caption mean?
More images from the Boxer Codex here.
According to Leiden scholars, another possibility is derivation [of Greek 'hundred'] from Proto-Indo-European *h₁ḱm̥tóm, which is a regular simplification of *dḱm̥tóm in their theory.
What are other examples of Proto-Indo-European *d becoming *h₁ [ʔ] in initial clusters?
9. Tonight's 48 Hours pronounces Frunză [ˈfrunzə] in Moldova as [ˈfɹʌnzə] by analogy with English run [ɹʌn]. Sigh.
10. I use this
English circumposition all the time even though I never heard of
the term circumposition until I read about Pashto grammar
tonight. The Wikipedia Pashto grammar article uses the term ambiposition.
WHITE RAT 2.6
? qulugh ai ? sair ? nyair
'white rat year, two month, six day'
1. I never tire of posting this list of variants of <SIX> i the Khitan large script that Andrew West compiled (and includes in his font that you see here):
How many other Khitan large script variants remain undiscovered? How
many platonic characters (platographs?) does
the Khitan large script have? Certainly less than the 2,000+ different
forms that have been found so far. I've been guessing that the
platonic'inventory is no more than 1,000 characters.
Today I read that Konstantin Pozdniakov reduced Barthel's inventory of 600 rongorongo glyphs down to about 78 with 52 accounting for 99.7% of the corpus.
The other 0.3% were made up of two dozen glyphs with limited distribution, many of them hapax legomena. This analysis excluded the Santiago Staff, which contained another three or four frequent glyphs. [My figure of ~78 is from 52 + "two dozen" + "three or four".]
As Pozdniakov readily admits, his analysis is highly sensitive to the accuracy of the glyph inventory. Since he has not published the details of how he established this inventory, it is not possible for others to verify his work.
2. Wikipedia also says:
However, Sproat (2007) believes that the results from the frequency distributions are nothing more than an effect of Zipf's Law, and furthermore that neither rongorongo nor the old texts were representative of the Rapanui language, so that a comparison between them is unlikely to be enlightening.
I've known of Zipf's law for a long time, but I didn't know what it exactly was until now:
Zipf's law was originally formulated in terms of quantitative linguistics, stating that given some corpus of natural language utterances, the frequency of any word is inversely proportional to its rank in the frequency table. Thus the most frequent word will occur approximately twice as often as the second most frequent word, three times as often as the third most frequent word, etc.: the rank-frequency distribution is an inverse relation.
If I understand that correctly, it predicts that the top three words
in a list should occur in a 6 : 3 : 2 ratio. For every six occurrences
of the most frequent word, there should be about three occurrences of
the second most frequent word and about two occurrences of the third
most frequent word. (Let's not worry about defining 'word'.)
Here are ratios from three lists I've looked at - I've given the most common word/component a value of 6 to facilitate comparison with the 6 : 3 : 2 ratio above:
English TV and movie scripts: 6 : 5 : 4 (you : I : to)
The text of the Dutch novel Max Havelaar: 6 : 3.4 : 3.1 (de 'the' [m./f./pl.] : en 'a' : het 'the' [n. sg.])
David Boxenhorn's counts of Tangut character components: 6 : 2.96 : 2.78 (𘢌 dex : 𘠁 bae : 𘠢 cin)
3. Before seeing the frequency list above, I had never heard of Multatuli's Max Havelaar,
"the book that killed colonialism" in the words of Pramoedya
In the last chapter the author announces that he will translate the book "into the few languages I know, and into the many languages I can learn."
... which fits the multilingual reputation of the Dutch.
Incredibly Max Havelaar "was not translated into Indonesian
until 1972" - long after independence! And a 1976 film adaptation "was
not allowed to be shown in Indonesia until 1987." I wonder why.
4. I don't know how I managed to not see this photo of
the Dornogovi inscription (1058) in the Khitan large script on Wikipedia
until last night. It's been up since.
(ᠳᠣᠷᠤᠨᠠᠭᠣᠪᠢ <TUrUnaghUbi> in the traditional Mongolian script) is
5. In White Rat 2.2 I forgot to mention that the Albanian version of Persian khwāja 'master' is hoxha [ˈhɔdʒa]. When I first had to memorize Enver Hoxha's name for school in 1984, I must have mispronounced it as [haksha].
6. Words I discovered via Enver
Hoxha's Wikipedia entry: Albanian gjakmarrja
'blood feud' and hakmarrja 'revenge' (banned during his
early years in power) - and Serbian крвна освета
7. Last night I learned that the Defense Language Institute Foreign Language Center regards Pashto as a Category IV language: i.e., super-hard for English speakers like Arabic, Mandarin, Japanese, and Korean. What little I've seen of Pashto intimidates me in a way that Persian and Hindi do not. So I'm not surprised Pashto is the hardest Indo-European language on the DLIFLC scale.
used to teach Vietnamese until 2003. I've long been surprised that
Vietnamese and Thai were 'just' Category III languages (see
because I think they're comparable to Mandarin in difficulty apart from
their writing systems. I guess the Chinese script is a factor in
classifying Mandarin as Category IV.
8. As a general rule, Sino-Japanese compounds don't mix elements from different borrowing strata. But there are many exceptions. 隧道 'tunnel' is read as
zui-dō (Go-on stratum + Go-on stratum)
sui-dō (Kan-on stratum + Go-on stratum)
the Kan-on tō reading of 道 is uncommon, though a
very well-known example is 神道 Shintō
'god way', a wholly Kan-on compound (the hypothetical Go-on reading
would be *Jindō).
I suspect sui in 隧道 suidō is by analogy with its
phonetic, the more common character 遂 read sui rather than a
conscious use of the Kan-on reading. Today, 遂 is required in
schools, but 隧
is not. Shpika
遂 has the native reading tsui which I would mistake for
Sino-Japanese if I didn't know better. But tsui is from an
earlier disyllabic tupi which doesn't sound like a monosyllabic
9. Tonight I saw the word 比較 hikaku 'comparison' on the obi of Gundam Wing Data Collection 1 and was reminded of a question that's come to mind from time to time: why is 較 read kaku rather than kyō? Compare hikaku with Korean pigyo (not ˟pigak) and Mandarin bǐjiào (< *-æ̤w; not ˟bǐjué < *-æwk). (The corresponding hypothetical Vietnamese word should be *tỉ giảo or *tỉ giác. Has any such word ever existed?) In Middle Chinese, 較 represented *kæ̤w 'compare' and *kæwk 'bars atop the sides of a carriage box; compete; contest'. Most 交-characters were read with *-w, not *-wk. Did hikaku originate as a hypercorrection?
(2.29.11:53: The reading giảo < *kjảːw
for 較 is interesting, as I would expect ˟giáo. Segmentally giảo
looks like a late stratum loan, but its tone is typical of an early
stratum loan. In theory an early borrowing of 較 would have been *kẻo
[kɛ̉w]. Perhaps giảo is a middle stratum loan that tells us *ɛw
occurred in the source language before its 'departing tone' changed
from what sounded like a *sắc tone to what sounded like a *hỏi tone
to Vietnamese ears.)
10. What's with the camel case title of this article: "Type 97 ShinHoTo Chi-Ha medium tank"? I've never seen morphemic capitalization of Japanese before: 新砲塔 shinhōtō 'new cannon tower' as ShinHoTo.
explains what 97 ... Chi-Ha is:
Chi (チ) came from Chū-sensha (チュウセンシャ, "medium tank"). Ha and Ni, in Japanese army nomenclature, refer to model number 3 and 4, respectively from old Japanese alphabet iroha. The Type was numbered 97 as an abbreviation of the imperial year 2597, corresponding to the year 1937 in the standard Gregorian calendar. Therefore, the name "Type 97 Chi-Ha" could be translated as "1937's medium tank model 3".
11. Tonight I uploaded the last four White Rat month 1 entries (1.26 / 1.27
/ 1.28 / 1.29) culminating in my promotion of Dwight Decker's phrase déjà
Wiktionary says déjà
/deʒa/ is from dès
/dɛ/ 'from' (< de + ex) + jà
/ʒa/ 'already'. Why isn't the combination of /dɛ/ + /ʒa/ dèjà
12. I confess I carelessly used the word manuscript to refer to premodern prints until recently. See Sven Osterkamp on this issue.
13. This kind of curling would be nice to see in modern Chinese character logos.
14. Interesting 18th century romanizations of Japanese names from Sven
朽木昌綱 Kuchiki Masatsuna as K: Masatsna
吉雄幸作 Yoshio Kōsaku as Jo. Koozack
楢林重兵衛 Narabayashi Jūbei as NLB ziubij
堀門十郎 Hori Monjūrō as FoLi Monsuro
They give us hints about Japanese and Dutch pronunciation at the
184.108.40.206:59: WHITE RAT 2.5
? qulugh ai ? sair tau nyair
'white rat year, two month, five day'
1. Last night I finally realized that the IPA symbol ɞ is a closed ɜ. And just now I realized ɵ is a closed ə. Duh. The closure makesɞ and ɵ looks like o.
2. Imagine ɵ as a modification of the hangul zero consonant letter ㅇ to transcribe, say, [ʕ] in an Arabic phrasebook for Koreans.
3. Last night I saw the Japanese word 披露 hirō 'making public' when I read this story about a 'new' 手塚治虫 Tezuka Osamu manga created with AI assistance (AI-d?). That brought to mind a question I've wondered about before: why does披露 hirō end in a long vowel? In other environments, 露 ro has a short vowel. The Chinese rhyme class of 露 ro was regularly borrowed as *-o in Japanese:
庫 ku, 素 su, 布 fu
originally borrowed as *ko, *so, *po before *o-raising and *p-weakening
ideally I'd like to give an example of ru < *ro
for maximum parallelism, but I can't think of one
all originally borrowed as *ro after *o-raising; their readings have remained stable since c. the 7th century apart from any subphonemic changes in the pronunciations of /r/ and /o/
4. Windows 10's Japanese IME offered 琥 as an option when I typed
<ku>. When is 琥 read ku? The only reading of 琥 that I
know of is ko as in 琥珀 kohaku 'amber', the only common
word containing the character. Dictionaries do list ku
as an alternate reading, but I think that's an artificial Go-on reading
created via fanqie. I would expect the Go-on reading of琥 to be *ku
(from earlier *ko after *o-raising), but such a reading
may have not have survived.
5. I've been taking a closer look at Alan Downes' 2018 PhD dissertation How Does Tangut Work? I confess I do not understand how he got
'When differing coloured vessels are received, they should be supplied stamped.' (p. 99)
out of this excerpt from article 1261 of the Tangut law code:
one (by one)?
| one after another
1. 1ly3 (approximately [lə]) may be an unaccented form
of 𘈩 0100 1lew1 'one'; cf.
English a(n) from one.
Character 1 might be the first half of 1ly3 1ly3 'one by one' if the lost character 2 is also 5285. 1ly3 1ly3 'one by one' precedes nouns: e.g.,
1ly3 1ly3 1i4 1vi'1
'one one many born' = 'each and every living thing'
So 1ly3 1ly3 'one by one' would be appropriate in this context before the noun 1ka4 2gu4 'vessel'.
3-4, 5-6. Unlike Downes, I gloss polysyllabic words as single words instead of glossing individual syllables. I don't see where 'coloured' comes from. Could 2my1 2ner4 'various' also refer to variation in shape?
7-9. 'contribute come' corresponds to Downes' are received', and the postposition 'on' is a locative metaphor corresponding to Downes' 'when'.
10-11. 'beginning': i.e., the vessels must have been stamped from
the beginning? Corresponds directly to nothing in Downes' translation.
12-13. optative prefix of inward motion + 'seal' = 'should [be] stamp[ed]'
14-15. literally 'beginning back'; corresponds to nothing in Downes'
16. A perfective prefix for some unknown verb. Downes' translation
supplies 'supply' as the lost verb.
6. Page 34 of volume II of 永野護 Nagano Mamoru's The Five Star Stories has an unusual case of kanji ruby for hiragana: あいだ aida 'interval' has the ruby 時間 jikan 'time' (which in turn contains the kanji 間 that can be used to write aida 'interval'). The idea is to imply that aida is an interval of time. The English translation of the passage on page 10 has "moment".
7. Bess Press publishes readers in English, Hawaiian, Marshallese, Chuukese, and CHamoru for the Hawaii market. They capitalize both letters in the CHamoru digraph CH, though Wikipedia does not: <Ch>. Wikipedia explains:
There is also a movement on Guam to capitalize both letters in a digraph such as "CH" in words like "CHamoru" (Guamanian spelling) or "CHe'lu" ['sibling'], which NMI [Northern Mariana Islands] Chamorros find silly.
I'm guessing the spelling ch is due to Spanish influence.
8. It's taken me almost four decades to realize that Bess Press might be based on Pidgin bes pres 'best press'. Until now I thought it might have been named after someone named Bess.
9. Bess Press has published a trilingual Pidgin/Okinawan/Japanese book:
Okinawan Princess: Da Legend of Hajichi Tattoos
ウチナー ヌ ウミナイビ ヌ ハジチ ヌ イファナシ
Uchinā nu uminaibi nu hajichi nu ifanashi
Okinawa no ohimesama no hajichi no densetsu
'Okinawa GEN princess GEN hajichi GEN legend'
The Okinawan title is written in ruby atop the Japanese title. That
takes advantage of the match between Okinawan and Japanese word order.
I am reminded of the Korean ruby on this
1940 announcement from the 大邱 Taikyū (now Taegu) court under Japanese
Translator 崎原正志 Sakihara Masashi, a PhD in linguistics and a
specialist in Ryukyuan and Japanese linguistics, was 38 as of September
2019, so I assume he learned Okinawan as a foreign language. I would be
surprised if Okinawans born in the 80s grew up speaking Okinawan.
What I've seen of Lee Tonouchi's writing was in 'light' Pidgin in English spelling, so Sakihara was probably able to translate directly from it rather than through an English translation. Sakihara has been to Hawaii and may be familiar with Pidgin.
I assume ifanashi is Okinawan for 'legend', as it is the
ruby for Japanese 伝説 densetsu 'legend'. But I can't find ifanashi
in Sakihara's 2006 dictionary. The word is clearly cognate to Okinawan
and Japanese hanashi 'story'. I don't know what i- is.
10. Windows 10's Japanese IME converts <tegu> (an
approximation of Korean Taegu) into 大邱. Nice.
11. Is the Korean word 루비 rubi 'ruby' directly from English or via Japanese? I would guess the latter, though Martin et al. (1967: 557) derive it directly from English. (Contrast how they derive 루(우)블 ru(u)bŭl 'ruble' in the preceding entry from Russian via English.)
Martin et al. spell the word as 루(우)비 ru(u)bi with an optional long vowel. The long vowel variant is not from Japanese ルビ rubi which has short vowels. I don't know of any variant ルービ rūbi with a long vowel.
12. While looking for ルービ rūbi, I found ルービックキューブ Rūbikku kyūbu 'Rubik's Cube'. The long vowel in Rūbikku may reflect English [ˈɹuːbɪk] because the original Hungarian name has no long vowel: Rubik [ˈrubik] (not Rúbik [ˈruːbik]).
13. Top ten kanji searches at kanji.jitenon.jp as of 2.28 Japan time with their Shpika stats:
||req in school
||<JADE.WHITE.STONE> uncommon morpheme for
||<WOMAN.MAN.WOMAN>: variant of 嬲|
||<STAND.WIND>: first half of 颯爽 'gallant'
||<FISH.FISH.FISH>: variant of 鮮 'fresh'|
||'city outskirts'; far more common as a character
||prewar form of 寿 'long life' still used decoratively|
||<DEER.DEER.DEER>: variant of 粗 'coarse'
I've seen 1 (嬲) and 3 (嫐) come up in discussions of kanji with funny component combinations.
4 (鑫), 6 (鱻), and 10 (麤) may interest people because of their tripled parts.
says 鑫 is
A [Chinese] nickname for Kim Jong-un; the character is composed of three Kim characters (金), and Kim Jong-un is the third Kim to rule North Korea.
Is that true?)
Is 9 (壽) no longer common knowledge? It's been over seventy years
since it was required in schools, but it's still around.
I don't know why people would look up 7 (輝), a very common character. 2 (碧) and 5 (颯) are slightly more understandable, but they're not rare either.
I'm puzzled by how 8 (冂) became a popular thing to look up.
14. Page 34 of volume II of 永野護 Nagano Mamoru's
Five Star Stories has 皮膚 hifu
'skin' spelled as mixed kanji-kana 皮フ <SKIN fu>. I wouldn't have
expected such an abbreviation in the dialogue of a 'meight' (a creator
of artificial humans), though I would expect many Japanese to have
trouble writing膚 since it mostly appears in only one word and none of
its components are read fu.
15. The word featured in this week's Star-Advertiser
Japan section is チンする chin suru 'to microwave'. suru is
'do', and chin is said to be the
sound of a microwave indicating it's done.
Maybe that's how microwaves sounded in 1988 when the word came into use
according to zokugo-dict.com. I don't remember any microwave sounding
16. It just occurred to me that the Japanese radical name madare for 广 is 'ma-hanging', a reference to the various ma-graphs written with 广:
广 itself, however, is not a phonetic for ma; the actual phonetic of those graphs is 麻 ma, and 广 'house built to depend on a cliff' is read gen. 广 is also the PRC simplification of 廣 'wide' - which is simplified in Japan as 広 and read kō.
Today I learned that 广 can also be a simplification of 庵 'hut', though I'm guessing that abbreviation is now obsolete. I didn't notice it in Andrew West's list of simplified characters from a 1935 Republic of China Ministry of Education proposal.
17. Charlamagne tha God is a guest on Stephen Colbert's show tonight. His name is hard to spell correctly - I just misspelled it as Charlemagne.
18. Is the pegative case only in one language? Is it necessary for the analysis of the Azoyú variety of the Tlapanec language?
19. The perlative case is more common than the pegative case, but I never heard of it until tonight.
220.127.116.11:52: WHITE RAT 2.4
? qulugh ai ? sair ? nyair
'white rat year, two month, four day'
1. Yesterday this caught my eye in the
TV Tropes guide to characters in The Five Star Stories:
His full royal title is Amaterasu dis Grand Grees Eydas IV.
The "Eydas" part of the name is actually a tens' counter, so in pure numerical count he'd be Amaterasu dis Grand Grees LXXXIV.
I assume Eydas (エイダス Eidasu) is a distortion of
English eighty, but the -dasu part unintentionally
reminded me of Pali dasa
'ten'. (The near-homophony of -dasu with Japanese dāsu
'dozen' is presumably also unintentional. Pali asīti 'eighty'
sounds nothing like Eidasu.)
Incidentally I long thought Grees (グリース Gurīsu) was
a reference to Grease since Five Star Stories creator
Nagano loves music references. But Grease doesn't seem to be
Nagano's kind of music. Maybe it's from Greece. (The Japanese
word for 'Greece' is related though different: ギリシャ Girisha.)
2. Can you distinguish these Jurchen characters?
<fai> and <muta>
Jason Glavy's Jurchen font (shown above) has a gap beneath the 7-shaped top component of <muta>.
Kiyose (1977: 65) distinguishes 078 <fai> and 079 <muta> by their bottom right strokes:
<fai> and <muta>
Note how Kiyose writes the top components of both characters with a
much narrower 7-shape.
Jin (1984: 44) has two entries for <fei>/<fai>¹ and <muta> but also says "there is no difference in character shapes" (字形沒有區別). Then he tries to distinguish them the same way Kiyose does with a long stroke in <fei/fai> and a dot in <muta>. He derives <fei>/<fai> from 扉 (Jin Chinese *fi).
I wish I could examine copies of the Sino-Jurchen vocabulary to see
for myself what <fai> and <muta> look like.
¹Jin lists two readings in different parts of the entry. The reading <fei> may be influenced by féi, the modern standard Mandarin reading of 肥, the Chinese character used to transcribe <fei> in the Sino-Jurchen vocabulary. The reading <fai> may be based on two facts:
First, the Jurchen word for 'eyebrow'
<? ta> (#500)
transcribed as Ming Chinese 肥塔 *fita is cognate to Manchu faitan
'eyebrow'. There was no Ming Chinese syllable *fai, so 肥 *fi
was the closest available approximation of Jurchen fai. There
is no reason to believe that Jurchen fi became Manchu fai.
Manchu ai generally corresponds
to Jurchen ai.² I don't know of any other apparent cases of
Manchu ai corresponding to Jurchen i.
Second, the Jurchen word for 'label'
<? sï> (#270)
transcribed as Ming Chinese 肥子 *fizi is a borrowing of Jin or Ming Chinese 牌子 *paizi 'label'. It's hard to tell if the borrowing predated the shift of Jin Jurchen p- to Ming Jurchen f-. It's possible that the word was borrowed during the Ming when f- was the only available Jurchen approximation of Ming Chinese p-.
²One seeming exception is Manchu aisin
: Jurchen alcun or ancun?
'gold'. But I think the Manchu and Jurchen forms are different
borrowings from a common source. So I would not derive the Manchu form
from the Jurchen form.
3. David Boxenhorn solved the Nagamese mystery: the -m- is by analogy with Assamese (whose -m- is part of the stem and not a buffer).
4. While playing 北斗の拳 Hokuto no Ken #78 last night, I heard the name 泰山 Taizan which the subtitler correctly rendered as "Taishan". Did the subtitler have to look that up, or did they know the Mandarin name (or even know Mandarin?). The English subtitles that I saw in Hawaii broadcasts of Japanese TV shows in the 70s would have left "Taizan" untranslated, as the subtitlers were under time pressure and were probably local Japanese-Americans without any knowledge of Mandarin.
When I first encountered the name Taizan via Hokuto no Ken in 1987, I gave no thought to the -z- which is unexpected. Theoretically 泰 Tai plus 山 san 'mountain' should equal Taisan, not Taizan. Normally Sino-Japanese s may become z after a nasal or *nasal vowel: e.g.,
見 ken + 參 san = 見參 kenzan 'going
to a superior and meeting them' (humilific; see here for a longer
Variant readings are kezan, gezan, genzan, and genzō. I don't know how 見 came to be read gen. k- > g- is highly irregular in Japanese.
2.27.14:01: Strictly speaking, not Japanese: the inexplicable (at least to me) case of Okinawan gani 'crab' (cf. Japanese kani 'id.') comes to mind.
One cannot reconstruct Proto-Japonic *g- which was retained in Okinawan but devoiced in Japanese simply to explain this single pair of words.
正 syaũ + 三 samu = 正三 Shōzō (name of one of the scripters of the Hokuto cartoon)
三 -zō is an irregular reading of 三 san < samu only in noninitial position in male names; cf. the -zō reading of 參 san < samu in 見參 genzō
三 -zō has even spread into names that never had *nasality in their first elements: e.g., 浩三 ~ 巧三 Kōzō < kau + samu as well as 幸三 ~ 耕三 Kōzō < kaũ + samu.
But there never was any nasality in 泰 Tai.
So it seems the -z- in Taizan is a case of voicing the initial of a second element of a compound (rendaku) - a morphological alteration rather than a product of phonetic conditioning.
2.27.13:57: Later last night I realized I had known an example of 'mountainous' rendaku since childhood:
火 ka 'fire' + 山 san 'mountain' = 火山 kazan 'volcano'
a word that came up in at least one of my elementary school
Japanese-language textbooks. Back then I never questioned why 山 had an
irregular reading -zan. I don't remember misreading 火山 as
regular ˟kasan (though I've made a lot of other reading
mistakes over my lifetime). And now, about forty years later, I know
that 火 ka 'fire' never had any nasality that would condition
the voicing of the s- of 'mountain'.
5. Sanskrit japa- 'muttering prayers' is a straightforward a-noun derived from the root jap 'mutter'. This etymology is nonsense:
It [japa-] can be further defined as ja to destroy birth, death, and reincarnation and pa meaning to destroy ones sins.
Chinese-like monosyllabic analyses of Sanskrit words are usually
dubious. There is no such ja or pa. In fact, there is a
-ja- '-born' (not 'destroy birth') a pāpa-
'sin', and a -pa- '-protecting' (not
pa 'destroy one's sins'). I presume the 'analysis' of japa
was influenced by ja and pāpa-.
One case in which a Chinese-like monosyllabic analysis really is
true is Sanskrit khaga- 'bird' from kha 'void' and ga
'go'. There is no root khag-.
6. If I didn't see the caption identifying this picture as being of
I wouldn't be able to identify the non-English script. This is only
the second time I've seen handwritten Bengali. (The first was a sample
Tagore's handwriting - possibly this
image.) I'm familiar with so many scripts only in typeset form. I
wish I had more access to original Khitan and Jurchen texts.
7. Phrase of the day: pizza effect.
8. Tonight I started reading volume II of 永野護 Nagano Mamoru's The Five Star Stories. (Windows 10's IME has Nagano's name built in!) Nagano has his own idiosyncratic readings of kanji:
南 minami 'south' as Sazando (p. 16)
北 kita 'north' as Nouzu (p. 17)
Sazando (< Sazan < Southern) and Nouzu (< Nōsu < North) are names of two of the five stars in The Five Star Stories
神様 kami-sama 'god-HON' as Maitorēya 'something greater than a god' (p. 20)
18.104.22.168:50: WHITE RAT 2.3
? qulugh ai ? sair ? nyair
'white rat year, two month, three day'
1. Today I offer a digital Mardi Gras feast of three entries posted
at the same time. (I'll post the entries for the second half of last
2. Perhaps it was yesterday afternoon when I realized that the Khitan small script character
235 <ri> (the interpretation in Kane [2009: 62])
might be derived from Chinese 礼 (pronounced *li in Liao
Chinese). That derivation won't work for Shimunek's interpretation of
235 as <ir>, a possibility Kane (2009: 62) also acknowledges.
3. These Jurchen characters look related:
<he> and <ke>
They are even more similar as printed in Kiyose (1977) which has an identically sized 人 component in both:
<he> and <ke>
I proposed that <he> might be a graphic cognate of Chinese 黑;
if so, then <ke> is <he> with an elongated first stroke.
22.214.171.124:59: WHITE RAT 2.2
? qulugh ai ? sair ? nyair
'white rat year, two month, two day'
1. Why was Persian khwāja
borrowed into Manchu (via some Turkic language?) as hojo?
I can't find any other
version of the word ending in -o.
2. While copying the Golden Guide last night, it finally occurred to me that Tangut 𗂅 2384 2me4 'minister' is a semantic compound of 'hand' + 'person' reminiscent of English right-hand man (though 'hand' is on the left of the Tangut character!).
3. Yesterday I saw Japanese 見れます miremasu, short for miraremasu 'can see' on p. 102 of vol. 1 of The Five Star Stories. That got me to look into the phenomenon of 「ら」抜き ra-drop discussed here.
Normally the potential is -raremasu after vowel stems and -emasu
after consonant stems: e.g., kak-emasu 'can write'. mi-
'see' is a vowel stem, but miremasu looks as if it contains a
(nonexistent) consonant stem mir-. I wonder if future Japanese
will make more (or all) verbs consonant stems.
4. Speaking of the verb miru 'see', Wiktionary derives it from Proto-Japonic *miu and says it is cognate with me 'eye'.
The reasoning behind *miu seems to be that the regular
conclusive ending is *-u (cf. consonant stems like kak-u
'write') and that in Proto-Japonic, *-u was added as is to all
stems, whereas in Japanese, a buffer -r- was inserted after
vowel stems: *mi-u > mi-ru but *kak-u
> kak-u. One could propose the reverse: *mi-ru > mir-u
but *kak-ru > kak-u.
I don't think there is any relationship between mi- 'see'
and me < *ma-i 'eye'. I know of no other examples of
Ci-verbs corresponding to *Ca-i nouns.
5. Is the Gundam robot name 笑倣江湖 <LAUGH IMITATE RIVER LAKE> Shōhō Kōko a play on the title 笑傲江湖 Shōgō kōko <LAUGH PROUD RIVER LAKE>, known in English as The Smiling, Proud Wanderer?
江湖 <RIVER LAKE> is not the sum of its parts.
6. Sven Osterkamp on Vietnamese in Japanese transcription and vice versa.
Just two interesting examples:
Vietnamese một [mot] 'one' transcribed as
Japanese moru (clearly not representing a Vietnamese
dialect in which -ôt became [ok]).
Japanese mizu [mi(d)zɯ] 'water' transcribed as
monosyllabic Vietnamese 篾 miệt [miət] (again,
not based on a Vietnamese dialect with [t] > [k])
7. 篾 is a rare example of a nom character that is a semantogram: it
can represent both Sino-Vietnamese miệt 'bamboo splints' and
its unrelated native Vietnamese synonym giá 'id.' (Is giá
'bamboo splints' an obsolete word? I can't find it outside of nomfoundation.org.)
8. Last night I heard "avant-garbage" on The Goldbergs. I'll
store that for future use.
9. Last night I finally learned the
etymology of やおい yaoi:
yama[ba] nashi, ochi nashi, imi nashi
'no climax, no denouement, no meaning'
It sounds like a backronym.
10. Last night I learned that the Czech version of Mardi Gras is masopust. Maso is
'meat', but pust has a short vowel unlike půst
'fast'. Wiktionary says masopůst is obsolete. Why was ů
shortened? Because it wasn't stressed? But Czech has lots of unstressed
long vowels in noninitial syllables. (Czech stress tends to be on the
11. Going eastward: I was surprised to learn that the
Slovakian prime minister is named Peter Pellegrini.
His Italian name is pronounced like a Slovakian word, so -ni is
[ɲi] and not [ni]. To Italian ears his name might sound like Pellegrigni.
12. I had heard of Pellegrini's predecessor Robert Fico.
What is the etymology of that name? F seems to rule out a Slavic
origin, but it doesn't sound like any non-Slavic European name that I
can think of.
13. I didn't know about Nagamese Creole
until today. I can't remember ever seeing -m- as a buffer
between a vowel-final base and -ese before.
looks like a folk etymology of the ethnonym Naga:
The term "Naga" is derived from a Burmese language word "Naka", which mean "Pierced ears" or "People with pierced ears". Piercing of ears is common tradition of the said people.
The Burmese name of the Naga is နာဂ <nāga> from Pali nāga- 'naga'. 'Ear' is နား <nāḥ> with a high tone, not နာ <nā> with a low tone, and there is no က <ka> or ဂ <ga> meaning 'pierced'.
15. SEAlang's Burmese dictionary defines Burmese နာဂ <nāga> as "a Tibet-Myanmar speaking ethnic group inhabiting the hilly north-west region along the Myanmar-India border." There's a synonym of Tibeto-Burman
I've never seen before: Tibet-Myanmar.
16. Is Nefamese
another Assamese-based pidgin, or is it something else?
17. It just occurred to me that Taic by analogy with Turkic
and Mongolic could be used in English to avoid the
confusing homophony of Tai and Thai. It's tiring to
explain to nonspecialist the difference between Tai and Thai.
Part of me feels like using Dai to make the connection with Kra-Dai
obvious, but Dai
is already an ethnonym. Maybe Daic and Kra-Daic?
18. Entries 360-371 and 376-381 in the Sino-Jurchen vocabulary of the Bureau of Translators have the format
Jurchen X : Chinese Y
Jurchen X : Chinese Z
That shows the semantic range of Jurchen words.
19. Today's Honolulu Star-Advertiser reprinted a 1963 photo
with a Hawaii tourism ad translated into Japanese with the slogan:
omotta yori majika de ... yume mita yori mo utsukushii
thought than close be ... dream saw than even beautiful
'Closer than you thought ... even more beautiful than you dreamt'
Note how 思つた omotta 'thought' is spelled in the prewar
manner as with a full-size つ <tsu>. The postwar spelling is 思った
with a reduced <tsu>.
In the ad, the kanji 近 has its prewar form with two dots at thte top
left instead of just one.
I suspect the slogan was translated in Hawaii by someone who still
had not shifted to postwar orthography.
20. Did the Japanese sentence-final -ものを mono wo (lit. 'thing ACC') construction in
damat-te-i-reba ii mono wo
lit. 'silent-CVB-be-if good thing ACC'
'I wish I had just not said anything'
(example from this site which has an explanation and more examples)
originate as an abbreviation of mono wo followed by a verb (like 'wish')?
21. Big news from Andrew West:
In May 2015 the National Library of China acquired 18 bundles of Tangut documents in a very poor state from a book dealer in Yinchuan who had contacted eminent Tangutologist Prof. Shi Jinbo.
If only similar Khitan and Jurchen-language items could be found.
The last new Pyu-language discovery I know of was from four years
ago. I'm sure there will be more.
22. Stephen Colbert just pronounced 习 Xí [ɕi] (as in Jinping) as [ʒiː]. Sigh.
126.96.36.199:23: WHITE RAT 2.1
? qulugh ai ? sair ? nyair
'white rat year, two month, one day'
1. I just changed the nonsense string "par juri" 'ten twenty' to "juri" 'twenty' in my transcriptions of Khitan dates. So many copy-and-paste mistakes. I copy and paste to save time, but my carelessness publicly embarrasses me until I make and upload fixes. Which takes time!
(2.25.1:20: I wrote topic 2 thinking the date was 1.30, but in fact
it was 2.1 since Khitan
months can have only 29 days.)
2. If Khitan juri 'twenty' is from jur 'two' plus -i, then I would guess that 'thirty' might be something like guri: gur 'three' plus -i. But I don't really know, and I'm not even sure the Khitan word for 'three' is gur. And I can't explain the Khitan large script character
which doesn't look like Chinese 卅 <THIRTY>. And Chinese 卅 <THIRTY> (a combination of three 十 <TEN>s) does resemble ...
which is obviously a combination of four lines.
3. Starting today, the last post on the index page is followed by a link to the previous week's page. I've also added links to previous weekly pages at the bottom of weekly pages from the 19.12.22-19.12.28 page onward. Those are no substitute for updating the archives page, I know. In the meantime one can use search engines to find older posts.
4. A friend gave me an interesting pamphlet, M. Vrdalj's Engleski sa izgovorom (Beograd: Jovan, 2006). For now I'll just comment on the title on the front cover: I would expect sa 'with' to be s before izgovorom 'pronunciation.INS.SG.' (See Wiktionary's usage notes.) And indeed s izgovorom is the title on the inside front cover and the National Library of Serbia listing in the back. So does the front cover have a typo nobody caught, or is it acceptable for sa to be used for vowels?
I expect Slavic *sŭ 'with' to become s (or z) everywhere except in certain environments where it has a buffer vowel (i.e., a remnant of *ŭ): e.g., Ukrainian
зі сестрою zi sestroju 'with [one's] sister'
found at ukrainianlanguage.org.uk. But ukrainiangrammar.com has
з сестрою z sestroju 'with [one's] sister'
without a buffer vowel.
"з сестрою": about 1,770,000
"зі сестрою": about 247,000
has both s and so, but it's not clear to me when to use
so. Merunka's (2018: 51) grammar only mentions s.
5. Why is it taking so long to develop the
AP Russian Language and Culture exam?
AP Russian Language and Culture is a proposed Advanced Placement course and examination, in development since 2005. [...] The program was meant to launch between 2007-2008.
A prototype exam was administered to students in 2010.
But the real exam still doesn't exist ten years later.
6. Numbers of students taking Advanced Placement language tests in 2019:
Students taking Spanish AP exams outnumber students taking all other AP language exams by a ratio of four to one.I took the German AP exam in 1989.
7. Scott Pelley on 60 Minutes pronounced Calehr as [kjl]. What language does that name originate from? Samira Calehr's sons Shaka and Miguel were going to see their grandmother in Bali. I would expect Calehr to be pronounced something like [tʃaləhr] in Indonesian. How is the name pronounced in the Netherlands where Calehr's family lives?
The family has an interesting mix of first names. Samira is
Arabic, Miguel is Spanish, and I can't tell what Shaka
and Mika (Samira's third son) are.
8. Another shaka is a mysterious word from Hawaii that isn't Hawaiian (Wiktionary points out sh isn't in Hawaiian) and doesn't seem to be from any language here. I wish I could do a full-text search of the local press to see how early it appeared in print.
Help:IPA/Malay page says:
The dental fricatives [θ, ð] are found solely in Arabic loanwords [in Malay (Malaysian and Indonesian)], but the writing is not distinguished from the Arabic loanwords containing the [s, z] sounds and these sounds must be learned separately by the speakers.
So [θ ð] are written as <s z> like [s z]. On the other hand, Wikipedia's
article on Malay phonology gives redha (not reza)
'good will'¹ as an example of /ð/. It notes that
Before 1972, this sound [θ] was written ⟨th⟩ in Standard Malay (but not Indonesian).
Why wasn't <th> adopted for [θ] in the 1972 orthographies?
¹Not a meaning I can find in the dictionaries at SEAlang.
10. I just heard a McDonald's commercial ending with "made perfecter" (sic). I suppose the irony is supposed to be funny. I wonder how many English learners might take that phrase as a model.
11. My 1987 printing of vol. 1 of The Five Star Stories has the Japanese phrase
sono himitsu (hitotsu ya futatsu dewanai)
that secret (one ya two not.be)
on the second color page. English requires one to say either that secret or those secrets, whereas sono himitsu can refer to one or more secrets. The author has given the impression up to that point that there is only one secret but then reveals the surprise that there are more than just one or two. To translate sono himitsu as those secrets would give away the surprise, but that secret (not just one or two) sounds odd.
The numerals are spelled inconsistently in the original as <hi to tsu> and <2 tsu> - <2> representing futa- 'two'.
ya is hard to translate precisely. X ya Y means 'X
and Y (and others)', not just 'X and Y'.
has dehanai as a romanization of ではない <de ha na i>
dewanai. Sigh. Such transliterations are dangerous for
those who don't know how the word is pronounced.