1. Is this why Tangut is simpler than Japhug?

When a language seems especially telegraphic, usually another factor has come into play: Enough adults learned it at a certain stage in its history that, given the difficulty of learning a new language after childhood, it became a kind of stripped-down "schoolroom" version of itself. Because all languages, are, to some extent, busier than they need to be, this streamlining leaves the language thoroughly complex and nuanced, just lighter on the bric-a-brac that so many languages pant under.

I'm surprised McWhorter thinks Mandarin has tense:

Mandarin can mark tense but often doesn’t

2. When machines judge machines (emphasis mine):

A Google AI research team recently published the paper Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges, proposing a universal neural machine translation (NMT) system trained on over 25 billion examples that can handle 103 languages.


There are a couple of limitations in the experimental results. First, it is not clear which points in the figures correspond to which languages, so it is hard to get finer-grained takeaways about which types of languages are benefitting from this type of training. Second, there are no qualitative results or translation examples, only results measured using automatic measures such as BLEU score. Because of this it is hard to tell which of these systems have reached a practical level.

(11.3.21:15: What is BLEU?

BLEU (bilingual evaluation understudy) is an algorithm for evaluating the quality of text which has been machine-translated from one natural language to another. Quality is considered to be the correspondence between a machine's output and that of a human: "the closer a machine translation is to a professional human translation, the better it is" – this is the central idea behind BLEU.)

3. Surprising kanji readings of the day:

3a. 日暮 Nippori (even though 暮 isn't -pori or hori < *p- elsewhere). The spelling 日暮里 has a clarifier 里 ri at the end to indicate that 日暮 isn't read higure or higurashi with normal readings for both 日 and 暮).

(11.3.20:55: Is the reading pori for 暮 rooted in the normal reading bo for 暮? Is 日暮 Nippori a clipped version of 日暮里, whose three parts in isolation would be read as nichi, bo, and ri?)

3b. 麻布 Azabu (even though 麻 is normally asa, not aza). I can't think of any other case of voicing being ignored in the middle of a reading to write a similar-sounding name. FAN VS. HAN

Words for 'outer' groups of people can be hard to explain and translate. Using a local example, is haole always pejorative?

Shao-yun Yang's "Fan and Han: The Origins and Uses of a Conceptual Dichotomy in Mid-Imperial China, ca. 500-1200" (2014) is of interest to TJK (Tangut/Jurchen/Khitan) scholars:

However, the dichotomy eventually became ethnic in the Kitan [= Khitan] Liao and Western Xia [= Tangut Empire], where Han reverted to being an ethnonym for the "Chinese." Our understanding of the word Fan as used in the Kitan empire remains incomplete, but one of its uses was as a synonym for Kitan. Similarly, the primary use of Fan in the Xia was as a synonym for Mi, the ruling Tangut people’s most common self-appellation. [...] Meanwhile, the Jin  [= Jurchen Empire] revived the use of Han as an ethnonym for the Chinese in the North China Plain, but banned the use of Fan as an appellation for the ruling Jurchen and their language in 1191¹—possibly as a way of asserting the political legitimacy of Jurchen rule over north China.

The article makes me think how dangerous it can be to be wedded to the simplistic tag translations that I and pretty much everyone uses for Chinese morphemes: e.g., <BARBARIAN> for 蕃 ~ 番 Fan and <CHINESE> for 漢 Han. There is an unspoken assumption that the semantics of Chinese morphemes are just as 'unchanging' as character forms, but this is not so - meanings varied over time and space. In this respect, Chinese is no different from any other language or language family. What is different about Chinese is the illusion of semantic stability implied by the graphic stability of character forms. Chinese has the world's most conservative orthography. It would be nice to see a dictionary that goes beyond the simple dichotomy of premodern/literary vs. modern/colloquial (Mandarin) and says that, for instance, 蕃 ~ 番 meant this in one time and place, that in another time and place, etc.

¹11.2.22:29: I don't think it's a coincidence that the ban occurred during this period described by Kane (2009: 3-4):

In 1191 the Jin emperor Zhangzong ordered that Jurchen should be directly translated into Chinese [rather than via Khitan]. Clerks of the Department of National Historiography who knew only the Kitan script[s] were dismissed. In 1192, the position of Kitan secretary was abolished in all ministries. [...] The Kitan script[s were] abolished by the Jin Emperor Zhangzong in 1191-1192.

The abolishing of Khitan (a 'Fan' language) was part of the trend of distancing the Jurchen from the 'Fan' category. XIAOXUETANG READINGS OR TRANSLATIONS?

I've been looking in 小學堂 Xiaoxuetang for w/v-readings of 扔 <THROW> cognate to Cantonese wing1. So far the only such reading I've found is 斗門 Doumen Yue veŋ1. No w/v-readings in Hakka or Pinghua varieties.

Is "readings" the right word? Looking at the Pinghua entry for 扔 <THROW>, I see pronunciations for morphemes not cognate to the pan-Chinese¹ root 扔 *ɲ̊iŋ:

As far as I know, no Pinghua varieties are written. So I wonder how the above data were elicited. Were speakers asked to translate Mandarin 扔 reng1 'to throw' into their native languages?

In 1992 I had difficulty eliciting Taiwanese readings for Chinese characters from a Taiwanese speaker. In some cases she only knew the Mandarin reading for a character. That seems inevitable if one is only educated in Mandarin and isn't taught to read lower-frequency and literary characters in one's own language. I wouldn't expect her to know the Taiwanese reading of, say, the literary perfective particle 矣. How did Xiaoxuetang get Pinghua readings of 矣?

11.1.22:23:22: The ROC Ministry of Education Taiwanese dictionary lists ah as the reading of 矣, but I think ah is a native Taiwanese word 'anterior aspect particle' (as defined by Philip T. Lin [2015]) written semantically with a character 矣 <PERFECTIVE> originally for an unrelated particle whose Taiwanese pronunciation should be something like i. (Xiaoxuetang says the reading in 漳州 Zhangzhou, a close mainland relative of Taiwanese, is i.) ah is to i what Pinghua pʰi, sa, and tiu are to Mandarin reng1 and other reflexes of pan-Chinese *ɲ̊iŋ: a (loose) semantic equivalent rather than a cognate. I would say that ah is probably a Taiwanese reading of 矣 which I imagine has another i-like reading used when literary Chinese texts are read out loud in Taiwanese (something that probably doesn't happen much anymore - hence the absence of an i-like reading in the government dictionary). I doubt scholars in premodern Taiwan read 矣 as ah.

I wonder how old the practice of writing ah as 矣 is. A historical dictionary of Taiwanese orthography showing all the competing spellings and their dates would be fun.

I was surprised that the Maryknoll Taiwanese dictionary (now in Excel as well as PDF format!) doesn't have an entry for the particle ah.

¹I use the term 'pan-Chinese' for a form inferred from modern forms rather based on philological data like 'Middle Chinese' or 'Old Chinese'. KIRWANDAN = KINYARWANDA?

1. From one IMDb biography for Stephanie Katherine Grant:

There aren't many thirteen year olds who speak Greek, French and Kirwandan

Is Kirwandan another name for Kinyarwanda? I wouldn't have guessed how <rw> is pronounced in Kinyarwanda.

(10.31.20:55: Lovely, I might be able to answer my question if I paid $480/year so I could see the Ethnologue entry for Rwanda.)

2. I was surprised by this entry in Chalmers and Dealy's English and Cantonese Dictionary, Volume 2 (1907: 809) for two reasons.

X and Y, (algebraic symbols) 天元 t‘in uen, 天地 t‘in tiʾ.

First, the method of marking tones:

sonorant-final syllable
stop-final syllable
roman (no ʾ!)

(10.31.0:33: I can't find volume 1 which presumably had a key to the tone system, so I've had to reverse-engineer the notation for entries in volume 2.)

Second, how 'X' is 天元 <HEAVEN ORIGIN> and 'Y' is 天地 <HEAVEN EARTH>. Who came up with those equivalents?

3. I was also surprised to see that Eitel's A Chinese dictionary in the Cantonese dialect, Part 1 (1877: xiii) says Cantonese has an u : uu distinction in addition to an a : aa distinction. Only the latter survives a half-century later (and strictly speaking are distinguished by quality as well as length: [ɐ] : [aː] - but were they only distinguished by quality in Eitel's time?). KATAHDIN

1. I wondered where the name Katahdin (as in Katahdin sheep) came from:

From Abenaki Ktaden, Ktaaden ("great mountain"), from kta-, gita- ("great") (compare Penobscot kta- ("great")) + aden ("mountain") (whence also monadnock).

The name caught my eye because of the unusual consonant sequence <hd>. I mistakenly syllabified the name as ka-ta-hdin, not ka-tah-din.

2. I had never heard of Abenaki before. It has an asymmetrical vowel system: e.g., its sole nasal vowel is [ɔ̃] which has no oral counterpart. Would an Abenaki speaker have trouble with nasalizing other vowels? With oral [ɔ]?

I've never heard of a noble/ignoble distinction before:

Although there may be occasional exceptions, noble words pertain to living things, and inanimate objects are considered to be ignoble words.

3. The Wiktionary entry for 力 <POWER> draws parallels between these readings:

That got me thinking about the multiple Sino-Korean rhymes corresponding to the Middle Chinese rhyme class of 力 <POWER>:


idealized SK
ʔɯ́k rɯ́k
gɯ́k dík
actual SK
ŏk ryŏk
chik < t-
actual SK rhyme type
jiki < *nd-
choku < ty-
shoku < sy-

The idealized SK readings are from 東國正韻 Tongguk chŏngun (Correct Rhymes of the Eastern Country, 1448).

In Old Chinese, all five of the above words rhymed in *-ək, and in Middle Chinese, all five belonged to the rhyme category that I reconstruct as *-ɨk. Does that mean that ancient Koreans heard Middle Chinese *-ɨk five times and rendered it in three different ways? No. 'Middle Chinese' is a theoretical construct, a generic 6th century dialect that never existed. At least some actual Middle Chinese dialects probably splitOld Chinese *-ək in different ways depending on the initial class.

So should I conclude, for example, that actual Sino-Korean borrowed from a Middle Chinese dialect with a five-way split of *-ək? Not necessarily. Although Sino-Korean is relatively uniform, it does have multiple strata reflecting one or more source dialects at one or more points of time. Two obvious candidate source dialects are the prestige dialect of the distant Tang capital and the nearby dialect of the northeast. My guess is that the five rhyme types are from a mixture of two or more strata, and none of the strata have a five-way split. POWER BRANCH

1. Before getting to the title, let me branch off from the first of yesterday's topics and link to Victor Mair's post about 梘 in Taiwan.

2. Cantonese 攰 <BRANCH.POWER> gui6 'tired' has an unusual phonetic 支 <BRANCH> which normally has four types of phonetic values:

None of the above readings have a labial vowel. -(e)i phonetics do not normally occur in graphs for -ui syllables.

If I were to design a character for gui6, it would be


since 貴 gwai3 is phonetic in 憒潰瞶繢聵䙡䠿䫭 kui2.

3. Cantonese 扔 wing1 'to throw' has an unusual phonetic 乃 which normally has two types of phonetic values:

wing1 has -ing like the jing-type but has a w- which is unusual (or even unique?) for a *dental phonetic series.

The first tone indicates a *voiceless initial, so wing1 cannot go back to voiced *n- or *w-.

I don't have time to go through the comparative Yue data for 扔 right now. At a glance the picture looks complex.

4. Why is 賽 <COMPETE> coi3 in Cantonese instead of the expected soi3 (cf. Middle Chinese *səjʰ and Mandarin sài)? WOODEN SIGHT

1. Cantonese gaan2 'soap' is written as 梘 <WOOD.SEE>. 木 muk6 'wood' isn't an obvious semantic component to me (though I just learned that soap can be made from potash [i.e., wood ash] - whose name is from pot ash¹!).gin3 'to see' is from Old Chinese *kens, and its derivatives normally end in -in < *-en in Cantonese. Most Middle Chinese readings of 梘 ended in *-en:

So I'm surprised that 見 gin3 is phonetic in a character for an -aan syllable.

What is the etymology of gaan2 'soap'? I don't know of a Mandarin cognate. ('Soap' is 肥皂 féizào in Mandarin. féi is 'fat' and soap was made from 皂莢zàojiá 'Chinese honey locust'.) Meixian Hakka 鹼 gian2 'soap' sounds like a cognate.

鹼 <SOAP> should theoretically be Meixian Hakka †giam2. As *-m > -n is neither a morphological or regular phonological change in Hakka,  it seems that giam2 'soap' was written semantically as 鹼 <SOAP>. Could Meixian Hakka have gotten the word from a language with *-m/*-n merger?

I would expect the Cantonese cognate of Meixian Hakka gian2 'soap' to be †gin2, not gaan2. (Cantonese has no -ia-.) Could the graph 梘 also be borrowed from that language? What if 見 in that language was pronounced like Meixian Hakka gian3 'to see'? Summing up my hypothetical scenario:

¹Which is pronounced [ˈpɒtæʃ]. Until now I assumed it was [ˈpowtæʃ].

2. Tonight 60 Minutes had a segment on pandas.

2a. 0:06: Scott Pelley pronounced 熊猫 xióngmāo [ɕjʊŋ˧˥ maw ˥] 'panda' as [ʃɔŋmaw]. [ɔ] reflects written Pinyin <o> rather than the spoken vowel [ʊ]. I don't understand why Pinyin has <o> for [ʊ].

2b. 3:40:

The common name, "panda," means "bamboo eater."

That might make most viewers think panda is 'Chinese' is for 'bamboo eater', but I know of no such Chinese phrase, and the etymology seems unclear to me. (I don't find the Tibetan etymology convincing: pho nya has no d.)

3. On God Friended Me tonight there was a character named Rakesh. Sanskrit Rākeśas is a name for Shiva that looks like a compound of rāka- + īśas 'lord'. The trouble is that there is no word rāka- attested outside dictionaries which define it as 'quiver; wealth, money; sun'. Could one or more of those definitions be a guess for the rāka- in Rākeśas which might have originated as 'lord' plus some non-Sanskrit word (or placename)?

