MCWHORTER'S SIMPLIFICATION HYPOTHESIS
I'm surprised McWhorter thinks Mandarin has tense:
When a language seems especially telegraphic, usually another factor has come into play: Enough adults learned it at a certain stage in its history that, given the difficulty of learning a new language after childhood, it became a kind of stripped-down "schoolroom" version of itself. Because all languages, are, to some extent, busier than they need to be, this streamlining leaves the language thoroughly complex and nuanced, just lighter on the bric-a-brac that so many languages pant under.
Mandarin can mark tense but often doesn’t
machines judge machines (emphasis mine):
A Google AI research team recently published the paper Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges, proposing a universal neural machine translation (NMT) system trained on over 25 billion examples that can handle 103 languages.
There are a couple of limitations in the experimental results. First, it is not clear which points in the figures correspond to which languages, so it is hard to get finer-grained takeaways about which types of languages are benefitting from this type of training. Second, there are no qualitative results or translation examples, only results measured using automatic measures such as BLEU score. Because of this it is hard to tell which of these systems have reached a practical level.
(11.3.21:15: What is BLEU?
BLEU (bilingual evaluation understudy) is an algorithm for evaluating the quality of text which has been machine-translated from one natural language to another. Quality is considered to be the correspondence between a machine's output and that of a human: "the closer a machine translation is to a professional human translation, the better it is" – this is the central idea behind BLEU.)
3. Surprising kanji readings of the day:
3a. 日暮 Nippori (even though 暮 isn't -pori or hori < *p- elsewhere). The spelling 日暮里 has a clarifier 里 ri at the end to indicate that 日暮 isn't read higure or higurashi with normal readings for both 日 and 暮).
(11.3.20:55: Is the reading pori for 暮 rooted in the normal reading bo for 暮? Is 日暮 Nippori a clipped version of 日暮里, whose three parts in isolation would be read as nichi, bo, and ri?)
3b. 麻布 Azabu (even though 麻 is normally asa, not aza). I can't think of any other case of voicing being ignored in the middle of a reading to write a similar-sounding name.
220.127.116.11:41: FAN VS. HAN
Words for 'outer' groups of people can be hard to explain and translate. Using a local example, is haole always pejorative?
Shao-yun Yang's "Fan and Han: The Origins and Uses of a Conceptual Dichotomy in Mid-Imperial China, ca. 500-1200" (2014) is of interest to TJK (Tangut/Jurchen/Khitan) scholars:
However, the dichotomy eventually became ethnic in the Kitan [= Khitan] Liao and Western Xia [= Tangut Empire], where Han reverted to being an ethnonym for the "Chinese." Our understanding of the word Fan as used in the Kitan empire remains incomplete, but one of its uses was as a synonym for Kitan. Similarly, the primary use of Fan in the Xia was as a synonym for Mi, the ruling Tangut people’s most common self-appellation. [...] Meanwhile, the Jin [= Jurchen Empire] revived the use of Han as an ethnonym for the Chinese in the North China Plain, but banned the use of Fan as an appellation for the ruling Jurchen and their language in 1191¹—possibly as a way of asserting the political legitimacy of Jurchen rule over north China.
The article makes me think how dangerous it can be to be wedded to the simplistic tag translations that I and pretty much everyone uses for Chinese morphemes: e.g., <BARBARIAN> for 蕃 ~ 番 Fan and <CHINESE> for 漢 Han. There is an unspoken assumption that the semantics of Chinese morphemes are just as 'unchanging' as character forms, but this is not so - meanings varied over time and space. In this respect, Chinese is no different from any other language or language family. What is different about Chinese is the illusion of semantic stability implied by the graphic stability of character forms. Chinese has the world's most conservative orthography. It would be nice to see a dictionary that goes beyond the simple dichotomy of premodern/literary vs. modern/colloquial (Mandarin) and says that, for instance, 蕃 ~ 番 meant this in one time and place, that in another time and place, etc.
¹11.2.22:29: I don't think it's a coincidence that the ban occurred during this period described by Kane (2009: 3-4):
In 1191 the Jin emperor Zhangzong ordered that Jurchen should be directly translated into Chinese [rather than via Khitan]. Clerks of the Department of National Historiography who knew only the Kitan script[s] were dismissed. In 1192, the position of Kitan secretary was abolished in all ministries. [...] The Kitan script[s were] abolished by the Jin Emperor Zhangzong in 1191-1192.
The abolishing of Khitan (a 'Fan' language) was part of the trend of
distancing the Jurchen from the 'Fan' category.
18.104.22.168:41: XIAOXUETANG READINGS OR TRANSLATIONS?
I've been looking in 小學堂 Xiaoxuetang for w/v-readings of 扔 <THROW> cognate to Cantonese wing1. So far the only such reading I've found is 斗門 Doumen Yue veŋ1. No w/v-readings in Hakka or Pinghua varieties.
Is "readings" the right word? Looking at the Pinghua entry for 扔
<THROW>, I see pronunciations for morphemes not cognate to the pan-Chinese¹ root 扔 *ɲ̊iŋ:
兩江 Liangjiang pʰi < *final stop
does this have any cognates in other varieties/languages?
五通 Wutong sa < *final stop
青龍 Qinglong tiu1 (and similar t-forms in other
As far as I know, no Pinghua varieties are written. So I wonder how
the above data were elicited. Were speakers asked to translate Mandarin
扔 reng1 'to throw' into their native languages?
In 1992 I had difficulty eliciting Taiwanese readings for Chinese characters from a Taiwanese speaker. In some cases she only knew the Mandarin reading for a character. That seems inevitable if one is only educated in Mandarin and isn't taught to read lower-frequency and literary characters in one's own language. I wouldn't expect her to know the Taiwanese reading of, say, the literary perfective particle 矣. How did Xiaoxuetang get Pinghua readings of 矣?
11.1.22:23:22: The ROC Ministry of Education Taiwanese dictionary
lists ah as the reading of 矣, but I think ah is a
native Taiwanese word 'anterior aspect particle' (as defined by Philip
T. Lin ) written semantically with a character 矣
<PERFECTIVE> originally for an unrelated particle whose Taiwanese
pronunciation should be something like i. (Xiaoxuetang
says the reading in 漳州 Zhangzhou, a close mainland relative of
Taiwanese, is i.) ah is to i what Pinghua pʰi, sa,
and tiu are to Mandarin reng1 and other reflexes of
pan-Chinese *ɲ̊iŋ: a (loose) semantic equivalent rather than a
cognate. I would say that ah is probably a Taiwanese
reading of 矣 which I imagine has another i-like reading used
when literary Chinese texts are read out loud in Taiwanese (something
that probably doesn't happen much anymore - hence the absence of an i-like
reading in the government dictionary). I doubt scholars in premodern
Taiwan read 矣 as ah.
I wonder how old the practice of writing ah as 矣 is. A
historical dictionary of Taiwanese orthography showing all the
competing spellings and their dates would be fun.
I was surprised that the Maryknoll Taiwanese dictionary (now in Excel as well as PDF format!) doesn't have an entry for the particle ah.
¹I use the term 'pan-Chinese' for a form inferred from modern forms rather based on philological data like 'Middle Chinese' or 'Old Chinese'.
22.214.171.124:59: KIRWANDAN = KINYARWANDA?
1. From one IMDb biography for Stephanie Katherine Grant:
There aren't many thirteen year olds who speak Greek, French and Kirwandan
Is Kirwandan another name for Kinyarwanda? I wouldn't have guessed how <rw> is pronounced in Kinyarwanda.
(10.31.20:55: Lovely, I might be able to answer my question if I
paid $480/year so I could see the Ethnologue
entry for Rwanda.)
2. I was surprised by this entry in Chalmers
and Dealy's English and Cantonese Dictionary, Volume 2 (1907:
for two reasons.
X and Y, (algebraic symbols) 天元 t‘in uen, 天地 t‘in tiʾ.
First, the method of marking tones:
||roman (no ʾ!)|
(10.31.0:33: I can't find volume 1 which presumably had a key to the
tone system, so I've had to reverse-engineer the notation for entries
in volume 2.)
Second, how 'X' is 天元 <HEAVEN ORIGIN> and 'Y' is 天地 <HEAVEN EARTH>. Who came up with those equivalents?
3. I was also surprised to see that Eitel's A
Chinese dictionary in the Cantonese dialect, Part 1 (1877: xiii)
says Cantonese has an u : uu distinction in addition to
an a : aa distinction. Only the latter survives a
half-century later (and strictly speaking are distinguished by quality
as well as length: [ɐ] : [aː] - but were they only distinguished by
quality in Eitel's time?).
1. I wondered where the name Katahdin (as in Katahdin sheep) came from:
From Abenaki Ktaden, Ktaaden ("great mountain"), from kta-, gita- ("great") (compare Penobscot kta- ("great")) + aden ("mountain") (whence also monadnock).
The name caught my eye because of the unusual consonant sequence <hd>. I mistakenly syllabified the name as ka-ta-hdin, not ka-tah-din.
2. I had never heard of Abenaki before. It has an
asymmetrical vowel system:
e.g., its sole nasal vowel is [ɔ̃] which has no oral counterpart. Would
an Abenaki speaker have trouble with nasalizing other vowels? With oral
I've never heard of a noble/ignoble distinction before:
Although there may be occasional exceptions, noble words pertain to living things, and inanimate objects are considered to be ignoble words.
Wiktionary entry for 力 <POWER> draws parallels between these
Go-on riki and Cantonese lik
Kan-on ryoku and Sino-Korean 력 ryŏk
That got me thinking about the multiple Sino-Korean rhymes corresponding to the Middle Chinese rhyme class of 力 <POWER>:
||chik < t-
|actual SK rhyme type
||jiki < *nd-
||choku < ty-
||shoku < sy-
The idealized SK readings are from 東國正韻 Tongguk chŏngun
(Correct Rhymes of the Eastern Country, 1448).
In Old Chinese, all five of the above words rhymed in *-ək, and in Middle Chinese, all five belonged to the rhyme category that I reconstruct as *-ɨk. Does that mean that ancient Koreans heard Middle Chinese *-ɨk five times and rendered it in three different ways? No. 'Middle Chinese' is a theoretical construct, a generic 6th century dialect that never existed. At least some actual Middle Chinese dialects probably splitOld Chinese *-ək in different ways depending on the initial class.
So should I conclude, for example, that actual Sino-Korean borrowed
from a Middle Chinese dialect with a five-way split of *-ək?
Not necessarily. Although Sino-Korean is relatively uniform, it does
have multiple strata reflecting one or more source dialects at one or
more points of time. Two obvious candidate source dialects are the
prestige dialect of the distant Tang capital and the nearby dialect of
the northeast. My guess is that the five rhyme types are from a mixture
of two or more strata, and none of the strata have a five-way split.
1. Before getting to the title, let me branch off from the first of yesterday's topics and link to Victor Mair's
post about 梘 in Taiwan.
2. Cantonese 攰 <BRANCH.POWER> gui6 'tired' has an unusual phonetic 支 <BRANCH> which normally has four types of phonetic values:
zi-type < *k-: e.g., 支吱枝肢鳷㩼㩽㽻䓩䧴 zi1
ci-type < *kʰ-?: e.g., 翄翅 ci3
kei-type < *g-: e.g., 岐歧蚑跂㟚㩽䡋 kei4
gei-type < *gr-: e.g., 伎忮技芰妓鬾䰙 gei6
None of the above readings have a labial vowel. -(e)i phonetics do not normally occur in graphs for -ui syllables.
If I were to design a character for gui6, it would be
since 貴 gwai3 is phonetic in 憒潰瞶繢聵䙡䠿䫭 kui2.
3. Cantonese 扔 wing1 'to throw' has an unusual phonetic 乃 which normally has two types of phonetic values:
奶 naai1, naai5
疓 naai3, naai4
jing-type < *n- : e.g.,
扔 wing1 has -ing like the jing-type but has a w- which is unusual (or even unique?) for a *dental phonetic series.
The first tone indicates a *voiceless initial, so wing1
cannot go back to voiced *n- or *w-.
I don't have time to go through the comparative Yue data for 扔 right now. At a glance the picture looks complex.
4. Why is 賽 <COMPETE> coi3 in Cantonese instead
of the expected †soi3 (cf. Middle Chinese *səjʰ
and Mandarin sài)?
126.96.36.199:59: WOODEN SIGHT
1. Cantonese gaan2 'soap' is written as 梘 <WOOD.SEE>. 木 muk6 'wood' isn't an obvious semantic component to me (though I just learned that soap can be made from potash [i.e., wood ash] - whose name is from pot ash¹!). 見 gin3 'to see' is from Old Chinese *kens, and its derivatives normally end in -in < *-en in Cantonese. Most Middle Chinese readings of 梘 ended in *-en:
*kenˀ 'pipe for conveying water' (alternative spelling of 筧 with 竹 <BAMBOO>; 筧 were made of bamboo, and I suppose 梘 were made out of wood)
*kenʰ 'bolt' (?), 'coffin cover'
belongs to the rhyme 霰 *senʰ 'sleet' with an anomalous phonetic 散 normally for *san-syllables
*ɣenʰ 'to inspect' (?) (< prefixed 'to see'?; another word for 'to inspect', 檢 *kɨemˀ, also has 木 <WOOD> on the left)
*ʔæw 'wooden axe handle' (?; has a variant with 日 instead
of 目; the right side of that variant is similar to 皃 *mæwʰ
'appearance' minus the top dot, but it would be odd to write an *ʔ-syllable
with an *m-phonetic)
So I'm surprised that 見 gin3 is phonetic in a character for an -aan syllable.
What is the etymology of gaan2 'soap'? I don't know of a Mandarin cognate. ('Soap' is 肥皂 féizào in Mandarin. féi is 'fat' and soap was made from 皂莢zàojiá 'Chinese honey locust'.) Meixian Hakka 鹼 gian2 'soap' sounds like a cognate.
鹼 <SOAP> should theoretically be Meixian Hakka †giam2.
As *-m > -n is neither a morphological or regular
phonological change in Hakka, it seems that giam2 'soap'
was written semantically as 鹼 <SOAP>. Could Meixian Hakka have
gotten the word from a language with *-m/*-n merger?
I would expect the Cantonese cognate of Meixian Hakka gian2 'soap' to be †gin2, not gaan2. (Cantonese has no -ia-.) Could the graph 梘 also be borrowed from that language? What if 見 in that language was pronounced like Meixian Hakka gian3 'to see'? Summing up my hypothetical scenario:
there was a language (not necessarily Meixian Hakka or even
Hakka) in which 'soap' was *gian2 < *giam2
that word was written in that language as 梘 <WOOD.*gian3>,
recycling a character for (nearly) homophonous but unrelated words
the word was borrowed into Cantonese as 梘 gaan2 even though the phonetic no longer fit very well since 見 is gin3 in Cantonese)
¹Which is pronounced [ˈpɒtæʃ]. Until now I
assumed it was [ˈpowtæʃ].
2. Tonight 60 Minutes had a segment on pandas.
2a. 0:06: Scott Pelley pronounced 熊猫 xióngmāo [ɕjʊŋ˧˥ maw ˥]
'panda' as [ʃɔŋmaw]. [ɔ] reflects written Pinyin <o> rather than
the spoken vowel [ʊ]. I don't understand why Pinyin has <o> for
The common name, "panda," means "bamboo eater."
That might make most viewers think panda is 'Chinese' is for 'bamboo eater', but I know of no such Chinese phrase, and the etymology seems unclear to me. (I don't find the Tibetan etymology convincing: pho nya has no d.)
3. On God
Friended Me tonight there was a character named Rakesh.
Sanskrit Rākeśas is a name for Shiva that looks like a compound
of rāka- + īśas 'lord'. The trouble is that there is no
attested outside dictionaries which define it as 'quiver; wealth,
money; sun'. Could one or more of those definitions be a guess for the rāka-
in Rākeśas which might have originated as 'lord' plus some
non-Sanskrit word (or placename)?