Amaravati: Abode of Amritas

07.4.27.23:49: LOK-ATION, LOK-ATION, KƏ-LOK-ATION

David Boxenhorn had another explanation for the different Old Chinese readings of 谷 'valley':

Another possibility is that areal *lUK was borrowed at more than one time or place, once as an emphatic [*lok] and once as a non-emphatic [*lok].

I had assumed that 谷 'valley' was originally a Chinese word that might have been inherited from Proto-Sino-Tibetan. Regardless of the origin of 谷 'valley', some emphatic/nonemphatic sets may have originated from alternate Sinifications of a foreign word. Suppose that there was a foreign word 'valley' that was *lOk [lɔk], a syllable that did not exist in my reconstruction of Old Chinese. Old Chinese speakers could have borrowed it as

1. *lOk [lɔk] without any change at all, even though Old Chinese had no *O - would speakers import an alien vowel just for one word? Probably not. The other two possibilities both conform to the phonotactics of my Old Chinese reconstruction:
2. *lok even though the*o associated with nonemphatic consonants did not quite match the original *O

3. *lok [lˁɔˁqˁ] even though the emphasis wasn't in the original

Although 谷 'valley' may turn out to be native after all, this pattern could potentially apply to any foreign word which did not quite fit the Old Chinese phonological system:

1. borrow the word as is
2. approximate the consonant(s), even if the vowel doesn't quite match
3. approximate the vowel, even if the (non)emphatic quality of the sylllable doesn't match the original

Of course, David's explanation cannot explain emphatic/nonemphatic sets of presumably native words such as

髀 'thigh' (cf. Written Tibetan dpyi 'hip'; no non-Sino-Tibetan cognates listed in Schuessler [2007: 164])
Middle Chinese *be' < Old Chinese *N-pe' [mˁpˁɛˁʔˁ]
Middle Chinese *bie' (Jiyun) < Old Chinese *N-pe'
Middle Chinese *pie' < Old Chinese *pe'
Middle Chinese *pi' < Old Chinese *pi'
Middle Chinese *mWih [mɰiʰ] (Jiyun) < OC *?r-N-pi'-s
(but normally OC *N-p- does not become MC *m-! Is this from a nonstandard dialect which reduced OC *nasal-stop clusters to MC *nasals instead of MC *voiced stops?)
(Mandarin bi [no longer an independent word; in 髀骨 bigu 'thighbone' with 骨 gu 'bone'] could have come from either of the first two readings.)

Other examples written with the same nonemphatic phonetic 卑 OC *pe (I'll skip the MC readings) are

椑 'inmost coffin'
OC *bek [bˁɛˁqˁ]
OC *bek
螷 'long and narrow bivalve'
OC *r-beng' [ʀˁbˁɛˁɴˁʔˁ]
OC *r-be [ʀˁbˁɛˁ] (cf. the homophone 蜱 'oyster')
(Does *r- signify a pair: i.e., the two parts of a bivalve shell? See Sagart [1999: 115] for *r- [his infix *-r-] in words for double or multiple objects: e.g.,
眼 *r-ngən' 'eyeball' < 眼 *ngən' 'bulge, knob'
腔 *r-khong 'body cavities [esp. lungs]' < 空 *khong 'empty')
OC *be [bˁɛˁ]
OC *be
鞞 'scabbard'
OC *peng' [pˁɛˁɴˁʔˁ]
OC *pe'
also used to write 鼙 OC *be [bˁɛˁ] 'a small hand drum'

I am not aware of any external cognates for these words, so I assume they are native.

In all these cases, I assume that the roots were nonemphatic like the phonetic 卑 *pe in their graphs:

髀 'thigh': *pe' ~ *pi'

椑 'inmost coffin': *bek

螷 'long and narrow bivalve': *be

鞞 'scabbard': *pe(ng)'

The emphatic readings reflected a lost presyllable that triggered harmony in the root syllable:

*Cə-P- [Cˁʌˁ-P] > *Cə-P- [Cˁʌˁ-Pˁ]

Perhaps this presyllable was a remnant of an earlier full syllable that was not a prefix, but the first half of an earlier disyllabic root: e.g.,

髀 'thigh': **√pape'

椑 'inmost coffin': **√tabek

螷 'long and narrow bivalve': **√tsabe

鞞 'scabbard': **√kape(ng)'

(The initial syllables are arbitrary and are almost certainly wrong. I assume that *a was a vowel in the earliest 'emphasizing' syllables.)

If the presyllable was a prefix, it might have been the count noun prefix *kə- [qˁʌˁ] since all four of these nouns are count nouns.

Other instances of *kə- [qˁʌˁ] causing roots to 'emphasize' are

谷 'valley':*kə-lok > *kə-lok > *klok, *lok (alongside *lok)

鞻 'shoe':*ro > *kə-ro > *ro (alongside *ro)
also cf. 鞻, 屨 kros 'shoe' < *k-ro-s (with emphatic consonantal prefix [reduced from *kə- [qˁʌˁ]] assimilating to the following nonemphatic syllable?)

域 *wək 'boundary tracts' > 國 *kə-wək > *kwək 'state' (i.e., 'bounded thing')

In these cases, the presyllable survived in one later Chinese reading (谷 MC *kok, 鞻 / 屨 MC *küöh, 域 *kwək), whereas emphasis is the only remaining trace of the presyllable in 髀 'thigh', etc. There may be cases in which presyllables disappeared without a trace: e.g., instances in which a presyllable and the following syllable were both (non)emphatic. Cf. English cause [kʌz] < because: given only the former, there is no way anyone can reconstruct the be- of the disyllabic original.

Next: 於 *'a of the 虎 tiger.

07.4.26.23:46: STILL STUCK IN THE VALLEY

Last night, I came up with an overly complex explanation for the three readings of 谷:

*lok 'valley'
*k-lok [qˁlˁɔˁqˁ] 'valley'
*kə-lok [qˁʌˁ-lˁɔˁqˁ] in the Xiongnu title 谷蠡 *kə-lokrey

Here's a simpler explanation:

There was an areal word *lUK with a back rounded vowel and a velar coda meaning 'valley': e.g., Written Tibetan lung-pa 'valley' (-pa is a WT noun suffix.) For other examples, see Schuessler (2007: 259).
In Old Chinese, nonemphatic *lok 'valley' became emphatic after the (count noun?) prefix *kə- [qˁʌˁ]:
*kə-lok [qˁʌˁ-lok] > *kə-lok [qˁʌˁ-lˁɔˁqˁ] 'a valley'
This sesquisyllable was then reduced in two ways:
1. By losing its first minimal vowel:
*kə-lok [qˁʌˁ-lˁɔˁqˁ] > *k-lok [qˁ-lˁɔˁqˁ] (> Middle Chinese *kok > Md gu)
2. By losing its presyllable:
*kə-lok [qˁʌˁ-lˁɔˁqˁ] > *lok [lˁɔˁqˁ] (> Middle Chinese *lok > Md lu)
These two reduced forms existed side by side with the original*lok (> Middle Chinese *yüökyu) and possibly even *kə-lok.

In modern Chinese languages, 谷 has reading pronunciations derived from mainstream Middle Chinese *kok (< Old Chinese *k-lok). (See this list.) However, it is possible that there are modern colloquial words for 'valley' derived from the other OC readings (*kə-lok, *lok, *lok). The Middle Chinese rhyme dictionary Guangyun (1008) lists both MC*kok and MC *yüök for 'valley'.

Although Guangyun defines MC *lok as part of the Xiongnu title 谷蠡, I assume that whoever coined that transcription pronounced 谷 'valley' as *kə-lok or *lok in his (northern) Old Chinese dialect, and that speakers of other dialects used their approximation of his dialectal pronunciation for 谷 in that title while still using their own words for 'valley'.

Similarly, whoever coined the transcription 吐谷渾 (MC *tho' yüök Gon*) for the Tuyuhun probably had a *y-initial word for 'valley' in his (northwestern) early Middle Chinese dialect, and speakers of other dialects used their approximation of his dialectal pronunciation for 谷 in that name while still using their own words for 'valley'.

My assumption in both cases is that the men who coined the transcriptions 谷蠡 and 吐谷渾 had their everyday words for 'valley' in mind. They did not deliberately decide to create peculiar readings of 谷 solely for use in those transcriptions. Those readings only seem peculiar because later varieties of Chinese favored readings based on OC *k-lok.

Perhaps similar reasoning can be applied to other unusual readings in transcriptions. They may turn out to reflect dialectal variants of ordinary words.

*Judging from the Middle Chinese transcription 吐谷渾 *tho' yüök Gon of their name, the Tuyuhun (now known by the modern Mandarin reading of that transcription) might not have had vowel harmony in their language. It is, however, possible that the MC transcription represented a compound *töyök-Gon whose components could be nonharmonic. Middle Chinese did not have a syllable like *t(h)ö, so 吐 *tho' could have represented a foreign *tö [tʰø]. (Chinese voiclees aspirates were used to transcribe [phonetically aspirated] voiceless obstruents in 'Altaic' languages other than Korean and Japanese.) The glottal stop in 吐 *tho' probably corresponded to nothing in the original name. There was no Middle Chinese syllable *tho that would have been a better match. Moreover, 吐 'spit' may have been a derograph chosen for its insulting meaning as well as its sound. Transcribers may have used derogatory sinographs even if they were not exact phonetic matches for foreign syllables.

07.4.25.23:59: STUCK IN THE VALLEY

One might think that

谷 *lok 'valley' > 谷 *k-lok [qˁlˁɔˁqˁ] 'valley'

from my last post is evidence for prefixes triggering rightward emphatic harmonization.

But not all nonemphatic stems became emphatic when the concrete noun prefix *k- was added (Sagart 1999: 98-99):

嶸 *weng 'distant' > 坰 *k-weng 'outlying parts, far from the capital'
方 *pang 'square, regular' > 匡*k-phang 'square basket'
(the aspiration of *ph eludes explanation)

I am fairly sure that the root of 谷 'valley' was originally nonemphatic. Its phonetic series (Karlgren 1957: 306, 310) is nonemphatic with the exception of 谷 'valley' itself:

欲 *lok 'desire, wish'
(obviously cognate to the following word)
慾 *lok 'lust, passion'
浴 *lok 'bathe, wash'
(cf. 溶 *long 'much water' below)
鵒 *lok, second half of
鴝鵒 / 鸜鵒 *golok 'mynah bird' (a disyllabic word with harmony)

裕 *loks 'ample, abundant'; also used to write 欲 *lok 'desire, wish'
(cf. the following word; 'ample' < 'containing much'?)
容 *long 'contain'
溶 *long 'much water' (cf. 浴 *lok 'bathe, wash' above; 湧 / 涌 *long-' 'bubble up', 洶 *'-hlong 'rush [as water]', 洪 *gong < ?*N-k-long 'great [waters]'* [Schuessler 2007: 443])
蓉 *long, second half of
芙蓉 *balong 'lotus' (a disyllabic word with harmony)
(Could -ng in 容溶蓉 have come from an earlier stop-nasal cluster *-k-N?)

If phonetic elements were originally intended to write only emphatics or nonemphatics, then

a. 谷 symbolized the nonemphatic syllable *lok and its emphatic reading *k-lok [qˁlˁɔˁqˁ] was secondary
or
b. 谷 symbolized the emphatic syllable *lok, and the other members of its phonetic series (欲慾浴鵒裕容溶蓉) were also originally emphatic until a lost nonemphatic prefix *Cə- made them nonemphatic: e.g.,
*Cə-lok [Cɯ-lˁɔˁqˁ] > *Cə-lok [Cɯ-lok] > *lok

I prefer (a) since it requires fewer changes than (b).

But how can the change of 谷*lok from nonemphatic to emphatic be explained? Moreover, 谷 *lok /*k-lok [qˁlˁɔˁqˁ] 'valley' has another emphatic reading *Cə-lok for the Xiongnu title 谷蠡 *Cə-lokrey (a foreign loanword lacking harmony). Although Sagart did not specify the initial consonant of *Cə-lok, I suspect that it was *k [qˁ]. The prefix *kə- would trigger rightward emphatic spreading before dropping:

*kə-lok [qˁʌˁ-lok] > *kə-lok [qˁʌˁ-lˁɔˁqˁ] > *lok [lˁɔˁqˁ]

Later, a reduced form of that prefix, *k-, attached to *lok and became emphatic to match its stem:

*k-lok [k-lˁɔˁqˁ] > *klok [qˁlˁɔˁqˁ]

*klok became the normal word for 'valley' while *lok, the original word, and *lok, the variant with secondary emphasis, became marginalized. In modern standard Mandarin, 谷 is normally read gu < *klok, and the other two readings yu < *lok and lu < *kə-lok are only used in the foreign names 吐谷渾 Tuyuhun (a Middle Chinese transcription of the name of a Central Asian tribe [Wikipedia entry]) and the aforementioned Xiongnu title 谷蠡 (now pronounced yuli).

This is convoluted, but still preferable to a scenario in which several different roots and parts of words lost their emphasis.

(This is a false choice. There is a better alternative.)

Next: I'm stumped!

*07.4.26.8:13: I am not sure that 洪 *gong 'great (waters)' is really from *N-k-long. Maybe it is simply a bare root *gong. Schuessler suggested an alternative cognate 浩 *gu' 'vast (rising waters)' which he linked to 高 *kaw 'high'. Maybe they share a root *√k-w:

*N-k-w-' (zero grade): 浩 *gu' 'vast (rising waters)'
*k-a-w (a-grade): 高 *kaw 'high'
?*N-k-a-w-ng: 洪 *gong 'great (waters)'

I have also considered analyzing 洪 *gong 'great (waters)' (also written 鴻) as an 'o-grade' derivative *N-k-w-o-ng of a root *√w-ng in

弘 *w-ə-ng 'vast; enlarge'
宏 *r-w-ə-ng 'resounding; great'

泓 ?*'-r-w-əng 'deep water' (no attestations before Late Old Chinese?)
洸 *k-w-a-ng 'rushing water'
廣 *k-w-a-ng-' 'wide'
擴 *kh-w-a-k < ?*s-k-w-a-ng-C 'extend'

07.4.24.23:59: CAN PREFIXES OVERPOWER STEMS?

In the previous post, I wrote,

Since I did not want to reconstruct pairs of prefixes (e.g., *N-/*N-) with different harmonizing effects, in 2005 I proposed that leftward harmonization applied to single-consonant prefixes.

But in fact I do reconstruct pairs of prefixes (e.g., *N-/*N-) in Old Chinese. Here's how I interpret Sagart's (1999: 75) first two examples of the intransitive nasal prefix¹:

nonemphatic*N- in 復 *N-phuk [mphuk] 'return' < 覆 *phuk 'turn over'

emphatic*N- in 現 *N-ken-s [ɴˁqˁɛˁnˁsˁ] 'to appear' < 見 *ken-s 'see'²

(*N-/*N- represents a nasal that assimilates to the following consonant. The original point of articulation of this nasal prefix is unknown.)

So how I do reconcile this contradiction?

The issue is whether such pairs of prefixes (e.g., *N-/*N-) were primary or secondary: i.e., were they originally different, or were their differences due to emphatic harmonization? When I wrote that "I did not want to reconstruct pairs of prefixes", I was referring to primary pairs of prefixes.

Given the fact that words with *N-/*N- may either be emphatic or nonemphatic, at least two interpretations are possible:

- Rightward harmonization: There are two separate primary prefixes *N- and*N- with the same function which cause the following stem to harmonize.
- Leftward harmonization: There is one prefix *N- which harmonizes with the following stem, resulting in two secondary prefixes: *N- before nonemphatic stems and *N- before emphatic stems.

If the rightward interpretation is correct, there could be cases of

*N- + stem = *N-stem (nonemphatic stem deemphasizes to match nonemphatic prefix)
*N- + stem = *N-stem (emphatic stem emphasizes to match emphatic prefix)

However, all twelve of Sagart's (1999: 75-77) examples of *N- prefixation involve stems and derivatives that have identical emphatic values:

*stem > *N-stem
*stem > *N-stem

(But elsewhere, Sagart [1999: 92, 114] proposed
剛 *kang [qˁɑˁɴˁ] 'hard, strong'³ > 彊 *N-kang [ɴkang] 'strong, violent'
寶 *pu' [pˁʊˁʔˁ] 'precious' > 婦 *N-pu' [mpu'] 'wife' ['precious one'])

*N- generally assimilated to following consonants in terms of emphasis as well as articulation. This is what I would expect since suffixes assimilate to stems in Uralic and 'Altaic' languages. Suffixes do not 'overpower' stems: i.e., they do not cause stems to conform to them. Instead, it is the stems that 'overpower' affixes.

Was *N- unique? There are cases of affixation which are apparently accompanied by emphasis status reversal (Sagart 1999: 106, 175):

nonemphatic *stem > emphatic *prefix-stem
谷 *lok 'valley' > 谷 *k-lok [qˁlˁɔˁqˁ] 'valley'

emphatic *stem > nonemphatic *prefix-stem
孽 *ngat [ɴˁɑˁtˁ] 'stump of a tree' > 孽 *r-ngat 'stump of a tree'

Next: But is correlation causation?

¹I assume that *[m] and *[ɴˁ] have a common origin since they are identical in all but two aspects (point of articulation and emphasis), though it is possible that two or more nasal prefixes with the same function but different points of articulation later merged into a single prefix.

²Sagart reconstructed the root as *^aken (his *^a = my emphasis), whereas I think it might have been *ken' with a final glottal stop. I cannot find any derivatives of these root ending in a bare nasal in Schuessler (2007: 304, 530), and so I suspect that *-n-s should really be reconstructed as *-n'-s:

見 *ken'-s 'see'
現 *N-ken'-s 'appear'
俔 *khen'-s, *N-khen' 'look like'

³07.4.25.1:11: Sagart actually cited the sinograph 鋼 for *kang. However, 鋼 *kang meant 'steel' (< 'the hard thing') whereas 'hard, strong' was 剛 *kang. (But there may be some loan usage of 鋼 for 'hard, strong' that I am not aware of.)

I wonder if 鋼 'steel' was actually *k-kang with Sagart's concrete noun prefix *k- forming a noun out of 'hard'. But 'steel' is a mass noun, whereas Sagart's (1999: 107) *k- stood

for actions and objects that are well-delimited in time and space, and hence usually concrete and countable. (Emphasis mine.)

Perhaps 'steel' had another noun-deriving prefix which disappeared without a trace, or was simply an instance of zero derivation.

Sagart (1999: 107) also proposed that the

disappearance of [Old Chinese] *k- between the Old Chinese and Middle Chinese periods deprived Chinese of a means of distinguishing between count and mass nouns. This may have been a factor in the rise of numeral classifiers in Chinese during the same period.

Is it a coincidence that the generic classifier 個*kays [qˁɑˁyˁsˁ] contains a *k-? I am tempted to regard Mandarin 一個X yi ge X [i kə ...] 'one ...' as a survival of an earlier *it k-X. Was *k- reduced from an earlier 個*kays? I don't think so, because:

1. Early classifiers appeared after nouns, not before them: e.g.., 馬三匹 *mra' səm phit⁴ 'horse three (classifier)' = 'three horses' in Zuozhuan.
2. The proposed long form 個*kays did not appear until Late Old Chinese. I would expect the long form to predate the short form, not the other way around.

⁴07.4.25.1:01: The Japanese reading of 匹 is hiki < *piki with an unexpected -k-. I wonder if this reflects a nonstandard Middle Chinese reading *phik preserving a final *-k lost in mainstream Old Chinese: *phit < *phik. Cf. Old Chinese 虱 *srit 'louse' which is probably from an earlier *srik: cf. Written Tibetan shig < *hryik 'louse' (Schuessler 2007: 73).

07.4.23.23:59: SYLLABLES TO THE RIGHT, CONSONANTS TO THE LEFT

If Old Chinese only had rightward emphatic harmonization (see my previous post), one would predict that roots would tend to harmonize with single-consonant prefixes:

*C-CV > *C-CV (root becames emphatic like prefix *C-)

*C-CV > *C-CV (root becames nonemphatic like prefix *C-)

just as syllables tended to harmonize with preceding syllables:

*CV.CV > *CV.CV (second syllable became emphatic like first syllable *C-)
*CV.CV > *CV.CV (second syllable became nonemphatic like first syllable *C-)

(. = a syllabic break that may or may not correspond to a morphemic boundary, which I indicate with a hyphen [-])

One would also predict that most of the words with a given prefix would either be emphatic or nonemphatic. Some of the small samples of prefixed roots in Sagart (1999) seem to violate these predictions (e.g., intransitive *N- occurred in almost equal numbers of emphatic and nonemphatic words) while others don't (e.g., agentive noun *m- only occured in nonemphatic words). However, it is very dangerous to draw firm conclusions from small samples (e.g., a single example of a stative verb prefix *k-). Large samples of prefixed roots might reveal different patterns. (Statistics added c. 4.24.00:40.)

	Number of emphatic words	Number of nonemphatic words
Causative *s-	2	4
Denominative *s-	2	1
Directional *s-	2	3
Inchoative *s-	1	2
Noun-deriving *s-	1	5
Intransitive *N-	5	7
Verbal *m-	3	1
Agentive noun *m-	0	3
Animal *m-	1	4
(function[s?] unclear) *p-	3	5
Stative verb *t-	4	6
Involuntary physiological verb *t-	2	1
Misc. intransitive *t-	0	4
Noun *t-	1	4
Action verb *k-	0	3
Stative verb *k-	0	1
Noun *k-	5	2
(function[s?] unclear) *'-	6	9
Repeated action *-r-	6	1
Multilocational action / collective participant *-r-	4	5
Multiple object *-r-	3	1
Intensifier *-r-	6	1

Since I did not want to reconstruct pairs of prefixes (e.g., *N-/*N-) with different harmonizing effects, in 2005 I proposed that leftward harmonization applied to single-consonant prefixes. Now I would restate that hypothesis as follows:

Old Chinese syllables had to be either emphatic or nonemphatic from onset to coda (if any). It would be difficult to pronounce a cluster of an emphatic consonant followed by a nonemphatic consonant, or vice versa. Therefore any consonantal affix had to harmonize with the root syllable:

prefix (leftward harmonization):
emphatic prefix + nonemphatic root: *C-CV > *C-CV
nonemphatic prefix + emphatic root: *C-CV > *C-CV
infix (bidirectional harmonization):
emphatic infix + nonemphatic root: *C-C-V > *C-C-V
nonemphatic infix + emphatic root: *C-C-V > *C-C-V
suffix (rightward harmonization):
nonemphatic root + emphatic suffix: *CV-C > *CV-C
emphatic root + nonemphatic suffix: *CV-C > *CV-C

Harmonization within a syllable was bidirectional, but harmonization across syllabic boundaries was unidirectional (rightward): the first syllable of a disyllabic word usually determined the emphatic setting for the following syllable.

Next: Snakes and snails, oh ma!

07.4.22.23:59: THE BUTTERFLY CASE: PARTTR-ES

In the early 50s, the late Sinologist George Kennedy wrote "The Butterfly Case" (1955) and an unpublished 1954¹ sequel summarized in William G. Boltz' (2000: 31) "Monosyllabicity and the Chinese Script". Both articles dealt with the disyllabic Chinese word for 'butterfly':

蝴蝶

Old Chinese *galep [ɢˁɑˁlˁɛˁpˁ] > Md hudie

In Boltz' words, Kennedy

asked why, if the early Chinese scribes were able to produce pictographs of horses [馬], oxen [牛], sheep [羊], and birds [鳥], they never produced a pictograph of a butterfly? The Chinese word for butterfly has always been written with what Kennedy calls ... phonetic loan characters, i.e., rebus usages, one character per syllable. Such usages are, according to the basic hypothesis of how writing systems develop, used to write words that are not readibly depictable. Kennedy asks what is so undepictable about a butterfly, and inter alia, a bat (bianfu [蝙蝠]), a cicada (zhiliao [蜘蟟/知了]), or a spider (zhizhu [蜘蛛]), none of which is written with a character that is in origin a pictograph, all of which are written with phonetic loans². His answer is, of course, that the butterfly and all of these others are bisyllabic words, and his conclusion is that "words for which one might expect pictorial representation³, but which do not have it, are found to be polysyllabic words today, and have presumably always been polysyllabic." And the further clear implication is that [conversely -AMR] bisyllabic words did not get written with single depictively realistic graphs, consistent with our [Boltz'] earlier hypothesis that monosyllabicity⁴ was a required condition for the first appearance of writing.

(I have added sinographs and replaced Boltz' Gwoyeu Romatzyh romanization with toneless Pinyin.)

Words such as 蝴蝶 'butterfly' which have always been polysyllabic can be used to test the 'emphatic Sinitic' hypothesis. In theory, there could be four types of disyllabic Old Chinese words:

1. emphatic first syllable + emphatic second syllable
e.g., 蝴蝶
OC *galep [ɢˁɑˁlˁɛˁpˁ] 'butterfly'
2. nonemphatic first syllable + nonemphatic second syllable
e.g., 蜘蛛
OC *rterto or *tretro 'spider'
(a reduplicative word with e-o vowel alternation)
3. emphatic first syllable + nonemphatic second syllable
e.g., 蝙蝠
OC *penpuk [pˁɛˁnˁpuk] 'bat'
(rhymes too dissimilar to regard this word as reduplicative)
4. nonemphatic first syllable + emphatic second syllable

e.g., 蜘蟟/知了
?OC *trerew' or *rterew' [...rˁɛˁwˁʔ] 'cicada'
(I assume the word is old, though it has no early attestations. I favor the first reconstruction as it could be a partial reduplication of a base *rew'. The whole word could simply be onomatopoetic. Cf. how some describe the noise of a cicada as a "trill" in English.)

Since the ratio of 'emphatic' syllables to 'nonemphatic' syllables is roughly 45 : 55, the four types would have the following distribution if there were no 'emphatic harmony':

1. first second: 20.25% (= .45 * .45)
2. first second: 30.25% (= .55 * .55)
3. first second: 24.75% (= .45 * .55)
4. first second: 24.75% (= .55 * .45)

In my forthcoming paper written in 2005, I found that 82.9% of the disyllables in Pulleyblank's (1991) lexicon had 'emphatic harmony': i.e., they belonged to types 1 and 2. 12.5% belonged to type 3 and only 4.6% belonged to type 4.

Obviously there was a strong, though not total, tendency toward 'harmony', but what could account for the exceptions, and why did one class of exceptions (type 3) outnumber the other (type 4) by a three to one ratio?

I hypothesized that the skewed distribution of types 3 and 4 was due to markedness and rightward 'harmonization':

- Since 'nonemphatic' syllables are unmarked, it's not surprising that some of them would 'resist' becoming marked ('emphatic'). Hence type 3 words are relic forms which had not undergone 'harmonization': e.g.,
蝙蝠 OC *penpuk [pˁɛˁnˁpuk] 'bat'
(instead of the expected *penpuk [pˁɛˁnˁpˁʊˁqˁ])
- Conversely, since 'emphatic' syllables are marked, there would be a tendency to make them unmarked ('nonemphatic'), and so there would be very few relics of type 4: e.g.,
蜘蟟/ 知了 ?OC *trerew' or *rterew' [...rˁɛˁwˁʔ] 'cicada'
(instead of the expected *trerew' [trerewʔ])
(Was this an onomatopetic exception to 'harmony'?)

Someday I'd like to confirm this statistical tendency by examining disyllables in another lexical source. I expect to find similar results.

I don't expect to make the same big mistake I made in my 2005 paper. I claimed that OC 'syllabic' (i.e., 'emphatic') harmony had no parallels in Arabic. Wrong!

Next: Syllables to the right, consonants to the left.

¹Boltz called the unpublished paper a "sequel" even though it was presented to the American Oriental Society before "The Butterfly Case" was published in 1955. I assume that the sequel was written after "The Butterfly Case", even though it came out first.

²Strictly speaking, most of these cases involve semantophonetic characters combining various rebus symbols with the semantic element 虫 'bug'. The exceptions are

'butterfly' which can be written as 胡蝶 without 虫 'bug' in the first sinograph.
(胡 OC *ga [ɢˁɑˁ] originally meant 'dewlap' and was also used to write a homophone 'barbarian' which might have been the source of one Tangut word for 'barbarian' [TT4464].)
'cicada' which has a pure rebus spelling 知了 'know-complete'.

³The emblems of the superheroes Batman and Spider-Man illustrate (literally!) how easy it would have been for the Chinese to draw pictographs of bats and spiders.

I vaguely recall an early Chinese drawing of a bat in Kennedy's 1955 article, but I don't have his Selected Works with me and can't confirm that.

⁴07.4.23.1:24: 'Monosyllabicity' in this context obviously does not mean 'all words are one syllable long' since 'butterfly' is obviously disyllabic.

Boltz (2000: 13-14) wrote,

What this comes down to is a hypothesis that, once the use of pictographs to stand for words, not things, has been achieved, that is, once pictographs have become invested with a stable phonetic value conventionally associated with a fixed meaning, the probability of discovering the rebus principle - and this is the crux of the hypothesis - goes up with the number of monosyllabic words in the language. The more monosyllables, the better the chances that some of them will be words that can be written effectively with pictographs [as opposd to words that cannot be easily visualized -AMR] ... Notice that the claim is not that the languages in question must be absolutely monosyllabic [as no language lacks polysyllabic words], but rather only that there be a large enough number of monosyllabic words to allow the rebus principle to be likely of discovery. By the same token, the evidence of monosyllabicity in the earliest Mayan, Sumerian, and Chinese texts is not necessarily evidence of a wholly monosyllabic language, but only of monosyllabicity for that part of the language that got written down. On the other hand, virtually all the circumstantial evidence points to a general monosyllabicity overall and the conclusion that these were in fact largely monosyllabic languages is therefore not unwarranted.

In a footnote, Boltz quoted from Coe's Breaking the Maya Code (1992: 51):

... [Mayan] roots are overwhelmingly monosyllabic, with the CVC pattern dominant, but these are highly inflected.

Similarly, I would note that Old Chinese roots were overwhelmingly monosyllabic, though they were not inflected.

Of course, Egyptian was not 'monosyllabic' in this sense and is an exception to this trend. Boltz (2000: 4-5) did not consider Egyptian writing to be "an invention of writing ex nihilo":

The independent origin of the invention of writing in Egypt seems to be not quite absolute, because it is widely acknowledged that, while Egyptian writing does not in its own intrinsic orthographic form show any influence from Mesopotamia, the idea of writing was likely to have come to the Egyptians from Mesopotamia such that whatever conditions or factors shaped the invention of writing in Egypt, they included a knowledge of writing as a conceptual debt owed to the Mesopotamians ... Mayan, Sumerian and Chinese, on the other hand, are the three linguistic environments where writing arose, we can safely say, without any knowledge of or influence from any other writing system. In all three cases the language at the time writing was inventing was predominantly monosyllabic ... the first graphs to appear that qualify as writing ... stand for monosyllabic words. Writing might have appeared somewhere in the form of (picto-)graphs that stood for words of more than one syllable, but we do not know of any such instance ... The obvious question is whether the monosyllabicity is simply a chance coincidence or whether it might have been a factor in the emergence and successful development of writing in world history.

Boltz thought that monosyllabicity was significant, but he also did not deny that

there were likely to have been both punctual events and enduring conditions among the non-linguistic factors that played a role in the appearance of writing. (p. 16)

Hence monosyllabicity may have been a necessary condition for graphogenesis, but not the sole condition.

All this implies that 'syllables' are not just arbitrary constructs of Western linguistics. Three unrelated, non-Western civilizations built their writing systems around syllabic cores.