08.2.9.23:59: CRYPTOGENDER

is a term I made up after seeing page 687 of Karin Ryding's A Reference Grammar of Modern Standard Arabic:

cryptofeminine: a feminine noun not overtly marked for feminine gender

cryptomasculine: a masculine noun not overtly marked for masculine gender

I have always struggled with grammatical gender partly because my first languages were genderless: Pijin, English, Japanese. (I am excluding natural gender.)

The first language with gender that I studied was German which I would describe as heavily cryptogendered.  It seemed even more cryptic than it actually was because I don't remember being taught any generalizations other than:

-e tends to correlate with feminine (but beware of das Ende!)

-chen is neuter (so the neuter noun Mädchen 'girl', often cited as an example of how unpredictable German gender is, turns out to be perfectly regular!)

I may have picked up some other rules by inference: e.g., after memorizing a bunch of feminine nouns ending in, say, -heit, one might guess correctly that -heit nouns are feminine.

Years later, I studied less cryptogendered languages: Latin, Sanskrit, and Russian.  It was a relief to look at a noun and be able to guess its gender in many (but still not all!) cases.  The Russian rules are so simple that I can summarize the rules (adapted from Pulkina, et al., Russian: A Practical Grammar with Exercises, pp. 22-23, 27) here:

if a noun refers to a person, it has natural gender

except for ditja 'child' which is neuter

if all other nouns end in

a nonpalatalized consonant or (palatal!) -j: masculine

a palatalized consonant: masculine or feminine

this is the main cryptogendered category for native nouns

-(j)a : feminine

but -mja: neuter

cf. ditja 'child' above; I suspect these -ja neuters come from old nasal-final neuters: compare imja 'name' with Skt naaman, Lat nomen

-(j)o, -e : neuter

borrowed indeclinable animates have biological gender

borrowed indeclinable inanimates are neuter regardless of their endings

except for kafe 'coffee' which is masculine

Proto-Indo-European doesn't seem to have been very cryptogendered, and pre-PIE may not have had any gender at all:

That this system [of suffixing adjectives to indicate the feminine] once came into being in a not too distant past, and that the feminine gender is recent, seems to be obvious.  But because traces of this system remain in almost all of the languages, it must go back to PIE ... Whether Hittite was separated from the other languages before or after the appearance of the feminine is of less importance, though interesting enough.  It is a priori probable that Hittite lost this category because that language lost so much ...

There are also indications that the neuter is recent.  These are those words which either did not or hardly ever appeared in the ergative.  The adjective confirms that the difference between masculine and neuter exclusively lies in the nominative-accusative (for which the neuter had the old absolutive form, while the masculine-feminine changed an older opposition, ergative : absolute, into nominative : accusative). 

- Robert SP Beekes, Comparative Indo-European Linguistics, p. 174

Here's my understanding of the development of the IE gender system:

pre-PIE: ergative/absolutivePIE: nominative/accusative
nouns capable of taking the ergative case* (animates?)nouns with a nominative/accusative distinctionmasculine
feminine
nouns incapable of taking the ergative case (inanimates?)nouns without a nominative/accusative distinctionneuter

Even now, English still preserves the lack of an nominative/accusative distinction in the neuter third person singular pronoun: he/him, she/her, but it/it.

Strangely, Russian has regained it: оно (nom.), его (acc./gen.; same as the acc./gen. for он 'he').  (The distinction is still absent from neuter nouns: e.g., окно [nom./acc.] 'window'.)

Russian also has an animate/inanimate distinction within the masculine category (as opposed to the pre-PIE animates which became later masculines and feminines).

According to Wikipedia, Polish goes even further: it has a three-way distinction within the masculine category:

animacypersonhoodnom sg.acc. sg.nom. pl.gloss
animatepersonalnowy studentnowego studentanowi studencinew student(s)
nonpersonalnowy piesnowego psanowe psynew dog(s)
inanimatenowy stółnowe stółynew chair(s)

I've been asked why ancient (Indo-European) languages are so complicated. I could turn the question around and ask, why are modern IE languages so complicated?  Sanskrit has nothing like the tripartite split of the masculine and it has fewer cryptogender nouns than some modern European languages.  Here's an outline of the Sanskrit system:

masculinefeminineneuter
-a-s, -aa-s-aa, -ii(-s), -uu(-s)-a-m
-i-s, -u-s-i, -u
-ṛ (based on biological gender)(only when -ṛ nouns used to qualify neuter nouns; see Whitney, Sanskrit Grammar, p. 140)
consonant-final root stems: e.g., pad 'foot' (m.), vaac 'voice' (f.), hṛd 'heart' (n.)
-aas/-as- (is uṣaas/uṣas- 'dawn' the only f.?)-as-, -is-, -us-
-aa/-an-, -ii/-in-(feminines in -ii; see above)-a/-an-, -i/-in-
-a(a)n/-a(n)t--at/-a(n)t-

The above chart does not cover adjectives or oddities like rathiis 'charioteer' which looks like a feminine but is masculine.

2.10.00:50: Some of the above cryptogender categories are not so cryptic if forms other than the nom. sg. are examined: e.g., it's impossible to tell from the nom. sg. that agnis 'fire' is m. and gatis 'road' is f.  However, they have distinct inst. sgs. and gatis can optionally take f.-only endings in the dat., abl./gen., and loc. sg. (cryptogender forms are in bold):

sg.'fire' (m.)'road' (f.)cf. 'girl (f.)
nom. agnisgatiskanyaa
acc.agnimgatimkanyaam
inst.agninaa**gatyaakanyayaa
dat.agnayegataye, gatyaikanyayai
abl./gen.agnesgates, gatyaaskanyayaas
loc.agnaugatau, gatyaamkanyayaam
voc.agnegatekanye

*PIE nom. *-s may have originated as an ergative ending (Beekes 1995: 194).  In Beekes' PIE reconstruction, the nom. gen. sg. of *-o-stems is identical.  This is not surprising, since "[i]n many languages the genitive serves as ergative ... It seems very probable, then, that the genitive originally functioned as an ergative."  I wonder if the reverse could be true: i.e., that the ergative case could have originally represented the 'possession' of an action by an animate actor.

In Sanskrit, the gen. sg. has an added -ya:

amṛta-s 'Amritas' (nom. sg.)

amṛta-s-ya 'of Amritas' (gen. sg.)

Although Pulleyblank has tried to link Sino-Tibetan to IE, as far as I know, he has never tried to link the Written Tibetan ergative ending -s to PIE *-s.  This resemblance is purely fortuitous.

WT -s is actually one out of four allomorphs of the ergative ending. All but of these allomorphs are identical to the allomorphs of the genitive plus -s:

environmentgenitiveergative
after -d (< *-t), -b (< *-p), -s-kyi-kyi-s
after -r, -l, -n, -m-gyi-gyi-s
after -g (< *-k), -ng-gi-gi-s
after vowels-Hi-s

I presume all these forms came from an original *-ki(-s):

*-k voiced after a voiced consonant (though this doesn't explain how *-k-k- became -g-g-). 

*i became yi unless preceded by two velars.

*-k- lenited to -H- intervocalically.

In early Tibetan, the ergative ending after vowels may appear as -His (Beyer 1992: 190).  Thus -s must be a contraction of -Hi-s. Another variant in verse is -yis (Beyer 1992: 190, 265), possibly from

*-Hyi-s < *-gyi-s < *-gi-s < *-ki-s

The WT genitive resembles the later pronunciations (early Md *khi, late Middle Chinese *gɰi) of the Old Chinese third person possesive pronoun 其 *gə. However, OC corresponds to WT a, not WT i.

**Although the -naa of agninaa (m.) sets it apart from gatyaa (f.), the same inst. sg. ending also appears in the neuter: vaariṇaa 'water'. (The retroflexion of is automatic after a preceding r and has nothing to do with gender.)


08.2.8.20:36: ÖÜTSPEECH (explanation*)

I am always interested in transcriptions of languages for nonlinguists, because they may give me insights about which sounds are perceived to be similar.

For example, Thai [ɨ] and [ə] are transcribed as ü and ö (even though German also has a [ə]!**) at clickthai.de. This reminds me of how Sino-Vietnamese ư, and less frequently, ơ may have been attempts to imitate Southern Middle Chinese *y and possibly *ø:

'fish': 魚 SV ngư [ŋɨɨ] < late SMC *ŋy

cf. Cantonese jy

'that which': 所 SV sở [ʂəə] < early SMC *ʂyø (or *ʂɨə?)

but other Chinese languages have a back vowel: e.g., Cantonese so, Md suo

Sino-Korean also has a back vowel: so instead of

I plan to look into this mismatch later.

if the SV form were based on late SMC, it would have been sử [ʂɨɨ] and rhyme with 'fish' above (except for the tone)

It is, however, also possible that SV preserved SLMC unrounded vowels.

*'Öütspeak' is based on Aussprache 'pronunciation'. Öü öf cöürse refers to the two vowel symbols that I discuss in this post.

Öü - if pronounced as in German - sounds like Dutch ui [œy] as in uitspraak, the Dutch equivalent of Aussprache.

**This is because German [ə] is written as e, which is the symbol clickthai.de uses to represent Thai [e]. Thai also has a long [əə] unlike German.

When I was trying to devise an orthography for Pijin in high school, I didn't know the IPA even existed. I did, however, know the German alphabet. So I recycled the German letter ö to write [ə].

Another German letter I used in my early Pijin orthography was ä for Pijin [æ]. (The German value of ä is [ɛ], so clickthai.de uses it to represent Thai [ɛ].)

My current Pijin orthography is based on a maximally un-English pronunciation with only five vowels, so I no longer need any vowel symbols other than a e i o u. a corresponds to all English achromatic (nonpalatal and nonlabial) vowels and e corresponds to all English nonhigh front vowels.


08.2.7.3:19: A 家 *KRA-ZY IDEA (PART 1)

In two recent posts, I mentioned two sinographs that happen to be homophonous:

豭 Old Chinese *k-ra > Middle Chinese *kæ > Md jia 'pig'

家 Old Chinese *kra > Middle Chinese *kæ > Md jia 'house'

Both graphs share the element

豕 MC *ɕjeʔ > Md shi 'pig'

Its OC reading is unknown, but it certainly couldn't be anything like *kra. It could have ended in *-eʔ or *-ajʔ. Schuessler reconstructed *hleʔ or *hlajʔ, which resembles Mon-Khmer words for 'pig'. I don't know of any evidence for a liquid initial other than this remote possibility*.

While I was out on Saturday, I realized that 豕 'pig' might have had two readings in early OC:

- the one that became MC *ɕjeʔ (a loan from MK?)

- OC *k-ra or just *ra

sharing a root with 豬 OC *t-ra 'pig' (< ?*tɯ-ra)

For a long time, I couldn't understand why 家 'house' was written as 宀 'roof' atop 豕 'pig'. Not all houses contained pigs.** But if 'house' was homophonous with 'pig', then it would have made sense to write 'house' as 'roof' (semantic) + 'pig' (phonetic).

When I came home, I found that Shuowen stated that 豭 'pig' was an abbreviated phonetic in 家. Although Shuowen dates over a millennium after the creation of sinography and is not necessarily reliable, I was glad to see that Xu Shen and I agreed that 'pig' was somehow related to 'house'.

The abbreviated phonetic explanation is rather strange, though, because the obvious phonetic in 豭 'pig' is its right side 叚 OC *kra 'borrow' (now written as 假 with 'person' on the left). Why wasn't 'house' written as 'roof' + 'borrow'? Why use the semantic part of a phonetic as an abbreviated phonetic?

I don't think 'pig' was either semantic or part of a larger phonetic - it itself was the phonetic.

Next: Given that 'pig' was a prefix-root sequence *k-ra, was its homophone 'house' was also bimorphemic?

*豕 MC *ɕjeʔ could be phonetic as well as semantic in 豚 MC *don 'young pig' if both words had lateral root initials and liquid codas in OC:

豕 OC *hlalʔ > *hlajʔ > MC *ɕjeʔ

豚 OC *lur > MC *don

They would share a root *l-l ~ *l-r. However, I know of no other instances of an *a ~ *u root vowel alternation.

A slightly more likely cognate of 豕 is 'swine':

豨 OC *ʔhləl(ʔ) > *ʔhləj(ʔ) > MC *xɨj(ʔ)

but its other (!) phonetic 希 is presumably OC *xəj; there is no evidence for reconstructing 希 with a lateral

the only way out is to assume that the graph 豨 was devised after *ʔhl- > *x-, which would have to be a very early sound change

*a ~ *ə vowel alternation is much better attested. It has been advocated by Pulleyblank for years. Examples are in Schuessler (2007: 103).

**I think the hypersimplified graph 宀 'roof' + 人 'person' (U+219BC) makes more sense as a semantic compound for 'house'. Houses are generally meant for 人 people, not 豕 pigs.

宀+ 人 is from a list of proposed simplifications for discussion in the second table of the "Second Draft Plan for the Simplification of Sinography" (1977). As far as I know, these proposed simplifications were not used, whereas the simplifications in the first table were actually used between January and July 1978. How many of these short-lived graphs are in Unicode?

宀+人 may be in Unicode because it is also a traditional character for an obscure, post-MC word Md rong meaning 'long hair' (source). It could go back to an unattested MC *ɲuowŋʔ which in turn would be from an OC *noŋʔ. It may be cognate to other *n-words for hair: e.g.,

*nə 'whiskers of an animal' (may be from < *no)

*s-nə 'bearded'

*s-no 'beard'


08.2.6.00:46: TREE TALES (explanation*)

The character spotlighted in "My Own House on the Water" seemed to be a heavily disguised version of this graph. Its parts make only a little less sense than this Japanese compound:

眼 梶木

'eye paper mulberry tree'

What is it? Hint: It's not a tree, though it does have eyes. Answer here. Scroll over the white space below for more.

Jpn mekajiki 'swordfish' sounds like

me 'eye' + kaji 'paper mulberry' + ki 'tree'

梶木 kajiki is purely phonetic, but 眼 me may be semantic. According to Koujien, 'eye' refers to the large eyes of the swordfish compared to kajiki 'marlin'.

Another spelling of mekajiki with different semantic graphs is 眼旗魚 'eye-flag-fish'.

旗魚 'flag-fish' is the Chinese word for 'marlin'. Its spelling has been recycled for an unrelated Japanese translation equivalent kajiki.

*Meant to sound like 'tree tales'. The graph 梶 Md wei consists of 木 'tree' (semantic) + 尾 'tail' (phonetic). It means 樹杪 Md shushao 'treetop' and is probably cognate to its homophone 尾 'tail' (i.e., the end of something).

miao 'tip of a branch; end of a period' in 樹杪 shushao 'treetop' could be cognate to 梢 shao 'tip of a branch, treetop (< 'tip of a tree')'. (I will investigate the strange m- ~ sh- alternation later.) 梢 can also mean 'end of something' (e.g., a result) or 'rudder' (< 'tip of a ship'?).

In Japanese, 'rudder' and 'paper mulberry' are both kaji < *kandi, so this may have something to do with how 梶 'treetop' ended up representing a specific plant rather than just part of one:

梢 'treetop / rudder'

> similar in meaning to 梶 'treetop'

> has another meaning corresponding to Jpn kaji 'rudder'

> which sounds like Jpn kaji 'paper mulberry'

It's not clear why 梢 wasn't used to write Jpn kaji 'paper mulberry'.

(The normal spelling of Jpn kaji 'rudder' is 舵, originally for OC *lajʔ, which is unrelated to the m- ~ sh- 'tip' words or 尾 'tail' and 梶 'treetop'.)


08.2.6.00:22: AUNTIZENS

Who are they? They're called 아티즌 athijUn in Korean. I learned that word from the quiz atop this list of Korean words which aren't in dictionaries. Scroll over the white space below for the definition or go here for an explanation in Japanese. Hint: Netizen is 네티즌 nethijUn in Korean.

athijUn is from 아줌마 ajumma 'auntie' plus nethijUn. It refers to middle-aged women who use the Internet. Other similar coinages are:

악티즌 akthijUn (< Sino-Korean 惡 ak 'evil'): someone who slanders others online

욕티즌 yokthijUn (< Sino-Korean 辱 yok 'disgrace'): someone who attacks others in comments sections

노티 즌 nothijUn (< Sino-Korean 老 no 'old'): elderly people who are online


08.2.5.23:50: LANGUAGE OF THE LUCKY LAND (where's that?*)

I just discovered that Wikipedia is available in 閩東語 Eastern Min: cdo.wikipedia.org. The cdo (< 'Chinese Dong [East]'?) pages are written in the strange-looking 平話字 Bàng-uâ-cê 'everyday speech character' romanization with double-dot subscripts to accomodate superscript tones: e.g., Mìng-dĕ̤ng-ngṳ̄, the EM pronunciation of the name of the language.

I would avoid subscripts by using tonal spelling:

BUC Bàng-uâ-cê : my Bhang wah czeh

cf. Phing huah zih in my Md tonal spelling (四聲拼音 'Sisheng Pinyin')

-h- (-z- after c-) indicates a yang tone after obstruents, which are not typically yang

final -h indicates a 'departing' tone

BUC Mìng-dĕ̤ng-ngṳ̄ : my Ming dëng nguu

cf. Miin dong yuu in my Md tonal spelling

doubled vowels indicate 'rising' tones

these tonal names are currently purely conventional; in Fuzhou Eastern Min, the 'rising' tone is actually mid level

the Fuzhou cognate of 閩 Md Miin is Ming with a 'level' tone instead of a 'rising' tone; Jiyun contains both 'level' and 'rising' tone Middle Chinese readings for 閩

Fuzhou has merged all final nasals as -ng; similarly, all final stops have become -k in Fuzhou, whereas Md has lost them entirely:

Middle ChineseFuzhouMandarin
*-ŋ-ng-ng
*-n, *-m-n
*-k, *-t, *-p-k(none)

(I am ignoring the distinction between /k/ and /ʔ/ in Fuzhou tonal sandhi and initial assimilation. The distinction is neutralized in citation forms. The origin of the distinction is not clear to me.)

The BUC represents underlying phonemic forms, so it ignores complex phonetic phenomena: e.g., 福州 Hók-ciŭ 'Fuzhou' looks like it should be pronounced as [houʔ24 tsiu55] but it is actually pronounced [huʔ21 tsiu55] with a falling tone and no [o] in the first syllable. But 福 would be pronounced [houʔ24] with a rising tone and an [o] in isolation.

Two phonetic phenomena in Fuzhou may have parallels in Tangut:

1. The lenition of initial consonants after vowels is reminiscent of my proposal of Tangut lenition conditioned by lost presyllables:

*CV-k- > ɣ- (cf. Fuzhou /V k/ > [V Ø] [was there an earlier *ɣ-stage?])

*CV-ts- > z- (cf. Fuzhou /V ts/ > [V ʒ])

*CV-t- > l- (cf. Fuzhou /V t/ > [V l])

*CV-p- > w- (cf. Fuzhou /V p/ > [V β])

Tangut w may have been [v] or [ʋ] which are close to Fuzhou [β], and there could have been a *[β]-stage

Tangut lenition is like Alexander Vovin's proposal of medial lenition in Korean except that

- in Korean, presyllables were not involved, and the vowels conditioning lenition were not lost

- *p lenited to β in Middle Korean, not w; β later became modern w

2. The tense and lax vowels of Tangut might have been phonetically parallel to the 'close' and 'open' rhymes of Fuzhou.

*福州 Fuzhou consists of 福 'good luck' + 'district'.


08.2.4.22:26: MY OWN HOUSE ON THE WATER

is a reference to a sinograph that is a compound of

自 Md zi < MC *dzih < OC *sbits 'self'

家 Md jia < MC *kæ < OC *kra 'house'

水 Md shui < MC *ɕwiʔ < OC *hlulʔ 'water'

None of these components are phonetic. None seem to be semantic either. The structure of this graph is as opaque as some tangraphs. What could it possibly mean? Scroll over the blank lines below for the answer.

自+ 家+水 is a variant of 藥 Md yao < MC *jɨak < OC *lakw 'medicine'.

自 'self' may be a deliberate distortion of 白 Md bai < MC *bæk < OC *brak 'white'.

水 'water' may be a deliberate distortion of 木 Md mu < MC *mok < OC *mok 'tree, wood'.

家 'house' has a roof 宀 that may be a deliberate distortion of 艹 'grass'. But ㄠand ㄠ, the only remaining parts of 藥, do not resemble the 豕 'pig' under the roof.

Next: A 家自 idea.


08.2.3.23:59: BORROWINGS WITHOUT BEARDS

In "Lotsa *luk", I reconstructed three Old Chinese readings of 濼 'medicinal grass':

*lakw

*hlakw

*Cʌ-lekw

The first reading matches 藥 OC *lakw 'medicinal herb, medicine'. Jiyun* has two more Middle Chinese readings of 藥 with nonmedicinal meanings:

MC *ɕɨak < OC *hlakw 芍藥熱皃 ?'peony appearing hot' (see the bottom right corner of this graphic); cognate to 灼 OC *t-lakw 'burn' and the rest of the *l-kw 'shine' word family?

MC *lɨak < OC *Cɯ-lakw 勺藥調味和 ??'peony flavors harmonizing' (this can't be right; see these two pages)

Vietnamese thuốc [thuək] 'medicine' seems to be borrowed from

MC or late OC *ɕɨak < OC *hlakw

which would be homophonous with the first of the two nonmedicinal meanings of 藥 and the second reading of 濼 'medicinal grass'. The correspondence of Viet th- to MC *ɕ- is regular and is parallel to the correspondence of Viet t- to MC *s-**.

There are two problems with this etymology:

1. MC *ɕɨak 'medicine' is not attested in the rhyme dictionary tradition (though MC *ɕɨak is attested as a reading of 藥)

2. MC *-ɨa- corresponds to Viet -ươ- [ɨə], not -uô- [uə] (The tonal correspondence is regular.)

The first problem is not fatal, since the rhyme dictionaries are not complete catalogs of every single word in MC.

(2.4.00:20: Moreover, OC *hl- could be from a prefixed *lakw. Until recently, I assumed that only *s- could devoice root-initial liquids, but now I suspect that *k- could also have the same effect. I will explore this idea in a later post.)

The second problem (a 'beardless' u appearing where a 'bearded' ư is expected) is also found in this etymology:

著 'chopstick': Viet đũa [ɗuə]< earlier Viet *duəh : MC *ɖɨəh or Late OC *ɖɨah

The correspondence of LOC to Viet *d also occurs in

池 'pond': Viet đìa [ɗiə] < earlier Viet *diə : MC *ɖiə or late OC *ɖia (Pulleyblank 1984: 209)

Why aren't the Vietnamese words for 'medicine' and 'chopstick' thước and đữa with -ư- [ɨ] instead of thuốc and đũa with -u-? Perhaps these words were borrowed during an early period when Vietnamese did not have the diphthong *ɨə, which "must have been an innovation ... and not something inherited from Austroasiatic" (Pulleyblank [1984], "The Old Chinese Origins of Type A and B Syllables", p. 85). Vietnamese *uə would have been the closest available approximation of Chinese *ɨə. (Cf. how a 'bearded' ư represents Vietnamese [ɨ].)

This implies that all Vietnamese borrowings from Chinese with -ươ-/-ưa [ɨə] either postdate the development of that diphthong in Vietnamese and/or were originally borrowed with *a and later subjected to diphthongization: e.g., 唐 Đường [ɗɨəŋ] 'Tang Dynasty' < MC *daŋ. (A nondiphthongized reading Đàng [ɗaŋ] also exists. There is no Chinese-internal evidence for an MC reading with a diphthong.)

*Thanks to Sven Osterkamp for introducing me to Waseda University's online version of Jiyun.

***s- became *t- in Vietnamese after MC *s-words were borrowed. Viet th- corresponding to MC *ɕ- could be the result of a similar change

*ɕ- borrowed as*sh- > *th-

but an aspirated s is unusual, though such a fricative does exist in modern Burmese.

(Burmese sh- is from an earlier *tsh- [romanized as hts- in Judson's grammar of 1842] which is turn is from an earlier palatal stop *ch- or palatal affricate *tɕh-).


Tangut fonts by Mojikyo.org
All other content copyright © 2002-2008 Amritavision