I have uploaded an Excel file of the initial phonemes of entries in Calvert Watkins' (2000) The American Heritage Dictionary of Indo-European Roots.

Following Beekes (1995: 125), I treat i- and u- as consonants y- and w-.

The other initial vowels originated as sequences of laryngeals and vowels. I didn't have time to sort through the various origins of initial vowels, but as a crude approximation, I will treat

a- as ʕ-

e- as ʔ-

o- as ʕʷ- (though some o- once had ʔ-)

using Beekes' (1995: 148) interpretation of the laryngeals.

The top three initials are voiceless: *s-, *k, *p-.

The next three are labial: *w-, *bh-, *m-. They are much more common than their palatal and dental counterparts *y-, *dh-, *n-.

*ʕ- is even more common than *d- and *t-, which is surprising since ʕ is rare in the world's languages. It occurs in only 2% of UPSID. ʕ- is also more common than d and t in Arabic. If a language has ʕ, is it likely to have a lot of ʕ?

In Arabic, ʔ is slightly more common than ʕ, but in Watkins  (2000), *ʕ- is almost twice as common as *ʔ-!

Labiovelars and the unusual labialized pharyngeal *ʕʷ- (not even listed in UPSID!)* are near the bottom and *b- is at the bottom.

*7.24.00:16: Shuswap not only has ʕʷ but also has glottalized ʕʷˀ. THE DISTRIBUTION OF INITIALS IN AUSTRONESIAN COGNATE SETS

At the top of the cognate set pages of Robert Blust's Austronesian Comparative Dictionary is a bar graph showing how many sets share a given initial phoneme. The distribution of initial consonants is extremely skewed:

Top seven initials

1. b (944)

2. zero (i.e., vowel-initial; 696)

3. q (382)

4. p (305)

5. k (253)

6. l (228)

7. t (227)

Bottom seven initials

7. m (39)

6. n (36)

4-5. c and ŋ (tie; both 35)

3. z (25)

2. N (8)

1. ñ (7)

These figures conflate reconstructions at different levels: .e.g., Proto-Austronesian, Proto-Malayo-Polynesian, Proto-Oceanic (which has an *o- absent from PAN and PMP). Nonetheless, three oddities stand out amidst that messy aggregate:

- The voiced stop b is 2.5+ times more common than any voiceless stop or any other voiced stop. Why? (Cf. Proto-Indo-European *b- which was marginal!)

- Uvular q is 50% more common than velar k, even though k is a more common sound worldwide. Why?

- All nasals are in the bottom seven. PAN has only one cognate set with initial *ñ-. Why? (Cf. Proto-Turkic which lacked initial *m-.) I should explain why I find those frequencies unusual.

Compare the top and bottom seven of the ACD with those of English, Spanish, Sanskrit, and Arabic.

Voiceless stops are usually more common than their voiced counterparts:

English: p > b, t > d, c (even without adding k) > g

Spanish: p > b, c > g, but d > t (though not by 2.5 times! - could the preposition de account for a significant amount of the frequency of d?)

Sanskrit: p > b, t > d, c > j, k > g

Arabic: k > j from *g, but

d is slightly ahead of t (excluding taa' marbuuṭa for t)

b > f from *p (though not by 2.5 times! - could the preposition bi account for a significant amount of the frequency of b?)

In UPSID, k occurs in 89% of languages, but q occurs in 11% of languages. That leads me to think that k should be more common than q, though in Arabic, the reverse is true. Nonetheless, the q-k gap in Arabic is only about 30%, whereas it is 50% on the Austronesian cognate set pages.

Nasals are usually very common:

n is the second most common consonant in English, flanked by t at #1 and h at #3 (because of an and the?). m is way behind n, yet is still ahead of p.

n is the third most common consonant in Spanish. (Because of the n in indefinite articles? But l is not at the top in spite of the l in definite articles! And adding ñ to the n total couldn't be a big boost.) m lags far behind n but is still far from the bottom.

n and m are more common than any consonants in Sanskrit other than y (which is slightly ahead of m but not n), v, r, and t (the most common because it appears in pronouns and suffixes?).

n and m are the third and fourth most common letters in Arabic.

I am aware that I am comparing apples, oranges, and even some inedible objects here. Frequency in cognate sets mixing various levels of reconstruction is not the same as textual frequency (the basis of the English, Spanish, Sanskrit, and Arabic figures) or frequency in UPSID (i.e., the percentage of languages that have a particular phoneme). It is possible for a language to have only a small number of words with a phoneme that appears in very common words: e.g., English /ð/ in the. TSOU FŊ-

I never expected to write a trilogy about fN-clusters. This trilogy is rather unbalanced, since this final part is much shorter than the first two.

Back in part 40 of my Uchinaaguchi series in May, I mentioned the initial clusters of Tsou. But I left out one cluster: fŋ-. There are no fP-clusters, so there may be a constraint against them. On the other hand, the existence of ft- leads me to think that there may be no constraint against a hypothetical fn-.

Proto-Austronesian did not have any *f- or initial clusters, so I assume fŋ- is from an earlier *CVŋ-.

7.22.00:33: Looking at Blust's Austronesian Comparative Dictionary, I see that Tsou f- is from PAN *b- and that fC-clusters do descend from PAN *bVC-sequences, though not necessarily in a straightforward manner: e.g.,

Ts fkorə 'a plant' < PAN *baNaR 'a thorny vine'

Ts fkuu 'a star visible in the twelfth lunar month: the Dipper' < PAN *banaq 'a constellation: the Pleiades'

Did PAN medial nasals become Ts -k-, and if so, how? Is Tsou fŋ- from *bVŋ-, or does it have a more complex origin? AN OLD WORD WITH A *PNEW ETYMOLOGY

Yesterday, I wrote,

The only English example of fn- I know of is the invented word fnord.

f (and Dutch and German v) in Germanic languages is from Proto-Indo-European *p. So I would expect Germanic words with fn- to come from PIE *pn-. The only Indo-European pn-root I could think of was Greek pneu-, but I couldn't think of any fn-cognates in Germanic and assumed that modern Germanic fn-words must be innovations.

Andrew West came to my rescue last night and pointed out that the Old English word for 'sneeze' was fnēosan. Do the first four letters look familiar? JP Mallory (1997: 82) reconstructed (emphasis mine)

*pneu- '±snort [...] ON [Old Norse] fnȳsa 'puff, snort', OE fnēosan 'sneeze', Grk πνέω 'breathe'. Probably onomatopoetic in origin and only in Greek has it become the ordinary word for 'breathe'.

(What does the ± symbol represent?)

Nikolayev's database of Indo-European etymology has more.

What happened to the old Germanic fn-words for snorting and sneezing?

Proto-Indo-European *pneu-
Proto-Germanic *fniu-s-ia- ~ *fnū-s-a- ~ *fnu-s-, *fnu-z-an-
Old Norse fnȳsa Middle High German phnūsen Middle Dutch fniesen Old English fnēosan
Icelandic fnýsa 'to sneeze' Swedish fnysa 'to snort', nysa 'to sneeze' Norwegian and Danish  fnyse 'to snort', nyse 'to sneeze' German niesen 'to sneeze' Dutch niezen 'to sneeze' English sneeze

In the mainland Scandinavian languages, f-less variants coexist alongside fn-words but with different meanings.

fn- was simplified to n- to German and Dutch, whereas f- assimilated to the following alveolar n- and became alveolar s- in English.


Although Icelandic has a reputation for being conservative, even it is not immune to innovation. fnýsa retained fn-, but the cluster fn- in fnkyr 'stench' had variants n- ~ sn- ~ fr-. (The first two are reminiscent of mainland Scandinavian fn- > n- and the second is reminiscent of English fn- > sn-.)

According to the Wiktionary entry for sneeze (emphasis mine)

The infrequency of the “fn” combination coupled with the visual similarity of an “f” and “ſ” (long “s”) assisted in ultimately turning “fneeze” into “ſneeze (sneeze)”.

'Sneeze' is not a predominantly literary concept, so I suspect that its written form had nothing to do with the change. I can imagine misreadings of low-frequency words spreading from the literate to the illiterate, but I can't imagine the literate misreading an everyday word.

Andrew West explained the rules for long s. FM-

Is there a language with initial fm-? I don't know of any. It would be nice if there were an online database of all attested consonant clusters in human languages.

Initial clusters of the type SN- are very common but FN- and XN- seem to be rare.

Japhug rGyalrong, a living relative of Tangut, has many initial XC-clusters including initial XN-clusters (Jacques 2004: 45, 47, 333):

ɣm(b)- < *km(b)- ʁm- < *qm- ʁmb- < ? ʁmbɣ- < ?
ɣn(d)- < *kn(d)- ʁn- (corresponds to Tibetan gny- in borrowings) ʁnd- < *qnd-
ɣndʑ- < *kndʑ-

ʁɲ- (corresponds to Tibetan gny- in borrowings)

and a large set of initial fC-clusters but no fN-clusters (Jacques 2004: 34, 295, 333-334):

ft- (not from proto-rGyalrong *pt-*; corresponds to Tib bt- [and in one case, Tib rt-!] in borrowings)

fts- < *pts- fs- < *ps- fsr- (not from proto-rGyalrong *psr-; corresponds to Tibetan bsr- in borrowings)
ftɕ- < *ptɕ- fɕ- < *ptɕ-
ftʂ- < *ptr-

fk- < *pə-k- (*pk- became pɣ-!**)

Note that some f-clusters found in English and other European languages (fl-, fr-, fj- [as in fjord]) are missing. Everyday clusters to a Japhug speaker are exotic to Europeans, and vice versa.

Czech and Slovak have fň- in 'to whimper'

Cz fňukat

Sl fňukať

which I suspect is onomatopoetic in origin.

The only Dutch word with fn- is fnuiken 'to disable'. I don't know where this fn- or the somewhat more common fn- of Danish, Norwegian, and Swedish comes from.

The only English example of fn- I know of is the invented word fnord.

In addition to fC-clusters, Japhug has many βC-clusters but no βN-clusters***, whereas Slavic languages have lots of vN-clusters including vm- as well as vn-. Why is vm- more common than fm-? In the case of Slavic, v- < *w-, whereas f- is almost entirely in loanwords from langauges without fm-.

*Odd that proto-rGyalrong had no *pt-. Such a cluster would have become an unattested *fc- (Jacques 2004: 297 - but why did labials condition palatalization of dentals?). Perhaps pre-Japhug originally borrowed Tibetan bt- and bsr- as *pt- and *psr- which later became ft- and fsr-.

**Could *pk- have become *fk-, following the pattern of the other *pC-clusters? The -ɣ- of pɣ- would be due to intervocalic lenition:

*pə-k- > *pə-g- > *pə-ɣ- > pɣ-

***Japhug has f and β instead of f and v (both labiodental) or Φ and β (both bilabial). KHITAN SMALL SCRIPT 11:  BAFFLED BY <BO.HU.ÁN>

Words for 'children' have unusual plurals in some languages:

English child-r-en (two obsolete plural endings)

Dutch kind-er-en (one obsolete plural ending followed by a productive ending)

cf. German Kind-er with just one ending

Russian дети (plural) is unrelated to ребёнок (singular) (full paradigm)

In Japonic, plurals have become singulars:

Jpn ko-domo 'children' > 'child' as well as 'children'; new plural kodomo-tachi

Proto-Ryukyuan *ko-ra 'child > Okinawan kwaa 'child' (proposed by Leon Serafim)

(Thorpe 1983: 271 reconstructed PR *kuwa.)

The Khitan words for 'child' and 'children' follow none of the above patterns:


<bo.qo> 'child' ~ <bo.hu.án> 'children'

The plural not only has a different second (root?) syllable  <hu> but also a final syllable  <án> that does not match any of the Khitan plural endings listed in Kane (2009 138-142).

Written Mongolian baɣacuud 'children' (< baɣa 'small')  matches the <h> *[ɣ] of the Khitan plural, but not the <q> of the Khitan singular.

What's going on here? Three solutions:

Solution 1: The root of <bo.qo> and <bo.hu.án> is <bo>.

Problem 1: If <hu.án> is a plural suffix, what is <qo> in <bo.qo>?

Problem 2: If the root <bo>, then it probably can't be related to WM baɣa unless <bo> is a contraction of an earlier *baw < *baɣa.

Solution 2: Toyoda read  as *baɣa and as *baɣacu. These nicely match the WM forms and <án> does look like Chinese 出 *chu.

Problem 1: If was *ɣa, then  'dog' was *niɣa with a *-ɣ- corresponding to -q- of WM noqai 'dog'. The price for one WM match is another WM mismatch.

Problem 2: The suffix <én> from part 10 could be changed to <án>, suggesting that they "must have been close, morphologically and/or phonetically" (Kane 2009: 70). Perhaps had two readings:

<chu> (based on Chn 出 *chu 'go out')

<án> (based on a Khitan root *an 'go out'?)

Solution 3: The <q> ~ <ɣ> alternation in 'child' ~ 'children' is consonant gradation.

Problem: As far as I know, this phenomenon has not yet been observed in any other Khitan words. Is consonant gradation in any Mongolic or even any 'Altaic' language? KHITAN SMALL SCRIPT 10: A PROPER BIRD?

I realized today that the final Khitan small script phonogram 361 <én> which appeared at the bottom of the polygram


from yesterday's entry might be derived from Chinese 焉 *yen which looks like 正 'proper' atop the bottom half of 鳥 'bird' but in fact can be translated as preposition + third person pronoun. Examples from Lin Yutang, Morohashi 1992 and Pulleyblank 1995:

在焉 'is in it'

往焉 'go there', 'go to him'

學焉 'learn from him'

大焉 'bigger than it'

焉 in preverbal position is interrogative:

焉往 'where ... go?'

焉有 'how is there ...?' (lit. 'how exist')

焉能 ~ 焉可 'how can ...?'

焉得 'how could ...?' (lit. 'how get')

焉知 'how does one know?' (lit. 'how know')

焉用 'what is the need to use ...?' (lit. 'how use')

It is equivalent to a hypothetical sequence

於 'in, at, to, from, than' + the third person object pronoun 之 'him, her, it, them'.

In Old Chinese, 焉 was *ʔan. The first two-thirds match 於 *ʔa but *-n is unlike 之 *tə, so 焉 cannot be a fusion of

*ʔa + 之 *tə

Is the *-n of 焉 *ʔan a remnant of a older third person pronoun paradigm with accusative *tə and locative *nV? (I doubt the third person locative was *nə because there was a second person pronoun 而 *nə, part of a larger set of *n-pronouns.) If an alternation between *t ~ *n forms seems unlikely, English has I and me in the same paradigm, and the two are different even at the Proto-Indo-European level:

*ʔeg 'I' : *ʔme 'me'

Abstract grammar words are difficult to visualize. They have no archetypal image that can be drawn. So they are written phonetically with graphs for homophonous words that can be drawn. Hence the third person object pronoun 之 is a drawing of a foot originally intended to represent *tə 'to go' and 焉 is a drawing of a bird called *ʔan sharing its bottom with 鳥 *tewʔ 'bird'.  Similarly, 於 *ʔa is a drawing of a crow (Old Chinese *ʔa, now written as 烏, a variant of the same drawing). And 不 *pə(t) 'not' is a drawing of a soaring bird representing *pə 'to soar'.

I think the resemblance between 焉 and 正 'proper' is due to scribes replacing the original top half of 焉 with a similar but unrelated line pattern 正 - originally incorporating a drawing of a foot 止 representing *təʔ 'to stop' (derived from 之 *tə 'to go' plus a suffix?).

Morohashi (1993: 480) explains 正 *teŋ-s 'proper' as a combination of a horizontal line 一 and a foot 止: "The original meaning [of 正] was to go straight forward. By extension, it was used to mean 'proper'." *teŋ-s 'proper' was derived from *teŋ 'to go' plus a suffix *-s. *teŋ 'to go' is now written as 征 with the motion radical 彳. Could 之 *tə 'to go' and 征 *teŋ 'to go' be cognates?

There are occasional zero ~ * alternations in Old Chinese: e.g.,

*wa ~ 往 *waŋʔ 'to go'

*ʔa 'in' ~ 央 *ʔaŋʔ 'center' < 'reach the center' < 'be in'

*ma 'to not exist' ~ 亡 *maŋ 'to die' (i.e., to cease to exist)

*nə ~ 乃 *nə(ŋ)ʔ ~  汝 *naʔ  ~ 爾 *neʔ or *najʔ 'you' (did *-e irregularly develop from *-aj?)

(For the reconstruction of *-ŋ in 乃 *nə(ŋ)ʔ, see Sagart 1991: 61-62)

but most are not accompanied by vowel alternations. For other examples, see Schuessler (2007: 76). For an explanation involving an OC *-g that I don't reconstruct, see Gong (1995: 57-59).

*təŋ 'to rise', 等 *təŋʔ 'step of a stair (and 'to wait' in post-OC), 待 *dəʔ < *?N-t- 'to wait'

could be cognate to 之 *tə 'to go' though the semantic match is extremely loose. (The latter two even happen to incorporate a drawing of a 止 foot into their phonetic 寺 which was once written as 止+寸 instead of 土+寸.)

There are also OC ~ *e alternations in closed syllables (Schuessler 2007: 11): e.g.,

*krets ~ 届 *krəts 'limit'

*grek ~ 𦑜 *grək 'wing, feather'

but *tə is not a closed syllable. Are there any other cases of *-ə ~ *-eŋ alternation? I have already mentioned the closest other alternation

*nə ~  爾 *neʔ 'you'

which does not involve *-ŋ.*nə may simply be an unstressed variant of the other second person pronouns. Chinese 鳥 *tieu 'bird' may be the source of the Khitan small script character

072 <deu>

Chinese unaspirated *t was borrowed into Khitan as <d> (which could have been unaspirated *[t]).

