I would like to see statistics on Hmong tonal frequency. In the Romanized Popular Alphabet, the mid tone is unmarked, leading me to think that it is the most frequent tone, but I don't know if that is really the case.

Although Hmong and Chinese are unrelated, they do share similar tonal systems (albeit with different values for each tonal category). For a long time I assumed that the mid tone corresponded to the Chinese 'upper level' category which may be the most common Chinese tone. However, according to this table, the mid tone actually corresponds to the Chinese 'upper departing' category ("tone 5").

If White Hmong (WH; Hmoob Dawb) tones originated in the same way as Chinese tones, they should have the following sources:

Final sonorant
Final glottal stop
'Departing': Final fricative 'Entering': Final stop (non-glottal)
'Upper': Voiceless initial consonant tone -b (high) tone -v (mid rising) unmarked tone (mid) tone -s (low)
'Lower': Voiced initial consonant tone -j (high falling) tone -s (low) tone -g (mid falling, breathy) tone -m (low creaky) or -d (long low rising)

(5.13.00:11: The 'upper' tones are all nonlow in WH except for tone -s. The 'lower' tones are all falling [i.e., becoming low] and/or low in WH. Hence WH is a mostly voiced-low tonal language.

The table for Green Hmong [Moob Leeg] would be identical except that -g would be in the 'lower rising' as well as 'lower departing' categories. This merger is also common in Chinese.)

The breathiness of tone -g may be a trace of a lost final *-h.

In my last entry, I reported that Golston and Yang (2000) found low -s tones in nearly all White Hmong (WH) loans from toneless French. One might hypothesize:

Nearly all loans from nontonal languages have -s tones in WH.

And then one might predict that loans from toneless English should also mostly have -s tones. However, G&Y found that Anglo-Hmong loans have four different tones out of eight (RPA spellings are my guesses):

-m tone: final unstressed syllable

xov fam < sofa

-j tone: stressed syllable with long nucleus (a tense vowel, diphthong, or vowel-[ɹ] sequence)

Khes Maj < Kmart

-v tone: syllable ending in a voiceless consonant (the same environment that conditioned all non-b, -j tones in Hmong?)

mbav < bus (WH has no initial b-; mb- is the closest substitute)

-s tone: syllable ending in a vowel or nasal (the same environment that conditioned -b, -j tones in Hmong?)

Khes Maj < Kmart

Do WH speakers hear more tones in English than in French?

The creaky -m tone surprised me because English has no phonemic creaky voice. Yet I can easily pronounce sofa with creaky voice in  the last syllable.

One might expect Thai speakers to perceive English 'tones' in a similar manner, but they don't. According to Gandour (1979), Anglo-Thai loans are full of high and mid tones - the very tones that are absent from Anglo-Hmong loans! And Anglo-Thai falling and low tones do not correspond to Anglo-Hmong falling and low tones:

Anglo-Thai Anglo-Hmong
falling tone final syllables of polysyllabic words ending in sonorants: wii (mid) saa (falling) 'visa' stressed syllable with a long nucleus: Khes Ma'Kmart'
low tone final syllables of polysyllabic words ending in voiceless stops (in Thai): hɔt (high) dɔɔk (low) 'hot dog' syllables ending in sonorants: Khes Maj 'Kmart'

I wish I could find examples of the same English word in both Thai and WH for tonal comparison.

5.13.00:54: According to Nacaskul (1979: 161),

The high tone seems to be the favourite tone for English loanwords [in Thai ...] It is also noticeable that the rising tone hardly ever occurs in English loanwords in Thai [with only three exceptions ending in diphthongs, whereas the Hmong rising tone corresponds to syllables ending in voiceless consonants].

5.13.1:10: Bickner (1986: 35) pointed out

that the route travelled by a particular word [i.e., borrowing through speech or writing] will influence its pronunciation in Thai. For words which entered the language through speech, it is important to consider [tonal] contour shape as well as several seemingly minor details of the phonology of the different Thai tones in order to understand the pattern of tone assignment [e.g., the "glottal constriction ... characteristic of the Thai high tone" that is absent in the Hmong high tone]

The Biblical Hmong words in my last post were probably borrowed through both speech and writing - possibly with a conscious attempt to maintain tonal uniformity - whereas the Anglo-Hmong loans were probably borrowed through speech.

5.13.1:21: Kenstowicz and Suchato (2004: 25) performed an experiment in which native speakers of Thai assigned high or mid tones to nonsense English monosyllables ending in nasals or nasal-stop combinations:

High tone Mid tone
Syllable ending in nasal 202 842
Syllable ending in nasal + stop 593 451

I wonder if the participants would have chosen other tones if they had been given more options.

White Hmong (WH; Hmoob Dawb) has eight tones but also has borrowings from toneless languages. Since all WH syllables must have a tone, foreign words that did not originally have tones acquire them. Golston and Yang's (2000) paper "White Hmong Loanword Phonology" examines tonal assignment in words such as

'Damascus': Fr Damas [dama] > WH Das Mas [da ma] (s = low tone)

'Sinai': Fr Sinaï [sinai] > WH Xis Nais [si nai] (initial [s] is written as WH x; WH s is [ʂ])

'Timothy': Fr Timothée [timɔte] > WH Tis Mos Tes [ti mɔ te]

'David': Fr David [david] > WH Das Vis [da vi]

'Sarah': Fr Sarah [saʁa] > WH Xas Las [sa la]

'Jacob': Fr Jacob [ʒakɔ] > WH Yas Kos [ja kɔ]

(The WH spellings are my guesses based on G&Y's phonetic transcription.)

Although one might think that the silent -s of French Damas was borrowed as WH -s for a low tone, nearly all French syllables were borrowed with s-tones regardless of their original spellings.

At least four exceptions have other tones:

-b (high) tone

'Mary': WH Mab Liab [ma lia] (not *Mas Lias; appears to be from Maria, not Fr Marie; not listed by G&Y as an exception)

'Peter': WH Pob Zeb [pɔ ʒe] (not *Pos Zes; does not appear to be from Pierre;  WH z is [ʐ], not [z], just as WH s is [ʂ], not [s])

'Job': WH Yob [jɔ] (not *Yos)

-v (mid rising) tone

'Ruth': WH Luv [lu] (not *Lus; not listed by G&Y as an exception)

G&Y regard Yob as a possible case of spelling-driven tone assignment. Note, however, that Jacob also ends in a -b but was not borrowed as WH *Yas Kob.

'Mary' and 'Peter' may have anomalous tones because they could be loans from a language other than French. I cannot identify a source for 'Peter', as I don't know of any European equivalent of 'Peter' like Po(d)re. I wonder if there are two or more layers of Christian WH vocabulary: a French layer and an even older non-French layer. The Hmong might have heard about Mary (and Peter?) before the whole of the Bible was translated into Hmong.

The -v tone in 'Ruth' has parallels in Hmong borrowings from English that I'll look at next time. SEELTERSK: FONETIK, FONOLOGIE, STAVERING

I thank Dwight Decker for introducing me to Sater(land) Frisian (Seeltersk; hereafter SF - initials he would like). Here are a few things about it that caught my eye:

Three degrees of vowel length

Most SF vowels are either short or long, but high vowels may be 'semilong'. Acute accents distinguish long high vowels from semilong high vowels in spelling (stavering).

Short Semilong Long
i [ɪ] ie [iˑ] íe [iː]
u [ʊ] uu [uˑ] úu [uː]
(but uu in uui [uːi]; there is no semilong [uˑi])

I wonder what conditioned the semilong vowels.

The only other language I can think of with three degrees of vowel length is Estonian which has 'overlong' rather than 'semilong' vowels. Unlike SF, Estonian lacks a three-way distinction in its orthography: long and overlong vowels are spelled identically.

Wikipedia lists a few other examples of languages with three degrees of vowel length:

One of the very few languages to have three lengths, independent of vowel quality or syllable structure, is Mixe. An example from Mixe is [poʃ] "guava", [poˑʃ] "spider", [poːʃ] "knot". Similar claims have been made for Yavapai and Wichita.

Could Tangut's rich vowel system have had such a distinction?

The nonhigh SF long vowels are generally written doubled without acute accents. Exceptions are

oa [ɔː] (not oo, which is for [oː]; there is no corresponding short *[o])

öä [œː] (not öö); abbreviated to ö in öi [œːi] since there is no short *[œi]

(2:24: Were these historically opening diphthongs *[oa] and *[œɛ] that monophthongized?)

This SF course implies that üü may also have three lengths; it says that ie, uu, üü (without acute accents!) "are sometimes pronunciated long, sometimes shorter."

The course also mentions that long vowels may be written as single vowels in open syllables (as in Dutch) in "Dr. Fort's spelling".

-u vs. -uw

SF has orthographic syllables ending in both -u and -uw: e.g.,

Dau 'dew'

häuw 'hit, thrust'

What is the phonetic difference between them, if any? w is [w] after /u/, so are they [u] and [uw]? Or is -w required after some diphthongs but not others? Are there minimal pairs of the same vowel or diphthong before zero and -w?

The aforementioned SF course teaches that -w is part of the diphthong: äuw [ew] (Wikipedia: [ɛu]). Is this use of -w arbitrary, or was äuw originally *[ɛ(u)v] or *[ɛ(u)ʋ]?

s vs. z

Why is /s/ spelled both s- and z- in initial position? The SF course states that "Initial s is always sharp like in English sister [s]". Does the z-spelling reflect a lost earlier *[z]? Is the absence of minimal pairs of s- and z-words accidental? MORE MŪṢ-TERIES

I forgot to mention mūṣ-tery 7 last night - which is arguably really muṣ-tery 1: why is √muṣ listed as a variant of √maṣ 'hurt' in Monier-Williams? CuC roots do not alternate with CaC roots.

Mūṣ-tery 8: According to Monier-Williams, √maṣ 'hurt' was "prob. invented to serve as the source of the words" with maṣ- (= mash- in MW's romanization) 'powder; ink'. But why doesn't this artificial verb mean 'powderize' rather than 'hurt' (< 'crush' < 'crush into powder'?)? I would expect a more transparent relationship between an artificial verb and the words that inspired it.

Mūṣ-tery 3 revisited: I asked,

Is this verb [√mūṣ- 'steal'] attested, or was the root invented on the basis of the 'stealer' interpretation of mūṣ- 'mouse'?

Turner (entry 10222) derived Hindi मूसना mūs-nā 'to steal' from Skt mūṣ-a-ti 'steals'. (Obviously he meant that the two shared a root √mūṣ, not that Hindu -nā is from Skt -ati.) Since it's highly unlikely that the Hindi verb is based on an artificial root, I assume that √mūṣ was a real root - perhaps a colloquial variant of earlier √muṣ influenced by mūṣ-'mouse'. How many other marginal Sanskrit words have firm descendants in later Indo-Aryan? MŪṢ-TERIES

In "More Ra-ts", I mentioend that the Sanskrit root for 'rat, mouse' is mūṣ-, cognate to English mouse. I first learned a suffixed derivative mūṣ-ika- 'rat, mouse'. The word survives today in Hindi as mūs; other modern descendants are listed in entry 10258 of Turner.

The earliest attestation of the word I can find is in the Rigveda (i, 105, 8)

mūṣ-o na śiśn-ā vy-ad-anti

rat-NOM-PL as tail-INST-SG devour-3PL

Mūṣ-tery 1: What does this mean? Griffith translated this as 'as rats devour the weaver's threads' but I don't see any 'weaver's threads'. I do see the instrumental singular (not plural!) of 'tail'. 'As the rats devour with a tail?' That doesn't make sense.

'Tail' is the only part of the phrase that has no cognate in English.

na, literally/cognate to 'not', came to mean 'although not being' (Monier-Williams).

vy- 'apart' is cognate to vice (that which is apart - separated - from that which is correct?).

ad is cognate to eat; *e became a in Sanskrit.

Mūṣ-tery 2: Are any other forms of mūṣ- attested: e.g., is its nominative singular mūṭ? (-ṣ is not permissible before a pause.)

Mūṣ-tery 3: Monier-Williams glossed mūṣ- as 'stealer, thief'. Is the word cognate to √muṣ 'steal'? That seems unlikely as Sanskrit ū is from a Proto-Indo-European vowel-laryngeal sequence *uH whereas short u is from PIE *u without a laryngeal. My understanding is that laryngeals are integral parts of roots and can't be inserted: e.g.,

*mus > *mu-H-s!?

(2:45: The Dhātupāṭha listed √mūṣ with a long vowel as 'steal'. Is this verb attested, or was the root invented on the basis of the 'stealer' interpretation of mūṣ- 'mouse'?)

Mūṣ-tery 4:muṣ 'steal' can be conjugated as a member of three different verb classes: e.g., 'steals' could be

I. (earliest attested class). moṣ-a-ti (not muṣ-a-ti which is class VI; see below)

IX. (second oldest but most common?) muṣ-ṇā-ti

VI. (newest) muṣ-a-ti

Class traits are in bold.

-ti '-s' is cognate to archaic English -th.

I always thought of VI as the easiest class: the stem is more stable in the present than in classes I or IX. So I'm not surprised it's the newest. What I don't understand is the function(s?) of the elements between the roots and stems: -a- in I and VI and -ṇā- in IX.

It's interesting that the verb started out in huge class I, then moved to the smaller classes IX and VI.

(2:36: According to Whitney [1924: 263, 267], there are "less than twenty" class IX verbs "in use through the whole life of the language" as opposed to "over two hundred" class I verbs and roughly fifty class VI verbs during the same period.)

The first class IX verb I learned was √krī 'buy' (e.g., krī-ṇā-ti 'buys') - and to steal is to take without buying.

I hope to return to class IX when/if I write about Korean verbs again.

Mūṣ-tery 5:muṣ 'steal' has no attested future (not counting grammatical texts). Was it really impossible to say 'will steal' with √muṣ as opposed to its synonym √cur?

Mūṣ-tery 6: Monier-Williams listed a derived noun muṣ 'theft' with the note "MW" where the abbreviation of an attestation is expected. "MW." is not in the printed list of abbreviations but the online edition says it is short for

Monier-Williams' Sanskrit-English Dictionary, 1st edition with marginal notes

Was Monier-Williams was citing his own work, or was this abbreviation added by later editors?

Monier-Williams finished the new edition of his dictionary just days before he passed away in 1899. MORE RA-TS

My entry on Vietnamese chuột < *juət 'rat' and related words reminded David Boxenhorn of "Chua the rat" from Kipling's The Jungle Book. That name is derived from the common New Indo-Aryan word for 'rat' and/or 'mouse':

cūha 4899 *cūha ʻ rat, mouseʼ.

S. cūho m. ʻ ratʼ; L. cūhā m. ʻratʼ, cūhī f. ʻmouseʼ; P. cūhā m., °hī f. ʻrat, mouseʼ; N. cuhā ʻmouseʼ; B. cuyā ʻrat, mouseʼ; Or. cūā ʻmouseʼ; H. cūhā, cūā m., cūhī, °hiyā f. ʻrat, mouseʼ, G. cuvɔ m.; M. ċuhā, ċuvā m. ʻsharp -- witted personʼ.  - Turner (1962-6: 267; key to abbreviations)

I don't know where this word came from. It has no Sanskrit cognate. The basic Sanskrit root for 'rat' is mūṣ-, cognate to English mouse. Sanskrit lexicons list caṇḍu- 'rat' and cikura-, cikka-, cuñcu-, cucundarī, and chucchundara- 'muskrat' (cf. Kipling's "Chuchundra, the musk-rat" and entry 2661 in Burrow and Emeneau's Dravidian Etymology Dictionary) but none look like good matches for *cūha, as I wouldn't expect to correspond to a or i or *h to correspond to k, kk, ñc, or various NT-type clusters*. The only Munda forms I can find (in Ho and Remo) apart from Santali cũnd 'muskrat' (from B&E 2661) don't resemble *cūha either.

Some other c/ts-words for 'rat' in (South)east Asia:

Thai ไจ้   <cai2> cai 'rat (calendrical)'

Lao ໄຈ້ <cai2> cai 'rat (calendrical)'

Old Chinese 子 *tsəʔ 'rat (calendrical)'

Korean 쥐 cwi 'rat'

(I recall that Hashimoto Mantarou proposed that this word was borrowed from Chinese 鼠 'rat' which generally has a fricative initial [e.g., Mandarin shu] but southern languages like Taiwanese tshi still have affricate initials.)

The Thai and Lao words may be borrowed from a southern Old Chinese variant of 子 'rat' with a presyllable (prefix?) that conditioned vowel lowering:

*Cʌ-tsəʔ > *Cʌ-tsʌɰʔ > *tsʌjʔ

Thai and Lao c- is the closest available equivalent of Chinese ts-.

Thai and Lao written tone 2 is from *-ʔ.

5.10.00:13: The Thai/Lao/OC words have nothing to do with the Korean or Indian words: a shared consonant type is not sufficient evidence for a connection.

*I would expect medial *h to be from a voiced aspirate (gh, jh, ḍh, dh, bh) or a fricative (ś, ṣ, s). NƆ̣-T RELATED

Yesterday I mentioned Proto-Tai *hnuu 'rat' which vaguely resembles Tangut

3907 2nɔ̣ 'rat' (not the calendrical term which is 3859 1xwɨi)

One might think they are related not only to each other but also to


*kh-noC (Edmondson and Yang 1988 in Schuessler 2007: 471; tone C may be from *-ʔ)

*hnoC (Peiros)

*hnuC (Thurgood)

Proto-Mon-Khmer *kni (Shorto 2006) and its descendants: e.g., Mon ဂၞိ <gni> nɔeˀ (Shorto 1962)

Old Chinese *hnaʔ (Schuessler 2007: 471)

Japanese ne < *na(-)i or *ne (calendrical term; the regular word is nezumi)

However, the only solid set of cognates is probably Proto-Tai and Proto-Kam-Sui. The vowels of the other forms mostly don't match. If one proposed rules to explain the vocalic discrepancy, those rules should also apply to other cognates and/or loans.

The source of Jpn ne is unknown; it is probably a truncation of the regular word nezumi, since -zumi cannot be interpreted as a suffix or suffix sequence.

Taiwanese tshi for 'rat' may point to Old Chinese *th rather than *hnaʔ.

And Tangut 2nɔ̣ may be from pre-Tangut *SnroH or *SnraŋH; both have a medial *-r- absent in the others and there is no evidence for a Tangut infix *-r-.* (A Tangut *-ŋ : Old Chinese zero correspondence may not be a problem: see Schuessler 1997: 76-77.)

*5.9.1:05: An unknown coronal obstruent *S- conditioned the tense vowel of 2nɔ̣. The subscript dot indicates tenseness.

Pre-Tangut *-r- conditioned the lowering of *o (possibly from *-aŋ) to ɔ.

There are no known alternations between Tangut words reconstructible with and without *-r-. For now I assume that Tangut medial *-r- is a root consonant.

An unknown final glottal *-H conditioned the second (i.e., 'rising') tone of 2nɔ̣. HUNTING FOR RATS


Vietnamese chuột < *juət 'rat (regular word)'

Thai ชวด <jawaɗa>* chuat < *juat 'rat (calendrical)'

at Andrew West's BabelPad page made me wonder how widespread this word was. It's also in Khmer:

ជូត <juuta> cuut < *juut 'rat (calendrical)'

The regular words for 'rat' outside Vietnamese are

Thai หนู <hnuu> nuu and Lao ໜູ  <hnuu> nuu < Proto-Tai *hnuu*

Khmer កណ្ដុរ <kaṇḍura> kɑndao ~ kɑndol < *kɔnɗur

-o ~ -l < *-r is odd; normally *-r becomes zero: e.g.,

is Surin Khmer knʌr 'rat' related?

I was surprised to find a chuột-like word for 'rat' in only one other language in the SEAlang Mon-Khmer Languages Project dictionary: Thanh Hoa Muong cuot. Ferlus did not reconstruct it at the Proto-Vietic level: i.e., in the ancestor of Vietnamese and Muong. It's not in Proto-Tai'o-Matic either.

The fact that chuột is the regular word for 'rat' in Vietnamese makes me think that the word was originally Vietnamese and spread to Thai and Lao via Khmer.

But that scenario doesn't explain why Thai chuat and Lao suat don't have -uu- like Khmer cuut; their -ua- matches Vietnamese -uô- [uə].

And why would Vietnamese -uô- [uə] be borrowed as -uu- in Khmer when Khmer had a perfect phonetic match -uə-? I would expect the Khmer word for 'rat' to be a homophone of ជួត <jt> ct 'wrap around the head, wear a turban'.)

Is there any other word with a similar distribution (Thai, Lao, Khmer, Vietnamese, Muong, but few/no smaller languages)?

*I keep changing my mind about how to transliterate Thai and Lao. I originally wrote <jwɗ> for both but decided to write the inherent vowels for maximum compatibility with Sanskrit and Pali. However, those vowels are often not meaningful for native words: e.g., native ชวด/ຊວດ <jawaɗa> < *juat 'rat' was never trisyllabic *jawaɗa, though borrowed นคร <nagara> nakhɔɔn 'city' is from trisyllabic Sanskrit/Pali nagara- 'id.'

Final -t in Thai and Lao is written as ด/ດ <ɗ>; cf. the -d of Classical Tibetan corresponding to Old Chinese *-t: e.g., CT brgyad : OC *pret 'eight'. POGAN-ISM

Until this morning, I assumed that Białystok was a purely Polish name: 'white slope'. (Is there a slope in Białystok?) But then I saw this etymology in Wikipedia:

The linguist A. P. Nepokupnyj proposes that the language source for Białystok is Yotvingian. Names with the -stok suffix as a second element of a hydronym are localized in the basin of the upper Narew.

I looked up Yotvingian and found this sad story:

Until the 1970s, Yotvingian was chiefly known from toponyms and medieval Russian sources. But in the 1970s a monument with Yotvingian writing was discovered by accident. In Belarussia, a young man named Zinov, an amateur collector, bought a manuscript titled Pogańskie gwary z Narewu ("Pagan speeches of Narew") from a priest. It was written partly in Polish, and partly in an unknown, "pagan" language. Unfortunately, Zinov had an argument with his mother, who burned the priceless manuscript in a rage. However before the manuscript was destroyed, Zinov had made notes of it which he subsequently sent to the renowned Baltist Vladimir Toporov. Even though Zinov's notes were riddled with errors, it has been proven beyond doubt that the notes are indeed a copy of an authentic Yotvingian text. This short Yotvingian–Polish dictionary (of just 215 words), Pogańskie gwary z Narewu, appears to have been written by some Polish priest in order to preach to Yotvingians in their mother tongue.

What if Zinov hadn't taken notes? How much less would be known?

The title of the book puzzles me because Polish for 'pagan' is pogańskie (nominative plural) with o instead of a. The left side of the Wikipedia page on paganism lists other o-words for 'paganism' besides Polish poganstvo:

West Slavic:

Czech pohanství (why not -stvo?*)

Slovak pohanstvo

South Slavic:

Croatian poganstvo (but Serbian паганизам without о and with the borrowed suffix -изам from Latin -ismus instead of the native suffix -ство - I find it ironic that Croatian seems to have more Slavic elements despite Croatia's greater affinity with the West)

Slovene poganstvo


Lithuanian pagonybė

Samogitian paguonībė


Hungarian pogányság

All of these words are based on a common Latin prototype pāgān(ism)us without o. Why do the borrowings have o in two different locations? Are there other words with non-Latin o corresponding to Latin a?

*22:49: If I understand pages 50 and 51 of Janda and Townsend's grammar correctly, Czech has a two-way opposition I haven't seen elsewhere in Slavic:

-stvo is for animate(-related?) collectives (hence svinstvo 'filthiness' - a condition reminiscent of svině 'swine'?)

-ství is for other abstractions

The declension of -ství is surprising: all singular cases are -ství except for instrumental singular -stvím, and the plural has less syncretism than the singular: -ství (nom./gen./acc./voc.), -stvím (dat.), -stvích (loc.), -stvími (inst.).

