Earlier today (in a table in an addendum I finished on 6.14) I mentioned the 'famous' Saek word for 'eye' (praː) which attracts attention because it's not like Thai taː or similar words in other Tai languages. Pittayaporn (2009: 323) reconstructs its Proto-Tai source as *p.ta which elegantly accounts for the p-, -r- (< *-t-), and t-.

That made me curious about whether Proto-Tai *p.t- always became pr- in Saek. Going through Pittayaporn's list of Proto-Tai reconstructions, I see that Proto-Tai *p.t- has two different reflexes:

1. pr- as in 'eye' (above) and pra:j 'die' (Pittayaporn 2009: 357)

2. t- as in tɤ: 'gizzard' (Pittayaporn 2009: 330)

The presyllable *p.- must have been lost in the ancestor of Saek 'gizzard'; it is reconstructible on the basis of Bao Yen pʰɤɰ whose aspiration is from *-r̥- < *-r- < *-t- (cf. Cao Bang tʰɤj with the same source of aspiration).

Pittayporn (2009: 328) reconstructs Proto-Tai *p.tak 'grasshopper' even though that word has no reflexes in Saek or Bao Yen. Does it have any reflexes with p-like initials? I think he reconstructs *p.t- on the basis of forms like Cao Bang and Shangsi tʰak which have aspiration from  *-r̥- < *-r- < *-t- (as in Bao Yen). Even without Saek or Bao Yen or anything labial, the pattern of initials in Cao Bang and Shangsi matches that of *p.t-words rather than *t-words:

Bao Yen
Cao Bang
pʰ- tʰ-

If Proto-Tai 'grasshopper' were simply *tak, the Cao Bang and Shangsi reflexes would be †tak with †t-.

6.15.10:16: Old Chinese had many words of the 'gizzard' type that had variants with and without presyllables: e.g., 扶 'to crawl'.

Early Old Chinese
*Nɯ-pʰa *pʰa
Middle Old Chinese
Late Old Chinese
*bua *pʰɑ
Early Middle Chinese
*buo *pʰɔ
Late Middle Chinese

At a stage even before Early Old Chinese, the word may have been *Ni-pʰa, *Nə-pʰa, or *Nu-pʰa with a high series vowel that was later reduced to in an unstressed position and ultimately lost.

In Early Old Chinese, the word had developed a variant without a presyllable. *pʰa is comparable to English 'cause, a variant of because without a presyllable be-. Presyllable loss - and other forms of reduction - are not entirely mechanically predictable. Just because because could lose its be- doesn't mean that it always did, much less that all be-words had such variation: e.g., there is no monosyllabic variant †lieve of believe.

In Middle Old Chinese, the high vowel of the presyllable conditioned the warping of *a to *ɨa. The variant without a presyllable had no high vowel and was subject to developing pharyngealization. I write pharyngealization after the initial consonant, but it was a quality of the entire syllable.

In Late Old Chinese, *N-pʰ- fused into *b-. rounded to *u after labials. Pharygealized *a backed to *ɑ. Pharyngealization disappeared after leaving its mark on the vowel.

In Early Middle Chinese, *a raised and rounded to *o after *u. *ɑ raised and rounded to *ɔ.

In Late Middle Chinese, the vowels raised further: *uo > *u, *ɔ > *o. *b- became breathy *fʱ before *u.

In Mandarin, breathiness conditioned tone 2 before being lost. Open syllables without that breathiness or any laryngeals developed tone 1. *o raised even further to /u/.

痡 'suffering' and 鋪 'to spread out' both have two variants, one with a presyllable and one without. The bare version happens to be homophonous with the monosyllabic version of 'to crawl'.

Early Old Chinese
*Cɯ-pʰa *pʰa
Middle Old Chinese
Late Old Chinese
*pʰua *pʰɑ
Early Middle Chinese
*pʰuo *pʰɔ
Late Middle Chinese

*pʰ-, unlike *b-, did not develop a breathy reflex in Late Middle Chinese. As a result, Late Middle Chinese *fu became Mandarin /fu1/ rather than /fu2/ with tone 2 conditioned by *breathiness.

I suspect that the sesquisyllabic (and even earlier disyllabic) versions of 痡 'sufferihg' and 鋪 'to spread out' had very different first halves: e.g., *kupʰa and *pipʰa, etc. The original first consonants are not recoverable, and all that can be said about the original first vowel was that it was nonlow; a low series vowel (*a *e *o) would not have conditioned the warping of *a to *ɨa. *ɯ is my symbol for an unknown high series vowel. So the 'homophony' of 痡 'sufferihg' and 鋪 'to spread out' is an illusion caused by my agnostic notation *Cɯ-pʰa; the two words may not have been homophonous until Middle Old Chinese.

I don't know why 鋪 'to spread out' is written with the 金 'metal' radical. The sesquisyllabic version of 'to spread out' has a more common spelling 敷 with the radicals 方 'direction' and 攵 'action with hand'¹ which make more sense. 敷 is not a spelling of the monosyllabic version *pʰa.

Schuessler (2007: 173) regards 鋪敷 'to spread out' to be cognate to 布 *pa-s 'to spread out' and 博 *pa-k 'wide'. The aspirated initial *pʰ- may be from some earlier cluster like *kp- (which is absent from Baxter and Sagart's 2014 reconstruction). Perhaps the earliest reconstructible form of 鋪 'to spread out' is *kɯ-pa. The two Middle Old Chinese forms would then both reflect the presyllable.

Stage 1: Early Old Chinese


Stage 2: early presyllabic vowel loss
Stage 3: vocalic transfer
*kɯ-pɨa *kpa
Stage 4: late presyllabic vowel loss
*kpɨa *kpa
Stage 5: aspiration
*pʰɨa *pʰa
Stage 6: Middle Old Chinese
*pʰɨa *pʰˁa

In Stage 1, there is only one form of the word.

In Stage 2, the word develops a monosyllabic variant *kpa.

In Stage 3, the vowel of *kpa remains unbent since there is no presyllabic high vowel to condition the bending of *a to *ɨa.

In Stage 4, the presyllabic vowel of *kɯ-pɨa was lost.

In Stage 5, *kp- became *pʰ- - a change that probably also occurred in Middle Korean centuries later.

In Stage 6, the variant without a high vowel developed pharyngealization.

I forgot about the use of 布 *pa-s 'to spread out' to write 'cloth' (a borrowing from an Austroasiatic language: cf. Katu [Kantu dialect] kapaːs 'cotton', Kuy kpah 'cloth', and Sanskrit kārpāsa- 'cotton', also an AA borrowing) which fits my hypothesis of an earlier *k- in 'to spread out', a native word that happened to sound like 'cloth'. The *k-p-word was later reborrowed with disyllabic spellings:

幏布 *kæh-pɑh 'cotton' (c. 100 AD); is the first *-h for foreign *-r-, or was this spelling coined by someone who still had *kr- in 幏: *krɑh-pɑh?

古貝 *kɔˀ-pɑɕ 'cotton' (c. 430 AD)

See Schuessler (2007: 173) for further discussion, though he does not reconstruct *k- in the Old Chinese words for 'cloth' or 'to spread out'.

¹There is no Chinese word 攵 'action with hand'; the gloss refers to the use of 攵 *(r-)pʰok 'to beat' as a component in other characters. (The word 'to beat' is more commonly written 撲 which is not a component in other characters.) DID SAEK SHIFT *Z- UNDER VIETNAMESE INFLUENCE?

Last night I stumbled upon found this passage in Pittayaporn (2009: 296):

In Saek, *z- became /j-/ merging with PT *ˀj-, probably due to influence from North-Central Vietnamese, where original *z- has become /j-/ (Alves 2007).

Northern Vietnamese has /z/ corresponding to /j/ in central and southern Vietnamese. I think Saek would be or would have been in contact with central Vietnamese. (It's not clear if there are Saek villages in Vietnam anymore.)

One might conclude that the north preserves a /z/ that became /j/ elsewhere. This would then be parallel with Saek. But I am not sure that is the case. Here are the data:

Old Vietnamese
*kj-, *-C-
*j-, *-T-
*r-, *-s-
Middle Vietnamese spelling
Northern Vietnamese
Nonnorthern Vietnamese

By 'northern' I mean Hanoi and Vinh (the latter is north central); 'nonnorthern' refers to Huế (at the center) and Saigon. (I don't want to say 'south' because Huế is certainly not in the south.)

Capital letters stand for obstruents with unspecified voicing: e.g., *C could be voiceless *c or voiced *ɟ.

Hyphens before consonants indicate the presence of an unspecified presyllable: e.g, *-C- represents *c or voiced *ɟ. preceded by a presyllable.

Exactly what the Middle Vietnamese spellings gi- d- r- stood for is not certain. I can only say that none of those three consonants were /z-/ or /j-/. I think it's possible that gi- and d- became /j-/ without a *z-phase. But maybe Saek is evidence for such a phase.

Or is it? The /z-/ of Vietnamese postdates the 17th century and long postdates the devoicing of original *voiced obstruents (possibly by the late first millennium AD). On the other hand, Saek *z- is original. Did Saek have *z- and a full set of voiced obstruents as late as the 18th century - almost a thousand years after Vietnamese devoiced its voiced obstruents?

6.14.2:21: I don't think what I wrote above is clear. Let me try again.

Phases of Vietnamese

Vietnamese consonants can be said to have gone through five phases which I will illustrate with hypothetical examples for simplicity:

-voc -voc
*praː *taː
*p *taː
*pʂ *taː
/zaː/ ~ /jaː/ /zaː/ ~ /jaː/ /saː/ ~ /ʂaː/

Phase 1: Early Old Vietnamese:

presyllables present

no tones

no lenition

phonemic voicing in obstruents

I am not sure Early Old Vietnamese ever had *(d)z-. It is perhaps telling that Early Middle Chinese 字 *dzɨʰ 'written character' was borrowed as ́*ɟɨːʰ (now chữ) rather than as †zɨːʰ which would have become †tữ. Later Early Middle Chinese 字 *dzɨʰ became Late Middle Chinese 字 *tsɨ̣ and was borrowed again into Vietnamese; see phase 3 below.

Phase 2: Middle Old Vietnamese:

*-r- > *-r̥- after a voiceless initial

subphonemic tones conditioned by voicing before main vowel: *voiceless > unmarked ngang tone, *voiced > grave accent for huyền tone

tones conditioned by final consonants may date between phase 1 and phase 2

Phase 3: Late Old Vietnamese:

voicing (lenition) of medial obstruents: *-t- > *-d-

*-r̥- > *-ʂ-

devoicing of voiced obstruent initials

words formerly distinguished by obstruent voicing now distinguished only by tone which had become phonemic

Late Middle Chinese 字 *tsɨ̣ 'written character' (with a devoiced initial) was borrowed as ́*sɨ̣ː (now tự). (For simplicity I use a Vietnamese tone mark even for Late Middle Chinese.)

Phase 4: Middle Vietnamese:

presyllables lost

*Cʂ- > s- /ʂ/

Drag chain *s- > *t- > /ɗ/

Italicized forms are 17th century spellings; those spellings of consonants remain in use today. đ is /ɗ/, but the phonetic value of d is uncertain. [d] is the simplest interpretation, but [dʲ] and [ð] are also possible.

Phase 5: Modern Vietnamese: different reflexes of Middle Vietnamese s and d depending on dialect. s lost retroflexion in Hanoi (but not in Vinh which has /z/ like Hanoi and unlike the nonnorth dialects; Thompson 1987: 98). The picture for d is less clear. Two scenarios:

Scenario 1. All dialects shifted d to /z/, and nonnorthern dialects shifted /z/ to /j/


Scenario 2. d shifted in different ways; no shared /z/-phase


There is no doubt that Proto-Tai *z- became /j-/ in Saek. The question is whether that shift in Saek reflects the influence of Vietnamese given scenario 1. Let's suppose scenario 1 is true. Phase 4 is in the 17th century and phase 5b perhaps starts in the middle 19th century. (The last traces of Middle Vietnamese consonantism seem to disappear after the early 19th century.) So the Saek change would have to be dated between the 17th and 19th centuries. But if the Saek change were that recent, Saek would have had *z- - and presumably other Proto-Tai voiced obstruents such as *g *d *b- - as late as the 17th or even 18th century. That doesn't seem likely given that its neighbor Vietnamese had undergone devoicing prior to borrowing from Late Middle Chinese during phase 3 (circa the 10th century).

Phases of Saek

Saek has gone through some of the same changes as Vietnamese up to phase 3, though the details differ:

*praː *taː
*pər *pr̥aː *taː
*pdaː *praː *pʰraː
pr raː pʰraː taː àː saː

Phase 1: Proto-Tai:

presyllables present (rewritten here as *Cə- instead of as *C.- as in Pittayaporn's notation)

no tones

no lenition

phonemic voicing in obstruents

Phase 2:
drag chain shift: *-t- > *-d- > *-r-; contrast with Vietnamese phase 3 in which  *-t- > *-d-; 

Phase 3:

loss of presyllabic vowels

*pər- > *pr-; *pr̥- > *pʰr-

subphonemic tones determined by initial consonant (Including presyllabic consonants unlike Vietnamese) after lenition (again, unlike Vietnamese)

To facilitate comparison with Vietnamese, I use Vietnamese tone notation: zero for tone A1 and a grave accent for tone A2.

Tones conditioned by final consonants may have developed between phase 1 and phase 3.

Phase 4:

drag chain shift: *pd- > *pr- > r-, *d- > tʰ-, *z- > j-

words formerly distinguished by initial voicing now distinguished by tone wh\ich has become phonemic

My guess is that lenition and devoicing happened independently in Vietnamese and Saek, whereas tonogenesis did not - Vietnamese phase 3 and Saek phase 3 may have been simultaneous.

Phases of Cao Bang

On 6.11, I thought Saek having *z- and other voiced consonants as late as the 18th century was improbable, but Tai languages on the Sino-Vietnamese border never underwent devoicing (PIttayaporn 2009: 110). Compare the phases of Cao Bang with those of Vietnamese and Saek:

*pdaː *p
*p *pdàː *pʂ
dàː pʰj
taː àː

Phase 1: Proto-Tai: same as Saek phase 1

Phase 2:

loss of presyllabic vowels

*-r- > *-r̥- after a voiceless initial (as in Vietnamese and Saek)

Phase 3:

Chain shift: *pt- > *pr̥-*pʂ-

subphonemic tones determined by voicing of consonant before vowel (contrast with Saek)

To facilitate comparison with Vietnamese, I use Vietnamese tone notation: zero for tone A1 and a grave accent for tone A2.

Tones conditioned by final consonants may have developed between phase 1 and phase 3.

Phase 4:

*pr̥- > *tr̥- > tʰ-

elimination of *voiceless-voiced clusters and chain shift: *pd- > *d- > dʱ-

*pʂ- > *pɕ- > pʰj-

*z- > *s- > tʰ-; *z- devoiced but this seems to be an anomaly; see my 6.13 entry; the fortition is reminiscent of Vietnamese (see Phan 2013 for examples of *s- > /tʰ/ in Vietnamese: eg., *sit > thịt 'meat'¹) but probably occurred independently much later. Phan (2013: 65) regards fortition of fricatives as "common in Southeast Asia and should not be considered a shared innovation."

tone A2 still strongly associated with voiced initials but has become phonemic due to the devoicing of *z-

Finally, for reference:

Phases of Thai/Lao

Thai and Lao never underwent lenition; medial *-t- and *-d- remain as stops today.

*pdaː *p
* *ɗ *taː
d pʰaː
taː àː saː sàː

Phase 1: Proto-Tai: same as Saek and Cao Bang phase 1

Phase 2:

loss of presyllabic vowels

*-r- > *-r̥- after a voiceless initial (as in Vietnamese, Saek, and Cao Bang)

Phase 3: More or less represented by Thai and Lao spelling (but Lao has no <z>; *z- corresponds to ຊ <j>)

reduction of *pC- to *t- and *ɗ- (not *d-!); was there an intermediate geminate stage *tt- and *dd-?

*-r̥- > -ʰ-

subphonemic tones determined by initial consonant (Including former presyllabic consonants unlike Vietnamese)

To facilitate comparison with Vietnamese, I use Vietnamese tone notation: zero for tone A1 and a grave accent for tone A2.

Tones conditioned by final consonants may have developed between phase 1 and phase 3.

Phase 4

drag chain shift: *ɗ- > d- > *tʰ-

words formerly distinguished by initial voicing now distinguished by tone wh\ich has become phonemic

the Vietnamese notation, though convenient, is misleading, as tones A1 and A2 have undergone splits and, in Thai, a merger.

The development of tones A1 and A2 in Thai and Lao

Stage 3 subphonemic tone
Stage 3 initials
*pʰ-, *s-
*ɗ-, t-
*d-, *z-
Stage 4: Thai tones
Stage 4: Vientiane Lao tones

All of the phases above are my speculations built upon the work of Gage ("Vietnamese in Mon-Khmer Perspective", 1985) and Pittayaporn (2009). The relative chronology is only approximate; some but not all changes could be reordered with the same final results.

¹The nặng tone written with a subscript dot normally indicates a *voiced initial. It is tempting to reconstruct a change *z- > /tʰ/ as in Cao Bang. But support for *z- in native words is weak. The tone may reflect a lost voiced prefix. EMPHATIC SAND

Tonight I found the section on the Middle Korean emphatic particle za at random in Lee and Ramsey (2011: 194). The earliest attestations of it I can find in Old Korean are in two 鄉歌 hyangga


*motʌn kəs sa

'all thing EMPH'

- 慕竹旨郎歌 (c. 700)


*hʌtʌn sa

'one EMPH'

- 禱千手觀音歌 (c. mid-8th century)
where it is spelled phonetically with Middle Chinese 沙 *ʂæ 'sand'.

It occurred to me that the 'sand' spelling of that particle¹ obviously must predate the lenition of *s to Middle Korean z.

If a *z-pronunciation had existed in Old Korean, it could have been spelled with Middle Chinese

嵯嵳𣩈㽨瘥𥰭䑘艖蒫醝䰈鹺䴾齹虘蔖䠡䣜躦𪘓 *dza

or 邪䓉耶椰瑘𥯘鎁釾𦭿𦰳斜䔑擨 *ziæ².

(There was no Middle Chinese syllable *za. This gap is not accidental. I should look into it.)

It turns out that 邪 'evil' is attested as a phonogram in Old Korean hyangga, but 俞昌均 Yu Chhang-gyun (1994: 76) interprets it as a symbol for *ra (cf. its possible Old Chinese reading *la in Schuessler 2009: 56). There have been many attempts to reconstruct the pronunciation of Old Korean. Has anyone interpreted 邪 as *sa (possibly tempted by its modern Sino-Korean reading sa) or *za? I don't have any other sets of hyangga readings on hand. Another thing to look into when I get the chance.

¹6.11.21:29: It never occurred to me to use Unicode superscript numerals for endnotes until now. No more long strings of asterisks.

It's theoretically possible that the 'sand' spelling in this text postdates the 8th century, as these poems survive in 三國遺事 Samguk yusa (1285) whose earliest surviving copy is from 1512. Even if these poems are actually from c. 700 AD, their spellings could have been altered in the centuries between then and 1512.However, I know of no other evidence pointing toward some other Ur-spelling of the emphatic particle. The 口訣 kugyŏl phonogram for *sa ~ *za is 氵 which is almost certainly an abbreviation of 沙 'sand', the most common sa-character with the left-hand component 氵 'water'. Kugyŏl manuscripts from the Koryŏ dynasty (918-1392) predate 1512; one need not worry about potential errors in their transmission.

²6.11.23:44: Nearly all of these characters are rare and therefore not likely candidates for phonograms which tended to be high-frequency characters. So one might argue that the Old Korean particle was *za but not written as such because there was no high-frequency characters with a similar reading other than 邪 *ziæ 'evil' which was already being used for *ra if Yu (1994) is correct. However, if *s had already lenited to *z in Old Korean, I would expect to see other phonogram spellings unambiguously reflecting lenition. But I know of none offhand. Although one might argue that *s lenited before other consonants, that possibility could only be confirmed if there were *(d)z-spellings of later z-words. No such spellings seem to exist.

The only *(d)z-phonogram in Yu's (1994: 75-78) catalog of phonograms in hyangga are the aforementioned 邪 *ziæ 'evil' and

齊 Middle Chinese *dzej 'equal' : Yu's Old Korean *tsjə (my *tse)

which, like 邪 *ziæ 'evil', does not represent an Old Korean syllable corresponding to a Middle Korean z-syllable. So if Old Korean already had *z-syllables, they were not written with Chinese *(d)z-characters and cannot be detected.

I could argue that in fact the dialect of Chinese known to educated Old Koreans had shifted *(d)z- to *(t)sʱ- (as in Pulleyblank's Late Middle Chinese reconstruction), so the characters above wouldn't have been appropriate for an Old Korean *za.

That Chinese dialect had a reflex of Middle Chinese *ɲ- that corresponds to z in Middle Korean Sino-Korean readings. But there was no Middle Korean Sino-Korean reading †za. So it seems Old Koreans had no good options for writing *za if they had such a syllable - and I still don't think they did.

(The questions of what that Chinese dialect's reflex of *ɲ- was and how it was borrowed into Old Korean - as *z- or as something else that became z- in Middle Korean - remain open. The simplest solution is to assume that Chinese dialect had something like the *ž- of Liao Chinese. This was borrowed into Old Korean as *z-, a consonant originally only in borrowings. Later, Middle Korean lenited *s in native words, resulting in a new /z/ that shared the fate of the old borrowed one: both /z/ soon disappeared from the Seoul dialect. [But does any Korean dialect today have a trace of /z/ in Sino-Korean words?) THE PHONETIC VALUE OF MIDDLE KOREAN DOUBLE ZERO

In the earliest hangul texts from the 15th century, there were three circular letters.

ㅇ <Ø> : ㆁ <ŋ> : ㆀ <ØØ>

In modern hangul, ㅇ <Ø> has come to represent zero in initial position and /ŋ/ in coda position: e.g., 앙 <ØaØ> /aŋ/. Although ㅇ may appear with a short vertical line on top like ㆁ <ŋ> in some fonts, that line no longer distinguishes ㆁ <ŋ> from ㅇ <Ø>; the reading of ㅇ /ㆁ is now wholly dependent on its position within a syllabic block.

ㅇ <Ø> had two uses in the earliest hangul orthography for Late Middle Korean in the 15th century. it could represent initial /Ø/ as in the modern language and - unlike the modern language - also represented /ɣ/ in four environments:

1. between /r/ and a vowel

2. between /z/ and a vowel

3. between /j/ and a vowel

4. between /i/ and a vowel

This /ɣ/ has disappeared in the modern standard language, though traces remain in dialects: e.g., 15th century 몰애 <morØai> /morɣaj/ 'sand' corresponds to Pukchhŏng molgɛ with -g- (cf. standard morɛ).

What was ㆀ <ØØ>? Lee and Ramsey (2011: 146) regard it as another spelling of Late Middle Korean /ɣ/. But why would two letters be devised for the same sound at the very beginning of a script? A clue may lie in the limited distribution of ㆀ <ØØ> which was solely used to write forms of the passive/causative suffix ᅇᅵ<ØØi> - and in one instance, the causative suffix ᅇᅮ <ØØu> (月印釋譜 Wŏrin sŏkpo 14:14) - after /j/. If the first suffix were simply /ɣi/, why not spell it as 이 <Øi> which is the spelling after /l z/? (I don't know of any instances of that suffix after /i/. The second suffix is otherwise spelled <Øu> = /ɣu/ after /l z j/.)

Yesterday afternoon it occurred to me that ㆀ <ØØ> might represent a palatal allophone [ʝ] of /ɣ/. This allophone may have been geminated [ʝʝ] if it was like /ss/ and /hh/ which were written as double consonants ㅆ ㆅ <ss hh>. There is even one case of /nn/ as ㅥ <nn> in 訓民正音諺解 Hunmin chŏngŭm ŏnhae.

There is, however, no guarantee that a double consonant necessarily represented a geminate, as ㅆ ㆅ <ss hh> could also represent /z ɦ/ in the prescriptive transcription of Sino-Korean readings. (Native /z/ had a different letter ㅿ <z>. It might be more accurate to regard the artificial voiced consonants of Sino-Korean readings as breathy voiced: e.g., Sino-Korean ㅆ <ss> was /zʱ/ or /sʱ/ and therefore distinct from ㅿ /z/.) Doubled ㄲ ㄸ ㅃ ㅉ <kk tt pp cc> could only represent /g d b dz/ in that transcription in the earliest hangul texts; their use for reinforced consonants came later.

Moreover, the circle was used to derive consonant characters for nongeminates: e.g., /β/ was written as ㅸ. So ㆀ <ØØ> could be interpreted as 'derivative of circle' for [ʝ] rather as than 'double circle' for [ʝʝ] (or geminate zero which would make no sense).

One problem with this proposal is that it cannot easily account for the one instance of ㆀ <ØØ> in the causative suffix ᅇᅮ <ØØu>. It is understandable that /ɣ/ would palatalize to [ʝ] between /j/ and /i/ in, for instance, ᄆᆡᅇᅵ<mʌi.ØØi> /mʌjɣi/ [mʌjʝi] 'to be bound to', the passive stem of /mʌj/ 'to bind'. It is slightly less understandable why /ɣ/ would palatalize to [ʝ] between /j/ and /w/ in  뮈ᅇᅯ <mui.ØØuə> /mujɣwə/ 'moving'. (/ɣw/ is an allomorph of /ɣu/ before vowel-initial suffixes like /ə/ '-ing', called the 'infinitive' [though it is not like an Indo-European infinitive].)

Perhaps 뮈ᅇᅯ <mui.ØØuə> reflects a pronunciation [mujʝɥə] in which the palatal quality of /j/ spread into the following consonants. That pronunciation might even have been common, though for most purposes a phonemic spelling 뮈워 <mui.ØØuə> for /mujɣwə/ might have sufficed instead of a more precise phonetic spelling 뮈ᅇᅯ <mui.ØØuə>. I don't know if the spelling 뮈워 <mui.ØØuə> is attested, but 月印千江之曲 Wŏrin ch'ŏn'gang chi kok 62 has the spelling 뮈우 <mui.Øu> /mujɣu/ for the stem. FRGÁL

Slavic languages normally only have [f] in loanwords and as a positional variant of /v/ (which is why Russian names in -v have variant spellings in -ff).

As far as I know (thanks to Short 1993), Czech initial [f] can only appear

- in onomatopoetic words (e.g., foukat 'to blow')

- as a positional variant of v before voiceless consonants (e.g., vsadit 'to bet', pronounced [fsadit])

- in loanwords from non-Slavic languages (e.g., .fonetický 'phonetic')

So what is the source of the f in the dish called frgál? That f- is before a voiced syllabic r and is not a variant of v-. Is it onomatopoetic or from a foreign language - perhaps Romanian, given that frgál is from Moravian Wallachia? That region isn't continguous with modern Romania, but it was settled by Vlachs. SHIMUNEK (2017) AND DOWNES (2018)

Last night, I found the addenda and corrigenda to Andrew Shimunek's Languages of Ancient Southern Mongolia and North China (2017). I thought that would be as close as I'd get to having his book which I can't afford at $116.76 until I saw an online sampler.

It's remarkable that three books on Khitan have appeared in English within a decade - the other two being Daniel Kane's The Kitan Language and Script (2009) and Wu Yingzhe and Juha Janhunen's New Materials on the Khitan Small Script: A Critical Edition of Xiao Dilu and Yelü Xiangwen (2010 - just a year after Kane's book!).

Can a new book on Jurchen be far behind? It has been almost thirty years since Kane's The Sino-Jurchen Vocabulary of Interpreters (1989) which despite its title is a general gateway to Jurchen language studies as well as complementing Kiyose Gisaburō's A Study of the Jurchen Language and Script - Reconstruction and Decipherment (1977) which covered the Sino-Jurchen vocabulary of the Bureau of Translators.

Not long after Imre Galambos' Translating Chinese Tradition and Teaching Tangut Culture: Manuscripts and Printed Books from Khara-Khoto (2015) comes Alan Downes' PhD dissertation "How Does Tangut Work?" (submitted 2016, revised 2018), a follow-up to his BA honors thesis "The Xixia Writing System" (2008) - and his website which links to mine.

Alas, I haven't written about Tangut - much less Khitan or Jurchen - in a long time. If I may rephrase Downes' question, I have been trying to come up with the answer to "How Does Pyu Work?" It's coming in a series of articles and a book.

These are exciting times for the study of extinct Asian languages. YAT AND ETA

Today I realized that my interpretation of the early Slavic vowel yat as [ɛː] (< *ai) sounded like the classical value of the Greek letter Η eta. Since Cyrillic is an offshoot of the Greek alphabet, one might expect yat to have been written with an eta-based Cyrillic letter. But of course eta was actually the model for the Cyrillic letter И <I> because eta had raised to [i] by the 4th century AD, long before Cyrillic was created in the late 9th century. [ɛː] was long gone in Greek, so a non-Greek letter was created for yat: Ѣ.

Ѣ looks like a derivative of the front yer letter Ь [ɪ] which in turn looks like a derivative of the Glagolitic front yer letter Ⱐ. But it is strange that a lower mid long vowel was written with a modified lower high short vowel rather than, say, with an additional stroke (like Czech ě which is nowadays used to transliterate yat). I don't see any resemblance between Ѣ and its Glagolitic counterpart Ⱑ.

5.14.23:24: According to Wikipedia, Schenker (1995) thought Ⱑ might be from Greek alpha Α. That makes a lot of sense if yat were [æ].

Modern reflexes of yat vary considerably in height from [ja] with a low vowel in eastern Bulgarian* to [i] in Ukrainian.

*Eastern Bulgarian has two reflexes of yat: [ja] and [ɛ]. The former is in stressed syllables not followed by front vowels. The latter occurs elsewhere. PROTO-CELTIC VOICED ASPIRATES?

I've seen this Proto-Celtic word list before, but I didn't notice voiced aspirates in it until now:

*mori-steigh-(e/o-) 'sea'

*men-n-dh-e/o- (?) 'want'*

*ati-od-bher-to- (?) 'sacrifice'

Are those pre-Proto-Celtic forms? I thought Proto-Celtic lost aspiration in voiced consonants:

Proto-Indo-European *gh *dh *bh > Proto-Celtic *g *d *b

*5.14.0:42: This reminds me of Avestan mazdā- 'wisdom' < *mn̥s-dheʔ 'mind-place', though the first root is in the e-grade in Celtic. CHU AND KRA-DAI (PART 2)

Here's my attempt to reconstruct the Old Chinese (OC) phonetic series of 楚 (Schuessler 2009 series 1-62, Karlgren 1957's series 88 plus 90) to make it fit Chamberlain's (2016) hypothesis from part 1.

The series has five types of Early Middle Chinese (EMC) readings (ignoring final consonants):

I. *sɨə-*Cɯ-sa- (*kɯ-sa-?) (胥湑稰諝糈壻婿)

II. *ʂɨə- < *kɯ-sa- (疋疏蔬梳糈)

III. *tʂʰɨə- < *kʂʰɨa- < *kɯ-sa- (楚 only)

IV. *ŋæ- < *ŋgʐa- < *N-k-sa- (alternate reading of 疋 only)

V. *sej < *se (alternate reading of 壻婿 only)

The high-vowel presyllables of types I-III conditioned medial *-ɨ- which in turn conditioned the raising of *a to *ʂɨə.

The high-vowel presyllables of type I was lost after conditioning medial *-ɨ-, but they fused with *s in types II and type III. *kɯ-s- that fused early became EMC *ʂ- via *kʂ-; *kɯ-s- that fused late became EMC *tʂʰɨə- via *kʂʰ-.

Type III *kʂʰɨaʔ might have approximated an early Kra-Dai *kraʔ, especially if it were phonetically something like [kʁaʔ].

(5.12.0:56: Or if 'Kra' were [kʐaʔ]. Cf. Polish krz [kʂ] from *kʐ- < *krʲ-. Pittayaporn 2009: 99 reconstructed *ks- as a Proto-Tai source of Proto-Southwestern Tai [and hence Siamese] *kʰr-, though he does not list any examples of Proto-Tai *ks-, and he reconstructed the Proto-Tai cognate of 'Kra' as *kraː C 'slave' with *kr- rather than *ks-. Siamese kʰaː C1 'slave' lacks the -r- that would point to medial *-s-. If *ks- became Siamese kʰr-, perhaps *kz- became *kr- and then Siamese kʰ-.)

*N- fused with *k- to form the *ŋ- of type IV.

(5.12.0:11: OC *a fronted to after retroflexes.)

The *-e rhyme of type V is anomalous and unique to 壻~婿 'son-in-law'; it cannot be reconciled with the *-a rhyme of the other types.

5.12.1:03: Added all examples of each type listed in (Schuessler 2009: 59) plus 疋 as the sole example of type IV which was not listed in Schuessler. CHU AND KRA-DAI (PART 1)

Chamberlain (2016) proposed that the name of the state now known as 楚 Chǔ in Mandarin is the same name as Kra as in Kra-Dai. This is an ingenious idea. But does it really work?

The rhymes certainly match. 楚 ended in *-aʔ in Old Chinese, and 'Kra' in Proto-Kra-Dai was something like *kraʔ (cf. Ostapirat's Proto-Kra *kra C 'Kra' and Pittayaporn's Proto-Tai *kraː C 'slave'; I interpret the C tone category as *-ʔ like Norquest 2016).

The trouble is the initial. If 楚 had initial *kr- in Old Chinese, it would have become Early Middle Chinese †kæʔ and Mandarin †jiǎ. But instead it became Early Middle Chinese †*tʂʰɨəʔ and Mandarin chǔ [tʂʰu] with aspirated retroflex initials.

Can those initials be reconciled?

Pulleyblank (1962: 129) proposed that Old Chinese *skʰ- might have become Early Middle Chinese *tʂʰ-. Later, Pulleyblank (1965: 206) proposed Old Chinese *kʰs- as a source of Early Middle Chinese *tʂʰ-. But there is no *s in Proto-Kra-Dai *kraʔ. *s- is likely to have been in the Old Chinese reading of 楚 since nearly all readings of characters in the 疋 phonetic series began with *ʂ- or *s- in Early Middle Chinese. There is no evidence on the Chinese side directly pointing to *k- in 楚 or any other member of the 疋 phonetic series, though 疋 does have another Early Middle Chinese reading *ŋæʔ which could mechanically be derived from an Old Chinese *ŋraʔ - close to *kraʔ but with a velar nasal rather than a stop.

Next: How can I make Chamberlain's idea work?

5.11.11:56: Added reference to Pulleyblank (1965) and link to Pulleyblank (1962). ARMENIAN, KOREAN, AND BURMESE APPROACHES TO KHITAN OBSTRUENTS

In my last entry, I wrote,

the Khitan transcribed Liao Chinese *t as both <t> and <d>

There are similar inconsistencies with other obstruents and to a lesser extent even in the spelling of native Khitan words: e.g., 'second' is spelled with both 162 <c> and 104 <dz> (Kane 2009: 115).

I originally thought that Liao Chinese and Khitan had different obstruent systems: e.g., LC had an unaspirated : aspirated distinction whereas Khitan had a voicing distinction. But that wouldn't explain the inconsistency in Khitan native words.

Today it occurred to me that Khitan might have had Armenian-style variation:

The major phonetic difference between dialects is in the reflexes of Classical Armenian voice-onset time. The seven dialect types have the following correspondences, illustrated with the t–d series:

Correspondence in initial position

Indo-European *d
*dʰ *t
Erevan t
Istanbul d
Kharpert, Middle Armenian d
Malatya, SWA
Classical Armenian, Agulis, SEA t
Van, Artsakh t

But of course Khitan had only two obstruent series, not three.

Might the use of certain spellings correlate with certain locations and/or time periods? They would then reflect the obstruent series of different regional/chronological varieties of Khitan. The unspoken assumption of Khitan studies is that the language was homogeneous over a wide area for a long period, but that is unlikely.

Another possibility is that Khitan was like modern Korean in which unaspirated obstruents have voiced and voiceless allophones conditioned by different environments: Sino-Korean 德 /tək/ appears as

[dək] after a sonorant

[tək] elsewhere

Could 254.020 <d.ei> ~ 247.020 <t.ei> transcribing Liao Chinese 德 (Kane 2009: 253) have had a similar distribution?

A final possibility is that Khitan was like Burmese in which etymological voiceless consonants may be voiced in close juncture. Wheatley (2009: 729) explains that in Burmese,

[c]lose juncture is characteristic of certain grammatical environments [...] But within compounds the degree of juncture between syllables is unpredictable; the constituents of disyllabic compound nouns (other than recent loanwords) tend to be closely linked, but compound verbs vary, some with open, some with close juncture.

The above possibilities are not mutually exclusive for Khitan. THE KHITAN EMPEROR SHENGZONG IN UNICODE

Today I discovered that lookalikes for all four Khitan large script characters for 聖宗皇帝 'Emperor Shengzong' (r.  979-1031) exist in Unicode:


Of course it's only the first two characters that are interesting; they are unknown to nearly all literate in Chinese. The last two are identical to Chinese 皇帝 'emperor'.

'Emperor Shengzong' exemplifies how the Khitan large script to a Chinese eye is a mix of familiar and alien elements. The first two characters combine famliar elements

夕 'evening' + 卞 'hat' = 𫝢

亻 'person' + 及 'to reach' = 伋

in unfamiliar ways.

𫝢 turns out to be a variant of 升 'to rise', which in turn was a homophone of 聖 *šiŋ 'sage' in Liao Chinese aside from its tone. 𫝢/升 and 聖 were not homophones until the late first millennium AD, so the use of 𫝢 for 'sage' may date from the Liao dynasty and is probably not a carryover from the pre-Liao Parhae script hypothesized by Janhunen. Why didn't the Khitan simply recycle 聖 'sage' the way they recycled 皇帝 'emperor'? Was 聖 'sage' too complex for the Khitan large script which favored a low number of strokes per character?

In Chinese, 伋 is a name character of no known meaning. (It is the birth name of Confucius' grandson 子思 Zisi.) It would have been pronounced *ki in Liao Chinese and not 宗 *tsuŋ like 'ancestor'. So the reasoning for 伋 as 'ancestor' is unclear (though at least the 亻 'person' radical makes sense). Might a Khitan or even a Parhae word for 'ancestor' have sounded something like *ki?

(5.9.9:39, revised 14:16: Was 伋 a semantic compound invented by someone who might not have known about the rare character 伋? But I know of no semantic compounds unique to the Khitan large script. The closest instance I can think of is


which consists of 天 'heaven' over 土 'earth'. It is not a true semantic compound because it does not represent a word for 'heaven and earth' or 'world' (the sum of 'heaven and earth'); 土 'earth' seems to disambiguate an unknown Khitan word for 'heaven' from 天 for <tên>, a borrowing from Liao Chinese. The semantic function, if any, of 及 'to reach' in 伋 'ancestor' is less clear.

The Dictionary of Chinese Character Variants has no 伋-like variants of 宗. What I will call Janhunen's Question remains unanswered: If the Khitan wanted a script to distinguish themselves from the Chinese, why did they keep or replace characters seemingly at random? I still think the only possible answer is that they didn't do that - rather, they adapted a sister script of Chinese [Janhunen's hypothetical Parhae script]. The situation is somewhat parallel to that of Cyrillic which is related to the Latin alphabet but not derived from it; they are 'cousins', not 'daughter' and 'mother'.)

Although the shapes of 皇帝 'emperor' are uninteresting, the question of how we know their readings is worth examining. Kane (2009) reads them as <hoŋ di> (= <ghong di> in the transcription system on this site).

However, I have not found any Khitan small script phonetic spelling of the first half of 皇帝 'emperor' or any of its homophones in Chinese. I would expect such a spelling to be 340.071 <> with voiceless 340 <h> rather than voiced-initial 076 <gho>. (There is no known small script character <gh> without a vowel, and devoiced to *x in Liao Chinese.) No spelling <> is in Qidan xiaozi yanjiu (1985: 460). Has such a spelling been found in the thirty-plus years since the publication of that book?

Kane (2009: 244) lists 247.339.339 <t.i.i> as a small script spelling of the second half of 皇帝 'emperor'. Unfortunately, he does not cite a source for this spelling, and it is not in Qidan xiaozi yanjiu (1985: 375). I presume <t.i.i> is from an inscription discovered after Qidan xiaozi yanjiu was written. The <t> of <t.i.i> does not necessarily invalidate Kane's reading di for 帝 since the Khitan transcribed Liao Chinese *t as both <t> and <d>, and they transcribed Liao Chinese *i as both <i> and <i.i>.

5.9.0:33: Why is the name character 伋 glossed in English as 'deceptive' at

5.9.0:49: Kane (2009: 181) also lists a second Khitan large script character ⿰歹卞 for 聖 'sage' with 歹 'bad' on the left instead of  夕 'evening' from Liu and Wang (2004: 27, character 150). That character has no Unicode lookalike; it is character 0177 in N4631 ("Proposal on Encoding Khitan Large Script in UCS") which does not seem to list 𫝢 from Kane (2009: 183). Where is 𫝢 attested? Regardless of whether 𫝢 is an error for ⿰歹卞 and hence not a real Khitan large script character, I have no doubt that  ⿰歹卞 is a variant of the Chinese character 𫝢 and is a phonetic loan for  聖 'sage'.

I also think that 𫝢 / ⿰歹卞 <shing> may have been the inspiration for the vaguely similar Tangut character


2shen3 'sage'

whose Tangraphic Sea analysis has been lost.

5.9.22:31: Are Khitan large script characters

1054 (升 + a dot on the right)

1056 (1054 with the first stroke 丿 stretching over both vertical strokes of 廾 plus a dot on the right)

in N4631 further variants of 𫝢 / ⿰歹卞 <shing>?

5.10.1:49: Chinggeltei's  關於契丹文字的特點 (1997: 110) includes 𫝢  in its list of Khitan large script characters. OBLIQUE AFFRICATES IN CHINESE

Today on Wikipedia I saw that standard Mandarin 斜 xie [ɕjɛ] 'oblique' corresponded to Lower Yangtze Mandarin

colloquial [tɕia]

literary [tɕiɪ]

with affricate initials. The colloquial reading preserves an earlier -a going all the way back to Old Chinese; the literary reading has an innovative raised vowel [ɪ].

The dictionary Middle Chinese initial is *z-. Other dialects of Middle Chinese might have had *dz-. In any case, the Old Chinese word began with *sɯ-, though what was between that *sɯ- and *-a is not clear: *sɯ.ɢa, *sɯ.ja, and *sɯ.la are all possible. There is no known external comparison that could narrow down the possibilities. The character 斜 has the phonetic 余 *Cɯ.la, but the character 斜 dates from Han times, and at that point *ɢ, *j, and *l might have already merged into *j. (邪 'slant' - a homophone of 斜 in Middle Chinese - may be a pre-Han spelling of the same word. But its phonetic 牙 has a velar nasal initial *ŋ-!)

My hypothetical Middle Chinese *dz- might be from *sɯ.ɢ- > *s.ɢ- > *zɢ- > *zd- > *dz-. But it's more likely that it results from a Late Old Chinese or Middle Chinese confusion of *z- with *dz-. Japanese merged *z- and *dz- into /z/ which is now [dz] initially, [z] medially, and [ddz] when geminated.

Xiaoxuetang reports affricate initials in 斜 in

Mandarin: 天長 Tianchang [tsʰ] (the sole Mandarin example on the site)

Wu: 丹陽 Danyang [dʑiɑ] ~ [dʑiɒ], etc.

(Hui: no data; NB: this 徽 Hui is not the Mandarin-speaking Muslim 回 Hui, whose name is pronounced with a different tone)

Gan: 湖口 Hukou [dʑia], etc.

Xiang: 雙峰 Shuangfeng [dʑio], etc.

Min: 廈門 Amoy [tsʰia] (colloquial; literary [sia]), etc.

Yue: Cantonese [tsʰɛ] (where long ago I first observed this affricate initial corresponding to Middle Chinese *z-; I didn't know such an initial was in Mandarin too)

Ping: 永福 Yongfu [tsʰiə], etc.

Hakka: 梅縣 Meixian [tsʰia] (colloquial; literary [sia]), etc.

The affricate initial is represented in nearly every branch. No Jin variety on that website has an affricate reading. But all but one of the unclassified varieties has an affricate initial.

It seems that literary varieties of Middle Chinese kept *z- (> modern [s]) apart from *dz- while colloquial varieties merged them to various extents.

5.8.13:40: For comparison, let's see if the above dialects also have affricates for Middle Chinese 徐 *zɨə 'to walk slowly; a surname':

Mandarin: 天長 Tianchang [tʃʰʮ], etc.

Wu: 丹陽 Danyang [dʑyz] (sic), etc.

Hui: 旌德 Jingde [tsʰʮ], etc.

Gan: 湖口 Hukou [dzi], etc.

Xiang: 雙峰 Shuangfeng [dy] (sic) ~ [dʑy], etc.

Min: 廈門 Amoy [tsʰi] (colloquial; literary [su]), etc.

Yue: Cantonese [tsʰœy], etc.

Ping: 永福 Yongfu [tsʰy], etc.

Hakka: 梅縣 Meixian [tsʰi], etc.

The only Jin variety with a reading is the most well-known: 太原 Taiyuan [ɕy]. 徐 is a common surname, so it must be in other Jin varieties. The absence of affricates in Jin readings of 斜 'oblique' makes me guess that 徐 also lacks affricates in the rest of Jin, but I don't know.

The unclassified varieties have a mix of initials: e.g.,

富川 Fuchuan [sy]

鍾山 Zhongshan [θy]

賀州 Hezhou [ty] (cf. the stop [d] in Shuangfeng above)

道縣 Daoxian [tso]

連州 Lianzhou [tsʰɛu]

To work out what's going on with them would require studies of their individual phonologies. It is a shame that Xiaoxuetang doesn't seem to have initial, rhyme, and tonal inventories online for each variety. In theory I could extract inventories from the data, but I don't have the time to do that right now. HAVE A ČĪZBURGERU: ENGLISH BORROWINGS IN LATVIAN

After mentioning Latvian datums last time with its combination of a Latin neuter suffix -um and a Latvian masculine suffix -s, I was curious to see how Baltic languages dealt with a recent influx of English loans. Baltic languages and Greek are the only modern Indo-European languages I know of that still retain ancient -s suffixes in the nominative case.

I guessed that all Latvian borrowings of English consonant-final stems would be placed in the first masculine declension like datums. And it does seem that is generally the case. See these two lists. Even sibilant-final stems are assigned to that declension: e.g., bizness (which is biznes-s and not copying the -ss of the English spelling) and finišs (< finish + -s). I might have expected them to be assigned to the second declension with -is or the third declension with -us.

The exceptions I've seen so far end in -er in English:

adapteris < adapter

menedžeris < manager

peidžeris < pager

porteris < porter

taimeris < timer

Were they assigned to the second declension by analogy with some earlier wave of -eris loans?

Not all English -er words become -eris words in Latvian: cheeseburger has become čīzburgers (with an un-English pronunciation of burger with [u] - †čīzberger would have been closer to the English original). Maybe -burger is by analogy with hamburgers, perhaps in turn influenced by Russian <gamburger>, also with [u]? No, maybe -burger is simply based on a spelling pronunciation. THE GENDER OF 'DATE' IN BALTO-SLAVIC AND ROMANCE

On the same Wiktionary page as Dutch datum 'date' (masculine despite its Latin neuter ending -um!) are

Czech datum (neuter); cf. Slovak dátum (masculine; why a long á that doesn't match Czech or Latin?; its neighbor Hungarian dátum also has a long vowel)

Serbo-Croatian and Slovene datum (masculine)

Macedonian <datum> is also masculine. The shift to masculine in Slavic is understandable since consonant-final nouns are generally masculine, and Latin -um is not a Slavic suffix and hence prone to reinterpretation as the ending of a stem.

Leaving Slavic, Latvian has no neuter, and its feminine stems generally end in vowels, so masculine datums is also understandable.

However, Latvian's sister Lithuanian has feminine data (which looks like the Latin plural!) rather than masculine †datumas (see Wikipedia on LIthuanian declension).

And going back to Slavic, Polish also has feminine data, and Bulgarian, Macedonian, Belarusian, Russian, and Ukrainian have feminine <data>. Romance languages have feminine data (French date and Romanian dată) too. Wiktionary derives the Romance forms from a Late Latin data. fdb explains:

Italian, Spanish, Portuguese (etc.) data, and French date (whence English date) are all taken from Mediaeval Latin data, the plural of classical Latin datum, but reinterpreted in these languages as a singular noun. German and Dutch use the classical singular form datum.

All of these are bookish borrowings from Mediaeval or Classical Latin (so-called cultisms) and not organic descendants of the Latin words.

[Someone asks what organic descendants would look like.]

In that case one would expect *dada in Spanish, Portuguese and Italian.

Are the -um forms in Slavic and Latvian borrowings from German Datum?

5.6.0:01: English date then got borrowed into German as das Date which is presumably neuter by analogy with Datum.

5.6.0:09: Added quotation from fdb.

5.6.0:28: Danish date from English has common gender (cf. German above).

5.6.0:32: Added Romanian dată. THE GENDER OF DUTCH '-ISM'S AND 'DATE'

Not in time for May Day ...

French communisme is masculine, as is its Latinized German equivalent Kommunismus with a restored Latin masculine nominative singular ending -us. So why is Dutch communisme (and other -isme words like socialisme) neuter?

Conversely, datum has a Latin neuter nominative singular ending -um and is still neuter in German. So why is Dutch datum masculine unlike, say, neuter museum which is still neuter in Dutch?

Are the genders by analogy with semantically similar words? Was there ever a time when de communisme and het datum were acceptable?

5.5.0:33: Google Books has examples of het datum from the 18th and 19th centuries. But I can't find any examples of de communisme in Dutch (as opposed to French where that is a preposition-noun sequence rather than a definite article-noun sequence).

Treffers-Daller (1994: 140) discusses French-Dutch gender mismatches and mentions Van Marle's hypothesis that French borrowings are marked and may receive the marked gender: the less frequent neuter gender (only 25% of Dutch nouns are neuter according to Tuinman 1967).

She also writes,

According to Volland (1986), many French loans obtain neuter gender when borrowed into German. About 60 percent of the borrowings keep the original gender in German, and 40 percent are allocated another gender. In most cases it is the masculine nouns who become neuter in German. It is remarkable that the same tendency for masculine words to become neuter exists in German and in Dutch.

Obviously Kommunismus is not one of those masculine words (though its -us may have made it resistant to gender shift). CZECH VOWEL ASYMMETRY AGAIN

Judging from the IPA for Czech at Wikipedia, Czech vowels are phonetically as well as distributionally asymmetrical:

/iː/ [iː]

/u uː/ [u uː]
/i/ [ɪ]

/o oː/ [o oː]
/e eː/ [ɛ ɛː]

/a aː/ [a aː]

The front part of the system 'tilts downward' with the exception of /iː/ which is high.

Short /i/ is lower than long /iː/ and has no back counterpart at the same height.

/e eː/ are lower than /o oː/.

How did this system come about? /i iː/ are from earlier front *i *iː and central *ɨ *ɨː.

Was there a Ukrainian-like phase in which the central high vowels became *ɪ *ɪː? (Ukrainian has no phonemic vowel length, though.) The four front vowels in stage 2 then merged into an English-like subsystem with a higher long vowel and a lower short vowel in stage 3:

Stage 1

Stage 2

Stage 3

Unlike Czech, Slovak is next door to Ukrainian, and according to the IPA at Wikipedia it has no [ɪ]; its vowel system is truly symmetrical on the phonetic level if one ignores the increasingly marginal vowel [æ]:

[i iː]

[u uː]
[e eː]
[o oː]

[a aː]

The Slovak phonology article at Wikipedia, however, paints a more complex picture: e.g., /e eː/ [e̞ e̞ː] may be phonetically higher than /o oː/ [ɔ̝ ɔ̝ː] - the reverse of Czech. (Did the presence of low [æ] - a vowel absent from Czech - incentivize speakers to raise /e eː/ for greater contrast during its heyday in the past?) Nonetheless it seems that length is not correlated with height differences unlike Czech where short and long /i/ have different heights.

Like Czech /i iː/, Slovak /i iː/ are from earlier front *i *iː and central *ɨ *ɨː So I suspect Slovak also had a Ukrainian-like phase in which the central high vowels became *ɪ *ɪ.

But maybe at some earlier point Czech and/or Slovak had a Rusyn-like stage in which central *ɨ *ɨː coexisted with front *ɪ *ɪ. I still don't understand how Rusyn can have both central /ɨ/ and front /ɪ/ since I assume both are from *ɨ. Are they in complementary distribution? Is one native and one borrowed?

5.4.0:40: Are Czech /e eː/ lower mid because they merged with */ě/ *[ɛː]? */ě/ was historically long, but its reflexes in Czech are both long and short for reasons I don't understand:

*bělъjь > bí /bliː/ 'white'

*svě > svět /svjet/ 'world'

The short reflex is /e/ which may be preceded by a secondary palatal consonant: e.g., /j/ in the case of /svjet/. CZECH VOWEL ASYMMETRY

Having written about Slavic and vowels in my last two entries, I'm going to combine the two topics together.

The standard Czech vowel system appears symmetrical if one only looks at vowels in isolation. Each short vowel has a long counterpart:


And the diphthongs form a triangle:




But distribution tells a more complex story.

Original *uː became /ou/ except "chiefly in noun prefixes" (Short 1993: 456). e.g., úraz 'injury' but urazit 'to injure'. Why was the prefix *u lengthened to an *uː later preserved in nouns? I still don't understand the backstory of length in Slavic.

Original *oː became uo and then a new /uː/ written <ů> (which I think of as <o> atop <u>); cf. Polish <ó> /u/ and Slovak <ô> /uo/ from earlier *oː. (I'd like to see a chronology of *oː-shifts in West Slavic.)

Loanwords supplied a new /oː/ and /au eu/ to balance /ou/.

Those back vowel developments did not have exact front vowel parallels. *iː did not become †/ei/ (though Short 1993: 464 reports ý /ɨː/ > /ej/ in colloquial Czech), and *eː only sometimes became /iː/ (Short 1993: 464). INDEPENDENT VOWEL SYMBOLS IN THE INDIC SCRIPTS OF THE PHILIPPINES

Indic scripts typically have two kinds of vowel symbols:

- dependent vowel symbols attached to/in 'orbit' around consonant symbols

- independent vowel symbols

Depending on the script, vowels may be written with dependent vowel symbols plus a carrier <°a>, independent vowel symbols, or a mix of the two.

The Indic scripts of the Philippines generally only have three independent vowel symbols each, and on closer observation, some of those symbols are derived from others:

Baybayin for Tagalog on central Luzon in the north: three truly independent symbols <°a °i °u>

Hanunoo on southern Mindoro in the center: independent <°a °u:>; <°i> looks like <°a> plus a stroke on the bottom right (unlike either the dependent vowel <i> on the top or the dependent vowel <u> on the bottom)

Buhid on southern Mindoro in the center: independent <°a °u>; <°i> looks like <°a> plus a stroke on the bottom like the dependent vowel <u> rather than the dependent vowel <i> on the top)

Tagbanwa on Palawan in the southwest: <°a °i> have the same basic shape with different extra strokes: one on the bottom for <°a> and another on top for <°i>; neither stroke matches the dependent vowel <u> on the bottom or the dependent vowel <i>); only <°u> is not derived from another symbol

Kulitan for Kapangpangan on central Luzon in the north: independent <°i °u>; <°a> looks like <°u> plus an extra stroke on the bottom left (unlike the dependent vowel <u> on the bottom); <°e °o> look like <°a°i> and <°a°u>, reflecting their apparent origin as "monophthongized diphthongs".

Tagalog is the most conservative; it alone preserves three completely different vowel symbols that still resemble their Indic prototypes.

It is not surprising that the Mindoro scripts have the same innovation (replacing <°i> with a <°a>-derivative).

Tagbanwa and Kulitan seems to have each gone their own way. Tagbanwa is isolated by the sea, but Kulitan is next door to Baybayin. WHAT HAPPENED TO UKRAINIAN NOMINATIVE PLURAL ADJECTIVES?

I almost 'corrected' Ukrainian <zorjani> 'stellar (nom. pl.)' to †<zoryany> with a <y> ending that I expected by analogy with Russian <ye> and Belarusian <yja> < *-ye after 'hard' (nonpalatalized) stems. But the nominative plural ending is <i> regardless of stem type. Compare:

stem type
'soft' (palatalized)
m. nom. sg.
nom. pl.
m. nom. sg.
nom. pl.

Did <i> spread by analogy through all adjective paradigms despite the fact that hard stems outnumber soft stems (which would have led me to guess that <y> would win out)? Did the higher frequency and lower markedness of <i> in Ukrainian help it to defeat its less palatal competitor <y>?

5.1.0:07: Added table.

5.1.22:22: Maybe Ukrainian shares an areal feature with Polish which has soft novi 'new (m. pers. nom. pl.)' instead of †nowy. (But the non-m. pers. nom. pl. is still hard nowe rather than soft †nowie.)

Slovak, another neighbor of Ukrainian, has a mixed pattern like Polish: soft noví 'new (m. anim. nom. pl.)' ~ hard nové (other nom. pl.). A consistently hard paradigm would have †nový́ ~ nové and a consistentl soft paradigm would have noví ~ †novie. (Both í and ý́ are /ɨː/, but in the past I assume ý was something like /ɨː/. No long /ieː/ exists.)

So does Czech: noví 'new (m. anim. nom. pl.)' instead of †nový. (As in Slovak, both í and ý are /ɨː/, but in the past I assume ý́ was something like /ɨː/.) Unlike any of the above languages, Czech has three types of nominative plurals:

1. soft noví 'new (m. anim. nom. pl.)'

2. hard nové 'new (m. inanim. + fem. nom. pl.)' instead of soft †noví < *-ie

3. hard nová 'new (neut. anim. nom. pl.)' instead of soft †noví < *-ie < *-a̋

Interslavic doesn't have a 'soft' e, so the non-m. anim. nom. pl. has to be hard:

soft novi 'new (m. anim. nom. pl.)'

hard nove 'new (other nom. pl.)'

This two-way distinction is hard for me to grasp since I'm accustomed to Russian having a single form for both categories. STAR WARS IN SLAVIC

Having just linked to the Belarusian Wikipedia's entry on Star Wars, I was surprised by how Star was translated as <Zordnyja> which isn't cognate to the 'star' word in most of the other Slavic titles for the movie:

South Slavic

Bosnian zvijezda 'star'

Croatian Zvjezdani 'stellar'

Serbian zvezda 'star'

Slovenian zvezd 'of the stars'

Bulgarian <Mežduzvezdni> 'interstellar'

Macedonian <zvezdite> 'the stars'

West Slavic

Polish Gwiezdne 'stellar'

Silesian Gwjezdne stellar' (did an author of this article translate the title?)

Slovak Hviezdne 'stellar'

East Slavic

Russian <zvëzdnye> 'stellar'

The exceptions are Ukrainian <Zorjani> 'stellar' and Czech Star (Wars) (untranslated!).

I was expecting a Belarusian adjective derived from <zvjazda> 'star' (the name of this newspaper that I've seen online) - something like Interslavic zvězdne. <Zvjazdnyja>?

Wiktionary derives Belarusian <zorka> 'star' from Proto-Slavic *zorja. But is the word attested outside East Slavic? The only cognate I know of is Ukrainian <zirka> 'star' whose <i> is unexpected; normally *o > <i> before or *ъ, not *a. (The Ukrainian adjective <Zorjani> 'stellar' preserves *o.)

4.30.1:30: Filled out the list of equivalents of Star and added the final note about <Zorjani>.

4.30.21:21: I might as well survey the second half of the title in Slavic as well. I'm going to guess that it's some cognate of Belarusian <vojny> 'wars' almost everywhere: cf. Interslavic vojny 'wars'. I seem to recall an exception other than the untranslated Wars in Czech - ah, it was Serbo-Croatian!

South Slavic

Serbo-Croatian ratovi 'wars' (but would vojne be theoretically possible?)

Slovenian vojna 'war'

Bulgarian <vojni> 'wars'

Macedonian <vojna> 'war'

West Slavic

Polish and Silesian wojny 'wars'

Slovak vojny 'wars'

East Slavic

Ukrainian <vijny> 'wars' (nom. pl. of <vijna>; as with <zirka>, why did *o become <i> even without a following or *ъ?)

4.30.23:23: Duh, the word was *vojьna in Proto-Slavic. And I suppose <zirka> 'star' is from a earlier *zorьka or  *zorъka.

Russian <vojny> 'wars'

Serbo-Croatian rat turns out to be the cognate of Ancient Greek ἔρις éris 'strife' ... and English earnest!? I see the word is in East Slavic as well, but not West Slavic, so vojna is the best choice for Interslavic since it's understood across the entire family. TABLES AND FALCONS: THE FATE OF FINAL *L IN SLAVIC

Polish kiełbasa /kʲewbasa/ from my last two entries is spelled with ł but is no longer pronounced with an [l].

Standard Polish once had three kinds of phonetic laterals, but only two survive today: a palatal allophone before /i/ and a dental allophone elsewhere.

Earier phonetic
Current phonetic
Current phonemic
Example (from de Bray 1980: 261)
łapa 'paw'
lato 'summer'
list 'letter'

The reflexes of Polish laterals seem straightforward: old hard *l becomes /w/ and old soft *lʲ becomes /l/.

Hence *stolъ 'table' and *sokolъ 'falcon' became Polish stół /stuw/ and sokół /sokuw/.

(I can't predict when *o became ó /u/.)

What does not seem straightforward to me is the fate of syllable-final *l in Ukrainian, Belarusian, and Serbo-Croatian.

There is a tendency toward shifting syllable-final *-l to /w w o/ in those languages: e.g.,

Ukrainian /stojaw/ 'stood' (masc. sg.) < *-l

Belarusian /stajaw/ 'stood' (masc. sg.) < *-l

Serbo-Croatian /stajao/ 'stood' (masc. sg.) < *-l

The best-known example might be Serbo-Croatian /beograd/ (cf. English Belgrade reflecting earlier *l).

Nonetheless, 'table' and 'falcon' may retain *-l:

Ukrainian /stil/, /sokil/

Belarusian /stol/, /sokal/

Serbo-Croatian /sto/ (Serbia) ~ /stol/ (Croatia), /soko/ (Bosnia, Serbia) ~ /sokol/ (Croatia)

(Countries are from Wiktionary entries.)

In Belarusian, word-final *l remains except in the past tense masculine singular (Mayo 1993: 893). (Did it erode there due to high frequency?)

The situation in Ukrainian seems similar, though I know of one case of /w/ < *l that is not a past tense masculine singular: /piw/ < *polъ 'half'.

Could /l/ retention in Croatian stol 'table' be motivated by avoiding homophony with 'hundred' which is /sto/ across Slavic? That doesn't explain Croatian sokol 'falcon', though. Browne and Alt (n.d.: 20) write,

In adjectives and nouns it [*l > o] is widespread though some words avoid it: masculine singular nominative mio [< *mil] 'nice', feminine mila, but ohol 'haughty', feminine ohola.

I assume borrowings postdating *l-shifts retain final -l in Serbo-Croatian: e.g., hotel (not †hoteo).

Ukrainian and Belarusian seem to favor borrowing foreign -l-words with /lʲ/:

U /hote/, B /hate/ 'hotel'

U /alkoho/, B /alkaho/ 'alcohol'

but U <mark hemill> and B <mark hèmil>, both /mark hemil/ 'Mark Hamill'. (The B form is from the B Wikipedia entry for the original Star Wars [Зорныя войны. Эпізод IV: Новая надзея].)

4.29.21:57: Added Mayo on Belarusian, Ukrainian /piw/, Browne and Alt quotation, and everything after that. IRREGULARITIES IN 'KIELBASA' REVISITED

Yesterday I discovered in de Bray's (1980: 258) book on West Slavic that Polish kiełbasa /kʲewbasa/ is in fact the regular reflex of an earlier *kl̩basa (cf. Slovak klbása ~ klobása). I assume his hard *l̩ goes back to Proto-Slavic *ъl.

But I still don't know how to account for the front vowels of

Ukrainian ківбаса <kivbasa> < *kilbasa

Belarusian кілбаса <kilbasa> ~ келбаса <kelbasa>

Are they borrowings of forms resembling Polish kiełbasa or pre-Polish (proto-West Slavic?) *kl̩basa? If they are from *kl̩basa, their front vowels could have been inserted to avoid /klb/-clusters that are not possible in East Slavic.

My guess is that Belarusian келбаса <kelbasa> is a borrowing from Polish kiełbasa, whereas Belarusian кілбаса <kilbasa> is an older form with an epenthetic vowel.

Ukrainian ківбаса <kivbasa> was presumably borrowed as *kilbasa before *l > <v> /w/. I don't think it's from Polish since

- the height of the first vowel doesn't match

- Polish ł apparently became [w] in the standard language only in the early twentieth century (Wikipedia); Morfill (1884: 1) says it is "a very strong l", not [w].

- Polish ł is still [ɫ̪] and not [w] in eastern dialects of Polish in contact with Ukrainian (Wikipedia)

A recent borrowing from the modern standard pronunciation of kiełbasa would be †<kevbasa> and a borrowing from a pre-20th century standard pronunciation or an eastern dialectal pronunciation would be †<kelbasa>. IRREGULARITIES IN 'KIELBASA'

Wiktionary derives Polish kiełbasa /kʲewbasa/ and its relatives from a Proto-Slavic *kъlbasa, in turn borrowed from some Turkic word similar to modern Turkish külbastı 'roasted meat', lit. 'ash-pressed'. Irregularity within Slavic implies that the word was borrowed more than once.

The Polish word and nonstandard forms like

Ukrainian ківбаса <kivbasa>

Belarusian кілбаса <kilbasa> ~ келбаса <kelbasa>

have front vowels /i e/ that I would not expect from Proto-Slavic *ъ.

At first I thought that maybe the Polish and Belarusian forms were from an earlier Ukrainian

*külbasa < *kölbasa < *kolbasa (cf. standard Ukranian ковбаса <kovbasa>)< *kъlbasa

but *o only raises to і in standard Ukrainian before a lost weak jer ( or  *ь) which wasn't in this word. Maybe the <kivbasa> dialect worked differently.

My current guess is that the /i e/ vowels in Polish, Ukrainian, and Belarusian reflect attempts to imitate Turkic ü and are not from *o or *ъ.

The Belarusian forms have /l/ instead of /w/ < *l corresponding to Ukranian <v> /w/ < *l and Polish ł < *l. This suggest that the Belarusian borrowings postdate the shift of *l to /w/ in Belarusian. But maybe I misunderstand when *l becomes /w/ in Belarusian. A SHARED *SHCH-IFT IN CHINESE AND RUSSIAN

Last Friday (yes, I'm behind), I saw

新商品 'new product', lit. 'new trade item'

on packaging.

In Old Chinese, 商 was

either *sɯ-taŋ (corresponding to Baxter and Sagart 2014's *s-taŋ)

or *sɯ-laŋ (corresponding to Schuessler 2009's *lhaŋ)

and in Middle Chinese, it was *ɕɨaŋ.

It occurred to me that the palatalization of *sɯ-t- to *ɕ-

*sɯ-t- > *sɯ-tɨ-  > *stɨ- > *stɕɨ- > *ɕtɕɨ- > *ɕːɨ- > *ɕɨ-

was like what I understand to be the palatalization of *stj- to [ɕː] in Russian:

*stj- > *stɕ- > *ɕtɕ- > щ [ɕː]

Above I presume there was an intermediate *ɕtɕɨ-stage at some point in Old Chinese resembling romanizations of Russian щ as šč or shch (e.g., Хрущёв Khrushchev), but without external evidence (e.g., Old Chinese transcriptions of a foreign word with šč-), it's impossible to say when that point was.

3.14.11:45: I assume that Russian alternations such as

вместить 'to contain (perf.)' ~ вмещу 'I will contain'

can be internally reconstructed as

*vmestitĭ ~ *vmestju

to fit the pattern of

вменить 'to consider (perf.)' ~ вменю 'I will consider'

< *vmenitĭ ~ *vmenju

Ideally I'd like to find an example of initial щ- [ɕː] from *stj-, but I think initial щ [ɕː] is normally from *sk-. A possible exception I found in Preobrazhensky's Etymological Dictionary of the Russian Language is щегол 'goldfinch'; Duden says German Stieglitz 'goldfinch' is of Slavic origin.

Proto-Slavic *štjegŭlŭ? > *ščegŭlŭ

East Slavic:

Ukrainian щиголь <ščyhol'>, щоголь <ščohol'>, щоглих <ščohlix>

Belarusian щигель <ščihel'>, щиглик <ščiglik> (I have kept Preobrazhensky's spellings with щ and и instead of modern шч and і)

(why -ль as if from *-lĭ?)

(no South Slavic reflexes? I would expect Bulgarian initial щ- [št], Serbo-Croatian initial št-, and Slovene initial šč-)

West Slavic:

Czech stehlec, stehlík (with ste- rather than the regular ště- [ʃcɛ] - could this be a borrowing from some variety of German in which st- was [st] instead of [ʃt]?)

Polish szczygieł [ʂtʂɨɡʲɛw]

Upper Sorbian šćihlica [ʃtsʲihlitsa]

Lower Sorbian ščgeľc [ʂtʂgɛlts] (I have kept Preobrazhensky's spelling with ľ instead of modern l)

The reflexes of *stj- could have had parallels in Old Chinese at different stages and/or different places. A LITTLE MISTAKE: ÍT ÓT TO BE THE PHONETIC

In my last post, I wrote that 乚 ất was the phonetic of the Vietnamese Chữ Nôm character 𡮒 ót 'a kind of fish'. After announcing that post on Twitter, I realized that the actual phonetic was 𠃝 which has two readings, ít 'little' and út 'youngest'. I didn't think of 𠃝 because 乙 appears as 乚  in 𡮒.

If the creator of 𡮒 had the reading út in mind for its phonetic 𠃝, the score of 𡮒 would be 2 + 3 + 2 + 2 = 9 - much higher than my original score of 6.

乙 is a 'Semitic phonetic': it can represent syllables with a wide range of vowels as long as those vowels are within the consonantal frame [ʔ-t]:

Neutral or achromatic vowels (neither palatal nor labial)

ướt [ʔɨət]

ất [ʔət]

ớt [ʔəːt]

𢖮 ắt [ʔat]

𢖮 át [ʔaːt]

Palatal vowels

𠃝 ít [ʔit]

𠮙 ét [ʔɛt]

Labial vowels

𠃝 út [ʔut]

𡮒 ót [ʔɔt]

All of those syllables have the sắc tone written with an acute accent. Syllables with initial glottal stops and final stops regularly develop that tone.

Such a range of vocalism for a phonetic is unusual in Chữ Nôm. In my 2003 book, I proposed that phonetics generally belong to three vowel classes: neutral, palatal, or labial.

'Semitic phonetics' are exceptions to that generalization: e.g., 曰 viết in

neutral: 曰 vất [vət], 抇 vớt [vəːt]

palatal: 𢪏 vít [vit], 𧿭 vết [vet], 𢪏 vét [vɛt]

labial: ⿰曰𡿨 vót [vɔt]

3.1.0:39: Compare the ranges of readings for 'Semitic phonetics' above with those for کت <kt> listed in Hayyim's  New Persian-English Dictionary:

neutral: kat

palatal: ket

labial: kot

(Of course, Persian is not a Semitic language, but it is written in a Semitic script.)

One difference is that all of those k-t readings have no tones, whereas all of the readings for Chữ Nôm characters with the two 'Semitic phonetics' above have the same tone. Perhaps the term 'Semitic phonetic' is a misnomer if the consonantal frames are actually consonant-and-tone frames.

cam is a third 'Semitic phonetic' whose derivatives below have readings with three different tones (ngang, huyền, sắc) as well as three different vowel classes:

neutral: 坩柑泔 cam [kaːm], 紺 cám [kaːm], ⿰月甘 cằm [kam], 𩚵 [kəːm], 鉗 cườm [kɨəm]

palatal: 鉗 kìm [kim], ghìm [ɣim], kiềm [kiəm], kềm [kem], kèm [kɛm]

labial: 鉗 cùm [kum], 柑 cùm [kum]

Note, however, that all but one of the readings in that sample have either the ngang or huyền tones which are variants of the same proto-tone conditioned by voicing or its absence in proto-onsets. Also, only one of those characters is a made-in-Vietnam character (⿰月甘). 甘 was already a neutral and palatal phonetic in Middle Chinese because Old Chinese *a often had palatal reflexes after nonemphatic initials. An ideal example of a 'Semitic phonetic' would have many made-in-Vietnam derivatives with a wide range of vowels and tones. I should dig deeper to see if I can find one. ÓT TO BE WRITTEN: FISHING FOR PHONETICS

The Vietnamese Chữ Nôm script represents Vietnamese syllables with existing and modified Chinese characters. The problem is that Vietnamese has many more syllables than Sino-Vietnamese, the subset of Vietnamese syllables that are Chinese character readings. For instance, Vietnamese has syllables ending in -ót, a rhyme absent from Sino-Vietnamese.

In my last two posts, I looked at Vietnamese solutions for writing the syllable lót.

I got curious about how other -ót syllables were written and found several strategies. My examples are not exhaustive, and I have omitted glosses in most cases since I am focusing on readings.

1. Overall match

⿰口脫 thót : 脫 thoát (score: 2 + 3 + 2 + 2 = 9; not a 10 only because the vowel heights don't match: o [ɔ] is higher than oa [wa], though I could be generous and say oa is like [o] + [a], and [ɔ] is between those two vowels in height)

2. Matching the onset and coda without much regard for the vowel

𡮒 ót 'a kind of fish' : 乚 ất (the unwritten onset is [ʔ]; score: 2 + 0 + 2 + 2 = 6)

mót : 蔑 miệt (score: 2 + 1 + 2 + 1 = 6; the only matching vowel quality is length*)

⿰曰𡿨 vót : 曰 viết (score: 2 + 1 + 2 + 2 = 7; the only matching vowel quality is length)

This is the consonantal skeleton or Semitic strategy. If English were written with such a strategy:

cat = drawing of a cat

Kate = <woman> + <cat>

kite = <wing> (representing flight) + <cat>

cut = <blade> + <cat>

coat = <clothes> + <cat>

coot = <bird> + <cat>

caught = <hand> + <cat>

Cf. the reverse Semitic strategy (5 below).

3. Matching the rhyme without much if any regard for the onset

3a. Glottal onset : nonglottal phonetic

𡁾 hót : 說 thuyết < *ɕ- (or *sʰ-?) (score: 0 or 1 + 2 + 2 + 2 = 6 or 7, depending on whether the aspiration of th- [tʰ] < *sʰ-? counts as a partial match for h-)

3b. *Palatal onset : nonpalatal phonetic

chót with initial [c] : 卒 tốt < *(t)s- (score: 1 + 3 + 2 + 2 = 8)

giót < *ɟ- < *CV-c- : 卒 tốt < *(t)s- (score: 0 or 1 + 3 + 2 + 2 = 7 or 8, depending on how close the initials were when 埣 was created: *CV-c- is not far from *(t)s-, whereas modern gi- [z] ~ [j] is far from t-)

xót < *ɕ- < *cʰ- : 卒 tốt < *(t)s- (score: 0 or 1 + 3 + 2 + 2 = 7 or 8, depending on how close the initials were when 埣 was created: *cʰ- is not far from *(t)s-, whereas modern x- [s] is far from t-)

⿰律𡿨 xót < *ɕ- < *cʰ- : 律 luật (score: 0 + 2 + 2 + 1 = 5)

3c. *Retroflex onset : nonpalatal phonetic

sót < *ʂ- < *Cr- : 卒 tốt < *(t)s- (score: 0 or 1 + 3 + 2 + 2 = 7 or 8, depending on whether the proto-onset was *sr- which isn't too far from *(t)s-; *(t)s- had hardened to t- by the time *Cr- fused into *ʂ-)

rót < *r- or *CV-s- (proto-onset unknown) : 卒 tốt < *(t)s- (score: 0 or 1 + 3 + 2 + 2 = 7 or 8, depending on whether the proto-onset was *CV-s-)

3d. Palatal nasal onset nh- [ɲ] : oral onset phonetic

nhót : 卒 tốt < *(t)s- (score: 0 + 3 + 2 + 2 = 7)

𦝬 nhót : 突 đột with initial [ɗ] < *t- (score: 0 + 3 + 2 + 1 = 6)

𣑵 nhót : 聿 duật with initial [z] ~ [j] < *dʲ- < *j- (score: 0 or 1 + 2 + 2 + 1 = 5 or 6, depending what the onset of 聿 was when 𣑵 was created)

3e. Lateral onset : nonlateral onset phonetic

⿰貝骨 lót : 骨 cốt (score: 0 + 3 + 2 + 2 = 7)

lót : 卒 tốt < *(t)s- (score: 1 + 3 + 2 + 2 = 8)

3f. Labial onset : nonlabial onset phonetic

𡁾 vót : 說 thuyết < *ɕ- (or *sʰ-?) (score: 0 + 2 + 2 + 2 = 6)

vót : 卒 tốt < *(t)s- (score: 0 + 3 + 2 + 2 = 7)

This character could belong to 2 or 3a depending on which part is phonetic:

⿰孛乙 ót 'back of brain' : 孛 bột '' 'comet' + 乙 ất 'second Heavenly Stem' (score: 0 + 3 + 2 + 1 if 孛 is phonetic or 6 or 2 + 0 + 2  + 2 = 6 if 乙 is phonetic)

Neither part is obviously semantic. The absence of any component meaning 'brain' or even 'head' is puzzling. Could this be a double phonetic compound with 孛 approximating the vowel and 乙 the rest?

4. Approximating the onset, vowel, and tone without regard for the coda

𠲿 thót : 束 thúc (score: 2 + 3 + 1 + 2 = 8)

I suspect 𠲿 was created by a speaker of a central or southern dialect in which *-t > [k]. If so, 𠲿 is really an example of strategy 1, and the score should be 9 (with a penalty solely for vowel height mismatch).

5. Approximating the vowel and tone without regard for the consonants

The reverse Semitic strategy (cf. 2 which is the Semitic strategy).

hót : 束 thúc (score: 0 + 3 + 1 + 2 = 6)

I suspect this usage of 束 started with a speaker of a central or southern dialect in which *-t > [k]. If so, 束 is really an example of strategy 3, and the score should be 7 (with penalties for the onset and vowel height mismatch). The score could be raised to 8 if the aspiration of th- [tʰ] counts as a partial match for h-.

No solution has a score of 4 for vowels simply because no phonetic has a Sino-Vietnamese reading with o [ɔ]. The maximum possible score for -ót syllables is 9 out of an ideal of 10 (= 2 + 4 + 2 + 2). The actual scores above range from 5 to 9. It is not possible to determine the median or the mode of scores for ót-characters from the data in this post because it is incomplete and only typologically rather than statistically represenative: e.g., I omitted all but one strategy 1 character with a score of 9 because near-exact matches are boring.

Until now Chữ Nôm characters and readings have been treated as a uniform, timeless body. The next phase of Chữ Nôm studies should take space and time into account: where and when do certain spellings arise, and what can they tell us about Vietnamese phonetics in a given place and period?

*I consider all Vietnamese vowels and diphthongs to be the same length for scoring purposes with the exceptions of the short vowels ă [a] and â [ə] which cannot appear in syllable-final position because all Vietnamese syllables must be bimoraic. Hypothetical *Că and *Câ-syllables would be monomoraic and therefore not permissible. A LÓT OF BRIBES OF BONES AND SHELLS

字典𡦂喃引解 Tự Điển Chữ Nôm Dẫn Giải ‘Character Dictionary of Chữ Nôm with Quotations and Explanations' lists

⿰貝骨 (not in Unicode) lót 'bribe'

as a homophone of lót 'to add a layer beneath or inside' from yesterday. (I suspect the noun is an extension of the verb: a bribe is something one pockets - put inside.)

bối 'shell' on the left is the monetary radical. It's not surprising.

What is surprising is 骨 cốt 'bone' on the right with initial [k] instead of [l]. Or is it?

Using yesterday's scoring system for phonetic fidelity, ⿰貝骨 is a 7:

- the initial consonant is a 0 - [k] and [l] have nothing in common

- the vowel is a 3 - o [ɔ] and ô [o] are both back rounded and of the same length; only their height differs

- the final consonants is a 2 - a perfect match

- the tone is a 2 - a perfect match

Taberd lists a spelling of lót 'bribe' with a matching initial and an ironic original meaning:

律, originally for luật 'law' (bare phonetic)

I find his entry format confusing:

— 揬 | đút —, subornare

Why are the dashes in the Chữ Nôm and the Quốc Ngữ romanization on opposite sides? Why isn't the entry like this?

揬 — | đút —, subornare

đút, another word for 'bribe' (presumably an extended usage of đút 'to insert'), has two other spellings without the 扌 'hand' radical (the means of insertion):

⿰貝突 with the monetary radical plus the same phonetic 突 đột 'suddenly'

is there the syllables of the redundant compound ⿰貝突⿰貝骨 đút lót 'bribe' would have matching radicals with this spelling: cf. Sino-Vietnamese 賄賂 hối lộ 'bribe' with double monetary radicals

đút with the monetary radical plus the phonetic 卒 tốt 'to end'

Let's score those spellings:

揬 and ⿰貝突: initial 2, vowel 3, final 2, tone 1 = 8

賥: initial 1.5 (t- is closer to đ- than, say, l- which would be a 1), vowel 3, final 2, tone 2 = 8.5

Do scores correlate with textual frequency? Did writers tend to favor better phonetic matches? Probably not. I admit my scoring is arbitrary and for fun. And timely given that the


Thế vận hội Mùa đông

'World athletic meeting Season winter' = 'Olympic Winter Games'

are still going. Though not for long - they end tomorrow.

(I wanted to type a made-in-Vietnam character for mùa 'season', but my editor doesn't support CJK Unified Ideographs Extension E. And it probably never will since KompoZer's development has been frozen since 2010.)

