REFLEXES OF PROTO-TAI *P.T- IN SAEK
Earlier today (in a table in
an addendum I finished on 6.14) I mentioned the 'famous' Saek word for
'eye' (praː) which attracts attention because it's not like
Thai taː or similar words in other Tai languages. Pittayaporn
(2009: 323) reconstructs its Proto-Tai source as *p.ta which
elegantly accounts for the p-, -r- (< *-t-), and t-.
That made me curious about whether Proto-Tai *p.t- always became pr- in Saek. Going through Pittayaporn's list of Proto-Tai reconstructions, I see that Proto-Tai *p.t- has two different reflexes:
1. pr- as in 'eye' (above) and pra:j 'die' (Pittayaporn 2009: 357)
2. t- as in tɤ: 'gizzard' (Pittayaporn 2009: 330)
The presyllable *p.- must have been lost in the ancestor of
Saek 'gizzard'; it is reconstructible on the basis of Bao Yen pʰɤɰ
whose aspiration is from *-r̥- < *-r- < *-t-
(cf. Cao Bang tʰɤj with the same source of aspiration).
Pittayporn (2009: 328) reconstructs Proto-Tai *p.tak
'grasshopper' even though that word has no reflexes in Saek or Bao Yen.
Does it have any reflexes with p-like initials? I think he
reconstructs *p.t- on the basis of forms like Cao Bang and
Shangsi tʰak which have aspiration from *-r̥-
< *-r- < *-t-
(as in Bao Yen). Even without Saek or Bao Yen or anything labial, the
pattern of initials in Cao Bang and Shangsi matches that
of *p.t-words rather than *t-words:
If Proto-Tai 'grasshopper' were simply *tak, the Cao Bang and Shangsi reflexes would be †tak with †t-.
6.15.10:16: Old Chinese had many words of the 'gizzard' type that
had variants with and without presyllables: e.g., 扶 'to crawl'.
|Early Old Chinese
|Middle Old Chinese
|Late Old Chinese
|Early Middle Chinese
|Late Middle Chinese
At a stage even before Early Old Chinese, the word may have been *Ni-pʰa, *Nə-pʰa, or *Nu-pʰa with a high series vowel that was later reduced to *ɯ in an unstressed position and ultimately lost.
In Early Old Chinese, the word had developed a variant without a
presyllable. *pʰa is comparable to English 'cause, a
variant of because without a presyllable be-.
Presyllable loss - and other forms of reduction - are not entirely
mechanically predictable. Just because because could lose its be-
doesn't mean that it always did, much less that all be-words
had such variation: e.g., there is no monosyllabic variant †lieve
In Middle Old Chinese, the high vowel of the presyllable conditioned
the warping of *a to *ɨa.
The variant without a presyllable had no high vowel and was subject to
developing pharyngealization. I write pharyngealization after the
initial consonant, but it was a quality of the entire syllable.
In Late Old Chinese, *N-pʰ- fused into *b-. *ɨ
rounded to *u after labials. Pharygealized *a backed to
*ɑ. Pharyngealization disappeared after leaving its mark on the
In Early Middle Chinese, *a raised and rounded to *o after *u. *ɑ raised and rounded to *ɔ.
In Late Middle Chinese, the vowels raised further: *uo > *u,
*ɔ > *o. *b- became breathy *fʱ before *u.
In Mandarin, breathiness conditioned tone 2 before being lost. Open syllables without that breathiness or any laryngeals developed tone 1. *o raised even further to /u/.
痡 'suffering' and 鋪 'to spread out' both have two variants, one with
a presyllable and one without. The bare version happens to be
homophonous with the monosyllabic version of 'to crawl'.
|Early Old Chinese
|Middle Old Chinese
|Late Old Chinese
|Early Middle Chinese
|Late Middle Chinese
*pʰ-, unlike *b-, did not develop a breathy reflex in Late Middle Chinese. As a result, Late Middle Chinese *fu became Mandarin /fu1/ rather than /fu2/ with tone 2 conditioned by *breathiness.
I suspect that the sesquisyllabic (and even earlier disyllabic)
versions of 痡 'sufferihg' and 鋪 'to spread out' had very different
first halves: e.g., *kupʰa and *pipʰa,
etc. The original first consonants are not recoverable, and all that
can be said about the original first vowel was that it was nonlow; a
low series vowel (*a *e *o) would not have conditioned the
warping of *a to *ɨa. *ɯ
is my symbol for an unknown high series vowel. So the 'homophony' of 痡
'sufferihg' and 鋪 'to spread out' is an illusion caused by my agnostic
notation *Cɯ-pʰa; the two words may not have been homophonous
until Middle Old Chinese.
I don't know why 鋪 'to spread out' is written with the 金 'metal' radical. The sesquisyllabic version of 'to spread out' has a more common spelling 敷 with the radicals 方 'direction' and 攵 'action with hand'¹ which make more sense. 敷 is not a spelling of the monosyllabic version *pʰa.
Schuessler (2007: 173) regards 鋪敷 'to spread out' to be cognate to 布 *pa-s 'to spread out' and 博 *pa-k 'wide'. The aspirated initial *pʰ- may be from some earlier cluster like *kp- (which is absent from Baxter and Sagart's 2014 reconstruction). Perhaps the earliest reconstructible form of 鋪 'to spread out' is *kɯ-pa. The two Middle Old Chinese forms would then both reflect the presyllable.
|Stage 1: Early Old Chinese
|Stage 2: early presyllabic vowel loss
|Stage 3: vocalic transfer
|Stage 4: late presyllabic vowel loss
|Stage 5: aspiration
|Stage 6: Middle Old Chinese
In Stage 1, there is only one form of the word.
In Stage 2, the word develops a monosyllabic variant *kpa.
In Stage 3, the vowel of *kpa remains unbent since there is no presyllabic high vowel to condition the bending of *a to *ɨa.
In Stage 4, the presyllabic vowel of *kɯ-pɨa was lost.
In Stage 5, *kp- became *pʰ- - a change that
probably also occurred in Middle Korean centuries later.
In Stage 6, the variant without a high vowel developed pharyngealization.
I forgot about the use of 布 *pa-s 'to spread out' to write 'cloth' (a borrowing from an Austroasiatic language: cf. Katu [Kantu dialect] kapaːs 'cotton', Kuy kpah 'cloth', and Sanskrit kārpāsa- 'cotton', also an AA borrowing) which fits my hypothesis of an earlier *k- in 'to spread out', a native word that happened to sound like 'cloth'. The *k-p-word was later reborrowed with disyllabic spellings:
幏布 *kæh-pɑh 'cotton' (c. 100 AD); is the first *-h for foreign *-r-, or was this spelling coined by someone who still had *kr- in 幏: *krɑh-pɑh?
古貝 *kɔˀ-pɑɕ 'cotton' (c. 430 AD)
See Schuessler (2007: 173) for further discussion, though he does not reconstruct *k- in the Old Chinese words for 'cloth' or 'to spread out'.
¹There is no Chinese word 攵 'action with hand'; the gloss refers to the use of 攵 *(r-)pʰok 'to beat' as a component in other characters. (The word 'to beat' is more commonly written 撲 which is not a component in other characters.)
DID SAEK SHIFT *Z- UNDER VIETNAMESE INFLUENCE?
Last night I stumbled upon found this passage in Pittayaporn (2009: 296):
In Saek, *z- became /j-/ merging with PT *ˀj-, probably due to influence from North-Central Vietnamese, where original *z- has become /j-/ (Alves 2007).
Northern Vietnamese has /z/ corresponding to /j/ in central and
southern Vietnamese. I think Saek would be or would have been in
contact with central Vietnamese. (It's not clear if there are Saek
villages in Vietnam anymore.)
One might conclude that the north preserves a /z/ that became /j/ elsewhere. This would then be parallel with Saek. But I am not sure that is the case. Here are the data:
|Middle Vietnamese spelling
By 'northern' I mean Hanoi and Vinh (the latter is north central);
'nonnorthern' refers to Huế (at the center) and Saigon. (I don't want
to say 'south' because Huế is certainly not in the south.)
Capital letters stand for obstruents with unspecified voicing: e.g.,
*C could be voiceless *c or voiced *ɟ.
Hyphens before consonants indicate the presence of an unspecified
presyllable: e.g, *-C- represents *c or voiced *ɟ.
preceded by a presyllable.
Exactly what the Middle Vietnamese spellings gi- d- r- stood for is not certain. I can only say that none of those three consonants were /z-/ or /j-/. I think it's possible that gi- and d- became /j-/ without a *z-phase. But maybe Saek is evidence for such a phase.
Or is it? The /z-/ of Vietnamese postdates the 17th century and long postdates the devoicing of original *voiced obstruents (possibly by the late first millennium AD). On the other hand, Saek *z- is original. Did Saek have *z- and a full set of voiced obstruents as late as the 18th century - almost a thousand years after Vietnamese devoiced its voiced obstruents?
6.14.2:21: I don't think what I wrote above is clear. Let me try again.
Phases of Vietnamese
Vietnamese consonants can be said to have gone through five phases
which I will illustrate with hypothetical examples for simplicity:
||/zaː/ ~ /jaː/||/zaː/ ~ /jaː/||/saː/ ~ /ʂaː/
Phase 1: Early Old Vietnamese:
phonemic voicing in obstruents
I am not sure Early Old Vietnamese ever had *(d)z-. It is perhaps telling that Early Middle Chinese 字 *dzɨʰ 'written character' was borrowed as ́*ɟɨːʰ (now chữ) rather than as †zɨːʰ which would have become †tữ. Later Early Middle Chinese 字 *dzɨʰ became Late Middle Chinese 字 *tsɨ̣ and was borrowed again into Vietnamese; see phase 3 below.
Phase 2: Middle Old Vietnamese:
*-r- > *-r̥- after a voiceless initial
subphonemic tones conditioned by voicing before main vowel: *voiceless > unmarked ngang tone, *voiced > grave accent for huyền tone
tones conditioned by final consonants may date between phase 1 and phase 2
Phase 3: Late Old Vietnamese:
voicing (lenition) of medial obstruents: *-t- > *-d-
*-r̥- > *-ʂ-
devoicing of voiced obstruent initials
words formerly distinguished by obstruent voicing now distinguished only by tone which had become phonemic
Late Middle Chinese 字 *tsɨ̣ 'written character' (with a devoiced initial) was borrowed as ́*sɨ̣ː (now tự). (For simplicity I use a Vietnamese tone mark even for Late Middle Chinese.)
Phase 4: Middle Vietnamese:
*Cʂ- > s- /ʂ/
Drag chain *s- > *t- > /ɗ/
Italicized forms are 17th century spellings; those spellings of consonants remain in use today. đ is /ɗ/, but the phonetic value of d is uncertain. [d] is the simplest interpretation, but [dʲ] and [ð] are also possible.
Phase 5: Modern Vietnamese: different reflexes of Middle Vietnamese s and d depending on dialect. s lost retroflexion in Hanoi (but not in Vinh which has /z/ like Hanoi and unlike the nonnorth dialects; Thompson 1987: 98). The picture for d is less clear. Two scenarios:
Scenario 1. All dialects shifted d to /z/, and
nonnorthern dialects shifted /z/ to /j/
Scenario 2. d shifted in different ways; no shared
There is no doubt that Proto-Tai *z- became /j-/ in Saek. The question is whether that shift in Saek reflects the influence of Vietnamese given scenario 1. Let's suppose scenario 1 is true. Phase 4 is in the 17th century and phase 5b perhaps starts in the middle 19th century. (The last traces of Middle Vietnamese consonantism seem to disappear after the early 19th century.) So the Saek change would have to be dated between the 17th and 19th centuries. But if the Saek change were that recent, Saek would have had *z- - and presumably other Proto-Tai voiced obstruents such as *g *d *b- - as late as the 17th or even 18th century. That doesn't seem likely given that its neighbor Vietnamese had undergone devoicing prior to borrowing from Late Middle Chinese during phase 3 (circa the 10th century).
Phases of Saek
Saek has gone through some of the same changes as Vietnamese up to phase 3, though the details differ:
Phase 1: Proto-Tai:
presyllables present (rewritten here as *Cə- instead of as *C.- as in Pittayaporn's notation)
phonemic voicing in obstruents
drag chain shift: *-t- > *-d- > *-r-; contrast with Vietnamese phase 3 in which *-t- > *-d-;
loss of presyllabic vowels
*pər- > *pr-; *pr̥- > *pʰr-
subphonemic tones determined by initial consonant (Including presyllabic consonants unlike Vietnamese) after lenition (again, unlike Vietnamese)
To facilitate comparison with Vietnamese, I use Vietnamese tone notation: zero for tone A1 and a grave accent for tone A2.
Tones conditioned by final consonants may have developed between phase 1 and phase 3.
drag chain shift: *pd- > *pr- > r-, *d- > tʰ-, *z- > j-
words formerly distinguished by initial voicing now distinguished by tone wh\ich has become phonemic
My guess is that lenition and devoicing happened independently in Vietnamese and Saek, whereas tonogenesis did not - Vietnamese phase 3 and Saek phase 3 may have been simultaneous.
Phases of Cao Bang
On 6.11, I thought Saek having *z- and other voiced
consonants as late as the 18th century was improbable, but Tai
languages on the Sino-Vietnamese border never underwent
devoicing (PIttayaporn 2009: 110). Compare the phases of Cao Bang with
those of Vietnamese and Saek:
Phase 1: Proto-Tai: same as Saek phase 1
loss of presyllabic vowels
*-r- > *-r̥- after a voiceless initial (as in Vietnamese and Saek)
Chain shift: *pt- > *pr̥- > *pʂ-
subphonemic tones determined by voicing of consonant before vowel (contrast with Saek)
To facilitate comparison with Vietnamese, I use Vietnamese tone notation: zero for tone A1 and a grave accent for tone A2.
Tones conditioned by final consonants may have developed between phase 1 and phase 3.
*pr̥- > *tr̥- > tʰ-
elimination of *voiceless-voiced clusters and chain shift: *pd- > *d- > dʱ-
*pʂ- > *pɕ- > pʰj-
*z- > *s- > tʰ-; *z- devoiced but this seems to be an anomaly; see my 6.13 entry; the fortition is reminiscent of Vietnamese (see Phan 2013 for examples of *s- > /tʰ/ in Vietnamese: eg., *sit > thịt 'meat'¹) but probably occurred independently much later. Phan (2013: 65) regards fortition of fricatives as "common in Southeast Asia and should not be considered a shared innovation."
tone A2 still strongly associated with voiced initials but has become phonemic due to the devoicing of *z-
Finally, for reference:
Phases of Thai/Lao
Thai and Lao never underwent lenition; medial *-t- and *-d-
remain as stops today.
Phase 1: Proto-Tai: same as Saek and Cao Bang phase 1
loss of presyllabic vowels
*-r- > *-r̥- after a voiceless initial (as in Vietnamese, Saek, and Cao Bang)
Phase 3: More or less represented by Thai and Lao spelling
(but Lao has no <z>; *z- corresponds to ຊ <j>)
reduction of *pC- to *t- and *ɗ- (not *d-!); was there an intermediate geminate stage *tt- and *dd-?
*-r̥- > -ʰ-
subphonemic tones determined by initial consonant (Including former presyllabic consonants unlike Vietnamese)
To facilitate comparison with Vietnamese, I use Vietnamese tone notation: zero for tone A1 and a grave accent for tone A2.
Tones conditioned by final consonants may have developed between phase 1 and phase 3.
drag chain shift: *ɗ- > d- > *tʰ-
words formerly distinguished by initial voicing now distinguished by tone wh\ich has become phonemic
the Vietnamese notation, though convenient, is misleading, as tones A1 and A2 have undergone splits and, in Thai, a merger.
The development of tones A1 and A2 in Thai and Lao
|Stage 3 subphonemic tone
|Stage 3 initials
|Stage 4: Thai tones
|Stage 4: Vientiane Lao tones
All of the phases above are my speculations built upon the work of
Gage ("Vietnamese in Mon-Khmer Perspective", 1985) and Pittayaporn
(2009). The relative chronology is only approximate; some but not all
changes could be reordered with the same final results.
¹The nặng tone written with a subscript dot normally indicates a *voiced initial. It is tempting to reconstruct a change *z- > /tʰ/ as in Cao Bang. But support for *z- in native words is weak. The tone may reflect a lost voiced prefix.
Tonight I found the section on the Middle Korean emphatic particle za
at random in Lee and Ramsey (2011: 194). The earliest attestations of
it I can find in Old Korean are in two 鄉歌 hyangga
毛等居叱沙*motʌn kəs sa
'all thing EMPH'
- 慕竹旨郎歌 （c. 700)
- 禱千手觀音歌 (c. mid-8th century)where it is spelled phonetically with Middle Chinese 沙 *ʂæ 'sand'.
It occurred to me that the 'sand'
spelling of that
particle¹ obviously must predate the lenition of *s to
Middle Korean z.
If a *z-pronunciation had existed in Old Korean, it could have been spelled with Middle Chinese
嵯嵳𣩈㽨瘥𥰭䑘艖蒫醝䰈鹺䴾齹虘蔖䠡䣜躦𪘓 *dzaor 邪䓉耶椰瑘𥯘鎁釾𦭿𦰳斜䔑擨 *ziæ².
(There was no Middle Chinese syllable *za. This gap is not
accidental. I should look into it.)
It turns out that 邪 'evil' is attested as a phonogram in Old Korean hyangga, but 俞昌均 Yu Chhang-gyun (1994: 76) interprets it as a symbol for *ra (cf. its possible Old Chinese reading *la in Schuessler 2009: 56). There have been many attempts to reconstruct the pronunciation of Old Korean. Has anyone interpreted 邪 as *sa (possibly tempted by its modern Sino-Korean reading sa) or *za? I don't have any other sets of hyangga readings on hand. Another thing to look into when I get the chance.
¹6.11.21:29: It never occurred to me to use Unicode superscript numerals for endnotes until now. No more long strings of asterisks.
It's theoretically possible that the 'sand' spelling in this text
postdates the 8th century, as these poems survive in 三國遺事 Samguk
(1285) whose earliest surviving copy is from 1512. Even if these poems
are actually from c. 700 AD, their spellings could have been altered in
the centuries between then and 1512.However, I know of no other
evidence pointing toward some other Ur-spelling of the emphatic
particle. The 口訣 kugyŏl phonogram for *sa ~ *za
is 氵 which is almost certainly an abbreviation of 沙 'sand', the most
common sa-character with the left-hand component 氵 'water'. Kugyŏl
manuscripts from the Koryŏ dynasty (918-1392) predate 1512; one need
not worry about potential errors in their transmission.
²6.11.23:44: Nearly all of these characters are rare and therefore not likely candidates for phonograms which tended to be high-frequency characters. So one might argue that the Old Korean particle was *za but not written as such because there was no high-frequency characters with a similar reading other than 邪 *ziæ 'evil' which was already being used for *ra if Yu (1994) is correct. However, if *s had already lenited to *z in Old Korean, I would expect to see other phonogram spellings unambiguously reflecting lenition. But I know of none offhand. Although one might argue that *s lenited before other consonants, that possibility could only be confirmed if there were *(d)z-spellings of later z-words. No such spellings seem to exist.
The only *(d)z-phonogram in Yu's (1994: 75-78) catalog of
phonograms in hyangga are the aforementioned 邪 *ziæ
齊 Middle Chinese *dzej 'equal' : Yu's Old Korean *tsjə (my *tse)
which, like 邪 *ziæ 'evil', does not represent an Old Korean syllable corresponding to a Middle Korean z-syllable. So if Old Korean already had *z-syllables, they were not written with Chinese *(d)z-characters and cannot be detected.I could argue that in fact the dialect of Chinese known to educated Old Koreans had shifted *(d)z- to *(t)sʱ- (as in Pulleyblank's Late Middle Chinese reconstruction), so the characters above wouldn't have been appropriate for an Old Korean *za.
That Chinese dialect had a reflex of Middle Chinese *ɲ- that
corresponds to z in Middle Korean Sino-Korean readings. But
there was no Middle Korean Sino-Korean reading †za. So it seems
Old Koreans had no good options for writing *za if they had
such a syllable - and I still don't think they did.
(The questions of what that Chinese dialect's reflex of *ɲ-
was and how it was borrowed into Old Korean - as *z- or as
something else that became z- in Middle Korean - remain open.
The simplest solution is to assume that Chinese dialect had something
like the *ž- of Liao Chinese. This was borrowed into Old Korean
as *z-, a consonant originally only in borrowings. Later,
Middle Korean lenited *s
in native words, resulting in a new /z/ that shared the fate of the old
borrowed one: both /z/ soon disappeared from the Seoul dialect. [But
does any Korean dialect today have a trace of /z/ in Sino-Korean words?)
THE PHONETIC VALUE OF MIDDLE KOREAN DOUBLE ZERO
In the earliest hangul texts from the 15th century, there were three
ㅇ <Ø> : ㆁ <ŋ> : ㆀ <ØØ>
In modern hangul, ㅇ <Ø> has come to represent zero in initial position and /ŋ/ in coda position: e.g., 앙 <ØaØ> /aŋ/. Although ㅇ may appear with a short vertical line on top like ㆁ <ŋ> in some fonts, that line no longer distinguishes ㆁ <ŋ> from ㅇ <Ø>; the reading of ㅇ /ㆁ is now wholly dependent on its position within a syllabic block.
ㅇ <Ø> had two uses in the earliest hangul orthography for Late
Middle Korean in the 15th century. it could represent initial /Ø/ as in
the modern language and - unlike the modern language - also represented
/ɣ/ in four environments:
1. between /r/ and a vowel
2. between /z/ and a vowel
3. between /j/ and a vowel
4. between /i/ and a vowel
This /ɣ/ has disappeared in the modern standard language, though traces remain in dialects: e.g., 15th century 몰애 <morØai> /morɣaj/ 'sand' corresponds to Pukchhŏng molgɛ with -g- (cf. standard morɛ).
What was ㆀ <ØØ>? Lee and Ramsey (2011: 146) regard it as
another spelling of Late Middle Korean /ɣ/.
But why would two letters be devised for the same sound at the very
beginning of a script? A clue may lie in the limited distribution of ㆀ
<ØØ> which was solely used to write forms of the
passive/causative suffix ᅇᅵ<ØØi> - and in one instance, the
causative suffix ᅇᅮ <ØØu> (月印釋譜 Wŏrin sŏkpo 14:14) -
after /j/. If the first suffix were
simply /ɣi/, why not spell it as 이 <Øi> which is the spelling
after /l z/? (I don't know of any instances of that suffix after /i/.
The second suffix is otherwise spelled <Øu> = /ɣu/ after /l z j/.)
Yesterday afternoon it occurred to me that ㆀ <ØØ> might represent a palatal allophone [ʝ] of /ɣ/. This allophone may have been geminated [ʝʝ] if it was like /ss/ and /hh/ which were written as double consonants ㅆ ㆅ <ss hh>. There is even one case of /nn/ as ㅥ <nn> in 訓民正音諺解 Hunmin chŏngŭm ŏnhae.
There is, however, no guarantee that a double consonant necessarily represented a geminate, as ㅆ ㆅ <ss hh> could also represent /z ɦ/ in the prescriptive transcription of Sino-Korean readings. (Native /z/ had a different letter ㅿ <z>. It might be more accurate to regard the artificial voiced consonants of Sino-Korean readings as breathy voiced: e.g., Sino-Korean ㅆ <ss> was /zʱ/ or /sʱ/ and therefore distinct from ㅿ /z/.) Doubled ㄲ ㄸ ㅃ ㅉ <kk tt pp cc> could only represent /g d b dz/ in that transcription in the earliest hangul texts; their use for reinforced consonants came later.
Moreover, the circle was used to derive consonant characters for nongeminates: e.g., /β/ was written as ㅸ. So ㆀ <ØØ> could be interpreted as 'derivative of circle' for [ʝ] rather as than 'double circle' for [ʝʝ] (or geminate zero which would make no sense).
One problem with this proposal is that it cannot easily account for the one instance of ㆀ <ØØ> in the causative suffix ᅇᅮ <ØØu>. It is understandable that /ɣ/ would palatalize to [ʝ] between /j/ and /i/ in, for instance, ᄆᆡᅇᅵ<mʌi.ØØi> /mʌjɣi/ [mʌjʝi] 'to be bound to', the passive stem of /mʌj/ 'to bind'. It is slightly less understandable why /ɣ/ would palatalize to [ʝ] between /j/ and /w/ in 뮈ᅇᅯ <mui.ØØuə> /mujɣwə/ 'moving'. (/ɣw/ is an allomorph of /ɣu/ before vowel-initial suffixes like /ə/ '-ing', called the 'infinitive' [though it is not like an Indo-European infinitive].)
Perhaps 뮈ᅇᅯ <mui.ØØuə> reflects a pronunciation [mujʝɥə] in which the palatal quality of /j/ spread into the following consonants. That pronunciation might even have been common, though for most purposes a phonemic spelling 뮈워 <mui.ØØuə> for /mujɣwə/ might have sufficed instead of a more precise phonetic spelling 뮈ᅇᅯ <mui.ØØuə>. I don't know if the spelling 뮈워 <mui.ØØuə> is attested, but 月印千江之曲 Wŏrin ch'ŏn'gang chi kok 62 has the spelling 뮈우 <mui.Øu> /mujɣu/ for the stem.
Slavic languages normally only have [f] in loanwords and as a
positional variant of /v/ (which is why Russian names in -v
have variant spellings in -ff).
- in onomatopoetic words (e.g., foukat 'to blow')
- as a positional variant of v before voiceless consonants (e.g., vsadit 'to bet', pronounced [fsadit])
- in loanwords from non-Slavic languages (e.g., .fonetický 'phonetic')
So what is the source of the f in the dish called frgál? That f- is before a voiced syllabic r and is not a variant of v-. Is it onomatopoetic or from a foreign language - perhaps Romanian, given that frgál is from Moravian Wallachia? That region isn't continguous with modern Romania, but it was settled by Vlachs.
SHIMUNEK (2017) AND DOWNES (2018)
Last night, I found the addenda and corrigenda to Andrew Shimunek's Languages of Ancient Southern Mongolia and North China (2017). I thought that would be as close as I'd get to having his book which I can't afford at $116.76 until I saw an online sampler.
It's remarkable that three books on Khitan have appeared in English within a decade - the other two being Daniel Kane's The Kitan Language and Script (2009) and Wu Yingzhe and Juha Janhunen's New Materials on the Khitan Small Script: A Critical Edition of Xiao Dilu and Yelü Xiangwen (2010 - just a year after Kane's book!).
Can a new book on Jurchen be far behind? It has been almost thirty years since Kane's The Sino-Jurchen Vocabulary of Interpreters (1989) which despite its title is a general gateway to Jurchen language studies as well as complementing Kiyose Gisaburō's A Study of the Jurchen Language and Script - Reconstruction and Decipherment (1977) which covered the Sino-Jurchen vocabulary of the Bureau of Translators.
Not long after Imre Galambos' Translating Chinese Tradition and Teaching Tangut Culture: Manuscripts and Printed Books from Khara-Khoto (2015) comes Alan Downes' PhD dissertation "How Does Tangut Work?" (submitted 2016, revised 2018), a follow-up to his BA honors thesis "The Xixia Writing System" (2008) - and his tangut.info website which links to mine.
Alas, I haven't written about Tangut - much less Khitan or Jurchen -
in a long time. If I may rephrase Downes' question, I have been trying
to come up with the answer to "How Does Pyu Work?" It's coming in a
series of articles and a book.
These are exciting times for the study of extinct Asian languages.
YAT AND ETA
Today I realized that my interpretation of the early Slavic vowel yat as [ɛː] (< *ai) sounded like the classical value of the Greek letter Η eta. Since Cyrillic is an offshoot of the Greek alphabet, one might expect yat to have been written with an eta-based Cyrillic letter. But of course eta was actually the model for the Cyrillic letter И <I> because eta had raised to [i] by the 4th century AD, long before Cyrillic was created in the late 9th century. [ɛː] was long gone in Greek, so a non-Greek letter was created for yat: Ѣ.
Ѣ looks like a derivative of the front yer letter Ь [ɪ] which in turn looks like a derivative of the Glagolitic front yer letter Ⱐ. But it is strange that a lower mid long vowel was written with a modified lower high short vowel rather than, say, with an additional stroke (like Czech ě which is nowadays used to transliterate yat). I don't see any resemblance between Ѣ and its Glagolitic counterpart Ⱑ.
5.14.23:24: According to Wikipedia, Schenker (1995) thought Ⱑ might be from Greek alpha Α. That makes a lot of sense if yat were [æ].
Modern reflexes of yat vary considerably in height from [ja] with a
low vowel in eastern Bulgarian* to [i] in Ukrainian.
*Eastern Bulgarian has two reflexes of yat: [ja] and [ɛ]. The former is in stressed syllables not followed by front vowels. The latter occurs elsewhere.
PROTO-CELTIC VOICED ASPIRATES?
I've seen this Proto-Celtic word list before, but I didn't notice voiced aspirates in it until now:
*mori-steigh-(e/o-) 'sea'*men-n-dh-e/o- (?) 'want'*
*ati-od-bher-to- (?) 'sacrifice'
Are those pre-Proto-Celtic forms? I thought Proto-Celtic lost
aspiration in voiced consonants:
Proto-Indo-European *gh *dh *bh > Proto-Celtic *g *d *b
*5.14.0:42: This reminds me of Avestan mazdā- 'wisdom' < *mn̥s-dheʔ 'mind-place', though the first root is in the e-grade in Celtic.
CHU AND KRA-DAI (PART 2)
The series has five types of Early Middle Chinese (EMC) readings (ignoring final consonants):
The high-vowel presyllables of types I-III conditioned medial *-ɨ- which in turn conditioned the raising of *a to *ʂɨə.
I. *sɨə- < *Cɯ-sa- (*kɯ-sa-?) (胥湑稰諝糈壻婿)
II. *ʂɨə- < *kɯ-sa- (疋疏蔬梳糈)
III. *tʂʰɨə- < *kʂʰɨa- < *kɯ-sa- (楚 only)
IV. *ŋæ- < *ŋgʐa- < *N-k-sa- (alternate reading of 疋 only)
V. *sej < *se (alternate reading of 壻婿 only)
The high-vowel presyllables of type I was lost after conditioning medial *-ɨ-, but they fused with *s in types II and type III. *kɯ-s- that fused early became EMC *ʂ- via *kʂ-; *kɯ-s- that fused late became EMC *tʂʰɨə- via *kʂʰ-.
Type III *kʂʰɨaʔ might have approximated an early Kra-Dai *kraʔ, especially if it were phonetically something like [kʁaʔ].
(5.12.0:56: Or if 'Kra' were [kʐaʔ]. Cf. Polish krz [kʂ] from *kʐ- < *krʲ-. Pittayaporn 2009: 99 reconstructed *ks- as a Proto-Tai source of Proto-Southwestern Tai [and hence Siamese] *kʰr-, though he does not list any examples of Proto-Tai *ks-, and he reconstructed the Proto-Tai cognate of 'Kra' as *kraː C 'slave' with *kr- rather than *ks-. Siamese kʰaː C1 'slave' lacks the -r- that would point to medial *-s-. If *ks- became Siamese kʰr-, perhaps *kz- became *kr- and then Siamese kʰ-.)
*N- fused with *k- to form the *ŋ- of type IV.
(5.12.0:11: OC *a fronted to *æ after retroflexes.)
The *-e rhyme of type V is anomalous and unique to 壻~婿
'son-in-law'; it cannot be reconciled with the *-a rhyme of the
5.12.1:03: Added all examples of each type listed in (Schuessler 2009: 59) plus 疋 as the sole example of type IV which was not listed in Schuessler.
18.104.22.168:59: CHU AND KRA-DAI (PART 1)
Chamberlain (2016) proposed that the name of the state now known as 楚 Chǔ in Mandarin is the same name as Kra as in Kra-Dai. This is an ingenious idea. But does it really work?
The rhymes certainly match. 楚 ended in *-aʔ in Old Chinese, and 'Kra' in Proto-Kra-Dai was something like *kraʔ (cf. Ostapirat's Proto-Kra *kra C 'Kra' and Pittayaporn's Proto-Tai *kraː C 'slave'; I interpret the C tone category as *-ʔ like Norquest 2016).
The trouble is the initial. If 楚 had initial *kr- in Old Chinese, it would have become Early Middle Chinese †kæʔ and Mandarin †jiǎ. But instead it became Early Middle Chinese †*tʂʰɨəʔ and Mandarin chǔ [tʂʰu] with aspirated retroflex initials.
Can those initials be reconciled?
Pulleyblank (1962: 129) proposed that Old Chinese *skʰ- might have become Early Middle Chinese *tʂʰ-. Later, Pulleyblank (1965: 206) proposed Old Chinese *kʰs- as a source of Early Middle Chinese *tʂʰ-. But there is no *s in Proto-Kra-Dai *kraʔ. *s- is likely to have been in the Old Chinese reading of 楚 since nearly all readings of characters in the 疋 phonetic series began with *ʂ- or *s- in Early Middle Chinese. There is no evidence on the Chinese side directly pointing to *k- in 楚 or any other member of the 疋 phonetic series, though 疋 does have another Early Middle Chinese reading *ŋæʔ which could mechanically be derived from an Old Chinese *ŋraʔ - close to *kraʔ but with a velar nasal rather than a stop.
Next: How can I make Chamberlain's idea work?
5.11.11:56: Added reference to Pulleyblank (1965) and link to Pulleyblank (1962).
ARMENIAN, KOREAN, AND BURMESE APPROACHES TO KHITAN OBSTRUENTS
In my last entry, I wrote,
the Khitan transcribed Liao Chinese *t as both <t> and <d>
There are similar inconsistencies with other obstruents and to a
lesser extent even in the spelling of native Khitan words: e.g.,
'second' is spelled with both 162 <c> and 104 <dz> (Kane
I originally thought that Liao Chinese and Khitan had different obstruent systems: e.g., LC had an unaspirated : aspirated distinction whereas Khitan had a voicing distinction. But that wouldn't explain the inconsistency in Khitan native words.
Today it occurred to me that Khitan might have had Armenian-style variation:
The major phonetic difference between dialects is in the reflexes of Classical Armenian voice-onset time. The seven dialect types have the following correspondences, illustrated with the t–d series:
Correspondence in initial position
dʰ tʰ Erevan t
dʰ tʰ Istanbul d
tʰ Kharpert, Middle Armenian d
tʰ Malatya, SWA
tʰ Classical Armenian, Agulis, SEA t
tʰ Van, Artsakh t tʰ
But of course Khitan had only two obstruent series, not three.
Might the use of certain spellings correlate with certain locations
and/or time periods? They would then reflect the obstruent series of
different regional/chronological varieties of Khitan. The unspoken
assumption of Khitan studies is that the language was homogeneous over
a wide area for a long period, but that is unlikely.
Another possibility is that Khitan was like modern Korean in which unaspirated obstruents have voiced and voiceless allophones conditioned by different environments: Sino-Korean 德 /tək/ appears as
[dək] after a sonorant
Could 254.020 <d.ei> ~ 247.020 <t.ei> transcribing Liao
Chinese 德 (Kane 2009: 253) have had a similar distribution?
A final possibility is that Khitan was like Burmese in which etymological voiceless consonants may be voiced in close juncture. Wheatley (2009: 729) explains that in Burmese,
[c]lose juncture is characteristic of certain grammatical environments [...] But within compounds the degree of juncture between syllables is unpredictable; the constituents of disyllabic compound nouns (other than recent loanwords) tend to be closely linked, but compound verbs vary, some with open, some with close juncture.
The above possibilities are not mutually exclusive for Khitan.
THE KHITAN EMPEROR SHENGZONG IN UNICODE
Today I discovered that lookalikes for all four Khitan large script characters for 聖宗皇帝 'Emperor Shengzong' (r. 979-1031) exist in Unicode:
Of course it's only the first two characters that are interesting; they are unknown to nearly all literate in Chinese. The last two are identical to Chinese 皇帝 'emperor'.
'Emperor Shengzong' exemplifies how the Khitan large script to a Chinese eye is a mix of familiar and alien elements. The first two characters combine famliar elements
夕 'evening' + 卞 'hat' = 𫝢
亻 'person' + 及 'to reach' = 伋
in unfamiliar ways.
𫝢 turns out to be a variant of 升 'to rise', which in turn was a homophone of 聖 *šiŋ 'sage' in Liao Chinese aside from its tone. 𫝢/升 and 聖 were not homophones until the late first millennium AD, so the use of 𫝢 for 'sage' may date from the Liao dynasty and is probably not a carryover from the pre-Liao Parhae script hypothesized by Janhunen. Why didn't the Khitan simply recycle 聖 'sage' the way they recycled 皇帝 'emperor'? Was 聖 'sage' too complex for the Khitan large script which favored a low number of strokes per character?In Chinese, 伋 is a name character of no known meaning. (It is the birth name of Confucius' grandson 子思 Zisi.) It would have been pronounced *ki in Liao Chinese and not 宗 *tsuŋ like 'ancestor'. So the reasoning for 伋 as 'ancestor' is unclear (though at least the 亻 'person' radical makes sense). Might a Khitan or even a Parhae word for 'ancestor' have sounded something like *ki?
(5.9.9:39, revised 14:16: Was 伋 a semantic compound invented by
someone who might
not have known about the rare character 伋? But I know of no
semantic compounds unique to the Khitan large script. The closest
instance I can think of is
which consists of 天 'heaven' over 土 'earth'. It is not a true
semantic compound because it does not represent a word for 'heaven and
earth' or 'world' (the sum of 'heaven and earth'); 土 'earth' seems to
disambiguate an unknown Khitan word for 'heaven' from 天 for
a borrowing from Liao Chinese. The semantic function, if any, of 及 'to
reach' in 伋 'ancestor' is less clear.
The Dictionary of Chinese
Character Variants has no 伋-like variants of 宗.
What I will call Janhunen's
Question remains unanswered: If the Khitan
wanted a script to distinguish themselves from the Chinese, why did
they keep or replace characters seemingly at random? I still think the
only possible answer is that they didn't do that - rather, they adapted
a sister script of Chinese [Janhunen's hypothetical Parhae script]. The
situation is somewhat parallel to that of Cyrillic which is related to
the Latin alphabet but not derived from it; they are 'cousins', not
'daughter' and 'mother'.)
Although the shapes of 皇帝 'emperor' are uninteresting, the question of how we know their readings is worth examining. Kane (2009) reads them as <hoŋ di> (= <ghong di> in the transcription system on this site).
However, I have not found any Khitan small script phonetic spelling
of the first half of 皇帝
'emperor' or any of its homophones in Chinese. I
would expect such a spelling to be 340.071 <h.ong> with voiceless
340 <h> rather than voiced-initial 076 <gho>. (There is no
known small script character <gh> without a vowel, and *ɣ
devoiced to *x in Liao Chinese.) No spelling <h.ong> is
in Qidan xiaozi yanjiu (1985: 460). Has such a spelling been
found in the thirty-plus years since the publication of that book?
Kane (2009: 244) lists 247.339.339 <t.i.i> as a small script spelling of the second half of 皇帝 'emperor'. Unfortunately, he does not cite a source for this spelling, and it is not in Qidan xiaozi yanjiu (1985: 375). I presume <t.i.i> is from an inscription discovered after Qidan xiaozi yanjiu was written. The <t> of <t.i.i> does not necessarily invalidate Kane's reading di for 帝 since the Khitan transcribed Liao Chinese *t as both <t> and <d>, and they transcribed Liao Chinese *i as both <i> and <i.i>.
5.9.0:33: Why is the name character 伋 glossed in English as 'deceptive' at zdic.net?
5.9.0:49: Kane (2009: 181) also lists a second Khitan large script character ⿰歹卞 for 聖 'sage' with 歹 'bad' on the left instead of 夕 'evening' from Liu and Wang (2004: 27, character 150). That character has no Unicode lookalike; it is character 0177 in N4631 ("Proposal on Encoding Khitan Large Script in UCS") which does not seem to list 𫝢 from Kane (2009: 183). Where is 𫝢 attested? Regardless of whether 𫝢 is an error for ⿰歹卞 and hence not a real Khitan large script character, I have no doubt that ⿰歹卞 is a variant of the Chinese character 𫝢 and is a phonetic loan for 聖 'sage'.
I also think that 𫝢 / ⿰歹卞 <shing> may have been the
inspiration for the vaguely similar Tangut character
whose Tangraphic Sea analysis has been lost.
5.9.22:31: Are Khitan large script characters
1054 (升 + a dot on the right)
1056 (1054 with the first stroke 丿 stretching over both vertical strokes of 廾 plus a dot on the right)
in N4631 further variants of 𫝢 / ⿰歹卞 <shing>?
5.10.1:49: Chinggeltei's 關於契丹文字的特點 (1997: 110) includes
𫝢 in its list of Khitan large script characters.
OBLIQUE AFFRICATES IN CHINESE
Today on Wikipedia
I saw that standard Mandarin 斜 xie [ɕjɛ] 'oblique' corresponded
with affricate initials. The colloquial reading preserves an earlier -a going all the way back to Old Chinese; the literary reading has an innovative raised vowel [ɪ].
The dictionary Middle Chinese initial is *z-. Other
dialects of Middle Chinese might have had *dz-. In any case,
the Old Chinese word began with *sɯ-, though what was between
that *sɯ- and *-a is not clear: *sɯ.ɢa, *sɯ.ja,
and *sɯ.la are all possible. There is no known external
comparison that could narrow down the possibilities. The character 斜
has the phonetic 余 *Cɯ.la, but the character 斜 dates from Han
times, and at that point *ɢ, *j, and *l might have
already merged into *j. (邪 'slant' - a homophone of 斜 in Middle
Chinese - may be a pre-Han spelling of the same word. But its phonetic
牙 has a velar nasal initial *ŋ-!)
My hypothetical Middle Chinese *dz- might be from *sɯ.ɢ-
> *s.ɢ- > *zɢ- > *zd- > *dz-.
But it's more likely that it results from a Late Old Chinese or Middle
Chinese confusion of *z- with *dz-. Japanese merged *z-
and *dz- into /z/ which is now [dz] initially, [z] medially,
and [ddz] when geminated.
Xiaoxuetang reports affricate initials in 斜 in
Mandarin: 天長 Tianchang [tsʰ] (the sole Mandarin example on the site)
Wu: 丹陽 Danyang [dʑiɑ] ~ [dʑiɒ], etc.
(Hui: no data; NB: this 徽 Hui is not the Mandarin-speaking Muslim 回 Hui, whose name is pronounced with a different tone)
Gan: 湖口 Hukou [dʑia], etc.
Xiang: 雙峰 Shuangfeng [dʑio], etc.
Min: 廈門 Amoy [tsʰia] (colloquial; literary [sia]), etc.
Yue: Cantonese [tsʰɛ] (where long ago I first observed this affricate initial corresponding to Middle Chinese *z-; I didn't know such an initial was in Mandarin too)
Ping: 永福 Yongfu [tsʰiə], etc.
Hakka: 梅縣 Meixian [tsʰia] (colloquial; literary [sia]), etc.
The affricate initial is represented in nearly every branch. No Jin variety on that website has an affricate reading. But all but one of the unclassified varieties has an affricate initial.
It seems that literary varieties of Middle Chinese kept *z- (> modern [s]) apart from *dz- while colloquial varieties merged them to various extents.
5.8.13:40: For comparison, let's see if the above dialects also have affricates for Middle Chinese 徐 *zɨə 'to walk slowly; a surname':
Mandarin: 天長 Tianchang [tʃʰʮ], etc.
Wu: 丹陽 Danyang [dʑyz] (sic), etc.
Hui: 旌德 Jingde [tsʰʮ], etc.
Gan: 湖口 Hukou [dzi], etc.
Xiang: 雙峰 Shuangfeng [dy] (sic) ~ [dʑy], etc.
Min: 廈門 Amoy [tsʰi] (colloquial; literary [su]), etc.
Yue: Cantonese [tsʰœy], etc.
Ping: 永福 Yongfu [tsʰy], etc.
Hakka: 梅縣 Meixian [tsʰi], etc.
The only Jin variety with a reading is the most well-known: 太原 Taiyuan [ɕy]. 徐 is a common surname, so it must be in other Jin varieties. The absence of affricates in Jin readings of 斜 'oblique' makes me guess that 徐 also lacks affricates in the rest of Jin, but I don't know.
unclassified varieties have a mix of initials: e.g.,
富川 Fuchuan [sy]
鍾山 Zhongshan [θy]
賀州 Hezhou [ty] (cf. the stop [d] in Shuangfeng above)
道縣 Daoxian [tso]
連州 Lianzhou [tsʰɛu]
To work out what's going on with them would require studies of their individual phonologies. It is a shame that Xiaoxuetang doesn't seem to have initial, rhyme, and tonal inventories online for each variety. In theory I could extract inventories from the data, but I don't have the time to do that right now.
HAVE A ČĪZBURGERU: ENGLISH BORROWINGS IN LATVIAN
After mentioning Latvian datums last time with its combination of a Latin neuter suffix -um and a Latvian masculine suffix -s, I was curious to see how Baltic languages dealt with a recent influx of English loans. Baltic languages and Greek are the only modern Indo-European languages I know of that still retain ancient -s suffixes in the nominative case.
I guessed that all Latvian borrowings of English consonant-final
stems would be placed in the first
masculine declension like datums. And it does seem that is
generally the case. See these two
Even sibilant-final stems are assigned to that declension: e.g., bizness
(which is biznes-s and not copying the -ss of the
English spelling) and finišs (< finish + -s).
I might have expected them to be assigned to the second declension
with -is or the third declension with -us.
The exceptions I've seen so far end in -er in English:
adapteris < adapter
menedžeris < manager
peidžeris < pager
porteris < porter
taimeris < timer
Were they assigned to the second declension by analogy with some earlier wave of -eris loans?
Not all English -er words become -eris words in Latvian: cheeseburger has become čīzburgers (with an un-English pronunciation of burger with [u] - †čīzberger would have been closer to the English original). Maybe -burger is by analogy with hamburgers, perhaps in turn influenced by Russian <gamburger>, also with [u]? No, maybe -burger is simply based on a spelling pronunciation.
THE GENDER OF 'DATE' IN BALTO-SLAVIC AND ROMANCE
On the same Wiktionary page as Dutch datum 'date' (masculine despite its Latin neuter ending -um!) are
Czech datum (neuter); cf. Slovak dátum (masculine; why a long á that doesn't match Czech or Latin?; its neighbor Hungarian dátum also has a long vowel)
Serbo-Croatian and Slovene datum (masculine)
is also masculine. The shift to masculine in Slavic is understandable
since consonant-final nouns are generally masculine, and Latin -um
not a Slavic suffix and hence prone to reinterpretation as the ending
of a stem.
Leaving Slavic, Latvian has no neuter, and its feminine stems generally end in vowels, so masculine datums is also understandable.
However, Latvian's sister Lithuanian has feminine data
(which looks like the Latin plural!) rather than masculine †datumas
on LIthuanian declension).
And going back to Slavic, Polish also has feminine data, and Bulgarian, Macedonian, Belarusian, Russian, and Ukrainian have feminine <data>. Romance languages have feminine data (French date and Romanian dată) too. Wiktionary derives the Romance forms from a Late Latin data. fdb explains:
Italian, Spanish, Portuguese (etc.) data, and French date (whence English date) are all taken from Mediaeval Latin data, the plural of classical Latin datum, but reinterpreted in these languages as a singular noun. German and Dutch use the classical singular form datum.
All of these are bookish borrowings from Mediaeval or Classical Latin (so-called cultisms) and not organic descendants of the Latin words.
[Someone asks what organic descendants would look like.]In that case one would expect *dada in Spanish, Portuguese and Italian.
Are the -um forms in Slavic and Latvian borrowings from
5.6.0:01: English date then got borrowed into German as das Date which is presumably neuter by analogy with Datum.
5.6.0:09: Added quotation from fdb.
5.6.0:28: Danish date from English has common gender (cf. German above).
5.6.0:32: Added Romanian dată.
THE GENDER OF DUTCH '-ISM'S AND 'DATE'
Not in time for May Day ...
is masculine, as is its Latinized German equivalent Kommunismus
with a restored Latin masculine nominative singular ending -us.
So why is Dutch communisme
(and other -isme words like socialisme)
has a Latin neuter nominative singular ending -um and is still
neuter in German. So why is Dutch datum
masculine unlike, say, neuter museum
which is still neuter in Dutch?
Are the genders by analogy with semantically similar words? Was there ever a time when de communisme and het datum were acceptable?
5.5.0:33: Google Books has examples of het datum from the 18th and 19th centuries. But I can't find any examples of de communisme in Dutch (as opposed to French where that is a preposition-noun sequence rather than a definite article-noun sequence).
Treffers-Daller (1994: 140) discusses French-Dutch gender mismatches and mentions Van Marle's hypothesis that French borrowings are marked and may receive the marked gender: the less frequent neuter gender (only 25% of Dutch nouns are neuter according to Tuinman 1967).
She also writes,
According to Volland (1986), many French loans obtain neuter gender when borrowed into German. About 60 percent of the borrowings keep the original gender in German, and 40 percent are allocated another gender. In most cases it is the masculine nouns who become neuter in German. It is remarkable that the same tendency for masculine words to become neuter exists in German and in Dutch.
Obviously Kommunismus is not one of those masculine words (though its -us may have made it resistant to gender shift).
22.214.171.124:59: CZECH VOWEL ASYMMETRY AGAIN
Judging from the IPA for Czech at Wikipedia, Czech vowels are phonetically as well as distributionally asymmetrical:
||/u uː/ [u uː]
|/o oː/ [o oː]
|/e eː/ [ɛ ɛː]
|/a aː/ [a aː]
The front part of the system 'tilts downward' with the exception of /iː/ which is high.
Short /i/ is lower than long /iː/ and has no back counterpart at the
How did this system come about? /i iː/ are from earlier front *i
*iː and central *ɨ *ɨː.
Was there a Ukrainian-like phase in which the central high vowels became *ɪ *ɪː? (Ukrainian has no phonemic vowel length, though.) The four front vowels in stage 2 then merged into an English-like subsystem with a higher long vowel and a lower short vowel in stage 3:
Unlike Czech, Slovak is next door to Ukrainian, and according to the IPA at Wikipedia it has no [ɪ]; its vowel system is truly symmetrical on the phonetic level if one ignores the increasingly marginal vowel [æ]:
|[e eː]||[o oː]
The Slovak phonology article at Wikipedia, however, paints a more complex picture: e.g., /e eː/ [e̞ e̞ː] may be phonetically higher than /o oː/ [ɔ̝ ɔ̝ː] - the reverse of Czech. (Did the presence of low [æ] - a vowel absent from Czech - incentivize speakers to raise /e eː/ for greater contrast during its heyday in the past?) Nonetheless it seems that length is not correlated with height differences unlike Czech where short and long /i/ have different heights.
Like Czech /i iː/, Slovak /i iː/ are from earlier front *i *iː and central *ɨ *ɨː So I suspect Slovak also had a Ukrainian-like phase in which the central high vowels became *ɪ *ɪ.
But maybe at some earlier point Czech and/or Slovak had a Rusyn-like stage in which central *ɨ *ɨː coexisted with front *ɪ *ɪ. I still don't understand how Rusyn can have both central /ɨ/ and front /ɪ/ since I assume both are from *ɨ. Are they in complementary distribution? Is one native and one borrowed?
5.4.0:40: Are Czech /e eː/ lower mid because they merged with */ě/
*[ɛː]? */ě/ was historically long, but its reflexes in Czech are both
long and short for reasons I don't understand:
*bělъjь > bílý /biːliː/ 'white'
*světъ > svět /svjet/ 'world'
The short reflex is /e/ which may be preceded by a secondary palatal consonant: e.g., /j/ in the case of /svjet/.
CZECH VOWEL ASYMMETRY
The standard Czech vowel system appears symmetrical if one only
looks at vowels in isolation. Each short vowel has a long counterpart:
And the diphthongs form a triangle:
But distribution tells a more complex story.
Original *uː became /ou/ except "chiefly in noun prefixes"
(Short 1993: 456). e.g., úraz 'injury' but urazit 'to
injure'. Why was the prefix *u lengthened to an *uː
later preserved in nouns? I still don't understand the backstory of
length in Slavic.
Loanwords supplied a new /oː/ and /au eu/ to balance /ou/.
Those back vowel developments did not have exact front vowel parallels. *iː did not become †/ei/ (though Short 1993: 464 reports ý /ɨː/ > /ej/ in colloquial Czech), and *eː only sometimes became /iː/ (Short 1993: 464).
INDEPENDENT VOWEL SYMBOLS IN THE INDIC SCRIPTS OF THE PHILIPPINES
- dependent vowel symbols attached to/in 'orbit' around consonant symbols
- independent vowel symbols
Depending on the script, vowels may be written with dependent vowel
symbols plus a carrier <°a>, independent vowel symbols, or a mix
of the two.
The Indic scripts of the Philippines generally only have three independent vowel symbols each, and on closer observation, some of those symbols are derived from others:
Baybayin for Tagalog on central Luzon in the north: three truly independent symbols <°a °i °u>
Hanunoo on southern Mindoro in the center: independent <°a °u:>; <°i> looks like <°a> plus a stroke on the bottom right (unlike either the dependent vowel <i> on the top or the dependent vowel <u> on the bottom)Buhid on southern Mindoro in the center: independent <°a °u>; <°i> looks like <°a> plus a stroke on the bottom like the dependent vowel <u> rather than the dependent vowel <i> on the top)
on Palawan in the
southwest: <°a °i> have the same basic shape with different extra
strokes: one on the bottom for <°a> and another on top for
<°i>; neither stroke matches the dependent vowel <u> on the
bottom or the dependent vowel <i>); only <°u> is not
derived from another symbol
on central Luzon in the north: independent <°i °u>; <°a>
looks like <°u> plus an extra stroke on the bottom left (unlike
the dependent vowel <u> on the bottom); <°e °o> look like
<°a°i> and <°a°u>, reflecting their apparent origin as "monophthongized
Tagalog is the most conservative; it alone preserves three completely different vowel symbols that still resemble their Indic prototypes.
It is not surprising that the Mindoro scripts have the same innovation (replacing <°i> with a <°a>-derivative).
Tagbanwa and Kulitan seems to have each gone their own way. Tagbanwa is isolated by the sea, but Kulitan is next door to Baybayin.
WHAT HAPPENED TO UKRAINIAN NOMINATIVE PLURAL ADJECTIVES?
I almost 'corrected' Ukrainian <zorjani> 'stellar (nom. pl.)' to †<zoryany> with a <y> ending that I expected by analogy with Russian <ye> and Belarusian <yja> < *-ye after 'hard' (nonpalatalized) stems. But the nominative plural ending is <i> regardless of stem type. Compare:
||m. nom. sg.
||m. nom. sg.
Did <i> spread by analogy through all adjective paradigms despite the fact that hard stems outnumber soft stems (which would have led me to guess that <y> would win out)? Did the higher frequency and lower markedness of <i> in Ukrainian help it to defeat its less palatal competitor <y>?
5.1.0:07: Added table.
5.1.22:22: Maybe Ukrainian shares an areal feature with Polish which has soft novi 'new (m. pers. nom. pl.)' instead of †nowy. (But the non-m. pers. nom. pl. is still hard nowe rather than soft †nowie.)
Slovak, another neighbor of Ukrainian, has a mixed pattern like
Polish: soft noví 'new (m. anim. nom. pl.)' ~ hard nové
(other nom. pl.). A consistently hard paradigm would have †nový́
~ nové and a consistentl soft paradigm would have noví
~ †novie. (Both í and ý́ are /ɨː/, but in the
past I assume ý was something like /ɨː/. No long /ieː/ exists.)
So does Czech: noví 'new (m. anim. nom. pl.)' instead of †nový. (As in Slovak, both í and ý are /ɨː/, but in the past I assume ý́ was something like /ɨː/.) Unlike any of the above languages, Czech has three types of nominative plurals:
1. soft noví 'new (m. anim. nom. pl.)'
2. hard nové 'new (m. inanim. + fem. nom. pl.)' instead of soft †noví < *-ie
3. hard nová 'new (neut. anim. nom. pl.)' instead of soft †noví < *-ie < *-a̋
Interslavic doesn't have a 'soft' e, so the non-m. anim. nom. pl. has to be hard:
soft novi 'new (m. anim. nom. pl.)'
hard nove 'new (other nom. pl.)'
This two-way distinction is hard for me to grasp since I'm accustomed to Russian having a single form for both categories.
STAR WARS IN SLAVIC
Having just linked to the Belarusian Wikipedia's entry on Star Wars, I was surprised by how Star was translated as <Zordnyja> which isn't cognate to the 'star' word in most of the other Slavic titles for the movie:
Bosnian zvijezda 'star'
Croatian Zvjezdani 'stellar'
Serbian zvezda 'star'
Slovenian zvezd 'of the stars'
Bulgarian <Mežduzvezdni> 'interstellar'
Macedonian <zvezdite> 'the stars'
Polish Gwiezdne 'stellar'
Silesian Gwjezdne stellar' (did an author of this article translate the title?)
Slovak Hviezdne 'stellar'
Russian <zvëzdnye> 'stellar'
The exceptions are Ukrainian <Zorjani> 'stellar' and Czech Star
I was expecting a Belarusian adjective derived from <zvjazda>
'star' (the name of this
newspaper that I've seen online) - something like Interslavic zvězdne.
4.30.1:30: Filled out the list of equivalents of Star and added the final note about <Zorjani>.
4.30.21:21: I might as well survey the second half of the title in Slavic as well. I'm going to guess that it's some cognate of Belarusian <vojny> 'wars' almost everywhere: cf. Interslavic vojny 'wars'. I seem to recall an exception other than the untranslated Wars in Czech - ah, it was Serbo-Croatian!
Serbo-Croatian ratovi 'wars' (but would vojne be theoretically possible?)
Slovenian vojna 'war'
Bulgarian <vojni> 'wars'
Macedonian <vojna> 'war'
Polish and Silesian wojny 'wars'
Slovak vojny 'wars'
Ukrainian <vijny> 'wars' (nom. pl. of <vijna>; as with <zirka>, why did *o become <i> even without a following *ь or *ъ?)
4.30.23:23: Duh, the word was *vojьna in Proto-Slavic. And I suppose <zirka> 'star' is from a earlier *zorьka or *zorъka.
Russian <vojny> 'wars'
Serbo-Croatian rat turns out to be the cognate of Ancient Greek ἔρις éris 'strife' ... and English earnest!? I see the word is in East Slavic as well, but not West Slavic, so vojna is the best choice for Interslavic since it's understood across the entire family.
TABLES AND FALCONS: THE FATE OF FINAL *L IN SLAVIC
Polish kiełbasa /kʲewbasa/ from my last two entries is spelled with ł but is no longer pronounced with an [l].
Standard Polish once had three kinds of phonetic laterals, but only
two survive today: a palatal allophone before /i/ and a dental
||Example (from de Bray 1980:
The reflexes of Polish laterals seem straightforward: old hard *l becomes /w/ and old soft *lʲ becomes /l/.
Hence *stolъ 'table' and *sokolъ 'falcon' became
Polish stół /stuw/ and sokół /sokuw/.
(I can't predict when *o became ó /u/.)
What does not seem straightforward to me is the fate of syllable-final *l in Ukrainian, Belarusian, and Serbo-Croatian.There is a tendency toward shifting syllable-final *-l to /w w o/ in those languages: e.g.,
Ukrainian /stojaw/ 'stood' (masc. sg.) < *-l
Belarusian /stajaw/ 'stood' (masc. sg.) < *-l
Serbo-Croatian /stajao/ 'stood' (masc. sg.) < *-l
The best-known example might be Serbo-Croatian /beograd/ (cf.
English Belgrade reflecting earlier *l).
Nonetheless, 'table' and 'falcon' may retain *-l:
Ukrainian /stil/, /sokil/
Belarusian /stol/, /sokal/
Serbo-Croatian /sto/ (Serbia) ~ /stol/ (Croatia), /soko/ (Bosnia, Serbia) ~ /sokol/ (Croatia)
(Countries are from Wiktionary entries.)
In Belarusian, word-final *l remains except in the past tense masculine singular (Mayo 1993: 893). (Did it erode there due to high frequency?)
The situation in Ukrainian seems similar, though I know of one case
of /w/ < *l that is not a past tense masculine singular: /piw/
< *polъ 'half'.
Could /l/ retention in Croatian stol 'table' be motivated by
avoiding homophony with 'hundred' which is /sto/ across Slavic? That
doesn't explain Croatian sokol 'falcon', though. Browne and Alt
(n.d.: 20) write,
In adjectives and nouns it [*l > o] is widespread though some words avoid it: masculine singular nominative mio [< *mil] 'nice', feminine mila, but ohol 'haughty', feminine ohola.
I assume borrowings postdating *l-shifts retain final -l in Serbo-Croatian: e.g., hotel (not †hoteo).
Ukrainian and Belarusian seem to favor borrowing foreign -l-words with /lʲ/:
U /hotelʲ/, B /hatelʲ/ 'hotel'
U /alkoholʲ/, B /alkaholʲ/ 'alcohol'
but U <mark hemill> and B <mark hèmil>, both /mark hemil/ 'Mark Hamill'. (The B form is from the B Wikipedia entry for the original Star Wars [Зорныя войны. Эпізод IV: Новая надзея].)
4.29.21:57: Added Mayo on Belarusian, Ukrainian /piw/, Browne and Alt quotation, and everything after that.
IRREGULARITIES IN 'KIELBASA' REVISITED
Yesterday I discovered in de Bray's (1980: 258) book on West Slavic that Polish kiełbasa /kʲewbasa/ is in fact the regular reflex of an earlier *kl̩basa (cf. Slovak klbása ~ klobása). I assume his hard *l̩ goes back to Proto-Slavic *ъl.
But I still don't know how to account for the front vowels of
Ukrainian ківбаса <kivbasa> < *kilbasaBelarusian кілбаса <kilbasa> ~ келбаса <kelbasa>
Are they borrowings of forms resembling Polish kiełbasa or
pre-Polish (proto-West Slavic?) *kl̩basa? If they are from *kl̩basa,
their front vowels could have been inserted to avoid /klb/-clusters
that are not possible in East Slavic.
My guess is that Belarusian келбаса <kelbasa> is a borrowing from Polish kiełbasa, whereas Belarusian кілбаса <kilbasa> is an older form with an epenthetic vowel.
Ukrainian ківбаса <kivbasa> was presumably borrowed as *kilbasa before *l > <v> /w/. I don't think it's from Polish since
- the height of the first vowel doesn't match
- Polish ł apparently became [w] in the standard language only in the early twentieth century (Wikipedia); Morfill (1884: 1) says it is "a very strong l", not [w].
- Polish ł is still [ɫ̪] and not [w] in eastern dialects of Polish in contact with Ukrainian (Wikipedia)
A recent borrowing from the modern standard pronunciation of kiełbasa
would be †<kevbasa> and a borrowing from a pre-20th century
standard pronunciation or an eastern dialectal pronunciation would be
126.96.36.199:45: IRREGULARITIES IN 'KIELBASA'
Wiktionary derives Polish kiełbasa /kʲewbasa/ and its relatives from a Proto-Slavic *kъlbasa, in turn borrowed from some Turkic word similar to modern Turkish külbastı 'roasted meat', lit. 'ash-pressed'. Irregularity within Slavic implies that the word was borrowed more than once.
The Polish word and nonstandard forms like
Ukrainian ківбаса <kivbasa>
Belarusian кілбаса <kilbasa> ~ келбаса <kelbasa>
have front vowels /i e/ that I would not expect from Proto-Slavic *ъ.
*külbasa < *kölbasa < *kolbasa (cf. standard Ukranian ковбаса <kovbasa>)< *kъlbasa
but *o only raises to і in standard Ukrainian before a lost weak jer (*ъ or *ь) which wasn't in this word. Maybe the <kivbasa> dialect worked differently.
My current guess is that the /i e/ vowels in Polish, Ukrainian, and Belarusian reflect attempts to imitate Turkic ü and are not from *o or *ъ.
The Belarusian forms have /l/ instead of /w/ < *l corresponding to Ukranian <v> /w/ < *l and Polish ł < *l. This suggest that the Belarusian borrowings postdate the shift of *l to /w/ in Belarusian. But maybe I misunderstand when *l becomes /w/ in Belarusian.
A SHARED *SHCH-IFT IN CHINESE AND RUSSIAN
Last Friday (yes, I'm behind), I saw
新商品 'new product', lit. 'new trade item'
In Old Chinese, 商 was
either *sɯ-taŋ (corresponding to Baxter and Sagart 2014's *s-taŋ)
or *sɯ-laŋ (corresponding to Schuessler 2009's *lhaŋ)
and in Middle Chinese, it was *ɕɨaŋ.
It occurred to me that the palatalization of *sɯ-t- to *ɕ-
*sɯ-t- > *sɯ-tɨ- > *stɨ- > *stɕɨ- > *ɕtɕɨ- > *ɕːɨ- > *ɕɨ-
was like what I understand to be the palatalization of *stj- to [ɕː] in Russian:
*stj- > *stɕ- > *ɕtɕ- > щ [ɕː]
Above I presume there was an intermediate *ɕtɕɨ-stage at some point in Old Chinese resembling romanizations of Russian щ as šč or shch (e.g., Хрущёв Khrushchev), but without external evidence (e.g., Old Chinese transcriptions of a foreign word with šč-), it's impossible to say when that point was.3.14.11:45: I assume that Russian alternations such as
вместить 'to contain (perf.)' ~ вмещу 'I will contain'
can be internally reconstructed as
*vmestitĭ ~ *vmestju
to fit the pattern of
вменить 'to consider (perf.)' ~ вменю 'I will consider'
< *vmenitĭ ~ *vmenju
Ideally I'd like to find an example of initial щ- [ɕː] from *stj-,
but I think initial щ [ɕː] is normally from *sk-. A possible
I found in Preobrazhensky's Etymological Dictionary of the Russian
Language is щегол 'goldfinch'; Duden says
German Stieglitz 'goldfinch' is of Slavic origin.
Proto-Slavic *štjegŭlŭ? > *ščegŭlŭ
Ukrainian щиголь <ščyhol'>, щоголь <ščohol'>, щоглих <ščohlix>
Belarusian щигель <ščihel'>, щиглик <ščiglik> (I have kept Preobrazhensky's spellings with щ and и instead of modern шч and і)
(why -ль as if from *-lĭ?)
(no South Slavic reflexes? I would expect Bulgarian initial щ- [št], Serbo-Croatian initial št-, and Slovene initial šč-)
Czech stehlec, stehlík (with ste- rather than the regular ště- [ʃcɛ] - could this be a borrowing from some variety of German in which st- was [st] instead of [ʃt]?)
Polish szczygieł [ʂtʂɨɡʲɛw]
Upper Sorbian šćihlica [ʃtsʲihlitsa]
Lower Sorbian ščgeľc [ʂtʂgɛlts] (I have kept Preobrazhensky's spelling with ľ instead of modern l)
The reflexes of *stj- could have had parallels in Old Chinese at different stages and/or different places.
A LITTLE MISTAKE: ÍT ÓT TO BE THE PHONETIC
In my last post, I wrote that 乚 ất was the phonetic of the Vietnamese Chữ Nôm character 𡮒 ót 'a kind of fish'. After announcing that post on Twitter, I realized that the actual phonetic was 𠃝 which has two readings, ít 'little' and út 'youngest'. I didn't think of 𠃝 because 乙 appears as 乚 in 𡮒.
If the creator of 𡮒 had the reading út in mind for its
phonetic 𠃝, the score of 𡮒 would be 2 + 3 + 2 + 2 = 9 - much higher
than my original score of 6.
乙 is a 'Semitic phonetic': it can represent syllables with a wide range of vowels as long as those vowels are within the consonantal frame [ʔ-t]:
Neutral or achromatic vowels (neither palatal nor labial)
㲸 ướt [ʔɨət]
乙 ất [ʔət]
艺 ớt [ʔəːt]
𢖮 ắt [ʔat]
𢖮 át [ʔaːt]
𠃝 ít [ʔit]
𠮙 ét [ʔɛt]
𠃝 út [ʔut]
𡮒 ót [ʔɔt]
All of those syllables have the sắc tone written with an
Syllables with initial glottal stops and final stops regularly develop
Such a range of vocalism for a phonetic is unusual in Chữ Nôm. In my 2003 book, I proposed that phonetics generally belong to three vowel classes: neutral, palatal, or labial.
'Semitic phonetics' are exceptions to that generalization: e.g., 曰 viết in
neutral: 曰 vất [vət], 抇 vớt [vəːt]
palatal: 𢪏 vít [vit], 𧿭 vết [vet], 𢪏 vét [vɛt]
labial: ⿰曰𡿨 vót [vɔt]
3.1.0:39: Compare the ranges of readings for 'Semitic phonetics'
above with those for کت <kt> listed in Hayyim's
(Of course, Persian is not a Semitic language, but it is written in
a Semitic script.)
One difference is that all of those k-t readings have no tones, whereas all of the readings for Chữ Nôm characters with the two 'Semitic phonetics' above have the same tone. Perhaps the term 'Semitic phonetic' is a misnomer if the consonantal frames are actually consonant-and-tone frames.
甘 cam is a third 'Semitic phonetic' whose derivatives below have readings with three different tones (ngang, huyền, sắc) as well as three different vowel classes:
neutral: 坩柑泔 cam [kaːm], 紺 cám [kaːm], ⿰月甘 cằm [kam], 𩚵 [kəːm], 鉗 cườm [kɨəm]
palatal: 鉗 kìm [kim], ghìm [ɣim], kiềm [kiəm], kềm [kem], kèm [kɛm]
labial: 鉗 cùm [kum], 柑 cùm [kum]
Note, however, that all but one of the readings in that sample have either the ngang or huyền tones which are variants of the same proto-tone conditioned by voicing or its absence in proto-onsets. Also, only one of those characters is a made-in-Vietnam character (⿰月甘). 甘 was already a neutral and palatal phonetic in Middle Chinese because Old Chinese *a often had palatal reflexes after nonemphatic initials. An ideal example of a 'Semitic phonetic' would have many made-in-Vietnam derivatives with a wide range of vowels and tones. I should dig deeper to see if I can find one.
ÓT TO BE WRITTEN: FISHING FOR PHONETICS
The Vietnamese Chữ Nôm script represents Vietnamese syllables with existing and modified Chinese characters. The problem is that Vietnamese has many more syllables than Sino-Vietnamese, the subset of Vietnamese syllables that are Chinese character readings. For instance, Vietnamese has syllables ending in -ót, a rhyme absent from Sino-Vietnamese.
In my last two posts, I looked at Vietnamese solutions for writing the syllable lót.
I got curious about how other -ót syllables were written
found several strategies. My examples are not exhaustive, and I have
omitted glosses in most cases since I am focusing on readings.
1. Overall match
2. Matching the onset and coda without much regard for the vowel
⿰口脫 thót : 脫 thoát (score: 2 + 3 + 2 + 2 = 9; not a 10 only because the vowel heights don't match: o [ɔ] is higher than oa [wa], though I could be generous and say oa is like [o] + [a], and [ɔ] is between those two vowels in height)
𡮒 ót 'a kind of fish' : 乚 ất (the unwritten onset is [ʔ]; score: 2 + 0 + 2 + 2 = 6)
㩢 mót : 蔑 miệt (score: 2 + 1 + 2 + 1 = 6; the only matching vowel quality is length*)
⿰曰𡿨 vót : 曰 viết (score: 2 + 1 + 2 + 2 = 7; the only matching vowel quality is length)
This is the consonantal skeleton or Semitic strategy. If English were written with such a strategy:
cat = drawing of a cat
Kate = <woman> + <cat>
kite = <wing> (representing flight) + <cat>
cut = <blade> + <cat>
coat = <clothes> + <cat>
coot = <bird> + <cat>
caught = <hand> + <cat>
Cf. the reverse Semitic strategy (5 below).
3. Matching the rhyme without much if any regard for the onset
𡁾 hót : 說 thuyết < *ɕ- (or *sʰ-?) (score: 0 or 1 + 2 + 2 + 2 = 6 or 7, depending on whether the aspiration of th- [tʰ] < *sʰ-? counts as a partial match for h-)
3b. *Palatal onset : nonpalatal phonetic
卒 chót with initial [c] : 卒 tốt < *(t)s- (score: 1 + 3 + 2 + 2 = 8)
埣 giót < *ɟ- < *CV-c- : 卒 tốt < *(t)s- (score: 0 or 1 + 3 + 2 + 2 = 7 or 8, depending on how close the initials were when 埣 was created: *CV-c- is not far from *(t)s-, whereas modern gi- [z] ~ [j] is far from t-)
悴 xót < *ɕ- < *cʰ- : 卒 tốt < *(t)s- (score: 0 or 1 + 3 + 2 + 2 = 7 or 8, depending on how close the initials were when 埣 was created: *cʰ- is not far from *(t)s-, whereas modern x- [s] is far from t-)
⿰律𡿨 xót < *ɕ- < *cʰ- : 律 luật (score: 0 + 2 + 2 + 1 = 5)
3c. *Retroflex onset : nonpalatal phonetic
卒 sót < *ʂ- < *Cr- : 卒 tốt < *(t)s- (score: 0 or 1 + 3 + 2 + 2 = 7 or 8, depending on whether the proto-onset was *sr- which isn't too far from *(t)s-; *(t)s- had hardened to t- by the time *Cr- fused into *ʂ-)
捽 rót < *r- or *CV-s- (proto-onset unknown) : 卒 tốt < *(t)s- (score: 0 or 1 + 3 + 2 + 2 = 7 or 8, depending on whether the proto-onset was *CV-s-)
3d. Palatal nasal onset nh- [ɲ] : oral onset phonetic
踤 nhót : 卒 tốt < *(t)s- (score: 0 + 3 + 2 + 2 = 7)
𦝬 nhót : 突 đột with initial [ɗ] < *t- (score: 0 + 3 + 2 + 1 = 6)
𣑵 nhót : 聿 duật with initial [z] ~ [j] < *dʲ- < *j- (score: 0 or 1 + 2 + 2 + 1 = 5 or 6, depending what the onset of 聿 was when 𣑵 was created)
3e. Lateral onset : nonlateral onset phonetic
⿰貝骨 lót : 骨 cốt (score: 0 + 3 + 2 + 2 = 7)
䘹 lót : 卒 tốt < *(t)s- (score: 1 + 3 + 2 + 2 = 8)
3f. Labial onset : nonlabial onset phonetic
𡁾 vót : 說 thuyết < *ɕ- (or *sʰ-?) (score: 0 + 2 + 2 + 2 = 6)
啐 vót : 卒 tốt < *(t)s- (score: 0 + 3 + 2 + 2 = 7)
This character could belong to 2 or 3a depending on which part is
⿰孛乙 ót 'back of brain' : 孛 bột '' 'comet' + 乙 ất 'second Heavenly Stem' (score: 0 + 3 + 2 + 1 if 孛 is phonetic or 6 or 2 + 0 + 2 + 2 = 6 if 乙 is phonetic)
Neither part is obviously semantic. The absence of any component meaning 'brain' or even 'head' is puzzling. Could this be a double phonetic compound with 孛 approximating the vowel and 乙 the rest?4. Approximating the onset, vowel, and tone without regard for the coda
𠲿 thót : 束 thúc (score: 2 + 3 + 1 + 2 = 8)
I suspect 𠲿 was created by a speaker of a central or southern dialect in which *-t > [k]. If so, 𠲿 is really an example of strategy 1, and the score should be 9 (with a penalty solely for vowel height mismatch).
5. Approximating the vowel and tone without regard for the
The reverse Semitic strategy (cf. 2 which is the Semitic strategy).
束 hót : 束 thúc (score: 0 + 3 + 1 + 2 = 6)
I suspect this usage of 束 started with a speaker of a central or southern dialect in which *-t > [k]. If so, 束 is really an example of strategy 3, and the score should be 7 (with penalties for the onset and vowel height mismatch). The score could be raised to 8 if the aspiration of th- [tʰ] counts as a partial match for h-.
No solution has a score of 4 for vowels simply because no phonetic has a Sino-Vietnamese reading with o [ɔ]. The maximum possible score for -ót syllables is 9 out of an ideal of 10 (= 2 + 4 + 2 + 2). The actual scores above range from 5 to 9. It is not possible to determine the median or the mode of scores for ót-characters from the data in this post because it is incomplete and only typologically rather than statistically represenative: e.g., I omitted all but one strategy 1 character with a score of 9 because near-exact matches are boring.
Until now Chữ Nôm characters and readings have been treated as a uniform, timeless body. The next phase of Chữ Nôm studies should take space and time into account: where and when do certain spellings arise, and what can they tell us about Vietnamese phonetics in a given place and period?
*I consider all Vietnamese vowels and diphthongs to be the same length for scoring purposes with the exceptions of the short vowels ă [a] and â [ə] which cannot appear in syllable-final position because all Vietnamese syllables must be bimoraic. Hypothetical *Că and *Câ-syllables would be monomoraic and therefore not permissible.
A LÓT OF BRIBES OF BONES AND SHELLS
字典𡦂喃引解 Tự Điển Chữ Nôm Dẫn Giải ‘Character Dictionary of Chữ Nôm with Quotations and Explanations' lists
⿰貝骨 (not in Unicode) lót 'bribe'
as a homophone of lót 'to add a layer beneath or inside'
from yesterday. (I suspect the noun is an extension of the verb: a
bribe is something one pockets - put inside.)
貝 bối 'shell' on the left is the monetary radical. It's not surprising.
What is surprising is 骨 cốt 'bone' on the right with initial [k] instead of [l]. Or is it?
Using yesterday's scoring system for phonetic fidelity, ⿰貝骨 is a 7:
- the initial consonant is a 0 - [k] and [l] have nothing in common
- the vowel is a 3 - o [ɔ] and ô [o] are both back rounded and of the same length; only their height differs
- the final consonants is a 2 - a perfect match
- the tone is a 2 - a perfect match
Taberd lists a spelling of lót 'bribe' with a matching initial and an ironic original meaning:
律, originally for luật 'law' (bare phonetic)
I find his entry format confusing:
— 揬 | đút —, subornare
Why are the dashes in the Chữ Nôm and the Quốc Ngữ
romanization on opposite sides? Why isn't the entry like this?
揬 — | đút —, subornare
揬 đút, another word for 'bribe' (presumably an extended usage of đút 'to insert'), has two other spellings without the 扌 'hand' radical (the means of insertion):
⿰貝突 with the monetary radical plus the same phonetic 突 đột 'suddenly'
is there the syllables of the redundant compound ⿰貝突⿰貝骨 đút lót 'bribe' would have matching radicals with this spelling: cf. Sino-Vietnamese 賄賂 hối lộ 'bribe' with double monetary radicals
賥 đút with the monetary radical plus the phonetic 卒 tốt 'to end'
Let's score those spellings:
Do scores correlate with textual frequency? Did writers tend to favor better phonetic matches? Probably not. I admit my scoring is arbitrary and for fun. And timely given that the
揬 and ⿰貝突: initial 2, vowel 3, final 2, tone 1 = 8
賥: initial 1.5 (t- is closer to đ- than, say, l- which would be a 1), vowel 3, final 2, tone 2 = 8.5
Thế vận hội Mùa đông
'World athletic meeting Season winter' = 'Olympic Winter Games'
are still going. Though not for long - they end tomorrow.
(I wanted to type a made-in-Vietnam character for mùa 'season', but my editor doesn't support CJK Unified Ideographs Extension E. And it probably never will since KompoZer's development has been frozen since 2010.)