10.10.30.23:31: THE GOLDEN GUIDE: LINE 87: TANGRAPHS 431-435
87. Tangut posts are the most exhausting to write. At this rate - one line a week - it will take me over two years to finish the Golden Guide! Maybe I can do one line a day when I'm on vacation.
Tangraph number | 431 | 432 | 433 | 434 | 435 |
Tangraph | |||||
Li Fanwen number | 0830 | 1478 | 4455 | 3799 | 5055 |
My reconstructed pronunciation | 1kĩ | 1gĩ | 2tha | 2sew | 1tʃɨi |
Tangraph gloss | (transcription of Chinese) | (transcription of Chinese) | (transcription of Chinese) | small (< Chn 小) | (transcription of Chinese) |
Word | the surname 金 Jin (*kĩ) | the surname 嚴 Yan (*ŋgɨã) | the surname 陶 Tao (*thaw) | the surname 蕭 Xiao (*siew) | the surname 甄 Zhen (*tʃɨĩ) |
Translation | Kin, Gin, Tha, Sew, Chi |
431: Were the Jin in the Tangut Empire related to the Wang and the Tangut Rer?
=+
0830 1kĩ 'the Chinese surname 金 Jin (*kĩ)' (folcin) =
0403 1wõ 'the Chinese surname Wang' (folcok) +
2796 2rieʳ 'the Tangut surname Rer' (bilhascin)
432: I would have expected 1giã instead of 1gĩ as an Tangut approximation of Tangut period NW Chinese 嚴 *ŋgɨã. The absence of 1giã is an accidental gap since 1kiã is possible. Grade III -ɨ- is possible after velars in Tangut period NW Chinese but not in Tangut.
Were the Yan in the Tangut Empire related to the Jin?
=+
1478 1gĩ 'the Chinese surname 嚴 Yan (*ŋgɨã)' (folfal) =
0830 1kĩ 'the Chinese surname 金 Jin (*kĩ)' (folcin) +
0405 1dzwiə̣ 'wall' (falhul)
433: 4455, a Chinese transcription tangraph, must be related to its homophone 4456 2tha, the first half of 2tha-2xa 'wild goose' (whose second half looks like 'water' + 'hand' + 'bird'):
<>?
4456 is usually read as 2lɨẹ 'large'. This reading/meaning has its own number 4457. Since the Tangut period NW Chinese word for 'large' was 大 *tha, 4457 could also be used to write the syllable 2tha.
Then again, the right side of 4456/4457 is gii, an abbreviation of 2262 2dʒwɨõ 'bird', so I wonder if 4456 was originally meant for 2tha-2xa 'wild goose' and then recycled for 2lɨẹ 'large'.
434: 3799 'small' may have been chosen to follow 4455 2tha since the latter was homophonous with Chinese 大 *tha 'big'.
Although 小 'small' was Middle Chinese *siewʔ with Grade IV *-i-, it or its Tangut period NW Chinese descendant was borrowed into Tangut as Grade I 2sew rather than as Grade IV 2siew. Gong reconstructed the Tangut period NW Chinese rhyme of 小 'small' without his Grade IV *-j- as *-ew, matching the rhyme of 2sew.
3799 'small' (analysis unknown) looks like 'earth' (indicating an intention to use it in place name transcriptions?) plus 'small' (itself a derivative of 小?):
=+
435: 1836 must be phonetic in 5055. Is 4902 really relevant? Or were the Zhen of the Tangut Empire famous for their speech?
=+
5055 1tʃɨi 'the Chinese surname 甄 Zhen (*tʃɨĩ)' (biofuncin) =
4902 1ŋwəəu 'speech; word' (biodexbelcin) +
1836 1tʃɨẽ 'correct' (< Chn 正) (funham)
10.10.30.22:12: EMPHATIC FREQUENCY IN ARABIC ROOTS
I had forgotten about this table of consonant frequency in Arabic roots until David Boxenhorn reminded me. Below I've listed emphatics in red and their nonemphatic counterparts in black from most to least common:
I would expect nonemphatics to be more common than emphatics (e.g., d > ḍ) but the following pairs are inexplicable counterexamples:
Emphatic | Nonemphatic | Ratio (Roots) | Ratio (Qur'an) |
ʕ | ʔ | 65 : 35 | (15 : 85) |
q | k | 59 : 41 | 40 : 60 |
ħ | h | 60 : 40 | 19 : 81 |
ṭ | t | 57 : 43 | 11 : 89 |
The ratios for the running text in the Qur'an are in accordance with my prediction: nonemphatics outnumber emphatics. The Qur'anic ratio for ʕ : ʔ is in parentheses since the figure for ʔ includes the use of alif for aa.
Unlike Hebrew tsadi, the four emphatics in the table above have single sources in Proto-Semitic. According to this table, Arabic retains the PS consonant inventory with six exceptions:
PS *p > A f
PS *θ̣ > A ð̣ (I assume the ẓ pronunciation is newer)PS *ʃ, *s > A s (hence Arabic s-l-m corresponds to Hebrew sh-l-m)
PS *ɬ > A ʃ (cf. Old Chinese nonemphatic *hl > Middle Chinese ɕ)
PS *ɬˁ > A ḍ (ɮˁ in Muhammad's time; > ʕ in Aramaic?; cf. Old Chinese emphatic *hl > Middle Chinese aspirated stop *th)
PS *g > A j (ɡʲ in Muhammad's time)
Last night, I was puzzled by the high frequency of Hebrew emphatic s relative to nonemphatic s. David Boxenhorn pointed out that Hebrew s has a triple origin:
Proto-Semitic | Hebrew | Arabic |
*θ | צ s | ظ ð or z |
*s | ص s | |
*ɬ | ض d |
See his post for a full chart of Semitic consonants.
If the Hebrew s : s ratio is 68 : 32, would the Arabic ð / z + s + d : s ratio be similar? In fact, the Arabic ratio in the Qur'an is the other way around: 43 : 57. Why is Hebrew s more common than the corresponding trio of Arabic consonants?In modern standard Hebrew, s is a nonemphatic affricate [ts]. The title refers to the name צדי tsadi for its letter צ s.
The shift of s to ts caught my eye years ago because it reminded me of Starostin's (1989) partial derivation of Middle Chinese *tsh from Old Chinese aspirated *sh. Starostin's *sh accounts for some puzzling phonetic series with fricatives and affricates in Middle Chinese like
妻 OC *shəj > MC *tshej 'wife', phonetic in 棲 OC *səj > MC *sej 'bird's nest'心 OC *səm > MC *sim 'heart', phonetic in 沁 OC *shəms > MC tshimh 'to ooze'
生 OC *sreŋ > MC *ʂɨeŋ 'life', phonetic in 靑 OC *sheŋ > MC *tsheŋ 'green'
(I am ignoring cases in which Starostin reconstructed *sh in non-*s-phonetic series: e.g., his 千 OC *shiin which I reconstruct as OC *s-hnin since it belongs to an *n-series.)
Although the shift of OC *sh to MC *tsh does resemble Hebrew s > ts at first glance, there arekey differences:
First, nonemphatic OC *sh also shifted to MC *tsh.Second, Hebrew s and ts are not aspirates, whereas OC *sh and *sh, and MC *tsh are all aspirates.
Third and most importantly, there is no doubt about Hebrew s, whereas the existence of OC *sh and *sh is dubious. Nonemphatic sh is a very rare sound and I know of no language with emphatic sh. UPSID lists only three languages with nonemphatic sh: Burmese and its neighbor Karen and Mazahua in Mexico. I would add Korean to that list, though the aspiration of Korean s is nondistinctive and never romanized whereas s and sh are distinct phonemes in the other three languages. Burmese sh is from earlier aspirated palatal *ch. Perhaps Karen sh has a similar origin. I am reluctant to reconstruct unusual sounds in earlier languages.
I would prefer to account for MC *s- ~ *tsh-series with OC clusters.
Schuessler (2007: 61-62) derived the MC *tsh- in these phonetic series from OC *k-s-, whereas the affrication of Hebrew s did not involve a cluster.
Schuessler's proposal conflicts with Pulleyblank's (1991: 67) derivation of MC *ʂ- from OC *k-s-.
Pulleyblank (1965: 206-208) has an extensive discussion of the *k-s-problem. Pulleyblank's earlier solution involved a contrast between unaspirated and aspirated clusters:
OC *ks- > MC *ʂ-: e.g., 殺 MC *ʂɛt < OC *ksat 'to kill', cognate to 摋 MC *sat < OC *sat 'to slap', Written Tibetan gsod-pa 'to kill' (< root s-d), and Written Burmese သတ် sat 'to kill'.
OC *khs- > MC *tʂh-: e.g., 刹 MC *tʂhæt < late OC *khsat, transcribing Sanskrit kṣat [kʂət] and kṣet [kʂeet] and sharing a phonetic 杀 with 殺 'kill' and 摋 'slap'
Since I don't know of any language with a phonemic contrast between Cs- and Chs-clusters*, perhaps I could combine Starostin's and Pulleyblank's ideas:
OC *s- > MC *s-OC *sh- > MC *tsh-
OC *k-s- > MC *ʂ- (Cf. Sanskrit kṣ- < *ks-.)
OC *k-sh- > MC *tʂh-
These rules would also apply to OC emphatic *(k-)s(h)-. But I don't know of any languages with k-aspirated sh-clusters.
*The aspiration in Khmer clusters like khm- /km/ is nondistinctive. There is no contrast in Khmer between km- and khm-. Khmer stops must be aspirated before nasals (exception: kŋ-, not khŋ-).
Chinese historical linguists have known for at least 70 years that Old Chinese had two kinds of syllables. However, there has been no consensus about what the difference between these two types was. In 1994, Jerry Norman proposed that the distinction involved pharyngealization:
Some interpretations / notations of the Old Chinese A/B distinction
Karlgren 1940 | Pulleyblank 1962 | Yakhontov 1965 | Pulleyblank | Starostin 1989 | Norman 1994 | Ferlus as reported in Sagart 1999 | Schuessler 2007 | This site | |
Type A | *CV | *CV | *CV | *Cv́ | *CVV | *CˁV | *CCV | *Cv̂ | *CV |
Type B | *CjV | *CVV | *C-CV | *Cv̀ | *CV | *CV | *CV | *CV | *CV |
Notice how Pulleyblank and Starostin assigned vowel length in opposite ways.
I use small v instead of V beneath diacritics.
Although I was agnostic about the distinction for years, I adopted Norman's view a decade ago, though I use the term 'emphasis' and view emphasis as a syllabic property rather than a property of the initial consonant. I write emphasis with underlining, though I could also write *CV as *CˁV.
Type B syllables slightly outnumber Type A syllables in the Qieyun rhyme dictionary by a ratio of 52 : 48 (Wang 1957 in Norman 1994: 401). Norman (1994: 402) pointed out that "many of the most common grammatical function words of Early Chinese, which would normally be expected to have a simple phonological form" are Type B. Thus the B : A ratio in running text as opposed to dictionaries may be something like 6 : 4 rather than 52 : 48. I wish I had exact numbers for a text (and better yet, numbers for texts from different periods). Such figures imply that Type B must have been less marked (simpler) than Type A. It's highly improbable that grammatical words - which tend to be unstressed and reduced - would predominantly contain long vowels or complex and/or unusual initials.
Pharyngealized consonants are rare in the languages of the world. Only eight languages in UPSID contain them. The best-known languages with such consonants are Semitic: Arabic and Biblical Hebrew (BH). Ehret (1995) reconstructed Proto-Afroasiatic (PAA) ejectives as sources of Semitic pharyngealized consonants (hereafter 'emphatics').
Norman (1994) was the first to see parallels between Old Chinese and Arabic phonology. How far do these parallels go? One difference that has long bothered me is the absence of labial and nasal emphatics in standard Arabic and BH. Such emphatics are also rare outside Semitic. UPSID contains only a single emphatic nasal in Xu! (ŋ) and no languages with labial emphatics.
According to Ehret (1995), the PAA ejective *p'- became nonemphatic pre-Proto-Semitic *b- which became Arabic b- (and, I presume, BH b-). Old Chinese, on the other hand, has emphatic versions of (nearly?) all nonemphatic consonants including labials. Such an extensive emphatic inventory also exists in Cairene Arabic (Youssef 2006). Some Type A/B labial-initial minimal pairs:
逋 *pa 'to escape' : 簠 *pa 'ritual vessel'
柏 *phrak 'cypress' : 碧 *phrak 'green precious stone'
袍 *bu 'robe' : 枹 *bu 'drumstick'
忙 *maŋ 'flurried' : 亡 *maŋ 'to die'
Each pair above is written with the same phonetics. Some phonetics can be used for both Type A and B syllables, whereas others can only be used to write one type or the other: e.g, 巴 is only for *Pra whereas 便 is only for *Pen.
This is unlike Semitic alphabets which have distinct, noninterchangeable letters for emphatics and nonemphatics.
Some Old Chinese words and word families have Type A and B variants: e.g.,
編 *pen ~ *pen 'plait'
無 *ma 'to not exist', 莫 *mak 'don't ...!'
These alternations also have no parallel in Semitic.
Are the 52 : 48 or hypothetical 6 : 4 nonemphatic : emphatic ratios in Old Chinese similar to nonemphatic : emphatic ratios in Semitic? As a quick test, let's look at the frequencies of nonemphatic : emphatic ratios of letter pairs in Arabic and Hebrew:
Qur'anic Arabic (source: islamtutor.com)
t | d | ð or z | s | k (emphatic q) | |
Nonemphatic | ت | د | ذ or ز | س | ك |
Emphatic | ط | ض | ظ | ص | ق |
NE : E ratio | 89 : 11 | 78 : 22 | 85 : 15 or 65 : 35 | 74 : 26 | 60 : 40 |
Arabic emphatic l, found only in the name of Allah, is not included, since it does not have a distinct letter.
Most of the ratios are quite extreme compared to Old Chinese.
I presume the Arabic ratios are similar to those found by the earliest frequency analysts:
The first known recorded explanation of frequency analysis (indeed, of any kind of cryptanalysis) was given in the 9th century by Al-Kindi, an Arab polymath, in A Manuscript on Deciphering Cryptographic Messages. It has been suggested that close textual study of the Qur'an first brought to light that Arabic has a characteristic letter frequency.
Torah Hebrew (source: Haralick 1998; I do not endorse a 'Torah code' and am solely interested in letter frequency; if anyone can find a non-'code' source for letter frequencies, I would be grateful.)
t | k (emphatic q) | s | |
Nonemphatic | ת | כ | ס |
Emphatic | ט | ק | צץ |
NE : E ratio | 90 : 10 | 72 : 28 | 32 : 68 |
The t ratio is like that of Arabic. The k : q ratio is roughly 3 : 1 compared to 6 : 4 in Arabic. I am surprised by the s ratio which is the opposite of what I'd expect. Why is nonemphatic s so uncommon? This 'coder' found a 36 : 64 ratio of s : s in Ezekiel.
The equivalent Old Chinese ratios for consonant pairs are much more balanced: e.g., OC *t was extremely common compared to Arabic and Hebrew t. This implies that emphasis had a very different origin in Sinitic (which I proposed here) or that the Old Chinese emphatic hypothesis is completely wrong.
My first exposure to Lithuanian was in Baldi (1983) which I bought at the Stanford bookstore in 1993. On page 94, Baldi mentioned the myth
that native speakers of Lithuanian were capable of conversing with Brahmin speakers of Sanskrit, each in his own language, with almost complete mutual intelligibility. Such an assertion is, of course, wildly untrue, but it does underscore the conservative nature of the Baltic languages.
Although Baltic was first attested in the 14th century,
"[t]he reputation for conservatism that the Baltic languages [Lithuanian and Latvian] enjoy is well deserved, and indeed in many instances it can shown that they reflect the Indo-European system more closely than do [the much earlier attested!] Greek, Latin, or Sanskrit, the three languages on which much of reconstructed Proto-Indo-European is based.
Baldi points out that modern Lithuanian gyvas [giivas] is close to Proto-Indo-European *gwiiwos 'alive' compared to Old Irish biu, Latin vivus, Greek bíos, Gothic qius (cognate to English quick, as in the quick and the dead = 'the living and the dead'), Old Church Slavonic zhivŭ, and Sanskrit jiivas. Only the intiial consonant of the Lithuanian and Sanskrit forms is different. (The accentuation is also different - L gývas vs. Skt jiivás - but I will not compare Lithuanian accent and Sanskrit accent here.)
The Lithuanian word that really jumped out at me was vyras [viiras] 'man' which is segmentally homophonous with Sanskrit viiras 'hero' (< 'manly man'?): cf. Irish fear, Latin vir, English were (as in werewolf).
Lithuanian numerals also resemble their Sanskrit cognates, though to a lesser degree. ('Zero' and 'one' are not cognate.) I have included Lithuanian's neighbor Russian for comparison.
Gloss / English cognate | Lithuanian | Sanskrit | Russian |
two | du | dva | dva |
three | trys [triis] | tri | tri |
four | keturi | catur | četyre |
five | penki | pañca | pjat' |
six | šeši | ṣaṣ | šešt' |
seven | septyni | sapta | sem' |
eight | aštuoni | aṣṭa | vosem' |
nine | devyni | nava | devjat' |
ten | dešimt | daśa | desjat' |
On Saturday, I pointed out how common Sanskrit a is. Here we can see that Sanskrit a corresponds to Lithuanian and Russian e. Both those modern languages still preserve a vowel that was lost long ago in the ancestor of Sanskrit.
Russian has its share of innovations: e.g., the loss of a nasal in 'five'.
This is not to say that Lithuanian is always more conservative. Like Russian, Lithuanian has d- instead of an n- for 'nine' that is preserved in many other Indo-European languages. This d- is probably due to the influence of the adjacent numeral 'ten' with d-.
All three eastern languages have shifted an original *k to an s-type consonant in 'ten'. Latin decem [dekem] and Greek deka preserve this *k.
Now let's look at a sample of Lithuanian numeral declension. I was disappointed to see a lack of obvious parallels with Sanskrit: e.g., 'three'. Lithuanian has no neuter, so I have not included the Sanskrit neuter forms.which are identical to the masculine except for nom./acc. triiṇi.
Case | Masculine | Feminine | ||
Lithuanian | Sanskrit | Lithuanian | Sanskrit | |
Nominative | trys | trayas | trys | tisras |
Accusative | tris | triin | tris | |
Instrumental | trimis | tribhis | trimis | tisṛbhis |
Dative | trims | tribhyas | trims | tisṛbhyas |
Ablative | (none) | (none) | ||
Genitive | trijų | trayaaṇaam | trijų | tisṛṇaam |
Locative | trijuose | triṣu | trijose | tisṛṣu |
Lithuanian has no ablative case.
The only difference between the masculine and feminine Lithuanian paradigms is a -u- in the masculine locative. How many Lithuanian second language speakers confuse the masculine and feminine locatives?
The Sanskrit masculine/neuter and feminine paradigms have different stems: m./n. tri- and f. tisr-. I have no idea what the -s- is doing in the feminine. Beekes (1995: 212) projects these two stems back to Proto-Indo-European: m./n. *tri- (nom. *treies) and f. *tisr- (nom. *tisres). Sanskrit has maintained the original m. and f. nominatives apart from shifting *e to a.
Although Lithuanian has lost the feminine stem, its accusative tris is closer to PIE m. *trins than Sanskrit m. triin, which might be by analogy with masculine i-stem nouns: e.g., agni- 'fire', agniin 'fires'.
According to Beekes (1995: 118), L -m- and Skt -bh- in the instrumental and dative may reflect Proto-Indo-European dialect variation or may have originally been *-m- in the dative and *-bh- in the instrumental.
L gen. trijų [trijuu] < *u + nasal is closer to PIE *treiom than the Skt forms with -(aa)ṇaam.
I don't know what the origin of the aforementioned locatives in Lithuanian are. I presume they are innovations since Beekes reconstructed m. loc. *trisu (with a simple *-s- that became retroflex -ṣ- in Sanskrit).
Other Lithuanian declension patterns are as opaque to me as that of 'three'. Lithuanian is not to Sanskrit what Polish is to Russian. I can guess the gender, case, and number of a Polish word by thinking of its Russian cognate, but I can't do with Lithuanian.
10.10.27.23:59: PG-LESS VS. BK-LESS
My recent posts about vowel systems got me thinking about statistical tendencies in consonant systems. Unfortunately, I don't have any numbers on hand for the kinds of consonant systems that I want to discuss.
If one were designing a language from scratch, and could only have five of these six consonants
p | t | k |
b | d | g |
one might think that randomly deleting a consonant would be OK. But I've only seen two out of the six possibilities:
Egyptian Arabic, Middle Mongolian, Manchu, Japanese (excluding onomatopoeia and borrowings)
(*p > fricative) | t | k |
b | d | g |
Dutch, Czech, Slovak, Ukrainian, Belarusian (excluding onomatopoeia and borrowings)
p | t | k |
b | d | (*g > fricatives) |
Thai, Lao, Khmer, and Vietnamese are also missing g, which became kh in Thai and Lao and k in Khmer and Vietnamese. But their b and d are not like Indo-European b and d. Khmer and Vietnamese b and d are implosive [ɓ] and [ɗ] and Thai and Lao b and d used to be implosives:
Premodern | Modern | |
Thai and Lao | *b, *d, *g; *ɓ, *ɗ | ph, th, kh; b, d |
Khmer | p, t, k, ɓ, ɗ | |
Vietnamese | ɓ, ɗ, k, ɓ, ɗ |
Implosive ɠ is rare. It is only in five African languages (1.11%) in UPSID.
If two of the six consonants are missing, I predict they would be p and g, as in standard Arabic:
(*p > f) | t | k |
b | d | (*g > j) |
Let's call standard Arabic a PG-less language. Are there are any BK-less languages with the opposite pattern?
p | t | (gap!) |
(gap!) | d | g |
13 languages in UPSID do not have k:
Bandjalang is PTK-less. It has no voiceless stops.
Berta is PTK-less with ejective p and k but no regular p or k. Its θ may have been from *t.
Cherokee is PBTK-less according to UPSID but PBDG-less according to Wikipedia.
It's not clear if Djirbal is truly PTK-less since "[RMW] Dixon does not specify that stops are actually phonetically voiced in all positions." Voicing in Mbamaram is also uncertain.
Hupa and Kwaio are PBDKG-less. Kwaio *p b d k g may have become ɸ mb nd x ŋg.
Jomang stops are voiceless when geminated. So although it is PTK-less, it does have pp tt kk.
Kewa is PBK-less. Could *p b k have become ɸ w x?
Klao is KG-less but has kp and gb.
Usan is K-less. Did *k become a glottal stop as in Hawaiian?
Vanimo is KG-less. Did *k and *g merge into ɦ?Yidiny is PTK-less with optional "partially voiced" allophones of /b d g/ in word-initial position.
None of the thirteen are BK-less, though Kewa comes close. UPSID does not contain all the languages of the world, so it's still possible that BK-less languages may exist.10.10.26.00:26: BROWN'S (1979) "VOWEL LENGTH IN THAI"
When David Boxenhorn and I were recently discussing trends in vowel length, I remembered this article that I read back in the 90s in Brown's (1985) compilation From Ancient Thai to Modern Dialects:
It has long been noted that, except for the vowel /a/, vowel length distinction carries very little functional load in Thai. Minimal pairs are not as hard to find as most writers seem to indicate, but the tendency towards complementary distribution is unmistakable.
Brown counted the number of vowels in ancient Thai words ending in consonants other than a glottal stop*. The more common member of a vowel length pair is in bold. Va diphthongs have no length distinction before consonants.
ia 167 | ɨa 109 | ua 163 |
ii 53 | ɨɨ 58 | uu 60 |
i 178 | ɨ 86 | u 222 |
ee 65 | əə 85 | oo 132 |
e 103 | ə 9 | o 195 |
ɛɛ 127 | aa 430 | ɔɔ 209 |
ɛ 58 | a 438 | ɔ 98 |
A pattern can be seen more clearly if the lower-frequency vowels are deleted. Short vowels are more common among paired vowels with three exceptions in bold:
ia 167 | ɨa 109 | ua 163 |
i 178 | ɨ 86 | u 222 |
e 103 | əə 85 | o 195 |
ɛɛ 127 | aa 430 / a 438 | ɔɔ 209 |
I have not deleted aa since it is almost as frequent as a. Among non-a vowels, only mid ə and lower mid ɛ and ɔ are mostly long in a CVC environment.
Brown speculated that Thai monosyllabic words were originally disyllabic. Medial consonants were lost, "flood[ing] the system with long vowels and clusters":
*CVCVC > *CVVC
Brown constructed three types of vowels at this stage: short, long, and *Va-diphthongs. *aa could be regarded as either a long *a or as a *Va-diphthong. Brown regarded *aa as a "pivot" between long vowels and diphthongs.
*ia | *ɨa | *ua |
*ii | *ɨɨ | *uu |
*i | *ɨ | *u |
*ee | space! | *oo |
*e | *o | |
*ea | *aa | *oa |
*a |
(I have compressed Brown's three tables into one.)
The low clusters *ea and *oa fused into lower mid long vowels ɛɛ and ɔɔ which initially had no low counterparts:
ia | ɨa | ua |
ii | ɨɨ | uu |
i | ɨ | u |
ee | space! | oo |
e | o | |
ɛɛ | aa | ɔɔ |
space! | a | space! |
Then "the [short] /ɛ-ɔ/ space [...] started to fill in from without (borrowings and new formations) and from wihtin (cross-overs of existing words from the long space [/ɛɛ/-/ɔɔ/]."
Moreover, "[a]n uninhabited /əə/ space was created somewhere along the way [wasn't it there all along? -A] and slowly started to fill in [with what? -A]; but other than that about the only change has been a gradual wearing down of clusters and long vowels. This changes has gone all the way in some dialects (like White Thai, see Gedney 1964), but in modern Thai it has done little more than change the statistics of long-short occurrences in different environments."
Brown believed short ə was of recent origin, but did not explictly state its sources. An example of a ə word is เงิน ŋən 'silver' which is probably a loanword from Old Chinese 銀 *ŋrən 'id.' Perhaps this word was originally borrowed as *ŋəən with a long vowel which later wore down.
There are no longer any spaces in the modern Thai vowel system:
ia | ɨa | ua |
ii | ɨɨ | uu |
i | ɨ | u |
ee | əə | oo |
e | ə | o |
ɛɛ | aa | ɔɔ |
ɛ | a | ɔ |
The above may give the impression that Thai long vowels are shortening. Brown even wrote that "now it [Thai vowel length] appears to me to be all waning and no waxing". However, he acknowledges the existence of counterexamples to the waning trend:
*ʔday > daay 'can' (I have changed Brown's *ʔn- to *ʔd-)
*kaw > kaaw 'nine' (cf. Cantonese kaw which still has a short vowel)
These words are still spelled with short vowels in the Thai script.
Haas (1956) discussed cases of vowel length variation: e.g., แก้ว kɛw 'crystal' was "sometimes" kɛɛw. Only the long vowel version is listed at the Thai Dictionary Project site, implying that the short vowel version is now extinct.
Both Brown and Haas described how modern Thai vowel length is partly correlated with tones. I will not examine those correlations here.
Thai is remotely related to the Kra languages. Ostapirat (2000: 217) reconstructed a simple six-vowel system without length for Proto-Kra:
*i | *ə | *u |
*e | *a | *o |
Brown's pre-Thai system looks almost identical minus long vowels and diphthongs:
*i | *ɨ | *u |
*e | *a | *o |
Ostapirat's bibliography does not include Brown (1979) or the compilation it appeared in, so I wonder if Ostapirat was aware of Brown's six-vowel system.
*In Thai, vowels have no length distinctions before zero (i.e., in open syllables) and glottal stops. There are no minimal pairs like
-V vs. -VV-Vʔ vs. -VVʔ
I regard diphthongs as single Vs. I interpret vowels in open syllables and before glottal stops as
/-VV/ vs. /-V/ = [-VV] vs. [-Vʔ]
The final glottal stop is predictable after a short vowel.
10.10.25.1:05: VISUALIZING JAPANESE VOWEL FREQUENCY - THEN AND NOW
In 1999, I counted the frequency of phonemes in the Old Japanese poetry in Kojiki (712 AD). Here are the vowel phonemes arranged from left to right by frequency according to the statistics in my 2003 book:
a is number one as in Sanskrit or Belarusian, though it is not as huge, since OJ /a/ can only be from *a. Proto-Japonic *e and *o have sometimes raised to OJ /i/ and /u/, whereas *e and *o merged with a in Skt and Br*.
Front and back vowels grouped together in Skt and Br, whereas in OJ, high front and back vowels (/i u/) precede mid front and back vowels (/o e/). /o/ may slightly outnumber /e/ since it is partly from earlier *ə after labials: e.g., /wo/ < *wə.
Unlike most scholars working on Old Japanese, I regard one of the traditional eight 'vowels' ('type B e') as a vowel-glide sequence /əy/. If regarded as a distinct vowel, /əy/ would be almost as tiny as high central /ɨ/ at the end. (Compare the last place ranking of OJ /ɨ/ with the middle ranking of Br ы [ɨ].) I have added the figure for /əy/ to /ə/, but subtracting /əy/ would make no visible difference.
I don't have any figures for modern Japanese, but I can create a very rough ranking by converting OJ vowels into their modern equivalents:
OJ /a i e o/ > no change
OJ /u/ > ɯ (but romanized as u)
OJ /ə/ except before -y > o
OJ /əy/ > e
OJ /ɨ > i
The above mergers result in a five-vowel system:
Ignore the dot of i when comparing it to o.
The actual frequency of o may be even higher since Japanese has developed many -ou from earlier /apu/ and -(y)ou from earlier /epu/. But I doubt that o is as common as a, though it may be a close second.
*Strictly speaking, in Belarusian, *e became ja ~ ʲa, not a.
10.10.24.15:12: VISUALIZING BELARUSIAN VOWEL FREQUENCY
Unlike Russian vowels, Belarusian vowels are written phonemically, not etymologically*. Here are the vowels of Belarusian arranged from left to right by frequency:
а [a] is in a class of its own, followed by front vowels, the high central vowel ы [ɨ], and back vowels. Thus Belarusian has the same a > front > back pattern as Sanskrit.
э [e] is slightly more common than і [i], though this is hard to see. (Disregard the dot atop і [i] when comparing its height to that of э [e].)
Similarly, у [u] is slightly more common than о [o].
I have used only one Belarusian letter per vowel. In the Belarusian alphabet, vowels preceded by j- or palatalized consonants with ʲ- have separate letters:
Belarusian letter without j- | Belarusian letter before j- or ʲ- |
а [a] | я [ja] ~ [ʲa] |
э [e] | е [je] ~ [ʲe] |
і [i] ~ [ʲi] (corresponding to Russian и) | (no [ji]; [ʲi] is і) |
ы [ɨ] | (no [jɨ]) |
у [u] | ю [ju] ~ [ʲu] |
о [o] | ё [jo] ~ [ʲo] |
My frequency graphic uses the non-j-letters to represent the vowels: e.g., а represents the [a] in both а [a] and я [ja]. No figure was listed for ё [jo], so о [o] may be smaller than it should be.
*Written Russian е and о are respectively pronounced like и /i/ and а /a/ when unstressed, whereas Belarusian is pronounced as written: e.g.,
Russian spelling | Russian phonemics | Belarusian spelling | Belarusian phonemics | |
sister (nom. sg.) | сестра | /sʲistrá/ | сястра | /sʲastrá/ |
wall (nom. sg.) | стена | /stʲiná/ | сцяна | /scʲaná/ |
him/it (acc. sg.) | его | /jivó/ | яго | /jaɣó/ |
window (nom. sg.) | окно | /aknó/ | акно | /aknó/ |
head (nom. sg.) | голова | /galavá/ | галава | /ɣalavá/ |
chair (gen. sg.) | стола | /stalá/ | стала | /stalá/ |
Stressed vowels are in bold.
Belarusian at first looks like phonemically spelled Russian, but the differences between the two languages are not just orthographic: e.g., 'language' is мова in Belarusian but язык in Russian. Even cognate words are subtly different. Only two out of the six Belarusian words above are homophonous with their Russian cognates. Shifts of *e and *o to /a/ have boosted the number of /a/ in Belarusian as well as in Sanskrit.