Here's another example of what happens when we reconstruct proto-languages without knowing all the facts - which is what we do all the time!
Only the Indo-Aryan branch of Indo-European has voiced aspirates like dh. Suppose we didn't even know they existed. We would see the following basic pattern among dental/alveolar stops:
|Proto-Indo-European||Old Irish||Gothic||Latin||Greek||Old Church Slavonic||Hittite||Avestan|
The traditional Proto-Indo-European reconstructed sources of these stops are *t, *d, *dh which are identical to their Sanskrit reflexes. Would anyone have come up with *dh for *C3 if Sanskrit were unknown? Would *C3 have been reconstructed as, say, *ð, which would have
- devoiced, hardened, and merged with t in Hittite
- devoiced to θ and hardened to Greek th (which later lenited to θ)
- shifted to f in Latin (cf. my Hawaii pronunciation of bathe as [bejf]).
- hardened and merged with d elsewhere - causing a chain shift in Germanic (ð > d > t > θ)?
I recall that David Stampe suggested *ð for *C3 in a class I took from him. Are there are any languages whose dh < *ð?11.6.1:33: Some UPSID statistics of interest:
1. A t-d-ð system without θ is unlikely, but not impossible. UPSID lists 12 languages with ð but without θ. 5 (Cubeo, Dahalo, Koiari, Nganasa, Tacana) have t-d-ð and the remaining 7 have t-ð.
2. All but one of the UPSID languages with dh are on the Indian subcontinent, and none contain ð. (The outlier is Igbo. The UPSID and Wikipedia descriptions of Igbo consonants do not match. Wikipedia doesn't list any voiced aspirates.) dh is in only 9 languages in UPSID, whereas ð is in 22 languages. Is it likely that ð would have hardened to a rarer dh? 106 languages have th, so PIE *ð > Greek th would be a shift from marked to less marked.
There is an unspoken assumption that we have sufficient extant material to reconstruct lost proto-languages. This is clearly false. Suppose Latin were completely lost. It would not be possible to reconstruct the Latin case system from modern Romance languages, not even Romanian, which still has a case system: e.g., 'wolf':
(Romanian only distinguishes nom./acc. from gen./dat. if the definite article suffix is attached: lup-ul [nom./acc.]. lup-ului [gen./dat.; -ului < Lat illui], lup-ule [voc.])
Similarly, it would not be possible to reconstruct the Sanskrit case system from modern Indo-Aryan languages: e.g., 'village':
I presume Bengali graam- is a borrowing from Sanskrit. Does Bengali have a descendant of graam- corresponding to Hindi gããv < graam-?
Hindi has lost all case distinctions in the singular of this particular masculine noun (though others have a two-way distinction).
The Bengali endings -ke and -er are not derivable from Sanskrit. Bengal locative -e looks too good to be a true preservation.
Proto-languages are approximations at best, and their accuracy is dependent on what happened to survive as well as the skill of historical linguists. No one can reconstruct what it is completely gone.
The above came to mind as I realized I had left out one other (highly unlikely) scenario from last night's list: what if Tangut vowel length reflected final voiced stops that were lost in all other Sino-Tibetan languages? E.g.,
|Proto-Sino-Tibetan||Tangut||gDong-brgyad rGyalrong||Written Tibetan||Written Burmese||Old Chinese|
|*-ag, *-ad, *-ab?||-aa||-a||-a||-a||*-a|
But is it really likely that Tangut is the only one out of hundreds of ST languages that preserved any remnant of this feature? Moreover, even if it did, it would still be impossible to reconstruct which voiced stop was the source of length in any given morpheme since no other ST language has a trace of them.
11.5.3:40: Reconstructing PST *-aa as a source of Tangut -aa and non-Tangut -a would be reasonable based solely on the five languages in the table. However, if each and every correspondence pattern between the hundreds of ST languages were the basis of a proto-rhyme, PST would have an impossibly huge number of rhymes. Proto-languages are still languages and are subject to the same constraints as attested languages. I am hesitant to reconstruct proto-traits on the basis of a single language (Tangut) whose phonetics remain uncertain: e.g., Gong's and my -aa corresponds to Arakawa's -a' ([aʔ]?), Sofronov's -aɯ, Nishida's -ɑw, etc.
I mentioned final consonants as a source of Tangut vowel length at the end of my last post. Here's a quick test of that hypothesis. Guillaume Jacques has compiled a list of gDong-brgyad rGyalrong (GBR)-Tangut cognates. If the simplest version of my hypothesis is correct, Tangut long vowel rhymes should only correspond to GBR rhymes ending in consonants. But that is clearly not the case because there are counterexamples: e.g.,
|Correspondence type||GBR gloss||GBR||Tangut||Tangut gloss if different||Notes (WT = Written Tibetan; OC = Old Chinese)|
|1||to cook||kɤ sqa||1ɣii||ɣ- < *Cɯ-q-; Cognate to 1ɣɪ̣ 'to cook' with a short vowel|
|2a||girl||tɯ me||1miəə||woman||Cognate to 1miẹ 'woman' with a short vowel|
|2b||tail||tɤ jme||1miee||cf. OC 尾 *məjʔ|
|2c||four||kɯ βde||1lɨəəʳ||cf. WT bzhi < *blyi, OC 四 *shli(t)s|
|3a||clear (water)||kɯ mgri||1gii||clear|
|3c||name||tɤ rmi||2miee||cf. WT ming, OC 名 *meŋ|
|4a||to sit||kɤ mdzɯ||2dzəəu||OC 坐 *dzojʔ has the same initial but its rhyme is too different|
|4b||soft||kɯ mpɯ||1vəə||v- < *Cɯ-p-|
|4c||to steal||kɤ mɯ rkɯ < *-u||1kwiəəʳ||cf. WT rku-, OC 寇 *khos|
|5||dream||tɯ jmŋo < Proto-rGyalrong *-aŋ||1miee||cf. WT rmang lam, OC 夢 *məŋs|
|6||to welcome||kɤ qru||1khʊʊ||kh- < *qh- (< *qʁ- < *qr-?)|
(The table lists all correspondence types but not all examples of each type.)
OC *-ʔ and *-s may or may not be parts of roots.
The roots of 'name' and 'dream' originally had a final consonant *-ŋ lost in both GBR and Tangut. Final *-ŋ does not necessarily guarantee a Tangut long vowel: e.g., the Tangut cognates of GBR mbro < *-aŋ are 1rieʳ and 2riaʳ with short vowels.
Some possible explanations for the counterexamples:
1. Tangut had some other feature instead of vowel length: e.g., Arakawa reconstructed final -' (phonetically [ʔ]). Tangut could have kept this feature whereas GBR lost it. Note that Arakawa's -' does not consistently correspond to OC *-ʔ.
2. Tangut retained a vowel length distinction lost in the other languages.
3. Tangut long vowels are partly or wholly from suffixes unique to Tangut. Hence 1ɣii 'to cook' and 1ɣɪ̣ 'to cook' may share a common root *qi with different affixes:
*Cɯ-qi-C > 1ɣii (why not 1ɣ̣ɪɪ?)*Sɯ-qi > 1mɪ̣
Similarly, 1miəə and 1miẹ 'woman' may share a root *m-:
*mə-C > 1miəə*Sɯ-me > 1miẹ
I am not convinced Tangut had long vowels since Sanskrit long vowels were transcribed with Tangut short vowels plus the tangraph 'long' (Grinstead 1972: 68) rather than with Tangut 'long' vowels. That implies that Tangut 'long' vowels were distinguished by some other feature.
10.11.2.23:59: A FORM-AL BOW
(Dedicated to Petri Kallio.)
This Finnic sound change reminds me of one of the origins of modern Japanese long vowels:
The velar nasal *ŋ was vocalized to a semivowel in various positions (*joŋsi "bow" → jousi, *suŋi "summer" → suvi). In some cases further loss occurred (*müŋä "backside" → Estonian möö-, Finnish myö-).
Further examples can be on found on pp. 232-233 of this paper by Petri.
Many Japanese long vowels are in borrowings from Chinese. These Sino-Japanese (SJ) long vowels have several types of sources: e.g., *V(p)u-sequences:
SJ *-a(p)u > -ou [oo]
SJ *-i(p)u > -yuu [jɯɯ]
SJ *-e(p)u > -you [joo]
SJ *-o(p)u > -ou [oo]
SJ *-uu > uu [ɯɯ] (there was no *-upu)
Middle Chinese (MC) readings ending in *-ŋ are another source:
MC *-aŋ > SJ *-aũ > *-au > -ou [oo]
(There was no MC *-iŋ.)
MC *-uŋ > SJ *-uũ > -uu [ɯɯ]
MC *-eŋ > SJ *-eĩ > -ei [ee]
MC *-oŋ > SJ *-oũ > -ou [oo]
When I saw Proto-Finnic *joŋsi > Finnish jousi 'bow', I was reminded of modern SJ compounds pronounced youshi whose components were once like *yoŋ and *si: e.g.,
SJ 用紙 youshi 'form' < *yoũ (< MC *juoŋh 'use') + *si (< MC *tɕieʔ 'paper')
SJ 容姿 youshi 'appearance' < *yoũ (< MC *juoŋ 'appearance') + *si (< MC *tsi 'appearance')
However, note that the Finnic change is word-internal, whereas MC *-ŋ in medial or final position ends up being reflected in SJ as vowel length: e.g.., the SJ reading for 'bow' in isolation is kyuu < MC *kɨwŋ.
11.3.1:06: The vocalization of a nasal coda (possibly *-ŋ) to a semivowel may also have occurred in Tangut if Gong's reconstruction of rhyme groups VII and XI is correct:
|Gong's Tangut rhyme group||Pre-Tangut||Gong's reconstruction||This site|
Compare the middle two columns with
MC *-eŋ > SJ *-eĩ > -ei [ee]
MC *-oŋ > SJ *-oũ > -ou [oo]
Perhaps these Tangut rhymes were like the earlier SJ rhymes with both nasalization and semivowels: -ẽj, -õw?
Gong reconstructed many Tangut long-vowel rhymes which I have more or less carried over into my reconstruction. These rhymes may also originate from lost consonantal codas:
*-VC > *-VG > -VV
10.11.1.21:41: THE GOLDEN GUIDE: LINE 89: TANGRAPHS 441-445
89. The first two surnames are uncommon in Chinese but may have been common in the Tangut Empire.
The first three tangraphs also represent three Chinese loanwords: 西 'west', 川 'river', and 凡 'ordinary', which were homophonous with the surnames 息, 傳, 范.
|Li Fanwen number||4293||1990||2052||5267||4710|
|My reconstructed pronunciation||1si||1tʃhwɨã||1xwiã||1lɨẽ||1lo|
|Tangraph gloss||west; (transcription of Chinese)||river; (transcription of Chinese)||ordinary; Sanskrit; (transcription of Chinese)||to tie; to take over; to contact; (transcription of Chinese)||(transcription of Chinese)|
|Word||the surname 息 Xi (*si)||the surname 傳 Chuan (*tʃhwɨã)||the surname 范 Fan (*fɨã)||the surname 廉 Lian (*liẽ)||the surname 羅 Luo (*lo)|
|Translation||Si, Chhwan, Hwan, Len, Lo|
441: Is the choice of 'wood' for a phonetic meant to imply 'Root West', the name of the indigenous Tangut religion?
4293 1si 'west' (boxdiljeu) =
4250 1si 'wood' (boxdexdexcok; phonetic)
3226 1niəə 'to shine upon'(diljeu; cognate to Old Chinese 日 *nit 'sun'?; semantic - the setting sun?; why is jeu 'eight' on the right?)
442: 1990 is a semantic compound:
1990 1tʃhwɨã 'river' (cirdaicok) =
3058 2ziəəʳ 'water' (cirzaa) +
2474 2raʳ 'to flow' (dexdaidex) +
2107 1tsəiʳ 'earth' (giigircok)
443: The graphic analysis of 2052 makes no phonetic or semantic sense and may be arbitrary:
2052 1xwiã 'ordinary; Sanskrit' (biipikbel) =
1995 2məi 'the 巽 wind trigram ☴' (biidexdak) +
3695 2ziuʳ 'broom' (gempik; 'grass' + 'hand') +
1976 2bie 'gold; the 兌 marsh trigram ☱' (baebeldexbel)
11.2.2:40: 1xwiã 'Sanskrit' is from Chinese *fwɨã < Late Old Chinese *bramh < Skt Brahma.
444: 5267 may be a fanqie tangraph, even though the rhyme of 0535 is oral rather than nasal:
52671lɨẽ (transcription of Chinese) (pekcox) =
1661 1lɨĩ (transcription of Chinese) (bospek) +
0535 1ʃɨe 'according to' (bumguxcox)
445: Could the structure of 4710 be loosely based on its Chinese near-homophone 廊 *lõ 'porch; corridor'? The graphic analysis makes no phonetic or semantic sense and may be arbitrary:
4710 1lo (transcription of Chinese) (biobaepikheu) =
5045 1kwĩ 'gentleman' (< Chinese 君 *kwĩ) (biofeodex)
3508 2bi 'prime minister' (baepikpik)
5464 2ʒɛʳ 'to live, reside' (tiiheu)
10.10.31.21:29: THE GOLDEN GUIDE: LINE 88: TANGRAPHS 436-440
88. All five tangraphs below are Chinese transcription tangraphs.
|Li Fanwen number||5491||2771||1227||4329||5737|
|My reconstructed pronunciation||1xʊ||1phɛ||2ʃɨew||1xwĩ||1tshwe|
|Tangraph gloss||(transcription of Chinese)|
|Word||the surname 胡 Hu (*xəu)||the surname 白 Bai (*phɛ)||the surname 邵 Shao (*ʃɨew)||the surname 封 Feng (*fɨũ)||the surname 崔 Cui (*tshwe)|
|Translation||Hu, Phe, Shew, Hwin, Tshwe|
436: 5491 has a circular analysis:
5491 1xʊ 'the Chinese surname 胡 Hu (*xəu)' (halbilfir) =
4093 2xʊ 'a kind of tree' (boxhalbilfir) +
2796 2rieʳ 'the Tangut surname Rer' (bilhascin)
I don't understand why Chinese 胡 *xəu wasn't borrowed as 1xəu, a syllable which does exist in Tangut. 1xʊ is Grade II unlike Chinese 胡 *xəu and 1xəu which are both Grade I.
Were the Chinese Hu of the Tangut Empire somehow connected to the Tangut Rer?
4093 (analysis unknown) must consist of 'wood' atop its nearly homophonous phonetic 5491.
437: Were the Chinese Phe of the Tangut Empire (like the Hu above) somehow connected to the Tangut Rer?
2771 1phɛ 'the Chinese surname 白 Bai (*phɛ)' (dexhoecin) =
3366 1bɛ 'the Tangut surname Be' (dexhoe; phonetic) +
2796 2rieʳ 'the Tangut surname Rer' (bilhascin)
The radical hoe in 2771 and 3366 representing Pɛ-syllables may be based on Chinese 馬 *mbæ 'horse'.
438: 1227 2ʃɨew (analysis unknown) has no (near-)homophones with shared radicals. Could it be a semantic compound referring to characteristics of the Shao, or does it have cryptophonetics: components from tangraphs whose Chinese translations sounded like 2ʃɨew?
439: 4329 1xwĩ bears almost no phonetic resemblance to Chinese *fɨu. Tangut had no f- or -ɨũ. Although the Tangut could have borrowed *f- as ph- (as Koreans do), they approximated it with xw-. -ĩ was the closest approximation of -ɨũ since Tangut -ɨ- could not follow a velar and Tangut -ũ could not follow a high vowel.
4329 1xwĩ 'the surname 封 Feng (*fɨũ)' (boxpikpik)
4342 2dia 'perfective marker' (boxjaltun) +
5751 1dii 'to divide, distribute' (pikpik; 'hand' x 2)
The function of 4342 is unknown. Why would a perfective marker originating from a prefix indicating movement away from the speaker have box 'wood' on top and tun 'skin' on the bottom right?
5751 may be a cryptophonetic. Its Chinese translation was 分 *fɨũ, which was homophonous with the surname 封 *fɨũ in the dialect known to the Tangut.
440: 5737 is a fanqie tangraph. Note how bio in 4824 is reduced to half its width in 5737.
5737 1tshwe 'the surname 崔 Cui (*tshwe)' (pikbiohan) =
5760 1tshəu 'wide, thick' (pikquu) +
4824 1lwe 'rich, wealthy' (biohanpax)