When I was walking across the University of Hawaii campus with Bob Blust around 1996, he asked me if I thought Sino-Tibetan reconstruction methodology was fundamentally different from the 'standard' reconstruction methodology of Indo-European or in his specialty, Austronesian. I answered no, and he seemed to like that answer. But now I'm not so sure.

Last week, David Boxenhorn told me that Proto-Indo-European (PIE) reconstructions

- are adequate to express what people want to say about language relationships

- "are largely reinterpretations of the same data"

- "have the advantage of being well-known"

I don't think the same can be said for Sino-Tibetan (ST) or even Chinese. At least not yet.

All language reconstructions symbolize language relationships. To reconstruct a proto-language is to assume a relationship between its descendants. Conversely, my unwilllingness to reconstruct, say, 'Altaic' indicates I don't think there is a genetic relationship between the 'Altaic' languages.

The 'distance' between a reconstructed proto-language and its presumed descendants is a gauge of innovation. And shared innovations are my preferred basis of subgrouping.

So far I know of only one large-scale Proto-Sino-Tibetan (PST) reconstruction: the 494-word list of Coblin (1986). Superficially it looks like a mixture of Chinese and Tibetan, which is what a nonspecialist might expect given the name 'Sino-Tibetan'. (It was largely based on Chinese and Tibetan data; many other languages were cited only once or twice.) But is such a blend sufficient to represent the relationships between the many ST languages? (Ethnologue lists 460 ST languages; regardless of whatever number one uses, there is no doubt that the vast majority are not in the same branches of ST as Chinese and Tibetan.) The internal complexity with ST may be so great that ST might be comparable to 'Nostratic' (whose existence is debatable) rather than Indo-European.

Nearly a decade later, Gong (1994) published a shorter list of Proto-Sino-Tibetan reconstructions which, as Coblin (2011) noted, were

identical in most of its details with his Old Chinese [reconstructions]. This then points to a tacit conclusion about the nature of Sino-Tibetan and early Chinese, i.e., that Common Sino-Tibetan [i.e., Proto-Sino-Tibetan] was, phonologically at least, virtually the same language as Old Chinese.

There is no Indo-European or Austronesian reconstruction like Coblin's or Gong's PST which strongly resembles a daughter language. While it is true that the earliest version of Schleicher's fable looks like Sanskrit, no modern PIE reconstruction could be mistaken for Sanskrit at a glance. It is not a priori impossible that one branch of a language family could be extremely conservative: e.g., Baltic. Even so, the degree of conservativism (Sinocentrism?) in Gong's PST reconstruction is exceptional.

Why are PST reconstructions so Sino and so Tibetan (and not so much Kiranti)? The answer lies in another difference between IE and ST reconstruction.

There are several key languages for PIE reconstruction, and all are well-known - down to the most minute phonetic detail in the case of Sanskrit. Some are even of considerable time depth.

On the other hand, nearly all ST languages are attested only from the last millennium, with the obvious major exception of Chinese, whose script is somewhat opaque (more on that problem below).

The temptation to follow Coblin and Gong's examples and stick to languages with early written records (Chinese, Tibetan, Burmese, and in Gong's case, Tangut) for ST reconstruction is strong. However, the possibility that languages without such records could be of considerable value for reconstruction cannot be denied. Age does not entail archaism: e.g., Sanskrit long ago lost *e and *o which partly survive in Italian today:

'seven': PIE *septm > Skt sapta but It sette

'night': PIE *nokʷt- > Skt nakta- but It notte

(Of course, there's no need to use Italian for PIE reconstructions since we have Latin with septem and nox. Nonetheless, the point remains that older is not better in every way.)

If Austronesianists were unaware of the Formosan languages, their Proto-Austronesian would really be Proto-Malayo-Polynesian: i.e., Austronesian minus Formosan. Could there be Sino-Tibetan analogues of Formosan languages whose value for reconstruction has yet to be recognized?

It would be unthinkable to reconstruct Proto-Austronesian today without reference to Formosan which is part of a core body of data that any Proto-Austronesian reconstruction must account for. Proto-Austronesian reconstructions are, to use David's words, "largely reinterpretations of [this] same data". Similary, PIE reconstructions are "largely reinterpretations" of a fixed set of languages: Sanskrit, Greek, Latin, etc.

Coblin and Gong's PST reconstructions are not based on the same data. Coblin cast a wider net than Gong, Gong included Tangut and Coblin did not, and even the Old Chinese that they use is different: Coblin used Li Fang-Kuei's reconstruction whereas Gong used his own reconstruction which is similar but not identical to Li's.

The reconstruction of Chinese is a problem unlike anything in Indo-European. Despite a wealth of evidence, there is no consensus on Old or Middle Chinese phonology; different scholars' reconstructions of Old Chinese can look like different languages, and conversion between reconstructions is not always easy.

Some reconstructions of 道 'road' (now dao in Mandarin):

Karlgren (1957): *d'ôg

Li Fang-Kuei: *dəgwx (from Schuessler 1987: 115)

Schuessler (1987): *gləwʔ

Starostin (1989): *lhūʔ

Gong (1994): *'ləmx

Baxter and Sagart (2014): *[kə.l]ˤuʔ

This site: *Cʌ-luʔ (*C- might be *q-)

Reconstructions of PIE, Proto-Austronesian, etc. are not constrained by a wealth of philological material, whereas the structure of Chinese characters, rhymes in poetry and dictionaries, and the rhyme tables combine into one gigantic phonetic algebra problem whose solutions vary depending on one's use of clues scattered throughout East Asia: modern Chinese languages, loanwords in non-Chinese languages, and transcriptions of Chinese in non-Chinese scripts and vice versa. Simply applying the comparative method is not enough; it cannot take us as far back as Old Chinese, just as applying the comparative method to modern Romance languages cannot restore all the details of Latin. One must also master traditional Chinese phonology and try to make sense of it in modern terms while also constantly looking outside the Chinese box for guidance.

The usual starting point for reconstruction is Middle Chinese. Like Pulleyblank (1994: 164), I believe in

[g]etting Middle Chinese right. What I mean by this, of course, is not that we should wait until we are one hundred percent certain about every detail of the phonological systems underlying the Qièyùn and the rhyme tables before looking at Old Chinese, simply that we should get it as right as we possibly can. That means, in my view, eliminating basic errors in Karlgren's system that inevitably get projected back and distort our views on the earlier stages of the language.

Those basic errors may not only be projected backward into Old Chinese but also outward into reconstructions of other languages recorded in Chinese characters during the Middle Chinese period. An error in PIE reconstruction has no similar chain reaction effects; Nostratic and the like aside, PIE is as far as Indo-Europeanists go, and the reconstruction of neighboring proto-languages is not dependent on PIE.

We are far from a PST reconstruction that has wide acceptance. Neither Coblin nor Gong's reconstructions are frequently cited - unlike PIE reconstructions that have even made it into the American Heritage Dictionary. There will be no "well-known" PST reconstruction until the controversies in Chinese reconstruction settle down or Chinese ceases to play a crucial role. The second scenario is particularly unlikely.

In short, ST/Chinese reconstruction methodology is different from the IE/Austronesian 'norm', and in a way the stakes are higher, as reconstructed Chinese is also the key to the reconstruction of other languages. None of that means that the basic principles of linguistics (e.g., the regularity of sound change) fail to apply in the East. The differences are necessitated by the nature of the extant written evidence, not some inherent exotic quality of Sino-Tibetan speech. If early Indo-European had its own tradition of phonological analysis and were in a complex script, and if early Uralic and Basque and extinct languages like Etruscan were transcribed in that script, then what I wrote about Chinese would also apply to Indo-European. WHY DO I RECONSTRUCT THE TANGUT LIGHT LABIAL AS V-?

In my last entry, I wrote that the lip-rounding of Tangut shibilants

was sufficient for them to behave like Class II v- and Class IX l- which may have been partway between [ɫ] and [w]

but did not explain why I reconstructed the Class II initial* as labiodental v- instead of bilabial w- which would be the obvious choice given the lip-rounding of the shibilants. Below I list three arguments for v-. In isolation, they may not be convincing, but together ... I'll admit I'm not 100% convinced myself, so I've followed them with three counterarguments.

1. It's Class II, not Class VIII

I'll start with the most obvious argument.

The Tangut had adopted the traditional Chinese classification of initial consonants. In the Chinese phonological tradition, *w- was considered to be a glottal initial (喉音). In Tangut, glottal initials are Class VIII. Like Gong, I reconstruct ʔw- but not w- in Class VIII. This fanqie shows how ʔw- was analyzed as ʔ- + w-:


2094 1ʔwĩ = 3003 2ʔɨu + 0209 1lwĩ
That fanqie would not make sense if the initial of 2094 were w- without a glottal stop.

Gong reconstructed w- in Class II. However, I prefer to reconstruct v- because that class corresponds to 'light labials' (labiodentals) in the Chinese phonological tradition.

There is, however, no guarantee that all 'light labials' were labiodental in the Chinese dialect known to the Tangut. In fact, I think that one 'light labial' in that dialect was *w-, as I'll explain below. So an argument based on labels is the weakest of all.

2. Tibetan transcription

Tibetan has a letter for w- but no letter for v-. If Tangut had w-, it should have been transcribed as w-. However, there are many different Tibetan transcriptions of the Class II initial (Tai 2008: 177-178):

Tibetan transcription Frequency
d-w- 19
w- 16
yw- 5
b-w- 4
ww- 2
b-, b-?-, d-?-, ny-, H-?, H-bh-, wy-, yww- 1 each

Nishida (1964: 82) also found a transcription wh-. Unfortunately its frequency is unknown.

d- may indicate a tone.

See section 5 below for yw- and wy-.

b-C-sequences transcribed Tangut Cw-sequences, so b-w- may have been equivalent to ww-. The doubling of w- and the transcriptions with stops (b-, H-bh-) may indicate that the Class II initial had more friction than w-: i.e., that it was v-. (b- and H-bh- normally transcribed Tangut b- which may have been prenasalized [mb]. H- before an obstruent indicated prenasaliation in Tibetan.)

All of the above assumes that Tibetan w- was [w] in the dialect(s) of the transcribers. It is possible that w- was [v] in those dialects and that ww-, etc. were attempts to write [w]! But such a scenario cannot account for the data in the next section.

3. Chinese transcription

Both the Class II initial and Class VIII ʔ- (followed by rhyme 1 -u) were used to transcribe the Chinese 'light labial' traditionally known as 微(Gong 2002: 436-437, 444). I interpret this to mean that Tangut v- and ʔu were the closest available approximations of Tangut period northwestern Chinese *w-. v- wasn't bilabial, but at least it lacked a glottal stop. Conversely, ʔu was close to w- apart from the glottal stop.

Moreover, the Class II syllable

2467 1vɨạ 'flower'

was transcribed as Chinese *fɨa in the Pearl (with a diacritic indicating the Tangut initial wasn't simply f-) and conversely,

1360 1va 'to hide'

transcribed Chinese 發 *fɨa in the Forest of Categories (Gong 2002: 437). These transcriptions make more sense if the Class II initial was v-. If. Here's why I'm not so sure.

4. A problematic fanqie

This Tangraphic Sea fanqie may imply that the Class II initial was w-:


0072 1võ (Class II initial in Homophones) = 1097 2ʔu (Class VIII initial in Homophones!) + 0365 1kwõ

(12.8.0:22: 0072 'to wish' is probably a borrowing from Chinese 望 *wɨõ which never had a glottal stop.)

But perhaps 0072 had different initials in the Tangraphic Sea and Homophones dialects. The Tangraphic Sea initial ʔw- could contain a prefix plus root initial v- (/w/ which became a fricative in initial position). If 0072 had a Class II initial in both dialects, maybe that initial was a labiodental glide ʋ- which is more u-like (i.e., has less turbulence) than v-.

5. Tibetan transcription revisited

The yw-, wy-, and yww-transcriptions before front vowels make me wonder if the Class II initial was palatalized vʲ- or even labiopalatal ɥ- in at least some environments. Could the 'vigilant' initials associated with Grade III be palatal?

ɥ- tɕ- tɕh- dʑ- ɕ- ʑ- λ- (instead of v/ʋ- tʂ- tʂh- dʐ- ʂ- ʐ- ɫ-)

The problem with the palatal interpretation of the 'vigilant' initials is that it clashes with the fact that Grade III is nonpalatal. Maybe nonpalatals became palatal in the dialect(s) transcribed in Tibetan.

6. Sanskrit transcription

Sanskrit v- is transcribed with Tangut ph- and b- as well as v-. If Tangut had w- instead of v-, b- would be an understandable approximation of v-. But what about ph-?

I suspect that the b- and ph-transcriptions reflect Sanskrit borrowed through Chinese. Late Old Chinese and Early Middle Chinese did not have *v-, so Sanskrit v- was transcribed as *b- which later became *ph- in the Chinese dialect known to the Tangut.

Part of the variation may be in the Sanskrit itself, as Sanskrit v "is often confounded and interchanged with the labial consonant b" (Monier-Williams 1899: 910).

In any case, Tangut-internal variation (v-ariation?) may cloud our picture of the Class II initial(s?).

*There is no consensus on the number of Class II initials. Sofronov (1968 II: 69-70) identified five fanqie chains for Class II initials on the basis of the surviving two-thirds of the Tangraphic Sea. It is possible that further chains may be in the lost 'rising' tone volume. In any case, it is possible to reconstruct up to five different Class II initials, assuming that each fanqie chain has a unique initial. Nishida (1964: 84) reconstructed four Class II initials (f- v- ɱv- w-) whereas I only reconstruct v-. The focus of this post is on the phonetic quality of v-, not on the number of Class II initials. I should argue against reconstructing multiple Class II initials elsewhere.

