Amaravati: Abode of Amritas

09.8.1.23:59: CMPRSSN N PHN RNG CHM

Last night I mentioned Sagart (1999: 15) citing Phan Rang Cham (Alieva 1994) in Vietnam as an example of a language with coexisting forms in various stages of contraction: e.g.,

Proto-Chamic *dahlaʔ > PRC tahlaʔ ~ thlaʔ ~ hlaʔ ~ laʔ 'I' (the resemblance to Old Chinese 余 *la is be coincidental)

After posting "Hidden Jin-ius", I found an article by Graham Thurgood on Phan Rang Cham citing this even more extreme example of variation from Blood (1962):

'new': perèw ~ prèw ~ phirèw ~ phrèw ~ firèw ~ frèw

According to Thurgood (2005: 3), Blood

notes that the scholars tend to maintain the full forms in speech, but, typically, non-scholars modify the first syllable, reducing its vocalism, subjecting it to assimilation, or losing it entirely.

I presume "scholars" refers to Phan Rang Cham who are well-versed in tradition rather than Phan Rang Cham who have Vietnamese or Western academic credentials but might not be conversant in their native language's conservative variety.

Thurgood (2005: 4) thinks ph ~ f variation (in which *ph > [f], still spelled ph). But what accounts for p ~ ph variation which has no parallel in Vietnamese? (Vietnamese doesn't even allow p- as an initial consonant.)

English has a few cases of similar variation - e.g.,

because: [bikʌz] ~ [bɪkʌz] ~ [bəkʌz] ~ [pkʌz] ~ [kəz] ~ [kz̩]

and: [ænd] ~ [æn] ~ [ən] ~ [n̩]

- but most words cannot be reduced to such a degree. Moreover, English has not yet developed the tones of Phan Rang Cham which are/were segmentally conditioned (Thurgood 2005: 6):

final glottal stop?	no	yes
*voiceless obstruent initial > voiceless obstruent initial + modal voice	mid tone	falling tone
*voiced obstruent initial > voiceless obstruent initial + breathy voice	low tone	rising tone

An acute accent indicates breathy voice in Thurgood's transcription. Thurgood transcribed Alieva's tahlaʔ as tə̀hlaʔ with an acute accent; the breathy voice in that word reflects an earlier voiced initial *d-.

Here's how four hypothetical Proto-Chamic syllables would have developed tones in Phan Rang Cham:

final glottal stop?	no	yes
*voiceless obstruent initial > voiceless obstruent initial + modal voice	ta > ta +* mid tone	taʔ > taʔ* + falling tone
*voiced obstruent initial > voiceless obstruent initial + breathy voice	*da > tà + low tone	daʔ > tàʔ* + rising tone

I predict that Phan Rang Cham may eventually lose its glottal stops (and even breathy phonation?) but retain all four tones under the influence of tonal Vietnamese. Compare these four hypothetical syllables:

language	Proto-Chamic	Current Phan Rang Cham	Future Phan Rang Cham?
phonation	no	yes	yes or no
tones	no	yes	yes
glottal stops	yes	yes	no
voiceless obstruent initial	*ta	ta + mid tone	?ta + mid tone
voiceless obstruent initial + final glottal stop	*taʔ	taʔ + falling tone	?ta + falling tone
voiced obstruent initial	*da	tà + low tone	?tà or ta + low tone
voiced obstruent initial + final glottal stop	*daʔ	tàʔ + rising tone	?tà or ta + rising tone

8.2.0:32: Tonal contours could also change in the future, but I have kept them intact for simplicity.

09.7.31.23:59: HIDDEN JIN-IUS

Last night I forgot to mention one other example of an important variety of a big language. Back in 1996, I considered the 晉 Jin dialects to be Mandarin dialects rather than members of a separate branch of Chinese. So I didn't believe that there was anything really special about Jin. But I was wrong.

Jin has what Sagart (1999: 117) interprets as l-infixation, a modern remnant of Old Chinese (OC) *-r-infixation. Wikipedia provides the following examples from an unspecified Jin dialect:

'hop': 蹦 pəŋ > pəʔ ləŋ
'drag': 拖 tʰuɤ > tʰəʔ luɤ
'scrape': 刮 kua > kuəʔ la
'street': 巷 xɒ̃ >xəʔ lɒ̃

I wonder if in fact these are noncontracted and contracted descendants of OC words with liquids:

'hop': 蹦 OC ?*pʌ-ləŋs or ?*pʌ-rəŋs > pəʔ ləŋ ~ pəŋ
'drag': 拖 OC ?*tʌ-hlaj > tʰəʔ luɤ ~ tʰuɤ
'scrape': 刮 OC ?*kʷʌ-rat or ?*kʷʌ-r-lat > kuəʔ la ~ kua

'street': 巷 OC ?*N-kʌ-roŋs > xəʔ lɒ̃ ~ xɒ̃

Sagart (1999: 15) cites Phanrang Cham (Alieva 1994) as an example of a language with coexisting forms in various stages of contraction: e.g.,

tahlaʔ ~ thlaʔ ~ hlaʔ ~ laʔ 'I' (the resemblance to Old Chinese 余 *la must be coincidental)

If I didn't know about these Jin forms, I would only reconstruct OC liquids with certainty in compressed versions of the last three words:

'hop': 蹦 OC ?*p(l/r)əŋs
'drag': 拖 OC *hlaj
'scrape': 刮 OC *kʷrat or *kʷrlat; its phonetic is 舌 *m-lat
'street': 巷 OC *groŋs

蹦 'hop' is not attested in Old Chinese and I cannot even find it in Middle Chinese (MC). The modern form could be from OC *pəŋs, *pləŋs, or*prəŋs. (I reconstruct *-s presuming that the word has the same tone category in Jin and in standard Mandarin.)

The OC presyllable ?*kʷʌ- in 刮 'scrape' has an impermissible initial in Sagart's (1999) system. I wonder if *kʷʌ-r- is a compression of an earlier *kor- with a nonhigh vowel conditioning 'emphasis'.

I reconstructed OC ?*N-kʌ- instead of *gʌ- to adhere to Sagart's (1999: 110) constraint against voiced stop initials in OC presyllables.

Sagart (1999: 132) mentions another potentially conservative feature of a Jin dialect. 孝義 Xiaoyi has

- a "glottal break" -ˀ- in the middle of Tone B syllables corresponding to MC and OC *-ʔ reconstructed as the source of Tone B

- "a weak -ʰ coda" at the end of Tone C syllables corresponding to MC *-h < OC *-s reconstructed as the source of Tone C

The MC and OC codas were reconstructed long before Xiaoyi were discovered within the last twenty years. (Guo JIanrong 1989 observed final -ʰ, but it's not clear whether Guo also discovered the "glottal break" described in Sagart [1999: 132].) Xiaoyi could be the Chinese equivalent of Hittite:

Hittite preserves some very archaic features lost in other Indo-European languages. For example, Hittite has retained two of three laryngeals (h2 and h3 word-initially). These sounds, whose existence had been hypothesized by Ferdinand de Saussure on the basis of vowel quality in other Indo-European languages in 1879, were not preserved as separate sounds in any attested Indo-European language until the discovery of Hittite.

8.1.0:06: It may not be necessary to reconstruct liquids at the OC level for all Jin words with zero ~ -Vʔ-l-alternation. Perhaps some alternations are of modern origin by analogy with genuine preservations of OC contracted and uncontracted forms: e.g.,

'hop': 蹦 pəŋ > pəʔ ləŋ (earlier liquid is uncertain; word itself may be of post-OC/MC origin)

by analogy with

'drag': 拖 tʰuɤ ~ tʰəʔ luɤ < OC *thlaj ~ *tʌ-hlaj

8.1.1:08: I am intrigued by the Jin prefix 入 zəʔ- which has a verbalizing function in

鬼 'ghost, devil' > 入鬼 'fool around'

I doubt that the semantics of the graph 入 'enter' are relevant, unless

'enter devil(hood)' > 'fool around'

If 入 is simply a phonetic symbol, zəʔ- may be from OC *nɯ-. Could that be a longer form of the prefix that Sagart (1999: 74) reconstructed with an uncertain nasal as OC *N-? However, Sagart's OC *N- derives intranstive verbs from transitive verbs instead of verbs from nouns. Moreover, Schuessler (2007: 19) reconstructs *m- for the detransitivizing prefix, specifying a labial articulation. (Sagart [1999: 79] reconstructs *N- and *m- as separate prefixes.) Perhaps the prefix 入- is a Jin innovation (a grammaticalization of 'enter'?) without any OC source.

There seem to have been other nasal prefixes in OC: e.g., some that apparently did not change the meaning a word:

'raised wooden platform': 棓 MC *phəw-ʔ < OC *phəw-ʔ ~ MC *bəw < OC *N-phəw

'screen' (noun): 蔀 MC *phəwʔ < OC *phəwʔ ~ MC *bəwʔ < OC *N-phəwʔ

'overthrow, lay prostrate': 棓 MC *phəwh < OC *phək-s ~ MC *bək < OC *N-phək

But I cannot quickly find any example of a verbalizing OC nasal prefix. I don't think this pair of words is related even though they are written with the same graph 拒 since they have a different root initial:

'troops drawn up in squares' 拒 MC *kuəʔ < OC *kʷa-ʔ (root initial *kʷ-)

probably cognate to 'carpenter's square': 巨 MC *kuə < OC *kʷa

'to oppose': 拒 MC *gɨəh < OC *gaʔ < ?*N-ka-ʔ (root initial *k- if *g- < *N-k-)

09.7.30.23:56: THE RIAU VALUE OF BIG LANGUAGES

One may get the impression from "Picking Passengers for the Linguistic Lifeboat" that I want to save select exotica and let the rest go ignored. Using my approach, one could pick one Indo-European language, one Uralic language, and Basque for the European lifeboat. But could, say, German really represent everything from Portuguese to Russian? And Finnish and Hungarian are quite different. Not that any of those languages will perish any day soon, but I would like to save everything, including varieties of big languages - and so-called 'languages'.

Chinese is generally regarded as a single 'language' but I view it as a family of languages. Some family members are better known than others. Mandarin and Cantonese are the most famous. But 贛 Gan (spoken by up to 48 million - more than Polish) and 湘 Xiang (spoken by 36 million - more than Romanian or Dutch) are obscure despite their large numbers of speakers and a very famous native speaker of Xiang - Mao Zedong.

I once knew a Gan-speaking linguist who was working on ... sigh ... standard Mandarin syntax, even though Gan syntax is far less studied.

Xiang is becoming Mandarinized 'New Xiang' and may eventually merge with Mandarin (Norman 1988: 209). 'Old Xiang' - Xiang without Mandarinization - retains voiced stops and affricates from Middle Chinese. Chengbu Xiang 桃 dao 'peach' and 洞 doŋ 'cave' (Norman 1988: 207) are living fossils that are basically identical to their MC ancestors (except for tonal contours); they keep an original *d that has shifted to [t] and/or [tʰ] in many other Chinese languages:

	桃 'peach'	洞 'cave'
Middle Chinese	*daw	*doŋ
Chengbu Xiang (an 'Old Xiang' dialect)	dao (probably [daw])	doŋ
Changsha Xiang (a 'New Xiang' dialect)	taɤ	toŋ
Mandarin	[tʰaw]	[tʊŋ]
Cantonese	[tʰow]	[tʊŋ]

An even more potentially existing variant of a big language is Riau Indonesian, "a language without a name". Although what little I've heard about David Gil's work has yet to convince me, he might have stumbled upon a linguistic jackpot if he's right:

At present, my main research interest is in the Riau dialect of Indonesian: the more I work on this language, the more "exotic" it seems to me to become, and the more it leads me towards the belief that languages may differ from one another to a much greater extent than is commonly assumed. I am now beginning work on a book which, on the one hand, will provide an easy and accessible description of Riau Indonesian, and at the same time will lay the foundations for a theory of universal grammar that is non-Eurocentric in orientation, taking the Riau dialect of Indonesian (rather than Latin or English) as its point of departure.

09.7.29.23:54: KOREAN AND TRANS-NEW GUINEA ARE NA-T RELATED

I didn't do a good job of saying what I wanted to say last night about Proto-Chadic and Proto-Trans-New-Guinea (PTNG) correspondences, so I'm going to try again.

It's easy to find lookalikes between random languages. For instance, PTNG *na 'I' looks like Korean na 'I'. One could say that

PTNG *n : Korean n
PTNG *a : Korean a

If there is a relationship (genetic or contact) between these two languages, these correspondences would be predictive. They would tell me that if there were a Korean word with n and/or a meaning X, there could be a PTNG word with *n and/or *a meaning X (or something similar: X'), and vice versa. But if there is no relationship, these correspondences would not enable me to predict forms in one language on the basis of the other.

The *n-correspondence has no predictive power for pronouns. (Unfortunately, I have no other PTNG reconstructions on hand.)

Given PTNG *ni 'we', I would expect Korean to have n- in 'we', but Korean has uri for 'we'.

Given Korean nO 'thou', I would expect PTNG to have *n- in 'thou', but PTNG has ga for 'thou'.

Given Korean nOhUi 'you', I would expect PTNG to have *n- in 'you', but PTNG has gi for 'you'.

Note that Korean and PTNG have entirely different pluralization strategies. Korean has suppletion in the first person (na and uri are unrelated) and -hUi suffixation in the second person. PTNG has a ~ i ablaut: a for singular and i for plural.

So far all proposed correspondences have been X : X correspondences. X : Y correspondences are also possible, but are they meaningful: e.g.,

PTNG *g : Korean n (in 'thou' and 'you')

Sound changes can be viewed as attempts to explain X : Y correspondences. Why does Korean n correspond to both PTNG *n and PTNG *g? One could claim that

Proto-TNG-Korean *n > PTNG *n, Korean n
Proto-TNG-Korean *ŋ > PTNG *g, Korean n

In the latter change, the PTNG and Korean consonants only preserve some characteristics of the original.

One could even try to explain the vowel correspondences of the second person pronouns:

Proto-TNG-Korean *O > PTNG *a, Korean O

Proto-TNG-Korean *OhUi > PTNG *i, Korean Ohui

The latter is particularly hard to test since I can't think of any other Korean words ending in -Ohui.

One could go so far as to take two languages that have absolutely nothing in common and write a gigantic number of correspondences and 'sound changes' for them: e.g., if I want Korean na and Latin ego to be related, I could say Latin e- is a prefix (meaning what?*) and that

Proto-Korean-Latin *ŋ > Korean n, Latin g

Proto-Korean-Latin *ɒ > Korean a, Latin o

But a vast list of such changes for each and every word would have no predictive value since each word would have its own set of changes with little overlap. And an absurd number of proto-sounds would be needed to make the changes 'work': e.g.,

Proto-Korean-Language X *n1 > Korean n, Language X consonant 1

Proto-Korean-Language X *n5 > Korean n, Language X consonant 5

Proto-Korean-Language X *n10 > Korean n, Language X consonant 10

etc.

Truly related languages have a far more limited set of changes applying to a plausible number of phonemes. No language has ten kinds of n. Proto-languages are assumed to have the same constraints as modern ones - and all attested ancient languages.

*Chopping off unexplainable material and calling it an affix can be a dangerous copout method. An 'affix' that has no known meaning and is only posited to make the rest look like a word in another language** is not an affix at all.

**In the above case, chopping e- off of Latin ego 'I' results in go, which is a CV syllable like Korean na 'I'.

09.7.28.23:59: IS SEMITIC M-T TOO?

Last night, I wrote about the M-T (sounds like 'empty'!) pattern of Indo-European and Uralic personal pronouns. David Boxenhorn suggested that Semitic shared this pattern:

Hebrew 'ani 'I', 'atta 'thou' (m.), 'att 'thou' (f.)

Arabic 'anaa 'I', 'inta 'thou' (m.), 'inti 'thou' (f.)

I'd describe this as a medial N-T pattern, not an initial M-T pattern. Is there any Semitic-internal evidence for a prefix 'V- and for *m > n intervocalically (which I've never seen anywhere)?

These look too different from the IE and Uralic forms to be loanwords. Christopher Ehret (1995: 362-363) reconstructs very similar pronouns at the Proto-Afroasiatic level:

Gloss	Proto-Afroasiatic	Proto-Semitic	Egyptian	Proto-Cushitic	Proto-Chadic	Proto-Omotic
I	(ʔ)ân-, (ʔ)în-	*ʔ-n	(no cognates!)	*ʔâni	nV (na?)	Proto-North Omotic *in-
thou	(ʔ)ânt-, (ʔ)înt-	*ʔ-n-t	(no cognates!)	*ʔânt-	(no cognate)	Proto-Omotic *int-

I conclude that Afroasiatic pronouns have always been different from those of IE/Uralic. I wonder if one could reconstruct a pre-PAA pronoun stem *ʔ-n- with *-t- added as a suffix for the second person.

7.29.1:39: Frederik Kortlandt (my host at Leiden University) reviewed Ehret's book.

7.29.2:11: If Proto-Chadic for 'I' were *na, it would resemble the *na 'I' reconstructed for Proto-Trans-New Guinea. Chance resemblances are far more likely between short words than long words. Although PTNG was reconstructed largely on the basis of pronouns, just one lookalike pronoun would not be sufficient to claim that Chadic is related to PTNG. Chadic pronouns as a whole would have to resemble those of PTNG, but they don't according to Ehret (1995: 155-156, 198, 362):

Gloss	Proto-Chadic	Proto-Trans-New Guinea
I	nV (na?)	*na
we	(unknown)	*ni
thou	(unknown)	*ga
you (pl.)	*kun	*gi
he	*sV	(y)a, ua (also 'she', 'it' in PTNG)
they	West Chadic *sun	*i

If the vowel of PC *nV was not *a, then not even one member of these pronoun sets matches.

PC may have alternated between *-V and *-un to mark singular or plural whereas PTNG alternated between *-a and *-i. One could propose that

Proto-Chadic-TNG (!) *-un > *-yn > *-ỹ > PTNG *-i (but preserved in PC)

Proto-Chadic-TNG *g- > PC *k- (but preserved in PTNG)

Proto-Chadic-TNG *s- > PTNG *Ø- (but preserved in PC)

Such sound changes imply the correspondences

PC *-un : PTNG *-i
PC *k- : PTNG *g-

PC *s- : PTNG *Ø-

I doubt such correspondences can be found in nonpronominal vocabulary in the two languages. PC (and PAA) are not related to PTNG - if PTNG even existed at all.

09.7.27.23:59: PICKING PASSENGERS FOR THE LINGUISTIC LIFEBOAT

Last night, I wrote,

Given the extreme improbability of all 5,000 languages being documented before it's too late, we need a sampling method to choose which ones to focus our limited resources on.

As a historical linguist, I'm most interested in what languages can tell me about their past.

When Otto Dempwolff reconstructed Proto-Austronesian, the ancestor of over 1,200 languages, he looked at "certain critical aspects of the historical phonology of over 100 Austronesian languages" and ultimately concluded that just three languages, Tagalog, Toba Batak (in Sumatra), and Javanese

could represent all Austronesian languages for purposes of reconstructing an adequate ancestral phonology ... Dempwolff actually used hundreds of witnesses [presumably the "over 100" mentioned above plus bits and pieces of hundreds more] but for practical reasons he carried out the reconstruction as if it were based only on Tagalog, Toba Batak, and Javanese. (Blust 1990: 137)

What are the equivalents of those three languages for other language families? It's those key languages that I want on my linguistic lifeboat. (Note that not all key languages have to be famous: Toba Batak isn't as well known as Tagalog or Javanese.)

If I wanted to reconstruct, say, Proto-Slavic, I'd want one West Slavic language (e.g., Polish), one East Slavic language (e.g., Russian), and one South Slavic language (e.g., Serbo-Croatian). I couldn't reconstruct Proto-Slavic just using languages from a single branch like East Slavic (Russian, Belarusian, and Ukranian).

Right now we have a vague, impressionistic, and possibly erroneous classification of most of the world's languages. It's not as if nothing is known about the 5,000 languages of the world. Nor is it likely that we'll find many more new languages in the future. We have to use what little we know about the 4,000+ poorly documented languages to choose which ones we want to thoroughly study while they're still alive.

I'd want my 'Noah's ark' to have at least one representative of each language family. Papua New Guinea may be the most linguistically diverse part of the world, with 841 different languages (one out of six languages in the world is spoken in PNG!) which may belong to over sixty language families and perhaps a dozen or so isolates (i.e., one-member language families). PNG is important because its current situation may once have been the universal situation. Perhaps there were many more thousands of little languages spoken everywhere on Earth before the spread of Indo-European, Bantu, Chinese, etc.

7.28.1:08: The actual number of language families in PNG may be fewer than sixty if the Trans-New Guinea hypothesis is correct. Ross united nearly 500 Papuan languages into a 'Trans-New Guinea' (TNG) family and classified the rest into 22 other families and "9-13 isolates". However, TNG is not as well-established as Indo-European or Austronesian because

it is based on a single parameter, pronouns, and therefore must remain tentative. Although pronouns are conservative elements in a language [not necessarily; see below], they are both short and utilise a reduced set of the language's phonemic inventory. Both phenomena greatly increase the possibility of chance resemblances, especially when they are not confirmed by lexical similarities.

The problem is that pronouns can be borrowed: e.g., the Japanese male first-person pronoun boku is from Middle Chinese 僕 *bo(w)k 'servant'. But Malcolm Ross, the latest proponent of TNG,

argues that open-class pronoun systems, where borrowings are common, are found in hierarchical cultures such as those of Southeast Asia and Japan, where pronouns indicate details of relationship and social status rather than simply being [closed-set] grammatical pro-forms as they are in the more egalitarian New Guinea societies.

Moreover, although a similar pronominal argument can be made for Indo-Uralic, IU remains controversial in spite of non-pronominal evidence for it. Both Indo-European and Uralic have the 'me-thee' pattern:

Indo-European

Sanskrit maa 'me', tvaam 'thee'

Russian menja 'me', tebja 'thee'

English me, thee (th < *t-)

Proto-Uralic *mina, *mun 'I', *tina, *tun 'you'

Mari məj, məń, təj, təń

Udmurt mon, tun

Komi me, te

Further Uralic forms can be found by looking up equivalents of Finnish minä, sinä [see below] in this table.)

Note that the two big Uralic languages seem to have different halves of the pattern:

Finnish minä, sinä (si < *ti-; see here)

Hungarian én, te

Without knowing anything about the history of Finnish, one might doubt it was a Uralic language on the basis of sinä (which in fact fits the pattern). This illustrates the danger of superficial comparisons.

09.7.26.23:59: FIELDWORKERS AND THE LINGUISTIC LOTTERY

In The Rise and Fall of Languages, RMW Dixon asks,

Why bother? These are insignificant languages spoken by insignificant peoples ... How can these languages tell us anything we don't know already from studying the rich resources of French, German, Spanish, English and Russian, perhaps throwing in for good measure Finnish, Turkish, Hebrew, Arabic, Hindi, Japanese, Chinese and Swahili? (p. 117)

He counters,

This form of response reflects several illusions. The first is that people with a limited material culture must have a proportionately threadbare language. The reverse tends to be the case. (p. 117)

He does not specify any illusions past the first. By "the reverse", does he mean that "people with a limited material culture" actually have languages with inversely proportionate wealth? This can sound condescendingly noble savage-y: "These people seem so poor, but in reality they are richer than Western man!" I think Dixon means that "people with a limited material culture" actually have rich languages without any reference to proportionality.

In my experience, there is no correlation between the complexity of a language and its speakers' civilizational level. (Yes, "civilizational". I don't play the game of we're-all-equal.) There are simpler languages, and there are more complex languages, but there are no 'primitive' or 'advanced' languages - such terms are rooted in value judgments and are not very useful for linguistics. Calling a language 'primitive' or 'advanced' tells me something about attitudes toward the speakers of that language, but it tells me almost nothing about the characteristics of that language. I can predict that "people with a limited material culture" will not have vocabulary for nuclear physics, but I can make no predictions beyond that. Although many make a big deal out of how some 'primitive' language lacks this or that modern word, one should remember that no language had words for nuclear physics not too long ago. All languages are capable of expansion.

Dixon lists a number of traits of small languages which may seem exotic to speakers of bigger ones. He concludes,

Every language has its own genius, its own points of interest, certain things that can be said in it more clearly than in other languages. And only by describing every possible language - investigating the correlations between their categories, and the ways in which these change - can we hope to achieve a reasonable understanding of what human language is, how it can be structured, and the ways in which it evolves. (p. 127)

As much as I am interested in small languages, I think he is going too far. There is a logical problem: if we don't know much about the thousands of languages in the world, how can we be sure that "[e]very language has its own genius"? Having looked at many languages both big and small, I see a lot of redundancy from a typological perspective. Every language is unique, but some are more unique than others. And if you look at languages in a given part of the world, you'll see a lot of similarity due to 'genetic' and/or areal influences. Languages tend to be like their neighbors. At times, the languages of northeast Asia that I studied - Old Turkic, Mongolian, Manchu, Korean, and Japanese - felt like the same language with different words because their grammar was so similar even though I consider them to be unrelated. I had similar feelings when I studied Southeast Asian languages. The 5,000 languages do not have 5,000 totally different major points of interest.

If one is doing fieldwork in hopes of adding something significant to human linguistic knowledge beyond documentation (which is a valid and sufficient end in itself), of finding some heretofore unseen grammatical characteristic exemplifying the creativity and flexibility of the human mind, one is playing the lottery. You could luck out and work on an unusual language like Pirahã which may make linguists question their assumptions, but it's likely you'll end up working on a language that shares characteristics of its more well-known neighbors.

Given the extreme improbability of all 5,000 languages being documented before it's too late, we need a sampling method to choose which ones to focus our limited resources on.

7.27.0:11: Languages are created by humans and are as diverse as humans. Paraphrasing Dixon, one might say every human has their own genius, their own points of interest, certain things that they can do but others can't. But what if there's an emergency and we can't save everyone? Who do we choose? The sad thing is that we'll make bad choices. We'll choose the seemingly exciting over quiet, hidden treasures. And we may never realize whom - or what - we overlooked.