I vaguely recall seeing a sample of Gin* somewhere online and wanted to write about it, but forgot until David Boxenhorn mentioned this Wikipedia article that I hadn't seen in a while. Was I going to comment on the phoneme list in the Chinese Wikipedia? I'm not sure. In any case, I have 京语简志 Jingyu** jianzhi (A Brief Account of the Gin Language) now.

Gin is in China which is of course north of Vietnam, so I expected it to be like northern Vietnamese. Not quite. The table of initials below is from page 7 of Jingyu jianzhi with altered notation, added colors, and a final column for phonemes in Hanoi Vietnamese but not in Gin:

Point of articulation Stops Nasals Continuants Hanoi
Labials p- pʰ- ɓ- m- f- v-
Dentals t- tʰ- ɗ- n- ɬ- l-
Alveolars ts- tsʰ-


j- /c/
Velars k- kʰ-
ɣ- /x/
Labiovelars kʷ- kʰʷ-

Glottals ʔ-


Labioglottals ʔʷ-


Red indicates initials only in loanwords from Cantonese.

Words  with p-, pʰ-, and tsʰ- must have been borrowed after the following  sound changes:

*p- > ɓ-

*pʰ- > f-

*(t)sʰ- > tʰ-

Earlier Vietnamese had some sort of voiceless lateral that merged with voiced l-. Cantonese loanwords with ɬ- must have been borrowed after that merger.

Standard Cantonese lacks ɬ-. Judging from the loanword ɬiu 'small' corresponding to standard Cantonese 小 siu (< Middle Chinese *siewˀ), I assume the variety of Cantonese known to Gin speakers shifted *s- to ɬ-.

Green indicates phonemes whose pronunciations differ from their Hanoi counterparts:

Gin ts- : Hanoi ch- /c/

Gin j- : Hanoi d-, gi-, r- /z/ < Middle Vietnamese /d ɟ r/

Gin kʰ- : Hanoi kh- /x/

Gin ts- may reflect the influence of Cantonese which has ts- but not c-.

Gin j- is like central and southern Vietnamese rather than northern Vietnamese. I suspect it developed independently under pressure from Cantonese which lacks z-.

Gin kʰ- is a retention of a stop lost in Vietnamese (though preserved in Vietnamese spelling which was devised in the 17th century). Younger speakers weaken kʰ(ʷ)- to h(ʷ)-, not x(ʷ)-. Cantonese has h- but lacks x-. (Standard Cantonese also lacks hʷ-, but perhaps the dialect known to the Gin has it.)

Blue indicates unit phonemes corresponding to Hanoi consonant-/w/ sequences: e.g.,

Gin kʷ- : Hanoi qu- /kw/

Hanoi allows /w/ to combine with nonback, nonlabial consonants, but such clusters correspond to single consonants in Gin: e.g.,

Gin ɗaːn kiət : Hanoi đoàn kết  /ɗwaːn ket/ 'to unite' < Chn 團結

The unit phonemes of Gin are like the labiovelars of Cantonese which does not permit [w] after nonvelars. One could analyze Gin [kw], etc. as sequences but that would complicate the canonical syllable shape (/C(w)V(C)/ instead of /CVC/) and one would have to add the caveat that /w/ could only follow velars and glottals.

*Gin is the PRC standard romanization of the autonym [kin] 'Gin'. The use of G for unaspirated [k] is a carryover from pinyin.

**The Vietnamese autonym is Kinh, which is homophonous with Sino-Vietnamese 京 kinh 'capital'. Hence the name was Mandarinized as 京 Jing (as in 北京 Beijing 'Northern Capital'). 语 yu is 'language'. SIXTY CALVES

Today is the first day of the Far Eastern calendar (a term David Boxenhorn and I devised).

Andrew West's site displays the current date and time according to the Far Eastern calendar in Old Mandarin, Middle Mongolian, Manchu, Jurchen, Khitan (both scripts), and Tangut. Details here.

This new year is 甲午 (31st) in the Far Eastern sexagenary cycle but will be Jaya 'victory' (28th) in the Indian sexagenary cycle. Wikipedia user Śiva wrote,

I find it very surprising that there is no link between this article and the Sexagenary cycle used in East Asia. They seem to work on the same principle, and I wouldn't be surprised if one was the source for the other (as is the case with the mapping between planets and days of the week).

The Far Eastern cycle is a combination of the ten-stem and twelve-branch cycles, whereas the Indian cycle does not contain any other cycles (though it is divided into three sets of twenty years).

My guess is that the two cycles developed independently from each other and other systems of sixty around the world. Sixty is a highly composite number and hence more attractive than a prime number such as forty-three.

Did the two cycles coexist in Southeast Asia? According to Diller (2000), the early Tai had a Chinese-like sexagenary cycle combining cycles of ten and twelve. The SEAlang dictionaries list two terms for the Indian cycle in Thai and Lao and one term in Khmer:

From Skt mahācakra- 'great wheel':

Thai มหาจักร <mhācakr> mahaacak

Lao ມະຫາຈັກ <mḥhācak> mahaacak

(Khmer មហាចក្រ <mahācakra> meaʔhaacak ~ mɔhaacak means 'emperor')

From Skt bṛhaspaticakra- 'Jupiter wheel; cycle of sixty years':

Thai พฤหัสบดีจักร<bṛhasɓɗīcakr> > ph(a)rɯhatsabɔɔdiicak

Lao ພະຣືຫັດສະບໍດີຈັກ  <bḥrhatsḥɓɔ̄ɗīcak> pharɯɯhatsabɔɔdiicak

Khmer ព្រហស្បត្ណិចក្រ <brahaspatṇicakra> prɔhoahcak (why silent <ṇ>?)

I was not able to find any Thai, Lao, or Khmer equivalents of Sanskrit saṃvatsara- 'year' (< vatsa- 'calf' < 'yearling'? - hence "Calves" in the title). Do the sixty year names of the Indian sixty-year cycle have Thai, Lao, and Khmer equivalents: e.g.,

Thai ชัย <jay> chay

Lao ຊັຍ <jay> say

Khmer ជ័យ <jăya> cey


When writing "A Silent Sacrifice in Thai(y)", I thought I had seen a Thai word with an unmarked silent <ṇ>, but I couldn't find any such word. (It would be neat if someone listed all the possible unmarked silent letters in Thai.) I did, however, rediscover this word with a similar-looking letter in the section on silent letters in Haas' (1956) The Thai System of Writing:

เพชฌฆาต <bejjhghāt> phetchakhaat 'executioner' < Pali vajjha- 'to be slain' + ghātaka- 'slayer'

This word was actually printed as

เพชฆาต <bejghāt>

in my edition which stated that its <ṇ> (sic!) was an unmarked silent letter. Note how <jh> and <ṇ> are identical except for the postiion of the second loop. Were these typos the reason why I thought there was a Thai word with an unmarked silent <ṇ>?

This word is of interest to me for several reasons other than that typo.

1. Thai ph- < *b- : Pali v-

The initial consonant is <b> rather than the expected <w> corresponding to Sanskrit and Pali v-. The Khmer word for 'executioner' also has <b>:

ពេជ្ឈឃាត <bejjhaghāta> ~ ពេជ្ឈឃាដ <bejjhaghāṭa> (with a nonetymological spelling of the final consonant) pɨccheaʔkhiet

My guess is that the Khmer heard an Indic language speaker with b- for v- (like modern Bengali*), borrowed the word as *bejjhaghaat, which the Thai in turn borrowed as *betjagaat. (Thai never had voiced aspirates or syllables ending in an obstruent *-j.) Later, all the voiced obstruents devoiced and aspirated in Thai: phetchakhaat.

Today I realized that even though early Thai did have <v> *v, that consonant never appeared in Indic borrowings, implying that the source of the Indic borrowings (i.e., premodern Khmer) had *w rather than *v. Khmer never had a /w/ : /v/ distinction, so Khmer speakers borrowed Indic v as Khmer <v> *w. (Maybe I should transliterate that Khmer letter as <w> to match my reconstructed *w rather than Indic.)

Modern Khmer has v like Lao and Vietnamese. Did one language shift *w to *v, and did that change spread through the region? (The change must have occurred in Lao after <v>  *v devoiced to f.)

2. Thai e : Pali a

The first vowel of phetchakhaat is irregular; it should be a or o**. Another word with this correspondence is

เพชร <bejr> ~ เพ็ชร <bĕjr> ~ เพ็ชร์ <bĕjr̽> phet < Sanskrit vajra- 'diamond' (cf. Pali vajira- 'id.')

Gedney (1947: 348) speculated that its a assimilated with a following i in some intermediary language (presumably Indic as such a change is alien to Khmer as well as Thai). However, there is no i following the first a of Pali vajjha-. Did a sporadically front before palatals in some Indic language?

3. Is Thai <jh> really silent in 'executioner'?

Many Thai medial consonant letters in Indic loans have what Haas (1956: 59) called 'double function': they simultaneously represent the coda of one syllable and the onset of another syllable ending in -a. If 'executioner' were spelled as

เพชฆาต <bejghāt> phetchakhaat

its <j> would have double function: it would represent the final -t of the syllable phet and the initial ch- of the syllable -cha-.

However, in reality, it is spelled with <jh>. Although Haas thought its <j> had double function followed by an unmarked silent <jh>, I think <j> has a single function: to represent the final -t of the syllable phet. I view <jh> as representing the syllable cha.

For comparison, modern Lao spelling has no silent letters and hence no double function consonant in this word; <t> represents -t and <j> represents s-. (There is no affricate /ts/ in Lao.)

ເພັດຊະຄາດ <bĕtjḥgāt> phetsakhaat

Back to Thai: If the letter after <j> really were <ṇ> as printed in Haas' book, then that <ṇ> would be silent, as there is no [n] in phetchakhaat.

*1.31.0:41: Unlike other Indic scripts, the modern Bengali script has no letter for v. The Bengali script - or some immediate ancestor - must have also had such a letter in the past. When did it disappear? And how long was it used after *v hardened to b in Bengali? There is a codepoint (U+09B5) where I would expect v in the Bengali block of Unicode. Will it ever be filled?

**1:31:2:11: Bengali ɔ and o may correspond to Sanskrit a, so I have long wondered if Thai o for Indic a reflects a Bengali innovation. Compare:

Sanskrit jana- 'person', janma- 'birth'

Bengali জন -jon (classifier for humans), জন্ম jɔnmo- 'birth'

Thai ชน <jn> chon < *jon 'person', ชนม <jnm> chon < *jon 'birth'

Khmer ជន <jana> cʊən < *jɔn 'person', ជន្ម <janma> cʊən < *jɔn 'birth'

When the Thai borrowed Indic vocabulary through Khmer, why didn't they borrow Khmer as Thai *ɔ? I think it's because Thai lacked short at the time. Proto-Southwestern Tai only had short *o and long ɔ (Pittayaporn 2009: 197). Indic borrowings may have added long oo to the Thai inventory.  I assume short ɔ in modern Thai is (a) secondary in native words and (b) in borrowings such as ชอล์ก <jʔl̽k> chɔk 'chalk'. A SILENT SACRIFICE IN THAI(Y)

I didn't know until this morning that "ghoti can be a silent word". Using the same logic, I tried to construct a 'silent word' in Thai using silent letters:

ยรรษหิทธุณ <yrrṣhiddhuṇ> yan 'sacrifice', normally spelled ยัญ <yañ> < Pali yañña-

ย <y>: silent in ไท <daiy> Thay 'Thai'

ร <r>: silent in เกียติ <kīerti> kiat 'glory' < Skt kīrti-

รร <rr> can represent a

ษ <ṣ>: silent in ลักณ์ <lakṇ̽> lak 'sign' < Skt lakṣṇ-

ห <h>:silent in พรม <brhm> Phrom 'Brahma' < Skt brahma-

ิ <i>: silent in ชาต <jāti> chaat 'nation' < Skt and Pali jāti- 'people'

ท <d>: silent in จันร์ <candr̽> can 'moon' < Skt candra-

ธ <dh>: silent in พุธ <buddh> Phut 'Buddha' < Skt and Pali Buddha-

ุ <u>: silent in เหต <hetu> heet 'cause' < Skt hetu-

ณ <ṇ>: silent in ลักษ์ <lakṣ̽> lak 'sign' < Skt lakṣṇ-

Many Thai silent letters are marked with the diacritic การันต์ <kārnt̽> kaaran which I transliterate as <˟>. The word การันต์ <kārnt̽> itself ends in a การันต์ <kārnt̽>.However, I tried to avoid letters with kaaran when creating my silly spelling of yan. Is there a word in which an unmarked letter normally for [n] (ญ <ñ> , ณ  <ṇ>, น <n>) is silent?

(What is the etymology of การันต์ <kārnt̽> kaaran? It looks like it should be from a Sanskrit or Pali kāranta-, but I can't find any such word. Is it from kārānta- < kāra-anta- 'letter-end' (ending in a letter, referring to how kaaran normally appears in word-final position*?).

*My impression is that word-medial kaaran was rare until a recent wave of transliterations from European languages like จอหน <cʔh̽n> cɔɔn 'John'.

One example of an old word-medial kaaran is สาสน <sās̽n> saan 'document' < Pali sāsana-  'message'. SÁRA ÉS SARRA

Shortly after being puzzled by the silent s of Ghosn, I was stumped by the S of Hungarian Sára 'Sarah'. One might assume that Sára is [sara], but it is actually [ʃaːrɒ]. Why isn't it *Szára [saːrɒ]? I don't know of any other 'ver-sh-ion' of 'Sarah' with an initial [ʃ].

Another Hungarian Biblical S-name of this type is Sét [ʃeːt] 'Seth' instead of *Szét [seːt]. (But the Hungarian name of the Egyptian god Set is Széth [seːt].)

Are Sára and Sét 'eye loans' from a European language whose S was [s]? Are there non-Biblical borrowings in Hungarian with s [ʃ] corresponding to a foreign [s]? Or is S the standard Magyarization of Hebrew שׂ <ś> and שׁ <š>?

(S [ʃ] isn't the only Hungarian consonant corresponding to Hebrew שׁ <š>. 'Moses' is Mózes [moːzɛʃ] with a -z- corresponding to Hebrew שׁ <š>; its s [ʃ] corresponds to nothing in Hebrew  מֹשֶׁה‎<mošeh> and is surely from some other European language form of the name like German Moses [moːzəs]. I guess the final -s was first added to Greek Μωυσής <Mōusḗs>.)

és [eːʃ] is Hungarian for 'and', but its spelling reminds me of Spanish es 'is'. (Latin speakers could in theory have misunderstood es as '(thou) art', though I imagine they would have been able to grasp its correct meaning from context.) So the title is 'Sára and Sarra' or 'Sára is Sarra' (if one ignores the acute accent on és).

According to Wikipedia, Sarra is the ISO 259 romanization of Hebrew שָׂרָה <śārāh>. Why is r doubled even though the Hebrew original has no dagesh?

The doubling also appears in Russian Сарра <Sarra>, the Biblical Sarah, as opposed to Russian Сара <Sara> the personal name. Surprisingly there are no Belarusian or Ukrainian Wikipedia articles on the Biblical Sarah, so I checked the entries on Abraham in those languages and found Сара <Sara> in all three of them: Belarusian in classical orthography, Belarusian in postclassical orthography, and Ukrainian. WHERE HAS THE S GHOSN(E)?

Carlos Ghosn's name in Arabic is  كارلوس غصن <kārlws ghṣn>. I have seen his name for years, but have never heard it in English. His surname is the Arabic word for 'branch'. I assume its Lebanese Arabic pronunciation is [ɣosˁn] (corresponding to a standard [ɣusˁn]). So I'm surprised that some English speakers pronounce it like gone. These spellings on Wikipedia reflect similar pronunciations:

Hebrew גוהן <gwhn> (why ה <h>?)

Japanese ゴーン Gōn

Mandarin 戈恩 Gēēn [kɤ˥ ɤn˥] (Mandarin does not have the syllables [go], [on], or [gon])

Russian Гон <Gon>

OTOH, the Azerbaijani spellings are Gosn, Ğosn [ɣosn], and even Qosn!

What is the origin of the s-less pronunciation of Ghosn? Is his name pronounced [gɔ̃] in French? Other bearers of the name are Ghosne in French; is that spelling pronounced [gɔn]?

I will close with a joke:

When I was in school they told us that ghoti could be pronounced like fish, so is Ghosn pronounced like fission?

Ghoti has its own Wikipedia entry. WHAT'S IN THE ARABIAN GH-UL-F?

Current standard alphabets are usually encoded in uninterrupted ranges in Unicode: e.g.,

Latin A-Z: U+0041-U+005A

Cyrillic А-Я: U+0410-U+042F

Hebrew א-ת: U+05D0-U+05EA

Thai ก-ฮ: U+0E01-U+0E2E

Korean ᄀ-ᄒ U+1100-U+1112


Granted, these ranges are only 'complete' from the perspective of the biggest language using each script: e.g., 26 letters are sufficient for English, but not for, say, Czech (čeština), whose Č is U+010C, outside the A-Z range. Similarly, Ukrainian (українська) Ї is U+0407, before the Russocentric U+0410-U+042F range. In theory, the Latin and Cyrillic ranges could have integrated noncore letters (in red):

АӐӒӔБВГЃҐҒҔӶӺД ... instead of АБВГД ...

Or they could have been more selective about integrating noncore letters and be like the Devanagari block which has some noncore (i.e., non-Sanskrit) characters integrated with the core range

अ आ इ ई उ ऊ ऋ ऌ ऍ ऎ ए ऐ ऑ ऒ ओ औ क ख ग घ ङ च छ ज झ ञ ट ठ ड ढ ण त थ द ध न प ...

and others placed in two groups at the end (क़ ख़ ग़ ... and ॻ ॼ ...). (1.29.0:11: Oddly the core Sanskrit syllabic consonant ॠ <ṝ> and the theoretical Sanskrit syllabic consonant ॡ <ḹ> are after the second group of noncore characters at the end.)

Not all noncore characters are equally noncore. I regard Ukrainian Ї as more core than, say, U+A69F COMBINING CYRILLIC LETTER IOTIFIED E. (I have no idea what the latter is for.* It wasn't added to Unicode until 6.1.0 in January 2012. You can see it and other presumably very noncore Cyrillic letters here.)

If I had to create a Cyrillic block for a Unicode-type encoding without any regard for compatibility with previous encodings, I might group all modern Cyrillic letters into one block and place all the others into a block for obsolete letters. It turns out that all of the letters in my sample of Cyrillic with noncore integration are still in use and would remain together -


- but COMBINING CYRILLIC LETTER IOTIFIED E would still be far away from them.

Back in the real world, the actual Arabic block in Unicode puzzles me. I could understand

- a block of Classical Arabic letters only with modern non-CA letters in another block, and obsolete letters in yet another block

- a block of all current Perso-Arabic letters with obsolete letters in yet another block

but not a block of Classical Arabic letters interrupted by "Additions for early Persian and Azerbaijani" (U+063B-U+063F) and tatweel (U+0640F) between غ <gh> and ف <f>. The five added letters aren't even variants of <gh> and <f>, and tatweel isn't a letter at all:






If the five added letters have to be in the core block, why weren't they grouped with ك <k> and ي <y>? It's as if the Latin core range were


with obscure versions** of K and Y but not common letters like Czech Č, etc.

(The Latin alphabet has no tatweel, so I chose a hyphen to approximate its odd position.)

Tatweel, being a much-needed character, was understandably in Unicode 1.1.0 (June 1993), though I don't understand its placement. And I do not understand why the codepoints U+063B-U+063F were reserved until they were finally filled in Unicode 5.1.0 (March 2008). I would have placed tatweel before or after the core block and placed the five obsolete letters in a different block.

*1.29.0:16: Of course Michael Everson and his coauthors would know:

A69F COMBINING CYRILLIC LETTER IOTIFIED E can be found already in some of the earliest Old East Slavic mss., like the Putjatina mineja of the 11th cent.

**1.29.1:02: For perfect parallelism with U+063B-U+063F, I should have picked obsolete Latin letters, but I mostly didn't - I just grabbed K and Y-like letters from the Latin Extended Additional and Latin Extended-C blocks:

U+1E30 Ḱ LATIN CAPITAL LETTER K WITH ACUTE (for Saanich, Macedonian transliteration, and Proto-Indo-European)

U+1E32 Ḳ LATIN CAPITAL LETTER K WITH DOT BELOW (Arabic transliteration)

U+1E34 Ḵ LATIN CAPITAL LETTER K WITH LINE BELOW (I have no idea; not even in eki.ee's database)


U+1E8E Ẏ LATIN CAPITAL LETTER Y WITH DOT ABOVE (again, I have no idea; not even in eki.ee's database)

1.29.1:09: ADDENDUM: What is the logic behind the ordering of Hebrew letters in the Alphabetic Presentation Forms block? The first and second letters are variants of yod interrupted by HEBREW POINT JUDEO-SPANISH VARIKA, the third is a variant of ayin, the fourth is a variant of alef, etc. AN EXALTED JUDGE BEYOND EXPLANATION

When I looked at the Wikipedia entry on Mongolic languages while adding to my entry on vowel length in Khitan and Shira Yughur last Friday night, the part about Moghol caught my eye:

unclear whether there are speakers left

I went to the entry on Moghol and was surprised to see khoda 'God' (< Persian خدا khodā)  instead of a form of Allah. But I should have expected Persian influence on a language spoken in Afghanistan.

So I guess the first half of Moghol daidān deksh 'God the Exalted' (Weiers 2003) is from Persian (and ultimately Arabic) دیان dayyān 'judge', though I can't explain why Moghol -d- corresponds to the second -y- of the Perso-Arabic original. And where is deksh 'exalted' from? Its un-Mongolic (but Persian-like) position after the noun makes me think the phrase daidān deksh was borrowed as a whole from Persian. However, I can't find anything similar to deksh in Persian or Mongolian. THEIR VILLAGE CAN ALSO BE WARM

Andrew West's post on the Hphags-pa tombstones of Quanzhou made me revisit his Hphags-pa pages. One of the sample words on his "Description" page is

ꡠꡘ ꡁꡦ ꡖꡟꡊ <er khÿ Hud>* 'Christians'

corresponding to Classical Mongolian ᠡᠷᠺᠡᠭᠦᠳ Erkegüd.

According to Igor de Rachewiltz (2006: 72),

Yelikewen 也里可温 is the standard Chinese transcription of mmo. [Middle Mongolian] erke'ün (pmo. [Proto-Mongolic] *erkegün, pl. erkegüd < mtu. [Middle Turkic] ärkägün < syr. [Syriac] < gr. [Greek]) "Nestorian Christian".

What is the Greek source of this word?

The title of this post refers to the Chinese characters for 也里可温 Yelikewen:

ye 'also' (in modern Mandarin)

li 'village'

ke 'can'

wen 'warm'

In 13th century northeastern Chinese, 也里可温 was pronounced *jelikʰoun. *je and *kʰo were the available syllables closest to Middle Mongolian e and ke.

The Erke'üd / Yelikewen were said to worship 長生天 'Long Life Heaven' (Kim 2004: 140).

*I use Coblin's transliteration of ꡦ as <ÿ> since this letter is grouped with semivowels and is used to write Mongolian ü and ö.

Capital H for ꡖ is a carryover from my transliteration of Written Tibetan འ.

