In A History of the Korean Language, Lee and Ramsey (2011: 304) wrote,

The distinctive Japanese-style boxed lunch, for example, which was called by its Japanese name, o-bentō, at the end of World War II, briefly metamorphosed into 변또 pyŏntto*, but by the 1980s it had become a 도시락 toshirak, with a native name.

(I have added hangul and altered the romanizations to fit the style I use on this site.)

When was toshirak first attested? It's not in Yu Chang-don's (1964) dictionary of Yi Dynasty Korean, though an older form 도슭 tosŭk /tosŭrk/ appeared in 靑丘永言 Chhŏnggu yŏngŏn (1728). Its entry in Martin et al.'s (1967) dictionary defined it as

a small willow basket (for carrying food), a lunch basket. Syn. 벤또 [pentto], 밥-동구리 [pap-tongguri]**

indicating that it was already in use as an equivalent of (o-)bentō of by the mid-60s.

According to the Korean Wikipedia, the North Korean word for boxed lunch is 곽밥 kwakpap, which I presume is from kwak 'box' and pap 'rice'. When was kwakpap coined? Kwak sounds as if it should be a Sino-Korean word, but the closest Sino-Korean word is kwak 'outer coffin' which is a loose semantic match. Naver lists a North Korean variant papkwak 'rice box' with the elements in reverse order.

*1.19.0:39: Naver's Korean monolingual dictionary lists 변또 pyŏntto, 벤또 pentto, 벤토 pentho, and 변토 pyŏntho as "errors" for toshirak.

Korean does not allow -nt-, so -ntt- and -nth- are the only available approximations of Japanese -nt-.

Pyŏn may be the Sino-Korean reading of the first character for the prewar Japanese spelling 辨當 of bentō. (The modern spelling is 弁当.) Why wasn't the word completely Sino-Koreanized as *변당 pyŏndang?

**1.19.1:29: Oddly Martin et al. have no entry for pap-tongguri, so it's not clear whether it refers to a container for lunch and/or the lunch in the container. That dictionary also lacks an entry for tongguri, which Naver's Korean monolingual dictionary defined as

'a box made out of tightly woven bamboo stalks or willow branches; used for carrying food and made up of matching top and bottom halves'

Naver regards pap-tongguri as an "error" for toshirak, and the word only has 152 Google results, so I assume it is now obsolete.

Other "errors" for toshirak are 도실기 toshilgi (1,340 Google results) and 밥도시락 pap-toshirak (39,400 Google results; rejected as redundant?).

Naver lists three dialectal equivalents of toshirak:

동고량 tonggoryang (Cheju; related to tongguri, though Naver divides it as tong-goryang?)

밥두구래기 pap-tuguraegi (South Hamgyŏng; 'rice' + another cognate of tongguri?; tuguraegi by itself is equivalent to standard 뚝배기 ttukpaegi 'unglazed earthen bowl')

펵개 phyŏkkae (North Hamgyŏng; etymology unknown)

Are the Hamgyŏng words still current, or have they been phased out in favor of kwakpap and/or papkwak? How outdated is the information about North Korean dialects in Naver's dictionary? PRINCE PRINTS MINCES MENCHI

What is the origin of the name of the Japanese dish menchi katsu? According to the English Wikipedia,

Menchi and katsu are phonologically modified versions of the words "mince" and "cutlet".

Mince and cutlet would normally be Japanized today as minsu and katto(retto). But of course such modern conventions didn't exist in the past.

I think the identification of katsu is correct. According to the Japanese Wikipedia, katsuretsu for 'cutlet' is attested in 1899, and tonkatsu 'pork cutlet' is attested in 1911. And the Japanese Wikipedia reports minsu mīto katsuretsu 'mince meat cutlet' (with the expected minsu) in the Meiji era.

I was initially skeptical about equating menchi (also minchi in the Kinki area) with mince. I had never heard of English [ɪ] and [s] being borrowed as Japanese ch. But then I realized that the phonetic mismatch wasn't as great as I had thought. [ɪ] is between [e] and [i] (hence the menchi ~ minchi variation in Japanese), and /ns/ can be pronounced as [nts]: e.g., prince and prints - or mince and mints - can be homophones. [nts] is close to nchi, though it is even closer to -ntsu.

1.18.1:18: The Japanese Wikipedia has a discussion of the etymology of menchi. It mentions a western Japanese word minchi 'ground meat'. Is minchi a borrowing from mince? Or was there a native word *menti that became minchi in the west (with nonfinal *e > i raising) but became menchi in the east? (Cf. western imo 'potato' from an *emo preserved in Akita ni-ndo-emo 'potato', lit. 'two-times-potato'. Proto-Ainu *emo 'potato' was borrowed from an early eastern Japanese dialect retaining *e [Vovin 2010: 35].) I doubt that, though it would be neat if variation in a modern loanword happened to mirror variation due to much earlier sound changes.

I looked up the Japanese Wikipedia entry on 挽肉 hikiniku 'minced meat' (which can also be accessed via the title ミンチ minchi and the disambiguation page for メンチ menchi). On its left side were links to Russian and Belarusian articles titled Фарш Farsh. I presume farsh is from French farce, but I don't understand why it ends in -sh instead of -s. The Japanese versions of that French word are farusu and farushi (< farci(e) 'stuffed'). (Unlike Russian or Belarusian, Japanese does not permit [si], so foreign [si] is borrowed into Japanese as shi.) PROTO-TAI *J-, *ˀJ-, OR *ʄ-?

Last week I wrote about the Thai and Lao reflexes of Proto-Tai *ˀj- (written as *ʔy- in that post). Li (1977) reconstructed a distinction between Proto-Tai *j- and *ʔj-, but Pittayaporn (2009) only reconstructed *ˀj-:

the reconstruction of PT *j- is dubious as most etyma are found only in SWT [southwestern Tai]. Etyma that are found outside of SWT either show irregularities in the correspondence, or can be identified as loans.

For instance, Pittayaporn regarded 'paternal grandmother' (e.g., Thai yaa B2, Lao ɲaa B2) as a loan from Mon-Khmer (cf. Proto-Mon-Khmer *jaʔ 'grandmother' in Shorto 2006)*.

Is there any language that has ˀj- without j-? In UPSID, Bolyu (a.k.a. Lai) is the only language that has ˀj without j, but Edmondson (1995) listed j- as well as ʔj-words in Bolyu, and Edmondson (1996) distinguished between /j-/ and /ʔj-/ in Bolyu. Edmondson (1995) gives me the impression that ʔj- is less common than j-, as I didn't find any ʔj-words until the 19th page of his list which was organized by English gloss. Gong's (1997) Tangut reconstruction has ʔj- without j-, but I prefer to reconstruct j-: e.g.,

1je (genitive suffix) = Gong's 1ʔjij

(mostly transcribed as ye in Tibetan; less common transcriptions are g-ye, g-yi, g-yeh, and yi)

I wonder if Proto-Tai ˀj- is from an earlier *j-. Could a glottal stop have been added to *j- since it is similar to *i which would automatically be preceded by a glottal stop if no other consonant preceded it?

What does Pittayaporn's Proto-Tai *ˀj- correspond to in other Kra-Dai proto-languages? Norquest (2007: 262) listed the following correspondences:

Proto-South-Kra-Dai *C-c-
Pre-Hlai *C-ɟ- > Proto-Hlai *tɕ- Proto-Be *C-j- Proto-Southwest Tai *ʔj-

(1.17.0:07: Oops, I missed a second set of correspondences for Proto-Southwest Tai *ʔj- on p. 278. I'll get to it later today. It doesn't affect my arguments in this post.)

Norquest (2007: 250) proposed Proto-South-Kra-Dai as the ancestor of Hlai, Be, and Tai. See the family diagram in "Dating Proto-Kra-Dai".

Given Norquest's (2007: 262) Proto-South-Kra-Dai *C-p- and *C-t- correspond to Pittayaporn's (2009: 70) Proto-Tai implosives *ɓ- and *ɗ-, could his Proto-South-Kra-Dai *C-c- correspond to a Proto-Tai palatal implosive *ʄ- (= Pittayaporn's *ˀj-) that later became j- with first series tones conditioned by voiceless initials?

(1.16.23:57: I just noticed that someone rewrote Pittayaporn's *ˀj- as an implosive *ʄ- in the Wikipedia article on Proto-Tai.)

Rephrasing my earlier question, is there any language that has ʄ- without j-? All ten languages with ʄ in UPSID also have j. Are there languages whose j- is in part or whole from an earlier *ʄ-?

Norquest (2007: 276-277) reconstructed *Cid- as a source of Proto-Southwest Tai *j-:

Proto-South-Kra-Dai *Cid-
Pre-Hlai *Ciɾj- > Proto-Hlai *ɾj- Proto-Be *[C-]r- Proto-Southwest Tai *j-

I can only find a single example of a Hlai-PST *j- correspondence in Norquest (2007: 277):

'bad': PSKD *Cidaːk > Proto-Hlai *ɾjaːk : PST *jaːk (no Be cognate)

If Proto-Southwest Tai - and Proto-Tai - never had *j-, how would that affect Norquest's reconstructions?

*I would expect the Tai tone for 'paternal grandmother' to be C (< *-ʔ) rather than B (< *-h).

Ostapirat (2000: 233) reconstructed Proto-Kra *ja C 'grandmother' with a C tone matching Proto-Mon-Khmer *-ʔ. Was 'grandmother' borrowed into Proto-Kra-Dai, or was it independently borrowed by Proto-Kra and Proto-Tai?

Does the more specific meaning of the Tai word imply that the ancestor of the Tai borrowed the word from Mon-Khmer-speaking fathers? DUNIN-MARCINKIEVIČ'S LETTER FOR BELARUSIAN [W]

If one sees the letter ў in a modern Cyrillic text, that text is most likely to be in Belarusian.

Usage of ў

Language IPA Time period
(Ukrainian) ? ѵ̆  (forerunner of ў); late 16th to early 17th centuries
Romanian Before 1837 (mentioned in Русалка днѣстровая; see below)
Ukrainian [w] 1837 (in Русалка днѣстровая; was it ever used again?)
Belarusian 1870-
Siberian Yupik 1937- (1940-1941 examples with transliteration)
Dungan [uː]* 1940-
Uzbek [o]** 1940-1992

In 2003, a monument to the letter ў was erected in Belarus. Ў and its Latin equivalent ŭ are the only survivors of an earlier diversity of representations: e.g., Latin u, ú, ǔ, and w̆. I am not sure whether "plain 〈u〉, or with added accent, haček, or caret" in the Wikipedia article on Cyrillic short u refers to Cyrillic у у́ у̌ and у̂. That article mentions that Vincent Dunin-Marcinkievič used an "italicized" Latin u for Belarusian [w], whereas the Wikipedia article on the Belarusian Latin alphabet states that he used Latin u "in cursive". In either case, how would he have distinguished between his letters for [u] and [w] in his handwriting?

This reminds me of how Albanian ll, nj, and y [ɫ ɲ y] were once written as italic l, n, and u.

In the Sacred Books of the East, Sanskrit palatal c, ch, j, and jh were written as italic k, kh, g,  and gh to reflect their velar origins. Sir Monier Monier-Williams criticized that practice in the introduction to his Sanskrit dictionary:

... the philological advantage thought to be gained by thus exhibiting the phonetic truth of the interchange of gutturals and palatals appears to me to be completely outweighed by the disadvantage of representing by similar symbols sounds differing so greatly in actual pronunciation. For instance, to represent such common words as 'chinna' ['cut off'] by 'khinna' and 'jaina' [Jain] by 'gaina' seems to be as objectionable as to write 'Khina' for 'China' and 'Gapan' for 'Japan.' The plan of using Italics is no safeguard, seeing that in printing popular books and papers the practice of mixing up Roman and Italic letters in the same word is never adhered to, so that it is now common to find the important Indian sect of Jains printed and pronounced 'Gains.'

When did that unfortunate pronunciation of Jains become extinct?

Has anyone proposed using italic or cursive letters as distinct letters in alphabets devised in the 20th and 21st centuries? In theory one could represent such letters using Unicode mathematical alphabetic symbols, though in practice the mixture of letters would be awkward: e.g., vo𝓊k for Belarusian воўк/voŭk 'wolf'.

*1.16.0:12: According to Wikipedia, Dungan Cyrillic у represents [ʊ] ([ʊː]?), whereas ў represents [uː]. Dungan Cyrillic у corresponds to standard Mandarin -ou which is syllable-final. Since all (?) other syllable-final monophthongs are long in Dungan, I suspect that Cyrillic у is [ʊː].

**1.16.0:33: I have followed the Russian Wikipedia which equates Cyrillic ў and о with [o] and [ɑ], whereas the English Wikipedia lists very different IPA equivalents for those letters: [ɘ, ɤ, ø] and [ɒ, o]. None of the phonetic values of Cyrillic ў are close to [u] for Cyrillic у. So why was ў chosen for [o] or [ɘ, ɤ, ø] instead of, say, о̆? Was it more practical to recycle an existing Cyrillic letter? Or was the vowel represented by ў originally closer to [u], as implied by how Ўзбек corresponds to English Uzbek and Russian Узбек? (But Turkish for 'Uzbek' is Özbek!) THE O-RDER OF STANKIEVIČ'S BELARUSIAN ALPHABETS

Until tonight I never noticed that Jan Stankievič's 1962 Roman alphabet for Belarusian began with the letter o according to the English Wikipedia and both Belarusian Wikipedias:

o a e b c ć č d f g h ch i j k l ł m n ń p r [s]* ś š t v u ŭ dz dź dž z ź ž

(The three articles probably draw upon each other, so they may not be independent witnesses.) I wish I could see his 1962 article to verify this order. I am surprised because o may be the least common vowel in Belarusian.

Belarusian vowel frequency based on Cyrillic data at pravapis.org

(I have combined frequencies for V/jV pairs: а/я <a/ja>, э/е <e/je>, and у/ю <u/ju>. The true frequency of /o/ is somewhat higher because ё <jo> is missing from the data. The non-Belarusian letter и <i> is included, but I presume it is only in Russian words, so I have excluded it from my calculations.)

Vowel Taraškievica Post-1933 Soviet orthography Average
/a/ 48.5% 47.8% 48.15%
/e/ 11.7% 11.6% 11.65%
/i/ 11.5% 12.5% 12.0%
/o/ 8.4% 9.1% 8.75%
/u/ 9.3% 9.2% 9.25%
/y/ 10.6% 9.8% 10.2%

Earlier unstressed o became a in Belarusian which is why "[p]robably the most distinguising feature of Belarusian letter frequency is the abundance of letter "a" - more than 16%!" So why is the remaining o of Belarusian in first place?

1.15.1:11: According to the same three Wikipedias, Stankievič proposed moving о to first place in the Belarusian Cyrilic alphabet as well, so I have changed "Alphabet" to "Alphabets" in the title of this post:

о а э б ґ г х д е ё я дз дж з ж і й к л м н п р с ш т в у ў ф ь ц ч ы ю

<o a e b g h ch d je jo ja dz dž z ž i j k l m n p r s š t v u ŭ f ' c č y ju>

Letters in bold have been moved from their standard locations.

Letters with similar sound values are often grouped together. However, э <e> follows а <a> instead of being grouped with е <je>. And I have no idea why ь <'> is between ф <f> and ц <c>.

The current order (which excludes ґ <g>) is

а б в г д е ё ж з і й к л м н о п р с т у ў ф х ц ч ш ы ь э ю я

<a b v h d je jo ž z i j k l m n o p r s t u ŭ f ch c č š y ' e ju ja>

It is identical to the Russian order with the following exceptions:

- і instead of и for <i>

- the addition of ў <ŭ>

- the absence of щ <šč> and the hard sign ъ

I assume Stankievič's name would still be Станкевіч in his Cyrillic alphabet, but I don't know how he would spell it in his Latin alphabet.

*1.15.1:07: The letter s was missing in all three articles. I assume it was in Stankievič's Latin alphabet because his Cyrillic alphabet has с <s> and I can't imagine how he would write /s/ in the Latin alphabet without s. I have placed s before ś by analogy with his z that precedes ź. AN *-AW-KWARD ABSENCE

The standard dialect of Tangut recorded in its dictionary tradition had 105 rhymes which apparently all ended in vowels or semivowels. One might think that it would have been easy for the Tangut to imitate Chinese *-aw. Yet it seems that there was no such rhyme in Tangut. The Tangut had two strategies for transcribing Chinese *-aw:

1. Imitate the vowel

Gong (2002: 454-456) found that some Tangut transcriptions of Chinese *-aw syllables in the Forest of Categories ended in -a without -w:

Rhyme 17 -a: 悼濤陶道皓操草老

Rhyme 21 -ɨaa:

Rhyme 22 -aa: 熬奡

2. Imitate the semivowel

Conversely, he found that other Tangut transcriptions of Chinese *-aw syllables in that same text ended in -w without -a-:

Rhyme 44 -ew: 澡曹騷高

These strategies were not unique to the Forest of Categories. For instance, Andrew West brought the transcription character


5044 1la (tangraph for surnames) =

top of 4940 2iə (tangraph for surnames*) +

all of 5881 1la 'small, little' (phonetic)

to my attention. It transcribed Chinese 老 *law in the Tangut translation of Sunzi as well as in the Forest of Categories.

Why did Tangut lack -aw? Tangut -w was in part from *-k, and Tangut phonotactics forbade -w after central vowels (a, ə). I think pre-Tangut *-w assimilated with the preceding vowel, becoming a velar semivowel that either lenghthened that vowel or was lost if that vowel was tense (since Tangut lacked long tense vowels):

pre-Tangut *nɨak-H > *nɨaɰ-H > 2nɨaa (second syllable of 2miə 2nɨaa 'Tangut', borrowed into Tibetan before *-k-loss as mi-nyag)

pre-Tangut *S-lak > *lạɰ > 1lạ 'hand, arm' (cf. Written Tibetan lag and Written Burmese lak 'id.')

Is it possible that the Chinese transcription 老索 *lawso of a Tangut name reflects a Tangut dialect that had retained *-w after *a? If so, then 老 *law may correspond to standard Tangut laa or lạ (tone unknown). However, as far as I know, neither laa nor lạ are in Tangut surnames. Andrew suggested that 老索 *lawso may be a transcription of an otherwise unattested Tangut surname

5044 2670 *1la 2so

whose halves are attested in other surnames.

If standard Tangut speakers transcribed Chinese *law as 1la, then perhaps they would also have transcribed nonstandard Tangut law as 1la, not knowing that pre-Tangut *law became standard Tangut laa. Hence *Laso could be a standardized form of a nonstandard Tangut surname Lawso whose pronunciation (tones aside) might have been closer to its Chinese transcription 老索 *lawso. Perhaps the attested surname

4788 5044 2lew 1la

might have been *Lewlaw (a reduplicative form?) in a nonstandard Tangut dialect. OLD CORD

Andrew West blogged about a stele that

commemorates (in Chinese) the life of the Tangut official Laosuo 老索 and four generations of his family, who lived in Baoding throughout the Yuan dynasty.

The Chinese transcription 老索 (literally 'old cord'; roughly *lawso in Yuan Dynasty pronunciation) of a Tangut name is interesting because it contains *-aw, a rhyme that is absent from recent Tangut reconstructions (i.e., Gong's, Arakawa's, Sofronov's, and mine). Was 老 *law an attempt to transcribe a Tangut syllable with some other rhyme: e.g., lew? Or did 老 *law reflect a syllable law in a later (nonstandard?) dialect of Tangut? Lastly, could 老 *law be a Chinese prefix 'old' added to a Tangut name So? We know nothing about Tangut dialectology, much less how Tangut changed in the centuries between the compilation of dictionaries such as the Tangraphic Sea in the 11th century and the last known Tangut inscription from 1502. I hope to find hints of sound changes in the Tangut transcriptions of Sanskrit on the dharani pillars: i.e., spellings that would not have been used in earlier periods.

Tangut fonts by Mojikyo.org
Tangut radical and Khitan fonts by Andrew West
Jurchen font by Jason Glavy
All other content copyright © 2002-2013 Amritavision