[EDIT][TITLE] Pikachu's Summer Festival
Moderator: AniDB
[EDIT][TITLE] Pikachu's Summer Festival
I need help getting the Kanji and Romaji Names for Pikachu's Summer Festival. Here is the screencap.
Well, my rule is that I can find the word in the dictionary. Try looking up the 'word' "Summer Festival" in dictionary.com for instance, and you will have a hard time finding it because it's not one word in English. In German it would be "Sommerfest", which is one word, and can thus be found in the dictionary. Perhaps not a linguistically satisfying answer for you, but well, I'm not a linguist. ^^;
But now a question for you, what do you think about the 'noun phrases' in the following title: Natsuiro no Sunadokei. Why isn't it "Natsu Iro"? (summer color) And why isn't it "Suna Dokei"? (Sand + Clock = Hourglass)
Oh, and one more question, could you give me an example of a noun phrase in Japanese where spaces are definitely needed?
But now a question for you, what do you think about the 'noun phrases' in the following title: Natsuiro no Sunadokei. Why isn't it "Natsu Iro"? (summer color) And why isn't it "Suna Dokei"? (Sand + Clock = Hourglass)
Oh, and one more question, could you give me an example of a noun phrase in Japanese where spaces are definitely needed?
Ehe, if you want a 'reason' why that h-game adaptation is spaced like that, it's because the fansubbers gave the title like that in the filename, no doubt. You'll notice the database is far from consistant on spacing issues.
As you brought up german, it's nice to mention it does pretty well on endless compounding, Donaudampfschiffahrtsgesellschaftskapitaenskajuetenschluesselloch I got from a website, or even just neunhundertneunundneunzigtausendneunhundertneunundneunzig.
As for what 'dictionary words' could do with a little breaking, can't manage quite as good as the german off hand, but just wacking ちょう into an online dic and seeing what came up:
長距離電話会社 【ちょうきょりでんわがいしゃ】 (n) long-distance phone company
超低金利金融政策 【ちょうていきんりきんゆうせいさく】 (n) ultra-loose monetary policy
超電導磁気浮上式鉄道 【ちょうでんどうじきふじょうしきてつどう】 (n) superconducting maglev train
Or if we allow proper names, I quite liked this one:
朝鮮民主主義人民共和国 【ちょうせんみんしゅしゅぎじんみんきょうわこく】 (n) Democratic People's Republic of Korea (North Korea); DPRK; (P)
Include techincal and scientific terms, you can go on forever.
This isn't entirely idle banter, as anime is full of stull that's clumsily translated into english as 'Great Dragon Iron Fist Fireball Attack' and equally silly phrases. Though not 'dictionary words', they're really no different to sensible noun phrases you would find in the dictionary.
Rar
As you brought up german, it's nice to mention it does pretty well on endless compounding, Donaudampfschiffahrtsgesellschaftskapitaenskajuetenschluesselloch I got from a website, or even just neunhundertneunundneunzigtausendneunhundertneunundneunzig.
As for what 'dictionary words' could do with a little breaking, can't manage quite as good as the german off hand, but just wacking ちょう into an online dic and seeing what came up:
長距離電話会社 【ちょうきょりでんわがいしゃ】 (n) long-distance phone company
超低金利金融政策 【ちょうていきんりきんゆうせいさく】 (n) ultra-loose monetary policy
超電導磁気浮上式鉄道 【ちょうでんどうじきふじょうしきてつどう】 (n) superconducting maglev train
Or if we allow proper names, I quite liked this one:
朝鮮民主主義人民共和国 【ちょうせんみんしゅしゅぎじんみんきょうわこく】 (n) Democratic People's Republic of Korea (North Korea); DPRK; (P)
Include techincal and scientific terms, you can go on forever.
This isn't entirely idle banter, as anime is full of stull that's clumsily translated into english as 'Great Dragon Iron Fist Fireball Attack' and equally silly phrases. Though not 'dictionary words', they're really no different to sensible noun phrases you would find in the dictionary.
Rar
Ok, so there are indeed some entries in dictionaries which need some spacing if you're going to romanize them. Taking the first one of your examples, I would romanize that one as follows:
長距離電話会社 --> Chou-Kyori Denwagaisha = long-distance phone company
I'm pretty sure 'Denwagaisha' should be written as one single word here, otherwise it would have been 'Denwa Kaisha' instead. And Chou-Kyori and Denwagaisha need spacing between them because when you say it outloud in Japanese there's a small pause or something after the Kyori. At least you can definitely 'feel' that it's not one long word.
Hmm, it's kinda hard for me to explain why, it probably has something to do with 'pitch accents' or how the 'word' sounds and flows in Japanese, but I can't think of 'Natsumatsuri' as anything but one single word. 'Natsu Matsuri' just looks wrong to my eyes, just like 'Denwa Gaisha', or 'Natsu Iro' would. Perhaps I'll ask a Japanese linguist about this tomorrow, I've gotten kind of curious. ^^
長距離電話会社 --> Chou-Kyori Denwagaisha = long-distance phone company
I'm pretty sure 'Denwagaisha' should be written as one single word here, otherwise it would have been 'Denwa Kaisha' instead. And Chou-Kyori and Denwagaisha need spacing between them because when you say it outloud in Japanese there's a small pause or something after the Kyori. At least you can definitely 'feel' that it's not one long word.
Hmm, it's kinda hard for me to explain why, it probably has something to do with 'pitch accents' or how the 'word' sounds and flows in Japanese, but I can't think of 'Natsumatsuri' as anything but one single word. 'Natsu Matsuri' just looks wrong to my eyes, just like 'Denwa Gaisha', or 'Natsu Iro' would. Perhaps I'll ask a Japanese linguist about this tomorrow, I've gotten kind of curious. ^^
Well, I went with Rar's, except I used Pikachu, since that is the way it is spelled in English literature and it probably falls under the loanword clause (ironically). So I put in: 'Pikachu no Natsu Matsuri'. Anyone who thinks otherwise can try to convince Rar to change it.Rar wrote:If we're being picky, I'd use caps for non-particles and break the noun phrase in the romanisation: Pikachuu no Natsu Matsuri - and you missed using pretty colours!
I was wracking my brains then trying to remember if Pikachu was a character in Wuthering Heights or Great Expectations.egg wrote:except I used Pikachu, since that is the way it is spelled in English literature
Er.. but seriously, isn't the name always given as ピカチュウ rather than in roman in japanese? No need to drop the long vowel just because the american versions do, most pokemon names are changed completely, romaji fields should still have the japanese transcription rather than the american translation.
Note this is different to this case, where a roman spelling is provided for a loanword/name, but lines can get blurry.
Back to Rafal briefly, seems to me some elements of this discussion are reasonably constant across languages while others are less so. The wikip rules for breaking words seem to boil down to minimal semantic units vs. rhythm of speech. Obviously if you take either too religiously you get silly results.
A bluebird is not always blue bird, and as mentioned a 砂時計 is not as such 砂・時・計, though there is some sense in breaking at every character in many japanese noun phrases, as most kanji do have a semantic involvement.
On the other hand of you try to go purely on spoken usage you run in to the problem that a language is anything but a set in stone standard. What might 'feel right' to you may been seen as outright wrong by other speakers of the language, the intonation and pauses you might thing natural are by no means universal.
Anyway, dictionaries are quite a useful snapshot of how you might define 'words' at a moment in time in a certain dialect, and I suggested something broadly similar to what you did, it's nice arguing against my own proposition to find breaking cases.
As for your spliting of 長距離電話会社, I think you shouldn't have been a chicken and just done the shortest one.
Anyway, seems reasonable, but I don't like Prefix-Word much, though people are used to the Word-suffix format. And as this is just dog transcription, being able to parse it matters as much as some abstract 'correctness' - 'correct' is write it in japanese.
Anyway, I think the ultimate solution is some flashy titles handling with markup, but for the moment it's just a case of trying to give sensible options. You'll see why there are not guidelines as of yet I hope, this is like the mapping coastlines problem, the closer you look the longer a job it gets.
Rar
"Try to limit word to shortest possible combination of kanji with on-yomi without leaving one dangling kanji and treat each kanji with kun-yomi as separate word, unless there's consonant shift" should work, I think.
So "natsu iro" and "natsu matsuri" should be split, and "sunadokei" or "choukyori" should be not.
So "natsu iro" and "natsu matsuri" should be split, and "sunadokei" or "choukyori" should be not.
All right, I did a little research, asked around and my findings were:
- Like Rar said earlier it's hard to define what exactly a word is in Japanese and there's no absolute and definite way to find out. However, currently one common and somewhat accepted method is indeed the 'dictionary entry'. The entries which Rar found btw are not found in any of the 'great' J-J dictionaries (Koujiten, Daijirin and Daijisen) and are thus not regarded as single words according to this method.
- Most Japanese people think of "Natsumatsuri" as one word.
- Both my own J-E dictionary and my two kanji dictionaries (the New Nelson and Kanji & Kana: A Handbook of the Japanese Writing System) write 夏祭り as one single word in romaji.
The second book also has an interesting paragraph about romanization and how to define what a 'word' is in Japanese:
"The only real problem in romanizing Japanese text, in which there are no spaces between words, is in deciding where one word ends and the next begins. There are no universal rules for this, but, as a basic principle, components which are perceived to be independent units are written seperately: Hon o sagashite iru n desu. Hyphenation is used for various suffixes and other word units that one does not want to run together but does not want to write seperately: Toukyou-to, Minato-ku, Endou-san. For readability, long compounds are broken up into smaller units: Nihon Shoki, kaigai ryokou, minshu shugi." - Google search result
夏祭り "natsu matsuri" = 90 hits
夏祭り natsumatsuri = 2880 hits
The rows. suggestion is quite fun, well worth listing as an idea. Though I can think of some cases where dakuten are added but perhaps a space would still be warrented, noun phrases beginning 二人 for instance.
More interestin' stuffs from rafal, I'm sure you can find some much longer 'words' in the j-j dicts than my hasty internet check managed though, if you have a good poke. Really, I'd prefer to see what these textbooks say on the topic of prefixes, something I've seen very little on as opposed to the established word-suffix.
Finally, one point that needs making from experience with wikipedia, google fights that don't involve hundreds or at least tens of thousands of results are almost entirely worthless when trying to prove something. The internet is a very poor sample of language use, but is very good at propagating errors widely. Particularly trying to get an idea of how japanese use romaji from the web is a pointless exercise (read: however they damn well like, they read their own scripts so it's not like romaji correctness matters). And final point, google.nl and google.something_else will return different results, if you want japanese pages co.jp is probably slightly preferable.
Rar
More interestin' stuffs from rafal, I'm sure you can find some much longer 'words' in the j-j dicts than my hasty internet check managed though, if you have a good poke. Really, I'd prefer to see what these textbooks say on the topic of prefixes, something I've seen very little on as opposed to the established word-suffix.
Finally, one point that needs making from experience with wikipedia, google fights that don't involve hundreds or at least tens of thousands of results are almost entirely worthless when trying to prove something. The internet is a very poor sample of language use, but is very good at propagating errors widely. Particularly trying to get an idea of how japanese use romaji from the web is a pointless exercise (read: however they damn well like, they read their own scripts so it's not like romaji correctness matters). And final point, google.nl and google.something_else will return different results, if you want japanese pages co.jp is probably slightly preferable.
Rar
Perhaps, but I don't see why you can't use this as a general rule though. Long entries can always be looked at seperately.Rar wrote: More interestin' stuffs from rafal, I'm sure you can find some much longer 'words' in the j-j dicts than my hasty internet check managed though, if you have a good poke.
Btw, 長距離電話(ちょうきょりでんわ) does appear in the Daijirin and is romanized as "Choukyoridenwa" in my j-e romaji dic and as "Choukyori Denwa" in the New Nelson . I personally would say the second one looks better because of reasons I have already mentioned (so this would indeed be an exception). The hyphen I used earlier isn't really needed.
I can't find much of it either, but I think it depends on the prefix and the word. For instance for the politeness prefix 御 ("o", or "go") hyphenation is usually preferred when the prefix is not considered an integral part of the word itself: O-genki desu ka?. And when the prefix has become a part of the word as in nouns like 'otaku', 'ojou-san' or 'gohan', the general rule is to write them without a hyphen.Really, I'd prefer to see what these textbooks say on the topic of prefixes, something I've seen very little on as opposed to the established word-suffix.
I've used google.nl/.com/.co.jp and they all gave similar results, so I did think about that before posting those results (the default I use is .nl).Finally, one point that needs making from experience with wikipedia, google fights that don't involve hundreds or at least tens of thousands of results are almost entirely worthless when trying to prove something. The internet is a very poor sample of language use, but is very good at propagating errors widely. Particularly trying to get an idea of how japanese use romaji from the web is a pointless exercise (read: however they damn well like, they read their own scripts so it's not like romaji correctness matters). And final point, google.nl and google.something_else will return different results, if you want japanese pages co.jp is probably slightly preferable.
As for 'proving' anything with this, well there isn't really anything to prove as there are no universal rules for this. All you can do is look at what literature and linguists have to say about this, how other people do it and draw your own conclusions and create your own rules from there. I'm just seeing that in most (all?) Hepburn based Japanese (text)books and romaji dictionaries 夏祭り is romanized as one word, so I'm more inclined to follow their example and write it as one single word as well.
Sorry for the double post, forgot to reply to rowaasr's post. ^^;
I propose to just write everything that can be found in the dictionary as one word, unless the word is very long or hard to read (for instance the earlier mentioned 'Choukyori Denwa').
This doesn't seem like a very good idea, for instance with this 'rule' you'd end up writing the Japanese word for 'wheelchair' 車椅子(くるまいす) as 'Kuruma Isu'. Or the word 仲間(なかま) as 'Naka Ma'. I think nobody with any knowledge of Japanese would want to romanize these words like that.rowaasr13 wrote:"Try to limit word to shortest possible combination of kanji with on-yomi without leaving one dangling kanji and treat each kanji with kun-yomi as separate word, unless there's consonant shift" should work, I think.
I propose to just write everything that can be found in the dictionary as one word, unless the word is very long or hard to read (for instance the earlier mentioned 'Choukyori Denwa').