Change encoding to UTF-8 [DENIED]
Moderator: AniDB
Change encoding to UTF-8 [DENIED]
It would be good if encoding were changed to UTF-8. That would allow to add japanese names as synonyms, as well as other official synonyms in other languages, like "ΑΠΗΕΝΤΟ ΣΟΜΑ" for Argento Soma or "Пламенный лабиринт" for Labyrinth Of Flames. (BTW, last one already have that synonym, but because of mess with encoding you can see it only after manually passing it through several recodes).
Last edited by rowaasr13 on Sun Sep 28, 2003 9:59 pm, edited 1 time in total.
I guess as long as approximately less than 1% of all the users who enter filenames do not understand Kanji at all this request is kind of moot. I think with these many missing informations for files (hashes, aspect rations, groups, source for the RAWs etc) there shouldn't be an option to add something that could be entered and afterwards not checked by more than another person from that 1%.
It would be like those people running around with Kanji on their T-Shirts or Tatoos who were told "This means this and that" and in the end, it means something totally different. In the end we could end up with insults and other similar stuff in the database and nobody would notice.
Not that the Japanese don't run around with silly stuff in roman letters on their shirts but I guess that is why a japanese AniDB ... forget that, Anime titles are often enough so silly that they wouldn't even notice I guess ^_^
It would be like those people running around with Kanji on their T-Shirts or Tatoos who were told "This means this and that" and in the end, it means something totally different. In the end we could end up with insults and other similar stuff in the database and nobody would notice.
Not that the Japanese don't run around with silly stuff in roman letters on their shirts but I guess that is why a japanese AniDB ... forget that, Anime titles are often enough so silly that they wouldn't even notice I guess ^_^
While the number of users who'll find Kanji titles interesting is certainly quite small, this feauture is IMO too easy to not implement it.
Either in the HTML template:
Or in perl:
The database itself doesn't have to worry about UTF-8 encoded titles since the individual octets pass as normal ASCII text. However, adding the charset directive only solves the display issue; I don't know if it's equally simple to e.g. search the database for UTF-8 encoded titles. (Shouldn't be a problem tho, if browsers submit form values as UTF-8.)
Either in the HTML template:
Code: Select all
<head>
...
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
</head>
Code: Select all
print "Content-type: text/html; charset=UTF-8\n\n";
At least IE does just that
IE submits everything in currently selected encoding - just look at my first post - it was typed after selecting UTF-8 in IE. If I didn't do that, it would be badly scrambled.I don't know if it's equally simple to e.g. search the database for UTF-8 encoded titles. (Shouldn't be a problem tho, if browsers submit form values as UTF-8.)
(Should be Unicode-entities, thus encoding-independant )rowaasr13 wrote:like "ΑΠΗΕΝΤΟ ΣΟΜΑ" for Argento Soma or "Пламенный лабиринт" for Labyrinth Of Flames.
Interesting idea though
I think that info would fit best into a special field, like "Original title", instead of being "just another synonym".
For the search-issue (and Chii's "!stitle" aswell), it might be helpful to explicitly include a "romanized" version of the original title.
*DENIED*
as we're having serious problems on the anidb client side with encodings AniDB will even switch to _plain_ ASCII (meaning chars 1-127) soon.
all exisiting entries will be converted, unknown chars will be replaced with "_". Some especial handling is done for common non ascii chars like äöüßéáèà...
So better don't start adding any none ASCII titles
BYe!
EXP
as we're having serious problems on the anidb client side with encodings AniDB will even switch to _plain_ ASCII (meaning chars 1-127) soon.
all exisiting entries will be converted, unknown chars will be replaced with "_". Some especial handling is done for common non ascii chars like äöüßéáèà...
So better don't start adding any none ASCII titles
BYe!
EXP
What kind of problems do you have exactly? Considering that UTF-8 fully preserves ASCII <127 there shouldn't be anything serious. Most OS I know have either built-in or widely available libraries for UTF-8 based output, so that shouldn't be issue as well. UTF fits well in URLs in case of GET requests as well, and have absolutely no restrictions for POST too (I remember client will use HTTP based requests, right?)
In case you really want all chars <127, you can use UTF-7, it will work just as well, but will take more space. It won't affect ASCII only name at all (well, almost - not many anime have + in their title).
In case you really want all chars <127, you can use UTF-7, it will work just as well, but will take more space. It won't affect ASCII only name at all (well, almost - not many anime have + in their title).
I think the problem here is not the encoding but using double-byte chars within the programs. But don't .NET as well as Java support double-byte char values natively? And I'm pretty sure that I've seen a Delphi program or two that did save text files as double-byte streams... So it should be doable, ne?
In .NET strings are always stored in unicode, thus 2-byte. There shouldn't be a problem as even the char-type is a unicode-2-byte-character. You might only run into problems, if you start handling stings as bytearrays (which is a really stupid idea, as the string-class provides all services an array does).
Well,
it's all nice and good however it's just a pain to work with.
we've lost way to much time trying to get utf8 support working on the entire datapaths in anidb (db<->cgi<->bot<->client<->misc).
so we've decided to drop non-ascii support entirely for now.
as plain ascii is valid utf8 this will always allow us to step up to complete utf8 support once everyone can work with it.
BYe!
EXP
it's all nice and good however it's just a pain to work with.
we've lost way to much time trying to get utf8 support working on the entire datapaths in anidb (db<->cgi<->bot<->client<->misc).
so we've decided to drop non-ascii support entirely for now.
as plain ascii is valid utf8 this will always allow us to step up to complete utf8 support once everyone can work with it.
BYe!
EXP
no, we're using a good old hand-made plaintext protocoll.kidan wrote:Aren't you using SOAP for the protocol-stuff? SOAP should be able to handle multibyte-charsets.
but the encoding isn't really a protocoll problem, we could just make sure all data is utf8 encoded before it's passed to the protocoll level.
BYe!
EXP