Character encoding

MaxVT · Post by **MaxVT** » Sat Dec 18, 2004 4:37 pm

The UDP spec says: character encoding = unknown.

What's that supposed to mean? I mean, one may assume ASCII or 8859-1 or whatever, but what about other languages? Or Japanese? Or anything but English, for the matter?

The AniDB is working just fine, but you have to deal with those issues sometime... and to choose UTF-8 or UCS2 now is much better than trying to deal with assorted character translations later.

Post by **PetriW** » Sat Dec 18, 2004 5:08 pm

The TCP api uses iso-8859-1 BUT sometimes it uses utf-8. What you rather should support is standard ascii + html encoded characters. Maybe support iso-8859-1 since there are many entires which use >127 characters but if you do you should be prepared to have to switch the character convert to utf-8 without notice.
Due to issues exp had with perl the decision for unicode was use html-encoded characters.

(Notice, the tcp api has changed encoding 4 times so far.)