Character encoding

Want to help out? Need help accessing the AniDB API? This is the place to ask questions.

Moderator: AniDB

Locked
MaxVT
Posts: 2
Joined: Sat Dec 18, 2004 4:24 pm
Contact:

Character encoding

Post by MaxVT »

The UDP spec says: character encoding = unknown.

What's that supposed to mean? I mean, one may assume ASCII or 8859-1 or whatever, but what about other languages? Or Japanese? Or anything but English, for the matter?

The AniDB is working just fine, but you have to deal with those issues sometime... and to choose UTF-8 or UCS2 now is much better than trying to deal with assorted character translations later.
PetriW
AniDB Staff
Posts: 1522
Joined: Sat May 24, 2003 2:34 pm

Post by PetriW »

The TCP api uses iso-8859-1 BUT sometimes it uses utf-8. What you rather should support is standard ascii + html encoded characters. Maybe support iso-8859-1 since there are many entires which use >127 characters but if you do you should be prepared to have to switch the character convert to utf-8 without notice.
Due to issues exp had with perl the decision for unicode was use html-encoded characters.

(Notice, the tcp api has changed encoding 4 times so far.)
Locked