When i'm entering russian symbols in 'search' field, i have no results. even if i do ctrl+c, ctrl+v from animepage.
for example Last Exile. i can find using japanese (ラストエグザイル) and even arabic (لمنفى الأخير) titles. but i can't do it searching for russian title (Изгнанник).
http://tracker.anidb.info/view.php?id=665
no results while searching anime by russian titles
Moderator: AniDB
-
- Posts: 14
- Joined: Fri Nov 18, 2005 12:24 pm
- Location: Moscow, Russia
- Contact:
Confirmed:
http://anidb.info/perl-bin/animedb.pl?s ... rch=%D0%B6
Greek also has the same problem:
http://anidb.info/perl-bin/animedb.pl?s ... rch=%CF%83
My suspicion (based on knowledge of how crappy non-ascii string handling is in perl) was that at some point it's treating input as latin-1 and lowercasing it*, but a quick glance over adbs_animelist.pm and adbs_all_misc.pm shows nothing obviously doing that - unless postgre is fucking up ILIKE.
Rar
*Reasoning:
http://anidb.info/perl-bin/animedb.pl?s ... rch=%D0%B6
Greek also has the same problem:
http://anidb.info/perl-bin/animedb.pl?s ... rch=%CF%83
My suspicion (based on knowledge of how crappy non-ascii string handling is in perl) was that at some point it's treating input as latin-1 and lowercasing it*, but a quick glance over adbs_animelist.pm and adbs_all_misc.pm shows nothing obviously doing that - unless postgre is fucking up ILIKE.
Rar
*Reasoning:
Code: Select all
>>> for c in (u"し", u"σ", u"ж"):
... utf8 = c.encode('utf8')
... lcbork = utf8.decode('latin1').lower()
... tripdone = lcbork.encode('latin1').decode('utf8','replace')
... print " ".join(repr(s) for s in (c, utf8, lcbork, tripdone))
...
u'\u3057' '\xe3\x81\x97' u'\xe3\x81\x97' u'\u3057'
u'\u03c3' '\xcf\x83' u'\xef\x83' u'\ufffd'
u'\u0436' '\xd0\xb6' u'\xf0\xb6' u'\ufffd'
Right, we've done a bit more testing, and as it turns out that it works* on dev and:
<EXP[BUSY]> well, main difference between dev and main atm should be locale
<EXP[BUSY]> C on main, en_GB.UTF-8 on dev
Changing locale should fix, this won't be done immediately though as it means quite a bit of downtime.
Rar
*Where works=doesn't fail as much.
These two queries return different results:
show=animelist&adb.search=%D0%96
show=animelist&adb.search=%D0%B6
So clearly still only as unicode as the normal unicode 'support' of just treating utf8 as funny ascii and being careful not to fuck with the top bit.
<EXP[BUSY]> well, main difference between dev and main atm should be locale
<EXP[BUSY]> C on main, en_GB.UTF-8 on dev
Changing locale should fix, this won't be done immediately though as it means quite a bit of downtime.
Rar
*Where works=doesn't fail as much.
These two queries return different results:
show=animelist&adb.search=%D0%96
show=animelist&adb.search=%D0%B6
So clearly still only as unicode as the normal unicode 'support' of just treating utf8 as funny ascii and being careful not to fuck with the top bit.