A little change for Search [tracked]

old granted and denied feature requests

Moderator: AniDB

Locked
OnegaiNL
Posts: 80
Joined: Wed Oct 20, 2004 4:38 pm

A little change for Search [tracked]

Post by OnegaiNL »

@ Google when you search for something you get this for example:

you search for 'piks'
and you get results and something above it wich says 'Did you mean: pics'

it would be nice if AniDB to have something like that too, example for AniDB:

you search for 'Lolicon Angel' (you actually want to find Lolikon Angel)
it doesn't find it because @ Lolikon Angel other names doesn't contain 'Lolicon Angel'
it would be nice if it said 'Did you mean Lolicon Angel' or something :)

i don't know if it is hard to create since i can't script or anything lol
exp
Site Admin
Posts: 2438
Joined: Tue Oct 01, 2002 9:42 pm
Location: Nowhere

Post by exp »

dunno,

does anyone know if postgres supports some kind of fuzzy text searching?

BYe!
EXP
fahrenheit
AniDB Staff
Posts: 438
Joined: Thu Apr 08, 2004 1:43 am
Location: Portugal

Post by fahrenheit »

hmm, best matches i think it would not be to dificult to implement, say one searches for ABCDEFGH and it doesn't find it, never the less there is one ABCDEF and one BCDEFGH, so it displays it has possible matches.
One possible match with 90% of similarity could be used as "did you mean BCDEFGH ?"
dunno, postgres is way over my league..
nich
Posts: 33
Joined: Sat Feb 08, 2003 12:38 am

Post by nich »

exp wrote:[...]does anyone know if postgres supports some kind of fuzzy text searching?[...]
Well... a quick look at google and I found in PostgreSQL developers' page something about "fuzzy search". Couldn't find anything else of relevance in that page, though.

Interesting that the guy listed as having implemented fuzzy search is (apparently) also one of the main developers of OpenFTS (which gmni implemented the animereactor forum and, so I believe, is one of the main reasons it's taking so long :P ).

Maybe you could take a look at it and check if there's anything useful for anidb.

-nich
wahaha
AniDB Staff
Posts: 1497
Joined: Sun Nov 17, 2002 3:33 pm

Post by wahaha »

I suppose a good (and not too costy) way would be to have some algorithm that associates a string with an integer that doesn't move much on small differences. Alas, I don't know any such formula. ^^;

Something more simple, but still somwhat useful, would be to generate one "dumbed down" list of all titles, which eliminates most ambiguities like "k->c", "m->n", etc...
The search terms could then be dumbed down in the same manner and compared to that list with a simple string comparison.
One could either apply a very excessive conversion, so that many results are returned which can then be sorted with a more sophisticated (and time-intensive) method, for example the Levensthein distance; or one could apply a moderate conversion and simply output all matches.
Jarudin
Posts: 7
Joined: Tue Jan 11, 2005 5:19 pm

Post by Jarudin »

You could 'enchance' the search by allowing wildcards, which would also solve a problem I run into quite often.

My problem: for instance you search for "Ghost shell" expecting to find "Ghost in the shell" but appearantly, a -space- is not seen as an -and- or -or- possibilty, or something that would allow multiple wildcards (in this case: " in the " would be the wildcard part). This could be solved easily (if you work with SQL that is)

Then you could allow a -space- to be a mutli char wildcard (0 or more chars inbetween) and ? as a single char wildcard (0 or 1 chars inbetween)

THEN (still going :p ) you could (when no search results are returned) replace every char each by each with a -?- to make that char the wildcard, and just keep going till you find a result.

Examples:
"Ghost Shell" would give "did you mean: Ghost in the shell?" (So its like "%Ghost%shell%" (note the extra % there))
"Ra?m??r? Sekitan" would give "did you mean: Raimuiro Sekitan?" (might have misspelled that, you get the point :p )
"Rourouni Kenshin" would give "did you mean: Rurouni Kenshin?"
"Gnundam" would give "did you mean: Mobile Suit Gundam Wing or .. etc.."

Also, the -space- could be taken as an -or- when no search results are found using -space- as an -and-
"Pokemon Advance" would give "did you mean: Pokemon?"

However, this might be -somewhat- time consuming (I dont know how much one full search costs) then again it could save alot of search aswell :)[/i]
exp
Site Admin
Posts: 2438
Joined: Tue Oct 01, 2002 9:42 pm
Location: Nowhere

Post by exp »

hmmm,

strange, thought I do replace ' ' with '%'.
but it seems that i commented that part out at somepoint.
I'll change that back -> ONTODO

about the other things, they're way to heavy on the DB in my opinion.

BYe!
EXP
pelican
AniDB Staff
Posts: 234
Joined: Wed Aug 11, 2004 11:19 pm

Post by pelican »

exp wrote:strange, thought I do replace ' ' with '%'.
but it seems that i commented that part out at somepoint.
I'll change that back -> ONTODO
Er, wouldn't that eliminate the `feature' of trailing spaces for word boundaries, as you explain every time someone notes that bug? :)
egg
Posts: 769
Joined: Tue Nov 11, 2003 7:17 am

Post by egg »

pelican wrote:Er, wouldn't that eliminate the `feature' of trailing spaces for word boundaries, as you explain every time someone notes that bug? :)
Shhhhh... You aren't supposed to tell him that. :wink:
kaoru
Posts: 7
Joined: Tue Nov 19, 2002 4:02 pm
Location: ehrgeiz

Post by kaoru »

aye.

i push this thread again, because i am a nervous person and in my dark room i often mistype.
so fuzzy logic please help me.
Rar
AniDB Staff
Posts: 1471
Joined: Fri Mar 12, 2004 2:41 pm
Location: UK
Contact:

Post by Rar »

This isn't practical. One letter error/transposition is, at performance cost, but for real smart searching, you need to delegate to huge companies who've actually got the tech for this already set up - ie, let the search engines in.

Rar
n4rut0
Posts: 61
Joined: Wed Jun 08, 2005 11:53 pm
Location: Dominican Rep.

Post by n4rut0 »

Jarudin wrote:You could 'enchance' the search by allowing wildcards, which would also solve a problem I run into quite often.

My problem: for instance you search for "Ghost shell" expecting to find "Ghost in the shell" but appearantly, a -space- is not seen as an -and- or -or- possibilty, or something that would allow multiple wildcards (in this case: " in the " would be the wildcard part). This could be solved easily (if you work with SQL that is)

Then you could allow a -space- to be a mutli char wildcard (0 or more chars inbetween) and ? as a single char wildcard (0 or 1 chars inbetween)

THEN (still going :p ) you could (when no search results are returned) replace every char each by each with a -?- to make that char the wildcard, and just keep going till you find a result.

Examples:
"Ghost Shell" would give "did you mean: Ghost in the shell?" (So its like "%Ghost%shell%" (note the extra % there))
"Ra?m??r? Sekitan" would give "did you mean: Raimuiro Sekitan?" (might have misspelled that, you get the point :p )
"Rourouni Kenshin" would give "did you mean: Rurouni Kenshin?"
"Gnundam" would give "did you mean: Mobile Suit Gundam Wing or .. etc.."

Also, the -space- could be taken as an -or- when no search results are found using -space- as an -and-
"Pokemon Advance" would give "did you mean: Pokemon?"

However, this might be -somewhat- time consuming (I dont know how much one full search costs) then again it could save alot of search aswell :)[/i]
as a quote for this guy the spaces work pretty fun in anidb try searching for " akira" with an space before and "akira" without spaces u go directly to akira countdown... but when u put "akira " with a space after ^^ it gives ya 2 diferent answers one of them is samuray 7 " because it got akira in the name too " another nice example is with ninja scroll

try " ninja scroll" it takes ya to the 1st ninja scroll he sees in database ( basilisk" if u type "ninja scroll" it gives ya 5 answers... if u type "ninja scroll" with two spaces u get nothing ^^ and "ninja scroll " with a space after and u get the movies ^^ i dont know if that's something premeditate but is kinda fun how u can play with spaces in this database
Locked