A little change for Search [tracked]
Moderator: AniDB
A little change for Search [tracked]
@ Google when you search for something you get this for example:
you search for 'piks'
and you get results and something above it wich says 'Did you mean: pics'
it would be nice if AniDB to have something like that too, example for AniDB:
you search for 'Lolicon Angel' (you actually want to find Lolikon Angel)
it doesn't find it because @ Lolikon Angel other names doesn't contain 'Lolicon Angel'
it would be nice if it said 'Did you mean Lolicon Angel' or something
i don't know if it is hard to create since i can't script or anything lol
you search for 'piks'
and you get results and something above it wich says 'Did you mean: pics'
it would be nice if AniDB to have something like that too, example for AniDB:
you search for 'Lolicon Angel' (you actually want to find Lolikon Angel)
it doesn't find it because @ Lolikon Angel other names doesn't contain 'Lolicon Angel'
it would be nice if it said 'Did you mean Lolicon Angel' or something
i don't know if it is hard to create since i can't script or anything lol
-
- AniDB Staff
- Posts: 438
- Joined: Thu Apr 08, 2004 1:43 am
- Location: Portugal
hmm, best matches i think it would not be to dificult to implement, say one searches for ABCDEFGH and it doesn't find it, never the less there is one ABCDEF and one BCDEFGH, so it displays it has possible matches.
One possible match with 90% of similarity could be used as "did you mean BCDEFGH ?"
dunno, postgres is way over my league..
One possible match with 90% of similarity could be used as "did you mean BCDEFGH ?"
dunno, postgres is way over my league..
Well... a quick look at google and I found in PostgreSQL developers' page something about "fuzzy search". Couldn't find anything else of relevance in that page, though.exp wrote:[...]does anyone know if postgres supports some kind of fuzzy text searching?[...]
Interesting that the guy listed as having implemented fuzzy search is (apparently) also one of the main developers of OpenFTS (which gmni implemented the animereactor forum and, so I believe, is one of the main reasons it's taking so long ).
Maybe you could take a look at it and check if there's anything useful for anidb.
-nich
I suppose a good (and not too costy) way would be to have some algorithm that associates a string with an integer that doesn't move much on small differences. Alas, I don't know any such formula. ^^;
Something more simple, but still somwhat useful, would be to generate one "dumbed down" list of all titles, which eliminates most ambiguities like "k->c", "m->n", etc...
The search terms could then be dumbed down in the same manner and compared to that list with a simple string comparison.
One could either apply a very excessive conversion, so that many results are returned which can then be sorted with a more sophisticated (and time-intensive) method, for example the Levensthein distance; or one could apply a moderate conversion and simply output all matches.
Something more simple, but still somwhat useful, would be to generate one "dumbed down" list of all titles, which eliminates most ambiguities like "k->c", "m->n", etc...
The search terms could then be dumbed down in the same manner and compared to that list with a simple string comparison.
One could either apply a very excessive conversion, so that many results are returned which can then be sorted with a more sophisticated (and time-intensive) method, for example the Levensthein distance; or one could apply a moderate conversion and simply output all matches.
You could 'enchance' the search by allowing wildcards, which would also solve a problem I run into quite often.
My problem: for instance you search for "Ghost shell" expecting to find "Ghost in the shell" but appearantly, a -space- is not seen as an -and- or -or- possibilty, or something that would allow multiple wildcards (in this case: " in the " would be the wildcard part). This could be solved easily (if you work with SQL that is)
Then you could allow a -space- to be a mutli char wildcard (0 or more chars inbetween) and ? as a single char wildcard (0 or 1 chars inbetween)
THEN (still going :p ) you could (when no search results are returned) replace every char each by each with a -?- to make that char the wildcard, and just keep going till you find a result.
Examples:
"Ghost Shell" would give "did you mean: Ghost in the shell?" (So its like "%Ghost%shell%" (note the extra % there))
"Ra?m??r? Sekitan" would give "did you mean: Raimuiro Sekitan?" (might have misspelled that, you get the point :p )
"Rourouni Kenshin" would give "did you mean: Rurouni Kenshin?"
"Gnundam" would give "did you mean: Mobile Suit Gundam Wing or .. etc.."
Also, the -space- could be taken as an -or- when no search results are found using -space- as an -and-
"Pokemon Advance" would give "did you mean: Pokemon?"
However, this might be -somewhat- time consuming (I dont know how much one full search costs) then again it could save alot of search aswell [/i]
My problem: for instance you search for "Ghost shell" expecting to find "Ghost in the shell" but appearantly, a -space- is not seen as an -and- or -or- possibilty, or something that would allow multiple wildcards (in this case: " in the " would be the wildcard part). This could be solved easily (if you work with SQL that is)
Then you could allow a -space- to be a mutli char wildcard (0 or more chars inbetween) and ? as a single char wildcard (0 or 1 chars inbetween)
THEN (still going :p ) you could (when no search results are returned) replace every char each by each with a -?- to make that char the wildcard, and just keep going till you find a result.
Examples:
"Ghost Shell" would give "did you mean: Ghost in the shell?" (So its like "%Ghost%shell%" (note the extra % there))
"Ra?m??r? Sekitan" would give "did you mean: Raimuiro Sekitan?" (might have misspelled that, you get the point :p )
"Rourouni Kenshin" would give "did you mean: Rurouni Kenshin?"
"Gnundam" would give "did you mean: Mobile Suit Gundam Wing or .. etc.."
Also, the -space- could be taken as an -or- when no search results are found using -space- as an -and-
"Pokemon Advance" would give "did you mean: Pokemon?"
However, this might be -somewhat- time consuming (I dont know how much one full search costs) then again it could save alot of search aswell [/i]
as a quote for this guy the spaces work pretty fun in anidb try searching for " akira" with an space before and "akira" without spaces u go directly to akira countdown... but when u put "akira " with a space after ^^ it gives ya 2 diferent answers one of them is samuray 7 " because it got akira in the name too " another nice example is with ninja scrollJarudin wrote:You could 'enchance' the search by allowing wildcards, which would also solve a problem I run into quite often.
My problem: for instance you search for "Ghost shell" expecting to find "Ghost in the shell" but appearantly, a -space- is not seen as an -and- or -or- possibilty, or something that would allow multiple wildcards (in this case: " in the " would be the wildcard part). This could be solved easily (if you work with SQL that is)
Then you could allow a -space- to be a mutli char wildcard (0 or more chars inbetween) and ? as a single char wildcard (0 or 1 chars inbetween)
THEN (still going :p ) you could (when no search results are returned) replace every char each by each with a -?- to make that char the wildcard, and just keep going till you find a result.
Examples:
"Ghost Shell" would give "did you mean: Ghost in the shell?" (So its like "%Ghost%shell%" (note the extra % there))
"Ra?m??r? Sekitan" would give "did you mean: Raimuiro Sekitan?" (might have misspelled that, you get the point :p )
"Rourouni Kenshin" would give "did you mean: Rurouni Kenshin?"
"Gnundam" would give "did you mean: Mobile Suit Gundam Wing or .. etc.."
Also, the -space- could be taken as an -or- when no search results are found using -space- as an -and-
"Pokemon Advance" would give "did you mean: Pokemon?"
However, this might be -somewhat- time consuming (I dont know how much one full search costs) then again it could save alot of search aswell [/i]
try " ninja scroll" it takes ya to the 1st ninja scroll he sees in database ( basilisk" if u type "ninja scroll" it gives ya 5 answers... if u type "ninja scroll" with two spaces u get nothing ^^ and "ninja scroll " with a space after and u get the movies ^^ i dont know if that's something premeditate but is kinda fun how u can play with spaces in this database