re-using kowai

Want to help out? Need help accessing the AniDB API? This is the place to ask questions.

Moderator: AniDB

titoum
Posts: 29
Joined: Mon Sep 13, 2004 10:48 am
Location: Plop

re-using kowai

Post by titoum »

hi

i'm currently writting an aom client based on the udp api in java. the login and retrieving information are going smooth but i was wondering if it can be possible to use the *.data file provide by aom.

the other way i'm looking to is to make an import of mylist and ask to other big user to provide me their list to have a big base anim database to start with and incrementally increase it through the search of the user.

tks in advance
Der Idiot
AniDB Staff
Posts: 1227
Joined: Fri Mar 21, 2003 10:19 am

Post by Der Idiot »

no it's not possible. aom dumps are encrypted
PetriW
AniDB Staff
Posts: 1522
Joined: Sat May 24, 2003 2:34 pm

Post by PetriW »

In order to use the information in aom you need to develop a TCP API application.
titoum
Posts: 29
Joined: Mon Sep 13, 2004 10:48 am
Location: Plop

Post by titoum »

tks for your quick answer PetriW & Der Idiot

so i was wondering...

would it be possible to have juste an extract with the anime info without eps and so juste anime id | name like we get through the udp api ?

and if a user is searching on it doing the udp connection for more info and also an incremental update of the base by incrementing the db to the lastest anime id ?
Rar
AniDB Staff
Posts: 1471
Joined: Fri Mar 12, 2004 2:41 pm
Location: UK
Contact:

Post by Rar »

What you're envisioning is essentially not practical. At any rate, step #1 is getting a working client ap that uses the UDP interface, so concentrate on that.

Rar
epoximator
AniDB Staff
Posts: 379
Joined: Sun Nov 07, 2004 11:05 am

Post by epoximator »

yeah, join the hall of shame
titoum
Posts: 29
Joined: Mon Sep 13, 2004 10:48 am
Location: Plop

Post by titoum »

Rar wrote:What you're envisioning is essentially not practical. At any rate, step #1 is getting a working client ap that uses the UDP interface, so concentrate on that.

Rar
#1 yeah it's already connecting and retrieving mylist stats and anime info...

i was just thinking about further. anyway i hope to have finish it this week or next depending of my workload :)

#2 why not practical ?
anidb have more than 4000 animes so if you only have the name & id of them, your client can only fetch data for which the user is interested in (ep, files..).

by exemple :

you have the anime bleach if the user look at it => retrieve episode available => if one in particulary ep => get files.
and store all of those to avoid querying it again.

but at the end less more information to store it all, nop ?

it's just a thought i had and didn't look deeply to see if it possible.
exp
Site Admin
Posts: 2438
Joined: Tue Oct 01, 2002 9:42 pm
Location: Nowhere

Post by exp »

titoum wrote:
Rar wrote:What you're envisioning is essentially not practical. At any rate, step #1 is getting a working client ap that uses the UDP interface, so concentrate on that.

Rar
#1 yeah it's already connecting and retrieving mylist stats and anime info...

i was just thinking about further. anyway i hope to have finish it this week or next depending of my workload :)

#2 why not practical ?
anidb have more than 4000 animes so if you only have the name & id of them, your client can only fetch data for which the user is interested in (ep, files..).

by exemple :

you have the anime bleach if the user look at it => retrieve episode available => if one in particulary ep => get files.
and store all of those to avoid querying it again.

but at the end less more information to store it all, nop ?

it's just a thought i had and didn't look deeply to see if it possible.
well, the problem here is quite simple, the UDP API is not meant for things like this. this is the domain of the TCP API.
It is of course possible to query some particular data via the UDP API. However if you were to write something like AoM where you want users to basically have all the data anidb offers at their fingertips, then the UDP API just won't do.
The key issue here is serverload. Having every client only query those few data bits which it requires at the moment is good in the short run. However, once the number of users increases you get to a point where it becomes impossible to handle all those small requests.
In such a scenario, it is a lot less work for the server to just trow a large binary database dump at the client and be done with it, instead of giving it out bit by bit. Which is why we have the TCP API in the first place.

However, the TCP API is not public and we usually don't hand TCP API access out to any newcomers, so you should start your work with the UDP API first.

BYe!
EXP
titoum
Posts: 29
Joined: Mon Sep 13, 2004 10:48 am
Location: Plop

Post by titoum »

oki tks for your complete answer :)

gonna put it in the wiki when done
MostAwesomeDude
Posts: 38
Joined: Fri Jun 01, 2007 11:02 am

Post by MostAwesomeDude »

exp wrote:
titoum wrote:
Rar wrote:What you're envisioning is essentially not practical. At any rate, step #1 is getting a working client ap that uses the UDP interface, so concentrate on that.

Rar
#1 yeah it's already connecting and retrieving mylist stats and anime info...

i was just thinking about further. anyway i hope to have finish it this week or next depending of my workload :)

#2 why not practical ?
anidb have more than 4000 animes so if you only have the name & id of them, your client can only fetch data for which the user is interested in (ep, files..).

by exemple :

you have the anime bleach if the user look at it => retrieve episode available => if one in particulary ep => get files.
and store all of those to avoid querying it again.

but at the end less more information to store it all, nop ?

it's just a thought i had and didn't look deeply to see if it possible.
well, the problem here is quite simple, the UDP API is not meant for things like this. this is the domain of the TCP API.
It is of course possible to query some particular data via the UDP API. However if you were to write something like AoM where you want users to basically have all the data anidb offers at their fingertips, then the UDP API just won't do.
The key issue here is serverload. Having every client only query those few data bits which it requires at the moment is good in the short run. However, once the number of users increases you get to a point where it becomes impossible to handle all those small requests.
In such a scenario, it is a lot less work for the server to just trow a large binary database dump at the client and be done with it, instead of giving it out bit by bit. Which is why we have the TCP API in the first place.

However, the TCP API is not public and we usually don't hand TCP API access out to any newcomers, so you should start your work with the UDP API first.

BYe!
EXP
At risk of being pedantic (something I am constantly accused of), there is a public TCP API. Of course, you probably would not call it that, but...

Do not forget that all information exposed through AOM is also available through the webserver, and that parsing HTML is definitely a viable alternative to attempting to break the private TCP API, even if you think it's tedious. cURLing AniDB for data and then parsing it into a displayable format is no worse than stripping, unescaping, and formatting a UDP packet, and has the added advantage of not being a "bannable offense;" that is, retrieving a mylist through HTTP results in less traffic than through UDP, even if the user fits the best-case scenario (in this case, if he still has all of the original files registered in his mylist.) (Proof of this is left as an exercise to the reader.)

(Caveat programmer: There is no reason why repeated HTTP requests might be blocked. The idea behind using HTTP in this case is to mitigate the total number of requests needed to assemble a complete image. So, you can't download the entire DB, and it will still be easier to send individual ANIME requests instead of show=anime&aid=xxxx requests.)

You should also try and construct a static cache, if you are thinking of replacing AOM. (I'm going off my own "replace-AOM" notes here, can't ya tell?) There are many pages which simply do not change on AniDB, even if they are dynamically generated. Even mylists can be fully cached with state and no expiry, thanks to the format of the MYLIST command. Start with a gestalt mylist, and then keep it updated as you modify it. Add in a user-controlled option to invalidate a node or entry in the cache, and you're set.

One final thing: Consider the ability to import and cache CSV mylist exports.

(Sorry for the myriad parenthetical asides. This is how I sound in real life, too. Honest. Also, I'm sorry that you have to do this in Java -- since I switched to Python a few months back, I can't stand anything else!)

~ C.
titoum
Posts: 29
Joined: Mon Sep 13, 2004 10:48 am
Location: Plop

Post by titoum »

hi,

thks for your long answer :)

my appz is going bigger and bigger ^^

i'have done a cache for the mylist don't worry about that and also be able to complete it with other mylist than yours.

with thread and stuff to make it quicker as possible.

i'm just enjoying swt @ the moment with row coloring and also doing my work lol

for the add function, i will only first parse the mylist cache to see if the file is already know if not try to add => if ok => fetch the info + add to the cache and serialze it on the hdd

once this will be done may be i will throw a first jar :p

would it be a problem if it's using webstart ? because include swt stuff in the jar seem annoying.
fahrenheit
AniDB Staff
Posts: 438
Joined: Thu Apr 08, 2004 1:43 am
Location: Portugal

Post by fahrenheit »

MostAwesomeDude wrote: At risk of being pedantic (something I am constantly accused of), there is a public TCP API. Of course, you probably would not call it that, but...

Do not forget that all information exposed through AOM is also available through the webserver, and that parsing HTML is definitely a viable alternative to attempting to break the private TCP API, even if you think it's tedious. cURLing AniDB for data and then parsing it into a displayable format is no worse than stripping, unescaping, and formatting a UDP packet, and has the added advantage of not being a "bannable offense;" that is, retrieving a mylist through HTTP results in less traffic than through UDP, even if the user fits the best-case scenario (in this case, if he still has all of the original files registered in his mylist.) (Proof of this is left as an exercise to the reader.)

(Caveat programmer: There is no reason why repeated HTTP requests might be blocked. The idea behind using HTTP in this case is to mitigate the total number of requests needed to assemble a complete image. So, you can't download the entire DB, and it will still be easier to send individual ANIME requests instead of show=anime&aid=xxxx requests.)

You should also try and construct a static cache, if you are thinking of replacing AOM. (I'm going off my own "replace-AOM" notes here, can't ya tell?) There are many pages which simply do not change on AniDB, even if they are dynamically generated. Even mylists can be fully cached with state and no expiry, thanks to the format of the MYLIST command. Start with a gestalt mylist, and then keep it updated as you modify it. Add in a user-controlled option to invalidate a node or entry in the cache, and you're set.

One final thing: Consider the ability to import and cache CSV mylist exports.

(Sorry for the myriad parenthetical asides. This is how I sound in real life, too. Honest. Also, I'm sorry that you have to do this in Java -- since I switched to Python a few months back, I can't stand anything else!)

~ C.
You know, the reason there is an udp api is to prevent stuff like what you are sugesting.
Yeah, you could curl the entire site for the data you want but you just need to think a bit to figure out why that isn't the best aproach, the amount of unnecessary data you are requesting is just too big. Example, you need some data of an anime, you have to get all the html code, plus you need to parse the said html to extract the data you need and only then you have the data you need, not pratical.

On a side note, you can get banned if you you try to request more than X of the http server, so instead of doing this by brute force it's much more sweet to do it the gentle way, like webaom does, trough the udp api.

Also if you wan't something that isn't provided by the udp api it's usualy faster to ask epox and he will do it in way less time it takes for something to happen on the http server (days vs months/years).

have fun
MostAwesomeDude
Posts: 38
Joined: Fri Jun 01, 2007 11:02 am

Post by MostAwesomeDude »

fahrenheit wrote:You know, the reason there is an udp api is to prevent stuff like what you are sugesting.
Yeah, you could curl the entire site for the data you want but you just need to think a bit to figure out why that isn't the best aproach, the amount of unnecessary data you are requesting is just too big. Example, you need some data of an anime, you have to get all the html code, plus you need to parse the said html to extract the data you need and only then you have the data you need, not pratical.

On a side note, you can get banned if you you try to request more than X of the http server, so instead of doing this by brute force it's much more sweet to do it the gentle way, like webaom does, trough the udp api.

Also if you wan't something that isn't provided by the udp api it's usualy faster to ask epox and he will do it in way less time it takes for something to happen on the http server (days vs months/years).

have fun
cURL or wget the entire site? Oh, God no. That would be evil. I was thinking more along the lines of certain useful pages for which there is no analog in the UDP API (and for which there will never be, due to server load.) Pages like http://anidb.info/perl-bin/animedb.pl?s ... t&uid=xxxx, for example.

I mentioned gestalts -- this is the best way to do things. AOM does this already, with the kowai data dumps; however, AOM also has a leg up, since there is a way to incrementally retrieve massive amounts of data through the TCP API. (Also, kowai, for the purposes of discussion, will be off-limits to UDP API devs.) So, absent a gestalt, the only way to proceed with a UDP version of AOM is to be massively evil to the server, which is not cool.

Fortunately, we're not completely without tools. AniDB is, like any database, massively static, with almost all changes being addition of data, and next to no removals. Thus, incremental retrieval of data from the main database will be fine, as long as we cache all of it and never invalidate the cache. Cache flushes should be incremental (because the alternative is to invalidate an entire table at one time, and that's not good when your table is several thousand UDP requests' worth of data!)

The mylist, however, is sliiightly different. First, it's small. My mylist is about 2% of AniDB, and spans three pages in a web browser. This is a cURLable amount. All of the data in a mylist is self-contained -- you do not need to issue an ANIME or EPISODE for each item in the list. (You might end up doing that anyway, if the user requests it, but try to stay away from that kind of thinking.) More importantly, the mylist can change quite easily. It should be very simple to invalidate the cached mylist, and load another gestalt, or to use a new gestalt to update the cached mylist (that's what gestalts are designed for, after all...)

(A mylist gestalt could also be obtained with the "mylist export" feature of the database, which can create a CSV. Whether or not this option is more taxing to the server [it probably is, XD] doesn't matter; it's just important that we support that option, since it is a valid snapshot of the mylist.)

You are right about unnecessary data. Parsing HTML is a chore that takes a day to write and two seconds to execute. It's slow, arduous work. Unfortunately, there's no other option, since the only good, valid gestalts that could power an AOM replacement are the kowai data dumps, which are not accessible. (Well, actually, they're not inaccessible, but the time which it would take to reverse the TCP API and the code that controls them and reimplement it is much greater than the time that it would take to write out gestalt importing and caching code. I should know; I've already written code to import a mylist from an HTTP or CSV dump!) The other option is to ask for an implementation in the UDP API, but it will not happen, since the UDP API was not designed to handle large amounts of data. *cough*1400bytemtu*cough* On top of that, implementing retrieval of gestalts (a TCP thing) in the UDP API defeats the purpose of both APIs.

~ C.
titoum
Posts: 29
Joined: Mon Sep 13, 2004 10:48 am
Location: Plop

Post by titoum »

i have about 10% of anidb in mylist...

you can parse the *.js for yourliste may be a way faster than parsing html ?

like fahrenheit said why curl everythings...once you have your list and every 5ms you add a file + add it in your cache. do you need more?
MostAwesomeDude
Posts: 38
Joined: Fri Jun 01, 2007 11:02 am

Post by MostAwesomeDude »

titoum wrote:i have about 10% of anidb in mylist...

you can parse the *.js for yourliste may be a way faster than parsing html ?

like fahrenheit said why curl everythings...once you have your list and every 5ms you add a file + add it in your cache. do you need more?
Depends on how much of AniDB you are trying to emulate. I'm not suggesting downloading the entire database -- the only viable way to do that is to break kowai.
Locked