Export of AniDB data, not just mylist data?

Want to help out? Need help accessing the AniDB API? This is the place to ask questions.

Moderator: AniDB

Locked
ncmaothvez
Posts: 8
Joined: Mon Sep 12, 2005 5:37 pm

Export of AniDB data, not just mylist data?

Post by ncmaothvez » Sun Jul 30, 2006 2:34 pm

Hi!

Wasn't sure if this q should go under Support or Development, it's related to some of my personal AniDB dev projects though. Feel free to move it if it's too OT.

The July 29th update threw a monkeywrench in the works for me. I have a few non-public AniDB tools that do offline processing of data from the Advanced Search page. Now, when the number of returned search results have been limited to 250 results, these tools no longer work.

The data I need, once every 24-48h, is a list of all animes in the AniDB database that have at least 10 votes (in other words, all titles that have a calculated rating) and for each of those animes I want:
The (English) title
The aid#
The number of votes
The current rating

I've looked at AOM but, as far as I can tell, AOM cannot export data from any of the 'Anime browser' sub-tabs and its database is encrypted. I believe I also read somewhere that UDP based clients are not allowed to do a full DB download.

Is it somehow possible to export the above listed data from AniDB? I guess not, would probably be an invitation to AniDB ripoff attempts. Just want to make sure I'm not missing anything

/N

User avatar
Rar
AniDB Staff
Posts: 1471
Joined: Fri Mar 12, 2004 2:41 pm
Location: UK
Contact:

Post by Rar » Sat Aug 05, 2006 9:09 pm

No, screen scraping is not okay, and your bot will be automatically banned. AniDB provides an interface for automated retrieval of select data, it's well documented on the wiki and in this forum, *and* there's example code in most languages. See http://wiki.anidb.info/w/UDP_API_Definition - if you need a feature that's not there yet, there's a dev page too.

Rar

ncmaothvez
Posts: 8
Joined: Mon Sep 12, 2005 5:37 pm

Post by ncmaothvez » Sun Aug 06, 2006 1:49 pm

The API does look interesting but I see two problems though:

1. It's not possible to retrieve the aid# for the anime with the highest aid#. This means that one has to:

a. Guess the highest AID#.

b. Repeatedly send the ANIME command, incrementing the 'aid' parameter by one for each command, until it can be assumed that data for all animes in the DB have been retrieved while caching all aid#s that return a 330 reply.

If the '30 seconds between commands' rule set by the API is followed then it would take 2 to 3 days to load all data. For my application this is OK but I suspect this would put an unwanted load on the server.

2. The note about anti leech protection in the API def says that a full DB download using the API will get you banned.

Is there really no way to get the data I want? :(


Just out of curiosity; why is screen scraping not okay? Is it for server load reasons? Copyright reasons? I can fully understand that it would be really bad if one made a custom bot that continuosly loads pages as fast as possible 24/7. If it's not a bot but instead Internet Explorer (or any other browser) that loads 10 to 20 pages, waiting 30 to 60 seconds between each page load, and then waits 24 hours until next batch load, then the server load would probably be even less than if a human manually loaded the webpages in the browser.

Besides, what's the difference between having Internet Explorer automatically loading and saving 20 pages (asuming it's done slowly) and me manually loading the same 20 pages in Internet Explorer and saving them? Just curious.

Regards,
/N

User avatar
Rar
AniDB Staff
Posts: 1471
Joined: Fri Mar 12, 2004 2:41 pm
Location: UK
Contact:

Post by Rar » Sun Aug 06, 2006 2:09 pm

You misunderstand. You do not actually want a dump of all titles and ratings every day, that's just a slightly idiotic means to an end. What you want is a... list? of top titles by ranking? Saying what you're actually trying to do would probably help. Ask for a command to get that from the api, then you're doing one request per 24 hours. Screen scraping is bad because it's generally used by people who don't understand what the problem is, when there are better alternatives available.

Rar

ncmaothvez
Posts: 8
Joined: Mon Sep 12, 2005 5:37 pm

Post by ncmaothvez » Sun Aug 06, 2006 2:57 pm

Good point, Rar.

For the sake of completeness, and since I'll be referring to this post from the command request, I'm including parts of a message I sent to Der Idiot, which explains a bit more abot the actual application.

No need to reply unless someone wants to add anything.
...
For the past year or so I`ve been collecting statistics on the animes in the database for my own personal use and amusement. The main reason for doing so has been to see which animes are currently the most `hot` ones by looking at the trend of the ratings, weighted against the most popular anime, over a 30 day period.

In the past I used the Advanced Search page to get a list, once per day, of all animes in the database with more than 10 votes and then processed and imported the data to an Excel graph. By looking at a graph with all the ratings gives me a much better view (I think) of the ratings since spam voting and other malicious behaviour can be much more easily seen in a graph than by just looking at a number. Also, it`s quite nice to see how the popularity of the animes change over several months.

Since the Adv. Search page now has a 250 items limit, I can no longer get all the data I want using the old method.
...

User avatar
exp
Site Admin
Posts: 2438
Joined: Tue Oct 01, 2002 9:42 pm
Location: Nowhere

Post by exp » Mon Aug 07, 2006 3:40 am

ncmaothvez wrote:Good point, Rar.

For the sake of completeness, and since I'll be referring to this post from the command request, I'm including parts of a message I sent to Der Idiot, which explains a bit more abot the actual application.

No need to reply unless someone wants to add anything.
...
For the past year or so I`ve been collecting statistics on the animes in the database for my own personal use and amusement. The main reason for doing so has been to see which animes are currently the most `hot` ones by looking at the trend of the ratings, weighted against the most popular anime, over a 30 day period.

In the past I used the Advanced Search page to get a list, once per day, of all animes in the database with more than 10 votes and then processed and imported the data to an Excel graph. By looking at a graph with all the ratings gives me a much better view (I think) of the ratings since spam voting and other malicious behaviour can be much more easily seen in a graph than by just looking at a number. Also, it`s quite nice to see how the popularity of the animes change over several months.

Since the Adv. Search page now has a 250 items limit, I can no longer get all the data I want using the old method.
...
now that helps a lot :o)

actually, what you're doing is an offline implementation of a new anidb feature I've already started work on :P
at some point anidb will be able to display monthly and yearly graphs for the anime statistics ep, file, user and group count.
i've already added the backend code which takes daily samples of these stats but I'll wait another 1-2 months until enough data has been accumulated to make any meaningfull graphs.

i think rather than doing this on your own, it would be better to create a new topic here where you describe in detail what kind of features your current offline solution offers and how an anidb online version may even include some more interesting stuff.
that way all users would benefit from it.

BYe!
EXP

PS: btw. just to underline this, our main problem with automated parsing of the anidb webpages is that even though it may only create a minimal load when you look at your own scripts, anidb unfortunately has tons of ppl who try to grab some data. And if you take it all together it is a serious problem.
another issue is the fact that we, for various reasons, believe that making the information stored in the anidb database publicly available is not beneficial (refer to some of my older posts for reasons).

the idea you outlined above of requesting each anime separately via the UDP api would definitely result in an automated ban (just as a warning :P)

User avatar
exp
Site Admin
Posts: 2438
Joined: Tue Oct 01, 2002 9:42 pm
Location: Nowhere

Post by exp » Mon Aug 07, 2006 3:44 am

oh btw. the fact that the backend has already started the data collection also means that you should speak up soon if you believe that any other stats beyond:
- eps added for the anime
- files added for the anime
- groups subbing the anime
- users with at least one file for the anime

is needed.

BYe!
EXP

ncmaothvez
Posts: 8
Joined: Mon Sep 12, 2005 5:37 pm

Post by ncmaothvez » Mon Aug 07, 2006 5:01 pm

Exp, looks like you've been reading my mind :wink: I've actually been toying with the idea of posting a request for this feature.

OK, I'm dropping this thread and I'll continue in a new one here
/N

Locked