Corbin's crazy ideas for OpenAniDB

Want to help out? Need help accessing the AniDB API? This is the place to ask questions.

Moderator: AniDB

MostAwesomeDude
Posts: 38
Joined: Fri Jun 01, 2007 11:02 am

Corbin's crazy ideas for OpenAniDB

Post by MostAwesomeDude »

The object of the game is simple. I'm bored, and bored me usually writes strange code that probably violates rules, ordinances, or FCC guidelines. Thus, I'm gonna write out ideas, and you guys are gonna shoot 'em down. (Or approve 'em, but that's unlikely.)

Who I'm looking for: Staffers, since they know the rules; and anybody who's tried out my client. (Right now, that's me and like four staffers.)

Simple ideas:
- Regexp/substitutions for file save path. Feasible, simple, and not done mostly because of laziness. Done.

- Add version numbers to filenames and display data. Not quite straightforward, but possible. Would be one night's work.

- Windows binaries. I finally got my hands on a Windows machine, yay? But, nobody's asked for binaries. If you want 'em, I'll make 'em! Done.

More complex:
- Targeted notifies. Will require writing a notify handling thread. Would only grab notifies related to files in the cache.

- Implement producer list. I don't see a need for it, but then again I'm just one person. Would require work equal to when groups were implemented; that is, three hours. Meh.

- Per-user mylists. In theory, somebody out there needs two mylists for one local account. This would solve that problem. However, in practice, nobody's asked for it.

- More hashes. TTH, MD5, SHA1, CRC32. CRC32 would be very useful to creqers, and could be used offline with a regexp that pulls a release group's CRC32 out from the filename. The others have been discussed at one time or another as being used to uniquely identify files in AniDB proper, and so I should probably implement them.

New territory (no known UDP client does this, so feel free to veto these with "what the **** is your major malfunction, son?"):
- Per-file hashing, filing, and renaming settings, via popup. This would be, as far as I know, a first; when each file is up for renaming and addition to the mylist, a popup can be used to customize settings for each file. This is one of those crazy 2AM ideas. Trashed; bad UI design.

- Keep track of files in the local system. This would be a true endeavor, but the benefit of having every previously hashed file on record would be useful for mylist double-checking and backup list generation, for example.

- All notifies. This is what I wanted the client to be, at one point. The client becomes aware of all data passing through it, and eventually acquires a sizable portion of the database, sort of like an incremental kowai. Requires a notify handling thread. Trashed; violates leeching rules.

- Avdump. Do I need source? Nah. It would be sorta fun to write, and I'm tired of writing so much GUI stuff anyway. Which, of course, leads to...

- Integrated creqs. This was just something that came into my head from nowhere, but a bit of thinking told me that it's not impossible, which is the important thing.

- Replace AOM, lawl. Just kidding, just kidding! Not possible, since I have neither TCP permission nor kowai decryption nor dev blessing nor anything else required for that. I'm leaving this here to remind me not to attempt it. Trashed; violates kowai rules.

- Support for other online anime databases. Prior to a weeaboo on 4chan mentioning this, I had no idea that there were other places, but I suppose that eventual support for other places may be in the distant future... Trashed; needs more information.

I'm falling asleep, but this should be more than enough craziness for one post. Of course, if you've got more suggestions that out-crazy me, then send 'em in!

~ C.
Last edited by MostAwesomeDude on Sat Aug 25, 2007 11:16 pm, edited 1 time in total.
Rar
AniDB Staff
Posts: 1471
Joined: Fri Mar 12, 2004 2:41 pm
Location: UK
Contact:

Re: Corbin's crazy ideas for OpenAniDB

Post by Rar »

MostAwesomeDude wrote: - Targeted notifies. Will require writing a notify handling thread. Would only grab notifies related to files in the cache.
Notifies are already 'targeted' to some degree by the selections users make when asking for notifications. Not sure what additional filtering would be useful without looking into it - but a decent notification interface, including that-second-desktop-popups was what the UDP interface was originally for.
MostAwesomeDude wrote: - Implement producer list. I don't see a need for it, but then again I'm just one person. Would require work equal to when groups were implemented; that is, three hours. Meh.
I am probably the primary user of this. :D
MostAwesomeDude wrote: - Per-user mylists. In theory, somebody out there needs two mylists for one local account. This would solve that problem. However, in practice, nobody's asked for it.
A few do, mostly due to pron-shame or something stupid. In more demand is better operations between the mylists of different users.
MostAwesomeDude wrote: - More hashes. TTH, MD5, SHA1, CRC32. CRC32 would be very useful to creqers, and could be used offline with a regexp that pulls a release group's CRC32 out from the filename. The others have been discussed at one time or another as being used to uniquely identify files in AniDB proper, and so I should probably implement them.
MD5 and SHA1 *do* uniquely identify files, and are used in some places *cough*winny*cough*. 2.5 has hashlib, I use pycrypto.
MostAwesomeDude wrote: - Per-file hashing, filing, and renaming settings, via popup. This would be, as far as I know, a first; when each file is up for renaming and addition to the mylist, a popup can be used to customize settings for each file. This is one of those crazy 2AM ideas.
Blocking-popup? Sounds annoying, software that needs hand-holding.
MostAwesomeDude wrote: - Keep track of files in the local system. This would be a true endeavor, but the benefit of having every previously hashed file on record would be useful for mylist double-checking and backup list generation, for example.
Less epic if you can pursuade people to move/rename through the ap, otherwise 'd require re-hashing (though PetriW did look at using some NTFS annotation feature in the past to avoid that).
MostAwesomeDude wrote: - All notifies. This is what I wanted the client to be, at one point. The client becomes aware of all data passing through it, and eventually acquires a sizable portion of the database, sort of like an incremental kowai. Requires a notify handling thread.
I don't have room for a sizable portion of anidb on my computer... (joke, clearly you only want one mylist locally). Anyway, we discussed why I don't think this is practical in a previous thread.
MostAwesomeDude wrote: - Avdump. Do I need source? Nah. It would be sorta fun to write, and I'm tired of writing so much GUI stuff anyway. Which, of course, leads to...
It's GPL. Epoxi somehow made took various portable things and made it un-portable, but that's fixable if you're a c-wrangler. At any rate, you can just use subprocess and pipe to and from.
MostAwesomeDude wrote: - Integrated creqs. This was just something that came into my head from nowhere, but a bit of thinking told me that it's not impossible, which is the important thing.
Wouldn't hurt, if done well.
MostAwesomeDude wrote: - Replace AOM, lawl. Just kidding, just kidding! Not possible, since I have neither TCP permission nor kowai decryption nor dev blessing nor anything else required for that. I'm leaving this here to remind me not to attempt it.
Depends what you mean by 'replace'. Of the things I use AOM for, you (and others) have done the 80% which is hashing files and adding them to mylist. Other tasks, like db-searching, are feasable by other means as well.
MostAwesomeDude wrote: - Support for other online anime databases. Prior to a weeaboo on 4chan mentioning this, I had no idea that there were other places, but I suppose that eventual support for other places may be in the distant future...
Probably a lot of work for little gain, unless you're thinking of stuff outside the ones we already link.

Rar
MostAwesomeDude
Posts: 38
Joined: Fri Jun 01, 2007 11:02 am

Re: Rar

Post by MostAwesomeDude »

Two days later, after some sleep...
Rar wrote:
MostAwesomeDude wrote: - Targeted notifies. Will require writing a notify handling thread. Would only grab notifies related to files in the cache.
Notifies are already 'targeted' to some degree by the selections users make when asking for notifications. Not sure what additional filtering would be useful without looking into it - but a decent notification interface, including that-second-desktop-popups was what the UDP interface was originally for.
Notifies, I think, wouldn't hurt a lightweight client like mine.
Rar wrote:
MostAwesomeDude wrote: - Implement producer list. I don't see a need for it, but then again I'm just one person. Would require work equal to when groups were implemented; that is, three hours. Meh.
I am probably the primary user of this. :D
I should do this, if for nothing else besides completeness.
Rar wrote:
MostAwesomeDude wrote: - Per-user mylists. In theory, somebody out there needs two mylists for one local account. This would solve that problem. However, in practice, nobody's asked for it.
A few do, mostly due to pron-shame or something stupid. In more demand is better operations between the mylists of different users.
Agreed, although I wonder how much of that should be client-side and how much should be server-side. (Probably all client-side, since you have to auth as a user before you can have that user's mylist data.)
Rar wrote:
MostAwesomeDude wrote: - More hashes. TTH, MD5, SHA1, CRC32. CRC32 would be very useful to creqers, and could be used offline with a regexp that pulls a release group's CRC32 out from the filename. The others have been discussed at one time or another as being used to uniquely identify files in AniDB proper, and so I should probably implement them.
MD5 and SHA1 *do* uniquely identify files, and are used in some places *cough*winny*cough*. 2.5 has hashlib, I use pycrypto.
So fields 11 and 12 in fcode do uniquely identify? Sweet! That will make PyCrypto unneeded for hashing, since MD5 is gobs faster. Even without hashlib, there is still built-in MD5 and SHA1. I authored the Python version of TTH for SHA1, as well. That only leaves CRC32.
Rar wrote:
MostAwesomeDude wrote: - Per-file hashing, filing, and renaming settings, via popup. This would be, as far as I know, a first; when each file is up for renaming and addition to the mylist, a popup can be used to customize settings for each file. This is one of those crazy 2AM ideas.
Blocking-popup? Sounds annoying, software that needs hand-holding.
Agreed. My ideas at 2AM suck. *grin*
Rar wrote:
MostAwesomeDude wrote: - Keep track of files in the local system. This would be a true endeavor, but the benefit of having every previously hashed file on record would be useful for mylist double-checking and backup list generation, for example.
Less epic if you can pursuade people to move/rename through the ap, otherwise 'd require re-hashing (though PetriW did look at using some NTFS annotation feature in the past to avoid that).
Alternate Data Streams, probably. There's also metadata in ReiserFS, but nobody except ricers use it. ADS is fun, but not exactly portable. However, you could always use file sizes to identify a set of mylist possibilities. I remember talking about this kind of thing way back when I started, and this should definitely be backburnered pending better algorithms for file detection and sorting.
Rar wrote:
MostAwesomeDude wrote: - All notifies. This is what I wanted the client to be, at one point. The client becomes aware of all data passing through it, and eventually acquires a sizable portion of the database, sort of like an incremental kowai. Requires a notify handling thread.
I don't have room for a sizable portion of anidb on my computer... (joke, clearly you only want one mylist locally). Anyway, we discussed why I don't think this is practical in a previous thread.
I really wrote this? It's like DBZ. "Oh, no, Goku, it's a new challenger that has all of our abilities combined into one super fighter, and his power level is greater than all of ours combined!" Kowai would be a fun thing to have, but trying to build it incrementally is stupid for reasons obvious. Maybe kowai via BitTorrent... Just kidding. Stupid idea, gone.
Rar wrote:
MostAwesomeDude wrote: - Avdump. Do I need source? Nah. It would be sorta fun to write, and I'm tired of writing so much GUI stuff anyway. Which, of course, leads to...
It's GPL. Epoxi somehow made took various portable things and made it un-portable, but that's fixable if you're a c-wrangler. At any rate, you can just use subprocess and pipe to and from.
I love C. It's so procedural and directly modifying memory, I just love it. Seriously, though, I would be modifying avdump if I were to use it, since it still has some deficiencies. Either way, it leads to...
Rar wrote:
MostAwesomeDude wrote: - Integrated creqs. This was just something that came into my head from nowhere, but a bit of thinking told me that it's not impossible, which is the important thing.
Wouldn't hurt, if done well.
Huh, this one actually makes sense. This is a good idea.
Rar wrote:
MostAwesomeDude wrote: - Replace AOM, lawl. Just kidding, just kidding! Not possible, since I have neither TCP permission nor kowai decryption nor dev blessing nor anything else required for that. I'm leaving this here to remind me not to attempt it.
Depends what you mean by 'replace'. Of the things I use AOM for, you (and others) have done the 80% which is hashing files and adding them to mylist. Other tasks, like db-searching, are feasable by other means as well.
Again, this goes back to kowai not being open, but after seriously thinking about it, there's no reason to need kowai, anyway. I mean, stubbed tables for keyword searching would be fun, but kowai doesn't have those; they would have to be generated. The thing that I want from AOM is a full mylist table. Minimal-csv is not sufficient; I need to write a full-csv template that will give all of the needed info for a mylist import.
Rar wrote:
MostAwesomeDude wrote: - Support for other online anime databases. Prior to a weeaboo on 4chan mentioning this, I had no idea that there were other places, but I suppose that eventual support for other places may be in the distant future...
Probably a lot of work for little gain, unless you're thinking of stuff outside the ones we already link.
I haven't done any more reading into this, nor do I intend to. /b/ wouldn't know what to do with it anyway, and I have had exactly zero requests for this. (To be fair, I don't get very many requests at all!) So, no on this unless somebody has a specific proposal.
Rar wrote:Rar
Arigatou gozaimasu for your input.
~ C.
Rar
AniDB Staff
Posts: 1471
Joined: Fri Mar 12, 2004 2:41 pm
Location: UK
Contact:

Re: Rar

Post by Rar »

MostAwesomeDude wrote:
Rar wrote:MD5 and SHA1 *do* uniquely identify files, and are used in some places *cough*winny*cough*. 2.5 has hashlib, I use pycrypto.
So fields 11 and 12 in fcode do uniquely identify? Sweet! That will make PyCrypto unneeded for hashing, since MD5 is gobs faster. Even without hashlib, there is still built-in MD5 and SHA1. I authored the Python version of TTH for SHA1, as well. That only leaves CRC32.
Brief comment while I read the rest:
from zlib import crc32

Rar
Der Idiot
AniDB Staff
Posts: 1227
Joined: Fri Mar 21, 2003 10:19 am

Re: Corbin's crazy ideas for OpenAniDB

Post by Der Idiot »

MostAwesomeDude wrote: - Avdump. Do I need source? Nah. It would be sorta fun to write, and I'm tired of writing so much GUI stuff anyway. Which, of course, leads to...
Rar wrote: It's GPL. Epoxi somehow made took various portable things and made it un-portable, but that's fixable if you're a c-wrangler. At any rate, you can just use subprocess and pipe to and from.
which doesn't work too well for unicode characters in filenames like my gui for avdump shows. on the otherhand i sucking fuck.

on another note the tcp api isn't completely closed. those who want access to it just have to show enough effort to get access grnated. we got 2 other people working on some java tcp client atm.
PetriW
AniDB Staff
Posts: 1522
Joined: Sat May 24, 2003 2:34 pm

Post by PetriW »

Avdump has a call to open files by file handle if I remember correctly, this does not have any unicode file name issues. However, it would of course require another program to open the file in question then call avdump dll with the handle.

And yes, the tcp api isn't closed, if you show a good enough skill you could be allowed to access it. ;) The problem of course is that you have to make your client somewhat hard to reverse engineer and keep the data files encrypted... which increases the workload significantly and carries a significant hit on performance.


ADS works pretty good, most people do have access to it through windows being kind of popular. If ADS isn't available you could use the eMule way of keeping track of files: filename + filesize. Not safe but may be better than forcing a rehash all the time.
MostAwesomeDude
Posts: 38
Joined: Fri Jun 01, 2007 11:02 am

Re: Corbin's crazy ideas for OpenAniDB

Post by MostAwesomeDude »

Der Idiot wrote:
MostAwesomeDude wrote: - Avdump. Do I need source? Nah. It would be sorta fun to write, and I'm tired of writing so much GUI stuff anyway. Which, of course, leads to...
Rar wrote: It's GPL. Epoxi somehow made took various portable things and made it un-portable, but that's fixable if you're a c-wrangler. At any rate, you can just use subprocess and pipe to and from.
which doesn't work too well for unicode characters in filenames like my gui for avdump shows. on the otherhand i sucking fuck.
"Sucking fuck?" Man, I thought I was hard on myself... Anyway, Unicode is not a big problem; I can always alter the program to respect Unicode filenames. Alternately, I can open the file in Python and use an mmap or pipe to transfer stuff to the avdump code (libavdump, anyone? Anyone? Bueller?)
PetriW wrote:Avdump has a call to open files by file handle if I remember correctly, this does not have any unicode file name issues. However, it would of course require another program to open the file in question then call avdump dll with the handle.

And yes, the tcp api isn't closed, if you show a good enough skill you could be allowed to access it. ;) The problem of course is that you have to make your client somewhat hard to reverse engineer and keep the data files encrypted... which increases the workload significantly and carries a significant hit on performance.

ADS works pretty good, most people do have access to it through windows being kind of popular. If ADS isn't available you could use the eMule way of keeping track of files: filename + filesize. Not safe but may be better than forcing a rehash all the time.
Closing TCP code would be a cinch. Just write the code in C, put it in a UPXed DLL, and then use Ctypes to call it. However, I explicitly did want data files to be readable, or at the very least exportable, which means that any such TCP support would be optional and not a dependency for operation.

More importantly, I don't need TCP access; I need to write a "full-csv" template to complement the existing minimal-csv. The idea is that if I am going to support CSV mylist import, I need to have all the data points I require in the CSV, because otherwise I'm just back to leeching the UDP, and we all don't want that.

I have to track down a DLL for ADS access at some point if I want it. The eMule way of doing things calls to me, although the two data points that I prefer are a CRC32 and filesize, since those do not change and together should isolate unique mylist entries. (If there's a collision, let the user select the correct entry.)

@PetriW: If we're doing ADS, I want to be able to read and write the same metadata that you're reading and writing; anything else just seems silly/stupid on my part. I saw the other topic where you talked about it; have you decided on a metadata format of some sort?

@Der Idiot: I fixed that bug from way back when, with Windows and login. (I was passing shared memory inappropriately.) There's also a binary .zip in my folder if you want. http://locke.aweenet.net/~simpson/oadb if you're still interested.

Looks like my roadmap is pretty clear for now. Add more hashes and settings to control them; add avdump/write a libavdump shared object; find an ADS library of some sort and figure out how to do it on both Linux on Windows. Oh, and fix bug reports and acknowledge user requests. Now if only I had bug reports and users... :roll:

Thanks very much, guys.
~ C.
PetriW
AniDB Staff
Posts: 1522
Joined: Sat May 24, 2003 2:34 pm

Post by PetriW »

Well, the ADS thing in aom.neverarriving has been done since like forever but since it's not actually out yet it can still be changed.

May want to know that aom will not accept ADS data provided by another application as it would be possible to submit bad data to AniDB, which we don't want. ;)

As for the structure, poke me when you need it and I'll send it. ;) Unless I document it on the wiki first. Don't have it in cleartext atm since I write the files through a xml thingie.
MostAwesomeDude
Posts: 38
Joined: Fri Jun 01, 2007 11:02 am

Post by MostAwesomeDude »

PetriW wrote:Well, the ADS thing in aom.neverarriving has been done since like forever but since it's not actually out yet it can still be changed.

May want to know that aom will not accept ADS data provided by another application as it would be possible to submit bad data to AniDB, which we don't want. ;)

As for the structure, poke me when you need it and I'll send it. ;) Unless I document it on the wiki first. Don't have it in cleartext atm since I write the files through a xml thingie.
Structure it however you want. If you don't want tags I write to be trusted (good call, by the way,) then just put a signature on valid tags that I can't duplicate. Off the top of my head, take a hash with a secret salt, then prepend it to the top of the structure, or something like that. I don't have any problem with my app's output being untrusted, but you should find something that somebody malicious couldn't duplicate. (Of course, you would have to be pretty malicious to attack AniDB... :? )
PetriW
AniDB Staff
Posts: 1522
Joined: Sat May 24, 2003 2:34 pm

Post by PetriW »

It already does that and more actually. ;)
Rar
AniDB Staff
Posts: 1471
Joined: Fri Mar 12, 2004 2:41 pm
Location: UK
Contact:

Post by Rar »

It doesn't, because it's not released. Winners ship, losers brag about the many features of their vapourware.

Rar
MostAwesomeDude
Posts: 38
Joined: Fri Jun 01, 2007 11:02 am

Post by MostAwesomeDude »

Rar wrote:It doesn't, because it's not released. Winners ship, losers brag about the many features of their vapourware.

Rar
Guess I'd better "code moar."
~ C.
PetriW
AniDB Staff
Posts: 1522
Joined: Sat May 24, 2003 2:34 pm

Post by PetriW »

Rar wrote:It doesn't, because it's not released. Winners ship, losers brag about the many features of their vapourware.
If you want a release with that in it, talk with epoximator, not me. ;) Nah, well if epoxi can't do it I have to find some other solution I guess.
There won't be an aom.5 replacement anytime soon but I do kind of want to release another preview, but without sqlite encryption I can't do it and I've not really reached the point where I'm willing to pay 300 euros for it yet (to get DISqlite).
epoximator
AniDB Staff
Posts: 379
Joined: Sun Nov 07, 2004 11:05 am

Post by epoximator »

i don't buy that. i have sqlite on low pri because i know you won't release anything anytime soon anyway. (i've already updated it twice, no?)
PetriW
AniDB Staff
Posts: 1522
Joined: Sat May 24, 2003 2:34 pm

Post by PetriW »

Well, if you don't want to do it don't, I'll solve it some other way. Probably less effort for me anyway since I don't have to import a dll (but heavier on the wallet :().

The first of the previous dlls have been public, the second never was made public because I couldn't get the bugs sorted out (which were in the first too). Turned out it was sqlite which was buggy and hence why I asked for a new dll again since it has been fixed, also note that I waited until I actually had something to show plus that the java team needed it too before asking again, I don't just ask for them to make your life a misery.

Anyway, as said, if you don't want to do it tell me so I can solve it some other way, it is after all your own time.
Locked