How about open-source?
Moderator: AniDB
How about open-source?
This isn't really a bug report, maybe I should have posted it in requests, wasn't sure.
Anyway, all I wanted to do is suggest you to release the code for the client under gpl (or an open source licence of your choice) and maybe submit a project on sourceforge so that more people can help you develop it, after all the whole of anidb is founded on the comunity-concept (at least I think so) so why not apply it to the client as well? I'm sure there are many people that are willing to help out.
			
			
									
									
						Anyway, all I wanted to do is suggest you to release the code for the client under gpl (or an open source licence of your choice) and maybe submit a project on sourceforge so that more people can help you develop it, after all the whole of anidb is founded on the comunity-concept (at least I think so) so why not apply it to the client as well? I'm sure there are many people that are willing to help out.
I am myself a big fan of open source, however there are some mayor drawbacks which will most likely prevent something like that from happening in the near future:
1) working with multiple ppl on one program is always a problem. ATM there's exactly one person who works on each softeare part.
getting more ppl on any of those projects will be a hassle. if those ppl are not dedicated enough we lose more time to get them up to speed than we gain later.
2) we want to gather as many users on anidb as possible, the db lives only due to users who add data. creating anidb offsprings would therefore be a mayor problem for us. however that's exactly what would happen within a short period of time should we make all the code public.
this does not only affect the cgi but the client also. as the client more or less needs a complete local copy of the anidb data to do it's work the corresponding API is not public and for the time being that won't change.
All client developpers agreed on this so you won't see any complete source for any of them. However they are of course free to open some of their source to the public if they think that might be a good idea.
All this doesn't mean that we can't use your help though.
If you're dedicated enough I am sure we can work something out.
Currently AniDB is divided into the following parts:
- cgi (perl (mod_perl/postgresql)) [EXP]
- api (perl) [EXP]
- irc bot (perl) [EXP]
- client I (delphi) [BennieB]
- client II (C# .NET) [Hisoka]
the api daemon and the ircbot might need to be ported to another language though.
BYe!
EXP
			
			
									
									
						1) working with multiple ppl on one program is always a problem. ATM there's exactly one person who works on each softeare part.
getting more ppl on any of those projects will be a hassle. if those ppl are not dedicated enough we lose more time to get them up to speed than we gain later.
2) we want to gather as many users on anidb as possible, the db lives only due to users who add data. creating anidb offsprings would therefore be a mayor problem for us. however that's exactly what would happen within a short period of time should we make all the code public.
this does not only affect the cgi but the client also. as the client more or less needs a complete local copy of the anidb data to do it's work the corresponding API is not public and for the time being that won't change.
All client developpers agreed on this so you won't see any complete source for any of them. However they are of course free to open some of their source to the public if they think that might be a good idea.
All this doesn't mean that we can't use your help though.
If you're dedicated enough I am sure we can work something out.
Currently AniDB is divided into the following parts:
- cgi (perl (mod_perl/postgresql)) [EXP]
- api (perl) [EXP]
- irc bot (perl) [EXP]
- client I (delphi) [BennieB]
- client II (C# .NET) [Hisoka]
the api daemon and the ircbot might need to be ported to another language though.
BYe!
EXP
Irony
Ironically enough, I'm feeling slightly tempted to make anidb clone because the API is NOT public.  Luckily I'm too lazy :)
Reason? I can get the actual database in various ways. That is not a problem[1].
But I CANNOT choose client to use and at least currently the AOM is promising albeit about 10x slower than my experimental PostgreSQL+Python-based minimalist client and lacking in features.
Alternatives as I see them?
- Beg for AoM to get more features+speed (especially speed being possibly difficult if there are architectural issues),
- ask for API (apparently no-go for open-source client and I don't want to bother with closed source exactly for these reasons),
- 'steal' or 're-use' database, or
- start new one from scratch (somewhat pointless).
I'd like exp to at least reconsider his API policy - I don't even care much about spreading a client, but as getting the DB is easiest thing on planet currently [1], I don't see how his threat model works.
For now, I'm just dealing with inadequacies of the AOM by using it's database directly and having AOM update the results to the AniDB, but I find this solution somewhat inspid.
-F
[1] In current protocol version (28), the client is authenticated using MD5 digest of something. I'm too lazy to bother with softice or equivalent to make my own key generation version (probably few hours' work), as
there are currently at least 4 other points of weakness in this chain (I'll leave them as exercise to the reader in case exp really wants to pretend this data is 'safe'). For the record, yes, I've worked in security field in past :-)
			
			
									
									
						Reason? I can get the actual database in various ways. That is not a problem[1].
But I CANNOT choose client to use and at least currently the AOM is promising albeit about 10x slower than my experimental PostgreSQL+Python-based minimalist client and lacking in features.
Alternatives as I see them?
- Beg for AoM to get more features+speed (especially speed being possibly difficult if there are architectural issues),
- ask for API (apparently no-go for open-source client and I don't want to bother with closed source exactly for these reasons),
- 'steal' or 're-use' database, or
- start new one from scratch (somewhat pointless).
I'd like exp to at least reconsider his API policy - I don't even care much about spreading a client, but as getting the DB is easiest thing on planet currently [1], I don't see how his threat model works.
For now, I'm just dealing with inadequacies of the AOM by using it's database directly and having AOM update the results to the AniDB, but I find this solution somewhat inspid.
-F
[1] In current protocol version (28), the client is authenticated using MD5 digest of something. I'm too lazy to bother with softice or equivalent to make my own key generation version (probably few hours' work), as
there are currently at least 4 other points of weakness in this chain (I'll leave them as exercise to the reader in case exp really wants to pretend this data is 'safe'). For the record, yes, I've worked in security field in past :-)
I am well aware of the fact that the API login scheme is by no means secure. But it's impossible to achive real security in a scenario like this one anyway. So why bother?
The point is simply that a missbeheaving client can cause the API and with it anidb quite a lot of harm. So I'd like to ensure a certain level of quality before granting a client the permission to connect to the API.
And everyone should provit from a client so i generally don't think much of private clients.
Another point is of course that anidb depends heavily on the amount of active users. That's why an anidb clone could hurt us pretty badly.
So we're of course not inclined to give out any easy way to leech the entire anidb database as a starting point for a cloned page. (again i know that it is pretty easy to get the db data but it still takes more time than simply using a public API would).
BYe!
EXP
			
			
									
									
						The point is simply that a missbeheaving client can cause the API and with it anidb quite a lot of harm. So I'd like to ensure a certain level of quality before granting a client the permission to connect to the API.
And everyone should provit from a client so i generally don't think much of private clients.
Another point is of course that anidb depends heavily on the amount of active users. That's why an anidb clone could hurt us pretty badly.
So we're of course not inclined to give out any easy way to leech the entire anidb database as a starting point for a cloned page. (again i know that it is pretty easy to get the db data but it still takes more time than simply using a public API would).
BYe!
EXP
Ok, I'll bite on the cons to provoke discussion:
- If you provide SDK that uses the API, the existing clients themselves are also erroneous if they can cause harm to database using it (and if it's just due to users using clients wrongly, then it can be also done with current clients). As long as the API providing module is even slightly robust, little danger there that I can see. Care to elaborate on the damage? (Ok, because I got through the 'protection' layer using a simple trick I could in theory do the damage too, but I'm really curious ;>)
- Creating a clone would be probably few weeks' effort (at least, if some reasonable subset of functionality would be desired). Of those 4 different ways to get at data, none take more than few hours (and actually two are in minutes category and one in seconds.. ).
). 
Note that there is also potential BENEFIT from public SDK: OSS software tends to develop more rapidly (for example, compare development of emule and edonkey). I for one would probably even contribute to some OSS client, but can't be bothered to go through the hoops to exchange mystic handshakes to get at some sort of CSS for sake of feature or two..
-F
Ironically enough, once upon a time I was in 'lets keep this client+server CSS' boat regarding certain program I wrote, and making client OSS increased userbase, and eventually I even made server OSS and didn't get hurt. No details to protect the guilty
			
			
									
									
						- If you provide SDK that uses the API, the existing clients themselves are also erroneous if they can cause harm to database using it (and if it's just due to users using clients wrongly, then it can be also done with current clients). As long as the API providing module is even slightly robust, little danger there that I can see. Care to elaborate on the damage? (Ok, because I got through the 'protection' layer using a simple trick I could in theory do the damage too, but I'm really curious ;>)
- Creating a clone would be probably few weeks' effort (at least, if some reasonable subset of functionality would be desired). Of those 4 different ways to get at data, none take more than few hours (and actually two are in minutes category and one in seconds..
 ).
). Note that there is also potential BENEFIT from public SDK: OSS software tends to develop more rapidly (for example, compare development of emule and edonkey). I for one would probably even contribute to some OSS client, but can't be bothered to go through the hoops to exchange mystic handshakes to get at some sort of CSS for sake of feature or two..
-F
Ironically enough, once upon a time I was in 'lets keep this client+server CSS' boat regarding certain program I wrote, and making client OSS increased userbase, and eventually I even made server OSS and didn't get hurt. No details to protect the guilty

I'd volunteer to an OSS client project, too, as long as I could be of any help (which pretty much limits it to C, C++ or Perl and to the non-GUI parts for Windows software). So, it's two of us now.fingon wrote:(...)
Note that there is also potential BENEFIT from public SDK: OSS software tends to develop more rapidly (for example, compare development of emule and edonkey). I for one would probably even contribute to some OSS client, but can't be bothered to go through the hoops to exchange mystic handshakes to get at some sort of CSS for sake of feature or two..
Make that 3  
 
Although I'm not posting much I'm a regular AniDB user and on that base I have to agree with exp: forking an AniDB clone would be harmful for everyone, since efforts like this need a lot of user-partecipation. Just think if eDonkey and it's forks each used it's own protocol.
Nevertheless, giving out the API can't be a bad idea: fingon has a point, if one wants to clone AniDB, there are already all the means to do it, releasing the api would not benefit him any further.
I think that if the implementation of the API is secure enough, there won't be any problems.
Anyway, I would like to hear exp's thought on Open Source development, if he's completly against it, I don't think we'll ever convince him
			
			
									
									
						 
 Although I'm not posting much I'm a regular AniDB user and on that base I have to agree with exp: forking an AniDB clone would be harmful for everyone, since efforts like this need a lot of user-partecipation. Just think if eDonkey and it's forks each used it's own protocol.
Nevertheless, giving out the API can't be a bad idea: fingon has a point, if one wants to clone AniDB, there are already all the means to do it, releasing the api would not benefit him any further.
I think that if the implementation of the API is secure enough, there won't be any problems.
Anyway, I would like to hear exp's thought on Open Source development, if he's completly against it, I don't think we'll ever convince him

Yeah, the reason AOM is slow is because of sqlite, it's horribly slow for anything beyond the most basic...Beg for AoM to get more features+speed (especially speed being possibly difficult if there are architectural issues
 
 I'm currently looking at solutions to completelly replace sqlite, the fact that a 3 table join on indexed colums take minutes to run is a pretty good indication of the quality of sqlite.
I hope you don't find the client running slow otherwise though, it should be fast for everything except database read/write operations.
And well I should point out that it's an ALPHA, I have hardly even bothered with optimizations yet.
No SDK is provided, only an API specification (html).fingon wrote:Ok, I'll bite on the cons to provoke discussion:
- If you provide SDK that uses the API, the existing clients themselves are also erroneous if they can cause harm to database using it (and if it's just due to users using clients wrongly, then it can be also done with current clients). As long as the API providing module is even slightly robust, little danger there that I can see. Care to elaborate on the damage? (Ok, because I got through the 'protection' layer using a simple trick I could in theory do the damage too, but I'm really curious ;>)
And there are LOTS of requirements in that API spec and a client can do quite some harm to the server by not following the specs closely.
(i.e. server load/availability, bulk creqs, bulk adds, spamming, ...)
well even that will keep some ppl away.fingon wrote:- Creating a clone would be probably few weeks' effort (at least, if some reasonable subset of functionality would be desired). Of those 4 different ways to get at data, none take more than few hours (and actually two are in minutes category and one in seconds..).
and to say it bluntly, within a few weeks you'd never get anywhere near the functionality of anidb

well, the ppl _seriously_ interested in helping out could as well form a group and work on a closed source client.fingon wrote:Note that there is also potential BENEFIT from public SDK: OSS software tends to develop more rapidly (for example, compare development of emule and edonkey). I for one would probably even contribute to some OSS client, but can't be bothered to go through the hoops to exchange mystic handshakes to get at some sort of CSS for sake of feature or two..
if they're not willing to commit themselves to such a large project they'd prolly not contribute much anyway.
btw. I do see that OSS has it's advantages, i just don't see that those outweight the danger a public API would pose to AniDB.
well who knows, maybe the anidb api will be public someday, but not atm.fingon wrote:Ironically enough, once upon a time I was in 'lets keep this client+server CSS' boat regarding certain program I wrote, and making client OSS increased userbase, and eventually I even made server OSS and didn't get hurt. No details to protect the guilty

BYe!
EXP
I'd guess you could just have the server-side ban anyone doing more than X ops/hour from one IP, doesn't sound like rocket science to meexp wrote: No SDK is provided, only an API specification (html).
And there are LOTS of requirements in that API spec and a client can do quite some harm to the server by not following the specs closely.
(i.e. server load/availability, bulk creqs, bulk adds, spamming, ...)
 Also, the access isn't anonymous anyway so I don't see the point - sure, they might at best cause some annoyance once but then they'd be gone.
 Also, the access isn't anonymous anyway so I don't see the point - sure, they might at best cause some annoyance once but then they'd be gone.And as deterrence method, even with current client authentication scheme I could spam requests easily enough with unauthenticated code, how to do it as left as exercise to the reader.
I'd probably get the functionality _I_ need or even use (which is as I said, reasonable subset). Simple way HowTo: Grab Plone (http://www.plone.org), write db schema, write minimal edit/submission workflow and web pages, write API server on top of the Zope db (or use separate RDBMS such as PostgreSQL). I doubt it'd be more than two weeks, but I don't personally care for fork in the project anyway so it's academic point only.exp wrote:well even that will keep some ppl away.fingon wrote:- Creating a clone would be probably few weeks' effort (at least, if some reasonable subset of functionality would be desired). Of those 4 different ways to get at data, none take more than few hours (and actually two are in minutes category and one in seconds..).
and to say it bluntly, within a few weeks you'd never get anywhere near the functionality of anidb
That's the major problem, most people in world (from my experience with few projects, both CSS and OSS) are not _seriously_ interested, they usually care only about some changes to the existing status quo within projects and the more they need to work to get changes done, the less likely they're to do anything useful.exp wrote:well, the ppl _seriously_ interested in helping out could as well form a group and work on a closed source client.fingon wrote:Note that there is also potential BENEFIT from public SDK: OSS software tends to develop more rapidly (for example, compare development of emule and edonkey). I for one would probably even contribute to some OSS client, but can't be bothered to go through the hoops to exchange mystic handshakes to get at some sort of CSS for sake of feature or two..
if they're not willing to commit themselves to such a large project they'd prolly not contribute much anyway.
btw. I do see that OSS has it's advantages, i just don't see that those outweight the danger a public API would pose to AniDB.
-F
Yes, the db operations are what annoys me mostly (my Python+PostgreSQL "create all tables from scratch from the original gzipped database file" takes less time to do (about 1/4 as a matter of fact) of the AOM bootup).PetriW wrote:Yeah, the reason AOM is slow is because of sqlite, it's horribly slow for anything beyond the most basic...Beg for AoM to get more features+speed (especially speed being possibly difficult if there are architectural issues
I'm currently looking at solutions to completelly replace sqlite, the fact that a 3 table join on indexed colums take minutes to run is a pretty good indication of the quality of sqlite.
I hope you don't find the client running slow otherwise though, it should be fast for everything except database read/write operations.
And well I should point out that it's an ALPHA, I have hardly even bothered with optimizations yet.
Have you experimented with EXPLAIN to see if it really uses indexes in those joins? At least in my experience SQLite is one of faster free SQL databases when used correctly (It's write performance sucks in sync mode, but that can be fixed in few alternate ways).
-F
It's not an issue anymore, AOM now only uses very simple queries. I'll try it out in some queries that run abnormally slow though.Have you experimented with EXPLAIN to see if it really uses indexes in those joins?
If you have hints on how to improve write performance beyond using transactions you are welcome to tell me.At least in my experience SQLite is one of faster free SQL databases when used correctly (It's write performance sucks in sync mode, but that can be fixed in few alternate ways).
Latest private build of AOM takes 5 seconds to start on a P4 2.8ghz, here's some current debug times fyi:Yes, the db operations are what annoys me mostly (my Python+PostgreSQL "create all tables from scratch from the original gzipped database file" takes less time to do (about 1/4 as a matter of fact) of the AOM bootup).
Code: Select all
[2004-01-16 09:56:32] ~ Starting TAniDB.InitCache
[2004-01-16 09:56:32] ~ Starting TAniDBAnimes.UpdateAnimes
[2004-01-16 09:56:32] ~ TAniDBAnimes.UpdateAnimes - size verified at 0 milliseconds
[2004-01-16 09:56:32] ~ TAniDBAnimes.UpdateAnimes - query complete at 46 milliseconds
[2004-01-16 09:56:33] ~ TAniDBAnimes.UpdateAnimes - update complete at 765 milliseconds
[2004-01-16 09:56:33] ~ TAniDBAnimes.UpdateAnimes - refresh complete at 765 milliseconds
[2004-01-16 09:56:33] ~ TAniDBAnimes.UpdateAnimes - 615 milliseconds spent updating titles and genres
[2004-01-16 09:56:33] ~ Starting TAniDBEpisodes.UpdateEpisodes
[2004-01-16 09:56:33] ~ TAniDBEpisodes.UpdateEpisodes - size verified at 0 milliseconds
[2004-01-16 09:56:33] ~ TAniDBEpisodes.UpdateEpisodes - query complete at 141 milliseconds
[2004-01-16 09:56:33] ~ TAniDBEpisodes.UpdateEpisodes - update complete at 311 milliseconds
[2004-01-16 09:56:33] ~ TAniDBEpisodes.UpdateEpisodes - refresh complete at 328 milliseconds
[2004-01-16 09:56:33] ~ Starting TAniDBFiles.UpdateFiles
[2004-01-16 09:56:33] ~ TAniDBFiles.UpdateFiles - size verified at 0 milliseconds
[2004-01-16 09:56:34] ~ TAniDBFiles.UpdateFiles - query complete  at 890 milliseconds
[2004-01-16 09:56:35] ~ TAniDBFiles.UpdateFiles - update complete at 1641 milliseconds
[2004-01-16 09:56:35] ~ TAniDBFiles.UpdateFiles - refresh complete at 1671 milliseconds
[2004-01-16 09:56:35] ~ Starting TAniDBMylist.UpdateEntries
[2004-01-16 09:56:35] ~ TAniDBMylist.UpdateEntries - size verified at 0 milliseconds
[2004-01-16 09:56:35] ~ TAniDBMylist.UpdateEntries - query complete at 46 milliseconds
[2004-01-16 09:56:35] ~ TAniDBMylist.UpdateEntries - update complete at 514 milliseconds
[2004-01-16 09:56:35] ~ TAniDBMylist.UpdateEntries - refresh complete at 546 milliseconds
[2004-01-16 09:56:35] ~ TAniDBMylist.UpdateEntries - 446 milliseconds spent updating interface
[2004-01-16 09:56:36] ~ TAniDB.InitCache - done after running for 3687 millisecondsThere are 3 ways I've optimized the SQLite program or two I've written - transactions, asynchronous writes and query optimization.PetriW wrote:It's not an issue anymore, AOM now only uses very simple queries. I'll try it out in some queries that run abnormally slow though.Have you experimented with EXPLAIN to see if it really uses indexes in those joins?If you have hints on how to improve write performance beyond using transactions you are welcome to tell me.At least in my experience SQLite is one of faster free SQL databases when used correctly (It's write performance sucks in sync mode, but that can be fixed in few alternate ways).
At least with earlier versions of SQLite, the default mode used fsync every transaction or so on UNIX and something even more disgusting on Windows, which made it abysmally slow. I'm not aware of the current state of the project, but check pager.c / sqlitepager_set_safety_level (those high default safety levels are only meaningful for industrial databases, if you're doing something "for fun" style, you probably care more about speed).
YMMV, it worked for me in past.
-F
