Collection of some simple python scripts to create https://myanimelist.net/ anime and user data set.
Myanimelist Anime Dataset upto May 7 2018 [This may take some time to load]
For latest dataset click here [Constantly updating]
NOTE: This page contains lots of GIF. So it may take a lot of time to load. Please be patient.
This script can be used to download anime dataset from Myanimelist using an unofficial MyAnimeList REST API, Jikan.
- animeID: id of anime as in anime url https://myanimelist.net/anime/1
- name: title of anime
- premiered: premiered on. default format (season year)
- genre: list of genre
- type: type of anime (example TV, Movie etc)
- episodes: number of episodes
- studios: list of studio
- source: source of anime (example original, manga, game etc)
- scored: score of anime
- scoredBy: number of member scored the anime
- members: number of member added anime to their list
-
Before doing this I will request you to watch this video - Google Sheets and Python. Here we are using this concept as a base. I just integrated this with heroku.
-
First you need to visit this link to create a project inside Google cloud resource manager.
-
Click CREATE PROJECT, then give it a name. If below gif is low quality, then click here.
- Now you need to enable Google sheet API for your project.
- Next you need to get credential file. If below gif is low quality, then click here.
-
Add client_secret.json and give access to the spreadsheet. Spreadsheet contains header, which you need to add. Watch how to do that here.
-
Before deploying to Heroku. You need to create an app. If below gif is low quality, then click here.
- At last just push to heroku master and start the worker dyno. Watch how to do that here
NOTE: If the worker doesn't starts amnually, you can start it using the following command: heroku ps:scale worker=1
- Final Product:
python getAnime.py starting_index ending_index [output_file.csv]
This script can be used to download user dataset from Myanimelist using an API, Kuristina.
- userID: MAL user ID
- animeID: id of anime as in anime url https://myanimelist.net/anime/ID
- score: score by the use for anime with id = animeID (if user haven't score the anime then this field is 0).
python getUser.py UserList.txt [User.csv]
NOTE: Make sure you have a Userlist.txt file containing the name of the users. If you don't have that then use the scrapper(scrap from club or scrap from post).
For this you need to get topic ID. Go to MAL -> Community -> Forums -> Select a forum
For example for the following forums links their respective ID are highlighted in bold below:
https://myanimelist.net/forum/?topicid=1699126 -> 1699126
https://myanimelist.net/forum/?topicid=1696289 -> 1696289
After getting the topic ID, you can use createUserListFromPost script.
python createUserListFromPost.py topicID [UserList.txt]
For this you need to get club ID. Go to MAL -> Community -> Clubs -> Select a club
For example for the following clubs links their respective ID are highlighted in red below:
https://myanimelist.net/clubs.php?cid=72250 -> 72250
https://myanimelist.net/clubs.php?cid=32683 -> 32683
After getting the topic ID, you can use createUserListFromClub script.
python createUserListFromClub.py clubID [UserList.txt]
- Scrapping Locally ✔
- Scrapping using Heroku ✔
- Creating Heroku Deploy Button ⌛