This repository includes all the Python code I used to gather data from a novel translation archive website. In here, you will find 5 Python files, which consists of 4 different classes and 1 main script.
- logger
This file contains aLoggerclass which is used to keep track of warnings and errors that may or may not happen during runtime and then write it into a file calledlogs.log. - pool
This file has a function calledcreate_pool()which will be called byProxerto generate a set (a pool) of random proxies and headers. This pool will later be passed on as a parameter during the HTTP GET request so the bot will seem more like a normal user and will less likely to get blocked. - proxer
TheProxerclass in this file keeps track of the IP and header rotations onmain, does the HTTP request, and then return the HTML response. - novelparser
Includes theBeautifulSoupcodes to find the desired information, including maximum number of pages, titles for each page, and details for each novel. - main
Contains a loop to fetch get the HTML response usingProxer(). open_site()and parse it using various functions fromNovelParser.