December 01, 2024

Data Collection of Reviews

A python crawler was designed for it

Crawler Programme

In this part, first, I use some external libraries of Python, such as BeautifulSoup4 and DrissionPage, to build a crawler program, and use MultiProcessing for multi-process crawling to improve the crawling efficiency.

Finally, the data is connected to the MongoDB database to complete storage. The form of the crawled data is shown in the following figure. fig2.png

Data Statistics

and the statistics of crawled data are showed below: fig6.png

Deficiency

An awkward thing is that there is no replies in the database of lipstick and running shoes, and only a few replies in the ricecooker database.

So given to this deficiency, only the databases of cellphone, laptop and camera were used for processing