Blog #1 – Learning Python Web Scraping Basics

Learning Python Web Scraping

I am currently looking into learning Python Web Scraping. As it seems fairly easy to get started as well as being a useful skill. In terms of producing a fun project, I believe it will allow me to increase its complexity at my own pace. From what I have read so far the Beautiful Soup library appears to be the recommended place to start, and I have played around with it, having followed some YouTube tutorials. BeautifulSoup is a HTML and XML parsing library, that creates a parsing tree that can be used to extract data from.

Using Beautiful Soup combined with the urllib library, I have managed to produce some HTML Parsing code. In terms of HTML parsing, the main content on HTML pages of value would be text, tables, and xml. I would next be interested in learning how to scrape JavaScript, download and store scraped data, format data, and produce a crawler. I have so far seen and reproduced examples of using PyQt to scrape JavaScript.

I will be following the book Web Scraping with Python By Ryan Mitchell from now on, as it appears to contain a useful learning progression with good content. I have also heard about some other well-known python web scraping libraries such as mechanize, scrapy, selenium and scrapemark. Technically scrapy is a framework, and is very useful if you are looking to develop a website crawler. If anyone knows of any useful learning resources on the topic of python web scraping, feel free to leave a comment.

Useful Resources:

Blog #1 - Learning Python Web Scraping Basics

Learning Python Web Scraping

Add comment Cancel reply