KevsRobots Learning Platform
50% Percent Complete
By Kevin McAleer, 2 Minutes
In this lesson, we will introduce the concept of web scraping, which is a method of extracting information from websites. Python offers several libraries for web scraping, including Beautiful Soup and requests.
Web scraping is the process of extracting information directly from a web page. It involves making a request to a web page, downloading its HTML content, and parsing that content to extract the information you need.
requests
The requests
library allows you to send HTTP requests using Python. You can use it to download web pages.
import requests
# Make a request to a web page
response = requests.get('https://www.example.com')
# Print the status code (200 means success)
print(response.status_code)
# Print the first 500 characters of the HTML content
print(response.text[:500])
Beautiful Soup
Beautiful Soup is a library for parsing HTML and XML documents. It provides methods and Pythonic idioms for iterating, searching, and modifying the parse tree.
from bs4 import BeautifulSoup
import requests
# Make a request to a web page
response = requests.get('https://www.example.com')
# Create a Beautiful Soup object
soup = BeautifulSoup(response.text, 'html.parser')
# Find the title tag
title_tag = soup.find('title')
# Print the text of the title tag
print(title_tag.text)
In this lesson, you’ve learned about web scraping with Python. We’ve covered how to use the requests library to download web pages and the Beautiful Soup library to parse HTML and extract information. Web scraping is a powerful tool for gathering data from the internet.
You can use the arrows ← →
on your keyboard to navigate between lessons.