Web scraping is a useful technique for extracting data from websites, and Python offers powerful libraries like BeautifulSoup and requests for this purpose. In this tutorial, we'll create a simple web scraper to extract information from a website and save it to a CSV file.
First, let's install the necessary libraries:
pip install beautifulsoup4 requests
Now, let's create our scraper script scraper.py:
import requests
from bs4 import BeautifulSoup
import csv
# URL of the website to scrape
url = 'https://example.com'
# Send a GET request to the URL
response = requests.get(url)
# Parse the HTML content of the page
soup = BeautifulSoup(response.text, 'html.parser')
# Find the elements containing the data we want to scrape
items = soup.find_all('div', class_='item')
# Open a CSV file to write the scraped data
with open('data.csv', 'w', newline='') as csvfile:
fieldnames = ['title', 'price']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
# Write the header row
writer.writeheader()
# Loop through each item and extract relevant data
for item in items:
title = item.find('h2').text.strip()
price = item.find('span', class_='price').text.strip()
# Write the data to the CSV file
writer.writerow({'title': title, 'price': price})
print('Scraping complete. Data saved to data.csv')
In this script, we send a GET request to a specified URL, parse the HTML content using BeautifulSoup, and extract data from elements with specific classes. We then write the extracted data to a CSV file.
To run the scraper, simply execute the script:
python scraper.py
The scraped data will be saved to a CSV file named data.csv in the same directory as the script.
Keep in mind that web scraping should be done responsibly and in accordance with the website's terms of service. It's also important to handle errors and edge cases gracefully to ensure the reliability and robustness of your scraper.
298 views