Skip to content

This repository provides a web scraping with python using beautiful soup library to collect data from the web which is especially useful for data science and data analysis for collecting large amounts of data automatically

Notifications You must be signed in to change notification settings

YousefAmerAwad/Web-Scraping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

Web-Scraping

Introduction

Imagine you have to pull a large amount of data from websites and you want to do it as quickly as possible. How would you do it without manually going to each website and getting the data? Well, “Web Scraping” is the answer. Web Scraping just makes this job easier and faster.

Web scraping is an automated method used to extract large amounts of data from websites. The data on the websites are unstructured. Web scraping helps collect these unstructured data and store it in a structured form. There are different ways to scrape websites such as online Services, APIs or writing your own code. In this article, we’ll see how to implement web scraping with python.

There is no universal solution for web scraping because the way data is stored on each website is usually specific to that site. In fact, if you want to scrape the data, you need to understand the website’s structure.

This website will give you more information about web scraping https://realpython.com/beautiful-soup-web-scraper-python/

Project Description

In this project we will scrape data from two websites.

  • Wuzzuf: It's an online employment platform that helps people to find Jobs in Egypt and Middle East. We will search for available data science jobs. For each job we will find job title, company name, location, skills, requirements and salary. Then we put these information in a csv file.
    The URL of the page: https://wuzzuf.net/search/jobs/?a=hpb&q=data%20science

  • Oasis Cars: A trusted name among locals and expats communities in Qatar founded in (1997) in buying and selling new and used cars. For each car, we will find Model, Year, Show_Room, Mileage, Specs and Price.
    URL: https://oasiscars.com/Cars/List

In this project, we will use Python for scraping because of its ease. It has a library known as 'Beautiful Soup' which assists this task.

Libraries used for Web Scraping

  • BeautifulSoup: Beautiful Soup is a Python package for parsing HTML and XML documents. It creates parse trees that is helpful to extract the data easily.
  • requests: With Python's requests library we're getting a web page by using get() on the URL, using page.content will give us the HTML. Once we have the HTML we can then parse it for the data we're interested in analyzing.
  • xlml: A feature-rich library for processing XML and HTML
  • itertools: From this module, we need to import zip_longest function. This function makes an iterator that aggregates elements from each of the iterables. The iteration continues until the longest iterable is not exhausted.

About

This repository provides a web scraping with python using beautiful soup library to collect data from the web which is especially useful for data science and data analysis for collecting large amounts of data automatically

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages