An Interest In:
Web News this Week
- April 1, 2024
- March 31, 2024
- March 30, 2024
- March 29, 2024
- March 28, 2024
- March 27, 2024
- March 26, 2024
Importing Data From The Web Into Python
Throughout your journey as a Data Scientist, you will find yourself regularly dealing with data. Sometimes, these data are readily available, while other times, you have to source for and gather the data yourself.
Your data can be gathered from various sources, but more often than not, you would get these data from the web.
Now imagine you found a website that has this gigantic enormous data that you find very useful. Unfortunately, there is no way you can download the contents on this website onto your device for analysis.
Manually collating the data from the website would cost you a great amount of time. Fortunately, you can seamlessly import these data using some Python packages.
Importing data using urlretrieve
Import the function urlretrieve from the urllib.request subpackage.
Assign the url of the website to a variable - url is used as example here.
Use the function urlretrieve to save this file locally. Pass two arguments to the function - the url of the website (which has been assigned to the variable url) and the name you wish to save the file as.
The data is now saved as a file on your device, which you can manage and wrangle as you wish.
Importing data using urlopen and Request
To fully understand how this works, you need to have a basic understanding of HTTP requests. But worry not, even if you do not understand requests, you can follow the steps below and import data from the web.
Import the functions urlopen and Request from the subpackage urllib.request.
Specify the url.
Package the request by calling Request on the url.
Send the request and catch the response.
The response gotten from your request is an object. To extract the content of the html, call the read method on the response object.
You can then print, wrangle and manage the content of the webpage.
Importing Data With requests
Then here comes the almighty requests package. It is an easier and more recommended way of performing the same import performed with urllib above.
Call requests' get method on the url. This packages the request, sends it and catches the response. All with one command. Pretty cool, right?
The response is an HTTP object. To access the contents of the response, call the text attribute on the object.
Note that there are several other actions you could take with the packages used above, like interacting with an API. However, for the context of this article, we are only concerned with using them for importing data from a webpage.
Woohoo! You can now easily import data from the web with Python.
The data imported however are HTML contents, with html tags, and other html attributes. They are therefore not quite ready for use or analysis.
To make them ready for use, you have to format them using a package called BeautifulSoup. This will be discussed in a follow-up article.
Till then, keep importing data with these packages and doing wonders with Python.
Original Link: https://dev.to/olayinkaatobiloye/importing-data-from-the-web-into-python-3h0n
Dev To
An online community for sharing and discovering great ideas, having debates, and making friendsMore About this Source Visit Dev To