PolarSPARC |
Hands-on with Python Requests
Bhaskar S | 07/03/2021 |
Overview
requests is an elegant, human-friendly, and popular Python library for making HTTP requests.
Installation
Installation is assumed to be Linux desktop running Ubuntu 20.04 LTS. To install the requests Python module, open a terminal window and execute the following command:
$ pip3 install requests
On successful installation, we should be ready to start using Python requests.
Hands-on Python requests
The following is a simple Python script that makes a GET request to the Hacker News site:
# # @Author: Bhaskar S # @Blog: https://www.polarsparc.com # @Date: 03 Jul 2021 # import logging import requests logging.basicConfig(format='%(asctime)s - %(message)s', level=logging.INFO) def main(): url = 'http://news.ycombinator.com/newest' logging.info('URL to GET: %s' % url) res = requests.get(url) logging.info('Type of res: %s' % type(res)) logging.info('URL: %s, Status code: %d, Content length: %d' % (res.url, res.status_code, len(res.content))) if __name__ == '__main__': main()
Some aspects of the first.py from the above needs a little explanation.
get(url) :: Makes a HTTP GET request to the specified url. It returns an object of class type requests.models.Response
res.status_code :: encapsulates the HTTP status code from the server. A code of 200 indicates success, a code of 301 means a redirect, a code of 401 means unauthorized, a code of 403 means forbidden, a code of 404 means not found, a code of 500 mean internal server error, etc
res.content :: encapsulates the response content (in bytes) from the server
res.url :: indicates the target URL location of the response
To run the Python script first.py, execute the following command:
$ python3 first.py
The following would be a typical output:
2021-07-03 11:05:57,422 - URL to GET: http://news.ycombinator.com/newest 2021-07-03 11:05:57,967 - Type of res: <class 'requests.models.Response'> 2021-07-03 11:05:57,967 - URL: https://news.ycombinator.com/newest, Status code: 200, Content length: 41978
Interesting part - the target URL is 'https://news.ycombinator.com/newest' vs 'http://news.ycombinator.com/newest'.
The illustration below indicates the request made from a chrome browser with the developer tools ON:
As is evident from the illustration above, there is a HTTP redirection (301) involved.
Python requests by default performs location redirec tions of all the HTTP verbs, except for the HTTP HEAD request.
The following is a simple Python script that makes the same GET request to the Hacker News site and shows the redirection:
# # @Author: Bhaskar S # @Blog: https://www.polarsparc.com # @Date: 03 Jul 2021 # import logging import requests logging.basicConfig(format='%(asctime)s - %(message)s', level=logging.INFO) def main(): url = 'http://news.ycombinator.com/newest' logging.info('URL to GET: %s' % url) res = requests.get(url) logging.info('Unicode content size: %d, Encoding: %s, Headers: %s' % (len(res.text), res.encoding, res.headers)) if res.history: for his in res.history: logging.info('History: status: %d, headers: %s' % (his.status_code, his.headers)) if __name__ == '__main__': main()
Some aspects of the second.py from the above needs a little explanation.
res.text :: encapsulates the response content (in unicode) from the server
res.encoding :: encapsulates the encoding that can be used to decode res.text
res.headers :: encapsulates the response HTTP headers from the server as a Python dictionary
res.history :: A list of requests.models.Response objects (for each of the redirects from the oldest to the most recent) before reaching the target URL
To run the Python script second.py, execute the following command:
$ python3 second.py
The following would be a typical output:
2021-07-03 12:08:01,099 - URL to GET: http://news.ycombinator.com/newest 2021-07-03 12:08:01,654 - Unicode content size: 41126, Encoding: utf-8, Headers: {'Server': 'nginx', 'Date': 'Sat, 03 Jul 2021 16:08:01 GMT', 'Content-Type': 'text/html; charset=utf-8', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Vary': 'Accept-Encoding', 'Cache-Control': 'private; max-age=0', 'X-Frame-Options': 'DENY', 'X-Content-Type-Options': 'nosniff', 'X-XSS-Protection': '1; mode=block', 'Referrer-Policy': 'origin', 'Strict-Transport-Security': 'max-age=31556900', 'Content-Security-Policy': "default-src 'self'; script-src 'self' 'unsafe-inline' https://www.google.com/recaptcha/ https://www.gstatic.com/recaptcha/ https://cdnjs.cloudflare.com/; frame-src 'self' https://www.google.com/recaptcha/; style-src 'self' 'unsafe-inline'", 'Content-Encoding': 'gzip'} 2021-07-03 12:08:01,654 - History: status: 301, headers: {'Server': 'nginx', 'Date': 'Sat, 03 Jul 2021 16:08:01 GMT', 'Content-Type': 'text/html', 'Content-Length': '178', 'Connection': 'keep-alive', 'Location': 'https://news.ycombinator.com/newest'}
To disable the default behavior of redirection handling, the following is a simple Python script that makes the same GET request to the Hacker News site:
# # @Author: Bhaskar S # @Blog: https://www.polarsparc.com # @Date: 03 Jul 2021 # import logging import requests logging.basicConfig(format='%(asctime)s - %(message)s', level=logging.INFO) def main(): url = 'http://news.ycombinator.com/newest' logging.info('URL to GET: %s' % url) res = requests.get(url, allow_redirects=False) logging.info('Unicode content: %s' % res.text) logging.info('Status code: %d, Location: %s' % (res.status_code, res.headers['Location'])) if res.history: for his in res.history: logging.info('History: status: %d, headers: %s' % (his.status_code, his.headers)) if __name__ == '__main__': main()
Some aspects of the third.py from the above needs a little explanation.
allow_redirects=False :: flag that disables the default redirection behavior
res.headers['Location'] :: access the URL to redirect to as indicated by the server
To run the Python script third.py, execute the following command:
$ python3 third.py
The following would be a typical output:
2021-07-03 13:36:37,936 - URL to GET: http://news.ycombinator.com/newest 2021-07-03 13:36:38,134 - Unicode content: <html> <head><title>301 Moved Permanently</title></head> <body bgcolor="white"> <center><h1>301 Moved Permanently</h1></center> <hr><center>nginx</center> </body> </html> 2021-07-03 13:36:38,134 - Status code: 301, Location: https://news.ycombinator.com/newest
Until now, we have been exploring the HTTP GET method. The other commonly used methods are POST, PUT, and DELETE. In the following simple Python script, we demonstrate the use of these common HTTP methods by making requests to the simple HTTP request/response site https://httpbin.org:
# # @Author: Bhaskar S # @Blog: https://www.polarsparc.com # @Date: 03 Jul 2021 # import logging import requests logging.basicConfig(format='%(asctime)s - %(message)s', level=logging.INFO) def http_get(): url = 'https://httpbin.org/get' logging.info('[GET] URL: %s' % url) res = requests.get(url) logging.info('[GET] Status code: %d' % res.status_code) logging.info('[GET] Content: %s' % res.text) logging.info('[GET] Headers: %s' % res.headers) def http_post(): url = 'https://httpbin.org/post' payload = {'abc': '123', 'def': '456'} logging.info('[POST] URL: %s' % url) res = requests.post(url, data=payload) logging.info('[POST] Status code: %d' % res.status_code) logging.info('[POST] Content: %s' % res.text) logging.info('[POST] Headers: %s' % res.headers) def http_put(): url = 'https://httpbin.org/put' payload = {'abc': '789'} logging.info('[PUT] URL: %s' % url) res = requests.put(url, data=payload) logging.info('[PUT] Status code: %d' % res.status_code) logging.info('[PUT] Content: %s' % res.text) logging.info('[PUT] Headers: %s' % res.headers) def http_delete(): url = 'https://httpbin.org/delete' logging.info('[DELETE] URL: %s' % url) res = requests.delete(url) logging.info('[DELETE] Status code: %d' % res.status_code) logging.info('[DELETE] Content: %s' % res.text) logging.info('[DELETE] Headers: %s' % res.headers) if __name__ == '__main__': http_get() http_post() http_put() http_delete()
Some aspects of the fourth.py from the above needs a little explanation.
post(url, data=payload) :: allows one to make a POST request to the specified URL with the specified data (as a Python dictionary) as the payload in the body of the request
put(url, data=payload) :: allows one to make a PUT request to the specified URL with the specified data (as a Python dictionary) as the payload in the body of the request
delete(url) :: allows one to make a DELETE request to the specified URL
To run the Python script fourth.py, execute the following command:
$ python3 fourth.py
The following would be a typical output:
2021-07-03 14:20:11,307 - [GET] URL: https://httpbin.org/get 2021-07-03 14:20:11,426 - [GET] Status code: 200 2021-07-03 14:20:11,427 - [GET] Content: { "args": {}, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Host": "httpbin.org", "User-Agent": "python-requests/2.22.0", "X-Amzn-Trace-Id": "Root=1-60e0aa5b-38ffcad8663ebd196c350476" }, "origin": "173.71.122.117", "url": "https://httpbin.org/get" } 2021-07-03 14:20:11,427 - [GET] Headers: {'Date': 'Sat, 03 Jul 2021 18:20:11 GMT', 'Content-Type': 'application/json', 'Content-Length': '308', 'Connection': 'keep-alive', 'Server': 'gunicorn/19.9.0', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Credentials': 'true'} 2021-07-03 14:20:11,427 - [POST] URL: https://httpbin.org/post 2021-07-03 14:20:11,557 - [POST] Status code: 200 2021-07-03 14:20:11,558 - [POST] Content: { "args": {}, "data": "", "files": {}, "form": { "abc": "123", "def": "456" }, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Content-Length": "15", "Content-Type": "application/x-www-form-urlencoded", "Host": "httpbin.org", "User-Agent": "python-requests/2.22.0", "X-Amzn-Trace-Id": "Root=1-60e0aa5b-7ceaddef2d5d20555ac9f775" }, "json": null, "origin": "173.71.122.117", "url": "https://httpbin.org/post" } 2021-07-03 14:20:11,558 - [POST] Headers: {'Date': 'Sat, 03 Jul 2021 18:20:11 GMT', 'Content-Type': 'application/json', 'Content-Length': '498', 'Connection': 'keep-alive', 'Server': 'gunicorn/19.9.0', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Credentials': 'true'} 2021-07-03 14:20:11,558 - [PUT] URL: https://httpbin.org/put 2021-07-03 14:20:11,677 - [PUT] Status code: 200 2021-07-03 14:20:11,677 - [PUT] Content: { "args": {}, "data": "", "files": {}, "form": { "abc": "789" }, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Content-Length": "7", "Content-Type": "application/x-www-form-urlencoded", "Host": "httpbin.org", "User-Agent": "python-requests/2.22.0", "X-Amzn-Trace-Id": "Root=1-60e0aa5b-7d8ec6be57aaee1e3f250932" }, "json": null, "origin": "173.71.122.117", "url": "https://httpbin.org/put" } 2021-07-03 14:20:11,677 - [PUT] Headers: {'Date': 'Sat, 03 Jul 2021 18:20:11 GMT', 'Content-Type': 'application/json', 'Content-Length': '477', 'Connection': 'keep-alive', 'Server': 'gunicorn/19.9.0', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Credentials': 'true'} 2021-07-03 14:20:11,677 - [DELETE] URL: https://httpbin.org/delete 2021-07-03 14:20:11,790 - [DELETE] Status code: 200 2021-07-03 14:20:11,791 - [DELETE] Content: { "args": {}, "data": "", "files": {}, "form": {}, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Content-Length": "0", "Host": "httpbin.org", "User-Agent": "python-requests/2.22.0", "X-Amzn-Trace-Id": "Root=1-60e0aa5b-5bf7b2721da78f1344ff15b6" }, "json": null, "origin": "173.71.122.117", "url": "https://httpbin.org/delete" } 2021-07-03 14:20:11,791 - [DELETE] Headers: {'Date': 'Sat, 03 Jul 2021 18:20:11 GMT', 'Content-Type': 'application/json', 'Content-Length': '402', 'Connection': 'keep-alive', 'Server': 'gunicorn/19.9.0', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Credentials': 'true'}/pre>
Often times we have a need to interact with HTTP based REST API services using GET (to query for resources), POST (to create a new resource), PUT (to update an existing resource), or DELETE (to delete a resource). The REST services typically take a JSON payload and respond with a JSON payload. In the following simple Python script, we demonstrate the POST and PUT methods by making API requests to the fake JSON API service at https://jsonplaceholder.typicode.com/:
# # @Author: Bhaskar S # @Blog: https://www.polarsparc.com # @Date: 03 Jul 2021 # import logging import requests logging.basicConfig(format='%(asctime)s - %(message)s', level=logging.INFO) def api_post(): url = 'https://jsonplaceholder.typicode.com/posts' headers = {'Content-Type': 'application/json'} json = {'title': 'Learning Python, 5th', 'body': 'An in-depth introductory Python book', 'userId': 3} logging.info('[POST] URL: %s' % url) res = requests.post(url, headers=headers, json=json) logging.info('[POST] Status code: %d' % res.status_code) logging.info('[POST] Response: %s' % res.json()) def api_put(): url = 'https://jsonplaceholder.typicode.com/posts/1' headers = {'Content-Type': 'application/json'} json = {'id': 101, 'title': 'Learning Python, 5th', 'body': 'A comprehensive, in-depth introduction to the core Python language', 'userId': 3} logging.info('[PUT] URL: %s' % url) res = requests.put(url, headers=headers, json=json) logging.info('[PUT] Status code: %d' % res.status_code) logging.info('[PUT] Response: %s' % res.json()) if __name__ == '__main__': api_post() api_put()
Some aspects of the fifth.py from the above needs a little explanation.
res.json() :: returns the JSON encoded response from the server
To run the Python script fifth.py, execute the following command:
$ python3 fifth.py
The following would be a typical output:
2021-07-03 16:46:25,555 - [POST] URL: https://jsonplaceholder.typicode.com/posts 2021-07-03 16:46:25,658 - [POST] Status code: 201 2021-07-03 16:46:25,659 - [POST] Response: {'title': 'Learning Python, 5th', 'body': 'An in-depth introductory Python book', 'userId': 3, 'id': 101} 2021-07-03 16:46:25,659 - [PUT] URL: https://jsonplaceholder.typicode.com/posts/1 2021-07-03 16:46:25,756 - [PUT] Status code: 200 2021-07-03 16:46:25,757 - [PUT] Response: {'id': 1, 'title': 'Learning Python, 5th', 'body': 'A comprehensive, in-depth introduction to the core Python language', 'userId': 3}
The following is a simple Python script that makes a GET request to the PolarSPARC site:
# # @Author: Bhaskar S # @Blog: https://www.polarsparc.com # @Date: 03 Jul 2021 # import logging import requests logging.basicConfig(format='%(asctime)s - %(message)s', level=logging.INFO) def main(): url = 'https://www.polarsparc.com/' logging.info('URL to GET: %s' % url) res = requests.get(url) logging.info('URL: %s, Status code: %d, Content: %s' % (res.url, res.status_code, res.content)) if __name__ == '__main__': main()
To run the Python script sixth.py, execute the following command:
$ python3 sixth.py
The following would be a typical output:
2021-07-03 20:35:52,452 - URL to GET: https://www.polarsparc.com/ 2021-07-03 20:35:52,701 - URL: https://www.polarsparc.com/, Status code: 406, Content: b'Not Acceptable! Not Acceptable!
An appropriate representation of the requested resource could not be found on this server. This error was generated by Mod_Security.