Understanding HyperText Transfer Protocol (HTTP)
Bhaskar S | 02/15/2009 |
We live in an Internet age where the World Wide Web (WWW) is part of
our lives.
Have you ever wondered how the Web works ? What happens when you type a
web
address on your browser and press
The HTTP protocol is an application level request-response like communication protocol of the Web. When we type a web address like “www.polarsparc.com” in our Web Browser, it is basically making a request to the Web Server for the web content of PolarSPARC. The Web Server sends the web content back as a response to the Web Browser. The Web Browser renders the content on the browser. The following diagram illustrates this scenario:
For convenience, we will refer to the Web Browser as the Client and the Web Server as the Server.
The Web Server is a container for various web resources such as, texts, images, videos, etc. Every web resource on a Web Server can be identified uniquely using a Uniform Resource Identifier (or URI for short). When a URI is used as an address to locate a web resource on a Web Server, it is known as a Uniform Resource Locator (or URL for short). For example, to locate the original HTTP protocol specification from the Internet Engineering Task Force (IETF) website, we would use the URL http://www.rfc-editor.org/rfc/rfc1945.txt.
The format of any URI is as follows:
protocol://[user-id:password@]host[:port][/resource-path][?query-string]#[fragment]
where,
protocol :: is the protocol used to address the resource, namely, http or ftp, etc
user-id:password :: is optional & specifies the authentication credentials to access the resource. If omitted, it means anonymous access
host :: specify server that can serve the resource
port :: is optional & defaults to 80
resource-path :: is optional & specifies the name of the resource
query-string :: is optional & provides additional context information for accessing the resource. It is specified as a string of name and value pairs like: name1=value1&name2=value2&name3=value3, etc
fragment :: is optional & specifies the location within the resource
For example, if we once again look at the URI http://www.rfc-editor.org/rfc/rfc1945.txt, the protocol is http, the host is www.rfc-editor.org, the port is not specified and hence defaults to 80, and the resource-path is /rfc/rfc1945.txt.
It is clear that the Web Browser is the one that initiates action by sending a HTTP request to the Web Server. How does the HTTP request message look like ? A HTTP request has the following structure:
HTTP Method | Description |
---|---|
GET | This is one of the most common method. It is used to request the server for a resource |
HEAD | This is similar to the GET method except that the server returns meta information about the requested resource and not the actual resource. This method can be used to check if a resource is present or not or to determine the type of the resource |
POST | This is the other most used method. It is used to send the [Optional Body of Data] to the server. This method is used when the data in the user form(s) needs to be submitted to the server for further processing |
PUT | This method is used to create a new resource
on the server
whose path is provided by |
DELETE | This method is used to tell the server to
delete the resource
specified by <URI> |
OPTIONS | This method is used to get the various features supported by the server or the features supported for the given resource |
TRACE | This method is usually used to discover information about the various intermediaries between the client and the server, namely, proxy, firewall, etc. This method is useful for diagnostics for a loop back response from the server. Most of the web servers disable this option due to security vulnerabilities |
<Request Header Name><:><Request Header Value> indicates the HTTP request header(s) that the client can send to the server. The following table list and describes the most commonly used request headers from the client:
Request Header | Description |
---|---|
Accept | This header indicates the types of data the
client can handle.
Here are some examples: Accept: */* -- Indicates it can accept any type of data response Accept: text/html – Indicates it can only accept html data response |
Accept-Encoding | This header indicates the type of data
encoding (compression
types) the client can handle. Here is an example: Accept-Encoding: gzip – Indicates it can accept gzipped data |
Authorization | This header indicates the user authentication
credentials
to the server. The most common type is the Basic Auth. Here is an
example: Authorization: Basic STa59v3wUPNb |
Host | This header indicates the server and port
from where the
resource <URI> Host: www.polarsparc.com |
If-Modified-Since | This header indicates to the server to send
the content
for the requested resource If-Modified-Since: Sat, 21 Feb 2009 16:50:00 GMT |
User-Agent | This request header indicates the client from
where the
request is sent. Here is an example: User-Agent: Mozilla/5.0 (X11; Linux i686, en-US) Gecko/20090209 |
When a Web Browser sends a HTTP request, the Web Server has to send a HTTP response. How does the HTTP response message look like ? A HTTP response has the following structure:
<Status Code>
indicates the result code of processing a client request. <Reason Phrase>
Status Code | Reason Phrase | Description |
---|---|---|
100 | Continue | Indicates that the server received the request and the client should continue and expect a final response from the server |
200 | OK | Indicates the client request was processed successfully |
201 | Created | Indicates that the resource corresponding to HTTP PUT was successfully created |
301 | Moved Permanently | Indicates the resource indicated by <URI> has been moved to a new location and hence has a new resource path |
304 | Not Modified | Indicates that the content has not changed since the last time it was requested by the client |
400 | Bad Request | Indicates that the client request is not correct |
401 | Unauthorized | Indicates that the client needs to authenticate before accessing the resource |
404 | Not Found | Indicates that the resource indicated by <URI> was not found |
500 | Internal Server Error | Indicates an internal error on the server |
<Response Header Name><:><Response Header Value> indicates the HTTP response header(s) that the server sends to the client. The following table lists and describes the most commonly used response headers from the server:
Response Header | Description |
---|---|
Content-Type | This header indicates the type of
[Optional Resource Content]. Here are some examples: Content-Type: text/xml -- Indicates the type of resource content in the response to be XML |
Content-Encoding | This header indicates the type of resource
content
encoding (compression types). Here is an example: Content-Encoding: gzip – Indicates that the resource content is gzipped |
Content-Length | This header indicates length of the resource
content in bytes. Here is an example: Content-Length: 1245 |
Expires | This header indicates the time after which
the resource
content is to be considered stale. This is basically to invalidate any
cache.
Here is an example: Expires: Sat, 21 Feb 2009 16:30:00 GMT |
Last-Modified | This header indicates the last time the
resource
content was modified. Here is an example: Last-Modified: Sat, 21 Feb 2009 16:50:00 GMT |
Location | This request header indicates a redirect and
provides
a new Location: www.polarsparc.com/unknown/location/res_one |
Having explored both the HTTP request and the HTTP response messages, the following diagram illustrates the web interaction as follows:
With this we have covered the basics of the HTTP request-response protocol.
We can try some of the HTTP requests using the telnet command to see HTTP in action. Open a new terminal and type the following HTTP request:
$ telnet www.yahoo.com 80
GET / HTTP/1.1
Accept: text/html
Host: www.yahoo.com
User-Agent: telnet
<Press Enter twice>
The following is the HTTP response from www.yahoo.com:
HTTP/1.1 200 OK
Date: Sun, 22 Feb 2009 03:46:12 GMT
P3P: policyref="http://info.yahoo.com/w3c/p3p.xml", CP="CAO DSP COR CUR ADM DEV TAI PSA PSD IVAi IVDi CONi TELo OTPi OUR DELi SAMi OTRi UNRi PUBi IND PHY ONL UNI PUR FIN COM NAV INT DEM CNT STA POL HEA PRE LOC GOV"
Cache-Control: private
Vary: User-Agent
X-XRDS-Location: http://open.login.yahooapis.com/openid20/www.yahoo.com/xrds
Last-Modified: Sun, 22 Feb 2009 03:04:24 GMT
Accept-Ranges: bytes
Content-Length: 9490
Connection: close
Content-Type: text/html; charset=utf-8
[<html content here>]
<!-- pbt 1235271720 -->Connection closed by foreign host.
Lets try another HTTP request as follows:
$ telnet www.w3c.org 80
GET / HTTP/1.1
Accept: text/html
Host: www.w3c.org
User-Agent: telnet
<Press Enter twice>
The following is the HTTP response from www.w3c.org:
HTTP/1.1 301 Moved Permanently
Date: Sun, 22 Feb 2009 03:43:07 GMT
Server: Apache/2
Location: http://www.w3.org/
Content-Length: 226
Connection: close
Content-Type: text/html; charset=iso-8859-1
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<h1>Moved Permanently</h1>
<p>The document has moved <a href="http://www.w3.org/">here</a>.</p>
</body></html>
Connection closed by foreign host.
This should give you an idea of how the HTTP protocol works !!!