Understanding HyperText Transfer Protocol (HTTP)

We live in an Internet age where the World Wide Web (WWW) is part of our lives. Have you ever wondered how the Web works ? What happens when you type a web address on your browser and press key to go ? At a very high-level, the Web Browser is requesting for information from the Web Server corresponding to the web address and the Web Server responds by sending the appropriate information. Behind the scenes, the Web Browser and the Web Server are communicating with each other using HyperText Transfer Protocol (HTTP for short).

The HTTP protocol is an application level request-response like communication protocol of the Web. When we type a web address like “www.polarsparc.com” in our Web Browser, it is basically making a request to the Web Server for the web content of PolarSPARC. The Web Server sends the web content back as a response to the Web Browser. The Web Browser renders the content on the browser. The following diagram illustrates this scenario:

The Web Server is a container for various web resources such as, texts, images, videos, etc. Every web resource on a Web Server can be identified uniquely using a Uniform Resource Identifier (or URI for short). When a URI is used as an address to locate a web resource on a Web Server, it is known as a Uniform Resource Locator (or URL for short). For example, to locate the original HTTP protocol specification from the Internet Engineering Task Force (IETF) website, we would use the URL http://www.rfc-editor.org/rfc/rfc1945.txt.

where,

protocol :: is the protocol used to address the resource, namely, http or ftp, etc
user-id:password :: is optional & specifies the authentication credentials to access the resource. If omitted, it means anonymous access
host :: specify server that can serve the resource
port :: is optional & defaults to 80
resource-path :: is optional & specifies the name of the resource
query-string :: is optional & provides additional context information for accessing the resource. It is specified as a string of name and value pairs like: name1=value1&name2=value2&name3=value3, etc
fragment :: is optional & specifies the location within the resource

For example, if we once again look at the URI http://www.rfc-editor.org/rfc/rfc1945.txt, the protocol is http, the host is www.rfc-editor.org, the port is not specified and hence defaults to 80, and the resource-path is /rfc/rfc1945.txt.

It is clear that the Web Browser is the one that initiates action by sending a HTTP request to the Web Server. How does the HTTP request message look like ? A HTTP request has the following structure:

<HTTP Method> <URI> <HTTP Version> <CR><LF>
<Request Header Name-1><:> <Request Header Value-1> <CR><LF>
<Request Header Name-2><:> <Request Header Value-2> <CR><LF>
...<CR><LF>
...<CR><LF>
<Request Header Name-n><:> <Request Header Value-n> <CR><LF>
<CR><LF>
[Optional Body of Data]

<HTTP Method> can be one of the following: GET, HEAD, POST, PUT, DELETE, OPTIONS, or TRACE. These are sometimes referred to as HTTP Verbs. The following table describes each of the HTTP methods:

HTTP Method	Description
GET	This is one of the most common method. It is used to request the server for a resource
HEAD	This is similar to the GET method except that the server returns meta information about the requested resource and not the actual resource. This method can be used to check if a resource is present or not or to determine the type of the resource
POST	This is the other most used method. It is used to send the [Optional Body of Data] to the server. This method is used when the data in the user form(s) needs to be submitted to the server for further processing
PUT	This method is used to create a new resource on the server whose path is provided by <URI> and whose content is [Optional Body of Data]
DELETE	This method is used to tell the server to delete the resource specified by <URI>. The server can override this request from a client and not delete the resource
OPTIONS	This method is used to get the various features supported by the server or the features supported for the given resource
TRACE	This method is usually used to discover information about the various intermediaries between the client and the server, namely, proxy, firewall, etc. This method is useful for diagnostics for a loop back response from the server. Most of the web servers disable this option due to security vulnerabilities

HTTP Method

Description

GET

This is one of the most common method. It is used to request the server for a resource

HEAD

This is similar to the GET method except that the server returns meta information about the requested resource and not the actual resource. This method can be used to check if a resource is present or not or to determine the type of the resource

POST

This is the other most used method. It is used to send the [Optional Body of Data] to the server. This method is used when the data in the user form(s) needs to be submitted to the server for further processing

PUT

This method is used to create a new resource on the server whose path is provided by <URI> and whose content is [Optional Body of Data]

DELETE

This method is used to tell the server to delete the resource specified by <URI>. The server can override this request from a client and not delete the resource

OPTIONS

This method is used to get the various features supported by the server or the features supported for the given resource

TRACE

This method is usually used to discover information about the various intermediaries between the client and the server, namely, proxy, firewall, etc. This method is useful for diagnostics for a loop back response from the server. Most of the web servers disable this option due to security vulnerabilities

<Request Header Name><:><Request Header Value> indicates the HTTP request header(s) that the client can send to the server. The following table list and describes the most commonly used request headers from the client:

Request Header	Description
Accept	This header indicates the types of data the client can handle. Here are some examples: Accept: / -- Indicates it can accept any type of data response Accept: text/html – Indicates it can only accept html data response
Accept-Encoding	This header indicates the type of data encoding (compression types) the client can handle. Here is an example: Accept-Encoding: gzip – Indicates it can accept gzipped data
Authorization	This header indicates the user authentication credentials to the server. The most common type is the Basic Auth. Here is an example: Authorization: Basic STa59v3wUPNb
Host	This header indicates the server and port from where the resource <URI> is being requested. Here is an example: Host: www.polarsparc.com
If-Modified-Since	This header indicates to the server to send the content for the requested resource <URI> only if has been modified since the last time it was requested. Here is an example: If-Modified-Since: Sat, 21 Feb 2009 16:50:00 GMT
User-Agent	This request header indicates the client from where the request is sent. Here is an example: User-Agent: Mozilla/5.0 (X11; Linux i686, en-US) Gecko/20090209

Request Header

Description

This header indicates the types of data the client can handle. Here are some examples:
Accept: */* -- Indicates it can accept any type of data response
Accept: text/html – Indicates it can only accept html data response

Accept-Encoding

This header indicates the type of data encoding (compression types) the client can handle. Here is an example:
Accept-Encoding: gzip – Indicates it can accept gzipped data

Authorization

This header indicates the user authentication credentials to the server. The most common type is the Basic Auth. Here is an example:
Authorization: Basic STa59v3wUPNb

Host

This header indicates the server and port from where the resource <URI> is being requested. Here is an example:
Host: www.polarsparc.com

If-Modified-Since

This header indicates to the server to send the content for the requested resource <URI> only if has been modified since the last time it was requested. Here is an example:
If-Modified-Since: Sat, 21 Feb 2009 16:50:00 GMT

User-Agent

This request header indicates the client from where the request is sent. Here is an example:
User-Agent: Mozilla/5.0 (X11; Linux i686, en-US) Gecko/20090209

When a Web Browser sends a HTTP request, the Web Server has to send a HTTP response. How does the HTTP response message look like ? A HTTP response has the following structure:

<HTTP Version> <Status Code> <Reason Phrase> <CR><LF>
<Response Header Name-1><:> <Response Header Value-1> <CR><LF>
<Response Header Name-2><:> <Response Header Value-2> <CR><LF>
...<CR><LF>
...<CR><LF>
<Response Header Name-n><:> <Response Header Value-n> <CR><LF>
<CR><LF>
[Optional Resource Content]

<Status Code> indicates the result code of processing a client request. <Reason Phrase> is the human readable interpretation of the result code. The following table describes some of the common codes and phrases:

Status Code	Reason Phrase	Description
100	Continue	Indicates that the server received the request and the client should continue and expect a final response from the server
200	OK	Indicates the client request was processed successfully
201	Created	Indicates that the resource corresponding to HTTP PUT was successfully created
301	Moved Permanently	Indicates the resource indicated by <URI> has been moved to a new location and hence has a new resource path
304	Not Modified	Indicates that the content has not changed since the last time it was requested by the client
400	Bad Request	Indicates that the client request is not correct
401	Unauthorized	Indicates that the client needs to authenticate before accessing the resource
404	Not Found	Indicates that the resource indicated by <URI> was not found
500	Internal Server Error	Indicates an internal error on the server

Status Code

Reason Phrase

Description

100

Continue

Indicates that the server received the request and the client should continue and expect a final response from the server

200

Indicates the client request was processed successfully

201

Created

Indicates that the resource corresponding to HTTP PUT was successfully created

301

Moved Permanently

Indicates the resource indicated by <URI> has been moved to a new location and hence has a new resource path

304

Not Modified

Indicates that the content has not changed since the last time it was requested by the client

400

Bad Request

Indicates that the client request is not correct

401

Unauthorized

Indicates that the client needs to authenticate before accessing the resource

404

Not Found

Indicates that the resource indicated by <URI> was not found

500

Internal Server Error

Indicates an internal error on the server

<Response Header Name><:><Response Header Value> indicates the HTTP response header(s) that the server sends to the client. The following table lists and describes the most commonly used response headers from the server:

Response Header	Description
Content-Type	This header indicates the type of [Optional Resource Content]. Here are some examples: Content-Type: text/xml -- Indicates the type of resource content in the response to be XML
Content-Encoding	This header indicates the type of resource content encoding (compression types). Here is an example: Content-Encoding: gzip – Indicates that the resource content is gzipped
Content-Length	This header indicates length of the resource content in bytes. Here is an example: Content-Length: 1245
Expires	This header indicates the time after which the resource content is to be considered stale. This is basically to invalidate any cache. Here is an example: Expires: Sat, 21 Feb 2009 16:30:00 GMT
Last-Modified	This header indicates the last time the resource content was modified. Here is an example: Last-Modified: Sat, 21 Feb 2009 16:50:00 GMT
Location	This request header indicates a redirect and provides a new <URI> the client needs to use for the requested resource. Here is an example: Location: www.polarsparc.com/unknown/location/res_one

Response Header

Description

Content-Type

This header indicates the type of [Optional Resource Content]. Here are some examples:
Content-Type: text/xml -- Indicates the type of resource content in the response to be XML

Content-Encoding

This header indicates the type of resource content encoding (compression types). Here is an example:
Content-Encoding: gzip – Indicates that the resource content is gzipped

Content-Length

This header indicates length of the resource content in bytes. Here is an example:
Content-Length: 1245

Expires

This header indicates the time after which the resource content is to be considered stale. This is basically to invalidate any cache. Here is an example:
Expires: Sat, 21 Feb 2009 16:30:00 GMT

Last-Modified

This header indicates the last time the resource content was modified. Here is an example:
Last-Modified: Sat, 21 Feb 2009 16:50:00 GMT

Location

This request header indicates a redirect and provides a new <URI> the client needs to use for the requested resource. Here is an example:
Location: www.polarsparc.com/unknown/location/res_one

Output.1

HTTP/1.1 200 OK 
Date: Sun, 22 Feb 2009 03:46:12 GMT 
P3P: policyref="http://info.yahoo.com/w3c/p3p.xml", CP="CAO DSP COR CUR ADM DEV TAI PSA PSD IVAi IVDi CONi TELo OTPi OUR DELi SAMi OTRi UNRi PUBi IND PHY ONL UNI PUR FIN COM NAV INT DEM CNT STA POL HEA PRE LOC GOV" 
Cache-Control: private 
Vary: User-Agent 
X-XRDS-Location: http://open.login.yahooapis.com/openid20/www.yahoo.com/xrds 
Last-Modified: Sun, 22 Feb 2009 03:04:24 GMT 
Accept-Ranges: bytes 
Content-Length: 9490 
Connection: close 
Content-Type: text/html; charset=utf-8 

[<html content here>]

<!-- pbt 1235271720 -->Connection closed by foreign host.

Output.2

HTTP/1.1 301 Moved Permanently 
Date: Sun, 22 Feb 2009 03:43:07 GMT 
Server: Apache/2 
Location: http://www.w3.org/ 
Content-Length: 226 
Connection: close 
Content-Type: text/html; charset=iso-8859-1 

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> 
<html><head> 
<title>301 Moved Permanently</title> 
</head><body> 
<h1>Moved Permanently</h1> 
<p>The document has moved <a href="http://www.w3.org/">here</a>.</p> 
</body></html> 
Connection closed by foreign host.

Input.1

Output.1

Input.2

Output.2