The HyperText Transfer Protocol (HTTP), the Web’s application-layer protocol, is at the heart of the Web. HTTP defines how Web clients request Web pages from Web servers and how servers transfer Web pages to clients.
When a user requests a Web page (for example, clicks on a hyperlink), the browser sends HTTP request messages for the objects in the page to the server. The server receives the requests and responds with HTTP response messages that contain the objects. HTTP uses TCP as its underlying transport protocol. The HTTP client first initiates a TCP connection with the server. Once the connection is established, the browser and the server processes access TCP through their socket interfaces.
HTTP Message Format
There are two types of HTTP messages, request messages and response messages, both of which are discussed below.
HTTP Request Message
Below we provide a typical HTTP request message:
GET /somedir/page.html HTTP/1.1 Request line
Host: www.someschool.edu Header line (method field, URL field, HTTP version field)
Connection: close Doesn’t bother with persistent connection
User-agent: Mozilla/4.0 User agent
Accept-language: fr Receive a French version of the object
HTTP defines methods to indicate the desired action to be performed on the identified resource. What this resource represents, whether pre-existing data or data that is generated dynamically, depends on the implementation of the server. Often, the resource corresponds to a file or the output of an executable residing on the server. The HTTP/1.0 specification defined the GET, HEAD and POST methods and the HTTP/1.1 specification added five new methods: OPTIONS, PUT, DELETE, TRACE and CONNECT. By being specified in these documents, their semantics are well-known and can be depended on. Any client can use any method and the server can be configured to support any combination of methods.
HTTP Response Message
HTTP/1.1 200 OK
Date: Sat, 07 Jul 2007 12:00:15 GMT
Server: Apache/1.3.0 (Unix)
Last-Modified: Sun, 6 May 2007 09:23:24 GMT
In HTTP/1.0 and since, the first line of the HTTP response is called the status line and includes a numeric status code (such as “404”) and a textual reason phrase(such as “Not Found”). The way the user agent handles the response depends primarily on the code, and secondarily on the other response header fields. Custom status codes can be used, for if the user agent encounters a code it does not recognize, it can use the first digit of the code to determine the general class of the response.
A Web cache—also called a proxy server—is a network entity that satisfies HTTP requests on the behalf of an origin Web server. The Web cache has its own disk storage and keeps copies of recently requested objects in this storage. As shown in Figure, a user’s browser can be configured so that all of the user’s HTTP requests are first directed to the Web cache. Once a browser is configured, each browser request for an object is first directed to the Web cache. As an example, suppose a browser is requesting the object http://www.someschool.edu/campus.gif. Here is what happens:
- The browser establishes a TCP connection to the Web cache and sends an HTTP request for the object to the Web cache.
2.The Web cache checks to see if it has a copy of the object stored locally. If it does, the Web cache returns the object within an HTTP response message to
the client browser.
- If the Web cache does not have the object, the Web cache opens a TCP connection to the origin server, that is, to www.someschool.edu. The Web cache then sends an HTTP request for the object into the cache-to-server TCP connection. After receiving this request, the origin server sends the object within an HTTP response to the Web cache.
- When the Web cache receives the object, it stores a copy in its local storage and sends a copy, within an HTTP response message, to the client browser (over the existing TCP connection between the client browser and the Web cache).