Sources Contact Advanced Search Tutorials

An Interest In:

Web News this Week

Search Archive

Some of Our Sources

View All Sources

Help Webnuz

Referal links:

April 29, 2013 07:53 pm GMT

HTTP: The Protocol Every Web Developer Must Know Part 2

In my previous article, we covered some of HTTP’s basics, such as the URL scheme, status codes and request/response headers. With that as our foundation, we will look at the finer aspects of HTTP, like connection handling, authentication and HTTP caching. These topics are fairly extensive, but we’ll cover the most important bits.

HTTP Connections

A connection must be established between the client and server before they can communicate with each other, and HTTP uses the reliable TCP transport protocol to make this connection. By default, web traffic uses TCP port 80. A TCP stream is broken into IP packets, and it ensures that those packets always arrive in the correct order without fail. HTTP is an application layer protocol over TCP, which is over IP.

HTTPS is a secure version of HTTP, inserting an additional layer between HTTP and TCP called TLS or SSL (Transport Layer Security or Secure Sockets Layer, respectively). HTTPS communicates over port 443 by default, and we will look at HTTPS later in this article.

An HTTP connection is identified by <source-IP, source-port> and <destination-IP, destination-port>. On a client, an HTTP application is identified by a <IP, port> tuple. Establishing a connection between two endpoints is a multi-step process and involves the following:

resolve IP address from host name via DNS
establish a connection with the server
send a request
wait for a response
close connection

The server is responsible for always responding with the correct headers and responses.

In HTTP/1.0, all connections were closed after a single transaction. So, if a client wanted to request three separate images from the same server, it made three separate connections to the remote host. As you can see from the above diagram, this can introduce lot of network delays, resulting in a sub-optimal user experience.

To reduce connection-establishment delays, HTTP/1.1 introduced persistent connections, long-lived connections that stay open until the client closes them. Persistent connections are default in HTTP/1.1, and making a single transaction connection requires the client to set the Connection: close request header. This tells the server to close the connection after sending the response.

In addition to persistent connections, browsers/clients also employ a technique, called parallel connections, to minimize network delays. The age-old concept of parallel connections involves creating a pool of connections (generally capped at six connections). If there are six assets that the client needs to download from a website, the client makes six parallel connections to download those assets, resulting in a faster turnaround. This is a huge improvement over serial connections where the client only downloads an asset after completing the download for a previous asset.

Parallel connections, in combination with persistent connections, is today’s answer to minimizing network delays and creating a smooth experience on the client. For an in-depth treatment of HTTP connections, refer to the Connections section of the HTTP spec.

Server-side Connection Handling

The server mostly listens for incoming connections and processes them when it receives a request. The operations involve:

establishing a socket to start listening on port 80 (or some other port)
receiving the request and parsing the message
processing the response
setting response headers
sending the response to the client
close the connection if a Connection: close request header was found

Of course, this is not an exhaustive list of operations. Most applications/websites need to know who makes a request in order to create more customized responses. This is the realm of identification and authentication.

Identification and Authentication

HTTP is an application layer protocol over TCP, which is over IP.

It is almost mandatory to know who connects to a server for tracking an app’s or site’s usage and the general interaction patterns of users. The premise of identification is to tailor the response in order to provide a personalized experience; naturally, the server must know who a user is in order to provide that functionality.

There are a few different ways a server can collect this information, and most websites use a hybrid of these approaches:

Request headers: From, Referer, User-Agent – We saw these headers in Part 1.
Client-IP – the IP address of the client
Fat Urls – storing state of the current user by modifying the URL and redirecting to a different URL on each click; each click essentially accumulates state.
Cookies – the most popular and non-intrusive approach.

Cookies allow the server to attach arbitrary information for outgoing responses via the Set-Cookie response header. A cookie is set with one or more name=value pairs separated by semicolon (;), as in Set-Cookie: session-id=12345ABC; username=nettuts.

A server can also restrict the cookies to a specific domain and path, and it can make them persistent with an expires value. Cookies are automatically sent by the browser for each request made to a server, and the browser ensures that only the domain- and path-specific cookies are sent in the request. The request header Cookie: name=value [; name2=value2] is used to send these cookies to the server.

The best way to identify a user is to require them to sign up and log in, but implementing this feature requires some effort by the developer, as well as the user.

Techniques like OAuth simplify this type of feature, but it still requires user consent in order to work properly. Authentication plays a large role here, and it is probably the only way to identify and verify the user.

Authentication

HTTP does support a rudimentary form of authentication called Basic Authentication, as well as the more secure Digest Authentication.

In Basic Authentication, the server initially denies the client’s request with a WWW-Authenticate response header and a 401 Unauthorized status code. On seeing this header, the browser displays a login dialog, prompting for a username and password. This information is sent in a base-64 encoded format in the Authentication request header. The server can now validate the request and allow access if the credentials are valid. Some servers might also send an Authentication-Info header containing additional authentication details.

A corollary to Basic-Authentication is Proxy Authentication. Instead of a web server, the authetication challenge is requested by an intermediate proxy. The proxy sends a Proxy-Authenticate header with a 407 Unauthorized status code. In return, the client is supposed to send the credentials via the Proxy-Authorization request header.

Digest Authentication is similar to Basic and uses the same handshake technique with the WWW-Authenticate and Authorization headers, but Digest uses a more secure hashing function to encrypt the username and password (commonly with MD5 or KD digest functions). Although Digest Authentication is supposed to be more secure than Basic, websites typically use Basic Authentication because of its simplicty. To mitigate the security concerns, Basic Auth is used in conjunction with SSL.

An Interest In:

Web News this Week

Some of Our Sources

Help Webnuz

HTTP: The Protocol Every Web Developer Must Know Part 2

HTTP Connections

Server-side Connection Handling

Identification and Authentication

Authentication

Secure HTTP

Certificates

HTTP Caching

Cache Processing

Cache Control Headers

Document Expiration

Server Revalidation

Controlling the Cachability

Constraining Freshness from the Client

Summary

References

TutsPlus - Code