I spoke at our local PUG about HTTP for the November meet up. The overall goal of the talk was to give an overview of the HTTP protocol to the group and layout some ideas of how developers can leverage it for better application development. It was a fun night with lots of questions and interactions.
One of the more specific points I tried to drive home was how to use Requests-lines & Response-codes, with few or no headers or content body as viable communications between clients and servers. Request-lines are the commands that a client sends to the server requesting a resource and the responce-code is the answer a server gives back (aka: server status). Knowing how to tweak these can be helpful to minimize bandwidth usage as well as speed up delivery times for users with slow connections. This is especially true for client side applications that use polling techniques that may initiate several request/response cycles in a minute as well as for high traffic websites that need to conserve bandwidth.
One example is a client side application needs to retrieve data from a server asynchronously, but the data is also processed asynchronously. So as a responsible programmer you queue up a task to be taken care by a worker later on & rather than maintaining the connection open and blocking other possible connections. In doing so you return a task ID and a resource where the client side application can check if the task was completed. Your client side application then begins polling the URL sending the request with some standard browser headers. The server side application then possibly responds with a few headers of its own & possibly a JSON body indicating the status of the task. Each round trip is probably a couple kilobytes. That’s not a lot, right? What if you multiply that by 5 to 10 requests per minute on a task that could take several minutes to complete? This is probably an edge case and not worth considering, right?
What if instead your request looked like this:
HEAD http://www.domain.com/path/to/task/resource HTTP/1.1
(oh yeah, x- on custom headers was deprecated).
Then the response could look like the following if the task had not been completed:
HTTP/1.1 100 Continue
Now you’re talking about probably less than a kilobyte, maybe in the low hundred bytes for the total size of the round trip.
This is not possible at the moment as there is no way for the client side application to tell most browsers to turn off standard headers. However, you do have some control over what the server sends as the response and you can minimize the amount of data sent back.
In Nginx, you can clear most, if not all, headers through the Http Headers More Module:
The HTTP specification actually says that some of these headers SHOULD be returned with the response and Apache has decided to treat this specification as “always must”. Which means that with a few workarounds you can still shave off about 100 bytes on a response that has no content body.
Start by doing things that are considered security best practices anyways. For example setting the Server header to be the least verbose. Add the following to your main Apache configuration:
This will remove most of the serer signature and leave only the product name, Apache.
While you’re at it, you should do the same for the
x-powered-by header added by PHP.Change
expose_php to the following in your php.ini
expose_php = off
These are considered security best practices because by removing the version numbers of the technologies you’re using, you are not advertising which security vulnerabilities you have not patched yet. This will also remove about 60 bytes from the headers.
Beyond that you can use PHP’s header() and header_remove() functions to set and remove custom headers. By default, if no content is sent back, the default headers from Apache are about 300 bytes and you can shave off about 100 by doing what’s mentioned above. In the end it’s the responsability of the application to set headers and content only when absolutely necessary and to use the status codes when appropriate
Looking into the future we must mention SPDY. SPDY is an experimental communications protocol which may possibly be a large part of the next version of HTTP. To give you an idea of what SPDY is looking to accomplish, these are the goals from the project website:
To allow many concurrent HTTP requests to run across a single TCP session.
To reduce the bandwidth currently used by HTTP by compressing headers and eliminating unnecessary headers.
To define a protocol that is easy to implement and server-efficient. We hope to reduce the complexity of HTTP by cutting down on edge cases and defining easily parsed message formats.
To make SSL the underlying transport protocol, for better security and compatibility with existing network infrastructure. Although SSL does introduce a latency penalty, we believe that the long-term future of the web depends on a secure network connection. In addition, the use of SSL is necessary to ensure that communication across existing proxies is not broken.
To enable the server to initiate communications with the client and push data to the client whenever possible.
This is both incredibly exciting and terrifying. Exciting because we’re witnesses/contributors to the evolution of a technology. Terrifying because of the implications this may have on existing applications.
I don’t fully agree with everything the project is trying to accomplish. For example making SSL the “underlying transport protocol” for everything is overkill for a lot of websites that are serving up static content (think your average WordPress site). However, some things like header compression & concurrent connections if they’re opt-in could lead to exciting new ways of writing client side applications.
The possibilities are endless and I only hope that I was able to give the members of our local group a means with which learn more about HTTP.