Fundamentals
Request execution The most essential function of HttpClient is to execute HTTP methods. Execution of an HTTP method involves one or several HTTP request / HTTP response exchanges, usually handled internally by HttpClient. The user is expected to provide a request object to execute and HttpClient is expected to transmit the request to the target server return a corresponding response object, or throw an exception if execution was unsuccessful. Quite naturally, the main entry point of the HttpClient API is the HttpClient interface that defines the contract described above. Here is an example of request execution process in its simplest form:
HTTP request All HTTP requests have a request line consisting a method name, a request URI and a HTTP protocol version. HttpClient supports out of the box all HTTP methods defined in the HTTP/1.1 specification: GET, HEAD, POST, PUT, DELETE, TRACE and OPTIONS. There is a special class for each method type.: HttpGet, HttpHead, HttpPost, HttpPut, HttpDelete, HttpTrace, and HttpOptions. The Request-URI is a Uniform Resource Identifier that identifies the resource upon which to apply the request. HTTP request URIs consist of a protocol scheme, host name, optional port, resource path, optional query, and optional fragment. HttpClient provides a number of utility methods to simplify creation and modification of request URIs. URI can be assembled programmatically: stdout > Query string can also be generated from individual parameters: qparams = new ArrayList(); qparams.add(new BasicNameValuePair("q", "httpclient")); qparams.add(new BasicNameValuePair("btnG", "Google Search")); qparams.add(new BasicNameValuePair("aq", "f")); qparams.add(new BasicNameValuePair("oq", null)); URI uri = URIUtils.createURI("http", "www.google.com", -1, "/search", URLEncodedUtils.format(qparams, "UTF-8"), null); HttpGet httpget = new HttpGet(uri); System.out.println(httpget.getURI()); ]]> stdout >
HTTP response HTTP response is a message sent by the server back to the client after having received and interpreted a request message. The first line of that message consists of the protocol version followed by a numeric status code and its associated textual phrase. stdout >
Working with message headers An HTTP message can contain a number of headers describing properties of the message such as the content length, content type and so on. HttpClient provides methods to retrieve, add, remove and enumerate headers. stdout > The most efficient way to obtain all headers of a given type is by using the HeaderIterator interface. stdout > It also provides convenience methods to parse HTTP messages into individual header elements. stdout >
HTTP entity HTTP messages can carry a content entity associated with the request or response. Entities can be found in some requests and in some responses, as they are optional. Requests that use entities are referred to as entity enclosing requests. The HTTP specification defines two entity enclosing methods: POST and PUT. Responses are usually expected to enclose a content entity. There are exceptions to this rule such as responses to HEAD method and 204 No Content, 304 Not Modified, 205 Reset Content responses. HttpClient distinguishes three kinds of entities, depending on where their content originates: streamed: The content is received from a stream, or generated on the fly. In particular, this category includes entities being received from HTTP responses. Streamed entities are generally not repeatable. self-contained: The content is in memory or obtained by means that are independent from a connection or other entity. Self-contained entities are generally repeatable. This type of entities will be mostly used for entity enclosing HTTP requests. wrapping: The content is obtained from another entity. This distinction is important for connection management when streaming out content from an HTTP response. For request entities that are created by an application and only sent using HttpClient, the difference between streamed and self-contained is of little importance. In that case, it is suggested to consider non-repeatable entities as streamed, and those that are repeatable as self-contained.
Repeatable entities An entity can be repeatable, meaning its content can be read more than once. This is only possible with self contained entities (like ByteArrayEntity or StringEntity)
Using HTTP entities Since an entity can represent both binary and character content, it has support for character encodings (to support the latter, ie. character content). The entity is created when executing a request with enclosed content or when the request was successful and the response body is used to send the result back to the client. To read the content from the entity, one can either retrieve the input stream via the HttpEntity#getContent() method, which returns an java.io.InputStream, or one can supply an output stream to the HttpEntity#writeTo(OutputStream) method, which will return once all content has been written to the given stream. When the entity has been received with an incoming message, the methods HttpEntity#getContentType() and HttpEntity#getContentLength() methods can be used for reading the common metadata such as Content-Type and Content-Length headers (if they are available). Since the Content-Type header can contain a character encoding for text mime-types like text/plain or text/html, the HttpEntity#getContentEncoding() method is used to read this information. If the headers aren't available, a length of -1 will be returned, and NULL for the content type. If the Content-Type header is available, a Header object will be returned. When creating an entity for a outgoing message, this meta data has to be supplied by the creator of the entity. stdout >
Ensuring release of low level resources In order to ensure proper release of system resources one must close the content stream associated with the entity. Please note that HttpEntity#writeTo(OutputStream) method is also required to ensure proper release of system resources once the entity has been fully written out. If this method obtains an instance of java.io.InputStream by calling HttpEntity#getContent(), it is also expected to close the stream in a finally clause. When working with streaming entities, one can use the EntityUtils#consume(HttpEntity) method to ensure that the entity content has been fully consumed and the underlying stream has been closed. There can be situations, however, when only a small portion of the entire response content needs to be retrieved and the performance penalty for consuming the remaining content and making the connection reusable is too high, one can simply terminate the request by calling HttpUriRequest#abort() method. The connection will not be reused, but all level resources held by it will be correctly deallocated.
Consuming entity content The recommended way to consume content of an entity is by using its HttpEntity#getContent() or HttpEntity#writeTo(OutputStream) methods. HttpClient also comes with the EntityUtils class, which exposes several static methods to more easily read the content or information from an entity. Instead of reading the java.io.InputStream directly, one can retrieve the whole content body in a string / byte array by using the methods from this class. However, the use of EntityUtils is strongly discouraged unless the response entities originate from a trusted HTTP server and are known to be of limited length. In some situations it may be necessary to be able to read entity content more than once. In this case entity content must be buffered in some way, either in memory or on disk. The simplest way to accomplish that is by wrapping the original entity with the BufferedHttpEntity class. This will cause the content of the original entity to be read into a in-memory buffer. In all other ways the entity wrapper will be have the original one.
Producing entity content HttpClient provides several classes that can be used to efficiently stream out content though HTTP connections. Instances of those classes can be associated with entity enclosing requests such as POST and PUT in order to enclose entity content into outgoing HTTP requests. HttpClient provides several classes for most common data containers such as string, byte array, input stream, and file: StringEntity, ByteArrayEntity, InputStreamEntity, and FileEntity. Please note InputStreamEntity is not repeatable, because it can only read from the underlying data stream once. Generally it is recommended to implement a custom HttpEntity class which is self-contained instead of using generic InputStreamEntity. FileEntity can be a good starting point.
Dynamic content entities Often HTTP entities need to be generated dynamically based a particular execution context. HttpClient provides support for dynamic entities by using EntityTemplate entity class and ContentProducer interface. Content producers are objects which produce their content on demand, by writing it out to an output stream. They are expected to be able produce their content every time they are requested to do so. So entities created with EntityTemplate are generally self-contained and repeatable. "); writer.write(" "); writer.write(" important stuff"); writer.write(" "); writer.write(""); writer.flush(); } }; HttpEntity entity = new EntityTemplate(cp); HttpPost httppost = new HttpPost("http://localhost/handler.do"); httppost.setEntity(entity); ]]>
HTML forms Many applications frequently need to simulate the process of submitting an HTML form, for instance, in order to log in to a web application or submit input data. HttpClient provides special entity class UrlEncodedFormEntity to facilitate the process. formparams = new ArrayList(); formparams.add(new BasicNameValuePair("param1", "value1")); formparams.add(new BasicNameValuePair("param2", "value2")); UrlEncodedFormEntity entity = new UrlEncodedFormEntity(formparams, "UTF-8"); HttpPost httppost = new HttpPost("http://localhost/handler.do"); httppost.setEntity(entity); ]]> This UrlEncodedFormEntity instance will use the so called URL encoding to encode parameters and produce the following content:
Content chunking Generally it is recommended to let HttpClient choose the most appropriate transfer encoding based on the properties of the HTTP message being transferred. It is possible, however, to inform HttpClient that the chunk coding is preferred by setting HttpEntity#setChunked() to true. Please note that HttpClient will use this flag as a hint only. This value well be ignored when using HTTP protocol versions that do not support chunk coding, such as HTTP/1.0.
Response handlers The simplest and the most convenient way to handle responses is by using ResponseHandler interface. This method completely relieves the user from having to worry about connection management. When using a ResponseHandler HttpClient will automatically take care of ensuring release of the connection back to the connection manager regardless whether the request execution succeeds or causes an exception. handler = new ResponseHandler() { public byte[] handleResponse( HttpResponse response) throws ClientProtocolException, IOException { HttpEntity entity = response.getEntity(); if (entity != null) { return EntityUtils.toByteArray(entity); } else { return null; } } }; byte[] response = httpclient.execute(httpget, handler); ]]>
HTTP execution context Originally HTTP has been designed as a stateless, response-request oriented protocol. However, real world applications often need to be able to persist state information through several logically related request-response exchanges. In order to enable applications to maintain a processing state HttpClient allows HTTP requests to be executed within a particular execution context, referred to as HTTP context. Multiple logically related requests can participate in a logical session if the same context is reused between consecutive requests. HTTP context functions similarly to java.util.Map<String, Object>. It is simply a collection of arbitrary named values. Application can populate context attributes prior to a request execution or examine the context after the execution has been completed. In the course of HTTP request execution HttpClient adds the following attributes to the execution context: 'http.connection': HttpConnection instance representing the actual connection to the target server. 'http.target_host': HttpHost instance representing the connection target. 'http.proxy_host': HttpHost instance representing the connection proxy, if used 'http.request': HttpRequest instance representing the actual HTTP request. 'http.response': HttpResponse instance representing the actual HTTP response. 'http.request_sent': java.lang.Boolean object representing the flag indicating whether the actual request has been fully transmitted to the connection target. For instance, in order to determine the final redirect target, one can examine the value of the http.target_host attribute after the request execution: stdout >
Exception handling HttpClient can throw two types of exceptions: java.io.IOException in case of an I/O failure such as socket timeout or an socket reset and HttpException that signals an HTTP failure such as a violation of the HTTP protocol. Usually I/O errors are considered non-fatal and recoverable, whereas HTTP protocol errors are considered fatal and cannot be automatically recovered from.
HTTP transport safety It is important to understand that the HTTP protocol is not well suited for all types of applications. HTTP is a simple request/response oriented protocol which was initially designed to support static or dynamically generated content retrieval. It has never been intended to support transactional operations. For instance, the HTTP server will consider its part of the contract fulfilled if it succeeds in receiving and processing the request, generating a response and sending a status code back to the client. The server will make no attempts to roll back the transaction if the client fails to receive the response in its entirety due to a read timeout, a request cancellation or a system crash. If the client decides to retry the same request, the server will inevitably end up executing the same transaction more than once. In some cases this may lead to application data corruption or inconsistent application state. Even though HTTP has never been designed to support transactional processing, it can still be used as a transport protocol for mission critical applications provided certain conditions are met. To ensure HTTP transport layer safety the system must ensure the idempotency of HTTP methods on the application layer.
Idempotent methods HTTP/1.1 specification defines idempotent method as Methods can also have the property of "idempotence" in that (aside from error or expiration issues) the side-effects of N > 0 identical requests is the same as for a single request In other words the application ought to ensure that it is prepared to deal with the implications of multiple execution of the same method. This can be achieved, for instance, by providing a unique transaction id and by other means of avoiding execution of the same logical operation. Please note that this problem is not specific to HttpClient. Browser based applications are subject to exactly the same issues related to HTTP methods non-idempotency. HttpClient assumes non-entity enclosing methods such as GET and HEAD to be idempotent and entity enclosing methods such as POST and PUT to be not.
Automatic exception recovery By default HttpClient attempts to automatically recover from I/O exceptions. The default auto-recovery mechanism is limited to just a few exceptions that are known to be safe. HttpClient will make no attempt to recover from any logical or HTTP protocol errors (those derived from HttpException class). HttpClient will automatically retry those methods that are assumed to be idempotent. HttpClient will automatically retry those methods that fail with a transport exception while the HTTP request is still being transmitted to the target server (i.e. the request has not been fully transmitted to the server). HttpClient will automatically retry those methods that have been fully transmitted to the server, but the server failed to respond with an HTTP status code (the server simply drops the connection without sending anything back). In this case it is assumed that the request has not been processed by the server and the application state has not changed. If this assumption may not hold true for the web server your application is targeting it is highly recommended to provide a custom exception handler.
Request retry handler In order to enable a custom exception recovery mechanism one should provide an implementation of the HttpRequestRetryHandler interface. = 5) { // Do not retry if over max retry count return false; } if (exception instanceof NoHttpResponseException) { // Retry if the server dropped connection on us return true; } if (exception instanceof SSLHandshakeException) { // Do not retry on SSL handshake exception return false; } HttpRequest request = (HttpRequest) context.getAttribute( ExecutionContext.HTTP_REQUEST); boolean idempotent = !(request instanceof HttpEntityEnclosingRequest); if (idempotent) { // Retry if the request is considered idempotent return true; } return false; } }; httpclient.setHttpRequestRetryHandler(myRetryHandler); ]]>
Aborting requests In some situations HTTP request execution fail to complete within the expected time frame due to high load on the target server or too many concurrent requests issued on the client side. In such cases it may be necessary to terminate the request prematurely and unblock the execution thread blocked in a I/O operation. HTTP requests being executed by HttpClient can be aborted at any stage of execution by invoking HttpUriRequest#abort() method. This method is thread-safe and can be called from any thread. When an HTTP request is aborted its execution thread blocked in an I/O operation is guaranteed to unblock by throwing a InterruptedIOException
HTTP protocol interceptors HTTP protocol interceptor is a routine that implements a specific aspect of the HTTP protocol. Usually protocol interceptors are expected to act upon one specific header or a group of related headers of the incoming message or populate the outgoing message with one specific header or a group of related headers. Protocol interceptors can also manipulate content entities enclosed with messages, transparent content compression / decompression being a good example. Usually this is accomplished by using the 'Decorator' pattern where a wrapper entity class is used to decorate the original entity. Several protocol interceptors can be combined to form one logical unit. Protocol interceptors can collaborate by sharing information - such as a processing state - through the HTTP execution context. Protocol interceptors can use HTTP context to store a processing state for one request or several consecutive requests. Usually the order in which interceptors are executed should not matter as long as they do not depend on a particular state of the execution context. If protocol interceptors have interdependencies and therefore must be executed in a particular order, they should be added to the protocol processor in the same sequence as their expected execution order. Protocol interceptors must be implemented as thread-safe. Similarly to servlets, protocol interceptors should not use instance variables unless access to those variables is synchronized. This is an example of how local context can be used to persist a processing state between consecutive requests:
HTTP parameters HttpParams interface represents a collection of immutable values that define a runtime behavior of a component. In many ways HttpParams is similar to HttpContext. The main distinction between the two lies in their use at runtime. Both interfaces represent a collection of objects that are organized as a map of keys to object values, but serve distinct purposes: HttpParams is intended to contain simple objects: integers, doubles, strings, collections and objects that remain immutable at runtime. HttpParams is expected to be used in the 'write once - ready many' mode. HttpContext is intended to contain complex objects that are very likely to mutate in the course of HTTP message processing. The purpose of HttpParams is to define a behavior of other components. Usually each complex component has its own HttpParams object. The purpose of HttpContext is to represent an execution state of an HTTP process. Usually the same execution context is shared among many collaborating objects.
Parameter hierarchies In the course of HTTP request execution HttpParams of the HttpRequest object are linked together with HttpParams of the HttpClient instance used to execute the request. This enables parameters set at the HTTP request level take precedence over HttpParams set at the HTTP client level. The recommended practice is to set common parameters shared by all HTTP requests at the HTTP client level and selectively override specific parameters at the HTTP request level. stdout >
HTTP parameters beans HttpParams interface allows for a great deal of flexibility in handling configuration of components. Most importantly, new parameters can be introduced without affecting binary compatibility with older versions. However, HttpParams also has a certain disadvantage compared to regular Java beans: HttpParams cannot be assembled using a DI framework. To mitigate the limitation, HttpClient includes a number of bean classes that can used in order to initialize HttpParams objects using standard Java bean conventions. stdout >
HTTP request execution parameters These are parameters that can impact the process of request execution: 'http.protocol.version': defines HTTP protocol version used if not set explicitly on the request object. This parameter expects a value of type ProtocolVersion. If this parameter is not set HTTP/1.1 will be used. 'http.protocol.element-charset': defines the charset to be used for encoding HTTP protocol elements. This parameter expects a value of type java.lang.String. If this parameter is not set US-ASCII will be used. 'http.protocol.content-charset': defines the charset to be used per default for content body coding. This parameter expects a value of type java.lang.String. If this parameter is not set ISO-8859-1 will be used. 'http.useragent': defines the content of the User-Agent header. This parameter expects a value of type java.lang.String. If this parameter is not set, HttpClient will automatically generate a value for it. 'http.protocol.strict-transfer-encoding': defines whether responses with an invalid Transfer-Encoding header should be rejected. This parameter expects a value of type java.lang.Boolean. If this parameter is not set invalid Transfer-Encoding values will be ignored. 'http.protocol.expect-continue': activates Expect: 100-Continue handshake for the entity enclosing methods. The purpose of the Expect: 100-Continue handshake is to allow the client that is sending a request message with a request body to determine if the origin server is willing to accept the request (based on the request headers) before the client sends the request body. The use of the Expect: 100-continue handshake can result in a noticeable performance improvement for entity enclosing requests (such as POST and PUT) that require the target server's authentication. Expect: 100-continue handshake should be used with caution, as it may cause problems with HTTP servers and proxies that do not support HTTP/1.1 protocol. This parameter expects a value of type java.lang.Boolean. If this parameter is not set HttpClient will attempt to use the handshake. 'http.protocol.wait-for-continue': defines the maximum period of time in milliseconds the client should spend waiting for a 100-continue response. This parameter expects a value of type java.lang.Integer. If this parameter is not set HttpClient will wait 3 seconds for a confirmation before resuming the transmission of the request body.