Fundamentals
Request execution
The most essential function of HttpClient is to execute HTTP methods. Execution of an
HTTP method involves one or several HTTP request / HTTP response exchanges, usually
handled internally by HttpClient. The user is expected to provide a request object to
execute and HttpClient is expected to transmit the request to the target server return a
corresponding response object, or throw an exception if execution was unsuccessful.
Quite naturally, the main entry point of the HttpClient API is the HttpClient
interface that defines the contract described above.
Here is an example of request execution process in its simplest form:
HTTP request
All HTTP requests have a request line consisting a method name, a request URI and
a HTTP protocol version.
HttpClient supports out of the box all HTTP methods defined in the HTTP/1.1
specification: GET, HEAD,
POST, PUT, DELETE,
TRACE and OPTIONS. There is a special
class for each method type.: HttpGet,
HttpHead, HttpPost,
HttpPut, HttpDelete,
HttpTrace, and HttpOptions.
The Request-URI is a Uniform Resource Identifier that identifies the resource upon
which to apply the request. HTTP request URIs consist of a protocol scheme, host
name, optional port, resource path, optional query, and optional fragment.
HttpClient provides a number of utility methods to simplify creation and
modification of request URIs.
URI can be assembled programmatically:
stdout >
Query string can also be generated from individual parameters:
qparams = new ArrayList();
qparams.add(new BasicNameValuePair("q", "httpclient"));
qparams.add(new BasicNameValuePair("btnG", "Google Search"));
qparams.add(new BasicNameValuePair("aq", "f"));
qparams.add(new BasicNameValuePair("oq", null));
URI uri = URIUtils.createURI("http", "www.google.com", -1, "/search",
URLEncodedUtils.format(qparams, "UTF-8"), null);
HttpGet httpget = new HttpGet(uri);
System.out.println(httpget.getURI());
]]>
stdout >
HTTP response
HTTP response is a message sent by the server back to the client after having
received and interpreted a request message. The first line of that message consists
of the protocol version followed by a numeric status code and its associated textual
phrase.
stdout >
Working with message headers
An HTTP message can contain a number of headers describing properties of the
message such as the content length, content type and so on. HttpClient provides
methods to retrieve, add, remove and enumerate headers.
stdout >
The most efficient way to obtain all headers of a given type is by using the
HeaderIterator interface.
stdout >
It also provides convenience methods to parse HTTP messages into individual header
elements.
stdout >
HTTP entity
HTTP messages can carry a content entity associated with the request or response.
Entities can be found in some requests and in some responses, as they are optional.
Requests that use entities are referred to as entity enclosing requests. The HTTP
specification defines two entity enclosing methods: POST and
PUT. Responses are usually expected to enclose a content
entity. There are exceptions to this rule such as responses to
HEAD method and 204 No Content,
304 Not Modified, 205 Reset Content
responses.
HttpClient distinguishes three kinds of entities, depending on where their content
originates:
streamed:
The content is received from a stream, or generated on the fly. In
particular, this category includes entities being received from HTTP
responses. Streamed entities are generally not repeatable.
self-contained:
The content is in memory or obtained by means that are independent
from a connection or other entity. Self-contained entities are generally
repeatable. This type of entities will be mostly used for entity
enclosing HTTP requests.
wrapping:
The content is obtained from another entity.
This distinction is important for connection management when streaming out content
from an HTTP response. For request entities that are created by an application and
only sent using HttpClient, the difference between streamed and self-contained is of
little importance. In that case, it is suggested to consider non-repeatable entities
as streamed, and those that are repeatable as self-contained.
Repeatable entities
An entity can be repeatable, meaning its content can be read more than once.
This is only possible with self contained entities (like
ByteArrayEntity or
StringEntity)
Using HTTP entities
Since an entity can represent both binary and character content, it has
support for character encodings (to support the latter, ie. character
content).
The entity is created when executing a request with enclosed content or when
the request was successful and the response body is used to send the result back
to the client.
To read the content from the entity, one can either retrieve the input stream
via the HttpEntity#getContent() method, which returns
an java.io.InputStream, or one can supply an output
stream to the HttpEntity#writeTo(OutputStream) method,
which will return once all content has been written to the given stream.
When the entity has been received with an incoming message, the methods
HttpEntity#getContentType() and
HttpEntity#getContentLength() methods can be used
for reading the common metadata such as Content-Type and
Content-Length headers (if they are available). Since the
Content-Type header can contain a character encoding for
text mime-types like text/plain or text/html, the
HttpEntity#getContentEncoding() method is used to
read this information. If the headers aren't available, a length of -1 will be
returned, and NULL for the content type. If the Content-Type
header is available, a Header object will be
returned.
When creating an entity for a outgoing message, this meta data has to be
supplied by the creator of the entity.
stdout >
Ensuring release of low level resources
In order to ensure proper release of system resources one must close the content
stream associated with the entity.
Please note that HttpEntity#writeTo(OutputStream)
method is also required to ensure proper release of system resources once the
entity has been fully written out. If this method obtains an instance of
java.io.InputStream by calling
HttpEntity#getContent(), it is also expected to close
the stream in a finally clause.
When working with streaming entities, one can use the
EntityUtils#consume(HttpEntity) method to ensure that
the entity content has been fully consumed and the underlying stream has been
closed.
There can be situations, however, when only a small portion of the entire response
content needs to be retrieved and the performance penalty for consuming the
remaining content and making the connection reusable is too high, one can simply
terminate the request by calling HttpUriRequest#abort()
method.
The connection will not be reused, but all level resources held by it will be
correctly deallocated.
Consuming entity content
The recommended way to consume content of an entity is by using its
HttpEntity#getContent() or
HttpEntity#writeTo(OutputStream) methods. HttpClient
also comes with the EntityUtils class, which exposes several
static methods to more easily read the content or information from an entity.
Instead of reading the java.io.InputStream directly, one can
retrieve the whole content body in a string / byte array by using the methods from
this class. However, the use of EntityUtils is
strongly discouraged unless the response entities originate from a trusted HTTP
server and are known to be of limited length.
In some situations it may be necessary to be able to read entity content more than
once. In this case entity content must be buffered in some way, either in memory or
on disk. The simplest way to accomplish that is by wrapping the original entity with
the BufferedHttpEntity class. This will cause the content of
the original entity to be read into a in-memory buffer. In all other ways the entity
wrapper will be have the original one.
Producing entity content
HttpClient provides several classes that can be used to efficiently stream out
content though HTTP connections. Instances of those classes can be associated with
entity enclosing requests such as POST and PUT
in order to enclose entity content into outgoing HTTP requests. HttpClient provides
several classes for most common data containers such as string, byte array, input
stream, and file: StringEntity,
ByteArrayEntity,
InputStreamEntity, and
FileEntity.
Please note InputStreamEntity is not repeatable, because it
can only read from the underlying data stream once. Generally it is recommended to
implement a custom HttpEntity class which is
self-contained instead of using generic InputStreamEntity.
FileEntity can be a good starting point.
Dynamic content entities
Often HTTP entities need to be generated dynamically based a particular
execution context. HttpClient provides support for dynamic entities by using
EntityTemplate entity class and
ContentProducer interface. Content producers
are objects which produce their content on demand, by writing it out to an
output stream. They are expected to be able produce their content every time
they are requested to do so. So entities created with
EntityTemplate are generally self-contained and
repeatable.
");
writer.write(" ");
writer.write(" important stuff");
writer.write(" ");
writer.write("");
writer.flush();
}
};
HttpEntity entity = new EntityTemplate(cp);
HttpPost httppost = new HttpPost("http://localhost/handler.do");
httppost.setEntity(entity);
]]>
HTML forms
Many applications frequently need to simulate the process of submitting an
HTML form, for instance, in order to log in to a web application or submit input
data. HttpClient provides special entity class
UrlEncodedFormEntity to facilitate the
process.
formparams = new ArrayList();
formparams.add(new BasicNameValuePair("param1", "value1"));
formparams.add(new BasicNameValuePair("param2", "value2"));
UrlEncodedFormEntity entity = new UrlEncodedFormEntity(formparams, "UTF-8");
HttpPost httppost = new HttpPost("http://localhost/handler.do");
httppost.setEntity(entity);
]]>
This UrlEncodedFormEntity instance will use the so
called URL encoding to encode parameters and produce the following
content:
Content chunking
Generally it is recommended to let HttpClient choose the most appropriate
transfer encoding based on the properties of the HTTP message being transferred.
It is possible, however, to inform HttpClient that the chunk coding is preferred
by setting HttpEntity#setChunked() to true. Please note
that HttpClient will use this flag as a hint only. This value well be ignored
when using HTTP protocol versions that do not support chunk coding, such as
HTTP/1.0.
Response handlers
The simplest and the most convenient way to handle responses is by using
ResponseHandler interface. This method completely
relieves the user from having to worry about connection management. When using a
ResponseHandler HttpClient will automatically
take care of ensuring release of the connection back to the connection manager
regardless whether the request execution succeeds or causes an exception.
handler = new ResponseHandler() {
public byte[] handleResponse(
HttpResponse response) throws ClientProtocolException, IOException {
HttpEntity entity = response.getEntity();
if (entity != null) {
return EntityUtils.toByteArray(entity);
} else {
return null;
}
}
};
byte[] response = httpclient.execute(httpget, handler);
]]>
HTTP execution context
Originally HTTP has been designed as a stateless, response-request oriented protocol.
However, real world applications often need to be able to persist state information
through several logically related request-response exchanges. In order to enable
applications to maintain a processing state HttpClient allows HTTP requests to be
executed within a particular execution context, referred to as HTTP context. Multiple
logically related requests can participate in a logical session if the same context is
reused between consecutive requests. HTTP context functions similarly to
java.util.Map<String, Object>. It is
simply a collection of arbitrary named values. Application can populate context
attributes prior to a request execution or examine the context after the execution has
been completed.
In the course of HTTP request execution HttpClient adds the following attributes to
the execution context:
'http.connection':
HttpConnection instance representing the
actual connection to the target server.
'http.target_host':
HttpHost instance representing the connection
target.
'http.proxy_host':
HttpHost instance representing the connection
proxy, if used
'http.request':
HttpRequest instance representing the
actual HTTP request.
'http.response':
HttpResponse instance representing the
actual HTTP response.
'http.request_sent':
java.lang.Boolean object representing the flag
indicating whether the actual request has been fully transmitted to the
connection target.
For instance, in order to determine the final redirect target, one can examine the
value of the http.target_host attribute after the request
execution:
stdout >
Exception handling
HttpClient can throw two types of exceptions:
java.io.IOException in case of an I/O failure such as
socket timeout or an socket reset and HttpException that
signals an HTTP failure such as a violation of the HTTP protocol. Usually I/O errors are
considered non-fatal and recoverable, whereas HTTP protocol errors are considered fatal
and cannot be automatically recovered from.
HTTP transport safety
It is important to understand that the HTTP protocol is not well suited for all
types of applications. HTTP is a simple request/response oriented protocol which was
initially designed to support static or dynamically generated content retrieval. It
has never been intended to support transactional operations. For instance, the HTTP
server will consider its part of the contract fulfilled if it succeeds in receiving
and processing the request, generating a response and sending a status code back to
the client. The server will make no attempts to roll back the transaction if the
client fails to receive the response in its entirety due to a read timeout, a
request cancellation or a system crash. If the client decides to retry the same
request, the server will inevitably end up executing the same transaction more than
once. In some cases this may lead to application data corruption or inconsistent
application state.
Even though HTTP has never been designed to support transactional processing, it
can still be used as a transport protocol for mission critical applications provided
certain conditions are met. To ensure HTTP transport layer safety the system must
ensure the idempotency of HTTP methods on the application layer.
Idempotent methods
HTTP/1.1 specification defines idempotent method as
Methods can also have the property of "idempotence" in
that (aside from error or expiration issues) the side-effects of N > 0
identical requests is the same as for a single request
In other words the application ought to ensure that it is prepared to deal with
the implications of multiple execution of the same method. This can be achieved, for
instance, by providing a unique transaction id and by other means of avoiding
execution of the same logical operation.
Please note that this problem is not specific to HttpClient. Browser based
applications are subject to exactly the same issues related to HTTP methods
non-idempotency.
HttpClient assumes non-entity enclosing methods such as GET and
HEAD to be idempotent and entity enclosing methods such as
POST and PUT to be not.
Automatic exception recovery
By default HttpClient attempts to automatically recover from I/O exceptions. The
default auto-recovery mechanism is limited to just a few exceptions that are known
to be safe.
HttpClient will make no attempt to recover from any logical or HTTP
protocol errors (those derived from
HttpException class).
HttpClient will automatically retry those methods that are assumed to be
idempotent.
HttpClient will automatically retry those methods that fail with a
transport exception while the HTTP request is still being transmitted to the
target server (i.e. the request has not been fully transmitted to the
server).
HttpClient will automatically retry those methods that have been fully
transmitted to the server, but the server failed to respond with an HTTP
status code (the server simply drops the connection without sending anything
back). In this case it is assumed that the request has not been processed by
the server and the application state has not changed. If this assumption may
not hold true for the web server your application is targeting it is highly
recommended to provide a custom exception handler.
Request retry handler
In order to enable a custom exception recovery mechanism one should provide an
implementation of the HttpRequestRetryHandler
interface.
= 5) {
// Do not retry if over max retry count
return false;
}
if (exception instanceof NoHttpResponseException) {
// Retry if the server dropped connection on us
return true;
}
if (exception instanceof SSLHandshakeException) {
// Do not retry on SSL handshake exception
return false;
}
HttpRequest request = (HttpRequest) context.getAttribute(
ExecutionContext.HTTP_REQUEST);
boolean idempotent = !(request instanceof HttpEntityEnclosingRequest);
if (idempotent) {
// Retry if the request is considered idempotent
return true;
}
return false;
}
};
httpclient.setHttpRequestRetryHandler(myRetryHandler);
]]>
Aborting requests
In some situations HTTP request execution fail to complete within the expected time
frame due to high load on the target server or too many concurrent requests issued on
the client side. In such cases it may be necessary to terminate the request prematurely
and unblock the execution thread blocked in a I/O operation. HTTP requests being
executed by HttpClient can be aborted at any stage of execution by invoking
HttpUriRequest#abort() method. This method is thread-safe
and can be called from any thread. When an HTTP request is aborted its execution thread
blocked in an I/O operation is guaranteed to unblock by throwing a
InterruptedIOException
HTTP protocol interceptors
HTTP protocol interceptor is a routine that implements a specific aspect of the HTTP
protocol. Usually protocol interceptors are expected to act upon one specific header or
a group of related headers of the incoming message or populate the outgoing message with
one specific header or a group of related headers. Protocol interceptors can also
manipulate content entities enclosed with messages, transparent content compression /
decompression being a good example. Usually this is accomplished by using the
'Decorator' pattern where a wrapper entity class is used to decorate the original
entity. Several protocol interceptors can be combined to form one logical unit.
Protocol interceptors can collaborate by sharing information - such as a processing
state - through the HTTP execution context. Protocol interceptors can use HTTP context
to store a processing state for one request or several consecutive requests.
Usually the order in which interceptors are executed should not matter as long as they
do not depend on a particular state of the execution context. If protocol interceptors
have interdependencies and therefore must be executed in a particular order, they should
be added to the protocol processor in the same sequence as their expected execution
order.
Protocol interceptors must be implemented as thread-safe. Similarly to servlets,
protocol interceptors should not use instance variables unless access to those variables
is synchronized.
This is an example of how local context can be used to persist a processing state
between consecutive requests:
HTTP parameters
HttpParams interface represents a collection of immutable values that define a runtime
behavior of a component. In many ways HttpParams is
similar to HttpContext. The main distinction between the
two lies in their use at runtime. Both interfaces represent a collection of objects that
are organized as a map of keys to object values, but serve distinct purposes:
HttpParams is intended to contain simple
objects: integers, doubles, strings, collections and objects that remain
immutable at runtime.
HttpParams is expected to be used in the 'write
once - ready many' mode. HttpContext is intended
to contain complex objects that are very likely to mutate in the course of HTTP
message processing.
The purpose of HttpParams is to define a
behavior of other components. Usually each complex component has its own
HttpParams object. The purpose of
HttpContext is to represent an execution
state of an HTTP process. Usually the same execution context is shared among
many collaborating objects.
Parameter hierarchies
In the course of HTTP request execution HttpParams
of the HttpRequest object are linked together with
HttpParams of the
HttpClient instance used to execute the request.
This enables parameters set at the HTTP request level take precedence over
HttpParams set at the HTTP client level. The
recommended practice is to set common parameters shared by all HTTP requests at the
HTTP client level and selectively override specific parameters at the HTTP request
level.
stdout >
HTTP parameters beans
HttpParams interface allows for a great deal of
flexibility in handling configuration of components. Most importantly, new
parameters can be introduced without affecting binary compatibility with older
versions. However, HttpParams also has a certain
disadvantage compared to regular Java beans:
HttpParams cannot be assembled using a DI
framework. To mitigate the limitation, HttpClient includes a number of bean classes
that can used in order to initialize HttpParams
objects using standard Java bean conventions.
stdout >
HTTP request execution parameters
These are parameters that can impact the process of request execution:
'http.protocol.version':
defines HTTP protocol version used if not set explicitly on the request
object. This parameter expects a value of type
ProtocolVersion. If this parameter is not
set HTTP/1.1 will be used.
'http.protocol.element-charset':
defines the charset to be used for encoding HTTP protocol elements. This
parameter expects a value of type java.lang.String.
If this parameter is not set US-ASCII will be
used.
'http.protocol.content-charset':
defines the charset to be used per default for content body coding. This
parameter expects a value of type java.lang.String.
If this parameter is not set ISO-8859-1 will be
used.
'http.useragent':
defines the content of the User-Agent header. This
parameter expects a value of type java.lang.String.
If this parameter is not set, HttpClient will automatically generate a value
for it.
'http.protocol.strict-transfer-encoding':
defines whether responses with an invalid
Transfer-Encoding header should be rejected. This
parameter expects a value of type java.lang.Boolean.
If this parameter is not set invalid Transfer-Encoding
values will be ignored.
'http.protocol.expect-continue':
activates Expect: 100-Continue handshake for the entity
enclosing methods. The purpose of the Expect:
100-Continue handshake is to allow the client that is sending
a request message with a request body to determine if the origin server is
willing to accept the request (based on the request headers) before the
client sends the request body. The use of the Expect:
100-continue handshake can result in a noticeable performance
improvement for entity enclosing requests (such as POST
and PUT) that require the target server's authentication.
Expect: 100-continue handshake should be used with
caution, as it may cause problems with HTTP servers and proxies that do not
support HTTP/1.1 protocol. This parameter expects a value of type
java.lang.Boolean. If this parameter is not set
HttpClient will attempt to use the handshake.
'http.protocol.wait-for-continue':
defines the maximum period of time in milliseconds the client should spend
waiting for a 100-continue response. This parameter
expects a value of type java.lang.Integer. If this
parameter is not set HttpClient will wait 3 seconds for a confirmation
before resuming the transmission of the request body.