2010-05-08 05:22:29 -04:00
|
|
|
<?xml version="1.0" encoding="UTF-8"?>
|
|
|
|
<!DOCTYPE preface PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
|
|
|
|
"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd">
|
|
|
|
<!--
|
|
|
|
====================================================================
|
|
|
|
Licensed to the Apache Software Foundation (ASF) under one
|
|
|
|
or more contributor license agreements. See the NOTICE file
|
|
|
|
distributed with this work for additional information
|
|
|
|
regarding copyright ownership. The ASF licenses this file
|
|
|
|
to you under the Apache License, Version 2.0 (the
|
|
|
|
"License"); you may not use this file except in compliance
|
|
|
|
with the License. You may obtain a copy of the License at
|
|
|
|
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
|
|
|
|
Unless required by applicable law or agreed to in writing,
|
|
|
|
software distributed under the License is distributed on an
|
|
|
|
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
|
|
|
KIND, either express or implied. See the License for the
|
|
|
|
specific language governing permissions and limitations
|
|
|
|
under the License.
|
|
|
|
====================================================================
|
|
|
|
-->
|
|
|
|
<chapter id="caching">
|
|
|
|
<title>HTTP Caching</title>
|
|
|
|
|
|
|
|
<section id="generalconcepts">
|
|
|
|
<title>General Concepts</title>
|
|
|
|
|
2011-01-10 14:02:33 -05:00
|
|
|
<para>HttpClient Cache provides an HTTP/1.1-compliant caching layer to be
|
|
|
|
used with HttpClient--the Java equivalent of a browser cache. The
|
2013-08-14 14:41:58 -04:00
|
|
|
implementation follows the Chain of Responsibility design pattern, where the
|
|
|
|
caching HttpClient implementation can serve a drop-in replacement for
|
|
|
|
the default non-caching HttpClient implementation; requests that can be
|
|
|
|
satisfied entirely from the cache will not result in actual origin requests.
|
|
|
|
Stale cache entries are automatically validated with the origin where possible,
|
|
|
|
using conditional GETs and the If-Modified-Since and/or If-None-Match request
|
|
|
|
headers.
|
2011-01-10 14:02:33 -05:00
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
HTTP/1.1 caching in general is designed to be <emphasis>semantically
|
|
|
|
transparent</emphasis>; that is, a cache should not change the meaning of
|
|
|
|
the request-response exchange between client and server. As such, it should
|
2013-08-14 14:41:58 -04:00
|
|
|
be safe to drop a caching HttpClient into an existing compliant client-server
|
2011-01-10 14:02:33 -05:00
|
|
|
relationship. Although the caching module is part of the client from an
|
|
|
|
HTTP protocol point of view, the implementation aims to be compatible with
|
|
|
|
the requirements placed on a transparent caching proxy.
|
|
|
|
</para>
|
|
|
|
|
2013-08-14 14:41:58 -04:00
|
|
|
<para>Finally, caching HttpClient includes support the Cache-Control
|
2011-01-10 14:02:33 -05:00
|
|
|
extensions specified by RFC 5861 (stale-if-error and stale-while-revalidate).
|
|
|
|
</para>
|
2010-05-08 05:22:29 -04:00
|
|
|
|
2013-08-14 14:41:58 -04:00
|
|
|
<para>When caching HttpClient executes a request, it goes through the
|
2010-05-08 05:22:29 -04:00
|
|
|
following flow:</para>
|
|
|
|
|
|
|
|
<orderedlist>
|
|
|
|
<listitem>
|
|
|
|
<para>Check the request for basic compliance with the HTTP 1.1
|
|
|
|
protocol and attempt to correct the request.</para>
|
|
|
|
</listitem>
|
|
|
|
|
|
|
|
<listitem>
|
|
|
|
<para>Flush any cache entries which would be invalidated by this
|
|
|
|
request.</para>
|
|
|
|
</listitem>
|
|
|
|
|
|
|
|
<listitem>
|
|
|
|
<para>Determine if the current request would be servable from cache.
|
|
|
|
If not, directly pass through the request to the origin server and
|
|
|
|
return the response, after caching it if appropriate.</para>
|
|
|
|
</listitem>
|
|
|
|
|
|
|
|
<listitem>
|
|
|
|
<para>If it was a a cache-servable request, it will attempt to read it
|
|
|
|
from the cache. If it is not in the cache, call the origin server and
|
|
|
|
cache the response, if appropriate.</para>
|
|
|
|
</listitem>
|
|
|
|
|
|
|
|
<listitem>
|
|
|
|
<para>If the cached response is suitable to be served as a response,
|
|
|
|
construct a BasicHttpResponse containing a ByteArrayEntity and return
|
|
|
|
it. Otherwise, attempt to revalidate the cache entry against the
|
|
|
|
origin server.</para>
|
|
|
|
</listitem>
|
|
|
|
|
|
|
|
<listitem>
|
|
|
|
<para>In the case of a cached response which cannot be revalidated,
|
|
|
|
call the origin server and cache the response, if appropriate.</para>
|
|
|
|
</listitem>
|
|
|
|
</orderedlist>
|
|
|
|
|
2013-08-14 14:41:58 -04:00
|
|
|
<para>When caching HttpClient receives a response, it goes through the
|
2010-05-08 05:22:29 -04:00
|
|
|
following flow:</para>
|
|
|
|
|
|
|
|
<orderedlist>
|
|
|
|
<listitem>
|
2011-01-10 14:02:33 -05:00
|
|
|
<para>Examining the response for protocol compliance</para>
|
2010-05-08 05:22:29 -04:00
|
|
|
</listitem>
|
|
|
|
|
|
|
|
<listitem>
|
|
|
|
<para>Determine whether the response is cacheable</para>
|
|
|
|
</listitem>
|
|
|
|
|
|
|
|
<listitem>
|
|
|
|
<para>If it is cacheable, attempt to read up to the maximum size
|
|
|
|
allowed in the configuration and store it in the cache.</para>
|
|
|
|
</listitem>
|
|
|
|
|
|
|
|
<listitem>
|
|
|
|
<para>If the response is too large for the cache, reconstruct the
|
|
|
|
partially consumed response and return it directly without caching
|
|
|
|
it.</para>
|
|
|
|
</listitem>
|
|
|
|
</orderedlist>
|
|
|
|
|
2013-08-14 14:41:58 -04:00
|
|
|
<para>It is important to note that caching HttpClient is not, itself,
|
|
|
|
a different implementation of HttpClient, but that it works by inserting
|
|
|
|
itself as an additonal processing component to the request execution
|
|
|
|
pipeline.</para>
|
2010-05-08 05:22:29 -04:00
|
|
|
</section>
|
|
|
|
|
|
|
|
<section id="rfc2616compliance">
|
|
|
|
<title>RFC-2616 Compliance</title>
|
|
|
|
|
2012-08-22 14:31:52 -04:00
|
|
|
<para>We believe HttpClient Cache is <emphasis>unconditionally
|
2011-01-10 14:02:33 -05:00
|
|
|
compliant</emphasis> with <ulink
|
2010-05-08 05:22:29 -04:00
|
|
|
url="http://www.ietf.org/rfc/rfc2616.txt">RFC-2616</ulink>. That is,
|
2012-08-22 14:31:52 -04:00
|
|
|
wherever the specification indicates MUST, MUST NOT, SHOULD, or SHOULD NOT
|
|
|
|
for HTTP caches, the caching layer attempts to behave in a way that satisfies
|
|
|
|
those requirements. This means the caching module won't produce incorrect
|
|
|
|
behavior when you drop it in. </para>
|
2010-05-08 05:22:29 -04:00
|
|
|
</section>
|
|
|
|
|
|
|
|
<section>
|
|
|
|
<title>Example Usage</title>
|
|
|
|
|
2013-08-14 14:41:58 -04:00
|
|
|
<para>This is a simple example of how to set up a basic caching HttpClient.
|
2010-05-08 05:22:29 -04:00
|
|
|
As configured, it will store a maximum of 1000 cached objects, each of
|
|
|
|
which may have a maximum body size of 8192 bytes. The numbers selected
|
|
|
|
here are for example only and not intended to be prescriptive or
|
|
|
|
considered as recommendations.</para>
|
|
|
|
|
2010-11-03 12:09:07 -04:00
|
|
|
<programlisting><![CDATA[
|
2013-08-14 14:41:58 -04:00
|
|
|
CacheConfig cacheConfig = CacheConfig.custom()
|
|
|
|
.setMaxCacheEntries(1000)
|
|
|
|
.setMaxObjectSize(8192)
|
|
|
|
.build();
|
|
|
|
RequestConfig requestConfig = RequestConfig.custom()
|
|
|
|
.setConnectTimeout(30000)
|
|
|
|
.setSocketTimeout(30000)
|
|
|
|
.build();
|
2014-02-04 05:21:02 -05:00
|
|
|
CloseableHttpClient cachingClient = CachingHttpClients.custom()
|
2013-08-14 14:41:58 -04:00
|
|
|
.setCacheConfig(cacheConfig)
|
|
|
|
.setDefaultRequestConfig(requestConfig)
|
|
|
|
.build();
|
|
|
|
|
|
|
|
HttpCacheContext context = HttpCacheContext.create();
|
2010-11-03 12:09:07 -04:00
|
|
|
HttpGet httpget = new HttpGet("http://www.mydomain.com/content/");
|
2013-08-14 14:41:58 -04:00
|
|
|
CloseableHttpResponse response = cachingClient.execute(httpget, context);
|
|
|
|
try {
|
|
|
|
CacheResponseStatus responseStatus = context.getCacheResponseStatus();
|
|
|
|
switch (responseStatus) {
|
|
|
|
case CACHE_HIT:
|
|
|
|
System.out.println("A response was generated from the cache with " +
|
|
|
|
"no requests sent upstream");
|
|
|
|
break;
|
|
|
|
case CACHE_MODULE_RESPONSE:
|
|
|
|
System.out.println("The response was generated directly by the " +
|
|
|
|
"caching module");
|
|
|
|
break;
|
|
|
|
case CACHE_MISS:
|
|
|
|
System.out.println("The response came from an upstream server");
|
|
|
|
break;
|
|
|
|
case VALIDATED:
|
|
|
|
System.out.println("The response was generated from the cache " +
|
|
|
|
"after validating the entry with the origin server");
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
} finally {
|
|
|
|
response.close();
|
2010-11-03 12:09:07 -04:00
|
|
|
}
|
|
|
|
]]>
|
|
|
|
</programlisting>
|
2010-05-08 05:22:29 -04:00
|
|
|
</section>
|
2011-01-10 14:02:33 -05:00
|
|
|
|
|
|
|
<section id="configuration">
|
|
|
|
<title>Configuration</title>
|
|
|
|
|
2013-08-14 14:41:58 -04:00
|
|
|
<para>The caching HttpClient inherits all configuration options and parameters
|
|
|
|
of the default non-caching implementation (this includes setting options like
|
|
|
|
timeouts and connection pool sizes). For caching-specific configuration, you can
|
|
|
|
provide a <classname>CacheConfig</classname> instance to customize behavior
|
|
|
|
across the following areas:</para>
|
2011-01-10 14:02:33 -05:00
|
|
|
|
|
|
|
<para><emphasis>Cache size.</emphasis> If the backend storage supports these limits,
|
|
|
|
you can specify the maximum number of cache entries as well as the maximum cacheable
|
|
|
|
response body size.</para>
|
|
|
|
|
|
|
|
|
|
|
|
<para><emphasis>Public/private caching.</emphasis> By default, the caching module
|
|
|
|
considers itself to be a shared (public) cache, and will not, for example, cache
|
|
|
|
responses to requests with Authorization headers or responses marked with
|
|
|
|
"Cache-Control: private". If, however, the cache is only going to be used by one
|
|
|
|
logical "user" (behaving similarly to a browser cache), then you will want to turn
|
|
|
|
off the shared cache setting.</para>
|
|
|
|
|
|
|
|
<para><emphasis>Heuristic caching.</emphasis>Per RFC2616, a cache MAY cache
|
|
|
|
certain cache entries even if no explicit cache control headers are set by the
|
|
|
|
origin. This behavior is off by default, but you may want to turn this on if you
|
|
|
|
are working with an origin that doesn't set proper headers but where you still
|
|
|
|
want to cache the responses. You will want to enable heuristic caching, then
|
|
|
|
specify either a default freshness lifetime and/or a fraction of the time since
|
|
|
|
the resource was last modified. See Sections 13.2.2 and 13.2.4 of the HTTP/1.1
|
|
|
|
RFC for more details on heuristic caching.</para>
|
|
|
|
|
|
|
|
<para><emphasis>Background validation.</emphasis> The cache module supports the
|
|
|
|
stale-while-revalidate directive of RFC5861, which allows certain cache entry
|
|
|
|
revalidations to happen in the background. You may want to tweak the settings
|
|
|
|
for the minimum and maximum number of background worker threads, as well as the
|
|
|
|
maximum time they can be idle before being reclaimed. You can also control the
|
|
|
|
size of the queue used for revalidations when there aren't enough workers to
|
|
|
|
keep up with demand.</para>
|
|
|
|
</section>
|
|
|
|
|
|
|
|
<section id="storage">
|
|
|
|
<title>Storage Backends</title>
|
|
|
|
|
2013-08-14 14:41:58 -04:00
|
|
|
<para>The default implementation of caching HttpClient stores cache entries and
|
2011-01-10 14:02:33 -05:00
|
|
|
cached response bodies in memory in the JVM of your application. While this
|
|
|
|
offers high performance, it may not be appropriate for your application due to
|
|
|
|
the limitation on size or because the cache entries are ephemeral and don't
|
|
|
|
survive an application restart. The current release includes support for storing
|
2012-08-22 14:31:52 -04:00
|
|
|
cache entries using EhCache and memcached implementations, which allow for
|
2011-01-10 14:02:33 -05:00
|
|
|
spilling cache entries to disk or storing them in an external process.</para>
|
|
|
|
|
|
|
|
<para>If none of those options are suitable for your application, it is
|
|
|
|
possible to provide your own storage backend by implementing the HttpCacheStorage
|
2013-08-14 14:41:58 -04:00
|
|
|
interface and then supplying that to caching HttpClient at construction time. In
|
2011-01-10 14:02:33 -05:00
|
|
|
this case, the cache entries will be stored using your scheme but you will get to
|
|
|
|
reuse all of the logic surrounding HTTP/1.1 compliance and cache handling.
|
|
|
|
Generally speaking, it should be possible to create an HttpCacheStorage
|
|
|
|
implementation out of anything that supports a key/value store (similar to the
|
|
|
|
Java Map interface) with the ability to apply atomic updates.</para>
|
|
|
|
|
2013-08-14 14:41:58 -04:00
|
|
|
<para>Finally, with some extra efforts it's entirely possible to set up
|
|
|
|
a multi-tier caching hierarchy; for example, wrapping an in-memory caching
|
|
|
|
HttpClient around one that stores cache entries on disk or remotely in memcached,
|
|
|
|
following a pattern similar to virtual memory, L1/L2 processor caches, etc.
|
2011-01-10 14:02:33 -05:00
|
|
|
</para>
|
|
|
|
</section>
|
2010-05-08 05:22:29 -04:00
|
|
|
</chapter>
|