diff --git a/pom.xml b/pom.xml index 0d7db6c0b..d5b09fdb1 100644 --- a/pom.xml +++ b/pom.xml @@ -193,7 +193,6 @@ com.agilejava.docbkx docbkx-maven-plugin - 2.0.8 diff --git a/src/docbkx/advanced.xml b/src/docbkx/advanced.xml new file mode 100644 index 000000000..e984bff93 --- /dev/null +++ b/src/docbkx/advanced.xml @@ -0,0 +1,223 @@ + + + + + Advanced topics +
+ Custom client connections + In certain situations it may be necessary to customize the way HTTP messages get + transmitted across the wire beyond what is possible possible using HTTP parameters in + order to be able to deal non-standard, non-compliant behaviours. For instance, for web + crawlers it may be necessary to force HttpClient into accepting malformed response heads + in order to salvage the content of the messages. + Usually the process of plugging in a custom message parser or a custom connection + implementation involves several steps: + + + Provide a custom LineParser / + LineFormatter interface implementation. + Implement message parsing / formatting logic as required. + + + + Provide a custom OperatedClientConnection + implementation. Replace default request / response parsers, request / response + formatters with custom ones as required. Implement different message writing / + reading code if necessary. + + + + Provide a custom ClientConnectionOperator + interface implementation in order to create connections of new class. Implement + different socket initialization code if necessary. + + + + Provide a custom ClientConnectionManager + interface implementation in order to create connection operator of new + class. + + + +
+
+ Stateful HTTP connections + While HTTP specification assumes that session state information is always embedded in + HTTP messages in the form of HTTP cookies and therefore HTTP connections are always + stateless, this assumption does not always hold true in real life. There are cases when + HTTP connections are created with a particular user identity or within a particular + security context and therefore cannot be shared with other users and can be reused by + the same user only. Examples of such stateful HTTP connections are + NTLM authenticated connections and SSL connections with client + certificate authentication. +
+ User token handler + HttpClient relies on UserTokenHandler interface to + determine if the given execution context is user specific or not. The token object + returned by this handler is expected to uniquely identify the current user if the + context is user specific or to be null if the context does not contain any resources + or details specific to the current user. The user token will be used to ensure that + user specific resources will not be shared with or reused by other users. + The default implementation of the UserTokenHandler + interface uses an instance of Principal class to represent a state object for HTTP + connections, if it can be obtained from the given execution context. + DefaultUserTokenHandler will use the user principle of + connection based authentication schemes such as NTLM or that of + the SSL session with client authentication turned on. If both are unavailable, null + token will be returned. + Users can provide a custom implementation if the default one does not satisfy + their needs: + +
+
+ User token and execution context + In the course of HTTP request execution HttpClient adds the following user + identity related objects to the execution context: + + + + 'http.user-token': + Object instance representing the actual user identity, usually + expected to be an instance of Principle + interface + + + + One can find out whether or not the connection used to execute the request was + stateful by examining the content of the local HTTP context after the request has + been executed. + +
+ Persistent stateful connections + Please note that persistent connection that carry a state object can be reused + only if the same state object is bound to the execution context when requests + are executed. So, it is really important to ensure the either same context is + reused for execution of subsequent HTTP requests by the same user or the user + token is bound to the context prior to request execution. + +
+
+ +
+ +
diff --git a/src/docbkx/authentication.xml b/src/docbkx/authentication.xml new file mode 100644 index 000000000..3ffb8900a --- /dev/null +++ b/src/docbkx/authentication.xml @@ -0,0 +1,318 @@ + + + + + HTTP authentication + HttpClient provides full support for authentication schemes defined by the HTTP standard + specification. HttpClient's authentication framework can also be extended to support + non-standard authentication schemes such as NTLM and + SPNEGO. +
+ User credentials + Any process of user authentication requires a set of credentials that can be used to + establish user identity. In the simplest form user crednetials can be just a user name / + password pair. UsernamePasswordCredentials represents a set of + credentials consisting of a security principal and a password in clear text. This + implementation is sufficient for standard authentication schemes defined by the HTTP + standard specification. + + stdout > + + NTCredentials is a Microsoft Windows specific implementation + that includes in addition to the user name / password pair a set of additional Windows + specific attributes such as a name of the user domain, as in Microsoft Windows network + the same user can belong to multiple domains with a different set of + authorizations. + + stdout > + +
+
+ Authentication schemes + The AuthScheme interface represents an abstract + challenge-response oriented authentication scheme. An authentication scheme is expected + to support the following functions: + + + Parse and process the challenge sent by the target server in response to + request for a protected resource. + + + Provide properties of the processed challenge: the authentication scheme type + and its parameters, such the realm this authentication scheme is applicable to, + if available + + + Generate authorization string for the given set of credentials and the HTTP + request in response to the actual authorization challenge. + + + Please note authentication schemes may be stateful involving a series of + challenge-response exchanges. + HttpClient ships with several AuthScheme + implementations: + + + + Basic: + Basic authentication scheme as defined in RFC 2617. This authentication + scheme is insecure, as the credentials are transmitted in clear text. + Despite its insecurity Basic authentication scheme is perfectly adequate if + used in combination with the TLS/SSL encryption. + + + Digest + Digest authentication scheme as defined in RFC 2617. Digest authentication + scheme is significantly more secure than Basic and can be a good choice for + those applications that do not want the overhead of full transport security + through TLS/SSL encryption. + + + NTLM: + NTLM is a proprietary authentication scheme developed by Microsoft and + optimized for Windows platforms. NTLM is believed to be more secure than + Digest. This scheme is supported only partially and requires an external + NTLM engine. For details please refer to the + NTLM_SUPPORT.txt document included with HttpClient + distributions. + + + +
+
+ HTTP authentication parameters + These are parameters that be used to customize HTTP authentication process and + behaviour of individual authentication schemes: + + + + 'http.protocol.handle-authentication': + defines whether authentication should be handled automatically. This + parameter expects a value of type java.lang.Boolean. + If this parameter is not set HttpClient will handle authentication + automatically. + + + 'http.auth.credential-charset': + defines the charset to be used when encoding user credentials. This + parameter expects a value of type java.lang.String. If + this parameter is not set US-ASCII will be used. + + + +
+
+ Authentication scheme registry + HttpClient maintains a registry of available authentication scheme using + AuthSchemeRegistry class. The following schemes are + registered per default: + + + + Basic: + Basic authentication scheme + + + Digest: + Digest authentication scheme + + + + Please note NTLM scheme is NOT registered per + default. For details on how to enable NTLM support please refer to + the NTLM_SUPPORT.txt document included with HttpClient + distributions. +
+
+ Credentials provider + Credentials providers are intended to maintain a set of user credentials and to be + able to produce user credentials for a particular authentication scope. Authentication + scope consists of a host name, a port number, a realm name and an authentication scheme + name. When registering credentials with the credentials provider one can provide a wild + card (any host, any port, any realm, any scheme) instead of a concrete attribute value. + The credentials provider is then expected to be able to find the closest match for a + particular scope if the direct match cannot be found. + HttpClient can work with any physical representation of a credentials provider that + implements the CredentialsProvider interface. The default + CredentialsProvider implementation called + BasicCredentialsProvider is a simple implementation backed by + a java.util.HashMap. + + stdout > + +
+
+ HTTP authentication and execution context + HttpClient relies on the AuthState class to keep track of + detailed information about the state of the authentication process. HttpClient creates + two instances of AuthState in the course of HTTP request + execution: one for target host authentication and another one for proxy authentication. + In case the target server or the proxy require user authentication the respective + AuthScope instance will be populated with the + AuthScope, AuthScheme and + Crednetials used during the authentication process. + The AuthState can be examined in order to find out what kind of + authentication was requested, whether a matching + AuthScheme implementation was found and whether the + credentials provider managed to find user credentials for the given authentication + scope. + In the course of HTTP request execution HttpClient adds the following authentication + related objects to the execution context: + + + + 'http.authscheme-registry': + AuthSchemeRegistry instance representing the actual + authentication scheme registry. The value of this attribute set in the local + context takes precedence over the default one. + + + 'http.auth.credentials-provider': + CookieSpec instance representing the actual + credentials provider. The value of this attribute set in the local context + takes precedence over the default one. + + + 'http.auth.target-scope': + AuthState instance representing the actual target + authentication state. The value of this attribute set in the local context + takes precedence over the default one. + + + 'http.auth.proxy-scope': + AuthState instance representing the actual proxy + authentication state. The value of this attribute set in the local context + takes precedence over the default one. + + + + The local HttpContext object can be used to customize + the HTTP authentication context prior to request execution or examine its state after + the request has been executed: + +
+
+ Preemptive authentication + HttpClient does not support preemptive authentication out of the box, because if + misused or used incorrectly the preemptive authentication can lead to significant + security issues, such as sending user credentials in clear text to an unauthorized third + party. Therefore, users are expected to evaluate potential benefits of preemptive + authentication versus security risks in the context of their specific application + environment and are required to add support for preemptive authentication using standard + HttpClient extension mechanisms such as protocol interceptors. + This is an example of a simple protocol interceptor that preemptively introduces an + instance of BasicScheme to the execution context, if no + authentication has been attempted yet. Please note that this interceptor must be added + to the protocol processing chain before the standard authentication interceptors. + +
+
diff --git a/src/docbkx/connmgmt.xml b/src/docbkx/connmgmt.xml new file mode 100644 index 000000000..6e668e51b --- /dev/null +++ b/src/docbkx/connmgmt.xml @@ -0,0 +1,806 @@ + + + + + Connection management + HttpClient has a complete control over the process of connection initialization and + termination as well as I/O operations on active connections. However various aspects of + connection operations can be controlled using a number of parameters. +
+ Connection parameters + These are parameters that can influence connection operations: + + + + 'http.socket.timeout': + defines the socket timeout (SO_TIMEOUT) in + milliseconds, which is the timeout for waiting for data or, put differently, + a maximum period inactivity between two consecutive data packets). A timeout + value of zero is interpreted as an infinite timeout. This parameter expects + a value of type java.lang.Integer. If this parameter + is not set read operations will not time out (infinite timeout). + + + 'http.tcp.nodelay': + determines whether Nagle's algorithm is to be used. The Nagle's algorithm + tries to conserve bandwidth by minimizing the number of segments that are + sent. When applications wish to decrease network latency and increase + performance, they can disable Nagle's algorithm (that is enable + TCP_NODELAY. Data will be sent earlier, at the cost + of an increase in bandwidth consumption. This parameter expects a value of + type java.lang.Boolean. If this parameter is not, + TCP_NODELAY will be enabled (no delay). + + + 'http.socket.buffer-size': + determines the size of the internal socket buffer used to buffer data + while receiving / transmitting HTTP messages. This parameter expects a value + of type java.lang.Integer. If this parameter is not + set HttpClient will allocate 8192 byte socket buffers. + + + 'http.socket.linger': + sets SO_LINGER with the specified linger time in + seconds. The maximum timeout value is platform specific. Value 0 implies + that the option is disabled. Value -1 implies that the JRE default is used. + The setting only affects the socket close operation. If this parameter is + not set value -1 (JRE default) will be assumed. + + + 'http.connection.timeout': + determines the timeout in milliseconds until a connection is established. + A timeout value of zero is interpreted as an infinite timeout. This + parameter expects a value of type java.lang.Integer. + If this parameter is not set connect operations will not time out (infinite + timeout). + + + 'http.connection.stalecheck': + determines whether stale connection check is to be used. Disabling stale + connection check may result in a noticeable performance improvement (the + check can cause up to 30 millisecond overhead per request) at the risk of + getting an I/O error when executing a request over a connection that has + been closed at the server side. This parameter expects a value of type + java.lang.Boolean. For performance critical + operations the check should be disabled. If this parameter is not set the + stale connection will be performed before each request execution. + + + 'http.connection.max-line-length': + determines the maximum line length limit. If set to a positive value, any + HTTP line exceeding this limit will cause an + java.io.IOException. A negative or zero + value will effectively disable the check. This parameter expects a value of + type java.lang.Integer. If this parameter is not set, + no limit will be enforced. + + + 'http.connection.max-header-count': + determines the maximum HTTP header count allowed. If set to a positive + value, the number of HTTP headers received from the data stream exceeding + this limit will cause an java.io.IOException. + A negative or zero value will effectively disable the check. This parameter + expects a value of type java.lang.Integer. If this + parameter is not set, no limit will be enforced. + + + 'http.connection.max-status-line-garbage': + defines the maximum number of ignorable lines before we expect a HTTP + response's status line. With HTTP/1.1 persistent connections, the problem + arises that broken scripts could return a wrong + Content-Length (there are more bytes sent than + specified). Unfortunately, in some cases, this cannot be detected after the + bad response, but only before the next one. So HttpClient must be able to + skip those surplus lines this way. This parameter expects a value of type + java.lang.Integer. 0 disallows all garbage/empty lines before the status + line. Use java.lang.Integer#MAX_VALUE for unlimited + number. If this parameter is not set unlimited number will be + assumed. + + + +
+
+ Connection persistence + The process of establishing a connection from one host to another is quite complex and + involves multiple packet exchanges between two endpoints, which can be quite time + consuming. The overhead of connection handshaking can be significant, especially for + small HTTP messages. One can achieve a much higher data throughput if open connections + can be re-used to execute multiple requests. + HTTP/1.1 states that HTTP connections can be re-used for multiple requests per + default. HTTP/1.0 compliant endpoints can also use similar mechanism to explicitly + communicate their preference to keep connection alive and use it for multiple requests. + HTTP agents can also keep idle connections alive for a certain period time in case a + connection to the same target host may be needed for subsequent requests. The ability to + keep connections alive is usually refered to as connection persistence. HttpClient fully + supports connection persistence. +
+
+ HTTP connection routing + HttpClient is capable of establishing connections to the target host either directly + or via a route that may involve multiple intermediate connections also referred to as + hops. HttpClient differentiates connections of a route into plain, tunneled and layered. + The use of multiple intermediate proxies to tunnel connections to the target host is + referred to as proxy chaining. + Plain routes are established by connecting to the target or the first and only proxy. + Tunnelled routes are established by connecting to the first and tunnelling through a + chain of proxies to the target. Routes without a proxy cannot be tunnelled. Layered + routes are established by layering a protocol over an existing connection. Protocols can + only be layered over a tunnel to the target, or over a direct connection without + proxies. +
+ Route computation + RouteInfo interface represents information about a + definitive route to a target host involving one or more intermediate steps or hops. + HttpRoute is a concrete implementation of + RouteInfo, which cannot be changed (is + immutable). HttpTracker is a mutable + RouteInfo implementation used internally by + HttpClient to track the remaining hops to the ultimate route target. + HttpTracker can be updated after a successful execution + of the next hop towards the route target. HttpRouteDirector + is a helper class that can be used to compute the next step in a route. This class + is used internally by HttpClient. + HttpRoutePlanner is an interface representing a + strategy to compute a complete route to a given target based on the execution + context. HttpClient ships with two default + HttpRoutePlanner implementation. + ProxySelectorRoutePlanner is based on + java.net.ProxySelector. By default, it will pick up the + proxy settings of the JVM, either from system properties or from the browser running + the application. DefaultHttpRoutePlanner implementation does + not make use of any Java system properties, nor of system or browser proxy settings. + It computes routes based exclusively on HTTP parameters described below. +
+
+ Secure HTTP connections + HTTP connections can be considered secure if information transmitted between two + connection endpoints cannot be read or tampered with by an unauthorized third party. + The SSL/TLS protocol is the most widely used technique to ensure HTTP transport + security. However, other encryption techniques could be employed as well. Usually, + HTTP transport is layered over the SSL/TLS encrypted connection. +
+
+
+ HTTP route parameters + These are parameters that can influence route computation: + + + + 'http.route.default-proxy': + defines a proxy host to be used by default route planners that do not make + use of JRE settings. This parameter expects a value of type + HttpHost. If this parameter is not set direct + connections to the target will be attempted. + + + + + 'http.route.local-address': + defines a local address to be used by all default route planner. On + machines with multiple network interfaces, this parameter can be used to + select the network interface from which the connection originates. This + parameter expects a value of type + java.net.InetAddress. If this parameter is not + set a default local address will be used automatically. + + + + + 'http.route.forced-route': + defines an forced route to be used by all default route planner. Instead + of computing a route, the given forced route will be returned, even if it + points to a completely different target host. This parameter expects a value + of type HttpRoute. + + + +
+
+ Socket factories + HTTP connections make use of a java.net.Socket object + internally to handle transmission of data across the wire. They, however, rely on + SocketFactory interface to create, initialize and + connect sockets. This enables the users of HttpClient to provide application specific + socket initialization code at runtime. PlainSocketFactory is the + default factory for creating and initializing plain (unencrypted) sockets. + The process of creating a socket and that of connecting it to a host are decoupled, so + that the socket could be closed while being blocked in the connect operation. + +
+ Secure socket layering + LayeredSocketFactory is an extension of + SocketFactory interface. Layered socket factories + are capable of creating sockets that are layered over an existing plain socket. + Socket layering is used primarily for creating secure sockets through proxies. + HttpClient ships with SSLSocketFactory that implements SSL/TLS layering. Please note + HttpClient does not use any custom encryption functionality. It is fully reliant on + standard Java Cryptography (JCE) and Secure Sockets (JSEE) extensions. +
+
+ SSL/TLS customization + HttpClient makes use of SSLSocketFactory to create SSL connections. + SSLSocketFactory allows for a high degree of + customization. It can take an instance of + javax.net.ssl.SSLContext as a parameter and use + it to create custom configured SSL connections. + + Customization of SSLSocketFactory implies a certain degree of familiarity with the + concepts of the SSL/TLS protocol, a detailed explanation of which is out of scope + for this document. Please refer to the Java Secure Socket Extension for a detailed description of + javax.net.ssl.SSLContext and related + tools. +
+
+ Hostname verification + In addition to the trust verification and the client authentication performed on + the SSL/TLS protocol level, HttpClient can optionally verify whether the target + hostname matches the names stored inside the server's X.509 certificate, once the + connection has been established. This verification can provide additional guarantees + of authenticity of the server trust material. X509HostnameVerifier interface + represents a strategy for hostname verification. HttpClient ships with three + X509HostnameVerifier. Important: hostname verification should not be confused with + SSL trust verification. + + + + <classname>StrictHostnameVerifier</classname>: + The strict hostname verifier works the same way as Sun Java 1.4, Sun + Java 5, Sun Java 6. It's also pretty close to IE6. This implementation + appears to be compliant with RFC 2818 for dealing with wildcards. The + hostname must match either the first CN, or any of the subject-alts. A + wildcard can occur in the CN, and in any of the subject-alts. + + + + + <classname>BrowserCompatHostnameVerifier</classname>: + The hostname verifier that works the same way as Curl and Firefox. The + hostname must match either the first CN, or any of the subject-alts. A + wildcard can occur in the CN, and in any of the subject-alts. The only + difference between BrowserCompatHostnameVerifier + and StrictHostnameVerifier is that a wildcard + (such as "*.foo.com") with + BrowserCompatHostnameVerifier matches all + subdomains, including "a.b.foo.com". + + + + + <classname>AllowAllHostnameVerifier</classname>: + This hostname verifier essentially turns hostname verification off. + This implementation is a no-op, and never throws the + javax.net.ssl.SSLException. + + + + Per default HttpClient uses BrowserCompatHostnameVerifier + implementation. One can specify a different hostname verifier implementation if + desired + +
+
+
+ Protocol schemes + Scheme class represents a protocol scheme such as "http" or + "https" and contains a number of protocol properties such as the default port and the + socket factory to be used to creating java.net.Socket instances + for the given protocol. SchemeRegistry class is used to maintain + a set of Schemes HttpClient can choose from when trying to + establish a connection by a request URI: + +
+
+ HttpClient proxy configuration + Even though HttpClient is aware of complex routing scemes and proxy chaining, it + supports only simple direct or one hop proxy connections out of the box. + The simplest way to tell HttpClient to connect to the target host via a proxy is by + setting the default proxy parameter: + + One can also instruct HttpClient to use standard JRE proxy selector to obtain proxy + information: + + Alternatively, one can provide a custom RoutePlanner + implementation in order to have a complete control over the process of HTTP route + computation: + +
+
+ HTTP connection managers +
+ Connection operators + Operated connections are client side connections whose underlying socket or its + state can be manipulated by an external entity, usually referred to as a connection + operator. OperatedClientConnection interface extends + HttpClientConnection interface and define + additional methods to manage connection socket. The + ClientConnectionOperator interface represents a + strategy for creating OperatedClientConnection + instances and updating the underlying socket of those objects. Implementations will + most likely make use SocketFactorys to create + java.net.Socket instances. The + ClientConnectionOperator interface enables the + users of HttpClient to provide a custom strategy for connection operators as well as + an ability to provide alternative implementation of the + OperatedClientConnection interface. +
+
+ Managed connections and connection managers + HTTP connections are complex, stateful, thread-unsafe objects which need to be + properly managed to function correctly. HTTP connections can only be used by one + execution thread at a time. HttpClient employs a special entity to manage access to + HTTP connections called HTTP connection manager and represented by the + ClientConnectionManager interface. The purpose of + an HTTP connection manager is to serve as a factory for new HTTP connections, manage + persistent connections and synchronize access to persistent connections making sure + that only one thread can have access to a connection at a time. + Internally HTTP connection managers work with instances of + OperatedClientConnection, but they hands out + instances of ManagedClientConnection to the service + consumers. ManagedClientConnection acts as a wrapper + for a OperatedClientConnection instance that manages + its state and controls all I/O operations on that connection. It also abstracts away + socket operations and provides convenience methods for opening and updating sockets + in order to establish a route. + ManagedClientConnection instances are aware of + their link to the connection manager that spawned them and of the fact that they + must be returned back to the manager when no longer in use. + ManagedClientConnection classes also implement + ConnectionReleaseTrigger interface that can be + used to trigger the release of the connection back to the manager. Once the + connection release has been triggered the wrapped connection gets detached from the + ManagedClientConnection wrapper and the + OperatedClientConnection instance is returned + back to the manager. Even though the service consumer still holds a reference to the + ManagedClientConnection instance, it is no longer + able to execute any I/O operation or change the state of the + OperatedClientConnection either intentionally or + unintentionally. + This is an example of acquiring a connection from a connection manager: + + The connection request can be terminated prematurely by calling + ClientConnectionRequest#abortRequest() if necessary. + This will unblock the thread blocked in the + ClientConnectionRequest#getConnection() method. + BasicManagedEntity wrapper class can be used to ensure + automatic release of the underlying connection once the response content has been + fully consumed. HttpClient uses this mechanism internally to achieve transparent + connection release for all responses obtained from + HttpClient#execute() methods: + +
+
+ Simple connection manager + SingleClientConnManager is a simple connection manager that + maintains only one connection at a time. Even though this class is thread-safe it + ought to be used by one execution thread only. + SingleClientConnManager will make an effort to reuse the + connection for subsequent requests with the same route. It will, however, close the + existing connection and open it for the given route, if the route of the persistent + connection does not match that of the connection request. If the connection has been + already been allocated + java.lang.IllegalStateException is thrown. + SingleClientConnManager is used by HttpClient per + default. +
+
+ Pooling connection manager + ThreadSafeClientConnManager is a more complex + implementation that manages a pool of client connections and is able to service + connection requests from multiple execution threads. Connections are pooled on a per + route basis. A request for a route which already the manager has persistent + connections for available in the pool will be services by leasing a connection from + the pool rather than creating a brand new connection. + ThreadSafeClientConnManager maintains a maximum limit of + connection on a per route basis and in total. Per default this implementation will + create no more than than 2 concurrent connections per given route and no more 20 + connections in total. For many real-world applications these limits may prove too + constraining, especially if they use HTTP as a transport protocol for their + services. Connection limits, however, can be adjusted using HTTP parameters. + This example shows how the connection pool parameters can be adjusted: + +
+
+ Connection manager shutdown + When an HttpClient instance is no longer needed and is about to go out of scope it + is important to shut down its connection manager to ensure that all connections kept + alive by the manager get closed and system resources allocated by those connections + are released. + +
+
+
+ Connection management parameters + These are parameters that be used to customize standard HTTP connection manager + implementations: + + + + 'http.conn-manager.timeout': + defines the timeout in milliseconds used when retrieving an instance of + ManagedClientConnection from the + ClientConnectionManager This parameter + expects a value of type java.lang.Long. If this + parameter is not set connection requests will not time out (infinite + timeout). + + + + + 'http.conn-manager.max-per-route': + defines the maximum number of connections per route. This limit is + interpreted by client connection managers and applies to individual manager + instances. This parameter expects a value of type + ConnPerRoute. + + + + + 'http.conn-manager.max-total': + defines the maximum number of connections in total. This limit is + interpreted by client connection managers and applies to individual manager + instances. This parameter expects a value of type + java.lang.Integer. + + + +
+
+ Multithreaded request execution + When equipped with a pooling connection manager such as ThreadSafeClientConnManager + HttpClient can be used to execute multiple requests simultaneously using multiple + threads of execution. + ThreadSafeClientConnManager will allocate connections based on + its configuration. If all connections for a given route have already been leased, a + request for connection will block until a connection is released back to the pool. One + can ensure the connection manager does not block indefinitely in the connection request + operation by setting 'http.conn-manager.timeout' to a positive value. + If the connection request cannot be serviced within the given time period + ConnectionPoolTimeoutException will be thrown. + + +
+
+ Connection eviction policy + One of the major shortcoming of the classic blocking I/O model is that the network + socket can react to I/O events only when blocked in an I/O operation. When a connection + is released back to the manager, it can be kept alive however it is unable to monitor + the status of the socket and react to any I/O events. If the connection gets closed on + the server side, the client side connection is unable to detect the change in the + connection state and react appropriately by closing the socket on its end. + HttpClient tries to mitigate the problem by testing whether the connection is 'stale', + that is no longer valid because it was closed on the server side, prior to using the + connection for executing an HTTP request. The stale connection check is not 100% + reliable and adds 10 to 30 ms overhead to each request execution. The only feasible + solution that does not involve a one thread per socket model for idle connections is a + dedicated monitor thread used to evict connections that are considered expired due to a + long period of inactivity. The monitor thread can periodically call + ClientConnectionManager#closeExpiredConnections() method to + close all expired connections and evict closed connections from the pool. It can also + optionally call ClientConnectionManager#closeIdleConnections() + method to close all connections that have been idle over a given period of time. + +
+
+ Connection keep alive strategy + The HTTP specification does not specify how long a persistent connection may be and + should be kept alive. Some HTTP servers use non-standard Keep-Alive + header to communicate to the client the period of time in seconds they intend to keep + the connection alive on the server side. HttpClient makes use of this information if + available. If the Keep-Alive header is not present in the response, + HttpClient assumes the connection can be kept alive indefinitely. However, many HTTP + servers out there are configured to drop persistent connections after a certain period + of inactivity in order to conserve system resources, quite often without informing the + client. In case the default strategy turns out to be too optimistic, one may want to + provide a custom keep-alive strategy. + +
+
diff --git a/src/docbkx/fundamentals.xml b/src/docbkx/fundamentals.xml index 374ba698d..2e70df36a 100755 --- a/src/docbkx/fundamentals.xml +++ b/src/docbkx/fundamentals.xml @@ -503,54 +503,42 @@ byte[] response = httpclient.execute(httpget, handler); - - <literal>http.connection</literal> - + 'http.connection': HttpConnection instance representing the actual connection to the target server. - - <literal>http.target_host</literal> - + 'http.target_host': HttpHost instance representing the connection target. - - <literal>http.proxy_host</literal> - + 'http.proxy_host': HttpHost instance representing the connection proxy, if used - - <literal>http.request</literal> - + 'http.request': HttpRequest instance representing the actual HTTP request. - - <literal>http.response</literal> - + 'http.response': HttpResponse instance representing the actual HTTP response. - - <literal>http.request_sent</literal> - + 'http.request_sent': java.lang.Boolean object representing the flag indicating whether the actual request has been fully transmitted to the connection target. @@ -889,9 +877,7 @@ null - - <literal>http.protocol.version</literal> - + 'http.protocol.version': defines HTTP protocol version used if not set explicitly on the request object. This parameter expects a value of type ProtocolVersion. If this parameter is not @@ -900,9 +886,7 @@ null - - <literal>http.protocol.element-charset</literal> - + 'http.protocol.element-charset': defines the charset to be used for encoding HTTP protocol elements. This parameter expects a value of type java.lang.String. If this parameter is not set US-ASCII will be @@ -911,9 +895,7 @@ null - - <literal>http.protocol.content-charset</literal> - + 'http.protocol.content-charset': defines the charset to be used per default for content body coding. This parameter expects a value of type java.lang.String. If this parameter is not set ISO-8859-1 will be @@ -922,9 +904,7 @@ null - - <literal>http.useragent</literal> - + 'http.useragent': defines the content of the User-Agent header. This parameter expects a value of type java.lang.String. If this parameter is not set, HttpClient will automatically generate a value @@ -933,9 +913,7 @@ null - - <literal>http.protocol.strict-transfer-encoding</literal> - + 'http.protocol.strict-transfer-encoding': defines whether responses with an invalid Transfer-Encoding header should be rejected. This parameter expects a value of type java.lang.Boolean. @@ -945,9 +923,7 @@ null - - <literal>http.protocol.expect-continue</literal> - + 'http.protocol.expect-continue': activates Expect: 100-Continue handshake for the entity enclosing methods. The purpose of the Expect: 100-Continue handshake is to allow the client that is sending @@ -966,9 +942,7 @@ null - - <literal>http.protocol.wait-for-continue</literal> - + 'http.protocol.wait-for-continue': defines the maximum period of time in milliseconds the client should spend waiting for a 100-continue response. This parameter expects a value of type java.lang.Integer. If this diff --git a/src/docbkx/httpagent.xml b/src/docbkx/httpagent.xml new file mode 100644 index 000000000..ed70f761b --- /dev/null +++ b/src/docbkx/httpagent.xml @@ -0,0 +1,203 @@ + + + + + HTTP client service +
+ HttpClient facade + HttpClient interface represents the most essential + contract for HTTP request execution. It imposes no restrictions or particular details on + the request execution process and leaves the specifics of connection management, state + management, authentication and redirect handling up to individual implementations. This + should make it easier to decorate the interface with additional functionality such as + response content caching. + DefaultHttpClient is the default implementation of the + HttpClient interface. This class acts as a facade to + a number of special purpose handler or strategy interface implementations responsible + for handling of a particular aspect of the HTTP protocol such as redirect or + authentication handling or making decision about connection persistence and keep alive + duration. This enables the users to selectively replace default implementation of those + aspects with custom, application specific ones. + + DefaultHttpClient also maintains a list of protocol + interceptors intended for processing outgoing requests and incoming responses and + provides methods for managing those interceptors. New protocol interceptors can be + introduced to the protocol processor chain or removed from it if needed. Internally + protocol interceptors are stored in a simple java.util.ArrayList. + They are executed in the same natural order as they are added to the list. + + DefaultHttpClient is thread safe. It is recommended that the + same instance of this class is reused for multiple request executions. When an instance + of DefaultHttpClient is no longer needed and is about to go out + of scope the connection manager associated with it must be shut down by calling the + ClientConnectionManager#shutdown() method. + +
+
+ HttpClient parameters + These are parameters that be used to customize the behaviour of the default HttpClient + implementation: + + + + 'http.protocol.handle-redirects': + defines whether redirects should be handled automatically. This parameter + expects a value of type java.lang.Boolean. If this + parameter is not HttpClient will handle redirects automatically. + + + + + 'http.protocol.reject-relative-redirect': + defines whether relative redirects should be rejected. HTTP specification + requires the location value be an absolute URI. This parameter expects a + value of type java.lang.Boolean. If this parameter is + not set relative redirects will be allowed. + + + + + 'http.protocol.max-redirects': + defines the maximum number of redirects to be followed. The limit on + number of redirects is intended to prevent infinite loops caused by broken + server side scripts. This parameter expects a value of type + java.lang.Integer. If this parameter is not set + no more than 100 redirects will be allowed. + + + + + 'http.protocol.allow-circular-redirects': + defines whether circular redirects (redirects to the same location) should + be allowed. The HTTP spec is not sufficiently clear whether circular + redirects are permitted, therefore optionally they can be enabled. This + parameter expects a value of type java.lang.Boolean. + If this parameter is not set circular redirects will be disallowed. + + + + + 'http.connection-manager.factory-class-name': + defines the class name of the default + ClientConnectionManager implementation. + This parameter expects a value of type + java.lang.String. If this parameter is not set + SingleClientConnManager will be used per + default. + + + + + 'http.virtual-host': + defines the virtual host name to be used in the Host + header instead of the physical host name. This parameter expects a value of + type HttpHost. If this parameter is not set name or + IP address of the target host will be used. + + + + + 'http.default-headers': + defines the request headers to be sent per default with each request. This + parameter expects a value of type + java.util.Collection containing + Header objects. + + + + + 'http.default-host': + defines the default host. The default value will be used if the target + host is not explicitly specified in the request URI (relative URIs). This + parameter expects a value of type HttpHost. + + + +
+
+ Automcatic redirect handling + HttpClient handles all types of redirects automatically, except those explicitly + prohibited by the HTTP specification as requiring user intervention. Redirects on + POST and PUT requests are converted to + GET requests as required by the HTTP specification. +
+
+ HTTP client and execution context + The DefaultHttpClient treats HTTP requests as immutable objects + that are never supposed to change in the course of request execution. Instead, it + creates a private mutable copy of the original request object, whose properties can be + updated depending on the execution context. Therefore the final request properties such + as the target host and request URI can be determined by examining the content of the + local HTTP context after the request has been executed. + +
+
diff --git a/src/docbkx/index.xml b/src/docbkx/index.xml index e81b1465a..387f7fb3b 100755 --- a/src/docbkx/index.xml +++ b/src/docbkx/index.xml @@ -62,5 +62,10 @@ - + + + + + + diff --git a/src/docbkx/preface.xml b/src/docbkx/preface.xml index 8a01c5a3d..a6e42efef 100755 --- a/src/docbkx/preface.xml +++ b/src/docbkx/preface.xml @@ -48,7 +48,7 @@ Client-side HTTP transport library based on HttpCore + url="http://hc.apache.org/httpcomponents-core/index.html">HttpCore diff --git a/src/docbkx/statemgmt.xml b/src/docbkx/statemgmt.xml new file mode 100644 index 000000000..f4f93d0a5 --- /dev/null +++ b/src/docbkx/statemgmt.xml @@ -0,0 +1,392 @@ + + + + + HTTP state management + Originally HTTP was designed as a stateless, request / response oriented protocol that + made no special provisions for stateful sessions spanning across several logically related + request / response exchanges. As HTTP protocol grew in popularity and adoption more and more + systems began to use it for applications it was never intended for, for instance as a + transport for e-commerce applications. Thus, the support for state management became a + necessity. + Netscape Communications, at that time a leading developer of web client and server + software, implemented support for HTTP state management in their products based on a + proprietary specification. Later, Netscape tried to standardise the mechanism by publishing + a specification draft. Those efforts contributed to the formal specification defined through + the RFC standard track. However, state management in a significant number of applications is + still largely based on the Netscape draft and is incompatible with the official + specification. All major developers of web browsers felt compelled to retain compatibility + with those applications greatly contributing to the fragmentation of standards + compliance. +
+ HTTP cookies + Cookie is a token or short packet of state information that the HTTP agent and the + target server can exchange to maintain a session. Netscape engineers used to refer to it + as as a "magic cookie" and the name stuck. + HttpClient uses Cookie interface to represent an + abstract cookie token. In its simples form an HTTP cookie is merely a name / value pair. + Usually an HTTP cookie also contains a number of attributes such as version, a domain + for which is valid, a path that specifies the subset of URLs on the origin server to + which this cookie applies, and maximum period of time the cookie is valid for. + SetCookie interface represents a + Set-Cookie response header sent by the origin server to the HTTP + agent in order to maintain a conversational state. + SetCookie2 interface extends SetCookie with + Set-Cookie2 specific methods. + ClientCookie interface extends + Cookie interface with additional client specific + functionality such ability to retrieve original cookie attributes exactly as they were + specified by the origin server. This is important for generating the + Cookie header because some cookie specifications require that the + Cookie header should include certain attributes only if they were + specified in the Set-Cookie or Set-Cookie2 + header. +
+ Cookie versions + Cookies compatible with Netscape draft specification but non-compliant with the + official specification are considered to be of version 0. Standard compliant cookies + are expected to have version 1. HttpClient may handle cookies differently depending + on the version. + Here is an example of re-creating a Netscape cookie: + + Here is an example of re-creating a standard cookie. Please note that standard + compliant cookie must retain all attributes as sent by the origin server: + + Here is an example of re-creating a Set-Cookie2 compliant + cookie. Please note that standard compliant cookie must retain all attributes as + sent by the origin server: + +
+
+
+ Cookie specifications + CookieSpec interface represents a cookie management + specification. Cookie management specification is expected to enforce: + + + rules of parsing Set-Cookie and optionally + Set-Cookie2 headers. + + + rules of validation of parsed cookies. + + + formatting of Cookie header for a given host, port and path + of origin. + + + HttpClient ships with several CookieSpec + implementations: + + + + Netscape draft: + This specification conforms to the original draft specification published + by Netscape Communications. It should be avoided unless absolutely necessary + for compatibility with legacy code. + + + + + RFC 2109: + Older version of the official HTTP state management specification + superseded by RFC 2965. + + + + + RFC 2965: + The official HTTP state management specification. + + + + + Browser compatibility: + This implementations strives to closely mimic (mis)behavior of common web + browser applications such as Microsoft Internet Explorer and Mozilla + FireFox. + + + + + Best match: + 'Meta' cookie specification that picks up a cookie policy based on the + format of cookies sent with the HTTP response. It basically aggregates all + above implementations into one class. + + + + It is strongly recommended to use the Best Match policy and let + HttpClient pick up an appropriate compliance level at runtime based on the execution + context. +
+
+ HTTP cookie and state management parameters + These are parameters that be used to customize HTTP state management and behaviour of + individual cookie specifications: + + + + 'http.protocol.cookie-datepatterns': + defines valid date patterns to be used for parsing non-standard + expires attribute. Only required for compatibility + with non-compliant servers that still use expires defined + in the Netscape draft instead of the standard max-age + attribute. This parameter expects a value of type + java.util.Collection. The collection + elements must be of type java.lang.String compatible + with the syntax of java.text.SimpleDateFormat. If + this parameter is not set the choice of a default value is + CookieSpec implementation specific. + Please note this parameter applies + + + + + 'http.protocol.single-cookie-header': + defines whether cookies should be forced into a single + Cookie request header. Otherwise, each cookie is + formatted as a separate Cookie header. This parameter + expects a value of type java.lang.Boolean. If this + parameter is not set the choice of a default value is CookieSpec + implementation specific. Please note this parameter applies to strict cookie + specifications (RFC 2109 and RFC 2965) only. Browser compatibility and + netscape draft policies will always put all cookies into one request + header. + + + + + 'http.protocol.cookie-policy': + defines the name of a cookie specification to be used for HTTP state + management. This parameter expects a value of type + java.lang.String. If this parameter is not set + valid date patterns are CookieSpec + implementation specific. + + + +
+
+ Cookie specification registry + HttpClient maintains a registry of available cookie specifications using + CookieSpecRegistry class. The following specifications are + registered per default: + + + + compatibility: + Browser compatibility (lenient policy). + + + + + netscape: + Netscape draft. + + + + + rfc2109: + RFC 2109 (outdated strict policy). + + + + + rfc2965: + RFC 2965 (standard conformant strict policy). + + + + + best-match: + Best match meta-policy. + + + +
+
+ Choosing cookie policy + Cookie policy can be set at the HTTP client and overridden on the HTTP request level + if required. + +
+
+ Custom cookie policy + In order to implement a custom cookie policy one should create a custom implementation + of CookieSpec interface, create a + CookieSpecFactory implementation to create and + initialize instances of the custom specification and register the factory with + HttpClient. Once the custom specification has been registered, it can be activated the + same way as the standard cookie specifications. + +
+
+ Cookie persistence + HttpClient can work with any physical representation of a persistent cookie store that + implements the CookieStore interface. The default + CookieStore implementation called + BasicClientCookie is a simple implementation backed by a + java.util.ArrayList. Cookies stored in an + BasicClientCookie object are lost when the container object + get garbage collected. Users can provide more complex implementations if + necessary. + +
+
+ HTTP state management and execution context + In the course of HTTP request execution HttpClient adds the following state management + related objects to the execution context: + + + + 'http.cookiespec-registry': + CookieSpecRegistry instance representing the actual + cookie specification registry. The value of this attribute set in the local + context takes precedence over the default one. + + + + + 'http.cookie-spec': + CookieSpec instance representing the actual + cookie specification. + + + + + 'http.cookie-origin': + CookieOrigin instance representing the actual + details of the origin server. + + + + + 'http.cookie-store': + CookieStore instance represents the actual + cookie store. The value of this attribute set in the local context takes + precedence over the default one. + + + + The local HttpContext object can be used to customize + the HTTP state management context prior to request execution or examine its state after + the request has been executed: + +
+
+ Per user / thread state management + One can use an individual local execution context in order to implement per user (or + per thread) state management. Cookie specification registry and cookie store defined in + the local context will take precedence over the default ones set at the HTTP client + level. + +
+