Transactions And Concurrency

Transactions And Concurrency The most important point about Hibernate and concurrency control is that it is very easy to understand. Hibernate directly uses JDBC connections and JTA resources without adding any additional locking behavior. We highly recommend you spend some time with the JDBC, ANSI, and transaction isolation specification of your database management system. Hibernate only adds automatic versioning but does not lock objects in memory or change the isolation level of your database transactions. Basically, use Hibernate like you would use direct JDBC (or JTA/CMT) with your database resources. However, in addition to automatic versioning, Hibernate also offers a (minor) API for pessimistic locking of rows, using the SELECT FOR UPDATE syntax. This API is discussed later in this chapter. We start the discussion of concurrency control in Hibernate with the granularity of Configuration, SessionFactory, and Session, as well as database and long application transactions. Session and transaction scopes A SessionFactory is an expensive-to-create, threadsafe object intended to be shared by all application threads. It is created once, usually on application startup, from a Configuration instance. A Session is an inexpensive, non-threadsafe object that should be used once, for a single business process, a single unit of work, and then discarded. A Session will not obtain a JDBC Connection (or a Datasource) unless it is needed, so you may safely open and close a Session even if you are not sure that data access will be needed to serve a particular request. (This becomes important as soon as you are implementing some of the following patterns using request interception.) To complete this picture you also have to think about database transactions. A database transaction has to be as short as possible, to reduce lock contention in the database. Long database transactions will prevent your application from scaling to highly concurrent load. What is the scope of a unit of work? Can a single Hibernate Session span several database transactions or is this a one-to-one relationship of scopes? When should you open and close a Session and how do you demarcate the database transaction boundaries? Unit of work First, don't use the session-per-operation antipattern, that is, don't open and close a Session for every simple database call in a single thread! Of course, the same is true for database transactions. Database calls in an application are made using a planned sequence, they are grouped into atomic units of work. (Note that this also means that auto-commit after every single SQL statement is useless in an application, this mode is intended for ad-hoc SQL console work. Hibernate disables, or expects the application server to do so, auto-commit mode immediately.) The most common pattern in a multi-user client/server application is session-per-request. In this model, a request from the client is send to the server (where the Hibernate persistence layer runs), a new Hibernate Session is opened, and all database operations are executed in this unit of work. Once the work has been completed (and the response for the client has been prepared), the session is flushed and closed. You would also use a single database transaction to serve the clients request, starting and committing it when you open and close the Session. The relationship between the two is one-to-one and this model is a perfect fit for many applications. The challenge lies in the implementation: not only has the Session and transaction to be started and ended correctly, but they also have to be accessible for data access operations. The demarcation of a unit of work is ideally implemented using an interceptor that runs when a request hits the server and before the response will be send (i.e. a ServletFilter). We recommend to bind the Session to the thread that serves the request, using a ThreadLocal variable. This allows easy access (like accessing a static variable) in all code that runs in this thread. Depending on the database transaction demarcation mechanism you chose, you might also keep the transaction context in a ThreadLocal variable. The implementation patterns for this are known as ThreadLocal Session and Open Session in View. You can easily extend the HibernateUtil helper class shown earlier in this documentation to implement this. Of course, you'd have to find a way to implement an interceptor and set it up in your environment. See the Hibernate website for tips and examples. Application transactions The session-per-request pattern is not the only useful concept you can use to design units of work. Many business processes require a whole series of interactions with the user interleaved with database accesses. In web and enterprise applications it is not acceptable for a database transaction to span a user interaction. Consider the following example: The first screen of a dialog opens, the data seen by the user has been loaded in a particular Session and database transaction. The user is free to modify the objects. The user clicks "Save" after 5 minutes and expects his modifications to be made persistent; he also expects that he was the only person editing this information and that no conflicting modification can occur. We call this unit of work, from the point of view of the user, a long running application transaction. There are many ways how you can implement this in your application. A first naive implementation might keep the Session and database transaction open during user think time, with locks held in the database to prevent concurrent modification, and to guarantee isolation and atomicity. This is of course an anti-pattern, since lock contention would not allow the application to scale with the number of concurrent users. Clearly, we have to use several database transactions to implement the application transaction. In this case, maintaining isolation of business processes becomes the partial responsibility of the application tier. A single application transaction usually spans several database transactions. It will be atomic if only one of these database transactions (the last one) stores the updated data, all others simply read data (e.g. in a wizard-style dialog spanning several request/response cycles). This is easier to implement than it might sound, especially if you use Hibernate's features: Automatic Versioning - Hibernate can do automatic optimistic concurrency control for you, it can automatically detect if a concurrent modification occured during user think time. Detached Objects - If you decide to use the already discussed session-per-request pattern, all loaded instances will be in detached state during user think time. Hibernate allows you to reattach the objects and persist the modifications, the pattern is called session-per-request-with-detached-objects. Automatic versioning is used to isolate concurrent modifications. Long Session - The Hibernate Session may be disconnected from the underlying JDBC connection after the database transaction has been committed, and reconnected when a new client request occurs. This pattern is known as session-per-application-transaction and makes even reattachment unnecessary. Automatic versioning is used to isolate concurrent modifications. Both session-per-request-with-detached-objects and session-per-application-transaction have advantages and disadvantages, we discuss them later in this chapter in the context of optimistic concurrency control. Considering object identity An application may concurrently access the same persistent state in two different Sessions. However, an instance of a persistent class is never shared between two Session instances. Hence there are two different notions of identity: Database Identity foo.getId().equals( bar.getId() ) JVM Identity foo==bar Then for objects attached to a particular Session (i.e. in the scope of a Session) the two notions are equivalent, and JVM identity for database identity is guaranteed by Hibernate. However, while the application might concurrently access the "same" (persistent identity) business object in two different sessions, the two instances will actually be "different" (JVM identity). Conflicts are resolved using (automatic versioning) at flush/commit time, using an optimistic approach. This approach leaves Hibernate and the database to worry about concurrency; it also provides the best scalability, since guaranteeing identity in single-threaded units of work only doesn't need expensive locking or other means of synchronization. The application never needs to synchronize on any business object, as long as it sticks to a single thread per Session. Within a Session the application may safely use == to compare objects. However, an application that uses == outside of a Session, might see unexpected results. This might occur even in some unexpected places, for example, if you put two detached instances into the same Set. Both might have the same database identity (i.e. they represent the same row), but JVM identity is by definition not guaranteed for instances in detached state. The developer has to override the equals() and hashCode() methods in persistent classes and implement his own notion of object equality. There is one caveat: Never use the database identifier to implement equality, use a business key, a combination of unique, usually immutable, attributes. The database identifier will change if a transient object is made persistent. If the transient instance (usually together with detached instances) is held in a Set, changing the hashcode breaks the contract of the Set. Attributes for business keys don't have to be as stable as database primary keys, you only have to guarantee stability as long as the objects are in the same Set. See the Hibernate website for a more thorough discussion of this issue. Also note that this is not a Hibernate issue, but simply how Java object identity and equality has to be implemented. Common issues Never use the anti-patterns session-per-user-session or session-per-application (of course, there are rare exceptions to this rule). Note that some of the following issues might also appear with the recommended patterns, make sure you understand the implications before making a design decision: A Session is not thread-safe. Things which are supposed to work concurrently, like HTTP requests, session beans, or Swing workers, will cause race conditions if a Session instance would be shared. If you keep your Hibernate Session in your HttpSession (discussed later), you should consider synchronizing access to your Http session. Otherwise, a user that clicks reload fast enough may use the same Session in two concurrently running threads. An exception thrown by Hibernate means you have to rollback your database transaction and close the Session immediately (discussed later in more detail). If your Session is bound to the application, you have to stop the application. Rolling back the database transaction doesn't put your business objects back into the state they were at the start of the transaction. This means the database state and the business objects do get out of sync. Usually this is not a problem, because exceptions are not recoverable and you have to start over after rollback anyway. The Session caches every object that is in persistent state (watched and checked for dirty state by Hibernate). This means it grows endlessly until you get an OutOfMemoryException, if you keep it open for a long time or simply load too much data. One solution for this is to call clear() and evict() to manage the Session cache, but you most likely should consider a Stored Procedure if you need mass data operations. Some solutions are shown in . Keeping a Session open for the duration of a user session also means a high probability of stale data. Database transaction demarcation Datatabase (or system) transaction boundaries are always necessary. No communication with the database can occur outside of a database transaction (this seems to confuse many developers who are used to the auto-commit mode). Always use clear transaction boundaries, even for read-only operations. Depending on your isolation level and database capabilities this might not be required but there is no downside if you always demarcate transactions explicitly. A Hibernate application can run in non-managed (i.e. standalone, simple Web- or Swing applications) and managed J2EE environments. In a non-managed environment, Hibernate is usually responsible for its own database connection pool. The application developer has to manually set transaction boundaries, in other words, begin, commit, or rollback database transactions himself. A managed environment usually provides container-managed transactions, with the transaction assembly defined declaratively in deployment descriptors of EJB session beans, for example. Programmatic transaction demarcation is then no longer necessary, even flushing the Session is done automatically. However, it is often desirable to keep your persistence layer portable. Hibernate offers a wrapper API called Transaction that translates into the native transaction system of your deployment environment. This API is actually optional, but we strongly encourage its use unless you are in a CMT session bean. Usually, ending a Session involves four distinct phases: flush the session commit the transaction close the session handle exceptions Flushing the session has been discussed earlier, we'll now have a closer look at transaction demarcation and exception handling in both managed- and non-managed environments. Non-managed environment If a Hibernate persistence layer runs in a non-managed environment, database connections are usually handled by Hibernate's pooling mechanism. The session/transaction handling idiom looks like this: You don't have to flush() the Session explicitly - the call to commit() automatically triggers the synchronization. A call to close() marks the end of a session. The main implication of close() is that the JDBC connection will be relinquished by the session. This Java code is portable and runs in both non-managed and JTA environments. You will very likely never see this idiom in business code in a normal application; fatal (system) exceptions should always be caught at the "top". In other words, the code that executes Hibernate calls (in the persistence layer) and the code that handles RuntimeException (and usually can only clean up and exit) are in different layers. This can be a challenge to design yourself and you should use J2EE/EJB container services whenever they are available. Exception handling is discussed later in this chapter. Note that you should select org.hibernate.transaction.JDBCTransactionFactory (which is the default). Using JTA If your persistence layer runs in an application server (e.g. behind EJB session beans), every datasource connection obtained by Hibernate will automatically be part of the global JTA transaction. Hibernate offers two strategies for this integration. If you use bean-managed transactions (BMT) Hibernate will tell the application server to start and end a BMT transaction if you use the Transaction API. So, the transaction management code is identical to the non-managed environment. With CMT, transaction demarcation is done in session bean deployment descriptors, not programatically. If you don't want to manually flush and close the Session yourself, just set hibernate.transaction.flush_before_completion to true, hibernate.connection.release_mode to after_statement or auto and hibernate.transaction.auto_close_session to true. Hibernate will then automatically flush and close the Session for you. The only thing left is to rollback the transaction when an exception occurs. Fortunately, in a CMT bean, even this happens automatically, since an unhandled RuntimeException thrown by a session bean method tells the container to set the global transaction to rollback. This means you do not need to use the Hibernate Transaction API at all in CMT. Note that you should choose org.hibernate.transaction.JTATransactionFactory in a BMT session bean, and org.hibernate.transaction.CMTTransactionFactory in a CMT session bean, when you configure Hibernate's transaction factory. Remember to also set org.hibernate.transaction.manager_lookup_class. If you work in a CMT environment, and use automatic flushing and closing of the session, you might also want to use the same session in different parts of your code. Typically, in a non-managed environment you would use a ThreadLocal variable to hold the session, but a single EJB request might execute in different threads (e.g. session bean calling another session bean). If you don't want to bother passing your Session instance around, the SessionFactory provides the getCurrentSession() method, which returns a session that is bound to the JTA transaction context. This is the easiest way to integrate Hibernate into an application! The "current" session always has auto-flush, auto-close and auto-connection-release enabled (regardless of the above property settings). Our session/transaction management idiom is reduced to this: In other words, all you have to do in a managed environment is call SessionFactory.getCurrentSession(), do your data access work, and leave the rest to the container. Transaction boundaries are set declaratively in the deployment descriptors of your session bean. The lifecycle of the session is completely managed by Hibernate. There is one caveat to the use of after_statement connection release mode. Due to a silly limitation of the JTA spec, it is not possible for Hibernate to automatically clean up any unclosed ScrollableResults or Iterator instances returned by scroll() or iterate(). You must release the underlying database cursor by calling ScrollableResults.close() or Hibernate.close(Iterator) explicity from a finally block. (Of course, most applications can easily avoid using scroll() or iterate() at all from the CMT code.) Exception handling If the Session throws an exception (including any SQLException), you should immediately rollback the database transaction, call Session.close() and discard the Session instance. Certain methods of Session will not leave the session in a consistent state. No exception thrown by Hibernate can be treated as recoverable. Ensure that the Session will be closed by calling close() in a finally block. The HibernateException, which wraps most of the errors that can occur in a Hibernate persistence layer, is an unchecked exception (it wasn't in older versions of Hibernate). In our opinion, we shouldn't force the application developer to catch an unrecoverable exception at a low layer. In most systems, unchecked and fatal exceptions are handled in one of the first frames of the method call stack (i.e. in higher layers) and an error message is presented to the application user (or some other appropriate action is taken). Note that Hibernate might also throw other unchecked exceptions which are not a HibernateException. These are, again, not recoverable and appropriate action should be taken. Hibernate wraps SQLExceptions thrown while interacting with the database in a JDBCException. In fact, Hibernate will attempt to convert the eexception into a more meningful subclass of JDBCException. The underlying SQLException is always available via JDBCException.getCause(). Hibernate converts the SQLException into an appropriate JDBCException subclass using the SQLExceptionConverter attached to the SessionFactory. By default, the SQLExceptionConverter is defined by the configured dialect; however, it is also possible to plug in a custom implementation (see the javadocs for the SQLExceptionConverterFactory class for details). The standard JDBCException subtypes are: JDBCConnectionException - indicates an error with the underlying JDBC communication. SQLGrammarException - indicates a grammar or syntax problem with the issued SQL. ConstraintViolationException - indicates some form of integrity constraint violation. LockAcquisitionException - indicates an error acquiring a lock level necessary to perform the requested operation. GenericJDBCException - a generic exception which did not fall into any of the other categories. Optimistic concurrency control The only approach that is consistent with high concurrency and high scalability is optimistic concurrency control with versioning. Version checking uses version numbers, or timestamps, to detect conflicting updates (and to prevent lost updates). Hibernate provides for three possible approaches to writing application code that uses optimistic concurrency. The use cases we show are in the context of long application transactions but version checking also has the benefit of preventing lost updates in single database transactions. Application version checking In an implementation without much help from Hibernate, each interaction with the database occurs in a new Session and the developer is responsible for reloading all persistent instances from the database before manipulating them. This approach forces the application to carry out its own version checking to ensure application transaction isolation. This approach is the least efficient in terms of database access. It is the approach most similar to entity EJBs. The version property is mapped using <version>, and Hibernate will automatically increment it during flush if the entity is dirty. Of course, if you are operating in a low-data-concurrency environment and don't require version checking, you may use this approach and just skip the version check. In that case, last commit wins will be the default strategy for your long application transactions. Keep in mind that this might confuse the users of the application, as they might experience lost updates without error messages or a chance to merge conflicting changes. Clearly, manual version checking is only feasible in very trivial circumstances and not practical for most applications. Often not only single instances, but complete graphs of modified ojects have to be checked. Hibernate offers automatic version checking with either long Session or detached instances as the design paradigm. Long session and automatic versioning A single Session instance and its persistent instances are used for the whole application transaction. Hibernate checks instance versions at flush time, throwing an exception if concurrent modification is detected. It's up to the developer to catch and handle this exception (common options are the opportunity for the user to merge changes or to restart the business process with non-stale data). The Session is disconnected from any underlying JDBC connection when waiting for user interaction. This approach is the most efficient in terms of database access. The application need not concern itself with version checking or with reattaching detached instances, nor does it have to reload instances in every database transaction. The foo object still knows which Session it was loaded in. Session.reconnect() obtains a new connection (or you may supply one) and resumes the session. The method Session.disconnect() will disconnect the session from the JDBC connection and return the connection to the pool (unless you provided the connection). After reconnection, to force a version check on data you aren't updating, you may call Session.lock() with LockMode.READ on any objects that might have been updated by another transaction. You don't need to lock any data that you are updating. If the explicit calls to disconnect() and reconnect() are too onerous, you may instead use hibernate.connection.release_mode. This pattern is problematic if the Session is too big to be stored during user think time, e.g. an HttpSession should be kept as small as possible. As the Session is also the (mandatory) first-level cache and contains all loaded objects, we can probably use this strategy only for a few request/response cycles. This is indeed recommended, as the Session will soon also have stale data. Also note that you should keep the disconnected Session close to the persistence layer. In other words, use an EJB stateful session bean to hold the Session and don't transfer it to the web layer (or even serialize it to a separate tier) to store it in the HttpSession. Detached objects and automatic versioning Each interaction with the persistent store occurs in a new Session. However, the same persistent instances are reused for each interaction with the database. The application manipulates the state of detached instances originally loaded in another Session and then reattaches them using Session.update(), Session.saveOrUpdate(), or Session.merge(). Again, Hibernate will check instance versions during flush, throwing an exception if conflicting updates occured. You may also call lock() instead of update() and use LockMode.READ (performing a version check, bypassing all caches) if you are sure that the object has not been modified. Customizing automatic versioning You may disable Hibernate's automatic version increment for particular properties and collections by setting the optimistic-lock mapping attribute to false. Hibernate will then no longer increment versions if the property is dirty. Legacy database schemas are often static and can't be modified. Or, other applications might also access the same database and don't know how to handle version numbers or even timestamps. In both cases, versioning can't rely on a particular column in a table. To force a version check without a version or timestamp property mapping, with a comparison of the state of all fields in a row, turn on optimistic-lock="all" in the <class> mapping. Note that this concepetually only works if Hibernate can compare the old and new state, i.e. if you use a single long Session and not session-per-request-with-detached-objects. Sometimes concurrent modification can be permitted as long as the changes that have been made don't overlap. If you set optimistic-lock="dirty" when mapping the <class>, Hibernate will only compare dirty fields during flush. In both cases, with dedicated version/timestamp columns or with full/dirty field comparison, Hibernate uses a single UPDATE statement (with an appropriate WHERE clause) per entity to execute the version check and update the information. If you use transitive persistence to cascade reattachment to associated entities, Hibernate might execute uneccessary updates. This is usually not a problem, but on update triggers in the database might be executed even when no changes have been made to detached instances. You can customize this behavior by setting select-before-update="true" in the <class> mapping, forcing Hibernate to SELECT the instance to ensure that changes did actually occur, before updating the row. Pessimistic Locking It is not intended that users spend much time worring about locking strategies. Its usually enough to specify an isolation level for the JDBC connections and then simply let the database do all the work. However, advanced users may sometimes wish to obtain exclusive pessimistic locks, or re-obtain locks at the start of a new transaction. Hibernate will always use the locking mechanism of the database, never lock objects in memory! The LockMode class defines the different lock levels that may be acquired by Hibernate. A lock is obtained by the following mechanisms: LockMode.WRITE is acquired automatically when Hibernate updates or inserts a row. LockMode.UPGRADE may be acquired upon explicit user request using SELECT ... FOR UPDATE on databases which support that syntax. LockMode.UPGRADE_NOWAIT may be acquired upon explicit user request using a SELECT ... FOR UPDATE NOWAIT under Oracle. LockMode.READ is acquired automatically when Hibernate reads data under Repeatable Read or Serializable isolation level. May be re-acquired by explicit user request. LockMode.NONE represents the absence of a lock. All objects switch to this lock mode at the end of a Transaction. Objects associated with the session via a call to update() or saveOrUpdate() also start out in this lock mode. The "explicit user request" is expressed in one of the following ways: A call to Session.load(), specifying a LockMode. A call to Session.lock(). A call to Query.setLockMode(). If Session.load() is called with UPGRADE or UPGRADE_NOWAIT, and the requested object was not yet loaded by the session, the object is loaded using SELECT ... FOR UPDATE. If load() is called for an object that is already loaded with a less restrictive lock than the one requested, Hibernate calls lock() for that object. Session.lock() performs a version number check if the specified lock mode is READ, UPGRADE or UPGRADE_NOWAIT. (In the case of UPGRADE or UPGRADE_NOWAIT, SELECT ... FOR UPDATE is used.) If the database does not support the requested lock mode, Hibernate will use an appropriate alternate mode (instead of throwing an exception). This ensures that applications will be portable.