From de1d2a6247fa6d950e5e53021a0896c7820f6623 Mon Sep 17 00:00:00 2001 From: Erik-Berndt Scheper Date: Wed, 24 Nov 2010 13:58:54 +0100 Subject: [PATCH] HHH-5371 - Document sql table partitioning for audit tables --- .../main/docbook/en-US/Envers_Reference.xml | 1 + ...ibernate_Envers_-_Easy_Entity_Auditing.xml | 1 + .../docbook/en-US/content/partitioning.xml | 358 ++++++++++++++++++ 3 files changed, 360 insertions(+) create mode 100644 documentation/envers/src/main/docbook/en-US/content/partitioning.xml diff --git a/documentation/envers/src/main/docbook/en-US/Envers_Reference.xml b/documentation/envers/src/main/docbook/en-US/Envers_Reference.xml index 01579e0356..ab6f22776a 100644 --- a/documentation/envers/src/main/docbook/en-US/Envers_Reference.xml +++ b/documentation/envers/src/main/docbook/en-US/Envers_Reference.xml @@ -54,6 +54,7 @@ + diff --git a/documentation/envers/src/main/docbook/en-US/Hibernate_Envers_-_Easy_Entity_Auditing.xml b/documentation/envers/src/main/docbook/en-US/Hibernate_Envers_-_Easy_Entity_Auditing.xml index 7c312de837..c4149d724d 100644 --- a/documentation/envers/src/main/docbook/en-US/Hibernate_Envers_-_Easy_Entity_Auditing.xml +++ b/documentation/envers/src/main/docbook/en-US/Hibernate_Envers_-_Easy_Entity_Auditing.xml @@ -54,6 +54,7 @@ + diff --git a/documentation/envers/src/main/docbook/en-US/content/partitioning.xml b/documentation/envers/src/main/docbook/en-US/content/partitioning.xml new file mode 100644 index 0000000000..86ff8a06d1 --- /dev/null +++ b/documentation/envers/src/main/docbook/en-US/content/partitioning.xml @@ -0,0 +1,358 @@ + + + + +%BOOK_ENTITIES; +]> + + + Audit table partitioning + +
+ + Benefits of audit table partitioning + + + Because audit tables tend to grow indefinitely they can quickly become really large. When the audit tables have grown + to a certain limit (varying per RDBMS and/or operating system) it makes sense to start using table partitioning. + SQL table partitioning offers a lot of advantages including, but certainly not limited to: + + + + Improved query performance by selectively moving rows to various partitions (or even purging old rows) + + + + + Faster data loads, index creation, etc. + + + + + +
+ +
+ + Suitable columns for audit table partitioning + + Generally SQL tables must be partitioned on a column that exists within the table. As a rule it makes sense to use + either the end revision or the end revision timestamp column for + partioning of audit tables. + + + End revision information is not available for the default AuditStrategy. + + + + Therefore the following Envers configuration options are required: + + + org.hibernate.envers.audit_strategy = + org.hibernate.envers.strategy.ValidityAuditStrategy + + + org.hibernate.envers.audit_strategy_validity_store_revend_timestamp = + true + + + + Optionally, you can also override the default values following properties: + + + org.hibernate.envers.audit_strategy_validity_end_rev_field_name + + + org.hibernate.envers.audit_strategy_validity_revend_timestamp_field_name + + + + For more information, see . + + + + + + The reason why the end revision information should be used for audit table partioning is based on the assumption that + audit tables should be partionioned on an 'increasing level of interestingness', like so: + + + + + + + A couple of partitions with audit data that is not very (or no longer) interesting. + This can be stored on slow media, and perhaps even be purged eventually. + + + + + Some partitions for audit data that is potentially interesting. + + + + + One partition for audit data that is most likely to be interesting. + This should be stored on the fastest media, both for reading and writing. + + + + + + +
+ +
+ + Audit table partitioning example + + In order to determine a suitable column for the 'increasing level of interestingness', + consider a simplified example of a salary registration for an unnamed agency. + + + + Currently, the salary table contains the following rows for a certain person X: + + + Salaries table + + + + + + Year + Salary (USD) + + + + + 2006 + 3300 + + + 2007 + 3500 + + + 2008 + 4000 + + + 2009 + 4500 + + + +
+
+ + + The salary for the current fiscal year (2010) is unknown. The agency requires that all changes in registered + salaries for a fiscal year are recorded (i.e. an audit trail). The rationale behind this is that decisions + made at a certain date are based on the registered salary at that time. And at any time it must be possible + reproduce the reason why a certain decision was made at a certain date. + + + + The following audit information is available, sorted on in order of occurrence: + + + Salaries - audit table + + + + + + + + + Year + Revision type + Revision timestamp + Salary (USD) + End revision timestamp + + + + + 2006 + ADD + 2007-04-01 + 3300 + null + + + 2007 + ADD + 2008-04-01 + 35 + 2008-04-02 + + + 2007 + MOD + 2008-04-02 + 3500 + null + + + 2008 + ADD + 2009-04-01 + 3700 + 2009-07-01 + + + 2008 + MOD + 2009-07-01 + 4100 + 2010-02-01 + + + 2008 + MOD + 2010-02-01 + 4000 + null + + + 2009 + ADD + 2010-04-01 + 4500 + null + + + +
+
+ +
+ + Determining a suitable partitioning column + + To partition this data, the 'level of interestingness' must be defined. + Consider the following: + + + + For fiscal year 2006 there is only one revision. It has the oldest revision timestamp + of all audit rows, but should still be regarded as interesting because it is the latest modification + for this fiscal year in the salary table; its end revision timestamp is null. + + + Also note that it would be very unfortunate if in 2011 there would be an update of the salary for fiscal + year 2006 (which is possible in until at least 10 years after the fiscal year) and the audit + information would have been moved to a slow disk (based on the age of the + revision timestamp). Remember that in this case Envers will have to update + the end revision timestamp of the most recent audit row. + + + + + There are two revisions in the salary of fiscal year 2007 which both have nearly the same + revision timestamp and a different end revision timestamp. + On first sight it is evident that the first revision was a mistake and probably uninteresting. + The only interesting revision for 2007 is the one with end revision timestamp null. + + + + + Based on the above, it is evident that only the end revision timestamp is suitable for + audit table partitioning. The revision timestamp is not suitable. + + +
+ +
+ + Determining a suitable partitioning scheme + + A possible partitioning scheme for the salary table would be as follows: + + + + end revision timestamp year = 2008 + + + This partition contains audit data that is not very (or no longer) interesting. + + + + + end revision timestamp year = 2009 + + + This partition contains audit data that is potentially interesting. + + + + + end revision timestamp year >= 2010 or null + + + This partition contains the most interesting audit data. + + + + + + + This partitioning scheme also covers the potential problem of the update of the + end revision timestamp, which occurs if a row in the audited table is modified. + Even though Envers will update the end revision timestamp of the audit row to + the system date at the instant of modification, the audit row will remain in the same partition + (the 'extension bucket'). + + + + And sometime in 2011, the last partition (or 'extension bucket') is split into two new partitions: + + + + end revision timestamp year = 2010 + + + This partition contains audit data that is potentially interesting (in 2011). + + + + + end revision timestamp year >= 2011 or null + + + This partition contains the most interesting audit data and is the new 'extension bucket'. + + + + + +
+ +
+
+