HHH-6036: Moving the "partitioning" chapter of Envers docs
This commit is contained in:
parent
61bd52fb34
commit
6c7ec667a0
|
@ -540,4 +540,332 @@ public class ExampleListener implements RevisionListener {
|
|||
</section>
|
||||
</section>
|
||||
|
||||
<section id="envers-partitioning">
|
||||
<title>Advanced: Audit table partitioning</title>
|
||||
|
||||
<section id="envers-partitioning-benefits">
|
||||
|
||||
<title>Benefits of audit table partitioning</title>
|
||||
|
||||
<para>
|
||||
Because audit tables tend to grow indefinitely they can quickly become really large. When the audit tables have grown
|
||||
to a certain limit (varying per RDBMS and/or operating system) it makes sense to start using table partitioning.
|
||||
SQL table partitioning offers a lot of advantages including, but certainly not limited to:
|
||||
<orderedlist>
|
||||
<listitem>
|
||||
<para>
|
||||
Improved query performance by selectively moving rows to various partitions (or even purging old rows)
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
Faster data loads, index creation, etc.
|
||||
</para>
|
||||
</listitem>
|
||||
</orderedlist>
|
||||
</para>
|
||||
|
||||
</section>
|
||||
|
||||
<section id="envers-partitioning-columns">
|
||||
|
||||
<title>Suitable columns for audit table partitioning</title>
|
||||
<para>
|
||||
Generally SQL tables must be partitioned on a column that exists within the table. As a rule it makes sense to use
|
||||
either the <emphasis>end revision</emphasis> or the <emphasis>end revision timestamp</emphasis> column for
|
||||
partioning of audit tables.
|
||||
<note>
|
||||
<para>
|
||||
End revision information is not available for the default AuditStrategy.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Therefore the following Envers configuration options are required:
|
||||
</para>
|
||||
<para>
|
||||
<literal>org.hibernate.envers.audit_strategy</literal> =
|
||||
<literal>org.hibernate.envers.strategy.ValidityAuditStrategy</literal>
|
||||
</para>
|
||||
<para>
|
||||
<literal>org.hibernate.envers.audit_strategy_validity_store_revend_timestamp</literal> =
|
||||
<literal>true</literal>
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Optionally, you can also override the default values following properties:
|
||||
</para>
|
||||
<para>
|
||||
<literal>org.hibernate.envers.audit_strategy_validity_end_rev_field_name</literal>
|
||||
</para>
|
||||
<para>
|
||||
<literal>org.hibernate.envers.audit_strategy_validity_revend_timestamp_field_name</literal>
|
||||
</para>
|
||||
|
||||
<para>
|
||||
For more information, see <xref linkend="configuration"/>.
|
||||
</para>
|
||||
</note>
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The reason why the end revision information should be used for audit table partioning is based on the assumption that
|
||||
audit tables should be partionioned on an 'increasing level of interestingness', like so:
|
||||
</para>
|
||||
|
||||
<para>
|
||||
<orderedlist>
|
||||
<listitem>
|
||||
<para>
|
||||
A couple of partitions with audit data that is not very (or no longer) interesting.
|
||||
This can be stored on slow media, and perhaps even be purged eventually.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
Some partitions for audit data that is potentially interesting.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
One partition for audit data that is most likely to be interesting.
|
||||
This should be stored on the fastest media, both for reading and writing.
|
||||
</para>
|
||||
</listitem>
|
||||
</orderedlist>
|
||||
</para>
|
||||
|
||||
|
||||
</section>
|
||||
|
||||
<section id="envers-partitioning-example">
|
||||
|
||||
<title>Audit table partitioning example</title>
|
||||
<para>
|
||||
In order to determine a suitable column for the 'increasing level of interestingness',
|
||||
consider a simplified example of a salary registration for an unnamed agency.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Currently, the salary table contains the following rows for a certain person X:
|
||||
|
||||
<table frame="topbot">
|
||||
<title>Salaries table</title>
|
||||
<tgroup cols="2">
|
||||
<colspec colname="c1" colwidth="1*"/>
|
||||
<colspec colname="c2" colwidth="1*"/>
|
||||
<thead>
|
||||
<row>
|
||||
<entry>Year</entry>
|
||||
<entry>Salary (USD)</entry>
|
||||
</row>
|
||||
</thead>
|
||||
<tbody>
|
||||
<row>
|
||||
<entry>2006</entry>
|
||||
<entry>3300</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>2007</entry>
|
||||
<entry>3500</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>2008</entry>
|
||||
<entry>4000</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>2009</entry>
|
||||
<entry>4500</entry>
|
||||
</row>
|
||||
</tbody>
|
||||
</tgroup>
|
||||
</table>
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The salary for the current fiscal year (2010) is unknown. The agency requires that all changes in registered
|
||||
salaries for a fiscal year are recorded (i.e. an audit trail). The rationale behind this is that decisions
|
||||
made at a certain date are based on the registered salary at that time. And at any time it must be possible
|
||||
reproduce the reason why a certain decision was made at a certain date.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The following audit information is available, sorted on in order of occurrence:
|
||||
|
||||
<table frame="topbot">
|
||||
<title>Salaries - audit table</title>
|
||||
<tgroup cols="5">
|
||||
<colspec colname="c1" colwidth="1*"/>
|
||||
<colspec colname="c2" colwidth="1*"/>
|
||||
<colspec colname="c3" colwidth="1*"/>
|
||||
<colspec colname="c4" colwidth="1*"/>
|
||||
<colspec colname="c5" colwidth="1*"/>
|
||||
<thead>
|
||||
<row>
|
||||
<entry>Year</entry>
|
||||
<entry>Revision type</entry>
|
||||
<entry>Revision timestamp</entry>
|
||||
<entry>Salary (USD)</entry>
|
||||
<entry>End revision timestamp</entry>
|
||||
</row>
|
||||
</thead>
|
||||
<tbody>
|
||||
<row>
|
||||
<entry>2006</entry>
|
||||
<entry>ADD</entry>
|
||||
<entry>2007-04-01</entry>
|
||||
<entry>3300</entry>
|
||||
<entry>null</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>2007</entry>
|
||||
<entry>ADD</entry>
|
||||
<entry>2008-04-01</entry>
|
||||
<entry>35</entry>
|
||||
<entry>2008-04-02</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>2007</entry>
|
||||
<entry>MOD</entry>
|
||||
<entry>2008-04-02</entry>
|
||||
<entry>3500</entry>
|
||||
<entry>null</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>2008</entry>
|
||||
<entry>ADD</entry>
|
||||
<entry>2009-04-01</entry>
|
||||
<entry>3700</entry>
|
||||
<entry>2009-07-01</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>2008</entry>
|
||||
<entry>MOD</entry>
|
||||
<entry>2009-07-01</entry>
|
||||
<entry>4100</entry>
|
||||
<entry>2010-02-01</entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry>2008</entry>
|
||||
<entry>MOD</entry>
|
||||
<entry>2010-02-01</entry>
|
||||
<entry>4000</entry>
|
||||
<entry>null</entry>
|
||||
</row>
|
||||
<row >
|
||||
<entry>2009</entry>
|
||||
<entry>ADD</entry>
|
||||
<entry>2010-04-01</entry>
|
||||
<entry>4500</entry>
|
||||
<entry>null</entry>
|
||||
</row>
|
||||
</tbody>
|
||||
</tgroup>
|
||||
</table>
|
||||
</para>
|
||||
|
||||
<section id="envers-partitioning-example-column">
|
||||
|
||||
<title>Determining a suitable partitioning column</title>
|
||||
<para>
|
||||
To partition this data, the 'level of interestingness' must be defined.
|
||||
Consider the following:
|
||||
<orderedlist>
|
||||
<listitem>
|
||||
<para>
|
||||
For fiscal year 2006 there is only one revision. It has the oldest <emphasis>revision timestamp</emphasis>
|
||||
of all audit rows, but should still be regarded as interesting because it is the latest modification
|
||||
for this fiscal year in the salary table; its <emphasis>end revision timestamp</emphasis> is null.
|
||||
</para>
|
||||
<para>
|
||||
Also note that it would be very unfortunate if in 2011 there would be an update of the salary for fiscal
|
||||
year 2006 (which is possible in until at least 10 years after the fiscal year) and the audit
|
||||
information would have been moved to a slow disk (based on the age of the
|
||||
<emphasis>revision timestamp</emphasis>). Remember that in this case Envers will have to update
|
||||
the <emphasis>end revision timestamp</emphasis> of the most recent audit row.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
There are two revisions in the salary of fiscal year 2007 which both have nearly the same
|
||||
<emphasis>revision timestamp</emphasis> and a different <emphasis>end revision timestamp</emphasis>.
|
||||
On first sight it is evident that the first revision was a mistake and probably uninteresting.
|
||||
The only interesting revision for 2007 is the one with <emphasis>end revision timestamp</emphasis> null.
|
||||
</para>
|
||||
</listitem>
|
||||
</orderedlist>
|
||||
|
||||
Based on the above, it is evident that only the <emphasis>end revision timestamp</emphasis> is suitable for
|
||||
audit table partitioning. The <emphasis>revision timestamp</emphasis> is not suitable.
|
||||
</para>
|
||||
|
||||
</section>
|
||||
|
||||
<section id="envers-partitioning-example-scheme">
|
||||
|
||||
<title>Determining a suitable partitioning scheme</title>
|
||||
<para>
|
||||
A possible partitioning scheme for the salary table would be as follows:
|
||||
<orderedlist>
|
||||
<listitem>
|
||||
<para>
|
||||
<emphasis>end revision timestamp</emphasis> year = 2008
|
||||
</para>
|
||||
<para>
|
||||
This partition contains audit data that is not very (or no longer) interesting.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
<emphasis>end revision timestamp</emphasis> year = 2009
|
||||
</para>
|
||||
<para>
|
||||
This partition contains audit data that is potentially interesting.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
<emphasis>end revision timestamp</emphasis> year >= 2010 or null
|
||||
</para>
|
||||
<para>
|
||||
This partition contains the most interesting audit data.
|
||||
</para>
|
||||
</listitem>
|
||||
</orderedlist>
|
||||
</para>
|
||||
|
||||
<para>
|
||||
This partitioning scheme also covers the potential problem of the update of the
|
||||
<emphasis>end revision timestamp</emphasis>, which occurs if a row in the audited table is modified.
|
||||
Even though Envers will update the <emphasis>end revision timestamp</emphasis> of the audit row to
|
||||
the system date at the instant of modification, the audit row will remain in the same partition
|
||||
(the 'extension bucket').
|
||||
</para>
|
||||
|
||||
<para>
|
||||
And sometime in 2011, the last partition (or 'extension bucket') is split into two new partitions:
|
||||
<orderedlist>
|
||||
<listitem>
|
||||
<para>
|
||||
<emphasis>end revision timestamp</emphasis> year = 2010
|
||||
</para>
|
||||
<para>
|
||||
This partition contains audit data that is potentially interesting (in 2011).
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
<emphasis>end revision timestamp</emphasis> year >= 2011 or null
|
||||
</para>
|
||||
<para>
|
||||
This partition contains the most interesting audit data and is the new 'extension bucket'.
|
||||
</para>
|
||||
</listitem>
|
||||
</orderedlist>
|
||||
</para>
|
||||
|
||||
</section>
|
||||
|
||||
</section>
|
||||
</section>
|
||||
|
||||
</chapter>
|
Loading…
Reference in New Issue