HHH-6036: Moving the "partitioning" chapter of Envers docs

2025-03-04 08:49:22 +00:00 · 2011-03-31 13:53:47 +02:00 · 2011-03-31 13:53:47 +02:00 · 6c7ec667a0
commit 6c7ec667a0
parent 61bd52fb34
1 changed files with 328 additions and 0 deletions
--- a/documentation/src/main/docbook/devguide/en-US/Envers.xml
+++ b/documentation/src/main/docbook/devguide/en-US/Envers.xml
@ -540,4 +540,332 @@ public class ExampleListener implements RevisionListener {
        </section>
    </section>

+    <section id="envers-partitioning">
+        <title>Advanced: Audit table partitioning</title>
+
+        <section id="envers-partitioning-benefits">
+
+            <title>Benefits of audit table partitioning</title>
+
+            <para>
+                Because audit tables tend to grow indefinitely they can quickly become really large. When the audit tables have grown
+                to a certain limit (varying per RDBMS and/or operating system) it makes sense to start using table partitioning.
+                SQL table partitioning offers a lot of advantages including, but certainly not limited to:
+                <orderedlist>
+                    <listitem>
+                        <para>
+                            Improved query performance by selectively moving rows to various partitions (or even purging old rows)
+                        </para>
+                    </listitem>
+                    <listitem>
+                        <para>
+                            Faster data loads, index creation, etc.
+                        </para>
+                    </listitem>
+                </orderedlist>
+            </para>
+
+        </section>
+
+        <section id="envers-partitioning-columns">
+
+            <title>Suitable columns for audit table partitioning</title>
+            <para>
+                Generally SQL tables must be partitioned on a column that exists within the table. As a rule it makes sense to use
+                either the <emphasis>end revision</emphasis> or the <emphasis>end revision timestamp</emphasis> column for
+                partioning of audit tables.
+                <note>
+                    <para>
+                        End revision information is not available for the default AuditStrategy.
+                    </para>
+
+                    <para>
+                        Therefore the following Envers configuration options are required:
+                    </para>
+                    <para>
+                        <literal>org.hibernate.envers.audit_strategy</literal> =
+                        <literal>org.hibernate.envers.strategy.ValidityAuditStrategy</literal>
+                    </para>
+                    <para>
+                        <literal>org.hibernate.envers.audit_strategy_validity_store_revend_timestamp</literal> =
+                        <literal>true</literal>
+                    </para>
+
+                    <para>
+                        Optionally, you can also override the default values following properties:
+                    </para>
+                    <para>
+                        <literal>org.hibernate.envers.audit_strategy_validity_end_rev_field_name</literal>
+                    </para>
+                    <para>
+                        <literal>org.hibernate.envers.audit_strategy_validity_revend_timestamp_field_name</literal>
+                    </para>
+
+                    <para>
+                        For more information, see <xref linkend="configuration"/>.
+                    </para>
+                </note>
+            </para>
+
+            <para>
+                The reason why the end revision information should be used for audit table partioning is based on the assumption that
+                audit tables should be partionioned on an &apos;increasing level of interestingness&apos;, like so:
+            </para>
+
+            <para>
+                <orderedlist>
+                    <listitem>
+                        <para>
+                            A couple of partitions with audit data that is not very (or no longer) interesting.
+                            This can be stored on slow media, and perhaps even be purged eventually.
+                        </para>
+                    </listitem>
+                    <listitem>
+                        <para>
+                            Some partitions for audit data that is potentially interesting.
+                        </para>
+                    </listitem>
+                    <listitem>
+                        <para>
+                            One partition for audit data that is most likely to be interesting.
+                            This should be stored on the fastest media, both for reading and writing.
+                        </para>
+                    </listitem>
+                </orderedlist>
+            </para>
+
+
+        </section>
+
+        <section id="envers-partitioning-example">
+
+            <title>Audit table partitioning example</title>
+            <para>
+                In order to determine a suitable column for the &apos;increasing level of interestingness&apos;,
+                consider a simplified example of a salary registration for an unnamed agency.
+            </para>
+
+            <para>
+                Currently, the salary table contains the following rows for a certain person X:
+
+                <table frame="topbot">
+                    <title>Salaries table</title>
+                    <tgroup cols="2">
+                        <colspec colname="c1" colwidth="1*"/>
+                        <colspec colname="c2" colwidth="1*"/>
+                        <thead>
+                            <row>
+                                <entry>Year</entry>
+                                <entry>Salary (USD)</entry>
+                            </row>
+                        </thead>
+                        <tbody>
+                            <row>
+                                <entry>2006</entry>
+                                <entry>3300</entry>
+                            </row>
+                            <row>
+                                <entry>2007</entry>
+                                <entry>3500</entry>
+                            </row>
+                            <row>
+                                <entry>2008</entry>
+                                <entry>4000</entry>
+                            </row>
+                            <row>
+                                <entry>2009</entry>
+                                <entry>4500</entry>
+                            </row>
+                        </tbody>
+                    </tgroup>
+                </table>
+            </para>
+
+            <para>
+                The salary for the current fiscal year (2010) is unknown. The agency requires that all changes in registered
+                salaries for a fiscal year are recorded (i.e. an audit trail). The rationale behind this is that decisions
+                made at a certain date are based on the registered salary at that time. And at any time it must be possible
+                reproduce the reason why a certain decision was made at a certain date.
+            </para>
+
+            <para>
+                The following audit information is available, sorted on in order of occurrence:
+
+                <table frame="topbot">
+                    <title>Salaries - audit table</title>
+                    <tgroup cols="5">
+                        <colspec colname="c1" colwidth="1*"/>
+                        <colspec colname="c2" colwidth="1*"/>
+                        <colspec colname="c3" colwidth="1*"/>
+                        <colspec colname="c4" colwidth="1*"/>
+                        <colspec colname="c5" colwidth="1*"/>
+                        <thead>
+                            <row>
+                                <entry>Year</entry>
+                                <entry>Revision type</entry>
+                                <entry>Revision timestamp</entry>
+                                <entry>Salary (USD)</entry>
+                                <entry>End revision timestamp</entry>
+                            </row>
+                        </thead>
+                        <tbody>
+                            <row>
+                                <entry>2006</entry>
+                                <entry>ADD</entry>
+                                <entry>2007-04-01</entry>
+                                <entry>3300</entry>
+                                <entry>null</entry>
+                            </row>
+                            <row>
+                                <entry>2007</entry>
+                                <entry>ADD</entry>
+                                <entry>2008-04-01</entry>
+                                <entry>35</entry>
+                                <entry>2008-04-02</entry>
+                            </row>
+                            <row>
+                                <entry>2007</entry>
+                                <entry>MOD</entry>
+                                <entry>2008-04-02</entry>
+                                <entry>3500</entry>
+                                <entry>null</entry>
+                            </row>
+                            <row>
+                                <entry>2008</entry>
+                                <entry>ADD</entry>
+                                <entry>2009-04-01</entry>
+                                <entry>3700</entry>
+                                <entry>2009-07-01</entry>
+                            </row>
+                            <row>
+                                <entry>2008</entry>
+                                <entry>MOD</entry>
+                                <entry>2009-07-01</entry>
+                                <entry>4100</entry>
+                                <entry>2010-02-01</entry>
+                            </row>
+                            <row>
+                                <entry>2008</entry>
+                                <entry>MOD</entry>
+                                <entry>2010-02-01</entry>
+                                <entry>4000</entry>
+                                <entry>null</entry>
+                            </row>
+                            <row >
+                                <entry>2009</entry>
+                                <entry>ADD</entry>
+                                <entry>2010-04-01</entry>
+                                <entry>4500</entry>
+                                <entry>null</entry>
+                            </row>
+                        </tbody>
+                    </tgroup>
+                </table>
+            </para>
+
+            <section id="envers-partitioning-example-column">
+
+                <title>Determining a suitable partitioning column</title>
+                <para>
+                    To partition this data, the &apos;level of interestingness&apos; must be defined.
+                    Consider the following:
+                    <orderedlist>
+                        <listitem>
+                            <para>
+                                For fiscal year 2006 there is only one revision. It has the oldest <emphasis>revision timestamp</emphasis>
+                                of all audit rows, but should still be regarded as interesting because it is the latest modification
+                                for this fiscal year in the salary table; its <emphasis>end revision timestamp</emphasis> is null.
+                            </para>
+                            <para>
+                                Also note that it would be very unfortunate if in 2011 there would be an update of the salary for fiscal
+                                year 2006 (which is possible in until at least 10 years after the fiscal year) and the audit
+                                information would have been moved to a slow disk (based on the age of the
+                                <emphasis>revision timestamp</emphasis>). Remember that in this case Envers will have to update
+                                the <emphasis>end revision timestamp</emphasis> of the most recent audit row.
+                            </para>
+                        </listitem>
+                        <listitem>
+                            <para>
+                                There are two revisions in the salary of fiscal year 2007 which both have nearly the same
+                                <emphasis>revision timestamp</emphasis> and a different <emphasis>end revision timestamp</emphasis>.
+                                On first sight it is evident that the first revision was a mistake and probably uninteresting.
+                                The only interesting revision for 2007 is the one with <emphasis>end revision timestamp</emphasis> null.
+                            </para>
+                        </listitem>
+                    </orderedlist>
+
+                    Based on the above, it is evident that only the <emphasis>end revision timestamp</emphasis> is suitable for
+                    audit table partitioning. The <emphasis>revision timestamp</emphasis> is not suitable.
+                </para>
+
+            </section>
+
+            <section id="envers-partitioning-example-scheme">
+
+                <title>Determining a suitable partitioning scheme</title>
+                <para>
+                    A possible partitioning scheme for the salary table would be as follows:
+                    <orderedlist>
+                        <listitem>
+                            <para>
+                                <emphasis>end revision timestamp</emphasis> year = 2008
+                            </para>
+                            <para>
+                                This partition contains audit data that is not very (or no longer) interesting.
+                            </para>
+                        </listitem>
+                        <listitem>
+                            <para>
+                                <emphasis>end revision timestamp</emphasis> year = 2009
+                            </para>
+                            <para>
+                                This partition contains audit data that is potentially interesting.
+                            </para>
+                        </listitem>
+                        <listitem>
+                            <para>
+                                <emphasis>end revision timestamp</emphasis> year >= 2010 or null
+                            </para>
+                            <para>
+                                This partition contains the most interesting audit data.
+                            </para>
+                        </listitem>
+                    </orderedlist>
+                </para>
+
+                <para>
+                    This partitioning scheme also covers the potential problem of the update of the
+                    <emphasis>end revision timestamp</emphasis>, which occurs if a row in the audited table is modified.
+                    Even though Envers will update the <emphasis>end revision timestamp</emphasis> of the audit row to
+                    the system date at the instant of modification, the audit row will remain in the same partition
+                    (the &apos;extension bucket&apos;).
+                </para>
+
+                <para>
+                    And sometime in 2011, the last partition (or &apos;extension bucket&apos;) is split into two new partitions:
+                    <orderedlist>
+                        <listitem>
+                            <para>
+                                <emphasis>end revision timestamp</emphasis> year = 2010
+                            </para>
+                            <para>
+                                This partition contains audit data that is potentially interesting (in 2011).
+                            </para>
+                        </listitem>
+                        <listitem>
+                            <para>
+                                <emphasis>end revision timestamp</emphasis> year >= 2011 or null
+                            </para>
+                            <para>
+                                This partition contains the most interesting audit data and is the new &apos;extension bucket&apos;.
+                            </para>
+                        </listitem>
+                    </orderedlist>
+                </para>
+
+            </section>
+
+        </section>
+    </section>
+
 </chapter>