SOLR-11635: CDCR Source configuration example in the ref guide leaves out important settings

This commit is contained in:
Erick Erickson 2017-11-10 09:50:24 -08:00
parent b5571031ca
commit 6e3d082395
1 changed files with 29 additions and 30 deletions

View File

@ -167,10 +167,21 @@ The Source and Target configurations differ in the case of the data centers bein
=== Source Configuration === Source Configuration
Here is a sample of a Source configuration file, a section in `solrconfig.xml`. The presence of the <replica> section causes CDCR to use this cluster as the Source and should not be present in the Target collections. Details about each setting are after the two examples: Here is a sample of a Source configuration file, a section in `solrconfig.xml`. The presence of the <replica> section causes CDCR to use this cluster as the Source and should not be present in the Target collections. Details about each setting are after the two examples. The source example has buffering disabled, the default is enabled:
[source,xml] [source,xml]
---- ----
<updateRequestProcessorChain name="cdcr-processor-chain">
<processor class="solr.CdcrUpdateProcessorFactory"/>
<processor class="solr.RunUpdateProcessorFactory"/>
</updateRequestProcessorChain>
<requestHandler name="/update" class="solr.UpdateRequestHandler">
<lst name="defaults">
<str name="update.chain">cdcr-processor-chain</str>
</lst>
</requestHandler>
<requestHandler name="/cdcr" class="solr.CdcrRequestHandler"> <requestHandler name="/cdcr" class="solr.CdcrRequestHandler">
<lst name="replica"> <lst name="replica">
<str name="zkHost">10.240.18.211:2181,10.240.18.212:2181</str> <str name="zkHost">10.240.18.211:2181,10.240.18.212:2181</str>
@ -191,6 +202,12 @@ Here is a sample of a Source configuration file, a section in `solrconfig.xml`.
<lst name="updateLogSynchronizer"> <lst name="updateLogSynchronizer">
<str name="schedule">1000</str> <str name="schedule">1000</str>
</lst> </lst>
<!-- optional -->
<lst name="buffer">
<str name="defaultState">DISABLED</str>
</lst>
</requestHandler> </requestHandler>
<!-- Modify the <updateLog> section of your existing <updateHandler> <!-- Modify the <updateLog> section of your existing <updateHandler>
@ -200,6 +217,8 @@ Here is a sample of a Source configuration file, a section in `solrconfig.xml`.
<str name="dir">${solr.ulog.dir:}</str> <str name="dir">${solr.ulog.dir:}</str>
<!--Any parameters from the original <updateLog> section --> <!--Any parameters from the original <updateLog> section -->
</updateLog> </updateLog>
<!-- Other configuration options such as autoCommit should still be present -->
</updateHandler> </updateHandler>
---- ----
@ -212,6 +231,7 @@ Target instance must configure an update processor chain that is specific to CDC
[source,xml] [source,xml]
---- ----
<requestHandler name="/cdcr" class="solr.CdcrRequestHandler"> <requestHandler name="/cdcr" class="solr.CdcrRequestHandler">
<!-- recommended for Target clusters -->
<lst name="buffer"> <lst name="buffer">
<str name="defaultState">disabled</str> <str name="defaultState">disabled</str>
</lst> </lst>
@ -235,6 +255,9 @@ Target instance must configure an update processor chain that is specific to CDC
<str name="dir">${solr.ulog.dir:}</str> <str name="dir">${solr.ulog.dir:}</str>
<!--Any parameters from the original <updateLog> section --> <!--Any parameters from the original <updateLog> section -->
</updateLog> </updateLog>
<!-- Other configuration options such as autoCommit should still be present -->
</updateHandler> </updateHandler>
---- ----
@ -274,12 +297,14 @@ The number of updates to send in one batch. The optimal size depends on the size
Expert: Non-leader nodes need to synchronize their update logs with their leader node from time to time in order to clean deprecated transaction log files. By default, such a synchronization process is performed every minute. The schedule of the synchronization can be modified with a “updateLogSynchronizer” list as follows: Expert: Non-leader nodes need to synchronize their update logs with their leader node from time to time in order to clean deprecated transaction log files. By default, such a synchronization process is performed every minute. The schedule of the synchronization can be modified with a “updateLogSynchronizer” list as follows:
TIP: If the updateLogSynchronizer element is omitted from the Source cluster, transaction logs may accumulate on non-leaders.
`schedule`:: `schedule`::
The delay in milliseconds for synchronizing the update logs. The default is `60000`. The delay in milliseconds for synchronizing the update logs. The default is `60000`.
==== The Buffer Element ==== The Buffer Element
When buffering updates, the update logs will store all the updates indefinitely. It is recommended to disable buffering on both the Source and Target clusters during normal operation as when buffering is enabled the Update Logs will grow without limit. Leaving buffering enabled is intended for special maintenance periods. The buffer can be disabled at startup with a “buffer” list and the parameter “defaultState” as follows: When buffering updates, the update logs will store all the updates indefinitely. It is best to disable buffering on both the Source and Target clusters during normal operation as when buffering is enabled the Update Logs will grow without limit. Enbling buffering is intended for special maintenance periods. Buffering can be disabled at startup with a “buffer” list and the parameter “defaultState” as follows:
`defaultState`:: `defaultState`::
The state of the buffer at startup. The default is `enabled`. The state of the buffer at startup. The default is `enabled`.
@ -293,7 +318,7 @@ Buffering is designed to augment maintenance windows. The following points shoul
* During normal operation, the Update Logs will automatically accrue on the Source data center if the Target data center is unavailable; It is not necessary to enable buffering for CDCR to handle routine network disruptions. * During normal operation, the Update Logs will automatically accrue on the Source data center if the Target data center is unavailable; It is not necessary to enable buffering for CDCR to handle routine network disruptions.
** For this reason, monitoring disk usage on the Source data center is recommended as an additional check that the Target data center is receiving updates. ** For this reason, monitoring disk usage on the Source data center is recommended as an additional check that the Target data center is receiving updates.
* Buffering should _not_ be enabled on the Target data center as Update Logs would accrue without limit. * Buffering should _not_ be enabled on the Target data center as Update Logs would accrue without limit.
* If buffering is enabled then disabled, the Update Logs will be removed when their contents have been sent to the Target data center. This process may take some time. * If buffering is enabled then disabled, the Update Logs will be removed when their contents have been sent to the Target data center. This process may take some time and is triggered by additional updates the Source cluster.
** Update Log cleanup is not triggered until a new update is sent to the Source data center. ** Update Log cleanup is not triggered until a new update is sent to the Source data center.
==== ====
@ -630,33 +655,7 @@ As usual, it is good to start small. Sync a single cloud and monitor for a perio
* Before starting, stop or pause the indexers. This is best done during a small maintenance window. * Before starting, stop or pause the indexers. This is best done during a small maintenance window.
* Stop the SolrCloud instances at the Source * Stop the SolrCloud instances at the Source
* Include the CDCR request handler configuration in `solrconfig.xml` as in the below example. * Upload the modified `solrconfig.xml` to ZooKeeper on both Source and Target as appropriate, see the examples above.
+
[source,xml]
----
<requestHandler name="/cdcr" class="solr.CdcrRequestHandler">
<lst name="replica">
<str name="zkHost">${TargetZk}</str>
<str name="Source">${SourceCollection}</str>
<str name="Target">${TargetCollection}</str>
</lst>
<lst name="replicator">
<str name="threadPoolSize">8</str>
<str name="schedule">10</str>
<str name="batchSize">2000</str>
</lst>
<lst name="updateLogSynchronizer">
<str name="schedule">1000</str>
</lst>
</requestHandler>
<updateRequestProcessorChain name="cdcr-processor-chain">
<processor class="solr.CdcrUpdateProcessorFactory" />
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
----
+
* Upload the modified `solrconfig.xml` to ZooKeeper on both Source and Target
* Sync the index directories from the Source collection to Target collection across to the corresponding shard nodes. `rsync` works well for this. * Sync the index directories from the Source collection to Target collection across to the corresponding shard nodes. `rsync` works well for this.
+ +
For example, if there are 2 shards on collection1 with 2 replicas for each shard, copy the corresponding index directories from For example, if there are 2 shards on collection1 with 2 replicas for each shard, copy the corresponding index directories from