SOLR-10883: Ref guide: Escape replacement substitutions; add .adoc file checks to the top-level validate target

2025-02-22 01:56:16 +00:00 · 2017-06-21 18:21:40 -04:00 · 2017-06-21 18:21:40 -04:00 · d3f9059d29
commit d3f9059d29
parent 5dcd6263cb
13 changed files with 82 additions and 43 deletions
--- a/build.xml
+++ b/build.xml
@ -138,7 +138,7 @@
        'java', 'jflex', 'py', 'pl', 'g4', 'jj', 'html', 'js',
        'css', 'xml', 'xsl', 'vm', 'sh', 'cmd', 'bat', 'policy',
        'properties', 'mdtext',
-        'template',
+        'template', 'adoc',
      ];
      def invalidPatterns = [
        (~$/@author\b/$) : '@author javadoc tag',
@ -170,10 +170,15 @@
      def javaCommentPattern = ~$/(?sm)^\Q/*\E(.*?)\Q*/\E/$;
      def xmlCommentPattern = ~$/(?sm)\Q<!--\E(.*?)\Q-->\E/$;
      def lineSplitter = ~$/[\r\n]+/$;
+      def singleLineSplitter = ~$/\n\r?/$;
      def licenseMatcher = Defaults.createDefaultMatcher();
      def validLoggerPattern = ~$/(?s)\b(private\s|static\s|final\s){3}+\s*Logger\s+\p{javaJavaIdentifierStart}+\s+=\s+\QLoggerFactory.getLogger(MethodHandles.lookup().lookupClass());\E/$;
      def packagePattern = ~$/(?m)^\s*package\s+org\.apache.*;/$;
      def xmlTagPattern = ~$/(?m)\s*<[a-zA-Z].*/$;
+      def sourceHeaderPattern = ~$/\[source\b.*/$;
+      def blockBoundaryPattern = ~$/----\s*/$;
+      def blockTitlePattern = ~$/\..*/$;
+      def unescapedSymbolPattern = ~$/(?<=[^\\]|^)([-=]>|<[-=])/$; // SOLR-10883
      
      def isLicense = { matcher, ratDocument ->
        licenseMatcher.reset();
@ -197,6 +202,33 @@
          }
      }

+      def checkForUnescapedSymbolSubstitutions = { f, text ->
+        def inCodeBlock = false;
+        def underSourceHeader = false;
+        def lineNumber = 0;
+        singleLineSplitter.split(text).each {
+          ++lineNumber;
+          if (underSourceHeader) { // This line is either a single source line, or the boundary of a code block
+            inCodeBlock = blockBoundaryPattern.matcher(it).matches();
+            if ( ! blockTitlePattern.matcher(it).matches()) { // Keep underSourceHeader=true
+              underSourceHeader = false;
+            }
+          } else {
+            if (inCodeBlock) {
+              inCodeBlock = ! blockBoundaryPattern.matcher(it).matches();
+            } else {
+              underSourceHeader = sourceHeaderPattern.matcher(it).matches();
+              if ( ! underSourceHeader) {
+                def unescapedSymbolMatcher = unescapedSymbolPattern.matcher(it);
+                if (unescapedSymbolMatcher.find()) {
+                  reportViolation(f, 'Unescaped symbol "' + unescapedSymbolMatcher.group(1) + '" on line #' + lineNumber);
+                }
+              }
+            }
+          }
+        }
+      }
+
      ant.fileScanner{
        fileset(dir: baseDir){
          extensions.each{
@ -244,6 +276,9 @@
        if (f.toString().endsWith('.xml') || f.toString().endsWith('.xml.template')) {
          checkLicenseHeaderPrecedes(f, '<tag>', xmlTagPattern, xmlCommentPattern, text, ratDocument);
        }
+        if (f.toString().endsWith('.adoc')) {
+          checkForUnescapedSymbolSubstitutions(f, text);
+        }
      };
      
      if (found) {
--- a/solr/CHANGES.txt
+++ b/solr/CHANGES.txt
@ -472,6 +472,10 @@ Other Changes

 * SOLR-10834: Fixed tests and test configs to stop using numeric uniqueKey fields (hossman)

+* SOLR-10883: Ref guide: Escape replacement substitutions, e.g. => to right arrow, so that they are
+  rendered visibly in the PDF.  Also add .adoc file checks to the top-level validate target, including
+  for the invisible substitutions PDF problem.  (Steve Rowe)
+
 ==================  6.6.1 ==================

 Bug Fixes
--- a/solr/solr-ref-guide/src/charfilterfactories.adoc
+++ b/solr/solr-ref-guide/src/charfilterfactories.adoc
@ -43,8 +43,8 @@ Example:
 Mapping file syntax:

 * Comment lines beginning with a hash mark (`#`), as well as blank lines, are ignored.
-* Each non-comment, non-blank line consists of a mapping of the form: `"source" => "target"`
-** Double-quoted source string, optional whitespace, an arrow (`=>`), optional whitespace, double-quoted target string.
+* Each non-comment, non-blank line consists of a mapping of the form: `"source" \=> "target"`
+** Double-quoted source string, optional whitespace, an arrow (`\=>`), optional whitespace, double-quoted target string.
 * Trailing comments on mapping lines are not allowed.
 * The source string must contain at least one character, but the target string may be empty.
 * The following character escape sequences are recognized within source and target strings:
@ -54,14 +54,14 @@ Mapping file syntax:
 [cols="20,30,20,30",options="header"]
 |===
 |Escape Sequence |Resulting Character (http://www.ecma-international.org/publications/standards/Ecma-048.htm[ECMA-48] alias) |Unicode Character |Example Mapping Line
-|`\\` |`\` |U+005C |`"\\" => "/"`
-|`\"` |`"` |U+0022 |`"\"and\"" => "'and'"`
-|`\b` |backspace (BS) |U+0008 |`"\b" => " "`
-|`\t` |tab (HT) |U+0009 |`"\t" => ","`
-|`\n` |newline (LF) |U+000A |`"\n" => "<br>"`
-|`\f` |form feed (FF) |U+000C |`"\f" => "\n"`
-|`\r` |carriage return (CR) |U+000D |`"\r" => "/carriage-return/"`
-|`\uXXXX` |Unicode char referenced by the 4 hex digits |U+XXXX |`"\uFEFF" => ""`
+|`\\` |`\` |U+005C |`"\\" \=> "/"`
+|`\"` |`"` |U+0022 |`"\"and\"" \=> "'and'"`
+|`\b` |backspace (BS) |U+0008 |`"\b" \=> " "`
+|`\t` |tab (HT) |U+0009 |`"\t" \=> ","`
+|`\n` |newline (LF) |U+000A |`"\n" \=> "<br>"`
+|`\f` |form feed (FF) |U+000C |`"\f" \=> "\n"`
+|`\r` |carriage return (CR) |U+000D |`"\r" \=> "/carriage-return/"`
+|`\uXXXX` |Unicode char referenced by the 4 hex digits |U+XXXX |`"\uFEFF" \=> ""`
 |===
 ** A backslash followed by any other character is interpreted as if the character were present without the backslash.

@ -96,8 +96,8 @@ The table below presents examples of HTML stripping.
 |===
 |Input |Output
 |`my <a href="www.foo.bar">link</a>` |my link
-|`<br>hello<!--comment-->` |hello
-|`hello<script><!-- f('<!--internal--></script>'); --></script>` |hello
+|`<br>hello<!--comment-\->` |hello
+|`hello<script><!-- f('<!--internal-\-></script>'); -\-></script>` |hello
 |`if a<b then print a;` |if a<b then print a;
 |`hello <td height=22 nowrap align="left">` |hello
 |`a<b &#65 Alpha&Omega` Ω |a<b A Alpha&Omega Ω
--- a/solr/solr-ref-guide/src/collections-api.adoc
+++ b/solr/solr-ref-guide/src/collections-api.adoc
@ -1666,7 +1666,7 @@ Assigns leaders in a collection according to the preferredLeader property on act
 |===
 |Key |Type |Required |Description
 |collection |string |Yes |The name of the collection to rebalance preferredLeaders on.
-|maxAtOnce |string |No |The maximum number of reassignments to have queue up at once. Values <=0 are use the default value Integer.MAX_VALUE. When this number is reached, the process waits for one or more leaders to be successfully assigned before adding more to the queue.
+|maxAtOnce |string |No |The maximum number of reassignments to have queue up at once. Values \<=0 are use the default value Integer.MAX_VALUE. When this number is reached, the process waits for one or more leaders to be successfully assigned before adding more to the queue.
 |maxWaitSeconds |string |No |Defaults to 60. This is the timeout value when waiting for leaders to be reassigned. NOTE: if maxAtOnce is less than the number of reassignments that will take place, this is the maximum interval that any _single_ wait for at least one reassignment. For example, if 10 reassignments are to take place and maxAtOnce is 1 and maxWaitSeconds is 60, the upper bound on the time that the command may wait is 10 minutes.
 |===

--- a/solr/solr-ref-guide/src/command-line-utilities.adoc
+++ b/solr/solr-ref-guide/src/command-line-utilities.adoc
@ -158,7 +158,7 @@ This can be useful to create a chroot path in ZooKeeper before first cluster sta
 [[CommandLineUtilities-Setaclusterproperty]]
 === Set a cluster property

-This command will add or modify a single cluster property in `clusterprops.json`. Use this command instead of the usual getfile -> edit -> putfile cycle.
+This command will add or modify a single cluster property in `clusterprops.json`. Use this command instead of the usual getfile \-> edit \-> putfile cycle.

 Unlike the CLUSTERPROP command on the <<collections-api.adoc#CollectionsAPI-clusterprop,Collections API>>, this command does *not* require a running Solr cluster.

--- a/solr/solr-ref-guide/src/faceting.adoc
+++ b/solr/solr-ref-guide/src/faceting.adoc
@ -675,9 +675,9 @@ Intervals must begin with either '(' or '[', be followed by the start value, the

 For example:

-* (1,10) -> will include values greater than 1 and lower than 10
-* [1,10) -> will include values greater or equal to 1 and lower than 10
-* [1,10] -> will include values greater or equal to 1 and lower or equal to 10
+* (1,10) \-> will include values greater than 1 and lower than 10
+* [1,10) \-> will include values greater or equal to 1 and lower than 10
+* [1,10] \-> will include values greater or equal to 1 and lower or equal to 10

 The initial and end values cannot be empty.

--- a/solr/solr-ref-guide/src/filter-descriptions.adoc
+++ b/solr/solr-ref-guide/src/filter-descriptions.adoc
@ -76,7 +76,7 @@ This filter converts alphabetic, numeric, and symbolic Unicode characters which

 *Arguments:*

-`preserveOriginal`:: (boolean, default false) If true, the original token is preserved: "thé" -> "the", "thé"
+`preserveOriginal`:: (boolean, default false) If true, the original token is preserved: "thé" \-> "the", "thé"

 *Example:*

@ -1487,7 +1487,7 @@ There are two ways to specify synonym mappings:
 +
 * A comma-separated list of words. If the token matches any of the words, then all the words in the list are substituted, which will include the original token.
 +
-* Two comma-separated lists of words with the symbol "=>" between them. If the token matches any word on the left, then the list on the right is substituted. The original token will not be included unless it is also in the list on the right.
+* Two comma-separated lists of words with the symbol "\=>" between them. If the token matches any word on the left, then the list on the right is substituted. The original token will not be included unless it is also in the list on the right.

 `ignoreCase`:: (optional; default: `false`) If `true`, synonyms will be matched case-insensitively.

@ -1671,41 +1671,41 @@ Note: although this filter produces correct token graphs, it cannot consume an i

 The rules for determining delimiters are determined as follows:

-* A change in case within a word: "CamelCase" -> "Camel", "Case". This can be disabled by setting `splitOnCaseChange="0"`.
+* A change in case within a word: "CamelCase" \-> "Camel", "Case". This can be disabled by setting `splitOnCaseChange="0"`.

-* A transition from alpha to numeric characters or vice versa: "Gonzo5000" -> "Gonzo", "5000" "4500XL" -> "4500", "XL". This can be disabled by setting `splitOnNumerics="0"`.
+* A transition from alpha to numeric characters or vice versa: "Gonzo5000" \-> "Gonzo", "5000" "4500XL" \-> "4500", "XL". This can be disabled by setting `splitOnNumerics="0"`.

-* Non-alphanumeric characters (discarded): "hot-spot" -> "hot", "spot"
+* Non-alphanumeric characters (discarded): "hot-spot" \-> "hot", "spot"

-* A trailing "'s" is removed: "O'Reilly's" -> "O", "Reilly"
+* A trailing "'s" is removed: "O'Reilly's" \-> "O", "Reilly"

-* Any leading or trailing delimiters are discarded: "--hot-spot--" -> "hot", "spot"
+* Any leading or trailing delimiters are discarded: "--hot-spot--" \-> "hot", "spot"

 *Factory class:* `solr.WordDelimiterGraphFilterFactory`

 *Arguments:*

-`generateWordParts`:: (integer, default 1) If non-zero, splits words at delimiters. For example:"CamelCase", "hot-spot" -> "Camel", "Case", "hot", "spot"
+`generateWordParts`:: (integer, default 1) If non-zero, splits words at delimiters. For example:"CamelCase", "hot-spot" \-> "Camel", "Case", "hot", "spot"

-`generateNumberParts`:: (integer, default 1) If non-zero, splits numeric strings at delimiters:"1947-32" ->*"1947", "32"
+`generateNumberParts`:: (integer, default 1) If non-zero, splits numeric strings at delimiters:"1947-32" \->*"1947", "32"

-`splitOnCaseChange`:: (integer, default 1) If 0, words are not split on camel-case changes:"BugBlaster-XL" -> "BugBlaster", "XL". Example 1 below illustrates the default (non-zero) splitting behavior.
+`splitOnCaseChange`:: (integer, default 1) If 0, words are not split on camel-case changes:"BugBlaster-XL" \-> "BugBlaster", "XL". Example 1 below illustrates the default (non-zero) splitting behavior.

-`splitOnNumerics`:: (integer, default 1) If 0, don't split words on transitions from alpha to numeric:"FemBot3000" -> "Fem", "Bot3000"
+`splitOnNumerics`:: (integer, default 1) If 0, don't split words on transitions from alpha to numeric:"FemBot3000" \-> "Fem", "Bot3000"

-`catenateWords`:: (integer, default 0) If non-zero, maximal runs of word parts will be joined: "hot-spot-sensor's" -> "hotspotsensor"
+`catenateWords`:: (integer, default 0) If non-zero, maximal runs of word parts will be joined: "hot-spot-sensor's" \-> "hotspotsensor"

-`catenateNumbers`:: (integer, default 0) If non-zero, maximal runs of number parts will be joined: 1947-32" -> "194732"
+`catenateNumbers`:: (integer, default 0) If non-zero, maximal runs of number parts will be joined: 1947-32" \-> "194732"

-`catenateAll`:: (0/1, default 0) If non-zero, runs of word and number parts will be joined: "Zap-Master-9000" -> "ZapMaster9000"
+`catenateAll`:: (0/1, default 0) If non-zero, runs of word and number parts will be joined: "Zap-Master-9000" \-> "ZapMaster9000"

-`preserveOriginal`:: (integer, default 0) If non-zero, the original token is preserved: "Zap-Master-9000" -> "Zap-Master-9000", "Zap", "Master", "9000"
+`preserveOriginal`:: (integer, default 0) If non-zero, the original token is preserved: "Zap-Master-9000" \-> "Zap-Master-9000", "Zap", "Master", "9000"

 `protected`:: (optional) The pathname of a file that contains a list of protected words that should be passed through without splitting.

 `stemEnglishPossessive`:: (integer, default 1) If 1, strips the possessive `'s` from each subword.

-`types`:: (optional) The pathname of a file that contains *character => type* mappings, which enable customization of this filter's splitting behavior. Recognized character types: `LOWER`, `UPPER`, `ALPHA`, `DIGIT`, `ALPHANUM`, and `SUBWORD_DELIM`.
+`types`:: (optional) The pathname of a file that contains *character \=> type* mappings, which enable customization of this filter's splitting behavior. Recognized character types: `LOWER`, `UPPER`, `ALPHA`, `DIGIT`, `ALPHANUM`, and `SUBWORD_DELIM`.
 +
 The default for any character without a customized mapping is computed from Unicode character properties. Blank lines and comment lines starting with '#' are ignored. An example file:
 +
--- a/solr/solr-ref-guide/src/language-analysis.adoc
+++ b/solr/solr-ref-guide/src/language-analysis.adoc
@ -1409,7 +1409,7 @@ Swedish å, ä, ö are in fact the same letters as Norwegian and Danish å, æ,

 In that situation almost all Swedish people use a, a, o instead of å, ä, ö. Norwegians and Danes on the other hand usually type aa, ae and oe instead of å, æ and ø. Some do however use a, a, o, oo, ao and sometimes permutations of everything above.

-There are two filters for helping with normalization between Scandinavian languages: one is `solr.ScandinavianNormalizationFilterFactory` trying to preserve the special characters (æäöå) and another `solr.ScandinavianFoldingFilterFactory` which folds these to the more broad ø/ö->o etc.
+There are two filters for helping with normalization between Scandinavian languages: one is `solr.ScandinavianNormalizationFilterFactory` trying to preserve the special characters (æäöå) and another `solr.ScandinavianFoldingFilterFactory` which folds these to the more broad ø/ö\->o etc.

 See also each language section for other relevant filters.

@ -1444,7 +1444,7 @@ It's a semantically less destructive solution than `ScandinavianFoldingFilter`,
 [[LanguageAnalysis-ScandinavianFoldingFilter]]
 ==== Scandinavian Folding Filter

-This filter folds Scandinavian characters åÅäæÄÆ->a and öÖøØ->o. It also discriminate against use of double vowels aa, ae, ao, oe and oo, leaving just the first one.
+This filter folds Scandinavian characters åÅäæÄÆ\->a and öÖøØ\->o. It also discriminate against use of double vowels aa, ae, ao, oe and oo, leaving just the first one.

 It's a semantically more destructive solution than `ScandinavianNormalizationFilter`, but can in addition help with matching raksmorgas as räksmörgås.

--- a/solr/solr-ref-guide/src/response-writers.adoc
+++ b/solr/solr-ref-guide/src/response-writers.adoc
@ -259,7 +259,7 @@ Solr has an optional Ruby response format that extends its JSON output in the fo
 * \ and ' are the only two characters escaped.
 * Unicode escapes are not used. Data is written as raw UTF-8.
 * nil used for null.
-* => is used as the key/value separator in maps.
+* \=> is used as the key/value separator in maps.

 Here is a simple example of how one may query Solr using the Ruby response format:

--- a/solr/solr-ref-guide/src/solr-control-script-reference.adoc
+++ b/solr/solr-ref-guide/src/solr-control-script-reference.adoc
@ -529,7 +529,7 @@ Use the `zk upconfig` command to upload one of the pre-configured configuration
 |-n <name> a|
 Name of the configuration set in ZooKeeper. This command will upload the configuration set to the "configs" ZooKeeper node giving it the name specified.

-You can see all uploaded configuration sets in the Admin UI via the Cloud screens. Choose Cloud -> Tree -> configs to see them.
+You can see all uploaded configuration sets in the Admin UI via the Cloud screens. Choose Cloud \-> Tree \-> configs to see them.

 If a pre-existing configuration set is specified, it will be overwritten in ZooKeeper.

@ -571,7 +571,7 @@ Use the `zk downconfig` command to download a configuration set from ZooKeeper t
 [cols="20,40,40",options="header"]
 |===
 |Parameter |Description |Example
-|-n <name> |Name of config set in ZooKeeper to download. The Admin UI Cloud -> Tree -> configs node lists all available configuration sets. |`-n myconfig`
+|-n <name> |Name of config set in ZooKeeper to download. The Admin UI Cloud \-> Tree \-> configs node lists all available configuration sets. |`-n myconfig`
 |-d <configset dir> a|
 The path to write the downloaded configuration set into. If just a name is supplied, `$SOLR_HOME/server/solr/configsets` will be the parent. An absolute path may be supplied as well.

--- a/solr/solr-ref-guide/src/solr-jdbc-apache-zeppelin.adoc
+++ b/solr/solr-ref-guide/src/solr-jdbc-apache-zeppelin.adoc
@ -44,7 +44,7 @@ For most installations, Apache Zeppelin configures PostgreSQL as the JDBC interp
 [[SolrJDBC-ApacheZeppelin-CreateaNotebook]]
 == Create a Notebook

-.Click Notebook -> Create new note
+.Click Notebook \-> Create new note
 image::images/solr-jdbc-apache-zeppelin/zeppelin_solrjdbc_4.png[image,width=517,height=400]

 .Provide a name and click "Create Note"
--- a/solr/solr-ref-guide/src/the-terms-component.adoc
+++ b/solr/solr-ref-guide/src/the-terms-component.adoc
@ -89,7 +89,7 @@ Specifies the minimum document frequency to return in order for a term to be inc
 Example: `terms.mincount=5`

 `terms.maxcount`::
-Specifies the maximum document frequency a term must have in order to be included in a query response. The default setting is -1, which sets no upper bound. Results are inclusive of the maxcount (that is, <= maxcount).
+Specifies the maximum document frequency a term must have in order to be included in a query response. The default setting is -1, which sets no upper bound. Results are inclusive of the maxcount (that is, \<= maxcount).
 +
 Example: `terms.maxcount=25`

--- a/solr/solr-ref-guide/src/uploading-structured-data-store-data-with-the-data-import-handler.adoc
+++ b/solr/solr-ref-guide/src/uploading-structured-data-store-data-with-the-data-import-handler.adoc
@ -134,8 +134,8 @@ Request parameters can be substituted in configuration with placeholder `${datai
 ----
 <dataSource driver="org.hsqldb.jdbcDriver"
            url="${dataimporter.request.jdbcurl}"
-	    user="${dataimporter.request.jdbcuser}"
-	    password="${dataimporter.request.jdbcpassword}" />
+            user="${dataimporter.request.jdbcuser}"
+            password="${dataimporter.request.jdbcpassword}" />
 ----

 These parameters can then be passed to the `full-import` command or defined in the `<defaults>` section in `solrconfig.xml`. This example shows the parameters with the full-import command: