diff --git a/solr/core/src/java/doc-files/tutorial.html b/solr/core/src/java/doc-files/tutorial.html index 80c677f74d3..25942620993 100755 --- a/solr/core/src/java/doc-files/tutorial.html +++ b/solr/core/src/java/doc-files/tutorial.html @@ -478,34 +478,52 @@ in subsequent searches. named text_general, which has defaults appropriate for all languages.

- If you know your textual content is English, as is the case for the example documents in this tutorial, - and you'd like to apply English-specific stemming and stop word removal, as well as split compound words, you can use the text_en_splitting fieldType instead. - Go ahead and edit the schema.xml under the solr/example/solr/conf directory, - and change the type for fields text and features from text_general to text_en_splitting. - Restart the server and then re-post all of the documents, and then these queries will show the English-specific transformations: + If you know your textual content is English, as is the case for the example + documents in this tutorial, and you'd like to apply English-specific stemming + and stop word removal, as well as split compound words, you can use the + text_en_splitting fieldType instead. + Go ahead and edit the schema.xml in the + solr/example/solr/conf directory, + to use the text_en_splitting fieldType for + the text and + features fields like so: +

+
+   <field name="features" type="text_en_splitting" indexed="true" stored="true" multiValued="true"/>
+   ...
+   <field name="text" type="text_en_splitting" indexed="true" stored="false" multiValued="true"/>
+
+

+ Stop and restart Solr after making these changes and then re-post all of + the example documents using + java -jar post.jar *.xml. + Now queries like the ones listed below will demonstrate English-specific + transformations:

@@ -514,30 +532,56 @@ in subsequent searches.

Analysis Debugging

-

There is a handy analysis - debugging page where you can see how a text value is broken down into words, - and shows the resulting tokens after they pass through each filter in the chain. -

- -This - shows how "Canon Power-Shot SD500" would be indexed as a value in the name field. Each row of - the table shows the resulting tokens after having passed through the next TokenFilter in the analyzer for the name field. - Notice how both powershot and power, shot are indexed. Tokens generated at the same position - are shown in the same column, in this case shot and powershot. -

-

Selecting verbose output - will show more details, such as the name of each analyzer component in the chain, token positions, and the start and end positions - of the token in the original text. -

-

Selecting highlight matches - when both index and query values are provided will take the resulting terms from the query value and highlight - all matches in the index value analysis. -

+ There is a handy analysis + debugging page where you can see how a text value is broken down into words, + and shows the resulting tokens after they pass through each filter in the chain. +

-Here - is an example of stemming and stop-words at work. -

+ This + url shows how "Canon Power-Shot SD500" would + shows the tokens that would be instead be created using the + text_en_splitting type. Each row of + the table shows the resulting tokens after having passed through the next + TokenFilter in the analyzer. + Notice how both powershot and + power, shot + are indexed. Tokens generated at the same position + are shown in the same column, in this case + shot and + powershot. (Compare the previous output with + The tokens produced using the text_general field type.) +

+

+ Selecting verbose output + will show more details, such as the name of each analyzer component in the + chain, token positions, and the start and end positions of the token in + the original text. +

+

+ Selecting highlight matches + when both index and query values are provided will take the resulting + terms from the query value and highlight + all matches in the index value analysis. +

+

+ Other interesting examples: +

+ +