[[analysis-overview]] == Text analysis overview ++++ Overview ++++ Text analysis enables {es} to perform full-text search, where the search returns all _relevant_ results rather than just exact matches. If you search for `Quick fox jumps`, you probably want the document that contains `A quick brown fox jumps over the lazy dog`, and you might also want documents that contain related words like `fast fox` or `foxes leap`. [discrete] [[tokenization]] === Tokenization Analysis makes full-text search possible through _tokenization_: breaking a text down into smaller chunks, called _tokens_. In most cases, these tokens are individual words. If you index the phrase `the quick brown fox jumps` as a single string and the user searches for `quick fox`, it isn't considered a match. However, if you tokenize the phrase and index each word separately, the terms in the query string can be looked up individually. This means they can be matched by searches for `quick fox`, `fox brown`, or other variations. [discrete] [[normalization]] === Normalization Tokenization enables matching on individual terms, but each token is still matched literally. This means: * A search for `Quick` would not match `quick`, even though you likely want either term to match the other * Although `fox` and `foxes` share the same root word, a search for `foxes` would not match `fox` or vice versa. * A search for `jumps` would not match `leaps`. While they don't share a root word, they are synonyms and have a similar meaning. To solve these problems, text analysis can _normalize_ these tokens into a standard format. This allows you to match tokens that are not exactly the same as the search terms, but similar enough to still be relevant. For example: * `Quick` can be lowercased: `quick`. * `foxes` can be _stemmed_, or reduced to its root word: `fox`. * `jump` and `leap` are synonyms and can be indexed as a single word: `jump`. To ensure search terms match these words as intended, you can apply the same tokenization and normalization rules to the query string. For example, a search for `Foxes leap` can be normalized to a search for `fox jump`. [discrete] [[analysis-customization]] === Customize text analysis Text analysis is performed by an <>, a set of rules that govern the entire process. {es} includes a default analyzer, called the <>, which works well for most use cases right out of the box. If you want to tailor your search experience, you can choose a different <> or even <>. A custom analyzer gives you control over each step of the analysis process, including: * Changes to the text _before_ tokenization * How text is converted to tokens * Normalization changes made to tokens before indexing or search