[[token-graphs]] === Token graphs When a <<analyzer-anatomy-tokenizer,tokenizer>> converts a text into a stream of tokens, it also records the following: * The `position` of each token in the stream * The `positionLength`, the number of positions that a token spans Using these, you can create a https://en.wikipedia.org/wiki/Directed_acyclic_graph[directed acyclic graph], called a _token graph_, for a stream. In a token graph, each position represents a node. Each token represents an edge or arc, pointing to the next position. image::images/analysis/token-graph-qbf-ex.svg[align="center"] [[token-graphs-synonyms]] ==== Synonyms Some <<analyzer-anatomy-token-filters,token filters>> can add new tokens, like synonyms, to an existing token stream. These synonyms often span the same positions as existing tokens. In the following graph, `quick` and its synonym `fast` both have a position of `0`. They span the same positions. image::images/analysis/token-graph-qbf-synonym-ex.svg[align="center"] [[token-graphs-multi-position-tokens]] ==== Multi-position tokens Some token filters can add tokens that span multiple positions. These can include tokens for multi-word synonyms, such as using "atm" as a synonym for "automatic teller machine." However, only some token filters, known as _graph token filters_, accurately record the `positionLength` for multi-position tokens. This filters include: * <<analysis-synonym-graph-tokenfilter,`synonym_graph`>> * <<analysis-word-delimiter-graph-tokenfilter,`word_delimiter_graph`>> In the following graph, `domain name system` and its synonym, `dns`, both have a position of `0`. However, `dns` has a `positionLength` of `3`. Other tokens in the graph have a default `positionLength` of `1`. image::images/analysis/token-graph-dns-synonym-ex.svg[align="center"] [[token-graphs-token-graphs-search]] ===== Using token graphs for search <<analysis-index-search-time,Indexing>> ignores the `positionLength` attribute and does not support token graphs containing multi-position tokens. However, queries, such as the <<query-dsl-match-query,`match`>> or <<query-dsl-match-query-phrase,`match_phrase`>> query, can use these graphs to generate multiple sub-queries from a single query string. .*Example* [%collapsible] ==== A user runs a search for the following phrase using the `match_phrase` query: `domain name system is fragile` During <<analysis-index-search-time,search analysis>>, `dns`, a synonym for `domain name system`, is added to the query string's token stream. The `dns` token has a `positionLength` of `3`. image::images/analysis/token-graph-dns-synonym-ex.svg[align="center"] The `match_phrase` query uses this graph to generate sub-queries for the following phrases: [source,text] ------ dns is fragile domain name system is fragile ------ This means the query matches documents containing either `dns is fragile` _or_ `domain name system is fragile`. ==== [[token-graphs-invalid-token-graphs]] ===== Invalid token graphs The following token filters can add tokens that span multiple positions but only record a default `positionLength` of `1`: * <<analysis-synonym-tokenfilter,`synonym`>> * <<analysis-word-delimiter-tokenfilter,`word_delimiter`>> This means these filters will produce invalid token graphs for streams containing such tokens. In the following graph, `dns` is a multi-position synonym for `domain name system`. However, `dns` has the default `positionLength` value of `1`, resulting in an invalid graph. image::images/analysis/token-graph-dns-invalid-ex.svg[align="center"] Avoid using invalid token graphs for search. Invalid graphs can cause unexpected search results.