[[analysis-letter-tokenizer]] === Letter tokenizer ++++ Letter ++++ The `letter` tokenizer breaks text into terms whenever it encounters a character which is not a letter. It does a reasonable job for most European languages, but does a terrible job for some Asian languages, where words are not separated by spaces. [discrete] === Example output [source,console] --------------------------- POST _analyze { "tokenizer": "letter", "text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone." } --------------------------- ///////////////////// [source,console-result] ---------------------------- { "tokens": [ { "token": "The", "start_offset": 0, "end_offset": 3, "type": "word", "position": 0 }, { "token": "QUICK", "start_offset": 6, "end_offset": 11, "type": "word", "position": 1 }, { "token": "Brown", "start_offset": 12, "end_offset": 17, "type": "word", "position": 2 }, { "token": "Foxes", "start_offset": 18, "end_offset": 23, "type": "word", "position": 3 }, { "token": "jumped", "start_offset": 24, "end_offset": 30, "type": "word", "position": 4 }, { "token": "over", "start_offset": 31, "end_offset": 35, "type": "word", "position": 5 }, { "token": "the", "start_offset": 36, "end_offset": 39, "type": "word", "position": 6 }, { "token": "lazy", "start_offset": 40, "end_offset": 44, "type": "word", "position": 7 }, { "token": "dog", "start_offset": 45, "end_offset": 48, "type": "word", "position": 8 }, { "token": "s", "start_offset": 49, "end_offset": 50, "type": "word", "position": 9 }, { "token": "bone", "start_offset": 51, "end_offset": 55, "type": "word", "position": 10 } ] } ---------------------------- ///////////////////// The above sentence would produce the following terms: [source,text] --------------------------- [ The, QUICK, Brown, Foxes, jumped, over, the, lazy, dog, s, bone ] --------------------------- [discrete] === Configuration The `letter` tokenizer is not configurable.