[[analysis-fingerprint-analyzer]] === Fingerprint Analyzer The `fingerprint` analyzer implements a https://github.com/OpenRefine/OpenRefine/wiki/Clustering-In-Depth#fingerprint[fingerprinting algorithm] which is used by the OpenRefine project to assist in clustering. Input text is lowercased, normalized to remove extended characters, sorted, deduplicated and concatenated into a single token. If a stopword list is configured, stop words will also be removed. [float] === Definition It consists of: Tokenizer:: * <> Token Filters (in order):: 1. <> 2. <> 3. <> (disabled by default) 4. <> [float] === Example output [source,js] --------------------------- POST _analyze { "analyzer": "fingerprint", "text": "Yes yes, Gödel said this sentence is consistent and." } --------------------------- // CONSOLE ///////////////////// [source,js] ---------------------------- { "tokens": [ { "token": "and consistent godel is said sentence this yes", "start_offset": 0, "end_offset": 52, "type": "fingerprint", "position": 0 } ] } ---------------------------- // TESTRESPONSE ///////////////////// The above sentence would produce the following single term: [source,text] --------------------------- [ and consistent godel is said sentence this yes ] --------------------------- [float] === Configuration The `fingerprint` analyzer accepts the following parameters: [horizontal] `separator`:: The character to use to concate the terms. Defaults to a space. `max_output_size`:: The maximum token size to emit. Defaults to `255`. Tokens larger than this size will be discarded. `stopwords`:: A pre-defined stop words list like `_english_` or an array containing a list of stop words. Defaults to `_none_`. `stopwords_path`:: The path to a file containing stop words. See the <> for more information about stop word configuration. [float] === Example configuration In this example, we configure the `fingerprint` analyzer to use the pre-defined list of English stop words: [source,js] ---------------------------- PUT my_index { "settings": { "analysis": { "analyzer": { "my_fingerprint_analyzer": { "type": "fingerprint", "stopwords": "_english_" } } } } } GET _cluster/health?wait_for_status=yellow POST my_index/_analyze { "analyzer": "my_fingerprint_analyzer", "text": "Yes yes, Gödel said this sentence is consistent and." } ---------------------------- // CONSOLE ///////////////////// [source,js] ---------------------------- { "tokens": [ { "token": "consistent godel said sentence yes", "start_offset": 0, "end_offset": 52, "type": "fingerprint", "position": 0 } ] } ---------------------------- // TESTRESPONSE ///////////////////// The above example produces the following term: [source,text] --------------------------- [ consistent godel said sentence yes ] ---------------------------