From 4d88aaedc58a6aded0b5dc546469a0f9b8bf513b Mon Sep 17 00:00:00 2001 From: Mark Payne Date: Sun, 13 Dec 2015 10:13:27 -0500 Subject: [PATCH] NIFI-1258: Added a new function named getDelimitedField to the Expression Language and put together a guide that walks through how to add a new function Signed-off-by: Aldrin Piri --- nifi-commons/nifi-expression-language/README | 105 +++++++++++ .../language/antlr/AttributeExpressionLexer.g | 2 + .../antlr/AttributeExpressionParser.g | 6 +- .../attribute/expression/language/Query.java | 39 ++++ .../functions/GetDelimitedFieldEvaluator.java | 174 ++++++++++++++++++ .../expression/language/TestQuery.java | 61 ++++++ .../asciidoc/expression-language-guide.adoc | 39 ++++ 7 files changed, 423 insertions(+), 3 deletions(-) create mode 100644 nifi-commons/nifi-expression-language/README create mode 100644 nifi-commons/nifi-expression-language/src/main/java/org/apache/nifi/attribute/expression/language/evaluation/functions/GetDelimitedFieldEvaluator.java diff --git a/nifi-commons/nifi-expression-language/README b/nifi-commons/nifi-expression-language/README new file mode 100644 index 0000000000..6281dcae57 --- /dev/null +++ b/nifi-commons/nifi-expression-language/README @@ -0,0 +1,105 @@ +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to You under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. + + + +This document is intended to provide a walk-through of what is necessary +in order to add a new function to the Expression Language. Doing so requires +a handful of steps, so we will outline each of those steps here, in the order +that they must be done. While this documentation is fairly verbose, it is often +the case that reading the documentation takes longer than performing the tasks +outlined by the documentation. + + +1) In order to make the nifi-expression-language Maven module compile in your IDE, you may need to add the ANTLR-generated sources to your IDE's classpath. + This can be done using Eclipse, as follows: + - Right-click on the nifi-expression-language project + - Go to "Properties" on the context menu + - Go to the "Java Build Path" item in the left tree and choose the "Source" tab. + - Click "Add Folder..." + - Add the target/generated-sources/antlr3 folder. If this folder does not exist, first build the project from Maven and then + right-click on the nifi-expression-language project in Eclipse and click Refresh. + - Click OK to close all dialogs. + +2) Add the method name to the Tokens for the Lexer + - Open the src/main/antlr3/org/apache/nifi/attribute/expression/language/antlr/AttributeExpressionLexer.g file + - Add the function name to the list of tokens in this file. These functions are grouped by the number of arguments + that they take. This grouping mechanism could probably be made better, perhaps grouping by the type of function + provided. However, for now, it is best to keep some sort of structure, at least. If the function has optional + arguments, the function should be grouped by the maximum number of arguments that it takes (for example, the + substring function can take 1 or 2 arguments, so it is grouped with the '2 argument functions'). + The syntax to use is: + + : ''; + + The Token Name should be all-caps and words should be separated by underscores. The Token Name is what will be used to + identify the token when ANTLR parses an Expression. The function name should use camel case starting with a lower-case + letter. This is the name of the function as it will be referenced in the Expression Language. + - Save the AttributeExpressionLexer.g file + +3) Add the method to the grammar + - Open the src/main/antlr3/org/apache/nifi/attribute/expression/language/antlr/AttributeExpressionParser.g file + - Starting around line 75, the functions are defined, grouped by the type of value returned. We can add the new function + into the grammar here. Please see the ANTLR documentation for syntax on the grammar used. Note that this is ANTLR 3, NOT ANTLR 4. + The idea here is to spell out the syntax that should be used for the function. So generally, we do this by specifying the function name, + "LPAREN!" (which indicates a left parenthesis and the ! indicates that we do not want this passed to us when obtaining the parsed tokens), + and then a list of arguments that are separated by "COMMA!" (again, indicating a comma character and that we do not want the token passed + to us when we are looking at parsed tokens). We then end with the matching "RPAREN!". + - Save this file. + +4) Rebuild via Maven + - In order to make sure that we now can reference the tokens that are generated for our new function, we need to rebuild via Maven. + We can do this by building just the nifi-expression-language project, rather than rebuilding the entire NiFi code base. + - If necessary, right-click on the nifi-expression-language project in your IDE and refresh / update project from new Maven build. + This is generally necessary when using Eclipse. + +5) Add the logic for the function + - In the src/main/java/org/apache/nifi/attribute/expression/language/evaluation/function package directory, we will need to create a new + class that is capable of implementing the logic of the new function. Create a class using the standard naming convention of + Evaluator and extends the appropriate abstract evaluator. If the function will return a String, the evaluator should extend + StringEvaluator. If the function will return a boolean, the evaluator should extend BooleanEvaluator. There are also evaluators for Date + and Number return types. + - Generally the constructor for the evaluator will take an Evaluator for the "Subject" and an Evaluator for each argument. The subject is the + value that the function will be evaluated against. The substring function, for instance, takes a subject of type String. Thinking in terms of + Java, the "subject" is the object on which the function is being called. It is important to take Evaluator objects and not just a String, + for instance, as we have to ensure that we determine that actual values to use dynamically at runtime. + - Implement the functionality as appropriate by implementing the abstract methods provided by the abstract Evaluator that is being extended by + your newly created Evaluator. + - The Evaluator need not be thread-safe. The existing Evaluators are numerous and provide great examples for understanding the API. + +6) Add the logic to the query parser + - Generally, when using ANTLR, the preferred method to parse the input is to use a Tree Walker. However, this is far less intuitive for many + Java developers (including those of us who wrote the Expression Language originally). As a result, we instead use ANTLR to tokenize and parse the + input and then obtain an Abstract Syntax Tree and process this "manually" in Java code. This occurs in the Query class. + - We can add the function into our parsing logic by updating the #buildFunctionEvaluator method of the org.apache.nifi.attribute.expression.language.Query class. + A static import will likely need to be added to the Query class in order to reference the new token. The token can then be added to the existing + 'case' statement, which will return a new instance of the Evaluator that was just added. + +7) Add Unit Tests! + - Unit tests are critical for the Expression Language. These expressions can be used throughout the entire application and it is important that each function + perform its task properly. Otherwise, incorrect routing decisions could be made, or data could become corrupted as a result. + - Each function should have its battery of unit tests added to the TestQuery class. This class includes a convenience method named #verifyEquals that is + used to ensure that the Expression returns the same value, regardless of how it is compiled and evaluated. + +8) Add Documentation! + - The documentation for each function is provided in the nifi-docs module, under src/main/asciidoc/expression-language-guide.adoc. + The format of the document is crucial to maintain, as this document is not only rendered as HTML in the NiFi Documentation page, but the + CSS classes that are used in the rendered docs are also made use of by the NiFi UI. When a user is entering an Expression Language expression and + presses Ctrl+Space, the UI provides auto-completion information as well as inline documentation for each function. This information is pulled + directly from the HTML that is generated from this expression-language-guide file. + - Rebuild NiFi and run the application. Add an UpdateAttribute Processor to the graph and add a new property. For the value, type the Expression Language + opening tokens ${ and then press Ctrl+Space to ensure that the function and its documentation is presented as expected. Most functions that are added + will require a Subject. In order to see the function, then, you will need to provide a subject, such as typing "${myVariable:" (without the quotes) + and then press Ctrl+Space. This step is important, as it is quite easy to make a mistake when creating the documentation using a free-form text editor, + and this will ensure that users receive a very consistent and quality experience when using the new function. + diff --git a/nifi-commons/nifi-expression-language/src/main/antlr3/org/apache/nifi/attribute/expression/language/antlr/AttributeExpressionLexer.g b/nifi-commons/nifi-expression-language/src/main/antlr3/org/apache/nifi/attribute/expression/language/antlr/AttributeExpressionLexer.g index 80581f5c01..d56a27b192 100644 --- a/nifi-commons/nifi-expression-language/src/main/antlr3/org/apache/nifi/attribute/expression/language/antlr/AttributeExpressionLexer.g +++ b/nifi-commons/nifi-expression-language/src/main/antlr3/org/apache/nifi/attribute/expression/language/antlr/AttributeExpressionLexer.g @@ -157,6 +157,8 @@ SUBSTRING : 'substring'; REPLACE : 'replace'; REPLACE_ALL : 'replaceAll'; +// 4 arg functions +GET_DELIMITED_FIELD : 'getDelimitedField'; // STRINGS STRING_LITERAL diff --git a/nifi-commons/nifi-expression-language/src/main/antlr3/org/apache/nifi/attribute/expression/language/antlr/AttributeExpressionParser.g b/nifi-commons/nifi-expression-language/src/main/antlr3/org/apache/nifi/attribute/expression/language/antlr/AttributeExpressionParser.g index 7c37530a4d..780d8c5b95 100644 --- a/nifi-commons/nifi-expression-language/src/main/antlr3/org/apache/nifi/attribute/expression/language/antlr/AttributeExpressionParser.g +++ b/nifi-commons/nifi-expression-language/src/main/antlr3/org/apache/nifi/attribute/expression/language/antlr/AttributeExpressionParser.g @@ -79,7 +79,7 @@ oneArgString : ((SUBSTRING_BEFORE | SUBSTRING_BEFORE_LAST | SUBSTRING_AFTER | SU (TO_RADIX LPAREN! anyArg (COMMA! anyArg)? RPAREN!); twoArgString : ((REPLACE | REPLACE_ALL) LPAREN! anyArg COMMA! anyArg RPAREN!) | (SUBSTRING LPAREN! anyArg (COMMA! anyArg)? RPAREN!); - +fiveArgString : GET_DELIMITED_FIELD LPAREN! anyArg (COMMA! anyArg (COMMA! anyArg (COMMA! anyArg (COMMA! anyArg)?)?)?)? RPAREN!; // functions that return Booleans zeroArgBool : (IS_NULL | NOT_NULL | IS_EMPTY | NOT) LPAREN! RPAREN!; @@ -95,11 +95,11 @@ oneArgNum : ((INDEX_OF | LAST_INDEX_OF) LPAREN! anyArg RPAREN!) | (TO_DATE LPAREN! anyArg? RPAREN!) | ((MOD | PLUS | MINUS | MULTIPLY | DIVIDE) LPAREN! anyArg RPAREN!); -stringFunctionRef : zeroArgString | oneArgString | twoArgString; +stringFunctionRef : zeroArgString | oneArgString | twoArgString | fiveArgString; booleanFunctionRef : zeroArgBool | oneArgBool; numberFunctionRef : zeroArgNum | oneArgNum; -anyArg : NUMBER | numberFunctionRef | STRING_LITERAL | zeroArgString | oneArgString | twoArgString | booleanLiteral | zeroArgBool | oneArgBool | expression; +anyArg : NUMBER | numberFunctionRef | STRING_LITERAL | zeroArgString | oneArgString | twoArgString | fiveArgString | booleanLiteral | zeroArgBool | oneArgBool | expression; stringArg : STRING_LITERAL | zeroArgString | oneArgString | twoArgString | expression; functionRef : stringFunctionRef | booleanFunctionRef | numberFunctionRef; diff --git a/nifi-commons/nifi-expression-language/src/main/java/org/apache/nifi/attribute/expression/language/Query.java b/nifi-commons/nifi-expression-language/src/main/java/org/apache/nifi/attribute/expression/language/Query.java index 2c27e4d2d3..b3a364ae1c 100644 --- a/nifi-commons/nifi-expression-language/src/main/java/org/apache/nifi/attribute/expression/language/Query.java +++ b/nifi-commons/nifi-expression-language/src/main/java/org/apache/nifi/attribute/expression/language/Query.java @@ -50,6 +50,7 @@ import org.apache.nifi.attribute.expression.language.evaluation.functions.Equals import org.apache.nifi.attribute.expression.language.evaluation.functions.EqualsIgnoreCaseEvaluator; import org.apache.nifi.attribute.expression.language.evaluation.functions.FindEvaluator; import org.apache.nifi.attribute.expression.language.evaluation.functions.FormatEvaluator; +import org.apache.nifi.attribute.expression.language.evaluation.functions.GetDelimitedFieldEvaluator; import org.apache.nifi.attribute.expression.language.evaluation.functions.GreaterThanEvaluator; import org.apache.nifi.attribute.expression.language.evaluation.functions.GreaterThanOrEqualEvaluator; import org.apache.nifi.attribute.expression.language.evaluation.functions.HostnameEvaluator; @@ -138,6 +139,7 @@ import static org.apache.nifi.attribute.expression.language.antlr.AttributeExpre import static org.apache.nifi.attribute.expression.language.antlr.AttributeExpressionParser.FALSE; import static org.apache.nifi.attribute.expression.language.antlr.AttributeExpressionParser.FIND; import static org.apache.nifi.attribute.expression.language.antlr.AttributeExpressionParser.FORMAT; +import static org.apache.nifi.attribute.expression.language.antlr.AttributeExpressionParser.GET_DELIMITED_FIELD; import static org.apache.nifi.attribute.expression.language.antlr.AttributeExpressionParser.GREATER_THAN; import static org.apache.nifi.attribute.expression.language.antlr.AttributeExpressionParser.GREATER_THAN_OR_EQUAL; import static org.apache.nifi.attribute.expression.language.antlr.AttributeExpressionParser.HOSTNAME; @@ -1288,6 +1290,43 @@ public class Query { case NOT: { return addToken(new NotEvaluator(toBooleanEvaluator(subjectEvaluator)), "not"); } + case GET_DELIMITED_FIELD: { + if (argEvaluators.size() == 1) { + // Only a single argument - the index to return. + return addToken(new GetDelimitedFieldEvaluator(toStringEvaluator(subjectEvaluator), + toNumberEvaluator(argEvaluators.get(0), "first argument of getDelimitedField")), "getDelimitedField"); + } else if (argEvaluators.size() == 2) { + // two arguments - index and delimiter. + return addToken(new GetDelimitedFieldEvaluator(toStringEvaluator(subjectEvaluator), + toNumberEvaluator(argEvaluators.get(0), "first argument of getDelimitedField"), + toStringEvaluator(argEvaluators.get(1), "second argument of getDelimitedField")), + "getDelimitedField"); + } else if (argEvaluators.size() == 3) { + // 3 arguments - index, delimiter, quote char. + return addToken(new GetDelimitedFieldEvaluator(toStringEvaluator(subjectEvaluator), + toNumberEvaluator(argEvaluators.get(0), "first argument of getDelimitedField"), + toStringEvaluator(argEvaluators.get(1), "second argument of getDelimitedField"), + toStringEvaluator(argEvaluators.get(2), "third argument of getDelimitedField")), + "getDelimitedField"); + } else if (argEvaluators.size() == 4) { + // 4 arguments - index, delimiter, quote char, escape char + return addToken(new GetDelimitedFieldEvaluator(toStringEvaluator(subjectEvaluator), + toNumberEvaluator(argEvaluators.get(0), "first argument of getDelimitedField"), + toStringEvaluator(argEvaluators.get(1), "second argument of getDelimitedField"), + toStringEvaluator(argEvaluators.get(2), "third argument of getDelimitedField"), + toStringEvaluator(argEvaluators.get(3), "fourth argument of getDelimitedField")), + "getDelimitedField"); + } else { + // 5 arguments - index, delimiter, quote char, escape char, strip escape/quote chars flag + return addToken(new GetDelimitedFieldEvaluator(toStringEvaluator(subjectEvaluator), + toNumberEvaluator(argEvaluators.get(0), "first argument of getDelimitedField"), + toStringEvaluator(argEvaluators.get(1), "second argument of getDelimitedField"), + toStringEvaluator(argEvaluators.get(2), "third argument of getDelimitedField"), + toStringEvaluator(argEvaluators.get(3), "fourth argument of getDelimitedField"), + toBooleanEvaluator(argEvaluators.get(4), "fifth argument of getDelimitedField")), + "getDelimitedField"); + } + } default: throw new AttributeExpressionLanguageParsingException("Expected a Function-type expression but got " + tree.toString()); } diff --git a/nifi-commons/nifi-expression-language/src/main/java/org/apache/nifi/attribute/expression/language/evaluation/functions/GetDelimitedFieldEvaluator.java b/nifi-commons/nifi-expression-language/src/main/java/org/apache/nifi/attribute/expression/language/evaluation/functions/GetDelimitedFieldEvaluator.java new file mode 100644 index 0000000000..e5695a8330 --- /dev/null +++ b/nifi-commons/nifi-expression-language/src/main/java/org/apache/nifi/attribute/expression/language/evaluation/functions/GetDelimitedFieldEvaluator.java @@ -0,0 +1,174 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.nifi.attribute.expression.language.evaluation.functions; + +import java.util.Map; + +import org.apache.nifi.attribute.expression.language.evaluation.Evaluator; +import org.apache.nifi.attribute.expression.language.evaluation.QueryResult; +import org.apache.nifi.attribute.expression.language.evaluation.StringEvaluator; +import org.apache.nifi.attribute.expression.language.evaluation.StringQueryResult; +import org.apache.nifi.attribute.expression.language.evaluation.literals.BooleanLiteralEvaluator; +import org.apache.nifi.attribute.expression.language.evaluation.literals.StringLiteralEvaluator; +import org.apache.nifi.attribute.expression.language.exception.AttributeExpressionLanguageException; + +public class GetDelimitedFieldEvaluator extends StringEvaluator { + private final Evaluator subjectEval; + private final Evaluator indexEval; + private final Evaluator delimiterEval; + private final Evaluator quoteCharEval; + private final Evaluator escapeCharEval; + private final Evaluator stripCharsEval; + + public GetDelimitedFieldEvaluator(final Evaluator subject, final Evaluator index) { + this(subject, index, new StringLiteralEvaluator(",")); + } + + public GetDelimitedFieldEvaluator(final Evaluator subject, final Evaluator index, final Evaluator delimiter) { + this(subject, index, delimiter, new StringLiteralEvaluator("\"")); + } + + public GetDelimitedFieldEvaluator(final Evaluator subject, final Evaluator index, final Evaluator delimiter, + final Evaluator quoteChar) { + this(subject, index, delimiter, quoteChar, new StringLiteralEvaluator("\\\\")); + } + + public GetDelimitedFieldEvaluator(final Evaluator subject, final Evaluator index, final Evaluator delimiter, + final Evaluator quoteChar, final Evaluator escapeChar) { + this(subject, index, delimiter, quoteChar, escapeChar, new BooleanLiteralEvaluator(false)); + } + + public GetDelimitedFieldEvaluator(final Evaluator subject, final Evaluator index, final Evaluator delimiter, + final Evaluator quoteChar, final Evaluator escapeChar, final Evaluator stripChars) { + this.subjectEval = subject; + this.indexEval = index; + this.delimiterEval = delimiter; + this.quoteCharEval = quoteChar; + this.escapeCharEval = escapeChar; + this.stripCharsEval = stripChars; + } + + @Override + public QueryResult evaluate(final Map attributes) { + final String subject = subjectEval.evaluate(attributes).getValue(); + if (subject == null || subject.isEmpty()) { + return new StringQueryResult(""); + } + + final Long index = indexEval.evaluate(attributes).getValue(); + if (index == null) { + throw new AttributeExpressionLanguageException("Cannot evaluate getDelimitedField function because the index (which field to obtain) was not specified"); + } + if (index < 1) { + return new StringQueryResult(""); + } + + final String delimiter = delimiterEval.evaluate(attributes).getValue(); + if (delimiter == null || delimiter.isEmpty()) { + throw new AttributeExpressionLanguageException("Cannot evaluate getDelimitedField function because the delimiter was not specified"); + } else if (delimiter.length() > 1) { + throw new AttributeExpressionLanguageException("Cannot evaluate getDelimitedField function because the delimiter evaluated to \"" + delimiter + + "\", but only a single character is allowed."); + } + + final String quoteString = quoteCharEval.evaluate(attributes).getValue(); + if (quoteString == null || quoteString.isEmpty()) { + throw new AttributeExpressionLanguageException("Cannot evaluate getDelimitedField function because the quote character " + + "(which character is used to enclose values that contain the delimiter) was not specified"); + } else if (quoteString.length() > 1) { + throw new AttributeExpressionLanguageException("Cannot evaluate getDelimitedField function because the quote character " + + "(which character is used to enclose values that contain the delimiter) evaluated to \"" + quoteString + "\", but only a single character is allowed."); + } + + final String escapeString = escapeCharEval.evaluate(attributes).getValue(); + if (escapeString == null || escapeString.isEmpty()) { + throw new AttributeExpressionLanguageException("Cannot evaluate getDelimitedField function because the escape character " + + "(which character is used to escape the quote character or delimiter) was not specified"); + } else if (quoteString.length() > 1) { + throw new AttributeExpressionLanguageException("Cannot evaluate getDelimitedField function because the escape character " + + "(which character is used to escape the quote character or delimiter) evaluated to \"" + escapeString + "\", but only a single character is allowed."); + } + + Boolean stripChars = stripCharsEval.evaluate(attributes).getValue(); + if (stripChars == null) { + stripChars = Boolean.FALSE; + } + + final char quoteChar = quoteString.charAt(0); + final char delimiterChar = delimiter.charAt(0); + final char escapeChar = escapeString.charAt(0); + + // ensure that quoteChar, delimiterChar, escapeChar are all different. + if (quoteChar == delimiterChar) { + throw new AttributeExpressionLanguageException("Cannot evaluate getDelimitedField function because the quote character and the delimiter are the same"); + } + if (quoteChar == escapeChar) { + throw new AttributeExpressionLanguageException("Cannot evaluate getDelimitedField function because the quote character and the escape character are the same"); + } + if (delimiterChar == escapeChar) { + throw new AttributeExpressionLanguageException("Cannot evaluate getDelimitedField function because the delimiter and the escape character are the same"); + } + + // Iterate through each character in the subject, trying to find the field index that we care about and extracting the chars from it. + final StringBuilder fieldBuilder = new StringBuilder(); + final int desiredFieldIndex = index.intValue(); + final int numChars = subject.length(); + + boolean inQuote = false; + int curFieldIndex = 1; + boolean lastCharIsEscape = false; + for (int i = 0; i < numChars; i++) { + final char c = subject.charAt(i); + + if (c == quoteChar && !lastCharIsEscape) { + // we found a quote character that is not escaped. Flip the value of 'inQuote' + inQuote = !inQuote; + if (!stripChars && curFieldIndex == desiredFieldIndex) { + fieldBuilder.append(c); + } + } else if (c == delimiterChar && !lastCharIsEscape && !inQuote) { + // We found a delimiter that is not escaped and we are not in quotes - or we ran out of characters so we consider this + // the last character. + final int indexJustFinished = curFieldIndex++; + if (indexJustFinished == desiredFieldIndex) { + return new StringQueryResult(fieldBuilder.toString()); + } + } else if (curFieldIndex == desiredFieldIndex) { + if (c != escapeChar || !stripChars) { + fieldBuilder.append(c); + } + } + + lastCharIsEscape = (c == escapeChar) && !lastCharIsEscape; + } + + if (curFieldIndex == desiredFieldIndex - 1) { + // we have run out of characters and we are on the desired field. Return the characters from this field. + return new StringQueryResult(fieldBuilder.toString()); + } + + // We did not find enough fields. Return an empty string. + return new StringQueryResult(""); + } + + @Override + public Evaluator getSubjectEvaluator() { + return subjectEval; + } + +} diff --git a/nifi-commons/nifi-expression-language/src/test/java/org/apache/nifi/attribute/expression/language/TestQuery.java b/nifi-commons/nifi-expression-language/src/test/java/org/apache/nifi/attribute/expression/language/TestQuery.java index 131bcde9e6..c42931ea96 100644 --- a/nifi-commons/nifi-expression-language/src/test/java/org/apache/nifi/attribute/expression/language/TestQuery.java +++ b/nifi-commons/nifi-expression-language/src/test/java/org/apache/nifi/attribute/expression/language/TestQuery.java @@ -1156,6 +1156,67 @@ public class TestQuery { verifyEquals("${allMatchingAttributes('a.*'):contains('2'):equals('true'):and( ${literal(true)} )}", attributes, true); } + @Test + public void testGetDelimitedField() { + final Map attributes = new HashMap<>(); + + attributes.put("line", "Name, Age, Title"); + + // Test "simple" case - comma separated with no quoted or escaped text + verifyEquals("${line:getDelimitedField(2)}", attributes, " Age"); + verifyEquals("${line:getDelimitedField(2, ',')}", attributes, " Age"); + verifyEquals("${line:getDelimitedField(2, ',', '\"')}", attributes, " Age"); + verifyEquals("${line:getDelimitedField(2, ',', '\"', '\\\\')}", attributes, " Age"); + + // test with a space in column + attributes.put("line", "First Name, Age, Title"); + verifyEquals("${line:getDelimitedField(1)}", attributes, "First Name"); + verifyEquals("${line:getDelimitedField(1, ',')}", attributes, "First Name"); + verifyEquals("${line:getDelimitedField(1, ',', '\"')}", attributes, "First Name"); + verifyEquals("${line:getDelimitedField(1, ',', '\"', '\\\\')}", attributes, "First Name"); + + // test quoted value + attributes.put("line", "\"Name (Last, First)\", Age, Title"); + verifyEquals("${line:getDelimitedField(1)}", attributes, "\"Name (Last, First)\""); + verifyEquals("${line:getDelimitedField(1, ',')}", attributes, "\"Name (Last, First)\""); + verifyEquals("${line:getDelimitedField(1, ',', '\"')}", attributes, "\"Name (Last, First)\""); + verifyEquals("${line:getDelimitedField(1, ',', '\"', '\\\\')}", attributes, "\"Name (Last, First)\""); + + // test non-standard quote char + attributes.put("line", "_Name (Last, First)_, Age, Title"); + verifyEquals("${line:getDelimitedField(1)}", attributes, "_Name (Last"); + verifyEquals("${line:getDelimitedField(1, ',', '_')}", attributes, "_Name (Last, First)_"); + + // test escape char + attributes.put("line", "Name (Last\\, First), Age, Title"); + verifyEquals("${line:getDelimitedField(1)}", attributes, "Name (Last\\, First)"); + + attributes.put("line", "Name (Last__, First), Age, Title"); + verifyEquals("${line:getDelimitedField(1, ',', '\"', '_')}", attributes, "Name (Last__"); + + attributes.put("line", "Name (Last_, First), Age, Title"); + verifyEquals("${line:getDelimitedField(1, ',', '\"', '_')}", attributes, "Name (Last_, First)"); + + // test escape for enclosing chars + attributes.put("line", "\\\"Name (Last, First), Age, Title"); + verifyEquals("${line:getDelimitedField(1)}", attributes, "\\\"Name (Last"); + + // get non existing field + attributes.put("line", "Name, Age, Title"); + verifyEquals("${line:getDelimitedField(12)}", attributes, ""); + + // test escape char within quotes + attributes.put("line", "col 1, col 2, \"The First, Second, and \\\"Last\\\" Column\", Last"); + verifyEquals("${line:getDelimitedField(3):trim()}", attributes, "\"The First, Second, and \\\"Last\\\" Column\""); + + // test stripping chars + attributes.put("line", "col 1, col 2, \"The First, Second, and \\\"Last\\\" Column\", Last"); + verifyEquals("${line:getDelimitedField(3, ',', '\"', '\\\\', true):trim()}", attributes, "The First, Second, and \"Last\" Column"); + + attributes.put("line", "\"Jacobson, John\", 32, Mr."); + verifyEquals("${line:getDelimitedField(2)}", attributes, " 32"); + } + private void verifyEquals(final String expression, final Map attributes, final Object expectedResult) { Query.validateExpression(expression, false); assertEquals(String.valueOf(expectedResult), Query.evaluateExpressions(expression, attributes, null)); diff --git a/nifi-docs/src/main/asciidoc/expression-language-guide.adoc b/nifi-docs/src/main/asciidoc/expression-language-guide.adoc index 9593dbfdc7..00ce0bb4f3 100644 --- a/nifi-docs/src/main/asciidoc/expression-language-guide.adoc +++ b/nifi-docs/src/main/asciidoc/expression-language-guide.adoc @@ -739,6 +739,45 @@ then the following Expressions will result in the following values: +[.function] +=== getDelimitedField + +*Description*: [.description]#Parses the Subject as a delimited line of text and returns just a single field + from that delimited text.# + +*Subject Type*: [.subject]#String# + +*Arguments*: + + - [.argName]#_index_# : [.argDesc]#The index of the field to return. A value of 1 will return the first field, + a value of 2 will return the second field, and so on.# + - [.argName]#_delimiter_# : [.argDesc]#Optional argument that provides the character to use as a field separator. + If not specified, a comma will be used. This value must be exactly 1 character.# + - [.argName]#_quoteChar_# : [.argDesc]#Optional argument that provides the character that can be used to quote values + so that the delimiter can be used within a single field. If not specified, a double-quote (") will be used. This value + must be exactly 1 character.# + - [.argName]#_escapeChar_# : [.argDesc]#Optional argument that provides the character that can be used to escape the Quote Character + or the Delimiter within a field. If not specified, a backslash (\) is used. This value must be exactly 1 character.# + - [.argName]#_stripChars_# : [.argDesc]#Optional argument that specifies whether or not quote characters and escape characters should + be stripped. For example, if we have a field value "1, 2, 3" and this value is true, we will get the value `1, 2, 3`, but if this + value is false, we will get the value `"1, 2, 3"` with the quotes. The default value is false. This value must be either `true` + or `false`.# + +*Return Type*: [.returnType]#String# + +*Examples*: If the "line" attribute contains the value _"Jacobson, John", 32, Mr._ + and the "altLine" attribute contains the value _Jacobson, John|32|Mr._ + then the following Expressions will result in the following values: + +.GetDelimitedField Examples +|====================================================================== +| Expression | Value +| `${line:getDelimitedField(2)}` | _(space)_32 +| `${line:getDelimitedField(2):trim()}` | 32 +| `${line:getDelimitedField(1)}` | "Jacobson, John" +| `${line:getDelimitedField(1, ',', '"', '\\', true)}` | Jacobson, John +| `${altLine:getDelimitedField(1, '|')} | Jacobson, John +|======================================================================