[Docs] Batch07: adding examples to string functions (#16862)

* Lower,Upper,Lpad,Rpad,Parse_long

* up to REGEXP_EXTRACT

* batch 07 ready for review

* updated definitions in scalar

* Apply suggestions from code review

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* rpad and lpad

* addressing comments

* minor fixes

* improving examples based on suggestions

* matched -> matches

* correcting typo

* Apply suggestions from code review

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

---------

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
This commit is contained in:
Edgar Melendrez 2024-08-21 15:08:25 -07:00 committed by GitHub
parent 338da67bc6
commit 725695342c
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 230 additions and 34 deletions

View File

@ -1506,11 +1506,13 @@ SELECT
FROM "taxi-trips"
LIMIT 1
```
Returns the following:
| `max_temperature` | `natural_log_max_temp` |
| -- | -- |
| `76` | `4.330733340286331` |
</details>
[Learn more](sql-scalar.md#numeric-functions)
@ -1552,19 +1554,65 @@ Looks up the expression in a registered query-time lookup table.
## LOWER
`LOWER(expr)`
**Function type:** [Scalar, string](sql-scalar.md#string-functions)
Returns the expression in lowercase.
* **Syntax:** `LOWER(expr)`
* **Function type:** Scalar, string
<details><summary>Example</summary>
The following example converts the `OriginCityName` column from the `flight-carriers` datasource to lowercase.
```sql
SELECT
"OriginCityName" AS "origin_city",
LOWER("OriginCityName") AS "lowercase"
FROM "flight-carriers"
LIMIT 1
```
Returns the following:
| `origin_city` | `lowercase` |
| -- | -- |
`San Juan, PR` | `san juan, pr` |
</details>
[Learn more](sql-scalar.md#string-functions)
## LPAD
`LPAD(<CHARACTER>, <INTEGER>, [<CHARACTER>])`
Returns a string of size `length` from `expr`. When the length of `expr` is less than `length`, left pads `expr` with `chars`, which defaults to the space character. Truncates `expr` to `length` if `length` is shorter than the length of `expr`.
**Function type:** [Scalar, string](sql-scalar.md#string-functions)
* **Syntax:** `LPAD(expr, length[, chars])`
* **Function type:** Scalar, string
<details><summary>Example</summary>
The following example left pads the value of `OriginStateName` from the `flight-carriers` datasource to return a total of 11 characters.
```sql
SELECT
"OriginStateName" AS "origin_state",
LPAD("OriginStateName", 11, '+') AS "add_left_padding"
FROM "flight-carriers"
LIMIT 3
```
Returns the following:
| `origin_state` | `add_left_padding` |
| -- | -- |
| `Puerto Rico` | `Puerto Rico` |
| `Massachusetts` | `Massachuset` |
| `Florida` | `++++Florida` |
</details>
[Learn more](sql-scalar.md#string-functions)
Returns the leftmost number of characters from an expression, optionally padded with the given characters.
## LTRIM
@ -1794,11 +1842,31 @@ Parses `expr` into a `COMPLEX<json>` object. This operator deserializes JSON val
## PARSE_LONG
`PARSE_LONG(<CHARACTER>, [<INTEGER>])`
Converts a string into a long(BIGINT) with the given radix, or into DECIMAL(base 10) if a radix is not provided.
**Function type:** [Scalar, string](sql-scalar.md#string-functions)
* **Syntax:**`PARSE_LONG(string[, radix])`
* **Function type:** Scalar, string
<details><summary>Example</summary>
The following example converts the string representation of the binary, radix 2, number `1100` into its long (BIGINT) equivalent.
```sql
SELECT
'1100' AS "binary_as_string",
PARSE_LONG('1110', 2) AS "bigint_value"
```
Returns the following:
| `binary_as_string` | `bigint_value` |
| -- | -- |
| `1100` | `14` |
</details>
[Learn more](sql-scalar.md#string-functions)
Converts a string into a BIGINT with the given base or into a DECIMAL data type if the base is not specified.
## PERCENT_RANK
@ -1810,11 +1878,33 @@ Returns the relative rank of the row calculated as a percentage according to the
## POSITION
`POSITION(<CHARACTER> IN <CHARACTER> [FROM <INTEGER>])`
Returns the one-based index position of a substring within an expression, optionally starting from a given one-based index. If `substring` is not found, returns 0.
**Function type:** [Scalar, string](sql-scalar.md#string-functions)
* **Syntax**: `POSITION(substring IN expr [FROM startingIndex])`
* **Function type:** Scalar, string
Returns the one-based index position of a substring within an expression, optionally starting from a given one-based index.
<details><summary>Example</summary>
The following example returns the one-based index of the substring `PR` in the `OriginCityName` column from the `flight-carriers` datasource starting from index 5.
```sql
SELECT
"OriginCityName" AS "origin_city",
POSITION('PR' IN "OriginCityName" FROM 5) AS "index"
FROM "flight-carriers"
LIMIT 2
```
Returns the following:
| `origin_city` | `index` |
| -- | -- |
| `San Juan, PR` | `11` |
| `Boston, MA` | `0` |
</details>
[Learn more](sql-scalar.md#string-functions)
## POWER
@ -1872,28 +1962,90 @@ Returns the rank with gaps for a row within a window. For example, if two rows t
## REGEXP_EXTRACT
`REGEXP_EXTRACT(<CHARACTER>, <CHARACTER>, [<INTEGER>])`
Apply regular expression `pattern` to `expr` and extract the Nth capture group. If `N` is unspecified or zero, returns the first substring that matches the pattern. Returns `null` if there is no matching pattern.
**Function type:** [Scalar, string](sql-scalar.md#string-functions)
* **Syntax:** `REGEXP_EXTRACT(expr, pattern[, N])`
* **Function type:** Scalar, string
Applies a regular expression to the string expression and returns the _n_th match.
<details><summary>Example</summary>
The following example uses regular expressions to find city names inside the `OriginCityName` column from the `flight-carriers` datasource by matching what comes before the comma.
```sql
SELECT
"OriginCityName" AS "origin_city",
REGEXP_EXTRACT("OriginCityName", '([^,]+)', 0) AS "pattern_match"
FROM "flight-carriers"
LIMIT 1
```
Returns the following:
| `origin_city` | `pattern_match` |
| -- | -- |
| `San Juan, PR` | `San Juan`|
</details>
[Learn more](sql-scalar.md#string-functions)
## REGEXP_LIKE
`REGEXP_LIKE(<CHARACTER>, <CHARACTER>)`
Returns `true` if the regular expression `pattern` finds a match in `expr`. Returns `false` otherwise.
**Function type:** [Scalar, string](sql-scalar.md#string-functions)
* **Syntax:** `REGEXP_LIKE(expr, pattern)`
* **Function type:** Scalar, string
Returns true or false signifying whether the regular expression finds a match in the string expression.
<details><summary>Example</summary>
The following example returns `true` when the `OriginCityName` column from `flight-carriers` has a city name containing a space.
```sql
SELECT
"OriginCityName" AS "origin_city",
REGEXP_LIKE("OriginCityName", '[A-Za-z]+\s[A-Za-z]+') AS "pattern_found"
FROM "flight-carriers"
LIMIT 2
```
Returns the following:
| `origin_city` | `pattern_found` |
| -- | -- |
| `San Juan, PR` | `true` |
| `Boston, MA` | `false` |
</details>
[Learn more](sql-scalar.md#string-functions)
## REGEXP_REPLACE
`REGEXP_REPLACE(<CHARACTER>, <CHARACTER>, <CHARACTER>)`
Replaces all occurrences of a regular expression in a string expression with a replacement string. Refer to capture groups in the replacement string using `$group` syntax. For example: `$1` or `$2`.
**Function type:** [Scalar, string](sql-scalar.md#string-functions)
* **Syntax:** `REGEXP_REPLACE(expr, pattern, replacement)`
* **Function type:** Scalar, string
<details><summary>Example</summary>
The following example matches three consecutive words, where each word is its own capture group, and replaces the matched words with the word in the second capture group punctuated with exclamation marks.
```sql
SELECT
'foo bar baz' AS "original_string",
REGEXP_REPLACE('foo bar baz', '([A-Za-z]+) ([A-Za-z]+) ([A-Za-z]+)' , '$2!') AS "modified_string"
```
Returns the following:
| `original_string` | `modified_string` |
| -- | -- |
| `foo bar baz` | `bar!` |
</details>
[Learn more](sql-scalar.md#string-functions)
Replaces all occurrences of a regular expression in a string expression with a replacement string. The replacement
string may refer to capture groups using `$1`, `$2`, etc.
## REPEAT
@ -2041,11 +2193,34 @@ Returns the number of the row within the window starting from 1.
## RPAD
`RPAD(<CHARACTER>, <INTEGER>, [<CHARACTER>])`
Returns a string of size `length` from `expr`. When the length of `expr` is less than `length`, right pads `expr` with `chars`, which defaults to the space character. Truncates `expr` to `length` if `length` is shorter than the length of `expr`.
**Function type:** [Scalar, string](sql-scalar.md#string-functions)
* **Syntax:** `RPAD(expr, length[, chars])`
* **Function type:** Scalar, string
Returns the rightmost number of characters from an expression, optionally padded with the given characters.
<details><summary>Example</summary>
The following example right pads the value of `OriginStateName` from the `flight-carriers` datasource to return a total of 11 characters.
```sql
SELECT
"OriginStateName" AS "origin_state",
RPAD("OriginStateName", 11, '+') AS "add_right_padding"
FROM "flight-carriers"
LIMIT 3
```
Returns the following:
| `origin_state` | `add_right_padding` |
| -- | -- |
| `Puerto Rico` | `Puerto Rico` |
| `Massachusetts` | `Massachuset` |
| `Florida` | `Florida++++` |
</details>
[Learn more](sql-scalar.md#string-functions)
## RTRIM
@ -2586,12 +2761,33 @@ For more information, see [UNNEST](./sql.md#unnest).
## UPPER
`UPPER(expr)`
**Function type:** [Scalar, string](sql-scalar.md#string-functions)
Returns the expression in uppercase.
* **Syntax:** `UPPER(expr)`
* **Function type:** Scalar, string
<details><summary>Example</summary>
The following example converts the `OriginCityName` column from the `flight-carriers` datasource to uppercase.
```sql
SELECT
"OriginCityName" AS "origin_city",
UPPER("OriginCityName") AS "uppercase"
FROM "flight-carriers"
LIMIT 1
```
Returns the following:
| `origin_city` | `uppercase` |
| -- | -- |
`San Juan, PR` | `SAN JUAN, PR` |
</details>
[Learn more](sql-scalar.md#string-functions)
## VAR_POP
`VAR_POP(expr)`

View File

@ -105,11 +105,11 @@ String functions accept strings and return a type appropriate to the function.
|`LOOKUP(expr, lookupName[, replaceMissingValueWith])`|Looks up `expr` in an existing [query-time lookup table](lookups.md) that has the name `lookupName`. Returns the optional constant `replaceMissingValueWith` when `expr` is null or when the lookup does not contain a value for `expr`.<br /><br />You can query lookups directly using the [`lookup` schema](sql.md#from).|
|`LOWER(expr)`|Returns `expr` in all lowercase.|
|`UPPER(expr)`|Returns `expr` in all uppercase.|
|`LPAD(expr, length[, chars])`|Returns a string of `length` from `expr` left-padded with `chars`. If `length` is shorter than the length of `expr`, the result is `expr` which is truncated to `length`. The result is null if either `expr` or `chars` is null. If `chars` is an empty string, no padding is added; however, `expr` may be trimmed if necessary.|
|`RPAD(expr, length[, chars])`|Returns a string of `length` from `expr` right-padded with `chars`. If `length` is shorter than the length of `expr`, the result is `expr` which is truncated to `length`. The result is null if either `expr` or `chars` is null. If `chars` is an empty string, no padding is added; however, `expr` may be trimmed if necessary.|
|`PARSE_LONG(string[, radix])`|Parses a string into a long (BIGINT) with the given radix or 10 (decimal) if a radix is not provided.|
|`LPAD(expr, length[, chars])`|Returns a string of `length` from `expr`. If `expr` is shorter than `length`, left pads `expr` with `chars`, which defaults to space characters. If `expr` exceeds `length`, truncates `expr` to equal `length`. If `chars` is an empty string, no padding is added. Returns `null` if either `expr` or `chars` is null.|
|`RPAD(expr, length[, chars])`|Returns a string of `length` from `expr`. If `expr` is shorter than `length`, right pads `expr` with `chars`, which defaults to space characters. If `expr` exceeds `length`, truncates `expr` to equal `length`. If `chars` is an empty string, no padding is added. Returns `null` if either `expr` or `chars` is null.|
|`PARSE_LONG(string[, radix])`|Parses a string into a long (BIGINT) with the given radix, or 10 (decimal) if a radix is not provided.|
|`POSITION(substring IN expr [FROM startingIndex])`|Returns the index of `substring` within `expr` with indexes starting from 1. The search begins at `startingIndex`. If `startingIndex` is not specified, the default is 1. If `substring` is not found, returns 0.|
|`REGEXP_EXTRACT(expr, pattern[, index])`|Apply regular expression `pattern` to `expr` and extract a capture group or `NULL` if there is no match. If `index` is unspecified or zero, returns the first substring that matched the pattern. The pattern may match anywhere inside `expr`. To match the entire string, use the `^` and `$` markers at the start and end of your pattern. When `druid.generic.useDefaultValueForNull = true`, it is not possible to differentiate an empty-string match from a non-match (both return `NULL`).|
|`REGEXP_EXTRACT(expr, pattern[, index])`|Apply regular expression `pattern` to `expr` and extract a capture group or `NULL` if there is no match. If `index` is unspecified or zero, returns the first substring that matches the pattern. The pattern may match anywhere inside `expr`. To match the entire string, use the `^` and `$` markers at the start and end of your pattern. When `druid.generic.useDefaultValueForNull = true`, it is not possible to differentiate an empty-string match from a non-match (both return `NULL`).|
|`REGEXP_LIKE(expr, pattern)`|Returns whether `expr` matches regular expression `pattern`. The pattern may match anywhere inside `expr`; if you want to match the entire string instead, use the `^` and `$` markers at the start and end of your pattern. Similar to [`LIKE`](sql-operators.md#logical-operators), but uses regexps instead of LIKE patterns. Especially useful in WHERE clauses.|
|`REGEXP_REPLACE(expr, pattern, replacement)`|Replaces all occurrences of regular expression `pattern` within `expr` with `replacement`. The replacement string may refer to capture groups using `$1`, `$2`, etc. The pattern may match anywhere inside `expr`; if you want to match the entire string instead, use the `^` and `$` markers at the start and end of your pattern.|
|`REPLACE(expr, substring, replacement)`|Replaces instances of `substring` in `expr` with `replacement` and returns the result.|