Start a PPL query with a `search` command to reference a table to search from. You can have the commands that follow in any order.
In the following example, the `search` command refers to an `accounts` index as the source, then uses `fields` and `where` commands for the conditions:
```sql
search source=accounts
| where age > 18
| fields firstname, lastname
```
In the below examples, we represent required arguments in angle brackets `< >` and optional arguments in square brackets `[ ]`.
{: .note }
## search
Use the `search` command to retrieve a document from an index. You can only use the `search` command as the first command in the PPL query.
### Syntax
```sql
search source=<index> [boolean-expression]
```
Field | Description | Required
:--- | :--- |:---
`search` | Specify search keywords. | Yes
`index` | Specify which index to query from. | No
`bool-expression` | Specify an expression that evaluates to a boolean value. | No
*Example 1*: Get all documents
To get all documents from the `accounts` index:
```sql
search source=accounts;
```
| account_number | firstname | address | balance | gender | city | employer | state | age | email | lastname |
`int` | Retain the specified number of duplicate events for each combination. The number must be greater than 0. If you do not specify a number, only the first occurring event is kept and all other duplicates are removed from the results. | `string` | No | 1
`keepempty` | If true, keep the document if any field in the field list has a null value or a field missing. | `nested list of objects` | No | False
Use the `parse` command to parse a text field using regular expression and append the result to the search result.
### Syntax
```sql
parse <field><regular-expression>
```
Field | Description | Required
:--- | :--- |:---
field | A text field. | Yes
regular-expression | The regular expression used to extract new fields from the given test field. If a new field name exists, it will replace the original field. | Yes
The regular expression is used to match the whole text field of each document with Java regex engine. Each named capture group in the expression will become a new ``STRING`` field.
The example shows how to create new field `host` for each document. `host` will be the hostname after `@` in `email` field. Parsing a null field will return an empty string.
*Example 3*: Filter and sort be casted parsed field
The example shows how to sort street numbers that are higher than 500 in address field.
```sql
os> source=accounts | parse address '(?<streetNumber>\d+) (?<street>.+)' | where cast(streetNumber as int) > 500 | sort num(streetNumber) | fields streetNumber, street ;
fetched rows / total rows = 3/3
```
| streetNumber | street
:--- | :--- |
| 671 | Bristol Street
| 789 | Madison Street
| 880 | Holmes Lane
### Limitations
A few limitations exist when using the parse command:
- Fields defined by parse cannot be parsed again. For example, `source=accounts | parse address '\d+ (?<street>.+)' | parse street '\w+ (?<road>\w+)' ;` will fail to return any expressions.
- Fields defined by parse cannot be overridden with other commands. For example, when entering `source=accounts | parse address '\d+ (?<street>.+)' | eval street='1' | where street='1' ;``where` will not match any documents since `street` cannot be overridden.
- The text field used by parse cannot be overridden. For example, when entering `source=accounts | parse address '\d+ (?<street>.+)' | eval address='1' ;``street` will not be parse since address is overridden.
- Fields defined by parse cannot be filtered/sorted after using them in the `stats` command. For example, `source=accounts | parse email '.+@(?<host>.+)' | stats avg(age) by host | where host=pyrami.com ;``where` will not parse the domain listed.
Use the `stats` command to aggregate from search results.
The following table lists the aggregation functions and also indicates how each one handles null or missing values:
Function | NULL | MISSING
:--- | :--- |:---
`COUNT` | Not counted | Not counted
`SUM` | Ignore | Ignore
`AVG` | Ignore | Ignore
`MAX` | Ignore | Ignore
`MIN` | Ignore | Ignore
### Syntax
```
stats <aggregation>... [by-clause]...
```
Field | Description | Required | Default
:--- | :--- |:---
`aggregation` | Specify a statistical aggregation function. The argument of this function must be a field. | Yes | 1000
`by-clause` | Specify one or more fields to group the results by. If not specified, the `stats` command returns only one row, which is the aggregation over the entire result set. | No | -
*Example 1*: Calculate the average value of a field
To calculate the average `age` of all documents:
```sql
search source=accounts | stats avg(age);
```
| avg(age)
:--- |
| 32.25
*Example 2*: Calculate the average value of a field by group
To calculate the average age grouped by gender:
```sql
search source=accounts | stats avg(age) by gender;
Use the `where` command with a bool expression to filter the search result. The `where` command only returns the result when the bool expression evaluates to true.
### Syntax
```sql
where <boolean-expression>
```
Field | Description | Required
:--- | :--- |:---
`bool-expression` | An expression that evaluates to a boolean value. | No
*Example 1*: Filter result set with a condition
To get all documents from the `accounts` index where `account_number` is 1 or gender is `F`:
The `ad` command applies the Random Cut Forest (RCF) algorithm in the ML Commons plugin on the search result returned by a PPL command. Based on the input, the plugin uses two types of RCF algorithms: fixed in time RCF for processing time-series data and batch RCF for processing non-time-series data.
`shingle_size` | A consecutive sequence of the most recent records. The default value is 8. | No
`time_decay` | Specifies how much of the recent past to consider when computing an anomaly score. The default value is 0.001. | No
`time_field` | Specifies the time filed for RCF to use as time-series data. Must be either a long value, such as the timestamp in miliseconds, or a string value in yyyy-MM-dd HH:mm:ss.| Yes
For `cluster-number`, enter the number of clusters you want to group your data points into.
*Example*
The example shows how to classify three Iris species (Iris setosa, Iris virginica and Iris versicolor) based on the combination of four features measured from each sample: the length and the width of the sepals and petals.