diff --git a/docs/reference/getting-started.asciidoc b/docs/reference/getting-started.asciidoc index 382d881a46e..a0466290696 100755 --- a/docs/reference/getting-started.asciidoc +++ b/docs/reference/getting-started.asciidoc @@ -16,6 +16,7 @@ Here are a few sample use-cases that Elasticsearch could be used for: For the rest of this tutorial, you will be guided through the process of getting Elasticsearch up and running, taking a peek inside it, and performing basic operations like indexing, searching, and modifying your data. At the end of this tutorial, you should have a good idea of what Elasticsearch is, how it works, and hopefully be inspired to see how you can use it to either build sophisticated search applications or to mine intelligence from your data. -- +[[getting-started-concepts]] == Basic Concepts There are a few concepts that are core to Elasticsearch. Understanding these concepts from the outset will tremendously help ease the learning process. @@ -103,6 +104,7 @@ You can monitor shard sizes using the {ref}/cat-shards.html[`_cat/shards`] API. With that out of the way, let's get started with the fun part... +[[getting-started-install]] == Installation [TIP] @@ -266,6 +268,7 @@ As mentioned previously, we can override either the cluster or node name. This c Also note the line marked http with information about the HTTP address (`192.168.8.112`) and port (`9200`) that our node is reachable from. By default, Elasticsearch uses port `9200` to provide access to its REST API. This port is configurable if necessary. +[[getting-started-explore]] == Exploring Your Cluster [float] @@ -278,6 +281,7 @@ Now that we have our node (and cluster) up and running, the next step is to unde * Perform CRUD (Create, Read, Update, and Delete) and search operations against your indexes * Execute advanced search operations such as paging, sorting, filtering, scripting, aggregations, and many others +[[getting-started-cluster-health]] === Cluster Health Let's start with a basic health check, which we can use to see how our cluster is doing. We'll be using curl to do this but you can use any tool that allows you to make HTTP/REST calls. Let's assume that we are still on the same node where we started Elasticsearch on and open another command shell window. @@ -336,6 +340,7 @@ ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master Here, we can see our one node named "PB2SGZY", which is the single node that is currently in our cluster. +[[getting-started-list-indices]] === List All Indices Now let's take a peek at our indices: @@ -356,6 +361,7 @@ health status index uuid pri rep docs.count docs.deleted store.size pri.store.si Which simply means we have no indices yet in the cluster. +[[getting-started-create-index]] === Create an Index Now let's create an index named "customer" and then list all the indexes again: @@ -382,6 +388,7 @@ The results of the second command tells us that we now have one index named cust You might also notice that the customer index has a yellow health tagged to it. Recall from our previous discussion that yellow means that some replicas are not (yet) allocated. The reason this happens for this index is because Elasticsearch by default created one replica for this index. Since we only have one node running at the moment, that one replica cannot yet be allocated (for high availability) until a later point in time when another node joins the cluster. Once that replica gets allocated onto a second node, the health status for this index will turn to green. +[[getting-started-query-document]] === Index and Query a Document Let's now put something into our customer index. We'll index a simple customer document into the customer index, with an ID of 1 as follows: @@ -446,6 +453,7 @@ And the response: Nothing out of the ordinary here other than a field, `found`, stating that we found a document with the requested ID 1 and another field, `_source`, which returns the full JSON document that we indexed from the previous step. +[[getting-started-delete-index]] === Delete an Index Now let's delete the index that we just created and then list all the indexes again: @@ -492,6 +500,7 @@ If we study the above commands carefully, we can actually see a pattern of how w This REST access pattern is so pervasive throughout all the API commands that if you can simply remember it, you will have a good head start at mastering Elasticsearch. +[[getting-started-modify-data]] == Modifying Your Data Elasticsearch provides data manipulation and search capabilities in near real time. By default, you can expect a one second delay (refresh interval) from the time you index/update/delete your data until the time that it appears in your search results. This is an important distinction from other platforms like SQL wherein data is immediately available after a transaction is completed. @@ -552,6 +561,7 @@ POST /customer/_doc?pretty Note that in the above case, we are using the `POST` verb instead of PUT since we didn't specify an ID. +[[getting-started-update-documents]] === Updating Documents In addition to being able to index and replace documents, we can also update documents. Note though that Elasticsearch does not actually do in-place updates under the hood. Whenever we do an update, Elasticsearch deletes the old document and then indexes a new document with the update applied to it in one shot. @@ -596,6 +606,7 @@ In the above example, `ctx._source` refers to the current source document that i Elasticsearch provides the ability to update multiple documents given a query condition (like an `SQL UPDATE-WHERE` statement). See {ref}/docs-update-by-query.html[`docs-update-by-query` API] +[[getting-started-delete-documents]] === Deleting Documents Deleting a document is fairly straightforward. This example shows how to delete our previous customer with the ID of 2: @@ -611,6 +622,7 @@ See the {ref}/docs-delete-by-query.html[`_delete_by_query` API] to delete all do It is worth noting that it is much more efficient to delete a whole index instead of deleting all documents with the Delete By Query API. +[[getting-started-batch-processing]] === Batch Processing In addition to being able to index, update, and delete individual documents, Elasticsearch also provides the ability to perform any of the above operations in batches using the {ref}/docs-bulk.html[`_bulk` API]. This functionality is important in that it provides a very efficient mechanism to do multiple operations as fast as possible with as few network roundtrips as possible. @@ -643,6 +655,7 @@ Note above that for the delete action, there is no corresponding source document The Bulk API does not fail due to failures in one of the actions. If a single action fails for whatever reason, it will continue to process the remainder of the actions after it. When the bulk API returns, it will provide a status for each action (in the same order it was sent in) so that you can check if a specific action failed or not. +[[getting-started-explore-data]] == Exploring Your Data [float] @@ -706,6 +719,7 @@ yellow open bank l7sSYV2cQXmu6_4rJWVIww 5 1 1000 0 12 Which means that we just successfully bulk indexed 1000 documents into the bank index (under the `_doc` type). +[[getting-started-search-API]] === The Search API Now let's start with some simple searches. There are two basic ways to run searches: one is by sending search parameters through the {ref}/search-uri-request.html[REST request URI] and the other by sending them through the {ref}/search-request-body.html[REST request body]. The request body method allows you to be more expressive and also to define your searches in a more readable JSON format. We'll try one example of the request URI method but for the remainder of this tutorial, we will exclusively be using the request body method. @@ -843,6 +857,7 @@ to clutter the docs with it: It is important to understand that once you get your search results back, Elasticsearch is completely done with the request and does not maintain any kind of server-side resources or open cursors into your results. This is in stark contrast to many other platforms such as SQL wherein you may initially get a partial subset of your query results up-front and then you have to continuously go back to the server if you want to fetch (or page through) the rest of the results using some kind of stateful server-side cursor. +[[getting-started-query-lang]] === Introducing the Query Language Elasticsearch provides a JSON-style domain-specific language that you can use to execute queries. This is referred to as the {ref}/query-dsl.html[Query DSL]. The query language is quite comprehensive and can be intimidating at first glance but the best way to actually learn it is to start with a few basic examples. @@ -907,6 +922,7 @@ GET /bank/_search // CONSOLE // TEST[continued] +[[getting-started-search]] === Executing Searches Now that we have seen a few of the basic search parameters, let's dig in some more into the Query DSL. Let's first take a look at the returned document fields. By default, the full JSON document is returned as part of all searches. This is referred to as the source (`_source` field in the search hits). If we don't want the entire source document returned, we have the ability to request only a few fields from within source to be returned. @@ -1066,6 +1082,7 @@ GET /bank/_search // CONSOLE // TEST[continued] +[[getting-started-filters]] === Executing Filters In the previous section, we skipped over a little detail called the document score (`_score` field in the search results). The score is a numeric value that is a relative measure of how well the document matches the search query that we specified. The higher the score, the more relevant the document is, the lower the score, the less relevant the document is. @@ -1102,6 +1119,7 @@ Dissecting the above, the bool query contains a `match_all` query (the query par In addition to the `match_all`, `match`, `bool`, and `range` queries, there are a lot of other query types that are available and we won't go into them here. Since we already have a basic understanding of how they work, it shouldn't be too difficult to apply this knowledge in learning and experimenting with the other query types. +[[getting-started-aggregations]] === Executing Aggregations Aggregations provide the ability to group and extract statistics from your data. The easiest way to think about aggregations is by roughly equating it to the SQL GROUP BY and the SQL aggregate functions. In Elasticsearch, you have the ability to execute searches returning hits and at the same time return aggregated results separate from the hits all in one response. This is very powerful and efficient in the sense that you can run queries and multiple aggregations and get the results back of both (or either) operations in one shot avoiding network roundtrips using a concise and simplified API. @@ -1305,6 +1323,7 @@ GET /bank/_search There are many other aggregations capabilities that we won't go into detail here. The {ref}/search-aggregations.html[aggregations reference guide] is a great starting point if you want to do further experimentation. +[[getting-started-conclusion]] == Conclusion Elasticsearch is both a simple and complex product. We've so far learned the basics of what it is, how to look inside of it, and how to work with it using some of the REST APIs. Hopefully this tutorial has given you a better understanding of what Elasticsearch is and more importantly, inspired you to further experiment with the rest of its great features!