mirror of https://github.com/apache/lucene.git
SOLR-13129: nested docs: add more/better documentation in Solr ref-guide
This commit is contained in:
parent
2afa6cb00a
commit
c2a6772f1e
solr
CHANGES.txt
solr-ref-guide/src
blockjoin-faceting.adocindexing-and-basic-data-operations.adocindexing-nested-documents.adocjson-facet-api.adocjson-faceting-domain-changes.adocother-parsers.adocsearching-nested-documents.adocsearching.adoctransforming-and-indexing-custom-json.adoctransforming-result-documents.adocuploading-data-with-index-handlers.adoc
|
@ -295,7 +295,7 @@ New Features
|
||||||
|
|
||||||
* SOLR-12639: Umbrella JIRA for adding support HTTP/2 (Cao Manh Dat)
|
* SOLR-12639: Umbrella JIRA for adding support HTTP/2 (Cao Manh Dat)
|
||||||
|
|
||||||
* SOLR-12768: Improved nested document support, and enabled in the default schema with the presence of _nest_path_.
|
* SOLR-12768, SOLR-13129: Improved nested document support, and enabled in the default schema with the presence of _nest_path_.
|
||||||
When this field is present, certain things happen automatically. An internal URP is automatically used to populate
|
When this field is present, certain things happen automatically. An internal URP is automatically used to populate
|
||||||
it. The [child] (doc transformer) will return a hierarchy with relationships; no params needed. The relationship
|
it. The [child] (doc transformer) will return a hierarchy with relationships; no params needed. The relationship
|
||||||
path is indexed for use in queries (can be disabled if not needed). Also, child documents needn't provide a uniqueKey
|
path is indexed for use in queries (can be disabled if not needed). Also, child documents needn't provide a uniqueKey
|
||||||
|
|
|
@ -41,7 +41,7 @@ This example shows how you could add this search components to `solrconfig.xml`
|
||||||
|
|
||||||
This component can be added into any search request handler. This component work with distributed search in SolrCloud mode.
|
This component can be added into any search request handler. This component work with distributed search in SolrCloud mode.
|
||||||
|
|
||||||
Documents should be added in children-parent blocks as described in <<uploading-data-with-index-handlers.adoc#nested-child-documents,indexing nested child documents>>. Examples:
|
Documents should be added in children-parent blocks as described in <<indexing-nested-documents.adoc#indexing-nested-documents,indexing nested child documents>>. Examples:
|
||||||
|
|
||||||
.Sample document
|
.Sample document
|
||||||
[source,xml]
|
[source,xml]
|
||||||
|
|
|
@ -1,5 +1,16 @@
|
||||||
= Indexing and Basic Data Operations
|
= Indexing and Basic Data Operations
|
||||||
:page-children: introduction-to-solr-indexing, post-tool, uploading-data-with-index-handlers, uploading-data-with-solr-cell-using-apache-tika, uploading-structured-data-store-data-with-the-data-import-handler, updating-parts-of-documents, detecting-languages-during-indexing, de-duplication, content-streams, reindexing
|
:page-children: introduction-to-solr-indexing, +
|
||||||
|
post-tool, +
|
||||||
|
uploading-data-with-index-handlers, +
|
||||||
|
indexing-nested-documents, +
|
||||||
|
uploading-data-with-solr-cell-using-apache-tika, +
|
||||||
|
uploading-structured-data-store-data-with-the-data-import-handler, +
|
||||||
|
updating-parts-of-documents, +
|
||||||
|
detecting-languages-during-indexing, +
|
||||||
|
de-duplication, +
|
||||||
|
content-streams, +
|
||||||
|
reindexing
|
||||||
|
|
||||||
// Licensed to the Apache Software Foundation (ASF) under one
|
// Licensed to the Apache Software Foundation (ASF) under one
|
||||||
// or more contributor license agreements. See the NOTICE file
|
// or more contributor license agreements. See the NOTICE file
|
||||||
// distributed with this work for additional information
|
// distributed with this work for additional information
|
||||||
|
@ -27,6 +38,8 @@ This section describes how Solr adds data to its index. It covers the following
|
||||||
|
|
||||||
* *<<transforming-and-indexing-custom-json.adoc#transforming-and-indexing-custom-json,Transforming and Indexing Custom JSON>>*: Index any JSON of your choice
|
* *<<transforming-and-indexing-custom-json.adoc#transforming-and-indexing-custom-json,Transforming and Indexing Custom JSON>>*: Index any JSON of your choice
|
||||||
|
|
||||||
|
* *<<indexing-nested-documents.adoc#indexing-nested-documents,Indexing Nested Documents>>*: Detailed information about indexing and schema configuration for nested documents.
|
||||||
|
|
||||||
* *<<uploading-data-with-solr-cell-using-apache-tika.adoc#uploading-data-with-solr-cell-using-apache-tika,Uploading Data with Solr Cell using Apache Tika>>*: Information about using the Solr Cell framework to upload data for indexing.
|
* *<<uploading-data-with-solr-cell-using-apache-tika.adoc#uploading-data-with-solr-cell-using-apache-tika,Uploading Data with Solr Cell using Apache Tika>>*: Information about using the Solr Cell framework to upload data for indexing.
|
||||||
|
|
||||||
* *<<uploading-structured-data-store-data-with-the-data-import-handler.adoc#uploading-structured-data-store-data-with-the-data-import-handler,Uploading Structured Data Store Data with the Data Import Handler>>*: Information about uploading and indexing data from a structured data store.
|
* *<<uploading-structured-data-store-data-with-the-data-import-handler.adoc#uploading-structured-data-store-data-with-the-data-import-handler,Uploading Structured Data Store Data with the Data Import Handler>>*: Information about uploading and indexing data from a structured data store.
|
||||||
|
|
|
@ -0,0 +1,151 @@
|
||||||
|
= Indexing Nested Child Documents
|
||||||
|
// Licensed to the Apache Software Foundation (ASF) under one
|
||||||
|
// or more contributor license agreements. See the NOTICE file
|
||||||
|
// distributed with this work for additional information
|
||||||
|
// regarding copyright ownership. The ASF licenses this file
|
||||||
|
// to you under the Apache License, Version 2.0 (the
|
||||||
|
// "License"); you may not use this file except in compliance
|
||||||
|
// with the License. You may obtain a copy of the License at
|
||||||
|
//
|
||||||
|
// http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
//
|
||||||
|
// Unless required by applicable law or agreed to in writing,
|
||||||
|
// software distributed under the License is distributed on an
|
||||||
|
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||||
|
// KIND, either express or implied. See the License for the
|
||||||
|
// specific language governing permissions and limitations
|
||||||
|
// under the License.
|
||||||
|
|
||||||
|
Solr supports indexing nested documents, described here, and ways to <<searching-nested-documents.adoc#searching-nested-documents,search and retrieve>> them very efficiently.
|
||||||
|
By way of example, nested documents in Solr can be used to bind a blog post (parent document) with comments (child documents)
|
||||||
|
-- or products as parent documents and sizes, colors, or other variations as child documents. +
|
||||||
|
The parent with all children is referred to as a nested document or "block" and it explains some of the nomenclature of related features.
|
||||||
|
At query time, the <<other-parsers.adoc#block-join-query-parsers,Block Join Query Parsers>> can search these relationships,
|
||||||
|
and the `<<transforming-result-documents.adoc#child-childdoctransformerfactory,[child]>>` Document Transformer can attach child documents to the result documents.
|
||||||
|
In terms of performance, indexing the relationships between documents usually yields much faster queries than an equivalent "query time join",
|
||||||
|
since the relationships are already stored in the index and do not need to be computed.
|
||||||
|
However, nested documents are less flexible than query time joins as it imposes rules that some applications may not be able to accept.
|
||||||
|
Nested documents may be indexed via either the XML or JSON data syntax, and is also supported by <<using-solrj.adoc#using-solrj,SolrJ>> with javabin.
|
||||||
|
|
||||||
|
[NOTE]
|
||||||
|
====
|
||||||
|
.Limitation
|
||||||
|
With the exception of in-place updates, the whole block must be updated or deleted together, not separately. For some applications this may result in tons of extra indexing and thus may be a deal-breaker.
|
||||||
|
====
|
||||||
|
|
||||||
|
== Schema Configuration
|
||||||
|
|
||||||
|
* The schema must include an indexed field `\_root_`. Solr automatically populates this with the value of the top/parent ID. +
|
||||||
|
`<field name="\_root_" type="string" indexed="true" stored="false" docValues="false" />`
|
||||||
|
* `\_nest_path_` is populated by Solr automatically with the path of the document in the hierarchy for non-root documents. This field is optional. +
|
||||||
|
`<fieldType name="\_nest_path_" class="solr.NestPathField" />
|
||||||
|
<field name="\_nest_path_" type="_nest_path_" />`
|
||||||
|
* `\_nest_parent_` is populated by Solr automatically to store the ID of each document's parent document (if there is one). This field is optional. +
|
||||||
|
`<field name="\_nest_parent_" type="string" indexed="true" stored="true"/>`
|
||||||
|
* Nested documents are very much documents in their own right even if certain nested documents hold different information from the parent.
|
||||||
|
Therefore:
|
||||||
|
** a field can only be configured one way no matter what sort of document uses it
|
||||||
|
** it may be infeasible to use `required`
|
||||||
|
** even child documents need a unique ID
|
||||||
|
* Even though child documents are provided as field values syntactically and with SolrJ, it's a matter of syntax and it isn't an actual field in the schema.
|
||||||
|
Consequently, the field need not be defined in the schema and probably shouldn't be as it would be confusing.
|
||||||
|
There is no child document field type, at least not yet.
|
||||||
|
|
||||||
|
=== Rudimentary Root-only Schemas
|
||||||
|
|
||||||
|
These schemas do not contain any other nested related fields apart from `\_root_`.
|
||||||
|
Many schemas in existence are this way simply because default configSets are this way, even if the application isn't using nested documents.
|
||||||
|
If an application uses nested documents with such a schema, keep in mind that that some related features aren't as effective since there is less information. Mainly the <<searching-nested-documents.adoc#child-doc-transformer,[child]>> transformer returns matching children in a flat list (not nested) and it's attached to the parent using the special field name `\_childDocuments_`.
|
||||||
|
|
||||||
|
With such a schema, typically you should have a field that differentiates a root doc from any nested children.
|
||||||
|
However this isn't strictly necessary; so long as it's possible to write a query that can select only root documents somehow.
|
||||||
|
Such a query is needed for the <<other-parsers.adoc#block-join-query-parsers,block join query parsers>> and <<searching-nested-documents.adoc#child-doc-transformer,[child]>> doc transformer to function.
|
||||||
|
|
||||||
|
=== XML Examples
|
||||||
|
|
||||||
|
Here are two documents and their child documents.
|
||||||
|
It illustrates two styles of adding child documents: the first is associated via a field "comment" (preferred),
|
||||||
|
and the second is done in the classic way now referred to as an "anonymous" or "unlabelled" child document.
|
||||||
|
This field label relationship is available to the URP chain in Solr but is ultimately discarded unless the special fields are defined.
|
||||||
|
|
||||||
|
[source,xml]
|
||||||
|
----
|
||||||
|
<add>
|
||||||
|
<doc>
|
||||||
|
<field name="ID">1</field>
|
||||||
|
<field name="title">Solr adds block join support</field>
|
||||||
|
<field name="content_type">parentDocument</field>
|
||||||
|
<field name="content">
|
||||||
|
<doc>
|
||||||
|
<field name="ID">2</field>
|
||||||
|
<field name="comments">SolrCloud supports it too!</field>
|
||||||
|
</doc>
|
||||||
|
</field>
|
||||||
|
</doc>
|
||||||
|
<doc>
|
||||||
|
<field name="ID">3</field>
|
||||||
|
<field name="title">New Lucene and Solr release is out</field>
|
||||||
|
<field name="content_type">parentDocument</field>
|
||||||
|
<doc>
|
||||||
|
<field name="ID">4</field>
|
||||||
|
<field name="comments">Lots of new features</field>
|
||||||
|
</doc>
|
||||||
|
</doc>
|
||||||
|
</add>
|
||||||
|
----
|
||||||
|
|
||||||
|
In this example, we have indexed the parent documents with the field `content_type`, which has the value "parentDocument".
|
||||||
|
We could have also used a boolean field, such as `isParent`, with a value of "true", or any other similar approach.
|
||||||
|
|
||||||
|
=== JSON Examples
|
||||||
|
|
||||||
|
This example is equivalent to the XML example above.
|
||||||
|
Again, the field labelled relationship is preferred.
|
||||||
|
The labelled relationship here is one child document but could have been wrapped in array brackets.
|
||||||
|
For the anonymous relationship, note the special `\_childDocuments_` key whose contents must be an array of child documents.
|
||||||
|
|
||||||
|
[source,json]
|
||||||
|
----
|
||||||
|
[
|
||||||
|
{
|
||||||
|
"ID": "1",
|
||||||
|
"title": "Solr adds block join support",
|
||||||
|
"content_type": "parentDocument",
|
||||||
|
"comments": [{
|
||||||
|
"ID": "2",
|
||||||
|
"content": "SolrCloud supports it too!"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"ID": "3",
|
||||||
|
"content": "New filter syntax"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"ID": "4",
|
||||||
|
"title": "New Lucene and Solr release is out",
|
||||||
|
"content_type": "parentDocument",
|
||||||
|
"_childDocuments_": [
|
||||||
|
{
|
||||||
|
"ID": "5",
|
||||||
|
"comments": "Lots of new features"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
]
|
||||||
|
----
|
||||||
|
|
||||||
|
.Root-Only Mode
|
||||||
|
[NOTE]
|
||||||
|
In Root-only schemas, these two documents will result in the same docs being indexed (Root-only schemas do not honor nested relationships).
|
||||||
|
When queried, child docs will be appended to the _childDocuments_ field/key.
|
||||||
|
|
||||||
|
=== Important: Maintaining Integrity with Updates and Deletes
|
||||||
|
|
||||||
|
Nested documents (children and all) can simply be replaced by adding a new document with more or fewer documents as an application desires. This aspect isn't different than updating any normal document except that Solr takes care to ensure that all related child documents of the existing version get deleted.
|
||||||
|
|
||||||
|
Do *not* add a root document that has the same ID of a child document. _This will violate integrity assumptions that Solr expects._
|
||||||
|
|
||||||
|
To delete a nested document, you can delete it by the ID of the root document.
|
||||||
|
If you try to use an ID of a child document, nothing will happen since only root document IDs are considered.
|
||||||
|
If you use Solr's delete-by-query APIs, you *have to be careful* to ensure that no children remain of any documents that are being deleted. _Doing otherwise will violate integrity assumptions that Solr expects._
|
|
@ -757,7 +757,7 @@ Most stat facet functions (`avg`, `sumsq`, etc.) allow users to perform math com
|
||||||
|
|
||||||
=== uniqueBlock() and Block Join Counts
|
=== uniqueBlock() and Block Join Counts
|
||||||
|
|
||||||
When a collection contains <<uploading-data-with-index-handlers.adoc#nested-child-documents, Block Join child documents>>, the `blockChildren` and `blockParent` <<json-faceting-domain-changes.adoc#block-join-domain-changes, domain changes>> can be useful when searching for parent documents and you want to compute stats against all of the affected children documents (or vice versa).
|
When a collection contains <<indexing-nested-documents.adoc#indexing-nested-documents, Nested Documents>>, the `blockChildren` and `blockParent` <<json-faceting-domain-changes.adoc#block-join-domain-changes, domain changes>> can be useful when searching for parent documents and you want to compute stats against all of the affected children documents (or vice versa).
|
||||||
But if you only need to know the _count_ of all the blocks that exist in the current domain, a more efficient option is the `uniqueBlock()` aggregate function.
|
But if you only need to know the _count_ of all the blocks that exist in the current domain, a more efficient option is the `uniqueBlock()` aggregate function.
|
||||||
|
|
||||||
Suppose we have products with multiple SKUs, and we want to count products for each color.
|
Suppose we have products with multiple SKUs, and we want to count products for each color.
|
||||||
|
|
|
@ -177,7 +177,7 @@ NOTE: While a `query` domain can be combined with an additional domain `filter`,
|
||||||
|
|
||||||
== Block Join Domain Changes
|
== Block Join Domain Changes
|
||||||
|
|
||||||
When a collection contains <<uploading-data-with-index-handlers.adoc#nested-child-documents, Block Join child documents>>, the `blockChildren` or `blockParent` domain options can be used transform an existing domain containing one type of document, into a domain containing the documents with the specified relationship (child or parent of) to the documents from the original domain.
|
When a collection contains <<indexing-nested-documents.adoc#indexing-nested-documents, Nested Documents>>, the `blockChildren` or `blockParent` domain options can be used transform an existing domain containing one type of document, into a domain containing the documents with the specified relationship (child or parent of) to the documents from the original domain.
|
||||||
|
|
||||||
Both of these options work similarly to the corresponding <<other-parsers.adoc#block-join-query-parsers,Block Join Query Parsers>> by taking in a single String query that exclusively matches all parent documents in the collection. If `blockParent` is used, then the resulting domain will contain all parent documents of the children from the original domain. If `blockChildren` is used, then the resulting domain will contain all child documents of the parents from the original domain.
|
Both of these options work similarly to the corresponding <<other-parsers.adoc#block-join-query-parsers,Block Join Query Parsers>> by taking in a single String query that exclusively matches all parent documents in the collection. If `blockParent` is used, then the resulting domain will contain all parent documents of the children from the original domain. If `blockChildren` is used, then the resulting domain will contain all child documents of the parents from the original domain.
|
||||||
|
|
||||||
|
|
|
@ -24,7 +24,7 @@ Many of these parsers are expressed the same way as <<local-parameters-in-querie
|
||||||
|
|
||||||
== Block Join Query Parsers
|
== Block Join Query Parsers
|
||||||
|
|
||||||
There are two query parsers that support block joins. These parsers allow indexing and searching for relational content that has been <<uploading-data-with-index-handlers.adoc#nested-child-documents, indexed as nested documents>>.
|
There are two query parsers that support block joins. These parsers allow indexing and searching for relational content that has been <<indexing-nested-documents.adoc#indexing-nested-documents, indexed as Nested Documents>>.
|
||||||
|
|
||||||
The example usage of the query parsers below assumes these two documents and each of their child documents have been indexed:
|
The example usage of the query parsers below assumes these two documents and each of their child documents have been indexed:
|
||||||
|
|
||||||
|
|
|
@ -0,0 +1,202 @@
|
||||||
|
= Searching Nested Child Documents
|
||||||
|
// Licensed to the Apache Software Foundation (ASF) under one
|
||||||
|
// or more contributor license agreements. See the NOTICE file
|
||||||
|
// distributed with this work for additional information
|
||||||
|
// regarding copyright ownership. The ASF licenses this file
|
||||||
|
// to you under the Apache License, Version 2.0 (the
|
||||||
|
// "License"); you may not use this file except in compliance
|
||||||
|
// with the License. You may obtain a copy of the License at
|
||||||
|
//
|
||||||
|
// http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
//
|
||||||
|
// Unless required by applicable law or agreed to in writing,
|
||||||
|
// software distributed under the License is distributed on an
|
||||||
|
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||||
|
// KIND, either express or implied. See the License for the
|
||||||
|
// specific language governing permissions and limitations
|
||||||
|
// under the License.
|
||||||
|
|
||||||
|
This section exposes potential techniques which can be used for searching deeply nested documents,
|
||||||
|
show casing how more complex queries can be constructed using some of Solr's query parsers and Doc Transformers.
|
||||||
|
These features require `\_root_` and `\_nest_path_` to be declared in the schema. +
|
||||||
|
Please refer to the <<indexing-nested-documents.adoc#indexing-nested-documents, Indexing Nested Documents>>
|
||||||
|
section for more details about schema and index configuration.
|
||||||
|
|
||||||
|
|
||||||
|
[NOTE]
|
||||||
|
This section does not show case faceting on nested documents. For nested document faceting, please refer to the
|
||||||
|
<<blockjoin-faceting#blockjoin-faceting, Block Join Faceting>> section.
|
||||||
|
|
||||||
|
== Query Examples
|
||||||
|
|
||||||
|
For the upcoming examples, assume the following documents have been indexed:
|
||||||
|
|
||||||
|
[source,json]
|
||||||
|
----
|
||||||
|
[
|
||||||
|
{
|
||||||
|
"ID": "1",
|
||||||
|
"title": "Cooking Recommendations",
|
||||||
|
"tags": ["cooking", "meetup"],
|
||||||
|
"posts": [{
|
||||||
|
"ID": "2",
|
||||||
|
"title": "Cookies",
|
||||||
|
"comments": [{
|
||||||
|
"ID": "3",
|
||||||
|
"content": "Lovely recipe"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"ID": "4",
|
||||||
|
"content": "A-"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"ID": "5",
|
||||||
|
"title": "Cakes"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"ID": "6",
|
||||||
|
"title": "For Hire",
|
||||||
|
"tags": ["professional", "jobs"],
|
||||||
|
"posts": [{
|
||||||
|
"ID": "7",
|
||||||
|
"title": "Search Engineer",
|
||||||
|
"comments": [{
|
||||||
|
"ID": "8",
|
||||||
|
"content": "I am interested"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"ID": "9",
|
||||||
|
"content": "How large is the team?"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"ID": "10",
|
||||||
|
"title": "Low level Engineer"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
]
|
||||||
|
----
|
||||||
|
|
||||||
|
=== Child Doc Transformer
|
||||||
|
|
||||||
|
Can be used enrich query results with the documents' descendants. +
|
||||||
|
For a detailed explanation of this transformer, see the section <<transforming-result-documents.adoc#child-childdoctransformerfactory, [child] - ChildDocTransformerFactory>>.
|
||||||
|
|
||||||
|
For example, let us examine this query:
|
||||||
|
`q=ID:1,
|
||||||
|
fl=ID,[child childFilter=/comments/content:recipe]`. +
|
||||||
|
The Child Doc Transformer can be used to enrich matching docs with comments that match a particular filter. +
|
||||||
|
In this particular query, the child Filter will only match the first comment of doc(ID:1),
|
||||||
|
therefore only that particular comment will be appended to the result. This is a special syntax feature.
|
||||||
|
|
||||||
|
[source,json]
|
||||||
|
----
|
||||||
|
{ "response":{"numFound":1,"start":0,"docs":[
|
||||||
|
{
|
||||||
|
"ID": "1",
|
||||||
|
"title": "Cooking Recommendations",
|
||||||
|
"tags": ["cooking", "meetup"],
|
||||||
|
"posts": [{
|
||||||
|
"ID": "2",
|
||||||
|
"title": "Cookies",
|
||||||
|
"comments": [{
|
||||||
|
"ID": "3",
|
||||||
|
"content": "Lovely recipe"
|
||||||
|
}]
|
||||||
|
}]
|
||||||
|
}]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
----
|
||||||
|
|
||||||
|
=== Children Query Parser
|
||||||
|
|
||||||
|
Can be used to retrieve children of a matching document. +
|
||||||
|
For a detailed explanation of this parser, see the section <<other-parsers.adoc#block-join-children-query-parser, Block Join Children Query Parser>>.
|
||||||
|
|
||||||
|
For example, let us examine this query:
|
||||||
|
`q={!child of='_nest_path_:/posts}content:"Search Engineer"`. +
|
||||||
|
The `'of'` filter returns all posts. This is used to filter out all documents in a particular path of the hierarchy(all parents).
|
||||||
|
The second part of the query is a filter for some parents, which we wish to return their children. +
|
||||||
|
In this example, all comments of posts which had "Search Engineer in their `content` field will be returned.
|
||||||
|
|
||||||
|
[source,json]
|
||||||
|
----
|
||||||
|
{ "response":{"numFound":2,"start":0,"docs":[
|
||||||
|
{
|
||||||
|
"ID": "8",
|
||||||
|
"content": "I am interested"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"ID": "9",
|
||||||
|
"content": "How large is the team?"
|
||||||
|
}
|
||||||
|
]}
|
||||||
|
}
|
||||||
|
----
|
||||||
|
|
||||||
|
=== Parents Query Parser
|
||||||
|
|
||||||
|
Can be used to retrieve parents of a child document. +
|
||||||
|
For a detailed explanation of this parser, see the section <<other-parsers.adoc#block-join-parent-query-parser,Block Join Parent Query Parser>>.
|
||||||
|
|
||||||
|
For example, let us examine this query:
|
||||||
|
`q={!parent which='-_nest_path_:* \*:*'}title:"Search Engineer"`. +
|
||||||
|
The `'which'` filter returns all root documents.
|
||||||
|
The second part of this query is a filter to match some child documents.
|
||||||
|
This query returns the parent at the root(since all parents filter returns root documents) of each
|
||||||
|
matching child document. In this case, all child documents which had `Search Engineer` in their `title` field.
|
||||||
|
|
||||||
|
[source,json]
|
||||||
|
----
|
||||||
|
{ "response":{"numFound":1,"start":0,"docs":[{
|
||||||
|
"ID": "6",
|
||||||
|
"title": "For Hire",
|
||||||
|
"tags": ["professional", "jobs"]
|
||||||
|
}
|
||||||
|
]}
|
||||||
|
}
|
||||||
|
----
|
||||||
|
|
||||||
|
=== Combining Block Join Query Parsers with Child Doc Transformer
|
||||||
|
|
||||||
|
The combination of these two features enable seamless creation of powerful queries. +
|
||||||
|
For example, querying posts which are under a page tagged as a job, contain the words "Search Engineer".
|
||||||
|
The comments for matching posts can also be fetched, all done in a single Solr Query.
|
||||||
|
|
||||||
|
For example, let us examine this query:
|
||||||
|
`q=+{!child of='-\_nest_path_:* \*:*'}+tags:"jobs" &fl=*,[child]
|
||||||
|
&fq=\_nest_path_:/posts`. +
|
||||||
|
This query returns all posts and their comments, which had "Search Engineer" in their title,
|
||||||
|
and are indexed under a page tagged with "jobs".
|
||||||
|
The comments are appended to the matching posts, since the ChildDocTransformer is specified under the `fl` parameter.
|
||||||
|
|
||||||
|
[source,json]
|
||||||
|
----
|
||||||
|
{ "response":{"numFound":1,"start":0,"docs":[
|
||||||
|
{
|
||||||
|
"ID": "7",
|
||||||
|
"title": "Search Engineer",
|
||||||
|
"comments": [{
|
||||||
|
"ID": "8",
|
||||||
|
"content": "I am interested"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"ID": "9",
|
||||||
|
"content": "How large is the team?"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"ID": "10",
|
||||||
|
"title": "Low level Engineer"
|
||||||
|
}]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
----
|
|
@ -10,6 +10,7 @@
|
||||||
spell-checking, +
|
spell-checking, +
|
||||||
query-re-ranking, +
|
query-re-ranking, +
|
||||||
transforming-result-documents, +
|
transforming-result-documents, +
|
||||||
|
searching-nested-documents, +
|
||||||
suggester, +
|
suggester, +
|
||||||
morelikethis, +
|
morelikethis, +
|
||||||
pagination-of-results, +
|
pagination-of-results, +
|
||||||
|
@ -69,6 +70,7 @@ This section describes how Solr works with search requests. It covers the follow
|
||||||
** <<learning-to-rank.adoc#learning-to-rank,Learning To Rank>>: How to use LTR to run machine learned ranking models in Solr.
|
** <<learning-to-rank.adoc#learning-to-rank,Learning To Rank>>: How to use LTR to run machine learned ranking models in Solr.
|
||||||
|
|
||||||
* <<transforming-result-documents.adoc#transforming-result-documents,Transforming Result Documents>>: Detailed information about using `DocTransformers` to add computed information to individual documents
|
* <<transforming-result-documents.adoc#transforming-result-documents,Transforming Result Documents>>: Detailed information about using `DocTransformers` to add computed information to individual documents
|
||||||
|
* <<searching-nested-documents.adoc#searching-nested-documents,Searching Nested Documents>>: Detailed information about constructing nested and hierarchical queries.
|
||||||
* <<suggester.adoc#suggester,Suggester>>: Detailed information about Solr's powerful autosuggest component.
|
* <<suggester.adoc#suggester,Suggester>>: Detailed information about Solr's powerful autosuggest component.
|
||||||
* <<morelikethis.adoc#morelikethis,MoreLikeThis>>: Detailed information about Solr's similar results query component.
|
* <<morelikethis.adoc#morelikethis,MoreLikeThis>>: Detailed information about Solr's similar results query component.
|
||||||
* <<pagination-of-results.adoc#pagination-of-results,Pagination of Results>>: Detailed information about fetching paginated results for display in a UI, or for fetching all documents matching a query.
|
* <<pagination-of-results.adoc#pagination-of-results,Pagination of Results>>: Detailed information about fetching paginated results for display in a UI, or for fetching all documents matching a query.
|
||||||
|
|
|
@ -777,106 +777,6 @@ curl 'http://localhost:8983/api/collections/techproducts/update/json' -H 'Conten
|
||||||
====
|
====
|
||||||
--
|
--
|
||||||
|
|
||||||
== Indexing Nested Documents
|
|
||||||
|
|
||||||
The following is an example of indexing nested documents:
|
|
||||||
|
|
||||||
[.dynamic-tabs]
|
|
||||||
--
|
|
||||||
[example.tab-pane#v1nested]
|
|
||||||
====
|
|
||||||
[.tab-label]*V1 API*
|
|
||||||
[source,bash]
|
|
||||||
----
|
|
||||||
curl 'http://localhost:8983/solr/techproducts/update/json/docs?split=/|/orgs'\
|
|
||||||
-H 'Content-type:application/json' -d '{
|
|
||||||
"name": "Joe Smith",
|
|
||||||
"phone": 876876687,
|
|
||||||
"orgs": [
|
|
||||||
{
|
|
||||||
"name": "Microsoft",
|
|
||||||
"city": "Seattle",
|
|
||||||
"zip": 98052
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"name": "Apple",
|
|
||||||
"city": "Cupertino",
|
|
||||||
"zip": 95014
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}'
|
|
||||||
----
|
|
||||||
====
|
|
||||||
|
|
||||||
[example.tab-pane#v2nested]
|
|
||||||
====
|
|
||||||
[.tab-label]*V2 API Standalone Solr*
|
|
||||||
[source,bash]
|
|
||||||
----
|
|
||||||
curl 'http://localhost:8983/api/cores/techproducts/update/json?split=/|/orgs'\
|
|
||||||
-H 'Content-type:application/json' -d '{
|
|
||||||
"name": "Joe Smith",
|
|
||||||
"phone": 876876687,
|
|
||||||
"orgs": [
|
|
||||||
{
|
|
||||||
"name": "Microsoft",
|
|
||||||
"city": "Seattle",
|
|
||||||
"zip": 98052
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"name": "Apple",
|
|
||||||
"city": "Cupertino",
|
|
||||||
"zip": 95014
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}'
|
|
||||||
----
|
|
||||||
====
|
|
||||||
|
|
||||||
[example.tab-pane#v2nestedcloud]
|
|
||||||
====
|
|
||||||
[.tab-label]*V2 API SolrCloud*
|
|
||||||
[source,bash]
|
|
||||||
----
|
|
||||||
curl 'http://localhost:8983/api/collections/techproducts/update/json?split=/|/orgs'\
|
|
||||||
-H 'Content-type:application/json' -d '{
|
|
||||||
"name": "Joe Smith",
|
|
||||||
"phone": 876876687,
|
|
||||||
"orgs": [
|
|
||||||
{
|
|
||||||
"name": "Microsoft",
|
|
||||||
"city": "Seattle",
|
|
||||||
"zip": 98052
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"name": "Apple",
|
|
||||||
"city": "Cupertino",
|
|
||||||
"zip": 95014
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}'
|
|
||||||
----
|
|
||||||
====
|
|
||||||
--
|
|
||||||
|
|
||||||
With this example, the documents indexed would be, as follows:
|
|
||||||
|
|
||||||
[source,json]
|
|
||||||
----
|
|
||||||
{
|
|
||||||
"name":"Joe Smith",
|
|
||||||
"phone":876876687,
|
|
||||||
"_childDocuments_":[
|
|
||||||
{
|
|
||||||
"name":"Microsoft",
|
|
||||||
"city":"Seattle",
|
|
||||||
"zip":98052},
|
|
||||||
{
|
|
||||||
"name":"Apple",
|
|
||||||
"city":"Cupertino",
|
|
||||||
"zip":95014}]}
|
|
||||||
----
|
|
||||||
|
|
||||||
== Tips for Custom JSON Indexing
|
== Tips for Custom JSON Indexing
|
||||||
|
|
||||||
. Schemaless mode: This handles field creation automatically. The field guessing may not be exactly as you expect, but it works. The best thing to do is to setup a local server in schemaless mode, index a few sample docs and create those fields in your real setup with proper field types before indexing
|
. Schemaless mode: This handles field creation automatically. The field guessing may not be exactly as you expect, but it works. The best thing to do is to setup a local server in schemaless mode, index a few sample docs and create those fields in your real setup with proper field types before indexing
|
||||||
|
|
|
@ -124,7 +124,7 @@ A default style can be configured by specifying an `args` parameter in your `sol
|
||||||
|
|
||||||
=== [child] - ChildDocTransformerFactory
|
=== [child] - ChildDocTransformerFactory
|
||||||
|
|
||||||
This transformer returns all <<uploading-data-with-index-handlers.adoc#nested-child-documents,descendant documents>> of each parent document matching your query in a flat list nested inside the matching parent document. This is useful when you have indexed nested child documents and want to retrieve the child documents for the relevant parent documents for any type of search query.
|
This transformer returns all <<indexing-nested-documents.adoc#indexing-nested-documents,descendant documents>> of each parent document matching your query in a flat list nested inside the matching parent document. This is useful when you have indexed nested child documents and want to retrieve the child documents for the relevant parent documents for any type of search query.
|
||||||
|
|
||||||
[source,plain]
|
[source,plain]
|
||||||
----
|
----
|
||||||
|
@ -133,10 +133,10 @@ fl=id,[child parentFilter=doc_type:book childFilter=doc_type:chapter limit=100]
|
||||||
|
|
||||||
Note that this transformer can be used even though the query itself is not a <<other-parsers.adoc#block-join-query-parsers,Block Join query>>.
|
Note that this transformer can be used even though the query itself is not a <<other-parsers.adoc#block-join-query-parsers,Block Join query>>.
|
||||||
|
|
||||||
When using this transformer, the `parentFilter` parameter must be specified, and works the same as in all Block Join Queries. Additional optional parameters are:
|
When using this transformer, the `parentFilter` parameter must be specified _unless_ the schema declares `\_nest_path_`. It works the same as in all Block Join Queries. Additional optional parameters are:
|
||||||
|
|
||||||
`childFilter`::
|
`childFilter`::
|
||||||
A query to filter which child documents should be included. This can be particularly useful when you have multiple levels of hierarchical documents. The default is all children.
|
A query to filter which child documents should be included. This can be particularly useful when you have multiple levels of hierarchical documents. The default is all children. This query supports a special syntax to match nested doc patterns so long as `\_nest_path_` is defined in the schema and the query contains a `/` preceding the first `:`. Example: `childFilter=/comments/content:recipe` Further details of this are experimental.
|
||||||
|
|
||||||
`limit`::
|
`limit`::
|
||||||
The maximum number of child documents to be returned per parent document. The default is `10`.
|
The maximum number of child documents to be returned per parent document. The default is `10`.
|
||||||
|
|
|
@ -541,105 +541,3 @@ In addition to the `/update` handler, there is an additional CSV specific reques
|
||||||
|===
|
|===
|
||||||
|
|
||||||
The `/update/csv` path may be useful for clients sending in CSV formatted update commands from applications where setting the Content-Type proves difficult.
|
The `/update/csv` path may be useful for clients sending in CSV formatted update commands from applications where setting the Content-Type proves difficult.
|
||||||
|
|
||||||
== Nested Child Documents
|
|
||||||
|
|
||||||
Solr supports indexing nested documents such as a blog post parent document and comments as child documents -- or products as parent documents and sizes, colors, or other variations as child documents.
|
|
||||||
The parent with all children is referred to as a "block" and it explains some of the nomenclature of related features.
|
|
||||||
At query time, the <<other-parsers.adoc#block-join-query-parsers,Block Join Query Parsers>> can search these relationships,
|
|
||||||
and the `[child]` <<transforming-result-documents.adoc#transforming-result-documents,Document Transformer>> can attach child documents to the result documents.
|
|
||||||
In terms of performance, indexing the relationships between documents usually yields much faster queries than an equivalent "query time join",
|
|
||||||
since the relationships are already stored in the index and do not need to be computed.
|
|
||||||
However, nested documents are less flexible than query time joins as it imposes rules that some applications may not be able to accept.
|
|
||||||
|
|
||||||
.Note
|
|
||||||
[NOTE]
|
|
||||||
====
|
|
||||||
A big limitation is that the whole block of parent-children documents must be updated or deleted together, not separately.
|
|
||||||
In other words, even if a single child document or the parent document is changed, the whole block of parent-child documents must be indexed together.
|
|
||||||
_Solr does not enforce this rule_; if it's violated, you may get sporadic query failures or incorrect results.
|
|
||||||
====
|
|
||||||
|
|
||||||
Nested documents may be indexed via either the XML or JSON data syntax, and is also supported by <<using-solrj.adoc#using-solrj,SolrJ>> with javabin.
|
|
||||||
|
|
||||||
=== Schema Notes
|
|
||||||
|
|
||||||
* The schema must include an indexed, non-stored field `\_root_`. The value of that field is populated automatically and is the same for all documents in the block, regardless of the inheritance depth.
|
|
||||||
* Nested documents are very much documents in their own right even if certain nested documents hold different information from the parent.
|
|
||||||
Therefore:
|
|
||||||
** the schema must be able to represent the fields of any document
|
|
||||||
** it may be infeasible to use `required`
|
|
||||||
** even child documents need a unique `id`
|
|
||||||
* You must include a field that identifies the parent document as a parent; it can be any field that suits this purpose, and it will be used as input for the <<other-parsers.adoc#block-join-query-parsers,block join query parsers>>.
|
|
||||||
* If you associate a child document as a field (e.g., comment), that field need not be defined in the schema, and probably
|
|
||||||
shouldn't be as it would be confusing. There is no child document field type.
|
|
||||||
|
|
||||||
=== XML Examples
|
|
||||||
|
|
||||||
For example, here are two documents and their child documents.
|
|
||||||
It illustrates two styles of adding child documents; the first is associated via a field "comment" (preferred),
|
|
||||||
and the second is done in the classic way now referred to as an "anonymous" or "unlabelled" child document.
|
|
||||||
This field label relationship is available to the URP chain in Solr but is ultimately discarded.
|
|
||||||
Solr 8 will save the relationship.
|
|
||||||
|
|
||||||
[source,xml]
|
|
||||||
----
|
|
||||||
<add>
|
|
||||||
<doc>
|
|
||||||
<field name="id">1</field>
|
|
||||||
<field name="title">Solr adds block join support</field>
|
|
||||||
<field name="content_type">parentDocument</field>
|
|
||||||
<field name="content">
|
|
||||||
<doc>
|
|
||||||
<field name="id">2</field>
|
|
||||||
<field name="comments">SolrCloud supports it too!</field>
|
|
||||||
</doc>
|
|
||||||
</field>
|
|
||||||
</doc>
|
|
||||||
<doc>
|
|
||||||
<field name="id">3</field>
|
|
||||||
<field name="title">New Lucene and Solr release is out</field>
|
|
||||||
<field name="content_type">parentDocument</field>
|
|
||||||
<doc>
|
|
||||||
<field name="id">4</field>
|
|
||||||
<field name="comments">Lots of new features</field>
|
|
||||||
</doc>
|
|
||||||
</doc>
|
|
||||||
</add>
|
|
||||||
----
|
|
||||||
|
|
||||||
In this example, we have indexed the parent documents with the field `content_type`, which has the value "parentDocument".
|
|
||||||
We could have also used a boolean field, such as `isParent`, with a value of "true", or any other similar approach.
|
|
||||||
|
|
||||||
=== JSON Examples
|
|
||||||
|
|
||||||
This example is equivalent to the XML example above.
|
|
||||||
Again, the field labelled relationship is preferred.
|
|
||||||
The labelled relationship here is one child document but could have been wrapped in array brackets.
|
|
||||||
For the anonymous relationship, note the special `\_childDocuments_` key whose contents must be an array of child documents.
|
|
||||||
|
|
||||||
[source,json]
|
|
||||||
----
|
|
||||||
[
|
|
||||||
{
|
|
||||||
"id": "1",
|
|
||||||
"title": "Solr adds block join support",
|
|
||||||
"content_type": "parentDocument",
|
|
||||||
"comment": {
|
|
||||||
"id": "2",
|
|
||||||
"comments": "SolrCloud supports it too!"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"id": "3",
|
|
||||||
"title": "New Lucene and Solr release is out",
|
|
||||||
"content_type": "parentDocument",
|
|
||||||
"_childDocuments_": [
|
|
||||||
{
|
|
||||||
"id": "4",
|
|
||||||
"comments": "Lots of new features"
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}
|
|
||||||
]
|
|
||||||
----
|
|
||||||
|
|
Loading…
Reference in New Issue