lucene/solr/example/films/README.md

We have a movie data set in JSON, Solr XML, and CSV formats.
All 3 formats contain the same data.  You can use any one format to index documents to Solr.

The data is fetched from Freebase and the data license is present in the films-LICENSE.txt file.

This data consists of the following fields:
 * "id" - unique identifier for the movie
 * "name" - Name of the movie
 * "directed_by" - The person(s) who directed the making of the film
 * "initial_release_date" - The earliest official initial film screening date in any country
 * "genre" - The genre(s) that the movie belongs to

 Steps:
   * Start Solr:
     ```
     bin/solr start
     ```

   * Create a "films" core:
   
     ```
     bin/solr create -c films
     ```

   * Set the schema on a couple of fields that Solr would otherwise guess differently (than we'd like) about:
   
      ```
      curl http://localhost:8983/solr/films/schema -X POST -H 'Content-type:application/json' --data-binary '{
        "add-field" : {
          "name":"name",
          "type":"text_general",
          "multiValued":false,
          "stored":true
        },
        "add-field" : {
          "name":"initial_release_date",
          "type":"pdate",
          "stored":true
        }
      }'
      ```

   * Now let's index the data, using one of these three commands:

     - JSON: `bin/post -c films example/films/films.json`
     - XML: `bin/post -c films example/films/films.xml`
     - CSV: 
     ```
         bin/post \
                  -c films \
                  example/films/films.csv \
                  -params "f.genre.split=true&f.directed_by.split=true&f.genre.separator=|&f.directed_by.separator=|"
     ```
   * Let's get searching!
     - Search for 'Batman':
     
       http://localhost:8983/solr/films/query?q=name:batman

       * If you get an error about the name field not existing, you haven't yet indexed the data
       * If you don't get an error, but zero results, chances are that the _name_ field schema type override wasn't set
         before indexing the data the first time (it ended up as a "string" type, requiring exact matching by case even).
         It's easiest to simply reset the environment and try again, ensuring that each step successfully executes.

     - Show me all 'Super hero' movies:
     
       http://localhost:8983/solr/films/query?q=*:*&fq=genre:%22Superhero%20movie%22

     - Let's see the distribution of genres across all the movies. See the facet section of the response for the counts:
     
       http://localhost:8983/solr/films/query?q=*:*&facet=true&facet.field=genre

Exploring the data further - 

  * Increase the MAX_ITERATIONS value, put in your freebase API_KEY and run the film_data_generator.py script using Python 3.
    Now re-index Solr with the new data.

FAQ:
  Why override the schema of the _name_ and _initial_release_date_ fields?

     Without overriding those field types, the _name_ field would have been guessed as a multi-valued string field type
     and _initial_release_date_ would have been guessed as a multi-valued pdate type.  It makes more sense with this
     particular data set domain to have the movie name be a single valued general full-text searchable field,
     and for the release date also to be single valued.

  How do I clear and reset my environment?

      See the script below.

  Is there an easy to copy/paste script to do all of the above?

```
    Here ya go << END_OF_SCRIPT

bin/solr stop
rm server/logs/*.log
rm -Rf server/solr/films/
bin/solr start
bin/solr create -c films
curl http://localhost:8983/solr/films/schema -X POST -H 'Content-type:application/json' --data-binary '{
    "add-field" : {
        "name":"name",
        "type":"text_general",
        "multiValued":false,
        "stored":true
    },
    "add-field" : {
        "name":"initial_release_date",
        "type":"pdate",
        "stored":true
    }
}'
bin/post -c films example/films/films.json

# END_OF_SCRIPT
```
SOLR-6127: Improve example docs, using films data git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1647918 13f79535-47bb-0310-9956-ffa450edef68 2014-12-25 16:27:12 -05:00			`We have a movie data set in JSON, Solr XML, and CSV formats.`
			`All 3 formats contain the same data. You can use any one format to index documents to Solr.`

			`The data is fetched from Freebase and the data license is present in the films-LICENSE.txt file.`

SOLR-6127: More improvements to the films example: remove fake document, README steps polished git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1650688 13f79535-47bb-0310-9956-ffa450edef68 2015-01-09 17:39:07 -05:00			`This data consists of the following fields:`
SOLR-6127: Improve example docs, using films data git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1647918 13f79535-47bb-0310-9956-ffa450edef68 2014-12-25 16:27:12 -05:00			`* "id" - unique identifier for the movie`
			`* "name" - Name of the movie`
			`* "directed_by" - The person(s) who directed the making of the film`
			`* "initial_release_date" - The earliest official initial film screening date in any country`
			`* "genre" - The genre(s) that the movie belongs to`

			`Steps:`
			`* Start Solr:`
SOLR-14429: Convert .txt files to properly formatted .md files (#1450) 2020-04-26 19:43:04 -04:00			```
			`bin/solr start`
			```
SOLR-6127: Improve example docs, using films data git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1647918 13f79535-47bb-0310-9956-ffa450edef68 2014-12-25 16:27:12 -05:00
SOLR-6127: More improvements to the films example: remove fake document, README steps polished git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1650688 13f79535-47bb-0310-9956-ffa450edef68 2015-01-09 17:39:07 -05:00			`* Create a "films" core:`
SOLR-14429: Convert .txt files to properly formatted .md files (#1450) 2020-04-26 19:43:04 -04:00
			```
			`bin/solr create -c films`
			```
SOLR-6127: Improve example docs, using films data git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1647918 13f79535-47bb-0310-9956-ffa450edef68 2014-12-25 16:27:12 -05:00
SOLR-6127: More improvements to the films example: remove fake document, README steps polished git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1650688 13f79535-47bb-0310-9956-ffa450edef68 2015-01-09 17:39:07 -05:00			`* Set the schema on a couple of fields that Solr would otherwise guess differently (than we'd like) about:`
SOLR-14429: Convert .txt files to properly formatted .md files (#1450) 2020-04-26 19:43:04 -04:00
			```
			`curl http://localhost:8983/solr/films/schema -X POST -H 'Content-type:application/json' --data-binary '{`
			`"add-field" : {`
			`"name":"name",`
			`"type":"text_general",`
			`"multiValued":false,`
			`"stored":true`
			`},`
			`"add-field" : {`
			`"name":"initial_release_date",`
			`"type":"pdate",`
			`"stored":true`
			`}`
			`}'`
			```
SOLR-6127: Improve example docs, using films data git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1647918 13f79535-47bb-0310-9956-ffa450edef68 2014-12-25 16:27:12 -05:00
Adjust films README using new bin/post script instead of curl git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1647930 13f79535-47bb-0310-9956-ffa450edef68 2014-12-25 21:45:21 -05:00			`* Now let's index the data, using one of these three commands:`
SOLR-6127: Improve example docs, using films data git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1647918 13f79535-47bb-0310-9956-ffa450edef68 2014-12-25 16:27:12 -05:00
SOLR-14429: Convert .txt files to properly formatted .md files (#1450) 2020-04-26 19:43:04 -04:00			- JSON: `bin/post -c films example/films/films.json`
			- XML: `bin/post -c films example/films/films.xml`
			`- CSV:`
			```
			`bin/post \`
SOLR-6900: converted to Unix-style options in bin/post and updated usage examples, also updated version string of SimplePostTool git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1651916 13f79535-47bb-0310-9956-ffa450edef68 2015-01-14 19:53:17 -05:00			`-c films \`
SOLR-6127: More improvements to the films example: remove fake document, README steps polished git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1650688 13f79535-47bb-0310-9956-ffa450edef68 2015-01-09 17:39:07 -05:00			`example/films/films.csv \`
SOLR-6900: converted to Unix-style options in bin/post and updated usage examples, also updated version string of SimplePostTool git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1651916 13f79535-47bb-0310-9956-ffa450edef68 2015-01-14 19:53:17 -05:00			`-params "f.genre.split=true&f.directed_by.split=true&f.genre.separator=\|&f.directed_by.separator=\|"`
SOLR-14429: Convert .txt files to properly formatted .md files (#1450) 2020-04-26 19:43:04 -04:00			```
SOLR-6127: More improvements to the films example: remove fake document, README steps polished git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1650688 13f79535-47bb-0310-9956-ffa450edef68 2015-01-09 17:39:07 -05:00			`* Let's get searching!`
SOLR-6127: Improve example docs, using films data git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1647918 13f79535-47bb-0310-9956-ffa450edef68 2014-12-25 16:27:12 -05:00			`- Search for 'Batman':`
SOLR-14429: Convert .txt files to properly formatted .md files (#1450) 2020-04-26 19:43:04 -04:00
SOLR-6127: Improve example docs, using films data git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1647918 13f79535-47bb-0310-9956-ffa450edef68 2014-12-25 16:27:12 -05:00			`http://localhost:8983/solr/films/query?q=name:batman`

SOLR-6127: README improvements git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1649523 13f79535-47bb-0310-9956-ffa450edef68 2015-01-05 08:46:10 -05:00			`* If you get an error about the name field not existing, you haven't yet indexed the data`
			`* If you don't get an error, but zero results, chances are that the _name_ field schema type override wasn't set`
SOLR-6127: More improvements to the films example: remove fake document, README steps polished git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1650688 13f79535-47bb-0310-9956-ffa450edef68 2015-01-09 17:39:07 -05:00			`before indexing the data the first time (it ended up as a "string" type, requiring exact matching by case even).`
			`It's easiest to simply reset the environment and try again, ensuring that each step successfully executes.`
SOLR-6127: README improvements git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1649523 13f79535-47bb-0310-9956-ffa450edef68 2015-01-05 08:46:10 -05:00
SOLR-6127: Improve example docs, using films data git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1647918 13f79535-47bb-0310-9956-ffa450edef68 2014-12-25 16:27:12 -05:00			`- Show me all 'Super hero' movies:`
SOLR-14429: Convert .txt files to properly formatted .md files (#1450) 2020-04-26 19:43:04 -04:00
SOLR-6127: Improve example docs, using films data git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1647918 13f79535-47bb-0310-9956-ffa450edef68 2014-12-25 16:27:12 -05:00			`http://localhost:8983/solr/films/query?q=:&fq=genre:%22Superhero%20movie%22`

SOLR-6127: More improvements to the films example: remove fake document, README steps polished git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1650688 13f79535-47bb-0310-9956-ffa450edef68 2015-01-09 17:39:07 -05:00			`- Let's see the distribution of genres across all the movies. See the facet section of the response for the counts:`
SOLR-14429: Convert .txt files to properly formatted .md files (#1450) 2020-04-26 19:43:04 -04:00
SOLR-6127: Improve example docs, using films data git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1647918 13f79535-47bb-0310-9956-ffa450edef68 2014-12-25 16:27:12 -05:00			`http://localhost:8983/solr/films/query?q=:&facet=true&facet.field=genre`

			`Exploring the data further -`

SOLR-6127: Fix reference to previously renamed script git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1649376 13f79535-47bb-0310-9956-ffa450edef68 2015-01-04 12:48:03 -05:00			`* Increase the MAX_ITERATIONS value, put in your freebase API_KEY and run the film_data_generator.py script using Python 3.`
SOLR-6127: README improvements git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1649523 13f79535-47bb-0310-9956-ffa450edef68 2015-01-05 08:46:10 -05:00			`Now re-index Solr with the new data.`

			`FAQ:`
			`Why override the schema of the _name_ and _initial_release_date_ fields?`

			`Without overriding those field types, the _name_ field would have been guessed as a multi-valued string field type`
SOLR-11324: Clean up mention of trie fields in documentation and source comments 2017-09-05 11:14:53 -04:00			`and _initial_release_date_ would have been guessed as a multi-valued pdate type. It makes more sense with this`
SOLR-6127: More improvements to the films example: remove fake document, README steps polished git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1650688 13f79535-47bb-0310-9956-ffa450edef68 2015-01-09 17:39:07 -05:00			`particular data set domain to have the movie name be a single valued general full-text searchable field,`
			`and for the release date also to be single valued.`
SOLR-6127: README improvements git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1649523 13f79535-47bb-0310-9956-ffa450edef68 2015-01-05 08:46:10 -05:00
			`How do I clear and reset my environment?`

SOLR-6127: More improvements to the films example: remove fake document, README steps polished git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1650688 13f79535-47bb-0310-9956-ffa450edef68 2015-01-09 17:39:07 -05:00			`See the script below.`

			`Is there an easy to copy/paste script to do all of the above?`

SOLR-14429: Convert .txt files to properly formatted .md files (#1450) 2020-04-26 19:43:04 -04:00			```
SOLR-6127: More improvements to the films example: remove fake document, README steps polished git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1650688 13f79535-47bb-0310-9956-ffa450edef68 2015-01-09 17:39:07 -05:00			`Here ya go << END_OF_SCRIPT`

			`bin/solr stop`
			`rm server/logs/*.log`
			`rm -Rf server/solr/films/`
			`bin/solr start`
SOLR-6900: converted to Unix-style options in bin/post and updated usage examples, also updated version string of SimplePostTool git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1651916 13f79535-47bb-0310-9956-ffa450edef68 2015-01-14 19:53:17 -05:00			`bin/solr create -c films`
SOLR-6127: More improvements to the films example: remove fake document, README steps polished git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1650688 13f79535-47bb-0310-9956-ffa450edef68 2015-01-09 17:39:07 -05:00			`curl http://localhost:8983/solr/films/schema -X POST -H 'Content-type:application/json' --data-binary '{`
			`"add-field" : {`
			`"name":"name",`
			`"type":"text_general",`
SOLR-9004: Fix name field type definition in films example 2016-04-17 14:21:47 -04:00			`"multiValued":false,`
SOLR-6127: More improvements to the films example: remove fake document, README steps polished git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1650688 13f79535-47bb-0310-9956-ffa450edef68 2015-01-09 17:39:07 -05:00			`"stored":true`
			`},`
			`"add-field" : {`
			`"name":"initial_release_date",`
SOLR-11324: Clean up mention of trie fields in documentation and source comments 2017-09-05 11:14:53 -04:00			`"type":"pdate",`
SOLR-6127: More improvements to the films example: remove fake document, README steps polished git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1650688 13f79535-47bb-0310-9956-ffa450edef68 2015-01-09 17:39:07 -05:00			`"stored":true`
			`}`
			`}'`
SOLR-6900: converted to Unix-style options in bin/post and updated usage examples, also updated version string of SimplePostTool git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1651916 13f79535-47bb-0310-9956-ffa450edef68 2015-01-14 19:53:17 -05:00			`bin/post -c films example/films/films.json`
SOLR-6127: More improvements to the films example: remove fake document, README steps polished git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1650688 13f79535-47bb-0310-9956-ffa450edef68 2015-01-09 17:39:07 -05:00
			`# END_OF_SCRIPT`
SOLR-14429: Convert .txt files to properly formatted .md files (#1450) 2020-04-26 19:43:04 -04:00			```