Commit Graph

73 Commits

Author SHA1 Message Date
David Pilato c65db1a008 Fix mapping examples in documentation
Closes #179
(cherry picked from commit 7700340)
(cherry picked from commit ad68433)
(cherry picked from commit f83ffd2)
2015-10-29 19:22:43 +01:00
David Pilato a8b1a6fe15 update documentation with release 3.0.2 2015-10-28 19:19:00 +01:00
David Pilato 4edacab6cc Create branch for elasticsearch 2.x.x 2015-10-23 17:20:54 +02:00
David Pilato 543018a60a Create branch for elasticsearch 2.1.x 2015-10-23 16:58:50 +02:00
Björn Ali Göransson 88ca607058 Improved introduction, added hello world
New paragraph

Some abbreviation to 1st paragraph

More concise phrasing

Rename heading

Remove repeated "Now," from Hello World

Person is also a document

Rephrasing of last paragraph in Hello, World

Move installation to being above Hello, world

Accidentally left out moving code backticks. Fixed

Closes #155
2015-10-23 16:06:17 +02:00
David Pilato 602dd04f8b [Docs] Mapper 3.0 show ES 2.0 functionality that does not exist
Closes #159.
2015-10-23 15:56:48 +02:00
David Pilato 64983c705c update documentation with release 3.0.1 2015-10-01 00:22:10 +02:00
David Pilato f2eb0a3b2b Latest version is 3.0.0 2015-09-30 17:26:56 +02:00
David Pilato f18f1550fa update documentation with release 2.7.1 2015-09-30 12:18:01 +02:00
David Pilato 7ce41e13f5 update documentation with release 3.0.0 2015-08-27 14:40:43 +02:00
David Pilato f31d37b6bb Update README for 3.0.0-SNAPSHOT
Related to #151
2015-08-25 13:48:37 +02:00
Clinton Gormley 7f1926dcbf Changed 1.x version to 1.7 2015-07-17 17:36:57 +02:00
Tanguy Leroux 07b475bb48 update documentation with release 2.7.0 2015-07-17 15:11:40 +02:00
David Pilato 91caf3cf3b update documentation with release 2.6.0 2015-06-11 13:22:53 +02:00
David Pilato 06d84cd502 Create branch es-1.6 2015-06-11 13:13:09 +02:00
David Pilato 7e2a9dbf0c update documentation with release 2.5.0 2015-03-31 17:59:01 +02:00
David Pilato e08ebe9efa create `es-1.5` branch 2015-03-16 16:52:08 -07:00
David Pilato d4d54fe744 update documentation with release 2.4.3 2015-02-23 16:56:39 +01:00
David Pilato 4ffa06d773 [Doc] highlighting example is incorrect
Closes #107.
2015-02-23 11:10:50 +01:00
David Pilato ec0de9c57d [Test] Use now full qualified names for fields
We were asking for short name fields but elasticsearch does not allow anymore using short names but full qualified names.

```java
SearchResponse response = client().prepareSearch("test")
        .addField("content_type")
        .addField("name")
        .execute().get();
```

We need to use now:

```java
SearchResponse response = client().prepareSearch("test")
        .addField("file.content_type")
        .addField("file.name")
        .execute().get();
```

Closes #102.
2015-02-18 20:36:25 +01:00
David Pilato 400910e53e update documentation with release 2.4.2 2015-02-11 23:22:02 +01:00
David Pilato 77081e3dbf [Doc] copy_to using attachment field type
If you want to use [copy_to](http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html#copy-to)
feature, you need to define it on each sub-field you want to copy to another field:

```javascript
PUT /test/person/_mapping
{
  "person": {
    "properties": {
      "file": {
        "type": "attachment",
        "path": "full",
        "fields": {
          "file": {
            "type": "string",
            "copy_to": "copy"
          }
        }
      },
      "copy": {
        "type": "string"
      }
    }
  }
}
```

In this example, the extracted content will be copy as well to `copy` field.

Closes #97.
(cherry picked from commit f4f6b57)
(cherry picked from commit 5878a62)
2015-02-11 23:13:56 +01:00
David Pilato 931be57da9 [test] Add standalone runner
It could be sometime useful to have a stand alone runner to see how exactly Tika extracts content from a given file.

You can run `StandaloneRunner` class using:

*  `-u file://URL/TO/YOUR/DOC`
*  `--size` set extracted size (default to mapper attachment size)
*  `BASE64` encoded binary

Example:

```sh
StandaloneRunner BASE64Text
StandaloneRunner -u /tmp/mydoc.pdf
StandaloneRunner -u /tmp/mydoc.pdf --size 1000000
```

It produces something like:

```
## Extracted text
--------------------- BEGIN -----------------------
This is the extracted text
---------------------- END ------------------------
## Metadata
- author: null
- content_length: null
- content_type: application/pdf
- date: null
- keywords: null
- language: null
- name: null
- title: null
```

Closes #99.
(cherry picked from commit 720b3bf)
(cherry picked from commit 990fa15)
2015-02-09 17:45:07 +01:00
tlrx a5ed51533c update documentation with release 2.4.1 2014-11-05 20:38:24 +01:00
David Pilato 03b47d5a4c update documentation with release 2.4.0 2014-10-08 18:50:20 +02:00
David Pilato eef6b61806 Create branch es-1.4 for elasticsearch 1.4.0 2014-09-12 16:08:59 +02:00
David Pilato 34fe111a2b update documentation with release 2.3.2 2014-09-01 09:53:26 +02:00
David Pilato cc1a43b5c3 update documentation with release 2.3.1 2014-08-18 21:52:53 +02:00
David Pilato 08454d72f6 update documentation with release 2.2.1 2014-08-18 21:39:31 +02:00
David Pilato 587e6d3da2 Docs: make the welcome page more obvious
Closes #79.
2014-08-18 12:38:03 +02:00
David Pilato ad986eb2fc Add support for multi-fields
Now https://github.com/elasticsearch/elasticsearch/pull/6867 is merged in elasticsearch core code (branch 1.x - es 1.4),
we can support multi fields in mapper attachment plugin.

```
DELETE /test
PUT /test
{
  "settings": {
    "number_of_shards": 1
  }
}
PUT /test/person/_mapping
{
  "person": {
    "properties": {
      "file": {
        "type": "attachment",
        "path": "full",
        "fields": {
          "file": {
            "type": "string",
            "fields": {
              "store": {
                "type": "string",
                "store": true
              }
            }
          },
          "content_type": {
            "type": "string",
            "fields": {
              "store": {
                "type": "string",
                "store": true
              },
              "untouched": {
                "type": "string",
                "index": "not_analyzed",
                "store": true
              }
            }
          }
        }
      }
    }
  }
}

PUT /test/person/1?refresh=true
{
  "file": "IkdvZCBTYXZlIHRoZSBRdWVlbiIgKGFsdGVybmF0aXZlbHkgIkdvZCBTYXZlIHRoZSBLaW5nIg=="
}

GET /test/person/_search
{
  "fields": [
    "file.store",
    "file.content_type.store"
  ],
  "aggs": {
    "store": {
      "terms": {
        "field": "file.content_type.store"
      }
    },
    "untouched": {
      "terms": {
        "field": "file.content_type.untouched"
      }
    }
  }
}
```

It gives:

```js
{
   "took": 3,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 1,
      "hits": [
         {
            "_index": "test",
            "_type": "person",
            "_id": "1",
            "_score": 1,
            "fields": {
               "file.store": [
                  "\"God Save the Queen\" (alternatively \"God Save the King\"\n"
               ],
               "file.content_type.store": [
                  "text/plain; charset=ISO-8859-1"
               ]
            }
         }
      ]
   },
   "aggregations": {
      "store": {
         "doc_count_error_upper_bound": 0,
         "buckets": [
            {
               "key": "1",
               "doc_count": 1
            },
            {
               "key": "8859",
               "doc_count": 1
            },
            {
               "key": "charset",
               "doc_count": 1
            },
            {
               "key": "iso",
               "doc_count": 1
            },
            {
               "key": "plain",
               "doc_count": 1
            },
            {
               "key": "text",
               "doc_count": 1
            }
         ]
      },
      "untouched": {
         "doc_count_error_upper_bound": 0,
         "buckets": [
            {
               "key": "text/plain; charset=ISO-8859-1",
               "doc_count": 1
            }
         ]
      }
   }
}
```

Note that using shorter definition works as well:

```
DELETE /test
PUT /test
{
  "settings": {
    "number_of_shards": 1
  }
}
PUT /test/person/_mapping
{
  "person": {
    "properties": {
      "file": {
        "type": "attachment"
      }
    }
  }
}
PUT /test/person/1?refresh=true
{
  "file": "IkdvZCBTYXZlIHRoZSBRdWVlbiIgKGFsdGVybmF0aXZlbHkgIkdvZCBTYXZlIHRoZSBLaW5nIg=="
}

GET /test/person/_search
{
  "query": {
    "match": {
      "file": "king"
    }
  }
}
```

gives:

```js
{
   "took": 53,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 0.095891505,
      "hits": [
         {
            "_index": "test",
            "_type": "person",
            "_id": "1",
            "_score": 0.095891505,
            "_source": {
               "file": "IkdvZCBTYXZlIHRoZSBRdWVlbiIgKGFsdGVybmF0aXZlbHkgIkdvZCBTYXZlIHRoZSBLaW5nIg=="
            }
         }
      ]
   }
}
```

Closes #57.

(cherry picked from commit 432d7c0)
2014-07-26 00:27:28 +02:00
David Pilato 663d4eaddb Update to elasticsearch 1.4.0
Closes #77.

(cherry picked from commit c58516f)
2014-07-26 00:26:41 +02:00
David Pilato eaccd4383d Deprecate `content` by `_content`
When we want to force some values, we need to set those using `_field` where `field` is the field name we want to force:

```
{
  "file": {
    "_name": "myfilename.txt"
  }
}
```

But to set the content itself, we use `content` field name.

```
{
  "file": {
    "content": "VGhpcyBpcyBhbiBlbGFzdGljc2VhcmNoIG1hcHBlciBhdHRhY2htZW50IHRlc3Qu",
    "_name": "myfilename.txt"
  }
}
```

For consistency, we set `_content` instead:

```
{
  "file": {
    "_content": "VGhpcyBpcyBhbiBlbGFzdGljc2VhcmNoIG1hcHBlciBhdHRhY2htZW50IHRlc3Qu",
    "_name": "myfilename.txt"
  }
}
```

Closes #73.

(cherry picked from commit 2e6be20)
2014-07-25 18:15:37 +02:00
David Pilato 51a8f6f1a0 Fix doc typo
(cherry picked from commit f70eb1d)
2014-06-03 10:13:12 +02:00
David Pilato 94cf141108 Use` _language` field instead of `language`
When we want to force a language instead of using Tika language detection, we set `language` field in documents.

 To be consistent with other forced fields, `_content_type` and `_name`, we should prefix `language` field by an underscore `_`.

 So `language` become `_language`.

 We first deprecate `language` in version 2.1.0 and we remove it in 2.3.0.

 Closes #68.

(cherry picked from commit 2f46343)
2014-06-03 10:10:49 +02:00
David Pilato 7c1c2011bc Update to elasticsearch 1.3.0
Closes #67.
(cherry picked from commit d3eaac9)
2014-06-03 09:49:41 +02:00
David Pilato c0e7795f1f Update to elasticsearch 1.2.0
Closes #66.
(cherry picked from commit fb3b288)
2014-06-03 09:49:13 +02:00
David Pilato 7f8143ff12 Add highlighting documentation
Closes #54.
(cherry picked from commit efdf8ef)
2014-06-03 09:35:05 +02:00
David Pilato 8855bd7ddc Fix typo for JSON fields
(cherry picked from commit 63c60b8)
2014-06-03 09:34:51 +02:00
David Pilato e95bb18edb Create branches according to elasticsearch versions
We create branches:

* es-0.90 for elasticsearch 0.90
* es-1.0 for elasticsearch 1.0
* es-1.1 for elasticsearch 1.1
* master for elasticsearch master

We also check that before releasing we don't have a dependency to an elasticsearch SNAPSHOT version.

Add links to each version in documentation
2014-03-28 17:47:38 +01:00
David Pilato 839c4dab16 prepare for next development iteration 2014-03-25 19:02:16 +01:00
David Pilato 74d882110d prepare release elasticsearch-mapper-attachments-2.0.0 2014-03-25 18:47:56 +01:00
Richard Louapre 3d15cb0484 Add language detection option
Based on PR #45, we add a new language detection option using Language detection feature available in Tika:
https://tika.apache.org/1.4/detection.html#Language_Detection

By default, language detection is disabled (`false`) as it could come with a cost.
This default value can be changed by setting the `index.mapping.attachment.detect_language` setting.
It can also be provided on a per document indexed using the `_detect_language` parameter.

Closes #45.
Closes #44.
2014-03-25 18:26:09 +01:00
David Pilato 621995d0b4 Upgrade to Tika 1.5
Closes #56.
2014-03-19 23:20:29 +01:00
David Pilato 9d0b700b05 Add plugin release semi-automatic script
Closes #58.
2014-03-19 23:04:09 +01:00
David Pilato 7fc31c89f7 prepare release elasticsearch-mapper-attachments-2.0.0.RC1 2014-01-15 23:37:44 +01:00
David Pilato b877f1bd4f Update to elasticsearch 1.0.0.RC1
Closes #48.
2014-01-14 14:51:32 +01:00
David Pilato f8f647dea9 update headers 2014-01-13 22:31:14 +01:00
David Pilato 3f3fd74ee1 prepare release elasticsearch-mapper-attachments-1.9.0 2013-08-20 18:57:35 +02:00
David Pilato 62cc54a7c8 Update readme with release dates 2013-08-20 16:15:18 +02:00