Commit Graph

44 Commits

Author SHA1 Message Date
David Pilato 587e6d3da2 Docs: make the welcome page more obvious
Closes #79.
2014-08-18 12:38:03 +02:00
David Pilato ad986eb2fc Add support for multi-fields
Now https://github.com/elasticsearch/elasticsearch/pull/6867 is merged in elasticsearch core code (branch 1.x - es 1.4),
we can support multi fields in mapper attachment plugin.

```
DELETE /test
PUT /test
{
  "settings": {
    "number_of_shards": 1
  }
}
PUT /test/person/_mapping
{
  "person": {
    "properties": {
      "file": {
        "type": "attachment",
        "path": "full",
        "fields": {
          "file": {
            "type": "string",
            "fields": {
              "store": {
                "type": "string",
                "store": true
              }
            }
          },
          "content_type": {
            "type": "string",
            "fields": {
              "store": {
                "type": "string",
                "store": true
              },
              "untouched": {
                "type": "string",
                "index": "not_analyzed",
                "store": true
              }
            }
          }
        }
      }
    }
  }
}

PUT /test/person/1?refresh=true
{
  "file": "IkdvZCBTYXZlIHRoZSBRdWVlbiIgKGFsdGVybmF0aXZlbHkgIkdvZCBTYXZlIHRoZSBLaW5nIg=="
}

GET /test/person/_search
{
  "fields": [
    "file.store",
    "file.content_type.store"
  ],
  "aggs": {
    "store": {
      "terms": {
        "field": "file.content_type.store"
      }
    },
    "untouched": {
      "terms": {
        "field": "file.content_type.untouched"
      }
    }
  }
}
```

It gives:

```js
{
   "took": 3,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 1,
      "hits": [
         {
            "_index": "test",
            "_type": "person",
            "_id": "1",
            "_score": 1,
            "fields": {
               "file.store": [
                  "\"God Save the Queen\" (alternatively \"God Save the King\"\n"
               ],
               "file.content_type.store": [
                  "text/plain; charset=ISO-8859-1"
               ]
            }
         }
      ]
   },
   "aggregations": {
      "store": {
         "doc_count_error_upper_bound": 0,
         "buckets": [
            {
               "key": "1",
               "doc_count": 1
            },
            {
               "key": "8859",
               "doc_count": 1
            },
            {
               "key": "charset",
               "doc_count": 1
            },
            {
               "key": "iso",
               "doc_count": 1
            },
            {
               "key": "plain",
               "doc_count": 1
            },
            {
               "key": "text",
               "doc_count": 1
            }
         ]
      },
      "untouched": {
         "doc_count_error_upper_bound": 0,
         "buckets": [
            {
               "key": "text/plain; charset=ISO-8859-1",
               "doc_count": 1
            }
         ]
      }
   }
}
```

Note that using shorter definition works as well:

```
DELETE /test
PUT /test
{
  "settings": {
    "number_of_shards": 1
  }
}
PUT /test/person/_mapping
{
  "person": {
    "properties": {
      "file": {
        "type": "attachment"
      }
    }
  }
}
PUT /test/person/1?refresh=true
{
  "file": "IkdvZCBTYXZlIHRoZSBRdWVlbiIgKGFsdGVybmF0aXZlbHkgIkdvZCBTYXZlIHRoZSBLaW5nIg=="
}

GET /test/person/_search
{
  "query": {
    "match": {
      "file": "king"
    }
  }
}
```

gives:

```js
{
   "took": 53,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 0.095891505,
      "hits": [
         {
            "_index": "test",
            "_type": "person",
            "_id": "1",
            "_score": 0.095891505,
            "_source": {
               "file": "IkdvZCBTYXZlIHRoZSBRdWVlbiIgKGFsdGVybmF0aXZlbHkgIkdvZCBTYXZlIHRoZSBLaW5nIg=="
            }
         }
      ]
   }
}
```

Closes #57.

(cherry picked from commit 432d7c0)
2014-07-26 00:27:28 +02:00
David Pilato 663d4eaddb Update to elasticsearch 1.4.0
Closes #77.

(cherry picked from commit c58516f)
2014-07-26 00:26:41 +02:00
David Pilato eaccd4383d Deprecate `content` by `_content`
When we want to force some values, we need to set those using `_field` where `field` is the field name we want to force:

```
{
  "file": {
    "_name": "myfilename.txt"
  }
}
```

But to set the content itself, we use `content` field name.

```
{
  "file": {
    "content": "VGhpcyBpcyBhbiBlbGFzdGljc2VhcmNoIG1hcHBlciBhdHRhY2htZW50IHRlc3Qu",
    "_name": "myfilename.txt"
  }
}
```

For consistency, we set `_content` instead:

```
{
  "file": {
    "_content": "VGhpcyBpcyBhbiBlbGFzdGljc2VhcmNoIG1hcHBlciBhdHRhY2htZW50IHRlc3Qu",
    "_name": "myfilename.txt"
  }
}
```

Closes #73.

(cherry picked from commit 2e6be20)
2014-07-25 18:15:37 +02:00
David Pilato 51a8f6f1a0 Fix doc typo
(cherry picked from commit f70eb1d)
2014-06-03 10:13:12 +02:00
David Pilato 94cf141108 Use` _language` field instead of `language`
When we want to force a language instead of using Tika language detection, we set `language` field in documents.

 To be consistent with other forced fields, `_content_type` and `_name`, we should prefix `language` field by an underscore `_`.

 So `language` become `_language`.

 We first deprecate `language` in version 2.1.0 and we remove it in 2.3.0.

 Closes #68.

(cherry picked from commit 2f46343)
2014-06-03 10:10:49 +02:00
David Pilato 7c1c2011bc Update to elasticsearch 1.3.0
Closes #67.
(cherry picked from commit d3eaac9)
2014-06-03 09:49:41 +02:00
David Pilato c0e7795f1f Update to elasticsearch 1.2.0
Closes #66.
(cherry picked from commit fb3b288)
2014-06-03 09:49:13 +02:00
David Pilato 7f8143ff12 Add highlighting documentation
Closes #54.
(cherry picked from commit efdf8ef)
2014-06-03 09:35:05 +02:00
David Pilato 8855bd7ddc Fix typo for JSON fields
(cherry picked from commit 63c60b8)
2014-06-03 09:34:51 +02:00
David Pilato e95bb18edb Create branches according to elasticsearch versions
We create branches:

* es-0.90 for elasticsearch 0.90
* es-1.0 for elasticsearch 1.0
* es-1.1 for elasticsearch 1.1
* master for elasticsearch master

We also check that before releasing we don't have a dependency to an elasticsearch SNAPSHOT version.

Add links to each version in documentation
2014-03-28 17:47:38 +01:00
David Pilato 839c4dab16 prepare for next development iteration 2014-03-25 19:02:16 +01:00
David Pilato 74d882110d prepare release elasticsearch-mapper-attachments-2.0.0 2014-03-25 18:47:56 +01:00
Richard Louapre 3d15cb0484 Add language detection option
Based on PR #45, we add a new language detection option using Language detection feature available in Tika:
https://tika.apache.org/1.4/detection.html#Language_Detection

By default, language detection is disabled (`false`) as it could come with a cost.
This default value can be changed by setting the `index.mapping.attachment.detect_language` setting.
It can also be provided on a per document indexed using the `_detect_language` parameter.

Closes #45.
Closes #44.
2014-03-25 18:26:09 +01:00
David Pilato 621995d0b4 Upgrade to Tika 1.5
Closes #56.
2014-03-19 23:20:29 +01:00
David Pilato 9d0b700b05 Add plugin release semi-automatic script
Closes #58.
2014-03-19 23:04:09 +01:00
David Pilato 7fc31c89f7 prepare release elasticsearch-mapper-attachments-2.0.0.RC1 2014-01-15 23:37:44 +01:00
David Pilato b877f1bd4f Update to elasticsearch 1.0.0.RC1
Closes #48.
2014-01-14 14:51:32 +01:00
David Pilato f8f647dea9 update headers 2014-01-13 22:31:14 +01:00
David Pilato 3f3fd74ee1 prepare release elasticsearch-mapper-attachments-1.9.0 2013-08-20 18:57:35 +02:00
David Pilato 62cc54a7c8 Update readme with release dates 2013-08-20 16:15:18 +02:00
David Pilato 8c340535d2 Add content_length metadata
We now generate `content_length` field field based on file size.
Closes #26.
2013-08-20 16:03:31 +02:00
Frédéric Camblor 019d0f9a26 Don't reject full document in case of invalid metadata
From original PR #17 from @fcamblor

If you try to index a document with an invalid metadata, the full document is rejected.

For example:

```html
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html lang="fr">
<head>
<title>Hello</title>
<meta name="date" content="">
<meta name="Author" content="kimchy">
<meta name="Keywords" content="elasticsearch,cool,bonsai">
</head>
<body>World</body>
</html>
```

has a non parseable date.

This fix add a new option that ignore parsing errors `"index.mapping.attachment.ignore_errors":true` (default to `true`).

Closes #17, #38.
2013-08-20 12:26:49 +02:00
David Pilato d2e2fb5cdf Upgrade Tika to 1.4.
Closes #36.
2013-08-14 16:57:42 +02:00
David Pilato c0663277bc prepare for next development iteration 2013-08-07 10:02:02 +02:00
David Pilato 0a454efe18 prepare release elasticsearch-mapper-attachments-1.8.0 2013-08-07 09:52:29 +02:00
David Pilato d054f9a1e7 Mapper 1.7.0 does not work with elasticsearch 0.90.3
FastByteArrayInputStream has been removed in 0.90.3.
Closes #34.
2013-08-07 09:47:12 +02:00
Shay Banon 7e58416506 release 1.7 2013-02-26 16:06:39 +01:00
David Pilato 942b87b763 Move to Elasticsearch 0.21.0.Beta1
Due to refactoring in 0.21.x we have to update this plugin
Closes #24.
2013-02-23 12:13:51 +01:00
Martijn van Groningen 69f8bdea03 Master is now 0.20 2012-12-21 15:17:02 +01:00
Martijn van Groningen a163fdad0f Prepare 1.6.0 release 2012-09-28 12:00:12 +02:00
Martijn van Groningen 0a17fe2e44 Release 1.5 2012-09-19 11:33:06 +02:00
Martijn van Groningen 5c649ad226 Upgraded Tika, Testng, hamcrest, log4j and surefire plugin.
Closes #12
2012-09-19 10:55:58 +02:00
Shay Banon 65043c0692 add license and repo 2012-06-10 22:14:18 +02:00
Shay Banon 66b96cb994 release 1.4.0 2012-03-25 20:10:46 +02:00
Shay Banon c1df26e4e9 upgrade to tika 1.1 2012-03-25 20:00:45 +02:00
Shay Banon 4482a5de67 release 1.3.0 2012-03-07 22:02:49 +02:00
Shay Banon 744e3772a5 update readme 2012-03-07 21:56:48 +02:00
Shay Banon 9882a2937b update readme 2012-03-04 11:59:22 +02:00
Shay Banon 8d2a02e7d1 release 1.2.0 2012-02-15 22:43:48 +02:00
Shay Banon 802b795289 release 1.1.0 supporting elasticsearch 0.19 2012-02-07 17:00:47 +02:00
Shay Banon f97157da15 move to elasticsearch 0.19.0 snap and use some of its features 2012-01-31 14:38:41 +02:00
Benjamin Devèze ac0be37f09 Fix typo 2012-01-04 11:54:30 +01:00
Shay Banon c4a1275475 first commit 2011-12-05 14:05:14 +02:00