Tika may "hang up" client application

Sometimes Tika may crash while parsing some files.  In this case it may generate just runtime errors (Throwable), not  TikaException.
But there is no “catch” clause for  Throwable in the AttachmentMapper.java :

        String parsedContent;
        try {
            // Set the maximum length of strings returned by the parseToString method, -1 sets no limit
            parsedContent = tika().parseToString(new FastByteArrayInputStream(content), metadata, indexedChars);
        } catch (TikaException e) {
            throw new MapperParsingException("Failed to extract [" + indexedChars + "] characters of text for [" + name + "]", e);
        }

As a result,  tika() may “hang up” the whole application.
(we have some pdf-files that "hang up" Elastic client if you try to parse them using mapper-attahcment plugin)

We propose the following fix:

        String parsedContent;
        try {
            // Set the maximum length of strings returned by the parseToString method, -1 sets no limit
            parsedContent = tika().parseToString(new FastByteArrayInputStream(content), metadata, indexedChars);
        } catch (Throwable e) {
            throw new MapperParsingException("Failed to extract [" + indexedChars + "] characters of text for [" + name + "]", e);
        }

(just replace “TikaException” with “Throwable” – it works for our cases)

Thank you!
Closes #21.
This commit is contained in:
David Pilato 2013-08-20 17:28:11 +02:00
parent 0fff26f2bf
commit d6aa2f0615
1 changed files with 1 additions and 2 deletions

View File

@ -19,7 +19,6 @@
package org.elasticsearch.index.mapper.attachment;
import org.apache.tika.exception.TikaException;
import org.apache.tika.metadata.Metadata;
import org.elasticsearch.common.io.stream.BytesStreamInput;
import org.elasticsearch.common.logging.ESLogger;
@ -356,7 +355,7 @@ public class AttachmentMapper implements Mapper {
try {
// Set the maximum length of strings returned by the parseToString method, -1 sets no limit
parsedContent = tika().parseToString(new BytesStreamInput(content, false), metadata, indexedChars);
} catch (TikaException e) {
} catch (Throwable e) {
throw new MapperParsingException("Failed to extract [" + indexedChars + "] characters of text for [" + name + "]", e);
}