From 18bd29715a35e9e0a36766eb74a23613dc7e4622 Mon Sep 17 00:00:00 2001 From: markharwood Date: Thu, 14 May 2020 11:51:59 +0100 Subject: [PATCH] Lucene-9336: Changes.txt and migrate.md addition for RegExp enhancements (#1515) Added notes for new \w \s etc support --- lucene/CHANGES.txt | 4 ++++ lucene/MIGRATE.md | 4 ++++ 2 files changed, 8 insertions(+) diff --git a/lucene/CHANGES.txt b/lucene/CHANGES.txt index 29c7d324ef0..2f87a5d9e0b 100644 --- a/lucene/CHANGES.txt +++ b/lucene/CHANGES.txt @@ -60,6 +60,10 @@ API Changes Improvements +* LUCENE-9336: RegExp query now supports \w \W \d \D \s \S expressions. + This is a break with previous behaviour where these were (mis)interpreted + as literally the characters w W d etc. (Mark Harwood) + * LUCENE-8757: When provided with an ExecutorService to run queries across multiple threads, IndexSearcher now groups small segments together, up to 250k docs per slice. (Atri Sharma via Adrien Grand) diff --git a/lucene/MIGRATE.md b/lucene/MIGRATE.md index db188bed6ff..0956c8e0013 100644 --- a/lucene/MIGRATE.md +++ b/lucene/MIGRATE.md @@ -1,5 +1,9 @@ # Apache Lucene Migration Guide +## RegExp certain regular expressions now match differently (LUCENE-9336) + +The commonly used regular expressions \w \W \d \D \s and \S now work the same way [Java Pattern](https://docs.oracle.com/javase/tutorial/essential/regex/pre_char_classes.html#CHART) matching works. Previously these expressions were (mis)interpreted as searches for the literal characters w, d, s etc. + ## NGramFilterFactory "keepShortTerm" option was fixed to "preserveOriginal" (LUCENE-9259) The factory option name to output the original term was corrected in accordance with its Javadoc.