Updated formula evaluator documentation due to bugzilla 45768

git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@700304 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Josh Micich 2008-09-29 23:12:53 +00:00
parent d14991ed02
commit ae3ce0ae5d
3 changed files with 110 additions and 149 deletions

View File

@ -81,84 +81,45 @@
</section> </section>
<section><title>Walkthrough of an "evaluate()" implementation.</title> <section><title>Walkthrough of an "evaluate()" implementation.</title>
<p>So here is the fun part - lets walk through the implementation of the excel <p>So here is the fun part - lets walk through the implementation of the excel
function... <strong>SQRT()</strong> </p> function... <strong>NOT()</strong> </p>
<section><title>The Code</title> <section><title>The Code</title>
<source> <source>
public class Sqrt extends NumericFunction { public final class Not implements Function {
private static final ValueEvalToNumericXlator NUM_XLATOR = public Eval evaluate(Eval[] args, int srcCellRow, short srcCellCol) {
new ValueEvalToNumericXlator((short) if (args.length != 1) {
( ValueEvalToNumericXlator.BOOL_IS_PARSED return ErrorEval.VALUE_INVALID;
| ValueEvalToNumericXlator.EVALUATED_REF_BOOL_IS_PARSED }
| ValueEvalToNumericXlator.EVALUATED_REF_STRING_IS_PARSED boolean boolArgVal;
| ValueEvalToNumericXlator.REF_BOOL_IS_PARSED try {
| ValueEvalToNumericXlator.STRING_IS_PARSED ValueEval ve = OperandResolver.getSingleValue(args[0], srcCellRow, srcCellCol);
)); Boolean b = OperandResolver.coerceValueToBoolean(ve, false);
boolArgVal = b == null ? false : b.booleanValue();
protected ValueEvalToNumericXlator getXlator() { } catch (EvaluationException e) {
return NUM_XLATOR; return e.getErrorEval();
} }
public Eval evaluate(Eval[] operands, int srcRow, short srcCol) {
double d = 0;
ValueEval retval = null;
switch (operands.length) {
default:
retval = ErrorEval.VALUE_INVALID;
break;
case 1:
ValueEval ve = singleOperandEvaluate(operands[0], srcRow, srcCol);
if (ve instanceof NumericValueEval) {
NumericValueEval ne = (NumericValueEval) ve;
d = ne.getNumberValue();
}
else if (ve instanceof BlankEval) {
// do nothing
}
else {
retval = ErrorEval.NUM_ERROR;
}
}
if (retval == null) {
d = Math.sqrt(d);
retval = (Double.isNaN(d)) ? (ValueEval) ErrorEval.VALUE_INVALID : new NumberEval(d);
}
return retval;
}
return BoolEval.valueOf(!boolArgVal);
}
} }
</source> </source>
</section> </section>
<section><title>Implementation Details</title> <section><title>Implementation Details</title>
<ul> <ul>
<li>The first thing to realise is that classes already exist, even for functions that are not yet implemented. <li>The first thing to realise is that classes already exist, even for functions that are not yet implemented.
Just that they extend from DefaultFunctionImpl whose behaviour is to return an ErrorEval.FUNCTION_NOT_IMPLEMENTED value.</li> Just that they extend from NotImplementedFunction whose behaviour is to return an ErrorEval.FUNCTION_NOT_IMPLEMENTED value.</li>
<li>In order to implement SQRT(..), we need to: a. Extend from the correct Abstract super class; b. implement the evaluate(..) method</li> <li>In order to implement NOT(..), we need to implement the interface 'Function' which has a method 'evaluate(..)'</li>
<li>Hence we extend SQRT(..) from the predefined class NumericFunction</li> <li>Since NOT(..) takes a single argument, we verify the length of the operands array else set the return value to ErrorEval.VALUE_INVALID</li>
<li>Since SQRT(..) takes a single argument, we verify the length of the operands array else set the return value to ErrorEval.VALUE_INVALID</li>
<li>Next we normalize each operand to a limited set of ValueEval subtypes, specifically, we call the function <li>Next we normalize each operand to a limited set of ValueEval subtypes, specifically, we call the function
<code>singleOperandEvaluate(..)</code> to do conversions of different value eval types to one of: NumericValueEval, <code>OperandResolver.getSingleValue</code> to do conversions of different value eval types to one of: NumericValueEval,
BlankEval and ErrorEval. The conversion logic is configured by a ValueEvalToNumericXlator instance which BlankEval and ErrorEval. The conversion logic is performed by OperandResolver.coerceValueToBoolean()
is returned by the Factory method: <code>getXlator(..)</code> The flags used to create the ValueEvalToNumericXlator
instance are briefly explained as follows:
BOOL_IS_PARSED means whether this function treats Boolean values as 1,
REF_BOOL_IS_PARSED means whether Boolean values in cell references are parsed or not.
So also, EVALUATED_REF_BOOL_IS_PARSED means if the operand was a RefEval that was assigned a
Boolean value as a result of evaluation of the formula that it contained.
eg. SQRT(TRUE) returns 1: This means BOOL_IS_PARSED should be set.
SQRT(A1) returns 1 when A1 has TRUE: This means REF_BOOL_IS_PARSED should be set.
SQRT(A1) returns 1 when A1 has a formula that evaluates to TRUE: This means EVALUATED_REF_BOOL_IS_PARSED should be set.
If the flag is not set for a particular case, that case is ignored (treated as if the cell is blank) _unless_
there is a flag like: STRING_IS_INVALID_VALUE (which means that Strings should be treated as resulting in VALUE_INVALID ErrorEval)
</li> </li>
<li>Next perform the appropriate Math function on the double value (if an error didnt occur already).</li> <li>Next perform the appropriate java operation (if an error didnt occur already).</li>
<li>Finally before returning the NumberEval wrapping the double value that <li>Finally return the BoolEval wrapping primitive boolean result. Note - in the case of numeric results
you computed, do one final check to see if the double is a NaN, (or if it is "Infinite") you should check for NaN and Infinity, because exel likes these translated into specific error codes like
If it is return the appropriate ErrorEval instance. Note: The OpenOffice.org error codes VALUE_INVALID, NUM_ERROR, DIV_ZERO etc.
should NOT be preferred. Instead use the excel specific error codes like VALUE_INVALID, NUM_ERROR, DIV_ZERO etc. (Thanks to Avik for bringing this issue up early!)</li>
(Thanks to Avik for bringing this issue up early!) The Oo.o ErrorCodes will be removed (if they havent already been :)</li>
</ul> </ul>
</section> </section>
<section><title>Modelling Excel Semantics</title> <section><title>Modelling Excel Semantics</title>
@ -170,8 +131,8 @@ public class Sqrt extends NumericFunction {
Because when you use TRUE in referenced cells with arithmetic functions, it evaluates to blank - meaning it is not evaluated - as if it was string or a blank cell. Because when you use TRUE in referenced cells with arithmetic functions, it evaluates to blank - meaning it is not evaluated - as if it was string or a blank cell.
eg. "=SUM(1,A1)" when A1 is TRUE evaluates to 1. eg. "=SUM(1,A1)" when A1 is TRUE evaluates to 1.
This behaviour changes depending on which function you are using. eg. SQRT(..) that was This behaviour changes depending on which function you are using. eg. SQRT(..) that was
described earlier treats a TRUE as 1 in all cases. This is why the configurable ValueEvalToNumericXlator described earlier treats a TRUE as 1 in all cases. The various conversion logic has been refactored into common places like the following classes:
class had to be written. OperandResolver, TextFunction, NumericFunction, MultiOperandNumericFunction and FinanceFunction.
</p> </p>
<p>Note that when you are extending from an abstract function class like <p>Note that when you are extending from an abstract function class like
NumericFunction (rather than implementing the interface o.a.p.hssf.record.formula.eval.Function directly) NumericFunction (rather than implementing the interface o.a.p.hssf.record.formula.eval.Function directly)
@ -186,18 +147,16 @@ public class Sqrt extends NumericFunction {
</section> </section>
<section><title>Testing Framework</title> <section><title>Testing Framework</title>
<p>Automated testing of the implemented Function is easy. <p>Automated testing of the implemented Function is easy.
The source code for this is in the file: o.a.p.h.record.formula.GenericFormulaTestCase.java The source code for this is in the file: o.a.p.h.record.formula.TestFormulasFromSpreadsheet.java
This class has a reference to the test xls file (not /a/ test xls, /the/ test xls :) This class has a reference to the Excel test sample file 'FormulaEvalTestData.xls'. In this file,
which may need to be changed for your environment. Once you do that, in the test xls,
locate the entry for the function that you have implemented and enter different tests locate the entry for the function that you have implemented and enter different tests
in a cell in the FORMULA row. Then copy the "value of" the formula that you entered in the in a cell in the FORMULA row. Then copy the "value of" the formula that you entered in the
cell just below it (this is easily done in excel as: cell just below it (this is easily done in excel as:
[copy the formula cell] > [go to cell below] > Edit > Paste Special > Values > "ok"). [copy the formula cell] > [go to cell below] > Edit > Paste Special > Values > "ok").
You can enter multiple such formulas and paste their values in the cell below and the You can enter multiple such formulas and paste their values in the cell below and the
test framework will automatically test if the formula evaluation matches the expected test framework will automatically test if the formula evaluation matches the expected
value (Again, hard to put in words, so if you will, please take time to quickly look value (Please take time to quickly look at the code and the currently entered tests in the
at the code and the currently entered tests in the patch attachment "FormulaEvalTestData.xls" file "FormulaEvalTestData.xls").
file).
</p> </p>
</section> </section>
</body> </body>

View File

@ -20,13 +20,13 @@
<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.1//EN" "../dtd/document-v11.dtd"> <!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.1//EN" "../dtd/document-v11.dtd">
<document> <document>
<header> <header>
<title>Formula Evaluation</title> <title>Formula Evaluation</title>
<authors> <authors>
<person email="amoweb@yahoo.com" name="Amol Deshmukh" id="AD"/> <person email="amoweb@yahoo.com" name="Amol Deshmukh" id="AD"/>
</authors> </authors>
</header> </header>
<body> <body>
<section><title>Introduction</title> <section><title>Introduction</title>
<p>The POI formula evaluation code enables you to calculate the result of <p>The POI formula evaluation code enables you to calculate the result of
formulas in Excels sheets read-in, or created in POI. This document explains formulas in Excels sheets read-in, or created in POI. This document explains
@ -66,34 +66,33 @@
FileInputStream fis = new FileInputStream("c:/temp/test.xls"); FileInputStream fis = new FileInputStream("c:/temp/test.xls");
HSSFWorkbook wb = new HSSFWorkbook(fis); HSSFWorkbook wb = new HSSFWorkbook(fis);
HSSFSheet sheet = wb.getSheetAt(0); HSSFSheet sheet = wb.getSheetAt(0);
HSSFFormulaEvaluator evaluator = new HSSFFormulaEvaluator(sheet, wb); HSSFFormulaEvaluator evaluator = new HSSFFormulaEvaluator(wb);
// suppose your formula is in B3 // suppose your formula is in B3
CellReference cellReference = new CellReference("B3"); CellReference cellReference = new CellReference("B3");
HSSFRow row = sheet.getRow(cellReference.getRow()); HSSFRow row = sheet.getRow(cellReference.getRow());
HSSFCell cell = row.getCell(cellReference.getCol()); HSSFCell cell = row.getCell((int)cellReference.getCol());
evaluator.setCurrentRow(row);
HSSFFormulaEvaluator.CellValue cellValue = evaluator.evaluate(cell); HSSFFormulaEvaluator.CellValue cellValue = evaluator.evaluate(cell);
switch (cellValue.getCellType()) { switch (cellValue.getCellType()) {
case HSSFCell.CELL_TYPE_BOOLEAN: case HSSFCell.CELL_TYPE_BOOLEAN:
System.out.println(cellValue.getBooleanValue()); System.out.println(cellValue.getBooleanValue());
break; break;
case HSSFCell.CELL_TYPE_NUMERIC: case HSSFCell.CELL_TYPE_NUMERIC:
System.out.println(cellValue.getNumberValue()); System.out.println(cellValue.getNumberValue());
break; break;
case HSSFCell.CELL_TYPE_STRING: case HSSFCell.CELL_TYPE_STRING:
System.out.println(cellValue.getStringValue()); System.out.println(cellValue.getStringValue());
break; break;
case HSSFCell.CELL_TYPE_BLANK: case HSSFCell.CELL_TYPE_BLANK:
break; break;
case HSSFCell.CELL_TYPE_ERROR: case HSSFCell.CELL_TYPE_ERROR:
break; break;
// CELL_TYPE_FORMULA will never happen // CELL_TYPE_FORMULA will never happen
case HSSFCell.CELL_TYPE_FORMULA: case HSSFCell.CELL_TYPE_FORMULA:
break; break;
} }
</source> </source>
<p>Thus using the retrieved value (of type <p>Thus using the retrieved value (of type
@ -119,36 +118,42 @@ switch (cellValue.getCellType()) {
FileInputStream fis = new FileInputStream("/somepath/test.xls"); FileInputStream fis = new FileInputStream("/somepath/test.xls");
HSSFWorkbook wb = new HSSFWorkbook(fis); HSSFWorkbook wb = new HSSFWorkbook(fis);
HSSFSheet sheet = wb.getSheetAt(0); HSSFSheet sheet = wb.getSheetAt(0);
HSSFFormulaEvaluator evaluator = new HSSFFormulaEvaluator(sheet, wb); HSSFFormulaEvaluator evaluator = new HSSFFormulaEvaluator(wb);
// suppose your formula is in B3 // suppose your formula is in B3
CellReference cellReference = new CellReference("B3"); CellReference cellReference = new CellReference("B3");
HSSFRow row = sheet.getRow(cellReference.getRow()); HSSFRow row = sheet.getRow(cellReference.getRow());
HSSFCell cell = row.getCell(cellReference.getCol()); HSSFCell cell = row.getCell((int)cellReference.getCol());
evaluator.setCurrentRow(row);
if (cell!=null) { if (cell!=null) {
switch (<strong>evaluator.evaluateFormulaCell</strong>(cell)) { int valueType = evaluator.evaluateFormulaCell(cell);
if (valueType == -1) {
// Cell was not a formula cell
// but we can read the plain value from the cell just the same as a formula result
valueType = cell.getCellType();
}
switch (valueType) {
case HSSFCell.CELL_TYPE_BOOLEAN: case HSSFCell.CELL_TYPE_BOOLEAN:
System.out.println(cell.getBooleanCellValue()); System.out.println(cell.getBooleanCellValue());
break; break;
case HSSFCell.CELL_TYPE_NUMERIC: case HSSFCell.CELL_TYPE_NUMERIC:
System.out.println(cell.getNumberCellValue()); System.out.println(cell.getNumericCellValue());
break; break;
case HSSFCell.CELL_TYPE_STRING: case HSSFCell.CELL_TYPE_STRING:
System.out.println(cell.getStringCellValue()); System.out.println(cell.getRichStringCellValue().getString());
break; break;
case HSSFCell.CELL_TYPE_BLANK: case HSSFCell.CELL_TYPE_BLANK:
break; break;
case HSSFCell.CELL_TYPE_ERROR: case HSSFCell.CELL_TYPE_ERROR:
System.out.println(cell.getErrorCellValue()); System.out.println(HSSFErrorConstants.getText(cell.getErrorCellValue()));
break; break;
// CELL_TYPE_FORMULA will never occur
case HSSFCell.CELL_TYPE_FORMULA: case HSSFCell.CELL_TYPE_FORMULA:
break; throw new IllegalStateException("Result-type 'formula' cannot occur");
} }
} }
</source> </source>
</section> </section>
@ -163,36 +168,36 @@ if (cell!=null) {
FileInputStream fis = new FileInputStream("/somepath/test.xls"); FileInputStream fis = new FileInputStream("/somepath/test.xls");
HSSFWorkbook wb = new HSSFWorkbook(fis); HSSFWorkbook wb = new HSSFWorkbook(fis);
HSSFSheet sheet = wb.getSheetAt(0); HSSFSheet sheet = wb.getSheetAt(0);
HSSFFormulaEvaluator evaluator = new HSSFFormulaEvaluator(sheet, wb); HSSFFormulaEvaluator evaluator = new HSSFFormulaEvaluator(wb);
// suppose your formula is in B3 // suppose your formula is in B3
CellReference cellReference = new CellReference("B3"); CellReference cellReference = new CellReference("B3");
HSSFRow row = sheet.getRow(cellReference.getRow()); HSSFRow row = sheet.getRow(cellReference.getRow());
HSSFCell cell = row.getCell(cellReference.getCol()); HSSFCell cell = row.getCell((int)cellReference.getCol());
evaluator.setCurrentRow(row);
if (cell!=null) { if (cell!=null) {
switch (<strong>evaluator.evaluateInCell</strong>(cell).getCellType()) { switch (evaluator.evaluateInCell(cell).getCellType()) {
case HSSFCell.CELL_TYPE_BOOLEAN: case HSSFCell.CELL_TYPE_BOOLEAN:
System.out.println(cell.getBooleanCellValue()); System.out.println(cell.getBooleanCellValue());
break; break;
case HSSFCell.CELL_TYPE_NUMERIC: case HSSFCell.CELL_TYPE_NUMERIC:
System.out.println(cell.getNumberCellValue()); System.out.println(cell.getNumericCellValue());
break; break;
case HSSFCell.CELL_TYPE_STRING: case HSSFCell.CELL_TYPE_STRING:
System.out.println(cell.getStringCellValue()); System.out.println(cell.getRichStringCellValue().getString());
break; break;
case HSSFCell.CELL_TYPE_BLANK: case HSSFCell.CELL_TYPE_BLANK:
break; break;
case HSSFCell.CELL_TYPE_ERROR: case HSSFCell.CELL_TYPE_ERROR:
System.out.println(cell.getErrorCellValue()); System.out.println(cell.getErrorCellValue());
break; break;
// CELL_TYPE_FORMULA will never occur // CELL_TYPE_FORMULA will never occur
case HSSFCell.CELL_TYPE_FORMULA: case HSSFCell.CELL_TYPE_FORMULA:
break; break;
} }
} }
</source> </source>
</section> </section>
@ -201,13 +206,12 @@ if (cell!=null) {
<source> <source>
FileInputStream fis = new FileInputStream("/somepath/test.xls"); FileInputStream fis = new FileInputStream("/somepath/test.xls");
HSSFWorkbook wb = new HSSFWorkbook(fis); HSSFWorkbook wb = new HSSFWorkbook(fis);
for(int sheetNum = 0; sheetNum &lt; wb.getNumberOfSheets(); sheetNum++) { HSSFFormulaEvaluator evaluator = new HSSFFormulaEvaluator(wb);
for(int sheetNum = 0; sheetNum < wb.getNumberOfSheets(); sheetNum++) {
HSSFSheet sheet = wb.getSheetAt(sheetNum); HSSFSheet sheet = wb.getSheetAt(sheetNum);
HSSFFormulaEvaluator evaluator = new HSSFFormulaEvaluator(sheet, wb);
for(Iterator rit = sheet.rowIterator(); rit.hasNext();) { for(Iterator rit = sheet.rowIterator(); rit.hasNext();) {
HSSFRow r = (HSSFRow)rit.next(); HSSFRow r = (HSSFRow)rit.next();
evaluator.setCurrentRow(r);
for(Iterator cit = r.cellIterator(); cit.hasNext();) { for(Iterator cit = r.cellIterator(); cit.hasNext();) {
HSSFCell c = (HSSFCell)cit.next(); HSSFCell c = (HSSFCell)cit.next();
@ -218,6 +222,7 @@ for(int sheetNum = 0; sheetNum &lt; wb.getNumberOfSheets(); sheetNum++) {
} }
} }
wb.write(new FileOutputStream("/somepath/changed.xls")); wb.write(new FileOutputStream("/somepath/changed.xls"));
</source> </source>
</section> </section>
</section> </section>
@ -226,12 +231,12 @@ wb.write(new FileOutputStream("/somepath/changed.xls"));
<section><title>Performance Notes</title> <section><title>Performance Notes</title>
<ul> <ul>
<li>Generally you should have to create only one HSSFFormulaEvaluator <li>Generally you should have to create only one HSSFFormulaEvaluator
instance per sheet, but there really is no overhead in creating instance per workbook, but there really is no overhead in creating
multiple HSSFFormulaEvaluators per sheet other than that of the multiple HSSFFormulaEvaluators per workbook other than that of the
HSSFFormulaEvaluator object creation. HSSFFormulaEvaluator object creation.
</li> </li>
<li>Also note that HSSFFormulaEvaluator maintains a reference to <li>Also note that HSSFFormulaEvaluator maintains a reference to
the sheet and workbook, so ensure that the evaluator instance the workbook, so ensure that the evaluator instance
is available for garbage collection when you are done with it is available for garbage collection when you are done with it
(in other words don't maintain long lived reference to (in other words don't maintain long lived reference to
HSSFFormulaEvaluator if you don't really need to - unless HSSFFormulaEvaluator if you don't really need to - unless

View File

@ -144,22 +144,19 @@ public class HSSFFormulaEvaluator {
/** /**
* If cell contains formula, it evaluates the formula, * If cell contains formula, it evaluates the formula, and saves the result of the formula. The
* and saves the result of the formula. The cell * cell remains as a formula cell. If the cell does not contain formula, this method returns -1
* remains as a formula cell. * and leaves the cell unchanged.
* Else if cell does not contain formula, this method leaves *
* the cell unchanged. * Note that the type of the <em>formula result</em> is returned, so you know what kind of
* Note that the type of the formula result is returned, * cached formula result is also stored with the formula.
* so you know what kind of value is also stored with
* the formula.
* <pre> * <pre>
* int evaluatedCellType = evaluator.evaluateFormulaCell(cell); * int evaluatedCellType = evaluator.evaluateFormulaCell(cell);
* </pre> * </pre>
* Be aware that your cell will hold both the formula, * Be aware that your cell will hold both the formula, and the result. If you want the cell
* and the result. If you want the cell replaced with * replaced with the result of the formula, use {@link #evaluateInCell(HSSFCell)}
* the result of the formula, use {@link #evaluateInCell(HSSFCell)}
* @param cell The cell to evaluate * @param cell The cell to evaluate
* @return The type of the formula result (the cell's type remains as HSSFCell.CELL_TYPE_FORMULA however) * @return -1 for non-formula cells, or the type of the <em>formula result</em>
*/ */
public int evaluateFormulaCell(HSSFCell cell) { public int evaluateFormulaCell(HSSFCell cell) {
if (cell == null || cell.getCellType() != HSSFCell.CELL_TYPE_FORMULA) { if (cell == null || cell.getCellType() != HSSFCell.CELL_TYPE_FORMULA) {