mirror of https://github.com/apache/poi.git
added vision
git-svn-id: https://svn.apache.org/repos/asf/jakarta/poi/trunk@352075 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
e19352347b
commit
d2a7221a5d
|
@ -0,0 +1,604 @@
|
|||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.0//EN" "dtd/document-v10.dtd">
|
||||
|
||||
<document>
|
||||
<header>
|
||||
<title>POI 2.0 Vision Document</title>
|
||||
<authors>
|
||||
<person name="Andrew C. Oliver" email="acoliver2@users.sourceforge.net"/>
|
||||
<person name="Marcus W. Johnson" email="mjohnson@apache.org"/>
|
||||
<person name="Glen Stampoultzis" email="gstamp@iprimus.com.au"/>
|
||||
<person name="Nicola Ken Barozzi" email="barozzi@nicolaken.com"/>
|
||||
</authors>
|
||||
</header>
|
||||
|
||||
<body>
|
||||
|
||||
<s1 title="Preface">
|
||||
<p>
|
||||
This is the POI 2.0 cycle vision document. Although the vision
|
||||
has not changed and this document is certainly not out of date and
|
||||
the vision has not changed, the structure of the project has
|
||||
changed a bit. We're not going to change the vision document to
|
||||
reflect this (however proper that may be) because it would only
|
||||
involve deletion. There is no purpose in providing less
|
||||
information provded we give clarification.
|
||||
</p>
|
||||
<p>
|
||||
This document was created before the POI components for
|
||||
<link href="http://xml.apache.org/cocoon">Apache Cocoon</link>
|
||||
were accepted into the Cocoon project itself. It was also
|
||||
written before POI was accepted into Jakarta. So while the
|
||||
vision hasn't changed some of the components are actually now
|
||||
part of other projects. We'll still be working on them on the
|
||||
same timeline roughly (minus the overhead of coordination with
|
||||
other groups), but they are no longer technically part of the
|
||||
POI project itself.
|
||||
</p>
|
||||
</s1>
|
||||
|
||||
<s1 title="1. Introduction">
|
||||
<s2 title="1.1 Purpose of this document">
|
||||
<p>
|
||||
The purpose of this document is to
|
||||
collect, analyze and define high-level requirements, user needs,
|
||||
and features of the second release of the POI project software.
|
||||
The POI project currently consists of the following components:
|
||||
the HSSF Serializer, the HSSF library and the POIFS library.
|
||||
</p>
|
||||
<ul>
|
||||
<li>
|
||||
The HSSF Serializer is a set of Java classes whose main
|
||||
class supports the Serializer interface from the Cocoon
|
||||
2 project and outputs the serialized data in a format
|
||||
compatible with the spreadsheet program Microsoft Excel
|
||||
'97.
|
||||
</li>
|
||||
<li>
|
||||
The HSSF library is a set of classes for reading and
|
||||
writing Microsoft Excel 97 file format using pure Java.
|
||||
</li>
|
||||
<li>
|
||||
The POIFS library is a set of classes for reading and
|
||||
writing Microsoft's OLE 2 Compound Document format using
|
||||
pure Java.
|
||||
</li>
|
||||
</ul>
|
||||
<p>By the completion of this release cycle the POI project will also
|
||||
include the HSSF Generator and the HDF library.
|
||||
</p>
|
||||
<ul>
|
||||
<li>The HSSF Generator will be responsible for using HSSF to read
|
||||
in the XLS (Excel 97) file format and create SAX events. The HSSF
|
||||
Generator will support the applicable interfaces specified by the
|
||||
Apache Cocoon 2 project.
|
||||
</li>
|
||||
<li>The HDF library will provide a set of high level interfaces
|
||||
for reading and writing Microsoft Word 97 file format using pure
|
||||
Java.</li>
|
||||
</ul>
|
||||
|
||||
</s2>
|
||||
|
||||
|
||||
<s2 title="1.2 Project Overview">
|
||||
<p>
|
||||
The first release of the POI project
|
||||
was an astounding success. This release seeks to build on that
|
||||
success by:
|
||||
</p>
|
||||
<ul>
|
||||
<li>
|
||||
Refactoring POIFS into imput and
|
||||
output classes as well as an event-driven API for reading.
|
||||
</li>
|
||||
<li>
|
||||
Refactor HSSF for greater
|
||||
performance as well as an event-driven API for reading
|
||||
</li>
|
||||
<li>
|
||||
Extend HSSF by adding the ability to read and write formulas.
|
||||
</li>
|
||||
<li>
|
||||
Extend HSSF by adding the ability to read and write
|
||||
user-defined styles.
|
||||
</li>
|
||||
<li>
|
||||
Create a Cocoon 2 Generator for HSSF using the same tags
|
||||
as the HSSF Serializer.
|
||||
</li>
|
||||
<li>
|
||||
Create a new library (HDF) for reading and writing
|
||||
Microsoft Word DOC format.
|
||||
</li>
|
||||
<li>
|
||||
Refactor the HSSFSerializer into a separate extensible
|
||||
POIFSSerializer and HSSFSerializer
|
||||
</li>
|
||||
<li>
|
||||
Providing the create excel charts. (write only)
|
||||
</li>
|
||||
</ul>
|
||||
</s2>
|
||||
</s1>
|
||||
<s1 title="2. User Description">
|
||||
<s2 title="2.1 User/Market Demographics">
|
||||
<p>
|
||||
There are a number of enthusiastic
|
||||
users of XML, UNIX and Java technology. Furthermore, the Microsoft
|
||||
solution for outputting Office Document formats often involves
|
||||
actually manipulating the software as an OLE Server. This method
|
||||
provides extremely low performance, extremely high overhead and is
|
||||
only capable of handing one document at a time.
|
||||
</p>
|
||||
<ol>
|
||||
<li>
|
||||
Our intended audience for the HSSF
|
||||
Serializer portion of this project are developers writing reports or
|
||||
data extracts in XML format.
|
||||
</li>
|
||||
<li>
|
||||
Our intended audience for the HSSF
|
||||
library portion of this project is ourselves as we are developing
|
||||
the HSSF serializer and anyone who needs to read and write Excel
|
||||
spreadsheets in a non-XML Java environment, or who has specific
|
||||
needs not addressed by the Serializer
|
||||
</li>
|
||||
<li>
|
||||
Our intended audience for the
|
||||
POIFS library is ourselves as we are developing the HSSF and HDF
|
||||
libraries and anyone wishing to provide other libraries for
|
||||
reading/writing other file formats utilizing the OLE 2 Compound
|
||||
Document Format in Java.
|
||||
</li>
|
||||
<li>
|
||||
Our intended audience for the HSSF
|
||||
generator are developers who need to export Excel spreadsheets to
|
||||
XML in a non-proprietary environment.
|
||||
</li>
|
||||
<li>
|
||||
Our intended audience for the HDF
|
||||
library is ourselves, as we will be developing a HDF Serializer in a
|
||||
later release, and anyone wishing to add .DOC file processing and
|
||||
creation to their projects.
|
||||
</li>
|
||||
</ol>
|
||||
</s2>
|
||||
<s2 title="2.2. User environment">
|
||||
<p>
|
||||
The users of this software shall be
|
||||
developers in a Java environment on any operating system, or power
|
||||
users who are capable of XML document generation/deployment.
|
||||
</p>
|
||||
</s2>
|
||||
<s2 title="2.3. Key User Needs">
|
||||
<p>
|
||||
The HSSF library currently requires a
|
||||
full object representation to be created before reading values. This
|
||||
results in very high memory utilization. We need to reduce this
|
||||
substantially for reading. It would be preferable to do this for
|
||||
writing, but it may not be possible due to the constraints imposed by
|
||||
the file format itself. Memory utilization during read is our top
|
||||
user complaint.
|
||||
</p>
|
||||
<p>
|
||||
The POIFS library currently requires a
|
||||
full object representation to be created before reading values. This
|
||||
results in very high memory utilization. We need to reduce this
|
||||
substantially for reading.
|
||||
</p>
|
||||
<p>
|
||||
The HSSF library currently ignores
|
||||
formula cells and identifies them as "UnknownRecord" at the
|
||||
lower level of the API. We must provide a way to read and write
|
||||
formulas. This is now the top requested feature.
|
||||
</p>
|
||||
<p>
|
||||
The HSSF library currently does not support
|
||||
charts. This is a key requirement of some users who wish to use HSSF
|
||||
in a reporting engine.
|
||||
</p>
|
||||
<p>
|
||||
The HSSF Serializer currently does not
|
||||
provide serialization for cell styling. User's will want stylish
|
||||
spreadsheets to result from their XML.
|
||||
</p>
|
||||
<p>
|
||||
There is currently no way to generate
|
||||
the XML from an XLS that is consistent with the format used by the
|
||||
HSSF Serializer.
|
||||
</p>
|
||||
<p>
|
||||
There should be a way to read and write
|
||||
the DOC file format using pure Java.
|
||||
</p>
|
||||
|
||||
</s2>
|
||||
<s2 title="2.4. Alternatives and Competition">
|
||||
<p>
|
||||
Alternatives to using HSSF to manipulate Excel files include:
|
||||
</p>
|
||||
<ol>
|
||||
<li>Buy the $10,000 Formula 1 library
|
||||
(<link href="http://www.f1j.com/">www.tidestone.com</link>)
|
||||
now owned by Actuate and accept its crude api and limitations.
|
||||
</li>
|
||||
<li>Give up XML, Java, and operating system independence, and
|
||||
write Visual Basic code in a Microsoft Windows based environment
|
||||
</li>
|
||||
<li>Try writing output in Microsoft's poorly documented XHTML
|
||||
for Office format.
|
||||
</li>
|
||||
</ol>
|
||||
<p>
|
||||
There is also a decent library for
|
||||
reading Excel documents written by Andy Khan called xlReader
|
||||
(<link href="http://www.sourceforge.net/projects/xlrd">http://www.sourceforge.net/projects/xlrd</link>).
|
||||
It does not provide write ability.
|
||||
</p>
|
||||
<p>
|
||||
There are a number of PERL and C alternatives.
|
||||
None are consistent.
|
||||
</p>
|
||||
</s2>
|
||||
</s1>
|
||||
<s1 title="3. Project Overview">
|
||||
<s2 title="3.1. Project Perspective">
|
||||
<p>
|
||||
The produced code shall be licensed by
|
||||
the Apache License as used by the Cocoon 2 project (APL 1.1) and
|
||||
maintained on at <link href="http://poi.sourceforge.net/">http://poi.sourceforge.net</link>
|
||||
and <link href="http://sourcefoge.net/projects/poi">http://sourcefoge.net/projects/poi</link>.
|
||||
It is our hope to at some point integrate with the various Apache
|
||||
projects (xml.apache.org and jakarta.apache.org), at which point we'd
|
||||
turn the copyright over to them.
|
||||
</p>
|
||||
</s2>
|
||||
<s2 title="3.2. Project Position Statement">
|
||||
<p>
|
||||
For developers on a Java and/or XML
|
||||
environment this project will provide all the tools necessary for
|
||||
outputting XML data in the Microsoft Excel format. This project seeks
|
||||
to make the use of Microsoft Windows based servers unnecessary for
|
||||
file format considerations and to fully document the OLE 2 Compound
|
||||
Document format. The project aims not only to provide the tools for
|
||||
serializing XML to Excel and Word file formats and the tools for
|
||||
writing to those file formats from Java, but also to provide the
|
||||
tools for later projects to convert other OLE 2 Compound Document
|
||||
formats to pure Java APIs.
|
||||
</p>
|
||||
</s2>
|
||||
<s2 title="3.3. Summary of Capabilities">
|
||||
<p>
|
||||
HSSF Serializer for Apache Cocoon 2
|
||||
</p>
|
||||
<table>
|
||||
<tr>
|
||||
<td>
|
||||
Benefit
|
||||
</td>
|
||||
<td>
|
||||
Supporting Features
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>
|
||||
Ability to serialize styles from XML spreadsheets.
|
||||
</td>
|
||||
<td>
|
||||
HSSFSerialzier will support styles.
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>
|
||||
Ability to read and write formulas in XLS files.
|
||||
</td>
|
||||
<td>
|
||||
HSSF will support reading/writing formulas.
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>
|
||||
Ability to output in MS Word on any platform using Java.
|
||||
</td>
|
||||
<td>
|
||||
The project will develop an API that outputs in Word format
|
||||
using pure Java.
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>
|
||||
Enhance performance for reading and writing XLS files.
|
||||
</td>
|
||||
<td>
|
||||
HSSF will undergo a number of performance enhancements. HSSF
|
||||
will include a new event-based API for reading XLS files. POIFS
|
||||
will support a new event-based API for reading OLE2 CDF files.
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>
|
||||
Ability to generate XML from XLS files
|
||||
</td>
|
||||
<td>
|
||||
The project will develop an HSSF Generator.
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>
|
||||
The ability to generate charts
|
||||
</td>
|
||||
<td>
|
||||
HSSF will provide low level support for chart records as well
|
||||
as high level API support for generating charts. The ability
|
||||
to read chart information will not initially be provided.
|
||||
</td>
|
||||
</tr>
|
||||
|
||||
</table>
|
||||
</s2>
|
||||
<s2 title="3.4. Assumptions and Dependencies">
|
||||
<ul>
|
||||
<li>
|
||||
The HSSF Serializer and Generator
|
||||
will support the Gnumeric 1.0 XML tag language.
|
||||
</li>
|
||||
<li>
|
||||
The HSSF Generator and HSSF
|
||||
Serializer will be mutually validating. It should be possible to
|
||||
have an XLS file created by the Serializer run through the Generator
|
||||
and the output back through the Serializer (via the Cocoon pipeline)
|
||||
and get the same file or a reasonable facimille (no one cares if it
|
||||
differs by the order of the binary records in some minor but
|
||||
non-visually recognizable manner).
|
||||
</li>
|
||||
<li>
|
||||
The HSSF Generator will run on any
|
||||
Java 2 supporting platform with Apache Cocoon 2 installed along with
|
||||
the HSSF and POIFS APIs.
|
||||
</li>
|
||||
<li>
|
||||
The HSSF Serializer will run on
|
||||
any Java 2 supporting platform with Apache Cocoon 2 installed along
|
||||
with the HSSF and POIFS APIs.
|
||||
</li>
|
||||
<li>
|
||||
The HDF API requires a Java 2
|
||||
implementation and the POIFS API.
|
||||
</li>
|
||||
<li>
|
||||
The HSSF API requires a Java 2
|
||||
implementation and the POIFS API.
|
||||
</li>
|
||||
<li>
|
||||
The POIFS API requires a Java 2
|
||||
implementation.
|
||||
</li>
|
||||
|
||||
</ul>
|
||||
</s2>
|
||||
</s1>
|
||||
<s1 title="4. Project Features">
|
||||
<p>
|
||||
Enhancements to the POIFS API will
|
||||
include:
|
||||
</p>
|
||||
<ul>
|
||||
<li>
|
||||
An event driven API for reading
|
||||
POIFS Filesystems.
|
||||
</li>
|
||||
<li>
|
||||
A low-level API for
|
||||
creating/manipulating POI filesystems.
|
||||
</li>
|
||||
<li>
|
||||
Code improvements supporting
|
||||
greater separation between read and write structures.
|
||||
</li>
|
||||
</ul>
|
||||
<p>
|
||||
Enhancements to the HSSF API will
|
||||
include:
|
||||
</p>
|
||||
<ul>
|
||||
<li>
|
||||
An event driven API for reading
|
||||
XLS files.
|
||||
</li>
|
||||
<li>
|
||||
Performance improvements.
|
||||
</li>
|
||||
<li>
|
||||
Formula support (read/write)
|
||||
</li>
|
||||
<li>
|
||||
Support for user-defined data
|
||||
formats
|
||||
</li>
|
||||
<li>
|
||||
Better documentation of the file
|
||||
format and structure.
|
||||
</li>
|
||||
<li>
|
||||
An API for creation of charts.
|
||||
</li>
|
||||
</ul>
|
||||
<p>
|
||||
The HSSF Generator will include:
|
||||
</p>
|
||||
<ul>
|
||||
<li>
|
||||
A set of classes supporting the
|
||||
Cocoon 2 Generator interfaces providing a method for reading XLS
|
||||
files and outputting SAX events.
|
||||
</li>
|
||||
<li>
|
||||
The same tag format used by the
|
||||
HSSFSerializer in any given release.
|
||||
</li>
|
||||
</ul>
|
||||
<p>
|
||||
The HDF API will include:
|
||||
</p>
|
||||
<ul>
|
||||
<li>
|
||||
An event driven API for reading
|
||||
DOC files.
|
||||
</li>
|
||||
<li>
|
||||
A set of high and low level APIs
|
||||
for reading and writing DOC files.
|
||||
</li>
|
||||
<li>
|
||||
Documentation of the DOC file
|
||||
format or enhancements to existing documentation.
|
||||
</li>
|
||||
</ul>
|
||||
</s1>
|
||||
<s1 title="5. Other Product Requirements">
|
||||
<s2 title="5.1. Applicable Standards">
|
||||
<p>
|
||||
All Java code will be 100% pure Java.
|
||||
</p>
|
||||
</s2>
|
||||
<s2 title="5.2. System Requirements">
|
||||
<p>
|
||||
The minimum system requirements for the POIFS API are:
|
||||
</p>
|
||||
<ul>
|
||||
<li>64 Mbytes memory</li>
|
||||
<li>Java 2 environment</li>
|
||||
<li>Pentium or better processor (or equivalent on other platforms)</li>
|
||||
</ul>
|
||||
<p>
|
||||
The minimum system requirements for the the HSSF API are:
|
||||
</p>
|
||||
<ul>
|
||||
<li>64 Mbytes memory</li>
|
||||
<li>Java 2 environment</li>
|
||||
<li>Pentium or better processor (or equivalent on other platforms)</li>
|
||||
<li>POIFS API</li>
|
||||
</ul>
|
||||
<p>
|
||||
The minimum system requirements for the the HDF API are:
|
||||
</p>
|
||||
<ul>
|
||||
<li>64 Mbytes memory</li>
|
||||
<li>Java 2 environment</li>
|
||||
<li>Pentium or better processor (or equivalent on other platforms)</li>
|
||||
<li>POIFS API</li>
|
||||
</ul>
|
||||
|
||||
<p>
|
||||
The minimum system requirements for the HSSF Serializer are:
|
||||
</p>
|
||||
<ul>
|
||||
<li>64 Mbytes memory</li>
|
||||
<li>Java 2 environment</li>
|
||||
<li>Pentium or better processor (or equivalent on other platforms)</li>
|
||||
<li>Cocoon 2</li>
|
||||
<li>HSSF API</li>
|
||||
<li>POI API</li>
|
||||
</ul>
|
||||
</s2>
|
||||
<s2 title="5.3. Performance Requirements">
|
||||
<p>
|
||||
All components must perform well enough
|
||||
to be practical for use in a webserver environment (especially
|
||||
the "killer trio": Cocoon2/Tomcat/Apache combo)
|
||||
</p>
|
||||
</s2>
|
||||
<s2 title="5.4. Environmental Requirements">
|
||||
<p>
|
||||
The software will run primarily in
|
||||
developer environments. We should make some allowances for
|
||||
not-highly-technical users to write XML documents for the HSSF
|
||||
Serializer. All other components will assume intermediate Java 2
|
||||
knowledge. No XML knowledge will be required except for using the
|
||||
HSSF Serializer. As much documentation as is practical shall be
|
||||
required for all components as XML is relatively new, and the
|
||||
concepts introduced for writing spreadsheets and to POI filesystems
|
||||
will be brand new to Java and many Java developers.
|
||||
</p>
|
||||
</s2>
|
||||
</s1>
|
||||
<s1 title="6. Documentation Requirements">
|
||||
<s2 title="6.1 POI Filesystem">
|
||||
<p>
|
||||
The filesystem as read and written by
|
||||
POI shall be fully documented and explained so that the average Java
|
||||
developer can understand it.
|
||||
</p>
|
||||
</s2>
|
||||
<s2 title="6.2. POI API">
|
||||
<p>
|
||||
The POI API will be fully documented
|
||||
through Javadoc. A walkthrough of using the high level POI API shall
|
||||
be provided. No documentation outside of the Javadoc shall be
|
||||
provided for the low-level POI APIs.
|
||||
</p>
|
||||
</s2>
|
||||
<s2 title="6.3. HSSF File Format">
|
||||
<p>
|
||||
The HSSF File Format as implemented by
|
||||
the HSSF API will be fully documented. No documentation will be
|
||||
provided for features that are not supported by HSSF API that are
|
||||
supported by the Excel 97 File Format. Care will be taken not to
|
||||
infringe on any "legal stuff". Additionally, we are
|
||||
collaborating with the fine folks at OpenOffice.org on
|
||||
*free* documentation of the format.
|
||||
</p>
|
||||
</s2>
|
||||
<s2 title="6.4. HSSF API">
|
||||
<p>
|
||||
The HSSF API will be documented by
|
||||
javadoc. A walkthrough of using the high level HSSF API shall be
|
||||
provided. No documentation outside of the Javadoc shall be provided
|
||||
for the low level HSSF APIs.
|
||||
</p>
|
||||
</s2>
|
||||
<s2 title="6.5 HDF API">
|
||||
<p>
|
||||
The HDF API will be documented by
|
||||
javadoc. A walkthrough of using the high level HDF API shall be
|
||||
provided. No documentation outside of the Javadoc shall be provided
|
||||
for the low level HDF APIs.
|
||||
</p>
|
||||
</s2>
|
||||
<s2 title="6.6 HSSF Serializer">
|
||||
<p>
|
||||
The HSSF Serializer will be documented
|
||||
by javadoc.
|
||||
</p>
|
||||
</s2>
|
||||
<s2 title="6.7 HSSF Generator">
|
||||
<p>
|
||||
The HSSF Generator will be documented
|
||||
by javadoc.
|
||||
</p>
|
||||
</s2>
|
||||
<s2 title="6.8 HSSF Serializer Tag language">
|
||||
<p>
|
||||
The XML tag language along with
|
||||
function and usage shall be fully documented. Examples will be
|
||||
provided as well.
|
||||
</p>
|
||||
</s2>
|
||||
</s1>
|
||||
<s1 title="7. Terminology">
|
||||
<s2 title="7.1 Filesystem">
|
||||
<p>
|
||||
filesystem shall refer only to the POI formatted archive.
|
||||
</p>
|
||||
</s2>
|
||||
<s2 title="7.2 File">
|
||||
<p>
|
||||
file shall refer to the embedded data stream within a
|
||||
POI filesystem. This will be the actual embedded document.
|
||||
</p>
|
||||
</s2>
|
||||
</s1>
|
||||
</body>
|
||||
</document>
|
||||
|
Loading…
Reference in New Issue