diff --git a/src/documentation/content/xdocs/hdf/book.xml b/src/documentation/content/xdocs/hdf/book.xml deleted file mode 100644 index 10cd38c830..0000000000 --- a/src/documentation/content/xdocs/hdf/book.xml +++ /dev/null @@ -1,12 +0,0 @@ - - - - - - - - - - - - diff --git a/src/documentation/content/xdocs/hdf/docoverview.xml b/src/documentation/content/xdocs/hdf/docoverview.xml deleted file mode 100644 index dd177d6ad2..0000000000 --- a/src/documentation/content/xdocs/hdf/docoverview.xml +++ /dev/null @@ -1,94 +0,0 @@ - - - - -
- HDF - Word file format - - - -
- - -
The Word 97 File Format in semi-plain English - -

The purpose of this document is to give a brief high level overview of the - HDF document format. This document does not go into in-depth technical - detail and is only meant as a supplement to the Microsoft Word 97 Binary - File Format freely available at Wotsit.org.

-

The OLE file format is not discussed in this document. It is assumed that - the reader has a working knowledge of the POIFS API.

- -
Word file structure -

A Word file is made up of the document text and data structures - containing formatting information about the text. Of course, this is a - very simplified illustration. There are fields and macros and other - things that have not been considered. At this stage, HDF is mainly - concerned with formatted text.

-
-
Reading Word files -

The entry point for HDF's reading of a Word file is the File Information - Block (FIB). This structure is the entry point for the locations and size - of a document's text and data structures. The FIB is located at the - beginning of the main stream.

-
Text -

The document's text is also located in the main stream. Its starting - location is given as FIB.fcMin and its length is given in bytes by - FIB.ccpText. These two values are not very useful in getting the text - because of unicode. There may be unicode text intermingled with ASCII - text. That brings us to the piece table.

-

The piece table is used to divide the text into non-unicode and unicode - pieces. The size and offset are given in FIB.fcClx and FIB.lcbClx - respectively. The piece table may contain Property Modifiers (prm). - These are for complex(fast-saved) files and are skipped. Each text piece - contains offsets in the main stream that contain text for that piece. - If the piece uses unicode, the file offset is masked with a certain bit. - Then you have to unmask the bit and divide by 2 to get the real file - offset.

-
-
Text Formatting -
Stylesheet -

All text formatting is based on styles contained in the StyleSheet. - The StyleSheet is a data structure containing among other things, style - descriptions. Each style description can contain a paragraph style and - a character style or simply a character style. Each style description - is stored in a compressed version on file. Basically these are deltas - from another style.

-

Eventually, you have to chain back to the nil style which is an - imaginary style with certain implied values.

-
-
Paragraph and Character styles -

Paragraph and Character formatting properties for a document's text are - stored on file as deltas from some base style in the Stylesheet. The - deltas are used to create a complete uncompressed style in memory.

-

Uncompressed paragraph styles are represented by the Pargraph - Properties(PAP) data structure. Uncompressed character styles are - represented by the Character Properties(CHP) data structure. The styles - for the document text are stored in compressed format in the - corresponding Formatted Disk Pages (FKP). A compressed PAP is referred - to as a PAPX and a compressed CHP is a CHPX. The FKP locations are - stored in the bin table. There are seperate bin tables for CHPXs and - PAPXs. The bin tables' locations and sizes are stored in the FIB.

-

A FKP is a 512 byte OLE page. It contains the offsets of the beginning - and end of each paragraph/character run in the main stream and the - compressed properties for that interval. The compessed PAPX is based on - its base style in the StyleSheet. The compressed CHPX is based on the - enclosing paragraph's base style in the Stylesheet.

-
-
Uncompressing styles and other data structures -

All compressed properties(CHPX, PAPX, SEPX) contain a grpprl. A grpprl - is an array of sprms. A sprm defines a delta from some base property. - There is a table of possible sprms in the Word 97 spec. Each sprm is a - two byte operand followed by a parameter. The parameter size depends on - the sprm. Each sprm describes an operation that should be performed on - the base style. After every sprm in the grpprl is performed on the base - style you will have the style for the paragraph, character run, - section, etc.

-
-
-
-
- -
- diff --git a/src/documentation/content/xdocs/hdf/index.xml b/src/documentation/content/xdocs/hdf/index.xml deleted file mode 100644 index d36d2d078b..0000000000 --- a/src/documentation/content/xdocs/hdf/index.xml +++ /dev/null @@ -1,34 +0,0 @@ - - - - -
- Jakarta POI - HDF -Java APIs with XML manipulate MS-Word - Overview - - - - - -
- - -
Overview - -

HDF is the name of OUR port of the Microsoft Word 97(-2002) file format to - pure Java.

-

HDF is still in early development. It is in the - scratchpad section of the - CVS. Source code in the org.apache.poi.hdf.extractor tree is - legacy code. Source in the org.apache.poi.hdf.model - tree is the old legacy code refactored into an object model. Check the How-To - page for detailed examples on using HDF. -

-

- We are looking for developers!!! If you are interested in helping with HDF - familiarize yourself with the source code and just start coding. Make sure - you read the guidelines for - getting involved

-
- -
diff --git a/src/documentation/content/xdocs/hdf/projectplan.xml b/src/documentation/content/xdocs/hdf/projectplan.xml deleted file mode 100644 index ffee7c3d5d..0000000000 --- a/src/documentation/content/xdocs/hdf/projectplan.xml +++ /dev/null @@ -1,367 +0,0 @@ - - - - - -

HWPF Milestones

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- Milestones - - Target Date - - Owner -
- Read in a Word document -with minimum formatting -(no lists, tables, footnotes, -endnotes, headers, footers) -and write it back out with the -result viewable in Word -97/2000 - - 07/11/2003 - - Ryan -
- Add support for Lists and -Tables - - 8/15/2003 - -   -
- HWPF 1.0-alpha release with -documentation and examples - - 8/18/2003 - - Praveen/Ryan -
- Add support for Headers, -Footers, endnotes, and -footnotes - - 8/31/2003 - - ? -
- Add support for forms and -mail merge - - September/October 2003 - - ? -
-

HWPF Task Lists

-

Read in a Word document with minimum formatting (no lists, tables, footnotes, -endnotes, headers, footers) and write it back out with the result viewable in Word 97/2000

- - - - - - - - - - - - - - - - - - - - - -
- Task - - Target Date - - Owner -
- Create classes to read and -write low level data -structures with test cases - - 7/10/2003 - - Ryan -
- Create classes to read and -write FontTable and Font -names with test case - - 7/10/2003 - - Praveen -
- Final test - - 7/11/2003 - - Ryan -
-

Develop user friendly API so it is fun and easy to read and write word documents -with java.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- Task - - Target Date - - Owner -
- Develop a way for SPRMS to -be compressed and -uncompressed - - - - -
- Override CHPAbstractType -with a concrete class that -exposes attributes with -human readable names - - - - -
- Override PAPAbstractType -with a concrete class that -exposes attributes with -human readable names - - - - -
- Override SEPAbstractType -with a concrete class that -exposes attributes with -human readable names - - - - -
- Override DOPAbstractType -with a concrete class that -exposes attributes with -human readable names - - - - -
- Override TAPAbstractType -with a concrete class that -exposes attributes with -human readable names - - - - -
- Override TCAbstractType -with a concrete class that -exposes attributes with -human readable names - - - - -
- Develop a VerifyIntegrity -class for testing so it is easy -to determine if a Word -Document is well-formed. - - - - -
- Develop general intuitive -API to tie everything together - - - - -
-

Add support for lists and tables

- - - - - - - - - - - - - - - - -
- Task - - Target Date - - Owner -
- Add data structures for -reading and writing list data -with test cases. - - - - -
- Add data structures for -reading and writing tables -with test cases. - - - - -
-

HWPF 1.0-alpha release with documentation and examples

- - - - - - - - - - - - - - - - - - - - - -
- Task - - Target Date - - Owner -
- Document the user model -API - - - - -
- Document the low level -classes - - - - -
- Come up with detailed How-To’s - - - - -
- -