From 935a1711b369b4cdd7a8d573337388b140f3f9e3 Mon Sep 17 00:00:00 2001
From: David Goodger <goodger@python.org>
Date: Wed, 29 Jan 2003 04:20:19 +0000
Subject: [PATCH] Added PEP 305, CSV file API

---
 pep-0000.txt |   8 +-
 pep-0305.txt | 230 +++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 236 insertions(+), 2 deletions(-)
 create mode 100644 pep-0305.txt

diff --git a/pep-0000.txt b/pep-0000.txt
index 30f970ca9..246651bbf 100644
--- a/pep-0000.txt
+++ b/pep-0000.txt
@@ -105,7 +105,8 @@ Index by Category
  S   301  Package Index and Metadata for Distutils     Jones
  S   302  New Import Hooks                             JvR
  S   303  Extend divmod() for Multiple Divisors        Bellman
- S   304  Controlling generation of bytecode files     Montanaro
+ S   304  Controlling Generation of Bytecode Files     Montanaro
+ I   305  CSV File API                                 Montanaro, Altis, Wells
 
  Finished PEPs (done, implemented in CVS)
 
@@ -299,7 +300,8 @@ Numerical Index
  S   301  Package Index and Metadata for Distutils     Jones
  S   302  New Import Hooks                             JvR
  S   303  Extend divmod() for Multiple Divisors        Bellman
- S   304  Controlling generation of bytecode files     Montanaro
+ S   304  Controlling Generation of Bytecode Files     Montanaro
+ I   305  CSV File API                                 Montanaro, Altis, Wells
  SR  666  Reject Foolish Indentation                   Creighton
 
 
@@ -320,6 +322,7 @@ Owners
     Aahz                     aahz@pobox.com
     Ahlstrom, James C.       jim@interet.com
     Althoff, Jim             james_althoff@i2.com
+    Altis, Kevin             altis@semi-retired.com
     Ascher, David            davida@activestate.com
     Barrett, Paul            barrett@stsci.edu
     Baxter, Anthony          anthony@interlink.com.au
@@ -372,6 +375,7 @@ Owners
     Stein, Greg              gstein@lyra.org
     Tirosh, Oren             oren at hishome.net
     Warsaw, Barry            barry@zope.com
+    Wells, Cliff             LogiplexSoftware@earthlink.net
     Wilson, Greg             gvwilson@ddj.com
     Wouters, Thomas          thomas@xs4all.net
     Yee, Ka-Ping             ping@lfw.org
diff --git a/pep-0305.txt b/pep-0305.txt
new file mode 100644
index 000000000..3f90443d5
--- /dev/null
+++ b/pep-0305.txt
@@ -0,0 +1,230 @@
+PEP: 305
+Title: CSV file API
+Version: $Revision$
+Last-Modified: $Date$
+Author: Skip Montanaro <skip@pobox.com>,
+        Kevin Altis <altis@semi-retired.com>,
+        Cliff Wells <LogiplexSoftware@earthlink.net>
+Status: Draft
+Type: Informational
+Content-Type: text/x-rst
+Created: 26-Jan-2003
+Post-History: 
+
+
+Abstract
+========
+
+The Comma Separated Values (CSV) file format is the most common import
+and export format for spreadsheets and databases.  Although many CSV
+files are simple to parse, the format is not formally defined by a
+stable specification and is subtle enough that parsing lines of a CSV
+file with something like ``line.split(",")`` is bound to fail.  This
+PEP defines an API for reading and writing CSV files which should make
+it possible for programmers to select a CSV module which meets their
+requirements.
+
+
+Existing Modules
+================
+
+Three widely available modules enable programmers to read and write
+CSV files:
+
+- Object Craft's CSV module [1]_
+
+- Cliff Wells's Python-DSV module [2]_
+
+- Laurence Tratt's ASV module [3]_
+
+Each has a different API, making it somewhat difficult for programmers
+to switch between them.  More of a problem may be that they interpret
+some of the CSV corner cases differently, so even after surmounting
+the differences in the module APIs, the programmer has to also deal
+with semantic differences between the packages.
+
+
+Rationale
+=========
+
+By defining common APIs for reading and writing CSV files, we make it
+easier for programmers to choose an appropriate module to suit their
+needs, and make it easier to switch between modules if their needs
+change.  This PEP also forms a set of requirements for creation of a
+module which will hopefully be incorporated into the Python
+distribution.
+
+
+Module Interface
+================
+
+The module supports two basic APIs, one for reading and one for
+writing.  The reading interface is::
+
+    reader(fileobj [, dialect='excel2000']
+                   [, quotechar='"']
+                   [, delimiter=',']
+                   [, skipinitialspace=False])
+
+A reader object is an iterable which takes a file-like object opened
+for reading as the sole required parameter.  It also accepts four
+optional parameters (discussed below).  Readers are typically used as
+follows::
+
+    csvreader = csv.reader(file("some.csv"))
+    for row in csvreader:
+        process(row)
+
+The writing interface is similar::
+
+    writer(fileobj [, dialect='excel2000']
+                   [, quotechar='"']
+                   [, delimiter=',']
+                   [, skipinitialspace=False])
+
+A writer object is a wrapper around a file-like object opened for
+writing.  It accepts the same four optional parameters as the reader
+constructor.  Writers are typically used as follows::
+
+    csvwriter = csv.writer(file("some.csv", "w"))
+    for row in someiterable:
+        csvwriter.write(row)
+
+
+Optional Parameters
+-------------------
+
+Both the reader and writer constructors take four optional keyword
+parameters:
+
+- dialect is an easy way of specifying a complete set of format
+  constraints for a reader or writer.  Most people will know what
+  application generated a CSV file or what application will process
+  the CSV file they are generating, but not the precise settings
+  necessary.  The only dialect defined initially is "excel2000".  The
+  dialect parameter is interpreted in a case-insensitive manner.
+
+- quotechar specifies a one-character string to use as the quoting
+  character.  It defaults to '"'.
+
+- delimiter specifies a one-character string to use as the field
+  separator.  It defaults to ','.
+
+- skipinitialspace specifies how to interpret whitespace which
+  immediately follows a delimiter.  It defaults to False, which means
+  that whitespace immediate following a delimiter is part of the
+  following field.
+
+When processing a dialect setting and one or more of the other
+optional parameters, the dialect parameter is processed first, then
+the others are processed.  This makes it easy to choose a dialect,
+then override one or more of the settings.  For example, if a CSV file
+was generated by Excel 2000 using single quotes as the quote
+character and TAB as the delimiter, you could create a reader like::
+
+    csvreader = csv.reader(file("some.csv"), dialect="excel2000",
+                           quotechar="'", delimiter='\t')
+
+Other details of how Excel generates CSV files would be handled
+automatically.
+
+
+Testing
+=======
+
+TBD.
+
+
+
+Issues
+======
+
+- Should a parameter control how consecutive delimiters are
+  interpreted?  Our thought is "no".  Consecutive delimiters should
+  always denote an empty field.
+
+- What about Unicode?  Is it sufficient to pass a file object gotten
+  from codecs.open()?  For example::
+
+    csvreader = csv.reader(codecs.open("some.csv", "r", "cp1252"))
+
+    csvwriter = csv.writer(codecs.open("some.csv", "w", "utf-8"))
+
+  In the first example, text would be assumed to be encoded as cp1252.
+  Should the system be aggressive in converting to Unicode or should
+  Unicode strings only be returned if necessary?
+
+  In the second example, the file will take care of automatically
+  encoding Unicode strings as utf-8 before writing to disk.
+
+- What about alternate escape conventions?  When Excel exports a file,
+  it appears only the field delimiter needs to be escaped.  It
+  accomplishes this by quoting the entire field, then doubling any
+  quote characters which appear in the field.  It also quotes a field
+  if the first character is a quote character.  It would seem we need
+  to support two modes: escape-by-quoting and escape-by-prefix.  In
+  addition, for the second mode, we'd have to specify the escape
+  character (presumably defaulting to a backslash character).
+
+- Should there be a "fully quoted" mode for writing?  What about
+  "fully quoted except for numeric values"?
+
+- What about end-of-line?  If I generate a CSV file on a Unix system,
+  will Excel properly recognize the LF-only line terminators?
+
+- What about conversion to other file formats?  Is the list-of-lists
+  output from the csvreader sufficient to feed into other writers?
+
+- What about an option to generate list-of-dict output from the reader
+  and accept list-of-dicts by the writer?  This makes manipulating
+  individual rows easier since each one is independent, but you lose
+  field order when writing and have to tell the writer object the
+  order the fields should appear in the file.
+
+- Are quote character and delimiters limited to single characters?  I
+  had a client not that long ago who wrote their own flat file format
+  with a delimiter of ":::".
+
+- How should rows of different lengths be handled?  The options seem
+  to be:
+
+  * raise an exception when a row is encountered whose length differs
+    from the previous row
+
+  * silently return short rows
+
+  * allow the caller to specify the desired row length and what to do
+    when rows of a different length are encountered: ignore, truncate,
+    pad, raise exception, etc.
+
+
+References
+==========
+
+.. [1] csv module, Object Craft
+   (http://www.object-craft.com.au/projects/csv) 
+
+.. [2] Python-DSV module, Wells
+   (http://sourceforge.net/projects/python-dsv/) 
+
+.. [3] ASV module, Tratt
+   (http://tratt.net/laurie/python/asv/)
+
+There are many references to other CSV-related projects on the Web.  A
+few are included here.
+
+
+Copyright
+=========
+
+This document has been placed in the public domain.
+
+
+
+..
+   Local Variables:
+   mode: indented-text
+   indent-tabs-mode: nil
+   sentence-end-double-space: t
+   fill-column: 70
+   End: