2003-01-28 23:20:19 -05:00
|
|
|
|
PEP: 305
|
2003-01-28 23:23:26 -05:00
|
|
|
|
Title: CSV File API
|
2003-01-28 23:20:19 -05:00
|
|
|
|
Version: $Revision$
|
|
|
|
|
Last-Modified: $Date$
|
2008-12-29 22:48:55 -05:00
|
|
|
|
Author: Kevin Altis <altis@semi-retired.com>,
|
|
|
|
|
Dave Cole <djc@object-craft.com.au>,
|
|
|
|
|
Andrew McNamara <andrewm@object-craft.com.au>,
|
|
|
|
|
Skip Montanaro <skip@pobox.com>,
|
|
|
|
|
Cliff Wells <LogiplexSoftware@earthlink.net>
|
2022-02-27 17:46:36 -05:00
|
|
|
|
Discussions-To: csv@python.org
|
2004-08-11 17:00:25 -04:00
|
|
|
|
Status: Final
|
2003-01-29 08:36:59 -05:00
|
|
|
|
Type: Standards Track
|
2003-01-28 23:20:19 -05:00
|
|
|
|
Content-Type: text/x-rst
|
|
|
|
|
Created: 26-Jan-2003
|
2022-08-24 18:40:18 -04:00
|
|
|
|
Python-Version: 2.3
|
2003-02-13 14:01:17 -05:00
|
|
|
|
Post-History: 31-Jan-2003, 13-Feb-2003
|
2003-01-28 23:20:19 -05:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Abstract
|
|
|
|
|
========
|
|
|
|
|
|
|
|
|
|
The Comma Separated Values (CSV) file format is the most common import
|
|
|
|
|
and export format for spreadsheets and databases. Although many CSV
|
|
|
|
|
files are simple to parse, the format is not formally defined by a
|
|
|
|
|
stable specification and is subtle enough that parsing lines of a CSV
|
2003-02-09 23:03:49 -05:00
|
|
|
|
file with something like ``line.split(",")`` is eventually bound to
|
|
|
|
|
fail. This PEP defines an API for reading and writing CSV files. It
|
|
|
|
|
is accompanied by a corresponding module which implements the API.
|
2003-01-28 23:20:19 -05:00
|
|
|
|
|
|
|
|
|
|
2003-01-30 07:53:40 -05:00
|
|
|
|
To Do (Notes for the Interested and Ambitious)
|
|
|
|
|
==============================================
|
|
|
|
|
|
|
|
|
|
- Better motivation for the choice of passing a file object to the
|
2003-03-22 01:38:51 -05:00
|
|
|
|
constructors. See
|
2017-06-11 15:02:39 -04:00
|
|
|
|
https://mail.python.org/pipermail/csv/2003-January/000179.html
|
2003-01-30 07:53:40 -05:00
|
|
|
|
|
2003-02-09 23:03:49 -05:00
|
|
|
|
- Unicode. ugh.
|
2003-01-30 07:53:40 -05:00
|
|
|
|
|
2003-01-28 23:20:19 -05:00
|
|
|
|
|
2003-02-09 23:03:49 -05:00
|
|
|
|
Application Domain
|
|
|
|
|
==================
|
2003-01-28 23:20:19 -05:00
|
|
|
|
|
2003-02-09 23:03:49 -05:00
|
|
|
|
This PEP is about doing one thing well: parsing tabular data which may
|
|
|
|
|
use a variety of field separators, quoting characters, quote escape
|
|
|
|
|
mechanisms and line endings. The authors intend the proposed module
|
|
|
|
|
to solve this one parsing problem efficiently. The authors do not
|
2003-03-22 01:38:51 -05:00
|
|
|
|
intend to address any of these related topics:
|
2003-01-28 23:20:19 -05:00
|
|
|
|
|
2003-03-22 01:38:51 -05:00
|
|
|
|
- data interpretation (is a field containing the string "10" supposed
|
|
|
|
|
to be a string, a float or an int? is it a number in base 10, base
|
|
|
|
|
16 or base 2? is a number in quotes a number or a string?)
|
2003-01-28 23:20:19 -05:00
|
|
|
|
|
2003-03-22 01:38:51 -05:00
|
|
|
|
- locale-specific data representation (should the number 1.23 be
|
|
|
|
|
written as "1.23" or "1,23" or "1 23"?) -- this may eventually be
|
|
|
|
|
addressed.
|
2003-01-28 23:20:19 -05:00
|
|
|
|
|
2003-03-22 01:38:51 -05:00
|
|
|
|
- fixed width tabular data - can already be parsed reliably.
|
2003-01-28 23:20:19 -05:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Rationale
|
|
|
|
|
=========
|
|
|
|
|
|
2003-02-09 23:03:49 -05:00
|
|
|
|
Often, CSV files are formatted simply enough that you can get by
|
|
|
|
|
reading them line-by-line and splitting on the commas which delimit
|
|
|
|
|
the fields. This is especially true if all the data being read is
|
2016-07-11 11:14:08 -04:00
|
|
|
|
numeric. This approach may work for a while, then come back to bite
|
2003-02-09 23:03:49 -05:00
|
|
|
|
you in the butt when somebody puts something unexpected in the data
|
|
|
|
|
like a comma. As you dig into the problem you may eventually come to
|
|
|
|
|
the conclusion that you can solve the problem using regular
|
2016-07-11 11:14:08 -04:00
|
|
|
|
expressions. This will work for a while, then break mysteriously one
|
2003-02-09 23:03:49 -05:00
|
|
|
|
day. The problem grows, so you dig deeper and eventually realize that
|
|
|
|
|
you need a purpose-built parser for the format.
|
2003-01-28 23:20:19 -05:00
|
|
|
|
|
2003-01-31 16:49:32 -05:00
|
|
|
|
CSV formats are not well-defined and different implementations have a
|
|
|
|
|
number of subtle corner cases. It has been suggested that the "V" in
|
|
|
|
|
the acronym stands for "Vague" instead of "Values". Different
|
|
|
|
|
delimiters and quoting characters are just the start. Some programs
|
2003-02-09 23:03:49 -05:00
|
|
|
|
generate whitespace after each delimiter which is not part of the
|
|
|
|
|
following field. Others quote embedded quoting characters by doubling
|
|
|
|
|
them, others by prefixing them with an escape character. The list of
|
|
|
|
|
weird ways to do things can seem endless.
|
2003-01-31 16:49:32 -05:00
|
|
|
|
|
2003-02-09 23:03:49 -05:00
|
|
|
|
All this variability means it is difficult for programmers to reliably
|
|
|
|
|
parse CSV files from many sources or generate CSV files designed to be
|
|
|
|
|
fed to specific external programs without a thorough understanding of
|
|
|
|
|
those sources and programs. This PEP and the software which accompany
|
|
|
|
|
it attempt to make the process less fragile.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Existing Modules
|
|
|
|
|
================
|
|
|
|
|
|
|
|
|
|
This problem has been tackled before. At least three modules
|
|
|
|
|
currently available in the Python community enable programmers to read
|
|
|
|
|
and write CSV files:
|
|
|
|
|
|
|
|
|
|
- Object Craft's CSV module [2]_
|
|
|
|
|
|
|
|
|
|
- Cliff Wells' Python-DSV module [3]_
|
|
|
|
|
|
|
|
|
|
- Laurence Tratt's ASV module [4]_
|
|
|
|
|
|
|
|
|
|
Each has a different API, making it somewhat difficult for programmers
|
|
|
|
|
to switch between them. More of a problem may be that they interpret
|
|
|
|
|
some of the CSV corner cases differently, so even after surmounting
|
|
|
|
|
the differences between the different module APIs, the programmer has
|
|
|
|
|
to also deal with semantic differences between the packages.
|
2003-01-31 16:49:32 -05:00
|
|
|
|
|
2003-01-28 23:20:19 -05:00
|
|
|
|
|
|
|
|
|
Module Interface
|
|
|
|
|
================
|
|
|
|
|
|
2003-02-09 23:03:49 -05:00
|
|
|
|
This PEP supports three basic APIs, one to read and parse CSV files,
|
|
|
|
|
one to write them, and one to identify different CSV dialects to the
|
|
|
|
|
readers and writers.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Reading CSV Files
|
|
|
|
|
-----------------
|
|
|
|
|
|
|
|
|
|
CSV readers are created with the reader factory function::
|
2003-01-28 23:20:19 -05:00
|
|
|
|
|
2003-02-02 21:07:37 -05:00
|
|
|
|
obj = reader(iterable [, dialect='excel']
|
2003-01-31 16:49:32 -05:00
|
|
|
|
[optional keyword args])
|
2003-01-28 23:20:19 -05:00
|
|
|
|
|
2003-02-09 23:03:49 -05:00
|
|
|
|
A reader object is an iterator which takes an iterable object
|
|
|
|
|
returning lines as the sole required parameter. If it supports a
|
|
|
|
|
binary mode (file objects do), the iterable argument to the reader
|
|
|
|
|
function must have been opened in binary mode. This gives the reader
|
|
|
|
|
object full control over the interpretation of the file's contents.
|
|
|
|
|
The optional dialect parameter is discussed below. The reader
|
|
|
|
|
function also accepts several optional keyword arguments which define
|
|
|
|
|
specific format settings for the parser (see the section "Formatting
|
|
|
|
|
Parameters"). Readers are typically used as follows::
|
2003-01-28 23:20:19 -05:00
|
|
|
|
|
|
|
|
|
csvreader = csv.reader(file("some.csv"))
|
|
|
|
|
for row in csvreader:
|
|
|
|
|
process(row)
|
|
|
|
|
|
2003-02-09 23:03:49 -05:00
|
|
|
|
Each row returned by a reader object is a list of strings or Unicode
|
|
|
|
|
objects.
|
|
|
|
|
|
|
|
|
|
When both a dialect parameter and individual formatting parameters are
|
|
|
|
|
passed to the constructor, first the dialect is queried for formatting
|
|
|
|
|
parameters, then individual formatting parameters are examined.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Writing CSV Files
|
|
|
|
|
-----------------
|
2003-01-28 23:20:19 -05:00
|
|
|
|
|
2003-02-09 23:03:49 -05:00
|
|
|
|
Creating writers is similar::
|
|
|
|
|
|
|
|
|
|
obj = writer(fileobj [, dialect='excel'],
|
2003-01-31 16:49:32 -05:00
|
|
|
|
[optional keyword args])
|
2003-01-28 23:20:19 -05:00
|
|
|
|
|
|
|
|
|
A writer object is a wrapper around a file-like object opened for
|
2003-02-09 23:03:49 -05:00
|
|
|
|
writing in binary mode (if such a distinction is made). It accepts
|
|
|
|
|
the same optional keyword parameters as the reader constructor.
|
2003-01-29 09:09:45 -05:00
|
|
|
|
|
|
|
|
|
Writers are typically used as follows::
|
2003-01-28 23:20:19 -05:00
|
|
|
|
|
|
|
|
|
csvwriter = csv.writer(file("some.csv", "w"))
|
|
|
|
|
for row in someiterable:
|
2003-02-09 23:03:49 -05:00
|
|
|
|
csvwriter.writerow(row)
|
2003-01-28 23:20:19 -05:00
|
|
|
|
|
2003-01-29 09:09:45 -05:00
|
|
|
|
To generate a set of field names as the first row of the CSV file, the
|
|
|
|
|
programmer must explicitly write it, e.g.::
|
|
|
|
|
|
|
|
|
|
csvwriter = csv.writer(file("some.csv", "w"), fieldnames=names)
|
|
|
|
|
csvwriter.write(names)
|
|
|
|
|
for row in someiterable:
|
|
|
|
|
csvwriter.write(row)
|
|
|
|
|
|
2003-01-31 16:49:32 -05:00
|
|
|
|
or arrange for it to be the first row in the iterable being written.
|
|
|
|
|
|
2003-01-29 09:09:45 -05:00
|
|
|
|
|
2003-02-09 23:03:49 -05:00
|
|
|
|
Managing Different Dialects
|
|
|
|
|
---------------------------
|
2003-01-28 23:20:19 -05:00
|
|
|
|
|
2003-02-09 23:03:49 -05:00
|
|
|
|
Because CSV is a somewhat ill-defined format, there are plenty of ways
|
|
|
|
|
one CSV file can differ from another, yet contain exactly the same
|
|
|
|
|
data. Many tools which can import or export tabular data allow the
|
|
|
|
|
user to indicate the field delimiter, quote character, line
|
|
|
|
|
terminator, and other characteristics of the file. These can be
|
|
|
|
|
fairly easily determined, but are still mildly annoying to figure out,
|
|
|
|
|
and make for fairly long function calls when specified individually.
|
2003-01-30 07:11:27 -05:00
|
|
|
|
|
2003-02-09 23:03:49 -05:00
|
|
|
|
To try and minimize the difficulty of figuring out and specifying a
|
|
|
|
|
bunch of formatting parameters, reader and writer objects support a
|
|
|
|
|
dialect argument which is just a convenient handle on a group of these
|
|
|
|
|
lower level parameters. When a dialect is given as a string it
|
|
|
|
|
identifies one of the dialects known to the module via its
|
|
|
|
|
registration functions, otherwise it must be an instance of the
|
|
|
|
|
Dialect class as described below.
|
2003-01-31 16:49:32 -05:00
|
|
|
|
|
2003-01-29 09:09:45 -05:00
|
|
|
|
Dialects will generally be named after applications or organizations
|
2003-02-09 23:03:49 -05:00
|
|
|
|
which define specific sets of format constraints. Two dialects are
|
|
|
|
|
defined in the module as of this writing, "excel", which describes the
|
|
|
|
|
default format constraints for CSV file export by Excel 97 and Excel
|
|
|
|
|
2000, and "excel-tab", which is the same as "excel" but specifies an
|
|
|
|
|
ASCII TAB character as the field delimiter.
|
2003-01-28 23:20:19 -05:00
|
|
|
|
|
2003-01-31 16:49:32 -05:00
|
|
|
|
Dialects are implemented as attribute only classes to enable users to
|
2003-02-09 23:03:49 -05:00
|
|
|
|
construct variant dialects by subclassing. The "excel" dialect is a
|
|
|
|
|
subclass of Dialect and is defined as follows::
|
|
|
|
|
|
|
|
|
|
class Dialect:
|
|
|
|
|
# placeholders
|
|
|
|
|
delimiter = None
|
|
|
|
|
quotechar = None
|
|
|
|
|
escapechar = None
|
|
|
|
|
doublequote = None
|
|
|
|
|
skipinitialspace = None
|
|
|
|
|
lineterminator = None
|
|
|
|
|
quoting = None
|
2003-01-30 07:11:27 -05:00
|
|
|
|
|
2003-02-09 23:03:49 -05:00
|
|
|
|
class excel(Dialect):
|
2003-01-30 07:11:27 -05:00
|
|
|
|
delimiter = ','
|
2003-02-02 07:25:23 -05:00
|
|
|
|
quotechar = '"'
|
|
|
|
|
doublequote = True
|
2003-01-30 07:11:27 -05:00
|
|
|
|
skipinitialspace = False
|
|
|
|
|
lineterminator = '\r\n'
|
2003-01-30 08:34:29 -05:00
|
|
|
|
quoting = QUOTE_MINIMAL
|
2003-01-30 07:11:27 -05:00
|
|
|
|
|
2003-02-09 23:03:49 -05:00
|
|
|
|
The "excel-tab" dialect is defined as::
|
2003-01-30 07:11:27 -05:00
|
|
|
|
|
2003-02-09 23:03:49 -05:00
|
|
|
|
class exceltsv(excel):
|
2003-01-30 07:11:27 -05:00
|
|
|
|
delimiter = '\t'
|
|
|
|
|
|
2003-02-09 23:03:49 -05:00
|
|
|
|
(For a description of the individual formatting parameters see the
|
|
|
|
|
section "Formatting Parameters".)
|
|
|
|
|
|
|
|
|
|
To enable string references to specific dialects, the module defines
|
|
|
|
|
several functions::
|
2003-01-28 23:20:19 -05:00
|
|
|
|
|
2003-01-30 07:11:27 -05:00
|
|
|
|
dialect = get_dialect(name)
|
2003-02-09 23:03:49 -05:00
|
|
|
|
names = list_dialects()
|
|
|
|
|
register_dialect(name, dialect)
|
|
|
|
|
unregister_dialect(name)
|
2003-01-29 09:09:45 -05:00
|
|
|
|
|
2003-02-09 23:03:49 -05:00
|
|
|
|
``get_dialect()`` returns the dialect instance associated with the
|
|
|
|
|
given name. ``list_dialects()`` returns a list of all registered
|
|
|
|
|
dialect names. ``register_dialects()`` associates a string name with
|
|
|
|
|
a dialect class. ``unregister_dialect()`` deletes a name/dialect
|
2017-03-24 17:11:33 -04:00
|
|
|
|
association.
|
2003-01-29 09:09:45 -05:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Formatting Parameters
|
|
|
|
|
---------------------
|
|
|
|
|
|
2003-02-09 23:03:49 -05:00
|
|
|
|
In addition to the dialect argument, both the reader and writer
|
|
|
|
|
constructors take several specific formatting parameters, specified as
|
2003-03-22 01:38:51 -05:00
|
|
|
|
keyword parameters. The formatting parameters understood are:
|
2003-01-28 23:20:19 -05:00
|
|
|
|
|
2003-03-22 01:38:51 -05:00
|
|
|
|
- ``quotechar`` specifies a one-character string to use as the quoting
|
|
|
|
|
character. It defaults to '"'. Setting this to None has the same
|
|
|
|
|
effect as setting quoting to csv.QUOTE_NONE.
|
2003-01-28 23:20:19 -05:00
|
|
|
|
|
2003-03-22 01:38:51 -05:00
|
|
|
|
- ``delimiter`` specifies a one-character string to use as the field
|
|
|
|
|
separator. It defaults to ','.
|
2003-01-30 07:11:27 -05:00
|
|
|
|
|
2003-03-22 01:38:51 -05:00
|
|
|
|
- ``escapechar`` specifies a one-character string used to escape the
|
|
|
|
|
delimiter when quotechar is set to None.
|
2003-01-28 23:20:19 -05:00
|
|
|
|
|
2003-03-22 01:38:51 -05:00
|
|
|
|
- ``skipinitialspace`` specifies how to interpret whitespace which
|
|
|
|
|
immediately follows a delimiter. It defaults to False, which means
|
|
|
|
|
that whitespace immediately following a delimiter is part of the
|
|
|
|
|
following field.
|
2003-01-29 09:09:45 -05:00
|
|
|
|
|
2003-03-22 01:38:51 -05:00
|
|
|
|
- ``lineterminator`` specifies the character sequence which should
|
|
|
|
|
terminate rows.
|
2003-01-31 16:49:32 -05:00
|
|
|
|
|
2003-03-22 01:38:51 -05:00
|
|
|
|
- ``quoting`` controls when quotes should be generated by the writer.
|
2003-03-22 01:39:54 -05:00
|
|
|
|
It can take on any of the following module constants:
|
2003-01-30 07:11:27 -05:00
|
|
|
|
|
2003-03-22 01:38:51 -05:00
|
|
|
|
* csv.QUOTE_MINIMAL means only when required, for example, when a
|
|
|
|
|
field contains either the quotechar or the delimiter
|
2003-01-30 07:11:27 -05:00
|
|
|
|
|
2003-03-22 01:38:51 -05:00
|
|
|
|
* csv.QUOTE_ALL means that quotes are always placed around fields.
|
2003-01-30 07:11:27 -05:00
|
|
|
|
|
2003-03-22 01:38:51 -05:00
|
|
|
|
* csv.QUOTE_NONNUMERIC means that quotes are always placed around
|
|
|
|
|
nonnumeric fields.
|
2003-02-02 07:25:23 -05:00
|
|
|
|
|
2003-03-22 01:38:51 -05:00
|
|
|
|
* csv.QUOTE_NONE means that quotes are never placed around fields.
|
2003-01-30 07:11:27 -05:00
|
|
|
|
|
2003-03-22 01:38:51 -05:00
|
|
|
|
- ``doublequote`` controls the handling of quotes inside fields. When
|
|
|
|
|
True two consecutive quotes are interpreted as one during read, and
|
|
|
|
|
when writing, each quote is written as two quotes.
|
2003-01-29 09:09:45 -05:00
|
|
|
|
|
2003-01-28 23:20:19 -05:00
|
|
|
|
When processing a dialect setting and one or more of the other
|
2003-02-09 23:03:49 -05:00
|
|
|
|
optional parameters, the dialect parameter is processed before the
|
|
|
|
|
individual formatting parameters. This makes it easy to choose a
|
|
|
|
|
dialect, then override one or more of the settings without defining a
|
|
|
|
|
new dialect class. For example, if a CSV file was generated by Excel
|
|
|
|
|
2000 using single quotes as the quote character and a colon as the
|
|
|
|
|
delimiter, you could create a reader like::
|
2003-01-28 23:20:19 -05:00
|
|
|
|
|
2003-01-31 16:49:32 -05:00
|
|
|
|
csvreader = csv.reader(file("some.csv"), dialect="excel",
|
2003-02-09 23:03:49 -05:00
|
|
|
|
quotechar="'", delimiter=':')
|
2003-01-28 23:20:19 -05:00
|
|
|
|
|
|
|
|
|
Other details of how Excel generates CSV files would be handled
|
2003-02-09 23:03:49 -05:00
|
|
|
|
automatically because of the reference to the "excel" dialect.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Reader Objects
|
|
|
|
|
--------------
|
|
|
|
|
|
|
|
|
|
Reader objects are iterables whose next() method returns a sequence of
|
|
|
|
|
strings, one string per field in the row.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Writer Objects
|
|
|
|
|
--------------
|
|
|
|
|
|
|
|
|
|
Writer objects have two methods, writerow() and writerows(). The
|
|
|
|
|
former accepts an iterable (typically a list) of fields which are to
|
|
|
|
|
be written to the output. The latter accepts a list of iterables and
|
|
|
|
|
calls writerow() for each.
|
2003-01-28 23:20:19 -05:00
|
|
|
|
|
|
|
|
|
|
2003-01-31 16:49:32 -05:00
|
|
|
|
Implementation
|
|
|
|
|
==============
|
|
|
|
|
|
|
|
|
|
There is a sample implementation available. [1]_ The goal is for it
|
|
|
|
|
to efficiently implement the API described in the PEP. It is heavily
|
|
|
|
|
based on the Object Craft csv module. [2]_
|
|
|
|
|
|
|
|
|
|
|
2003-01-28 23:20:19 -05:00
|
|
|
|
Testing
|
|
|
|
|
=======
|
|
|
|
|
|
2003-01-31 16:49:32 -05:00
|
|
|
|
The sample implementation [1]_ includes a set of test cases.
|
2003-01-28 23:20:19 -05:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Issues
|
|
|
|
|
======
|
|
|
|
|
|
2003-02-09 23:03:49 -05:00
|
|
|
|
1. Should a parameter control how consecutive delimiters are
|
|
|
|
|
interpreted? Our thought is "no". Consecutive delimiters should
|
|
|
|
|
always denote an empty field.
|
2003-01-28 23:20:19 -05:00
|
|
|
|
|
2003-02-09 23:03:49 -05:00
|
|
|
|
2. What about Unicode? Is it sufficient to pass a file object gotten
|
|
|
|
|
from codecs.open()? For example::
|
2003-01-28 23:20:19 -05:00
|
|
|
|
|
2003-02-09 23:03:49 -05:00
|
|
|
|
csvreader = csv.reader(codecs.open("some.csv", "r", "cp1252"))
|
2003-01-28 23:20:19 -05:00
|
|
|
|
|
2003-02-09 23:03:49 -05:00
|
|
|
|
csvwriter = csv.writer(codecs.open("some.csv", "w", "utf-8"))
|
2003-01-28 23:20:19 -05:00
|
|
|
|
|
2003-02-09 23:03:49 -05:00
|
|
|
|
In the first example, text would be assumed to be encoded as cp1252.
|
|
|
|
|
Should the system be aggressive in converting to Unicode or should
|
|
|
|
|
Unicode strings only be returned if necessary?
|
2003-01-28 23:20:19 -05:00
|
|
|
|
|
2003-02-09 23:03:49 -05:00
|
|
|
|
In the second example, the file will take care of automatically
|
|
|
|
|
encoding Unicode strings as utf-8 before writing to disk.
|
2003-01-28 23:20:19 -05:00
|
|
|
|
|
2003-02-09 23:03:49 -05:00
|
|
|
|
Note: As of this writing, the csv module doesn't handle Unicode
|
|
|
|
|
data.
|
2003-01-28 23:20:19 -05:00
|
|
|
|
|
2003-02-09 23:03:49 -05:00
|
|
|
|
3. What about alternate escape conventions? If the dialect in use
|
|
|
|
|
includes an ``escapechar`` parameter which is not None and the
|
|
|
|
|
``quoting`` parameter is set to QUOTE_NONE, delimiters appearing
|
|
|
|
|
within fields will be prefixed by the escape character when writing
|
|
|
|
|
and are expected to be prefixed by the escape character when
|
|
|
|
|
reading.
|
2003-01-28 23:20:19 -05:00
|
|
|
|
|
2003-02-09 23:03:49 -05:00
|
|
|
|
4. Should there be a "fully quoted" mode for writing? What about
|
|
|
|
|
"fully quoted except for numeric values"? Both are implemented
|
|
|
|
|
(QUOTE_ALL and QUOTE_NONNUMERIC, respectively).
|
2003-01-28 23:20:19 -05:00
|
|
|
|
|
2003-02-09 23:03:49 -05:00
|
|
|
|
5. What about end-of-line? If I generate a CSV file on a Unix system,
|
|
|
|
|
will Excel properly recognize the LF-only line terminators? Files
|
|
|
|
|
must be opened for reading or writing as appropriate using binary
|
2012-11-18 12:55:10 -05:00
|
|
|
|
mode. Specify the ``lineterminator`` sequence as ``'\r\n'``. The
|
2003-02-09 23:03:49 -05:00
|
|
|
|
resulting file will be written correctly.
|
2003-01-28 23:20:19 -05:00
|
|
|
|
|
2003-02-09 23:03:49 -05:00
|
|
|
|
6. What about an option to generate dicts from the reader and accept
|
|
|
|
|
dicts by the writer? See the DictReader and DictWriter classes in
|
|
|
|
|
csv.py.
|
2003-01-28 23:20:19 -05:00
|
|
|
|
|
2003-03-22 01:38:51 -05:00
|
|
|
|
7. Are quote character and delimiters limited to single characters?
|
2003-02-09 23:03:49 -05:00
|
|
|
|
For the time being, yes.
|
2003-01-28 23:20:19 -05:00
|
|
|
|
|
2003-03-22 01:38:51 -05:00
|
|
|
|
8. How should rows of different lengths be handled? Interpretation of
|
2003-02-09 23:03:49 -05:00
|
|
|
|
the data is the application's job. There is no such thing as a
|
|
|
|
|
"short row" or a "long row" at this level.
|
2003-01-28 23:20:19 -05:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
References
|
|
|
|
|
==========
|
|
|
|
|
|
2003-01-31 16:49:32 -05:00
|
|
|
|
.. [1] csv module, Python Sandbox
|
|
|
|
|
(http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/nondist/sandbox/csv/)
|
|
|
|
|
|
|
|
|
|
.. [2] csv module, Object Craft
|
|
|
|
|
(http://www.object-craft.com.au/projects/csv)
|
2003-01-28 23:20:19 -05:00
|
|
|
|
|
2003-01-31 16:49:32 -05:00
|
|
|
|
.. [3] Python-DSV module, Wells
|
|
|
|
|
(http://sourceforge.net/projects/python-dsv/)
|
2003-01-28 23:20:19 -05:00
|
|
|
|
|
2003-01-31 16:49:32 -05:00
|
|
|
|
.. [4] ASV module, Tratt
|
2003-01-28 23:20:19 -05:00
|
|
|
|
(http://tratt.net/laurie/python/asv/)
|
|
|
|
|
|
|
|
|
|
There are many references to other CSV-related projects on the Web. A
|
|
|
|
|
few are included here.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Copyright
|
|
|
|
|
=========
|
|
|
|
|
|
|
|
|
|
This document has been placed in the public domain.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
..
|
|
|
|
|
Local Variables:
|
|
|
|
|
mode: indented-text
|
|
|
|
|
indent-tabs-mode: nil
|
|
|
|
|
sentence-end-double-space: t
|
|
|
|
|
fill-column: 70
|
|
|
|
|
End:
|