python-peps/pep-0305/index.html

509 lines
35 KiB
HTML
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="color-scheme" content="light dark">
<title>PEP 305 CSV File API | peps.python.org</title>
<link rel="shortcut icon" href="../_static/py.png">
<link rel="canonical" href="https://peps.python.org/pep-0305/">
<link rel="stylesheet" href="../_static/style.css" type="text/css">
<link rel="stylesheet" href="../_static/mq.css" type="text/css">
<link rel="stylesheet" href="../_static/pygments.css" type="text/css" media="(prefers-color-scheme: light)" id="pyg-light">
<link rel="stylesheet" href="../_static/pygments_dark.css" type="text/css" media="(prefers-color-scheme: dark)" id="pyg-dark">
<link rel="alternate" type="application/rss+xml" title="Latest PEPs" href="https://peps.python.org/peps.rss">
<meta property="og:title" content='PEP 305 CSV File API | peps.python.org'>
<meta property="og:description" content="The Comma Separated Values (CSV) file format is the most common import and export format for spreadsheets and databases. Although many CSV files are simple to parse, the format is not formally defined by a stable specification and is subtle enough that...">
<meta property="og:type" content="website">
<meta property="og:url" content="https://peps.python.org/pep-0305/">
<meta property="og:site_name" content="Python Enhancement Proposals (PEPs)">
<meta property="og:image" content="https://peps.python.org/_static/og-image.png">
<meta property="og:image:alt" content="Python PEPs">
<meta property="og:image:width" content="200">
<meta property="og:image:height" content="200">
<meta name="description" content="The Comma Separated Values (CSV) file format is the most common import and export format for spreadsheets and databases. Although many CSV files are simple to parse, the format is not formally defined by a stable specification and is subtle enough that...">
<meta name="theme-color" content="#3776ab">
</head>
<body>
<svg xmlns="http://www.w3.org/2000/svg" style="display: none;">
<symbol id="svg-sun-half" viewBox="0 0 24 24" pointer-events="all">
<title>Following system colour scheme</title>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none"
stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round">
<circle cx="12" cy="12" r="9"></circle>
<path d="M12 3v18m0-12l4.65-4.65M12 14.3l7.37-7.37M12 19.6l8.85-8.85"></path>
</svg>
</symbol>
<symbol id="svg-moon" viewBox="0 0 24 24" pointer-events="all">
<title>Selected dark colour scheme</title>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none"
stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round">
<path stroke="none" d="M0 0h24v24H0z" fill="none"></path>
<path d="M12 3c.132 0 .263 0 .393 0a7.5 7.5 0 0 0 7.92 12.446a9 9 0 1 1 -8.313 -12.454z"></path>
</svg>
</symbol>
<symbol id="svg-sun" viewBox="0 0 24 24" pointer-events="all">
<title>Selected light colour scheme</title>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none"
stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round">
<circle cx="12" cy="12" r="5"></circle>
<line x1="12" y1="1" x2="12" y2="3"></line>
<line x1="12" y1="21" x2="12" y2="23"></line>
<line x1="4.22" y1="4.22" x2="5.64" y2="5.64"></line>
<line x1="18.36" y1="18.36" x2="19.78" y2="19.78"></line>
<line x1="1" y1="12" x2="3" y2="12"></line>
<line x1="21" y1="12" x2="23" y2="12"></line>
<line x1="4.22" y1="19.78" x2="5.64" y2="18.36"></line>
<line x1="18.36" y1="5.64" x2="19.78" y2="4.22"></line>
</svg>
</symbol>
</svg>
<script>
document.documentElement.dataset.colour_scheme = localStorage.getItem("colour_scheme") || "auto"
</script>
<section id="pep-page-section">
<header>
<h1>Python Enhancement Proposals</h1>
<ul class="breadcrumbs">
<li><a href="https://www.python.org/" title="The Python Programming Language">Python</a> &raquo; </li>
<li><a href="../pep-0000/">PEP Index</a> &raquo; </li>
<li>PEP 305</li>
</ul>
<button id="colour-scheme-cycler" onClick="setColourScheme(nextColourScheme())">
<svg aria-hidden="true" class="colour-scheme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
<svg aria-hidden="true" class="colour-scheme-icon-when-dark"><use href="#svg-moon"></use></svg>
<svg aria-hidden="true" class="colour-scheme-icon-when-light"><use href="#svg-sun"></use></svg>
<span class="visually-hidden">Toggle light / dark / auto colour theme</span>
</button>
</header>
<article>
<section id="pep-content">
<h1 class="page-title">PEP 305 CSV File API</h1>
<dl class="rfc2822 field-list simple">
<dt class="field-odd">Author<span class="colon">:</span></dt>
<dd class="field-odd">Kevin Altis &lt;altis&#32;&#97;t&#32;semi-retired.com&gt;,
Dave Cole &lt;djc&#32;&#97;t&#32;object-craft.com.au&gt;,
Andrew McNamara &lt;andrewm&#32;&#97;t&#32;object-craft.com.au&gt;,
Skip Montanaro &lt;skip&#32;&#97;t&#32;pobox.com&gt;,
Cliff Wells &lt;LogiplexSoftware&#32;&#97;t&#32;earthlink.net&gt;</dd>
<dt class="field-even">Discussions-To<span class="colon">:</span></dt>
<dd class="field-even"><a class="reference external" href="https://mail.python.org/mailman/listinfo/csv">Csv list</a></dd>
<dt class="field-odd">Status<span class="colon">:</span></dt>
<dd class="field-odd"><abbr title="Accepted and implementation complete, or no longer active">Final</abbr></dd>
<dt class="field-even">Type<span class="colon">:</span></dt>
<dd class="field-even"><abbr title="Normative PEP with a new feature for Python, implementation change for CPython or interoperability standard for the ecosystem">Standards Track</abbr></dd>
<dt class="field-odd">Created<span class="colon">:</span></dt>
<dd class="field-odd">26-Jan-2003</dd>
<dt class="field-even">Python-Version<span class="colon">:</span></dt>
<dd class="field-even">2.3</dd>
<dt class="field-odd">Post-History<span class="colon">:</span></dt>
<dd class="field-odd">31-Jan-2003, 13-Feb-2003</dd>
</dl>
<hr class="docutils" />
<section id="contents">
<details><summary>Table of Contents</summary><ul class="simple">
<li><a class="reference internal" href="#abstract">Abstract</a></li>
<li><a class="reference internal" href="#to-do-notes-for-the-interested-and-ambitious">To Do (Notes for the Interested and Ambitious)</a></li>
<li><a class="reference internal" href="#application-domain">Application Domain</a></li>
<li><a class="reference internal" href="#rationale">Rationale</a></li>
<li><a class="reference internal" href="#existing-modules">Existing Modules</a></li>
<li><a class="reference internal" href="#module-interface">Module Interface</a><ul>
<li><a class="reference internal" href="#reading-csv-files">Reading CSV Files</a></li>
<li><a class="reference internal" href="#writing-csv-files">Writing CSV Files</a></li>
<li><a class="reference internal" href="#managing-different-dialects">Managing Different Dialects</a></li>
<li><a class="reference internal" href="#formatting-parameters">Formatting Parameters</a></li>
<li><a class="reference internal" href="#reader-objects">Reader Objects</a></li>
<li><a class="reference internal" href="#writer-objects">Writer Objects</a></li>
</ul>
</li>
<li><a class="reference internal" href="#implementation">Implementation</a></li>
<li><a class="reference internal" href="#testing">Testing</a></li>
<li><a class="reference internal" href="#issues">Issues</a></li>
<li><a class="reference internal" href="#references">References</a></li>
<li><a class="reference internal" href="#copyright">Copyright</a></li>
</ul>
</details></section>
<section id="abstract">
<h2><a class="toc-backref" href="#abstract" role="doc-backlink">Abstract</a></h2>
<p>The Comma Separated Values (CSV) file format is the most common import
and export format for spreadsheets and databases. Although many CSV
files are simple to parse, the format is not formally defined by a
stable specification and is subtle enough that parsing lines of a CSV
file with something like <code class="docutils literal notranslate"><span class="pre">line.split(&quot;,&quot;)</span></code> is eventually bound to
fail. This PEP defines an API for reading and writing CSV files. It
is accompanied by a corresponding module which implements the API.</p>
</section>
<section id="to-do-notes-for-the-interested-and-ambitious">
<h2><a class="toc-backref" href="#to-do-notes-for-the-interested-and-ambitious" role="doc-backlink">To Do (Notes for the Interested and Ambitious)</a></h2>
<ul class="simple">
<li>Better motivation for the choice of passing a file object to the
constructors. See
<a class="reference external" href="https://mail.python.org/pipermail/csv/2003-January/000179.html">https://mail.python.org/pipermail/csv/2003-January/000179.html</a></li>
<li>Unicode. ugh.</li>
</ul>
</section>
<section id="application-domain">
<h2><a class="toc-backref" href="#application-domain" role="doc-backlink">Application Domain</a></h2>
<p>This PEP is about doing one thing well: parsing tabular data which may
use a variety of field separators, quoting characters, quote escape
mechanisms and line endings. The authors intend the proposed module
to solve this one parsing problem efficiently. The authors do not
intend to address any of these related topics:</p>
<ul class="simple">
<li>data interpretation (is a field containing the string “10” supposed
to be a string, a float or an int? is it a number in base 10, base
16 or base 2? is a number in quotes a number or a string?)</li>
<li>locale-specific data representation (should the number 1.23 be
written as “1.23” or “1,23” or “1 23”?) this may eventually be
addressed.</li>
<li>fixed width tabular data - can already be parsed reliably.</li>
</ul>
</section>
<section id="rationale">
<h2><a class="toc-backref" href="#rationale" role="doc-backlink">Rationale</a></h2>
<p>Often, CSV files are formatted simply enough that you can get by
reading them line-by-line and splitting on the commas which delimit
the fields. This is especially true if all the data being read is
numeric. This approach may work for a while, then come back to bite
you in the butt when somebody puts something unexpected in the data
like a comma. As you dig into the problem you may eventually come to
the conclusion that you can solve the problem using regular
expressions. This will work for a while, then break mysteriously one
day. The problem grows, so you dig deeper and eventually realize that
you need a purpose-built parser for the format.</p>
<p>CSV formats are not well-defined and different implementations have a
number of subtle corner cases. It has been suggested that the “V” in
the acronym stands for “Vague” instead of “Values”. Different
delimiters and quoting characters are just the start. Some programs
generate whitespace after each delimiter which is not part of the
following field. Others quote embedded quoting characters by doubling
them, others by prefixing them with an escape character. The list of
weird ways to do things can seem endless.</p>
<p>All this variability means it is difficult for programmers to reliably
parse CSV files from many sources or generate CSV files designed to be
fed to specific external programs without a thorough understanding of
those sources and programs. This PEP and the software which accompany
it attempt to make the process less fragile.</p>
</section>
<section id="existing-modules">
<h2><a class="toc-backref" href="#existing-modules" role="doc-backlink">Existing Modules</a></h2>
<p>This problem has been tackled before. At least three modules
currently available in the Python community enable programmers to read
and write CSV files:</p>
<ul class="simple">
<li>Object Crafts CSV module <a class="footnote-reference brackets" href="#id8" id="id1">[2]</a></li>
<li>Cliff Wells Python-DSV module <a class="footnote-reference brackets" href="#id9" id="id2">[3]</a></li>
<li>Laurence Tratts ASV module <a class="footnote-reference brackets" href="#id10" id="id3">[4]</a></li>
</ul>
<p>Each has a different API, making it somewhat difficult for programmers
to switch between them. More of a problem may be that they interpret
some of the CSV corner cases differently, so even after surmounting
the differences between the different module APIs, the programmer has
to also deal with semantic differences between the packages.</p>
</section>
<section id="module-interface">
<h2><a class="toc-backref" href="#module-interface" role="doc-backlink">Module Interface</a></h2>
<p>This PEP supports three basic APIs, one to read and parse CSV files,
one to write them, and one to identify different CSV dialects to the
readers and writers.</p>
<section id="reading-csv-files">
<h3><a class="toc-backref" href="#reading-csv-files" role="doc-backlink">Reading CSV Files</a></h3>
<p>CSV readers are created with the reader factory function:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">obj</span> <span class="o">=</span> <span class="n">reader</span><span class="p">(</span><span class="n">iterable</span> <span class="p">[,</span> <span class="n">dialect</span><span class="o">=</span><span class="s1">&#39;excel&#39;</span><span class="p">]</span>
<span class="p">[</span><span class="n">optional</span> <span class="n">keyword</span> <span class="n">args</span><span class="p">])</span>
</pre></div>
</div>
<p>A reader object is an iterator which takes an iterable object
returning lines as the sole required parameter. If it supports a
binary mode (file objects do), the iterable argument to the reader
function must have been opened in binary mode. This gives the reader
object full control over the interpretation of the files contents.
The optional dialect parameter is discussed below. The reader
function also accepts several optional keyword arguments which define
specific format settings for the parser (see the section “Formatting
Parameters”). Readers are typically used as follows:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">csvreader</span> <span class="o">=</span> <span class="n">csv</span><span class="o">.</span><span class="n">reader</span><span class="p">(</span><span class="n">file</span><span class="p">(</span><span class="s2">&quot;some.csv&quot;</span><span class="p">))</span>
<span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">csvreader</span><span class="p">:</span>
<span class="n">process</span><span class="p">(</span><span class="n">row</span><span class="p">)</span>
</pre></div>
</div>
<p>Each row returned by a reader object is a list of strings or Unicode
objects.</p>
<p>When both a dialect parameter and individual formatting parameters are
passed to the constructor, first the dialect is queried for formatting
parameters, then individual formatting parameters are examined.</p>
</section>
<section id="writing-csv-files">
<h3><a class="toc-backref" href="#writing-csv-files" role="doc-backlink">Writing CSV Files</a></h3>
<p>Creating writers is similar:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">obj</span> <span class="o">=</span> <span class="n">writer</span><span class="p">(</span><span class="n">fileobj</span> <span class="p">[,</span> <span class="n">dialect</span><span class="o">=</span><span class="s1">&#39;excel&#39;</span><span class="p">],</span>
<span class="p">[</span><span class="n">optional</span> <span class="n">keyword</span> <span class="n">args</span><span class="p">])</span>
</pre></div>
</div>
<p>A writer object is a wrapper around a file-like object opened for
writing in binary mode (if such a distinction is made). It accepts
the same optional keyword parameters as the reader constructor.</p>
<p>Writers are typically used as follows:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">csvwriter</span> <span class="o">=</span> <span class="n">csv</span><span class="o">.</span><span class="n">writer</span><span class="p">(</span><span class="n">file</span><span class="p">(</span><span class="s2">&quot;some.csv&quot;</span><span class="p">,</span> <span class="s2">&quot;w&quot;</span><span class="p">))</span>
<span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">someiterable</span><span class="p">:</span>
<span class="n">csvwriter</span><span class="o">.</span><span class="n">writerow</span><span class="p">(</span><span class="n">row</span><span class="p">)</span>
</pre></div>
</div>
<p>To generate a set of field names as the first row of the CSV file, the
programmer must explicitly write it, e.g.:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">csvwriter</span> <span class="o">=</span> <span class="n">csv</span><span class="o">.</span><span class="n">writer</span><span class="p">(</span><span class="n">file</span><span class="p">(</span><span class="s2">&quot;some.csv&quot;</span><span class="p">,</span> <span class="s2">&quot;w&quot;</span><span class="p">),</span> <span class="n">fieldnames</span><span class="o">=</span><span class="n">names</span><span class="p">)</span>
<span class="n">csvwriter</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">names</span><span class="p">)</span>
<span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">someiterable</span><span class="p">:</span>
<span class="n">csvwriter</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">row</span><span class="p">)</span>
</pre></div>
</div>
<p>or arrange for it to be the first row in the iterable being written.</p>
</section>
<section id="managing-different-dialects">
<h3><a class="toc-backref" href="#managing-different-dialects" role="doc-backlink">Managing Different Dialects</a></h3>
<p>Because CSV is a somewhat ill-defined format, there are plenty of ways
one CSV file can differ from another, yet contain exactly the same
data. Many tools which can import or export tabular data allow the
user to indicate the field delimiter, quote character, line
terminator, and other characteristics of the file. These can be
fairly easily determined, but are still mildly annoying to figure out,
and make for fairly long function calls when specified individually.</p>
<p>To try and minimize the difficulty of figuring out and specifying a
bunch of formatting parameters, reader and writer objects support a
dialect argument which is just a convenient handle on a group of these
lower level parameters. When a dialect is given as a string it
identifies one of the dialects known to the module via its
registration functions, otherwise it must be an instance of the
Dialect class as described below.</p>
<p>Dialects will generally be named after applications or organizations
which define specific sets of format constraints. Two dialects are
defined in the module as of this writing, “excel”, which describes the
default format constraints for CSV file export by Excel 97 and Excel
2000, and “excel-tab”, which is the same as “excel” but specifies an
ASCII TAB character as the field delimiter.</p>
<p>Dialects are implemented as attribute only classes to enable users to
construct variant dialects by subclassing. The “excel” dialect is a
subclass of Dialect and is defined as follows:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">Dialect</span><span class="p">:</span>
<span class="c1"># placeholders</span>
<span class="n">delimiter</span> <span class="o">=</span> <span class="kc">None</span>
<span class="n">quotechar</span> <span class="o">=</span> <span class="kc">None</span>
<span class="n">escapechar</span> <span class="o">=</span> <span class="kc">None</span>
<span class="n">doublequote</span> <span class="o">=</span> <span class="kc">None</span>
<span class="n">skipinitialspace</span> <span class="o">=</span> <span class="kc">None</span>
<span class="n">lineterminator</span> <span class="o">=</span> <span class="kc">None</span>
<span class="n">quoting</span> <span class="o">=</span> <span class="kc">None</span>
<span class="k">class</span> <span class="nc">excel</span><span class="p">(</span><span class="n">Dialect</span><span class="p">):</span>
<span class="n">delimiter</span> <span class="o">=</span> <span class="s1">&#39;,&#39;</span>
<span class="n">quotechar</span> <span class="o">=</span> <span class="s1">&#39;&quot;&#39;</span>
<span class="n">doublequote</span> <span class="o">=</span> <span class="kc">True</span>
<span class="n">skipinitialspace</span> <span class="o">=</span> <span class="kc">False</span>
<span class="n">lineterminator</span> <span class="o">=</span> <span class="s1">&#39;</span><span class="se">\r\n</span><span class="s1">&#39;</span>
<span class="n">quoting</span> <span class="o">=</span> <span class="n">QUOTE_MINIMAL</span>
</pre></div>
</div>
<p>The “excel-tab” dialect is defined as:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">exceltsv</span><span class="p">(</span><span class="n">excel</span><span class="p">):</span>
<span class="n">delimiter</span> <span class="o">=</span> <span class="s1">&#39;</span><span class="se">\t</span><span class="s1">&#39;</span>
</pre></div>
</div>
<p>(For a description of the individual formatting parameters see the
section “Formatting Parameters”.)</p>
<p>To enable string references to specific dialects, the module defines
several functions:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">dialect</span> <span class="o">=</span> <span class="n">get_dialect</span><span class="p">(</span><span class="n">name</span><span class="p">)</span>
<span class="n">names</span> <span class="o">=</span> <span class="n">list_dialects</span><span class="p">()</span>
<span class="n">register_dialect</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="n">dialect</span><span class="p">)</span>
<span class="n">unregister_dialect</span><span class="p">(</span><span class="n">name</span><span class="p">)</span>
</pre></div>
</div>
<p><code class="docutils literal notranslate"><span class="pre">get_dialect()</span></code> returns the dialect instance associated with the
given name. <code class="docutils literal notranslate"><span class="pre">list_dialects()</span></code> returns a list of all registered
dialect names. <code class="docutils literal notranslate"><span class="pre">register_dialects()</span></code> associates a string name with
a dialect class. <code class="docutils literal notranslate"><span class="pre">unregister_dialect()</span></code> deletes a name/dialect
association.</p>
</section>
<section id="formatting-parameters">
<h3><a class="toc-backref" href="#formatting-parameters" role="doc-backlink">Formatting Parameters</a></h3>
<p>In addition to the dialect argument, both the reader and writer
constructors take several specific formatting parameters, specified as
keyword parameters. The formatting parameters understood are:</p>
<ul class="simple">
<li><code class="docutils literal notranslate"><span class="pre">quotechar</span></code> specifies a one-character string to use as the quoting
character. It defaults to ‘”’. Setting this to None has the same
effect as setting quoting to csv.QUOTE_NONE.</li>
<li><code class="docutils literal notranslate"><span class="pre">delimiter</span></code> specifies a one-character string to use as the field
separator. It defaults to ,.</li>
<li><code class="docutils literal notranslate"><span class="pre">escapechar</span></code> specifies a one-character string used to escape the
delimiter when quotechar is set to None.</li>
<li><code class="docutils literal notranslate"><span class="pre">skipinitialspace</span></code> specifies how to interpret whitespace which
immediately follows a delimiter. It defaults to False, which means
that whitespace immediately following a delimiter is part of the
following field.</li>
<li><code class="docutils literal notranslate"><span class="pre">lineterminator</span></code> specifies the character sequence which should
terminate rows.</li>
<li><code class="docutils literal notranslate"><span class="pre">quoting</span></code> controls when quotes should be generated by the writer.
It can take on any of the following module constants:<ul>
<li>csv.QUOTE_MINIMAL means only when required, for example, when a
field contains either the quotechar or the delimiter</li>
<li>csv.QUOTE_ALL means that quotes are always placed around fields.</li>
<li>csv.QUOTE_NONNUMERIC means that quotes are always placed around
nonnumeric fields.</li>
<li>csv.QUOTE_NONE means that quotes are never placed around fields.</li>
</ul>
</li>
<li><code class="docutils literal notranslate"><span class="pre">doublequote</span></code> controls the handling of quotes inside fields. When
True two consecutive quotes are interpreted as one during read, and
when writing, each quote is written as two quotes.</li>
</ul>
<p>When processing a dialect setting and one or more of the other
optional parameters, the dialect parameter is processed before the
individual formatting parameters. This makes it easy to choose a
dialect, then override one or more of the settings without defining a
new dialect class. For example, if a CSV file was generated by Excel
2000 using single quotes as the quote character and a colon as the
delimiter, you could create a reader like:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">csvreader</span> <span class="o">=</span> <span class="n">csv</span><span class="o">.</span><span class="n">reader</span><span class="p">(</span><span class="n">file</span><span class="p">(</span><span class="s2">&quot;some.csv&quot;</span><span class="p">),</span> <span class="n">dialect</span><span class="o">=</span><span class="s2">&quot;excel&quot;</span><span class="p">,</span>
<span class="n">quotechar</span><span class="o">=</span><span class="s2">&quot;&#39;&quot;</span><span class="p">,</span> <span class="n">delimiter</span><span class="o">=</span><span class="s1">&#39;:&#39;</span><span class="p">)</span>
</pre></div>
</div>
<p>Other details of how Excel generates CSV files would be handled
automatically because of the reference to the “excel” dialect.</p>
</section>
<section id="reader-objects">
<h3><a class="toc-backref" href="#reader-objects" role="doc-backlink">Reader Objects</a></h3>
<p>Reader objects are iterables whose next() method returns a sequence of
strings, one string per field in the row.</p>
</section>
<section id="writer-objects">
<h3><a class="toc-backref" href="#writer-objects" role="doc-backlink">Writer Objects</a></h3>
<p>Writer objects have two methods, writerow() and writerows(). The
former accepts an iterable (typically a list) of fields which are to
be written to the output. The latter accepts a list of iterables and
calls writerow() for each.</p>
</section>
</section>
<section id="implementation">
<h2><a class="toc-backref" href="#implementation" role="doc-backlink">Implementation</a></h2>
<p>There is a sample implementation available. <a class="footnote-reference brackets" href="#id7" id="id4">[1]</a> The goal is for it
to efficiently implement the API described in the PEP. It is heavily
based on the Object Craft csv module. <a class="footnote-reference brackets" href="#id8" id="id5">[2]</a></p>
</section>
<section id="testing">
<h2><a class="toc-backref" href="#testing" role="doc-backlink">Testing</a></h2>
<p>The sample implementation <a class="footnote-reference brackets" href="#id7" id="id6">[1]</a> includes a set of test cases.</p>
</section>
<section id="issues">
<h2><a class="toc-backref" href="#issues" role="doc-backlink">Issues</a></h2>
<ol class="arabic">
<li>Should a parameter control how consecutive delimiters are
interpreted? Our thought is “no”. Consecutive delimiters should
always denote an empty field.</li>
<li>What about Unicode? Is it sufficient to pass a file object gotten
from codecs.open()? For example:<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">csvreader</span> <span class="o">=</span> <span class="n">csv</span><span class="o">.</span><span class="n">reader</span><span class="p">(</span><span class="n">codecs</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="s2">&quot;some.csv&quot;</span><span class="p">,</span> <span class="s2">&quot;r&quot;</span><span class="p">,</span> <span class="s2">&quot;cp1252&quot;</span><span class="p">))</span>
<span class="n">csvwriter</span> <span class="o">=</span> <span class="n">csv</span><span class="o">.</span><span class="n">writer</span><span class="p">(</span><span class="n">codecs</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="s2">&quot;some.csv&quot;</span><span class="p">,</span> <span class="s2">&quot;w&quot;</span><span class="p">,</span> <span class="s2">&quot;utf-8&quot;</span><span class="p">))</span>
</pre></div>
</div>
<p>In the first example, text would be assumed to be encoded as cp1252.
Should the system be aggressive in converting to Unicode or should
Unicode strings only be returned if necessary?</p>
<p>In the second example, the file will take care of automatically
encoding Unicode strings as utf-8 before writing to disk.</p>
<p>Note: As of this writing, the csv module doesnt handle Unicode
data.</p>
</li>
<li>What about alternate escape conventions? If the dialect in use
includes an <code class="docutils literal notranslate"><span class="pre">escapechar</span></code> parameter which is not None and the
<code class="docutils literal notranslate"><span class="pre">quoting</span></code> parameter is set to QUOTE_NONE, delimiters appearing
within fields will be prefixed by the escape character when writing
and are expected to be prefixed by the escape character when
reading.</li>
<li>Should there be a “fully quoted” mode for writing? What about
“fully quoted except for numeric values”? Both are implemented
(QUOTE_ALL and QUOTE_NONNUMERIC, respectively).</li>
<li>What about end-of-line? If I generate a CSV file on a Unix system,
will Excel properly recognize the LF-only line terminators? Files
must be opened for reading or writing as appropriate using binary
mode. Specify the <code class="docutils literal notranslate"><span class="pre">lineterminator</span></code> sequence as <code class="docutils literal notranslate"><span class="pre">'\r\n'</span></code>. The
resulting file will be written correctly.</li>
<li>What about an option to generate dicts from the reader and accept
dicts by the writer? See the DictReader and DictWriter classes in
csv.py.</li>
<li>Are quote character and delimiters limited to single characters?
For the time being, yes.</li>
<li>How should rows of different lengths be handled? Interpretation of
the data is the applications job. There is no such thing as a
“short row” or a “long row” at this level.</li>
</ol>
</section>
<section id="references">
<h2><a class="toc-backref" href="#references" role="doc-backlink">References</a></h2>
<aside class="footnote-list brackets">
<aside class="footnote brackets" id="id7" role="doc-footnote">
<dt class="label" id="id7">[1]<em> (<a href='#id4'>1</a>, <a href='#id6'>2</a>) </em></dt>
<dd>csv module, Python Sandbox
(<a class="reference external" href="http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/nondist/sandbox/csv/">http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/nondist/sandbox/csv/</a>)</aside>
<aside class="footnote brackets" id="id8" role="doc-footnote">
<dt class="label" id="id8">[2]<em> (<a href='#id1'>1</a>, <a href='#id5'>2</a>) </em></dt>
<dd>csv module, Object Craft
(<a class="reference external" href="http://www.object-craft.com.au/projects/csv">http://www.object-craft.com.au/projects/csv</a>)</aside>
<aside class="footnote brackets" id="id9" role="doc-footnote">
<dt class="label" id="id9">[<a href="#id2">3</a>]</dt>
<dd>Python-DSV module, Wells
(<a class="reference external" href="http://sourceforge.net/projects/python-dsv/">http://sourceforge.net/projects/python-dsv/</a>)</aside>
<aside class="footnote brackets" id="id10" role="doc-footnote">
<dt class="label" id="id10">[<a href="#id3">4</a>]</dt>
<dd>ASV module, Tratt
(<a class="reference external" href="http://tratt.net/laurie/python/asv/">http://tratt.net/laurie/python/asv/</a>)</aside>
</aside>
<p>There are many references to other CSV-related projects on the Web. A
few are included here.</p>
</section>
<section id="copyright">
<h2><a class="toc-backref" href="#copyright" role="doc-backlink">Copyright</a></h2>
<p>This document has been placed in the public domain.</p>
</section>
</section>
<hr class="docutils" />
<p>Source: <a class="reference external" href="https://github.com/python/peps/blob/main/peps/pep-0305.rst">https://github.com/python/peps/blob/main/peps/pep-0305.rst</a></p>
<p>Last modified: <a class="reference external" href="https://github.com/python/peps/commits/main/peps/pep-0305.rst">2023-09-09 17:39:29 GMT</a></p>
</article>
<nav id="pep-sidebar">
<h2>Contents</h2>
<ul>
<li><a class="reference internal" href="#abstract">Abstract</a></li>
<li><a class="reference internal" href="#to-do-notes-for-the-interested-and-ambitious">To Do (Notes for the Interested and Ambitious)</a></li>
<li><a class="reference internal" href="#application-domain">Application Domain</a></li>
<li><a class="reference internal" href="#rationale">Rationale</a></li>
<li><a class="reference internal" href="#existing-modules">Existing Modules</a></li>
<li><a class="reference internal" href="#module-interface">Module Interface</a><ul>
<li><a class="reference internal" href="#reading-csv-files">Reading CSV Files</a></li>
<li><a class="reference internal" href="#writing-csv-files">Writing CSV Files</a></li>
<li><a class="reference internal" href="#managing-different-dialects">Managing Different Dialects</a></li>
<li><a class="reference internal" href="#formatting-parameters">Formatting Parameters</a></li>
<li><a class="reference internal" href="#reader-objects">Reader Objects</a></li>
<li><a class="reference internal" href="#writer-objects">Writer Objects</a></li>
</ul>
</li>
<li><a class="reference internal" href="#implementation">Implementation</a></li>
<li><a class="reference internal" href="#testing">Testing</a></li>
<li><a class="reference internal" href="#issues">Issues</a></li>
<li><a class="reference internal" href="#references">References</a></li>
<li><a class="reference internal" href="#copyright">Copyright</a></li>
</ul>
<br>
<a id="source" href="https://github.com/python/peps/blob/main/peps/pep-0305.rst">Page Source (GitHub)</a>
</nav>
</section>
<script src="../_static/colour_scheme.js"></script>
<script src="../_static/wrap_tables.js"></script>
<script src="../_static/sticky_banner.js"></script>
</body>
</html>