New version of PEP 3144 by Peter Moody.

This commit is contained in:
Georg Brandl 2012-02-20 08:50:42 +01:00
parent 102687f9ff
commit 8172610f46
1 changed files with 104 additions and 326 deletions

View File

@ -1,382 +1,160 @@
PEP: 3144
Title: IP Address Manipulation Library for the Python Standard Library
Version: $Revision$
Last-Modified: $Date$
Author: Peter Moody <peter@hda3.com>
Discussions-To: ipaddr-py-dev@googlegroups.com
Status: Draft
Type: Standards Track
Content-Type: text/plain
Created: 13-Aug-2009
Python-Version: 3.2
PEP: 3144
Title: IP Address Manipulation Library for the Python Standard Library
Version: $Revision$
Last-Modified: $Date$
Author: Peter Moody <pmoody@google.com>
Discussions-To: <ipaddr-py-dev@googlegroups.com>
Status: Draft
Type: Standards Track
Content-Type: text/plain
Created: 6-Feb-2012
Python-Version: 3.3
Abstract:
This PEP proposes a design for a lightweight ip address manipulation module
for python.
This PEP proposes a design and for an IP address manipulation module for
python.
Motivation:
Many network administrators use python in their day to day jobs. Finding a
library to assist with the common ip address manipulation tasks is easy.
Finding a good library for performing those tasks can be somewhat more
difficult. For this reason, I (like many before me) scratched an itch and
wrote my own with an emphasis on being easy to understand and fast for the
most common operations.
For context, a previous version of this library was up for inclusion in
python 3.1, see issue 3959 [1] for more information.
Several very good IP address modules for python already exist.
The truth is that all of the struggle with the balance between
adherence to Pythonic principals and the shorthand upon which
network engineers and administrators rely. I believe ipaddr
strikes the right balance.
Rationale:
ipaddr was designed with a few basic principals in mind:
The existance of several Python IP address manipulation moduels is
evidence of an outstanding need for the functionality this module
seeks to provide.
- IPv4 and IPv6 objects are distinct.
- IP addresses and IP networks are distinct.
- the library should be useful and the assumptions obvious to the network
programmer.
- IP networks should be treated as lists (as opposed to some other
python intrinsic) in so far as it makes sense.
- the library should be lightweight and fast without sacrificing
expected functionality.
- Distinct IPV4 and IPV6 objects.
While there are many similarities, IPV4 and IPV6 objects are fundamentally
different. The similarities allow for easy abstraction of certain
operations which affect the bits from both in the same manner, but their
differences mean attempts to combine them into one object yield unexpected
results. According to Vint Cerf, "I have seen a substantial amount of
traffic about IPv4 and IPv6 comparisons and the general consensus is that
these are not comparable." (Vint Cerf [2]). For python versions >= 3.0,
this means that (<, >, <=, >=) comparison operations between IPv4 and IPv6
objects raise a TypeError per the Ordering Comparisons [3].
Background:
- Distinct network and address objects.
PEP 3144 and ipaddr have been up for inclusion before. The
version of the library specified here is backwards incompatible
with the version on PyPI and the one which was discussed before.
In order to avoid confusing users of the current ipaddr, I've
renamed this version of the library "ipaddress".
An IPV4 address is a single 32 bit number while the IPV4 address assigned
to a networked computer is a 32 bit address and associated network.
Similarly, an IPV6 address is a 128 bit number while an IPV6 address
assigned to a networked computer is a 128 bit number and associated network
information. The similarities leads to easy abstraction of some methods
and properties, but there are obviously a number of address/network
specific properties which require they be distinct. For instance, IP
networks contain a network address (the base address of the network),
broadcast address (the upper end of the network, also the address to
which every machine on a given network is supposed listen, hence the name
broadcast), supernetworks and subnetworks, etc. The individual property
addresses in an IP network obviously don't have the same properties,
they're simply 32 or 128 bit numbers.
The main differences between ipaddr and ipaddress are:
- Principal of least confusion for network programmers.
* ipaddress *Network classes are equivalent to the ipaddr *Network
class counterparts with the strict flag set to True.
It should be understood that, above all, this module is designed with the
network administrator in mind. In practice, this means that a number of
assumptions are made with regards to common usage and the library prefers
the usefulness of accepted practice over strict adherence to RFCs. For
example, ipaddr accepts '192.168.1.1/24' as a network definition because
this is a very common way of describing an address + netmask despite the
fact that 192.168.1.1 is actually an IP address on the network
192.168.1.0/24. Strict adherence would require that networks have all of
the host bits masked to zero, which would require two objects to describe
that IP + network. In practice, a looser interpretation of a network is
a very useful if common abstraction, so ipaddr prefers to make this
available. For the developer who is concerned with strict adherence,
ipaddr provides an optional 'strict' boolean argument to the
IPv(4|6)Network constructors which guarantees that all host bits are masked
down.
- Treat network elements as lists (in so far as it's possible).
* ipaddress *Interface classes are equivalent to the ipaddr
*Network class counterparts with the strict flag set to False.
Treating IP networks as lists is a natural extension from viewing the
network as a series of individual ip addresses. Most of the standard list
methods should be implemented and should behave in a manner that would be
consistent if the IP network object were actually a list of strings or
integers. The methods which actually modify a lists contents don't extend
as well to this model (__add__, __iadd__, __sub__, __isub__, etc) but
others (__contains__, __iter__, etc) work quite nicely. It should be noted
that __len__ doesn't work as expected since python internals has this
limited to a 32 bit integer and it would need to be at least 128 bits to
work with IPV6.
* The factory functions in ipaddress were renamed to disambiguate
them from classes.
- Lightweight.
While some network programmers will undoubtedly want more than this library
provides, keeping the functionality to strictly what's required from a IP
address manipulation module is critical to keeping the code fast, easily
comprehensible and extensible. It is a goal to provide enough options in
terms of functionality to allow the developer to easily do their work
without needlessly cluttering the library. Finally, It's important to note
that this design doesn't prevent subclassing or otherwise extending to meet
the unforeseen needs.
* A few attributes were renamed to disambiguate their purpose as
well. (eg. network, network_address)
Specification:
A slightly more detailed look at the library follows.
The ipaddr module defines a total of 6 new public classes, 3 for
manipulating IPv4 objects and 3 for manipulating IPv6 objects.
The classes are as follows:
- Design
IPv4Address/IPv6Address - These define individual addresses, for
example the IPv4 address returned by an A record query for
www.google.com (74.125.224.84) or the IPv6 address returned by a
AAAA record query for ipv6.google.com (2001:4860:4001:801::1011).
ipaddr has four main classes most people will use:
IPv4Network/IPv6Network - These define networks or groups of
addresses, for example the IPv4 network reserved for multicast use
(224.0.0.0/4) or the IPv6 network reserved for multicast
(ff00::/8, wow, that's big).
1. IPv4Address. (eg, '192.168.1.1')
2. IPv4Network (eg, '192.168.0.0/16')
3. IPv6Address (eg, '::1')
4. IPv6Network (eg, '2001::/32')
IPv4Interface/IPv6Interface - These hybrid classes refer to an
individual address on a given network. For example, the IPV4
address 192.0.2.1 on the network 192.0.2.0/24 could be referred to
as 192.0.2.1/24. Likewise, the IPv6 address 2001:DB8::1 on the
network 2001:DB8::/96 could be referred to as 2001:DB8::1/96.
It's very common to refer to addresses assigned to computer
network interfaces like this, hence the Interface name.
Most of the operations a network administrator performs on networks are
similar for both IPv4 and IPv6 networks. Ie. finding subnets, supernets,
determining if an address is contained in a given network, etc. Similarly,
both addresses and networks (of the same ip version!) have much in common;
the process for turning a given 32 or 128 bit number into a human readable
string notation, determining if the ip is within the valid specified range,
etc. Finally, there are some pythonic abstractions which are valid for all
addresses and networks, both IPv4 and IPv6. In short, there is common
functionality shared between (ipaddr class names in parentheses):
All IPv4 classes share certain characteristics and methods; the
number of bits needed to represent them, whether or not they
belong to certain special IPv4 network ranges, etc. Similarly,
all IPv6 classes share characteristics and methods.
1. all IP addresses and networks, both IPv4 and IPv6. (_IPAddrBase)
ipaddr makes extensive use of inheritance to avoid code
duplication as much as possible. The parent classes are private,
but they are outlined here:
2. all IP addresses of both versions. (_BaseIP)
_IPAddrBase - Provides methods common to all ipaddr objects.
3. all IP networks of both version. (_BaseNet)
_BaseAddress - Provides methods common to IPv4Address and
IPv6Address.
4. all IPv4 objects, both addresses and networks. (_BaseV4)
_BaseInterface - Provides methods common to IPv4Interface and
IPv6Interface, as well as IPv4Network and IPv6Network (ipaddr
treats the Network classes as a special case of Interface).
5. all IPv6 objects, both addresses and networks. (_BaseV6)
_BaseV4 - Provides methods and variables (eg, _max_prefixlen)
common to all IPv4 classes.
Seeing this as a clear hierarchy is important for recognizing how much
code is common between the four main classes. For this reason, ipaddr uses
class inheritance to abstract out as much common code is possible and
appropriate. This lack of duplication and very clean layout also makes
the job of the developer much easier should they need to debug code (either
theirs or mine).
_BaseV6 - Provides methods and variables common to all IPv6
classes.
Knowing that there might be cases where the developer doesn't so much care
as to the types of IP they might be receiving, ipaddr comes with two
important helper functions, IPAddress() and IPNetwork(). These, as you
might guess, return the appropriately typed address or network objects for
the given argument.
Comparisons between objects of differing IP versions results in a
TypeError [1]. Additionally, comparisons of objects with
different _Base parent classes results in a TypeError. The effect
of the _Base parent class limitation is that IPv4Interface's can
be compared to IPv4Network's and IPv6Interface's can be compared
to IPv6Network's.
Finally, as mentioned earlier, there is no meaningful natural ordering
between IPv4 and IPv6 addresses and networks [2]. Rather than invent a
standard, ipaddr follows Ordering Comparisons and returns a TypeError
when asked to compare objects of differing IP versions. In practice, there
are many ways a programmer may wish to order the addresses, so this this
shouldn't pose a problem for the developer who can easily write:
v4 = [x for x in mixed_list if x._version == 4]
v6 = [x for x in mixed_list if x._version == 6]
# perform operations on v4 and v6 here.
return v4_return + v6_return
- Multiple ways of displaying an IP Address.
Not everyone will want to display the same information in the same format;
IP addresses in cisco syntax are represented by network/hostmask, junipers
are (network/IP)/prefixlength and IPTables are (network/IP)/(prefixlength/
netmask). The ipaddr library provides multiple ways to display an address.
In [1]: IPNetwork('1.1.1.1').with_prefixlen
Out[1]: '1.1.1.1/32'
In [1]: IPNetwork('1.1.1.1').with_netmask
Out[1]: '1.1.1.1/255.255.255.255'
In [1]: IPNetwork('1.1.1.1').with_hostmask
Out[1]: '1.1.1.1/0.0.0.0'
the same applies to IPv6. It should be noted that netmasks and hostmasks
are not commonly used in IPv6, the methods exist for compatibility with
IPv4.
- Lazy evaluation combined with aggressive caching of network elements.
(the following example is for IPv6Network objects but the exact same
properties apply to IPv6Network objects).
As mentioned, an IP network object is defined by a number of properties.
The object
In [1]: IPv4Network('1.1.1.0/24')
has a number of IPv4Address properties
In [1]: o = IPv4Network('1.1.1.0/24')
In [2]: o.network
Out[2]: IPv4Address('1.1.1.0')
In [3]: o.broadcast
Out[3]: IPv4Address('1.1.1.255')
In [4]: o.hostmask
Out[4]: IPv4Address('0.0.0.255')
If we were to compute them all at object creation time, we would incur a
non-negligible performance hit. Since these properties are required to
define the object completely but their values aren't always of interest to
the programmer, their computation should be done only when requested.
However, in order to avoid the performance hit in the case where one
attribute for a particular object is requested repeatedly (and continuously
recomputed), the results of the computation should be cached.
- Address list summarization.
ipaddr supports easy summarization of lists of possibly contiguous
addresses, as this is something network administrators constantly find
themselves doing. This currently works in a number of ways.
1. collapse_address_list([list]):
Given a list of networks, ipaddr will collapse the list into the smallest
possible list of networks that wholey contain the addresses supplied.
In [1]: collapse_address_list([IPNetwork('1.1.0.0/24'),
...: IPNetwork('1.1.1.0/24')])
Out[1]: [IPv4Network('1.1.0.0/23')]
more elaborately:
In [1]: collapse_address_list([IPNetwork(x) for x in
...: IPNetwork('1.1.0.0/23')])
Out[1]: [IPv4Network('1.1.0.0/23')]
2. summarize_address_range(first, last).
Given a start and end address, ipaddr will provide the smallest number of
networks to cover the given range.
In [1]: summarize_address_range(IPv4Address('1.1.1.0'),
...: IPv4Address('2.2.2.0'))
Out[1]:
[IPv4Network('1.1.1.0/24'),
IPv4Network('1.1.2.0/23'),
IPv4Network('1.1.4.0/22'),
IPv4Network('1.1.8.0/21'),
IPv4Network('1.1.16.0/20'),
IPv4Network('1.1.32.0/19'),
IPv4Network('1.1.64.0/18'),
IPv4Network('1.1.128.0/17'),
IPv4Network('1.2.0.0/15'),
IPv4Network('1.4.0.0/14'),
IPv4Network('1.8.0.0/13'),
IPv4Network('1.16.0.0/12'),
IPv4Network('1.32.0.0/11'),
IPv4Network('1.64.0.0/10'),
IPv4Network('1.128.0.0/9'),
IPv4Network('2.0.0.0/15'),
IPv4Network('2.2.0.0/23'),
IPv4Network('2.2.2.0/32')]
- Address Exclusion.
Used somewhat less often, but all the more annoying, is the case where an
programmer would want "all of the addresses in a newtork *except* these".
ipaddr performs this exclusion equally well for IPv4 and IPv6 networks
and collapses the resulting address list.
In [1]: IPNetwork('1.1.0.0/15').address_exclude(IPNetwork('1.1.1.0/24'))
Out[1]:
[IPv4Network('1.0.0.0/16'),
IPv4Network('1.1.0.0/24'),
IPv4Network('1.1.2.0/23'),
IPv4Network('1.1.4.0/22'),
IPv4Network('1.1.8.0/21'),
IPv4Network('1.1.16.0/20'),
IPv4Network('1.1.32.0/19'),
IPv4Network('1.1.64.0/18'),
IPv4Network('1.1.128.0/17')]
In [1]: IPNewtork('::1/96').address_exclude(IPNetwork('::1/112'))
Out[1]:
[IPv6Network('::1:0/112'),
IPv6Network('::2:0/111'),
IPv6Network('::4:0/110'),
IPv6Network('::8:0/109'),
IPv6Network('::10:0/108'),
IPv6Network('::20:0/107'),
IPv6Network('::40:0/106'),
IPv6Network('::80:0/105'),
IPv6Network('::100:0/104'),
IPv6Network('::200:0/103'),
IPv6Network('::400:0/102'),
IPv6Network('::800:0/101'),
IPv6Network('::1000:0/100'),
IPv6Network('::2000:0/99'),
IPv6Network('::4000:0/98'),
IPv6Network('::8000:0/97')]
- IPv6 address compression.
By default, IPv6 addresses are compressed internally (see the method
BaseV6._compress_hextets), but ipaddr makes both the compressed and the
exploded representations available.
In [1]: IPNetwork('::1').compressed
Out[1]: '::1/128'
In [2]: IPNetwork('::1').exploded
Out[2]: '0000:0000:0000:0000:0000:0000:0000:1/128'
In [3]: IPv6Address('::1').exploded
Out[3]: '0000:0000:0000:0000:0000:0000:0000:0001'
In [4]: IPv6Address('::1').compressed
Out[4]: '::1'
(the same methods exist for IPv4 networks and addresses, but they're
just stubs for returning the normal __str__ representation).
- Most other common operations.
It is a design goal to support all of the common operation expected from
an IP address manipulation module. As such, finding supernets, subnets,
address and network containment etc are all supported.
Reference Implementation:
A reference implementation is available at:
http://ipaddr-py.googlecode.com/svn/trunk
The current reference implementation can be found at:
http://code.google.com/p/ipaddr-py/downloads/detail?name=3144.tar.gz
More information about using the reference implementation can be
found at: http://code.google.com/p/ipaddr-py/wiki/Using3144
References:
[1] http://bugs.python.org/issue3959
[2] Appealing to authority is a logical fallacy, but Vint Cerf is an
an authority who can't be ignored. Full text of the email follows:
[1] Appealing to authority is a logical fallacy, but Vint Cerf is an
an authority who can't be ignored. Full text of the email
follows:
"""
I have seen a substantial amount of traffic about IPv4 and IPv6
comparisons and the general consensus is that these are not comparable.
I have seen a substantial amount of traffic about IPv4 and
IPv6 comparisons and the general consensus is that these are
not comparable.
If we were to take a very simple minded view, we might treat these as
pure integers in which case there is an ordering but not a useful one.
If we were to take a very simple minded view, we might treat
these as pure integers in which case there is an ordering but
not a useful one.
In the IPv4 world, "length" is important because we take longest (most
specific) address first for routing. Length is determine by the mask,
as you know.
In the IPv4 world, "length" is important because we take
longest (most specific) address first for routing. Length is
determine by the mask, as you know.
Assuming that the same style of argument works in IPv6, we would have
to conclude that treating an IPv6 value purely as an integer for
comparison with IPv4 would lead to some really strange results.
Assuming that the same style of argument works in IPv6, we
would have to conclude that treating an IPv6 value purely as
an integer for comparison with IPv4 would lead to some really
strange results.
All of IPv4 space would lie in the host space of 0::0/96 prefix of
IPv6. For any useful interpretation of IPv4, this is a non-starter.
All of IPv4 space would lie in the host space of 0::0/96
prefix of IPv6. For any useful interpretation of IPv4, this is
a non-starter.
I think the only sensible conclusion is that IPv4 values and IPv6 values
should be treated as non-comparable.
I think the only sensible conclusion is that IPv4 values and
IPv6 values should be treated as non-comparable.
Vint
"""
[3] http://docs.python.org/dev/3.0/whatsnew/3.0.html#ordering-comparisons
Copyright: