PEP:  3144
Title:  IP Address Manipulation Library for the Python Standard Library
Version:  $Revision$
Last-Modified:  $Date$
Author:  Peter Moody <peter@hda3.com>
Discussions-To:  ipaddr-py-dev@googlegroups.com
Status:  Draft
Type:  Standards Track
Content-Type:  text/plain
Created:  13-Aug-2009
Python-Version:  3.2


Abstract:

    This PEP proposes a design for a lightweight ip address manipulation module
    for python.


Motivation:

    Many network administrators use python in their day to day jobs.  Finding a
    library to assist with the common ip address manipulation tasks is easy.
    Finding a good library for performing those tasks can be somewhat more
    difficult.  For this reason, I (like many before me) scratched an itch and
    wrote my own with an emphasis on being easy to understand and fast for the
    most common operations.

    For context, a previous version of this library was up for inclusion in
    python 3.1, see issue 3959 [1] for more information.


Rationale:
 
    ipaddr was designed with the goal of abstracting out as much of the common
    functionality as possible.  As mentioned earlier, the similarities between
    addresses and networks, IPV6 and IPV4 allows much code to be reused since
    python allows for easy (and clean) multiple inheritance.  Methods which are
    specific to IPV4 or IPV6, addresses or networks are inherited from
    appropriately named classes (Basev4, Basev6, BaseNet, BaseIP, etc) to
    provide the full functionality of IPv4Address, IPv4Network, IPv6Address and
    IPv6Network.

    - Distinct IPV4 and IPV6 objects.

    While there are many similarities, IPV4 and IPV6 objects are fundamentally
    different. The similarities allow for easy abstraction of certain
    operations which affect the bits from both in the same manner, but their
    differences mean attempts to combine them into one object would be like
    trying to force a round peg into a square hole (or visa versa).

    - Distinct network and address objects.

    Many people think of IP addresses and IP networks as synonymous, while they
    are however, distinct. An IPV4 address is a single 32 bit number while the
    IPV4 address assigned to a networked computer is a 32 bit address and
    associated network.  Similarly, an IPV6 address is a 128 bit number while
    an IPV6 address assigned to a networked computer is a 128 bit number and
    associated network information.  The similarities leads to easy abstraction
    of some methods and properties, but there are obviously a number of
    address/network specific properties which require they be distinct. For
    instance, IP networks contain network address (the base address of the
    network), broadcast addresses (the upper end of the network, also the
    address to which every machine on a given network is supposed listen, hence
    the name broadcast), supernetworks and subnetworks, etc. The individual
    property addresses in an IP network obviously don't have the same
    properties, they're simply 32 or 128 bit numbers.
    
    - Treat network elements as lists (in so far as it's possible).

    Treating IP networks as lists is a natural extension from viewing the
    network as a series of individual ip addresses.  Most of the standard list
    methods should be implemented and should behave in a manner that would be
    consistent if the IP network object were actually a list of strings or
    integers.  The methods which actually modify a lists contents don't extend
    as well to this model (__add__, __iadd__, __sub__, __isub__, etc) but
    others (__contains__, __iter__, etc) work quite nicely.  It should be noted
    that __len__ doesn't work as expected since python internals has this
    limited to a 32 bit integer and it would need to be at least 128 bits to
    work with IPV6.

    - Lightweight.

    While some network programmers will undoubtedly want more than this library
    provides, keeping the functionality to strictly what's required from a IP
    address manipulation module is critical to keeping the code fast, easily
    comprehensible and extensible.  I've tried to provide enough options in
    terms of functionality to allow the developer to easily do their work
    without needlessly cluttering the library.  Finally, It's important to note
    that this design doesn't prevent subclassing or otherwise extending to meet
    the unforeseen needs.


Specification:

    A slightly more detailed look at the library follows.

    - Design

    ipaddr has four main classes most people will use:

    1. IPv4Address. (eg, '192.168.1.1')
    2. IPv4Network  (eg, '192.168.0.0/16')
    3. IPv6Address  (eg, '::1')
    4. IPv6Network  (eg, '2001::/32')

    Most of the operations a network administrator performs on networks are
    similar for both IPv4 and IPv6 networks. Ie. finding subnets, supernets,
    determining if an address is contained in a given network, etc.  Similarly,
    both addresses and networks (of the same ip version!) have much in common;
    the process for turning a given 32 or 128 bit number into a human readable
    string notation, determining if the ip is within the valid specified range,
    etc.  Finally, there are some pythonic abstractions which are valid for all
    addresses and networks, both IPv4 and IPv6.  In short, there is common
    functionality shared between (ipaddr class names in parentheses):

    1. all IP addresses and networks, both IPv4 and IPv6. (IPAddrBase)

    2. all IP addresses of both versions. (BaseIP)

    3. all IP networks of both version. (BaseNet)

    4. all IPv4 objects, both addresses and networks. (BaseV4)

    5. all IPv6 objects, both addresses and networks. (BaseV6)

    Seeing this as a clear hierarchy is important for recognizing how much
    code is common between the four main classes. For this reason, ipaddr uses
    class inheritance to abstract out as much common code is possible and
    appropriate.  This lack of duplication and very clean layout also makes
    the job of the developer much easier should they need to debug code (either
    theirs or mine).

    Knowing that there might be cases where the developer doesn't so much care
    as to the types of IP they might be receiving, ipaddr comes with two
    important helper functions, IPAddress() and IPNetwork(). These, as you
    might guess, return the appropriately typed address or network objects for
    the given argument.

    Finally, this distinction between IPv4 and IPv6 IP versions  means that
    comparison operations on them return TypeError for py3k per Ordering
    Comparisons [2]. In practice, this shouldn't pose a problem for the
    developer who can easily write:

    v4 = [x for x in mixed_list if x._version == 4]
    v6 = [x for x in mixed_list if x._version == 6]

    # perform operations on v4 and v6 here.

    return v4_return + v6_return
    
    - Multiple ways of displaying an IP Address.

    Not everyone will want to display the same information in the same format;
    IP addresses in cisco syntax are represented by network/hostmask, junipers
    are (network/IP)/prefixlength and IPTables are (network/IP)/(prefixlength/
    netmask).  The ipaddr library provides multiple ways to display an address.

    In [1]: IPNetwork('1.1.1.1').with_prefixlen
    Out[1]: '1.1.1.1/32'

    In [1]: IPNetwork('1.1.1.1').with_netmask
    Out[1]: '1.1.1.1/255.255.255.255'

    In [1]: IPNetwork('1.1.1.1').with_hostmask
    Out[1]: '1.1.1.1/0.0.0.0'

    the same applies to IPv6

    - Lazy evaluation combined with aggressive caching of network elements.

    (the following example is for IPv6Network objects but the exact same
    properties apply to IPv6Network objects).

    As mentioned, an IP network object is defined by a number of properties.
    The object

    In [1]: IPv4Network('1.1.1.0/24')

    has a number of IPv4Address properties

    In [1]: o = IPv4Network('1.1.1.0/24')

    In [2]: o.network
    Out[2]: IPv4Address('1.1.1.0')

    In [3]: o.broadcast
    Out[3]: IPv4Address('1.1.1.255')

    In [4]: o.hostmask
    Out[4]: IPv4Address('0.0.0.255')

    If we were to compute them all at object creation time, we would incur a
    non-negligible performance hit. Since these properties are required to
    define the object completely but their values aren't always of interest to
    the programmer, their computation should be done only when requested.
    However, in order to avoid the performance hit in the case where one
    attribute for a particular object is requested repeatedly (and continuously
    recomputed), the results of the computation should be cached.

    - Address list summarization.

    ipaddr supports easy summarization of lists of possibly contiguous
    addresses, as this is something network administrators constantly find
    themselves doing. This currently works in a number of ways.

    1. collapse_address_list([list]):

    Given a list of networks, ipaddr will collapse the list into the smallest
    possible list of networks that wholey contain the addresses supplied.
    
    In [1]: collapse_address_list([IPNetwork('1.1.0.0/24'),
    ...:                           IPNetwork('1.1.1.0/24')])
    Out[1]: [IPv4Network('1.1.0.0/23')]

    more elaborately:

    In [1]: collapse_address_list([IPNetwork(x) for x in
    ...:                           IPNetwork('1.1.0.0/23')])
    Out[1]: [IPv4Network('1.1.0.0/23')]

    2. summarize_address_range(first, last).

    Given a start and end address, ipaddr will provide the smallest number of
    networks to cover the given range.


    In [1]: summarize_address_range(IPv4Address('1.1.1.0'),
    ...:                            IPv4Address('2.2.2.0'))
    Out[1]:
    [IPv4Network('1.1.1.0/24'),
     IPv4Network('1.1.2.0/23'),
     IPv4Network('1.1.4.0/22'),
     IPv4Network('1.1.8.0/21'),
     IPv4Network('1.1.16.0/20'),
     IPv4Network('1.1.32.0/19'),
     IPv4Network('1.1.64.0/18'),
     IPv4Network('1.1.128.0/17'),
     IPv4Network('1.2.0.0/15'),
     IPv4Network('1.4.0.0/14'),
     IPv4Network('1.8.0.0/13'),
     IPv4Network('1.16.0.0/12'),
     IPv4Network('1.32.0.0/11'),
     IPv4Network('1.64.0.0/10'),
     IPv4Network('1.128.0.0/9'),
     IPv4Network('2.0.0.0/15'),
     IPv4Network('2.2.0.0/23'),
     IPv4Network('2.2.2.0/32')]
    
    - Address Exclusion.

    Used somewhat less often, but all the more annoying, is the case where an
    programmer would want "all of the addresses in a newtork *except* these".
    ipaddr performs this exclusion equally well for IPv4 and IPv6 networks
    and collapses the resulting address list.

    In [1]: IPNetwork('1.1.0.0/15').address_exclude(IPNetwork('1.1.1.0/24'))
    Out[1]:
    [IPv4Network('1.0.0.0/16'),
     IPv4Network('1.1.0.0/24'),
     IPv4Network('1.1.2.0/23'),
     IPv4Network('1.1.4.0/22'),
     IPv4Network('1.1.8.0/21'),
     IPv4Network('1.1.16.0/20'),
     IPv4Network('1.1.32.0/19'),
     IPv4Network('1.1.64.0/18'),
     IPv4Network('1.1.128.0/17')]

    In [1]: IPNewtork('::1/96').address_exclude(IPNetwork('::1/112'))
    Out[1]:
    [IPv6Network('::1:0/112'),
     IPv6Network('::2:0/111'),
     IPv6Network('::4:0/110'),
     IPv6Network('::8:0/109'),
     IPv6Network('::10:0/108'),
     IPv6Network('::20:0/107'),
     IPv6Network('::40:0/106'),
     IPv6Network('::80:0/105'),
     IPv6Network('::100:0/104'),
     IPv6Network('::200:0/103'),
     IPv6Network('::400:0/102'),
     IPv6Network('::800:0/101'),
     IPv6Network('::1000:0/100'),
     IPv6Network('::2000:0/99'),
     IPv6Network('::4000:0/98'),
     IPv6Network('::8000:0/97')]

    - IPv6 address compression. (in a pending changelist [3])

    By default, IPv6 addresses are compressed internally (see the method
    BaseV6._compress_hextets), but ipaddr makes both the compressed and the
    exploded representations available.

    In [1]: IPNetwork('::1').compressed
    Out[1]: '::1/128'

    In [2]: IPNetwork('::1').exploded
    Out[2]: '0000:0000:0000:0000:0000:0000:0000:1/128'

    In [3]: IPv6Address('::1').exploded
    Out[3]: '0000:0000:0000:0000:0000:0000:0000:0001'

    In [4]: IPv6Address('::1').compressed
    Out[4]: '::1'

    (the same methods exist for IPv4 networks and addresses, but they're
    just stubs for returning the normal __str__ representation).


Reference Implementation:

    A reference implementation is available at:
    http://ipaddr-py.googlecode.com/svn/branches/2.0.x


References:

    [1] http://bugs.python.org/issue3959
    [2] http://docs.python.org/dev/3.0/whatsnew/3.0.html#ordering-comparisons
    [3] http://codereview.appspot.com/110044


Copyright:

    This document has been placed in the public domain.


Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: