Packaging

This chapter focuses on a repeatable process of writing and releasing Python packages. We will see how to shorten the time needed to set up everything before starting the real work. We will also learn how to provide a standardized way to write packages and ease the use of a test-driven development approach. We will finally learn how to facilitate the release process.

It is organized into the following four parts:

  • A common pattern for all packages that describes the similarities between all Python packages, and how distutils and setuptools play a central role the packaging process.

  • What are namespace packages and why they can be useful?

  • How to register and upload packages in the Python Package Index (PyPI) with emphasis on security and common pitfalls.

  • The standalone executables as an alternative way to package and distribute Python applications.

1. Creating a package

Python packaging can be a bit overwhelming at first. The main reason for that is the confusion about proper tools for creating Python packages. Anyway, once you create your first package, you will see that this is not as hard as it looks. Also, knowing proper, state-of- the art packaging tools helps a lot.

You should know how to create packages even if you are not interested in distributing your code as open source. Knowing how to make your own packages will give you more insight in the packaging ecosystem and will help you to work with third-party code that is available on PyPI that you are probably already using.

Also, having your closed source project or its components available as source distribution packages can help you to deploy your code in different environments. The advantages of leveraging the Python packaging ecosystem in the code deployment process will be described in more detail in the next chapter. Here we will focus on proper tools and techniques to create such distributions.

1.1. The confusing state of Python packaging tools

The state of Python packaging was very confusing for a long time and it took many years to bring organization to this topic. Everything started with the distutils package introduced in 1998, which was later enhanced by setuptools in 2003. These two projects started a long and knotted story of forks, alternative projects, and complete rewrites that tried to (once and for all) fix the Python packaging ecosystem. Unfortunately, most of these attempts never succeeded. The effect was quite the opposite. Each new project that aimed to supersede setuptools or distutils only added to the already huge confusion around packaging tools. Some of such forks were merged back to their ancestors (such as distribute which was a fork of setuptools) but some were left abandoned (such as distutils2).

Fortunately, this state is gradually changing. An organization called the Python Packaging Authority (PyPA) was formed to bring back the order and organization to the packaging ecosystem. The Python Packaging User Guide (https://packaging.python.org), maintained by PyPA, is the authoritative source of information about the latest packaging tools and best practices. Treat that site as the best source of information about packaging and complementary reading for this chapter. This guide also contains a detailed history of changes and new projects related to packaging. So it is worth reading it, even if you already know a bit about packaging, to make sure you still use the proper tools.

Stay away from other popular internet resources, such as “The Hitchhiker’s Guide to Packaging”. It is old, not maintained, and mostly obsolete. It may be interesting only for historical reasons, and the Python Packaging User Guide is in fact a fork of this old resource.

1.1.1. The current landscape of Python packaging thanks to PyPA

PyPA, besides providing an authoritative guide for packaging, also maintains packaging projects and a standardization process for new official aspects of Python packaging. All of PyPA’s projects can be found under a single organization on GitHub: https://github.com/pypa

Some of them were already mentioned. The following are the most notable:

  • pip

  • virtualenv

  • twine

  • warehouse

Note that most of them were started outside of this organization and were moved under PyPA patronage when they become mature and widespread solutions.

Thanks to PyPA engagement, the progressive abandonment of the eggs format in favor of wheels for built distributions has already happened. Also thanks to the commitment of the PyPA community, the old PyPI implementation was finally totally rewritten in the form of the Warehouse project. Now, PyPI has got a modernized user interface and many long- awaited usability improvements and features.

1.1.2. Tool recommendations

The Python Packaging User Guide gives a few suggestions on recommended tools for working with packages. They can be generally divided into the following two groups:

  • Tools for installing packages

  • Tools for package creation and distribution

Utilities from the first group recommended are:

  • Use pip for installing packages from PyPI.

  • Use virtualenv or venv for application-level isolation of the Python runtime environment.

The Python Packaging User Guide recommendations of tools for package creation and distribution are as follows:

  • Use setuptools to define projects and create source distributions.

  • Use wheels in favor of eggs to create built distributions.

  • Use twine to upload package distributions to PyPI.

1.2. Project configuration

It should be obvious that the easiest way to organize the code of big applications is to split them into several packages. This makes the code simpler, easier to understand, maintain, and change. It also maximizes the reusability of your code. Separate packages act as components that can be used in various programs.

1.2.1. setup.py

The root directory of a package that has to be distributed contains a setup.py script. It defines all metadata as described in the distutils module. Package metadata is expressed as arguments in a call to the standard setup() function. Despite distutils being the standard library module provided for the purpose of code packaging, it is actually recommended to use the setuptools instead. The setuptools package provides several enhancements over the standard distutils module.

Therefore, the minimum content for this file is as follows:

from setuptools import setup

setup(
    name='mypackage'
)

name gives the full name of the package. From there, the script provides several commands that can be listed with the --help-commands option, as shown in the following code:

$ python3 setup.py --help-commands
Standard commands:
    build           build everything needed to install
    clean           clean up temporary files from 'build' command
    install         install everything from build directory
    sdist           create a source distribution (tarball, zip file, etc.)
    register        register the distribution with the Python package index
    bdist           create a built (binary) distribution
    check           perform some checks on the package
    upload          upload binary package to PyPI

Extra commands:
    bdist_wheel     create a wheel distribution
    alias           define a shortcut to invoke one or more commands
    develop         install package in 'development mode'

usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
or: setup.py --help [cmd1 cmd2 ...]
or: setup.py --help-commands
or: setup.py cmd --help

The actual list of commands is longer and can vary depending on the available setuptools extensions. It was truncated to show only those that are most important and relevant to this chapter. Standard commands are the built-in commands provided by distutils, whereas extra commands are the ones provided by third-party packages, such as setuptools or any other package that defines and registers a new command. Here, one such extra command registered by another package is bdist_wheel, provided by the wheel package.

1.2.2. setup.cfg

The setup.cfg file contains default options for commands of the setup.py script. This is very useful if the process for building and distributing the package is more complex and requires many optional arguments to be passed to the setup.py script commands. This setup.cfg file allows you to store such default parameters together with your source code on a per project basis. This will make your distribution flow independent from the project and also provides transparency about how your package was built/distributed to the users and other team members.

The syntax for the setup.cfg file is the same as provided by the built-in configparser module so it is similar to the popular Microsoft Windows INI files. Here is an example of the setup.cfg configuration file that provides some global, sdist, and bdist_wheel commands’ defaults:

[global]
quiet=1

[sdist]
formats=zip,tar

[bdist_wheel]
universal=1

This example configuration will ensure that source distributions (sdist section) will always be created in two formats (ZIP and TAR) and the built wheel distributions (bdist_wheel section) will be created as universal wheels that are independent from the Python version. Also most of the output will be suppressed on every command by the global --quiet switch. Note that this option is included here only for demonstration purposes and it may not be a reasonable choice to suppress the output for every command by default.

1.2.3. MANIFEST.in

When building a distribution with the sdist command, the distutils module browses the package directory looking for files to include in the archive. By default distutils will include the following:

  • All Python source files implied by the py_modules, packages, and scripts arguments

  • All C source files listed in the ext_modules argument

  • Files that match the glob pattern test/test*.py

  • Files named README, README.txt, setup.py, and setup.cfg

Besides that, if your package is versioned with a version control system such as Subversion, Mercurial, or Git, there is the possibility to auto-include all version controlled files using additional setuptools extensions such as setuptools-svn, setuptools-hg, and setuptools-git. Integration with other version control systems is also possible through other custom extensions. No matter if it is the default built-in collection strategy or one defined by custom extension, the sdist will create a MANIFEST file that lists all files and will include them in the final archive.

Let’s say you are not using any extra extensions, and you need to include in your package distribution some files that are not captured by default. You can define a template called MANIFEST.in in your package root directory (the same directory as setup.py file). This template directs the sdist command on which files to include.

This MANIFEST.in template defines one inclusion or exclusion rule per line:

include HISTORY.txt
include README.txt
include CHANGES.txt
include CONTRIBUTORS.txt
include LICENSE
recursive-include *.txt *.py

The full list of the MANIFEST.in commands can be found in the official distutils documentation.

1.2.4. Most important metadata

Besides the name and the version of the package being distributed, the most important arguments that the setup() function can receive are as follows:

  • description: This includes a few sentences to describe the package.

  • long_description: This includes a full description that can be in reStructuredText (default) or other supported markup languages.

  • long_description_content_type: this defines MIME type of long description; it is used to tell the package repository what kind of markup language is used for the package description.

  • keywords: This is a list of keywords that define the package and allow for better indexing in the package repository.

  • author: This is the name of the package author or organization that takes care of it.

  • author_email: This is the contact email address.

  • url: This is the URL of the project.

  • license: This is the name of the license (GPL, LGPL, and so on) under which the package is distributed.

  • packages: This is a list of all package names in the package distribution; setuptools provides a small function called find_packages that can automatically find package names to include.

  • namespace_packages: This is a list of namespace packages within package distribution.

1.2.5. Trove classifiers

PyPI and distutils provide a solution for categorizing applications with the set of classifiers called trove classifiers. All trove classifiers form a tree-like structure. Each classifier string defines a list of nested namespaces where every namespace is separated by the :: substring. Their list is provided to the package definition as a classifiers argument of the setup() function.

Here is an example list of classifiers taken from solrq project available on PyPI:

from setuptools import setup


setup(
    name="solrq",
    # (...)
    classifiers=[
    'Development Status :: 4 - Beta',
    'Intended Audience :: Developers',
    'License :: OSI Approved :: BSD License',
    'Operating System :: OS Independent',
    'Programming Language :: Python',
    'Programming Language :: Python :: 2',
    'Programming Language :: Python :: 2.6',
    'Programming Language :: Python :: 2.7',
    'Programming Language :: Python :: 3',
    'Programming Language :: Python :: 3.2',
    'Programming Language :: Python :: 3.3',
    'Programming Language :: Python :: 3.4',
    'Programming Language :: Python :: Implementation :: PyPy',
    'Topic :: Internet :: WWW/HTTP :: Indexing/Search',
    ]
)

Trove classifiers are completely optional in the package definition but provide a useful extension to the basic metadata available in the setup() interface. Among others, trove classifiers may provide information about supported Python versions, supported operating systems, the development stage of the project, or the license under which the code is released. Many PyPI users search and browse the available packages by categories so a proper classification helps packages to reach their target.

Trove classifiers serve an important role in the whole packaging ecosystem and should never be ignored. There is no organization that verifies packages classification, so it is your responsibility to provide proper classifiers for your packages and not introduce chaos to the whole package index.

At the time of writing this section, there are 667 classifiers available on PyPI that are grouped into the following nine major categories:

  • Development status

  • Environment

  • Framework

  • Intended audience

  • License

  • Natural language

  • Operating system

  • Programming language

  • Topic

This list is ever-growing, and new classifiers are added from time to time. It is thus possible that the total count of them will be different at the time you read this. The full list of currently available trove classifiers is available at https://pypi.org/classifiers.

1.2.6. Common patterns

Creating a package for distribution can be a tedious task for unexperienced developers. Most of the metadata that setuptools or distutils accept in their setup() function call can be provided manually ignoring the fact that this metadata may be also available in other parts of the project. Here is an example:

from setuptools import setup


setup(
    name="myproject",
    version="0.0.1",
    description="mypackage project short description",
    long_description="""
        Longer description of mypackage project
        possibly with some documentation and/or
        usage examples
    """,
    install_requires=[
        'dependency1',
        'dependency2',
        'etc'
    ]
)

Some of the metadata elements are often found in different places in a typical Python project. For instance, content of long description is commonly included in the project’s README file, and it is a good convention to put a version specifier in the __init__ module of the package. Hardcoding such package metadata as setup() function arguments redundancy to the project that allows for easy mistakes and inconsistencies in future. Both setuptools and distutils cannot automatically pick metadata information from the project sources, so you need to provide it yourself. There are some common patterns among the Python community for solving the most popular problems such as dependency management, version/readme inclusion, and so on. It is worth knowing at least a few of them because they are so popular that they could be considered as packaging idioms.

1.2.6.1. Automated inclusion of version string from package

The PEP 440 “Version Identification and Dependency Specification” document specifies a standard for version and dependency specification. It is a long document that covers accepted version specification schemes and defines how version matching and comparison in Python packaging tools should work. If you are using or plan to use a complex project version numbering scheme, then you should definitely read this document carefully. If you are using a simple scheme that consists just of one, two, three, or more numbers separated by dots, then you don’t have to dig into the details of PEP 440. If you don’t know how to choose the proper versioning scheme, I greatly recommend following the semantic versioning scheme.

The other problem related to code versioning is where to include that version specifier for a package or module. There is PEP 396 (Module Version Numbers) that deals exactly with this problem. PEP 396 is only an informational document and has a deferred status, so it is not a part of the official Python standards track. Anyway, it describes what seems to be a de facto standard now. According to PEP 396, if a package or module has a specific version defined, the version specifier should be included as a __version__ attribute of package root __init__.py INI file or distributed module file. Another de facto standard is to also include the VERSION attribute that contains the tuple of the version specifier parts. This helps users to write compatibility code because such version tuples can be easily compared if the versioning scheme is simple enough.

So many packages available on PyPI follow both conventions. Their __init__.py files contain version attributes that look like the following:

VERSION = (0, 1, 1)
__version__ = ".".join([str(x) for x in VERSION])

The other suggestion of PEP 396 is that the version argument provided in the setup() function of the setup.py script should be derived from __version__, or the other way around. The Python Packaging User Guide features multiple patterns for single- sourcing project versioning, and each of them has its own advantages and limitations. My personal favorite is rather long and is not included in the PyPA’s guide, but has the advantage of limiting the complexity only to the setup.py script. This boilerplate assumes that the version specifier is provided by the VERSION attribute of the package’s __init__ module and extracts this data for inclusion in the setup() call. Here is an excerpt from some imaginary package’s setup.py script that illustrates this approach:

from setuptools import setup
import os

def get_version(version_tuple):
    if not isinstance(version_tuple[-1], int):
        return '.'.join(map(str, version_tuple[:-1])) + version_tuple[-1]
    return '.'.join(map(str, version_tuple))

init = os.path.join(os.path.dirname(__file__), 'src', 'some_package', '__init__.py')
version_line = list(filter(lambda l: l.startswith('VERSION'), open(init)))[0]
PKG_VERSION = get_version(eval(version_line.split('=')[-1]))

setup(
    name='some-package',
    version=PKG_VERSION,
    # ...
)
1.2.6.2. README file

The Python Package Index can display the project’s README file or the value of long_description on the package page in the PyPI portal. PyPI is able to interpret the markup used in the long_description content and render it as HTML on the package page. The type of markup language is controlled through the long_description_content_type argument of the setup() call. For now, there are the following three choices for markup available:

  • Plain text with long_description_content_type='text/plain'

  • reStructuredText with long_description_content_type='text/x-rst'

  • Markdown with long_description_content_type='text/markdown'

Markdown and reStructuredText are the most popular choices among Python developers, but some might still want to use different markup languages for various reasons. If you want to use something different as your markup language for your project’s README, you can still provide it as a project description on the PyPI page in a readable form. The trick lies in using the pypandoc package to translate your other markup language into reStructuredText (or Markdown) while uploading the package to the Python Package Index. It is important to do it with a fallback to plain content of your README file, so the installation won’t fail if the user has no pypandoc installed. The following is an example of a setup.py script that is able to read the content of the README file written in AsciiDoc markup language and translate it to reStructuredText before including a long_description argument:

from setuptools import setup


try:
    from pypandoc import convert

    def read_md(file_path):
        return convert(file_path, to='rst', format='asciidoc')

except ImportError:
    convert = None
    print("warning: pypandoc module not found, could not convert Asciidoc to RST")

    def read_md(file_path):
        with open(file_path, 'r') as f:
        return f.read()

README = os.path.join(os.path.dirname(__file__), 'README')

setup(
    name='some-package',
    long_description=read_md(README),
    long_description_content_type='text/x-rst',
    # ...
)
1.2.6.3. Managing dependencies

Many projects require some external packages to be installed in order to work properly. When the list of dependencies is very long, there comes a question as to how to manage it. The answer in most cases is very simple. Do not over-engineer it. Keep it simple and provide the list of dependencies explicitly in your setup.py script as follows:

from setuptools import setup


setup(
    name='some-package',
    install_requires=['falcon', 'requests', 'delorean']
    # ...
)

Some Python developers like to use requirements.txt files for tracking lists of dependencies for their packages. In some situations, you might find some reason for doing that, but in most cases, this is a relic of times where the code of that project was not properly packaged. Anyway, even such notable projects as Celery still stick to this convention. So if you are not willing to change your habits or you are somehow forced to use requirement files, then at least do it properly. Here is one of the popular idioms for reading the list of dependencies from the requirements.txt file:

from setuptools import setup
import os

def strip_comments(l):
    return l.split('#', 1)[0].strip()

def reqs(*f):
    return list(filter(None, [strip_comments(l)
                              for l in open(os.path.join(os.getcwd(), *f)).readlines()]))
setup(
    name='some-package',
    install_requires=reqs('requirements.txt')
    # ...
)

1.2.7. The custom setup command

distutils allows you to create new commands. A new command can be registered with an entry point, which was introduced by setuptools as a simple way to define packages as plugins.

An entry point is a named link to a class or a function that is made available through some APIs in setuptools. Any application can scan for all registered packages and use the linked code as a plugin.

To link the new command, the entry_points metadata can be used in the setup call as follows:

setup(
    name="my.command",
    entry_points="""
        [distutils.commands]
        my_command = my.command.module.Class
    """
)

All named links are gathered in named sections. When distutils is loaded, it scans for links that were registered under distutils.commands.

This mechanism is used by numerous Python applications that provide extensibility.

1.3. Working with packages during development

Working with setuptools is mostly about building and distributing packages. However, you still need to use setuptools to install packages directly from project sources. And the reason for that is simple. It is a good habit to test if our packaging code works properly before submitting your package to PyPI. And the simplest way to test it is by installing it. If you send a broken package to the repository, then in order to re-upload it, you need to increase the version number.

Testing if your code is packaged properly before the final distribution saves you from unnecessary version number inflation and obviously from wasting your time. Also, installation directly from your own sources using setuptools may be essential when working on multiple related packages at the same time.

1.3.1. setup.py install

The install command installs the package in your current Python environment. It will try to build the package if no previous build was made and then inject the result into the filesystem directory where Python is looking for installed packages. If you have an archive with a source distribution of some package, you can decompress it in a temporary folder and then install it with this command. The install command will also install dependencies that are defined in the install_requires argument. Dependencies will be installed from the Python Package Index.

An alternative to the bare setup.py script when installing a package is to use pip. Since it is a tool that is recommended by PyPA, you should use it even when installing a package in your local environment just for development purposes. In order to install a package from local sources, run the following command:

pip install <project-path>

1.3.2. Uninstalling packages

Amazingly, setuptools and distutils lack the uninstall command. Fortunately, it is possible to uninstall any Python package using pip as follows:

pip uninstall <package-name>

Uninstalling can be a dangerous operation when attempted on system-wide packages. This is another reason why it is so important to use virtual environments for any development.

1.3.3. setup.py develop or pip -e

Packages installed with setup.py install are copied to the site-packages directory of your current Python environment. This means that whenever you make a change to the sources of that package, you are required to reinstall it. This is often a problem during intensive development because it is very easy to forget about the need to perform installation again. This is why setuptools provides an extra develop command that allows you to install packages in the development mode. This command creates a special link to project sources in the deployment directory (site-packages) instead of copying the whole package there. Package sources can be edited without the need for reinstallation and are available in the sys.path as if they were installed normally.

pip also allows you to install packages in such a mode. This installation option is called editable mode and can be enabled with the -e parameter in the install command as follows:

pip install -e <project-path>

Once you install the package in your environment in editable mode, you can freely modify the installed package in place and all the changes will be immediately visible without the need to reinstall the package.

2. Namespace packages

The Zen of Python that you can read after writing import this in the interpreter session says the following about namespaces:

“Namespaces are one honking great idea-let’s do more of those!”

And this can be understood in at least two ways. The first is a namespace in context of the language. We all use the following namespaces without even knowing:

  • The global namespace of a module

  • The local namespace of the function or method invocation

  • The class namespace

The other kind of namespaces can be provided at the packaging level. These are namespace packages. This is often an overlooked feature of Python packaging that can be very useful in structuring the package ecosystem in your organization or in a very large project.

Namespace packages can be understood as a way of grouping related packages, where each of these packages can be installed independently.

Namespace packages are especially useful if you have components of your application developed, packaged, and versioned independently but you still want to access them from the same namespace. This also helps to make clear to which organization or project every package belongs. For instance, for some imaginary Acme company, the common namespace could be acme. Therefore this organization could create the general acme namespace package that could serve as a container for other packages from this organization. For example, if someone from Acme wants to contribute to this namespace with, for example, an SQL-related library, they can create a new acme.sql package that registers itself in the acme namespace.

It is important to know what’s the difference between normal and namespace packages and what problem they solve. Normally (without namespace packages), you would create a package called acme with an sql subpackage/submodule with the following file structure:

$ tree acme/
acme/
├── acme
│    ├── __init__.py
│    └── sql
│       └── __init__.py
└── setup.py

2 directories, 3 files

Whenever you want to add a new subpackage, let’s say templating, you are forced to include it in the source tree of acme as follows:

$ tree acme/
acme/
├── acme
│   ├── __init__.py
│   ├── sql
│   │    └── __init__.py
│   └── templating
│        └── __init__.py
└── setup.py

3 directories, 4 files

Such an approach makes independent development of acme.sql and acme.templating almost impossible. The setup.py script will also have to specify all dependencies for every subpackage. So it is impossible (or at least very hard) to have an installation of some of the acme components optional. Also, with enough subpackages it is practically impossible to avoid dependency conflicts.

With namespace packages, you can store the source tree for each of these subpackages independently as follows:

$ tree acme.sql/
acme.sql/
├── acme
│    └── sql
│       └── __init__.py
└── setup.py

2 directories, 2 files


$ tree acme.templating/
acme.templating/
├── acme
│   └── templating
│       └── __init__.py
└── setup.py

2 directories, 2 files

And you can also register them independently in PyPI or any package index you use. Users can choose which of the subpackages they want to install from the acme namespace as follows, but they never install the general acme package (it doesn’t even have to exist):

$ pip install acme.sql acme.templating

Note that independent source trees are not enough to create namespace packages in Python. You need a bit of additional work if you don’t want your packages to not overwrite each other. Also proper handling may be different depending on the Python language version you target. Details of that are described in the next two sections.

2.1. Implicit namespace packages

If you use and target only Python 3, then there is good news for you. PEP 420 (Implicit Namespace Packages) introduced a new way to define namespace packages. It is part of the standards track and became an official part of the language since version 3.3. In short, every directory that contains Python packages or modules (including namespace packages too) is considered a namespace package if it does not contain the __init__.py file. So, the following are examples of file structures presented in the previous section:

$ tree acme.sql/
acme.sql/
├── acme
│    └── sql
│       └── __init__.py
└── setup.py

2 directories, 2 files


$ tree acme.templating/
acme.templating/
├── acme
│   └── templating
│       └── __init__.py
└── setup.py

2 directories, 2 files

They are enough to define that acme is a namespace package under Python 3.3 and later. Minimal setup.py for acme.templating package will look like following:

from setuptools import setup
setup(
    name='acme.templating',
    packages=['acme.templating'],
)

Unfortunately, the setuptools.find_packages() function does not support PEP 420 at the time of writing this section. This may change in the future. Also, a requirement to explicitly define a list of packages seems to be a very small price to pay for easy integration of namespace packages.

2.2. Namespace packages in previous Python versions

You can’t use implicit namespace packages (PEP 420 layout) in Python versions older than 3.3. Still, the concept of namespace packages is very old and was commonly used for years in such mature projects such as Zope. It means that it is definitely possible to use namespace packages in older version of Python. Actually, there are several ways to define that the package should be treated as a namespace.

The simplest one is to create a file structure for each component that resembles an ordinary package layout without implicit namespace packages and leave everything to setuptools.

So, the example layout for acme.sql and a cme.templating could be the following:

$ tree acme.sql/
acme.sql/
├── acme
│    ├── __init__.py
│    └── sql
│       └── __init__.py
└── setup.py

2 directories, 3 files


$ tree acme.templating/
acme.templating/
├── acme
│    ├── __init__.py
│    └── templating
│       └── __init__.py
└── setup.py

2 directories, 3 files

Note that for both acme.sql and acme.templating, there is an additional source file, acme/__init__.py. This file must be left empty. The acme namespace package will be created if we provide its name as a value of the namespace_packages keyword argument of the setuptools.setup() function as follows:

from setuptools import setup


setup(
    name='acme.templating',
    packages=['acme.templating'],
    namespace_packages=['acme'],
)

Easiest does not mean best. The setuptools module in order to register a new namespace will call for the pkg_resources.declare_namespace() function in your __init__.py file. It will happen even if the __init__.py file is empty. Anyway, as the official documentation says, it is your own responsibility to declare namespaces in the __init__.py file, and this implicit behavior of setuptools may be dropped in the future. In order to be safe and future-proof, you need to add the following line to the acme/__init__.py file:

__import__('pkg_resources').declare_namespace(__name__)

This line will make your namespace package safe from potential future changes regarding namespace packages in the setuptools module.

3. Uploading a package

Packages would be useless without an organized way to store, upload, and download them. Python Package Index is the main source of open source packages in the Python community. Anyone can freely upload new packages and the only requirement is to register on the PyPI site: https://pypi.python.org/pypi.

You are not, of course, limited to only this index and all Python packaging tools support the usage of alternative package repositories. This is especially useful for distributing closed source code among internal organizations or for deployment purposes. Details of such packaging usage with instructions on how to create your own package index will be explained in the next chapter. Here we focus mainly on open source uploads to PyPI, with only little mention on how to specify alternative repositories.

3.1. PyPI: Python Package Index

Python Package Index is, as already mentioned, the official source of open source package distributions. Downloading from it does not require any account or permission. The only thing you need is a package manager that can download new distributions from PyPI. Your preferred choice should be pip.

3.1.1. Uploading to PyPI

Anyone can register and upload packages to PyPI provided that he or she has an account registered. Packages are bound to the user, so, by default, only the user that registered the name of the package is its admin and can upload new distributions. This could be a problem for bigger projects, so there is an option to mark other users as package maintainers so that they are able to upload new distributions too.

The easiest way to upload a package is to use the following upload command of the setup.py script:

$ python setup.py <dist-commands> upload

Here, <dist-commands> is a list of commands that creates distributions to upload. Only distributions created during the same setup.py execution will be uploaded to the repository. So, if you upload source distribution, built distribution, and wheel package at once, then you need to issue the following command:

$ python setup.py sdist bdist bdist_wheel upload

When uploading using setup.py, you cannot reuse distributions that were already built in previous command calls and are forced to rebuild them on every upload. This may be inconvenient for large or complex projects where creation of the actual distribution may take a considerable amount of time. Another problem of setup.py upload is that it can use plain text HTTP or unverified HTTPS connections on some Python versions. This is why Twine is recommended as a secure replacement for the setup.py upload command.

Twine is the utility for interacting with PyPI that currently serves only one purpose: securely uploading packages to the repository. It supports any packaging format and always ensures that the connection is secure. It also allows you to upload files that were already created, so you are able to test distributions before release. The following example usage of twine still requires invoking the setup.py script for building distributions:

$ python setup.py sdist bdist_wheel
$ twine upload dist/*

3.1.2. .pypirc

.pypirc is a configuration file that stores information about Python packages repositories. It should be located in your home directory. The format for this file is as follows:

[distutils]
index-servers =
    pypi
    other

[pypi]
repository: <repository-url>
username: <username>
password: <password>

[other]
repository: https://example.com/pypi
username: <username>
password: <password>

The distutils section should have the index-servers variable that lists all sections describing all the available repositories and credentials for them. There are only the following three variables that can be modified for each repository section:

  • repository: This is the URL of the package repository (it defaults to https://pypi.org/).

  • username: This is the username for authentication in the given repository.

  • password: This is the user password for authentication in the given repository (in plain text).

Note that storing your repository password in plain text may not be the wisest security choice. You can always leave it blank and you should be prompted for it whenever it is necessary.

The .pypirc file should be respected by every packaging tool built for Python. While this may not be true for every packaging-related utility out there, it is supported by the most important ones, such as pip, twine, distutils and setuptools.

3.2. Source packages versus built packages

There are generally the following two types of distributions for Python packages:

  • Source distributions

  • Built (binary) distributions

Source distributions are the simplest and most platform independent. For pure Python packages, it is a no-brainer. Such a distribution contains only Python sources and these should already be highly portable.

A more complex situation is when your package introduces some extensions written, for example, in C. Source distributions will still work provided that the package user has proper development toolchain in his/her environment. This consists mostly of the compiler and proper C header files. For such cases, the build distribution format may be better suited because it can provide already built extensions for specific platforms.

3.2.1. sdist

The sdist command is the simplest command available. It creates a release tree where everything that is needed to run the package is copied to. This tree is then archived in one or many archived files (often, it just creates one tarball). The archive is basically a copy of the source tree.

This command is the easiest way to distribute a package that would be independent from the target system. It creates a dist/ directory for storing the archives to be distributed. Before you create the first distribution, you have to provide a setup() call with a version number, as follows. If you don’t, setuptools module will assume default value of version = '0.0.0':

from setuptools import setup
setup(name='acme.sql', version='0.1.1')

Every time a package is released, the version number should be increased so that the target system knows the package has changed.

Let’s run the following sdist command for acme.sql package in 0.1.1 version:

$ python setup.py sdist
running sdist
...
creating dist
tar -cf dist/acme.sql-0.1.1.tar acme.sql-0.1.1
gzip -f9 dist/acme.sql-0.1.1.tar
removing 'acme.sql-0.1.1' (and everything under it)

$ ls dist/
acme.sql-0.1.1.tar.gz

Note

On Windows, the default archive type will be ZIP.

The version is used to mark the name of the archive, which can be distributed and installed on any system that has Python. In the sdist distribution, if the package contains C libraries or extensions, the target system is responsible for compiling them. This is very common for Linux-based systems or macOS because they commonly provide a compiler. But it is less usual to have it under Windows. That’s why a package should always be distributed with a prebuilt distribution as well, when it is intended to be run on several platforms.

3.2.2. bdist and wheels

To be able to distribute a prebuilt distribution, distutils provides the build command. This commands compiles the package in the following four steps:

  • build_py: This builds pure Python modules by byte-compiling them and copying them into the build folder.

  • build_clib: This builds C libraries, when the package contains any, using Python compiler and creating a static library in the build folder.

  • build_ext: This builds C extensions and puts the result in the build folder like build_clib.

  • build_scripts: This builds the modules that are marked as scripts. It also changes the interpreter path when the first line was set (using !# prefix) and fixes the file mode so that it is executable.

Each of these steps is a command that can be called independently. The result of the compilation process is a build folder that contains everything needed for the package to be installed. There’s no cross-compiler option yet in the distutils package. This means that the result of the command is always specific to the system it was built on.

When some C extensions have to be created, the build process uses the default system compiler and the Python header file (Python.h). This include file is available from the time Python was built from the sources. For a packaged distribution, an extra package for your system distribution is probably required. At least in popular Linux distributions, it is often named python-dev. It contains all the necessary header files for building Python extensions.

The C compiler used in the build process is the compiler that is default for your operating system. For a Linux-based system or macOS, this would be gcc or clang respectively. For Windows, Microsoft Visual C++ can be used (there’s a free command-line version available). The open source project MinGW can be used as well. This can be configured in distutils.

The build command is used by the bdist command to build a binary distribution. It invokes build and all the dependent commands, and then creates an archive in the same way as sdist does.

Let’s create a binary distribution for acme.sql on macOS as follows:

$ python setup.py bdist
running bdist
running bdist_dumb
running build
...
running install_scripts
tar -cf dist/acme.sql-0.1.1.macosx-10.3-fat.tar .
gzip -f9 acme.sql-0.1.1.macosx-10.3-fat.tar
removing 'build/bdist.macosx-10.3-fat/dumb' (and everything under it)

$ ls dist/
acme.sql-0.1.1.macosx-10.3-fat.tar.gz
acme.sql-0.1.1.tar.gz

Notice that the newly created archive’s name contains the name of the system and the distribution it was built on (macOS 10.3).

The same command invoked on Windows will create a another system, specific distribution archive as follows:

C:\acme.sql> python.exe setup.py bdist
...

C:\acme.sql> dir dist
25/02/2008      08:18       <DIR>       .
25/02/2008      08:18       <DIR>       ..
25/02/2008      08:24                   16 055 acme.sql-0.1.1.win32.zip
                    1 File(s)   16 055 bytes
                    2 Dir(s)    22 239 752 192 bytes free

If a package contains C code, apart from a source distribution, it’s important to release as many different binary distributions as possible. At the very least, a Windows binary distribution is important for those who most probably don’t have a C compiler installed.

A binary release contains a tree that can be copied directly into the Python tree. It mainly contains a folder that is copied into Python’s site-packages folder. It may also contain cached bytecode files (*.pyc files on Python 2 and __pycache__/*.pyc on Python 3).

The other kind of build distributions are wheels provided by the wheel package. When installed (for example, using pip), the wheel package adds a new bdist_wheel command to the distutils. It allows creating platform specific distributions (currently only for Windows, macOS, and Linux) that are better alternatives to normal bdist distributions. It was designed to replace another distribution format introduced earlier by setuptools called eggs. Eggs are now obsolete, so won’t be featured in here. The list of advantages of using wheels is quite long. Here are the ones that are mentioned on the Python Wheels page (http://pythonwheels.com/):

  • Faster installation for pure Python and native C extension packages

  • Avoids arbitrary code execution for installation. (avoids setup.py)

  • Installation of a C extension does not require a compiler on Windows, macOS, or Linux.

  • Allows better caching for testing and continuous integration.

  • Creates .pyc files as part of the installation to ensure they match the Python interpreter used

  • More consistent installs across platforms and machines

According to PyPA’s recommendation, wheels should be your default distribution format. For a very long time, the binary wheels for Linux were not supported, but that has changed fortunately. Binary wheels for Linux are called manylinux wheels. The process of building them is unfortunately not as straightforward as for Windows and macOS binary wheels. For these kind of wheels, PyPA maintains special Docker images that serve as a ready-to- use build environments. For sources of these images and more information, you can visit their official repository on GitHub: https://github.com/pypa/manylinux.

4. Standalone executables

Creating standalone executables is a commonly overlooked topic in materials that cover packaging of Python code. This is mainly because Python lacks proper tools in its standard library that could allow programmers to create simple executables that could be run by users without the need to install the Python interpreter.

Compiled languages have a big advantage over Python in that they allow you to create an executable application for the given system architecture that could be run by users in a way that does not require from them any knowledge of the underlying technology. Python code, when distributed as a package, requires the Python interpreter in order to be run. This creates a big inconvenience for users who do not have enough technical proficiency.

Developer-friendly operating systems, such as macOS or most Linux distributions, come with Python interpreter preinstalled. So, for their users, the Python-based application still could be distributed as a source package that relies on a specific interpreter directive in the main script file that is popularly called shebang. For most of Python applications, this takes the following form:

#!/usr/bin/env python

Such directive when used as a first line of script will mark it to be interpreted in the default Python version for the given environment. This can, of course, take a more detailed form that requires a specific Python version such as python3.4, python3, python2 and so on. Note that this will work in most popular POSIX systems, but isn’t portable at all. This solution relies on the existence of specific Python versions and also the availability of an env executable exactly at /usr/bin/env. Both of these assumptions may fail on some operating systems. Also, shebang will not work on Windows at all. Additionally, bootstrapping of the Python environment on Windows can be a challenge even for experienced developers, so you cannot expect that nontechnical users will be able to do that by themselves.

The other thing to consider is the simple user experience in the desktop environment. Users usually expect that applications can be run from the desktop by simply clicking on them. Not every desktop environment will support that with Python applications distributed as a source.

So it would be best if we are able to create a binary distribution that would work as any other compiled executable. Fortunately, it is possible to create an executable that has both the Python interpreter and our project embedded. This allows users to open our application without caring about Python or any other dependency.

4.1. When standalone executables useful?

Standalone executables are useful in situations where simplicity of the user experience is more important than the user’s ability to interfere with the applications code. Note that the fact that you are distributing applications as executables only makes code reading or modification harder, not impossible. It is not a way to secure application code and should only be used as a way to make interacting with the application simpler.

Standalone executables should be a preferred way of distributing applications for nontechnical end users and also seems to be the only reasonable way of distributing any Python application for Windows.

Standalone executables are usually a good choice for the following:

  • Applications that depend on specific Python versions that may not be easily available on the target operating systems

  • Applications that rely on modified precompiled CPython sources

  • Applications with graphical interfaces

  • Projects that have many binary extensions written in different languages

  • Games