Packaging¶

This chapter focuses on a repeatable process of writing and releasing Python packages. We will see how to shorten the time needed to set up everything before starting the real work. We will also learn how to provide a standardized way to write packages and ease the use of a test-driven development approach. We will finally learn how to facilitate the release process.

It is organized into the following four parts:

A common pattern for all packages that describes the similarities between all Python packages, and how distutils and setuptools play a central role the packaging process.
What are namespace packages and why they can be useful?
How to register and upload packages in the Python Package Index (PyPI) with emphasis on security and common pitfalls.
The standalone executables as an alternative way to package and distribute Python applications.

1. Creating a package¶

Python packaging can be a bit overwhelming at first. The main reason for that is the confusion about proper tools for creating Python packages. Anyway, once you create your first package, you will see that this is not as hard as it looks. Also, knowing proper, state-of- the art packaging tools helps a lot.

You should know how to create packages even if you are not interested in distributing your code as open source. Knowing how to make your own packages will give you more insight in the packaging ecosystem and will help you to work with third-party code that is available on PyPI that you are probably already using.

Also, having your closed source project or its components available as source distribution packages can help you to deploy your code in different environments. The advantages of leveraging the Python packaging ecosystem in the code deployment process will be described in more detail in the next chapter. Here we will focus on proper tools and techniques to create such distributions.

1.1. The confusing state of Python packaging tools¶

The state of Python packaging was very confusing for a long time and it took many years to bring organization to this topic. Everything started with the distutils package introduced in 1998, which was later enhanced by setuptools in 2003. These two projects started a long and knotted story of forks, alternative projects, and complete rewrites that tried to (once and for all) fix the Python packaging ecosystem. Unfortunately, most of these attempts never succeeded. The effect was quite the opposite. Each new project that aimed to supersede setuptools or distutils only added to the already huge confusion around packaging tools. Some of such forks were merged back to their ancestors (such as distribute which was a fork of setuptools) but some were left abandoned (such as distutils2).

Fortunately, this state is gradually changing. An organization called the Python Packaging Authority (PyPA) was formed to bring back the order and organization to the packaging ecosystem. The Python Packaging User Guide (https://packaging.python.org), maintained by PyPA, is the authoritative source of information about the latest packaging tools and best practices. Treat that site as the best source of information about packaging and complementary reading for this chapter. This guide also contains a detailed history of changes and new projects related to packaging. So it is worth reading it, even if you already know a bit about packaging, to make sure you still use the proper tools.

Stay away from other popular internet resources, such as “The Hitchhiker’s Guide to Packaging”. It is old, not maintained, and mostly obsolete. It may be interesting only for historical reasons, and the Python Packaging User Guide is in fact a fork of this old resource.

1.1.1. The current landscape of Python packaging thanks to PyPA¶

PyPA, besides providing an authoritative guide for packaging, also maintains packaging projects and a standardization process for new official aspects of Python packaging. All of PyPA’s projects can be found under a single organization on GitHub: https://github.com/pypa

Some of them were already mentioned. The following are the most notable:

pip
virtualenv
twine
warehouse

Note that most of them were started outside of this organization and were moved under PyPA patronage when they become mature and widespread solutions.

Thanks to PyPA engagement, the progressive abandonment of the eggs format in favor of wheels for built distributions has already happened. Also thanks to the commitment of the PyPA community, the old PyPI implementation was finally totally rewritten in the form of the Warehouse project. Now, PyPI has got a modernized user interface and many long- awaited usability improvements and features.

1.1.2. Tool recommendations¶

The Python Packaging User Guide gives a few suggestions on recommended tools for working with packages. They can be generally divided into the following two groups:

Tools for installing packages
Tools for package creation and distribution

Utilities from the first group recommended are:

Use pip for installing packages from PyPI.
Use virtualenv or venv for application-level isolation of the Python runtime environment.

The Python Packaging User Guide recommendations of tools for package creation and distribution are as follows:

Use setuptools to define projects and create source distributions.
Use wheels in favor of eggs to create built distributions.
Use twine to upload package distributions to PyPI.

1.2. Project configuration¶

It should be obvious that the easiest way to organize the code of big applications is to split them into several packages. This makes the code simpler, easier to understand, maintain, and change. It also maximizes the reusability of your code. Separate packages act as components that can be used in various programs.

1.2.1. setup.py¶

The root directory of a package that has to be distributed contains a setup.py script. It defines all metadata as described in the distutils module. Package metadata is expressed as arguments in a call to the standard setup() function. Despite distutils being the standard library module provided for the purpose of code packaging, it is actually recommended to use the setuptools instead. The setuptools package provides several enhancements over the standard distutils module.

Therefore, the minimum content for this file is as follows:

from setuptools import setup

setup(
    name='mypackage'
)

name gives the full name of the package. From there, the script provides several commands that can be listed with the --help-commands option, as shown in the following code:

$ python3 setup.py --help-commands
Standard commands:
    build           build everything needed to install
    clean           clean up temporary files from 'build' command
    install         install everything from build directory
    sdist           create a source distribution (tarball, zip file, etc.)
    register        register the distribution with the Python package index
    bdist           create a built (binary) distribution
    check           perform some checks on the package
    upload          upload binary package to PyPI

Extra commands:
    bdist_wheel     create a wheel distribution
    alias           define a shortcut to invoke one or more commands
    develop         install package in 'development mode'

usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
or: setup.py --help [cmd1 cmd2 ...]
or: setup.py --help-commands
or: setup.py cmd --help

The actual list of commands is longer and can vary depending on the available setuptools extensions. It was truncated to show only those that are most important and relevant to this chapter. Standard commands are the built-in commands provided by distutils, whereas extra commands are the ones provided by third-party packages, such as setuptools or any other package that defines and registers a new command. Here, one such extra command registered by another package is bdist_wheel, provided by the wheel package.

1.2.2. setup.cfg¶

The setup.cfg file contains default options for commands of the setup.py script. This is very useful if the process for building and distributing the package is more complex and requires many optional arguments to be passed to the setup.py script commands. This setup.cfg file allows you to store such default parameters together with your source code on a per project basis. This will make your distribution flow independent from the project and also provides transparency about how your package was built/distributed to the users and other team members.

The syntax for the setup.cfg file is the same as provided by the built-in configparser module so it is similar to the popular Microsoft Windows INI files. Here is an example of the setup.cfg configuration file that provides some global, sdist, and bdist_wheel commands’ defaults:

[global]
quiet=1

[sdist]
formats=zip,tar

[bdist_wheel]
universal=1

This example configuration will ensure that source distributions (sdist section) will always be created in two formats (ZIP and TAR) and the built wheel distributions (bdist_wheel section) will be created as universal wheels that are independent from the Python version. Also most of the output will be suppressed on every command by the global --quiet switch. Note that this option is included here only for demonstration purposes and it may not be a reasonable choice to suppress the output for every command by default.

1.2.3. MANIFEST.in¶

When building a distribution with the sdist command, the distutils module browses the package directory looking for files to include in the archive. By default distutils will include the following:

All Python source files implied by the py_modules, packages, and scripts arguments
All C source files listed in the ext_modules argument
Files that match the glob pattern test/test*.py
Files named README, README.txt, setup.py, and setup.cfg

Besides that, if your package is versioned with a version control system such as Subversion, Mercurial, or Git, there is the possibility to auto-include all version controlled files using additional setuptools extensions such as setuptools-svn, setuptools-hg, and setuptools-git. Integration with other version control systems is also possible through other custom extensions. No matter if it is the default built-in collection strategy or one defined by custom extension, the sdist will create a MANIFEST file that lists all files and will include them in the final archive.

Let’s say you are not using any extra extensions, and you need to include in your package distribution some files that are not captured by default. You can define a template called MANIFEST.in in your package root directory (the same directory as setup.py file). This template directs the sdist command on which files to include.

This MANIFEST.in template defines one inclusion or exclusion rule per line:

include HISTORY.txt
include README.txt
include CHANGES.txt
include CONTRIBUTORS.txt
include LICENSE
recursive-include *.txt *.py

The full list of the MANIFEST.in commands can be found in the official distutils documentation.

1.2.4. Most important metadata¶

Besides the name and the version of the package being distributed, the most important arguments that the setup() function can receive are as follows:

description: This includes a few sentences to describe the package.
long_description: This includes a full description that can be in reStructuredText (default) or other supported markup languages.
long_description_content_type: this defines MIME type of long description; it is used to tell the package repository what kind of markup language is used for the package description.
keywords: This is a list of keywords that define the package and allow for better indexing in the package repository.
author: This is the name of the package author or organization that takes care of it.
author_email: This is the contact email address.
url: This is the URL of the project.
license: This is the name of the license (GPL, LGPL, and so on) under which the package is distributed.
packages: This is a list of all package names in the package distribution; setuptools provides a small function called find_packages that can automatically find package names to include.
namespace_packages: This is a list of namespace packages within package distribution.

1.2.5. Trove classifiers¶

PyPI and distutils provide a solution for categorizing applications with the set of classifiers called trove classifiers. All trove classifiers form a tree-like structure. Each classifier string defines a list of nested namespaces where every namespace is separated by the :: substring. Their list is provided to the package definition as a classifiers argument of the setup() function.

Here is an example list of classifiers taken from solrq project available on PyPI:

from setuptools import setup


setup(
    name="solrq",
    # (...)
    classifiers=[
    'Development Status :: 4 - Beta',
    'Intended Audience :: Developers',
    'License :: OSI Approved :: BSD License',
    'Operating System :: OS Independent',
    'Programming Language :: Python',
    'Programming Language :: Python :: 2',
    'Programming Language :: Python :: 2.6',
    'Programming Language :: Python :: 2.7',
    'Programming Language :: Python :: 3',
    'Programming Language :: Python :: 3.2',
    'Programming Language :: Python :: 3.3',
    'Programming Language :: Python :: 3.4',
    'Programming Language :: Python :: Implementation :: PyPy',
    'Topic :: Internet :: WWW/HTTP :: Indexing/Search',
    ]
)

Trove classifiers are completely optional in the package definition but provide a useful extension to the basic metadata available in the setup() interface. Among others, trove classifiers may provide information about supported Python versions, supported operating systems, the development stage of the project, or the license under which the code is released. Many PyPI users search and browse the available packages by categories so a proper classification helps packages to reach their target.

Trove classifiers serve an important role in the whole packaging ecosystem and should never be ignored. There is no organization that verifies packages classification, so it is your responsibility to provide proper classifiers for your packages and not introduce chaos to the whole package index.

At the time of writing this section, there are 667 classifiers available on PyPI that are grouped into the following nine major categories:

Development status
Environment
Framework
Intended audience
License
Natural language
Operating system
Programming language
Topic

This list is ever-growing, and new classifiers are added from time to time. It is thus possible that the total count of them will be different at the time you read this. The full list of currently available trove classifiers is available at https://pypi.org/classifiers.

1.2.6. Common patterns¶

Creating a package for distribution can be a tedious task for unexperienced developers. Most of the metadata that setuptools or distutils accept in their setup() function call can be provided manually ignoring the fact that this metadata may be also available in other parts of the project. Here is an example:

from setuptools import setup


setup(
    name="myproject",
    version="0.0.1",
    description="mypackage project short description",
    long_description="""
        Longer description of mypackage project
        possibly with some documentation and/or
        usage examples
    """,
    install_requires=[
        'dependency1',
        'dependency2',
        'etc'
    ]
)

Some of the metadata elements are often found in different places in a typical Python project. For instance, content of long description is commonly included in the project’s README file, and it is a good convention to put a version specifier in the __init__ module of the package. Hardcoding such package metadata as setup() function arguments redundancy to the project that allows for easy mistakes and inconsistencies in future. Both setuptools and distutils cannot automatically pick metadata information from the project sources, so you need to provide it yourself. There are some common patterns among the Python community for solving the most popular problems such as dependency management, version/readme inclusion, and so on. It is worth knowing at least a few of them because they are so popular that they could be considered as packaging idioms.

1.2.6.1. Automated inclusion of version string from package¶

The PEP 440 “Version Identification and Dependency Specification” document specifies a standard for version and dependency specification. It is a long document that covers accepted version specification schemes and defines how version matching and comparison in Python packaging tools should work. If you are using or plan to use a complex project version numbering scheme, then you should definitely read this document carefully. If you are using a simple scheme that consists just of one, two, three, or more numbers separated by dots, then you don’t have to dig into the details of PEP 440. If you don’t know how to choose the proper versioning scheme, I greatly recommend following the semantic versioning scheme.

The other problem related to code versioning is where to include that version specifier for a package or module. There is PEP 396 (Module Version Numbers) that deals exactly with this problem. PEP 396 is only an informational document and has a deferred status, so it is not a part of the official Python standards track. Anyway, it describes what seems to be a de facto standard now. According to PEP 396, if a package or module has a specific version defined, the version specifier should be included as a __version__ attribute of package root __init__.py INI file or distributed module file. Another de facto standard is to also include the VERSION attribute that contains the tuple of the version specifier parts. This helps users to write compatibility code because such version tuples can be easily compared if the versioning scheme is simple enough.

So many packages available on PyPI follow both conventions. Their __init__.py files contain version attributes that look like the following:

VERSION = (0, 1, 1)
__version__ = ".".join([str(x) for x in VERSION])

The other suggestion of PEP 396 is that the version argument provided in the setup() function of the setup.py script should be derived from __version__, or the other way around. The Python Packaging User Guide features multiple patterns for single- sourcing project versioning, and each of them has its own advantages and limitations. My personal favorite is rather long and is not included in the PyPA’s guide, but has the advantage of limiting the complexity only to the setup.py script. This boilerplate assumes that the version specifier is provided by the VERSION attribute of the package’s __init__ module and extracts this data for inclusion in the setup() call. Here is an excerpt from some imaginary package’s setup.py script that illustrates this approach:

from setuptools import setup
import os

def get_version(version_tuple):
    if not isinstance(version_tuple[-1], int):
        return '.'.join(map(str, version_tuple[:-1])) + version_tuple[-1]
    return '.'.join(map(str, version_tuple))

init = os.path.join(os.path.dirname(__file__), 'src', 'some_package', '__init__.py')
version_line = list(filter(lambda l: l.startswith('VERSION'), open(init)))[0]
PKG_VERSION = get_version(eval(version_line.split('=')[-1]))

setup(
    name='some-package',
    version=PKG_VERSION,
    # ...
)

1.2.6.2. README file¶

The Python Package Index can display the project’s README file or the value of long_description on the package page in the PyPI portal. PyPI is able to interpret the markup used in the long_description content and render it as HTML on the package page. The type of markup language is controlled through the long_description_content_type argument of the setup() call. For now, there are the following three choices for markup available:

Plain text with long_description_content_type='text/plain'
reStructuredText with long_description_content_type='text/x-rst'
Markdown with long_description_content_type='text/markdown'

Markdown and reStructuredText are the most popular choices among Python developers, but some might still want to use different markup languages for various reasons. If you want to use something different as your markup language for your project’s README, you can still provide it as a project description on the PyPI page in a readable form. The trick lies in using the pypandoc package to translate your other markup language into reStructuredText (or Markdown) while uploading the package to the Python Package Index. It is important to do it with a fallback to plain content of your README file, so the installation won’t fail if the user has no pypandoc installed. The following is an example of a setup.py script that is able to read the content of the README file written in AsciiDoc markup language and translate it to reStructuredText before including a long_description argument:

from setuptools import setup


try:
    from pypandoc import convert

    def read_md(file_path):
        return convert(file_path, to='rst', format='asciidoc')

except ImportError:
    convert = None
    print("warning: pypandoc module not found, could not convert Asciidoc to RST")

    def read_md(file_path):
        with open(file_path, 'r') as f:
        return f.read()

README = os.path.join(os.path.dirname(__file__), 'README')

setup(
    name='some-package',
    long_description=read_md(README),
    long_description_content_type='text/x-rst',
    # ...
)

1.2.6.3. Managing dependencies¶

Many projects require some external packages to be installed in order to work properly. When the list of dependencies is very long, there comes a question as to how to manage it. The answer in most cases is very simple. Do not over-engineer it. Keep it simple and provide the list of dependencies explicitly in your setup.py script as follows:

from setuptools import setup


setup(
    name='some-package',
    install_requires=['falcon', 'requests', 'delorean']
    # ...
)

Some Python developers like to use requirements.txt files for tracking lists of dependencies for their packages. In some situations, you might find some reason for doing that, but in most cases, this is a relic of times where the code of that project was not properly packaged. Anyway, even such notable projects as Celery still stick to this convention. So if you are not willing to change your habits or you are somehow forced to use requirement files, then at least do it properly. Here is one of the popular idioms for reading the list of dependencies from the requirements.txt file:

from setuptools import setup
import os

def strip_comments(l):
    return l.split('#', 1)[0].strip()

def reqs(*f):
    return list(filter(None, [strip_comments(l)
                              for l in open(os.path.join(os.getcwd(), *f)).readlines()]))
setup(
    name='some-package',
    install_requires=reqs('requirements.txt')
    # ...
)

1.2.7. The custom setup command¶

distutils allows you to create new commands. A new command can be registered with an entry point, which was introduced by setuptools as a simple way to define packages as plugins.

An entry point is a named link to a class or a function that is made available through some APIs in setuptools. Any application can scan for all registered packages and use the linked code as a plugin.

To link the new command, the entry_points metadata can be used in the setup call as follows:

setup(
    name="my.command",
    entry_points="""
        [distutils.commands]
        my_command = my.command.module.Class
    """
)

All named links are gathered in named sections. When distutils is loaded, it scans for links that were registered under distutils.commands.

This mechanism is used by numerous Python applications that provide extensibility.

1.3. Working with packages during development¶

Working with setuptools is mostly about building and distributing packages. However, you still need to use setuptools to install packages directly from project sources. And the reason for that is simple. It is a good habit to test if our packaging code works properly before submitting your package to PyPI. And the simplest way to test it is by installing it. If you send a broken package to the repository, then in order to re-upload it, you need to increase the version number.

Testing if your code is packaged properly before the final distribution saves you from unnecessary version number inflation and obviously from wasting your time. Also, installation directly from your own sources using setuptools may be essential when working on multiple related packages at the same time.

1.3.1. setup.py install¶

The install command installs the package in your current Python environment. It will try to build the package if no previous build was made and then inject the result into the filesystem directory where Python is looking for installed packages. If you have an archive with a source distribution of some package, you can decompress it in a temporary folder and then install it with this command. The install command will also install dependencies that are defined in the install_requires argument. Dependencies will be installed from the Python Package Index.

An alternative to the bare setup.py script when installing a package is to use pip. Since it is a tool that is recommended by PyPA, you should use it even when installing a package in your local environment just for development purposes. In order to install a package from local sources, run the following command:

pip install <project-path>

1.3.2. Uninstalling packages¶

Amazingly, setuptools and distutils lack the uninstall command. Fortunately, it is possible to uninstall any Python package using pip as follows:

pip uninstall <package-name>

Uninstalling can be a dangerous operation when attempted on system-wide packages. This is another reason why it is so important to use virtual environments for any development.

1.3.3. setup.py develop or pip -e¶

Packages installed with setup.py install are copied to the site-packages directory of your current Python environment. This means that whenever you make a change to the sources of that package, you are required to reinstall it. This is often a problem during intensive development because it is very easy to forget about the need to perform installation again. This is why setuptools provides an extra develop command that allows you to install packages in the development mode. This command creates a special link to project sources in the deployment directory (site-packages) instead of copying the whole package there. Package sources can be edited without the need for reinstallation and are available in the sys.path as if they were installed normally.

pip also allows you to install packages in such a mode. This installation option is called editable mode and can be enabled with the -e parameter in the install command as follows:

pip install -e <project-path>

Once you install the package in your environment in editable mode, you can freely modify the installed package in place and all the changes will be immediately visible without the need to reinstall the package.

2. Namespace packages¶

The Zen of Python that you can read after writing import this in the interpreter session says the following about namespaces:

“Namespaces are one honking great idea-let’s do more of those!”

And this can be understood in at least two ways. The first is a namespace in context of the language. We all use the following namespaces without even knowing:

The global namespace of a module
The local namespace of the function or method invocation
The class namespace

The other kind of namespaces can be provided at the packaging level. These are namespace packages. This is often an overlooked feature of Python packaging that can be very useful in structuring the package ecosystem in your organization or in a very large project.

Namespace packages can be understood as a way of grouping related packages, where each of these packages can be installed independently.

Namespace packages are especially useful if you have components of your application developed, packaged, and versioned independently but you still want to access them from the same namespace. This also helps to make clear to which organization or project every package belongs. For instance, for some imaginary Acme company, the common namespace could be acme. Therefore this organization could create the general acme namespace package that could serve as a container for other packages from this organization. For example, if someone from Acme wants to contribute to this namespace with, for example, an SQL-related library, they can create a new acme.sql package that registers itself in the acme namespace.

It is important to know what’s the difference between normal and namespace packages and what problem they solve. Normally (without namespace packages), you would create a package called acme with an sql subpackage/submodule with the following file structure:

$ tree acme/
acme/
├── acme
│    ├── __init__.py
│    └── sql
│       └── __init__.py
└── setup.py

2 directories, 3 files

Whenever you want to add a new subpackage, let’s say templating, you are forced to include it in the source tree of acme as follows:

$ tree acme/
acme/
├── acme
│   ├── __init__.py
│   ├── sql
│   │    └── __init__.py
│   └── templating
│        └── __init__.py
└── setup.py

3 directories, 4 files

Such an approach makes independent development of acme.sql and acme.templating almost impossible. The setup.py script will also have to specify all dependencies for every subpackage. So it is impossible (or at least very hard) to have an installation of some of the acme components optional. Also, with enough subpackages it is practically impossible to avoid dependency conflicts.

With namespace packages, you can store the source tree for each of these subpackages independently as follows:

$ tree acme.sql/
acme.sql/
├── acme
│    └── sql
│       └── __init__.py
└── setup.py

2 directories, 2 files


$ tree acme.templating/
acme.templating/
├── acme
│   └── templating
│       └── __init__.py
└── setup.py

2 directories, 2 files

And you can also register them independently in PyPI or any package index you use. Users can choose which of the subpackages they want to install from the acme namespace as follows, but they never install the general acme package (it doesn’t even have to exist):

$ pip install acme.sql acme.templating

Note that independent source trees are not enough to create namespace packages in Python. You need a bit of additional work if you don’t want your packages to not overwrite each other. Also proper handling may be different depending on the Python language version you target. Details of that are described in the next two sections.

2.1. Implicit namespace packages¶

If you use and target only Python 3, then there is good news for you. PEP 420 (Implicit Namespace Packages) introduced a new way to define namespace packages. It is part of the standards track and became an official part of the language since version 3.3. In short, every directory that contains Python packages or modules (including namespace packages too) is considered a namespace package if it does not contain the __init__.py file. So, the following are examples of file structures presented in the previous section:

$ tree acme.sql/
acme.sql/
├── acme
│    └── sql
│       └── __init__.py
└── setup.py

2 directories, 2 files


$ tree acme.templating/
acme.templating/
├── acme
│   └── templating
│       └── __init__.py
└── setup.py

2 directories, 2 files

They are enough to define that acme is a namespace package under Python 3.3 and later. Minimal setup.py for acme.templating package will look like following:

from setuptools import setup
setup(
    name='acme.templating',
    packages=['acme.templating'],
)

Unfortunately, the setuptools.find_packages() function does not support PEP 420 at the time of writing this section. This may change in the future. Also, a requirement to explicitly define a list of packages seems to be a very small price to pay for easy integration of namespace packages.

2.2. Namespace packages in previous Python versions¶

You can’t use implicit namespace packages (PEP 420 layout) in Python versions older than 3.3. Still, the concept of namespace packages is very old and was commonly used for years in such mature projects such as Zope. It means that it is definitely possible to use namespace packages in older version of Python. Actually, there are several ways to define that the package should be treated as a namespace.

The simplest one is to create a file structure for each component that resembles an ordinary package layout without implicit namespace packages and leave everything to setuptools.

So, the example layout for acme.sql and a cme.templating could be the following:

$ tree acme.sql/
acme.sql/
├── acme
│    ├── __init__.py
│    └── sql
│       └── __init__.py
└── setup.py

2 directories, 3 files


$ tree acme.templating/
acme.templating/
├── acme
│    ├── __init__.py
│    └── templating
│       └── __init__.py
└── setup.py

2 directories, 3 files

Note that for both acme.sql and acme.templating, there is an additional source file, acme/__init__.py. This file must be left empty. The acme namespace package will be created if we provide its name as a value of the namespace_packages keyword argument of the setuptools.setup() function as follows:

from setuptools import setup


setup(
    name='acme.templating',
    packages=['acme.templating'],
    namespace_packages=['acme'],
)

Easiest does not mean best. The setuptools module in order to register a new namespace will call for the pkg_resources.declare_namespace() function in your __init__.py file. It will happen even if the __init__.py file is empty. Anyway, as the official documentation says, it is your own responsibility to declare namespaces in the __init__.py file, and this implicit behavior of setuptools may be dropped in the future. In order to be safe and future-proof, you need to add the following line to the acme/__init__.py file:

__import__('pkg_resources').declare_namespace(__name__)

This line will make your namespace package safe from potential future changes regarding namespace packages in the setuptools module.

3. Uploading a package¶

Packages would be useless without an organized way to store, upload, and download them. Python Package Index is the main source of open source packages in the Python community. Anyone can freely upload new packages and the only requirement is to register on the PyPI site: https://pypi.python.org/pypi.

You are not, of course, limited to only this index and all Python packaging tools support the usage of alternative package repositories. This is especially useful for distributing closed source code among internal organizations or for deployment purposes. Details of such packaging usage with instructions on how to create your own package index will be explained in the next chapter. Here we focus mainly on open source uploads to PyPI, with only little mention on how to specify alternative repositories.

3.1. PyPI: Python Package Index¶

Python Package Index is, as already mentioned, the official source of open source package distributions. Downloading from it does not require any account or permission. The only thing you need is a package manager that can download new distributions from PyPI. Your preferred choice should be pip.

3.1.1. Uploading to PyPI¶

Anyone can register and upload packages to PyPI provided that he or she has an account registered. Packages are bound to the user, so, by default, only the user that registered the name of the package is its admin and can upload new distributions. This could be a problem for bigger projects, so there is an option to mark other users as package maintainers so that they are able to upload new distributions too.

The easiest way to upload a package is to use the following upload command of the setup.py script:

$ python setup.py <dist-commands> upload

Here, <dist-commands> is a list of commands that creates distributions to upload. Only distributions created during the same setup.py execution will be uploaded to the repository. So, if you upload source distribution, built distribution, and wheel package at once, then you need to issue the following command:

$ python setup.py sdist bdist bdist_wheel upload

When uploading using setup.py, you cannot reuse distributions that were already built in previous command calls and are forced to rebuild them on every upload. This may be inconvenient for large or complex projects where creation of the actual distribution may take a considerable amount of time. Another problem of setup.py upload is that it can use plain text HTTP or unverified HTTPS connections on some Python versions. This is why Twine is recommended as a secure replacement for the setup.py upload command.

Twine is the utility for interacting with PyPI that currently serves only one purpose: securely uploading packages to the repository. It supports any packaging format and always ensures that the connection is secure. It also allows you to upload files that were already created, so you are able to test distributions before release. The following example usage of twine still requires invoking the setup.py script for building distributions:

$ python setup.py sdist bdist_wheel
$ twine upload dist/*

3.1.2. .pypirc¶

.pypirc is a configuration file that stores information about Python packages repositories. It should be located in your home directory. The format for this file is as follows:

[distutils]
index-servers =
    pypi
    other

[pypi]
repository: <repository-url>
username: <username>
password: <password>

[other]
repository: https://example.com/pypi
username: <username>
password: <password>

The distutils section should have the index-servers variable that lists all sections describing all the available repositories and credentials for them. There are only the following three variables that can be modified for each repository section:

repository: This is the URL of the package repository (it defaults to https://pypi.org/).
username: This is the username for authentication in the given repository.
password: This is the user password for authentication in the given repository (in plain text).

Note that storing your repository password in plain text may not be the wisest security choice. You can always leave it blank and you should be prompted for it whenever it is necessary.

The .pypirc file should be respected by every packaging tool built for Python. While this may not be true for every packaging-related utility out there, it is supported by the most important ones, such as pip, twine, distutils and setuptools.

3.2. Source packages versus built packages¶

There are generally the following two types of distributions for Python packages:

Source distributions
Built (binary) distributions

Source distributions are the simplest and most platform independent. For pure Python packages, it is a no-brainer. Such a distribution contains only Python sources and these should already be highly portable.

A more complex situation is when your package introduces some extensions written, for example, in C. Source distributions will still work provided that the package user has proper development toolchain in his/her environment. This consists mostly of the compiler and proper C header files. For such cases, the build distribution format may be better suited because it can provide already built extensions for specific platforms.

3.2.1. sdist¶

The sdist command is the simplest command available. It creates a release tree where everything that is needed to run the package is copied to. This tree is then archived in one or many archived files (often, it just creates one tarball). The archive is basically a copy of the source tree.

This command is the easiest way to distribute a package that would be independent from the target system. It creates a dist/ directory for storing the archives to be distributed. Before you create the first distribution, you have to provide a setup() call with a version number, as follows. If you don’t, setuptools module will assume default value of version = '0.0.0':

from setuptools import setup
setup(name='acme.sql', version='0.1.1')

Every time a package is released, the version number should be increased so that the target system knows the package has changed.

Let’s run the following sdist command for acme.sql package in 0.1.1 version:

$ python setup.py sdist
running sdist
...
creating dist
tar -cf dist/acme.sql-0.1.1.tar acme.sql-0.1.1
gzip -f9 dist/acme.sql-0.1.1.tar
removing 'acme.sql-0.1.1' (and everything under it)

$ ls dist/
acme.sql-0.1.1.tar.gz

Note

On Windows, the default archive type will be ZIP.

The version is used to mark the name of the archive, which can be distributed and installed on any system that has Python. In the sdist distribution, if the package contains C libraries or extensions, the target system is responsible for compiling them. This is very common for Linux-based systems or macOS because they commonly provide a compiler. But it is less usual to have it under Windows. That’s why a package should always be distributed with a prebuilt distribution as well, when it is intended to be run on several platforms.

3.2.2. bdist and wheels¶

To be able to distribute a prebuilt distribution, distutils provides the build command. This commands compiles the package in the following four steps:

build_py: This builds pure Python modules by byte-compiling them and copying them into the build folder.
build_clib: This builds C libraries, when the package contains any, using Python compiler and creating a static library in the build folder.
build_ext: This builds C extensions and puts the result in the build folder like build_clib.
build_scripts: This builds the modules that are marked as scripts. It also changes the interpreter path when the first line was set (using !# prefix) and fixes the file mode so that it is executable.

Each of these steps is a command that can be called independently. The result of the compilation process is a build folder that contains everything needed for the package to be installed. There’s no cross-compiler option yet in the distutils package. This means that the result of the command is always specific to the system it was built on.

When some C extensions have to be created, the build process uses the default system compiler and the Python header file (Python.h). This include file is available from the time Python was built from the sources. For a packaged distribution, an extra package for your system distribution is probably required. At least in popular Linux distributions, it is often named python-dev. It contains all the necessary header files for building Python extensions.

The C compiler used in the build process is the compiler that is default for your operating system. For a Linux-based system or macOS, this would be gcc or clang respectively. For Windows, Microsoft Visual C++ can be used (there’s a free command-line version available). The open source project MinGW can be used as well. This can be configured in distutils.

The build command is used by the bdist command to build a binary distribution. It invokes build and all the dependent commands, and then creates an archive in the same way as sdist does.

Let’s create a binary distribution for acme.sql on macOS as follows:

$ python setup.py bdist
running bdist
running bdist_dumb
running build
...
running install_scripts
tar -cf dist/acme.sql-0.1.1.macosx-10.3-fat.tar .
gzip -f9 acme.sql-0.1.1.macosx-10.3-fat.tar
removing 'build/bdist.macosx-10.3-fat/dumb' (and everything under it)

$ ls dist/
acme.sql-0.1.1.macosx-10.3-fat.tar.gz
acme.sql-0.1.1.tar.gz

Notice that the newly created archive’s name contains the name of the system and the distribution it was built on (macOS 10.3).

The same command invoked on Windows will create a another system, specific distribution archive as follows:

C:\acme.sql> python.exe setup.py bdist
...

C:\acme.sql> dir dist
25/02/2008      08:18       <DIR>       .
25/02/2008      08:18       <DIR>       ..
25/02/2008      08:24                   16 055 acme.sql-0.1.1.win32.zip
                    1 File(s)   16 055 bytes
                    2 Dir(s)    22 239 752 192 bytes free

If a package contains C code, apart from a source distribution, it’s important to release as many different binary distributions as possible. At the very least, a Windows binary distribution is important for those who most probably don’t have a C compiler installed.

A binary release contains a tree that can be copied directly into the Python tree. It mainly contains a folder that is copied into Python’s site-packages folder. It may also contain cached bytecode files (*.pyc files on Python 2 and __pycache__/*.pyc on Python 3).

The other kind of build distributions are wheels provided by the wheel package. When installed (for example, using pip), the wheel package adds a new bdist_wheel command to the distutils. It allows creating platform specific distributions (currently only for Windows, macOS, and Linux) that are better alternatives to normal bdist distributions. It was designed to replace another distribution format introduced earlier by setuptools called eggs. Eggs are now obsolete, so won’t be featured in here. The list of advantages of using wheels is quite long. Here are the ones that are mentioned on the Python Wheels page (http://pythonwheels.com/):

Faster installation for pure Python and native C extension packages
Avoids arbitrary code execution for installation. (avoids setup.py)
Installation of a C extension does not require a compiler on Windows, macOS, or Linux.
Allows better caching for testing and continuous integration.
Creates .pyc files as part of the installation to ensure they match the Python interpreter used
More consistent installs across platforms and machines

According to PyPA’s recommendation, wheels should be your default distribution format. For a very long time, the binary wheels for Linux were not supported, but that has changed fortunately. Binary wheels for Linux are called manylinux wheels. The process of building them is unfortunately not as straightforward as for Windows and macOS binary wheels. For these kind of wheels, PyPA maintains special Docker images that serve as a ready-to- use build environments. For sources of these images and more information, you can visit their official repository on GitHub: https://github.com/pypa/manylinux.

4. Standalone executables¶

Creating standalone executables is a commonly overlooked topic in materials that cover packaging of Python code. This is mainly because Python lacks proper tools in its standard library that could allow programmers to create simple executables that could be run by users without the need to install the Python interpreter.

Compiled languages have a big advantage over Python in that they allow you to create an executable application for the given system architecture that could be run by users in a way that does not require from them any knowledge of the underlying technology. Python code, when distributed as a package, requires the Python interpreter in order to be run. This creates a big inconvenience for users who do not have enough technical proficiency.

Developer-friendly operating systems, such as macOS or most Linux distributions, come with Python interpreter preinstalled. So, for their users, the Python-based application still could be distributed as a source package that relies on a specific interpreter directive in the main script file that is popularly called shebang. For most of Python applications, this takes the following form:

#!/usr/bin/env python

Such directive when used as a first line of script will mark it to be interpreted in the default Python version for the given environment. This can, of course, take a more detailed form that requires a specific Python version such as python3.4, python3, python2 and so on. Note that this will work in most popular POSIX systems, but isn’t portable at all. This solution relies on the existence of specific Python versions and also the availability of an env executable exactly at /usr/bin/env. Both of these assumptions may fail on some operating systems. Also, shebang will not work on Windows at all. Additionally, bootstrapping of the Python environment on Windows can be a challenge even for experienced developers, so you cannot expect that nontechnical users will be able to do that by themselves.

The other thing to consider is the simple user experience in the desktop environment. Users usually expect that applications can be run from the desktop by simply clicking on them. Not every desktop environment will support that with Python applications distributed as a source.

So it would be best if we are able to create a binary distribution that would work as any other compiled executable. Fortunately, it is possible to create an executable that has both the Python interpreter and our project embedded. This allows users to open our application without caring about Python or any other dependency.

4.1. When standalone executables useful?¶

Standalone executables are useful in situations where simplicity of the user experience is more important than the user’s ability to interfere with the applications code. Note that the fact that you are distributing applications as executables only makes code reading or modification harder, not impossible. It is not a way to secure application code and should only be used as a way to make interacting with the application simpler.

Standalone executables should be a preferred way of distributing applications for nontechnical end users and also seems to be the only reasonable way of distributing any Python application for Windows.

Standalone executables are usually a good choice for the following:

Applications that depend on specific Python versions that may not be easily available on the target operating systems
Applications that rely on modified precompiled CPython sources
Applications with graphical interfaces
Projects that have many binary extensions written in different languages
Games

4.2. Popular tools¶

Python does not have any built-in support for building standalone executables. Fortunately, there are some community projects solving that problem with varied amounts of success. The following four are the most notable:

PyInstaller
cx_Freeze
py2exe
py2app

Each one of them is slightly different in use and also each one of them has slightly different limitations. Before choosing your tool, you need to decide which platform you want to target, because every packaging tool can support only a specific set of operating systems.

It is best if you make such a decision at the very beginning of the project’s life. None of these tools, of course, requires deep interaction in your code, but if you start building standalone packages early, you can automate the whole process and save future integration time and costs. If you leave this for later, you may find yourself in a situation where the project is built in such a sophisticated way that none of the available tools will work. Providing a standalone executable for such a project will be problematic and will take a lot of your time.

4.2.1. PyInstaller¶

PyInstaller (http://www.pyinstaller.org) is by far the most advanced program to freeze Python packages into standalone executables. It provides the most extensive multiplatform compatibility among every available solution at the moment, so it is the most highly recommended one. PyInstaller supports the following platforms:

Windows (32-bit and 64-bit)
Linux (32-bit and 64-bit)
macOS (32-bit and 64-bit)
FreeBSD, Solaris, and AIX

Supported versions of Python are Python 2.7 and Python 3.3, 3.4, and 3.5. It is available on PyPI, so it can be installed in your working environment using pip. If you have problems installing it this way, you can always download the installer from the project’s page.

Unfortunately, cross-platform building (cross-compilation) is not supported, so if you want to build your standalone executable for a specific platform, then you need to perform building on that platform. This is not a big problem today with the advent of many virtualization tools. If you don’t have a specific system installed on your computer, you can always use Vagrant, which will provide you with the desired operating system as a virtual machine.

Usage for simple applications is pretty straightforward. Let’s assume our application is contained in the script named myscript.py . This is a simple hello world application. We want to create a standalone executable for Windows users and we have our sources located under D://dev/app in the filesystem. Our application can be bundled with the following short command:

$ pyinstaller myscript.py
INFO: PyInstaller: 3.1
INFO: Python: 2.7.10
INFO: Platform: Windows-7-6.1.7601-SP1
INFO: wrote D:\dev\app\myscript.spec
INFO: UPX is not available.
INFO: Extending PYTHONPATH with paths ['D:\\dev\\app', 'D:\\dev\\app']
INFO: checking Analysis
INFO: Building Analysis because out00-Analysis.toc is non existent
INFO: Initializing module dependency graph...
INFO: Initializing module graph hooks...
INFO: running Analysis out00-Analysis.toc
(...)
INFO: Updating resource type 24 name 2 language 1033

PyInstaller’s standard output is quite long, even for simple applications, so it was truncated in the preceding example for the sake of brevity. If run on Windows, the resulting structure of directories and files will be as follows:

$ tree /0066
│   ├─── myscript.py
│   └─── myscript.spec
│
├───build
│    └───myscript
│           ├─── myscript.exe
│           ├─── myscript.exe.manifest
│           ├─── out00-Analysis.toc
│           ├─── out00-COLLECT.toc
│           ├─── out00-EXE.toc
│           ├─── out00-PKG.pkg
│           ├─── out00-PKG.toc
│           ├─── out00-PYZ.pyz
│           ├─── out00-PYZ.toc
│           └─── warnmyscript.txt
│
└───dist
    └───myscript
            ├─── bz2.pyd
            ├─── Microsoft.VC90.CRT.manifest
            ├─── msvcm90.dll
            ├─── msvcp90.dll
            ├─── msvcr90.dll
            ├─── myscript.exe
            ├─── myscript.exe.manifest
            ├─── python27.dll
            ├─── select.pyd
            ├─── unicodedata.pyd
            └───_hashlib.pyd

The dist/myscript directory contains the built application that can now be distributed to the users. Note that whole directory must be distributed. It contains all the additional files that are required to run our application (DLLs, compiled extension libraries, and so on). A more compact distribution can be obtained with the --onefile switch of the pyinstaller command as follows:

$ pyinstaller --onefile myscript.py
(...)

$ tree /f
├───build
│    └───myscript
│           ├─── myscript.exe.manifest
│           ├─── out00-Analysis.toc
│           ├─── out00-EXE.toc
│           ├─── out00-PKG.pkg
│           ├─── out00-PKG.toc
│           ├─── out00-PYZ.pyz
│           ├─── out00-PYZ.toc
│           └─── warnmyscript.txt
│
└───dist
    └───myscript
            └─── myscript.exe

When built with the --onefile option, the only file you need to distribute to other users is the single executable found in the dist directory (here, myscript.exe). For small applications, this is probably the preferred option.

One of the side effects of running the pyinstaller command is the creation of the *.spec file. This is an auto generated Python module containing specification on how to create executables from your sources. This is the example specification file created automatically for myscript.py code:

# -*- mode: python -*-
block_cipher = None
a = Analysis(['myscript.py'],
             pathex=['D:\\dev\\app'],
             binaries=None,
             datas=None,
             hiddenimports=[],
             hookspath=[],
             runtime_hooks=[],
             excludes=[],
             win_no_prefer_redirects=False,
             win_private_assemblies=False,
             cipher=block_cipher)
pyz = PYZ(a.pure, a.zipped_data, cipher=block_cipher)
exe = EXE(pyz,
          a.scripts,
          a.binaries,
          a.zipfiles,
          a.datas,
          name='myscript',
          debug=False,
          strip=False,
          upx=True,
          console=True)

This .spec file contains all pyinstaller arguments specified earlier. This is very useful if you have performed a lot of customizations to your build. Once created, you can use it as an argument to the pyinstaller command instead of your Python script as follows:

$ pyinstaller.exe myscript.spec

Note that this is a real Python module, so you can extend it and perform more complex customizations to the building procedure. Customizing the .spec file is especially useful when you are targeting many different platforms. Also, not all of the pyinstaller options are available through the command-line interface and can be used only when modifying .spec file.

PyInstaller is an extensive tool, which by its usage is very simple for the great majority of programs. Anyway, thorough reading of its documentation is recommended if you are interested in using it as a tool to distribute your applications.

4.2.2. cx_Freeze¶

cx_Freeze ( http://cx-freeze.sourceforge.net/) is another tool for creating standalone executables. It is a simpler solution than PyInstaller, but also supports the following three major platforms:

Windows
Linux
macOS

Like PyInstaller, it does not allow you to perform cross-platform builds, so you need to create your executables on the same operating system you are distributing to. The major disadvantage of cx_Freeze is that it does not allow you to create real single-file executables. Applications built with it need to be distributed with related DLL files and libraries. Assuming that we have the same application as featured in the PyInstaller section, the example usage is very simple as well:

$ cxfreeze myscript.py
copying C:\Python27\lib\site-packages\cx_Freeze\bases\Console.exe ->
D:\dev\app\dist\myscript.exe
copying C:\Windows\system32\python27.dll ->
D:\dev\app\dist\python27.dll
writing zip file D:\dev\app\dist\myscript.exe
(...)
copying C:\Python27\DLLs\bz2.pyd -> D:\dev\app\dist\bz2.pyd
copying C:\Python27\DLLs\unicodedata.pyd -> D:\dev\app\dist\unicodedata.pyd

Resulting structure of files is as follows:

$ tree /f
│   └─── myscript.py
│
└───dist
    ├─── bz2.pyd
    ├─── myscript.exe
    ├─── python27.dll
    └─── unicodedata.pyd

Instead of providing the own format for build specification (like PyInstaller does), cx_Freeze extends the distutils package. This means you can configure how your standalone executable is built with the familiar setup.py script. This makes cx_Freeze very convenient if you already distribute your package using setuptools or distutils because additional integration requires only small changes to your setup.py script. Here is an example of such a setup.py script using cx_Freeze.setup() for creating standalone executables on Windows:

import sys
from cx_Freeze import setup, Executable

# Dependencies are automatically detected, but it might need fine tuning.
build_exe_options = {"packages": ["os"], "excludes": ["tkinter"]}
setup(
    name="myscript",
    version="0.0.1",
    description="My Hello World application!",
    options={
        "build_exe": build_exe_options
    },
    executables=[Executable("myscript.py")]
)

With such a file, the new executable can be created using the new build_exe command added to the setup.py script as follows:

$ python setup.py build_exe

The usage of cx_Freeze seems a bit easier than PyInstaller’s, and distutils integration is a very useful feature. Unfortunately this project may cause some trouble for inexperienced developers due to the following reasons:

Installation using pip may be problematic under Windows.
The official documentation is very brief and lacking in some places.

4.2.3. py2exe and py2app¶

py2exe (http://www.py2exe.org) and py2app (https:://py2app.readthedocs.io/en/latest/) are two complementary programs that integrate with Python packaging either via distutils or setuptools in order to create standalone executables. Here they are mentioned together because they are very similar in both usage and their limitations. The major drawback of py2exe and py2app is that they target only a single platform:

py2exe allows building Windows executables.
py2app allows building macOS apps.

Because the usage is very similar and requires only modification of the setup.py script, these packages complement each other. The documentation of the py2app project provides the following example of the setup.py script, which allows you to build standalone executables with the right tool (either py2exe or py2app) depending on the platform used:

import sys
from setuptools import setup


mainscript = 'MyApplication.py'

if sys.platform == 'darwin':
    extra_options = dict(
        setup_requires=['py2app'],
        app=[mainscript],
        # Cross-platform applications generally expect sys.argv to
        # be used for opening files.
        options=dict(py2app=dict(argv_emulation=True))
    )
elif sys.platform == 'win32':
    extra_options = dict(
        setup_requires=['py2exe'],
        app=[mainscript]
    )
else:
    extra_options = dict(
    # Normally unix-like platforms will use "setup.py install"
    # and install the main script as such
        scripts=[mainscript],
    )
    setup(
        name="MyApplication",
        **extra_options
    )

With such a script, you can build your Windows executable using the python setup.py py2exe command and macOS app using python setup.py py2app. Cross-compilation is, of course, not possible.

Despite py2app and py2exe having obvious limitations and offering less elasticity than PyInstaller or cx_Freeze, it is always good to be familiar with them. In some cases, PyInstaller or cx_Freeze might fail to build the executable for the project properly. In such situations, it is always worth checking whether other solutions can handle your code.

4.3. Security of Python code in executable packages¶

It is important to know that standalone executables do not make the application code secure by any means. It is not an easy task to decompile the embedded code from such executable files, but it is definitely doable. What is even more important is that the results of such decompilation (if done with proper tools) might look strikingly similar to original sources.

This fact makes standalone Python executables not a viable solution for closed source projects where leaking of the application code could harm the organization. So, if your whole business can be copied simply by copying the source code of you application, then you should think of other ways to distribute the application. Maybe providing software as a service will be a better choice for you.

4.3.1. Making decompilation harder¶

As already said, there is no reliable way to secure applications from decompilation with the tools available at the moment. Still, there are some ways to make this process harder. But harder does not mean less probable. For some of us, the most tempting challenges are the hardest ones. And we all know that the eventual price in this challenge is very high—the code that you tried to secure.

Usually the process of decompilation consists of the following steps:

Extracting the project’s binary representation of bytecode from standalone executables
Mapping of a binary representation to bytecode of a specific Python version
Translation of bytecode to AST
Re-creation of sources directly from AST

Providing the exact solutions for deterring developers from such reverse engineering of standalone executables would be pointless for obvious reasons. So here are only some ideas for hampering the decompilation process or devaluing its results:

Removing any code metadata available at runtime (docstrings) so the eventual results will be a bit less readable.
Modifying the bytecode values used by the CPython interpreter; so conversion from binary to bytecode and later to AST requires more effort.
Using a version of CPython sources modified in such a complex way that even if decompiled sources of the application are available, they are useless without decompiling the modified CPython binary.
Using obfuscation scripts on sources before bundling them into an executable, which will make sources less valuable after the decompilation.

Such solutions make the development process a lot harder. Some of the preceding ideas require a very deep understanding of Python runtime, but each one of them is riddled with many pitfalls and disadvantages. Mostly, they only defer what is anyway inevitable. Once your trick is broken, it renders all your additional efforts a waste of time and resources.

The only reliable way to not allow your closed code to leak outside of your application is to not ship it directly to users in any form. And this is only possible if other aspects of your organization security stay airtight.