← previous: Gapminder Demo next: Understanding Docker Container ecosystem →

Python Package Management

Sachin Verma

in

Most new developers/users of python want to jumpstart and code their desired functionality. Many do not take time to understand how python packages are installed/updated on their system. But, if one really wants to stay away from python package installation troubles and want to distribute their modules to the world reliably, one should consider going through the basics of python package management.

Well, having said that, i must admit that python package management is slightly complex as there are a number of tools/utilities that are in use currently depending on the python version/platform you are using.

Python user base can be broadly categorized as:

  • Users: anyone who is just using the python packages to write some application or an automation test setup doing continuous integration.
  • Developers: some one who is creating a module/functionality and wants to share it with others.

How does python distribution look like when installed on the filesystem?

  • on modern *nix systems, standard python installation goes to the directory /usr/lib/pythonx.y.
  • Any additional packages you download would go to the directory /usr/local/lib/pythonx.y/site-packages or /usr/local/lib/pythonx.y/dist-packages
  • If a user wants to install packages for herself, she could use the --user flag with the pip command. and the package would be installed in $HOME/.local/lib/pythonx.y/site-packages/

Python install locations on a *nix system


How can I install new python packages on my machine?

  • High level package managers like pip and easy_install can be used to install additional python packages. Or, if you have the source archive of the python package with you, then you could simple do python setup.py install
  • pip is the recommended tool for doing installation for following reasons:
    • It has the ability to Install and Uninstall packages.
    • it can resolve all the dependencies for the package being installed and would automatically install all the dependencies.

How difficult is it to install/use a python module?

  • Not at all difficult. You can place a python module in any directory and then all you have to do is to add this path to the PYTHONPATH environment variable. This environment variable stores a list of paths on your system where it needs to search for a module. Now, as soon as you include a module from your python code, python runtime on your system would read this environment variable and would be able to load it for your program.
  • Installing/Using a module in an ad-hoc manner as in the previous paragraph is fine for local usage. But, if we want to deploy the same functionality/module on thousands of other systems around then we need to organize the package distribution and create a standard to use or discard a module from a system.

What constitutes a package?

A python package consists of the following elements:

  • python source files
  • Data/Resources if any
  • shared libraries if any
  • metadata

What should a package management system accomplish?

For any Package management system the primary goals are:

  • package must be installable/usable on any target system.
  • package dependencies should be easily tracked
  • It should be easier for other package managers to repackage your package.

Above goals pose a tough challenge for any package management system.

Python modules can be of following types:

  1. pure python : This code is platform independent written only in python
  2. python extension: Some functionality of your package could be coded in lower level languages like C/C++ for CPython or java for Jython.

Packaging, building and distributing a pure python code is not so difficult as the code can be compiled and run on any platform which provides a python Virtual Machine implementation. But, the process is not so simple when we need to do the same for python extensions.

For instance, if i were to write a python extension in C, i have following options:

  • Option 1
    • Bundle the C source files along with the python package.
    • Any user who choses to install this package would have to download the compilation/build tools necessary to compile my C source code file for that platform.
  • Option 2
    • C file could be pre-compiled and packaged alongwith the python package and stored on an index server like PyPI.
    • The limitation of this approach is that we would need to create multiple such packages with C program compiled for a lot of different platforms.
    • Advantage of this approach is that the target systems would not need to install C compilation/build tools and wait for the compiled libraries.

Python packaging ecosystem

Evolution of Python Package Management:

  • distutils is born
    • Initial package management toolset which allowed users to build, install and distribute python packages.
    • Available as part of python standard library.
    • Although quite capable, distutils was not able to keep pace with the evolving requirements as it was part of the standard python library. Anything, which is part of a standard library faces stiff resistance for change requests.
  • setuptools arrives on the scene
    • Reluctance of standard library packaging tool distutils to incorporate new features led to development of a 3rd party package management toolset called setuptools.
    • Added support for dependency management while installing modules.
    • Supports .egg package format
    • Added a module easy_install which is a high level package manager allowing you to fetch packages from PyPI server including the dependencies
  • distribute (fork of setuptools)
    • among other reasons, distribute was created to expedite work on easy_install
  • distutils2 Succeeds distutils
    • created to add important features to standard packaging tool.
    • Abandoned later on.
  • setuptools makes a comeback
    • distribute project abandons development and merges back with setuptools.
    • setuptools is now the main packaging tool.
  • distlib in pipeline
    • Once complete, it may become part of python standard library replacing distutils

Currently, as a developer you must use one of the following two options to package your modules:

  1. distutils:
    • Standard, part of python standard library, so always available, albeit with less features.
  2. setuptools:
    • recommended tool to package your modules.
    • lots of features including support of pkg_resource and pip.
    • active development.
    • Wheel format adopted.

Distribution Python Package

How to publish your package

  • At minimum, you need to write a setup.py python script in which you prepare the blueprint of your package:
    • You define the source files (python or C/C++/Java etc in case of extension modules) that would be part of the package.
    • you define the metadata for your python package like name, version, dependencies etc.
    • you define rules about how your package should be laid out on the target system.
  • you have to specify in the setup.py script that whether you want to use setuptools or distutils for creating the required package.
  • python packaging ecosystem supports a number of targets (or outputs) for your package.
    • you could run a target sdist which would produce a source distribution. This distribution can then be unpacked and built from sources provided on the target system.
    • you could create a bdist which produces a binary distribution of your package. You must remember that the binary distributions are specific for particular platforms. bdist_wheel is becoming a popular installation format for the benefits described in the next section.
    • you can do a lot of cool things, generate rpm's and various other formats.
  • Once the package file has been generated locally, it can be uploaded to the PyPI server.
  • Once uploaded to PyPI, the PyPI server would create an entry for the package in the database along with an entry on a web page describing the package.

Python Distribution Containers:

Python defines 2 standard container formats which allow to pack metadata along with your modules:

  1. Eggs Format:
    • Proposed during early years of python packaging standard development, succeeded by Wheels format described next.
    • simple zip archive of project files and metadata about the project.
    • Package in .eggs format can be imported in your code while still being in the zip format.
    • can include compiled extensions of python files (.pyc)
    • Once installed, it creates egg-info directory which hosts all the relevant metadata for the package.
    • you could package source distributions to the PyPI server using eggs.
  2. Wheels Format:
    • New container format for packaging.
    • It is a binary only installation format. This implies that you do not need to use setup.py on your target system.
    • It has very fast installation time as we do not need to build or compile any python extension modules. All we need is to copy the prebuilt modules for the target.
    • It allows caching - This makes it possible to do quick installs in the virtualenv setups.
    • Compliant to the new PEP standard of meta-data. It creates dist-info directory inside the site-packages directory to install the metadata.

Looking Ahead

Existence of so many different utilities to manage python packages has been a source of confusion for everyone. Considering the features of tools available today, it would be a safe bet to use setuptools or distutils for doing all the packaging stuff. For installation/uninstallation using different sources of python packages or the cheeseshop a.k.a PyPI, it is recommended that you use pip.

Although you are free to choose the distribution format as per your requirement, but use of wheels format is recommended if it suits your deployment requirements.

setup.py still remains our gateway for packaging of python modules. setup.cfg introduced in distutils2 is a great way of decoupling metadata from the setup python script. But, it is still undergoing improvement as the standard evolves.


← previous: Gapminder Demo next: Understanding Docker Container ecosystem →