Development

If you find the openpyxl project intriguing and want to contribute a new awesome feature, fix a nasty bug or improve the documentation this section will guide you in setting up your development environment.

We will look into the coding standards and version control system workflows used, as well as cloning the openpyxl code to your local machine, setting up a virtual Python environment, running tests and building the documentation.

Getting the source

The source code of openpyxl is hosted on Heptapod as a Mercurial project which you can download using e.g. the GUI client SourceTree by Atlassian. If you prefer working with the command line you can use the following:

$ hg clone https://foss.heptapod.net/openpyxl/openpyxl
$ hg up 3.1

Please note that the default branch should never be used for development work. For bug fixes and minor patches you should base your work on the branch of the current release, e.g 3.1. New features should generally be based on the development branch of the next minor version. If in doubt get in touch with the openpyxl development team.

It is worthwhile to add an upstream remote reference to the original repository to update your fork with the latest changes, by adding to the ./hg/hgrc file the following:

[paths]
default = ...
openpyxl-master = https://foss.heptapod.net/openpyxl/openpyxl

You can then grab any new changes using:

$ hg pull openpyxl-master

After that you should create a virtual environment using virtualenv and install the project requirements and the project itself:

$ cd openpyxl
$ virtualenv openpyxl-env

Activate the environment using:

$ source bin/activate  # or ./openpyxl-env/Scripts/activate on Windows

Install the dev and prod dependencies and the package itself using:

(openpyxl-env) $ pip install -U -r requirements.txt
(openpyxl-env) $ pip install -e .

Running tests

Note that contributions to the project without tests will not be accepted.

We use pytest as the test runner with pytest-cov for coverage information and pytest-flakes for static code analysis.

To run all the tests you need to either execute:

(openpxyl-env) $ pytest -xrf openpyxl  # the flags will stop testing at the first error

Or use tox to run the tests on different Python versions and configurations:

$ tox openpyxl

Coverage

The goal is 100 % coverage for unit tests - data types and utility functions. Coverage information can be obtained using:

py.test --cov openpyxl

Organisation

Tests should be preferably at package / module level e.g openpyxl/cell. This makes testing and getting statistics for code under development easier:

py.test --cov openpyxl/cell openpyxl/cell

Checking XML

Use the openpyxl.tests.helper.compare_xml function to compare generated and expected fragments of XML.

Schema validation

When working on code to generate XML it is possible to validate that the generated XML conforms to the published specification. Note, this won’t necessarily guarantee that everything is fine but is preferable to reverse engineering!

Microsoft Tools

Along with the SDK, Microsoft also has a “Productivity Tool” for working with Office OpenXML.

This allows you to quickly inspect or compare whole Excel files. Unfortunately, validation errors contain many false positives. The tool also contain links to the specification and implementers’ notes.

File Support and Specifications

The primary aim of openpyxl is to support reading and writing Microsoft Excel 2010 files. These are zipped OOXML files that are specified by ECMA 376 and ISO 29500.

Where possible we try to support files generated by other libraries or programs, but can’t guarantee it, because often these do not strictly adhere to the above format.

Support of Python Versions

Python 3.6 and upwards are supported

Coding style

We orient ourselves at PEP-8 for the coding style, except when implementing attributes for round tripping. Despite that you are encouraged to use Python data conventions (boolean, None, etc.). Note exceptions from this convention in docstrings.

Contributing

Contributions in the form of pull requests are always welcome. Don’t forget to add yourself to the list of authors!

Branch naming convention

We use a “major.minor.patch” numbering system, ie. 3.1.3. Development branches are named after “major.minor” releases. In general, API change will only happen major releases but there will be exceptions. Always communicate API changes to the mailing list before making them. If you are changing an API try and an implement a fallback (with deprecation warning) for the old behaviour.

The “default branch” is used for releases and always has changes from a development branch merged in. It should never be the target for a pull request.

Pull Requests

Pull requests should be submitted to the current, unreleased development branch. Eg. if the current release is 3.1.3, pull requests should be made to the 3.1 branch. Exceptions are bug fixes to released versions which should be made to the relevant release branch and merged upstream into development.

Please use tox to test code for different submissions before making a pull request. This is especially important for picking up problems across Python versions.

Documentation

Remember to update the documentation when adding or changing features. Check that documentation is syntactically correct.:

tox -e doc

Benchmarking

Benchmarking and profiling are ongoing tasks. Contributions to these are very welcome as we know there is a lot to do.

Memory Use

There is a tox profile for long-running memory benchmarks using the memory_utils package.:

tox -e memory

Pympler

As openpyxl does not include any internal memory benchmarking tools, the python pympler package was used during the testing of styles to profile the memory usage in openpyxl.reader.excel.read_style_table():

# in openpyxl/reader/style.py
from pympler import muppy, summary

def read_style_table(xml_source):
  ...
  if cell_xfs is not None:  # ~ line 47
      initialState = summary.summarize(muppy.get_objects())  # Capture the initial state
      for index, cell_xfs_node in enumerate(cell_xfs_nodes):
         ...
         table[index] = new_style
      finalState = summary.summarize(muppy.get_objects())  # Capture the final state
      diff = summary.get_diff(initialState, finalState)  # Compare
      summary.print_(diff)

pympler.summary.print_() prints to the console a report of object memory usage, allowing the comparison of different methods and examination of memory usage. A useful future development would be to construct a benchmarking package to measure the performance of different components.