Development

With the ongoing development of openpyxl, there is occasional information useful to assist developers.

What is suppoprted

The primary aim of openpyxl is to support reading and writing Microsoft Excel 2010 files. Where possible support for files generated by other libraries or programs is available but this is not guaranteed.

Supporting different Python versions

We have a small library of utility functions to support development for Python 2 and 3. This is openpyxl.compat for Python and openpyxl.xml for XML functions.

Coding style

Use PEP-8 except when implementing attributes for roundtripping but always use Python data conventions (boolean, None, etc.) Note exceptions in docstrings.

Getting the source

The source code is hosted on bitbucket.org. You can get it using a Mercurial client and the following URL.

$ hg clone https://bitbucket.org/openpyxl/openpyxl
$ virtualenv openpyxl
$ hg up 2.5
$ cd openpyxl
$ source bin/activate
$ pip install -U -r requirements.txt
$ python setup.py develop

Specification

The file specification for OOXML was released jointly as ECMA 376 and ISO 29500.

Testing

Contributions without tests will not be accepted.

We use pytest as the test runner with pytest-cov for coverage information and pytest-flakes for static code analysis.

Coverage

The goal is 100 % coverage for unit tests - data types and utility functions. Coverage information can be obtained using

py.test --cov openpyxl

Organisation

Tests should be preferably at package / module level e.g openpyxl/cell. This makes testing and getting statistics for code under development easier:

py.test --cov openpyxl/cell openpyxl/cell

Checking XML

Use the openpyxl.tests.helper.compare_xml function to compare generated and expected fragments of XML.

Schema validation

When working on code to generate XML it is possible to validate that the generated XML conforms to the published specification. Note, this won’t necessarily guarantee that everything is fine but is preferable to reverse engineering!

Microsoft Tools

Along with the SDK, Microsoft also has a “Productivity Tool” for working with Office OpenXML.

This allows you to quickly inspect or compare whole Excel files. Unfortunately, validation errors contain many false positives. The tool also contain links to the specification and implementers’ notes.

Please see Testing on Windows for additional information on setting up and testing on Windows.

Contributing

Contributions in the form of pull requests are always welcome. Don’t forget to add yourself to the list of authors!

Branch naming convention

We use a “major.minor.patch” numbering system, ie. 2.5.15. Development branches are named after “major.minor” releases. In general, API change will only happen major releases but there will be exceptions. Always communicate API changes to the mailing list before making them. If you are changing an API try and an implement a fallback (with deprecation warning) for the old behaviour.

The “default branch” is used for releases and always has changes from a development branch merged in. It should never be the target for a pull request.

Pull Requests

Pull requests should be submitted to the current, unreleased development branch. Eg. if the current release is 2.5.15, pull requests should be made to the 2.5 branch. Exceptions are bug fixes to released versions which should be made to the relevant release branch and merged upstream into development.

Please use tox to test code for different submissions before making a pull request. This is especially important for picking up problems across Python versions.

Documentation

Remember to update the documentation when adding or changing features. Check that documentation is syntactically correct.

tox -e doc

Benchmarking

Benchmarking and profiling are ongoing tasks. Contributions to these are very welcome as we know there is a lot to do.

Memory Use

There is a tox profile for long-running memory benchmarks using the memory_utils package.

tox -e memory

Pympler

As openpyxl does not include any internal memory benchmarking tools, the python pympler package was used during the testing of styles to profile the memory usage in openpyxl.reader.excel.read_style_table():

# in openpyxl/reader/style.py
from pympler import muppy, summary

def read_style_table(xml_source):
  ...
  if cell_xfs is not None:  # ~ line 47
      initialState = summary.summarize(muppy.get_objects())  # Capture the initial state
      for index, cell_xfs_node in enumerate(cell_xfs_nodes):
         ...
         table[index] = new_style
      finalState = summary.summarize(muppy.get_objects())  # Capture the final state
      diff = summary.get_diff(initialState, finalState)  # Compare
      summary.print_(diff)

pympler.summary.print_() prints to the console a report of object memory usage, allowing the comparison of different methods and examination of memory usage. A useful future development would be to construct a benchmarking package to measure the performance of different components.