If you find the openpyxl project intriguing and want to contribute a new awesome feature, fix a nasty bug or improve the documentation this section will guide you in setting up your development environment.
We will look into the coding standards and version control system workflows used, as well as cloning the openpyxl code to your local machine, setting up a virtual Python environment, running tests and building the documentation.
Getting the source¶
The source code of openpyxl is hosted on BitBucket as a Mercurial project which you can download using e.g. the GUI client SourceTree by Atlassian. If you prefer working with the command line you can use the following:
$ hg clone https://bitbucket.org/openpyxl/openpyxl $ hg up 2.6
Please note that the default branch should never be used for development work. For bug fixes and minor patches you should base your work on the branch of the current release, e.g 2.6. New features should generally be based on the development branch of the next minor version. If in doubt get in touch with the openpyxl development team.
It is worthwhile to add an upstream remote reference to the
original repository to update your fork with the latest changes, by adding
./hg/hgrc file the following:
[paths] default = ... openpyxl-master = https://bitbucket.org/openpyxl/openpyxl
You can then grab any new changes using:
$ hg pull openpyxl-master
After that you should create a virtual environment using
and install the project requirements and the project itself:
$ cd openpyxl $ virtualenv openpyxl-env
Activate the environment using:
$ source bin/activate # or ./openpyxl-env/Scripts/activate on Windows
Install the dev and prod dependencies and the package itself using:
(openpyxl-env) $ pip install -U -r requirements.txt (openpyxl-env) $ pip install -e .
Note that contributions to the project without tests will not be accepted.
pytest as the test runner with
pytest-cov for coverage information and
pytest-flakes for static code analysis.
To run all the tests you need to either execute:
(openpxyl-env) $ pytest -xrf openpyxl # the flags will stop testing at the first error
tox to run the tests on different Python versions and
$ tox openpyxl
The goal is 100 % coverage for unit tests - data types and utility functions. Coverage information can be obtained using:
py.test --cov openpyxl
Tests should be preferably at package / module level e.g
makes testing and getting statistics for code under development easier:
py.test --cov openpyxl/cell openpyxl/cell
openpyxl.tests.helper.compare_xml function to compare
generated and expected fragments of XML.
When working on code to generate XML it is possible to validate that the generated XML conforms to the published specification. Note, this won’t necessarily guarantee that everything is fine but is preferable to reverse engineering!
Along with the SDK, Microsoft also has a “Productivity Tool” for working with Office OpenXML.
This allows you to quickly inspect or compare whole Excel files. Unfortunately, validation errors contain many false positives. The tool also contain links to the specification and implementers’ notes.
File Support and Specifications¶
Where possible we try to support files generated by other libraries or programs, but can’t guarantee it, because often these do not strictly adhere to the above format.
Support of Python Versions¶
We have a small library of utility functions to support development for
Python 2 and 3. With the functions code can by developed using Python 3 style
and idioms. This is
openpyxl.compat for Python and
openpyxl.xml for XML functions.
However, in version 3.0 we will drop support for Python 2.x versions.
We orient ourselves at PEP-8 for the coding style, except when implementing attributes for roundtripping. Despite that you are encouraged to use Python data conventions (boolean, None, etc.). Note exceptions from this convestion in docstrings.
Contributions in the form of pull requests are always welcome. Don’t forget to add yourself to the list of authors!
Branch naming convention¶
We use a “major.minor.patch” numbering system, ie. 2.6.2. Development branches are named after “major.minor” releases. In general, API change will only happen major releases but there will be exceptions. Always communicate API changes to the mailing list before making them. If you are changing an API try and an implement a fallback (with deprecation warning) for the old behaviour.
The “default branch” is used for releases and always has changes from a development branch merged in. It should never be the target for a pull request.
Pull requests should be submitted to the current, unreleased development branch. Eg. if the current release is 2.6.2, pull requests should be made to the 2.6 branch. Exceptions are bug fixes to released versions which should be made to the relevant release branch and merged upstream into development.
tox to test code for different submissions before
making a pull request. This is especially important for picking up problems
across Python versions.
Remember to update the documentation when adding or changing features. Check that documentation is syntactically correct.:
tox -e doc
Benchmarking and profiling are ongoing tasks. Contributions to these are very welcome as we know there is a lot to do.
There is a tox profile for long-running memory benchmarks using the memory_utils package.:
tox -e memory
As openpyxl does not include any internal memory benchmarking tools, the
python pympler package was used during the testing of styles to profile the
memory usage in
# in openpyxl/reader/style.py from pympler import muppy, summary def read_style_table(xml_source): ... if cell_xfs is not None: # ~ line 47 initialState = summary.summarize(muppy.get_objects()) # Capture the initial state for index, cell_xfs_node in enumerate(cell_xfs_nodes): ... table[index] = new_style finalState = summary.summarize(muppy.get_objects()) # Capture the final state diff = summary.get_diff(initialState, finalState) # Compare summary.print_(diff)
pympler.summary.print_() prints to the console a report of object
memory usage, allowing the comparison of different methods and examination of
memory usage. A useful future development would be to construct a
benchmarking package to measure the performance of different components.