With the ongoing development of openpyxl, there is occasional information useful to assist developers.
What is suppoprted¶
The primary aim of openpyxl is to support reading and writing Microsoft Excel 2010 files. Where possible support for files generated by other libraries or programs is available but this is not guaranteed.
Supporting different Python versions¶
We have a small library of utility functions to support development for Python 2 and 3. This is openpyxl.compat for Python and openpyxl.xml for XML functions.
Use PEP-8 except when implementing attributes for roundtripping but always use Python data conventions (boolean, None, etc.) Note exceptions in docstrings.
Getting the source¶
The source code is hosted on bitbucket.org. You can get it using a Mercurial client and the following URL.
$ hg clone https://bitbucket.org/openpyxl/openpyxl $ hg up 2.4 $ virtualenv openpyxl $ cd openpyxl $ source bin/activate $ pip install -U -r requirements.txt $ python setup.py develop
Contributions without tests will not be accepted.
We use pytest as the test runner with pytest-cov for coverage information and pytest-flakes for static code analysis.
The goal is 100 % coverage for unit tests - data types and utility functions. Coverage information can be obtained using
py.test --cov openpyxl
Tests should be preferably at package / module level e.g openpyxl/cell. This makes testing and getting statistics for code under development easier:
py.test --cov openpyxl/cell openpyxl/cell
openpyxl.tests.helper.compare_xml function to compare
generated and expected fragments of XML.
When working on code to generate XML it is possible to validate that the generated XML conforms to the published specification. Note, this won’t necessarily guarantee that everything is fine but is preferable to reverse engineering!
Along with the SDK, Microsoft also has a “Productivity Tool” for working with Office OpenXML.
This allows you to quickly inspect or compare whole Excel files. Unfortunately, validation errors contain many false positives. The tool also contain links to the specification and implementers’ notes.
Please see Testing on Windows for additional information on setting up and testing on Windows.
Contributions in the form of pull requests are always welcome. Don’t forget to add yourself to the list of authors!
Branch naming convention¶
We use a “major.minor.patch” numbering system, ie. 2.4.7. Development branches are named after “major.minor” releases. In general, API change will only happen major releases but there will be exceptions. Always communicate API changes to the mailing list before making them. If you are changing an API try and an implement a fallback (with deprecation warning) for the old behaviour.
The “default branch” is used for releases and always has changes from a development branch merged in. It should never be the target for a pull request.
Pull requests should be submitted to the current, unreleased development branch. Eg. if the current release is 2.4.7, pull requests should be made to the 2.4 branch. Exceptions are bug fixes to released versions which should be made to the relevant release branch and merged upstream into development.
Please use tox to test code for different submissions before making a pull request. This is especially important for picking up problems across Python versions.
Remember to update the documentation when adding or changing features. Check that documentation is syntactically correct.
tox -e doc
Benchmarking and profiling are ongoing tasks. Contributions to these are very welcome as we know there is a lot to do.
There is a tox profile for long-running memory benchmarks using the memory_utils package.
tox -e memory
As openpyxl does not include any internal memory benchmarking tools, the
python pympler package was used during the testing of styles to profile the
memory usage in
# in openpyxl/reader/style.py from pympler import muppy, summary def read_style_table(xml_source): ... if cell_xfs is not None: # ~ line 47 initialState = summary.summarize(muppy.get_objects()) # Capture the initial state for index, cell_xfs_node in enumerate(cell_xfs_nodes): ... table[index] = new_style finalState = summary.summarize(muppy.get_objects()) # Capture the final state diff = summary.get_diff(initialState, finalState) # Compare summary.print_(diff)
pympler.summary.print_() prints to the console a report of object
memory usage, allowing the comparison of different methods and examination of
memory usage. A useful future development would be to construct a
benchmarking package to measure the performance of different components.