Performance

openpyxl attempts to balance functionality and performance. Where in doubt, we have focused on functionality over optimisation: performance tweaks are easier once an API has been established. Memory use is fairly high in comparison with other libraries and applications and is approximately 50 times the original file size, e.g. 2.5 GB for a 50 MB Excel file. As many use cases involve either only reading or writing files, the Optimised Modes modes mean this is less of a problem.

Benchmarks

All benchmarks are synthetic and extremely dependent upon the hardware but they can nevertheless give an indication.

Write Performance

The benchmark code can be adjusted to use more sheets and adjust the proportion of data that is strings. Because the version of Python being used can also significantly affect performance, a driver script can also be used to test with different Python versions with a tox environment.

Performance is compared with the excellent alternative library xlsxwriter


Versions:
python: 2.7.1
openpyxl: 2.6.0dev
xlsxwriter: 1.0.9

Dimensions:
    Rows = 1000
    Cols = 50
    Sheets = 4
    Proportion text = 0.10

Times:
    xlsxwriter            :   2.45
    xlsxwriter (optimised):   2.64
    openpyxl              :   3.96
    openpyxl (optimised)  :   2.78


Versions:
python: 3.5.6
openpyxl: 2.6.0dev
xlsxwriter: 1.0.9

Dimensions:
    Rows = 1000
    Cols = 50
    Sheets = 4
    Proportion text = 0.10

Times:
    xlsxwriter            :   2.29
    xlsxwriter (optimised):   2.22
    openpyxl              :   4.35
    openpyxl (optimised)  :   2.90


Versions:
python: 3.6.6
openpyxl: 2.6.0dev
xlsxwriter: 1.0.9

Dimensions:
    Rows = 1000
    Cols = 50
    Sheets = 4
    Proportion text = 0.10

Times:
    xlsxwriter            :   2.32
    xlsxwriter (optimised):   2.22
    openpyxl              :   3.35
    openpyxl (optimised)  :   2.64


Versions:
python: 3.7.0
openpyxl: 2.6.0dev
xlsxwriter: 1.0.9

Dimensions:
    Rows = 1000
    Cols = 50
    Sheets = 4
    Proportion text = 0.10

Times:
    xlsxwriter            :   2.34
    xlsxwriter (optimised):   2.23
    openpyxl              :   2.93
    openpyxl (optimised)  :   2.49

Read Performance

Performance is measured using a file provided with a previous bug report and compared with the older xlrd library. xlrd is primarily for the older BIFF file format of .XLS files but it does have limited support for XLSX.

The code for the benchmark shows the importance of choosing the right options when working with a file. In this case disabling external links stops openpyxl opening cached copies of the linked worksheets.

One major difference between the libraries is that openpyxl’s read-only mode opens a workbook almost immediately making it suitable for multiple processes, this also readuces memory use significantly. xlrd does also not automatically convert dates and times into Python datetimes, though it does annotate cells accordingly but to do this in client code significantly reduces performance.


Versions:
python: 3.6.7
xlread: 1.1.0
openpyxl: 3.0.0dev

xlrd
    Workbook loaded 59.81s
    OptimizationData 0.24s
    Output Model 0.00s
    >>DATA>> 0.00s
    Store days 0% 0.08s
    Store days 100% 0.07s
    Total time 60.20s

openpyxl
    Workbook loaded 103.85s
    OptimizationData 0.00s
    Output Model 0.00s
    >>DATA>> 0.00s
    Store days 0% 0.00s
    Store days 100% 0.00s
    Total time 103.85s

openpyxl, read-only
    Workbook loaded 1.20s
    OptimizationData 28.57s
    Output Model 0.00s
    >>DATA>> 0.00s
    Store days 0% 23.92s
    Store days 100% 18.53s
    Total time 72.22s

openpyxl, read-only, values only
    Workbook loaded 1.10s
    OptimizationData 21.50s
    Output Model 0.00s
    >>DATA>> 0.00s
    Store days 0% 21.94s
    Store days 100% 16.96s
    Total time 61.51s


Versions:
python: 3.7.1
xlread: 1.1.0
openpyxl: 3.0.0dev

xlrd
    Workbook loaded 49.32s
    OptimizationData 0.22s
    Output Model 0.00s
    >>DATA>> 0.00s
    Store days 0% 0.07s
    Store days 100% 0.06s
    Total time 49.68s

openpyxl
    Workbook loaded 88.26s
    OptimizationData 0.00s
    Output Model 0.00s
    >>DATA>> 0.00s
    Store days 0% 0.00s
    Store days 100% 0.00s
    Total time 88.26s

openpyxl, read-only
    Workbook loaded 0.93s
    OptimizationData 24.37s
    Output Model 0.00s
    >>DATA>> 0.00s
    Store days 0% 21.95s
    Store days 100% 17.08s
    Total time 64.33s

openpyxl, read-only, values only
    Workbook loaded 0.95s
    OptimizationData 19.62s
    Output Model 0.00s
    >>DATA>> 0.00s
    Store days 0% 19.80s
    Store days 100% 15.39s
    Total time 55.77s

Parallelisation

Reading worksheets is fairly CPU-intensive which limits any benefits to be gained by parallelisation. However, if you are mainly interested in dumping the contents of a workbook then you can use openpyxl’s read-only mode and open multiple instances of a workbook and take advantage of multiple CPUs.

Sample code using the same source file as for read performance shows that performance scales reasonably with only a slight overhead due to creating additional Python processes.