Performance¶
openpyxl attempts to balance functionality and performance. Where in doubt, we have focused on functionality over optimisation: performance tweaks are easier once an API has been established. Memory use is fairly high in comparison with other libraries and applications and is approximately 50 times the original file size, e.g. 2.5 GB for a 50 MB Excel file. As many use cases involve either only reading or writing files, the Optimised Modes modes mean this is less of a problem.
Benchmarks¶
All benchmarks are synthetic and extremely dependent upon the hardware but they can nevertheless give an indication.
Write Performance¶
The benchmark code can be adjusted to use more sheets and adjust the proportion of data that is strings. Because the version of Python being used can also significantly affect performance, a driver script can also be used to test with different Python versions with a tox environment.
Performance is compared with the excellent alternative library xlsxwriter
Versions:
python: 2.7.1
openpyxl: 2.6.0dev
xlsxwriter: 1.0.9
Dimensions:
Rows = 1000
Cols = 50
Sheets = 4
Proportion text = 0.10
Times:
xlsxwriter : 2.45
xlsxwriter (optimised): 2.64
openpyxl : 3.96
openpyxl (optimised) : 2.78
Versions:
python: 3.5.6
openpyxl: 2.6.0dev
xlsxwriter: 1.0.9
Dimensions:
Rows = 1000
Cols = 50
Sheets = 4
Proportion text = 0.10
Times:
xlsxwriter : 2.29
xlsxwriter (optimised): 2.22
openpyxl : 4.35
openpyxl (optimised) : 2.90
Versions:
python: 3.6.6
openpyxl: 2.6.0dev
xlsxwriter: 1.0.9
Dimensions:
Rows = 1000
Cols = 50
Sheets = 4
Proportion text = 0.10
Times:
xlsxwriter : 2.32
xlsxwriter (optimised): 2.22
openpyxl : 3.35
openpyxl (optimised) : 2.64
Versions:
python: 3.7.0
openpyxl: 2.6.0dev
xlsxwriter: 1.0.9
Dimensions:
Rows = 1000
Cols = 50
Sheets = 4
Proportion text = 0.10
Times:
xlsxwriter : 2.34
xlsxwriter (optimised): 2.23
openpyxl : 2.93
openpyxl (optimised) : 2.49
Read Performance¶
Performance is measured using a file provided with a previous bug report and compared with the older xlrd library. xlrd is primarily for the older BIFF file format of .XLS files but it does have limited support for XLSX.
The code for the benchmark shows the importance of choosing the right options when working with a file. In this case disabling external links stops openpyxl opening cached copies of the linked worksheets.
One major difference between the libraries is that openpyxl’s read-only mode opens a workbook almost immediately making it suitable for multiple processes, this also readuces memory use significantly. xlrd does also not automatically convert dates and times into Python datetimes, though it does annotate cells accordingly but to do this in client code significantly reduces performance.
Versions:
python: 2.7.1
xlread: 1.1.0
openpyxl: 2.6.0dev
xlrd
Workbook loaded 66.72s
OptimizationData 0.19s
Output Model 0.00s
>>DATA>> 0.00s
Store days 0% 0.07s
Store days 100% 0.06s
Total time 67.04s
openpyxl
Workbook loaded 106.64s
OptimizationData 0.00s
Output Model 0.00s
>>DATA>> 0.00s
Store days 0% 0.00s
Store days 100% 0.00s
Total time 106.64s
openpyxl, read-only
Workbook loaded 0.97s
OptimizationData 24.77s
Output Model 0.00s
>>DATA>> 0.00s
Store days 0% 24.19s
Store days 100% 19.18s
Total time 69.11s
openpyxl, read-only, values only
Workbook loaded 0.95s
OptimizationData 21.84s
Output Model 0.00s
>>DATA>> 0.00s
Store days 0% 21.94s
Store days 100% 17.09s
Total time 61.82s
Versions:
python: 3.5.6
xlread: 1.1.0
openpyxl: 2.6.0dev
xlrd
Workbook loaded 67.13s
OptimizationData 0.24s
Output Model 0.00s
>>DATA>> 0.00s
Store days 0% 0.08s
Store days 100% 0.07s
Total time 67.52s
openpyxl
Workbook loaded 115.50s
OptimizationData 0.00s
Output Model 0.00s
>>DATA>> 0.00s
Store days 0% 0.00s
Store days 100% 0.00s
Total time 115.50s
openpyxl, read-only
Workbook loaded 1.25s
OptimizationData 38.46s
Output Model 0.00s
>>DATA>> 0.00s
Store days 0% 29.54s
Store days 100% 22.78s
Total time 92.04s
openpyxl, read-only, values only
Workbook loaded 1.30s
OptimizationData 27.08s
Output Model 0.00s
>>DATA>> 0.00s
Store days 0% 27.09s
Store days 100% 21.13s
Total time 76.59s
Versions:
python: 3.6.7
xlread: 1.1.0
openpyxl: 2.6.0dev
xlrd
Workbook loaded 52.04s
OptimizationData 0.23s
Output Model 0.00s
>>DATA>> 0.00s
Store days 0% 0.08s
Store days 100% 0.07s
Total time 52.42s
openpyxl
Workbook loaded 91.79s
OptimizationData 0.00s
Output Model 0.00s
>>DATA>> 0.00s
Store days 0% 0.00s
Store days 100% 0.00s
Total time 91.79s
openpyxl, read-only
Workbook loaded 1.08s
OptimizationData 25.53s
Output Model 0.00s
>>DATA>> 0.00s
Store days 0% 23.02s
Store days 100% 17.97s
Total time 67.61s
openpyxl, read-only, values only
Workbook loaded 1.08s
OptimizationData 20.90s
Output Model 0.01s
>>DATA>> 0.00s
Store days 0% 21.05s
Store days 100% 16.15s
Total time 59.20s
Versions:
python: 3.7.1
xlread: 1.1.0
openpyxl: 2.6.0dev
xlrd
Workbook loaded 49.78s
OptimizationData 0.22s
Output Model 0.00s
>>DATA>> 0.00s
Store days 0% 0.07s
Store days 100% 0.06s
Total time 50.13s
openpyxl
Workbook loaded 88.81s
OptimizationData 0.00s
Output Model 0.00s
>>DATA>> 0.00s
Store days 0% 0.00s
Store days 100% 0.00s
Total time 88.81s
openpyxl, read-only
Workbook loaded 0.94s
OptimizationData 21.73s
Output Model 0.00s
>>DATA>> 0.00s
Store days 0% 24.94s
Store days 100% 17.21s
Total time 64.82s
openpyxl, read-only, values only
Workbook loaded 0.97s
OptimizationData 19.94s
Output Model 0.00s
>>DATA>> 0.00s
Store days 0% 19.88s
Store days 100% 15.42s
Total time 56.20s
Parallelisation¶
Reading worksheets is fairly CPU-intensive which limits any benefits to be gained by parallelisation. However, if you are mainly interested in dumping the contents of a workbook then you can use openpyxl’s read-only mode and open multiple instances of a workbook and take advantage of multiple CPUs.
Sample code using the same source file as for read performance shows that performance scales reasonably with only a slight overhead due to creating additional Python processes.