-
-
Notifications
You must be signed in to change notification settings - Fork 8k
Closed
Labels
Milestone
Description
Xarray is a package for labeled arrays. If you use plt.hist to make a histogram of a DataArray, the speed depends a lot how you do it:
import xarray as xr
import numpy as np
import matplotlib.pyplot as plt
nPoints = 100000
data = xr.DataArray(np.random.random(nPoints),dims=['time'],coords=[np.arange(nPoints)])
It takes only some ms if you use
plt.figure()
%time data.plot.hist()
plt.figure()
%time plt.hist(data.values)
However, if you omit .values it takes extremely long:
In [12]: %time plt.hist(data)
CPU times: user 2min, sys: 9.73 s, total: 2min 9s
Wall time: 2min 3s
%prun suggests that the OrderdDict class is to blame:
145056729 function calls (144455882 primitive calls) in 198.255 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1800009 11.602 0.000 29.587 0.000 collections.py:50(__init__)
2800014 9.536 0.000 24.279 0.000 _abcoll.py:548(update)
400002 5.934 0.000 30.815 0.000 variable.py:334(__getitem__)
10200067 5.499 0.000 5.499 0.000 collections.py:90(__iter__)
13203051 5.330 0.000 13.624 0.000 {isinstance}
6600104 5.002 0.000 5.002 0.000 {hasattr}
2400082 4.842 0.000 7.813 0.000 abc.py:128(__instancecheck__)
2400012 4.821 0.000 4.821 0.000 collections.py:71(__setitem__)
6200102 4.821 0.000 4.821 0.000 _weakrefset.py:70(__contains__)
4400022 4.449 0.000 4.449 0.000 common.py:196(__setattr__)
200001 4.378 0.000 12.152 0.000 base.py:124(__new__)
1000005 4.308 0.000 5.123 0.000 indexing.py:10(expanded_indexer)
200001 4.304 0.000 51.921 0.000 dataset.py:878(isel)
200001 3.624 0.000 15.635 0.000 alignment.py:108(align_variables)
400002 3.434 0.000 8.928 0.000 dataset.py:66(_calculate_dims)
200001 3.347 0.000 9.288 0.000 dataset.py:470(_construct_dataarray)
600003 3.216 0.000 6.535 0.000 variable.py:87(as_compatible_data)
200001 3.142 0.000 99.663 0.000 merge.py:116(merge_datasets)
200001 3.044 0.000 12.129 0.000 coordinates.py:198(_to_dataset)
200001 2.740 0.000 92.687 0.000 merge.py:101(_merge_dataset_with_dict)
2800014 2.714 0.000 5.114 0.000 abc.py:148(__subclasscheck__)
400002 2.710 0.000 34.370 0.000 variable.py:493(isel)
3400017 2.651 0.000 4.377 0.000 collections.py:138(iteritems)
2400018 2.266 0.000 3.617 0.000 utils.py:355(ndim)
200001 2.028 0.000 18.420 0.000 dataset.py:221(_update_vars_and_coords)
400002 1.954 0.000 48.424 0.000 alignment.py:37(_join_indexes)
600003 1.792 0.000 14.645 0.000 variable.py:192(__init__)
4000844 1.712 0.000 1.713 0.000 {getattr}
400002 1.675 0.000 3.975 0.000 dataset.py:367(_construct_direct)
400002 1.624 0.000 44.523 0.000 alignment.py:28(_get_all_indexes)
200001 1.607 0.000 48.978 0.000 alignment.py:95(partial_align)
200001 1.585 0.000 17.709 0.000 merge.py:82(_merge_expand)
If one forgets to add .values or to use xarray's plot routine, one can be stuck for a long time. I have to kill python 1-2 per day due to that issue.
I have matplotlib 1.5.1 and xarray 0.7.2 on OSX/anaconda.