I ran a simple experiment to compare Erin Sheldon's correlation function and mine/Alexia's. I modified my correlation function so that it was only calculating the data-data pair counts. I did the same with Erin's. I did this on a set of 53,071 objects. I did an auto-correlation for simplicity (both data sets were the same).
For Erin's
Start time = 16:01:53:98
End time = 16:09:11:49
Run time = 437.51 seconds
For Mine
Start time = 16:24:27:99
End time = 16:26:12:03
Run time = 104.04 seconds
The number of pair counts match pretty well, except for at the ends which I believe is just a binning effect:
I'm kind of confused why my correlation function is 4X faster on this data set, when ball-park numbers Erin and I were comparing before implied that his should run much faster. I am going to see if maybe they scale differently with number of points by running on a larger data set.
So the default depth of the tree is depth=10. It seems that a lower depth, and thus lower resolution, is better for these very large 10-degree search angles. Here are some timing numbers that show depth=6 is best. You would do this to set your depth:
ReplyDeleteh=esutil.htm.HTM(depth)
Still, there is not much speedup. When we were talking at the meeting, you had
said it took "days" to compute these, so maybe we were just talking about different regimes. Also
I didn't realize you were working at 10 degree angles. Your best speedup might be to work
with physical scales so you optimize your search
radius. This is easy to do with the bincount()
code.
depth seconds
3 137.153673172
4 114.97971487
5 104.600896835
6 99.5818519592
7 102.73042202
8 119.931453943
9 185.077499866
10 460.641659021
11 1502.35429096