Wednesday, April 7, 2010

Speeding-up the Correlation Function

I ran another comparison between Sheldon's and my correlation function on a larger data set. Erin had suggested that I play with the depth (this is something to do with the grid spacing of his function). Below is an email from him:

So the default depth of the tree is depth=10. It seems that a lower depth, and thus lower resolution, is better for these very large 10-degree search angles. Here are some timing numbers that show depth=6 is best. You would do this to set your depth:


depth seconds
3 137.153673172
4 114.97971487
5 104.600896835

6 99.5818519592

7 102.73042202

8 119.931453943

9 185.077499866

10 460.641659021

11 1502.35429096

Erin also thinks I am correlating out to too large of an angle. Alexia did a quick calculation and agrees with this assessment. This is another reason why my correlation functions are taking so long. Erin suggests that I try working in physical space (Mpc) instead of angular space:

Another thing to try is working in physical space. 10 degrees is definitely too large in angle. If you are, for example, willing to work at 30Mpc you could get big speedups. You know the redshifts of the spectroscopic sample of course. So you just bin in physical projected separation as defined by:

where d is the angular diameter distance in Mpc and angle is in radians. The bincount function is set up to do this. If you send the scale= keyword then the rmin,rmax arguments will be in units of the scale*angle.

Let's say you have the spectroscopic sample.
ra1, dec1, z1 And the second sample is the photometric sample. ra2,dec2

# you can also give omega_m etc. here, see docs
cosmo = esutil.cosmology.Cosmo()

# get angular diameter distance
# to points in set 1
d = cosmo.Da(0.0, z1)

rmin = 0.025 # Note units in Mpc
rmax = 30 # Mpc
nbin = 25

rlower,rupper,counts = h.bincount(rmin,rmax,nbin,ra1,dec1,ra2,dec2,scale=d)

Now rlower,rupper are also in Mpc.

I did a bigger run on 356,915 objects out to 10 degrees, here are the numbers:

For Erin's
Start time = 12:59:24.71
End time = 13:32:24.96
Run time = 1980.25 seconds ≈ 33 minutes

For Mine
Start time = 12:57:33.83
End time = 13:40:35.13
Run time = 2581.30 seconds ≈ 43 minutes

A few things to think about:
  • Should I change the correlation angle as a function of the spectroscopic redshift? Can the reconstruction handle this?
  • I am currently running the DD, DR, RD, RR correlations all in succession, but they could be run simultaneously and this could in theory speed up the calculation.
  • What are the correct angles and comoving distances I should be correlating out to? Obviously going out to 20 degrees is too much. Alexia's quick calculations estimates that I only need to go out to 6 degrees, but this varies with redshift.
  • Sheldon's code does run faster and has more functionality. Should I use it from now on?
  • The 3D correlation function is always an auto-correlation function, and so there is no need to calculate both DR and RD and so I could change this code to run faster by calculating the correlation as DD - 2DR + RR. Of course RR is going to be 25X more points that DD and 5X as DR (I am over sampling by 5) so most the time of the calculation is spent there.

Happy Birthday Toni and Cleo

No comments:

Post a Comment