I've spent most of the day pouring over the 3D correlation function code trying to figure out the best way to convert it to an angular correlation function. Here is how the 3D correlation function works and my plan to change it:
There are two galaxy files which we are going to correlate, file S and file P. The angular region of the galaxies in each file is the same, and for simplicity sake let's assume it is the whole sky (in actuality it is some mask of Ra and Dec region(s)). The sky is divided up into a grid using lines that are evenly spaced in RA and Dec. Number of grid lines is set by the user.
All the galaxies in the S file are split up into the grid, such that for every grid section there is a list (or chain) of the galaxies contained in that grid.
A galaxy (X) is chosen from the P file and then placed into the S file grid. The distances between X and the S galaxies in that same grid section, as well as the S galaxies in close by grid sections are calculated. 'Close by' sections is defined as grids which are within the maximum correlation distance to the grid. So for instance if we are correlating out to 20 Mpc/h and the grids are spaced at 5 Mpc/h, then you could go out 4 grid sections in each direction, and thus correlate in a box which is 8X8 grids (in two dimensions). This prevents you from calculating correlations between galaxies that are further away than your maximum correlation distance.
Finding the 'close by' grids is very easy in Cartesian coordinates (as I described above), but very difficult in spherical coordinates. For example, if you have two objects on the north pole, separated by a small Dec, then it essentially doesn't matter what their RA separation is, they will still have a small angular separation. Whereas on the equator if objects have a small Dec deparation, but a large RA separation, then this means they are on different sides of the sky. So you cant simply look at grids with angle separation of +/- the maximum angular correlation because this breaks down as you go to the poles.
What I suggest to do is to create a distance matrix which has the angular distance (γ) from the center of each grid point to the center of every other grid point using this formula:
cos γ = cos(90 - Dec1)cos(90 - Dec2) + sin(90 - Dec1)sin(90 - Dec2)cos(RA1 - RA2)
This matrix only needs to be calculated once per grid, and can be saved to a file, and then read in if the same grid is used again.
Then I follow the same correlation strategy as above expect when finding the 'close by' grid sections, I go to this matrix and find all the point which are within the maximum correlation angle to the current grid and then calculate distances to all galaxies within these grids. I can speed things up further, but inputting objects from the P file grid by grid, so that once I find all the 'close by' S galaxies to galaxy X, I can then calculate distance to all other P objects in the same grid as X with the same S galaxies.
The main increase in time here will be due to making the matrix and then scanning the matrix to figure out which objects are closest, but I did a back of the envelope calculation, and I think the processing time used to do this is many orders of magnitude less than the processing time needed to correlate the galaxies with each other (10^8 vs 10^12) so hopefully this extra processing time will be in the noise.
There are two galaxy files which we are going to correlate, file S and file P. The angular region of the galaxies in each file is the same, and for simplicity sake let's assume it is the whole sky (in actuality it is some mask of Ra and Dec region(s)). The sky is divided up into a grid using lines that are evenly spaced in RA and Dec. Number of grid lines is set by the user.
All the galaxies in the S file are split up into the grid, such that for every grid section there is a list (or chain) of the galaxies contained in that grid.
A galaxy (X) is chosen from the P file and then placed into the S file grid. The distances between X and the S galaxies in that same grid section, as well as the S galaxies in close by grid sections are calculated. 'Close by' sections is defined as grids which are within the maximum correlation distance to the grid. So for instance if we are correlating out to 20 Mpc/h and the grids are spaced at 5 Mpc/h, then you could go out 4 grid sections in each direction, and thus correlate in a box which is 8X8 grids (in two dimensions). This prevents you from calculating correlations between galaxies that are further away than your maximum correlation distance.
Finding the 'close by' grids is very easy in Cartesian coordinates (as I described above), but very difficult in spherical coordinates. For example, if you have two objects on the north pole, separated by a small Dec, then it essentially doesn't matter what their RA separation is, they will still have a small angular separation. Whereas on the equator if objects have a small Dec deparation, but a large RA separation, then this means they are on different sides of the sky. So you cant simply look at grids with angle separation of +/- the maximum angular correlation because this breaks down as you go to the poles.
What I suggest to do is to create a distance matrix which has the angular distance (γ) from the center of each grid point to the center of every other grid point using this formula:
cos γ = cos(90 - Dec1)cos(90 - Dec2) + sin(90 - Dec1)sin(90 - Dec2)cos(RA1 - RA2)
This matrix only needs to be calculated once per grid, and can be saved to a file, and then read in if the same grid is used again.
Then I follow the same correlation strategy as above expect when finding the 'close by' grid sections, I go to this matrix and find all the point which are within the maximum correlation angle to the current grid and then calculate distances to all galaxies within these grids. I can speed things up further, but inputting objects from the P file grid by grid, so that once I find all the 'close by' S galaxies to galaxy X, I can then calculate distance to all other P objects in the same grid as X with the same S galaxies.
The main increase in time here will be due to making the matrix and then scanning the matrix to figure out which objects are closest, but I did a back of the envelope calculation, and I think the processing time used to do this is many orders of magnitude less than the processing time needed to correlate the galaxies with each other (10^8 vs 10^12) so hopefully this extra processing time will be in the noise.
No comments:
Post a Comment