Tuesday, December 14, 2010

Reconstruction, @#$% Yeah!

Oh Alexia you are a genius! Check it out everyone, this reconstruction looks pretty darn good, and it is on the SMALL data set, can't wait to see how the big one looks. Boo ya!

It's 2am, so I am not going to bother describing the details... more to follow.... cyan is data, red is reconstruction.



Making Randoms within Mask

I spent the weekend trying to make random catalogs inside the masks. They are taking forever to run, so I probably need to optimize better. I didn't spend too much time doing this before because I figured we would only need to make the random catalogs once and then we could just use them over and over again.

However, I've been running the code to generate them for > 13 hours and they still haven't finished, so there is a problem somewhere. The issue is that the mask represents a small region of the sky, but I am generating random objects over the whole sky, so 99.99% of the objects are rejected and thus the code is very inefficient.

Update: I found a bug in this code while trying to optimize explains why it was taking so long to run. Here is some testing about how it scales with number of randoms generated:

photo randoms:
500 randoms
Mon Dec 13 22:28:44 2010
Mon Dec 13 22:29:13 2010
29 seconds

5000 randoms
Mon Dec 13 22:34:00 2010
Mon Dec 13 22:31:02 2010
~3 minutes

50,000 randoms
Mon Dec 13 22:36:04 2010
Mon Dec 13 23:02:00 2010
~28 minutes

spectro randoms:
500 randoms
Mon Dec 13 22:32:10 2010
Mon Dec 13 22:31:45 2010
25 seconds

5000 randoms
Mon Dec 13 22:37:06 2010
Mon Dec 13 22:34:38 2010
~2.5 minutes

50,000 randoms
Mon Dec 13 22:38:51 2010
Mon Dec 13 23:01:42 2010
~23 minutes

The way the code works in it's current (inefficient) form:

For Spectroscopic Data:
1) Read in the two masks.
2) Create random ra and dec that are 5X as big as the spectro data set
3) Assign a redshift from the redshifts of the spectro data set (to keep same redshift distribution)
4) See if ra/dec is in the both masks
5) If it is in both masks, look at the weight in the BOSS mask. Assign a random weight to the object (between 0 and 1), if it is less than or equal to the weight of the BOSS mask then it is "observed", if it is greater than the BOSS weight it "isn't observed"
6) Repeat above until you reached the right number of randoms

For Photometric Data:
1) Read in the two masks.
2) Create random ra and dec that are 5X as big as the photo data set
3) See if ra/dec is in the both masks
4) Repeat above until you reached the right number of randoms

We don't need to assign a redshift to the photo data or "observe or not observe" because the photometric data doesn't have weights.

The code is here:

../pcode/photorandoms.pro
../pcode/spectrorandoms.pro

The code to run this as a qsub is here:
../pcode/makeRphoto.script
../pcode/makeRspec.script

You run it with the following code:
qsub makeR[photo,spec].script

The randoms have the same distribution as the data, shown in the following histograms, white is the data, red is the randoms:



Sweet!

Monday, December 13, 2010

No Overlap Data

I realized that I need to make sure that the photometric and spectroscopic data doesn't overlap. I've written code to do this here:

../pcode/removeDataOverlap.pro
../pcode/getSpectroPhotoData.pro

And the overlapping data is here:
/home/jessica/boss/spectroRaDecZMaskedNoOverLap.dat
/home/jessica/boss/photoRaDecZMaskedNoOverLap.dat

This is what we should use for the reconstruction.

Friday, December 10, 2010

Using the Whole Footprint with Masks

For the past few days I've been working on using masks to run correlation functions on the whole BOSS footprint. It's required learning how to use polygon masks and also to create randoms in these masks. Here is the code/testing:

To mask BOSS data (spectroscopic set)

;Read in BOSS data
bossgalaxies = '/home/jessica/boss/spAll-v5_4_14.fits'
spall = mrdfits(bossgalaxies, 1)

;Trim BOSS data to be a galaxy (remove QSOs, starts, etc)
isgal = where(spall.specprimary EQ 1 $
AND (spall.boss_target1 AND 2L^0+2L^1+2L^2+2L^3+2L^7) NE 0 $
AND strmatch(spall.class,'GALAXY*') $
AND spall.zwarning EQ 0)

;Select galaxies
galaxies = spall[isgal]
ra = galaxies.ra
dec = galaxies.dec
z = galaxies.z

;Select objects in the masks
weights = maskdataboth(ra,dec,z)

; Readout ra, dec, redshift of spectroscopic objects inside mask
thisfile = '/home/jessica/boss/spectroRaDecZMasked.dat'
writecol,thisfile,ra,dec,z

The code for maskdataboth is here:
/home/jessica/repository/ccpzcalib/Jessica/pythonJess/maskdataboth.pro

'maskdataboth' filters the data through both the BOSS mask and Shirley's mask to make sure the data sets are in the same ra/dec footprint.


To Mask Shirley's Data (photometric set)

;Read in Shirley's Photometric Data
file1 = "/home/shirleyho/research/SDSS_PK/powerspec_dir/DATA/ra_dec_z_run_rerun_camcol.z_gt_0.01"
readcol, file1, pra, pdec, pz, RUN, rerun, camcol, x, x, x, x, x, x, x, x, format='(F,F,F,D,D,D,D,D,F,F,F,F,F,F)'

;Select objects in the masks
pweights = maskdataboth(pra,pdec,pz)

; Readout ra, dec, redshift of spectroscopic objects inside mask
thisfile = '/home/jessica/boss/photoRaDecZMasked.dat'
writecol,thisfile,pra,pdec,pz

Here are plots of the data sets to show that they do indeed fall in the same region of the sky:




Both these data sets are saved in the following directory: /home/jessica/boss/
The code to run this is here: ../logs/101210log.pro

Thursday, December 9, 2010

Wednesday, December 8, 2010

Trimming Reconstruction

When looking over my xi and wps matrices used in the reconstruction, Alexia noticed that some fo the numbers were all over the place. She suggested changing the range of the correlation functions I use to those which have enough data to have reasonable number. So I spent yesterday creating the following code to "trim" the matrices to whatever ranges the user wants. Here is how it works:

The wpsMat is a concatenation of all the wps(r,theta) correlation functions at the different redshift (r) bins. However, sometimes there aren't enough objects at certain redshift bins, so the values in wps for that bin are nonsensical. Likewise there might angles (theta) that are too big or small. The followincg code allows you to decide how many of the r and theta bins to skip from the start and end of the data set:

skipStartR = 2 # number of r bins to skip at start
skipEndR = 1 # number of r bins to skip at end
skipR = skipStartR + skipEndR #total r bins skiped
skipStartT = 0 # number of theta bins to skip at start
skipEndT = 0 # number of theta bings to skip at end
#Make matrix with skipping:
makeWPSmatrix4(workingDir,rbins, corrBins, wpsMatrixFile, skipStartR, skipEndR, skipStartT, skipEndT)

Similarly, we can want to skip the same r bins in the xi matrix, and also we can skip bins along the line of sight (l):

#Create a matrix to input the correlation function data
# Write Xi Matrix to file
skipStartL = 0 # number of line of sight bins to skip at start
skipEndL = 0 # number of line of sight bins to skip at end
makeXimatrix4(workingDir,rbins,nxicorr,xiMatrixFile, skipStartR, skipEndR, skipStartL, skipEndL)

This doesn't appear to be helping the reconstruction. Here is reconstruction without trimming:

Reconstruction without trimming

Construction with trimming:

The code to make these plots is here: ../logs/101208log.py

Tuesday, December 7, 2010

Dying Reconstruction

Yesterday I had the problem that the reconstruction code was working fine on my laptop, but when I tried to run it on riemann I kept getting a segmentation fault. I turned out it was the same problem as this post. Oh thank you blog! Now the reconstruction runs on riemann as well as my lap top.

Monday, December 6, 2010

Reconstruction Code for Data

I spent today getting the reconstruction code in shape to do it on the stripe 82 data. It is in the following file: ../pythonjess/jessreconstructionStripe82DataFull.py

I've done the data run in this run directory:
../pcode/run2010128_21

Results to come.

Friday, December 3, 2010

BOSS Galaxies

Spent today getting the BOSS Galaxies in the correct form to use as my spectroscopic data set. Here's what I did:

1) Downloaded latest spAll file from here to ~/boss/:
:/clusterfs/riemann/
raid006/bosswork/groups/boss/spectro/redux/spAll-v5_4_14.fits

2) In IDL, trim the data to be galaxies in stripe 82:
bossgalaxies = '/home/jessica/boss/spAll-v5_4_14.fits'
spall = mrdfits(bossgalaxies, 1)

stripe82gal = where(((spall.ra LE 60.0) OR (spall.ra GE 300.)) $
and ((spall.dec GE -1.25) and (spall.dec LE 1.25)) $
AND spall.specprimary EQ 1 $
AND (spall.boss_target1 AND 2L^0+2L^1+2L^2+2L^3+2L^7) NE 0 $
AND strmatch(spall.class,'GALAXY*') $
AND spall.zwarning EQ 0

3) Write these galaxies out to a col delimited file that python can read:
thisfile = '/home/jessica/boss/BOSSstripe82RaDec.dat'
writecol, thisfile, spall[stripe82gal].ra, spall[stripe82gal].dec, spall[stripe82gal].z

4) Do the same to Shirley Ho's photometric LRG file:
shirleydata = "/home/shirleyho/research/SDSS_PK/powerspec_dir/DATA/ra_dec_z_run_rerun_camcol.z_gt_0.01"
readcol, shirleydata, RA, DEC, z, RUN, rerun, camcol, x, x, x, x, x, x, x, x, format='(F,F,F,D,D,D,D,D,F,F,F,F,F,F)'
datasize = n_elements(ra)

;Data just within stripe 82
stripe82all = where(((ra LE 60.0) OR (ra GT 305.)) $
and ((dec GE -1.25) and (dec LE 1.25)))

; Readout ra, dec, redshift of stripe 82 Shirly objects
thisfile = '/home/jessica/boss/ShirlyDataRaDec.dat'
writecol, thisfile, ra[stripe82all], dec[stripe82all], z[stripe82all]

If I want to use the whole BOSS Galaxy footprint I have a mask here from Martin White:
/home/jessica/boss/bossX.002.ply

You can read this file like so:

infile = "./bossX.002.ply"
read_mangle_polygons, infile, polygons, id

There are more functions in idlutils in playing with the mask.

Now I am trying to do the reconstruction on these two data sets. Using BOSS as the spectroscopic set and Shirley's LRG galaxies as the spectroscopic galaxies.

Here are some plots of the two sets of data:








The log file to make these data files and plots is here:
../logs/101207log.pro
../logs/101207log.py

Thursday, December 2, 2010

At Los Alamos

I am visiting Alexia at Los Alamos National Lab for the next three weeks. Los Alamos is beautiful, but COLD.

I'm putting the final touches on the data-correlation function setup.

I noticed that in the 3D auto-correlation, I am producing two different random catalogs and using the following calculation:

(dd -dr1 -r2d +r1r2)/(r1r2)

This might be extra computing power than necessary if it is ok for r1 = r2... but I forget if they need to be different... perhaps they do need to be different in order for this to work. It seems like this is something I would have thought of before.

I talked to Alexia about this and she thinks it shouldn't make a difference either way. So I am going to set r1 = r2 and see what happens....

Old Autocorrelation Function results (without randoms fed in)
# r (Mpc/h) cross xi (r)
# r (Mpc/h) cross xi (r)
1.0009750000 113.0999999998
3.0009250000 4.3312368973
5.0008750000 0.1890557940
7.0008250000 0.3768587361
9.0007750000 0.4359240581
11.0007250000 0.1349752029
13.0006750000 0.1273595140
15.0006250000 -0.1351841518
17.0005750000 0.2378351539
19.0005250000 0.2994336370
21.0004750000 -0.0479553597
23.0004250000 -0.0998846930
25.0003750000 -0.1438266200
27.0003250000 -0.3070417433
29.0002750000 -0.3108741254
31.0002250000 -0.1177777522
33.0001750000 -0.0633376228
35.0001250000 -0.2604527820
37.0000750000 -0.2517495550
39.0000250000 -0.0304629134


New Autocorrelation Function results (with randoms fed in)
I think small differences are rounding errors
# Reached checkpoint 4
# r (Mpc/h) cross xi (r)
1.0009750000 113.0999999998
3.0009250000 4.3294918806
5.0008750000 0.1890963726
7.0008250000 0.3766263941
9.0007750000 0.4359240581
11.0007250000 0.1349752029
13.0006750000 0.1273650408
15.0006250000 -0.1355281973
17.0005750000 0.2380827199
19.0005250000 0.2995001354
21.0004750000 -0.0479749624
23.0004250000 -0.0998864926
25.0003750000 -0.1437953465
27.0003250000 -0.3070974753
29.0002750000 -0.3107744869
31.0002250000 -0.1177689518
33.0001750000 -0.0632545877
35.0001250000 -0.2605975485
37.0000750000 -0.2517029624
39.0000250000 -0.0304363094

New Autocorrelation Function with r1 = r2 (with randoms fed in)
There are pretty big difference between these two, but this could just be because my data set is small. Should check on a larger set.
# Reached checkpoint 4
# r (Mpc/h) cross xi (r)
1.0009750000 120.9999999998
3.0009250000 3.9579487179
5.0008750000 0.1234024722
7.0008250000 0.3826803206
9.0007750000 0.4731333382
11.0007250000 0.0871628790
13.0006750000 0.0808080808
15.0006250000 -0.1168749344
17.0005750000 0.2334021835
19.0005250000 0.3148748159
21.0004750000 -0.0339674194
23.0004250000 -0.0762365727
25.0003750000 -0.1365716955
27.0003250000 -0.2876355988
29.0002750000 -0.3209356312
31.0002250000 -0.1164599148
33.0001750000 -0.0517181534
35.0001250000 -0.2402264337
37.0000750000 -0.2732306595
39.0000250000 -0.0168706759