Jessica Kirkpatrick Research: May 2011

Tuesday, May 31, 2011

More Reconstruction Test

I ran the reconstruction with a different selection function to see if that made things better. The photometric distribution is shown here (workingDir is here: 'run2011530_1632'):

I'm still getting the issue that I am having trouble finding phibins that don't give me the error. But here are some reconstructions, the first one looks good-ish, the second one not so much:

Note that the label on the y-axis is not right, there are more objects in each bin than it says, I just had to normalize them to be on the same scale.

I'm running again with small angles to see if this helps. This smaller angle run is here (workingDir = 'run2011531_1451').

The angular correlation functions are freaking out at small angles. Not sure what that is about:

They look normal starting around 10^-3:

I tried doing the reconstruction both using all the angular bins, and just the angular bins that weren't freaking out, the look the same:

less angular bins:

all angular bins:

Here are some other reconstruction binning, disturbingly I am still getting the "feature" where if I change the # of bins slightly (from 15 to 16) I get very different reconstructions:

Because I am still not sure how doing this "non uniform" binning in r effects the reconstruction, I am running again, but with uniform binning. We'll see how that goes.

Friday, May 27, 2011

Randoms for CFHT Galaxies

Spent the last couple days making the randoms for the CFHT galaxies. Here is how it's done:

; 1) Go to Analysis directory:

/home/jessica/repository/ccpzcalib/Jessica/qsobias/Analysis/

; 2) Make randoms for each pointing:

listfile = '/clusterfs/riemann/data/jessica/alexieData/se_cats/se_list.txt'

mask_dir='/clusterfs/riemann/data/jessica/alexieData/masks/'

infile = '/clusterfs/riemann/data/jessica/alexieData/Catalogs/cs82_all_cat1.fits'

random_dir = '/clusterfs/riemann/data/jessica/alexieData/randoms/'

make_randoms,infile,outfile,mask_dir,random_dir,listfile

; 3) Remove randoms not in mask, combine to single file

dir = '/clusterfs/riemann/data/jessica/alexieData/randoms/'

listfile = '/clusterfs/riemann/data/jessica/alexieData/se_cats/se_list.txt'

outfile = '/clusterfs/riemann/data/jessica/alexieData/Catalogs/random_catalog.fits'

fix_randoms,dir,listfile,outfile

; 4) Remove overlaping regions

infile = '/clusterfs/riemann/data/jessica/alexieData/Catalogs/random_catalog.fits'

outfile = '/clusterfs/riemann/data/jessica/alexieData/Catalogs/random_catalog_no_overlaps.fits'

ra_list = '/home/jessica/repository/ccpzcalib/Jessica/qsobias/Analysis/S82.ra_cuts'

se_list_file = '/clusterfs/riemann/data/jessica/alexieData/se_cats/se_list.txt'

remove_overlaps,infile,outfile,se_list_file,ra_list

Here are some plots from DS9, blue circles are randoms that are kept, and red circles are randoms that are masked out. The green and yellow regions are masked regions:

The randoms are made at a density of 7.5*10D^5 per square degree. This is approximately 10X the density of the data.

Next Steps with Reconstruction

I just talked to Alexia she suggested that I change the selection function for the photometric data to have more features and higher redshift, as right now we are trying to reconstruct the first "bump" at ~150 Mpc where there are very few galaxies.

She also said I should look into how non-uniform binning (log binning) effects the summantions done in the reconstruction and if perhaps we need to do something different to deal with the fact that the spacing is no longer the same for each bin.

Exciting that things are closer to working than ever!

Tuesday, May 24, 2011

Reconstruction (Log Binning) Results

In this post I discussed trying different binning to get the reconstruction to overlap more with the actual correlation functions.

Here is the binning I am using. The binning is log in theta (for 2d correlation function) and rlos (for 3d correlation function), but the binning in redshift is such that each bin has the same number of galaxies.

Here are the arrays:

rbins = array([ 4.592407, 136.86079 , 170.595585, 194.675063, 211.888514,

227.822058, 242.439338, 254.738113, 267.180894, 277.596667,

287.557931, 297.031654, 305.563449, 314.144417, 322.061375,

329.842934, 337.956897, 345.179652, 352.14308 , 359.026505,

365.954462, 372.052363, 378.261215, 384.175881, 389.837465,

395.409799, 400.562692, 405.823926, 411.006328, 416.036745,

420.853707, 425.608283, 430.068758, 434.598696, 438.968533,

443.201263, 447.527471, 451.469393, 455.440266, 459.405314,

463.13316 , 466.657868, 470.378975, 474.188462, 478.033728,

481.934477, 485.61191 , 489.252768, 492.81945 , 496.462536,

499.999607]

theta = array([ 1.12201850e-03, 1.41253750e-03, 1.77827940e-03,

2.23872110e-03, 2.81838290e-03, 3.54813390e-03,

4.46683590e-03, 5.62341330e-03, 7.07945780e-03,

8.91250940e-03, 1.12201845e-02, 1.41253754e-02,

1.77827941e-02, 2.23872114e-02, 2.81838293e-02,

3.54813389e-02, 4.46683592e-02, 5.62341325e-02,

7.07945784e-02, 8.91250938e-02, 1.12201845e-01,

1.41253754e-01, 1.77827941e-01, 2.23872114e-01,

2.81838293e-01, 3.54813389e-01, 4.46683592e-01,

5.62341325e-01, 7.07945784e-01, 8.91250938e-01,

1.12201845e+00, 1.41253754e+00, 1.77827941e+00,

2.23872114e+00, 2.81838293e+00, 3.54813389e+00,

4.46683592e+00, 5.62341325e+00, 7.07945784e+00,

8.91250938e+00])

rlos = array([ 0.52962686, 0.59425111, 0.66676072, 0.74811783,

0.83940201, 0.94182454, 1.05674452, 1.18568685,

1.33036253, 1.49269131, 1.6748272 , 1.87918702,

2.10848252, 2.36575629, 2.65442222, 2.97831072,

3.34171959, 3.74947105, 4.20697571, 4.72030438,

5.29626863, 5.94251114, 6.66760716, 7.48117828,

8.39402009, 9.41824545, 10.5674452 , 11.85686853,

13.3036253 , 14.92691309, 16.74827196, 18.79187021,

21.08482517, 23.65756295, 26.54422221, 29.78310718,

33.41719588, 37.49471047, 42.06975708, 47.20304381])

Here are plots of the 2D and 3D correlation functions:

I've found it hard to find phibin spacing that doesn't go below the limits of the correlation function. I get the following error with a lot of binning:

sximat=crl.getximat(th,rlos,phibins,rs,xispec,xierr)

0.000505647568659 is less than minimum r_ss 0.000529627

refusing to extrapolate

Perhaps I should allow it to extrapolate to smaller angles....

but the following phibins works:

phibins = array([ 0.01 , 0.03578947, 0.06157895, 0.08736842, 0.11315789,

0.13894737, 0.16473684, 0.19052632, 0.21631579, 0.24210526,

0.26789474, 0.29368421, 0.31947368, 0.34526316, 0.37105263,

0.39684211, 0.42263158, 0.44842105, 0.47421053, 0.5 ])

Here is what the xi and sximat plots look like, as you can see they overlap more now:

But the reconstruction isn't so great:

If I change the binning (here is 50 bins):

phibins =

array([ 0.00459241, 0.13686079, 0.17059558, 0.19467506, 0.21188851,

0.22782206, 0.24243934, 0.25473811, 0.26718089, 0.27759667,

0.28755793, 0.29703165, 0.30556345, 0.31414442, 0.32206137,

0.32984293, 0.3379569 , 0.34517965, 0.35214308, 0.3590265 ,

0.36595446, 0.37205236, 0.37826121, 0.38417588, 0.38983746,

0.3954098 , 0.40056269, 0.40582393, 0.41100633, 0.41603674,

0.42085371, 0.42560828, 0.43006876, 0.4345987 , 0.43896853,

0.44320126, 0.44752747, 0.45146939, 0.45544027, 0.45940531,

0.46313316, 0.46665787, 0.47037898, 0.47418846, 0.47803373,

0.48193448, 0.48561191, 0.48925277, 0.49281945, 0.49646254,

0.49999961])

Another thing is that now I have so many theta, rlos, r bins the crl.getCoeff takes a long time to run.

I suppose this looks better. At this point I need to either try a different binning, or re-write the crl.getximat to allow extrapolation to smaller angles / separations to that I can try different reconstructions. Opinions Alexia?

Data Handling

For the last few days I've been doing a bunch of data handling for this new project. Below is what I've done:

Alexie wrote a pipeline to collate data and apply masks. The code is here: .../qsobias/Analysis/doall_cfht_se.pro

I added some code to remove the repeats: ../qsobias/Analysis/remove_repeats.pro

To just start us off I wrote some code to trim the data and make randoms in the entire footprint (because we don't have masks yet). This code is in the following log file: .../logs/110522log.pro

;I start with the galaxy catalog from Alexie: ../Catalogs/cs82_all_cat6.fits
;I remove repeats:

cat6 = '/clusterfs/riemann/data/jessica/alexieData/Catalogs/cs82_all_cat6.fits'
cat7 = '/clusterfs/riemann/data/jessica/alexieData/Catalogs/cs82_all_cat7.fits'

infile = cat6
outfile = cat7
remove_repeats,infile,outfile

;Read in file with repeats removed

infile = '/clusterfs/riemann/data/jessica/alexieData/Catalogs/cs82_all_cat7.fits'
str=mrdfits(infile,1)

;Do some simple geometry cuts on ra, dec, and z
deltacut = where(str.DELTA_J2000 gt -1.0 and str.DELTA_J2000 lt 0.94)
str = str[deltacut]

alphacut = where(str.ALPHA_J2000 gt -42.5 and str.ALPHA_J2000 lt 45.0)
str = str[alphacut]

zcut = where(str.ZPHOT le 1.0)
str = str[zcut]

;Do some cuts that Alexie recommends to make sure we are getting galaxies, inside the mask, and only down to mag = 23.7
gooddatacut = where(str.mask eq 0 and str.mag_auto lt 23.7 and str.class_star lt 0.9)
str = str[gooddatacut]

;Outfile data to files
outfile = '/clusterfs/riemann/data/jessica/alexieData/Catalogs/cs82_all_cat8.fits'
mwrfits, str, outfile, /create

;This is the galaxy catalog I am using for the preliminary correlation function test
thisfile = '/clusterfs/riemann/data/jessica/alexieData/Catalogs/cfht_data.dat'
writecol,thisfile,str.alpha_j2000,str.delta_j2000,str.zphot

;Now to do the same to the BOSS Data
;This code can be found in 110519log.py

;Read in BOSS Galaxy Data
qsofile = '/home/jessica/qsobias/Stripe_82_QSOs_unique.dat'
readcol,qsofile,ra,dec,z,gmag,format='(D,D,D,D,X)'

;Remove duplicates
spherematch, ra, dec, ra, dec, 2./3600, m1, m2, maxmatch=0
dups = m1[where(m1 gt m2)]
good = [indgen(n_elements(ra))*0]+1
good[dups]=0
ra = ra[where(good)]
dec = dec[where(good)]
z = z[where(good)]
gmag = gmag[where(good)]

thisfile = '/home/jessica/qsobias/QSOs_unique.dat'
writecol,thisfile,ra,dec,z,gmag

#Now in python

import numpy as N
from pylab import *
from dataplay import *

galfile = '/clusterfs/riemann/data/jessica/alexieData/Catalogs/cfht_data.dat'
gal=N.loadtxt(galfile,comments='#')
galra = gal[:,0]
galdec = gal[:,1]
galz = gal[:,2]

#Make Cuts on QSOs
qsofile = '/home/jessica/qsobias/QSOs_unique.dat'
qso=N.loadtxt(qsofile,comments='#')
qsora = qso[:,0]
qsodec = qso[:,1]
qsoz = qso[:,2]
qsog = qso[:,3]

#Dec between -1 and 1
deccut = N.where((qsodec < 0.94) & (qsodec > -1.0)) #make sure they are inside the cfht mask

qsora = qsora[deccut]
qsodec = qsodec[deccut]
qsoz = qsoz[deccut]
qsog = qsog[deccut]

#Redshift between 0 and 1
zcut = N.where((qsoz >= 0.0) & (qsoz < 1.0)) qsora = qsora[zcut] qsodec = qsodec[zcut] qsoz = qsoz[zcut] qsog = qsog[zcut] #RA range (-42.5,45) highra = N.where(qsora > 300)
qsora[highra] = qsora[highra] - 360.
racut = N.where((qsora < 45.0) & (qsora > -42.5))

qsora = qsora[racut]
qsodec = qsodec[racut]
qsoz = qsoz[racut]
qsog = qsog[racut]

#Write masked data to files
thisfile = '/home/jessica/qsobias/QSO_data_masked.dat'
writeQSOsToFile(qsora,qsodec,qsoz,qsog,thisfile)

#This is the QSO data file
thisfile = '/home/jessica/qsobias/QSO_data.dat'
writeQSOsToFile2(qsora,qsodec,qsoz,thisfile)

#Plot the galaxies and qsos
plot(galra, galdec, 'b,', label = 'galaxy data')
plot(qsora, qsodec, 'r,', label = 'qso data')

xlabel('ra')
ylabel('dec')
title('RA & Dec of Data')
from pylab import *
legend(loc=1)

# Translate into comoving distance (from redshift, ra, dec)

qsox,qsoy,qsoz,qsocd=raDecZtoXYZ(qsora,qsodec,qsoz)

min(qsocd)

0 #Mpc/h

max(qsocd)

2312.67 #Mpc/h

# Plot redshift distribution of galaxies and qsos

bins = linspace(start = 0.01, stop = 1.0, num = 20)

histresult = histogram(qsoz, bins=bins)

qson = histresult[0]

plot(bins, qson, 'b.-', label = 'qso data')

histresult = histogram(galz, bins=bins)

galn = histresult[0]

plot(bins, galn/1000, 'r.-', label = 'galaxy data (k)')

xlabel('redshift bin (z)')

ylabel('number of QSO/gal in bin')

title('Redshift Distribution of Galaxies and QSOs')

from pylab import *

legend(loc=2)

;Back to IDL

;Make random catalog

; code in ../log/110519log.py

;Read in galaxy data

thisfile = '/clusterfs/riemann/data/jessica/alexieData/Catalogs/cfht_data.dat'

readcol, thisfile, ra, dec, z, format='(F,F,F)'

;Make 10X as many randoms as we have galaxies

nrandoms = n_elements(ra)*10

;Make random ra / dec in the Stripe 82 range

randomdec = 90.0-acos(RANDOMU(S, 2*nrandoms)*0.03385776-0.01745246)*180./!pi

deccut = where(randomdec GT -1.0 AND randomdec LT 0.94)

randomdec = randomdec[deccut]

randomdec = randomdec[0:nrandoms-1]

randomra = RANDOMU(S, nrandoms)*87.5 - 42.50

; Make file of ra, dec of spectroscopic randoms inside mask

thisfile = '/clusterfs/riemann/data/jessica/alexieData/Catalogs/cfht_randoms.dat'

writecol,thisfile, randomra, randomdec

;Calculate correlation function using spherematch (should actually use c-code because this takes a long time

;Read in galaxy data

thisfile = '/clusterfs/riemann/data/jessica/alexieData/Catalogs/cfht_data.dat'

readcol, thisfile, gra, gdec, gz, format='(F,F,F)'

;Read in random data

thisfile = '/clusterfs/riemann/data/jessica/alexieData/Catalogs/cfht_randoms.dat'

readcol, thisfile, rra, rdec, format='(F,F)'

;Read in QSO data

thisfile = '/home/jessica/qsobias/QSO_data.dat'

readcol, thisfile, qra, qdec, qz, format='(F,F,F)'

;Make theta bins

theta = findgen(11)/2.-5.

logtheta = 10D^(theta)

maxtheta = max(logtheta)

bins = logtheta

nbins = n_elements(bins) - 1

nr = n_elements(rra)

nq = n_elements(qra)

ng = n_elements(gra)

;Find distance separations using sphere match

print,'Starting cross-correlation:'

print,'Estimating DD...'

spherematch,qra,qdec,gra,gdec,maxtheta,ind1a,ind1b,dist_dd,max=0

print,'Estimating DR...'

spherematch,qra,qdec,rra,rdec,maxtheta,ind1,ind2,dist_dr,max=0

print,'Done cross-correlating.'

;Find number of pairs per bin

bins_lower = bins[0:nbins-1]

bins_upper = bins[1:nbins]

dd = fltarr(nbins)

dr = fltarr(nbins)

rmean = fltarr(nbins)

for i = 0,(nbins-1) do dd[i] = n_elements(where(dist_dd gt bins_lower[i] AND dist_dd le bins_upper[i]))

for i = 0,(nbins-1) do dd[i] = n_elements(where(dist_dr gt bins_lower[i] AND dist_dr le bins_upper[i]))

;Calculate the correlation function

corr1 = dd/dr*(nq*nr/(nq*ng)) - 1.0

Friday, May 20, 2011

New Project

Because things aren't yet working with my reconstruction project, David has suggested I work on something else in parallel so that if I end up not being able to get a result with the reconstruction, I still have another option for my thesis.

Here is the new project description:

~~~~~~~~~~~~

SDSS-III Project 127: Cross-correlation of BOSS spectroscopic quasars on Stripe 82 with photometric CFH galaxies.

Participants:
Jessica Kirkpatrick
Martin White, David Schlegel, Nic Ross, Alexie Leauthaud, Jean-Paul Kneib

Categories: BOSS

Project Description:
We plan to measure the intermediate scale clustering of low redshift BOSS quasars along Stripe 82 by cross-correlation against the photometric galaxy catalog from the CFH i-band imaging on Stripe 82. There are approximately 1,000 BOSS quasars with 0.5<z<1 and just under 6 million galaxies in the -43<RA<43 and -1<DEC<1 region brighter than i=23.5, which should lead to a strong detection of clustering over approximately 2 orders of magnitude in length scale. The geometry of the stripe suggests errors on the cross-correlation can be efficiently obtained by jackknife or bootstrap sampling the ~50 2x2 degree blocks.

We intend to split the QSO sample in luminosity and black hole mass. We plan to estimate the BH mass using the fits from Vestergaard and Peterson, knowing the Hbeta line width and the continuum luminosity at 5100A. The pipeline measures the former, we plan to measure the latter from the photometry calibrated with Ian McGreer's mocks.

If we use the QG/QR-1 estimator we do not need the quasar mask, only that of the galaxies which will be provided by the CS82 team in the form of a pixelized mask from visual inspection. The dN/dz of the galaxies is known from photometric redshifts plus spectroscopic training sets. While the galaxies could be split in photometric redshift bins, the gains from doing so are not expected to be large, so our initial investigations will simply cross-correlate the quasars with the magnitude limited galaxy catalog.

Along with this project we will submit a request for EC status for:

Ludo van Waerbeke
Hendrik Hildebrant
David Woods
Thomas Erben

who were instrumental in obtaining and reducing the CS82 data and producing the required galaxy catalog and mask but are not members of the BOSS collaboration.

Details are available at https://www.sdss3.org/internal/publications/cgi-bin/projects.pl/display_project/127

~~~~~~~~~~~~~
The first thing Martin wanted me to do was to approximate the errors bars for the correlation function based on the density of galaxies and quasars in my sample.

Starting with luminosity function in this paper, I am doing the following to estimate the density.

According to Table 1 in Ilbert et. al. the following are the Schechter parameters for the galaxy luminosity function, redshift 0.6-0.8:

ϕ* = 5.01e-3 h^3 Mpc^(-3) mag^(-1)

M* (i-band) = -22.17

α = -1.41

Inputing these into IDL's lf_schechter function we get the following:

mags = findgen(5*20+1)/20. - 23

vals= lf_schechter(mags, 5.01, -22.17, -1.41)

plot, mags, vals*10D^(-3), /ylog, XTITLE = 'Magnitude (iband)', YTITLE = 'log phi', TITLE = 'Luminosity Function', charsize = 2, charthick = 1

To get the density we integrate:

density = 0.05*vals*10D^(-3) #bin size is 0.05 mags

print, total(density)

0.041830996

=4.18 * 10^-2 h^3 Mpc^-3

This is similar to what they get in this paper by Faber et al:

According to Faber Paper the luminosity density:

log10(j_B) = 8.5 (@redshift 0.7) solar luminosity = 10^10 solar luminosity / galaxy

j_b = 10^8.5 solar lum = 10^(-1.5) galaxies / Mpc^3 (h = 0.7)

= 3.16*10^-2 galaxies / Mpc^3 (h = .7)

= 9.21 *10^-2 galaxies h^3 / Mpc^3

So we are looking at a density of something in the ballpark of 0.05 galaxies per (Mpc/h)^3.

To get it directly from the data:

5.5254 million galaxies (once mask/cuts applied)

sky area = 166 deg^2

redshift range = 0 - 1

redshift 1 = 2312.67 Mpc / h (in comoving distance)

volume = 166 sq-deg / (3282.90 sq-deg) * 4/3 pi r^3 = 2,619,871,820 (Mpc / h)^3

density = 2.00 *10^-3 galaxies * h^3 * Mpc^(-3)

This is an order of magnitude smaller. Not sure why.... am I perhaps doing the volume calculation incorrectly?

4257 quasars (out to redshift of 1, in the cfht footprint)

density = 1.66e-06 qsos * h^3 * Mpc^(-3)

Exchange with Martin White, RE: Estimating Errors

Martin,

I've figured out the density of the galaxies/qsos from both the LFs and the catalogs, and would like to estimate the errors in the correlation function. I understand that the errors go as 1 / sqrt(pair counts) in each bin. But going from galaxy/qso density to pair counts in a bin is where I am a bit lost. I asked Alexie, and she thought that I just take the density and multiply it by the volume of each bin in the correlation function, and then use that number to compute the pair counts. I don't see how that is the same as the pair counts for the correlation function. The correlation function is measuring separation, so how is that the same as the number of pairs in a volume with a side of the separation?

I went ahead and calculated the correlation function with the following bins (degrees):

theta = (1.0000000e-05, 3.1622777e-05, 0.00010000000, 0.00031622777, 0.0010000000, 0.0031622777, 0.010000000, 0.031622777, 0.10000000, 0.31622777, 1.0000000)

I get the following number of dd pairs in each bin:

dd = (589.000, 561.000, 49.0000, 386.000, 4675.00, 41535.0, 394556, 3.81242e+06, 3.60765e+07, 2.94166e+08)

1/sqrt(dd) = (0.0412043, 0.0422200, 0.142857, 0.0508987, 0.0146254, 0.00490674, 0.00159201, 0.000512153, 0.000166490, 5.83048e-05)

The catalogs are not properly masked, so we might get a reduction in dd values by perhaps 30% once we mask the data.

This would result in the following:

dd = (412.300 392.700 34.3000 270.200 3272.50 29074.5

276189. 2.66869e+06 2.52536e+07 2.05916e+08)

1/(sqrt(dd)) = ( 0.0492485 0.0504626 0.170747 0.0608355 0.0174808 0.00586467

0.00190282 0.000612140 0.000198993 6.96875e-05)

I'm currently running the rr, and will have a "correlation function" this afternoon. Although this will of course be wrong, because I'm not masking properly yet. I should get the masks from Alexia today or tomorrow.

Jessica

~~~~~~

Jessica,

I understand that the errors go as 1 / sqrt(pair counts) in each bin. But going from galaxy/qso density to pair counts in a bin is where I am a bit lost.

If you think of the very simplest correlation function estimator that you can write down, xi=DD/RR-1, and imagine that you have so many randoms the fluctuations in RR are negligible then you see that the errors in 1+xi are given by the fluctuations in the counts of DD in a bin. Assuming Poisson statistics, the fractional error in 1+xi goes as 1/sqrt{Npair} where Npair is the number of quasar-galaxy pairs.

For a 3D correlation function the number of data pairs goes as Nqso times Nbar-galaxy times 1+xi times the volume of the bin (in 3D, e.g. 4\pi s^2 ds for a spherical shell). Just think of what the code does: sit on each quasar and count all the galaxies in the bin. To go from a 3D correlation function to a 2D correlation function you need to integrate in the Z direction. But remember that the sum of independent Poisson distributions is also a Poisson with a mean equal to the sum of the means of the contributing parts. So this allows you to figure out what the error on wp is. You should see that as you integrate to very large line-of-sight distance things become noisier. So choose something like +/-50Mpc/h for the width in line-of-sight distance to integrate over in defining wp.

It's a little easier to understand if you write the defining equations out for yourself on a piece of paper.

Martin

~~~~~~~

Project Reading list
http://arxiv.org/abs/0802.2105
https://trac.sdss3.org/wiki/BOSS/quasars/black_hole_masses

http://iopscience.iop.org/0004-637X/665/1/265/pdf/62903.web.pdf

http://articles.adsabs.harvard.edu//full/1989ApJ...343....1D/0000011.000.html

http://adsabs.harvard.edu/abs/2005A%26A...439..863I

Tuesday, May 17, 2011

Reconstruction with Log Binning

I spent a while playing with the binning that goes into gximat to try to get the point in sximat to overlap more with the correlation functions, but I just couldn't get more than a couple to overlap with the current binning limitations.

Here is code that I used to play with this stuff: ../logs/110513log.py

So I've decided to re-run with some different binnings and see if this helps.

I spent the last day coding up an automation of log binning, and binning in equal-number-of-galaxies in each bin. Alexia had said she wasn't sure how the reconstruction would work with this binning, so we'll see how it turns out.

Here is the code (also in ../logs/110516log.py):

~~~~~~~~~~~

import numpy as N
from pylab import *
from matplotlib import *
import crlnew as crl
from correlationData import *
import galsphere as gals
import datetime as dt

# Makes the working directory for this run.
# Creates directory and returns string of directory name.
workingDir = getworkingdir()

# Create file names
mockLRGfile,photofile,specfile,corr2dfile,corr3dfile,photo2dfile,photoCDfile,spec2dfile,\
spec3dfile,argumentFile,runConstantsFile,wpsMatrixFile,xiMatrixFile,rbinsfile=makeFileNamesNew(workingDir)

#Angular Correlation
oversample = 5.
corrBins = 80.0 #should be one more than you want it to be
mincorr = 0.0001
maxcorr = 10.0
convo = 180./pi
tlogbin = 1 # = 0 for uniform spacing in theta, = 1 for log spacing in theta

#Auto Correlation
boxside = 1000.0 #in Mpc/h
xicorrmin = 0.0001
xicorrmax = 20.0
nxicorr = 40
rlogbin = 1 # = 0 for uniform spacing in r, = 1 for log spacing in r

#make redshift bins
#boxside = 1000.0 #in Mpc/h
zlogbin = 2 # = 0 for uniform spacing in z, = 1 for log spacing in z, = 2 for uniform spacing in number of objects
rbinmin = 0.0001 #in Mpc/h, can't be zero if doing log binning
nrbins = 100
rbinspace = boxside/nrbins #in Mpc/h
rbinmax = boxside/2.0 #in Mpc/h

if zlogbin == 0: rbins = makeRbins(rbinmin,rbinmax,rbinspace)
if zlogbin == 1: rbins = logspace(start = log10(rbinmin), stop = log10(rbinmax), num=nrbins+1, base = 10)

#Catalog sizes
psubsize=1000000
ssubsize=1000000

#Make the spectroscopic and photometric galaxies from mock catalogs
#Return selection functions for photometric and spectroscopic data sets
sfphoto,sfspec,bins=makeSpecPhotoGalaxyFiles(mockLRGfile,workingDir,nxicorr,photofile,specfile)

#Make the spectroscopic and photometric galaxies from mock catalogs
specX,specY,specZ,specDec,specRa,specCD,photoX,photoY,photoZ,photoDec,photoRa,photoCD=getSpecPhotoData(photofile,specfile,psubsize,ssubsize,boxside)

#Make rbins that are uniform in number of galaxies per bin
nspec = size(specCD)
nspecperbin = round(nspec*1.0/nrbins)
sortspecCD = sort(specCD)
rbins2 = linspace(start = min(specCD), stop = max(specCD), num = nrbins+1)
for i in range(nrbins-1): rbins2[i+1] = sortspecCD[(i+1)*nspecperbin-1]

if zlogbin == 2: rbins = rbins2

#Write rbins to a file
writeRbinsToFile(rbinsfile, rbins)

#Read rbins from file
rbins = readRbinsFromFile(rbinsfile)
rbinmin = min(rbins)
rbinmax = max(rbins)
nrbins = len(rbins) - 1

#Write these constants to a file such that if you want to rerun
#You can just read from this file.
writeRunConstantsToFileAllNew(mockLRGfile,photofile,specfile,corr2dfile,corr3dfile,photo2dfile,photoCDfile, \
spec2dfile,spec3dfile,argumentFile,runConstantsFile,wpsMatrixFile,xiMatrixFile,rbinsfile,oversample,corrBins,mincorr, \
maxcorr, boxside,xicorrmin,xicorrmax,nxicorr,rbinmin,rbinmax,nrbins,psubsize,ssubsize,tlogbin,rlogbin,zlogbin)

#get mask (this assumes that the data is in a contiguous box)
minRaP,maxRaP,minDecP,maxDecP,minCDP,maxCDP=getmask(photoRa,photoDec,photoCD)
minRaS,maxRaS,minDecS,maxDecS,minCDS,maxCDS=getmask(specRa,specDec,specCD)

#write masks to file
writemasks(workingDir,minRaP,maxRaP,minDecP,maxDecP,minCDP,maxCDP,minRaS,maxRaS,minDecS,maxDecS,minCDS,maxCDS)

#run cross-correlation function on spectroscopic and photometric data
#run if linear theta binning
if tlogbin == 0: runcrossCorr(workingDir, argumentFile, corr2dfile, spec2dfile, photo2dfile, nrbins, minRaS, maxRaS, minDecS, maxDecS, minRaP, maxRaP, minDecP, maxDecP, mincorr, maxcorr, corrBins) #no log bins

#run if log theta binning
if (tlogbin == 1) & (mincorr > 0): runcrossCorrLog(workingDir, argumentFile, corr2dfile, spec2dfile, photo2dfile, nrbins, minRaS, maxRaS, minDecS, maxDecS, minRaP, maxRaP, minDecP, maxDecP, mincorr, maxcorr, corrBins) # log bins
else: print "mincorr must be > 0"

#run auto correlation function on spectroscopic data
#run if linear r binning
if rlogbin == 0: runautoCorr(workingDir,corr3dfile,xicorrmin,xicorrmax,nxicorr,rbins,specX,specY,specZ,specRa,specDec,specCD) #no log bins

#run if log r binning
if (rlogbin == 1) & (xicorrmin > 0): runautoCorrLog(workingDir,corr3dfile,xicorrmin,xicorrmax,nxicorr,rbins,specX,specY,specZ,specRa,specDec,specCD) # log bins
else: print "xicorrmin must be > 0"

Running now, plots to follow soon!

Wednesday, May 11, 2011

More Testing of ximat and sximat

Talked to Alexia. She said that probably there are several things going on that could explain the results from yesterdays post. First of all, I had reverted to using an older version of crl when doing this reconstruction, this older version didn't use interpolation, but instead assumed that the correlation functions are power-laws. This explains why they are just straight lines.

Also she thinks that perhaps there is something going wrong with the conversion between Mpc and Gpc which might be the cause of the offset.

She suggested I go back to using the latest version of the code (with the interpolation turned on) and see what these same plots look like.

The reconstruction using the latest version of the code doesn't match very well now, however there isn't a lot of variation when changing the binning between 39-41:

However, when I go to less bins (14-15), we do see a lot of variation in the reconstruction:

In terms of how ximat and sximat compare to the correlation functions, the main thing that I notice is that they don't overlap very many points. I would think we would want more points to overlap to properly do the conversion: