Showing posts with label binning. Show all posts
Showing posts with label binning. Show all posts

Tuesday, May 31, 2011

More Reconstruction Test

I ran the reconstruction with a different selection function to see if that made things better. The photometric distribution is shown here (workingDir is here: 'run2011530_1632'):


I'm still getting the issue that I am having trouble finding phibins that don't give me the error. But here are some reconstructions, the first one looks good-ish, the second one not so much:

Note that the label on the y-axis is not right, there are more objects in each bin than it says, I just had to normalize them to be on the same scale.

I'm running again with small angles to see if this helps. This smaller angle run is here (workingDir = 'run2011531_1451').

The angular correlation functions are freaking out at small angles. Not sure what that is about:



They look normal starting around 10^-3:


I tried doing the reconstruction both using all the angular bins, and just the angular bins that weren't freaking out, the look the same:

less angular bins:
all angular bins:

Here are some other reconstruction binning, disturbingly I am still getting the "feature" where if I change the # of bins slightly (from 15 to 16) I get very different reconstructions:




Because I am still not sure how doing this "non uniform" binning in r effects the reconstruction, I am running again, but with uniform binning. We'll see how that goes.

Tuesday, May 24, 2011

Reconstruction (Log Binning) Results

In this post I discussed trying different binning to get the reconstruction to overlap more with the actual correlation functions.

Here is the binning I am using. The binning is log in theta (for 2d correlation function) and rlos (for 3d correlation function), but the binning in redshift is such that each bin has the same number of galaxies.

Here are the arrays:

rbins = array([ 4.592407, 136.86079 , 170.595585, 194.675063, 211.888514,
227.822058, 242.439338, 254.738113, 267.180894, 277.596667,
287.557931, 297.031654, 305.563449, 314.144417, 322.061375,
329.842934, 337.956897, 345.179652, 352.14308 , 359.026505,
365.954462, 372.052363, 378.261215, 384.175881, 389.837465,
395.409799, 400.562692, 405.823926, 411.006328, 416.036745,
420.853707, 425.608283, 430.068758, 434.598696, 438.968533,
443.201263, 447.527471, 451.469393, 455.440266, 459.405314,
463.13316 , 466.657868, 470.378975, 474.188462, 478.033728,
481.934477, 485.61191 , 489.252768, 492.81945 , 496.462536,
499.999607]

theta = array([ 1.12201850e-03, 1.41253750e-03, 1.77827940e-03,
2.23872110e-03, 2.81838290e-03, 3.54813390e-03,
4.46683590e-03, 5.62341330e-03, 7.07945780e-03,
8.91250940e-03, 1.12201845e-02, 1.41253754e-02,
1.77827941e-02, 2.23872114e-02, 2.81838293e-02,
3.54813389e-02, 4.46683592e-02, 5.62341325e-02,
7.07945784e-02, 8.91250938e-02, 1.12201845e-01,
1.41253754e-01, 1.77827941e-01, 2.23872114e-01,
2.81838293e-01, 3.54813389e-01, 4.46683592e-01,
5.62341325e-01, 7.07945784e-01, 8.91250938e-01,
1.12201845e+00, 1.41253754e+00, 1.77827941e+00,
2.23872114e+00, 2.81838293e+00, 3.54813389e+00,
4.46683592e+00, 5.62341325e+00, 7.07945784e+00,
8.91250938e+00])

rlos = array([ 0.52962686, 0.59425111, 0.66676072, 0.74811783,
0.83940201, 0.94182454, 1.05674452, 1.18568685,
1.33036253, 1.49269131, 1.6748272 , 1.87918702,
2.10848252, 2.36575629, 2.65442222, 2.97831072,
3.34171959, 3.74947105, 4.20697571, 4.72030438,
5.29626863, 5.94251114, 6.66760716, 7.48117828,
8.39402009, 9.41824545, 10.5674452 , 11.85686853,
13.3036253 , 14.92691309, 16.74827196, 18.79187021,
21.08482517, 23.65756295, 26.54422221, 29.78310718,
33.41719588, 37.49471047, 42.06975708, 47.20304381])

Here are plots of the 2D and 3D correlation functions:



I've found it hard to find phibin spacing that doesn't go below the limits of the correlation function. I get the following error with a lot of binning:

sximat=crl.getximat(th,rlos,phibins,rs,xispec,xierr)
0.000505647568659 is less than minimum r_ss 0.000529627
refusing to extrapolate

Perhaps I should allow it to extrapolate to smaller angles....

but the following phibins works:

phibins = array([ 0.01 , 0.03578947, 0.06157895, 0.08736842, 0.11315789,
0.13894737, 0.16473684, 0.19052632, 0.21631579, 0.24210526,
0.26789474, 0.29368421, 0.31947368, 0.34526316, 0.37105263,
0.39684211, 0.42263158, 0.44842105, 0.47421053, 0.5 ])

Here is what the xi and sximat plots look like, as you can see they overlap more now:

But the reconstruction isn't so great:


If I change the binning (here is 50 bins):

phibins =
array([ 0.00459241, 0.13686079, 0.17059558, 0.19467506, 0.21188851,
0.22782206, 0.24243934, 0.25473811, 0.26718089, 0.27759667,
0.28755793, 0.29703165, 0.30556345, 0.31414442, 0.32206137,
0.32984293, 0.3379569 , 0.34517965, 0.35214308, 0.3590265 ,
0.36595446, 0.37205236, 0.37826121, 0.38417588, 0.38983746,
0.3954098 , 0.40056269, 0.40582393, 0.41100633, 0.41603674,
0.42085371, 0.42560828, 0.43006876, 0.4345987 , 0.43896853,
0.44320126, 0.44752747, 0.45146939, 0.45544027, 0.45940531,
0.46313316, 0.46665787, 0.47037898, 0.47418846, 0.47803373,
0.48193448, 0.48561191, 0.48925277, 0.49281945, 0.49646254,
0.49999961])

Another thing is that now I have so many theta, rlos, r bins the crl.getCoeff takes a long time to run.


I suppose this looks better. At this point I need to either try a different binning, or re-write the crl.getximat to allow extrapolation to smaller angles / separations to that I can try different reconstructions. Opinions Alexia?

Tuesday, May 17, 2011

Reconstruction with Log Binning

I spent a while playing with the binning that goes into gximat to try to get the point in sximat to overlap more with the correlation functions, but I just couldn't get more than a couple to overlap with the current binning limitations.

Here is code that I used to play with this stuff: ../logs/110513log.py

So I've decided to re-run with some different binnings and see if this helps.

I spent the last day coding up an automation of log binning, and binning in equal-number-of-galaxies in each bin. Alexia had said she wasn't sure how the reconstruction would work with this binning, so we'll see how it turns out.

Here is the code (also in ../logs/110516log.py):

~~~~~~~~~~~

import numpy as N
from pylab import *
from matplotlib import *
import crlnew as crl
from correlationData import *
import galsphere as gals
import datetime as dt

# Makes the working directory for this run.
# Creates directory and returns string of directory name.
workingDir = getworkingdir()

# Create file names
mockLRGfile,photofile,specfile,corr2dfile,corr3dfile,photo2dfile,photoCDfile,spec2dfile,\
spec3dfile,argumentFile,runConstantsFile,wpsMatrixFile,xiMatrixFile,rbinsfile=makeFileNamesNew(workingDir)


#Angular Correlation
oversample = 5.
corrBins = 80.0 #should be one more than you want it to be
mincorr = 0.0001
maxcorr = 10.0
convo = 180./pi
tlogbin = 1 # = 0 for uniform spacing in theta, = 1 for log spacing in theta


#Auto Correlation
boxside = 1000.0 #in Mpc/h
xicorrmin = 0.0001
xicorrmax = 20.0
nxicorr = 40
rlogbin = 1 # = 0 for uniform spacing in r, = 1 for log spacing in r

#make redshift bins
#boxside = 1000.0 #in Mpc/h
zlogbin = 2 # = 0 for uniform spacing in z, = 1 for log spacing in z, = 2 for uniform spacing in number of objects
rbinmin = 0.0001 #in Mpc/h, can't be zero if doing log binning
nrbins = 100
rbinspace = boxside/nrbins #in Mpc/h
rbinmax = boxside/2.0 #in Mpc/h


if zlogbin == 0: rbins = makeRbins(rbinmin,rbinmax,rbinspace)
if zlogbin == 1: rbins = logspace(start = log10(rbinmin), stop = log10(rbinmax), num=nrbins+1, base = 10)

#Catalog sizes
psubsize=1000000
ssubsize=1000000


#Make the spectroscopic and photometric galaxies from mock catalogs
#Return selection functions for photometric and spectroscopic data sets
sfphoto,sfspec,bins=makeSpecPhotoGalaxyFiles(mockLRGfile,workingDir,nxicorr,photofile,specfile)

#Make the spectroscopic and photometric galaxies from mock catalogs
specX,specY,specZ,specDec,specRa,specCD,photoX,photoY,photoZ,photoDec,photoRa,photoCD=getSpecPhotoData(photofile,specfile,psubsize,ssubsize,boxside)

#Make rbins that are uniform in number of galaxies per bin
nspec = size(specCD)
nspecperbin = round(nspec*1.0/nrbins)
sortspecCD = sort(specCD)
rbins2 = linspace(start = min(specCD), stop = max(specCD), num = nrbins+1)
for i in range(nrbins-1): rbins2[i+1] = sortspecCD[(i+1)*nspecperbin-1]

if zlogbin == 2: rbins = rbins2


#Write rbins to a file
writeRbinsToFile(rbinsfile, rbins)

#Read rbins from file
rbins = readRbinsFromFile(rbinsfile)
rbinmin = min(rbins)
rbinmax = max(rbins)
nrbins = len(rbins) - 1


#Write these constants to a file such that if you want to rerun
#You can just read from this file.
writeRunConstantsToFileAllNew(mockLRGfile,photofile,specfile,corr2dfile,corr3dfile,photo2dfile,photoCDfile, \
spec2dfile,spec3dfile,argumentFile,runConstantsFile,wpsMatrixFile,xiMatrixFile,rbinsfile,oversample,corrBins,mincorr, \
maxcorr, boxside,xicorrmin,xicorrmax,nxicorr,rbinmin,rbinmax,nrbins,psubsize,ssubsize,tlogbin,rlogbin,zlogbin)


#get mask (this assumes that the data is in a contiguous box)
minRaP,maxRaP,minDecP,maxDecP,minCDP,maxCDP=getmask(photoRa,photoDec,photoCD)
minRaS,maxRaS,minDecS,maxDecS,minCDS,maxCDS=getmask(specRa,specDec,specCD)

#write masks to file
writemasks(workingDir,minRaP,maxRaP,minDecP,maxDecP,minCDP,maxCDP,minRaS,maxRaS,minDecS,maxDecS,minCDS,maxCDS)

#run cross-correlation function on spectroscopic and photometric data
#run if linear theta binning
if tlogbin == 0: runcrossCorr(workingDir, argumentFile, corr2dfile, spec2dfile, photo2dfile, nrbins, minRaS, maxRaS, minDecS, maxDecS, minRaP, maxRaP, minDecP, maxDecP, mincorr, maxcorr, corrBins) #no log bins


#run if log theta binning
if (tlogbin == 1) & (mincorr > 0): runcrossCorrLog(workingDir, argumentFile, corr2dfile, spec2dfile, photo2dfile, nrbins, minRaS, maxRaS, minDecS, maxDecS, minRaP, maxRaP, minDecP, maxDecP, mincorr, maxcorr, corrBins) # log bins
else: print "mincorr must be > 0"



#run auto correlation function on spectroscopic data
#run if linear r binning
if rlogbin == 0: runautoCorr(workingDir,corr3dfile,xicorrmin,xicorrmax,nxicorr,rbins,specX,specY,specZ,specRa,specDec,specCD) #no log bins


#run if log r binning
if (rlogbin == 1) & (xicorrmin > 0): runautoCorrLog(workingDir,corr3dfile,xicorrmin,xicorrmax,nxicorr,rbins,specX,specY,specZ,specRa,specDec,specCD) # log bins
else: print "xicorrmin must be > 0"

Running now, plots to follow soon!

Tuesday, May 10, 2011

Different Reconstruction

Here is the reconstruction I ran last night with a different photometric distribution function, below shows the photometric and spectroscopic distributions:

I used the same number of bins and angles as yesterdays post. Here are the correlation functions:


And here are the reconstructions with several different binnings. None of them match as well as yesterdays post, but both 40 and 39 are close-ish:

I talked to Alexia this morning, she suggested that I do the binning sensitivity check and also look at how xmpl_rv (in getximat) compares to sximat. I'm doing that now. Most posts to come shortly.

Here's the code to make the following (also in logs/110509log.py):
import numpy as N
from pylab import *
from matplotlib import *
import crlold as crl
from correlationData import *
import galsphere as gals
import datetime as dt

workingDir = 'run201159_123'

runConstantsFile = './'+workingDir+'/runConstants.dat'

mockLRGfile,photofile,specfile,corr2dfile,corr3dfile,photo2dfile,photoCDfile,spec2dfile,spec3dfile,\
argumentFile,runConstantsFile,wpsMatrixFile,xiMatrixFile,oversample,corrBins,mincorr,maxcorr,boxside,\
xicorrmin,xicorrmax,nxicorr,rbinmin,rbinmax,nrbins,psubsize,ssubsize=readRunConstantsFileAll(runConstantsFile)

rbinspace = int((rbinmax-rbinmin)/(nrbins))
rbins = makeRbins(rbinmin,rbinmax,rbinspace)


#read in masks
minRaP,maxRaP,minDecP,maxDecP,minCDP,maxCDP,minRaS,maxRaS,minDecS,maxDecS,minCDS,maxCDS=readmasks(workingDir)


#Import the photometric redshift distribution
photoCD = readPhotoCDfile(photoCDfile)


#Create a matrix to input the correlation function data
# Write wps Matrix to file
#The number of redshift bins you want to skip for the
#reconstruction (this removes noisy bins from calculation)
skip = 0
makeWPSmatrix2(workingDir,rbins,corrBins, wpsMatrixFile, skip)

#Create a matrix to input the correlation function data
# Write Xi Matrix to file
makeXimatrix2(workingDir,rbins,nxicorr,xiMatrixFile, skip)

#Plot wps correlation functions
numtoplot = int(nrbins-skip) #number of correlation functions you want to display on plot
plotWPS(skip, numtoplot, workingDir, rbins)
numtoplot = 10
plotWPSScreen(skip, numtoplot, workingDir, rbins)

#Plot xi correlation functions
plotXi(skip, numtoplot, workingDir, rbins)
plotXiScreen(skip, numtoplot, workingDir, rbins)


#from commandsAlexia import *
xissfile = xiMatrixFile
wpsfile = wpsMatrixFile
nphibins = size(rbins)

# Read in the data
#spectroscopic autocorrelation function
X=N.loadtxt(xissfile,comments='#')
X=X.T
rs=X[0,:]/1000. #want box units (Gpc, not Mpc)
xispec=X[1:,:] #xi in nspecbins


#2D cross correlation between spec and photo samples
th,rlos,wps=N.loadtxt(wpsfile,comments='#',unpack=True)
th=N.unique(th)
rlos=N.unique(rlos)
#rlos=getcomovingD(rlos) #adjusts from redshift to Mpc/h
rlos=rlos/1000 #want box units (Gpc, not Mpc)

#covariance matrix of wps measurements, wps vector arranged
covar = N.identity(wps.shape[0],float)
xierr = N.identity(xispec.shape[1],float)

# compute the xi matrix using the spec sample,
npbins=39#the number of bins you want reconstructed
reconstructmax = 0.5 #in Gpc
reconstructionmin = 0.0 #in Gpc
phibsize = (reconstructmax - reconstructionmin)/npbins

phibins = linspace(start=reconstructionmin,stop=reconstructmax,num=npbins,endpoint=True)

#phibins =(linspace(2,npbins,num=npbins,endpoint=False)+0.5)*(1./npbins)
sximat=crl.getximat(th,rlos,phibins,rs,xispec,xierr)

# invert to recover phi
phircvd=crl.getCoeff(wps,covar,sximat)
for i in range(len(phircvd.x)):
if (phircvd.x[i] < 0):
phircvd.x[i]=0

print 'the recovered selection function is not normalized.'

outphi4 = phircvd.x
phibins4 = phibins

start = 1
end = npbins - 1

#junkbins = linspace(start=0,stop=0.5,num=100,endpoint=True)
junkbins = phibins
junk = histogram(photoCD/1000., bins=junkbins)
junkdata = junk[0]
plot(junkbins[start:end], junkdata[start:end], 'r.-', label = 'photometric data')
plot((phibins4[start:end]+1.0*phibsize), outphi4[start:end]*sum(junkdata[start:end])/sum(outphi4[start:end]), 'g-', label='reconstruction %i'%npbins)

xlabel('comoving distance (Gpc/h)')
ylabel('number of objects per bin')
title('Redshift Distribution')
from pylab import *
legend(loc=2)


Monday, May 9, 2011

Going back to Working Reconstruction

I spent today trying to reproduce the result from this post when the reconstruction was working on the Mock Data.

I got it working... here is the code (also in ../logs/110509log.py):

#~~~~~~~~~~~~~~~~~

import numpy as N
from pylab import *
from matplotlib import *
import crlold as crl
from correlationData import *
import galsphere as gals
import datetime as dt

workingDir = 'run2010211_1354'

# Create file names
mockLRGfile,photofile,specfile,corr2dfile,corr3dfile,photo2dfile,photoCDfile,spec2dfile,\
spec3dfile,argumentFile,runConstantsFile,wpsMatrixFile,xiMatrixFile=makeFileNames(workingDir)


oversample = 5.
corrBins = 40.0 #should be one more than you want it to be
mincorr = 0.01
maxcorr = 20.0
convo = 180./pi
corrspace = (maxcorr - mincorr)/corrBins
corrarray = makecorrarray(mincorr, maxcorr, corrspace)

nbins = 40
boxside = 1000.0 #in Mpc/h
xicorrmin = 0.001
xicorrmax = 20.0
nxicorr = 40

#make redshift bins
#boxside = 1000.0 #in Mpc/h
rbinmin = 0 #in Mpc/h
rbinspace = boxside/40 #in Mpc/h
rbinmax = boxside/2.0 + 0.0001 #in Mpc/h
rbins = makeRbins(rbinmin,rbinmax,rbinspace)
nrbins = rbins.size - 1


photoCD = readPhotoCDfile(photoCDfile)


#Create a matrix to input the correlation function data
# Write wps Matrix to file
#The number of redshift bins you want to skip for the
#reconstruction (this removes noisy bins from calculation)
skip = 0
makeWPSmatrix2(workingDir,rbins,corrBins, wpsMatrixFile, skip)

#Create a matrix to input the correlation function data
# Write Xi Matrix to file
makeXimatrix2(workingDir,rbins,nxicorr,xiMatrixFile, skip)


#Plot wps correlation functions
numtoplot = int(nrbins-skip) #number of correlation functions you want to display on plot
numtoplot = 10
plotWPSScreen(skip, numtoplot, workingDir, rbins)
pylab.legend(loc=1)

#Plot xi correlation functions
plotXiScreen(skip, numtoplot, workingDir, rbins)


# compute the xi matrix using the spec sample,
npbins=40#the number of bins you want reconstructed
reconstructmax = 0.5 #in Gpc
reconstructionmin = 0.0 #in Gpc

phibins = linspace(start=reconstructionmin,stop=reconstructmax,num=npbins,endpoint=True)
#phibins =(linspace(2,npbins,num=npbins,endpoint=False)+0.5)*(1./npbins)
sximat=crl.getximat(th,rlos,phibins,rs,xispec,xierr)

# invert to recover phi
phircvd=crl.getCoeff(wps,covar,sximat)
for i in range(len(phircvd.x)):
if (phircvd.x[i] < 0):
phircvd.x[i]=0

print 'the recovered selection function is not normalized.'

outphi4 = phircvd.x
phibins4 = phibins

start = 1
end = npbins - 1

#junkbins = linspace(start=0,stop=0.5,num=100,endpoint=True)
junkbins = phibins
junk = histogram(photoCD/1000., bins=junkbins)
junkdata = junk[0]
plot(junkbins[start:end], junkdata[start:end], 'r.-', label = 'photometric data')
plot((phibins4[start:end]+1.0*phibsize), outphi4[start:end]*sum(junkdata[start:end])/sum(outphi4[start:end]), 'g-', label = 'phi')


plot(phibins4[start:end],outphi4[start:end]*sum(junkdata[start:end])/sum(outphi4[start:end]),'c.-',label='reconstruction %i'%npbins)
#plot((phibins4[2:]-1.5*phibsize)/1000, phi[2:]*size(photoCD), 'g-', label = 'phi')

xlabel('comoving distance (Gpc/h)')
ylabel('number of objects per bin')
title('Redshift Distribution')
from pylab import *
legend(loc=2)

#~~~~~~~~~~~~~~~~~


It involved reverting back to an old version of crl (I saved it as crlold.py) and an older version of the plotting.

Here are the correlation functions:



I am getting the same problems with slight changes to the binning, changing how well the reconstruction works. For instance the best reconstruction is here (with 40 bins):


But if I change to 39 or 41 bins, I get a very different reconstruction:


Which makes me wonder if was just dumb luck that it worked for that particular binning in the first place.

I'm also running with a more interesting photometric distribution function to see how that looks. More to come....

Friday, May 6, 2011

Computing Time

One interesting thing I've realized in trying to compare the 2D auto-correlation with the 3D auto-correlation (on the same files) is that the 2D auto-correlation takes MUCH longer to compute than the 3D. This is strange because in this test they are computing the same number of pairs, well barring some differences in the gridding. I must be doing something very inefficient with the 2D calculation, or measuring pairs out to much larger distances than in the 3D case. The 3D auto-correlation function took less than a minute to compute, and the 2D auto-correlation function is still running, and it's been 51 minutes.

Here are the 2D and 3D autocorrelation functions on the whole set (on big redshift bin)



Below is the 3D and 2D autocorrelation on a smaller (100Mpc/h) redshift bin:



At redshift (well actually comoving distance 350 Mpc/h), what are these angular separations in Mpc/h?

What noise levels would be expect just from shot noise?

How do we expect the slopes to compare?

I talked to David, Eric, Nic and Matt about this. They gave me some references to check out and also suggested that (just as a check) I set the redshifts of the 3d correlation function to all be the same, and then calculate the 3D and 2D, and compare them, they should basically be the same then, or at least proportional to each other.

David also suggested (as a way to figure out where my troubles are coming from.... to go back to the mocks, when the reconstruction was working)... and to then try applying the BOSS Geometry to these mocks and see if I can get the code to work on the mocks with the BOSS Masks. Then once that is working, swap in the BOSS data.

It's worth a try!

References to read:
Dodelson
Peebles 1980
Limber 1954

Still need to do the binning experiments that Alexia suggested. So much to do, so few hours in the day.

BOSS Reconstruction Plots

Here is a reconstruction I tried on the BOSS data. I was hoping that fixing the 2D correlation function might help things, but it doesn't seem to be working that well. Here are the correlation functions:



The 2D correlation functions are still nosier, which is mystifying because they have more objects that are being correlated. I can only assume this has something to do with the different clustering properties of the two sets. I am going to try doing the 2D correlation function on the same files as the 2D and to see if it is still as noisy. If it is, then perhaps this signifies there is still a problem there.

The reconstructions aren't great, but they aren't bouncing around as much as they were before either.









Wednesday, July 7, 2010

Alexia Mimic Reconstruction

In order to try to pin down the differences between Alexia and my reconstruction, I've tried to mimic the conditions under which she did her reconstruction for her paper.

The binning is as follows:
#Angular Correlation
corrBins = 40.0 #should be one more than you want it to be
mincorr = 6.0 #in degrees
maxcorr = 12.0 #in degrees

#Auto Correlation
xicorrmin = 5.0
xicorrmax = 50.0
nxicorr = 40

#make redshift bins
rbinmin = 0.0 #in Mpc/h
rbinspace = 100.0 #in Mpc/h
rbinmax = 500.0 #in Mpc/h
nrbins = 5.0

Here are plots...

Correlation Functions

Distribution Functions and Data

Best Fit Reconstruction

A couple things I am noticing...
First: I still get a problem with changing the binning and the reconstruction going all weird.
Second: Even in this best fit scenario I am getting a mis-fit between the reconstruction and the data, could there be a problem with the lining up of my bins? Like perhaps something with left/right binning in the histogram?
Third: The reconstruction tends to go weird at the ends, even though both ends are contained within the redshift bins.

Some other reconstructions that aren't working as well:

The log file for this is here: ../logs/100707log.py
The run directory is here: ../pythonJess/run201077_1424