Showing posts with label python. Show all posts
Showing posts with label python. Show all posts

Monday, April 2, 2012

Pretty Plots - 2D Histogram with 1D Histograms on Axes

Note: This post has been cross-posted to AstroBetter with a slightly more readable version of the code.  You might want to check it out there.

In a long list of plots that Martin wants me to add to the paper draft I sent out last week, the following:

Is it possible to show the histograms projected along each axis in addition to the 2D density? I know people do this in IDL frequently, I'm not sure how to do this in matplotlib. If it's possible we could play a similar game with the L-z figure. The 1D histograms contain lots of useful information, including how significant our clustering detection is in each bin.

Ask and you shall receive Martin. Below is the "money plot" (the temperature 2D histogram which shows amplitudes the bright versus dim correlation function) with histograms on the side of either axis.


I based this plot on code from here.

There are some cool features that I'll describe in the comments below. I think this plot rocks.

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import NullFormatter
def makeTempHistogramPlot(xdata,ydata,rexp,filename=None,xlims=-99, ylims =-99 , \
nxbins = 50,nybins=50, bw=0, nbins=100,contours=1,sigma=1,line=1):
#bw = 0 for color, = 1 for black and white
#line = 0 for no line, =1 for line
#sigma = 1 for display % below line, =0 for not
#contours = 1 for display 1,2,3 sigma contours, = 0 for not.

# Define the x and y data
x = xdata
y = ydata

# Set up default x and y limits
if (xlims == -99): xlims = [0,max(x)]
if (ylims == -99): ylims = [0,max(y)]

# Set up your x and y labels
xlabel = '$\mathrm{Your\\ X\\ Label}$'
ylabel = '$\mathrm{Your\\ X\\ Label}$'
mtitle = ''

# Define the locations for the axes
left, width = 0.12, 0.55
bottom, height = 0.12, 0.55
bottom_h = left_h = left+width+0.02

# Set up the geometry of the three plots
rect_temperature = [left, bottom, width, height] # dimensions of temp plot
rect_histx = [left, bottom_h, width, 0.25] # dimensions of x-histogram
rect_histy = [left_h, bottom, 0.25, height] # dimensions of y-histogram

# Set up the size of the figure
fig = plt.figure(1, figsize=(9.5,9))

# Make the three plots
axTemperature = plt.axes(rect_temperature) # temperature plot
axHistx = plt.axes(rect_histx) # x histogram
axHisty = plt.axes(rect_histy) # y histogram

# Remove the inner axes numbers of the histograms
nullfmt = NullFormatter()
axHistx.xaxis.set_major_formatter(nullfmt)
axHisty.yaxis.set_major_formatter(nullfmt)

# Find the min/max of the data
xmin = min(xlims)
xmax = max(xlims)
ymin = min(ylims)
ymax = max(y)

# Make the 'main' temperature plot
xbins = linspace(start = 0, stop = xmax, num = nxbins)
ybins = linspace(start = 0, stop = ymax, num = nybins)
xcenter = (xbins[0:-1]+xbins[1:])/2.0
ycenter = (ybins[0:-1]+ybins[1:])/2.0
aspectratio = 1.0*(xmax - 0)/(1.0*ymax - 0)
H, xedges,yedges = N.histogram2d(y,x,bins=(ybins,xbins))
X = xcenter
Y = ycenter
Z = H

# Plot the temperature data
if(bw): cax = axTemperature.imshow(H, extent=[xmin,xmax,ymin,ymax], \
interpolation='nearest', origin='lower',aspect=aspectratio, cmap=cm.gist_yarg)
else : cax = axTemperature.imshow(H, extent=[xmin,xmax,ymin,ymax], \
interpolation='nearest', origin='lower',aspect=aspectratio)

# Plot the temperature plot contours
if(bw): contourcolor = 'black'
else: contourcolor = 'white'

if (contours==0):
print ''
elif (contours==1):
xcenter = N.mean(x)
ycenter = N.mean(y)
ra = N.std(x)
rb = N.std(y)
ang = 0
X,Y=ellipse(ra,rb,ang,xcenter,ycenter)
axTemperature.plot(X,Y,"k:",ms=1,linewidth=2.0)
axTemperature.annotate('$1\\sigma$', xy=(X[15], Y[15]), xycoords='data',xytext=(10, 10), textcoords='offset points',horizontalalignment='right', verticalalignment='bottom',fontsize=25)
X,Y=ellipse(2*ra,2*rb,ang,xcenter,ycenter)
axTemperature.plot(X,Y,"k:",color = contourcolor,ms=1,linewidth=2.0)
axTemperature.annotate('$2\\sigma$', xy=(X[15], Y[15]), xycoords='data',xytext=(10, 10), textcoords='offset points',horizontalalignment='right', verticalalignment='bottom',fontsize=25, color = contourcolor)
X,Y=ellipse(3*ra,3*rb,ang,xcenter,ycenter)
axTemperature.plot(X,Y,"k:",color = contourcolor, ms=1,linewidth=2.0)
axTemperature.annotate('$3\\sigma$', xy=(X[15], Y[15]), xycoords='data',xytext=(10, 10), textcoords='offset points',horizontalalignment='right', verticalalignment='bottom',fontsize=25, color = contourcolor)
else:
xcenter = N.mean(x)
ycenter = N.mean(y)
ra = N.std(x)
rb = N.std(y)
ang = contours*N.pi/180.0
X,Y=ellipse(ra,rb,ang,xcenter,ycenter)
axTemperature.plot(X,Y,"k:",ms=1,linewidth=2.0)
axTemperature.annotate('$1\\sigma$', xy=(X[15], Y[15]), xycoords='data', xytext=(10, 10), textcoords='offset points',horizontalalignment='right', verticalalignment='bottom',fontsize=25)
X,Y=ellipse(2*ra,2*rb,ang,xcenter,ycenter)
axTemperature.plot(X,Y,"k:",ms=1,linewidth=2.0, color = contourcolor)
axTemperature.annotate('$2\\sigma$', xy=(X[15], Y[15]), xycoords='data', xytext=(10, 10), textcoords='offset points',horizontalalignment='right', verticalalignment='bottom',fontsize=25, color = contourcolor)
X,Y=ellipse(3*ra,3*rb,ang,xcenter,ycenter)
axTemperature.plot(X,Y,"k:",ms=1,linewidth=2.0, color = contourcolor)
axTemperature.annotate('$3\\sigma$', xy=(X[15], Y[15]), xycoords='data', xytext=(10, 10), textcoords='offset points',horizontalalignment='right', verticalalignment='bottom',fontsize=25, color = contourcolor)

#Plot the % below line
belowline = 1.0*size(where((x - y) > 0.0))/size(x)*1.0*100
if(sigma): axTemperature.annotate('$%.2f\%%\mathrm{\\ Below\\ Line}$'%(belowline), xy=(xmax-100, ymin+3),fontsize=20, color = contourcolor)

#Plot the axes labels
axTemperature.set_xlabel(xlabel,fontsize=25)
axTemperature.set_ylabel(ylabel,fontsize=25)

#Make the tickmarks pretty
ticklabels = axTemperature.get_xticklabels()
for label in ticklabels:
label.set_fontsize(18)
label.set_family('serif')

ticklabels = axTemperature.get_yticklabels()
for label in ticklabels:
label.set_fontsize(18)
label.set_family('serif')

#Plot the line on the temperature plot
if(line): axTemperature.plot([-1000,1000], [-1000,1000], 'k-', linewidth=2.0, color = contourcolor)

#Set up the plot limits
axTemperature.set_xlim(xlims)
axTemperature.set_ylim(ylims)

#Set up the histogram bins
xbins = N.arange(xmin, xmax, (xmax-xmin)/nbins)
ybins = N.arange(ymin, ymax, (ymax-ymin)/nbins)

#Plot the histograms
if (bw):
axHistx.hist(x, bins=xbins, color = 'silver')
axHisty.hist(y, bins=ybins, orientation='horizontal', color = 'dimgray')
else:
axHistx.hist(x, bins=xbins, color = 'blue')
axHisty.hist(y, bins=ybins, orientation='horizontal', color = 'red')

#Set up the histogram limits
axHistx.set_xlim( 0, max(x) )
axHisty.set_ylim( 0, max(y))

#Make the tickmarks pretty
ticklabels = axHistx.get_yticklabels()
for label in ticklabels:
label.set_fontsize(12)
label.set_family('serif')

#Make the tickmarks pretty
ticklabels = axHisty.get_xticklabels()
for label in ticklabels:
label.set_fontsize(12)
label.set_family('serif')

#Cool trick that changes the number of tickmarks for the histogram axes
axHisty.xaxis.set_major_locator(MaxNLocator(4))
axHistx.yaxis.set_major_locator(MaxNLocator(4))

if(filename):
savefig(filename + '.eps',format = 'eps', transparent=True)
savefig(filename + '.pdf',format = 'pdf', transparent=True)
savefig(filename + '.png',format = 'png', transparent=True)

return 0

Friday, March 30, 2012

no display name and $DISPLAY environment variable

I've been spending a lot of time organizing my code for the cross correlation project to run more efficiently. Part of this is setting up my code so that it can process/run a lot of different runs simultaneously. It's been a lot of fun learning all the cool tricks I can do with python to help with this. I should blog about some of these tricks.... but not right now....

This blog is about an error I was getting when I tried to run my newly optimized code on riemann. I use qsub/PBS to submit jobs, and I was getting the following errors in my qsub.out file:

File "/clusterfs/riemann/software/matplotlib/1.0.0/lib64/python2.7/site-packages/matplotlib/pyplot.py", line 270, in figure **kwargs)
File "/clusterfs/riemann/software/matplotlib/1.0.0/lib64/python2.7/site-packages/matplotlib/backends/backend_tkagg.py", line 83, in new_figure_manager
window = Tk.Tk() File "/clusterfs/riemann/software/Python/2.7.1/lib/python2.7/lib-tk/Tkinter.py", line 1685, in __init__
self.tk = _tkinter.create(screenName, baseName, className, interactive, wantobjects, useTk, sync, use)
_tkinter.TclError: no display name and no $DISPLAY environment variable

I know that when I run jobs on other riemann nodes using qsub, that the X11 forwarding doesn't work properly. Or in other words, I can't view any of the plots I make when I am running an interactive PBS session. However, for this code, I simply wanted to save plots to .pdf files. However, for some reason there isn't a simple command (that I can find) to plot something in python directly to a file. If anyone out there knows how to do this, please help me!

But I did come across the following from matplotlib's faq page:
~~~~~~~~~~~~~~~~~
How To Generate Python/Matplotlib Images Without Having A Window Appear

The easiest way to do this is use a non-interactive backend such as Agg (for PNGs), PDF, SVG or PS. In your figure-generating script, just call the matplotlib.use() directive before importing pylab or pyplot:

import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
plt.plot([1,2,3])
plt.savefig('myfig')
~~~~~~~~~~~~~~~~~
You need to make sure to call matplotlib.use() before pyplot or you will get the following error

UserWarning: This call to matplotlib.use() has no effect because the the backend has already been chosen;
matplotlib.use() must be called *before* pylab, matplotlib.pyplot,
or matplotlib.backends is imported for the first time.
if warn: warnings.warn(_use_error_msg)

I actually had an alias set up so that I call pyplot when I open up python, so I was getting this error until I changed that.

So now, I don't the the "$DISPLAY environment variable" error, because python knows to use the non-interactive backend Agg instead of trying to display to the screen. Yippee!

Friday, October 14, 2011

CosmoloPy

Why didn't I know this package existed before?

http://roban.github.com/CosmoloPy/

Functions Include:
  • Various cosmological densities.
  • Various cosmological distance measures.
  • Galaxy luminosity functions (Schechter functions).
  • Conversion in and out of the AB magnitude system.
  • Pre-defined sets of cosmological parameters (e.g. from WMAP).
  • Perturbation theory and the power spectrum.
  • The reionization of the IGM.
Very cool!



Thursday, October 13, 2011

More python plotting tricks

I absolutely love these python plotting page with examples:
http://matplotlib.sourceforge.net/gallery.html
http://www.scipy.org/Cookbook/Matplotlib

Today I figured out how to change the fonts of the legend on my plot, and add a shadow. (Pretty exciting I know.) I've also made the lines a little bit thicker, and different styles (so that it looks pretty in black and white printing of the paper).

Below is the code/plot:

from matplotlib.font_manager import fontManager, FontProperties
font= FontProperties(size=24);

fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(chi, probsqso, 'k-', label = '$\mathrm{QSOs}$', linewidth=2.0)
ax.plot(chi, probsqsow, 'k:', label = '$\mathrm{QSOs\\ Weighted}$', linewidth=2.0)
ax.plot(chi, prob, 'k--', label = '$\mathrm{Gals}$', linewidth=2.0)

ax.set_xlabel("$\\chi$", fontsize=24)
ax.set_ylabel("$\mathrm{Normalized\ }dN/d\\chi$",fontsize=24)
pylab.legend(loc=2,shadow=True,prop=FontProperties(size=18))


Friday, September 23, 2011

Blogging Update

I've been a very bad blogger recently. Basically this is because I've been spending all my time working on a project with people at Berkeley, and so there isn't as much need to post, as I was mainly using this blog to communicate to Alexia on our project. Alexia, if you are reading this I haven't given up on our project, and in fact much of the work I am doing now will be useful for us. However, considering I want to graduate this year, I've been pushing forward on this new project that is working, and close to being publishable.

My collaborators at Berkeley have wanted me to keep quiet about my new project's results (until we publish), and so I've not been able to post plots etc. for this project. However, I do find it helpful for organization and motivation to write on this blog, so I think I'll start writing on it again, if nothing else that to say what I did that day.

I've been working a lot of error analysis for my correlation functions. I have bootstrap/jackknife analysis in place and have been doing various fits to the correlation function to try to figure out the errors. It's strange because depending on what fitting methods I use, I get wildly different errors... and sometimes it seems like I am getting a statistically significant result (3+ sigma) and other times not.

I've also been much better at writing self-contained code that is easily re-runnable. I used to drive Alexia crazy will all the "cutting and pasting" I was doing. Well no more. I know am doing object-oriented programs with lots of functions etc. It's actually really nice.

I've been learning a lot more about python. Using cool fitting tools and statistics tools. Python is such a powerful language.

Tuesday, June 14, 2011

Temperature Plots in Python

I figured out how to do temperature plots in python. The only problem with below is that the numbers on the axis are wrong. I if I set 'extent' to be the actual extent of the axis (in this case x = [0.5,1], y = [-28,-18]) then it plots a very skinny, tall plot. So I had to correct the axis by hand later.


import matplotlib
import numpy as np
import matplotlib.cm as cm
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
import matplotlib.pyplot as plt

redbins = linspace(start = 0.5, stop = 1.0, num = 51)
rcenter = (redbins[0:-1]+redbins[1:])/2.0
mbins = linspace(start = -28, stop = -18, num = 51)
mcenter = (mbins[0:-1]+mbins[1:])/2.0


H, xedges,yedges = N.histogram2d(qsoM[dr7cut],qsored[dr7cut],bins=(mbins,redbins))
X = rcenter
Y = mcenter
Z = H

#plt.imshow(H, extent=[-3,3,-3,3], interpolation='bilinear', origin='lower')
#plt.colorbar()


plt.imshow(H, extent=[-3,3,-3,3], interpolation='nearest', origin='lower')
plt.colorbar()

xlabel('redshift (z)')
ylabel('absolute magnitude (i-band)')
title('SDSS DR7 Quasar Density')

The above code is also in the following log file: ../logs/110614log.py

Wednesday, June 8, 2011

1st Attempt at QSO-Galaxy Cross Correlation

I've calculated the qso-galaxy correlation functions on the entire stripe, and also the stripe divided into two halves.
The code to do this is below (also in ../qsobias/Correlate/runCorrelation.py):

import numpy as N
from pylab import *
from correlationFunctions import *

#------------------------------------------------------------------------
# Set Angular Correlation Bin Information
#------------------------------------------------------------------------

oversample = 5. # Amount that randoms should be oversampled
corrBins = 15.0 # Number of correlation bins (+1)
mincorr = 0.1 # (Mpc/h comoving distance separation) Must be great than zero if log-binning
maxcorr = 10.0 # (Mphc/h comoving distance separation)
convo = 180./pi # conversion from degrees to radians
tlogbin = 1 # = 0 for uniform spacing, = 1 for log spacing in theta

#------------------------------------------------------------------------
# Create file names (full catalog)
#------------------------------------------------------------------------
workingDir = 'fullrun1'
makeworkingdir(workingDir)
galaxyDataFile, qsoDataFile, randomDataFile, corr2dCodefile, argumentFile, \
runConstantsFile = makeFileNamesFull(workingDir)

#------------------------------------------------------------------------
# Write run constants to a file (full catalog)
#------------------------------------------------------------------------

writeRunConstantsToFile(runConstantsFile, galaxyDataFile, qsoDataFile, \
randomDataFile, corr2dCodefile, argumentFile, oversample, corrBins, \
mincorr, maxcorr, tlogbin)

#------------------------------------------------------------------------
# Compute the Angular Correlation Function (full catalog)
#------------------------------------------------------------------------

runcrossCorrelation(workingDir, argumentFile, corr2dCodefile, galaxyDataFile,\
qsoDataFile, randomDataFile, mincorr, maxcorr, corrBins, tlogbin)

#------------------------------------------------------------------------
# Create file names (Left catalog)
#------------------------------------------------------------------------
workingDir = 'leftrun1'
makeworkingdir(workingDir)
galaxyDataFile, qsoDataFile, randomDataFile, corr2dCodefile, argumentFile, \
runConstantsFile = makeFileNamesLeft(workingDir)

#------------------------------------------------------------------------
# Write run constants to a file (Left catalog)
#------------------------------------------------------------------------

writeRunConstantsToFile(runConstantsFile, galaxyDataFile, qsoDataFile, \
randomDataFile, corr2dCodefile, argumentFile, oversample, corrBins, \
mincorr, maxcorr, tlogbin)

#------------------------------------------------------------------------
# Compute the Angular Correlation Function (Left catalog)
#------------------------------------------------------------------------

runcrossCorrelation(workingDir, argumentFile, corr2dCodefile, galaxyDataFile,\
qsoDataFile, randomDataFile, mincorr, maxcorr, corrBins, tlogbin)

#------------------------------------------------------------------------
# Create file names (Right catalog)
#------------------------------------------------------------------------
workingDir = 'rightrun1'
makeworkingdir(workingDir)
galaxyDataFile, qsoDataFile, randomDataFile, corr2dCodefile, argumentFile, \
runConstantsFile = makeFileNamesRight(workingDir)

#------------------------------------------------------------------------
# Write run constants to a file (Right catalog)
#------------------------------------------------------------------------

writeRunConstantsToFile(runConstantsFile, galaxyDataFile, qsoDataFile, \
randomDataFile, corr2dCodefile, argumentFile, oversample, corrBins, \
mincorr, maxcorr, tlogbin)

#------------------------------------------------------------------------
# Compute the Angular Correlation Function (Right catalog)
#------------------------------------------------------------------------

runcrossCorrelation(workingDir, argumentFile, corr2dCodefile, galaxyDataFile,\
qsoDataFile, randomDataFile, mincorr, maxcorr, corrBins, tlogbin)

#------------------------------------------------------------------------
# Must wait for jobs to finish running & output files to be created
#------------------------------------------------------------------------

#------------------------------------------------------------------------
# Plot Correlation Functions
#------------------------------------------------------------------------

#------------------------------------------------------------------------
# Plot Full Stripe 82
#------------------------------------------------------------------------

workingDir = "fullrun1"
outfile = "./"+workingDir+"/wpsOutput.dat"
theta,omega=readCorrOutfile(outfile) #Reads correlation output file
loglog(theta, omega,'r-', label = "Full Stripe")
# Number of dd pairs in each bin
ddpairs = [3029, 5404, 9578, 17175, 31653, 58113, 106518, 194566, 358573, 656623, 1211740, 2234135, 4121513, 7602541, 13931607]
errors = 1/N.sqrt(ddpairs)
errorbar(theta, omega, yerr=errors,fmt='ro')

xlabel('Separation Distance (Mpc/h)')
ylabel('omega')
title('2D QSO-Galaxy Cross-correlation Function')
pylab.legend(loc=3)

(plots on wiki)

#------------------------------------------------------------------------
# Plot Full Stripe 82, Left-Half and Right-Half of Stripe 82
#------------------------------------------------------------------------

workingDir = "fullrun1"
outfile = "./"+workingDir+"/wpsOutput.dat"
theta,omega=readCorrOutfile(outfile) #Reads correlation output file
loglog(theta, omega, label = "Full Stripe")

workingDir = "leftrun1"
outfile = "./"+workingDir+"/wpsOutput.dat"
theta1,omega1=readCorrOutfile(outfile) #Reads correlation output file
loglog(theta1, omega1, label = "Left Stripe")

workingDir = "rightrun1"
outfile = "./"+workingDir+"/wpsOutput.dat"
theta2,omega2=readCorrOutfile(outfile) #Reads correlation output file
loglog(theta2, omega2, label = "Right Stripe")

xlabel('Separation Distance (Mpc/h)')
ylabel('omega')
title('2D QSO-Galaxy Cross-correlation Function')
pylab.legend(loc=3)

(plots on wiki)

#------------------------------------------------------------------------
# Plot Full, Left-Half, Right-Half of Stripe 82, and Mean of all three + 3-sigma errors
#------------------------------------------------------------------------

workingDir = "fullrun1"
outfile = "./"+workingDir+"/wpsOutput.dat"
theta,omega=readCorrOutfile(outfile) #Reads correlation output file
loglog(theta, omega, label = "Full Stripe")

workingDir = "leftrun1"
outfile = "./"+workingDir+"/wpsOutput.dat"
theta1,omega1=readCorrOutfile(outfile) #Reads correlation output file
loglog(theta1, omega1, label = "Left Stripe")

workingDir = "rightrun1"
outfile = "./"+workingDir+"/wpsOutput.dat"
theta2,omega2=readCorrOutfile(outfile) #Reads correlation output file
loglog(theta2, omega2, label = "Right Stripe")

datamaxtrix = [omega,omega1,omega2]
mean = N.mean(datamatrix,axis=0)
errors = 3*N.std(datamatrix,axis=0)/sqrt(3)
loglog(theta, mean, 'k-',label = "Mean")
errorbar(theta, mean, yerr=errors,fmt='ko')

xlabel('Separation Distance (Mpc/h)')
ylabel('omega')
title('2D QSO-Galaxy Cross-correlation Function')
pylab.legend(loc=3)

(plots on wiki)

I sent the above plots to Martin, Alexie, Nic and David. Martin thought they looked about right. The slope of the mean correlation function above is -0.74. Martin and Nikhil found that the 3D correlation function was about -1.7, and we would expect the projected 2D correlation function to be about one less than this.... so that is good.

1) Try dividing the QSO samples into different redshift/brightness sets and see if we get different correlation results.
2) Implement jackknifing ability into the code.

Monday, June 6, 2011

Testing QSO-Galaxy Cross CF

I spent the last couple days making sure that I trusted the correlation function code I had developed for this project. I compared it to the results from a (slow) sphere-match code on a small data set.

The following are the testing codes (also here ../logs/cross_corr_check.idl). In IDL using spherematch:

; Read in data/random files
readcol,'cfht_data_tiny.dat',rad1,decd1, format = ('D,D')
readcol, 'qso_data_tiny.dat',rad2,decd2,cd, format = ('D,D,D') ;cd is comoving distance of qsos
readcol,'cfht_randoms_tiny.dat',rar1,decr1, format = ('D,D')

; Find number of objects in files
nr1 = n_elements(rar1)
nd1 = n_elements(rad1)
nd2 = n_elements(rad2)

; Correlate out to a large theta to get all pairs
thetamax = 50.0

print,'Starting cross-correlation:'
print,'Estimating DD...'
spherematch,rad1,decd1,rad2,decd2,thetamax,ind1a,ind1b,dist_dd,maxmatch=0

; Convert from angular separation to comoving distance separation
this_dd = 2*sin(dist_dd*!pi/360)*cd[ind1b]

;Bins go from 0.1 to 10 with 15 bins.
corrmin = 0.1D
corrmax = 10.0D
nbins = 15.0D

; Find bins lower, upper and centers
bins_lower = (corrmax-corrmin)/(nbins)*findgen(nbins)+corrmin
bins_upper = (corrmax-corrmin)/(nbins)*(findgen(nbins)+1)+corrmin
rmean = fltarr(nbins)
for i = 0,(nbins-1) do rmean[i] = (bins_lower[i]+bins_upper[i])/2.

; Bin the DD separation distances
dd = fltarr(nbins)
for i = 0,(nbins-1) do dd[i] = n_elements(where(this_dd gt bins_lower[i] AND this_dd le bins_upper[i]))

print,'Estimating DR...'
spherematch,rar1,decr1,rad2,decd2,thetamax,ind1,ind2,dist_dr1,maxmatch=0

this_dr = 2*sin(dist_dr1*!pi/360)*cd[ind2]
dr = fltarr(nbins)
for i = 0,(nbins-1) do dr[i] = n_elements(where(this_dr ge bins_lower[i] AND this_dr le bins_upper[i]))

corr1 = 1L*dd/dr*1L*(nd2*nr1)/(1L*nd1*nd2)-1L

for i = 0,(nbins-1) do print, rmean[i], corr1[i]

Separation omega
0.430000 -0.115686
1.09000 -0.104478
1.75000 -0.120804
2.41000 -0.0914845
3.07000 -0.0393971
3.73000 -0.0268416
4.39000 0.0134841
5.05000 0.0596094
5.71000 0.0227162
6.37000 0.102554
7.03000 0.0929233
7.69000 0.0900670
8.35000 0.0591398
9.01000 0.0284724
9.67000 0.0598689

(Note that these "tiny" files only have 6 qsos and 9000 galaxies, so the correlation function values are very noisy, this was just to test and I used small files to)

By comparison I also have the python/C code which runs much faster (../Jessica/qsobias/Correlate/runCorrelation.py):

import numpy as N

from pylab import *

from correlationFunctions import *


#------------------------------------------------------------------------

# Create file names (tiny catalogs)

#------------------------------------------------------------------------

workingDir = 'tinyrun'

makeworkingdir(workingDir)

galaxyDataFile, qsoDataFile, randomDataFile, corr2dCodefile, argumentFile, runConstantsFile = makeFileNamesTiny(workingDir)


oversample = 5. # Amount that randoms should be oversampled

corrBins = 25.0 # Number of correlation bins (+1)

mincorr = 0.1 # (Mpc/h comoving distance separation) Must be great than zero if log-binning

maxcorr = 10.0 # (Mphc/h comoving distance separation)

convo = 180./pi # conversion from degrees to radians

tlogbin = 1 # = 0 for uniform spacing, = 1 for log spacing in theta


#------------------------------------------------------------------------

# Write run constants to a file

#------------------------------------------------------------------------


writeRunConstantsToFile(runConstantsFile, galaxyDataFile, qsoDataFile, \

randomDataFile, corr2dCodefile, argumentFile, oversample, corrBins, \

mincorr, maxcorr, tlogbin)


#------------------------------------------------------------------------

# Compute the Angular Correlation Function

#------------------------------------------------------------------------

runcrossCorrelation(workingDir, argumentFile, corr2dCodefile, galaxyDataFile,\

qsoDataFile, randomDataFile, mincorr, maxcorr, corrBins, tlogbin)


# separation (Mpc/h) crossw (Mpc/h)

0.4300000000 -0.1156862745

1.0900000000 -0.1044776119

1.7500000000 -0.1208039566

2.4100000000 -0.0914845135

3.0700000000 -0.0393970538

3.7300000000 -0.0268417043

4.3900000000 0.0134841235

5.0500000000 0.0596093513

5.7100000000 0.0227161938

6.3700000000 0.1025539385

7.0300000000 0.0929232804

7.6900000000 0.0900670231

8.3500000000 0.0591397849

9.0100000000 0.0284723490

9.6700000000 0.0598689436


As you can see the correlation functions match!

Monday, February 1, 2010

Housekeeping

I've restructured the way the code is run to that I can do multiple runs at once and to hopefully make it easier to re-produce runs and look at older runs quickly. Below is how it works.

Whenever I do a new run, the first thing that python does is create a new 'working directory' of the following format:

runYYYYMD_HM

where
YYYY is the year (2010)
M is the month (2)
D is the day (1)
H is the hour (1337)
so in this example the directory would be called: run201021_1337

This in done in python by the following code:

now = dt.datetime.now()
year = now.year
month = now.month
day = now.day
hour = now.hour
minute = now.minute
workingDir = "run%d%d%d_%d%d"%(year,month,day,hour,minute)
command = "mkdir "+ workingDir
os.system(command)


Within this directory all the photometric and spectroscopic data files are placed for each of the redshift bins. These files are in the format:

photo2D.dat - ra and dec of all photometric data
spec2D_#.dat - ra and dec of the spectroscopic data in the # redshift bin
spec3D_#.dat - x, y, z of the spectroscopic data in the # redshift bin
photoCD.dat - the comoving distance of the photometric data (from photo-zs)

I then compute the 2D cross correlation functions between the photo2D.dat and spec2D_#.dat files and these are saved in the wps#.dat files. The input arguments for these correlation functions are in the files wpsInputs#.

I then compute the 3D auto correlation functions on the spec3D_#.dat files and these are saved in the xiss#.dat files. The input arguments for these correlation functions are in the files xiInputs#.

Because the correlation functions in each redshift bin are submitted as qsub jobs, they take a while to run... so at this point you need to wait for all those jobs to finish. The jobs are named as follows:

2D Cross Correlation: JessWps#workingDir
3D Auto Correlation: JessXiss#workingDir

where # is the redshift bin and workingDir is the name of the working directory.

I can monitor the qsub jobs by typing in 'qstat' to see what is running. Once all the jobs for my working directory are completed then I do the following:

load in the constants from the run file (I put this in the working directory) and it is named YYMMDD_#run.py
where YY is the year (10)
MM is the month (02)
and DD is the day (01)
# is the run number (if I've done multiple runs on this day)

I set the variable workingDir to the working directory.

I load in the photometric comoving distances (which are saved in a file called photoCD.dat):
photoCD = readPhotoCDfile(photoCDfile)

I create the wps and xi matrixes (which are used in the reconstruction) and save them as files:
wpsMat.dat and xiMat.dat respectively.

Then I can do the reconstruction of the redshift distribution and compare it to the photoCD.

Friday, January 29, 2010

Parallelization

Adam rocks. He helped me figure out a way to parallelize the calculations of my correlation functions to that they all happen at once. Hopefully this will speed up my code by a factor of 10-20. Here is how it works:

A while ago I posted about how to use qsub to run scripts. I've now change my python code to call qsub within the program which calculates the correlation functions.

I have the following script:
runcrossCorr.script
~~~~~~~~~~~~~~
#PBS -l nodes=1
#PBS -N JessCC$job
#PBS -j oe
#PBS -m ea
#PBS -M jakirkpatrick@lbl.gov
#PBS -o qsub.out

cd /home/jessica/repository/ccpzcalib/Jessica/pythonJess
echo "$corr2dfile $argumentfile > $outfile"
$corr2dfile $argumentfile > $outfile
~~~~~~~~~~~~~~
this script is called using the following command:
qsub -v job=0, corr2dfile=./2dcorrFinal, argumentfile=./run2010129_237/wpsInputs0, outfile=./run2010129_237/wps0.dat runcrossCorr.script

where -v basically sets up environment variables for the qsub command so that within the script (in this example)
$job is replaced with 0
$corr2dfile is replaced with ./2dcorrFinal
$argumentfile is replaced with /run2010129_237/wpsInputs0
$outfile is replaced with ./run2010129_237/wps0.dat runcrossCorr.script

Many of these qsub commands can be run in python and they all are sent to the queue and run in parallel on different cluster computers.

I've done this so that python automatically changes the argumentfile and outfile and runs all the correlation functions at once.

These correspond to updated versions of my python functions runcrossCorr and runautoCorr in the correlationData library.

Look at all my jobs running (type qstat into terminal)!

And it even emails me when it's done!

So, now all I do is make a python script that runs all the code and set it going in the background:

python 100128run.py 100128.out 2> 100128.err &

I'm running with 1,000,000 points and finer redshift binning. Hopefully this will work even better!

Monday, January 25, 2010

Time to Get Real

Now that I have a working reconstruction pipeline it is time to run on real SDSS data. I am going to re-run on the data set I have (from CAS database), but I don't believe that the photo-z's are accurate enough to use as the redshifts. To simply use all the spectroscopic data from Sloan isn't going to work because the mask is going to be horribly complicated and I really just want something simple like Stripe 82.

Eyal Kazin has a large catalog of Luminous Red Galaxies that he used to compute the BAO signal. He also has randoms (presumably using a mask) that I could use for calculating the correlation functions. I downloaded this catalog and it only has 105,831 objects in it (3190 in Stripe 82), however the redshift distribution is pretty distinct:


I might be worth trying to reconstruction using his randoms to see what I get. It is probably only 1/2 a day's work to re-write the correlation function to use his randoms instead of generating my own. His random catalog has 15X as many points and has the same redshift distribution (above) and ra, and dec distribution as the data:

The data (blue) falls right on top of the randoms (red)

I have also contacted Eric and Demitri as they have both mentioned having catalog/databases that I might be able to use. I should also contact Antonio.

Python Tip of the Day: I sometimes find that I lose my plotting window. You can close it in the python terminal by the following command: close() and then re-plot to find it again.

Wednesday, November 18, 2009

Running Scripts (Update)

I want to utilize the fact that I can run the correlation function on multiple nodes of riemann. Eric helped me learn to use qsub to run a script on multiple nodes. I've got it working as follows:

In the file on riemann 091118jess.script there is the following:

#PBS -l nodes=1
#PBS -N jessjob1
#PBS -j oe
#PBS -M kirkpatrick@berkeley.edu
#PBS -V
cd /home/jessica/repository/ccpzcalib/Jessica/pythonJess
./runcode.script


where runcode.script contains the following:
python runReconstructionMock.py

I make both runcode.script and 091118jess.script executables, and then type in the terminal:
qsub 091118jess.script

I then exit from riemann/insure (didn't close it using x button, not sure if this makes a difference).

When I log back in the code is still running. (check using qstat)

Now I need to integrate this into the python code so that qsubs are generated for each correlation function for each redshift slice. This should make things go a lot faster.

The email from Eric:
----------
From: Eric Huff
Date: Wed, Nov 18, 2009 at 3:06 PM
Subject: qsub
To: Jessica Kirkpatrick

Here's a script example:

#PBS -l nodes=1
#PBS -l walltime=24:00:00
#PBS -m bea
#PBS -M emhuff@berkeley.edu
#PBS -V
ulimit -s unlimited
export PHOTO_DATA=/clusterfs/riemann/raid006/bosswork/groups/boss/photo/data
export PHOTO_REDUX=/clusterfs/riemann/raid006/bosswork/groups/boss/photo/redux
cd /home/emhuff/Coadds/Code/Coadd_Data_Root/Scripts
echo `hostname`
/home/emhuff/Coadds/Code/Coadd_Data_Root/Scripts/r.out 380720 6 0 00

And here's some online documentation for qsub:

http://www.doesciencegrid.org/public/pbs/qsub.html
http://njal.physics.sunysb.edu/tutorial-1.html

I'm running the full reconstruction on the mock data, I might have a fully working code tomorrow. Woo hoo!

Degrees vs. Radians

Turns out the "List length exceeded MaxList" error was due to the randoms not falling in the same region as the data. I had forgotten to convert my data from radians to degrees in Python (but it was being converted in the C program) and so the C-converted randoms (in x, y, z) did not match the Python-converted data (in x, y, z).

I've fixed this, and get the following 3D correlation functions for my redshift slices of the Mock data (and they look pretty darn good)!


My 3D Correlation function on Mock Data

This makes me want to make the bold statement that all the problems I was having with my 3D correlation functions on the Sloan data was a problem with the data, (not my code)... but I need to check this again because the 2D correlation functions looked ok on the Sloan data. This could be an issue of the photo-z's just being completely wrong, (David keeps saying he doesn't trust the CAS) so the 2D correlation function (which only uses ra and dec) is fine, but once we start using the photo-z's things get messed up.

Another thought I had was that I need to fix this declination random cosine population issue in the 2D correlation function, currently the declinations are being populated using a flat distribution.

Oh today is a good day indeed!

Friday, September 25, 2009

Plotting with Joe

Joe Hennawi and I worked on trying to figure out where exactly we are going wrong with the likelihood method. He suggested making a two-dimensional histogram of the likelihoods binned in color-color space. This would allow us to see where in color-color space were are getting objects which have high likelihoods and will help us debug if there is a population of objects which are being falsely selected.

It is quite easy to make the 2D histogram, but plotting it has proved to be an issue. I've spent most of the day trying to do this. Here is how I create the histogram:
qsos = testqsos

b_u = 1.4
b_g = 0.9
b_r = 1.2
b_i = 1.8
b_z = 7.4

z_temp = qsos.Z_TOT
fu1 = qsos.PSFFLUX[0]
fg1 = qsos.PSFFLUX[1]
fr1 = qsos.PSFFLUX[2]
fi1 = qsos.PSFFLUX[3]
fz1 = qsos.PSFFLUX[4]

ivar_u1 = qsos.PSFFLUX_IVAR[0]
ivar_g1 = qsos.PSFFLUX_IVAR[1]
ivar_r1 = qsos.PSFFLUX_IVAR[2]
ivar_i1 = qsos.PSFFLUX_IVAR[3]
ivar_z1 = qsos.PSFFLUX_IVAR[4]


u1 = sdss_flux2mags(fu1, b_u)
g1 = sdss_flux2mags(fg1, b_g)
r1 = sdss_flux2mags(fr1, b_r)
i1 = sdss_flux2mags(fi1, b_i)
z1 = sdss_flux2mags(fz1, b_z)

sigu = sdss_ivar2magerr(ivar_u1, fu1, b_u)
sigg = sdss_ivar2magerr(ivar_g1, fg1, b_g)
sigr = sdss_ivar2magerr(ivar_r1, fr1, b_r)
sigi = sdss_ivar2magerr(ivar_i1, fi1, b_i)
sigz = sdss_ivar2magerr(ivar_z1, fz1, b_z)

ug = u1-g1
gr = g1-r1
ri = r1-i1
iz = i1-z1

icut = where(i1 GT 19.5)

nx = 50
ny = 50
imageQSO = fltarr(nx,ny)
imageAll = fltarr(nx,ny)
imageRatio = fltarr(nx,ny)

likeQSO = qsoqlike[icut]
likeAll = qsoslike[icut]
likeRatio = likeQSO/likeAll

qsoug = ug[icut]
qsogr = gr[icut]

populate_image, imageQSO, qsoug, qsogr, weight = likeQSO
populate_image, imageALL, qsoug, qsogr, weight = likeAll
populate_image, imageRatio, qsoug, qsogr, weight = likeRatio

Now I need code to plot this histogram in color-color space with hotter colors in bins with higher likelihoods and cooler colors in bins with smaller likelihoods. Anyone have code that does this in IDL or python? Joe has some, but I'm having a hard time getting it to work.

Wednesday, August 19, 2009

Coordinate Confusion

Time to run my 3D autocorrelation function on a mock catalog where we know the answer to try to see if there is a problem with my code or that the reconstruction is failing for another reason. I downloaded the following raw mock LRG catalog from Martin White's web page: halo_000_0.8000.dat.gz. I discovered that I can transfer files to riemann via the insure replacement by doing a two step scp. That is useful to know.

The code I am using to convert from ra, dec (degrees), comoving distance (r) to x, y, z and back:
%  x=r*sin(pi/2-pi/180*dec)*cos(pi/180*ra);
% y=r*sin(pi/2-pi/180*dec)*sin(pi/180*ra);
% z=r*cos(pi/2-pi/180*dec) ;

% r = sqrt(x^2 + y^2 + z^2)
% dec = 90 - 180/pi*arccos(z/r)
% ra = 180/pi*arcsin(y/sqrt(x^2 + y^2))
I am a little confused because these mock files are in Cartesian coordinates and if I convert them using the above code into ra and dec, I don't get the objects populating all of ra/dec space:


I constrained the data to be in a sphere of radius 0.5 boxsize (936.0 Mpc/h), and I would think that the ra and dec would then go from 0 to 360 and 0 to 180 respectively.

After some digging, I discovered that by using the arctan2 function I get the proper range:
%  theta = arctan2(sqrt(x^2+y^2), z)
% phi = arctan2(x, y)


However when I try to then convert back to Cartesian I get a strange answer when I plot the original z versus the converted back z:




This is distressing!

Sources:
http://astro.uchicago.edu/cosmus/tech/code/radecz2xxyyzz.m
http://www.atlasoftheuniverse.com/cosmodis.c
http://www.math.montana.edu/frankw/ccp/multiworld/multipleIVP/spherical/body.htm
http://www.daniweb.com/forums/post860497-2.html

Python tips for the day
1) If you are getting the following error when trying to plot:
RuntimeError: Agg rendering complexity exceeded.
Consider downsampling or decimating your data
Then exit and restart your session and try again and it for some reason works. Stupid python.

2) When converting from spherical use atan2 not asin.