Showing posts with label David Schlegel. Show all posts
Showing posts with label David Schlegel. Show all posts

Wednesday, June 8, 2011

1st Attempt at QSO-Galaxy Cross Correlation

I've calculated the qso-galaxy correlation functions on the entire stripe, and also the stripe divided into two halves.
The code to do this is below (also in ../qsobias/Correlate/runCorrelation.py):

import numpy as N
from pylab import *
from correlationFunctions import *

#------------------------------------------------------------------------
# Set Angular Correlation Bin Information
#------------------------------------------------------------------------

oversample = 5. # Amount that randoms should be oversampled
corrBins = 15.0 # Number of correlation bins (+1)
mincorr = 0.1 # (Mpc/h comoving distance separation) Must be great than zero if log-binning
maxcorr = 10.0 # (Mphc/h comoving distance separation)
convo = 180./pi # conversion from degrees to radians
tlogbin = 1 # = 0 for uniform spacing, = 1 for log spacing in theta

#------------------------------------------------------------------------
# Create file names (full catalog)
#------------------------------------------------------------------------
workingDir = 'fullrun1'
makeworkingdir(workingDir)
galaxyDataFile, qsoDataFile, randomDataFile, corr2dCodefile, argumentFile, \
runConstantsFile = makeFileNamesFull(workingDir)

#------------------------------------------------------------------------
# Write run constants to a file (full catalog)
#------------------------------------------------------------------------

writeRunConstantsToFile(runConstantsFile, galaxyDataFile, qsoDataFile, \
randomDataFile, corr2dCodefile, argumentFile, oversample, corrBins, \
mincorr, maxcorr, tlogbin)

#------------------------------------------------------------------------
# Compute the Angular Correlation Function (full catalog)
#------------------------------------------------------------------------

runcrossCorrelation(workingDir, argumentFile, corr2dCodefile, galaxyDataFile,\
qsoDataFile, randomDataFile, mincorr, maxcorr, corrBins, tlogbin)

#------------------------------------------------------------------------
# Create file names (Left catalog)
#------------------------------------------------------------------------
workingDir = 'leftrun1'
makeworkingdir(workingDir)
galaxyDataFile, qsoDataFile, randomDataFile, corr2dCodefile, argumentFile, \
runConstantsFile = makeFileNamesLeft(workingDir)

#------------------------------------------------------------------------
# Write run constants to a file (Left catalog)
#------------------------------------------------------------------------

writeRunConstantsToFile(runConstantsFile, galaxyDataFile, qsoDataFile, \
randomDataFile, corr2dCodefile, argumentFile, oversample, corrBins, \
mincorr, maxcorr, tlogbin)

#------------------------------------------------------------------------
# Compute the Angular Correlation Function (Left catalog)
#------------------------------------------------------------------------

runcrossCorrelation(workingDir, argumentFile, corr2dCodefile, galaxyDataFile,\
qsoDataFile, randomDataFile, mincorr, maxcorr, corrBins, tlogbin)

#------------------------------------------------------------------------
# Create file names (Right catalog)
#------------------------------------------------------------------------
workingDir = 'rightrun1'
makeworkingdir(workingDir)
galaxyDataFile, qsoDataFile, randomDataFile, corr2dCodefile, argumentFile, \
runConstantsFile = makeFileNamesRight(workingDir)

#------------------------------------------------------------------------
# Write run constants to a file (Right catalog)
#------------------------------------------------------------------------

writeRunConstantsToFile(runConstantsFile, galaxyDataFile, qsoDataFile, \
randomDataFile, corr2dCodefile, argumentFile, oversample, corrBins, \
mincorr, maxcorr, tlogbin)

#------------------------------------------------------------------------
# Compute the Angular Correlation Function (Right catalog)
#------------------------------------------------------------------------

runcrossCorrelation(workingDir, argumentFile, corr2dCodefile, galaxyDataFile,\
qsoDataFile, randomDataFile, mincorr, maxcorr, corrBins, tlogbin)

#------------------------------------------------------------------------
# Must wait for jobs to finish running & output files to be created
#------------------------------------------------------------------------

#------------------------------------------------------------------------
# Plot Correlation Functions
#------------------------------------------------------------------------

#------------------------------------------------------------------------
# Plot Full Stripe 82
#------------------------------------------------------------------------

workingDir = "fullrun1"
outfile = "./"+workingDir+"/wpsOutput.dat"
theta,omega=readCorrOutfile(outfile) #Reads correlation output file
loglog(theta, omega,'r-', label = "Full Stripe")
# Number of dd pairs in each bin
ddpairs = [3029, 5404, 9578, 17175, 31653, 58113, 106518, 194566, 358573, 656623, 1211740, 2234135, 4121513, 7602541, 13931607]
errors = 1/N.sqrt(ddpairs)
errorbar(theta, omega, yerr=errors,fmt='ro')

xlabel('Separation Distance (Mpc/h)')
ylabel('omega')
title('2D QSO-Galaxy Cross-correlation Function')
pylab.legend(loc=3)

(plots on wiki)

#------------------------------------------------------------------------
# Plot Full Stripe 82, Left-Half and Right-Half of Stripe 82
#------------------------------------------------------------------------

workingDir = "fullrun1"
outfile = "./"+workingDir+"/wpsOutput.dat"
theta,omega=readCorrOutfile(outfile) #Reads correlation output file
loglog(theta, omega, label = "Full Stripe")

workingDir = "leftrun1"
outfile = "./"+workingDir+"/wpsOutput.dat"
theta1,omega1=readCorrOutfile(outfile) #Reads correlation output file
loglog(theta1, omega1, label = "Left Stripe")

workingDir = "rightrun1"
outfile = "./"+workingDir+"/wpsOutput.dat"
theta2,omega2=readCorrOutfile(outfile) #Reads correlation output file
loglog(theta2, omega2, label = "Right Stripe")

xlabel('Separation Distance (Mpc/h)')
ylabel('omega')
title('2D QSO-Galaxy Cross-correlation Function')
pylab.legend(loc=3)

(plots on wiki)

#------------------------------------------------------------------------
# Plot Full, Left-Half, Right-Half of Stripe 82, and Mean of all three + 3-sigma errors
#------------------------------------------------------------------------

workingDir = "fullrun1"
outfile = "./"+workingDir+"/wpsOutput.dat"
theta,omega=readCorrOutfile(outfile) #Reads correlation output file
loglog(theta, omega, label = "Full Stripe")

workingDir = "leftrun1"
outfile = "./"+workingDir+"/wpsOutput.dat"
theta1,omega1=readCorrOutfile(outfile) #Reads correlation output file
loglog(theta1, omega1, label = "Left Stripe")

workingDir = "rightrun1"
outfile = "./"+workingDir+"/wpsOutput.dat"
theta2,omega2=readCorrOutfile(outfile) #Reads correlation output file
loglog(theta2, omega2, label = "Right Stripe")

datamaxtrix = [omega,omega1,omega2]
mean = N.mean(datamatrix,axis=0)
errors = 3*N.std(datamatrix,axis=0)/sqrt(3)
loglog(theta, mean, 'k-',label = "Mean")
errorbar(theta, mean, yerr=errors,fmt='ko')

xlabel('Separation Distance (Mpc/h)')
ylabel('omega')
title('2D QSO-Galaxy Cross-correlation Function')
pylab.legend(loc=3)

(plots on wiki)

I sent the above plots to Martin, Alexie, Nic and David. Martin thought they looked about right. The slope of the mean correlation function above is -0.74. Martin and Nikhil found that the 3D correlation function was about -1.7, and we would expect the projected 2D correlation function to be about one less than this.... so that is good.

1) Try dividing the QSO samples into different redshift/brightness sets and see if we get different correlation results.
2) Implement jackknifing ability into the code.

Wednesday, March 16, 2011

Likelihood Paper

SDSS-III Publication 33: A Simple Likelihood Estimator for Quasar Target Selection

Authors:
Jessica A. Kirkpatrick (corresponding author)
David J. Schlegel, Nicholas P. Ross, Adam D. Myers, Joseph F. Hennawi

Abstract:
We present a new method for quasar target selection using a likelihood estimator. For our purposes we target quasars using Sloan Digital Sky Survey (SDSS) photometry to a magnitude limit of g=22. The efficiency and completeness of this technique is measured using Baryon Oscillation Spectroscopic Survey (BOSS) Commissioning Data, taken in late 2009. This technique was used as the CORE method for target selection for BOSS Year 1 spectroscopy to be realized in the 9th SDSS data release (DR9). When targeting at a density of 40 objects/deg2 we find the efficiency of this technique to be 41% and the completeness compared to all quasars identified in BOSS Commissioning Data to be 62%. This paper also describes possible extensions and improvements for this technique.

Comments:
SDSS-III participants have three weeks to send comments and to request changes. Participants comments should be sent to the Jessica Kirkpatrick.
The deadline to submit comments is Wednesday, April 6th 2011.

This is a BOSS science paper intended for submission to the Astrophysical Journal.

Downloads:
The most up to date .pdf of the paper can be found here:
likelihood paper (current) this version will be updated as I receive comments during the next three weeks.

The version of the paper submitted to the collaboration (3/16) is here:
likelihood collaboration submission

There is also a single column, double spaced version here (for easier editing):
likelihood paper one column

The .tex files and all associated figures, can be checked-out from the SVN at this location:
svn co svn+ssh://sdss3svn@sdss3.org/repo/boss/bosstarget/trunk/tex/likelihood/

Or downloaded here on the SDSS3 wiki.

Monday, February 28, 2011

BOSS Mask Problem?

Below is an email I sent to Martin White today after Shirley, Nic and David couldn't figure out what I was doing wrong with my masks/randoms:

~~~~~~~~
Martin,
Back in December you generated a polygon mask for me for the BOSS data. I've been using this mask + BOSS data for my reconstruction project with Alexia and have gotten puzzling results when I compute correlation functions on the data.

This prompted me to look more closely at the way I was applying the mask to make sure I wasn't making a stupid mistake.

The BOSS data inside the mask (with weights greater than 0) looks very patchy up close, and both Ross and Schlegel think it shouldn't look like this.

I was wondering if you could look at the attached plots and see if you think the mask is being applied correctly.

MaskedBOSSDataAll.png shows the entire footprint. The blue points are galaxies in the spAll file that were outside the mask or have a weight = 0. The red points are galaxies in the spAll file that were inside the mask and have a weight GT 0.


MaskedBOSSDataClose.png shows a zoom into the region (35 < Dec < 55, 110 < RA < 130). The color scheme is the same as above. This shows the strange "patchy" nature of the masked data even in regions which seem to have been observed. I tried to use the same ra/dec range as Fig 3 from your paper (http://arxiv.org/PS_cache/arxiv/pdf/1010/1010.4915v2.pdf) for easy comparison.


One difference between my data and that in your plot is that the weights (completeness) of the red objects in my plots range from [0.75 to 1.0], whereas in your paper's plot they seem to range from [0 to 1.0]. Perhaps this is how you determined what is in the mask, if it had a spectroscopic completeness greater than 0.75?

If you have any insight into what I am doing wrong, that would be really helpful. Below is the IDL code I am using to apply the masks and generate the attached plots.

Thank you,
Jessica

;Here the code I used to apply the mask:
;Read in BOSS data
bossgalaxies = '/clusterfs/riemann/raid001/jessica/boss/spAll-v5_4_14.fits'
spall = mrdfits(bossgalaxies, 1)

;Trim to be a galaxy
isgal = where(spall.specprimary EQ 1 $
AND (spall.boss_target1 AND 2L^0+2L^1+2L^2+2L^3+2L^7) NE 0 $
AND strmatch(spall.class,'GALAXY*') $
AND spall.zwarning EQ 0)

;Select galaxies
galaxies = spall[isgal]
ra = galaxies.ra
dec = galaxies.dec
z = galaxies.z
spall = 0

bossmask = "/clusterfs/riemann/raid001/jessica/boss/bossX.002.ply"
read_mangle_polygons, bossmask, polygons, id
results = is_in_window(ra=ra,dec=dec,polygons,in_polygon=polynum)
inBossMask = where(results GT 0)
polys = polynum[inBossMask]
bossWeight = polygons[polys].weight
;Cut the Ra/Dec/z such that it is in the Shirley Mask
sra = ra[inBossMask]
sdec = dec[inBossMask]
sz = z[inBossMask]

;Here is the code I used to make the attached plots
;Plot Data that is inside the mask
window,xsize=700,ysize=600
xobject = sra
yobject = sdec
xtit = 'RA'
ytit = 'Dec'
mtit = 'SDSS BOSS Data w/ and w/o Mask'
plot, xobject, yobject, psym=3, symsize=2, xrange = [130,110], yrange=[35,55], XTITLE = xtit, YTITLE = ytit, TITLE = mtit, charsize = 2, charthick = 1, thick = 2, xthick=2, ythick=2
oplot, ra, dec, ps=3, color=fsc_color('blue')
thisweight = where(bossWeight GT 0)
oplot, xobject[thisWeight], yobject[thisWeight], ps=3, color=fsc_color('red')


plot, xobject, yobject, psym=3, symsize=2, xrange = [0,360], yrange=[0,60], XTITLE = xtit, YTITLE = ytit, TITLE = mtit, charsize = 2, charthick = 1, thick = 2, xthick=2, ythick=2
oplot, ra, dec, ps=3, color=fsc_color('blue')
thisweight = where(bossWeight GT 0)
oplot, xobject[thisWeight], yobject[thisWeight], ps=3, color=fsc_color('red')

Monday, October 18, 2010

Next steps...

I talked to Shirley and Alexia and they both agree that the procedure outlined in my last post makes sense. Shirley told me that the weights in her file don't mean anything, so I should just treat them is a step function. I need to talk to David to make sure he agrees with all this.

Thinking I should post-pone my trip to Los Alamos until December. Feeling like I'd like to have the data in better shape before I get there, and the flights are cheaper.

Steps:
  1. Talk to David and make sure he agrees with these steps. Also talk about funding for trip.
  2. Generate randoms as laid out in last post.
  3. Book tickets for December.
  4. Keep looking for temp housing in Los Alamos.

Tuesday, March 9, 2010

Likelihood Wins = More Work For Me

At the BOSS Collaboration Meeting today we decided to make the core sample be composed of the top 20/deg^2 likelihood targets. The rest of the fibers are going to be targeted using a combined method. We're number one! Awesome.

However, this puts the pressure on the likelihood target selection group (me, Joe, David) to get this thing optimized, trained, etc.

First thing on the agenda, fix the luminosity function:
Above shows two different QSO luminosity selection functions as a function of redshift. The green is the Jiang et al. (2008) and the red is the Hopkins, Richards and Hernquist (2007).

I'm going to try running both of them down to 0.5 and see how they compare to what we've been doing. Joe says I shouldn't spend more than a couple days on this.

Adam Myers also suggested another paper to look at: R.J. Assef et al. The Mid-IR and X-ray Selected QSO Luminosity Function

In the mean time I am also re-running the reconstruction on the photo-z SDSS data with the fixed 2-d correlation function. Should have results from that soon. Waiting on Alexia to give me a new crl to see if this fixes the binning issues I was having.

Lots to do lots to do! Feeling excited about life.

Thursday, January 28, 2010

Working Data Reconstruction!

I ran the SDSS CAS data though our working pipeline. I've run on a small data set as a first go -- the same size as what I was running on for the reconstructions with the mock data -- 20,000 points in both the photometric and spectroscopic data sets. The difference here is that both data sets are coming from the same super-set of data (I am randomly picking points from a larger set of Stripe-82 data), so the redshift distributions of the two sets are the same (see below plot). This, in theory, should make the reconstruction easier to do.

Redshift Distributions of photometric and spectroscopic data sets


The Bad News
The correlation functions look really noisy. This is hopefully due to the fact that I am running with 20,000 points instead of 1,000,000 points (before) so hopefully the noise will clear up when I re-run with a larger data set. I wouldn't expect the correlations to look as smooth on a smaller data set as it does with the mock data, because the mock data, by design, traces dark matter populations explicitly, so the dark matter signal is put in by hand. There is also more noise on this data than on the mock data. It is logical to me that we might need higher statistics to get the same strength of signal as we did in the mock data.


The Good News
Even with the noisy correlation functions, the reconstruction looks not too bad. There seems to be a problem at the edges (it seems to be forcing the reconstruction to be zero at redshift zero and thus changing the shape on the low end), but overall it looks pretty good. (At least much better than it's ever looked before). Thoughts Alexia?


I've chosen the binning which looks the best. Like what happened in the mock data the reconstruction changes a lot depending on the binning. However, it looks like I might actually have something working here! Woo hoo!

Next steps
Higher Statistics: I'm not sure if I should run on Eyal's LRGs or on a bigger set of CAS data. The CAS data is easier to do right away, so I might as well set that going (it will take a couple days), and in the mean time I can work on getting my code to read in Eyal's randoms instead of generating it's own. Will talk to Schlegel about this and see what he thinks.

Other Catalogs: According to Eric Huff, Reiko Nakajima has a SDSS catalog with her own calculated photo-z's that she uses for lensing. I will talk to her about this. I also should get in touch with Antonio Vazquez about catalogs.

Optimization: There is quite a few things I can do to make this code run faster. I need to spend some time writing scripts to simultaneously calculate all the correlation functions (right now they are calculated in succession). This should speed up the code by a factor of ten. I also should write scripts so that I can just set a run going and then log-out of the terminal window.

Organizational Note
For a while now I've been creating a log file for each day with the code I use to generate the plots for the blog post that day. I've been keeping these log files locally on my computer, but decided it might be useful for collaborators to have copies of them, and also for there to be a back-up should something happen to my laptop. I've now added these logs to the repository under: repositoryDir/Jessica/pythonJess/logs. The name of the log file is the date of the blog post which the code corresponds to.

Thursday, December 3, 2009

Variable Likelihood

David Schelgel and I added variability information into the "everything" likelihood training file. This allows us to separate objects that vary (QSOs are included in this list) from objects that don't vary. This should improve our training by removing quasars from the "everything" file.

The training file now has chi^2 information of the changes in fluxes over repeat observations. The likelihood computation splits up L_EVERYTHING into chi^2 bins between the following :

L_EVERYTHING_ARRAY[0] : objs1.flux_clip_rchi2[2] LT 1
L_EVERYTHING_ARRAY[1] : objs1.flux_clip_rchi2[2] GE 1.1 AND
objs1.flux_clip_rchi2[2] LT 1.2
L_EVERYTHING_ARRAY[3] : objs1.flux_clip_rchi2[2] GE 1.2 AND
objs1.flux_clip_rchi2[2] LT 1.3
L_EVERYTHING_ARRAY[4] : objs1.flux_clip_rchi2[2] GE 1.3 AND
objs1.flux_clip_rchi2[2] LT 1.4
L_EVERYTHING_ARRAY[5] : objs1.flux_clip_rchi2[2] GE 1.4 AND
objs1.flux_clip_rchi2[2] LT 1.5
L_EVERYTHING_ARRAY[6] : objs1.flux_clip_rchi2[2] GE 1.5 AND
objs1.flux_clip_rchi2[2] LT 1.6
L_EVERYTHING_ARRAY[7] : objs1.flux_clip_rchi2[2] GE 1.6

Changing L_Ratio to be L_QSO / total(L_EVERYTHING_ARRAY[0:5] (which removes the L_EVERYTHING of the objects with variablility (chi^2 less than 1.5)), we get more QSOs. From the truth table data, using different chi^2 cuts:

L_ELSE = total(likechi.L_EVERYTHING_ARRAY[0:1],1) ; chi^2 LT 1.2
;new quasars selected
QNEWSELECT LONG = Array[77]
;new stars selected (not quasars)
SNEWSELECT LONG = Array[106]
; #new_quasars/#new_stars
0.726415

L_ELSE = total(likechi.L_EVERYTHING_ARRAY[0:2],1) ; chi^2 LT 1.3
;new quasars selected
QNEWSELECT LONG = Array[69]
;new stars selected (not quasars)
SNEWSELECT LONG = Array[92]
; #new_quasars/#new_stars
0.750000

L_ELSE = total(likechi.L_EVERYTHING_ARRAY[0:3],1) ; chi^2 LT 1.4
;new quasars selected
QNEWSELECT LONG = Array[67]
;new stars selected (not quasars)
STILLMISSING LONG = Array[159]
; #new_quasars/#new_stars
0.797619

L_ELSE = total(likechi.L_EVERYTHING_ARRAY[0:4],1) ; chi^2 LT 1.5
;new quasars selected
QNEWSELECT LONG = Array[65]
;new stars selected (not quasars)
SNEWSELECT LONG = Array[79]
; #new_quasars/#new_stars
0.822785

L_ELSE = total(likechi.L_EVERYTHING_ARRAY[0:5],1) ; chi^2 LT 1.6
;new quasars selected
QNEWSELECT LONG = Array[62]
;new stars selected (not quasars)
SNEWSELECT LONG = Array[72]
; #new_quasars/#new_stars
0.861111



This is L_EVERYTHING (chi2 less than 1.4) vs L_QSO
Green: Not quasars
Magenta: Quasars that the old likelihood method (no variability) targeted
Cyan: Quasars that the likelihood + variability method targets
Red: Quasars we are still missing (not targeting).

On another note, possible Thesis Title: "Putting Galaxies in their Place." Thoughts?

Wednesday, November 25, 2009

Likely Testing

I'm trying to track down why the likelihood method is missing quasars that other targeting methods are finding.

The plot of human-confirmed QSOs that were in the target list for the likelihood method. The magenta are targeted by likelihood, the cyan are missed by likelihood, the white are targeted only by likelihood (click on image to enlarge):

Color-Color (u-g vs g-r)
of likelihood selected objects (magenta),
likelihood missed objects (cyan),
likelihood only objects (white)

The reason this plot looks different than yesterdays is because I was accidentally plotting both QSOs and stars on the plot yesterday. The above human-confirmed QSOs.

Below is a log-log plot of the likelihood selection variables L_everything vs L_qso for the likelihood-selected QSOs (magenta), the likelihood-missed QSOs (cyan), the likelihood-selected stars (red), and the likelihood-missed stars (green) you can see the value of L_ratio we cut on is the slope of the dividing line between these populations (click on image to enlarge):


L_everything vs L_qso

I've looked at the individual fluxes of the likelihood selected/missed QSOs to see if there an obvious place in flux-color space where we can get more of the missing objects. They are all intermixed.

David Schlegel suggested increasing the 'added errors' in the likelihood calculation to see if this improves things. I am re-running the likelihoods on these ~1400 objects (above) with larger errors. The hope is that because we are have such a sparse sample of the QSOs, that by increasing the errors we'll allow for objects that are "farther away" in flux-space to be selected. The errors are currently on the order of ~1%. I am re-running with 10%, however David suggests that I do it with 5%, so I'll run with that too.

Monday, November 23, 2009

Revisiting Likelihood

Nick, David and Adam want me to check that the single epoch likelihood computation is correct. I also need to address some long-standing questions about the likelihood method including:
1. Why are we getting z > 3 when we aren't targeting these objects?
2. What is the proper normalization of the likelihood objects?
And several other questions posted a while back and never answered.

The likelihood code is at the following location on the repository:
https://trac.sdss3.org/browser/repo/boss/bosstarget/trunk/pro/qso-like/

I've also downloaded it here on riemann:
/home/jessica/boss/bosstarget/trunk/pro/qso-like

The varcat files are here (on riemann):
/home/schlegel/varcat4

The star/qso template files are here on the repository:
https://trac.sdss3.org/browser/repo/boss/bosstarget/trunk/data

I've also downloaded it here on riemann:
/home/jessica/boss/bosstarget/trunk/data

Myers sent me a fits file that has the single epoch targets with the likelihoods he computed. He wants me to check that they look reasonable. The file is in an email sent Nov 18/19 from Adam Myers with the subject line: Likelihood Method.

Below is an email I sent to Myers today regarding this check:

-------------------
From: Jessica Ann Kirkpatrick
Date: Mon, Nov 23, 2009 at 4:15 PM
Subject: Re: Likelihood method
To: Nic Ross
Cc: Joseph Hennawi , Adam Myers, David Schlegel

Adam,
when I compute the likelihoods I get the same values as you do for
like_ratio, like_everything, like_QSO_Z (well, within expected
rounding errors).

One thing that I had a question about is how like_rank is numbered. I
was assuming that like_rank = 0 would be the target with the largest
like_ratio, but it seems it is the reverse (like_rank = 0 is the
lowest like_ratio object).

So we are sorting target selection by largest ranking to smallest?

I've attached a color-color (ug, gr) plot of the 10,000 highest ranked
likelihood objects (cyan) on top of "everything" objects (red) and
qsos (green). You can see that these targets fall in the quasar zone.


Color-Color (ug, gr) plot of the 10,000 highest ranked
likelihood objects (cyan) on top of "everything" objects (red) and
qsos (green). You can see that these targets fall in the quasar zone.

Useful Mac Tips of the Day (via Demitri):
If you want to ssh into your Mac, go to
  • System Preferences -> Sharing -> Remote Login
  • Turn Remote Login on. Then find out your IP address by typing the following into a web browser: http://whatismyipaddress.com/
You can then ssh/scp into your Mac in the normal way using jessica@myipaddress!

Also... Alexia will be happy to know that I finally implemented different terminal colors for different computers that I am logged into (so much less confusing):


Top terminal window (green) is riemann,
lower terminal window (black) is my Macbook.

And... Demitri showed me how to turn any mp3 into a ring tone using Garage Band. What a great day for cool tricks! Now my alarm clock will be the beginning of James Brown's Sex Machine: "Get-up, get-on-up. Get-up, get-on-up."

Wednesday, November 18, 2009

Degrees vs. Radians

Turns out the "List length exceeded MaxList" error was due to the randoms not falling in the same region as the data. I had forgotten to convert my data from radians to degrees in Python (but it was being converted in the C program) and so the C-converted randoms (in x, y, z) did not match the Python-converted data (in x, y, z).

I've fixed this, and get the following 3D correlation functions for my redshift slices of the Mock data (and they look pretty darn good)!


My 3D Correlation function on Mock Data

This makes me want to make the bold statement that all the problems I was having with my 3D correlation functions on the Sloan data was a problem with the data, (not my code)... but I need to check this again because the 2D correlation functions looked ok on the Sloan data. This could be an issue of the photo-z's just being completely wrong, (David keeps saying he doesn't trust the CAS) so the 2D correlation function (which only uses ra and dec) is fine, but once we start using the photo-z's things get messed up.

Another thought I had was that I need to fix this declination random cosine population issue in the 2D correlation function, currently the declinations are being populated using a flat distribution.

Oh today is a good day indeed!

Thursday, October 29, 2009

Big Mock Run

I am performing several test to figure out what is wrong with my 3D correlation functions. The first is to run on a bigger mock data set.

I had a discussion with Nic Ross today about the proper way to deal with populating randoms. I was starting to fear that I was doing something wrong in that I force the randoms to have the same distribution in the "redshift" dimension. I was told to do this by David and Nikhil, but wondered if this could be the cause of the problem. Nic assured me that this was the right thing to do, and suggested I read the following papers about correlation functions:

The 2dF Galaxy Redshift Survey: correlation functions, peculiar velocities and the matter density of the Universe

The 2dF QSO Redshift Survey - XIV. Structure and evolution from the two-point correlation function


Nic looked at my 3D correlation functions and agrees that there is something fishy about them.

In terms of the bigger run:
I am doing the same comparison of Alexia's and my correlation 3D correlation functions that I did yesterday, but on a much bigger box/sphere. My data is contained in a sphere with radius 471.0 Mpc/h (half of the box). Alexia's data is the entire box.

Some plots of the data masks:




I am worried that some of the geometry effects will come into play when I compare the correlation functions because
  1. Alexia's box has more points (corners of the box)
  2. Alexia's box is a box, while mine is a sphere
  3. When you look at a histogram of the distribution of the declinations, it is not flat (you would expect this due to there being more objects in the center of the sphere when dec is ~pi/2 (versus the top or bottom or top of the sphere). However, when I populate the randoms I think I populate them uniformly... need to check this.
This takes a while to run because there are 300k objects in each box. Results to follow...

Monday, October 26, 2009

Rage!

The reconstruction doesn't work.

I am so frustrated! I don't know what I am doing wrong. It doesn't match at all. I am having one of those moments where I feel like nothing I do ever works and I am a complete failure as a grad student. I mean, I'm in my 5th year of my PhD and don't have a single paper published, nor a working project. Everything that I am doing that does work is someone else's code (i.e. Alexia or David) and therefore has nothing to do with my talent or skills. Rage Rage Rage!

Here is a histogram of the "redshifts" (converted to units of comoving line-of-sight distance (Gpc/h) away from the observer) of the photometric data set (based on the Sloan photo-z's):

This is what the reconstruction should look like. However, when I do the reconstruction I get the following:
(The green is the reconstruction)

This actually paints a better picture than I actually have because
1) The normalization doesn't work and so I am tuning the normalization to match the answer (can't do this when I don't know the answer).
2) This is with really course binning. If I use finer binning (which is what we would ideally want to do), I get worse results:


What is up with the reconstruction going to zero (~1, 1.6, 1.9 Gpc/h)? I am so confused about that. Something that Adam and I discussed last night was that we would hope that this method would work at least as well as simply taking a histogram of the redshifts of the spectroscopic data set. In this case the spectroscopic data is actually from the same data as the photometric set, so the redshift distributions of the two sets are almost identical:


The fact that the reconstruction is significantly worse than this is really disheartening. I give up for today. I'm going to work on likelihood stuff in rebellion! (and post a sad facebook status message so people feel sorry for me and make me feel better)

Friday, October 9, 2009

Talking with Weinberg

I ran into David Weinberg at tea at the IAS today. We started talking about the Newman Project.
He suggested that after we get the method working on Stripe 82 using the LRGs as our "spectroscopic sample" and the rest of the galaxies as our "photometric sample" we could compare the reconstructed distribution with spectroscopic "follow-up" surveys such as COSMOS, Vimas?, and DEEP.

Then it would be interesting to break down the main sample (i.e. the rest of the galaxies) by colors and do redshift distribution reconstruction on each type of galaxy. We could also break these into photometric redshift slices and then look at the reconstruction in each slice and break those down by colors, to see if perhaps there is a different spread in the colors.

I thought these were really good ideas and ones that I haven't though of before. Need to talk to Schlegel/Nikhil and see what they think. Oh exciting!

Friday, October 2, 2009

BOSS Data Kickoff Meeting

Today there was a meeting at LBNL to introduce people to the BOSS Data pipeline.
Some useful links (appologies, those not part of the SDSS collaboration wont be able to view these without membership):
Photometry information: http://www.sdss.org/dr7/algorithms/photometry.html
Photometry will be essentially the same as SDSS-II

Flags information: http://www.sdss.org/dr7/products/catalogs/flags.html
It was recommended that I use the new reduction on riemann instead of those on the CAS. Need to get David/Eric to help me to get a catalog of objects to run the reconstruction code on that have good photo-zs.
QSO Truth Table: https://trac.sdss3.org/wiki/BOSS/data_access
https://trac.sdss3.org/attachment/wiki/BOSS/target/Commissioning/
Need to see how well likelihood did in comparison to other methods for QSO target selection
View Spectroscopic Data on Riemann
Put these commands in your .bashrc file:
export IDLUTILS_DIR=$HOME/idlutils
export IDLSPEC2D_DIR=$HOME/idlspec2d
export IDL_PATH=+{$IDLUTILS_DIR/pro}:+{$IDLSPEC2D_DIR/pro}:{$IDL_PATH}
export SPECTRO_DATA=/clusterfs/riemann/raid006/bossredux/v5_4_1

In IDL, use "plotspec" to display spectra. For example:
IDL> plotspec, 3552, mjd=55098, 572


Wednesday, August 12, 2009

Tools of the Trade

I've found some useful tools for blog writing when you are trying to insert equations and images.
~~~~~

I talked with Eric Gawiser today. He has thought a bit about the Newman method and had some good suggestions for me. He suggested that we would expect bias to be a function of redshift in the Sloan data set, and therefore we would have trouble properly normalizing our reconstruction because it would be the actual redshift distribution convolved with the bias function. He said that perhaps we could get a handle on the bias as a function of redshift by doing a 3D-autocorrelation function on the photometric data. I am not sure if this is already taken into account with the reconstruction or if the reconstruction is assuming a constant bias over redshift bins. This is something I need to talk to Alexia about. Nonetheless, Eric seems to know a lot about this and would be a good person to keep talking to about it.

~~~~~

I talked to David about the SVM-Light package. He seemed not so enthused. He said he wanted me to work out some other problems before I moved on to "code optimization." We went over how they ended up doing the selection on the QSOs using the likelihood method. He wants me to work out the normalization of the likelihood function so that the numbers actually mean something logical. He also isn't convinced that the MMT data is actually what we think it is. Apparently the method was only 75% complete with a 40% efficiency, so that isn't good. He wants me to get to the bottom of this and see if the QSO objects in the MMT data are actually quasars by comparing with Robert daSilva's variability catalogues. He also wants to check how good the method works on single-epoch data (they were running it on the co-added photometry).

He suggested that perhaps I run Michael Blanton's photo-z code on the Stripe-82 co-adds and use that at my data set for the Newman project. He suspects the photo-z's will be better than the single epoch data. He also suggested that perhaps a reason why my 3D correlation functions look so weird is because we get fiber collisions which wipe out correlations and ~Mpc scales (depending on redshift). Another reason to perhaps try this on mock data before I debug for years trying to find an error in my code (it's possible this method just wont work on this data set).

~~~~~

I gave the link to this blog to Alexia, and posted it on Facebook. I am fulfilling the whole "being accountable to people" rule. I guess I should change the blog's privacy settings so it is Google-able [DONE], and perhaps should add a link to my webpage. I hate the idea of people actually reading this (not that I think I am that important).

Tuesday, August 11, 2009

Deconstruction

The basic idea of the Newman Method redshift distribution reconstruction code is to take two sets of data which overlap in sky position (ra, dec). One set (let's call it S) is spectroscopic data where we have known redshifts. The other set (P) is photometric data where we do not know the redshifts (or at least don't know them very accurately). Next, I bin S into line of sight (los) distance bins. In the example I am describing here I have binned S from 200 Mpc/h to 1400 Mpc/h with a bin width of 200 Mpc/h. Then we take the angular correlation function, ω(los,θ), [see Landy, Szalay (1993) and Masjedi et al. (2006) for definitions of correlation functions] between each of these binned subsets of S and P. We then take the 3D auto-correlation, ξ(r) of the binned S with itself. The redshift distribution, φ, of P can be reconstructed by inverting the following sum over N bins:
ref: Schulz and White in Prep.

I am applying this method to galaxies in the Sloan data set. Here is what I expect the reconstructed redshift distribution to be (based on the photo-z's of the photometric data):

However when I run the reconstruction code, I get answers which do not look anything like this redshift distribution. More disturbing is that I get answer which vary wildly depending on how many φ bins I use in the reconstruction (which makes no sense at all to me). See below plot:


Alexia thinks that perhaps there is something wrong with my 3D-correlation function (it looks pretty featureless and smooth). So the next step is to try different binning of ξ(r) to see if perhaps this allows for better reconstruction -- maybe I am using too many bins and this is causing problems because the correlation function doesn't work properly if it doesn't have enough objects to reconstruct? David suggested that I try running my code on a set of mock data where we know the answer (and we've gotten the code to work before on this data). This will help me separate out if this is a issue with my version of the code, with the data (I am using photometric data and pretending that the photo-z's are the actual redshifts, which may not be good enough), or with the Newman method itself.

Thursday, August 6, 2009

A Likely Result

In SDSS-III, the BOSS project is targeting 160,000 quasars (QSOs) at redshifts between 2.2 and 3.5. A color-color plot of spectroscopically confirmed QSOs (green) and stars (red). You can see that the quasars and stars have regions of overlap in the color space, and are therefore difficult to distinguish from each other.



David Schlegel suggested using a likelihood estimator to select quasars from stars. The basic idea is to have a templates of "quasar objects" and "all other objects" we would expect to see in the Sloan data set. We then take a "test object," one which we are trying to determine if it is a quasar or not, and compute a likelihood between it and the two templates. The likelihood of a test object (i) is for a set of template data (j) with color filters (f) is defined as follows:
where x is the flux of an object in the template data and mu is the flux of your test object and sigma if the error in your test object flux measurement.

I computed this likelihood with a set of test object where I knew if they were quasars or not. Below is a plot of the above likelihood computed on the test objects with a "quasar template" and a "everything else template." I am plotting the likelihood of the test objects with everything else (on the x axis) versus with quasars (on the y axis). Because I know what these test objects are, I can color the quasars (green) and the stars (blue). You can see that I get a clear separation in likelihood-likelihood space of these populations, where the quasars fall along a higher likelihood for quasars and the stars fall along a higher likelihood for everything else: