Sunday, December 11, 2011

Teaching Statement

Some of my job applications required a statement of teaching and philosophy. I thought I'd post it here, as it took some time to write, and might be helpful to others writing similar statements.


Teaching Experience and Philosophy
Jessica Kirkpatrick

My passion for physics and my personal academic history have combined to fuel a mission in me: I want to create a classroom environment which makes physics a more accessible subject. When diagnosed with a learning disability at age seventeen, I developed an interest in varied learning styles and strategies. Although I had always flourished in the sciences and mathematics, I would spend hours reading a single chapter of print. After extensive work with an educational therapist, I learned to draw on my strengths and accommodate my weaknesses using assistive technology as well as new approaches to learning.

While an undergraduate at Occidental College, I collaborated with another student to found the Learning Difference Association (LDA), which promoted awareness of learning disabilities and helped students with learning difficulties cope with their emotional, academic, and social concerns. LDA was a huge success: nearly half of the learning disabled students at Occidental were members, and it received the Club and Organization of the Year Award in 2000.

Founding LDA intensified my interest in understanding how people learn effectively. Hearing stories from my LDA peers about how their learning differences created challenges for them inside and outside of the classroom, I started to get a more complete perspective on how difficult learning can be for some students. While mathematics and physics came more naturally to me, I watched many classmates struggle with this material. I realized that some of the strategies LDA students used to accommodate their learning differences might be applicable to help the average physics student as well.

I next developed a physics version of the Academic Mastery Program (AMP), a tutorial program already utilized in other disciplines at Occidental. AMP provided a supplementary workshop to introductory classes, in which upperclassmen facilitated discussion, led problem-solving sessions, and conducted experiments with beginning students. I implemented this program as a sophomore, designed its lesson plans and problem worksheets, served as a facilitator for three semesters, and obtained funding to ensure its continuation.

My objective when developing AMP’s curriculum was to build on a student’s intuitions about the physical world from her day-to-day experiences. I hoped to guide her through a series of questions or exercises – creating a bridge between her observations of the world and the (sometimes unintuitive) problems that appear on typical physics exams and homework sets. Students developed this knowledge not only through pen-and-paper problem solving, but also by performing experiments, creating models using computer programs, and engaging in discussions with peers. My intention was to present the material in a variety of ways, making physics more accessible to a greater number of students.

I continued peer tutoring and advising throughout my undergraduate career. My personal history allowed me to empathize with struggling students and my alternative learning skills proved useful for many. I gained repute among Occidental physics and math undergrads, and my tutoring and advising hours always began with students lined up waiting for my arrival. My talents were acknowledged by the physics faculty who offered me a position as an adjunct instructor.

The year after I graduated from Occidental, I stayed on to work as an adjunct laboratory instructor and post-graduate researcher. The opportunity to be in charge of a classroom of 15 students helped me gain additional perspective on the role of college professor. I started to appreciate the interpersonal dynamics involved in managing a classroom, balancing the needs of the strongest or most vocal with those of students too shy to ask their questions. I did this by not only creating class-wide discussions, but having people talk in pairs or small groups. I made a point of engaging each individual student to evaluate if he was understanding the material. When a student didn’t understand, I would encourage him to talk with a peer who did. This allowed me to move on to other students, while providing the peer an opportunity to explain the concept and solidify her understanding as well. I also encouraged students to work with different people each week so that by halfway through the semester, most students knew each other, felt more comfortable engaging in class-wide discussions, and experienced a more jovial learning environment.

Every teaching session left me invigorated and extremely grateful for the opportunity to help others – especially considering all the assistance I was given when I was struggling. I decided that I wanted to become a science educator. This along with my love of physics research inspired me to pursue a PhD at UC Berkeley.

At UC Berkeley I have been a graduate student instructor (GSI) for three semesters. During this time I taught the entire physics introductory series. The structure of the introductory courses at UC Berkeley involves 2.5 hours per week of lecture with a faculty member, and 5 hours per week of labs, problem solving sessions, and office hours with a GSI. Because the majority of the students’ time per week is spent with the GSI, effective GSI instruction is critical to the student’s success.

During this time, I started to get an even broader sense of the struggles students have with introductory physics material. A major issue for students at UC Berkeley – a public university which attracts students from a wide variety of backgrounds – is inadequate mathematics and science preparation for these courses. I repeatedly saw students who had understanding of the physical concepts and enjoyment of the material, but performed poorly on homework and exams because of weak math, language, or laboratory skills. I tried to identify these students early in the semester, by giving a skills assessment in the first week of class. This worksheet had students describe their background in math and physics and tested various skills that were considered a prerequisite for the class. I would then meet with students individually to discuss their assessment, and make recommendations as to how they might improve in certain areas. This sometimes involved their attending a workshop which reviewed certain math skills or taught a specific computer program. I might even recommend that the most under-prepared student postpone taking physics until he had taken a refresher course in math.

It became clear to me that a student’s success, motivation, perseverance, and determination is highly influenced by her psychological beliefs about herself and her epistemological beliefs about physics. I observed smart, capable students belittle themselves and give up (even when they were on the right track) because they thought “I just can’t do physics.” I saw students ignore their physical intuitions, and answer questions using rote memorization because they thought “that’s just how you do science.” I found it important to be especially encouraging of students who lacked confidence, to be very open about my own academic struggles, and to provide a realistic perspective on the amount of time and energy required to understand something as challenging as physics. I tried to develop in my students a general set of problem solving skills by giving unique problems which weren’t easily solved using “plug and chug” algorithms, but encouraged creativity and deeper thought. My goal was for my students to view physics as a web of intuitive concepts instead of a disconnected list of facts and formulas.

I see my teaching philosophy as a work-in-progress. I believe a key component to being an effective teacher is to avoid complacency by continually making adjustments as I gain more experience and obtain feedback from student and peers. My hope, if given a teaching position at your institution, is to gain a more formal experience in teaching and science education. I am also interested in developing educational projects, assignments, and exercises using astronomical data from the Sloan Digital Sky Survey, which would tie together my research interests with the teaching component of this position.

Friday, December 9, 2011

Chat with Nikhil

I talked to Nikhil last night about the post-doc / job search. He said some interesting things that I thought I should write here.
  • Letters are the most important thing. Good letters are more important than a good CV or research statement.
  • A good research statement is a statement that is tailored to the university you are applying to. Show that you have some interest in / idea about the work going on there.
  • Professionally it isn't great to stay at the same place, but people understand if there are extenuating circumstances (two body problem etc).
  • Get as many people as possible to look over your applications, especially those that have been on hiring committees.
  • Get on a hiring committee / grant review committee. That's how you learn what people are looking for.
Helpful information. But now I am afraid that I didn't tailor my applications enough for the specific universities.

I ended up applying to 15 postdocs and 8 faculty positions. My work is done, now I just have to hope that someone wants me.

Friday, December 2, 2011

Practice Job Talk

Today I did a practice job/qual talk with my group a few extra post-docs at LBNL. Below are some feedback/notes. I also have emailed notes from Beth, David and Genevieve.

Likelihood:
*Show some plots of the 5D parameter space which shows the redshift evolution of quasars in that space and compares them to stars. Apparently there is something like this in Ross 2011.
*Also note the increase in errors with increased redshift.
*This will allow you to easily answer questions about why targeting is hard in this redshift range.
*redshift distribution plot -- what causes those bumps.
*More motivation -- show right away that 1/2 as many quasars would be targeted if we hadn't done this.
*Be able to answer questions about the quasars that we missed. Are they biased? Talk to McGreer about his models...
*Why did we go from 2 -> 3 arc seconds from sdss 2 to boss
be able to explain EVERYTHING you put on a slide.
*BOSS did do imaging (not just spectroscopic)
*Introduce terms better
*More set-up

Clustering:
*Need to connect more to models and interpretations
*Outline plan moving forward with modeling.

More:
*Need a timeline to finish
*Need to talk about third (mini) project to finish off PhD

General:
*MW likes to ask about dark matter.
*SP likes to ask about what is the main contribution to your errors.
*Should be able to be able to derive the likelihood equations, write down baye's theorem, understand what it means.
*Read Eisenstein 2007 about BAO understand more about where that comes from
*Be able to write down the equation of state. Derive dark energy stuff.
*Know more about what goes into the different luminosity functions.
*Need to be aware of shaky voice
*Dress comfortable
*Upturn in tone, like asking a question.

Friday, November 4, 2011

Talk at Princeton

I'm at the BOSS QSO Working group meeting at Princeton today. I'm giving a talk. I am still not sure if I can publicly release some of these results/data, but here is the money-slide. It shows that I am getting a clear separation between the clustering signal of the bright quasars (x-axis) and the dim quasars (y-axis) (~3-sigma):
For readers in the SDSS collaboration the full talk and more information is here on the SDSS wiki.

Thursday, November 3, 2011

Talk at Brookhaven

Yesterday I gave the first research seminar of my life. It was a 50 minute talk at Brookhaven National Labs. Overall, I think it went well. The talk was to a small group of research scientists. About half were cosmologists, the other half were physicists of another flavor.

My hosts Erin Sheldon and Anže Slosar gave me some really helpful feed back:
  1. I wasn't expecting such a mixed audience, and I didn't give enough introduction to BOSS, and the data. For instance I didn't directly discuss why quasars are hard to target. I also didn't emphasize that the inputs to the likelihood were the photometric fluxes. I should have talked more about the difference between photometric and spectroscopic data. But in general, I need to be more aware of the audience and beef up the introduction.
  2. I went through the full derivation of the likelihood, and this wasn't necessary.
  3. I was nervous and relied on notecards. Anže said this made it look like I didn't understand what I was talking about. Basically, I need to practice the talk a bunch so I don't need cards.
  4. I mispoke and talked about the first BAO detection. I said it was from SDSS-II, it was actually from SDSS-I.
  5. In general I didn't feel prepared to answer all questions, I need to go through the talk and make sure I fully understand everything I am posting -- for instance how was the first BAO detection made?
The talk was more fun than I expected, and has definitely eased my nerves in terms of my quals. I think I can use a similar version of this talk for my quals, which is good.

Friday, October 14, 2011

CosmoloPy

Why didn't I know this package existed before?

http://roban.github.com/CosmoloPy/

Functions Include:
  • Various cosmological densities.
  • Various cosmological distance measures.
  • Galaxy luminosity functions (Schechter functions).
  • Conversion in and out of the AB magnitude system.
  • Pre-defined sets of cosmological parameters (e.g. from WMAP).
  • Perturbation theory and the power spectrum.
  • The reionization of the IGM.
Very cool!



Thursday, October 13, 2011

More python plotting tricks

I absolutely love these python plotting page with examples:
http://matplotlib.sourceforge.net/gallery.html
http://www.scipy.org/Cookbook/Matplotlib

Today I figured out how to change the fonts of the legend on my plot, and add a shadow. (Pretty exciting I know.) I've also made the lines a little bit thicker, and different styles (so that it looks pretty in black and white printing of the paper).

Below is the code/plot:

from matplotlib.font_manager import fontManager, FontProperties
font= FontProperties(size=24);

fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(chi, probsqso, 'k-', label = '$\mathrm{QSOs}$', linewidth=2.0)
ax.plot(chi, probsqsow, 'k:', label = '$\mathrm{QSOs\\ Weighted}$', linewidth=2.0)
ax.plot(chi, prob, 'k--', label = '$\mathrm{Gals}$', linewidth=2.0)

ax.set_xlabel("$\\chi$", fontsize=24)
ax.set_ylabel("$\mathrm{Normalized\ }dN/d\\chi$",fontsize=24)
pylab.legend(loc=2,shadow=True,prop=FontProperties(size=18))


Friday, September 23, 2011

Blogging Update

I've been a very bad blogger recently. Basically this is because I've been spending all my time working on a project with people at Berkeley, and so there isn't as much need to post, as I was mainly using this blog to communicate to Alexia on our project. Alexia, if you are reading this I haven't given up on our project, and in fact much of the work I am doing now will be useful for us. However, considering I want to graduate this year, I've been pushing forward on this new project that is working, and close to being publishable.

My collaborators at Berkeley have wanted me to keep quiet about my new project's results (until we publish), and so I've not been able to post plots etc. for this project. However, I do find it helpful for organization and motivation to write on this blog, so I think I'll start writing on it again, if nothing else that to say what I did that day.

I've been working a lot of error analysis for my correlation functions. I have bootstrap/jackknife analysis in place and have been doing various fits to the correlation function to try to figure out the errors. It's strange because depending on what fitting methods I use, I get wildly different errors... and sometimes it seems like I am getting a statistically significant result (3+ sigma) and other times not.

I've also been much better at writing self-contained code that is easily re-runnable. I used to drive Alexia crazy will all the "cutting and pasting" I was doing. Well no more. I know am doing object-oriented programs with lots of functions etc. It's actually really nice.

I've been learning a lot more about python. Using cool fitting tools and statistics tools. Python is such a powerful language.

Thursday, September 22, 2011

On the Market

Hello readers!

I'm officially on the job market this year. Aiming to graduate in June, and hoping to start working the following fall. If you like what you've been reading, please consider hiring me.

Due to a "two-body" problem, I am mainly looking at jobs on the west coast of North America (Los Angeles, San Francisco, Seattle, Vancouver) and New York City.

Please visit my web site for more information.

Thursday, September 15, 2011

I hate IDL

I've been working a little bit in IDL for this new project because all the data is in fits files, Alexie has a lot of code that is in IDL which we are using to handle the CFHT data.

Anyway, I spent all morning struggling trying to do very simple input and output from this stupid stupid language.

I learned a few things, and since I hardly ever code in IDL, I am sure I'll forget unless I write them down right now. So here we go:

1) Tag names
You can get the tag names from a structure like this:

tname = tag_names(struct)

2) Merging structures
You can merge two structures as follows:
str3 = create_struct(str1,str2,NAME='str3')

3) writecol bug
For some reason when I try to use writecol to write out data, if there are more that 6 columns, and I don't specify the format, it puts a return carriage in the middle of the data. This goes away if you specify the format:

thisfile = './qsosMergedInfo.dat'
writecol,thisfile, qsos.ra, qsos.dec, qsos.z, qsos.zwarning, $
qsos.flux_clip_mean[0],qsos.flux_clip_mean[1], qsos.flux_clip_mean[2], $
qsos.flux_clip_mean[3], qsos.flux_clip_mean[4], fmt='(f,f,f,f,f,f,f,f,f)'


Wednesday, September 7, 2011

More streamlining

Martin White keeps telling me I should streamline my code/runs so that it is really easy to re-run with new data/subsets.

I've been working on doing that for running the correlation functions, and running the error analysis. I've written code:

../Jessica/qsobias/Correlate/runCrossCorrelationJackknifeMulti.py

That takes the run directories, and qso-file names as inputs (along with other constants for the correlation) and runs the correlation on all those different files at once.

Then there is code that compares the correlation functions and makes a bunch of pretty plots. I'm running it for the first time now. Hope it all works!

Oh, and I need to talk to Martin White about what to do when certain realizations of the bootstrap/jackknife are negative. This causes problems with fitting the power-law, and also isn't physical. Currently I re-run the realization if it has a negative value for the correlation function, but I don't think this the right thing to do, as it is biasing the bootstrap to have higher clustering values.


Black Hole Masses

I had a nice chat with Benny Trakhtenbrot this morning about calculating black hole masses and his research. He had some useful suggestions. First he suggested that I look at the McLure and Dunlop 2004 paper instead of the Vestergaard and Peterson 2006 for calculating the black hole masses. I should ask Martin if he agrees.

He also suggested that I talk to Yue Shen about his catalogs of black hole masses (or maybe it was line widths). I don't know how different this is to the catalog that Ian McGreer has given us.

He also thought that perhaps we might want to do a separation by Eddington ratio and see if there is a clustering dependence there.

He was very helpful, and overall a productive meeting.

I've been working on trying to get a stronger clustering signal separation. So far divided the QSO sample into even brighter/dimmer sets. As well as applying a uniform redshift selection to my two samples to see if what I am measuring is actually a redshift separation, not luminosity. Fun fun fun!

Saturday, September 3, 2011

Cambridge Fan?

So apparently I have a very regular reader in Cambridge, England. I was checking out my Google Analytics statistics for this blog and my personal web page yesterday (I was wondering how many people had looked at my likelihood post/paper). To my surprise, I discovered that between last March and today, and single individual in Cambridge has looked at my blog 238 times, and my personal web page 267 times. That is an average of 2.8 visits per day!

Who are you Cambridge, England? And why do you find my blog/web page so interesting?

Perhaps it is Stephen Hawking looking for inspiration? The strange thing is that I don't actually know anyone in Cambridge.

The only other city with anywhere near this many visits is Berkeley (44 on my web page, and 181 on my blog). However these visits are from 31/64 different people in Berkeley, whereas the Cambridge visitor is ONE person.

Thank you Google for providing me with such detailed information about my followers. Who else is out there reading this? Roll Call people.


My Blog

My Web Page

Friday, September 2, 2011

Likelihood Accepted!

It's been a long process, but the likelihood paper was accepted by the Astrophysical Journal this week. I submitted my final draft to them today, as well as resubmitting the accepted version to the arXiv! So happy to be done with that chapter of my research and able to focus on new work!

Here is a link to the paper!

Saturday, August 13, 2011

SDSS-III Collaboration Meeting - Vanderbilt

Exciting results! Luminosity Dependent Clustering of Quasars. More details here.


Tuesday, June 14, 2011

Temperature Plots in Python

I figured out how to do temperature plots in python. The only problem with below is that the numbers on the axis are wrong. I if I set 'extent' to be the actual extent of the axis (in this case x = [0.5,1], y = [-28,-18]) then it plots a very skinny, tall plot. So I had to correct the axis by hand later.


import matplotlib
import numpy as np
import matplotlib.cm as cm
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
import matplotlib.pyplot as plt

redbins = linspace(start = 0.5, stop = 1.0, num = 51)
rcenter = (redbins[0:-1]+redbins[1:])/2.0
mbins = linspace(start = -28, stop = -18, num = 51)
mcenter = (mbins[0:-1]+mbins[1:])/2.0


H, xedges,yedges = N.histogram2d(qsoM[dr7cut],qsored[dr7cut],bins=(mbins,redbins))
X = rcenter
Y = mcenter
Z = H

#plt.imshow(H, extent=[-3,3,-3,3], interpolation='bilinear', origin='lower')
#plt.colorbar()


plt.imshow(H, extent=[-3,3,-3,3], interpolation='nearest', origin='lower')
plt.colorbar()

xlabel('redshift (z)')
ylabel('absolute magnitude (i-band)')
title('SDSS DR7 Quasar Density')

The above code is also in the following log file: ../logs/110614log.py

Thursday, June 9, 2011

Calculating Absolute Magnitude

Going back to some "Freshman Astronomy" as Martin White said in the meeting today. The absolute magnitude i-band (M) and apparent i-band magnitude (m) are related by the following equation:

m = M + DM + K (Eqn. 1)

Where DM is the distance modulus and K is the K-correction.

The distance modulus is defined as follows:

DM = 5 log10 ( DL / 10pc ) (Eqn. 2)

where DL is the luminosity distance, defined as follows:

DL = (1+z)*DM (Eqn. 3)

where DM is the comoving distance and z is the redshift.

The K-correction is calculated the following way:

K = -2.5 (1+αν)log10 (1+z) (Eqn. 4)

αν= -0.5 according to Richards et. al 2006

So.... putting that all together:

M = m -DM - K
M = m - 5 log10 ( (1+z)*DM / 10pc ) + 2.5 (0.5)log10 (1+z)

I've decided to use this table from Richards et.al. 2006 to do the k-corrections, because it corrects for both the emission line and continuum components, where as the equation above just corrects for the continuum.

References:
http://www.mporzio.astro.it/~fiore/agn/richards_2006.pdf
http://arxiv.org/pdf/astro-ph/9905116

More QSO-Galaxy Cross Correlations

Here are the QSO-Galaxy Cross-Correlations comparing high/low redshift QSOs. I divide the QSO sample in half (0.5 < z < 0.77) & (0.77 < z <1.0):

(plots on wiki)

Here are the QSO-Galaxy Cross Correlations comparing bright/dim QSOs. I divide the QSO sample into the 1/3 brightest (20.20 < g-mag < 16.17) and 1/3 dimest (25.87

(plots on wiki)

It's pretty interesting that there seems to be a difference in the correlation functions for bright/dim objects. I am not sure if we should expect this. The redshift distribution of these objects is similar, so that points to the fact that these objects are actually brighter (not sure further away). I'm hoping to do this in terms of high/low luminosity... but I need to figure out the absolute magnitudes of these objects first. Have pinged Ross about this. More to come...

Wednesday, June 8, 2011

1st Attempt at QSO-Galaxy Cross Correlation

I've calculated the qso-galaxy correlation functions on the entire stripe, and also the stripe divided into two halves.
The code to do this is below (also in ../qsobias/Correlate/runCorrelation.py):

import numpy as N
from pylab import *
from correlationFunctions import *

#------------------------------------------------------------------------
# Set Angular Correlation Bin Information
#------------------------------------------------------------------------

oversample = 5. # Amount that randoms should be oversampled
corrBins = 15.0 # Number of correlation bins (+1)
mincorr = 0.1 # (Mpc/h comoving distance separation) Must be great than zero if log-binning
maxcorr = 10.0 # (Mphc/h comoving distance separation)
convo = 180./pi # conversion from degrees to radians
tlogbin = 1 # = 0 for uniform spacing, = 1 for log spacing in theta

#------------------------------------------------------------------------
# Create file names (full catalog)
#------------------------------------------------------------------------
workingDir = 'fullrun1'
makeworkingdir(workingDir)
galaxyDataFile, qsoDataFile, randomDataFile, corr2dCodefile, argumentFile, \
runConstantsFile = makeFileNamesFull(workingDir)

#------------------------------------------------------------------------
# Write run constants to a file (full catalog)
#------------------------------------------------------------------------

writeRunConstantsToFile(runConstantsFile, galaxyDataFile, qsoDataFile, \
randomDataFile, corr2dCodefile, argumentFile, oversample, corrBins, \
mincorr, maxcorr, tlogbin)

#------------------------------------------------------------------------
# Compute the Angular Correlation Function (full catalog)
#------------------------------------------------------------------------

runcrossCorrelation(workingDir, argumentFile, corr2dCodefile, galaxyDataFile,\
qsoDataFile, randomDataFile, mincorr, maxcorr, corrBins, tlogbin)

#------------------------------------------------------------------------
# Create file names (Left catalog)
#------------------------------------------------------------------------
workingDir = 'leftrun1'
makeworkingdir(workingDir)
galaxyDataFile, qsoDataFile, randomDataFile, corr2dCodefile, argumentFile, \
runConstantsFile = makeFileNamesLeft(workingDir)

#------------------------------------------------------------------------
# Write run constants to a file (Left catalog)
#------------------------------------------------------------------------

writeRunConstantsToFile(runConstantsFile, galaxyDataFile, qsoDataFile, \
randomDataFile, corr2dCodefile, argumentFile, oversample, corrBins, \
mincorr, maxcorr, tlogbin)

#------------------------------------------------------------------------
# Compute the Angular Correlation Function (Left catalog)
#------------------------------------------------------------------------

runcrossCorrelation(workingDir, argumentFile, corr2dCodefile, galaxyDataFile,\
qsoDataFile, randomDataFile, mincorr, maxcorr, corrBins, tlogbin)

#------------------------------------------------------------------------
# Create file names (Right catalog)
#------------------------------------------------------------------------
workingDir = 'rightrun1'
makeworkingdir(workingDir)
galaxyDataFile, qsoDataFile, randomDataFile, corr2dCodefile, argumentFile, \
runConstantsFile = makeFileNamesRight(workingDir)

#------------------------------------------------------------------------
# Write run constants to a file (Right catalog)
#------------------------------------------------------------------------

writeRunConstantsToFile(runConstantsFile, galaxyDataFile, qsoDataFile, \
randomDataFile, corr2dCodefile, argumentFile, oversample, corrBins, \
mincorr, maxcorr, tlogbin)

#------------------------------------------------------------------------
# Compute the Angular Correlation Function (Right catalog)
#------------------------------------------------------------------------

runcrossCorrelation(workingDir, argumentFile, corr2dCodefile, galaxyDataFile,\
qsoDataFile, randomDataFile, mincorr, maxcorr, corrBins, tlogbin)

#------------------------------------------------------------------------
# Must wait for jobs to finish running & output files to be created
#------------------------------------------------------------------------

#------------------------------------------------------------------------
# Plot Correlation Functions
#------------------------------------------------------------------------

#------------------------------------------------------------------------
# Plot Full Stripe 82
#------------------------------------------------------------------------

workingDir = "fullrun1"
outfile = "./"+workingDir+"/wpsOutput.dat"
theta,omega=readCorrOutfile(outfile) #Reads correlation output file
loglog(theta, omega,'r-', label = "Full Stripe")
# Number of dd pairs in each bin
ddpairs = [3029, 5404, 9578, 17175, 31653, 58113, 106518, 194566, 358573, 656623, 1211740, 2234135, 4121513, 7602541, 13931607]
errors = 1/N.sqrt(ddpairs)
errorbar(theta, omega, yerr=errors,fmt='ro')

xlabel('Separation Distance (Mpc/h)')
ylabel('omega')
title('2D QSO-Galaxy Cross-correlation Function')
pylab.legend(loc=3)

(plots on wiki)

#------------------------------------------------------------------------
# Plot Full Stripe 82, Left-Half and Right-Half of Stripe 82
#------------------------------------------------------------------------

workingDir = "fullrun1"
outfile = "./"+workingDir+"/wpsOutput.dat"
theta,omega=readCorrOutfile(outfile) #Reads correlation output file
loglog(theta, omega, label = "Full Stripe")

workingDir = "leftrun1"
outfile = "./"+workingDir+"/wpsOutput.dat"
theta1,omega1=readCorrOutfile(outfile) #Reads correlation output file
loglog(theta1, omega1, label = "Left Stripe")

workingDir = "rightrun1"
outfile = "./"+workingDir+"/wpsOutput.dat"
theta2,omega2=readCorrOutfile(outfile) #Reads correlation output file
loglog(theta2, omega2, label = "Right Stripe")

xlabel('Separation Distance (Mpc/h)')
ylabel('omega')
title('2D QSO-Galaxy Cross-correlation Function')
pylab.legend(loc=3)

(plots on wiki)

#------------------------------------------------------------------------
# Plot Full, Left-Half, Right-Half of Stripe 82, and Mean of all three + 3-sigma errors
#------------------------------------------------------------------------

workingDir = "fullrun1"
outfile = "./"+workingDir+"/wpsOutput.dat"
theta,omega=readCorrOutfile(outfile) #Reads correlation output file
loglog(theta, omega, label = "Full Stripe")

workingDir = "leftrun1"
outfile = "./"+workingDir+"/wpsOutput.dat"
theta1,omega1=readCorrOutfile(outfile) #Reads correlation output file
loglog(theta1, omega1, label = "Left Stripe")

workingDir = "rightrun1"
outfile = "./"+workingDir+"/wpsOutput.dat"
theta2,omega2=readCorrOutfile(outfile) #Reads correlation output file
loglog(theta2, omega2, label = "Right Stripe")

datamaxtrix = [omega,omega1,omega2]
mean = N.mean(datamatrix,axis=0)
errors = 3*N.std(datamatrix,axis=0)/sqrt(3)
loglog(theta, mean, 'k-',label = "Mean")
errorbar(theta, mean, yerr=errors,fmt='ko')

xlabel('Separation Distance (Mpc/h)')
ylabel('omega')
title('2D QSO-Galaxy Cross-correlation Function')
pylab.legend(loc=3)

(plots on wiki)

I sent the above plots to Martin, Alexie, Nic and David. Martin thought they looked about right. The slope of the mean correlation function above is -0.74. Martin and Nikhil found that the 3D correlation function was about -1.7, and we would expect the projected 2D correlation function to be about one less than this.... so that is good.

1) Try dividing the QSO samples into different redshift/brightness sets and see if we get different correlation results.
2) Implement jackknifing ability into the code.

Tuesday, June 7, 2011

More Debugging Correlation Functions

In trying to modify my angular correlation function for the qso-galaxy cross-correlation project, I found a but that might point to why I was getting huge spikes at small angles for my angular correlation function in this post.

Turned out that (for who knows how long) I wasn't feeding in the minimum correlation angle / distance into the code that makes the mesh for the correlation code. For some reason when I called the function I had hard coded 0.0 as the minimum correlation angle/distance instead of corrmin. So this meant that all objects with correlations less than corrmin were all put into the smallest bin. I don't think this exactly explains the observed spiky behavior, but now that I've fixed that I am going to see if things work better.

Monday, June 6, 2011

Testing QSO-Galaxy Cross CF

I spent the last couple days making sure that I trusted the correlation function code I had developed for this project. I compared it to the results from a (slow) sphere-match code on a small data set.

The following are the testing codes (also here ../logs/cross_corr_check.idl). In IDL using spherematch:

; Read in data/random files
readcol,'cfht_data_tiny.dat',rad1,decd1, format = ('D,D')
readcol, 'qso_data_tiny.dat',rad2,decd2,cd, format = ('D,D,D') ;cd is comoving distance of qsos
readcol,'cfht_randoms_tiny.dat',rar1,decr1, format = ('D,D')

; Find number of objects in files
nr1 = n_elements(rar1)
nd1 = n_elements(rad1)
nd2 = n_elements(rad2)

; Correlate out to a large theta to get all pairs
thetamax = 50.0

print,'Starting cross-correlation:'
print,'Estimating DD...'
spherematch,rad1,decd1,rad2,decd2,thetamax,ind1a,ind1b,dist_dd,maxmatch=0

; Convert from angular separation to comoving distance separation
this_dd = 2*sin(dist_dd*!pi/360)*cd[ind1b]

;Bins go from 0.1 to 10 with 15 bins.
corrmin = 0.1D
corrmax = 10.0D
nbins = 15.0D

; Find bins lower, upper and centers
bins_lower = (corrmax-corrmin)/(nbins)*findgen(nbins)+corrmin
bins_upper = (corrmax-corrmin)/(nbins)*(findgen(nbins)+1)+corrmin
rmean = fltarr(nbins)
for i = 0,(nbins-1) do rmean[i] = (bins_lower[i]+bins_upper[i])/2.

; Bin the DD separation distances
dd = fltarr(nbins)
for i = 0,(nbins-1) do dd[i] = n_elements(where(this_dd gt bins_lower[i] AND this_dd le bins_upper[i]))

print,'Estimating DR...'
spherematch,rar1,decr1,rad2,decd2,thetamax,ind1,ind2,dist_dr1,maxmatch=0

this_dr = 2*sin(dist_dr1*!pi/360)*cd[ind2]
dr = fltarr(nbins)
for i = 0,(nbins-1) do dr[i] = n_elements(where(this_dr ge bins_lower[i] AND this_dr le bins_upper[i]))

corr1 = 1L*dd/dr*1L*(nd2*nr1)/(1L*nd1*nd2)-1L

for i = 0,(nbins-1) do print, rmean[i], corr1[i]

Separation omega
0.430000 -0.115686
1.09000 -0.104478
1.75000 -0.120804
2.41000 -0.0914845
3.07000 -0.0393971
3.73000 -0.0268416
4.39000 0.0134841
5.05000 0.0596094
5.71000 0.0227162
6.37000 0.102554
7.03000 0.0929233
7.69000 0.0900670
8.35000 0.0591398
9.01000 0.0284724
9.67000 0.0598689

(Note that these "tiny" files only have 6 qsos and 9000 galaxies, so the correlation function values are very noisy, this was just to test and I used small files to)

By comparison I also have the python/C code which runs much faster (../Jessica/qsobias/Correlate/runCorrelation.py):

import numpy as N

from pylab import *

from correlationFunctions import *


#------------------------------------------------------------------------

# Create file names (tiny catalogs)

#------------------------------------------------------------------------

workingDir = 'tinyrun'

makeworkingdir(workingDir)

galaxyDataFile, qsoDataFile, randomDataFile, corr2dCodefile, argumentFile, runConstantsFile = makeFileNamesTiny(workingDir)


oversample = 5. # Amount that randoms should be oversampled

corrBins = 25.0 # Number of correlation bins (+1)

mincorr = 0.1 # (Mpc/h comoving distance separation) Must be great than zero if log-binning

maxcorr = 10.0 # (Mphc/h comoving distance separation)

convo = 180./pi # conversion from degrees to radians

tlogbin = 1 # = 0 for uniform spacing, = 1 for log spacing in theta


#------------------------------------------------------------------------

# Write run constants to a file

#------------------------------------------------------------------------


writeRunConstantsToFile(runConstantsFile, galaxyDataFile, qsoDataFile, \

randomDataFile, corr2dCodefile, argumentFile, oversample, corrBins, \

mincorr, maxcorr, tlogbin)


#------------------------------------------------------------------------

# Compute the Angular Correlation Function

#------------------------------------------------------------------------

runcrossCorrelation(workingDir, argumentFile, corr2dCodefile, galaxyDataFile,\

qsoDataFile, randomDataFile, mincorr, maxcorr, corrBins, tlogbin)


# separation (Mpc/h) crossw (Mpc/h)

0.4300000000 -0.1156862745

1.0900000000 -0.1044776119

1.7500000000 -0.1208039566

2.4100000000 -0.0914845135

3.0700000000 -0.0393970538

3.7300000000 -0.0268417043

4.3900000000 0.0134841235

5.0500000000 0.0596093513

5.7100000000 0.0227161938

6.3700000000 0.1025539385

7.0300000000 0.0929232804

7.6900000000 0.0900670231

8.3500000000 0.0591397849

9.0100000000 0.0284723490

9.6700000000 0.0598689436


As you can see the correlation functions match!

Wednesday, June 1, 2011

Data in Order

I've finished creating the randoms for the CFHT data. They are here:

../data/jessica/alexieData/Catalogs/cfht_random_catalog.dat

So now I have the masked galaxy data:
../data/jessica/alexieData/Catalogs/cfht_data.dat

The qso data:
../data/jessica/alexieData/Catalogs/qso_data.dat

The code to make these is catalogs are here:
randoms: ../Jessica/qsobias/Analysis/make_random_catalog.pro
galaxy data: ../Jessica/qsobias/Analysis/doall_cfht_se.pro
qso data: ../Jessica/qsobias/Analysis/make_qso_catalog.py

In my meeting last week, I realized that I had been calculating the correlation function incorrectly. I need to multiply the angle by the comoving distance to get a physical separation, not angular separation. This requires tweaking the c-code a bit to change that.

Tuesday, May 31, 2011

More Reconstruction Test

I ran the reconstruction with a different selection function to see if that made things better. The photometric distribution is shown here (workingDir is here: 'run2011530_1632'):


I'm still getting the issue that I am having trouble finding phibins that don't give me the error. But here are some reconstructions, the first one looks good-ish, the second one not so much:

Note that the label on the y-axis is not right, there are more objects in each bin than it says, I just had to normalize them to be on the same scale.

I'm running again with small angles to see if this helps. This smaller angle run is here (workingDir = 'run2011531_1451').

The angular correlation functions are freaking out at small angles. Not sure what that is about:



They look normal starting around 10^-3:


I tried doing the reconstruction both using all the angular bins, and just the angular bins that weren't freaking out, the look the same:

less angular bins:
all angular bins:

Here are some other reconstruction binning, disturbingly I am still getting the "feature" where if I change the # of bins slightly (from 15 to 16) I get very different reconstructions:




Because I am still not sure how doing this "non uniform" binning in r effects the reconstruction, I am running again, but with uniform binning. We'll see how that goes.

Friday, May 27, 2011

Randoms for CFHT Galaxies

Spent the last couple days making the randoms for the CFHT galaxies. Here is how it's done:
; 1) Go to Analysis directory:
/home/jessica/repository/ccpzcalib/Jessica/qsobias/Analysis/

; 2) Make randoms for each pointing:
listfile = '/clusterfs/riemann/data/jessica/alexieData/se_cats/se_list.txt'
mask_dir='/clusterfs/riemann/data/jessica/alexieData/masks/'
infile = '/clusterfs/riemann/data/jessica/alexieData/Catalogs/cs82_all_cat1.fits'
random_dir = '/clusterfs/riemann/data/jessica/alexieData/randoms/'
make_randoms,infile,outfile,mask_dir,random_dir,listfile

; 3) Remove randoms not in mask, combine to single file
dir = '/clusterfs/riemann/data/jessica/alexieData/randoms/'
listfile = '/clusterfs/riemann/data/jessica/alexieData/se_cats/se_list.txt'
outfile = '/clusterfs/riemann/data/jessica/alexieData/Catalogs/random_catalog.fits'
fix_randoms,dir,listfile,outfile

; 4) Remove overlaping regions
infile = '/clusterfs/riemann/data/jessica/alexieData/Catalogs/random_catalog.fits'
outfile = '/clusterfs/riemann/data/jessica/alexieData/Catalogs/random_catalog_no_overlaps.fits'
ra_list = '/home/jessica/repository/ccpzcalib/Jessica/qsobias/Analysis/S82.ra_cuts'
se_list_file = '/clusterfs/riemann/data/jessica/alexieData/se_cats/se_list.txt'
remove_overlaps,infile,outfile,se_list_file,ra_list

Here are some plots from DS9, blue circles are randoms that are kept, and red circles are randoms that are masked out. The green and yellow regions are masked regions:



The randoms are made at a density of 7.5*10D^5 per square degree. This is approximately 10X the density of the data.

Next Steps with Reconstruction

I just talked to Alexia she suggested that I change the selection function for the photometric data to have more features and higher redshift, as right now we are trying to reconstruct the first "bump" at ~150 Mpc where there are very few galaxies.

She also said I should look into how non-uniform binning (log binning) effects the summantions done in the reconstruction and if perhaps we need to do something different to deal with the fact that the spacing is no longer the same for each bin.

Exciting that things are closer to working than ever!

Tuesday, May 24, 2011

Reconstruction (Log Binning) Results

In this post I discussed trying different binning to get the reconstruction to overlap more with the actual correlation functions.

Here is the binning I am using. The binning is log in theta (for 2d correlation function) and rlos (for 3d correlation function), but the binning in redshift is such that each bin has the same number of galaxies.

Here are the arrays:

rbins = array([ 4.592407, 136.86079 , 170.595585, 194.675063, 211.888514,
227.822058, 242.439338, 254.738113, 267.180894, 277.596667,
287.557931, 297.031654, 305.563449, 314.144417, 322.061375,
329.842934, 337.956897, 345.179652, 352.14308 , 359.026505,
365.954462, 372.052363, 378.261215, 384.175881, 389.837465,
395.409799, 400.562692, 405.823926, 411.006328, 416.036745,
420.853707, 425.608283, 430.068758, 434.598696, 438.968533,
443.201263, 447.527471, 451.469393, 455.440266, 459.405314,
463.13316 , 466.657868, 470.378975, 474.188462, 478.033728,
481.934477, 485.61191 , 489.252768, 492.81945 , 496.462536,
499.999607]

theta = array([ 1.12201850e-03, 1.41253750e-03, 1.77827940e-03,
2.23872110e-03, 2.81838290e-03, 3.54813390e-03,
4.46683590e-03, 5.62341330e-03, 7.07945780e-03,
8.91250940e-03, 1.12201845e-02, 1.41253754e-02,
1.77827941e-02, 2.23872114e-02, 2.81838293e-02,
3.54813389e-02, 4.46683592e-02, 5.62341325e-02,
7.07945784e-02, 8.91250938e-02, 1.12201845e-01,
1.41253754e-01, 1.77827941e-01, 2.23872114e-01,
2.81838293e-01, 3.54813389e-01, 4.46683592e-01,
5.62341325e-01, 7.07945784e-01, 8.91250938e-01,
1.12201845e+00, 1.41253754e+00, 1.77827941e+00,
2.23872114e+00, 2.81838293e+00, 3.54813389e+00,
4.46683592e+00, 5.62341325e+00, 7.07945784e+00,
8.91250938e+00])

rlos = array([ 0.52962686, 0.59425111, 0.66676072, 0.74811783,
0.83940201, 0.94182454, 1.05674452, 1.18568685,
1.33036253, 1.49269131, 1.6748272 , 1.87918702,
2.10848252, 2.36575629, 2.65442222, 2.97831072,
3.34171959, 3.74947105, 4.20697571, 4.72030438,
5.29626863, 5.94251114, 6.66760716, 7.48117828,
8.39402009, 9.41824545, 10.5674452 , 11.85686853,
13.3036253 , 14.92691309, 16.74827196, 18.79187021,
21.08482517, 23.65756295, 26.54422221, 29.78310718,
33.41719588, 37.49471047, 42.06975708, 47.20304381])

Here are plots of the 2D and 3D correlation functions:



I've found it hard to find phibin spacing that doesn't go below the limits of the correlation function. I get the following error with a lot of binning:

sximat=crl.getximat(th,rlos,phibins,rs,xispec,xierr)
0.000505647568659 is less than minimum r_ss 0.000529627
refusing to extrapolate

Perhaps I should allow it to extrapolate to smaller angles....

but the following phibins works:

phibins = array([ 0.01 , 0.03578947, 0.06157895, 0.08736842, 0.11315789,
0.13894737, 0.16473684, 0.19052632, 0.21631579, 0.24210526,
0.26789474, 0.29368421, 0.31947368, 0.34526316, 0.37105263,
0.39684211, 0.42263158, 0.44842105, 0.47421053, 0.5 ])

Here is what the xi and sximat plots look like, as you can see they overlap more now:

But the reconstruction isn't so great:


If I change the binning (here is 50 bins):

phibins =
array([ 0.00459241, 0.13686079, 0.17059558, 0.19467506, 0.21188851,
0.22782206, 0.24243934, 0.25473811, 0.26718089, 0.27759667,
0.28755793, 0.29703165, 0.30556345, 0.31414442, 0.32206137,
0.32984293, 0.3379569 , 0.34517965, 0.35214308, 0.3590265 ,
0.36595446, 0.37205236, 0.37826121, 0.38417588, 0.38983746,
0.3954098 , 0.40056269, 0.40582393, 0.41100633, 0.41603674,
0.42085371, 0.42560828, 0.43006876, 0.4345987 , 0.43896853,
0.44320126, 0.44752747, 0.45146939, 0.45544027, 0.45940531,
0.46313316, 0.46665787, 0.47037898, 0.47418846, 0.47803373,
0.48193448, 0.48561191, 0.48925277, 0.49281945, 0.49646254,
0.49999961])

Another thing is that now I have so many theta, rlos, r bins the crl.getCoeff takes a long time to run.


I suppose this looks better. At this point I need to either try a different binning, or re-write the crl.getximat to allow extrapolation to smaller angles / separations to that I can try different reconstructions. Opinions Alexia?

Data Handling

For the last few days I've been doing a bunch of data handling for this new project. Below is what I've done:

Alexie wrote a pipeline to collate data and apply masks. The code is here: .../qsobias/Analysis/doall_cfht_se.pro

I added some code to remove the repeats: ../qsobias/Analysis/remove_repeats.pro

To just start us off I wrote some code to trim the data and make randoms in the entire footprint (because we don't have masks yet). This code is in the following log file: .../logs/110522log.pro

;I start with the galaxy catalog from Alexie: ../Catalogs/cs82_all_cat6.fits
;I remove repeats:

cat6 = '/clusterfs/riemann/data/jessica/alexieData/Catalogs/cs82_all_cat6.fits'
cat7 = '/clusterfs/riemann/data/jessica/alexieData/Catalogs/cs82_all_cat7.fits'

infile = cat6
outfile = cat7
remove_repeats,infile,outfile

;Read in file with repeats removed

infile = '/clusterfs/riemann/data/jessica/alexieData/Catalogs/cs82_all_cat7.fits'
str=mrdfits(infile,1)

;Do some simple geometry cuts on ra, dec, and z
deltacut = where(str.DELTA_J2000 gt -1.0 and str.DELTA_J2000 lt 0.94)
str = str[deltacut]

alphacut = where(str.ALPHA_J2000 gt -42.5 and str.ALPHA_J2000 lt 45.0)
str = str[alphacut]

zcut = where(str.ZPHOT le 1.0)
str = str[zcut]

;Do some cuts that Alexie recommends to make sure we are getting galaxies, inside the mask, and only down to mag = 23.7
gooddatacut = where(str.mask eq 0 and str.mag_auto lt 23.7 and str.class_star lt 0.9)
str = str[gooddatacut]

;Outfile data to files
outfile = '/clusterfs/riemann/data/jessica/alexieData/Catalogs/cs82_all_cat8.fits'
mwrfits, str, outfile, /create

;This is the galaxy catalog I am using for the preliminary correlation function test
thisfile = '/clusterfs/riemann/data/jessica/alexieData/Catalogs/cfht_data.dat'
writecol,thisfile,str.alpha_j2000,str.delta_j2000,str.zphot

;Now to do the same to the BOSS Data
;This code can be found in 110519log.py

;Read in BOSS Galaxy Data
qsofile = '/home/jessica/qsobias/Stripe_82_QSOs_unique.dat'
readcol,qsofile,ra,dec,z,gmag,format='(D,D,D,D,X)'

;Remove duplicates
spherematch, ra, dec, ra, dec, 2./3600, m1, m2, maxmatch=0
dups = m1[where(m1 gt m2)]
good = [indgen(n_elements(ra))*0]+1
good[dups]=0
ra = ra[where(good)]
dec = dec[where(good)]
z = z[where(good)]
gmag = gmag[where(good)]

thisfile = '/home/jessica/qsobias/QSOs_unique.dat'
writecol,thisfile,ra,dec,z,gmag

#Now in python

import numpy as N
from pylab import *
from dataplay import *

galfile = '/clusterfs/riemann/data/jessica/alexieData/Catalogs/cfht_data.dat'
gal=N.loadtxt(galfile,comments='#')
galra = gal[:,0]
galdec = gal[:,1]
galz = gal[:,2]

#Make Cuts on QSOs
qsofile = '/home/jessica/qsobias/QSOs_unique.dat'
qso=N.loadtxt(qsofile,comments='#')
qsora = qso[:,0]
qsodec = qso[:,1]
qsoz = qso[:,2]
qsog = qso[:,3]


#Dec between -1 and 1
deccut = N.where((qsodec < 0.94) & (qsodec > -1.0)) #make sure they are inside the cfht mask

qsora = qsora[deccut]
qsodec = qsodec[deccut]
qsoz = qsoz[deccut]
qsog = qsog[deccut]

#Redshift between 0 and 1
zcut = N.where((qsoz >= 0.0) & (qsoz < 1.0)) qsora = qsora[zcut] qsodec = qsodec[zcut] qsoz = qsoz[zcut] qsog = qsog[zcut] #RA range (-42.5,45) highra = N.where(qsora > 300)
qsora[highra] = qsora[highra] - 360.
racut = N.where((qsora < 45.0) & (qsora > -42.5))

qsora = qsora[racut]
qsodec = qsodec[racut]
qsoz = qsoz[racut]
qsog = qsog[racut]

#Write masked data to files
thisfile = '/home/jessica/qsobias/QSO_data_masked.dat'
writeQSOsToFile(qsora,qsodec,qsoz,qsog,thisfile)

#This is the QSO data file
thisfile = '/home/jessica/qsobias/QSO_data.dat'
writeQSOsToFile2(qsora,qsodec,qsoz,thisfile)

#Plot the galaxies and qsos
plot(galra, galdec, 'b,', label = 'galaxy data')
plot(qsora, qsodec, 'r,', label = 'qso data')

xlabel('ra')
ylabel('dec')
title('RA & Dec of Data')
from pylab import *
legend(loc=1)

# Translate into comoving distance (from redshift, ra, dec)
qsox,qsoy,qsoz,qsocd=raDecZtoXYZ(qsora,qsodec,qsoz)

min(qsocd)
0 #Mpc/h
max(qsocd)
2312.67 #Mpc/h

# Plot redshift distribution of galaxies and qsos

bins = linspace(start = 0.01, stop = 1.0, num = 20)
histresult = histogram(qsoz, bins=bins)
qson = histresult[0]
plot(bins, qson, 'b.-', label = 'qso data')

histresult = histogram(galz, bins=bins)
galn = histresult[0]
plot(bins, galn/1000, 'r.-', label = 'galaxy data (k)')

xlabel('redshift bin (z)')
ylabel('number of QSO/gal in bin')
title('Redshift Distribution of Galaxies and QSOs')
from pylab import *
legend(loc=2)

;Back to IDL
;Make random catalog
; code in ../log/110519log.py

;Read in galaxy data
thisfile = '/clusterfs/riemann/data/jessica/alexieData/Catalogs/cfht_data.dat'
readcol, thisfile, ra, dec, z, format='(F,F,F)'

;Make 10X as many randoms as we have galaxies
nrandoms = n_elements(ra)*10

;Make random ra / dec in the Stripe 82 range
randomdec = 90.0-acos(RANDOMU(S, 2*nrandoms)*0.03385776-0.01745246)*180./!pi
deccut = where(randomdec GT -1.0 AND randomdec LT 0.94)
randomdec = randomdec[deccut]
randomdec = randomdec[0:nrandoms-1]
randomra = RANDOMU(S, nrandoms)*87.5 - 42.50

; Make file of ra, dec of spectroscopic randoms inside mask
thisfile = '/clusterfs/riemann/data/jessica/alexieData/Catalogs/cfht_randoms.dat'
writecol,thisfile, randomra, randomdec

;Calculate correlation function using spherematch (should actually use c-code because this takes a long time

;Read in galaxy data
thisfile = '/clusterfs/riemann/data/jessica/alexieData/Catalogs/cfht_data.dat'
readcol, thisfile, gra, gdec, gz, format='(F,F,F)'

;Read in random data
thisfile = '/clusterfs/riemann/data/jessica/alexieData/Catalogs/cfht_randoms.dat'
readcol, thisfile, rra, rdec, format='(F,F)'

;Read in QSO data
thisfile = '/home/jessica/qsobias/QSO_data.dat'
readcol, thisfile, qra, qdec, qz, format='(F,F,F)'

;Make theta bins
theta = findgen(11)/2.-5.
logtheta = 10D^(theta)
maxtheta = max(logtheta)
bins = logtheta
nbins = n_elements(bins) - 1

nr = n_elements(rra)
nq = n_elements(qra)
ng = n_elements(gra)

;Find distance separations using sphere match
print,'Starting cross-correlation:'
print,'Estimating DD...'
spherematch,qra,qdec,gra,gdec,maxtheta,ind1a,ind1b,dist_dd,max=0

print,'Estimating DR...'
spherematch,qra,qdec,rra,rdec,maxtheta,ind1,ind2,dist_dr,max=0
print,'Done cross-correlating.'

;Find number of pairs per bin
bins_lower = bins[0:nbins-1]
bins_upper = bins[1:nbins]

dd = fltarr(nbins)
dr = fltarr(nbins)
rmean = fltarr(nbins)

for i = 0,(nbins-1) do dd[i] = n_elements(where(dist_dd gt bins_lower[i] AND dist_dd le bins_upper[i]))

for i = 0,(nbins-1) do dd[i] = n_elements(where(dist_dr gt bins_lower[i] AND dist_dr le bins_upper[i]))

;Calculate the correlation function
corr1 = dd/dr*(nq*nr/(nq*ng)) - 1.0

Friday, May 20, 2011

New Project

Because things aren't yet working with my reconstruction project, David has suggested I work on something else in parallel so that if I end up not being able to get a result with the reconstruction, I still have another option for my thesis.

Here is the new project description:
~~~~~~~~~~~~

SDSS-III Project 127: Cross-correlation of BOSS spectroscopic quasars on Stripe 82 with photometric CFH galaxies.

Participants:
Jessica Kirkpatrick
Martin White, David Schlegel, Nic Ross, Alexie Leauthaud, Jean-Paul Kneib

Categories: BOSS

Project Description:
We plan to measure the intermediate scale clustering of low redshift BOSS quasars along Stripe 82 by cross-correlation against the photometric galaxy catalog from the CFH i-band imaging on Stripe 82. There are approximately 1,000 BOSS quasars with 0.5
<z<1 and just under 6 million galaxies in the -43<RA<43 and -1<DEC<1 region brighter than i=23.5, which should lead to a strong detection of clustering over approximately 2 orders of magnitude in length scale. The geometry of the stripe suggests errors on the cross-correlation can be efficiently obtained by jackknife or bootstrap sampling the ~50 2x2 degree blocks.

We intend to split the QSO sample in luminosity and black hole mass. We plan to estimate the BH mass using the fits from Vestergaard and Peterson, knowing the Hbeta line width and the continuum luminosity at 5100A. The pipeline measures the former, we plan to measure the latter from the photometry calibrated with Ian McGreer's mocks.

If we use the QG/QR-1 estimator we do not need the quasar mask, only that of the galaxies which will be provided by the CS82 team in the form of a pixelized mask from visual inspection. The dN/dz of the galaxies is known from photometric redshifts plus spectroscopic training sets. While the galaxies could be split in photometric redshift bins, the gains from doing so are not expected to be large, so our initial investigations will simply cross-correlate the quasars with the magnitude limited galaxy catalog.

Along with this project we will submit a request for EC status for:

Ludo van Waerbeke
Hendrik Hildebrant
David Woods
Thomas Erben

who were instrumental in obtaining and reducing the CS82 data and producing the required galaxy catalog and mask but are not members of the BOSS collaboration.

Details are available at https://www.sdss3.org/internal/publications/cgi-bin/projects.pl/display_project/127

~~~~~~~~~~~~~
The first thing Martin wanted me to do was to approximate the errors bars for the correlation function based on the density of galaxies and quasars in my sample.

Starting with luminosity function in this paper, I am doing the following to estimate the density.

According to Table 1 in Ilbert et. al. the following are the Schechter parameters for the galaxy luminosity function, redshift 0.6-0.8:

ϕ* = 5.01e-3 h^3 Mpc^(-3) mag^(-1)
M* (i-band) = -22.17
α = -1.41

Inputing these into IDL's lf_schechter function we get the following:

mags = findgen(5*20+1)/20. - 23
vals= lf_schechter(mags, 5.01, -22.17, -1.41)
plot, mags, vals*10D^(-3), /ylog, XTITLE = 'Magnitude (iband)', YTITLE = 'log phi', TITLE = 'Luminosity Function', charsize = 2, charthick = 1


To get the density we integrate:
density = 0.05*vals*10D^(-3) #bin size is 0.05 mags
print, total(density)
0.041830996
=4.18 * 10^-2 h^3 Mpc^-3

This is similar to what they get in this paper by Faber et al:
According to Faber Paper the luminosity density:
log10(j_B) = 8.5 (@redshift 0.7) solar luminosity = 10^10 solar luminosity / galaxy

j_b = 10^8.5 solar lum = 10^(-1.5) galaxies / Mpc^3 (h = 0.7)
= 3.16*10^-2 galaxies / Mpc^3 (h = .7)
= 9.21 *10^-2 galaxies h^3 / Mpc^3

So we are looking at a density of something in the ballpark of 0.05 galaxies per (Mpc/h)^3.

To get it directly from the data:

5.5254 million galaxies (once mask/cuts applied)
sky area = 166 deg^2
redshift range = 0 - 1
redshift 1 = 2312.67 Mpc / h (in comoving distance)
volume = 166 sq-deg / (3282.90 sq-deg) * 4/3 pi r^3 = 2,619,871,820 (Mpc / h)^3

density = 2.00 *10^-3 galaxies * h^3 * Mpc^(-3)

This is an order of magnitude smaller. Not sure why.... am I perhaps doing the volume calculation incorrectly?

4257 quasars (out to redshift of 1, in the cfht footprint)
density = 1.66e-06 qsos * h^3 * Mpc^(-3)

Exchange with Martin White, RE: Estimating Errors

Martin,
I've figured out the density of the galaxies/qsos from both the LFs and the catalogs, and would like to estimate the errors in the correlation function. I understand that the errors go as 1 / sqrt(pair counts) in each bin. But going from galaxy/qso density to pair counts in a bin is where I am a bit lost. I asked Alexie, and she thought that I just take the density and multiply it by the volume of each bin in the correlation function, and then use that number to compute the pair counts. I don't see how that is the same as the pair counts for the correlation function. The correlation function is measuring separation, so how is that the same as the number of pairs in a volume with a side of the separation?


I went ahead and calculated the correlation function with the following bins (degrees):
theta = (1.0000000e-05, 3.1622777e-05, 0.00010000000, 0.00031622777, 0.0010000000, 0.0031622777, 0.010000000, 0.031622777, 0.10000000, 0.31622777, 1.0000000)

I get the following number of dd pairs in each bin:
dd = (589.000, 561.000, 49.0000, 386.000, 4675.00, 41535.0, 394556, 3.81242e+06, 3.60765e+07, 2.94166e+08)

1/sqrt(dd) = (0.0412043, 0.0422200, 0.142857, 0.0508987, 0.0146254, 0.00490674, 0.00159201, 0.000512153, 0.000166490, 5.83048e-05)

The catalogs are not properly masked, so we might get a reduction in dd values by perhaps 30% once we mask the data.

This would result in the following:

dd = (412.300 392.700 34.3000 270.200 3272.50 29074.5
276189. 2.66869e+06 2.52536e+07 2.05916e+08)

1/(sqrt(dd)) = ( 0.0492485 0.0504626 0.170747 0.0608355 0.0174808 0.00586467
0.00190282 0.000612140 0.000198993 6.96875e-05)

I'm currently running the rr, and will have a "correlation function" this afternoon. Although this will of course be wrong, because I'm not masking properly yet. I should get the masks from Alexia today or tomorrow.

Jessica

~~~~~~

Jessica,

I understand that the errors go as 1 / sqrt(pair counts) in each bin. But going from galaxy/qso density to pair counts in a bin is where I am a bit lost.

If you think of the very simplest correlation function estimator that you can write down, xi=DD/RR-1, and imagine that you have so many randoms the fluctuations in RR are negligible then you see that the errors in 1+xi are given by the fluctuations in the counts of DD in a bin. Assuming Poisson statistics, the fractional error in 1+xi goes as 1/sqrt{Npair} where Npair is the number of quasar-galaxy pairs.

For a 3D correlation function the number of data pairs goes as Nqso times Nbar-galaxy times 1+xi times the volume of the bin (in 3D, e.g. 4\pi s^2 ds for a spherical shell). Just think of what the code does: sit on each quasar and count all the galaxies in the bin. To go from a 3D correlation function to a 2D correlation function you need to integrate in the Z direction. But remember that the sum of independent Poisson distributions is also a Poisson with a mean equal to the sum of the means of the contributing parts. So this allows you to figure out what the error on wp is. You should see that as you integrate to very large line-of-sight distance things become noisier. So choose something like +/-50Mpc/h for the width in line-of-sight distance to integrate over in defining wp.

It's a little easier to understand if you write the defining equations out for yourself on a piece of paper.

Martin

~~~~~~~
Project Reading list
http://arxiv.org/abs/0802.2105
https://trac.sdss3.org/wiki/BOSS/quasars/black_hole_masses
http://iopscience.iop.org/0004-637X/665/1/265/pdf/62903.web.pdf
http://articles.adsabs.harvard.edu//full/1989ApJ...343....1D/0000011.000.html
http://adsabs.harvard.edu/abs/2005A%26A...439..863I