Thursday, January 28, 2010

Working Data Reconstruction!

I ran the SDSS CAS data though our working pipeline. I've run on a small data set as a first go -- the same size as what I was running on for the reconstructions with the mock data -- 20,000 points in both the photometric and spectroscopic data sets. The difference here is that both data sets are coming from the same super-set of data (I am randomly picking points from a larger set of Stripe-82 data), so the redshift distributions of the two sets are the same (see below plot). This, in theory, should make the reconstruction easier to do.

Redshift Distributions of photometric and spectroscopic data sets

The Bad News
The correlation functions look really noisy. This is hopefully due to the fact that I am running with 20,000 points instead of 1,000,000 points (before) so hopefully the noise will clear up when I re-run with a larger data set. I wouldn't expect the correlations to look as smooth on a smaller data set as it does with the mock data, because the mock data, by design, traces dark matter populations explicitly, so the dark matter signal is put in by hand. There is also more noise on this data than on the mock data. It is logical to me that we might need higher statistics to get the same strength of signal as we did in the mock data.

The Good News
Even with the noisy correlation functions, the reconstruction looks not too bad. There seems to be a problem at the edges (it seems to be forcing the reconstruction to be zero at redshift zero and thus changing the shape on the low end), but overall it looks pretty good. (At least much better than it's ever looked before). Thoughts Alexia?

I've chosen the binning which looks the best. Like what happened in the mock data the reconstruction changes a lot depending on the binning. However, it looks like I might actually have something working here! Woo hoo!

Next steps
Higher Statistics: I'm not sure if I should run on Eyal's LRGs or on a bigger set of CAS data. The CAS data is easier to do right away, so I might as well set that going (it will take a couple days), and in the mean time I can work on getting my code to read in Eyal's randoms instead of generating it's own. Will talk to Schlegel about this and see what he thinks.

Other Catalogs: According to Eric Huff, Reiko Nakajima has a SDSS catalog with her own calculated photo-z's that she uses for lensing. I will talk to her about this. I also should get in touch with Antonio Vazquez about catalogs.

Optimization: There is quite a few things I can do to make this code run faster. I need to spend some time writing scripts to simultaneously calculate all the correlation functions (right now they are calculated in succession). This should speed up the code by a factor of ten. I also should write scripts so that I can just set a run going and then log-out of the terminal window.

Organizational Note
For a while now I've been creating a log file for each day with the code I use to generate the plots for the blog post that day. I've been keeping these log files locally on my computer, but decided it might be useful for collaborators to have copies of them, and also for there to be a back-up should something happen to my laptop. I've now added these logs to the repository under: repositoryDir/Jessica/pythonJess/logs. The name of the log file is the date of the blog post which the code corresponds to.

No comments:

Post a Comment