Friday, December 25, 2009

Merry Australian Christmas!

Jessica + Adam + Koala send season's greetings from Australia

Monday, December 14, 2009

Traveling

I am traveling until January 16th. I will be posting sporadically. Happy Holidays everyone!

Sunday, December 6, 2009

Single Epoch Likelihood

I've been playing with the single epoch likelihoods cross checked with the truth table. The rest of the targeting will be on single epoch measurements (i.e. not stripe 82) so these are a more accurate measurement of how well the likelihood is doing. I've been playing with adjusting various parameters like how we define "variability" in terms of the chi^2 cut, if we should add variable objects from the L_everything catalog to L_QSO.

Below are the raw numbers of quasars targeted (normalized to targeting of ~40/degree^2) as well as the accuracy of each method:

No variability Method
; number quasars
1029
;percent accuracy
0.545986
; total targeted
1881


Variable objects with chi^2 > 1.5 subtracted from L_everything
; number quasars
1047
;percent accuracy
0.556619
; total targeted
1881


Variable objects with chi^2 > 1.5 subtracted from L_everything
Variable objects with chi^2 > 1.5 added to L_QSO
; number quasars
1033
;percent accuracy
0.549176
;total targeted
1881


Variable objects with chi^2 > 1.1 subtracted from L_everything
; number quasars
1053
;percent accuracy
0.559809
; total targeted
1881



Variable objects with chi^2 > 1.2 subtracted from L_everything
; number quasars
1055
;percent accuracy
0.560872
; total targeted
1881


Variable objects with chi^2 > 1.3 subtracted from L_everything
; number quasars
1054
;percent accuracy
0.560340
; total targeted
1881


Variable objects with chi^2 > 1.4 subtracted from L_everything
; number quasars
1051
;percent accuracy
0.558745
; total targeted
1881


Variable objects with chi^2 > 1.2 subtracted from L_everything
Variable objects with chi^2 > 1.6 added to L_QSO
; number quasars
1036
;percent accuracy
0.550771
;total targeted
1881

Variable objects with chi^2 > 1.2 subtracted from L_everything
Variable objects with chi^2 > 1.5 added to L_QSO
; number quasars
1035
;percent accuracy
0.550239
;total targeted
1881

Variable objects with chi^2 > 1.2 subtracted from L_everything
Variable objects with chi^2 > 1.4 added to L_QSO
; number quasars
1034
;percent accuracy
0.549708
;total targeted
1881


Variable objects with chi^2 > 1.2 subtracted from L_everything
Variable objects with chi^2 > 1.3 added to L_QSO
; number quasars
1029
;percent accuracy
0.547049
;total targeted
1881

It seems that the method with the best accuracy (56%) is variable objects with chi^2 > 1.2 subtracted from L_everything.

Here is a plot of the L_everything (non variable) vs L_QSO:

Magenta objects are targeted QSOs.
Cyan objects are targeted stars.
Green objects are not targeted stars.
Red objects are missed qsos.

It looks like we might be able to make the L_ratio a function of L_QSO to perhaps pick up a few more of the red objects at the top right corner.

Another interested plot is looking at likelihood-ratio vs g-band flux:

The color scheme is the same as first plot

We might be able to get more missed QSOs (red) if we target faint (g-flux less than 5) objects with a L_ratio close to the cut. I've also plotted likelihood-ratio vs g-r magnitude:

The color scheme is the same as first plot

Again, we might be able to use the fact that the missed QSOs (red) are clustered.

-------------
Cool Tip of The Day:
To download file directly from web using the command line:
wget http://www.downloadingurl.com/filetodownload.fits

If it is from a password protected page (like the wiki) then do the following:
wget --http-user=myusername --http-password=mypassword http://www.downloadingurl.com/filetodownload.fits

Friday, December 4, 2009

Likelihood Optimatization

I've been playing with various parameters in the likelihood method trying to find the most efficient cuts. The three things I have been playing with are:

added errors (what to use as the input adderr into likelihood_compute)
everything - variability (what happens when we take away variable objects from the L_everything file)
QSO + variability (what happens when we add the variable objects to the L_QSO likelihoods)

Here are my findings (these are on the co-added fluxes, next step is to re-run with single epoch):
Normal Errors (adderr = [0.014, 0.01, 0.01, 0.01, 0.014])
Percent of those targeted are quasars (based on 40/degree^2 targeting)
No variability: 0.649914
Variability everything: 0.656196
Variability everything + QSO = 0.651057

5X Errors (adderr = 5*[0.014, 0.01, 0.01, 0.01, 0.014])
Percent of those targeted are quasars (based on 40/degree^2 targeting)
No variability: 0.641348
Variability everything: 0.645346
Variability everything + QSO = 0.603655

7X Errors (adderr = 7*[0.014, 0.01, 0.01, 0.01, 0.014])
Percent of those targeted are quasars (based on 40/degree^2 targeting)
No variability: 0.627641
Variability everything: 0.624786
Variability everything + QSO = 0.572244

No Errors (adderr = 0.0*[0.014, 0.01, 0.01, 0.01, 0.014])
Percent of those targeted are quasars (based on 40/degree^2 targeting)
No variability: 0.644203
Variability everything: 0.627070
Variability everything + QSO = 0.619075

It looks like the errors we were running have the best numbers, and using the variability everything, but not the variability QSO.

I am going to play more with the definitions of variable everything and variable qso to see if I can get these to work better. I also want to play with not cutting on a L_ratio = 0.01, but perhaps changing L_ratio as a function of L_QSO (it seems we might be able to get a few more objects if we have L_ratio cut decrease as L_QSO gets large.

Thursday, December 3, 2009

Variable Likelihood

David Schelgel and I added variability information into the "everything" likelihood training file. This allows us to separate objects that vary (QSOs are included in this list) from objects that don't vary. This should improve our training by removing quasars from the "everything" file.

The training file now has chi^2 information of the changes in fluxes over repeat observations. The likelihood computation splits up L_EVERYTHING into chi^2 bins between the following :

L_EVERYTHING_ARRAY[0] : objs1.flux_clip_rchi2[2] LT 1
L_EVERYTHING_ARRAY[1] : objs1.flux_clip_rchi2[2] GE 1.1 AND
objs1.flux_clip_rchi2[2] LT 1.2
L_EVERYTHING_ARRAY[3] : objs1.flux_clip_rchi2[2] GE 1.2 AND
objs1.flux_clip_rchi2[2] LT 1.3
L_EVERYTHING_ARRAY[4] : objs1.flux_clip_rchi2[2] GE 1.3 AND
objs1.flux_clip_rchi2[2] LT 1.4
L_EVERYTHING_ARRAY[5] : objs1.flux_clip_rchi2[2] GE 1.4 AND
objs1.flux_clip_rchi2[2] LT 1.5
L_EVERYTHING_ARRAY[6] : objs1.flux_clip_rchi2[2] GE 1.5 AND
objs1.flux_clip_rchi2[2] LT 1.6
L_EVERYTHING_ARRAY[7] : objs1.flux_clip_rchi2[2] GE 1.6

Changing L_Ratio to be L_QSO / total(L_EVERYTHING_ARRAY[0:5] (which removes the L_EVERYTHING of the objects with variablility (chi^2 less than 1.5)), we get more QSOs. From the truth table data, using different chi^2 cuts:

L_ELSE = total(likechi.L_EVERYTHING_ARRAY[0:1],1) ; chi^2 LT 1.2
;new quasars selected
QNEWSELECT LONG = Array[77]
;new stars selected (not quasars)
SNEWSELECT LONG = Array[106]
; #new_quasars/#new_stars
0.726415

L_ELSE = total(likechi.L_EVERYTHING_ARRAY[0:2],1) ; chi^2 LT 1.3
;new quasars selected
QNEWSELECT LONG = Array[69]
;new stars selected (not quasars)
SNEWSELECT LONG = Array[92]
; #new_quasars/#new_stars
0.750000

L_ELSE = total(likechi.L_EVERYTHING_ARRAY[0:3],1) ; chi^2 LT 1.4
;new quasars selected
QNEWSELECT LONG = Array[67]
;new stars selected (not quasars)
STILLMISSING LONG = Array[159]
; #new_quasars/#new_stars
0.797619

L_ELSE = total(likechi.L_EVERYTHING_ARRAY[0:4],1) ; chi^2 LT 1.5
;new quasars selected
QNEWSELECT LONG = Array[65]
;new stars selected (not quasars)
SNEWSELECT LONG = Array[79]
; #new_quasars/#new_stars
0.822785

L_ELSE = total(likechi.L_EVERYTHING_ARRAY[0:5],1) ; chi^2 LT 1.6
;new quasars selected
QNEWSELECT LONG = Array[62]
;new stars selected (not quasars)
SNEWSELECT LONG = Array[72]
; #new_quasars/#new_stars
0.861111



This is L_EVERYTHING (chi2 less than 1.4) vs L_QSO
Green: Not quasars
Magenta: Quasars that the old likelihood method (no variability) targeted
Cyan: Quasars that the likelihood + variability method targets
Red: Quasars we are still missing (not targeting).

On another note, possible Thesis Title: "Putting Galaxies in their Place." Thoughts?

Tuesday, December 1, 2009

SVM Results

I've been playing with using a support vector machine (SVM) for quasar target selection. Adam suggested this because it seem that the likelihood method was very similar to this already solved computer science problem. He helped me get SVM-light working using R.

The way this method works is that you input variable information for a set of training objects that represent what you are looking for (in this case quasars) and what you are not looking for (everything else).

I trained on the u-g-r-i-z fluxes (the same fluxes used to in the likelihood method) from the qso template and everything else (from now on referred to as everything) template objects. I tell SVM which are quasars and which are everything. It takes a long time to train the SVM, so I've been taking subsets of the quasar/everything catalogs to make the system faster (the likelihood uses ~1,000,000 objects).

I then take the human-confirmed truth table objects and run them through the SVM to classify as quasar or not. Below are the results:

Using 30,000 training qsos and 30,000 training stars
#quasars targeted
[1] 700
#not quasars not targeted
[1] 466
#quasars not targeted
[1] 255
#not quasars targeted
[1] 926
>
#Accuracy of targeting (qsos targeted / total targeted)
[1] 0.4305043


~~~

Using 100,000 training qsos and 100,000 training stars
#quasars targeted
[1] 709
#not quasars not targeted
[1] 480
#quasars not targeted
[1] 246
#not quasars targeted
[1] 912
>
#Accuracy of targeting (qsos targeted / total targeted)
[1] 0.4373843


~~~

Compared with the likelihood method:
#quasars targeted
[1] 793
#not quasars not targeted
[1] 908
#quasars not targeted
[1] 162
#not quasars targeted
[1] 484
>
#Accuracy of targeting (qsos targeted / total targeted)
[1] 0.620987


I can try adding in more information like the errors, or the likelihood values as additional vectors in this analysis, but it doesn't seem like it is working as well as the likelihood or that adding more objects improves accuracy very much (using ~3 times as many objects only increased the accuracy by >1%). I could run on larger training sets and see if this makes a difference too.

Monday, November 30, 2009

R Packages on Riemann

I asked Michael Jennings to install R on riemann so I can play with this SVM-light thing for the target selection. He installed R but asked me to install any add-on packages into my home directory. Here is how you do this:
  1. Create a directory where you want the R add-on packages to be stored (i.e. /home/jessica/R/library)
  2. Tell R to look for libraries in this location by adding this to your .path file: export R_LIBS=/home/jessica/R/library
  3. Source your .bashrc file to update path changes
  4. Open R
  5. Type the command: install.packages("e1071", lib = '/home/jessica/R/library', repos = "http://cran.r-project.org") where "e1071" is the name of the package you want to install, lib is the directory you created to store the packages, and repos is the location to download the packages from.
  6. Test to make sure it was successful by closing R, re-opening and importing package you just installed: library("e1071")
And now you have successfully installed a R package on Riemann. Thanks to Michael for working so quickly. Now, to try this SVM thing on the quasar targeting...

Here's the code (also in like:svmcode.r)

library("e1071")
library("FITSio")
setwd("/Users/jessica/Documents/workspace/newman/Jessica/likelihood")
qso = readFITS("smallqso.fits")
star = readFITS("smallstar.fits")
x2 <- as.matrix(star$col[[1]],ncol=5)
x1<- as.matrix(qso$col[[1]],ncol=5)
x <- rbind(x1,x2)
y1= seq(1,smallsize)*0+1
y2= seq(1,smallsize)*0

y=c(y1,y2)
fy = factor(y)
m = svm(x,fy)
save(list = ls(), file = "./svmcode.Rdata")


Where smallqso.fits and smallstar.fits are subsets of the qso/star templates used in the likelihood calculation. In this run I am using 100,001 objects for these small files, but that run is taking 2+ days so far on my laptop, so I think I'm going to try to do it with less points for this run on riemann (30,000). To load this session image, type: load("./svmcode.Rdata")

To run this file as a script type the following (for some reason the qsub way of running a script isn't working):
R CMD BATCH ./svmcode.r svmcode.out 2> svmcode.err &

Thursday, November 26, 2009

Happy Thanksgiving

Happy Thanksgiving Everyone!

Wednesday, November 25, 2009

Likely Testing

I'm trying to track down why the likelihood method is missing quasars that other targeting methods are finding.

The plot of human-confirmed QSOs that were in the target list for the likelihood method. The magenta are targeted by likelihood, the cyan are missed by likelihood, the white are targeted only by likelihood (click on image to enlarge):

Color-Color (u-g vs g-r)
of likelihood selected objects (magenta),
likelihood missed objects (cyan),
likelihood only objects (white)

The reason this plot looks different than yesterdays is because I was accidentally plotting both QSOs and stars on the plot yesterday. The above human-confirmed QSOs.

Below is a log-log plot of the likelihood selection variables L_everything vs L_qso for the likelihood-selected QSOs (magenta), the likelihood-missed QSOs (cyan), the likelihood-selected stars (red), and the likelihood-missed stars (green) you can see the value of L_ratio we cut on is the slope of the dividing line between these populations (click on image to enlarge):


L_everything vs L_qso

I've looked at the individual fluxes of the likelihood selected/missed QSOs to see if there an obvious place in flux-color space where we can get more of the missing objects. They are all intermixed.

David Schlegel suggested increasing the 'added errors' in the likelihood calculation to see if this improves things. I am re-running the likelihoods on these ~1400 objects (above) with larger errors. The hope is that because we are have such a sparse sample of the QSOs, that by increasing the errors we'll allow for objects that are "farther away" in flux-space to be selected. The errors are currently on the order of ~1%. I am re-running with 10%, however David suggests that I do it with 5%, so I'll run with that too.

Tuesday, November 24, 2009

Commissioning Quasars

I am trying to answer these questions about the likelihood targets in the commissioning data. Today I took the collated targets for the QSO commissioning region from the wiki (here) and used spherematch to align it with the truth table that Nic Ross has provided me (based on visually inspected spectra). For truth table see the Nov 20th email from Ross with subject line BOSS Quasars "Truth Table." Both are also on riemann in the directory:
/home/jessica/boss/BOSS_QSOs_First14plates.fits
/home/jessica/boss/bosstarget-qso-comm-collate.fits
Target bitmasks from the wiki:
# restrictive qso selection
maskbits BOSS_TARGET1 10 QSO_CORE
# permissive qso selection
maskbits BOSS_TARGET1 11 QSO_BONUS
# known qso between [2.2,9.99]
maskbits BOSS_TARGET1 12 QSO_KNOWN_MIDZ
# known qso outside of miz range. never target
maskbits BOSS_TARGET1 13 QSO_KNOWN_LOHIZ
# Neural Net that match to sweeps/pass cuts
maskbits BOSS_TARGET1 14 QSO_NN
# UKIDSS stars that match sweeps/pass flag cuts
maskbits BOSS_TARGET1 15 QSO_UKIDSS
# kde targets from the stripe82 coadd
maskbits BOSS_TARGET1 16 QSO_KDE_COADD
# likelihood method
maskbits BOSS_TARGET1 17 QSO_LIKE
These are bitmasks, so in IDL to select items that are only selected by the likelihood method we would do the following:
LIKEONLY = where(truthmatch.BOSS_TARGET1 EQ ishft(long(1),17))

To select items which are selected by the likelihood (and others):

LIKE = where((truthmatch.BOSS_TARGET1 AND ishft(long(1),17)) GT 0)


To select items which are not selected by the likelihood (selected by others):

NOTLIKE = where((truthmatch.BOSS_TARGET1 AND ishft(long(1),17)) EQ 0)

To select objects with high redshift:

HIGHZ = where(truthmatch.Z_PERSON GT 3)


Below is a color-color plot of the above selections on top of the templates used for the likelihood selection. Red is everything. Green is QSOs. Magenta are objects that the likelihood method selected (LIKE). Cyan are objects that only the likelihood method selected (LIKEONLY). White are objects that have a high redshift (HIGHZ):

Color-Color (u-g vs g-r)
of likelihood selected objects

Below are objects that are QSOs but were missed by the likelihood (NOTLIKE). Need to figure out why.


Color-Color (u-g vs g-r)
of not-likelihood QSO objects

Monday, November 23, 2009

Revisiting Likelihood

Nick, David and Adam want me to check that the single epoch likelihood computation is correct. I also need to address some long-standing questions about the likelihood method including:
1. Why are we getting z > 3 when we aren't targeting these objects?
2. What is the proper normalization of the likelihood objects?
And several other questions posted a while back and never answered.

The likelihood code is at the following location on the repository:
https://trac.sdss3.org/browser/repo/boss/bosstarget/trunk/pro/qso-like/

I've also downloaded it here on riemann:
/home/jessica/boss/bosstarget/trunk/pro/qso-like

The varcat files are here (on riemann):
/home/schlegel/varcat4

The star/qso template files are here on the repository:
https://trac.sdss3.org/browser/repo/boss/bosstarget/trunk/data

I've also downloaded it here on riemann:
/home/jessica/boss/bosstarget/trunk/data

Myers sent me a fits file that has the single epoch targets with the likelihoods he computed. He wants me to check that they look reasonable. The file is in an email sent Nov 18/19 from Adam Myers with the subject line: Likelihood Method.

Below is an email I sent to Myers today regarding this check:

-------------------
From: Jessica Ann Kirkpatrick
Date: Mon, Nov 23, 2009 at 4:15 PM
Subject: Re: Likelihood method
To: Nic Ross
Cc: Joseph Hennawi , Adam Myers, David Schlegel

Adam,
when I compute the likelihoods I get the same values as you do for
like_ratio, like_everything, like_QSO_Z (well, within expected
rounding errors).

One thing that I had a question about is how like_rank is numbered. I
was assuming that like_rank = 0 would be the target with the largest
like_ratio, but it seems it is the reverse (like_rank = 0 is the
lowest like_ratio object).

So we are sorting target selection by largest ranking to smallest?

I've attached a color-color (ug, gr) plot of the 10,000 highest ranked
likelihood objects (cyan) on top of "everything" objects (red) and
qsos (green). You can see that these targets fall in the quasar zone.


Color-Color (ug, gr) plot of the 10,000 highest ranked
likelihood objects (cyan) on top of "everything" objects (red) and
qsos (green). You can see that these targets fall in the quasar zone.

Useful Mac Tips of the Day (via Demitri):
If you want to ssh into your Mac, go to
  • System Preferences -> Sharing -> Remote Login
  • Turn Remote Login on. Then find out your IP address by typing the following into a web browser: http://whatismyipaddress.com/
You can then ssh/scp into your Mac in the normal way using jessica@myipaddress!

Also... Alexia will be happy to know that I finally implemented different terminal colors for different computers that I am logged into (so much less confusing):


Top terminal window (green) is riemann,
lower terminal window (black) is my Macbook.

And... Demitri showed me how to turn any mp3 into a ring tone using Garage Band. What a great day for cool tricks! Now my alarm clock will be the beginning of James Brown's Sex Machine: "Get-up, get-on-up. Get-up, get-on-up."

Friday, November 20, 2009

Mock Reconstruction

I ran a full reconstruction on the mock data over the past couple days. There are some problems with this run (which I discovered after I started it running, but decided to just let it finish). The first is that I have not fixed the distribution of the randoms in the 2D correlation function to go as cosine for the declinations (they are flat). The second is that I discovered the mock catalog I was using was not dark matter halos but actually LRGs and to the clustering properties are nonsensical. I have since gotten a dark matter halo catalog from Martin White.

Here are the 2D and 3D correlation function for the various redshift bins:


2D correlation functions for redshift bins


3D correlation functions for redshift bins

As you can see the 3D correlation function are no longer fluctuating around zero as they were on the Sloan data.

The reconstruction is a bit puzzling. Below is a histogram of the comoving distances (from the center of the sphere) of the spectroscopic (yellow), photometric (green) data sets as well as the reconstructed phi (redshift distribution (pink)). You will noticed that while the histograms don't match, the bumps in the photometric/spectroscopic histograms seem to match the bumps in phi. I was thinking that because these photometric/spectroscopic histograms include both the geometry and the "selection function" so to speak (I didn't actually apply a selection function, so it is just the natural clustering of the objects). Whereas phi (I think) is only supposed to be the selection function, not the geometry. I was wondering if perhaps if I subtract the geometry from the p/s histograms (which is proportional to r^2) then we would basically be left with these same wiggles? I've asked Alexia about this and I am waiting for a response.


Thursday, November 19, 2009

Catalog Woes

It turns out my mock catalog is probably not what I want to be using. I talked to Alexia and Martin White, and I am using a LRG catalog, but I want to be using a halo catalog. This probably means that my clustering properties are nonsensical (because I am generating galaxies based on galaxies, not on halos). Martin said he is going to give me a better catalog to use.

In the mean time I am running on the catalog I have. The reconstruction should work as long as both the photometric and spectroscopic data sets are generated in the same way... even if they don't accurately represent any real distribution on the sky.

The reconstruction is currently running, and I should have it done later today.

I also checked my 3D slice against Alexia's to make sure they match, and they do:

My and Alexia's 3D Correlation functions on
a redshift slice of the mock data

Wednesday, November 18, 2009

Running Scripts (Update)

I want to utilize the fact that I can run the correlation function on multiple nodes of riemann. Eric helped me learn to use qsub to run a script on multiple nodes. I've got it working as follows:

In the file on riemann 091118jess.script there is the following:

#PBS -l nodes=1
#PBS -N jessjob1
#PBS -j oe
#PBS -M kirkpatrick@berkeley.edu
#PBS -V
cd /home/jessica/repository/ccpzcalib/Jessica/pythonJess
./runcode.script


where runcode.script contains the following:
python runReconstructionMock.py

I make both runcode.script and 091118jess.script executables, and then type in the terminal:
qsub 091118jess.script

I then exit from riemann/insure (didn't close it using x button, not sure if this makes a difference).

When I log back in the code is still running. (check using qstat)

Now I need to integrate this into the python code so that qsubs are generated for each correlation function for each redshift slice. This should make things go a lot faster.

The email from Eric:
----------
From: Eric Huff
Date: Wed, Nov 18, 2009 at 3:06 PM
Subject: qsub
To: Jessica Kirkpatrick

Here's a script example:

#PBS -l nodes=1
#PBS -l walltime=24:00:00
#PBS -m bea
#PBS -M emhuff@berkeley.edu
#PBS -V
ulimit -s unlimited
export PHOTO_DATA=/clusterfs/riemann/raid006/bosswork/groups/boss/photo/data
export PHOTO_REDUX=/clusterfs/riemann/raid006/bosswork/groups/boss/photo/redux
cd /home/emhuff/Coadds/Code/Coadd_Data_Root/Scripts
echo `hostname`
/home/emhuff/Coadds/Code/Coadd_Data_Root/Scripts/r.out 380720 6 0 00

And here's some online documentation for qsub:

http://www.doesciencegrid.org/public/pbs/qsub.html
http://njal.physics.sunysb.edu/tutorial-1.html

I'm running the full reconstruction on the mock data, I might have a fully working code tomorrow. Woo hoo!

Degrees vs. Radians

Turns out the "List length exceeded MaxList" error was due to the randoms not falling in the same region as the data. I had forgotten to convert my data from radians to degrees in Python (but it was being converted in the C program) and so the C-converted randoms (in x, y, z) did not match the Python-converted data (in x, y, z).

I've fixed this, and get the following 3D correlation functions for my redshift slices of the Mock data (and they look pretty darn good)!


My 3D Correlation function on Mock Data

This makes me want to make the bold statement that all the problems I was having with my 3D correlation functions on the Sloan data was a problem with the data, (not my code)... but I need to check this again because the 2D correlation functions looked ok on the Sloan data. This could be an issue of the photo-z's just being completely wrong, (David keeps saying he doesn't trust the CAS) so the 2D correlation function (which only uses ra and dec) is fine, but once we start using the photo-z's things get messed up.

Another thought I had was that I need to fix this declination random cosine population issue in the 2D correlation function, currently the declinations are being populated using a flat distribution.

Oh today is a good day indeed!

Énergie Noire

I think from now on I am going to refer to dark energy by it's French name. Énergie Noire just sounds so much more exotic.

There is a BigBOSS Meeting for the rest of the week at LBNL therefore I don't know how much I'll get done research-wise, but there is also a looming target selection deadline, so I need to do some stuff there too.

Tuesday, November 17, 2009

Exceeded Maxlist

I am getting a strange error when I run the 3D Correlation function. The error is from Martin White's meshing algorithm (nearmesh):

"List length exceeded MaxList=234700"

It think it happens when a point is near any mesh within the grid. But I am not sure exactly, I didn't write this section of the code, and I've never gotten this error before.

I wrote Alexia about it, and I am hoping she'll have some experience with this problem.

Below is the email I send her today:

-------
From: Jessica Ann Kirkpatrick
Date: Tue, Nov 17, 2009 at 11:24 PM
Subject: halo files and numbers
To: Alexia Schulz

Alexia,
the file I am using is from here: http://mwhite.berkeley.edu/Mock/

It was under a section that used to be there called LRG but now seems to be gone. See this blog entry.

The number of halos in the file is 102496.

When I run halolist with mmin = 5e10 I get 3811293 galaxies (for photometric data set)
When I run halolist with mmin = 5e11 I get 378932 galaxies (for spectroscopic data set)

output when I run halolist is below:

In [324]: #make galaxies
In [325]: mmin=5.e10 #for gal
In [326]: h=gals.halolist(mockLRGfile,mmin)
1.30144190788 seconds wall time to readin
1.3 seconds processor time to readin
33.0253970623 seconds wall time
30.84 seconds processor time

In [327]: g=gals.galist(h,nbins,outfile=photofile)
4.70264683703
halo number is 102496
number of halos with 1 or more gals 102496
176.569406033 seconds wall time
168.58 seconds processor time

In [328]: mmin=5.e11 #for spec
In [329]: h=gals.halolist(mockLRGfile,mmin)
10.7454509735 seconds wall time to readin
1.66 seconds processor time to readin
37.9757530689 seconds wall time
27.46 seconds processor time

In [330]: g=gals.galist(h,nbins,outfile=specfile)
4.70264683703
halo number is 102496
number of halos with 1 or more gals 101718
27.7532730103 seconds wall time
26.72 seconds processor time

I'm getting the error:
List length exceeded MaxList=234700
on one of my runs, has this ever happened to you?

It seems to be coming from deep in the calculation in the function meshnear.

I am getting this error when I try to run on redshift slices which are further away (and thus have higher statistics).

Tomorrow the BigBOSS meeting has a break from 12-1. Can I call you sometime in that window?
Cheers,
Jessica

--------

I thought perhaps a problem is that the randoms are not overlaping with the data (and so the nearmesh would mess up because the randoms are not in the same region as the data).

And this seems to be part of the problem:





Need to check these to see where the problem is. I think there might be a degrees vs radians issue. I'll look at this tomorrow.

Monday, November 16, 2009

Input files

I've decided to use put files for the arguments to my correlation functions (for purposes of documentation and also to make debugging easier). I used to do this with DRIFT data analysis, and liked it a lot.

The main problem is that when I debug using Xcode, you type in the arguments by hand. This is annoying because the masks change everytime I run a different slice. This isn't an issue when I am not debugging because I simply have python feed the arguments into the C program (that does the correlation).

Essentially I have changed the correlation function code so that it's only input parameter is a file name which has all the arguments in it. I think this should also help with replicating old runs as well as general organization.

Friday, November 6, 2009

Mock Slice

The next step in trying to figure out what is wrong with my 3D correlation functions is to try to actually do a full reconstruction on the mock data. I'm starting with taking a redshift slice of the mock data and calculating a 3D correlation function on the slice to see if I get a logical result. This appears to be working:


Matching 3D correlation functions

To be more explicit about what I am doing to create the above. I am first taking the mock data box and selecting a "redshift slice" of the data:
For my correlation function:



Redshift slice (green) of all data points in
comoving distance range of 350-450 Mpc/h

What this looks like in Cartesian coordinates:



It is hard to see in 2D projections,
but this is essentially a thick spherical shell with inner radius
350 Mpc/h and outer radius 450 Mph/h

For Alexia's correlation function I am taking a slice in the z dimension:



Redshift slice (green) for Alexia's data

I chose the slice such that both mine and Alexia's data subsets had approximately the same number of objects (~100,000).

The distributions of the randoms continue to match the data:
For my correlation function:








Number of randoms (blue) and data points (red)
for different dimensions (ra, dec, redshift, x, y, z)

This gives me the confidence to try to do the full reconstruction on this mock data set, as that would simply be just repeating the above on several redshift bins.

I seem to be running Alexia's and my correlation functions side by side a lot to compare the results, I have made the following scripts:

svn +x *.script
./091105jess.script 091105jess.out 2> 091105jess.err &
./091105alexia.script 091105alexia.out 2> 091105alexia.err &

Cool link of the day:
Now my parents can understand xkcd: http://www.explainxkcd.com/

Update (from Kevin):
As far as http://www.xkcd.com is concerned,
http://www.explainxkcd.com is useful,
but http://xkcdexplained.com/ is funny,
and http://xkcdexplainedexplained.tumblr.com/ exists.

Thursday, November 5, 2009

Data Fixed?

I wanted to see if the changes I made to the declination distribution of the randoms fixed the problem with the Sloan data 3D correlation function. It appears it has not.

The correlation function is going negative ~5 Mpc/h:




As per Nic Ross's advice I am printing out the individual elements of the correlation function:

bin dd dr rd rr
0 317762.000000 216879.000004 217092.400004 226377.000004
1 1438272.000000 1190313.799866 1189213.999867 1244237.199854
2 2708480.000000 2477950.600010 2481026.000013 2551443.400078
3 3778250.000000 3627401.601080 3635341.801088 3668580.801119
4 4493904.000000 4437785.801835 4441650.801839 4424500.401823
5 4827204.000000 4858547.602227 4850951.402220 4790191.802163
6 4832844.000000 4926666.002291 4915620.202280 4845375.602215
7 4632790.000000 4742744.802119 4739360.602116 4702071.602081
8 4356652.000000 4461201.001857 4459481.601855 4483517.001878
9 4157056.000000 4233592.001645 4233477.601645 4276671.401685
10 4014424.000000 4070859.001493 4070540.001493 4105703.401526
11 3903134.000000 3944458.201376 3945670.001377 3970457.201400
12 3811114.000000 3847011.401285 3849257.601287 3867510.001304
13 3745984.000000 3769143.601212 3771384.201215 3788595.401231
14 3684692.000000 3707139.601155 3706282.401154 3723841.201170
15 3632222.000000 3651903.001103 3652026.201103 3667982.201118
16 3582504.000000 3600661.601056 3600676.801056 3617006.401071
17 3541194.000000 3554850.401013 3556219.601014 3567875.401025
18 3505996.000000 3511868.800973 3511927.400973 3523757.600984
19 3464186.000000 3468078.800932 3468489.800932 3482728.600946
20 3418956.000000 3425039.800892 3424701.000892 3442556.200908
21 3376304.000000 3386133.600856 3386409.600856 3406330.600875
22 3348124.000000 3351585.600824 3351850.400824 3371267.600842
23 3312086.000000 3319740.000794 3317782.400792 3335477.000809
24 3278338.000000 3285228.400762 3283852.800761 3299746.600775
25 3237588.000000 3252288.800731 3251628.800731 3266714.800745
26 3207848.000000 3220719.400702 3217525.800699 3234663.000715
27 3184346.000000 3189389.600673 3187092.600670 3199858.800682
28 3154432.000000 3160154.200645 3157859.200643 3168571.000653
29 3127406.000000 3131088.200618 3127228.000615 3137800.800625
30 3093844.000000 3098019.600587 3096805.400586 3105624.200595
31 3064538.000000 3064802.600557 3064300.800556 3072282.600563
32 3036138.000000 3035210.800529 3032522.400526 3040139.800534
33 2999470.000000 3000089.200496 3000837.800497 3008239.000504
34 2962448.000000 2968977.200467 2970271.600468 2979345.000477
35 2927012.000000 2934725.000435 2935492.800436 2946856.400447
36 2901310.000000 2903968.800407 2902961.000406 2913088.800415
37 2863636.000000 2869294.400374 2869514.600375 2881526.600386
38 2821798.000000 2833074.200341 2834320.800342 2847371.800354
39 2796586.000000 2803686.200313 2803161.600313 2813829.200323


Because the dr an rd are being subtracted from rr and dd in the calculation of the correlation function, it is possible for the correlation function to go negative if the objects are more correlated with the random data than with themselves. This seems very strange to me, but it is possible that because we don't trust the photo-z's from the data set I am using that maybe this is causing the problem?

I think the next step is to run the reconstruction purely on the mock data to see if that works, and this should isolate if this is a bad data issue or a bad code issue.

Wednesday, November 4, 2009

Populating the Sphere

I am trying to populate the randoms in the same way that the data is populated over the sphere. This means that in right ascension I populate the randoms with a uniform distribution over the range of the data values. However, for declination I populate using a cosine function (because we want to density per volume element to be constant). I am doing the following:
oo = 1;
while(oo){
thisdec = drand48()*mask; // Choose a declination inside the mask
theta = drand48(); //Choose a random value for sin(thisdec)
/*If theta is "below the sin(thisdec) curve",
keep thisdec, otherwise repeat above process. */
if(theta < sin(thisdec){
Rb[nn].pos[i] = thisdec;
oo = 0;
}

}

This uses a rejection sampling method to populate the declinations. It seem to work, here are plots of the random distributions compared with the data, you can see that the distributions match:







Matching distributions of data (red) and randoms (blue)

Just to make sure that the data still falls in the same regions as the randoms here are more plots:







The data (red) falls in the same physical region as the randoms (blue)

Now my correlation function matches Alexia's again (as opposed to my October 30th post):


Matching 3D correlation functions!

Tuesday, November 3, 2009

Stripe 82 Histograms

Here are histograms of the random numbers versus data for Stripe 82. As you can see it seems like the randoms are matching the data pretty well:




Cartesian





Spherical Coordinates

Therefore I don't think this is what is causing the problem in my calculation on the data. I coded up the correlation function to print out the components as per Nic's suggestion (see yesterday's post). Hopefully this will help me isolate the problem.