Monday, February 28, 2011

BOSS Mask Problem?

Below is an email I sent to Martin White today after Shirley, Nic and David couldn't figure out what I was doing wrong with my masks/randoms:

~~~~~~~~
Martin,
Back in December you generated a polygon mask for me for the BOSS data. I've been using this mask + BOSS data for my reconstruction project with Alexia and have gotten puzzling results when I compute correlation functions on the data.

This prompted me to look more closely at the way I was applying the mask to make sure I wasn't making a stupid mistake.

The BOSS data inside the mask (with weights greater than 0) looks very patchy up close, and both Ross and Schlegel think it shouldn't look like this.

I was wondering if you could look at the attached plots and see if you think the mask is being applied correctly.

MaskedBOSSDataAll.png shows the entire footprint. The blue points are galaxies in the spAll file that were outside the mask or have a weight = 0. The red points are galaxies in the spAll file that were inside the mask and have a weight GT 0.


MaskedBOSSDataClose.png shows a zoom into the region (35 < Dec < 55, 110 < RA < 130). The color scheme is the same as above. This shows the strange "patchy" nature of the masked data even in regions which seem to have been observed. I tried to use the same ra/dec range as Fig 3 from your paper (http://arxiv.org/PS_cache/arxiv/pdf/1010/1010.4915v2.pdf) for easy comparison.


One difference between my data and that in your plot is that the weights (completeness) of the red objects in my plots range from [0.75 to 1.0], whereas in your paper's plot they seem to range from [0 to 1.0]. Perhaps this is how you determined what is in the mask, if it had a spectroscopic completeness greater than 0.75?

If you have any insight into what I am doing wrong, that would be really helpful. Below is the IDL code I am using to apply the masks and generate the attached plots.

Thank you,
Jessica

;Here the code I used to apply the mask:
;Read in BOSS data
bossgalaxies = '/clusterfs/riemann/raid001/jessica/boss/spAll-v5_4_14.fits'
spall = mrdfits(bossgalaxies, 1)

;Trim to be a galaxy
isgal = where(spall.specprimary EQ 1 $
AND (spall.boss_target1 AND 2L^0+2L^1+2L^2+2L^3+2L^7) NE 0 $
AND strmatch(spall.class,'GALAXY*') $
AND spall.zwarning EQ 0)

;Select galaxies
galaxies = spall[isgal]
ra = galaxies.ra
dec = galaxies.dec
z = galaxies.z
spall = 0

bossmask = "/clusterfs/riemann/raid001/jessica/boss/bossX.002.ply"
read_mangle_polygons, bossmask, polygons, id
results = is_in_window(ra=ra,dec=dec,polygons,in_polygon=polynum)
inBossMask = where(results GT 0)
polys = polynum[inBossMask]
bossWeight = polygons[polys].weight
;Cut the Ra/Dec/z such that it is in the Shirley Mask
sra = ra[inBossMask]
sdec = dec[inBossMask]
sz = z[inBossMask]

;Here is the code I used to make the attached plots
;Plot Data that is inside the mask
window,xsize=700,ysize=600
xobject = sra
yobject = sdec
xtit = 'RA'
ytit = 'Dec'
mtit = 'SDSS BOSS Data w/ and w/o Mask'
plot, xobject, yobject, psym=3, symsize=2, xrange = [130,110], yrange=[35,55], XTITLE = xtit, YTITLE = ytit, TITLE = mtit, charsize = 2, charthick = 1, thick = 2, xthick=2, ythick=2
oplot, ra, dec, ps=3, color=fsc_color('blue')
thisweight = where(bossWeight GT 0)
oplot, xobject[thisWeight], yobject[thisWeight], ps=3, color=fsc_color('red')


plot, xobject, yobject, psym=3, symsize=2, xrange = [0,360], yrange=[0,60], XTITLE = xtit, YTITLE = ytit, TITLE = mtit, charsize = 2, charthick = 1, thick = 2, xthick=2, ythick=2
oplot, ra, dec, ps=3, color=fsc_color('blue')
thisweight = where(bossWeight GT 0)
oplot, xobject[thisWeight], yobject[thisWeight], ps=3, color=fsc_color('red')

Thursday, February 24, 2011

Figuring out Mask Problems

I re-made the random catalogs without removing the duplicates. David thought that perhaps this was causing problems with the randoms not matching the catalogs or the correlation functions not working properly.

Below are a bunch of histograms of the distribution of the data (white) and randoms (green) for the two methods I have tried for making randoms, mine and Shirley's. You'll notice that the spectroscopic randoms mismatch the spectroscopic data much more than the photometric data/randoms (for both methods). This makes me perhaps think there is a problem with the mask I am using for the spectroscopic set. Martin White made this mask for me.. I've ask for a meeting with David this afternoon to look at this in more detail.

My randoms



Shirley's Randoms



I've tried zooming in on regions in the data where the mismatch is greatest. For instance ( dec > 40, 110 < ra < 130). I've posted these on the blog too.

It does look like there are regions where there are data and not randoms, which would suggest a problem with the mask. (Note that the regular spacing of the randoms is because this is how Shirley generates randoms, by putting an object in the center of the grid). I plotted these because it is easier to see with the regular spacing that regions have been missed. For instance at ra ~129.4, dec ~ 49.4, There are several data points (red), but but randoms (blue). You can click on the below pictures to make them bigger.








Tuesday, February 22, 2011

Re-running Luminosity Functions

I needed to re-run the efficiency/completeness calculations for the three different luminosity functions, with the corrected computations that Adam Myers helped me come up with.

I re-ran them over the weekend, and to try to make things run faster, I ran with a higher spectroscopic completeness (less objects). But unfortunately this caused my numbers to be much lower than when I used a larger set...

completeness threshold = 0.5
20 targets / deg^2, completeness = 46%, efficiency = 58%
40 targets / deg^2, completeness = 62%, efficiency = 41%

completeness threshold = 0.8
20 targets / deg^2, completeness = 43%, efficiency = 52%
40 targets / deg^2, completeness = 58%, efficiency = 35%

So, I've set some runs going on the larger sets (completeness threshold = 0.5).

The code to do this is in the following log file:
../logs/110221log.pro

The runs are in the following directories:
j06runnew
r06runnew
hrh07runnew

Wednesday, February 16, 2011

qsub deleting

This command will come in useful again I am sure. It deletes all the runs by a single user in qsub:

qdel $(qselect -u username)

or more specifically for me on riemann:
qdel $(qselect -u jessica)

Tuesday, February 15, 2011

Final Numbers for Likelihood Paper Table 1

Ok, I finally finished doing the calculations for efficiency/completeness table. I've tried to comment the below code pretty well, so hopefully it makes sense to me when I try to reproduce this in the future (also in the following log file: ../logs/110215log.pro):

; Read in the targeting file, made by Adam Myers and is 219.9 deg^2 in area
adamfile = '/home/jessica/boss/chunk1_wcomp.fits'
targets = mrdfits(adamfile, 1)
stripe82targets1 = targets
newratio = stripe82targets1.LIKE_RATIO_CORE

;Read in coadded targeting file, given to me by Adam, and has the
;coadded photometry and the coadded likelihood ratios (like_ratio_bonus)
coaddfile = '/home/jessica/boss/star82-varcat-bound-ts-wgalex-wnnvarbonus-galexbitset.fits.gz'
coadd = mrdfits(coaddfile, 1)

;Match the coadded targeting file to the 219.9 file above so that they
;are the same area
spherematch, stripe82targets1.ra, stripe82targets1.dec, coadd.ra, coadd.dec, 2./3600, i1, i2, d12
coadd = coadd[i2]

; Read in the truthtable file
truthfile = '/home/jessica/boss/BOSS_Quasars_3PCplus_v1.1_wcomp.fits'
collateinfo1 = mrdfits(truthfile, 1)
stripe82data1 = collateinfo1
;Constrain to High human confidence quasars (with z > 2.15)
confcut = where(stripe82data1.z_conf_person GE 2 and stripe82data1.Z_PERSON GE 2.15)
stripe82quasars1 = stripe82data1[confcut] ; with redshift > 2.15
lowqsocut = where(stripe82data1.z_conf_person GE 2 and stripe82data1.Z_PERSON GE 0.5 and stripe82data1.Z_PERSON LT 2.15)
stripe82lowquasars1 = stripe82data1[lowqsocut] ; with 0.5 < redshift < 2.15

;Cut to objects which are in areas of high spectroscopic completeness
compthresh = 0.5 ;adjust to make different rows of Adam's table
wphot = where(stripe82targets1.completeness ge compthresh)
wspec = where(stripe82quasars1.completeness ge compthresh)
wdata = where(stripe82data1.completeness ge compthresh)
wlow = where(stripe82lowquasars1.completeness ge compthresh)

;Find Area of the area with above spectroscopic completeness
areas = mrdfits('/home/jessica/boss/area_wcomp.fits',1)
areacol = where(areas.comp_thresh ge compthresh)
thisArea = areas[areacol[0]].area

;Limit to objects with above spectroscopic completeness
stripe82targets = stripe82targets1[wphot]
stripe82quasars = stripe82quasars1[wspec]
stripe82data = stripe82data1[wdata]
stripe82lowqso = stripe82lowquasars1[wlow]

;Find likelihood threshold at (targetdensity) tpsd
area = 219.9 ; we know the exact area of the chunk1 file
targetdensity = 40.0 ;change this number to get the rows in table 1
tpsd = round(targetdensity*area)
;Use the full targeting file (no cuts)
sortRatio = targets[reverse(sort(targets.LIKE_RATIO_CORE))].LIKE_RATIO_CORE
;Threshold is the threshold such that we target at the above targetdensity
threshold = sortRatio[tpsd-1]

;Similarly find the threshold using the coadded photometry (will be
;different then the single-epoch photometry threshold
sortCoadd = coadd[reverse(sort(coadd.LIKE_RATIO_BONUS))].LIKE_RATIO_BONUS
thresholdCoadd = sortCoadd[tpsd-1]

;Spectroscopic Completeness threshold
print, compthresh

;Area
print, thisArea

;Likelihood Threshold
print, threshold

;Likelihood Targets per square degree
thistpsd = n_elements(where(stripe82targets.like_ratio_core gt threshold))/thisArea
print, thistpsd

spherematch, stripe82targets.ra, stripe82targets.dec, stripe82quasars.ra, stripe82quasars.dec, 2./3600, i1, i2, d12
; Likelihood Quasars targeted per square degree (not corrected)
qpsd = n_elements(where(stripe82targets[i1].like_ratio_core gt threshold))/thisArea
print, qpsd

spherematch, stripe82targets.ra, stripe82targets.dec, stripe82lowqso.ra, stripe82lowqso.dec, 2./3600, i1, i2, d12
; Likelihood Low Redshift Quasars targeted per square degree
lqpsd = n_elements(where(stripe82targets[i1].like_ratio_core gt threshold))/thisArea
print, lqpsd

spherematch, stripe82targets.ra, stripe82targets.dec, stripe82data.ra, stripe82data.dec, 2./3600, i1, i2, d12
; Others targeted per square degree
otpsd = n_elements(where(stripe82targets[i1].like_ratio_core gt threshold))/thisArea
print, otpsd - (qpsd + lqpsd)

;Likelihood fibers not targeted
print, thistpsd-otpsd

;Total z > 2.15 qsos
totalqsos = n_elements(stripe82quasars)/thisArea
print, totalqsos

;Total 0.5 < z < 2.15 qsos
totallowqsos = n_elements(stripe82lowqso)/thisArea
print, totallowqsos

;Efficiency (not corrected)
print, qpsd/thistpsd

;Completeness (not corrected)
print, qpsd/totalqsos

;Dealing with the non-targeted quasars that should have been targeted
;in the coadded photometry, correcting the efficiency and completeness

;Get likelihood targets (pass like threshold)
liketargets = stripe82targets[where(stripe82targets.like_ratio_core gt threshold)]
;Find targets that were targeted
spherematch, liketargets.ra, liketargets.dec, stripe82data.ra, stripe82data.dec, 2./3600, i1, i2, d12

;All likelihood targets (above threhsold)
all = where(liketargets.ra GT -99999)
;Targets that were targeted
lobs = liketargets[i1]
;Targets that were missed
lnobs = setdifference(all, i1)

;Of the ones that were missed, get the coadded photometry
likemissed = liketargets[lnobs]
spherematch, likemissed.ra, likemissed.dec, coadd.ra, coadd.dec, 2./3600, i1, i2, d12
likemissedcoadd = coadd[i2]

;Number that were missed that should have been targeted, because their
;coadded like_ratio_bonus passes the coadded threshold
nmissedcoadd = n_elements(where(likemissedcoadd.like_ratio_bonus gt thresholdCoadd))

; Number per square degree that would have been quasars if they were
; targeted. Multiply the number missed by the (not corrected) efficiency
; We will add these into our numbers to make them correct
nmpsd = round(nmissedcoadd*qpsd/thistpsd)/thisArea

;Line from table 1 for the table
; targets/deg^2, threshold, total like targets, total QSOs found, QSOs missed, completeness, efficiency
print, targetdensity, threshold, thistpsd*thisarea, (qpsd+nmpsd)*thisarea, (totalqsos-qpsd-nmpsd)*thisarea, (qpsd+nmpsd)/(totalqsos+nmpsd), (qpsd+nmpsd)/thistpsd

And here is the pretty LaTeX table:

Thursday, February 10, 2011

Efficiency and Completeness (again)

Finally I think I am doing this right. The log file with this code is here:
.../logs/110209log.pro

Below is an email to Adam explaining my problems, and newest findings
~~~~~~~
Adam,
For some reason when I wget the file, and the file is already in my directory, it doesn't overwrite it, but saves it as a different version:

BOSS_Quasars_3PCplus_v1.1_wcomp.fits.6

I didn't realize this was happening until just now, and so I kept re-downloading it, but then using the old version of the file (because BOSS_Quasars_3PCplus_v1.1_wcomp.fits was still the old version). Sorry about that!

I'm now getting the same numbers as you for the following:

print, n_elements(where(stripe82targets.like_ratio_core gt 0.543214))/45.93
20.03

spherematch, stripe82targets.ra, stripe82targets.dec, stripe82quasars.ra, stripe82quasars.dec, 2./3600, i1, i2, d12
print, n_elements(where(stripe82targets[i1].like_ratio_core gt 0.543214))/45.93
10.47


In terms of filling out your table I get the following numbers:

Here are the column headings (i've modified them slightly):
1) Spec Completeness
2) Area (deg^2)
3) Total fibers /deg^2 at like thresh > 0.543214
4) # z > 2.15 quasars confirmed / deg^2
5) # z < 2.15 quasars confirmed (low redshift) / deg^2
6) others that got a fiber (not quasars) / deg^2
7) not confirmed (didn't get fiber) / deg^2
8) Total known z > 2.15 quasars / deg^2
9) Total known z < 2.15 quasars / deg^2

(1) (2) (3) (4) (5) (6) (7) (8) (9)
0.5 45.93 20.03 10.47 0.348 2.24 6.97 24.21 18.05
0.6 37.80 20.45 11.24 0.397 2.27 6.53 25.61 15.77
0.7 26.61 21.12 11.01 0.526 2.56 7.03 25.82 11.56
0.8 14.02 22.40 10.70 0.856 3.21 7.63 24.05 6.065
0.9 2.124 18.36 8.473 0.941 1.41 7.53 20.24 5.178

Note, that as you say, col (4 + 5 + 6 + 7) = col 3

So I understand what you are saying that most of the targets that didn't get a fiber (column 7) are not quasars because they weren't found with the coadded data.

So it seems like you are saying that I should be defining the efficiency as:

col (4) / col (3) = 52% (@ spec completeness 0.5)

And the completeness as:
col (4) / col (8) = 43$ (@ spec completeness 0.5)


These numbers change the following way with spec competeness:

spec comp efficiency completeness
0.6 55% 44%
0.7 50% 43%
0.8 48% 44%
0.9 46% 42%

So which of these should I use? You had mentioned there was another step....
Jessica



Below is the code I used to generate these numbers:
; Read in the targeting file
adamfile = '/home/jessica/boss/chunk1_wcomp.fits'
targets = mrdfits(adamfile, 1)
stripe82targets1 = targets
newratio = stripe82targets1.LIKE_RATIO_CORE

; Read in the truthtable file
truthfile = '/home/jessica/boss/BOSS_Quasars_3PCplus_v1.1_wcomp.fits'
collateinfo1 = mrdfits(truthfile, 1)
stripe82data1 = collateinfo1
;High human confidence quasars (with z > 2.15)
confcut = where(stripe82data1.z_conf_person GE 2 and stripe82data1.Z_PERSON GE 2.15)
stripe82quasars1 = stripe82data1[confcut] ; with redshift > 2.15
lowqsocut = where(stripe82data1.z_conf_person GE 2 and stripe82data1.Z_PERSON GE 0.5 and stripe82data1.Z_PERSON LT 2.15)
stripe82lowquasars1 = stripe82data1[lowqsocut] ; with 0.5 < redshift < 2.15

compthresh = 0.9
wphot = where(stripe82targets1.completeness ge compthresh)
wspec = where(stripe82quasars1.completeness ge compthresh)
wdata = where(stripe82data1.completeness ge compthresh)
wlow = where(stripe82lowquasars1.completeness ge compthresh)

areas = mrdfits('/home/jessica/boss/area_wcomp.fits',1)
areacol = where(areas.comp_thresh ge compthresh)
thisArea = areas[areacol[0]].area

stripe82targets = stripe82targets1[wphot]
stripe82quasars = stripe82quasars1[wspec]
stripe82data = stripe82data1[wdata]
stripe82lowqso = stripe82lowquasars1[wlow]

area = 219.9 ; we know the exact area of the chunk1 file
tpsd = round(20.0*area)
sortRatio = targets[reverse(sort(targets.LIKE_RATIO_CORE))].LIKE_RATIO_CORE
threshold = sortRatio[tpsd-1]

;Comp threshold
print, compthresh

;Area
print, thisArea

;Like Threshold
print, threshold

;Likelihood Targets per square degree
thistpsd = n_elements(where(stripe82targets.like_ratio_core gt threshold))/thisArea
print, thistpsd

spherematch, stripe82targets.ra, stripe82targets.dec, stripe82quasars.ra, stripe82quasars.dec, 2./3600, i1, i2, d12
; Likelihood Quasars targeted per square degree
qpsd = n_elements(where(stripe82targets[i1].like_ratio_core gt threshold))/thisArea
print, qpsd


spherematch, stripe82targets.ra, stripe82targets.dec, stripe82lowqso.ra, stripe82lowqso.dec, 2./3600, i1, i2, d12
; Likelihood Low Redshift Quasars targeted per square degree
lqpsd = n_elements(where(stripe82targets[i1].like_ratio_core gt threshold))/thisArea
print, lqpsd

spherematch, stripe82targets.ra, stripe82targets.dec, stripe82data.ra, stripe82data.dec, 2./3600, i1, i2, d12
; Others targeted per square degree
otpsd = n_elements(where(stripe82targets[i1].like_ratio_core gt threshold))/thisArea
print, otpsd - (qpsd + lqpsd)

;Likelihood fibers not targeted
print, thistpsd-otpsd


;Total z > 2.15 qsos
totalqsos = n_elements(stripe82quasars)/thisArea
print, totalqsos

;Total 0.5 < z < 2.15 qsos
totallowqsos = n_elements(stripe82lowqso)/thisArea
print, totallowqsos

;Efficiency
print, qpsd/thistpsd

;Completeness
print, qpsd/totalqsos