Friday, December 25, 2009

Merry Australian Christmas!

Jessica + Adam + Koala send season's greetings from Australia

Monday, December 14, 2009

Traveling

I am traveling until January 16th. I will be posting sporadically. Happy Holidays everyone!

Sunday, December 6, 2009

Single Epoch Likelihood

I've been playing with the single epoch likelihoods cross checked with the truth table. The rest of the targeting will be on single epoch measurements (i.e. not stripe 82) so these are a more accurate measurement of how well the likelihood is doing. I've been playing with adjusting various parameters like how we define "variability" in terms of the chi^2 cut, if we should add variable objects from the L_everything catalog to L_QSO.

Below are the raw numbers of quasars targeted (normalized to targeting of ~40/degree^2) as well as the accuracy of each method:

No variability Method
; number quasars
1029
;percent accuracy
0.545986
; total targeted
1881


Variable objects with chi^2 > 1.5 subtracted from L_everything
; number quasars
1047
;percent accuracy
0.556619
; total targeted
1881


Variable objects with chi^2 > 1.5 subtracted from L_everything
Variable objects with chi^2 > 1.5 added to L_QSO
; number quasars
1033
;percent accuracy
0.549176
;total targeted
1881


Variable objects with chi^2 > 1.1 subtracted from L_everything
; number quasars
1053
;percent accuracy
0.559809
; total targeted
1881



Variable objects with chi^2 > 1.2 subtracted from L_everything
; number quasars
1055
;percent accuracy
0.560872
; total targeted
1881


Variable objects with chi^2 > 1.3 subtracted from L_everything
; number quasars
1054
;percent accuracy
0.560340
; total targeted
1881


Variable objects with chi^2 > 1.4 subtracted from L_everything
; number quasars
1051
;percent accuracy
0.558745
; total targeted
1881


Variable objects with chi^2 > 1.2 subtracted from L_everything
Variable objects with chi^2 > 1.6 added to L_QSO
; number quasars
1036
;percent accuracy
0.550771
;total targeted
1881

Variable objects with chi^2 > 1.2 subtracted from L_everything
Variable objects with chi^2 > 1.5 added to L_QSO
; number quasars
1035
;percent accuracy
0.550239
;total targeted
1881

Variable objects with chi^2 > 1.2 subtracted from L_everything
Variable objects with chi^2 > 1.4 added to L_QSO
; number quasars
1034
;percent accuracy
0.549708
;total targeted
1881


Variable objects with chi^2 > 1.2 subtracted from L_everything
Variable objects with chi^2 > 1.3 added to L_QSO
; number quasars
1029
;percent accuracy
0.547049
;total targeted
1881

It seems that the method with the best accuracy (56%) is variable objects with chi^2 > 1.2 subtracted from L_everything.

Here is a plot of the L_everything (non variable) vs L_QSO:

Magenta objects are targeted QSOs.
Cyan objects are targeted stars.
Green objects are not targeted stars.
Red objects are missed qsos.

It looks like we might be able to make the L_ratio a function of L_QSO to perhaps pick up a few more of the red objects at the top right corner.

Another interested plot is looking at likelihood-ratio vs g-band flux:

The color scheme is the same as first plot

We might be able to get more missed QSOs (red) if we target faint (g-flux less than 5) objects with a L_ratio close to the cut. I've also plotted likelihood-ratio vs g-r magnitude:

The color scheme is the same as first plot

Again, we might be able to use the fact that the missed QSOs (red) are clustered.

-------------
Cool Tip of The Day:
To download file directly from web using the command line:
wget http://www.downloadingurl.com/filetodownload.fits

If it is from a password protected page (like the wiki) then do the following:
wget --http-user=myusername --http-password=mypassword http://www.downloadingurl.com/filetodownload.fits

Friday, December 4, 2009

Likelihood Optimatization

I've been playing with various parameters in the likelihood method trying to find the most efficient cuts. The three things I have been playing with are:

added errors (what to use as the input adderr into likelihood_compute)
everything - variability (what happens when we take away variable objects from the L_everything file)
QSO + variability (what happens when we add the variable objects to the L_QSO likelihoods)

Here are my findings (these are on the co-added fluxes, next step is to re-run with single epoch):
Normal Errors (adderr = [0.014, 0.01, 0.01, 0.01, 0.014])
Percent of those targeted are quasars (based on 40/degree^2 targeting)
No variability: 0.649914
Variability everything: 0.656196
Variability everything + QSO = 0.651057

5X Errors (adderr = 5*[0.014, 0.01, 0.01, 0.01, 0.014])
Percent of those targeted are quasars (based on 40/degree^2 targeting)
No variability: 0.641348
Variability everything: 0.645346
Variability everything + QSO = 0.603655

7X Errors (adderr = 7*[0.014, 0.01, 0.01, 0.01, 0.014])
Percent of those targeted are quasars (based on 40/degree^2 targeting)
No variability: 0.627641
Variability everything: 0.624786
Variability everything + QSO = 0.572244

No Errors (adderr = 0.0*[0.014, 0.01, 0.01, 0.01, 0.014])
Percent of those targeted are quasars (based on 40/degree^2 targeting)
No variability: 0.644203
Variability everything: 0.627070
Variability everything + QSO = 0.619075

It looks like the errors we were running have the best numbers, and using the variability everything, but not the variability QSO.

I am going to play more with the definitions of variable everything and variable qso to see if I can get these to work better. I also want to play with not cutting on a L_ratio = 0.01, but perhaps changing L_ratio as a function of L_QSO (it seems we might be able to get a few more objects if we have L_ratio cut decrease as L_QSO gets large.

Thursday, December 3, 2009

Variable Likelihood

David Schelgel and I added variability information into the "everything" likelihood training file. This allows us to separate objects that vary (QSOs are included in this list) from objects that don't vary. This should improve our training by removing quasars from the "everything" file.

The training file now has chi^2 information of the changes in fluxes over repeat observations. The likelihood computation splits up L_EVERYTHING into chi^2 bins between the following :

L_EVERYTHING_ARRAY[0] : objs1.flux_clip_rchi2[2] LT 1
L_EVERYTHING_ARRAY[1] : objs1.flux_clip_rchi2[2] GE 1.1 AND
objs1.flux_clip_rchi2[2] LT 1.2
L_EVERYTHING_ARRAY[3] : objs1.flux_clip_rchi2[2] GE 1.2 AND
objs1.flux_clip_rchi2[2] LT 1.3
L_EVERYTHING_ARRAY[4] : objs1.flux_clip_rchi2[2] GE 1.3 AND
objs1.flux_clip_rchi2[2] LT 1.4
L_EVERYTHING_ARRAY[5] : objs1.flux_clip_rchi2[2] GE 1.4 AND
objs1.flux_clip_rchi2[2] LT 1.5
L_EVERYTHING_ARRAY[6] : objs1.flux_clip_rchi2[2] GE 1.5 AND
objs1.flux_clip_rchi2[2] LT 1.6
L_EVERYTHING_ARRAY[7] : objs1.flux_clip_rchi2[2] GE 1.6

Changing L_Ratio to be L_QSO / total(L_EVERYTHING_ARRAY[0:5] (which removes the L_EVERYTHING of the objects with variablility (chi^2 less than 1.5)), we get more QSOs. From the truth table data, using different chi^2 cuts:

L_ELSE = total(likechi.L_EVERYTHING_ARRAY[0:1],1) ; chi^2 LT 1.2
;new quasars selected
QNEWSELECT LONG = Array[77]
;new stars selected (not quasars)
SNEWSELECT LONG = Array[106]
; #new_quasars/#new_stars
0.726415

L_ELSE = total(likechi.L_EVERYTHING_ARRAY[0:2],1) ; chi^2 LT 1.3
;new quasars selected
QNEWSELECT LONG = Array[69]
;new stars selected (not quasars)
SNEWSELECT LONG = Array[92]
; #new_quasars/#new_stars
0.750000

L_ELSE = total(likechi.L_EVERYTHING_ARRAY[0:3],1) ; chi^2 LT 1.4
;new quasars selected
QNEWSELECT LONG = Array[67]
;new stars selected (not quasars)
STILLMISSING LONG = Array[159]
; #new_quasars/#new_stars
0.797619

L_ELSE = total(likechi.L_EVERYTHING_ARRAY[0:4],1) ; chi^2 LT 1.5
;new quasars selected
QNEWSELECT LONG = Array[65]
;new stars selected (not quasars)
SNEWSELECT LONG = Array[79]
; #new_quasars/#new_stars
0.822785

L_ELSE = total(likechi.L_EVERYTHING_ARRAY[0:5],1) ; chi^2 LT 1.6
;new quasars selected
QNEWSELECT LONG = Array[62]
;new stars selected (not quasars)
SNEWSELECT LONG = Array[72]
; #new_quasars/#new_stars
0.861111



This is L_EVERYTHING (chi2 less than 1.4) vs L_QSO
Green: Not quasars
Magenta: Quasars that the old likelihood method (no variability) targeted
Cyan: Quasars that the likelihood + variability method targets
Red: Quasars we are still missing (not targeting).

On another note, possible Thesis Title: "Putting Galaxies in their Place." Thoughts?

Tuesday, December 1, 2009

SVM Results

I've been playing with using a support vector machine (SVM) for quasar target selection. Adam suggested this because it seem that the likelihood method was very similar to this already solved computer science problem. He helped me get SVM-light working using R.

The way this method works is that you input variable information for a set of training objects that represent what you are looking for (in this case quasars) and what you are not looking for (everything else).

I trained on the u-g-r-i-z fluxes (the same fluxes used to in the likelihood method) from the qso template and everything else (from now on referred to as everything) template objects. I tell SVM which are quasars and which are everything. It takes a long time to train the SVM, so I've been taking subsets of the quasar/everything catalogs to make the system faster (the likelihood uses ~1,000,000 objects).

I then take the human-confirmed truth table objects and run them through the SVM to classify as quasar or not. Below are the results:

Using 30,000 training qsos and 30,000 training stars
#quasars targeted
[1] 700
#not quasars not targeted
[1] 466
#quasars not targeted
[1] 255
#not quasars targeted
[1] 926
>
#Accuracy of targeting (qsos targeted / total targeted)
[1] 0.4305043


~~~

Using 100,000 training qsos and 100,000 training stars
#quasars targeted
[1] 709
#not quasars not targeted
[1] 480
#quasars not targeted
[1] 246
#not quasars targeted
[1] 912
>
#Accuracy of targeting (qsos targeted / total targeted)
[1] 0.4373843


~~~

Compared with the likelihood method:
#quasars targeted
[1] 793
#not quasars not targeted
[1] 908
#quasars not targeted
[1] 162
#not quasars targeted
[1] 484
>
#Accuracy of targeting (qsos targeted / total targeted)
[1] 0.620987


I can try adding in more information like the errors, or the likelihood values as additional vectors in this analysis, but it doesn't seem like it is working as well as the likelihood or that adding more objects improves accuracy very much (using ~3 times as many objects only increased the accuracy by >1%). I could run on larger training sets and see if this makes a difference too.