Showing posts with label R. Show all posts
Showing posts with label R. Show all posts

Tuesday, December 1, 2009

SVM Results

I've been playing with using a support vector machine (SVM) for quasar target selection. Adam suggested this because it seem that the likelihood method was very similar to this already solved computer science problem. He helped me get SVM-light working using R.

The way this method works is that you input variable information for a set of training objects that represent what you are looking for (in this case quasars) and what you are not looking for (everything else).

I trained on the u-g-r-i-z fluxes (the same fluxes used to in the likelihood method) from the qso template and everything else (from now on referred to as everything) template objects. I tell SVM which are quasars and which are everything. It takes a long time to train the SVM, so I've been taking subsets of the quasar/everything catalogs to make the system faster (the likelihood uses ~1,000,000 objects).

I then take the human-confirmed truth table objects and run them through the SVM to classify as quasar or not. Below are the results:

Using 30,000 training qsos and 30,000 training stars
#quasars targeted
[1] 700
#not quasars not targeted
[1] 466
#quasars not targeted
[1] 255
#not quasars targeted
[1] 926
>
#Accuracy of targeting (qsos targeted / total targeted)
[1] 0.4305043


~~~

Using 100,000 training qsos and 100,000 training stars
#quasars targeted
[1] 709
#not quasars not targeted
[1] 480
#quasars not targeted
[1] 246
#not quasars targeted
[1] 912
>
#Accuracy of targeting (qsos targeted / total targeted)
[1] 0.4373843


~~~

Compared with the likelihood method:
#quasars targeted
[1] 793
#not quasars not targeted
[1] 908
#quasars not targeted
[1] 162
#not quasars targeted
[1] 484
>
#Accuracy of targeting (qsos targeted / total targeted)
[1] 0.620987


I can try adding in more information like the errors, or the likelihood values as additional vectors in this analysis, but it doesn't seem like it is working as well as the likelihood or that adding more objects improves accuracy very much (using ~3 times as many objects only increased the accuracy by >1%). I could run on larger training sets and see if this makes a difference too.

Monday, November 30, 2009

R Packages on Riemann

I asked Michael Jennings to install R on riemann so I can play with this SVM-light thing for the target selection. He installed R but asked me to install any add-on packages into my home directory. Here is how you do this:
  1. Create a directory where you want the R add-on packages to be stored (i.e. /home/jessica/R/library)
  2. Tell R to look for libraries in this location by adding this to your .path file: export R_LIBS=/home/jessica/R/library
  3. Source your .bashrc file to update path changes
  4. Open R
  5. Type the command: install.packages("e1071", lib = '/home/jessica/R/library', repos = "http://cran.r-project.org") where "e1071" is the name of the package you want to install, lib is the directory you created to store the packages, and repos is the location to download the packages from.
  6. Test to make sure it was successful by closing R, re-opening and importing package you just installed: library("e1071")
And now you have successfully installed a R package on Riemann. Thanks to Michael for working so quickly. Now, to try this SVM thing on the quasar targeting...

Here's the code (also in like:svmcode.r)

library("e1071")
library("FITSio")
setwd("/Users/jessica/Documents/workspace/newman/Jessica/likelihood")
qso = readFITS("smallqso.fits")
star = readFITS("smallstar.fits")
x2 <- as.matrix(star$col[[1]],ncol=5)
x1<- as.matrix(qso$col[[1]],ncol=5)
x <- rbind(x1,x2)
y1= seq(1,smallsize)*0+1
y2= seq(1,smallsize)*0

y=c(y1,y2)
fy = factor(y)
m = svm(x,fy)
save(list = ls(), file = "./svmcode.Rdata")


Where smallqso.fits and smallstar.fits are subsets of the qso/star templates used in the likelihood calculation. In this run I am using 100,001 objects for these small files, but that run is taking 2+ days so far on my laptop, so I think I'm going to try to do it with less points for this run on riemann (30,000). To load this session image, type: load("./svmcode.Rdata")

To run this file as a script type the following (for some reason the qsub way of running a script isn't working):
R CMD BATCH ./svmcode.r svmcode.out 2> svmcode.err &