Jessica Kirkpatrick Research: April 2010

Wednesday, April 28, 2010

Traveling

I am traveling until May 5th. Congratulations to Kevin and Michelle!

Tuesday, April 27, 2010

Now I am going to see how the likelihood compares if we use the Richards luminosity function and we also include the (0.5 < z < 5.0). The code to run this is in the following log file: ../log/100427_3log.pro. This works even better than the last test:

eps = 1e-30
bossqsolike = total(likelihood.L_QSO_Z[16:32],1) ;quasars between redshift 2.1 and 3.7
qsolcut1 = where( alog10(total(likelihood.L_QSO_Z[16:32],1)) LT -9.0)
den = total(targets.L_EVERYTHING_ARRAY[0:4],1) + total(likelihood.L_QSO_Z[0:44],1) + eps
num = bossqsolike + eps
NEWRATIO = num/den
NEWRATIO[qsolcut1] = 0 ; eliminate objects with low L_QSO value

IDL> print, n_elements(ql) ; number quasars
461
IDL> print, n_elements(ql)*1.0/(n_elements(sl)+n_elements(ql)) ;percent accuracy
0.256111
IDL> print, n_elements(ql) + n_elements(sl) ; total targeted
1800
IDL>
IDL> print, n_elements(nql) ; number quasars
575
IDL> print, n_elements(nql)*1.0/(n_elements(nsl)+n_elements(nql)) ;percent accuracy
0.319444
IDL> print, n_elements(nql) + n_elements(nsl) ; total targeted

We now go from 25.6% accuracy to 31.9% accuracy! And we find 114 more quasars!

Plots
The white points were targeted/missed by both the new and old likelihoods
The magenta points were only targeted/missed by the new likelihood
The cyan points were only targeted/missed by the old likelihood

Monday, April 26, 2010

Likelihood Test Results (3)

As a sanity check I re-computed the likelihoods with the old qso catalog to make sure I got the same results as before, which I did. See ../logs/100426log.pro for code.

It seems that adding in the BOSS QSOs doesn't help.

Now I am going to run on the old qso catalog but with the larger redshift range (0.5 < z < 5.0) and see if this improves things. See ../logs/100426log.pro for code.

So this seems to actually make a difference. I have played around with the redshift range for the numerator to get the best selection numbers:

eps = 1e-30
bossqsolike = total(likelihood.L_QSO_Z[14:34],1) ;quasars between redshift 2.1 and 3.5
qsolcut1 = where( alog10(total(likelihood.L_QSO_Z[19:34],1)) LT -9.0)
den = total(targets.L_EVERYTHING_ARRAY[0:4],1) + total(likelihood.L_QSO_Z[0:44],1) + eps
num = bossqsolike + eps
NEWRATIO = num/den
NEWRATIO[qsolcut1] = 0 ; eliminate objects with low L_QSO value

qsolcut2 = where( alog10(total(targets.L_QSO_Z[0:18],1)) LT -9.0)
L_QSO = total(targets.L_QSO_Z[2:18],1)
den = total(targets.L_EVERYTHING_ARRAY[0:4],1) + total(targets.L_QSO_Z[0:18],1) + eps
num = L_QSO + eps
OLDRATIO = num/den
OLDRATIO[qsolcut2] = 0 ; eliminate objects with low L_QSO value

numsel = 1800-1
NRend = n_elements(newratio)-1
sortNR = reverse(sort(newratio))
targetNR = sortNR[0:numsel]
restNR = sortNR[numsel+1:NRend]

ORend = n_elements(oldratio)-1
sortOR = reverse(sort(oldratio))
targetOR = sortOR[0:numsel]
restOR = sortOR[numsel+1:ORend]

nql = setintersection(quasarindex,targetNR)
nqnl = setintersection(quasarindex, restNR)
nsl = setdifference(targetNR, quasarindex)
nsnl = setdifference(restNR, quasarindex)

ql = setintersection(quasarindex, targetOR)
qnl = setintersection(quasarindex, restOR)
sl = setdifference(targetOR,quasarindex)
snl = setdifference(restOR,quasarindex)

IDL> print, n_elements(ql) ; number quasars
461
IDL> print, n_elements(ql)*1.0/(n_elements(sl)+n_elements(ql)) ;percent accuracy
0.256111
IDL> print, n_elements(ql) + n_elements(sl) ; total targeted
1800
IDL>
IDL> print, n_elements(nql) ; number quasars
508
IDL> print, n_elements(nql)*1.0/(n_elements(nsl)+n_elements(nql)) ;percent accuracy
0.282222
IDL> print, n_elements(nql) + n_elements(nsl) ; total targeted
1800

So we go from 25.6% accuracy to 28.2% accuracy! And we find 47 more quasars!

The code for the above run is in the directory ../likelihood/run1/

Now to see how the different luminosity functions do. Below is results running with the Richard 06 luminosity function:

QSO Catalog with Richards Luminosity Function

(See ../logs/100427_2log.pro for code)

IDL> print, n_elements(ql) ; number quasars
461
IDL> print, n_elements(ql)*1.0/(n_elements(sl)+n_elements(ql)) ;percent accuracy
0.256111
IDL> print, n_elements(ql) + n_elements(sl) ; total targeted
1800
IDL>
IDL> print, n_elements(nql) ; number quasars
543
IDL> print, n_elements(nql)*1.0/(n_elements(nsl)+n_elements(nql)) ;percent accuracy
0.301667
IDL> print, n_elements(nql) + n_elements(nsl) ; total targeted
1800

This does better than the old luminosity function!
So we go from 25.6% accuracy to 30.2% accuracy! And we find 82 more quasars!

Sunday, April 25, 2010

Likelihood Test Results (2)

I'm going to try adding in all QSOs (regardless of brightness). I've taken out the BOSS QSOs that are in Stripe-82 so that we don't have the same objects in the Monte Carlo as in the testing data.

Joe suggested doing this. We'll see if it improves things, if not, then looks like we should go back to old catalog and see if modifying the luminosity function helps.

Here is what they look like:

BOSS QSOs

SDSS DR5 QSOs

The code to do this is in the following log file: ../logs/100425log.pro
It creates the following file to use as input to Monte Carlo: ~/boss/allQSOMCInput.fits"

Plot of allQSOMCInput.fits

I re-ran hiz_kde_numerator_jess.pro to make a new QSOCatalog, which is here:
../likelihood/qsocatalog/QSOCatalog-Mon-Apr-26-13:22:43-2010.fits

Plot of QSOCatalog-Mon-Apr-26-13:22:43-2010.fits

I changed likelihood_compute to look at the above catalog.

Run the likelihood calculation with this new input catalog:

smalltargetfile = "./smalltarget.fits"
smalltargets = mrdfits(smalltargetfile, 1)

.com splittargets2.pro

nsplit = 50L
splitsize = floor(n_elements(smalltargets)*1.0/nsplit)
junk = splitTargets2(smalltargets, nsplit, splitsize)

; run likelihood.script

.com mergeLikelihoods.pro
likelihood = mergelikelihoods(nsplit)

outfile = "./likelihoods2.0-4.0ALLBOSSAddedSmall.fits"
splog, 'Writing file ', outfile
mwrfits, likelihood, outfile, /create

IDL> print, n_elements(ql) ; number quasars
467
IDL> print, n_elements(ql)*1.0/(n_elements(sl)+n_elements(ql)) ;percent accuracy
0.259300
IDL> print, n_elements(ql) + n_elements(sl) ; total targeted
1801
IDL>
IDL> print, n_elements(nql) ; number quasars
418
IDL> print, n_elements(nql)*1.0/(n_elements(nsl)+n_elements(nql)) ;percent accuracy
0.232093
IDL> print, n_elements(nql) + n_elements(nsl) ; total targeted
1801

This doesn't do better than the old likelihood either (same color scheme as last post).

The logfile for this is ../logs/100425log.pro

Saturday, April 24, 2010

Likelihood Test Results (1)

The first test I did was to just put in the BOSS + Known QSOs into the QSO Catalog and see if that helped things. It didn't seem to make much of a difference:

Old Likelihood:
IDL> print, n_elements(ql) ; number quasars
467
IDL> print, n_elements(ql)*1.0/(n_elements(sl)+n_elements(ql)) ;percent accuracy
0.259300
IDL> print, n_elements(ql) + n_elements(sl) ; total targeted
1801

New Likelihood:
IDL> print, n_elements(nql) ; number quasars
447
IDL> print, n_elements(nql)*1.0/(n_elements(nsl)+n_elements(nql)) ;percent accuracy
0.248195
IDL> print, n_elements(nql) + n_elements(nsl) ; total targeted
1801

The white points were targeted/missed by both the new and old likelihoods
The magenta points were only targeted/missed by the new likelihood
The cyan points were only targeted/missed by the old likelihood

The logfile is ../logs/100424log.pro
I've saved the files for this run in the following directory:
../likelihood/likesmall2.0-4.0JiangCombo
The QSO catalog I used for this run is the following:
../likelihood/qsocatalog/QSOCatalog-Sun-Apr-25-18:09:33-2010.fits

Friday, April 23, 2010

Running Likelihood Tests

I'm going to change one thing at a time here to figure out where the new likelihood goes awry. Unfortunately I didn't keep the best notes about how I changed the QSO catalogs before, and so I am finding it difficult to reproduce what I did. So I am going to be much better about documenting all the step here.

1. First I make a catalog of all the BOSS and otherwise known QSOs which aren't already in the inputs which we also have coadded fluxes by spherematching them with . The code to do this is in the following log file ../logs/100423log_1.pro. This makes a file: ~/boss/coaddedBOSSQSOs.fits which has the coadded fluxes (from varcat files) and redshifts of these QSOs. These fluxes are not de-reddened.

Here is a plot of these quasars

2. Combine this BOSS catalog with the SDSS DR5 "bright" quasars that are included in the old version of the Monte Carlo. This is done by running Joe Hennawi's qso_photosamp.pro program (to get the SDSS DR5 QSOs), reading in coaddedBOSSQSOs file, de-reddening them both, and then merging them into one file, which is then saved as ~/boss/brightQSOMCInput.fits. The code to do this is in the following log file ../logs/100423log_2.pro.

SDSS DR5 Quasars

brightQSOMCInput.fits QSOs

3. Change qso_fakephoto.pro to read in the ~/boss/brightQSOMCInput.fits file insteady of just running qso_photosamp.pro. Save this modified program as ../likelihood/qsocatalog/qso_fakephoto_jess.pro

4. Change hiz_kde_numerator.pro to call qso_fakephoto_jess.pro, instead of qso_fakephoto.pro and save as ../likelihood/qsocatalog/hiz_kde_numerator_jess.pro

5. Follow instructions from this post for how to change/create the luminosity function.

6. Enter the command :
.run hiz_kde_numerator_jess.pro into IDL to generate QSO Catalog.

7. Change likelihood_compute.pro so that it points to this QSO Catalog that you just generated as the input QSO Catalog:
; Read in the QSO
filefile2 = '/home/jessica/repository/ccpzcalib/Jessica/likelihood/qsocatalog/newCatalog.fits'

QSO Catalog (including the BOSS quasars)

Code to generate plots is in the log file: ../logs/100423_4log.pro

8. Make sure that the redshift ranges in likelihood_compute.pro match those in the file hiz_qso_dndz.pro.

Thursday, April 22, 2010

Initial Likelihood Tests

Initial Tests of the "Improved" likelihood do not show improvements.

We seem to be missing a bunch of QSOs which we used to get. I think this is because the simulation changed from having 10M QSOs in the BOSS redshift range (2.0 < z < 3.9) to now having 10M QSOs in a larger redshift range (2.0 < z < 3.9) and most of them are low redshift, so we have a much lower density of QSOs in the redshift range of interest.

Below is a comparison of QSOs targeted with old/new methods.

The white points were targeted by both the new and old likelihoods
The magenta points were only targeted by the new likelihood
The cyan points were only targeted by the old likelihood (missed by new)

Below is a comparison of missed QSOs targeted with old/new methods.

The white points were missed by both the new and old likelihoods
The magenta points were only missed by the new likelihood
The cyan points were only missed by the old likelihood (targeted by new)

I'm going to re-run with the old luminosity function and redshift range, but with the extra QSOs as inputs to try to isolate where the problem is.

Wednesday, April 21, 2010

Known Quasars and Set Theory

It's been kind of a nightmare to put a file of all known quasars act as a truth table for my likelihood testing. Adam Myers has a list of all known quasars from SDSS DR 7 and other data sets, and then Nic Ross has a truth table with BOSS Quasars.

So here is code to combine them without repeats:

;Read in BOSS Truth table
collatefile = '/home/jessica/boss/BOSS_Quasars_3PCv3_cat.fits.gz'
collateinfo = mrdfits(collatefile, 1)

;Trim to only be QSOs (redshift > 2 with a high human confidence)
confcut = where(collateinfo.z_conf_person GT 2 and collateinfo.Z_PERSON GE 2)
confTT = collateinfo[confcut]

;Read in Myers known quasars
knownqsofile = '/home/jessica/boss/knownquasarstar.041310.fits'
knownqsos = mrdfits(knownqsofile, 1)

;Constrain to only be QSOS in the BOSS redshift range
knownqso = knownqsos[where(strmatch(knownqsos.source, "*QSO*"))]
knownbossqsos = knownqso[where((knownqsos.zem GE 2) AND (knownqsos.zem LE 5.0))]

;Spherematch these two sets to find overlapping qsos (qsooverlap)
spherematch, confTT.ra, confTT.dec, knownbossqsos.ra, knownbossqsos.dec, 2./3600, i1, qsooverlap, d12

;Subtract the overlapping QSOs from the Myers known QSOs
knownqsoindex = indgen(n_elements(knownbossqsos.dec))
knownqsos = setdifference(knownqsoindex, qsooverlap)

(the functions setdifference, setunion and setintersection can be found here)

;Combine the two sets to get the ra, dec, and redshift of all known QSOs
quasarsRa = [knownbossqsos[knownqsos].ra, confTT.ra]
quasarsDec = [knownbossqsos[knownqsos].dec, confTT.dec]
quasarsZ = [knownbossqsos[knownqsos].zem, confTT.z_person]

Monday, April 19, 2010

Likelihood Improvements

On April 28th we need to submit the final version of the likelihood to the collaboration. I've been working on implementing changes and improvements to the likelihood. Here are the changes I've made thus far:

1. Changed luminosity function. I am currently using the Richards '06 luminosity function. Joe Hennawi thinks I should use a Hopkins, Richards, Hernquist (2007) combination function. I'm going to run on both and see which is better at targeting QSOs. See this post for more details.

2. Changed the redshift range of the QSO Catalog. Before we were only modeling QSOs from 2.0 < z < 3.9. Now we are modeling from 0.5 < z < 5.0. I need to optimize which range we should use for the numerator and which range we should use for the denominator. Nominally we are interested in targeting QSOs in the redshift range 2.15 < z < 3.5, so that is where I will start. Including low redshift QSOs in the denominator of the likelihood should improve the method significantly though.

3. Including BOSS QSOs along with SDSS DR5 QSOs with "good photometry." This fills out regions of the color/flux space which were clipped from SDSS such that we aren't missing QSOs which overlap with the horizontal branch stars. See this post for more details.

To speed up the computing of likelihoods, I am sub-sampling the low redshift QSOs (z > 2.1) by 10% and the high redshift (z < 2.1) by 50%. This reduces the number of objects in the QSO Catalog from 10,000,000 to 1,582,012. See this post for more details.

Joe Hennawi has some ideas for how to do this too which he outlined in an email today (see 4/20/2010 email subject: Likelihood weights):
Hi Guys,
After thinking about this more, and actually trying to work out the math, I convinced myself that indeed you were right. The factored probability I was proposing is not mathematically equivalent to the numerator which you are computing with the likelihood. Indeed, factoring the probability into p1(f_u/f_i/, f_g/f_i, f_r/f_i, f_z/f_i)*p2(f_i), is actually a more constrained model. This is however what we are doing with the extreme deconvolution, and for QSOs, this model is actually I think more desirable, for the following reason. Our model of QSO colors implicitly assumes that QSO colors do not depend on magnitude, and p1 reflects that behavior, whereas p2 is basically the magnitude prior. For stars, colors vary with magnitude. For extreme deconvolution, we had to fit to re-scaled fluxes (e.g. colors) rather than total fluxes in 5-d because gaussian models in general provide poor fits to power law distributions. This is also fine, since the color distributions of the stars vary so slowly with magnitude that you essentially make no errors by treating the probability as separable.

But this doesn't help too much with the current version of likelihood, since we don't plan to totally re-write that code. So I think the best approach would be to just take the entire list of QSOs that you plan to use, and re-scale them onto a grid of i-band fluxes spanning the range you want to consider, e.g. imin=17.8 -22.4. The grid spacing determines how finely you sample the i-band flux space, and this should be significantly smaller than the dispersion of QSO colors about the mean. Not sure what to choose here, but say you chose di = 0.01. Then your list of QSOs in the numerator would be a replication of that same list of QSOs N = (22.4-17.8)/0.01 = 460 times. Now in each i-magnitude bin, you can calculate the weights to apply to each QSO such that the weighted histogram of the QSO redshifts would give you the correct redshift and distribution and number of QSOs as predicted by the luminosity function.

I'm going to see how big of a difference it makes to calculate the likelihoods with my reduced catalog versus the full catalog (in terms of efficiency of selecting QSOs). Before I spend a bunch of time implementing Joe's suggestions.

Thursday, April 15, 2010

Reductions to QSO Catalog

At the suggestion of David and Joe I am reducing the number of objects in the QSO Catalog to speed up the likelihood calculation, and then multiplying the likelihoods by the amount we reduced them. Here is the code:
___________________
file2 = "/home/jessica/boss/R06New0.5-5.0.fits"
objs2 = mrdfits(file2, 1)

minz = 0.5
dz = 0.1
nzbin = 16
reduction = 10.

.com subsample.pro
reducedlow = subsample(minz, dz, nzbin, objs2, reduction)

minz = 2.1
dz = 0.1
nzbin = 29
reduction = 2.
reducedhigh = subsample(minz, dz, nzbin, objs2, reduction)

newQSOcatalog = [reducedlow,reducedhigh]

outfile = "./reducedQSOCatalog.fits"
mwrfits, newQSOcatalog, outfile, /create
___________________

Also in the logfile: ../logs/100415log.pro with this code. The above code uses the function subsample.pro which is in the likelihood directory.

Monday, April 12, 2010

Parallelizing Likelihood

I spent most of yesterday and today parallelizing the likelihood to make it compute faster. This is how it works:

Start with a list of potential targets that you want to compute the likelihoods on.
Split up the targets into junks of 1000 objects using the function splitTargets (in the likelihood directory).
This will create separate fits files for every 1000 objects in your set, and save them to the current directory with the format: targetfile#.fits, where # will range from 0 to (number of targets / 1000).
Go to the directory with the target files and run the following script: likelihood.script (also in the likelihood directory)
The likelihood script loops through all files in the current directory of the format targetfile*.fits and via qsub runs the likelihood on each fits file and outputs the result into a file likefile#.fits.
This took about 2.5 hours to run on 775,000 targets and just depends on how many jobs are in the queue. Running the likelihood on each 1000 object file takes about 10 minutes.
Then use the mergelikelihoods function (in the likelihood directory) to read all the likefile#.fits and merge them into one likelihood file.
I then write this file to the file likelihoods.fits, which has a 1-1 mapping with the pre-split targetfile: targetfile.fits

The log file which runs this whole code is at the following location: ../logs/100411log.pro

Saturday, April 10, 2010

The No-Shampoo Experiment

Disclaimer: I know this isn't research related, but it is kinda scientific. Plus I wanted to put my findings about giving up shampoo somewhere on the internet so I could perhaps help others who are trying this out.

~~~~~~~~~~~~~
About a month ago I decided to try to give up shampoo. There are many reasons one might want to do this:

Health: Shampoo has a lot of nasty chemicals in it. These chemicals are bad for your eyes, skin, and scalp. They are also not that great to be putting in your bath water, which eventually goes into the ocean and the bodies of fish and marine life.
Environment: Hair/beauty products come in often non-recyclable bottles. I found I was generating extra waste using shampoo and conditioner.
Money: Shampoo / Conditioner can be expensive. I was using Matrix Biolage ($20 / bottle).
Convenience: I travel a lot, and with the TSA restrictions on liquids, its just has become really annoying to have to deal with bringing beauty products with me when I fly.
Philosophical: In general I don't like the idea of being dependent on things. It bugs me to think that if I were plopped on a desert island or something, that my body would freak out if I didn't have certain products, chemicals, etc. (Coffee is obviously an exception ;)

After reading online about the "No Poo" movement, I decided to try it myself. I was very very skeptical about this working for me because I've always had VERY oily hair. I am one of those people that if I don't wash my hair everyday, it starts to look like it is wet because the roots get dark and weighed down by the oil. The most I've ever been able to go without washing my hair is 1.5 days without looking like I am homeless.

The goal of going "No Poo" is to eventually wash your hair with nothing except good old water and a washcloth. However, it isn't recommended to go straight from shampoo to water-only. This is something that takes time to transition to because if you are (like I was) using shampoo daily, your scalp is producing a lot of extra oil to replace the oil that you are stripping away with the shampoo. So you need to get your scalp to go back to producing normal amounts of oil before you can go water-only.

The way you transition (or detox from shampoo) is washing with baking soda (as a shampoo) and vinegar (as a conditioner). Baking soda strips away less of your hair's oils than shampoo, and vinegar acts as a detangler by pH-balancing your hair (hair is slightly acidic). My boyfriend likes to make fun of me for using baking soda and vinegar in the shower, because it sounds like an elementary school style science fair volcano. But seriously, this stuff works for cleaning hair.

Below is what I did. A few web sites recommended trying to cut down shampoo to every other day or every third day before switching to baking soda. I couldn't do this because of how oily my hair is, so I simply went from shampooing / conditioning everyday to washing with baking soda / vinegar everyday.

How to wash you hair with baking soda and vinegar:

Before you get into the shower, brush your hair. This just makes it less tangled and easier to brush after you shower, and also distributes the oils a little bit before washing.
Put 1 tbs of baking soda into a cup, and mix it with 1 cup of warm water. Stir until well mixed, mixture should feel silky, not pasty or gritty.
In the shower, apply the baking soda / water mixture to DRY hair. Focus on the scalp area. Pour onto front and back of head. For me it was important that I applied it to dry hair. When I applied it to wet hair, I didn't get the same cleaning results from the baking soda and my hair got really oily.
With clean hands, massage your scalp. This obviously wont feel the same as foamy shampoo, but try to rub your scalp anyway, this helps clean out any grime/sweat/dirt and also distribute the oils down the shaft of your hair. Some people online say that they prefer to make a baking soda paste, and massage that into their scalp. That made my scalp / hair too dry. But it is also something to try.
Let the mixture sit in your hair for a couple minutes (optional). I let it sit in longer if my hair is feeling oily that day, and shorter if it isn't.
Rinse out the baking soda with water.
Mix 1 tbs of vinegar with 1 cup of water. I use organic apple cider vinegar because people on the internet say that this smells better than white vinegar. You can also use lemon juice instead of vinegar, or add lemon juice / rind to the vinegar to make it smell better. What I do is put some vinegar in a smaller bottle (4 oz) and add 3-4 drops of perfume oil to it. I keep this mixture in the shower and simply shake up this bottle to mix the perfume oil and vinegar before pouring out 1 tbs for conditioning.
Pour the vinegar / water mixture on the ENDS of your hair. Avoid your face / eyes as the vinegar will irritate them. I don't put any on my scalp, because my hair is really oily, and the vinegar makes this worse. However, if you have problems with dry hair / scalp you can also put this on your roots. I simply dip the ends of my hair into the cup with vinegar and water.
Rinse out the vinegar (optional). The reaction of most people when I tell them I am putting vinegar in my hair is a look of repulsion. They say: "Doesn't that make your hair smell bad?" I didn't believe this before I tried it either, but the truth is that once your hair dries the vinegar smell completely goes away. The only smell left in my hair is the smell of the perfume oil. This is even more true if you rinse out the vinegar, but you actually don't have to rinse and it seriously doesn't smell like vinegar after your hair is dry. Weird I know, but true. Vinegar has anti-bacterial properties and so putting it in your hair kills yucky things. This actually helps make your hair smell fresh and clean.
Towel dry your hair well. Focus on the roots. This also helps with distributing your scalp's oils to the rest of your hair.

I'm not going to lie, for the first couple weeks my hair was a pretty weird. Some days it would be very oily and heavy/flat looking, other days it would be frizzy or coarse and brittle. However, I found that because I was mixing my own "shampoo" and "conditioner" it was easy to make small adjustments to try and get my hair to look the way I wanted it. If the ends were too dry, I put a little bit of oil (jojoba) in the vinegar mixture. If my hair got too oily, I would add more baking soda, or let it sit in my hair longer. I liked that I could adjust my regiment based on what was actually going on with my hair on a given day. After I figured out what worked, I ended up having less "bad hair days" than when I used regular shampoo, because it was easier to fine-tune.

After three weeks of no shampoo I decided to try to reduce the amount I washed my hair. I've been slowly reducing the amount of baking soda I am using, and also skipping days where I don't wash my hair at all, or only wash with water and a wash cloth. The way this works is basically you try to distribute the oils by rubbing a wash cloth from your scalp down to the ends of your hair. I've read you can also do this with a fine-tooth comb or a soft brush. I really like taking showers, and so it is easier for me to do it in the shower. The ideas is that eventually I will only have to use baking soda / vinegar once every few weeks... or maybe not at all?

Now that my hair has adjusted to not using shampoo. I am really surprised at the results:

My hair is a lot shinier. I think this is because there are more oils in it.
My hair doesn't get as tangled (I have really long hair that used to get knots in it). This is probably because I have been brushing it more often, or maybe because (supposedly) the vinegar helps smooth out the cuticles in your hair, making it not catch on itself as much.
My hair is less oily. I can now go up to 3 days without using baking soda. I usually rinse my hair every day or two with water. I'm most surprised about this because like I said, before I couldn't go more than 1.5 days without shampooing before my hair looked like crap.
My hair smells really good. I thought that giving up shampoo would mean my hair would smell bad, or sweaty or weird. Because I put perfume oil in the vinegar, my hair smells like whatever perfume I want it to smell like. When I don't use perfume oil, my hair just smelled like me. It's hard to explain what that smells like, but it isn't perfume, or soapy, just like clean skin.
My hair has more volume. My hair seems fuller and bouncier. This is also weird because I would think more oils would weight my hair down more. But perhaps I was over-conditioning before and that was making my hair flatter?
My hair has more highlights. I think this is because I had "chemical build-up" in my hair from using so many hair products. Now that I am using less stuff in my hair, it looks brighter and a little lighter in color.
My hair is easier to style. It just holds a style better. I think the oils act kinda like a natural hairspray or something, but when I comb it a certain direction, it just stays that way. Before my hair had a lot of wave to it that made it go all different directions, it isn't doing that as much now.
I'm spending a lot less money: It's been over a month doing this and I've gone through 1/3 a box of baking soda ($1 / box) and 1/2 a bottle for apple cider vinegar ($2 / bottle). Much cheaper than my old regime. I could also buy this stuff in bulk if I wanted and get it for even less. And hopefully I'll continue to reduce the amount of these products that I use.
Traveling is easier: The only liquid I need to bring is a small travel-sized container that I filled with vinegar. Alternatively, vinegar is so cheap I could just buy it when I get to my location, or even ask to borrow some from the kitchen of the hotel I am staying.

On the down side.

I have to deal with a A LOT of teasing from Adam, and other people who hear about this. There has definitely been jokes about smelling like salad dressing and making model volcanoes in the shower.
When I go to the gym and sweat, or otherwise get my hair wet, it starts to smell a little bit like vinegar again. I stopped noticing this once I started rinsing out the vinegar, and adding a little bit of perfume oil.
I have noticed that my ends are a little dry. This is easily combated with a little bit of jojoba oil, and hopefully will become less of a problem as I taper off the baking soda.

So anyway, that is a summary of my experience giving up shampoo. If you decide to try this too (or have tried it in the past), please let me know your experiences. Below are some pictures of my hair transition over the past 4 weeks.

Day 2

Day 3

Day 4

Day 5

Day 6

Day 7

Day 8

Day 9

Day 10

Week 2

Week 3

Week 4

Thursday, April 8, 2010

Changing 3D Correlation Function

One of my "things to think about" from yesterday's blog post was to make changes to the 3D correlation function so that it was truly an auto-correlation (only having one set of randoms which should speed up the calculation by a bit).

I made changes to my 3D correlation function to do this (at least I think I have) and I am running a test to see if I get the same solution as when I have two separate random catalogs (what I was doing) and also to see how much this speeds things up.

This set of runs in in the working directory: ../pythonjess/run201048_1527 and the log file where the code to run this is ../logs/100408log.py.

The new version runs in 7 seconds, whereas before it took 20 seconds! So that is a good improvement.

Here is a plot to show they look the same

Buzz Comments from Erin about qsub and splitting up the jobs:

Erin Sheldon - Have you used the PBS system on riemann? You can submit jobs and they get sent to different machines on the system. You could split your random runs into 25 different jobs, run them in parallel on 25 different machines, and then just add up the counts later.

That's a factor of n_machines speedup, better than you could every hope to get by just writing faster code.

Below is a sample script, let's call it test.pbs. You can submit with "qsub test.pbs". Of course change all the log file paths to make sense for you.
Erin

#PBS -l nodes=1:ppn=1
#PBS -l walltime=24:00:00
#PBS -q batch
#PBS -N this_job_name
#PBS -j oe
#PBS -o /path/to/where/the/pbs/script/is/test.pbs.pbslog
#PBS -m a
#PBS -V
#PBS -r n
#PBS -W umask=0022

# put setups here

setup numpy -r ~esheldon/exports/numpy-trunk
setup scipy -r ~esheldon/exports/scipy-trunk
setup esutil -r ~esheldon/exports/esutil-trunk
setup python -r ~esheldon/exports/python/2.6.4

# this log is different than above, you can
# watch it in real time
logfile=/path/to/where/the/pbs/script/is/test.pbs.log

# send commands to python
python &> "$logfile" <6:49 amDeleteUndo deleteReport spamNot spam

Erin Sheldon - oh, and to see the status of your jobs
qstat -n -1 -u your_user_name

or equivalently:
~esheldon/python/bin/pbswatch6:55 am

Wednesday, April 7, 2010

Speeding-up the Correlation Function

I ran another comparison between Sheldon's and my correlation function on a larger data set. Erin had suggested that I play with the depth (this is something to do with the grid spacing of his function). Below is an email from him:

So the default depth of the tree is depth=10. It seems that a lower depth, and thus lower resolution, is better for these very large 10-degree search angles. Here are some timing numbers that show depth=6 is best. You would do this to set your depth:

h=esutil.htm.HTM(depth)
depth seconds
3 137.153673172
4 114.97971487
5 104.600896835
6 99.5818519592
7 102.73042202
8 119.931453943
9 185.077499866
10 460.641659021
11 1502.35429096

Erin also thinks I am correlating out to too large of an angle. Alexia did a quick calculation and agrees with this assessment. This is another reason why my correlation functions are taking so long. Erin suggests that I try working in physical space (Mpc) instead of angular space:

Another thing to try is working in physical space. 10 degrees is definitely too large in angle. If you are, for example, willing to work at 30Mpc you could get big speedups. You know the redshifts of the spectroscopic sample of course. So you just bin in physical projected separation as defined by:

d*angle

where d is the angular diameter distance in Mpc and angle is in radians. The bincount function is set up to do this. If you send the scale= keyword then the rmin,rmax arguments will be in units of the scale*angle.

Let's say you have the spectroscopic sample. ra1, dec1, z1 And the second sample is the photometric sample. ra2,dec2

# you can also give omega_m etc. here, see docs
cosmo = esutil.cosmology.Cosmo()

# get angular diameter distance
# to points in set 1
d = cosmo.Da(0.0, z1)

rmin = 0.025 # Note units in Mpc
rmax = 30 # Mpc
nbin = 25

rlower,rupper,counts = h.bincount(rmin,rmax,nbin,ra1,dec1,ra2,dec2,scale=d)

Now rlower,rupper are also in Mpc.

I did a bigger run on 356,915 objects out to 10 degrees, here are the numbers:

For Erin's
Start time = 12:59:24.71
End time = 13:32:24.96
Run time = 1980.25 seconds ≈ 33 minutes

For Mine
Start time = 12:57:33.83
End time = 13:40:35.13
Run time = 2581.30 seconds ≈ 43 minutes

A few things to think about:

Should I change the correlation angle as a function of the spectroscopic redshift? Can the reconstruction handle this?
I am currently running the DD, DR, RD, RR correlations all in succession, but they could be run simultaneously and this could in theory speed up the calculation.
What are the correct angles and comoving distances I should be correlating out to? Obviously going out to 20 degrees is too much. Alexia's quick calculations estimates that I only need to go out to 6 degrees, but this varies with redshift.
Sheldon's code does run faster and has more functionality. Should I use it from now on?
The 3D correlation function is always an auto-correlation function, and so there is no need to calculate both DR and RD and so I could change this code to run faster by calculating the correlation as DD - 2DR + RR. Of course RR is going to be 25X more points that DD and 5X as DR (I am over sampling by 5) so most the time of the calculation is spent there.

Plot

Happy Birthday Toni and Cleo

Tuesday, April 6, 2010

Comparing Correlation Function

I ran a simple experiment to compare Erin Sheldon's correlation function and mine/Alexia's. I modified my correlation function so that it was only calculating the data-data pair counts. I did the same with Erin's. I did this on a set of 53,071 objects. I did an auto-correlation for simplicity (both data sets were the same).

For Erin's
Start time = 16:01:53:98
End time = 16:09:11:49
Run time = 437.51 seconds

For Mine
Start time = 16:24:27:99
End time = 16:26:12:03
Run time = 104.04 seconds

The number of pair counts match pretty well, except for at the ends which I believe is just a binning effect:

I'm kind of confused why my correlation function is 4X faster on this data set, when ball-park numbers Erin and I were comparing before implied that his should run much faster. I am going to see if maybe they scale differently with number of points by running on a larger data set.

Monday, April 5, 2010

Sheldon's Correlaton Function

Erin Sheldon is helping me with my 2D correlation function. He has one that runs a lot faster than the one I have been using. He has been amazing in helping me to get it working with my data set.

After some incompatibility problems with various packages we needed for running our respective code it seems to be able working now (well it compiles and passes the checks). However, when I try to run it I get the following error:

RuntimeError: found log_binsize < 0

I've pinged Erin about this, so we'll see if he has ideas. The code I'm using to test this is in the following log file: .../logs/100405log.py

7:15 Update: Erin Rocks! He just got back to me with a fix. It is running now! I'm going to do some tests to see how long it takes and compare it's answer with my correlation function's answer.

Outputs:
Generating HTM ids
Generating reverse indices
rmin: 0.001
rmax: 10
degrees?: True
nbin: 25
logrmin: -3
logrmax: 1
log binsize: 0.16
len(scale_array) = 0

Each dot is 500 points
......................................................................
35000/53071 pair count: 176696860
....................................

In [15]: rlower
Out[15]:
array([ 1.00000000e-03, 1.44543977e-03, 2.08929613e-03,
3.01995172e-03, 4.36515832e-03, 6.30957344e-03,
9.12010839e-03, 1.31825674e-02, 1.90546072e-02,
2.75422870e-02, 3.98107171e-02, 5.75439937e-02,
8.31763771e-02, 1.20226443e-01, 1.73780083e-01,
2.51188643e-01, 3.63078055e-01, 5.24807460e-01,
7.58577575e-01, 1.09647820e+00, 1.58489319e+00,
2.29086765e+00, 3.31131121e+00, 4.78630092e+00,
6.91830971e+00])

In [16]: rupper
Out[16]:
array([ 1.44543977e-03, 2.08929613e-03, 3.01995172e-03,
4.36515832e-03, 6.30957344e-03, 9.12010839e-03,
1.31825674e-02, 1.90546072e-02, 2.75422870e-02,
3.98107171e-02, 5.75439937e-02, 8.31763771e-02,
1.20226443e-01, 1.73780083e-01, 2.51188643e-01,
3.63078055e-01, 5.24807460e-01, 7.58577575e-01,
1.09647820e+00, 1.58489319e+00, 2.29086765e+00,
3.31131121e+00, 4.78630092e+00, 6.91830971e+00,
1.00000000e+01])

In [17]: counts
Out[17]:
array([ 468, 512, 758, 1216, 2176, 4266,
6962, 13332, 25388, 46232, 83912, 146620,
246088, 377354, 523490, 674092, 972164, 1683416,
3167032, 6144742, 12083396, 22744048, 41867790, 71782726,
105281816])

Sunday, April 4, 2010

Eclipse Re-Installation

Riemann has been offline for the last 24 hours and so I haven't been able to do any work. Thus I finished my taxes, cleaned my room, did laundry and have fixed a bunch of problems with my computer which I had been putting off. One of these problems was an error I was getting in Eclipse (my python editor was closing unexpectedly and it wouldn't update the software). I completely re-installed Eclipse and because this is a little involved and I don't do it very often I thought I'd outline it here so that if I ever need to do it again it will be easier.

Install Eclipse for C/C++ (download here)
In Eclipse go to Help → Install New Software...
Install the Python Extension: http://pydev.org/updates
Install the IDL Extension: http://eclipsecorba.sourceforge.net/update
Install the Subversion Extension: http://subclipse.tigris.org/update_1.6.x

I also had to update my Mac's version of subversion by downloading it here: http://www.open.collab.net/downloads/subversion/

It required me registering which I did with my favorite user name and password.

Eclipse seems to be working great now. No more crashing, and it opens and closes much faster. It automatically updates (before it had errors every time I tried to update). And now riemann is up again, so I can do some actual research.