Wednesday, April 28, 2010
Tuesday, April 27, 2010
Likelihood Test Results (4)
eps = 1e-30
bossqsolike = total(likelihood.L_QSO_Z[16:32],1) ;quasars between redshift 2.1 and 3.7
qsolcut1 = where( alog10(total(likelihood.L_QSO_Z[16:32],1)) LT -9.0)
den = total(targets.L_EVERYTHING_ARRAY[0:4],1) + total(likelihood.L_QSO_Z[0:44],1) + eps
num = bossqsolike + eps
NEWRATIO = num/den
NEWRATIO[qsolcut1] = 0 ; eliminate objects with low L_QSO value
IDL> print, n_elements(ql) ; number quasars
461
IDL> print, n_elements(ql)*1.0/(n_elements(sl)+n_elements(ql)) ;percent accuracy
0.256111
IDL> print, n_elements(ql) + n_elements(sl) ; total targeted
1800
IDL>
IDL> print, n_elements(nql) ; number quasars
575
IDL> print, n_elements(nql)*1.0/(n_elements(nsl)+n_elements(nql)) ;percent accuracy
0.319444
IDL> print, n_elements(nql) + n_elements(nsl) ; total targeted
We now go from 25.6% accuracy to 31.9% accuracy! And we find 114 more quasars!
Monday, April 26, 2010
Likelihood Test Results (3)
It seems that adding in the BOSS QSOs doesn't help.
Now I am going to run on the old qso catalog but with the larger redshift range (0.5 < z < 5.0) and see if this improves things. See ../logs/100426log.pro for code.
So this seems to actually make a difference. I have played around with the redshift range for the numerator to get the best selection numbers:
eps = 1e-30
bossqsolike = total(likelihood.L_QSO_Z[14:34],1) ;quasars between redshift 2.1 and 3.5
qsolcut1 = where( alog10(total(likelihood.L_QSO_Z[19:34],1)) LT -9.0)
den = total(targets.L_EVERYTHING_ARRAY[0:4],1) + total(likelihood.L_QSO_Z[0:44],1) + eps
num = bossqsolike + eps
NEWRATIO = num/den
NEWRATIO[qsolcut1] = 0 ; eliminate objects with low L_QSO value
qsolcut2 = where( alog10(total(targets.L_QSO_Z[0:18],1)) LT -9.0)
L_QSO = total(targets.L_QSO_Z[2:18],1)
den = total(targets.L_EVERYTHING_ARRAY[0:4],1) + total(targets.L_QSO_Z[0:18],1) + eps
num = L_QSO + eps
OLDRATIO = num/den
OLDRATIO[qsolcut2] = 0 ; eliminate objects with low L_QSO value
numsel = 1800-1
NRend = n_elements(newratio)-1
sortNR = reverse(sort(newratio))
targetNR = sortNR[0:numsel]
restNR = sortNR[numsel+1:NRend]
ORend = n_elements(oldratio)-1
sortOR = reverse(sort(oldratio))
targetOR = sortOR[0:numsel]
restOR = sortOR[numsel+1:ORend]
nql = setintersection(quasarindex,targetNR)
nqnl = setintersection(quasarindex, restNR)
nsl = setdifference(targetNR, quasarindex)
nsnl = setdifference(restNR, quasarindex)
ql = setintersection(quasarindex, targetOR)
qnl = setintersection(quasarindex, restOR)
sl = setdifference(targetOR,quasarindex)
snl = setdifference(restOR,quasarindex)
IDL> print, n_elements(ql) ; number quasars
461
IDL> print, n_elements(ql)*1.0/(n_elements(sl)+n_elements(ql)) ;percent accuracy
0.256111
IDL> print, n_elements(ql) + n_elements(sl) ; total targeted
1800
IDL>
IDL> print, n_elements(nql) ; number quasars
508
IDL> print, n_elements(nql)*1.0/(n_elements(nsl)+n_elements(nql)) ;percent accuracy
0.282222
IDL> print, n_elements(nql) + n_elements(nsl) ; total targeted
1800
So we go from 25.6% accuracy to 28.2% accuracy! And we find 47 more quasars!
The white points were targeted/missed by both the new and old likelihoods
The magenta points were only targeted/missed by the new likelihood
The cyan points were only targeted/missed by the old likelihood
Now to see how the different luminosity functions do. Below is results running with the Richard 06 luminosity function:
(See ../logs/100427_2log.pro for code)
IDL> print, n_elements(ql) ; number quasars
461
IDL> print, n_elements(ql)*1.0/(n_elements(sl)+n_elements(ql)) ;percent accuracy
0.256111
IDL> print, n_elements(ql) + n_elements(sl) ; total targeted
1800
IDL>
IDL> print, n_elements(nql) ; number quasars
543
IDL> print, n_elements(nql)*1.0/(n_elements(nsl)+n_elements(nql)) ;percent accuracy
0.301667
IDL> print, n_elements(nql) + n_elements(nsl) ; total targeted
1800
This does better than the old luminosity function!
So we go from 25.6% accuracy to 30.2% accuracy! And we find 82 more quasars!
Sunday, April 25, 2010
Likelihood Test Results (2)
Joe suggested doing this. We'll see if it improves things, if not, then looks like we should go back to old catalog and see if modifying the luminosity function helps.
Here is what they look like:
The code to do this is in the following log file: ../logs/100425log.pro
It creates the following file to use as input to Monte Carlo: ~/boss/allQSOMCInput.fits"
../likelihood/qsocatalog/QSOCatalog-Mon-Apr-26-13:22:43-2010.fits
I changed likelihood_compute to look at the above catalog.
Run the likelihood calculation with this new input catalog:
smalltargetfile = "./smalltarget.fits"
smalltargets = mrdfits(smalltargetfile, 1)
.com splittargets2.pro
nsplit = 50L
splitsize = floor(n_elements(smalltargets)*1.0/nsplit)
junk = splitTargets2(smalltargets, nsplit, splitsize)
; run likelihood.script
.com mergeLikelihoods.pro
likelihood = mergelikelihoods(nsplit)
outfile = "./likelihoods2.0-4.0ALLBOSSAddedSmall.fits"
splog, 'Writing file ', outfile
mwrfits, likelihood, outfile, /create
IDL> print, n_elements(ql) ; number quasars
467
IDL> print, n_elements(ql)*1.0/(n_elements(sl)+n_elements(ql)) ;percent accuracy
0.259300
IDL> print, n_elements(ql) + n_elements(sl) ; total targeted
1801
IDL>
IDL> print, n_elements(nql) ; number quasars
418
IDL> print, n_elements(nql)*1.0/(n_elements(nsl)+n_elements(nql)) ;percent accuracy
0.232093
IDL> print, n_elements(nql) + n_elements(nsl) ; total targeted
1801
This doesn't do better than the old likelihood either (same color scheme as last post).
The logfile for this is ../logs/100425log.pro
Saturday, April 24, 2010
Likelihood Test Results (1)
Old Likelihood:
IDL> print, n_elements(ql) ; number quasars
467
IDL> print, n_elements(ql)*1.0/(n_elements(sl)+n_elements(ql)) ;percent accuracy
0.259300
IDL> print, n_elements(ql) + n_elements(sl) ; total targeted
1801
New Likelihood:
IDL> print, n_elements(nql) ; number quasars
447
IDL> print, n_elements(nql)*1.0/(n_elements(nsl)+n_elements(nql)) ;percent accuracy
0.248195
IDL> print, n_elements(nql) + n_elements(nsl) ; total targeted
1801
The magenta points were only targeted/missed by the new likelihood
The cyan points were only targeted/missed by the old likelihood
Friday, April 23, 2010
Running Likelihood Tests
1. First I make a catalog of all the BOSS and otherwise known QSOs which aren't already in the inputs which we also have coadded fluxes by spherematching them with . The code to do this is in the following log file ../logs/100423log_1.pro. This makes a file: ~/boss/coaddedBOSSQSOs.fits which has the coadded fluxes (from varcat files) and redshifts of these QSOs. These fluxes are not de-reddened.
2. Combine this BOSS catalog with the SDSS DR5 "bright" quasars that are included in the old version of the Monte Carlo. This is done by running Joe Hennawi's qso_photosamp.pro program (to get the SDSS DR5 QSOs), reading in coaddedBOSSQSOs file, de-reddening them both, and then merging them into one file, which is then saved as ~/boss/brightQSOMCInput.fits. The code to do this is in the following log file ../logs/100423log_2.pro.
3. Change qso_fakephoto.pro to read in the ~/boss/brightQSOMCInput.fits file insteady of just running qso_photosamp.pro. Save this modified program as ../likelihood/qsocatalog/qso_fakephoto_jess.pro
4. Change hiz_kde_numerator.pro to call qso_fakephoto_jess.pro, instead of qso_fakephoto.pro and save as ../likelihood/qsocatalog/hiz_kde_numerator_jess.pro
5. Follow instructions from this post for how to change/create the luminosity function.
6. Enter the command :
.run hiz_kde_numerator_jess.pro into IDL to generate QSO Catalog.
7. Change likelihood_compute.pro so that it points to this QSO Catalog that you just generated as the input QSO Catalog:
; Read in the QSO
filefile2 = '/home/jessica/repository/ccpzcalib/Jessica/likelihood/qsocatalog/newCatalog.fits'
Thursday, April 22, 2010
Initial Likelihood Tests
The white points were targeted by both the new and old likelihoods
The magenta points were only targeted by the new likelihood
The cyan points were only targeted by the old likelihood (missed by new)
Below is a comparison of missed QSOs targeted with old/new methods.
The white points were missed by both the new and old likelihoods
The magenta points were only missed by the new likelihood
The cyan points were only missed by the old likelihood (targeted by new)
Wednesday, April 21, 2010
Known Quasars and Set Theory
So here is code to combine them without repeats:
;Read in BOSS Truth table
collatefile = '/home/jessica/boss/BOSS_Quasars_3PCv3_cat.fits.gz'
collateinfo = mrdfits(collatefile, 1)
;Trim to only be QSOs (redshift > 2 with a high human confidence)
confcut = where(collateinfo.z_conf_person GT 2 and collateinfo.Z_PERSON GE 2)
confTT = collateinfo[confcut]
;Read in Myers known quasars
knownqsofile = '/home/jessica/boss/knownquasarstar.041310.fits'
knownqsos = mrdfits(knownqsofile, 1)
;Constrain to only be QSOS in the BOSS redshift range
knownqso = knownqsos[where(strmatch(knownqsos.source, "*QSO*"))]
knownbossqsos = knownqso[where((knownqsos.zem GE 2) AND (knownqsos.zem LE 5.0))]
;Spherematch these two sets to find overlapping qsos (qsooverlap)
spherematch, confTT.ra, confTT.dec, knownbossqsos.ra, knownbossqsos.dec, 2./3600, i1, qsooverlap, d12
;Subtract the overlapping QSOs from the Myers known QSOs
knownqsoindex = indgen(n_elements(knownbossqsos.dec))
knownqsos = setdifference(knownqsoindex, qsooverlap)
(the functions setdifference, setunion and setintersection can be found here)
;Combine the two sets to get the ra, dec, and redshift of all known QSOs
quasarsRa = [knownbossqsos[knownqsos].ra, confTT.ra]
quasarsDec = [knownbossqsos[knownqsos].dec, confTT.dec]
quasarsZ = [knownbossqsos[knownqsos].zem, confTT.z_person]
Monday, April 19, 2010
Likelihood Improvements
1. Changed luminosity function. I am currently using the Richards '06 luminosity function. Joe Hennawi thinks I should use a Hopkins, Richards, Hernquist (2007) combination function. I'm going to run on both and see which is better at targeting QSOs. See this post for more details.
2. Changed the redshift range of the QSO Catalog. Before we were only modeling QSOs from 2.0 < z < 3.9. Now we are modeling from 0.5 < z < 5.0. I need to optimize which range we should use for the numerator and which range we should use for the denominator. Nominally we are interested in targeting QSOs in the redshift range 2.15 < z < 3.5, so that is where I will start. Including low redshift QSOs in the denominator of the likelihood should improve the method significantly though.
3. Including BOSS QSOs along with SDSS DR5 QSOs with "good photometry." This fills out regions of the color/flux space which were clipped from SDSS such that we aren't missing QSOs which overlap with the horizontal branch stars. See this post for more details.
To speed up the computing of likelihoods, I am sub-sampling the low redshift QSOs (z > 2.1) by 10% and the high redshift (z < 2.1) by 50%. This reduces the number of objects in the QSO Catalog from 10,000,000 to 1,582,012. See this post for more details.
Joe Hennawi has some ideas for how to do this too which he outlined in an email today (see 4/20/2010 email subject: Likelihood weights):
Hi Guys,
After thinking about this more, and actually trying to work out the math, I convinced myself that indeed you were right. The factored probability I was proposing is not mathematically equivalent to the numerator which you are computing with the likelihood. Indeed, factoring the probability into p1(f_u/f_i/, f_g/f_i, f_r/f_i, f_z/f_i)*p2(f_i), is actually a more constrained model. This is however what we are doing with the extreme deconvolution, and for QSOs, this model is actually I think more desirable, for the following reason. Our model of QSO colors implicitly assumes that QSO colors do not depend on magnitude, and p1 reflects that behavior, whereas p2 is basically the magnitude prior. For stars, colors vary with magnitude. For extreme deconvolution, we had to fit to re-scaled fluxes (e.g. colors) rather than total fluxes in 5-d because gaussian models in general provide poor fits to power law distributions. This is also fine, since the color distributions of the stars vary so slowly with magnitude that you essentially make no errors by treating the probability as separable.
But this doesn't help too much with the current version of likelihood, since we don't plan to totally re-write that code. So I think the best approach would be to just take the entire list of QSOs that you plan to use, and re-scale them onto a grid of i-band fluxes spanning the range you want to consider, e.g. imin=17.8 -22.4. The grid spacing determines how finely you sample the i-band flux space, and this should be significantly smaller than the dispersion of QSO colors about the mean. Not sure what to choose here, but say you chose di = 0.01. Then your list of QSOs in the numerator would be a replication of that same list of QSOs N = (22.4-17.8)/0.01 = 460 times. Now in each i-magnitude bin, you can calculate the weights to apply to each QSO such that the weighted histogram of the QSO redshifts would give you the correct redshift and distribution and number of QSOs as predicted by the luminosity function.
I'm going to see how big of a difference it makes to calculate the likelihoods with my reduced catalog versus the full catalog (in terms of efficiency of selecting QSOs). Before I spend a bunch of time implementing Joe's suggestions.
Thursday, April 15, 2010
Reductions to QSO Catalog
___________________
file2 = "/home/jessica/boss/R06New0.5-5.0.fits"
objs2 = mrdfits(file2, 1)
minz = 0.5
dz = 0.1
nzbin = 16
reduction = 10.
.com subsample.pro
reducedlow = subsample(minz, dz, nzbin, objs2, reduction)
minz = 2.1
dz = 0.1
nzbin = 29
reduction = 2.
reducedhigh = subsample(minz, dz, nzbin, objs2, reduction)
newQSOcatalog = [reducedlow,reducedhigh]
outfile = "./reducedQSOCatalog.fits"
mwrfits, newQSOcatalog, outfile, /create
___________________
Also in the logfile: ../logs/100415log.pro with this code. The above code uses the function subsample.pro which is in the likelihood directory.
Monday, April 12, 2010
Parallelizing Likelihood
- Start with a list of potential targets that you want to compute the likelihoods on.
- Split up the targets into junks of 1000 objects using the function splitTargets (in the likelihood directory).
- This will create separate fits files for every 1000 objects in your set, and save them to the current directory with the format: targetfile#.fits, where # will range from 0 to (number of targets / 1000).
- Go to the directory with the target files and run the following script: likelihood.script (also in the likelihood directory)
- The likelihood script loops through all files in the current directory of the format targetfile*.fits and via qsub runs the likelihood on each fits file and outputs the result into a file likefile#.fits.
- This took about 2.5 hours to run on 775,000 targets and just depends on how many jobs are in the queue. Running the likelihood on each 1000 object file takes about 10 minutes.
- Then use the mergelikelihoods function (in the likelihood directory) to read all the likefile#.fits and merge them into one likelihood file.
- I then write this file to the file likelihoods.fits, which has a 1-1 mapping with the pre-split targetfile: targetfile.fits
Saturday, April 10, 2010
The No-Shampoo Experiment
~~~~~~~~~~~~~
About a month ago I decided to try to give up shampoo. There are many reasons one might want to do this:
- Health: Shampoo has a lot of nasty chemicals in it. These chemicals are bad for your eyes, skin, and scalp. They are also not that great to be putting in your bath water, which eventually goes into the ocean and the bodies of fish and marine life.
- Environment: Hair/beauty products come in often non-recyclable bottles. I found I was generating extra waste using shampoo and conditioner.
- Money: Shampoo / Conditioner can be expensive. I was using Matrix Biolage ($20 / bottle).
- Convenience: I travel a lot, and with the TSA restrictions on liquids, its just has become really annoying to have to deal with bringing beauty products with me when I fly.
- Philosophical: In general I don't like the idea of being dependent on things. It bugs me to think that if I were plopped on a desert island or something, that my body would freak out if I didn't have certain products, chemicals, etc. (Coffee is obviously an exception ;)
After reading online about the "No Poo" movement, I decided to try it myself. I was very very skeptical about this working for me because I've always had VERY oily hair. I am one of those people that if I don't wash my hair everyday, it starts to look like it is wet because the roots get dark and weighed down by the oil. The most I've ever been able to go without washing my hair is 1.5 days without looking like I am homeless.
The goal of going "No Poo" is to eventually wash your hair with nothing except good old water and a washcloth. However, it isn't recommended to go straight from shampoo to water-only. This is something that takes time to transition to because if you are (like I was) using shampoo daily, your scalp is producing a lot of extra oil to replace the oil that you are stripping away with the shampoo. So you need to get your scalp to go back to producing normal amounts of oil before you can go water-only.
The way you transition (or detox from shampoo) is washing with baking soda (as a shampoo) and vinegar (as a conditioner). Baking soda strips away less of your hair's oils than shampoo, and vinegar acts as a detangler by pH-balancing your hair (hair is slightly acidic). My boyfriend likes to make fun of me for using baking soda and vinegar in the shower, because it sounds like an elementary school style science fair volcano. But seriously, this stuff works for cleaning hair.
Below is what I did. A few web sites recommended trying to cut down shampoo to every other day or every third day before switching to baking soda. I couldn't do this because of how oily my hair is, so I simply went from shampooing / conditioning everyday to washing with baking soda / vinegar everyday.
How to wash you hair with baking soda and vinegar:
- Before you get into the shower, brush your hair. This just makes it less tangled and easier to brush after you shower, and also distributes the oils a little bit before washing.
- Put 1 tbs of baking soda into a cup, and mix it with 1 cup of warm water. Stir until well mixed, mixture should feel silky, not pasty or gritty.
- In the shower, apply the baking soda / water mixture to DRY hair. Focus on the scalp area. Pour onto front and back of head. For me it was important that I applied it to dry hair. When I applied it to wet hair, I didn't get the same cleaning results from the baking soda and my hair got really oily.
- With clean hands, massage your scalp. This obviously wont feel the same as foamy shampoo, but try to rub your scalp anyway, this helps clean out any grime/sweat/dirt and also distribute the oils down the shaft of your hair. Some people online say that they prefer to make a baking soda paste, and massage that into their scalp. That made my scalp / hair too dry. But it is also something to try.
- Let the mixture sit in your hair for a couple minutes (optional). I let it sit in longer if my hair is feeling oily that day, and shorter if it isn't.
- Rinse out the baking soda with water.
- Mix 1 tbs of vinegar with 1 cup of water. I use organic apple cider vinegar because people on the internet say that this smells better than white vinegar. You can also use lemon juice instead of vinegar, or add lemon juice / rind to the vinegar to make it smell better. What I do is put some vinegar in a smaller bottle (4 oz) and add 3-4 drops of perfume oil to it. I keep this mixture in the shower and simply shake up this bottle to mix the perfume oil and vinegar before pouring out 1 tbs for conditioning.
- Pour the vinegar / water mixture on the ENDS of your hair. Avoid your face / eyes as the vinegar will irritate them. I don't put any on my scalp, because my hair is really oily, and the vinegar makes this worse. However, if you have problems with dry hair / scalp you can also put this on your roots. I simply dip the ends of my hair into the cup with vinegar and water.
- Rinse out the vinegar (optional). The reaction of most people when I tell them I am putting vinegar in my hair is a look of repulsion. They say: "Doesn't that make your hair smell bad?" I didn't believe this before I tried it either, but the truth is that once your hair dries the vinegar smell completely goes away. The only smell left in my hair is the smell of the perfume oil. This is even more true if you rinse out the vinegar, but you actually don't have to rinse and it seriously doesn't smell like vinegar after your hair is dry. Weird I know, but true. Vinegar has anti-bacterial properties and so putting it in your hair kills yucky things. This actually helps make your hair smell fresh and clean.
- Towel dry your hair well. Focus on the roots. This also helps with distributing your scalp's oils to the rest of your hair.
I'm not going to lie, for the first couple weeks my hair was a pretty weird. Some days it would be very oily and heavy/flat looking, other days it would be frizzy or coarse and brittle. However, I found that because I was mixing my own "shampoo" and "conditioner" it was easy to make small adjustments to try and get my hair to look the way I wanted it. If the ends were too dry, I put a little bit of oil (jojoba) in the vinegar mixture. If my hair got too oily, I would add more baking soda, or let it sit in my hair longer. I liked that I could adjust my regiment based on what was actually going on with my hair on a given day. After I figured out what worked, I ended up having less "bad hair days" than when I used regular shampoo, because it was easier to fine-tune.
After three weeks of no shampoo I decided to try to reduce the amount I washed my hair. I've been slowly reducing the amount of baking soda I am using, and also skipping days where I don't wash my hair at all, or only wash with water and a wash cloth. The way this works is basically you try to distribute the oils by rubbing a wash cloth from your scalp down to the ends of your hair. I've read you can also do this with a fine-tooth comb or a soft brush. I really like taking showers, and so it is easier for me to do it in the shower. The ideas is that eventually I will only have to use baking soda / vinegar once every few weeks... or maybe not at all?
Now that my hair has adjusted to not using shampoo. I am really surprised at the results:
- My hair is a lot shinier. I think this is because there are more oils in it.
- My hair doesn't get as tangled (I have really long hair that used to get knots in it). This is probably because I have been brushing it more often, or maybe because (supposedly) the vinegar helps smooth out the cuticles in your hair, making it not catch on itself as much.
- My hair is less oily. I can now go up to 3 days without using baking soda. I usually rinse my hair every day or two with water. I'm most surprised about this because like I said, before I couldn't go more than 1.5 days without shampooing before my hair looked like crap.
- My hair smells really good. I thought that giving up shampoo would mean my hair would smell bad, or sweaty or weird. Because I put perfume oil in the vinegar, my hair smells like whatever perfume I want it to smell like. When I don't use perfume oil, my hair just smelled like me. It's hard to explain what that smells like, but it isn't perfume, or soapy, just like clean skin.
- My hair has more volume. My hair seems fuller and bouncier. This is also weird because I would think more oils would weight my hair down more. But perhaps I was over-conditioning before and that was making my hair flatter?
- My hair has more highlights. I think this is because I had "chemical build-up" in my hair from using so many hair products. Now that I am using less stuff in my hair, it looks brighter and a little lighter in color.
- My hair is easier to style. It just holds a style better. I think the oils act kinda like a natural hairspray or something, but when I comb it a certain direction, it just stays that way. Before my hair had a lot of wave to it that made it go all different directions, it isn't doing that as much now.
- I'm spending a lot less money: It's been over a month doing this and I've gone through 1/3 a box of baking soda ($1 / box) and 1/2 a bottle for apple cider vinegar ($2 / bottle). Much cheaper than my old regime. I could also buy this stuff in bulk if I wanted and get it for even less. And hopefully I'll continue to reduce the amount of these products that I use.
- Traveling is easier: The only liquid I need to bring is a small travel-sized container that I filled with vinegar. Alternatively, vinegar is so cheap I could just buy it when I get to my location, or even ask to borrow some from the kitchen of the hotel I am staying.
On the down side.
- I have to deal with a A LOT of teasing from Adam, and other people who hear about this. There has definitely been jokes about smelling like salad dressing and making model volcanoes in the shower.
- When I go to the gym and sweat, or otherwise get my hair wet, it starts to smell a little bit like vinegar again. I stopped noticing this once I started rinsing out the vinegar, and adding a little bit of perfume oil.
- I have noticed that my ends are a little dry. This is easily combated with a little bit of jojoba oil, and hopefully will become less of a problem as I taper off the baking soda.
So anyway, that is a summary of my experience giving up shampoo. If you decide to try this too (or have tried it in the past), please let me know your experiences. Below are some pictures of my hair transition over the past 4 weeks.
Week 4
Thursday, April 8, 2010
Changing 3D Correlation Function
I made changes to my 3D correlation function to do this (at least I think I have) and I am running a test to see if I get the same solution as when I have two separate random catalogs (what I was doing) and also to see how much this speeds things up.
This set of runs in in the working directory: ../pythonjess/run201048_1527 and the log file where the code to run this is ../logs/100408log.py.
The new version runs in 7 seconds, whereas before it took 20 seconds! So that is a good improvement.
Buzz Comments from Erin about qsub and splitting up the jobs:
That's a factor of n_machines speedup, better than you could every hope to get by just writing faster code.
Below is a sample script, let's call it test.pbs. You can submit with "qsub test.pbs". Of course change all the log file paths to make sense for you.
Erin
#PBS -l nodes=1:ppn=1
#PBS -l walltime=24:00:00
#PBS -q batch
#PBS -N this_job_name
#PBS -j oe
#PBS -o /path/to/where/the/p
#PBS -m a
#PBS -V
#PBS -r n
#PBS -W umask=0022
# put setups here
setup numpy -r ~esheldon/exports/nu
setup scipy -r ~esheldon/exports/sc
setup esutil -r ~esheldon/exports/es
setup python -r ~esheldon/exports/py
# this log is different than above, you can
# watch it in real time
logfile=/path/to/whe
# send commands to python
python &> "$logfile" <
qstat -n -1 -u your_user_name
or equivalently:
~esheldon/python/bin
Wednesday, April 7, 2010
Speeding-up the Correlation Function
So the default depth of the tree is depth=10. It seems that a lower depth, and thus lower resolution, is better for these very large 10-degree search angles. Here are some timing numbers that show depth=6 is best. You would do this to set your depth:
h=esutil.htm.HTM(depth)
depth seconds
3 137.153673172
4 114.97971487
5 104.600896835
6 99.5818519592
7 102.73042202
8 119.931453943
9 185.077499866
10 460.641659021
11 1502.35429096
where d is the angular diameter distance in Mpc and angle is in radians. The bincount function is set up to do this. If you send the scale= keyword then the rmin,rmax arguments will be in units of the scale*angle.
Let's say you have the spectroscopic sample. ra1, dec1, z1 And the second sample is the photometric sample. ra2,dec2
# you can also give omega_m etc. here, see docs
cosmo = esutil.cosmology.Cosmo()
# get angular diameter distance
# to points in set 1
d = cosmo.Da(0.0, z1)
rmin = 0.025 # Note units in Mpc
rmax = 30 # Mpc
nbin = 25
rlower,rupper,counts = h.bincount(rmin,rmax,nbin,ra1,
I did a bigger run on 356,915 objects out to 10 degrees, here are the numbers:
For Erin's
Start time = 12:59:24.71
End time = 13:32:24.96
Run time = 1980.25 seconds ≈ 33 minutes
For Mine
Start time = 12:57:33.83
End time = 13:40:35.13
Run time = 2581.30 seconds ≈ 43 minutes
A few things to think about:
- Should I change the correlation angle as a function of the spectroscopic redshift? Can the reconstruction handle this?
- I am currently running the DD, DR, RD, RR correlations all in succession, but they could be run simultaneously and this could in theory speed up the calculation.
- What are the correct angles and comoving distances I should be correlating out to? Obviously going out to 20 degrees is too much. Alexia's quick calculations estimates that I only need to go out to 6 degrees, but this varies with redshift.
- Sheldon's code does run faster and has more functionality. Should I use it from now on?
- The 3D correlation function is always an auto-correlation function, and so there is no need to calculate both DR and RD and so I could change this code to run faster by calculating the correlation as DD - 2DR + RR. Of course RR is going to be 25X more points that DD and 5X as DR (I am over sampling by 5) so most the time of the calculation is spent there.
Tuesday, April 6, 2010
Comparing Correlation Function
For Erin's
Start time = 16:01:53:98
End time = 16:09:11:49
Run time = 437.51 seconds
For Mine
Start time = 16:24:27:99
End time = 16:26:12:03
Run time = 104.04 seconds
The number of pair counts match pretty well, except for at the ends which I believe is just a binning effect:
Monday, April 5, 2010
Sheldon's Correlaton Function
After some incompatibility problems with various packages we needed for running our respective code it seems to be able working now (well it compiles and passes the checks). However, when I try to run it I get the following error:
RuntimeError: found log_binsize < 0
I've pinged Erin about this, so we'll see if he has ideas. The code I'm using to test this is in the following log file: .../logs/100405log.py
7:15 Update: Erin Rocks! He just got back to me with a fix. It is running now! I'm going to do some tests to see how long it takes and compare it's answer with my correlation function's answer.
Outputs:
Generating HTM ids
Generating reverse indices
rmin: 0.001
rmax: 10
degrees?: True
nbin: 25
logrmin: -3
logrmax: 1
log binsize: 0.16
len(scale_array) = 0
Each dot is 500 points
......................................................................
35000/53071 pair count: 176696860
....................................
In [15]: rlower
Out[15]:
array([ 1.00000000e-03, 1.44543977e-03, 2.08929613e-03,
3.01995172e-03, 4.36515832e-03, 6.30957344e-03,
9.12010839e-03, 1.31825674e-02, 1.90546072e-02,
2.75422870e-02, 3.98107171e-02, 5.75439937e-02,
8.31763771e-02, 1.20226443e-01, 1.73780083e-01,
2.51188643e-01, 3.63078055e-01, 5.24807460e-01,
7.58577575e-01, 1.09647820e+00, 1.58489319e+00,
2.29086765e+00, 3.31131121e+00, 4.78630092e+00,
6.91830971e+00])
In [16]: rupper
Out[16]:
array([ 1.44543977e-03, 2.08929613e-03, 3.01995172e-03,
4.36515832e-03, 6.30957344e-03, 9.12010839e-03,
1.31825674e-02, 1.90546072e-02, 2.75422870e-02,
3.98107171e-02, 5.75439937e-02, 8.31763771e-02,
1.20226443e-01, 1.73780083e-01, 2.51188643e-01,
3.63078055e-01, 5.24807460e-01, 7.58577575e-01,
1.09647820e+00, 1.58489319e+00, 2.29086765e+00,
3.31131121e+00, 4.78630092e+00, 6.91830971e+00,
1.00000000e+01])
In [17]: counts
Out[17]:
array([ 468, 512, 758, 1216, 2176, 4266,
6962, 13332, 25388, 46232, 83912, 146620,
246088, 377354, 523490, 674092, 972164, 1683416,
3167032, 6144742, 12083396, 22744048, 41867790, 71782726,
105281816])
Sunday, April 4, 2010
Eclipse Re-Installation
- Install Eclipse for C/C++ (download here)
- In Eclipse go to Help → Install New Software...
- Install the Python Extension: http://pydev.org/updates
- Install the IDL Extension: http://eclipsecorba.sourceforge.net/update
- Install the Subversion Extension: http://subclipse.tigris.org/update_1.6.x
It required me registering which I did with my favorite user name and password.
Eclipse seems to be working great now. No more crashing, and it opens and closes much faster. It automatically updates (before it had errors every time I tried to update). And now riemann is up again, so I can do some actual research.