Monday, April 12, 2010

Parallelizing Likelihood

I spent most of yesterday and today parallelizing the likelihood to make it compute faster. This is how it works:
  1. Start with a list of potential targets that you want to compute the likelihoods on.
  2. Split up the targets into junks of 1000 objects using the function splitTargets (in the likelihood directory).
  3. This will create separate fits files for every 1000 objects in your set, and save them to the current directory with the format: targetfile#.fits, where # will range from 0 to (number of targets / 1000).
  4. Go to the directory with the target files and run the following script: likelihood.script (also in the likelihood directory)
  5. The likelihood script loops through all files in the current directory of the format targetfile*.fits and via qsub runs the likelihood on each fits file and outputs the result into a file likefile#.fits.
  6. This took about 2.5 hours to run on 775,000 targets and just depends on how many jobs are in the queue. Running the likelihood on each 1000 object file takes about 10 minutes.
  7. Then use the mergelikelihoods function (in the likelihood directory) to read all the likefile#.fits and merge them into one likelihood file.
  8. I then write this file to the file likelihoods.fits, which has a 1-1 mapping with the pre-split targetfile: targetfile.fits
The log file which runs this whole code is at the following location: ../logs/

No comments:

Post a Comment