Monday, December 24, 2012

On Floors and Integers

A couple projects I have worked on in the past have involved Monte Carlo techniques for placing point particles (representing stars in a galaxy) in such a way that we have a constant surface density. In order to test that we do indeed have a constant surface density, we simply count the number of points from r=0 to r=rMax in steps of dr and display the total number per bin over the total radius.

Conceptually, this is fairly simple. Fortunately, this is also a fairly easy program too. If we have a radius of 15 units (for the galaxy I was modeling, this would be 15 kiloparsecs), the easiest thing to do would make an integer array with 15 cells so that the bin size is 1 unit. Then you could just take the floor of the radial position and get the distribution quickly:
do i=1,iMax
   b=floor(radius(i))
   hist(b)=hist(b)+1
enddo
where b and hist are appropriately defined as integers.

Alternatively, since Fortran returns integer values for integer math, dividing by 1 can accomplish the same thing:
do i=1,iMax
   b=radius(i)/1
   hist(b)=hist(b)+1
enddo
Upon testing these methods on a the same set of 10 million real values, floor takes about 3.9 seconds compared to 3.8 for dividing by 1. Similar results are found for using ceiling and dividing by 1 and adding 1.

If we want smaller bins, say dr=0.5, we might think we would run into the problem of determining which half-unit this point belongs to in order to add it to the correct bin--this would basically involve something like finding the remainder of real(floor(radius(i)))-radius(i) and seeing if it is greater or lesser than 0.5 and then putting it into the appropriate bin.

However, there is far more simple method of doing this that requires us to first do two things:
  (1) Set the histogram lower index to 0 and the upper index to number of bins - 1 (i.e., allocate(hist(0:31))
  (2) Multiply the radius by 2:
allocate(hist(0:31)) do i=1,iMax
   b=2*radius(i)
   hist(b)=hist(b)+1
enddo
If radius(i)=0.75, multiplying this by 2 gives 1, which is the correct bin. If radius(i)=4.25, then multiplying by 2 gives 8, which is also the correct bin. Note that we could have left the indices as they were and just add 1 to the multiplication of 2, but this method adds a needless (and potentially costly) extra mathematical operation.

So, the next time you have to create a histogram of your data, recall this easy-binning method, which can easily be extended to larger or smaller bin sizes.

Comments always welcome!

No comments:

Post a Comment