Thursday, December 6, 2012

On Functions

Like every other programming languages, Fortran allows you to write your own functions to be evaluated in your codes. There are two types of functions in Fortran: statement functions (also called inline functions) and subprogram function (also called external functions). In this post, we will be discussing the use and possible advantages of these different types of functions, starting with the statement function.

In order to use the statement function, you must
  (1) declare the function name and its argument in the variable declarations at the beginning of the program
  (2) define the function before any other variables are defined, even if it depends on other variables not yet defined
A simple example is given below
program fcnTest
   implicit none
   real::f,a,pi,x,dx
   integer::i

   f(a)=a**5

   pi=acos(-1.0);dx=0.0000012;x=1.0
   do i=1,1000000
     print *,x,(4.5**x)*sin(pi*f(x))
     x=x+dx
   enddo
end program

Note also that multiple statement functions can be employed in Fortran. We could have chosen to define g such that g(a)=4.5**a and used this in the above code (which, though I do not get into it here, does not change the results at the bottom of the page).

The function f is actually from a gnuplot tutorial I have seen online and looks like, using QtiPlot, a free-ware plotting program that I have grown to like.

I also chose to iterate the loop 1,000,000 times because it will give a long enough runtime to make a difference. If I used 1,000 iterations, the millisecond runtimes would not be enough to show a difference between one or the other.

I compiled the code without any optimization flags (i.e., just ifort -o fcnTest fcnTest.f90) and ran it twice, getting 60.013 s and 59.901 s, which is pretty slow to do 5 operations a million times. If I changed the code to output the data to a formatted file,
program fcnTest
   implicit none
   real::f,a,pi,x,dx
   integer::i

   f(a)=a**5

   open(unit=10,file='fcnTest.txt',status='unknown')

   pi=acos(-1.0);dx=0.0000012;x=1.0
   do i=1,1000000
     write(10,*),x,(4.5**x)*sin(pi*f(x))
     x=x+dx
   enddo
end program

which dropped the runtime to a fairly consistent 3.95 s with deviations of about 0.04 s (I ran it 5 times). Fortran also enables putting data in binary files simply by adding the option form='unformatted' in the open command and replacing write(10,*) with write(10). Doing so, the runtime drops to a slightly-less-fairly consistent 2.78 s with deviations of about 0.09 s (also ran 5 times).

The reason for the significant difference is, hopefully, obvious: Fortran was not designed for putting a large amount of data to the screen, it was designed for dumping it to data files for analyzing later.

A problem that exists with the binary output (form='unformatted') is that the binary output is machine-dependent, so dumping data in that format and then transferring it to another machine can cause issues with reading the file on the second machine. I have my own desktop (running Ubuntu Linux with an AMD Phenom X4 processor), my own laptop (running Arch Linux with an Intel i5 processor), and access to a super computer cluster (running Scientific Linux with Intel Xeon and Itanium processors) and I can tell you now that I have not experienced any file-reading issues between the three of them, but that is not to say this problem does not exist.

If you are a one-computer type of person, this problem should not be an issue. If you are working with multiple computers and are dumping data into binary format, it might be worth your time to write a second program to read in the binary code and output it as a formatted file to prevent these issues. Or, if you are working with multiple computers and large data sets, you might just want to use something like HDF5.

Anyway, now that we have some data on the inline function statement, how does it compare to the external function statement? This requires a significant change to the code, but it is not difficult:
program fcnTest
   implicit none
   real::pi,x,dx
   real,external::f
   integer::i

   pi=acos(-1.0); dx=0.0000012; x=1.0
   do i=1,1000000
     print *,x,4.5**x*sin(pi*f(x))
     x=x+dx
  enddo
end program fcnTest

real function f(x) result(a)
   implicit none
   real::x

   a = x**5
end function f

Compiling and running this in the same manner, it took 59.985 s and 60.010 s to run the display to screen. The formatted output took an average of 4.03 s with a deviation of 0.07 s; the unformatted output took an average of 2.84 s with a small deviation of about 0.06 s.

As a scientist, I like pretty pictures and tables of the data. Taking the data from above:

output inline runtime external runtime
screen 59.96±0.08 60.00±0.18
formatted file 3.95±0.04 4.03±0.07
unformatted file 2.78±0.09 2.84±0.06

The data indicates the following:
  (1) There is no advantage between inline and external functions at least for simple functions
  (2) Unformatted output is faster than formatted output
  (3) You should never put 1,000,000 data points on the screen (probably any output more than 10 lines long should go to a file)
If multiple subroutines require the use of the same function, then clearly the choice should be to use the external function. If the function is rather complex, the external function should also be the clear choice.
Still, it is good to know that, for simple functions in a simple program, writing it once at the beginning of your code can make it more tidy without sacrificing any performance.

Comments always welcome.

No comments:

Post a Comment