-->
Frequently Asked Questions PDF Print Email

Do you have another question that doesn't appear here? This email address is being protected from spam bots, you need Javascript enabled to view it and we may add it to this list.


Why should I care about program design?

Many statisticians, biostatisticians and methodologists need to program from time to time. By trial and error, you may gradually settle into a routine for generating code and create a program that seems to work well enough for the immediate purpose, but you are not satisfied with the finished product. For example, it may frequently crash and you cannot tell why. In hindsight, you find that early design decisions have locked in some major limitations, making the program difficult to maintain, enhance or extend. We, too, have made these mistakes and learned some valuable lessons about the need for good design. Our book, with its extensive source code examples, provides a template that shows you how to write a good statistical program in Fortran the first time. <Top>


Why program in Fortran?

FORTRAN (originally an acronym for FORmula TRANslation) was the first widely used programming language and, after its standardization in 1966, became the dominant tool for engineering and scientific applications. By the mid-1980s, competition from newer languages such as C began to expose serious limitations in FORTRAN 77, prompting a major extension in 1990 and a minor revision in 1995. (In 1990, the name of the language ceased to be regarded as an acronym, and it no longer needs to be written in capital letters.) Yet the essential features of the earlier language remain intact, allowing programmers to easily update legacy code and make use of extensive published routines and libraries that are available. In modern computing environments, Fortran remains an excellent choice for high-performance applications in engineering, mathematics, and statistics. <Top>

Why should I use Fortran 95?

Many current Fortran users acquired their basic skills during the 1970s or 1980s and continue to follow a classic programming style that has changed little over the last quarter century. The modern language has some very useful features, however, including:

  • modules, which help to organize large, complex programs;
  • derived types, which allow variables to be bundled together and passed as a single argument;
  • dynamic allocation, by which the dimensions of arrays can be determined at run-time;
  • pointers, which increase the power and flexibility of the language in countless ways; and
  • many other syntactical niceties, including optional arguments and function overloading.

Those who graduate from Fortran 77 to Fortran 95 will soon find that programs are simpler to write, easier to debug, and more reliable. <Top>


Why use pseudo object-oriented programming?

Unlike C++ and Java, Fortran 95 is not a true object-oriented language. Nevertheless, by adopting certain programming practices, it is possible to mimic many of the essential qualities of the object-oriented style and realize their benefits. The key features of Fortran 95 that make this possible are modules, derived types, and pointers. By adopting the pseudo object-oriented style recommended in our book, the developer gains many advantages of object-oriented programming--for example, the ability to recreate or enhance one part of the program without breaking the functionality of other parts. Adopting this style leads to programs that are more reliable and easier to develop, maintain, and extend. <Top>


Which compiler should I use?

We do not intend to promote or discourage the use of any specific product. At the moment, we know of six Fortran 95 compilers being sold for the Windows environment. We have personally tried Lahey/Fujitsu (Version 7.1), Salford FTN95 (Version 4.60) and Intel (formerly Compaq) Visual Fortran (Version 8.1)--and found them to be excellent. Each of these compilers fully implements the Fortran 95 standard and is suitable for creating console applications and dynamic-link libraries (DLLs). On the Unix/Linux side, we have tried Intel and found it suitable for creating console applications and shared objects (SOs), which are analogous to Windows DLLs. For developing COM, our experience is limited to Intel Visual Fortran. Our methods and tools for turning Fortran code into COM servers, which we cover at length in our book, rely on Intel Visual Fortran and Microsoft Visual Studio .NET. Other compilers may be capable of producing COM servers as well, but the procedures could be substantially different. <Top>


How can I write a statistical application for a broad audience?

It is not hard for statisticians to write programs for their use or for other statisticians; an S-PLUS or R function may suffice. But creating software that proves useful to behavioral scientists, medical researchers, biologists, or engineers can be very challenging. Consumers of statistical methods are accustomed to analyzing data in many different environments. Some will say, "Your method sounds interesting, but my colleagues won't try it unless they can do it in SAS" (or SPSS or Excel or MATLAB or...).Writing different versions of the same program for these various computing environments is unattractive and impractical.

What is the best way to implement a method once and package it in a form that can be used by a wide audience? A good old-fashioned Fortran console application can be compiled and run with little or no modification on Windows, Macintosh, Unix, or Linux machines. Unfortunately, users have grown so accustomed to working in graphical environments that many will balk at an application with no GUI. And console applications do not readily interact with packages such as Excel or SAS.

We encourage statistical researchers to write Fortran console programs using the style described in our book. The template conforms to the ANSI Fortran 95 standard. After the console program is working satisfactorily, we then take one more step: turning the program into a COM server. A COM server makes it possible to call the same computational engine from a wide variety of Windows applications: an Excel spreadsheet; a SAS, SPSS, S-PLUS, R or MATLAB session; or a rudimentary or elaborate graphical interface created in Visual Basic at a later time. If the console application is developed from our template, the process of turning it into a COM server is quite simple and can be carried out in a matter of hours using the free development tools that we have created in conjunction with the Intel Visual Fortran compiler. <Top>


What are DLLs and how can they be useful?

DLLs (dynamic-link libraries) are the building blocks of large applications that run in Windows. (In Unix and Linux systems, these components are called shared objects or SOs.) DLLs allow multiple programs to share a common set of procedures without any duplication of the compiled code. A DLL does not run on its own but must be called by an executable program. The binary code within the DLL is loaded into memory only when it is needed.

Computational routines written in Fortran can be packaged as a DLL and called from popular statistical applications including S-PLUS, R and PROC IML in SAS. One chapter of our book is devoted to creating and using Fortran DLLs. DLL calling conventions may vary from one application to another, however, and getting a single DLL to interact with all of these applications can be tricky.

Because conventional DLLs are not object-oriented, the kinds of procedures that can be called through them are somewhat limited. Conventional DLLs are a good choice for calling simple computational procedures whose arguments are scalars and arrays. They are less effective for storing data that must persist from one call to the next. Arrays of variable sizes may need to be passed in the FORTRAN 77 style, with the dimensions also passed as arguments. Array arguments may not be redimensioned within a procedure, so the size of each output must be known in advance. In some cases, you may encounter difficulties in passing character strings, and the total number of arguments may be restricted. When using DLLs, programming mistakes can be difficult to diagnose and fix. A conventional DLL does not spawn a new process on your computer, but loads new procedures into the program that is already running. If a problem arises, then the whole application may crash, with little information given as to why. Because of these limitations, we recommend conventional DLLs only when you want to quickly develop an external computational procedure for a single statistical package. <Top>


How can I begin to write a statistical application for the Windows environment?

A traditional Fortran program executes commands in a predetermined sequence and then stops. In contrast, Windows applications are highly interactive and event-driven, reacting incrementally to mouse clicks and keystrokes. An event-driven program needs to be more robust; allowing the user to vary the event sequence, providing helpful guidance and preventing illegal operations along the way. Programmers accustomed to writing in the old-fashioned style of FORTRAN 77 may have no idea how to create routines that operate in the graphical, point-and-click world of Windows. By adopting the pseudo object-oriented style described in our book, it is not difficult to convert a Fortran console program into a COM server that can readily interact with existing Windows applications (e.g. Excel) or with a custom built graphical user interface (GUI) written in a language like Visual Basic. The pseudo object-oriented strategy helps you to create and package Fortran source code into a modules whose public procedures are equally useful for both batch and interactive processing. <Top>


What is a COM server?

The Component Object Model (COM) is a programming standard that emerged and evolved during the 1990s. COM began with a technology called object linking and embedding (OLE) to facilitate communication between software products. As OLE evolved into COM, developers of Windows applications outside of Microsoft began to build their own applications on it as well. Another major development was automation, which allows COM objects to be accessed from scripting languages such as Visual Basic for Applications (VBA), the language used by Excel and other popular data-management programs to write macros.

A COM-compliant software component is called a COM server. A COM server is binary (compiled) code that contains object definitions, called classes, from which objects can be instantiated at run time. COM servers are packaged and distributed to Windows computers as .dll or .exe file. A COM client, on the other hand, is any program, application, or component that uses a COM server's classes. The client may be written in any programming language, as long as it adheres to the COM standard. Because COM has been standardized, any COM server can potentially be used by any COM client.

Knowledge of COM opens abundant possibilities for statisticians, because SAS, S-PLUS, SPSS, MATLAB, Excel and many other applications now provide COM client and COM server capabilities. The primary reason why a statistician ought now to consider creating a COM server is that, by developing and maintaining this single component, one carefully written set of computational procedures can be used in an ever-growing number and variety of applications. Our book presents an approach to COM server development using Fortran. Although COM itself is rather complicated, we provide guidance, recommendations, and development tools that take care of most of the details. <Top>


What is a COM client?

A COM client is a program that uses a COM server. In COM, most of the hard work takes place on the server side. A COM server must be written carefully and methodically. COM clients, on the other hand, are easier to create. They can be Excel macros, S-PLUS and R functions, SPSS macros and scripts, MATLAB M-files, and programs written in Visual Basic and C#.

We have created a development tool that allows you to invoke COM servers directly from an ordinary SAS program through PROC IML. This tool, which we call SASCOMIO (for "SAS/COM interoperability"), consists of a Windows DLL and a SAS .cbt file that defines the DLL calling conventions. <Top>


What about .NET?

An environment developed by Microsoft in which object-oriented software components run within a virtual machine. (A virtual machine is a computer program that acts as a small computer within a computer.) .NET may be considered as an extension of the concepts of COM. COM promotes interaction among programs written in different languages on a single computer, whereas .NET allows interoperability across computers and networks. Although .NET is now the industry standard for development in Windows, COM servers still work well and can easily interact with .NET. In fact, computationally intensive statistical routines implemented as COM servers tend to run faster than native .NET components, because the binary instructions are executed directly, bypassing the virtual machine. <Top>


What about Fortran 2003?

The final committee draft of the next version of Fortran has recently been adopted. The next standard, called Fortran 2003, is a major enhancement of Fortran 95, but Fortran 2003 compilers may not be available for some time. None of the language features described in our book have been deleted in the 2003 standard, so our techniques and examples will work for many years to come. <Top>