Recently the committee that is preparing the next standard of C++, known as C++20, approved the inclusion of modules. Modules are good™ but they pose some interesting challenges to implementors and users. In this post I will ruminate a bit about what challenges have impacted Fortran.
With my colleagues, I often claim that Fortran is an interesting case study in programming language evolution. Fortran was initially created in 1957 and since its standardisation it has evolved a lot. These 60 years have left lots of scars in the language in form of quirky syntaxes and awkward constructs. This, of course, at the eyes of today designs: they probably made a lot of sense when they were proposed.
Fortran had to evolve from a model of "bunch of statements that make up a program" into a more structured system with a program plus subroutines and functions (this happened in Fortran 66). This was done by defining "program units", namely the main program unit, subroutine program unit, function program unit and a block data program unit (too long to explain here its purpose). A Fortran 66 program is then a collection of one main program unit with zero or more of the other program unit kinds.
This model works well but was designed in a way that program units were independent of each other. This wasn't a problem in Fortran 66 and 77 because the language stablished conceptually simple communication mechanisms between the program units: either global variables (via a thing called common blocks) and parameters that were always passed by reference to the data element being passed. This lack of information precluded basic things like checking that the arguments to a call are appropiate for the function or subroutine being called.
These limiting factors, along with probably a desire to have better modularisation capabilities, led the addition of a new program-unit called module in Fortran 90. Modules came with many good features: they allowed grouping functions and subroutines that had related purposes in a module. They also allowed declaring global variables without relying on the, effective yet fragile, mechanism of common blocks. Another important feature of Fortran 90 was the introduction of explicit interfaces. Functions could now be declared to have an interface and typechecking was possible. Not only that, it also allowed passing more complex data-types like pointers or arrays whose size can be queried at runtime (Fortran 77 didn't have a standard notion of pointer and arrays were more limited).
Modules brought Fortran 90 to a more modern world. Later, in Fortran 2008, they were extended with another program unit called submodule. The original modules of Fortran 90 forced the programmer to implement everything in a single module. This made a bit difficult for users to further modularise the module implementation. We won't be discussing submodules today. If you've followed C++20 modules, Fortran submodules are similar to C++20's module implementation partitions (interfaces cannot be partitioned in Fortran).
How modules work
Modules have two parts. A module specification part and a module subprogram part. The specification part states what this module has to offer (this is non-standard terminology). It can be, publicly, offered to the users of the module or privately to the current module. The subprogram part is used to implement the functions and subroutines, collectively known as the module procedures, offered by the module.
Program units, including module subprograms, can use modules. This brings the names (of functions, subroutines, variables, types, etc.) of what the module publicly offers into the current program unit. Because Fortran scoping is mostly flat, clashes may arise between different modules offering the same names. It is possible to restrict the set of names used and even to rename them to avoid collisions.
This is an example of the definition of a module that implements a function that offers a constant named
version and a function named
add that adds two numbers.
Once this module has been defined, another program unit can use it.
Files? What files?
So far we haven't discussed how we physically represent our Fortran program. Seems reasonable that we will want to use files (there are not many alternatives that may work for Fortran here, truth be told).
One option is to have our program in a single file. This works for small programs. The only constraint we have to fulfill is that the modules appear before in the file, as I mentioned above.
Turns out that for most interesting Fortran programs, a single file is untenable. So developers split them in many files. This is nothing special actually and it works. A file usually contains one or more program units. If we ignore for a moment the
INCLUDE line (which is what hinders using other things than files, say a database of program units), a program unit is to be entirely contained in a single file. This includes modules, of course.
If we want to be able to write Fortran programs using several files, our Fortran compiler (actually the compiler and linker, dubbed translator in the Fortran standard) must support this scenario of separate compilation. This leads to two interesting facts: a) we need to be able to tell what the module is offering at the point where we use it b) modules enforce an order, not only within a file between module definitions and its users but also between files that define modules and files that use those modules.
The issue b) above is nowadays deemed a build system concern. See this very interesting document by Kitware, the creators of CMake, explaining their approach to address this. Most of the complexity arises from the fact that if we change a file that defines a module we want to recompile all its users. Also, we can only tell what modules are defined by a source file after we process (scan/parse) it in some form, this could be obscured even more by the usage of a preprocessor (which the Fortran standard does not specify but many vendors support). So the build system is forced to do some sort of two-pass process.
Issue a) is definitely a concern fo the compiler.
The module interface
We can name "what the module has to offer" as the module interface. This module interface must be available every time we use a module.
Fortran compilers, in order to support separate compilation, store the module interface in an ancillary file. Often with the extension
.mod. So a file containing
basic_calculus module above, not only produces a file with compiled code (i.e. a
file.o file) but also a
basic_calculus.mod file. This file will must be available when another program unit (in the same file or in another file) uses the module
What goes in the module interface?
This is where things get a bit thorny both for Fortran compiler implementors and Fortran users: the standard does not define a format for the modules.
It may well be sensibly argued that the standard does not have to dictate a specific format. And I agree. Unfortunately the industry hasn't come together to define a common Fortran module format. Ever.
This means that each Fortran vendor provides their own module interface format (sometimes in binary form, sometimes in textual form). Some of these formats are incompatible between vendors and even incompatible between different compiler versions of the same vendor.
Is this a problem? Fortran is mostly used in the HPC community. In that community it is not uncommon to rebuild all software components from source, so in principle this wouldn't be a major problem. Unfortunately this complicates distribution of proprietary software in binary forms (usually optimised libraries). If a vendor wants to provide a Fortran module for its software component it will have to provide one version for every supported compiler. And this is unfortunate because it may happen that newer versions of compilers may become unusable just because they can't read old module files.
Other programming languages can get away with this interoperability issue because of a number of reasons. Either the module interface is standardised or else there is only one vendor/implementation.
I hope C++20 doesn't do this mistake again and major vendors agree on a common module interface format.
Where is the module interface?
Another problem when solving a) is to define a way to find the module interface. Most programming languages specify a filesystem layout that supports modularisation. For instance files implementing modules must have specific names or be found in specific directory names.
Fortran does not define any of this. Vendors usually provide a way to specify where the module interface files have to be created. They also provide a way to tell the compiler where it can find the modules. That said, this is more a problem for build systems because Fortran has a global namespace (i.e. no hierarchy) for modules, so the compiler implementation is pretty straightforward here.
I think modules have been a good feature for Fortran even if they come with lots of challenges for the ecosystem. The standard sets very little constraints and I think the standard is fine in this regard.
However, I wish vendors had proposed a more constrained behaviour (e.g in some de facto Standard) which still fulfills the Standard requirements. Say, for example, a module must be implemented in a module source file with a specific file name pattern or directory layout) and no other program units appear in a module source file. Also I wish vendors had agreed on a common module interface format which can be generated and consumed efficiently by any Fortran translator.
I assume the niche status of Fortran, basically limited to HPC, is what does not motivate vendors to collaborate and push forward common strategies that foster interoperability. I'm sure that would make the Fortran ecosystem thrive but I also see that from an economical point of view it may not be worth the investment.