For my PhD research, I am doing empirical studies on a corpus of Java applications to determine the extent to which programmers follow certain design principles. The design principles I am interested in are described in the context of the object-oriented paradigm by John Lakos in his book "Large-Scale C++ Software Design" [Lak96], though the origin of these principles can be traced back to Dijkstra [Dij68b] and Parnas [Par74]. I concentrate on these particular principles because I believe there are strong arguments linking them to specific, important software quality attributes. The evidence I have collected so far indicates that these design principles are not widely-followed.

One might argue that we already know that, for the most part, programmers don't follow design principles. Foote and Yoder [FY00] have stated the most frequently deployed software architecture is the Big Ball of Mud; Parnas [Par96] has said most programs are not well-structured; Wirth [Wir96] has said most programs are not modular, rather they are monolithic. The problem I have with all these claims is that they are not supported by scientific evidence -- rather they are based on casual observation, and are subjective judgments. Our knowledge in software engineering ought to be rooted in science, since engineering is the application of scientific principles to practical ends. More importantly, we can be more assured of the truth of knowledge supported by scientific evidence, than knowledge supported only by hype, fashion, folklore or anecdotal evidence.

In the remainder of this document I briefly describe some of the research I have done so far. A more detailed (or rigorous) account of this research can be found in these papers ("An Empirical Study of Cycles among Classes in Java" and "A Simple Metric for Package Design Quality"). There is also a PowerPoint version of this document that I have used in talks given to industry professionals available here.


1.0 Preliminaries

A Java program comprises a number of types. That is to say, during the course of a Java program's execution the types that it comprises are loaded into memory, and instances of these types reflect its runtime state. Some of the types that comprise a program are declared in its source files; others are declared in external libraries (e.g. the standard Java API). Since Java is a strongly typed language, there are compilation dependencies among the types that comprise a program. If we were to depict a typical Java application as a directed graph, where the nodes in the graph represent types, and the edges represent compilation dependencies, we might get a graph that looks something like that of Figure 1.

Figure 1: A typical Java application

What generalizations can we make about the graph of a typical Java program?

  • Externally-declared types cannot depend on source-declared types. If externally-declared types did depend on source-declared types then these externally-defined types would not be truly external to our program. More concretely, if the Java API or (any other external libraries) depended on the user programs that utilize them then the maintainers of these libraries would not be able to compile them without all of the programs that used them!
  • The graph is connected. In a connected graph there is no way to partition the nodes of the graph without having edges cross the partitions. To understand why the graph of a program must be connected consider what is said by Stevens et al. in their seminal paper on coupling and cohesion: "Modules must at least pass data or they cannot functionally be a part of a single system. Thus connections that pass data are a necessary minimum" [SMC74]. Consider also what is said by Fowler: "You can break a program into modules, but these modules will need to communicate in some way -- otherwise, you’d just have multiple programs" [Fow01]. In essence, a type that comprises a program must communicate (and therefore depend on) at least on other type that also comprises the program. If it does not, then it is unlikely that it is truly part of the same program.
  • Usually the source-declared type subgraph is also connected. Consider just the subgraph of a program that contains nodes that represent source-declared types and the edges between these nodes. It is often the case that this subgraph is connected. The argument for why this is connected is similar to that for the whole graph being connected but with one additional detail. The additional detail is that externally-defined types are usually not sufficient in themselves allow communication between source-defined types. Somewhere along the lines a source defined type connects other source-defined types that are otherwise only coupled to externally-defined types. Also, the types declared in source defined types must be instantiated somewhere, and we have already argued that that they cannot be instantiated in externally defined types.

For the purposes of considering program structure, or design, it is often useful to consider only the types declared in the source files of an application. After all, as the developers of a program the only types we can modify are those declared in the application’s source files. We can alter the dependencies these types have on one another, and on externally-declared types, but we cannot (easily or sensibly) alter the dependencies externally-defined types have on one-another. Often we don’t even have the source code for these externally-defined types.

Another reason to exclude externally-declared types from our discussions about the design of any given program is that we may consider coupling (from source-declared types) to externally-declared types to be more innocuous than coupling to source-declared types [Lak96, Ber93]. We may consider coupling to externally-declared types as relatively innocuous because:

  1. Externally-declared types are stable and proven. By making a decision to use externally-declared types we are (albeit implicitly) assuming that these types are tested and proven. There is little likelihood these types will contain bugs and there is little likelihood the interfaces to these types will change. With respect to change we might say that externally defined types are more stable than types declared in the source files of our application (see [Mar96b]).

  2. Externally-declared types are well understood. Most (good) Java developers have a thorough understanding of the external libraries they use, especially the standard API. When a developer sees a source-declared type that makes extensive use of java.lang.String he (or she) will likely have a better chance of understanding it than if it uses a custom string type (e.g., myApp.MyString). Correspondingly, changes a developer makes to a source-declared type that only depends on externally-declared (particularly "standard") types will likely be faster and less prone to error.

  3. Externally-declared types are ubiquitously available. This is especially true of the API. If we declare a type that only depends on types declared in the Java API then we can be sure that we can lift this type -- and this type alone -- for use in the context of another system. If, on the other hand, we have a source-declared type that depends on other source-declared types, and we want to reuse it in the context of another system we are faced with a problem. Either we have to lift the other source-declared types on which it depends (and continue this process to its transitive closure); or we have to modify the type to eliminate the dependency. Neither solution is particularly good.

In terms of modifying code: studies have shown that even the smallest, most trivial change to source code can cause bugs (see [SB89]). Martin outlines some other good reasons to avoid reuse by "code copying" (see [Mar96a]). In terms of copying a source file, and all the source files on which it transitively depends: Lakos states that "in order for a […] subsystem to be reused successfully it must not be tied to a large block of unnecessary code" [Lak96, p.14]. We don’t want the "subsystems" that comprise a software system to be tied to unnecessary blocks of code because when deploy them in the context of a new system we also have to deploy all this extra, unnecessary code. This makes the new system more complicated because there are more source files to understand. This extra code (on which the subsystem we have reused depends) also has the potential to be the source of bugs in our new system.


2.0 Shape

So far we have represented programs as directed graphs, the nodes of which are types and the edges of which are compilation dependencies. I have also argued that it is reasonable to consider just the types declared in the source files of a program on the basis that we can only modify the types declared in source files and that coupling to externally defined types is more innocuous than coupling to types declared in source files. I have argued that it is often the case that the graph of just source-declared types is connected. With all this in mind we will now examine some less-well-known, but important design principles.

In his book "Large-Scale C++ Software Design" [Lak96], John Lakos presents graphs in the style of that shown in Figure 1 in order to discuss the overall "shape" of a software system, and its effect on quality. The graphs of Figure 2 represent two different software systems, or equivalently, we can think of them as two different designs of the same system. Assuming all other things are equal (i.e. the cohesion, naming, operations available on, encapsulation, and sizes the types involved), which system has the better structure and why?

Figure 2: "Shapes" of two comparable software systems

Lakos would argue that the system represented by Figure 2(a) has a better structure than that represented by the graph of 2(b) on the basis of:

  • Reuse. In order to reuse a type involved in a cycle, i.e. to deploy that type in the context of a new system and for it to be able to compile, we must also deploy all the types it transitively depends on. In a cyclic system a type transitively depends on all the types it is in a cycle with. In acyclic systems there are always types at the bottom of the graph that have very small transitive dependencies and are good candidates for reuse. (Remember the discussion above where we argued that we don’t want to break dependencies by modifying source code).

  • Understanding. Many researchers have claimed cycles are detrimental to understanding. The argument is that to understand a type involved in the cycle we potentially have to understand every other type in the cycle. In types not involved in cycles, to understand it we need only understand all the types on which it transitively depends. Lakos says "… you have probably been in a situation where you were looking at a software system for the first time and you could not seem to find a reasonable starting point or a piece of the system that made sense on its own. Not being able to understand or use any part of a system independently is a symptom of a cyclically dependent design" [Lak96, p.3]. Fowler says "Such systems [those with cyclic dependencies] are harder to understand because you have to go around the cycle many times" [Fow01].

  • Testing. Types that are involved in cycles inherently have more dependencies than those that not involved in cycles. Every type in a cycle transitively depends on every other type in that cycle. Because types involved in cycles have more dependencies they likely have more interactions with other types. This makes unit testing these classes more difficult because unit testing is meant to be about testing a class in isolation, independently from its environment. Also, cycles among types inhibit integration testing -- particularly it makes determining an integration test order difficult. In a cycle there is no good starting point for integration testing, whereas in acyclic systems we can test from the bottom up.

(A more thorough discussion of the effect of cyclic dependencies on reuse, testing and understanding can be found here.)

A key observation of the graphs of Figure 2 is that (b) is cyclic while (a) is "layered" or "hierarchical". The benefits of a hierarchical structure were first discussed by Dijkstra in his paper structure on the T.H.E Operating System [Dij68b]. The benefits of a hierarchical structure were further discussed by Parnas in his seminal paper [Par72]. In a later work Parnas has also cautioned on the problem of cyclic dependencies [Par78]. Booch has said in the context of object oriented program that all well-structured OO architectures are layered [Boo95, p.53].

So now we’ve established systems with "layered" or "hierarchical" structures are "better" than those with cyclic structures, which is the best of the three hierarchically structured systems whose graphs are depicted in Figure 3? Again assume that all other things are equal (i.e. cohesion, size, etc of the types in involved).


Figure 3: Different "Shapes" of three layered systems

Lakos [Lak96, p.195] ranks the systems of Figure 3 from best to worst as: (c), (b), (a). He argues that (a) is inflexible w.r.t. testing, reuse and parallel development. In (a), there is only one type that can be tested completely independently of any other (source-declared) types. Compare this to (b) where there are 4 types that can be tested completely independently of any other (source-declared) types. In (a) there is only one order to proceed for integration testing, whereas in (b) we can test any one of the bottom types first, and any one of the bottom types second and so on. In terms of parallel development, the arguments are similar to that for integration testing. In essence we can develop 4 of the types in (b) in parallel (the ones at the bottom) whereas we can only develop the types in (a) in the order in which they are integration tested -- from the bottom up. In terms of reuse, the types in (b), on average, transitively depend on less other types than those in (a). While Lakos ranks (c) as the best, according to our prior discussion, in reality it is a very unlikely architecture because the dependencies between source types usually form a connected graph.

The rub is, assuming all other things are equal (e.g. type-safeness, cohesion, abstraction, sizes and encapsulation of types), we would prefer to structure our systems as flatter rather than taller hierarchies. To be clear about what I mean by "hierarchy", I mean directed acyclic graphs -- i.e. a graph that contains no cycles.


3.0 Measurement

So what we want is for our programs to have flatter rather than taller layered (acyclic) structure. How can we measure the extent to which a real Java application exhibits these two characteristics? I have devised a metric, Class Reachability Set Size (CRSS), which can be used to gauge the extent to which a system is flat and acyclic. Simply put, the CRSS for a class is the total number of classes it transitively depends on. The computation of CRSS for the source-declared types comprising the program of Figure 4 is shown in Table 1.

Figure 4: Dependencies between source-declared types in a (small) Java program

NodeReachability SetCRSS
a{a, b, e, c, f, d, g, h, i}9
b{b, e}2
c{c, f, g, h, i} 5
d{d, f, g, h, i} 5
f{f, g, h ,i}4
g{g, h ,i}3
h{h, i, g}3
i{i, g, h}3

Table 1: Computation of CRSS for the program of Figure 4.

Though the system of Figure 4 is small enough to visually gauge how flat and cyclic it is, it is not possible to do this will real, large systems. Additionally, tables of CRSS values (like Table 1) will be too big to give us a good idea of the distribution of CRSS values for the system. We need some way of summarising the distribution of CRSS values that will scale for large software systems. The approach I take is to produce a histogram of CRSS values. A histogram for the data in Table 1 is shown in Figure 5.


Figure 5: CRSS Histogram for data in Table 1.

The histogram of Figure 5 doesn't really illustrate what I want, because the system is too small. What I want is to be able to show a "good" and a "bad" CRSS distribution. Netbeans 4.1 (Figure 6) has a "good" CRSS distribution, whereas Azureus has a "bad" CRSS distribution. The Netbeans CRSS distribution is good, because most of the classes in the system have small transitive dependencies (i.e., low CRSS values). The distribution of CRSS exhibited by Netbeans is usually indicative of a flat, relatively acyclic dependency graph -- relatively few classes have large transitive dependencies. A "bad" CRSS distribution is shown in Figure 7. It is bad because a large proportion of the classes in the system have large CRSS values, therefore have high-transitive dependencies. In fact more than half the classes that comprise Azureus transitively depend on more than 1300 classes. The type of distribution exhibited by Azureus is often indicative of tall or very cyclic systems.

(A more thorough explanation of CRSS distributions (with empirical data) is given here).


Figure 6: "Good" CRSS Distribution


Figure 7: "Bad" CRSS Distribution

You might wonder why there is a big gap in the Azureus distribution (from 51-1300). It's due to a large number of classes involved in one big cycle. The graph-theoretic term for a bunch of classes that are all mutually reachable, or all cyclically dependent with one another is a strongly connected component. I have tried to illustrate this with a dependency graph and a corresponding histogram all in Figure 8.


Figure 8: Dependency graph with large SCC and corresponding CRSS histogram.



4.0 Concluding Remarks

In this document I have tried to summarise some of the research that I have done as part of my PhD. It turns out that many real (commercial and open source) systems have lousy CRSS distributions and contain many large SCCs. At the moment I'm looking at refactoring techniques for improving the structure of such systems (see here). I'm also looking at empirically establishing a connection between flat, acyclic systems and specific quality attributes because as Fenton and Pleeger [FP96] note, the relationships between a design principles and specific external quality attributes have so rarely been established.

(Update 2006-05-29: I have made a tool available here for collecting these metrics. It can be used to see what the CRSS distribution and SCCs are for a project you're working on. You might even like to email me the results!)

(Update 2006-05-31: I have made contact with, and received responses with the ArgoUML developers and Azureus developers. You can view what they have to say about the designs of their systems w.r.t. the stuff on this page here and here.)  

(Another Update 2006-05-31: I have added some notes on the design and evolution of JUnit.)

(Update 2006-06-01: Some discussion on the jEdit mailing list.)


(Update 2006-10-12: Post on the Soot mailing list about the structure of Soot.)


5.0 References

[Boo95] Grady Booch. Object solutions: managing the object-oriented project. Addison Wesley Longman
Publishing Co., Inc., Redwood City, CA, USA, 1995.
[Dij68b] Edsger W. Dijkstra. The structure of the THE-multiprogramming system. Commun. ACM,
11(5):341--346, 1968.
[FP96] Norman E. Fenton and Shari Lawrence Pfleeger. Software Metrics: A Rigorous and Practical Approach. International Thomson Computer Press, Boston, MA, USA, 1996.
[FY00] Brian Foote and Joseph W. Yoder. Big ball of mud. In N. Harrison, B. Foote, and H. Rohnert, editors, Pattern Languages of Program Design, volume 4, pages 654--692. Addison Wesley, 2000.
[Fow01] Martin Fowler. Reducing coupling. IEEE Software, 18(4):102--104, 2001.
[Lak96] John Lakos. Large-scale C++ software design. Addison Wesley Longman Publishing Co., Inc.,
Redwood City, CA, USA, 1996.
[Mar96a] R. C. Martin. Granularity. C++ Report, 8(10):57--62, November-December 1996.
[Mar96b] Robert C. Martin. Stability. The C++ Report, 1996.
[Par72] D. L. Parnas. On the criteria to be used in decomposing systems into modules. Commun. ACM,
15(12):1053--1058, 1972.
[Par74] David Lorge Parnas. On a ‘buzzword’: Hierarchical structure. In IFIP Congress, pages 336--339, 1974.
[Par78] David L. Parnas. Designing software for ease of extension and contraction. In ICSE ’78: Proceedings of the 3rd international conference on Software engineering, pages 264--277, Piscataway,
NJ, USA, 1978. IEEE Press.
[Par96] David Lorge Parnas. Why software jewels are rare. Computer, 29(2):57--60, 1996.
[SB89] M. Stark and E. Booth. Using Ada to maximize verbatim software reuse. In TRI-Ada ’89:
Proceedings of the conference on Tri-Ada ’89, pages 278--290, New York, NY, USA, 1989. ACM
[SMC74] W. P. Stevens, G. J. Myers, and L. L. Constantine. Structured design. IBM Syst. J., 13(2):115--
139, 1974.
[Wir95] Niklaus Wirth. A plea for lean software. Computer, 28(2):64--68, 1995.

Home Research Corpus Software Papers Other