Overview
For my PhD research, I am doing empirical studies on a corpus of Java
applications to determine the extent to which programmers follow certain design principles. The design principles I am interested
in are described in the context of the object-oriented paradigm by John Lakos in his book "Large-Scale
C++ Software Design" [Lak96], though the origin of these principles
can be traced back to Dijkstra [Dij68b] and Parnas [Par74]. I
concentrate on these particular principles because I believe there are
strong arguments linking them to specific, important software quality
attributes. The evidence I have collected so far indicates that these design principles are not
widely-followed.
One might argue that we already know that, for the most part,
programmers don't follow design principles. Foote and Yoder [FY00] have
stated the most frequently deployed software architecture is the Big
Ball of Mud; Parnas [Par96] has said most programs are not
well-structured; Wirth [Wir96] has said most programs are not modular,
rather they are monolithic. The problem I have with all these claims is that
they are not supported by scientific evidence -- rather they are
based on casual observation, and are subjective judgments. Our knowledge in software engineering ought
to be rooted in science, since engineering is the application of
scientific principles to practical ends. More importantly, we can be
more assured of the truth of knowledge supported by scientific evidence,
than knowledge supported only by hype, fashion, folklore or anecdotal
evidence.
In the remainder of this document I briefly describe some of the
research I have done so far. A more detailed (or rigorous) account of
this research can be found in these papers ("An Empirical Study of
Cycles among Classes in Java" and "A
Simple Metric for Package Design Quality"). There is also a PowerPoint
version of this document that I have used in talks given
to industry professionals available here.
1.0 Preliminaries
A Java program comprises a number of types. That is
to say, during the course of a Java program's execution the types that
it comprises are loaded into memory, and instances of these types
reflect its runtime state. Some of the types that comprise a program are
declared in its source files; others are declared in external libraries
(e.g. the standard Java API). Since Java is a strongly typed language,
there are compilation dependencies among the types that comprise a
program. If we were to depict a typical Java application as a directed
graph, where the nodes in the graph represent types, and the edges
represent compilation dependencies, we might get a graph that looks
something like that of Figure 1.

Figure 1: A typical Java application
What generalizations can we make about the graph of a typical Java
program?
- Externally-declared types cannot depend on source-declared
types. If externally-declared types did depend on source-declared
types then these externally-defined types would not be truly
external to our program. More concretely, if the Java API or (any
other external libraries) depended on the user programs that utilize
them then the maintainers of these libraries would not be able to
compile them without all of the programs that used them!
- The graph is connected. In a connected graph there is no way
to partition the nodes of the graph without having edges cross the
partitions. To understand why the graph of a program must be connected
consider what is said by Stevens et al. in their seminal paper on
coupling and cohesion: "Modules must at least pass data or they cannot
functionally be a part of a single system. Thus connections that pass
data are a necessary minimum" [SMC74]. Consider also what is said by
Fowler: "You can break a program into modules, but these modules will
need to communicate in some way -- otherwise, you’d just have multiple
programs" [Fow01]. In essence, a type that comprises a program must
communicate (and therefore depend on) at least on other type that also
comprises the program. If it does not, then it is unlikely that it is
truly part of the same program.
- Usually the source-declared type subgraph is also
connected. Consider just the subgraph of a program that contains nodes
that represent source-declared types and the edges between these nodes.
It is often the case that this subgraph is connected. The argument for
why this is connected is similar to that for the whole graph being
connected but with one additional detail. The additional detail is that
externally-defined types are usually not sufficient in themselves allow
communication between source-defined types. Somewhere along the lines a
source defined type connects other source-defined types that are
otherwise only coupled to externally-defined types. Also, the types
declared in source defined types must be instantiated somewhere, and we
have already argued that that they cannot be instantiated in externally
defined types.
For the purposes of considering program structure,
or design, it is often useful to consider only the types
declared in the source files of an application. After all, as the
developers of a program the only types we can modify are those declared
in the application’s source files. We can alter the dependencies these
types have on one another, and on externally-declared types, but we
cannot (easily or sensibly) alter the dependencies externally-defined types have on
one-another. Often we don’t even have the source code for these
externally-defined types.
Another reason to exclude externally-declared types
from our discussions about the design of any given program is that we
may consider coupling (from source-declared types) to
externally-declared types to be more innocuous than coupling to
source-declared types [Lak96, Ber93]. We may consider coupling to
externally-declared types as relatively innocuous because:
-
Externally-declared types are stable and
proven. By making a decision to use externally-declared types we
are (albeit implicitly) assuming that these types are tested and
proven. There is little likelihood these types will contain bugs and
there is little likelihood the interfaces to these types will
change. With respect to change we might say that externally defined
types are more stable than types declared in the source files of our
application (see [Mar96b]).
-
Externally-declared types are well
understood. Most (good) Java developers have a thorough
understanding of the external libraries they use, especially the
standard API. When a developer sees a source-declared type that
makes extensive use of java.lang.String he (or she) will likely have
a better chance of understanding it than if it uses a custom string
type (e.g., myApp.MyString).
Correspondingly, changes a developer
makes to a source-declared type that only depends on
externally-declared (particularly "standard") types will likely be
faster and less prone to error.
-
Externally-declared types are ubiquitously
available. This is especially true of the API. If we declare a
type that only depends on types declared in the Java API then we can
be sure that we can lift this type -- and this type alone -- for use
in the context of another system. If, on the other hand, we have a
source-declared type that depends on other source-declared types,
and we want to reuse it in the context of another system we are
faced with a problem. Either we have to lift the other
source-declared types on which it depends (and continue this process
to its transitive closure); or we have to modify the type to
eliminate the dependency. Neither solution is particularly good.
In terms of modifying code: studies have shown that
even the smallest, most trivial change to source code can cause bugs
(see [SB89]). Martin outlines some other good reasons to avoid reuse by
"code copying" (see [Mar96a]). In terms of copying a source file, and
all the source files on which it transitively depends: Lakos states that
"in order for a […] subsystem to be reused successfully it must not be
tied to a large block of unnecessary code" [Lak96, p.14]. We don’t want
the "subsystems" that comprise a software system to be tied to
unnecessary blocks of code because when deploy them in the context of a
new system we also have to deploy all this extra, unnecessary code. This
makes the new system more complicated because there are more source
files to understand. This extra code (on which the subsystem we have
reused depends) also has the potential to be the source of bugs in our
new system.
2.0 Shape
So far we have represented programs as directed
graphs, the nodes of which are types and the edges of which are
compilation dependencies. I have also argued that it is reasonable to
consider just the types declared in the source files of a program on the
basis that we can only modify the types declared in source files and
that coupling to externally defined types is more innocuous than
coupling to types declared in source files. I have argued that it is
often the case that the graph of just source-declared types is
connected. With all this in mind we will now examine some
less-well-known, but important design principles.
In his book "Large-Scale C++ Software Design"
[Lak96], John Lakos presents graphs in the style of that shown in Figure
1 in order to discuss the overall "shape" of a software system, and its
effect on quality. The graphs of Figure 2 represent two different
software systems, or equivalently, we can think of them as two different
designs of the same system. Assuming all other things are equal (i.e. the
cohesion, naming, operations available on, encapsulation, and sizes the
types involved), which system has the better structure and why?

Figure 2:
"Shapes" of two comparable software systems
Lakos would argue that the system represented by Figure 2(a) has a
better structure than that represented by the graph of 2(b) on the basis
of:
-
Reuse. In order to reuse a type involved in a
cycle, i.e. to deploy that type in the context of a new system and
for it to be able to compile, we must also deploy all the types it
transitively depends on. In a cyclic system a type transitively
depends on all the types it is in a cycle with. In acyclic systems
there are always types at the bottom of the graph that have very
small transitive dependencies and are good candidates for reuse. (Remember
the discussion above where we argued that we don’t want to break
dependencies by modifying source code).
-
Understanding. Many researchers have claimed
cycles are detrimental to understanding. The argument is that to
understand a type involved in the cycle we potentially have to
understand every other type in the cycle. In types not involved in
cycles, to understand it we need only understand all the types on
which it transitively depends. Lakos says "… you have probably been
in a situation where you were looking at a software system for the
first time and you could not seem to find a reasonable starting
point or a piece of the system that made sense on its own. Not being
able to understand or use any part of a system independently is a
symptom of a cyclically dependent design" [Lak96, p.3]. Fowler says
"Such systems [those with cyclic dependencies] are harder to
understand because you have to go around the cycle many times"
[Fow01].
-
Testing. Types that are involved in cycles
inherently have more dependencies than those that not involved in cycles.
Every type in a cycle transitively depends on every other type in
that cycle. Because types involved in cycles have more dependencies
they likely have more interactions with other types. This makes unit
testing these classes more difficult because unit testing is meant
to be about testing a class in isolation, independently from its
environment. Also, cycles among types inhibit integration testing --
particularly it makes determining an integration test order
difficult. In a cycle there is no good starting point for
integration testing, whereas in acyclic systems we can test from the
bottom up.
(A more thorough discussion of the effect of cyclic
dependencies on reuse, testing and understanding can be found
here.)
A key observation of the graphs of Figure 2 is that
(b) is cyclic while (a) is "layered" or "hierarchical". The benefits of
a hierarchical structure were first discussed by Dijkstra in his paper
structure on the T.H.E Operating System [Dij68b]. The benefits of a
hierarchical structure were further discussed by Parnas in his seminal
paper [Par72]. In a later work Parnas has also cautioned on the problem
of cyclic dependencies [Par78]. Booch has said in the context of object
oriented program that all well-structured OO architectures are layered
[Boo95, p.53].
So now we’ve established systems with "layered" or
"hierarchical" structures are "better" than those with cyclic
structures, which is the best of the three hierarchically structured
systems whose graphs are depicted in Figure 3? Again assume that all
other things are equal (i.e. cohesion, size, etc of the types in
involved).

Figure 3: Different "Shapes" of three layered systems
Lakos [Lak96, p.195] ranks the systems of Figure 3 from best to worst
as: (c), (b), (a). He argues that (a) is inflexible w.r.t. testing,
reuse and parallel development. In (a), there is only one type that can
be tested completely independently of any other (source-declared) types.
Compare this to (b) where there are 4 types that can be tested
completely independently of any other (source-declared) types. In (a)
there is only one order to proceed for integration testing, whereas in
(b) we can test any one of the bottom types first, and any one of the
bottom types second and so on. In terms of parallel development, the
arguments are similar to that for integration testing. In essence we can
develop 4 of the types in (b) in parallel (the ones at the bottom) whereas we
can only develop the types in (a) in the order in which they are
integration tested -- from the bottom up. In terms of reuse, the types in
(b), on average, transitively depend on less other types than those in
(a). While Lakos ranks (c) as the best, according to our prior
discussion, in reality it is a very unlikely architecture because the
dependencies between source types usually form a connected graph.
The rub is, assuming all other things are equal
(e.g. type-safeness, cohesion, abstraction, sizes and encapsulation of
types), we would prefer to structure our systems as flatter rather than
taller hierarchies. To be clear about what I mean by "hierarchy", I mean
directed acyclic graphs -- i.e. a graph that contains no cycles.
3.0 Measurement
So what we
want is for our programs to have flatter rather than taller layered
(acyclic) structure. How can we measure the extent to which a real Java
application exhibits these two characteristics? I have devised a metric,
Class Reachability Set Size (CRSS), which can be used to
gauge the extent to which a
system is flat and acyclic. Simply put, the CRSS for a class is the
total number of classes it transitively depends on. The computation of CRSS
for the source-declared types comprising the program of Figure 4 is shown in Table 1.

Figure 4: Dependencies between source-declared types in a (small) Java
program
| Node | Reachability Set | CRSS |
| a | {a, b, e, c, f, d, g, h, i} | 9 |
| b | {b, e} | 2 |
| c | {c, f, g, h, i} |
5 |
| d | {d, f, g, h, i} |
5 |
| e | {e} | 1 |
| f | {f, g, h ,i} | 4 |
| g | {g, h ,i} | 3 |
| h | {h, i, g} | 3 |
| i | {i, g, h} | 3 |
Table 1: Computation of CRSS for the program of
Figure 4.
Though the system of Figure 4 is small enough to visually gauge how flat
and cyclic it is, it is not possible to do this will real, large
systems. Additionally, tables of CRSS values (like Table 1) will be too
big to give us a good idea of the distribution of CRSS values for the
system. We need some way of summarising the distribution of CRSS values
that will scale for large software systems. The approach I take is to
produce a histogram of CRSS values. A histogram for the data in Table 1
is shown in Figure 5.

Figure 5: CRSS Histogram for data in Table 1.
The histogram of Figure 5 doesn't really illustrate what I want, because
the system is too small. What I want is to be able to show a "good" and
a "bad" CRSS distribution. Netbeans 4.1 (Figure 6) has a "good" CRSS
distribution, whereas Azureus has a "bad" CRSS distribution. The
Netbeans CRSS distribution is good, because most of the classes in the
system have small transitive dependencies (i.e., low CRSS values). The
distribution of CRSS exhibited by Netbeans is usually indicative of a
flat, relatively acyclic dependency graph -- relatively few classes have
large transitive dependencies. A "bad" CRSS distribution is shown in
Figure 7. It is bad because a large proportion of the classes in the
system have large CRSS values, therefore have high-transitive
dependencies. In fact more than half the classes that comprise Azureus
transitively depend on more than 1300 classes. The type of distribution
exhibited by Azureus is often indicative of tall or very cyclic systems.
(A more thorough explanation of CRSS distributions (with empirical data)
is given
here).

Figure 6: "Good" CRSS Distribution

Figure 7: "Bad" CRSS Distribution
You might wonder why there is a big gap in the Azureus distribution
(from 51-1300). It's due to a large number of classes involved in one
big cycle. The graph-theoretic term for a bunch of classes that are all
mutually reachable, or all cyclically dependent with one another is a
strongly connected component. I have tried to illustrate this with a
dependency graph and a corresponding histogram all in Figure 8.

Figure 8: Dependency graph with large SCC and corresponding CRSS histogram.
4.0 Concluding Remarks
In this document I have tried to summarise some of the research that I have done as part
of my PhD. It turns out that many real (commercial and open source) systems have lousy CRSS distributions and
contain many large SCCs.
At the moment I'm looking at refactoring techniques for improving the structure of such systems (see
here). I'm
also looking at empirically establishing a connection between flat, acyclic systems and specific quality
attributes because as Fenton and Pleeger [FP96] note, the relationships between
a design principles and specific external
quality attributes have so rarely been established.
(Update 2006-05-29: I have made a tool available
here for collecting these metrics. It can be
used to see what the CRSS distribution and SCCs are for a project you're
working on. You might even like to email me the results!)
(Update 2006-05-31: I have made contact with, and received responses
with the ArgoUML developers and Azureus developers. You can view what
they have to say about the designs of their systems w.r.t. the stuff on
this page
here and
here.)
(Another Update 2006-05-31: I have added some notes on the
design and
evolution of JUnit.)
(Update 2006-06-01: Some
discussion on the jEdit mailing list.)
(Update 2006-10-12:
Post on the Soot mailing list about the structure of Soot.)
5.0 References
[Boo95] Grady Booch. Object solutions: managing the object-oriented
project. Addison Wesley Longman
Publishing Co., Inc., Redwood City, CA, USA, 1995.
[Dij68b] Edsger W. Dijkstra. The structure of the THE-multiprogramming
system. Commun. ACM,
11(5):341--346, 1968.
[FP96] Norman E. Fenton and Shari Lawrence Pfleeger. Software
Metrics: A Rigorous and Practical Approach. International Thomson
Computer Press, Boston, MA, USA, 1996.
[FY00] Brian Foote and Joseph W. Yoder. Big ball of mud. In N.
Harrison, B. Foote, and H. Rohnert, editors, Pattern Languages of
Program Design, volume 4, pages 654--692. Addison Wesley, 2000.
[Fow01] Martin Fowler. Reducing coupling. IEEE Software, 18(4):102--104,
2001.
[Lak96] John Lakos. Large-scale C++ software design. Addison Wesley
Longman Publishing Co., Inc.,
Redwood City, CA, USA, 1996.
[Mar96a] R. C. Martin. Granularity. C++ Report, 8(10):57--62,
November-December 1996.
[Mar96b] Robert C. Martin. Stability. The C++ Report, 1996.
[Par72] D. L. Parnas. On the criteria to be used in decomposing systems
into modules. Commun. ACM,
15(12):1053--1058, 1972.
[Par74] David Lorge Parnas. On a ‘buzzword’: Hierarchical
structure. In IFIP Congress, pages 336--339, 1974.
[Par78] David L. Parnas. Designing software for ease of extension and
contraction. In ICSE ’78: Proceedings of the 3rd international
conference on Software engineering, pages 264--277, Piscataway,
NJ, USA, 1978. IEEE Press.
[Par96] David Lorge Parnas. Why software jewels are rare.
Computer, 29(2):57--60, 1996.
[SB89] M. Stark and E. Booth. Using Ada to maximize verbatim software
reuse. In TRI-Ada ’89:
Proceedings of the conference on Tri-Ada ’89, pages 278--290, New York,
NY, USA, 1989. ACM
Press.
[SMC74] W. P. Stevens, G. J. Myers, and L. L. Constantine. Structured
design. IBM Syst. J., 13(2):115--
139, 1974.
[Wir95] Niklaus Wirth. A plea for lean software. Computer,
28(2):64--68, 1995.
|