'Number of classes' as a size metric for OO Applications

In the research I'm involved in, it turns outs that the most useful measure of "size" is "number of classes" (NOC). This is because we currently almost exclusively consider only code written in object-oriented languages and the structural properties we're interested in relate to the class graph of the application, that is, the graph formed by considering classes as vertices and relationships between classes as edges. In this context, the most common "size" metric, "lines of code" (LOC), just doesn't make sense.

This being the case, it is tempting to say that there are not the problems with NOC as a size metric as there are with LOC [FP96]. Except, there are.

Let us restrict our discussion to Java. The first question we have to answer is, how do we count nested classes? These are, by definition [JLS (chapter 8)], classes that aren't "top level", that is, classes whose declaration occurs in the scope of another class declaration. Those classes that are not defined in the scope of another class are referred to as "top level classes".

The argument could be made that nested classes are conceptually part of their enclosing class, and so it doesn't make sense to count them separately from their enclosing class. Another argument is that nested classes are in fact instantiated in the Java Virtual Machine in a manner no different to top level classes, and so they should be counted separately. There are certainly plenty of examples where the nested class is nested principally for namespace reasons, and it could just as easily made a top-level class. One way out of this is to simple have two metrics: "number of distinct classes", and "number of top level classes".

As I write this, I am bothered by the fact that entities can be only either nested or not nested. Perhaps we should distinguish the levels that they are nested at? So we have the number of 0-nested classes (a.k.a. top level classes), the number of 1-nested classes (classes whose enclosing class is a top level class), and so on (imagine some suitable recursive definition).

Now we must consider the effect of the Java Standard API. Any Java application must use part of this API (java.lang.Object at least), and usually an application will use many classes from the Standard API. Should we count such classes when determining application size? The argument against would be that such classes are not part of the development effort of the application, and so they should not be counted. The argument for is that that anyone wishing to comprehend the application's source code must take into consideration the API classes. This gives us another dimension with which to consider size, and on this dimension, one (at least) new metric, something like "number of user defined classes". Of course we also combine this dimension with the previous and so have "number of user defined top level classes" and so on.

Then there is the question of third-party libraries. Yes, the Standard API is going to be part of any Java application, but many applications use other APIs as well, and perhaps they should be counted separately from user defined classes and Standard API classes. I'm of two minds over this one. While I'd like to be as precise as possible, the problem is there are different levels of "third-party". For example, there is a bit of difference in the nature of the JUnit API and the JArgs API as one is more widely used than the other. And both of them are different from the API used by the current project but developed by the previous project (and other variations). The difficulty here is in coming up with well-defined categories, and so I don't and just lump everything as "third party".

Speaking of JUnit, when measuring size, should we count the many JUnit (and FIT, and similar) test classes? It's tempting to say no, but I worry about connections between test code and the "real" application code (e.g., some methods in the application code may be there solely to support testing). It also depends somewhat on what's being measured — the source code base or the deployed code? So in this case I think it is a matter of being careful in describing what "system" is actually being measured.

In fact, saying "class" is a bit of mis-nomer. Java has 4 distinct "class-like" entities: class, enum, interface, and annotation. Does it make sense to treat them all as the same kind of thing when conceptually they are different? Furthermore, Java has the concept of "type", which, as well as the class-like entities mentioned, also includes primitive types. Now I can't at the moment imagine how it could be useful to determine application size in terms of the number of primitive types used, but nor will I claim the necessary omniscience to rule it out.

I can't help wondering about exceptions. These are classes, "just" like any other class, except that they extend Throwable, are almost exclusively used in very specific places (throws and catch clauses), and almost always are very simple classes, e.g. about as simple as many enums. And enums are actually implemented as classes that extend java.lang.Enum, much as exceptions extend Throwable, except that's all hidden away by the compiler. In another reality, exceptions could have been implemented the same way. So I feel that they are sufficiently different in nature to "regular" classes that they should be distinguished. (It's now tempting to wonder about classes that extend Thread, but I won't...)

This leaves us with the following kinds of "number of entity type" metrics. Since these are all counts, prefixing them with some reference to "number of" seems somewhat redundant, so I don't.

Define application size as measured in terms of number of entities with the vector metric: (P, SLC, SLE, SLI, SLA, SLEx, TPC, TPE, TPI, TPA, TPEx UDC, UDE, UDI, UDA, UDEx), where

With this metric, we can determine all of the scalar metrics mentioned above. E.g., the number of user defined top-level types (the one we tend to use) is UDC[0]+UDE[0]+UDI[0]+UDA[0]+UDEx[0].

(Note to self: What about classes that are used through generated stuff, such as in Enums?)

Oh, and there's also the question of generic types....

So much for 'number of classes' being straightforward!

To see if any of this is actually useful, let's apply it to some application. I'll use jgraph, which is a system for Java Graph Visualization and Layout. It's useful for case studies because it's non-trivial, while at the same time not being very "big". So how big is it? Below is its size metric vector for version 5.9.2.1.

P SLCSLISLESLASLEx TPCTPITPETPATPEx UDCUDIUDEUDAUDEx
998,532,100130000036,3714,3000

From this we can see that this version of jgraph uses 9 primitive types, 98 top-level and 5 level-1 nested classes from the Standard API, 32 top-level and 1 level-1 nested interfaces from the Standard API, and 13 Exceptions from the Standard API. It uses no third-party entities. It defines 36 top-level and 37 level-1 nested classes, and 14 top-level and 3 level-1 nested classes. In this case, modules are considered to be in the application if they belong to the org.jgraph package.

Is this useful? I'm not sure, but I sure find it interesting. For example, I wonder if the jgraph developers realise that they use 98 classes from the Standard API. I suspect that, after a short pause for thought, they will remember all of the swing stuff they use, and so will only be mildly surprised at the number. What about the fact that there are 37 nested classes declared in the source. Again, on remembering the various swing events that are being handled, this will probably only be mildly surprising.

Here's the size metric vectors for all the version of jgraph I have handy.

 P SLCSLISLESLASLEx TPCTPITPETPATPEx UDCUDIUDEUDAUDEx
5.4.4-java1.39107,635,100171000037,4314,4000
5.4.4-java1.4993,632,100130000036,3514,3000
5.5993,632,100130000036,3514,3000
5.5.1993,632,100130000036,3614,3000
5.6.2996,632,100130000036,3614,3000
5.6.2.1996,632,100130000036,3614,3000
5.6.3995,632,100130000036,3614,3000
5.7995,632,100130000036,3714,3000
5.7.1995,632,100130000036,3714,3000
5.7.3995,632,100130000036,3714,3000
5.7.3.1995,632,100130000036,3714,3000
5.7.4994,632,100130000036,3714,3000
5.7.4.1994,632,100130000036,3714,3000
5.7.4.2994,632,100130000036,3714,3000
5.7.4.3994,632,100130000036,3714,3000
5.7.4.4994,632,100130000036,3714,3000
5.7.4.5995,532,100130000036,3714,3000
5.7.4.6995,532,100130000036,3714,3000
5.7.4.7995,532,100130000036,3714,3000
5.8.0.0996,532,100130000036,3714,3000
5.8.1.1996,532,100130000036,3714,3000
5.8.2.0996,532,100130000036,3714,3000
5.8.2.1996,532,100130000036,3714,3000
5.8.3.1996,532,100130000036,3714,3000
5.9.0.0996,532,100130000036,3714,3000
5.9.1.0997,532,100130000036,3714,3000
5.9.2.0997,532,100130000036,3714,3000
5.9.2.1998,532,100130000036,3714,3000
5.10.0.09100,532,100140000036,3714,3000
5.10.0.19100,532,100140000036,3714,3000

Note the the differences between using Java 1.3 and Java 1.4. It's interesting that there's no change in size from the developer's size metrics since version 5.7, however the other metrics show some changes. I wonder what a more traditional measure of size (LOC) would show...

For something a bit more significant in size, here's Eclipse 3.2.2 for win32.

P SLCSLISLESLASLExSLUnk TPCTPITPETPATPExTPUnk UDCUDIUDEUDAUDExUDUnk
9 492, 18 187, 1 1 0 173, 1 16, 1 1946, 956, 18 428, 54 1,2 7,2 78,7 66 11373, 9728, 101 2175, 116 1, 4 0 89, 20, 4 0

There are more numbers for eclipse than for jgraph that need to be explained. The tool used to produce these measurements operates on bytecodes, rather than source code. As there is not a 1-1 relationship between the two (Java compilers both drop some information and generate code), the measurements given here may not exactly align with the via a developer would have from the source code.

There are also practical problems that arise in doing measurement (whether bytecode or source code). In particular, it's possible that not everything is available to the measurer. For example, it's often the case that third party libraries are not distributed with the application, meaning the measurer has to choose between the time-consuming process of finding them all, or not directly measuring them - I choose not to directly measure them.

Not measuring all modules involved means that some of them cannot be properly classified. For example, if a method is declared to take an argument of type ArgType, and we don't see the definition of ArgType, then we can't tell what kind of module this is, whether it's a class, interface, or enum. (And it's not impossible that it be either an annotation or an exception, although it's unlikely.) Such modules are classified as "unknown", hence TPUnk.

While I could figure out what kind of module any JDK class is (the tool is written in Java and so when it executes all the JDK classes are on the classpath), I don't. So I also need an "unknown" classification for JDK classes, hence SLUnk.

Another issue that arises when measuring bytecodes is that it's not always obvious what's application code and what isn't if some of the third-party libraries are distributed. In fact, in the case of eclipse, not only is some third-party library code included, but also some of the JDK code. So just measuring any .class file we find is going to be mis-leading. What I do is identify those modules that are considered user-defined by listing the packages such modules can belong to. In the case of the eclipse measurements above, user-defined means any class belonging to the org.eclipse package.

A possible consequence of identifying user-defined modules by naming the application packages is that if one user-defined module references another module belonging to an application package, but no definition for that other module is given, then such a module would have to be classified as "user-defined unknown", or UDUnk. This seems possible in the case of dead code, for example a method that is never called but has an argument of a type that is not distributed. In such a situation the missing .class file would never be detected as the JVM would never try to load it. In the case of eclipse, no such modules exist.

References

[FP96] Norman Fenton and Shari L. Pfleeger, "Software Metrics: A Rigorous and Practical Approach," International Thomson Computer Press, London, UK, 1997, second edition.
[JLS]The Java Language Specification, Third Edition

History

13 March 2007
First release.
4 May 2007
Minor amendments.
10 May 2007
Added jgraph-5.10.0.1 and eclipse-3.2.2 measurements.
20 June 2007
Changed names of metrics slightly. Added jgraph-5.10.0.0, changed the version of eclipse reported, discussion about "unknowns" added.
29 July 2007
Improved comments comparing Enums and Exceptions