This program has now been upgraded to version 2, which additionally deals with C++ .NET solutions. This can be downloaded from my later article.
The attached program is a line count utility written in C#. This:
- Counts the number of lines in a .NET solution, project or individual code file.
- Works with both C# and VB.NET.
- Works with both Visual Studio 2003/.NET 1.1 and Visual Studio 2005/.NET 2.0 (but needs .NET 2.0 to run).
- Provides a sortable grid of results so you can easily find your biggest projects or biggest code files.
- Caches results at all levels and provides views onto them. For a solution you can see a view of all projects and their sizes, all code files and their sizes, or a view that combines both projects and code files.
- Shows the number of blank lines at every level.
- Shows the number of lines auto-generated by code-generators at every level (e.g. layout code on forms, or typed DataSet code).
- Shows the number of lines of comments at each level.
- Allows the grid to be copied into the clipboard in a format that can be pasted into Excel.
- Comes in a fetching green and white colour scheme.
I was recently asked how many lines of code there are in our current C# project, and how that compared with another similar project. The ‘other’ project is much bigger in terms of resources (numbers of developers), although it’s been running for slightly less time than our project. Our project has had two or three developers working on it for about a year.
I looked around for a line count utility on the internet, but couldn’t really find anything I liked the look of. So I upgraded an old VB6 line count utility I wrote several years ago. I used the VB6 to VB.NET upgrade wizard initially. It still amazes me that the upgrade wizard works at all, but in this case I got a VB.NET project (with VB6-style code) that compiled immediately. With a little work I got it counting code in individual C# projects.
This program told me we had about 180,000 lines of code in our entire C# solution. If you do the maths on that it comes out at about 1500 lines of code per developer per week, or over 300 lines per day.
300 lines per day per developer of production code seemed very high, so I decided I needed a tool that could analyze the data in a little more detail. This program is the result of that. Below I will discuss why our developers (myself included) are nowhere near as productive as the initial analysis suggests, and why.
Design – Code Containers
This is not a highly complex application, and there isn’t all that much to say about the design. However it was clear early on that I would want classes that represented the three possible types of entity that the program can be run on. These are solutions, projects and individual code files. In addition I would want some polymorphic behaviour from these three classes. That is I would want them to implement the same methods to do stuff like counting lines and getting results.
I have implemented this using an abstract base class for the three classes, with abstract methods CountLines and PopulateResults. I call the solutions, projects and individual code files ‘code containers’, and hence the abstract base class is called ‘CodeContainer’. Note that semantically the code containers are not just the individual solution file, project file or code file (although they contain a reference to that file), but represent the associated code structures and the line counts on them.
In particular each code container contains a list of other code containers that are contained directly within them: a Solution object will contain a list of Project objects which will in turn contain a list of CodeFile objects. These objects are then used to cache the line count results at the appropriate level: after calculation each CodeFile object will contain its own line count in its numberLines member variable, and the Project object will similarly contain the overall total number of lines in all its CodeFiles in its numberLines member variable. So in some ways this is a simple composite pattern, although it can only have three levels with specific types at each level.
The CodeContainer abstract base class lets me cache the actual line counts in member variables in the base (since all the code containers need to store these), and to have a ToResultString method that just output these numbers with a bit of blurb. Finally a factory idiom (CodeContainerFactory) allows the correct code container to be instantiated when necessary based on the extension on the name of the file.
All this means that the client code doesn’t need to know which type of code container it is dealing with: it instantiates the correct one by calling the CodeContainerFactory and then just calls the abstract methods on the base class when it needs to do something.
At start up the application opens a dialog to allow the user to select the solution, project or code file (.vb, .cs) that the program will run on initially. Once a file is selected the application calculates the line counts for that item and displays the results as below. Here a solution file has been selected and both project files and individual code files are being shown in the resulting grid:
The grid can as usual be sorted by clicking the column headers. Here it has been sorted by the number of lines in individual code files.
Additional functionality is available on both the traditional menus and a context menu. These can be used to hide the code files and show only project files, with one line in the grid per project file (by clearing the check mark alongside ‘Show Code Files’):
For a simpler view at code file level, the application can also be used to show code files only (by checking ‘Show Code Files’ and clearing ‘Show Project Files’). The breakdown columns (numbers of blank lines, code designer lines and comments) can also be hidden using the ‘Show Breakdown’ menu option:
The other functionality on the menus is pretty self-explanatory.
If you want to copy the grid into Excel you can simply select the entire grid (Ctrl-A), copy to the clipboard (Ctrl-C) and then launch Excel and paste (Ctrl-V). In a later version of the application I will add a menu option to do all this.
There are some issues around the counting of auto-generated code with this application, particularly with Visual Studio 2003 projects. In Visual Studio 2005 we have auto-generated code neatly split into partial ‘designer’ files, which makes it much easier to identify and count. For Visual Studio 2003 I have tried to identify the auto-generated code regions, but have been forced to do this by looking for the #Region or #region strings that precede these regions. This probably isn’t the most accurate method of identifying this code. See method ‘SetCodeBlockFlags’ in CodeFile.cs.
A further problem arises if your project references a web service. The proxy code for this is generated by Visual Studio in a file called ‘Reference.cs’. At the moment this is being identified by name and by the fact that it will have the text ‘Web References’ in its file path. Again, this isn’t a great solution.
Note that in any case only auto-generated code in .cs or .vb files is counted.
The Line Count program showed us that whilst our project does have 180,000 lines of code, 100,000 of them are auto-generated by Microsoft’s code generators.
Of the 100,000 auto-generated lines 73,000 are in our data access component. Our application is a low-volume but reasonably complex product, and for ease of development we have extensively used typed DataSets to get our data out of our database. Those 73,000 lines of code are mainly in these typed DataSets. In addition 22,000 auto-generated lines of code (out of the 100,000) are in our presentation layer. As you’d expect these are mainly auto-generated layout code for our forms and user controls.
So we’re down to 80,000 lines of code written by developers. Of this, a further 10,000 lines are blank, and another 10,000 are comments. Even this exaggerates the size of the actual application code as we can see that our unit test project has 16,000 lines of code.
I expect these numbers are not untypical of enterprise .NET applications. I’d be interested in some statistics from other projects.
As for the ‘other’ project I mentioned above, that has 50,000 lines of code, 10,000 auto-generated, 5,000 blank, 6,000 comments (and no unit tests).
In the end all this goes to back up something that all developers know instinctively: using lines of code as a metric for the ‘size’ of an application really doesn’t make much sense. Maybe that’s why I couldn’t find a decent line count program in the first place.
However counting lines can provide some interesting analysis. We can see at a glance which our biggest classes are, and these are clearly candidates for refactoring. Also, if you look closely at the screenshots you can see that we probably have too much logic in our presentation layer compared to our model layer (middle tier business layer). We knew that already, but the line count statistics bring it home.