Rich Newman

July 1, 2007

C# and VB.NET Line Count Utility – Version 1

Filed under: .net, c#, dotnet, line count, technology, utility — richnewman @ 7:10 pm

This program has now been upgraded to version 2, which additionally deals with C++ .NET solutions. This can be downloaded from my later article.

Overview

The attached program is a line count utility written in C#. This:

  • Counts the number of lines in a .NET solution, project or individual code file.
  • Works with both C# and VB.NET.
  • Works with both Visual Studio 2003/.NET 1.1 and Visual Studio 2005/.NET 2.0 (but needs .NET 2.0 to run).
  • Provides a sortable grid of results so you can easily find your biggest projects or biggest code files.
  • Caches results at all levels and provides views onto them. For a solution you can see a view of all projects and their sizes, all code files and their sizes, or a view that combines both projects and code files.
  • Shows the number of blank lines at every level.
  • Shows the number of lines auto-generated by code-generators at every level (e.g. layout code on forms, or typed DataSet code).
  • Shows the number of lines of comments at each level.
  • Allows the grid to be copied into the clipboard in a format that can be pasted into Excel.
  • Comes in a fetching green and white colour scheme.

Background

I was recently asked how many lines of code there are in our current C# project, and how that compared with another similar project. The ‘other’ project is much bigger in terms of resources (numbers of developers), although it’s been running for slightly less time than our project. Our project has had two or three developers working on it for about a year.

I looked around for a line count utility on the internet, but couldn’t really find anything I liked the look of. So I upgraded an old VB6 line count utility I wrote several years ago. I used the VB6 to VB.NET upgrade wizard initially. It still amazes me that the upgrade wizard works at all, but in this case I got a VB.NET project (with VB6-style code) that compiled immediately. With a little work I got it counting code in individual C# projects.

This program told me we had about 180,000 lines of code in our entire C# solution. If you do the maths on that it comes out at about 1500 lines of code per developer per week, or over 300 lines per day.

300 lines per day per developer of production code seemed very high, so I decided I needed a tool that could analyze the data in a little more detail. This program is the result of that. Below I will discuss why our developers (myself included) are nowhere near as productive as the initial analysis suggests, and why.

Design – Code Containers

This is not a highly complex application, and there isn’t all that much to say about the design. However it was clear early on that I would want classes that represented the three possible types of entity that the program can be run on. These are solutions, projects and individual code files. In addition I would want some polymorphic behaviour from these three classes. That is I would want them to implement the same methods to do stuff like counting lines and getting results.

I have implemented this using an abstract base class for the three classes, with abstract methods CountLines and PopulateResults. I call the solutions, projects and individual code files ‘code containers’, and hence the abstract base class is called ‘CodeContainer’. Note that semantically the code containers are not just the individual solution file, project file or code file (although they contain a reference to that file), but represent the associated code structures and the line counts on them.

In particular each code container contains a list of other code containers that are contained directly within them: a Solution object will contain a list of Project objects which will in turn contain a list of CodeFile objects. These objects are then used to cache the line count results at the appropriate level: after calculation each CodeFile object will contain its own line count in its numberLines member variable, and the Project object will similarly contain the overall total number of lines in all its CodeFiles in its numberLines member variable. So in some ways this is a simple composite pattern, although it can only have three levels with specific types at each level.

The CodeContainer abstract base class lets me cache the actual line counts in member variables in the base (since all the code containers need to store these), and to have a ToResultString method that just output these numbers with a bit of blurb. Finally a factory idiom (CodeContainerFactory) allows the correct code container to be instantiated when necessary based on the extension on the name of the file.

All this means that the client code doesn’t need to know which type of code container it is dealing with: it instantiates the correct one by calling the CodeContainerFactory and then just calls the abstract methods on the base class when it needs to do something.

Usage

At start up the application opens a dialog to allow the user to select the solution, project or code file (.vb, .cs) that the program will run on initially. Once a file is selected the application calculates the line counts for that item and displays the results as below. Here a solution file has been selected and both project files and individual code files are being shown in the resulting grid:

Line Count Main Small

The grid can as usual be sorted by clicking the column headers. Here it has been sorted by the number of lines in individual code files.

Additional functionality is available on both the traditional menus and a context menu. These can be used to hide the code files and show only project files, with one line in the grid per project file (by clearing the check mark alongside ‘Show Code Files’):

Line Count Projects

For a simpler view at code file level, the application can also be used to show code files only (by checking ‘Show Code Files’ and clearing ‘Show Project Files’). The breakdown columns (numbers of blank lines, code designer lines and comments) can also be hidden using the ‘Show Breakdown’ menu option:

Line Count Code Files

The other functionality on the menus is pretty self-explanatory.

If you want to copy the grid into Excel you can simply select the entire grid (Ctrl-A), copy to the clipboard (Ctrl-C) and then launch Excel and paste (Ctrl-V). In a later version of the application I will add a menu option to do all this.

Issues

There are some issues around the counting of auto-generated code with this application, particularly with Visual Studio 2003 projects. In Visual Studio 2005 we have auto-generated code neatly split into partial ‘designer’ files, which makes it much easier to identify and count. For Visual Studio 2003 I have tried to identify the auto-generated code regions, but have been forced to do this by looking for the #Region or #region strings that precede these regions. This probably isn’t the most accurate method of identifying this code. See method ‘SetCodeBlockFlags’ in CodeFile.cs.

A further problem arises if your project references a web service. The proxy code for this is generated by Visual Studio in a file called ‘Reference.cs’. At the moment this is being identified by name and by the fact that it will have the text ‘Web References’ in its file path. Again, this isn’t a great solution.

Note that in any case only auto-generated code in .cs or .vb files is counted.

Analysis

The Line Count program showed us that whilst our project does have 180,000 lines of code, 100,000 of them are auto-generated by Microsoft’s code generators.

Of the 100,000 auto-generated lines 73,000 are in our data access component. Our application is a low-volume but reasonably complex product, and for ease of development we have extensively used typed DataSets to get our data out of our database. Those 73,000 lines of code are mainly in these typed DataSets. In addition 22,000 auto-generated lines of code (out of the 100,000) are in our presentation layer. As you’d expect these are mainly auto-generated layout code for our forms and user controls.

So we’re down to 80,000 lines of code written by developers. Of this, a further 10,000 lines are blank, and another 10,000 are comments. Even this exaggerates the size of the actual application code as we can see that our unit test project has 16,000 lines of code.

I expect these numbers are not untypical of enterprise .NET applications. I’d be interested in some statistics from other projects.

As for the ‘other’ project I mentioned above, that has 50,000 lines of code, 10,000 auto-generated, 5,000 blank, 6,000 comments (and no unit tests).

Conclusion

In the end all this goes to back up something that all developers know instinctively: using lines of code as a metric for the ‘size’ of an application really doesn’t make much sense. Maybe that’s why I couldn’t find a decent line count program in the first place.

However counting lines can provide some interesting analysis. We can see at a glance which our biggest classes are, and these are clearly candidates for refactoring. Also, if you look closely at the screenshots you can see that we probably have too much logic in our presentation layer compared to our model layer (middle tier business layer). We knew that already, but the line count statistics bring it home.

Downloads

Executable download.

Source code download.

A Beginner’s Guide to the Black-Scholes Option Pricing Formula (Part 3)

Continued from part 2.

Volatility

If you know a little about options already you will probably be aware that their values depend on something called volatility. Volatility is usually not needed to price derivatives that are not options.

Technically volatility is defined as the annualized standard deviation of the return on an asset (in our case Microsoft stock). They are expressed as percentages.

However, it’s easier to think of it intuitively as the amount that the price will swing around in a given period. Stocks with a high level of uncertainty surrounding them will have high volatilities. An example currently might be the stock of small Russian oil companies. Stocks that are relatively stable (e.g. Microsoft) will have lower volatilities.

Why does volatility affect the price of an option? Again this is because our payoff graph is not symmetrical. A stock that has a high volatility is more likely to swing around, and hence more likely to have a very high value or very low value at maturity. A stock with a low volatility is more likely to be close to its current value at maturity.

Now if the stock price at maturity is below our strike price we don’t care if it’s just slightly below or massively below. In both cases we don’t exercise the option and don’t make any money.

But if the share price at maturity is above our strike we really want it to be as far above the strike as is possible, since we make more money the higher the volume is.

So an option with a high volatility is more likely to make us lots of money if the price goes up, but won’t lose us lots of money even if the price goes down hugely.

As a result options with high volatility are more valuable than options with low volatility.

As we will see below both d1 and d2 in the values discussed above depend on volatility.

The Formula

Finally, note that if I have bought the call I am paying the cash amount in i) above and receiving the value of the stock ii). So we can say that the value, c, of a European call option on a non-dividend paying stock is:

Black-Scholes Call on European Stock

d1 and d2

As mentioned in the introduction the mathematics behind the calculation of the probabilities in the Black-Scholes formula is fairly complex. It turns out that if N() is the cumulative normal function (a statistical operator) then d1 and d2 can be expressed as below. I’ll just present the results without explanation here. Note that whilst these formulas are complicated, you can just plug in the underlying values and get a result: this is what is known as a ‘closed form’ solution.

Black-Scholes d1 d2

Here sigma (σ) is volatility, as discussed above.

Actual Derivation of the Formula

This article has attempted to provide an intuitive interpretation of the Black-Scholes formula, without going into the mathematics behind it. Such an interpretation inevitably glosses over some of the details. I have glossed over risk neutrality considerations above, for instance.

In particular the actual derivation of the Black-Scholes formula was not done directly using the intuitive ideas discussed here. I will discuss this in future articles.

References

John C. Hull. Options, Futures and Other Derivatives (Sixth Edition)

http://en.wikipedia.org/wiki/Black-Scholes

The Shocking Blue Green Theme. Create a free website or blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 82 other followers