Review of a Trading System Project

Introduction

In late 2007 I wrote a series of articles on Microsoft’s Composite Application Block (CAB).  At that time I was running a team that was developing a user interface framework that used the CAB.

We’re now four years on and that framework is widely used throughout our department.  There are currently modules from eleven different development teams in production.  There are modules that do trade booking, trade management, risk management, including real-time risk management, curve marking, other market data management, and so on.  All of those were written by different teams, yet it appears to the user that this is one application.

This article will look back at the goals, design decisions, and implementation history of the project.  It will look at what we did right, what we did wrong, and some of the limitations of the CAB itself (which apply equally to its successor, Prism).

The framework is a success.  However, it’s only a qualified success.  Ironically, as we shall see, it is only a qualified success because it has been so successful.  To put that less cryptically: many of the problems with the framework have only arisen because it’s been so widely adopted.

Hopefully this article will be of interest.  It isn’t the kind of thing I usually write about and will of course be a personal view:  I’m not going to pretend I’m totally unbiased.

Design Goals

Original Overall Goals

The project had two very simple goals originally:

  1. A single client application that a user (in this case, a trader) would use for everything they need to do.
  2. Multiple development teams able to easily contribute to this application, working independently of each other.

I suspect these are the aims of most CAB or Prism projects.

Do You Actually Need a Single Client Application?

An obvious question arising from these goals is why you would need an application of this kind.

Historically there have tended to be two approaches to building big and complex trading applications:

  1. The IT department will create one huge monolithic application.  One large development team will build it all.
  2. The IT department breaks the problem up and assigns smaller development teams to develop separate applications to do each part.  This is a much more common approach than 1/.

Both of these approaches work, and both mean you don’t need a client application of the kind we are discussing.  However, neither of these approaches works very well:

  • Monolithic applications quickly become difficult to maintain and difficult to release without major regression testing.
  • Equally users don’t like having to log into many different applications.  This is particularly true if those applications are built by the same department but all behave in different ways.  It can also be difficult to make separate applications communicate with each other, or share data, in a sensible way.

So there definitely is a case for trying to create something that fulfils our original design goals above and avoids these problems.   Having said that it’s clearly more important to actually deliver the underlying functionality.  If it’s in several separate applications that matters less than failing to deliver it altogether.

More Detailed Goals

For our project we also had some more detailed goals:

  • Ease of use for the developer.  I have personally been compelled to use some very unpleasant user interface frameworks and was keen that this should not be another one of those.
  • A standardized look and feel.  The user should feel this was one application, not several applications glued together in one window.
  • Standard re-usable components, in particular a standard grid and standard user controls.  The user controls should include such things as typeahead counterparty lookups, book lookups, and security lookups based on the organization’s standard repositories for this data.  That is, they should include business functionality.
  • Simple security (authentication and authorization) based on corporate standards.
  • Simple configuration, including saving user settings and layouts.
  • Simple deployment.  This should include individual development teams being able to deploy independently of other teams.

As I’ll discuss, it was some of the things that we left off that list came to back to haunt us later on.

Goals re Serverside Communication

A further goal was use of our strategic architecture serverside, in particular for trade management.  For example, we wanted components that would construct and send messages to our servers in a standard way.  I won’t discuss the success or failure of this goal in detail here as it’s a long and chequered story, and not strictly relevant to the CAB and the user interface framework.

Technical Design

Technical Design: Technologies

The technologies we used to build this application were:

  • Microsoft C# and Windows Forms
  • Microsoft’s Patterns and Practices Group’s Composite Application Block (the CAB)
  • DevExpress’ component suite
  • Tibco EMS and Gemstone’s Gemfire for serverside communication and caching

As I’ve already discussed, this document is going to focus purely on the clientside development.

In 2007 these were logical choices for a project of this kind.  I’ll discuss some of the more detailed design decisions in the sections below.

Things We Did (Fairly) Well

As I said this is a personal view: I’m not sure all our developers would agree that all of this was done well.

Ease of Use

Designing for ease of use is, of course, quite difficult.  We have done a number of things to make the project easy to use, some of which I’ll expand on below.  These include:

  • Developers write vanilla user controls.  There’s no need to implement special interfaces, inherit from base classes or use any complex design pattern.
  • Almost all core functionality is accessed through simple services that the developer just gets hold of and calls.  So for example to show your user control you get an instance of the menu service and call a show method.  We used singleton service locators so the services could be accessed without resorting to CAB dependency injection.
  • Good documentation freely available on a wiki
  • A standard onboarding process for new teams, including setting up a template module.  This module has a ‘hello world’ screen that shows the use of the menus and other basic functionality.

Developers Not Forced to Learn the Composite Application Block (CAB)

As mentioned above, one of the key goals of the project was simplicity of use.  The CAB is far from simple to use: I wrote a 25 part introductory blog article on it and still hadn’t covered it all.

As a result we took the decision early on that developers would not be compelled to use the CAB actually within their modules.  We were keen that developers would not have to learn the intricacies of the CAB, and in particular would not have to use the CAB’s rather clunky dependency injection in their code.

However, obviously we were using the CAB in our core framework.  This made it difficult to isolate our developers from the CAB completely:

  • As mentioned above we exposed functionality to the developers through CAB services.  However we gave them a simple service locator so they didn’t have to know anything about the CAB to use these services.
  • We also used some CAB events that developers would need to sink.  However since this involves decorating a public method with an attribute we didn’t think this was too difficult.

As already mentioned, to facilitate this we wrote a ‘template’ module, and documentation on how to use it.  This was a very simple dummy module that showed how to do all the basics.  In particular it showed what code to write at startup (a couple of standard methods), how to get hold of a service, and how to set up a menu item and associated event.

Versioning

We realized after a few iterations of the system that we needed a reasonably sophisticated approach to versioning and loading of components.  As a result we wrote an assembly loader.  This:

  • Allows each module to keep its own assemblies in its own folder
  • Allows different modules to use different versions of the same assembly
  • Also allows different modules to explicitly share the same version of an assembly

Our default behaviour is that when loading an assembly that’s not in the root folder, the system checks all module folders for an assembly of that name and loads the latest version found.  This means teams can release interface assemblies without worrying about old versions in other folders.

Versioning of Core Components

For core components clearly there’s some code that has to be used by everyone (e.g. the shell form itself, and menus).  This has to be backwards compatible at each release because we don’t want everyone to have to release simultaneously.  We achieve this through the standard CAB pattern of interface assemblies: module teams only access core code through interfaces that can be extended, but not changed.

However, as mentioned above, the core team also writes control assemblies that aren’t backwards compatible: teams include them in their own module, and can upgrade whenever they want without affecting anyone else.

User Interface Design

For the user interface design, after a couple of iterations we settled on simple docking in the style of Visual Studio.  For this we used Weifen Luo’s excellent docking manager, and wrote a wrapper for it that turned it into a CAB workspace.  For menuing we used the ribbon bars in the DevExpress suite.

The use of docking again keeps things simple for our developers.  We have a menu service with a method to be called that just displays a vanilla user control in a docked (or floating) window.

Deployment

In large organizations it’s not uncommon for the standard client deployment mechanisms to involve complex processes and technology.  Our organization has this problem.  Early on in this project it was mandated that we would use the standard deployment mechanisms.

We tried hard to wrap our corporate process in a way that made deployment as simple as possible.  To some extent we have succeeded, although we are (inevitably) very far from a simple process.

Configuration

For configuration (eventually) we used another team’s code that wrapped our centralized configuration system to allow our developers to store configuration data.  This gives us hierarchies of data in a centralized database.  It means you can easily change a setting for all users, groups of users, or an individual user, and can do this without the need for a code release.

Module Interaction

Clientside component interaction is achieved by using the standard CAB mechanisms.  If one team wants to call another team’s code they simply have to get hold of a service in the same way as they do for the core code, and make a method call on an interface.  This works well, and is one advantage of using the CAB.  Of course the service interface has to be versioned and backwards compatible, but this isn’t difficult.

Security

For security we again wrapped our organization’s standard authentication and authorization systems so they could easily be used in our CAB application.  We extended the standard .Net Principal and Identity objects to allow authorization information to be directly accessed, and also allowed this information to be accessed via a security service.

One thing that we didn’t do so well here was the control of authorization permissions.  These have proliferated, and different teams have handled different aspects of this in different ways.  This was in spite of us setting up what we thought was a simple standard way of dealing with the issue.  The result of this is that it’s hard to understand the permissioning just by looking at our permissioning system.

Things We Didn’t Do So Well

As mentioned above, the things that didn’t go so well were largely the things we didn’t focus on in our original list of goals.

Most of these issues are about resource usage on the client.  This list is far from comprehensive: we do have other problems with what we’ve done, of course, but the issues highlighted here are the ones causing the most problems at the time of writing.

The problems included:

Threading

The Problem

We decided early on to allow each team to do threading in the way they thought was appropriate, and didn’t provide much guidance on threading.  This was a mistake, for a couple of reasons.

Threading and Exception Handling

The first problem we had with threading was the simple one of background threads throwing exceptions with no exception handler in place.  As I’m sure you know, this is pretty much guaranteed to crash the entire application messily (which in this case means bringing down 11 teams’ code).  Of course it’s easy to fix if you follow some simple guidelines whenever you spawn a background thread.  We have an exception hander that can be hooked up with one line of code and that can deal with appropriate logging and thread marshalling.  We put how to do this, and dire warnings about the consequences of not doing so, in our documentation, but to no avail.  In the end we had highly-paid core developers going through other teams’ code looking for anywhere they spawned a thread and then complaining to their managers if they hadn’t put handlers in.

Complex Threading Models

Several of our teams were used to writing serverside code with complex threading models.  They replicated these clientside, even though most of our traders don’t have anything better than a dual core machine, so any complex threading model in a workstation client is likely to be counterproductive.

Some of these models tend to throw occasional threading exceptions that are unreproducible and close to undebuggable.

What We Should Have Done

In retrospect we should have :

  • Provided some clear guidance for the use of threading in the client.
  • Written some simple threading wrappers and insisted the teams use them, horrible though that is.
  • Insisted that ANY use of threading be checked by the core team (i.e. a developer that knew about user interface threading).  The wrappers would have made it easy for us to check where threads were being spawned incorrectly (and without handlers).

Start Up

The Basic Problem

We have a problem with the startup of the system as well: it’s very slow.

Our standard startup code (in our template module) is very close to the standard SCSF code.  This allows teams to set up services and menu items when the entire application starts and the module is loaded.

This means the module teams have a hook that lets them run code at startup.  The intention here is that you instantiate a class or two, and it should take almost no time.  We didn’t think that teams would start using it to load their data, or start heartbeats, or worse, to fire off a bunch of background threads to load their data.  However, we have all of this in the system.

Of course the reason for this is that the place where this code should actually be is when a user clicks a menu item to load the team’s screen for the first time.  For heartbeats, it’s a little hard to control startup and closedown when a screen opens and closes: it’s much easier to just start your heartbeats when the application starts.  For data loading for a screen, if this is slow it becomes very obvious if it happens when a user requests a screen.

However, the impact of this happening over 11 development teams’ code is that the system is incredibly slow to start, and very very fragile at startup.  It will often spend a couple of minutes showing the splash screen and then keel over with an incomprehensible error message (or none).  As a result most traders keep the system open all the time (including overnight).  But an obvious consequence is that they are very reluctant to restart, even if they have a problem that we know a restart will fix.  Also all machines are rebooted at the weekend in our organization, so they have to sit through the application startup on a Monday morning in any case.

One further problem is that no individual team has any incentive to improve their startup speed: it’s just a big pool of slowness and you can’t tell if module X is much slower than module Y as a user.  If any one team moves to proper service creation at startup it won’t have a huge overall effect.  We have 11 teams and probably no one team contributes more than a couple of minutes to the overall startup.  It’s the cumulative effect that’s the problem.

What We Should Have Done

This is one area where we should just have policed what was going on better, and been very firm about what is and is not allowed to be run at startup.  At one stage I proposed fixing the problem by banning ANY module team’s code from running at startup, and I think if I were to build an application of this kind again then that’s what I’d do.  However, clearly a module has to be able to set up its menu items at startup (or the user won’t be able to run anything).  So we’d have to develop a way of doing this via config for this to work, which would be ugly.

One other thing that would really help would be the ability to restart an individual module without restarting the entire system.

Memory Usage

The Problem

We effectively have 11 applications running in the same process.  So with memory usage we have similar problems to the startup problems: every team uses as much memory as they think they need, but when you add it all up we can end up with instances of the system using well over 1GB of memory.  On a heavily-loaded trader’s machine this is a disaster: we’ve even had to get another machine for some traders just to run our application.

To be honest, this would be a problem for any complex trading environment.  If we had 11 separate applications doing the same things as ours the problem would probably be worse.

However, as above there’s no incentive for any individual team to address the problem: it’s just a big pool that everyone uses and no-one can see that module X is using 600MB.

What We Should Have Done

Again here better policing would have helped: we should have carefully checked every module’s memory requirements and told teams caching large amounts of data not to.  However, in the end this is a problem that is very hard to avoid: I don’t think many teams are caching huge amounts of data, it’s just there’s a lot of functionality in the client.

One thing that will help here is the move to 64-bit, which is finally happening in our organization.  All our traders have a ceiling of 4GB of memory at present (of which, as you know, over 1GB is used by Windows), so a 1GB application is a real problem.

Use of Other Dependency Injection Frameworks (Spring.Net)

The Problem

One unexpected effect of the decision not to compel teams to use the CAB was that a number of teams decided to use Spring.Net for dependency injection within their modules, rather than using the CAB dependency injection.  I have some sympathy with this decision, and we didn’t stop them.  However, Spring.Net isn’t well-designed for use in a framework of this kind and it did cause a number of problems.

  • The biggest of these is that Spring uses a number of process-wide singletons.  We had difficulties getting them to play nicely with our assembly loading.  This has resulted in everyone currently having to use the same (old) version of Spring.Net, and upgrading being a major exercise.
  • Handling application context across several modules written by different teams proved challenging.
  • If you use XML configuration in Spring.Net (which everyone does) then types in other assemblies are usually referenced using the simple assembly name only.  This invalidated some of our more ambitious assembly loading strategies.
  • The incomprehensibility associated with Spring.Net’s exception messages on initial configuration is made worse when you have multiple modules at startup.

We also had some similar problems re singletons and versioning with the clientside components of our caching technology.  Some code isn’t really compatible with single-process composite applications.

What We Should Have Done

Again we should have policed this better: many of the problems described above are solvable, or could at least have been mitigated by laying down some guidelines early on.

What I’d Change If I Did This Again

The ‘what we should have done’ sections above indicate some of the things I’d change if I am ever responsible for building another framework of this kind.  However, there are two more fundamental (and very different) areas that I would change:

Code Reviews

In the ‘what we should have done’ sections above I’ve frequently mentioned that we should have monitored what was happening in the application more carefully.  The reasons we didn’t were partially due to resourcing, but also to some extent philosophical.  Most of our development teams are of high quality, so we didn’t feel we needed to be carefully monitoring them and telling them what to do.

As you can see from the problems we’ve had, this was a mistake.  We should have identified the issues above early, and then reviewed all code going into production to ensure that there weren’t threading, startup, memory or any other issues.

Multiple Processes

The second thing I’d change is technical.  I now think it’s essential in a project of this kind to have some way of running clientside code in separate processes.  As we’ve seen many of the problems we’ve had have arisen because everything is running in the same process:

  • Exceptions can bring the process down, or poorly-written code can hang it
  • It’s hard to identify how much each module is contributing to memory usage or startup time
  • There’s no way of shutting down and unloading a misbehaving module

I think I’d ideally design a framework that had multiple message loops and gave each team its own process in which they could display their own user interface.  This is tricky, but not impossible to do well.

Note that I’d still write the application as a framework.  I’d make sure the separate processes could communicate with each other easily, and that data could be cached and shared between the processes.

As an aside a couple of alternatives to this are being explored in our organization at present.  The first is to simply break up the application into multiple simpler applications.  The problem with this is that it doesn’t really solve the memory usage or startup time problems, and in fact arguably makes them worse.  The second is to write a framework that has multiple processes but keeps the user interface for all development teams in the same process.  This is obviously easier to do technically than my suggestion above.  However for many of our modules it would require quite a bit of refactoring: we need to split our the user interface code cleanly and run it in a separate process to the rest of the module code.

4 thoughts on “Review of a Trading System Project

  1. Great post! Would love to read an expanded version of “What I’d Change If I Did This Again”. You mentioned that the stack you used was reasonable for 2007; curious what the stack would look like in 2011. I ask this for selfish reasons, since my team is looking to build a very similar application (i.e. combination of trading/risk tools to use on a trader’s workstation).

  2. I’m surprised you managed to get this framework working in your environment. Crashing a trading/risk application because some daily canteen menu app hadnt handled a thread exception properly would be awful. We have a centralised menu system through shortcuts split by teams in tab pages but that is as far a composite application block will go where I work. (15 tab pages with about 20 shortcuts to apps per tab). But then we arent a very big organisation.

    Keep up with your blog. It’s great to read. Complexities of trading, fun of developing. Its a great occupation to be in isnt it.

Leave a comment