Rich Newman

March 31, 2007

Comparing Values for Equality in .NET: Identity and Equivalence (Part 3)

Filed under: .net, c#, dotnet, equals, equivalence, identity, reference type, technology, value type — richnewman @ 10:53 am

Continued from part 1 and part 2:

Care with == and Reference Types

One final thing to note is that operator overloads dont behave like overrides. If you use the == operator with reference types without thinking this can be a problem.
For example, suppose you have an untyped DataSet ds containing a DataTable dt. Suppose this has columns Id and Name. dt has two rows. Consider the following code:

            // Create DataSet

            DataSet ds = new DataSet("ds");

            DataTable dt = ds.Tables.Add("dt");

            dt.Columns.Add("Value", typeof(int));

 

            // Add two rows, both with Value column set to 1

            DataRow row1 = dt.NewRow(); row1["Value"] = 1; dt.Rows.Add(row1);

            DataRow row2 = dt.NewRow(); row2["Value"] = 1; dt.Rows.Add(row2);

            Console.WriteLine(row1["Value"] == row2["Value"]);       // Compare with == returns false.

            Console.WriteLine(row1["Value"].Equals(row2["Value"]));  // Compare with .Equals returns true.

When we compare with == in the example above we get false, even though the column in both rows contains the integer 1. The reason is that both row1[Value] and row2[Value] return objects, not integers. So == will use the == in System.Object, not any overloaded version in integer. The == in System.Object does an identity comparison (reference equality test). The underlying values have been separately boxed onto the heap, so arent in the same memory address, and the test fails.

When we compare with .Equals we get true. This is because .Equals is overridden in System.Int32 to do a value comparison, so the comparison uses the overridden version to correctly compare the values of the two integers.

v) a is b

a is b: Overview

a is b isnt actually a test for object equality at all, although it looks like one. b here has to be a type name(so b would need to be a class name, for example). The operator tests whether object a is either of type b or can be cast to it without an exception being thrown. This is equivalent to TypeOf a Is b in VB.NET, which is a little clearer.

a is b: Value Types/Reference Types

The operator works in the same way for both value types and reference types.

a is b: Override (overload?) or not?

The operator cannot be overloaded (or overridden clearly).

The Final Twist: String Interning

On the basis of the above what should this do?

object a = “Hello World”;
object b = “Hello World”;
Console.WriteLine(a.Equals(b));
Console.WriteLine(a == b);

At first glance you might say that:
a) a and b are reference types containing strings (you would be right).
b) .Equals is overridden in the string class to do an equivalence (value) comparison, and the values are equal. So a.Equals(b) is true (you would still be right).
c) However, a == b is an overload and on the object type it does an identity comparison not a value comparison (you would still be right).
a) a and b are separate objects in memory so a == b is false (you would be wrong).

d) is actually wrong, but only because of an optimization in the CLR. The CLR keeps a list of all strings currently being used in an application in something called the intern pool. When a new string is set up in code the CLR checks the intern pool to see if the string is already in use. If so it will not allocate memory to the string again, but will re-use the existing memory. Hence a == b is true above.

You can prevent strings being interned by using a StringBuilder as below. In this case a.Equals(b) will be true, and a== b will be false, which is what youd expect:

object a = “Hello World”;
object b = new StringBuilder().Append(“Hello).Append(“World”).ToString();
Console.WriteLine(a.Equals(b));
Console.WriteLine(a == b);

VB.NET

This article has talked mainly about C#. However, the situation is similarly confusingin VB.NET. Because they are methods on System.Object VB.NET has methods a.Equals(b), object.Equals(a, b) and object.ReferenceEquals(a, b) which are the same as the methods described above.

VB.NET has no == operator, or any operator equivalent to it.

VB.NET additionally has the Is operator. This operators use in TypeOf a Is b statements was discussed under a is b: Overview above.

VB.NET: a Is b

The Is operator can also be used for identity (reference equality) comparisons on two reference types in VB.NET. However, unlike a.ReferenceEquals(b), which does the same thing for reference types, the Is operator cannot be used at all with value types. TheVisual Basic compiler will not compile code where either of a or b in the statement a Is b are value types.

References: Jeffrey Richter “Applied Microsoft .NET Framework Programming”
http://www.microsoft.com/mspress/books/sampchap/5353.aspx#SampleChapter

Interning strings
http://msdn2.microsoft.com/en-us/library/system.string.intern.aspx

When to overload ==
http://msdn2.microsoft.com/en-us/library/ms173147.aspx

March 24, 2007

Comparing Values for Equality in .NET: Identity and Equivalence (Part 2)

Filed under: .net, c#, dotnet, equals, equivalence, identity, reference type, technology, value type — richnewman @ 5:34 pm

Continued from Part 1:

ii) object.Equals(a, b)

object.Equals(a, b)): Overview

object.Equals(a, b) is a static method on the object class. Jeffery Richter describes it as ‘a little helper method’. It’s easiest to think of it as a method that does some checking for nulls and then calls a.Equals(b).

The reason it exists is that if a is null a call to a.Equals(b) will throw a NullReferenceException. If there’s a possibility that a will be null it is easier to call this method than explicitly check for the null. If a can’t be null there’s no need for the additional check and a call to a.Equals(b) will be better.

object.Equals(a, b)): Detail

In detail, this method does the following for a call to object.Equals(a, b):

a) Check if a and b are identical (i.e. they refer to the same location in memory or are both null). If so return true.
b) Check if either of a and b is null. We know they are not both null otherwise the routine would have returned in a) above, so if either is null return false.
c) Both a and b are not null: return the value of a.Equals(b).

object.Equals(a, b)): Value Types and Reference Types

Since a and b can’t be null for value types, object.Equals(a, b) is identical to a.Equals(b). In general you should call a.Equals(b) in preference to object.Equals(a, b) for value types.

For reference types as discussed above you should call this method if there’s a chance that a will be null in a call to a.Equals(b).

object.Equals(a, b): Override or not?

object.Equals(a, b) is a static method on System.Object, and consequently can’t be overridden. However, since it calls into a.Equals(b) any overrides of Equals will affect calls to this method as well.

iii) object.ReferenceEquals(a, b)

object.ReferenceEquals(a, b)): Overview

Whilst the two incarnations of Equals() above check for identity or equivalence depending on the underlying type, ReferenceEquals is intended to always check for identity.

object.ReferenceEquals(a, b)): Value Types and Reference Types

For reference types object.ReferenceEquals(a, b) returns true if and only if a and b have the same underlying memory address.

In general we shouldn’t care whether value types occupy the same underlying memory address. It isn’t relevant for anything we’d want to normally use them for. But the definition above gives us a problem when we come to value types being compared with Reference Equals

The difficulty comes from the fact that ReferenceEquals expects two System.Objects as parameters. This means that our value types will get boxed onto the heap as they are passed in to this routine. Normally, because of the way the boxing process works, they will get boxed separately to different memory addresses on the heap. This of course means the call to ReferenceEquals returns false.

So for example object.ReferenceEquals(10, 10) returns false, for these reasons.

You can see it’s the boxing that causes the problem in the following code:

// Set up value type in int variable – no boxing
int value = 10;
object one = value; // Cast to object so boxed
object two = value; // Cast again so boxed again separately
// one and two are now separate memory locations on the heap
Console.WriteLine(object.ReferenceEquals(one, two)); // false

// Set up value type in object variable which immediately boxes it onto the heap
object value2 = 10; // value is boxed already
object three = value2; // three points to the boxed value
object four = value2; // four also points to the same boxed value
Console.WriteLine(object.ReferenceEquals(three, four)); // true

object.ReferenceEquals(a, b): Override or not?

ReferenceEquals is a static method on object, and so once again cannot be overridden. It will always perform identity checks as outlined above.

iv) a == b

a == b: Overview

== is an operator, clearly, and not a method. In my humble opinion it has been included in C# largely as a syntactic convenience and to make the language look like C/C++.

As with a.Equals(b), == is intended to test for identity or equivalence as appropriate (see the discussion in the paragraph “What type of Equality do we expect?” in Part 1). In fact, in almost all circumstances == should behave like a.Equals(b).

a == b: Value Types

For value types within the .NET Framework, == is implemented as you would expect, and will test for equivalence (value equality). However, for any custom value types you implement (structs) a default == will not be available unless you provide one.

a == b: Reference Types

For reference types a default == is available, and this will test for identity (reference equality). For most reference types in the .NET Framework == will again test for identity, but, as for a.Equals(b), there are certain classes where the operator has been overloaded to do a value comparison. System.String is once again the canonical example, for the reasons discussed in part one of this article.

a == b: Override (overload?) or not?

Since == is an operator we can’t override it. However, we can overload it to provide different functionality to the base functionality described above.

For reference types Microsoft recommend that you don’t overload == unless you have reference types behaving as value types as discussed above. This means that even if you override a.Equals(b) to provide some custom functionality you should leave your == operator to provide an identity test. This is, I think, the only occasion where == should behave differently from a.Equals(b).

For value types, as mentioned above, a default overload of == will not be available and you will have to provide one if you need one. The easiest thing to do is simply to call a.Equals(b) from an operator overload in your struct: in general your implementation of == should not be different from a.Equals(b).

Note that if you overload == you should overload !=. You should also override a.Equals(b) to do the same thing, and as a result should overload GetHashCode. Finally you should consider overriding IComparable.CompareTo().

Part 3 to follow…

March 23, 2007

Comparing Values for Equality in .NET: Identity and Equivalence (Part 1)

Filed under: .net, c#, dotnet, equals, equivalence, identity, reference type, technology, value type — richnewman @ 10:17 pm

This article is now available on the Code Project at http://www.codeproject.com/dotnet/DotNetEquality.asp

Introduction

The various ways of comparing two values for equality in .NET can be very confusing. In fact if we have two objects a and b in C# there are at least four ways to compare their identity, plus one operator that looks like an identity comparison to add to the confusion:

i) if (a.Equals(b)) {}
ii) if (object.Equals(a, b)) {}
iii) if (object.ReferenceEquals(a, b) {}
iv) if (a == b) {}
v) if (a is b) {}

As if that isn’t confusing enough, these methods and operators behave differently depending on:

– whether a and b are reference types or value types
– whether they are reference types which are made to behave like value types for these purposes (System.String is one of these)

This post is an attempt to clarify why we have all these versions of equality, and what they all mean.

What does it mean to be the same?

Firstly, we have to understand that there are actually two basic types of equality for objects:

1. Identity (reference equality)
Two objects are identical if they actually are the same object in memory. That is, references to them point to the same memory address.
2. Equivalence (value equality)
Two objects are equivalent if the value or values they contain are the same.

So if we have two integers, a and b, both set to value 3, they are equivalent (they have the same value) but not necessarily identical (a and b can refer to different memory addresses).

However if two objects are identical (the same object) then they must be equivalent (have the same underlying values).

What type of Equality do we expect?

Clearly these notions of identity and equivalence are related to the concept of reference types and value types.

Value types are intended as lightweight objects that have value semantics: two objects are the same if they have the same value, and then can be used interchangeably. So integers a and b are the same in the example above because their values are both 3, it doesn’t matter if references a and b actually refer to the same underlying object in memory.

We don’t in general expect reference types to behave this way. Suppose we have two separate objects of type Book (a class). Book has one member variable called ‘title’ (a string). Do we necessarily consider these the ‘same’ Book if they have the same title? We might do so, but it isn’t clear.

To clarify the situation we might add an additional field ‘BookId’ which is unique for a given actual book. We could then say that two books are the same if they have the same BookId, even if they have different titles. But then we wouldn’t normally expect to have two separate Books with the same BookId in memory at the same time: there’s only one underlying book. So potentially we can just compare memory addresses to see if two Books are the same.

The point is that equality for reference types is trickier to define. Our default definition is going to be that two reference types are the same if they are identical.

Types of Equality

Now I’ll go through each of the types of equality referred to in the first paragraph in turn and try to explain why they exist. I’ll also explain how they are implemented for value and reference types, and when you should override or overload them.

i) a.Equals(b)

a.Equals(b): Overview

Equals() is a virtual method on System.Object. This means every single object can call this, and in your own type definitions you can override it to give the behaviour you want.

The base System.Object implementation of Equals() is to do an identity comparison. However, Equals() is intended to test for identity or equivalence as appropriate (see the discussion in the paragraph above).

a.Equals(b): Value Types

For value types this method is overridden to do a value (equivalence) comparison. In particular, System.ValueType itself, the root of all value types, contains an override that will compare two objects by reflecting over their internal fields to see if they are all equal. If you inherit this (by setting up a struct) your struct will get this override by default.

a.Equals(b): Reference Types

For reference types, as discussed above, the situation is trickier. In general we expect Equals() for reference types to do an identity comparison (to check whether the objects actually are the same in memory).

However, certain reference types aren’t lightweight enough to work as value types, but nevertheless have value semantics. The canonical example is System.String. System.String is a reference type. However if we have a = “abc” and b = “abc” we expect a to be equal to b. So in the framework Equals() is overridden to do a value comparison.

a.Equals(b): Override or not?

As mentioned above, for value types there is a default override of a.Equals(b) in the base class System.ValueType which will work for any structs you set up. This method uses reflection to iterate over all of the fields of the two value types you are trying to compare, checking that their values are equal. In general this is what you want for value type comparison.

However, the overridden Equals() method uses reflection, which is slow, and involves a certain amount of boxing. For speed optimization it can be good to override this method. For a more detailed discussion of this see Jeffrey Richter’s book ‘Applied Microsoft .NET Framework Programming’.

In general it is considered good practice to leave Equals() doing its default identity comparison when defining new reference types (classes). The exception is when you know you want value semantics for your class (like System.String), or when you want Equals to work in a specific way. In particular, if your class is going to be used as a key in a Hashtable you need to override Equals if that is to be in any way efficient.

Note that if you override a.Equals(b) you should also override GetHashCode() and should consider overriding IComparable.CompareTo().

To be continued in part 2

References:

Jeffrey Richter “Applied Microsoft .NET Framework Programming”
http://www.microsoft.com/mspress/books/sampchap/5353.aspx#SampleChapter

March 4, 2007

Problems with Table Adapters in .NET 2.0

Filed under: .net, c#, dataset, dotnet, table adapter, tableadapter, technology — richnewman @ 6:11 pm

There is now second article on the problems with table adapters in .NET.

Introduction

In Visual Studio 2005 Microsoft have effectively deprecated the use of the separate DataAdapter components (OleDbDataAdapter, SQLDataAdapter). They have replaced them with TableAdapters that are code-generated with the DataSet itself, in the same code module. However, we are struggling with how to do connection management with table adapters. A couple of initial problems are described here.

Problem 1: Custom Connection String Set-Up

One problem is that within our organization we are not permitted to use Microsoft’s integrated security. Furthermore, we are expected to use generic database accounts to connect to databases (so all users use the same account). We have been storing the password for this database account in the app.config file on the client workstation. When we do this we encrypt it using 256-bit AES encryption.

So if we are going to use table adapters we need to decrypt the password and set it on the table adapter. We can’t directly use the configuration file password in the way Microsoft clearly intend you to. It is possible that we could use Protected Configuration for this, of course.

Problem 2: Development vs Production Databases

We also need a method of ensuring that for release builds the connection strings on all table adapters are pointing at the production database. Developers can set up table adapters to point at any database they like (and of course this will be a development database) and this connection is cached and will be re-used. Obviously you don’t want your live system accessing a development database for certain data.

Solution

To solve both of these problems we need to be able to set the connection string used by all the table adapters to the same correct value at system start up. Further, we need to be able to construct the connection string in code and then set it.

To do this:

  1. Set up a new data source using the Data Source Configuration Wizard. This can point at any database, but when the wizard asks if you want to save the connection string to the applicaition configuration file tick ‘Yes’ and give the connection a sensible name (e.g. MainConnectionString).
  2. This has the effect of adding the string to both the application configuration file AND to the Settings class (under Properties in Solution Explorer). You’re not interested in the settings in the application configuration file and can delete them. For most projects the table adapters won’t be in the start up component in any case, which means an application configuration file won’t have any effect.
  3. Now extend the Settings class by adding a new partial class to your project as below. This should be in the same namespace as the existing settings class (since it is the same class). As shown, expose a method to set MainConnectionString (or whatever you have called it) on this class.
  4. Now write code that will run at start up, construct your connection string, and set it on the Settings class using your new method.
    internal sealed partial class Settings
     {
         internal void SetMainConnectionString(string value)
         {
             this["MainConnectionString"] = value;
         }
     }

All developers must now use your new named connection to connect any table adapter to the database. However, they can freely edit it to point at any development database they like, because the start up code will redirect it in production. This editing can easily be done in the Settings screen. Obviously you still can’t easily stop developers using the wrong named connection by mistake. You could in theory write a unit test using reflection to find any table adapters that didn’t have their connection correctly set (although this might be tricky since TableAdapter isn’t actually a type).

Conclusion

The problems with table adapters outlined here can be solved by using the Settings class.

However, currently in our project we are still using ‘old-style’ OleDbDataAdapters for all our data access. We upgraded our project to .NET 2.0 at the back end of last year, but have not yet moved to using TableAdapters due to time constraints, and because of the problems above and the problems outlined in my second article – Reasons Not to Use TableAdapters in .NET 2.0.

http://blogs.msdn.com/vbteam/archive/2004/07/19/187953.aspx#214433

Day 1

Filed under: general — richnewman @ 3:59 pm

Welcome to my blog.

Blog at WordPress.com.