Tag Archives: Benchmark

Using BenchmarkDotNet to profile string comparison

Introduction

String comparison and manipulation of strings are some of the slowest and most expensive (in terms of GC) things that you can do in .Net. In my head, I’ve always believed that using String.Compare outperforms string1.ToUpper() == string2.ToUpper(), which I think I once saw on a StackOverflow post.

In this post, I will do some actual testing on the various methods using BenchMarkDotNet (which I have previously written about).

Setting Up BenchmarkDotNet

There’s not much to this – just install a NuGet package:

Install-Package BenchmarkDotNet

Other than that, you just need to decorate your methods with:

[Benchmark]

You can’t (ATM) specify method parameters, but you can decorate a set-up method, or you can specify some parameters in a public variable:

        [Params("test1", "test2", "I am an aardvark")]
        public string _string1;

        [Params("test1", "Test2", "I Am an AARDVARK")]
        public string _string2;

Finally, in the main method, you run the class:

        static void Main(string[] args)
        {
            BenchmarkRunner.Run<StringCompareCaseSensitive>();
        }

Once run, the results are output into the following directory:

bin\Debug\BenchmarkDotNet.Artifacts\results

Comparing strings

Case sensitive

The following are the ways that I can think of to compare a string where the case is known:

string1 == string2

string1.Equals(string2) – with various flags

string.Compare(string1, string2)

string.CompareOrdinal(string1, string2)

string1.CompareTo(string2)

string1.IndexOf(string2) – with various flags

And the results were:

This is definitely not what I expected. String.Compare is actually slower that a straightforward comparison, and not by a small amount.

Case insensitive

The following are the ways that I can think of to compare a string where the case is not known:

String1.ToUpper() == string2.ToUpper()

String1.ToLower() == string2.ToLower()

string1.Equals(string2) – with various flags

string.Compare(string1, string2, true)

string1.IndexOf(string2) -with various flags

Results:

So, it looks like the most efficient string comparison is:

_string1.Equals(_string2, StringComparison.OrdinalIgnoreCase);

But why?

Nobody knows – Looking at the IL

The good thing about .Net, is that if you want to see what your code looks like once it’s “compiled”, you can. It’s not perfect, because you still can’t see the actual, executed code, but it still gives you a good idea of why it’s slow or fast. However, because all of the functions in question are system functions, looking at the IL for the test code is pretty much pointless.

Let’s run ildasm:

(bet you’re glad I included that screenshot)

The string comparison functions are in mscorelib.dll:

Here’s the code in there:

.method public hidebysig static int32  Compare(string strA,
                                               string strB,
                                               valuetype System.StringComparison comparisonType) cil managed
{
  .custom instance void System.Security.SecuritySafeCriticalAttribute::.ctor() = ( 01 00 00 00 ) 
  // Code size       0 (0x0)
} // end of method String::Compare

To be honest, I spent a while burrowing down this particular rabbit hole… but finally decided to see what ILSpy had to say about it… it looks like there is a helper method in the string class that, for some reason, ildasm doesn’t show. Let’s have a look what it does for:

string.Compare(_string1, _string2, true) == 0

The decompiled version is:

[__DynamicallyInvokable]
public static int Compare(string strA, string strB, bool ignoreCase)
{
    if (ignoreCase)
    {
        return CultureInfo.CurrentCulture.CompareInfo.Compare(strA, strB, CompareOptions.IgnoreCase);
    }
    return CultureInfo.CurrentCulture.CompareInfo.Compare(strA, strB, CompareOptions.None);
}

And the static method CompareInfo.Compare:

public virtual int Compare(string string1, string string2, CompareOptions options)
{
    if (options == CompareOptions.OrdinalIgnoreCase)
    {
        return string.Compare(string1, string2, StringComparison.OrdinalIgnoreCase);
    }
    if ((options & CompareOptions.Ordinal) != CompareOptions.None)
    {
        if (options != CompareOptions.Ordinal)
        {
            throw new ArgumentException(Environment.GetResourceString("Argument_CompareOptionOrdinal"), "options");
        }
        return string.CompareOrdinal(string1, string2);
    }
    else
    {
        if ((options & ~(CompareOptions.IgnoreCase | CompareOptions.IgnoreNonSpace | CompareOptions.IgnoreSymbols | CompareOptions.IgnoreKanaType | CompareOptions.IgnoreWidth | CompareOptions.StringSort)) != CompareOptions.None)
        {
            throw new ArgumentException(Environment.GetResourceString("Argument_InvalidFlag"), "options");
        }
        if (string1 == null)
        {
            if (string2 == null)
            {
                return 0;
            }
            return -1;
        }
        else
        {
            if (string2 == null)
            {
                return 1;
            }
            return CompareInfo.InternalCompareString(this.m_dataHandle, this.m_handleOrigin, this.m_sortName, string1, 0, string1.Length, string2, 0, string2.Length, CompareInfo.GetNativeCompareFlags(options));
        }
    }
}

And further:

Well… I couldn’t get further, so I asked Microsoft… the impression is that this function is generated at runtime.

There was a link to some code in this answer, too. While I couldn’t really identify any actual comparison code from this, I did notice that there was a check like this:

#ifndef FEATURE_CORECLR

So… does .NetCore work any better?

Having created a new .Net Core project, and copying the files across (I was going to add them as a link, but InvariantCulture has been removed (or rather, not included) in Core.

Anyway, the results from .Net Core (for case sensitive checks) are:

And case in-sensitive:

Conclusion

So, the clear winner across all tests for case sensitive checks is to use:

string1.Equals(string2)

And .Net Core is slightly faster than 4.6.2.

For case insensitive the clear winner is (by a large margin):

string1.Equals(string2, StringComparison.OrdinalIgnoreCase);

And, again, there’s around a 15 – 20% speed boost using .Net Core.

References

There is a GitHub repository for the code in this post here.

https://msdn.microsoft.com/en-us/library/fbh501kz%28v=vs.110%29.aspx?f=255&MSPPError=-2147217396

https://github.com/dotnet/BenchmarkDotNet/issues/60

http://mattwarren.org/2016/02/17/adventures-in-benchmarking-memory-allocations/

https://www.hanselman.com/blog/BenchmarkingNETCode.aspx

http://pmichaels.net/2016/11/04/message-persistence-in-rabbitmq-and-benchmarkdotnet/

https://blog.codinghorror.com/the-real-cost-of-performance/

https://msdn.microsoft.com/en-us/library/aa309387%28v=vs.71%29.aspx?f=255&MSPPError=-2147217396

http://ilspy.net/

http://stackoverflow.com/questions/9491337/what-is-dllimportqcall

Message Persistence in RabbitMQ and BenchMarkDotNet

(Note: if you want to follow the code on this, you might find it easier if you start from the project that I create here.)

A queue in a message broker can be persistent, which means that, should you have a power failure (or just shut down the server), when it comes back, the queue is still there.

So, we can create a durable (persistent) queue, like this:

var result = channel.QueueDeclare("NewQueue", true, false, false, args);

The second parameter indicates that the queue is durable. Let’s send it some messages:

static void Main(string[] args)
{            
    for (int i = 1; i <= 100; i++)
    {
        string msg = $"test{i}";
 
        SendNewMessage(msg);
    } 
    
}
private static void SendNewMessage(string message)
{
    var factory = new ConnectionFactory() { HostName = "localhost" };
    using (var connection = factory.CreateConnection())
    using (var channel = connection.CreateModel())
    {
        Dictionary<string, object> args = 
            DeadLetterHelper.CreateDeadLetterQueue(channel,
            "dl.exchange", "dead-letter", "DeadLetterQueue");
 
        var result = channel.QueueDeclare("NewQueue", true, false, false, args);
        Console.WriteLine(result);
 
        channel.BasicPublish("", "NewQueue", null, Encoding.UTF8.GetBytes(message));                
 
    }
}

Now we have 100 messages:

persist1

Let’s simulate a server reboot:

parsist2

Following the reboot, it’s gone:

persist3

Admittedly, that doesn’t sound very durable!

Why?

The reason for this, is that the durability of the queue doesn’t affect the durability of the message. At least, if the queue is durable, it doesn’t make the message so.

How can it be made persistent?

Let’s change our send code a little:

private static void SendNewMessage(string message)
{
    var factory = new ConnectionFactory() { HostName = "localhost" };
    using (var connection = factory.CreateConnection())
    using (var channel = connection.CreateModel())
    {
        Dictionary<string, object> args = 
            DeadLetterHelper.CreateDeadLetterQueue(channel,
            "dl.exchange", "dead-letter", "DeadLetterQueue");
 
        var result = channel.QueueDeclare("NewQueue", true, false, false, args);
        Console.WriteLine(result);
 
        IBasicProperties prop = channel.CreateBasicProperties();
        prop.Persistent = true;
 
        channel.BasicPublish("", "NewQueue", prop, Encoding.UTF8.GetBytes(message));                
 
    }
}

The only difference is the addition of the IBasicProperties parameter. The Persistent flag is set. Now we’ll re run the same test; here’s the messages:

persist4

And after restarting the service:

persist5

As you can see, the messages are still there, and you can see the time difference where they’re been restored to the queue after a failure.

Speed – Introducing BenchmarkDotNet

I suppose the main question here is what price do you pay for durability. This gives me a chance to play with a new tool that I heard about a little while ago: BenchmarkDotNet.

It’s quite straightforward to use, just add the NuGet package:

Install-Package BenchmarkDotNet

There’s a bit of refactoring; I effectively ripped out the send and called it from a separate class:

class Program
{        
    static void Main(string[] args)
    {
        BenchmarkRunner.Run<SpeedTest>();
    }
}
 
public class SpeedTest
{
    [Benchmark]
    public void SendNewMessagePersist()
    {
        MessageHelper helper = new MessageHelper();
        helper.SendStringMessage("Test", "NewQueue", true);
    }
 
    [Benchmark]
    public void SendNewMessageNonPersist()
    {
        MessageHelper helper = new MessageHelper();
        helper.SendStringMessage("Test", "NewQueue", false);
    }
 
 
}

I then ran this:

persist6

And it produced this:

persist7

So, it is a bit slower to persist the message. I’m not sure how helpful this information is: I probably could have guessed that persisting the message would have been slower beforehand. Having said that, I am quite impressed with BenchMarkDotNet!