Tag Archives: string

Using BenchmarkDotNet to profile string comparison

Introduction

String comparison and manipulation of strings are some of the slowest and most expensive (in terms of GC) things that you can do in .Net. In my head, I’ve always believed that using String.Compare outperforms string1.ToUpper() == string2.ToUpper(), which I think I once saw on a StackOverflow post.

In this post, I will do some actual testing on the various methods using BenchMarkDotNet (which I have previously written about).

Setting Up BenchmarkDotNet

There’s not much to this – just install a NuGet package:

Install-Package BenchmarkDotNet

Other than that, you just need to decorate your methods with:

[Benchmark]

You can’t (ATM) specify method parameters, but you can decorate a set-up method, or you can specify some parameters in a public variable:

        [Params("test1", "test2", "I am an aardvark")]
        public string _string1;

        [Params("test1", "Test2", "I Am an AARDVARK")]
        public string _string2;

Finally, in the main method, you run the class:

        static void Main(string[] args)
        {
            BenchmarkRunner.Run<StringCompareCaseSensitive>();
        }

Once run, the results are output into the following directory:

bin\Debug\BenchmarkDotNet.Artifacts\results

Comparing strings

Case sensitive

The following are the ways that I can think of to compare a string where the case is known:

string1 == string2

string1.Equals(string2) – with various flags

string.Compare(string1, string2)

string.CompareOrdinal(string1, string2)

string1.CompareTo(string2)

string1.IndexOf(string2) – with various flags

And the results were:

This is definitely not what I expected. String.Compare is actually slower that a straightforward comparison, and not by a small amount.

Case insensitive

The following are the ways that I can think of to compare a string where the case is not known:

String1.ToUpper() == string2.ToUpper()

String1.ToLower() == string2.ToLower()

string1.Equals(string2) – with various flags

string.Compare(string1, string2, true)

string1.IndexOf(string2) -with various flags

Results:

So, it looks like the most efficient string comparison is:

_string1.Equals(_string2, StringComparison.OrdinalIgnoreCase);

But why?

Nobody knows – Looking at the IL

The good thing about .Net, is that if you want to see what your code looks like once it’s “compiled”, you can. It’s not perfect, because you still can’t see the actual, executed code, but it still gives you a good idea of why it’s slow or fast. However, because all of the functions in question are system functions, looking at the IL for the test code is pretty much pointless.

Let’s run ildasm:

(bet you’re glad I included that screenshot)

The string comparison functions are in mscorelib.dll:

Here’s the code in there:

.method public hidebysig static int32  Compare(string strA,
                                               string strB,
                                               valuetype System.StringComparison comparisonType) cil managed
{
  .custom instance void System.Security.SecuritySafeCriticalAttribute::.ctor() = ( 01 00 00 00 ) 
  // Code size       0 (0x0)
} // end of method String::Compare

To be honest, I spent a while burrowing down this particular rabbit hole… but finally decided to see what ILSpy had to say about it… it looks like there is a helper method in the string class that, for some reason, ildasm doesn’t show. Let’s have a look what it does for:

string.Compare(_string1, _string2, true) == 0

The decompiled version is:

[__DynamicallyInvokable]
public static int Compare(string strA, string strB, bool ignoreCase)
{
    if (ignoreCase)
    {
        return CultureInfo.CurrentCulture.CompareInfo.Compare(strA, strB, CompareOptions.IgnoreCase);
    }
    return CultureInfo.CurrentCulture.CompareInfo.Compare(strA, strB, CompareOptions.None);
}

And the static method CompareInfo.Compare:

public virtual int Compare(string string1, string string2, CompareOptions options)
{
    if (options == CompareOptions.OrdinalIgnoreCase)
    {
        return string.Compare(string1, string2, StringComparison.OrdinalIgnoreCase);
    }
    if ((options & CompareOptions.Ordinal) != CompareOptions.None)
    {
        if (options != CompareOptions.Ordinal)
        {
            throw new ArgumentException(Environment.GetResourceString("Argument_CompareOptionOrdinal"), "options");
        }
        return string.CompareOrdinal(string1, string2);
    }
    else
    {
        if ((options & ~(CompareOptions.IgnoreCase | CompareOptions.IgnoreNonSpace | CompareOptions.IgnoreSymbols | CompareOptions.IgnoreKanaType | CompareOptions.IgnoreWidth | CompareOptions.StringSort)) != CompareOptions.None)
        {
            throw new ArgumentException(Environment.GetResourceString("Argument_InvalidFlag"), "options");
        }
        if (string1 == null)
        {
            if (string2 == null)
            {
                return 0;
            }
            return -1;
        }
        else
        {
            if (string2 == null)
            {
                return 1;
            }
            return CompareInfo.InternalCompareString(this.m_dataHandle, this.m_handleOrigin, this.m_sortName, string1, 0, string1.Length, string2, 0, string2.Length, CompareInfo.GetNativeCompareFlags(options));
        }
    }
}

And further:

Well… I couldn’t get further, so I asked Microsoft… the impression is that this function is generated at runtime.

There was a link to some code in this answer, too. While I couldn’t really identify any actual comparison code from this, I did notice that there was a check like this:

#ifndef FEATURE_CORECLR

So… does .NetCore work any better?

Having created a new .Net Core project, and copying the files across (I was going to add them as a link, but InvariantCulture has been removed (or rather, not included) in Core.

Anyway, the results from .Net Core (for case sensitive checks) are:

And case in-sensitive:

Conclusion

So, the clear winner across all tests for case sensitive checks is to use:

string1.Equals(string2)

And .Net Core is slightly faster than 4.6.2.

For case insensitive the clear winner is (by a large margin):

string1.Equals(string2, StringComparison.OrdinalIgnoreCase);

And, again, there’s around a 15 – 20% speed boost using .Net Core.

References

There is a GitHub repository for the code in this post here.

https://msdn.microsoft.com/en-us/library/fbh501kz%28v=vs.110%29.aspx?f=255&MSPPError=-2147217396

https://github.com/dotnet/BenchmarkDotNet/issues/60

http://mattwarren.org/2016/02/17/adventures-in-benchmarking-memory-allocations/

https://www.hanselman.com/blog/BenchmarkingNETCode.aspx

http://pmichaels.net/2016/11/04/message-persistence-in-rabbitmq-and-benchmarkdotnet/

https://blog.codinghorror.com/the-real-cost-of-performance/

https://msdn.microsoft.com/en-us/library/aa309387%28v=vs.71%29.aspx?f=255&MSPPError=-2147217396

http://ilspy.net/

http://stackoverflow.com/questions/9491337/what-is-dllimportqcall

Chaos Monkey – Part 3 – Consuming Memory

Continuing from previous posts on programs that generally do your machine no good at all, I thought it might be an idea to have a look what I could do to the available memory. The use case here being that you want to see how your application can function when in competition with either one high-memory process, or many smaller ones.

To accomplish this, we’re going to create a list of strings – since strings are notoriously bad for memory anyway. The first thing to note here is that a single character takes up 16 bits, which is 2 bytes.

The second this is how to check the system’s available memory:

        private static System.Diagnostics.PerformanceCounter ramCounter =
            new System.Diagnostics.PerformanceCounter("Memory", "Available MBytes");

        private static long GetRemainingMemory()
        {
            return ramCounter.RawValue;
        }

Finally, you need to be aware that you can only use up all the memory in your machine (assuming you have more than 2GB) if you run the app in x64 mode. If you have less then you probably don’t need this article to simulate what low memory feels like.

There is a pretty big caveat to doing this; once you actually run out of memory; it takes a good few minutes for the system to catch up; even when you terminate the process. Consequently, the code that I use allows you to specify a “remaining memory”; here’s the main function:

        static void Main(string[] args)
        {
            long remainingMemory = int.Parse(args[0]);

            // Determine how much memory there is
            long memoryLeft = GetRemainingMemory();
            Console.WriteLine("Consuming memory until {0} is left", remainingMemory);

            // Calculate how much memory to use
            long removeMemory = memoryLeft - remainingMemory;

            // Call the function to consume the memory
            Console.WriteLine("Consuming {0} memory", removeMemory);
            ConsumeMemory(removeMemory, 1000);

            // Free the memory
            Console.WriteLine("Press any key to free memory");
            Console.ReadLine();
            FreeMemory();

            Console.ReadLine();
        }

As you can see, it first determines what we have to play with, and then calls a function to consume it. The second parameter to ConsumeMemory allows you to specify the speed which it consumes memory. If you set this to 1 then the usage will be slow; however, if you set it higher than you want for the remaining memory then it may use too much. Also, it doesn’t seem to improve speed much after that anyway.

The ConsumeMemory() function looks like this:

        static void ConsumeMemory(long memoryToConsumeMB, int consumePerItt)
        {            
            long bitsPerMB = 1024 * 1024 * 8;
            // Single char 2 bytes (16 bits)
            long numCharsPerMB = (bitsPerMB / 16);
            long numChars = numCharsPerMB * memoryToConsumeMB;
            long counter = 1, chunk = 0;

            if (memoryToConsumeMB > GetRemainingMemory())
            {
                Console.WriteLine("Cannot consume {0} because there is only {1} left", 
                    memoryToConsumeMB, GetRemainingMemory());
            }

            counter = memoryToConsumeMB / consumePerItt;
            chunk = numCharsPerMB * consumePerItt;

            Console.WriteLine("Consuming {0} memory", memoryToConsumeMB);

            for (int i = 1; i <= counter; i++)
            {
                Console.WriteLine("Consuming {0} MB", chunk / numCharsPerMB);

                _str.Add(new string('_', (int)chunk));

                Console.WriteLine("Memory remaining: {0}", GetRemainingMemory());
            }
        }

So, we basically work out how much memory we’re using each iteration and just add to a list of strings each time. Here’s what it looks like when you run it as above:

Chaos1

The FreeMemory() function just releases the list and calls the GC:

        private static void FreeMemory()
        {
            _str = null;
            GC.Collect();
            ShowMemory();
        }

As you can see, it ramps up pretty quick. In this case I’m leaving 2GB.

Super Chaos Monkey Mode

Let’s try putting this in a loop and take out the prompts:

        static void Main(string[] args)
        {
            long remainingMemory = int.Parse(args[0]);

            while (true)
            {
                // Determine how much memory there is
                long memoryLeft = GetRemainingMemory();
                Console.WriteLine("Consuming memory until {0} is left", remainingMemory);

                // Calculate how much memory to use
                long removeMemory = memoryLeft - remainingMemory;

                // Call the function to consume the memory
                Console.WriteLine("Consuming {0} memory", removeMemory);
                ConsumeMemory(removeMemory, 1000);

                // Free the memory
                Console.WriteLine("Press any key to free memory");
                //Console.ReadLine();
                FreeMemory();
            }

            //Console.ReadLine();
        }

        private static void FreeMemory()
        {
            _str = new List<string>();
            GC.Collect();
            ShowMemory();
        }

Chaos2

A note on the GC

Okay – there are very few cases where the GC.Collect() should be called. But I believe this to be one of them. The reason being that, not calling it explicitly ends in the following:

Chaos3

Basically, by the time the garbage collection kicks in, you’re already allocating more memory, which affects the ebb and flow.

Super speed chaos

If you want very rapid consumption of memory, just alter the consume memory function as follows:

            //for (int i = 1; i <= counter; i++)
            Parallel.For(1, counter + 1, (i) =>
              {
                  Console.WriteLine("Consuming {0} MB", chunk / numCharsPerMB);

                  _str.Add(new string('_', (int)chunk));

                  Console.WriteLine("Memory remaining: {0}", GetRemainingMemory());
              });

Chaos4

Be very careful with this one, though. A slight bug in your code and you’ll need to do a hard reboot of your machine.