Tag Archives: Chaos

Introduction to Azure Chaos Studio

Some time ago, I investigated the concept of chaos engineering. The principle behind Chaos Engineering is a very simply one: since your software is likely to encounter hostile conditions in the wild, why not introduce those conditions while (and when) you can control them, and then deal with the fallout then, instead of at 3am on a Sunday.

At the time, I was trying to deal with an on-site issue where the connection seemed to be randomly dropping. In the end, I solved this by writing something similar to Polly – albeit a much simpler version.

Microsoft have recently released a preview of something called Chaos Studio. It’s very much in its infancy now, but what is there looks very interesting.

The product is essentially divided into two sections: targets and experiments. Targets represent the thing that you intend to wrought chaos upon, and experiments are how that chaos will be wrought.

Scope

For this test, I’m going to use a VM. That’s mainly because what you can do with this product is currently limited to VMs, AKS, and Redis.

Create a VM and Check Availability

The first step is to create a VM. To be honest, it doesn’t matter what the VM is, because all we’ll be doing is switching it off. Start by checking the availability – you should be able to do that in Logs – and you should notice 100% availability, unless something has gone catastrophically wrong with your deployment.

Targets

The next step is to configure our target. In chaos studio, select Targets and pick the new VM:

Not that you’ve enabled the targets, you’ll need to grant permission to the chaos studio for the VMs. Inside the VM blade, select Access Control:

If you don’t grant this access, you’ll get a permissions error when you run the experiment. The next step is to create the experiment. In Chaos Studio, select Experiments and then Create:

This will bring up a screen similar to the following:

Let’s discuss a little the concepts here: we have step, branch, and fault. A step is a sequential action that you will execute, whilst a branch is a parallel action; that is, actions in different branches can happen at the same time. A fault is what you actually do – so the fault is the chaos! Let’s add a fault:

This asks me two things, what do I want the fault to happen on (you can only select targets that have previously been created) and what do I want the fault to be. In my case, I’ve created a two step process that turns the machine off, waits a minute, then turns it off again:

Now that the experiment is created, you can start it. You get a warning at this point that basically says “it’s your foot, and you’re currently pointing a high powered rifle at it!”:

If you now run this, and it’s worth bearing in mind that there’s no simulation here – if you do this on production infrastructure it will shut it down for you, then you’ll see the update of it running:

You can drill down into the details to see exactly what it’s doing, what stage, etc.:

The experiment kills the machine for 1 minute, then waits for a minute, then kills it again. If you have a look at the availability graph, you should be able to see that:

Summary

So far, I’m pretty impressed with this tool. When they’ve finished (and by that, I mean, they’ve given the ability to create your own chaos, and have expanded the targets to cover the entire Azure ecosystem), it’s going to be a really interesting testing tool.

References

Azure Friday Introduction to Chaos Studio

Chaos Monkey – Part 4 – Creating an Asp.Net 6 Application that Caches an Error

This is a really strange post, but it’s a line up for a different post; however, I felt it made sense to be a post in its own right – it follows on from a trend I have of creating things that break on purpose. For example, here’s a post from a few years ago where I discussed how you might force a machine to run out of memory.

In this case, I’m creating a simple application that runs fine, but at a random point, it generates an error, which it caches, and then is broken until the application is restarted.

Why?

I’m working on some alerting and resilience experiments at the minute, and having an unstable application is useful for those tests. Also, this is not an unusual scenario – I mean, obviously, writing an application that purposes crashes after it’s broken, and from then on, is unusual; but having an application that does this somewhere in your estate may not be so unusual.

How

I’ve set-up a bog standard Asp.Net MVC 6 application. I then installed the following package:

Install-Package System.Runtime.Caching

Finally, I changed the default Privacy controller action to potentially crash:

public IActionResult Privacy()
{
    string result = Crash();
    return View(model: result);
}

Here, I’m feeding a string into the privacy view as its model. The Crash method has a 1 in 10 chance of caching an error:

        private string Crash()
        {
            if (!_memoryCache.TryGetValue("Error", out string errorCache))
            {
                if (_random.Next(10) == 1)
                {
                    _memoryCache.Set("Error", "Now broken!");
                    return "Now broken";
                }
            }
            else
            {
                throw new Exception("Some exception");
            }

            return "Working fine";
        }

I then just display the model in the view (privacy.cshtml):

@model string
@{
    ViewData["Title"] = "Privacy Policy";
}
<h1>@ViewData["Title"]</h1>
<h1>@Model</h1>

<p>Use this page to detail your site's privacy policy.</p>

Now, if you run it, somewhere between 2 and 15 times, you’re likely to see it break, and need to restart to fix.

Chaos Monkey – Part 3 – Consuming Memory

Continuing from previous posts on programs that generally do your machine no good at all, I thought it might be an idea to have a look what I could do to the available memory. The use case here being that you want to see how your application can function when in competition with either one high-memory process, or many smaller ones.

To accomplish this, we’re going to create a list of strings – since strings are notoriously bad for memory anyway. The first thing to note here is that a single character takes up 16 bits, which is 2 bytes.

The second this is how to check the system’s available memory:

        private static System.Diagnostics.PerformanceCounter ramCounter =
            new System.Diagnostics.PerformanceCounter("Memory", "Available MBytes");

        private static long GetRemainingMemory()
        {
            return ramCounter.RawValue;
        }

Finally, you need to be aware that you can only use up all the memory in your machine (assuming you have more than 2GB) if you run the app in x64 mode. If you have less then you probably don’t need this article to simulate what low memory feels like.

There is a pretty big caveat to doing this; once you actually run out of memory; it takes a good few minutes for the system to catch up; even when you terminate the process. Consequently, the code that I use allows you to specify a “remaining memory”; here’s the main function:

        static void Main(string[] args)
        {
            long remainingMemory = int.Parse(args[0]);

            // Determine how much memory there is
            long memoryLeft = GetRemainingMemory();
            Console.WriteLine("Consuming memory until {0} is left", remainingMemory);

            // Calculate how much memory to use
            long removeMemory = memoryLeft - remainingMemory;

            // Call the function to consume the memory
            Console.WriteLine("Consuming {0} memory", removeMemory);
            ConsumeMemory(removeMemory, 1000);

            // Free the memory
            Console.WriteLine("Press any key to free memory");
            Console.ReadLine();
            FreeMemory();

            Console.ReadLine();
        }

As you can see, it first determines what we have to play with, and then calls a function to consume it. The second parameter to ConsumeMemory allows you to specify the speed which it consumes memory. If you set this to 1 then the usage will be slow; however, if you set it higher than you want for the remaining memory then it may use too much. Also, it doesn’t seem to improve speed much after that anyway.

The ConsumeMemory() function looks like this:

        static void ConsumeMemory(long memoryToConsumeMB, int consumePerItt)
        {            
            long bitsPerMB = 1024 * 1024 * 8;
            // Single char 2 bytes (16 bits)
            long numCharsPerMB = (bitsPerMB / 16);
            long numChars = numCharsPerMB * memoryToConsumeMB;
            long counter = 1, chunk = 0;

            if (memoryToConsumeMB > GetRemainingMemory())
            {
                Console.WriteLine("Cannot consume {0} because there is only {1} left", 
                    memoryToConsumeMB, GetRemainingMemory());
            }

            counter = memoryToConsumeMB / consumePerItt;
            chunk = numCharsPerMB * consumePerItt;

            Console.WriteLine("Consuming {0} memory", memoryToConsumeMB);

            for (int i = 1; i <= counter; i++)
            {
                Console.WriteLine("Consuming {0} MB", chunk / numCharsPerMB);

                _str.Add(new string('_', (int)chunk));

                Console.WriteLine("Memory remaining: {0}", GetRemainingMemory());
            }
        }

So, we basically work out how much memory we’re using each iteration and just add to a list of strings each time. Here’s what it looks like when you run it as above:

Chaos1

The FreeMemory() function just releases the list and calls the GC:

        private static void FreeMemory()
        {
            _str = null;
            GC.Collect();
            ShowMemory();
        }

As you can see, it ramps up pretty quick. In this case I’m leaving 2GB.

Super Chaos Monkey Mode

Let’s try putting this in a loop and take out the prompts:

        static void Main(string[] args)
        {
            long remainingMemory = int.Parse(args[0]);

            while (true)
            {
                // Determine how much memory there is
                long memoryLeft = GetRemainingMemory();
                Console.WriteLine("Consuming memory until {0} is left", remainingMemory);

                // Calculate how much memory to use
                long removeMemory = memoryLeft - remainingMemory;

                // Call the function to consume the memory
                Console.WriteLine("Consuming {0} memory", removeMemory);
                ConsumeMemory(removeMemory, 1000);

                // Free the memory
                Console.WriteLine("Press any key to free memory");
                //Console.ReadLine();
                FreeMemory();
            }

            //Console.ReadLine();
        }

        private static void FreeMemory()
        {
            _str = new List<string>();
            GC.Collect();
            ShowMemory();
        }

Chaos2

A note on the GC

Okay – there are very few cases where the GC.Collect() should be called. But I believe this to be one of them. The reason being that, not calling it explicitly ends in the following:

Chaos3

Basically, by the time the garbage collection kicks in, you’re already allocating more memory, which affects the ebb and flow.

Super speed chaos

If you want very rapid consumption of memory, just alter the consume memory function as follows:

            //for (int i = 1; i <= counter; i++)
            Parallel.For(1, counter + 1, (i) =>
              {
                  Console.WriteLine("Consuming {0} MB", chunk / numCharsPerMB);

                  _str.Add(new string('_', (int)chunk));

                  Console.WriteLine("Memory remaining: {0}", GetRemainingMemory());
              });

Chaos4

Be very careful with this one, though. A slight bug in your code and you’ll need to do a hard reboot of your machine.

Chaos Monkey – Part 2 – Programmatically Resetting IIS at Scheduled Intervals

In this previous post I gave an example of a DOS batch script that simulated an unstable network. This is an alternative to that in .NET, which uses the `System.ServiceProcess` namespace

Let’s start with the main function:

        private static async Task MainLoop()
        {
            while (true)
            {
                Console.WriteLine("Stopping IIS");
                StopService("World Wide Web Publishing Service", 10000);

                await Task.Delay(3000);

                Console.WriteLine("Starting IIS");
                StartService("World Wide Web Publishing Service", 10000);

                await Task.Delay(5000);
            }
        }

This defines the flow of the code: essentially, it’s just stop the IIS service, wait, start it again… and wait. The service name for IIS is “World Wide Web Publishing Service” – at least for Windows 7 & 8 it is. The start and stop functions look like this:

        public static void StartService(string serviceName, int timeoutMilliseconds)
        {
            ServiceController service = new ServiceController(serviceName);

            TimeSpan timeout = TimeSpan.FromMilliseconds(timeoutMilliseconds);

            service.Start();
            service.WaitForStatus(ServiceControllerStatus.Running, timeout);
        }

        public static void StopService(string serviceName, int timeoutMilliseconds)
        {
            ServiceController service = new ServiceController(serviceName);

            // Only stop if it's started
            if (service.Status != ServiceControllerStatus.Running) return;

            TimeSpan timeout = TimeSpan.FromMilliseconds(timeoutMilliseconds);

            service.Stop();
            service.WaitForStatus(ServiceControllerStatus.Stopped, timeout);
        }

Obviously these could be used to stop and start any service; although you must be running as admin to affect admin services (such as IIS).

To test this, check you have a “default.htm” in your wwwroot and then navigate to localhost in a web browser. Run this app in the background and press F5 on your browser until you get an error.