Category Archives: Azure Service Fabric

Working with Multiple Cloud Providers – Part 3 – Linking Azure and GCP

This is the third and final post in a short series on linking up Azure with GCP (for Christmas). In the first post, I set-up a basic Azure function that updated some data in table storage, and then in the second post, I configured the GCP link from PubSub into BigQuery.

In the post, we’ll square this off by adapting the Azure function to post a message directly to PubSub; then, we’ll call the Azure function with Santa’a data, and watch that appear in BigQuery. At least, that was my plan – but Microsoft had other ideas.

It turns out that Azure functions have a dependency on Newtonsoft Json 9.0.1, and the GCP client libraries require 10+. So instead of being a 10 minute job on Boxing day to link the two, it turned into a mammoth task. Obviously, I spent the first few hours searching for a way around this – surely other people have faced this, and there’s a redirect, setting, or way of banging the keyboard that makes it work? Turns out not.

The next idea was to experiment with contacting the Google server directly, as is described here. Unfortunately, you still need the Auth libraries.

Finally, I swapped out the function for a WebJob. WebJobs give you a little move flexibility, and have no hard dependencies. So, on with the show (albeit a little more involved than expected).

WebJob

In this post I described how to create a basic WebJob. Here, we’re going to do something similar. In our case, we’re going to listen for an Azure Service Bus Message, and then update the Azure Storage table (as described in the previous post), and call out to GCP to publish a message to PubSub.

Handling a Service Bus Message

We weren’t originally going to take this approach, but I found that WebJobs play much nicer with a Service Bus message, than with trying to get them to fire on a specific endpoint. In terms of scaleability, adding a queue in the middle can only be a good thing. We’ll square off the contactable endpoint at the end with a function that will simply convert the endpoint to a message on the queue. Here’s what the WebJob Program looks like:

public static void ProcessQueueMessage(
    [ServiceBusTrigger("localsantaqueue")] string message,
    TextWriter log,
    [Table("Delivery")] ICollector<TableItem> outputTable)
{
    Console.WriteLine("test");
 
    log.WriteLine(message);
 
    // parse query parameter
    TableItem item = Newtonsoft.Json.JsonConvert.DeserializeObject<TableItem>(message);
    if (string.IsNullOrWhiteSpace(item.PartitionKey)) item.PartitionKey = item.childName.First().ToString();
    if (string.IsNullOrWhiteSpace(item.RowKey)) item.RowKey = item.childName;
 
    outputTable.Add(item);
 
    GCPHelper.AddMessageToPubSub(item).GetAwaiter().GetResult();
    
    log.WriteLine("DeliveryComplete Finished");
 
}

Effectively, this is the same logic as the function (obviously, we now have the GCPHelper, and we’ll come to that in a minute. First, here’s the code for the TableItem model:

[JsonObject(MemberSerialization.OptIn)]
public class TableItem : TableEntity
{
    [JsonProperty]
    public string childName { get; set; }
 
    [JsonProperty]
    public string present { get; set; }
}

As you can see, we need to decorate the members with specific serialisation instructions. The reason being that this model is being used by both GCP (which only needs what you see on the screen) and Azure (which needs the inherited properties).

GCPHelper

As described here, you’ll need to install the client package for GCP into the Azure Function App that we created in post one of this series (referenced above):

Install-Package Google.Cloud.PubSub.V1 -Pre

Here’s the helper code that I mentioned:

public static class GCPHelper
{
    public static async Task AddMessageToPubSub(TableItem toSend)
    {
        string jsonMsg = Newtonsoft.Json.JsonConvert.SerializeObject(toSend);
        
        Environment.SetEnvironmentVariable(
            "GOOGLE_APPLICATION_CREDENTIALS",
            Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "Test-Project-8d8d83hs4hd.json"));
        GrpcEnvironment.SetLogger(new ConsoleLogger());

        PublisherClient publisher = PublisherClient.Create();
        string projectId = "test-project-123456";
        TopicName topicName = new TopicName(projectId, "test");
        SimplePublisher simplePublisher = 
            await SimplePublisher.CreateAsync(topicName);
        string messageId = 
            await simplePublisher.PublishAsync(jsonMsg);
        await simplePublisher.ShutdownAsync(TimeSpan.FromSeconds(15));
    }
 
}

I detailed in this post how to create a credentials file; you’ll need to do that to allow the WebJob to be authorised. The Json file referenced above was created using that process.

Azure Config

You’ll need to create an Azure message queue (I’ve called mine localsantaqueue):

I would also download the Service Bus Explorer (I’ll be using it later for testing).

GCP Config

We already have a DataFlow, a PubSub Topic and a BigQuery Database, so GCP should require no further configuration; except to ensure the permissions are correct.

The Service Account user (which I give more details of here needs to have PubSub permissions. For now, we’ll make them an editor, although in this instance, they probably only need publish:

Test

We can do a quick test using the Service Bus Explorer and publish a message to the queue:

The ultimate test is that we can then see this in the BigQuery Table:

Lastly, the Function

This won’t be a completely function free post. The last step is to create a function that adds a message to the queue:

[FunctionName("Function1")]
public static HttpResponseMessage Run(
    [HttpTrigger(AuthorizationLevel.Function, "post")]HttpRequestMessage req,             
    TraceWriter log,
    [ServiceBus("localsantaqueue")] ICollector<string> queue)
{
    log.Info("C# HTTP trigger function processed a request.");
    var parameters = req.GetQueryNameValuePairs();
    string childName = parameters.First(a => a.Key == "childName").Value;
    string present = parameters.First(a => a.Key == "present").Value;
    string json = "{{ 'childName': '{childName}', 'present': '{present}' }} ";            
    queue.Add(json);
    

    return req.CreateResponse(HttpStatusCode.OK);
}

So now we have an endpoint for our imaginary Xamarin app to call into.

Summary

Both GCP and Azure are relatively immature platforms for this kind of interaction. The GCP client libraries seem to be missing functionality (and GCP is still heavily weighted away from .Net). The Azure libraries (especially functions) seem to be in a pickle, too – with strange dependencies that makes it very difficult to communicate outside of Azure. As a result, this task (which should have taken an hour or so) took a great deal of time, and it was completely unnecessary.

Having said that, it is clearly possible to link the two systems, if a little long-winded.

References

https://blog.falafel.com/rest-google-cloud-pubsub-with-oauth/

https://github.com/Azure/azure-functions-vs-build-sdk/issues/107

https://docs.microsoft.com/en-us/azure/azure-functions/functions-bindings-service-bus

https://stackoverflow.com/questions/48092003/adding-to-a-queue-using-an-azure-function-in-c-sharp/48092276#48092276

Working with Multiple Cloud Providers – Part 2 – Getting Data Into BigQuery

In this post, I described how we might attempt to help Santa and his delivery drivers to deliver presents to every child in the world, using the combined power of Google and Microsoft.

In this, the second part of the series (there will be one more), I’m going to describe how we might set-up a GCP pipeline that feeds that data into BigQuery (Google’s BigData NoSQL warehouse offering). We’ll first set up BigQuery, then the PubSub topic, and finally, we’ll set-up the dataflow, ready for Part 3, which will be joining the two systems together.

BigQuery

Once you navigate to the BigQuery section of the GCP console, you’ll be able to create a Dataset:

You can now set-up a new table. As this is an illustration, we’ll keep it as simple as possible, but you can see that this might be much more complex:

One thing to bear in mind about BigQuery, and cloud data storage in general is that, often, it makes sense to de-normalise your data – storage is often much cheaper than CPU time.

PubSub

Now we have somewhere to put the data; we could simply have the Azure function write the data into BigQuery. However, we might then run into problems if the data flow suddenly spiked. For this reason, Google recommends the use of PubSub as a shock absorber.

Let’s create a PubSub topic. I’ve written in more detail on this here:

DataFlow

The last piece of the jigsaw is Dataflow. Dataflow can be used for much more complex tasks than to simply take data from one place and put it in another, but in this case, that’s all we need. Before we can set-up a new dataflow job, we’ll need to create a storage bucket:

We’ll create the bucket as Regional for now:

Remember that the bucket name must be unique (so no-one can ever pick pcm-data-flow-bucket again!)

Now, we’ll move onto the DataFlow itself. We get a number of dataflow templates out of the box; and we’ll use one of those. Let’s launch dataflow from the console:

Here we create a new Dataflow job:

We’ll pick “PubSub to BigQuery”:

You’ll then get asked for the name of the topic (which was created earlier) and the storage bucket (again, created earlier); you’re form should look broadly like this when you’re done:

I strongly recommend specifying a maximum number of workers, at least while you’re testing.

Testing

Finally, we’ll test it. PubSub allows you to publish a message:

Next, visit the Dataflow to see what’s happening:

Looks interesting! Finally, in BigQuery, we can see the data:

Summary

We now have the two separate cloud systems functioning independently. Step three will be to join them together.

Working with Multiple Cloud Providers – Part 1 – Azure Function

Regular readers (if there are such things to this blog) may have noticed that I’ve recently been writing a lot about two main cloud providers. I won’t link to all the articles, but if you’re interested, a quick search for either Azure or Google Cloud Platform will yield several results.

Since it’s Christmas, I thought I’d do something a bit different and try to combine them. This isn’t completely frivolous; both have advantages and disadvantages: GCP is very geared towards big data, whereas the Azure Service Fabric provides a lot of functionality that might fit well with a much smaller LOB app.

So, what if we had the following scenario:

Santa has to deliver presents to every child in the world in one night. Santa is only one man* and Google tells me there are 1.9B children in the world, so he contracts out a series of delivery drivers. There needs to be around 79M deliveries every hour, let’s assume that each delivery driver can work 24 hours**. Each driver can deliver, say 100 deliveries per hour, that means we need around 790,000 drivers. Every delivery driver has an app that links to their depot; recording deliveries, schedules, etc.

That would be a good app to write in, say, Xamarin, and maybe have an Azure service running it; here’s the obligatory box diagram:

The service might talk to the service bus, might control stock, send e-mails, all kinds of LOB jobs. Now, I’m not saying for a second that Azure can’t cope with this, but what if we suddenly want all of these instances to feed metrics into a single data store. There’s 190*** countries in the world; if each has a depot, then there’s ~416K messages / hour going into each Azure service. But there’s 79M / hour going into a single DB. Because it’s Christmas, let assume that Azure can’t cope with this, or let’s say that GCP is a little cheaper at this scale; or that we have some Hadoop jobs that we’d like to use on the data. In theory, we can link these systems; which might look something like this:

So, we have multiple instances of the Azure architecture, and they all feed into a single GCP service.

Disclaimer

At no point during this post will I attempt to publish 79M records / hour to GCP BigQuery. Neither will any Xamarin code be written or demonstrated – you have to use your imagination for that bit.

Proof of Concept

Given the disclaimer I’ve just made, calling this a proof of concept seems a little disingenuous; but let’s imagine that we know that the volumes aren’t a problem and concentrate on how to link these together.

Azure Service

Let’s start with the Azure Service. We’ll create an Azure function that accepts a HTTP message, updates a DB and then posts a message to Google PubSub.

Storage

For the purpose of this post, let’s store our individual instance data in Azure Table Storage. I might come back at a later date and work out how and whether it would make sense to use CosmosDB instead.

We’ll set-up a new table called Delivery:

Azure Function

Now we have somewhere to store the data, let’s create an Azure Function App that updates it. In this example, we’ll create a new Function App from VS:

In order to test this locally, change local.settings.json to point to your storage location described above.

And here’s the code to update the table:

    public static class DeliveryComplete
    {
        [FunctionName("DeliveryComplete")]
        public static HttpResponseMessage Run(
            [HttpTrigger(AuthorizationLevel.Function, "post", Route = null)]HttpRequestMessage req, 
            TraceWriter log,            
            [Table("Delivery", Connection = "santa_azure_table_storage")] ICollector<TableItem> outputTable)
        {
            log.Info("C# HTTP trigger function processed a request.");
 
            // parse query parameter
            string childName = req.GetQueryNameValuePairs()
                .FirstOrDefault(q => string.Compare(q.Key, "childName", true) == 0)
                .Value;
 
            string present = req.GetQueryNameValuePairs()
                .FirstOrDefault(q => string.Compare(q.Key, "present", true) == 0)
                .Value;            
 
            var item = new TableItem()
            {
                childName = childName,
                present = present,                
                RowKey = childName,
                PartitionKey = childName.First().ToString()                
            };
 
            outputTable.Add(item);            
 
            return req.CreateResponse(HttpStatusCode.OK);
        }
 
        public class TableItem : TableEntity
        {
            public string childName { get; set; }
            public string present { get; set; }
        }
    }

Testing

There are two ways to test this; the first is to just press F5; that will launch the function as a local service, and you can use PostMan or similar to test it; the alternative is to deploy to the cloud. If you choose the latter, then your local.settings.json will not come with you, so you’ll need to add an app setting:

Remember to save this setting, otherwise, you’ll get an error saying that it can’t find your setting, and you won’t be able to work out why – ask me how I know!

Now, if you run a test …

You should be able to see your table updated (shown here using Storage Explorer):

Summary

We now have a working Azure function that updates a storage table with some basic information. In the next post, we’ll create a GCP service that pipes all this information into BigTable and then link the two systems.

Footnotes

* Remember, all the guys in Santa suits are just helpers.
** That brandy you leave out really hits the spot!
*** I just Googled this – it seems a bit low to me, too.

References

https://docs.microsoft.com/en-us/azure/azure-functions/functions-how-to-use-azure-function-app-settings#manage-app-service-settings

https://anthonychu.ca/post/azure-functions-update-delete-table-storage/

https://stackoverflow.com/questions/44961482/how-to-specify-output-bindings-of-azure-function-from-visual-studio-2017-preview

Function Apps in Azure

With Update 15.3.1 for Visual Studio came the ability to create Function Apps in VS. Functions were previously restricted to writing code in the browser directly on Azure*.

Set-up

The first step is to download and install, or use the Visual Studio Installer to update to the latest version of VS (at the time of writing, this was 15.3.3 – but, as stated above, it’s 15.3.1 has the Function App update).

Once this is done, you need to launch the Visual Studio Installer again

Select the Azure Workload (if you haven’t already):

The Microsoft article, referenced at the bottom of this post, answers the issue of what happens if this doesn’t work on it’s own; it says:

If for some reason the tools don’t get automatically updated from the gallery…

I’ve now done this twice on two separate machines and, on both occasions, the tools have not automatically been updated from the gallery (it also sounds like the author of the article doesn’t really know why this is the case). Assuming that the reader of this article will suffer the same fate, they should update the Azure gallery extension (if you don’t have to do that then please leave a comment – I’m interested to know if it ever works):

Close everything (including the installer) and this appears:

Finally, we see the new app type:

Function Apps

Once you create a new function app, you get an empty project:

To add a new function, you can right click on the solution (as you would for a new class file) and select new function:

New Function

You then, helpfully, get asked what kind of function you would like:

Function Type

Let’s select Generic WebHook:

Generic Web Hook

We now have some template code, so let’s try and run it:

Running it gives this neat little screen that wouldn’t have looked out of place on my BBS in 1995**:

The bottom line gives an address, so we can just type that into a browser:

As you can see, we do get a “WebHook Triggered” message… but things kind of go downhill from there!

There are a couple of reasons for this; the WebHook only deals with a post and, as per the default code, it needs some JSON for the body; let’s use Postman to create a new request:

This looks much better, and the console tells us that we’re firing:

Publish the App

Okay – so the function works locally, which is impressive (debugging on Azure wasn’t the easiest of things). Now we want to push it to the cloud.

This goes away for a while, compiles the app and then deploys it for us:

Your function app should now be in Azure:

Now you’ll need to find it’s URL. As already detailed in this article, you get the function URL from here:

If we switch Postman over to the cloud, we get the same result***:

Footnotes

* Actually, this is probably untrue. It was probably possible to write them in VS and publish them. There were a few add-ons knocking about in the VS gallery that claimed to allow just that.

** It was called The Twilight Zone BBS; although, if I’m being honest, although the ANSI art on it was impressive, it wasn’t my art work.

*** Locally, it wasn’t that fussed about the body format (it could be text), but once it was in the cloud, it insisted on JSON.

References

https://blogs.msdn.microsoft.com/webdev/2017/05/10/azure-function-tools-for-visual-studio-2017/

http://pmichaels.net/2017/07/16/azure-functions/

Creating a Basic Azure Web Job

In this article, I discussed the use of Azure functions; however, Web Jobs perform a similar task. Azure Functions are effectively an abstraction on top of Web Jobs – meaning that, while you have more control when using Web Jobs, there’s a little more to do when writing them.

This article covers the basics of Web Jobs, and has a walk-through for creating a very simple task using one.

Create a new Web Job

Once you create this project, you’ll need to fill in the following values in the app.config:

<configuration>
  <connectionStrings>
    <!-- The format of the connection string is "DefaultEndpointsProtocol=https;AccountName=NAME;AccountKey=KEY" -->
    <!-- For local execution, the value can be set either in this config file or through environment variables -->
    <add name="AzureWebJobsDashboard" connectionString="" />
    <add name="AzureWebJobsStorage" connectionString="" />
  </connectionStrings>

These can both be the same value, but they refer to where Azure stores it’s data.

AzureWebJobsDashboard

This is the storage account used to store logs.

AzureWebJobsStorage

This is the storage account used to store whatever the application needs to function (for example: queues or tables). In the example below, it’s where the file will go.

Storage accounts can be set-up from the Azure dashboard (more on this later):

A Basic Application

For this example, let’s take a file from a blob storage and parse it, then write out the result in a log. Specifically, we’ll take an XML file, and write the number of nodes into a log; here’s the file:

<test>
    <myNode>
    </myNode>
    <myNode>
    </myNode>
</test>

I think we’ll probably be looking for a figure around 2.

Blob Storage

Before we can do anything with blob storage, we’ll need a new storage area; create a new storage account:

Set the storage kind to “General Storage” (because we’re working with files); other than that, go with your gut.

Uploading

Once you’ve created the account, you’ll need to add a file – otherwise nothing will happen. You can do this in the web portal, or you can do it via a desktop utility that Microsoft provide: Storage Explorer.

I kind of expected this to take me to the web page mentioned… but it doesn’t! You have to navigate there manually:

http://storageexplorer.com

Install it… unless you want to upload your file using the web portal… in which case: don’t.

We can create a new container:

Now, we can see the storage account and any containers:

Now, you can upload a file from here (remember that you can do all this inside the Portal):

Once you’ve created this, go back and update the storage connection string (described above). You may also want to repeat the process for a dashboard storage area (or, as stated above, they can be the same).

Programmatically Downloading

Now we have a file in the directory, it can be downloaded via the WebJob; here’s a function that will download a file:

        public static async Task<string> GetFileContents(string connectionString, string containerString, string fileName)
        {
            CloudStorageAccount storage = CloudStorageAccount.Parse(connectionString);
            CloudBlobClient client = storage.CreateCloudBlobClient();
            CloudBlobContainer container = client.GetContainerReference(containerString);
            CloudBlob blob = container.GetBlobReference(fileName);

            MemoryStream ms = new MemoryStream();
            await blob.DownloadToStreamAsync(ms);
            ms.Position = 0;

            StreamReader sr = new StreamReader(ms);
            string contents = sr.ReadToEnd();
            return contents;
        }

The code to call this is here (note the commented out commands from the default WebJob Template):

        static void Main()
        {
            Console.WriteLine("Starting");

            var config = new JobHostConfiguration();

            if (config.IsDevelopment)
            {
                config.UseDevelopmentSettings();
            }

            //var host = new JobHost();

            string fileContents = AzureHelpers.GetFileContents(config.StorageConnectionString, "testblob", "test.xml").Result;
            Console.WriteLine(fileContents);

            // The following code ensures that the WebJob will be running continuously
            //host.RunAndBlock();

            Console.WriteLine("Done");
        }

Although this works (sort of – it doesn’t check for new files, and it would need to be run on a scheduled basis – “On Demand” in Azure terms), you don’t need it (at least not for jobs that react to files being uploaded to storage containers). WebJobs provide this functionality out of the box! There are a number of decorators that you can use for various purposes:

  • string
  • TextReader
  • Stream
  • ICloudBlob
  • CloudBlockBlob
  • CloudPageBlob
  • CloudBlobContainer
  • CloudBlobDirectory
  • IEnumerable<CloudBlockBlob>
  • IEnumerable<CloudPageBlob>

Here, we’ll use a BlobTrigger and accept a string. Moreover, doing it this way makes the writing to the log much easier, as there’s injection of sorts (at least I’m assuming that’s what it’s doing). Here’s what the complete solution looks like in the new paradigm:

        public static void ProcessFile([BlobTrigger("testblob/{name}")] string fileContents, TextWriter log)
        {            
            XmlDocument xmlDoc = new XmlDocument();
            xmlDoc.LoadXml(fileContents);            
            log.WriteLine($"Node count: {xmlDoc.FirstChild.ChildNodes.Count}");
        }

The key thing to notice here is that the function is static and public (the class it’s in needs to be public, too – even is that’s the Program class). The WebJob framework uses reflection to work out which functions it needs to run.

The other point to note is that I’m getting the parameter as a string – the article above details what you could have it as; for example, if you wanted to delete it afterwards, you’d probably want to use an ICloudBlob or something similar.

Anyway, it works:

The log file

Remember the storage area that we specified for the dashboard earlier? You should now see some new containers created in that storage area:

This has created a number of directories, but the one that we’re interested in is “output-logs” in the “azure-webjobs-hosts” container:

And here’s the log itself:

References

https://docs.microsoft.com/en-us/azure/app-service-web/web-sites-create-web-jobs

https://stackoverflow.com/questions/36610952/azure-webjobs-vs-azure-functions-how-to-choose

https://stackoverflow.com/questions/27580264/where-do-i-get-the-azurewebjobsdashboard-connection-string-information

http://www.hanselman.com/blog/IntroducingWindowsAzureWebJobs.aspx

https://stackoverflow.com/questions/24286214/where-are-azure-webjobs-blobinput-and-bloboutput-classes

https://docs.microsoft.com/en-us/azure/app-service-web/websites-dotnet-webjobs-sdk-storage-blobs-how-to