Azure Durable Functions: performance tips

I’ve been using Azure Durable Function for a long time for developing a high load ETL process. And I have to mention that it’s very useful library that allows you write stateful functions in a serverless compute environment. Also it not worse to mention that programming model allows you to write code in a same way you write asynchronous code using async and await keywords. At a first look it might seem that there is no difference between regular async calls and durable calls, but it’s not true.

Regular asynchronous code is handled by compiler by generating async state machine, so that async method is able to resume at the place where execution was suspended previously. But in case Durable Function (orchestrator) we do not have state machine generated by compiler. Instead the framework preserves execution history in external storage (Azure Storage) and then every time orchestration function “wakes up” it “plays back” execution history in order to determine what operations have already been executed and what operations have to be executed.

And sometimes misunderstanding of how durable function works behind the scene leads to a huge performance issues. Below I’ve gathered some basic tips which might help you to get rid of such issues.

Tip #1: Payload size for Activity function does matter

Lets take a look at code sample below:

[FunctionName("Orchestrator")]
public static async Task Run(
    [OrchestrationTrigger] IDurableOrchestrationContext context)
{
    var items = await context.CallActivityAsync<string>("GetItems");

    foreach (var item in items)
    {
        await context.CallActivityAsync<string>("ProcessItem", item);
    }
}

Lets say that activity function GetItems returns 500 MiB array of string. Since durable function framework stores all events happened in scope of orchestration function, this huge payload gets persisted Durable Storage.

In a worst case scenario every time when await context.CallActivityAsync<string>("ProcessItem", item); is called, the orchestration function is unloaded from memory until ProcessItem function complete. When ProcessItem finished, the orchestration function wakes up and re-plays execution history in order to get actual execution state. In other words orchestration function starts from very start and tries to call GetItems and ProcessItem once again.

But it doesn’t mean that GetItems function will be executed one more time. Instead of that the orchestration function fetches result of GetItems from Durable Storage. (The key point here is that await context.CallActivityAsync<string>("GetItems") might be executed multiple time, but activity function GetItems is executed only once).

I guess the issue here is obvious. Basically each time when orchestration function wakes up it must fetch 500 MiB payload from Durable storage. And this will happen as many times as how many elements in items array. As a result we just “burn” significant amount of time for downloading payload from Azure Storage rather that doing useful processing.

To be more practical lets run two tests:

Test #1 - Huge payload

    [FunctionName("SeqWithHugePayload")]
    public static async Task RunSeqHuge(
        [OrchestrationTrigger] IDurableOrchestrationContext context)
    {
        var payload = await context.CallActivityAsync<string[]>("GetHugePayload", null);

        await context.Measure(async () =>
        {
            foreach (var item in payload.Take(3))
            {
                await context.CallActivityAsync("ProcessItem", item);
            }
        });
    }

    [FunctionName("GetHugePayload")]
    public static Task<string[]> RunGetHugePayload([ActivityTrigger] IDurableActivityContext context, ILogger client)
    {
        var results = new List<string>();

        for (var i = 0; i < 100000; i++)
        {
            results.Add(Enumerable
                .Repeat(Guid.NewGuid().ToString(), 100)
                .Aggregate(new StringBuilder(), (builder, s) => builder.Append(s))
                .ToString());
        }

        return Task.FromResult(results.ToArray());
    }

In sample above you might see that function GetHugePayload returns array of 100000 elements, each of them is a string presented as a concatenation of 100 guids. Payload size is 144120004 bytes (~145 Mb). Execution time of SeqWithHugePayload is 00:00:41:5714 (41 seconds).

Test #2 - Small payload

    [FunctionName("SeqWithSmallPayload")]
    public static async Task RunSeqHuge(
        [OrchestrationTrigger] IDurableOrchestrationContext context)
    {
        var payload = await context.CallActivityAsync<string[]>("GetSmallPayload", null);

        await context.Measure(async () =>
        {
            foreach (var item in payload.Take(3))
            {
                await context.CallActivityAsync("ProcessItem", item);
            }
        });
    }

    [FunctionName("GetSmallPayload")]
    public static Task<string[]> RunGetHugePayload([ActivityTrigger] IDurableActivityContext context, ILogger client)
    {
        var results = new List<string>();

        for (var i = 0; i < 10; i++)
        {
            results.Add(Enumerable
                .Repeat(Guid.NewGuid().ToString(), 100)
                .Aggregate(new StringBuilder(), (builder, s) => builder.Append(s))
                .ToString());
        }

        return Task.FromResult(results.ToArray());
    }

On the other hand you might see that function GetSmallPayload returns array of 10 elements, each of them is a string presented as a concatenation of 100 guids. Payload size is 144124 bytes (~0.2 Mb). Execution time of SeqWithSmallPayload is 00:00:03:3554 (~ 4 seconds).

As you may see, execution time of GetHugePayload and GetSmallPayload is not included into measurment. That shows that huge payload slows down whole workflow.

Table below represents results of both tests:

Test	Execution Time
Huge Payload (145 Mb)	00:00:41:5714 (41 seconds)
Small Payload (0.2 Mb)	00:00:03:3554 (4 seconds)

How to optimize?

There are multiple options:

use Fan-out/fan-in pattern, so that it drastically reduces number of orchestration functions replays. The less replays we have - the less round-trips to Azure storage we have to do, the better execution time;
change design of the application to reduce the number of large messages (payloads);
use extended sessions, which will potentially reduce the number of replays and therefore the number of times we need to fetch payload from Azure Storage.

Tip #2: Use Fan-out/fan-in as much as possible

This one is tightly related with tip #1. Explanation is pretty evident: in case of using fan-out we reduce number of orchestration replays. The less of replays we have - the less round-trips to Azure storage we have to do, the better execution time.

Tests below show the difference between sequentian and parallel execution:

Test #1 - Sequential processing

    [FunctionName("Sequential")]
    public static async Task RunSeqHuge(
        [OrchestrationTrigger] IDurableOrchestrationContext context)
    {
        var payload = await context.CallActivityAsync<string[]>("GetHugePayload", null);

        await context.Measure(async () =>
        {
            foreach (var item in payload.Take(3))
            {
                await context.CallActivityAsync("ProcessItem", item);
            }
        });
    }

Execution time is 00:00:37.5051 (~ 38 seconds)

Test #1 - Parallel processing (using fan-out and fan-in)

    [FunctionName("FanOut")]
    public static async Task RunSeqHuge(
        [OrchestrationTrigger] IDurableOrchestrationContext context)
    {
        var payload = await context.CallActivityAsync<string[]>("GetHugePayload", null);

        await context.Measure(async () =>
        {
            await Task.WhenAll(payload
                .Take(3)
                .Select(e => context.CallActivityAsync("ProcessItem", e)))
        });
    }

Execution time is 00:00:14.7714 (~ 15 seconds)

Table below represents results of both tests:

Test	Execution Time
Sequential	00:00:37.5051 (~ 38 seconds)
Fan-out	00:00:14.7714 (~ 15 seconds)

Please keep in mind that fan-in operation is pretty expensive, as it’s required to fetch results of all completed tasks. More tasks you have to fan-in - the more time is required to get results from Storage.

Tip #3: Sub-orchestrators may improve performance

As was mentioned earlier, Durable Functions uses Azure Storage to preserve execution history. Every time an activity completes, the orchestrator function replays all the prior events to obtain the current state of the workflow.

History events for orchestrator and sub-orchestrator have unique partition key, so it allows to efficiently fetch history from Azure Storage. Means that dividing your orchestrator into multiple sub-orchestrators allows to reduce amount of history events that need to fetch from storage. The less events you need to fetch - the less time is required to reconstruct workflow state - the better performance.

Conclusion

Durable Function is pretty powerfull and easy to use framework. You can easy write stateful workflows just like you write regular async code. But if you want to achieve efficiency you still need to understand how all this stuff works behind the scene.

Happy codding!