Alexander Brett

10 steps to avoid failure with async/await in C#

12 December 2017

In 2012 Microsoft released C# 5.0, which introduced the async and await keywords:

C# changed the game again by baking asynchrony into the language as a first-class participant

This is the future. Not only are other major languages following C#’s lead in implementing the Task Asynchrony Pattern (TAP), but more and more C# libraries are being written using Tasks to offer non-blocking APIs - most notably System.HttpClient. Programming using TAP can make better use of resources by allowing methods to yield threads to other tasks when blocked, and it can make callback and promise based programming vastly easier to write, read and understand. To get more of an idea of what teary-eyed idealogues are saying, read some best practices - and then take a step back.

The async/await keywords and tasks are by no means trivial to get right: there are some obvious and slightly less obvious pitfalls which you’ll want to avoid. This blog post is the crash course you need to avoid embarrassing yourself at work, and ending up with developers who have spent the last 25 years bashing their heads against explicit threading deadlocks go off on boring rants at you about how these newfangled language features will get us all in trouble…

Here is what you need to know. You can follow along with all of the examples in a debugger by cloning https://github.com/alexander-brett/CsharpAsyncDemo

1. Consider whether you need TAP, and integrate it thoughtfully if you do

Asynchronous and parallel programming can be harder than it’s worth. The Task Parallel Library (TPL) often does not play well with legacy code at all, since many of the easy mistakes fall at the boundary between Task-based and non-Task-based code, and juggling threads AND tasks can become a headache really fast.

However, if you have decided that callback- or promise-based or parallel programming is a benefit in your new (or relatively small) application, strongly prefer the TPL over explicit threads! Your code will be much simpler to understand, and much much easier to test. The ideal situation is an app that has been designed from the ground up as a series of asynchronous operations, for instance a data pipeline or server.

2. Understand the difference between single- and multi-threaded schedulers

The TPL uses a Task Scheduler to keep track of which tasks are waiting and in progress at any time, and kick off waiting tasks when possible. Task Schedulers operate in two modes: single and multi-threaded. In UI applications and ASP.net servers, the main thread operates in a Single Thread Apartment (STA), which means that the Task Scheduler will only schedule tasks on the main thread. When there is only one thread to share between all scheduled tasks, it is much easier to deadlock than if there are multiple threads kicking around: in fact, without taking great care, this will happen a great deal!

This demands that you be aware of what mode the code you’re writing will run in - and you must bear in mind that if you’re writing library code of any time, it’s almost certain that down the line it will eventually be consumed in a single-threaded environment.

3. Recognise the classic TPL single-threaded deadlock

The single most talked about issue in the Thread Parallel Library is how easy it is to unknowingly cause a deadlock in a single-threaded scheduler when trying to synchronously wait for async code. For an in-depth explanation, you can read any of those blog posts, but the basic idea is that the UI thread schedules a task to be run on the first free thread and then blocks on the result - consuming the only available thread!

The solution in every case that this comes up is to ensure that the continuation (the part of the code after the asynchronous call) is permitted to run in a different context from async call itself, or to allow async to permeate all way up to your application’s entrypoint.

Consider the following code:

private static async Task DelayAsync() => await Task.Delay(100);

[Test, Timeout(200)]
public static void ThisWorksFineHonest()
{
    DelayAsync().Wait();
}

Because this test is running in a multiple-threaded environment, everything goes fantastically, and the test passes. However, when we queue it on the single-threaded synchronisationcontext, it fails:

// thanks to https://stackoverflow.com/questions/40343572/simulate-async-deadlock-in-a-console-application
[Test, Timeout(200)]
public static void DemonstrateFailure()
{
    new DedicatedThreadSynchronisationContext().Send(state =>
    {
        DelayAsync().Wait();
    }, null);
}

In order to demonstrate this, we’re using the DedicatedThreadSynchronisationContext from this StackOverflow post which reproduces the single-threaded behaviour of UI apps, and allows us to put it under a microscope in some unit tests.

4. Torture test every async method in a single-threaded scheduler

It’s great news that we have a way to replicate this bug in a unit test environment! Nothing squashes bugs like a well-written unit test. As such, we can write a very straightforward harness for any code which returns a Task:

public class TestHarness
{
    public static void TestAsyncBehaviour(Func<Task> test)
    {
        new DedicatedThreadSynchronisationContext().Send(state =>
        {
            test().Wait();
        }, null);
    }
}

Now, if this harness doesn’t deadlock, we’re safe. From here on, all the code examples will use this helper method. In your codebase, you can avoid embarrassing moments by using something like this regularly - certainly if you think you might want to blame something on Task library counter-intuitiveness you no longer need to guess.

5. Avoid deadlocks by removing async and await keywords where possible

Just because you’re using Tasks doesn’t mean you need the async and await keywords. They are only required when you need to allow the task in question to finish before running a continuation. In the following code, the first test fails and the second passes:

public static class AvoidDeadlockByNotUsingAwaitWhenUnneccesary
{
    private static async Task DelayAsync() => await Task.Delay(100);

    [Test, Timeout(200)]
    public static void DemonstrateDeadlock()
    {
        TestHarness.TestAsyncBehaviour(DelayAsync);
    }

    private static Task DelayTask() => Task.Delay(100);

    [Test, Timeout(200)]
    public static void DemonstrateNoDeadlock()
    {
        TestHarness.TestAsyncBehaviour(DelayTask);
    }
}

In DemonstrateDeadlock, the use of the await keyword attempts to capture context and yield, while the thread is blocked. This is exactly the deadlock outlined above, but in the second example we fix the deadlock simply by removing unnecessary keywords! The moral of the story is that return await is a code smell. Don’t do this until you have enough experience with TPL to know that that’s exactly, specifically, precisely what you want, and if it is place a very precise comment explaining why. It almost never is.

6. Avoid deadlocks by using ConfigureAwait(false)

In the previous example we avoided capturing the context by avoiding the return await costruction: in this example, we’ll do it by telling the scheduler explicitly that we don’t need it. Task.ConfigureAwait() hints to the scheduler whether or not the continuation (the bit of code after the await statement) needs to run in the same thread as the preceding code or not. If it does (ConfigureAwait(true), the default), then the caller captures the thread context and when you block that thread (with Wait()) the thread pool is starved. However, if you don’t need the thread context to be capture - you almost never do - you can fix the deadlock with ConfigureAwait(false):

class AvoidDeadlockByUsingConfigureAwait
{
    private static async Task<object> DelayAsync()
    {
        await Task.Delay(100);
        return new object();
    }

    [Test, Timeout(200)]
    public static void DemonstrateDeadlock()
    {
        TestHarness.TestAsyncBehaviour(DelayAsync);
    }

    private static async Task<object> DelayWithConfigureAwait()
    {
        await Task.Delay(100).ConfigureAwait(false);
        return new object();
    }

    [Test, Timeout(200)]
    public static void DemonstrateNoDeadlock()
    {
        TestHarness.TestAsyncBehaviour(DelayWithConfigureAwait);
    }
}

For more details, read this - in fact the whole article is a really great read.

7. Understand that await is more-or-less equivalent to ContinueWith()

So the await keyword is essentially just a way of scheduling a continuation - we’re saying when this task is finished, do this other thing. However, there is more than one way to do it: the following two tests do effectively the same thing (here is a more detailed blog post about it)[https://blogs.msdn.microsoft.com/pfxteam/2012/01/20/await-synchronizationcontext-and-console-apps/]:

class AwaitVersusContinueWith
{
    private static Task Method1()
    {
        Console.WriteLine("Method1");
        return Task.Delay(10);
    }

    private static Task Method2()
    {
        Console.WriteLine("Method2");
        return Task.Delay(10);
    }

    [Test]
    public static async Task DemonstrateAwait()
    {
        await Method1();
        await Method2();
    }

    [Test]
    public static async Task DemonstrateContinueWith()
    {            
        await Method1().ContinueWith(task => Method2());
    }
}

Judicious use of ContinueWith can make your code easier to understand, especially if you write code that accepts Tasks and manipulates them. Rather than awaiting your argument and doing some things, you can return an explicit continuation. Returning a continuation happens instantaneously, whereas awaiting happens asynchronously - and you will eventually be able to form an intuition about when you want the asynchronous call to happen and where you want errors to propagate (more on that later).

8. Avoid deadlocks by using Continuewith()

Having understood ContinueWith it becomes clear that the following example is effectively the same as #5 - remove the await point and the deadlock disappears:

class RemoveDeadlockByUsingContinueWith
{
    private static async Task<object> DelayAsync()
    {
        await Task.Delay(100);
        return new object();
    }

    [Test, Timeout(200)]
    public static void DemonstrateDeadlock()
    {
        TestHarness.TestAsyncBehaviour(DelayAsync);
    }

    private static Task<object> DelayContinuation()
        => Task.Delay(100).ContinueWith(task => new object());

    [Test, Timeout(200)]
    public static void DemonstrateNoDeadlock()
    {
        TestHarness.TestAsyncBehaviour(DelayContinuation);
    }
}

So far I have not managed to make a deadlock using ContinueWith - if you have an example, please open a pull request!

9. Realise that Tasks are more synchronous than you think

This is known as the fast path. Essentially, a lot of the time when you call a Task method, some or all of the continuations will execute synchronously, depending on exactly what’s going on. await Task.CompletedTask will execute synchronously, and if you only have an await keyword halfway down a method, the first lines will execute synchronously before the task yields. If you have a TaskCompletionSource and call SetResult on it, and there is one continuation associated with the task, then unless you set TaskContinuationOptions.RunContinuationsAsynchronously the continuation will probably run synchronously.

If exactly which lines of code execute synchronously or otherwise matters in the application you’re writing, it’s worth spending a few hours getting a feel for which situations are and are not synchronously executed. In a debugger, you can step through a synchronous chain of operations until it ends, which can be very instructive! Until then, just be careful not to assume that anything you awaited is automatically happening in parallel.

10. Handle exceptions well

With Tasks, any exception will not be thrown until the task is awaited, unless the method returning a task is in fact synchronous. If the task is never awaited, any exception thrown is called unobserved. Having an unobserved exception in your code is in general a really bad idea, because it means that bad behaviour is almost impossible to spot. Just like it’s a good idea to thoroughly test around error paths in synchronous code, you should also write tests for error behaviour in your async code.

Also, when a task throws an exception, when you await it you will see an AggregateException whose ex.Exception property contains the original exception. Often it’s desirable to unpack this and handle the inner exception.

All of the following tests pass!

class DoNotSwallowExceptions
{
    public static async Task ThrowException()
    {
        await Task.Delay(10).ConfigureAwait(false);
        throw new Exception();
    }

    [Test]
    public static void DoesNotThrowWhenAsynchronous()
    {
        Assert.DoesNotThrow(() => ThrowException());
    }

    [Test]
    public static void HaveToWaitToGetAnException()
    {
        Assert.Throws<AggregateException>(() => ThrowException().Wait());
    }

    public static async Task ThrowExceptionBeforeAwaiting()
    {
        if ("1".Equals(1.ToString())) throw new Exception();            
        await Task.Delay(10).ConfigureAwait(false);
    }

    [Test]
    public static void DoesNotThrowWhenHappensToBeAsynchronous()
    {
        Assert.DoesNotThrow(() => ThrowExceptionBeforeAwaiting());
    }

    public static Task ThrowExceptionSynchronouslyReturnTask()
    {
        if ("1".Equals(1.ToString())) throw new Exception();            
        return Task.Delay(10);
    }

    [Test]
    public static void ThrowsExceptionWhenHappensToBeSynchronous()
    {
        Assert.Throws<Exception>(() => ThrowExceptionSynchronouslyReturnTask());
    }
}

Conclusion

Having been working intensively with aysnc programming for the last little while, I’ve seen that almost every bug in async code I’ve written or read has come from misunderstanding one of the points in this article. I suggest that you get out there, write some code, and when it fails, come and reread this before tearing your hair out.

None of this work is genuinely new - I have aggregated insights from across various different blog posts, and as such I’m indebted to the previous work of people who have already explained and demonstrated these things.

If you think the explanation here needs tightening up, please submit a pull request, and again please do check out the csharp solution and play with it yourself.

Happy coding!

Tags: CSharp

Presenting to #Geomob

11 September 2016

Following on from my post about visualising London, I was invited to speak at #Geomob at the BCS! I had a lot of fun.

If you’re interested in the slideshow, you can see the slides here, and some source code is here. Press p to see my presentation notes - you drive the show by pressing the left mouse button.

The slideshow was driven by remark.js in order to be able to get all of the animations to work nicely. I did in fact have to roll my own fork of remark to pull it off - basically I ripped out the click event handling and replaced it with something intensely hacky, so it’s not code I’ll share for now!

During and after my talk, this happened!

Thank you everyone!

Visualising London

01 June 2016

I recently started working on a couple of interesting data-visualisation projects at work, which lead me to researching visualisation libraries, watching this video, and thinking that it would be cool to do something like that for London.

Happily, data.london.gov.uk exists and has a bizarrely good selection of datasets, so I threw together the following visualisation tool to provide a little window into the state of London, ward-by-ward.

Click to load
Abbey 1.3sq. km Barking and DagenhamAlibon 1.4sq. km Barking and DagenhamBecontree 1.3sq. km Barking and DagenhamChadwell Heath 3.4sq. km Barking and DagenhamEastbrook 3.5sq. km Barking and DagenhamEastbury 1.4sq. km Barking and DagenhamGascoigne 1.1sq. km Barking and DagenhamGoresbrook 1.3sq. km Barking and DagenhamHeath 2sq. km Barking and DagenhamLongbridge 1.6sq. km Barking and DagenhamMayesbrook 1.9sq. km Barking and DagenhamParsloes 1.2sq. km Barking and DagenhamRiver 3.5sq. km Barking and DagenhamThames 7.9sq. km Barking and DagenhamValence 1.3sq. km Barking and DagenhamVillage 2.1sq. km Barking and DagenhamWhalebone 1.6sq. km Barking and DagenhamBrunswick Park 3.2sq. km BarnetBurnt Oak 2.1sq. km BarnetChilds Hill 3.1sq. km BarnetColindale 2.6sq. km BarnetCoppetts 2.7sq. km BarnetEast Barnet 3.7sq. km BarnetEast Finchley 2.5sq. km BarnetEdgware 5.6sq. km BarnetFinchley Church End 2.7sq. km BarnetGarden Suburb 4.7sq. km BarnetGolders Green 3sq. km BarnetHale 5.4sq. km BarnetHendon 2.8sq. km BarnetHigh Barnet 8.3sq. km BarnetMill Hill 9.4sq. km BarnetOakleigh 3.3sq. km BarnetTotteridge 8.8sq. km BarnetUnderhill 4.6sq. km BarnetWest Finchley 2.2sq. km BarnetWest Hendon 3.4sq. km BarnetWoodhouse 2.6sq. km BarnetBarnehurst 2.9sq. km BexleyBelvedere 3.2sq. km BexleyBlackfen and Lamorbey 1.7sq. km BexleyBlendon and Penhill 2.1sq. km BexleyBrampton 2sq. km BexleyChristchurch 2.6sq. km BexleyColyers 1.6sq. km BexleyCrayford 4sq. km BexleyCray Meadows 7.2sq. km BexleyDanson Park 2.7sq. km BexleyEast Wickham 2.2sq. km BexleyErith 4.8sq. km BexleyFalconwood and Welling 1.7sq. km BexleyLesnes Abbey 2.4sq. km BexleyLonglands 2.6sq. km BexleyNorth End 5.7sq. km BexleyNorthumberland Heath 1.8sq. km BexleySt. Mary's 4.4sq. km BexleySt. Michael's 1.4sq. km BexleySidcup 2.7sq. km BexleyThamesmead East 4.8sq. km BexleyAlperton 2.1sq. km BrentBarnhill 3sq. km BrentBrondesbury Park 1.7sq. km BrentDollis Hill 2.3sq. km BrentDudden Hill 1.7sq. km BrentFryent 2.7sq. km BrentHarlesden 1.1sq. km BrentKensal Green 1.1sq. km BrentKenton 2.2sq. km BrentKilburn 0.9sq. km BrentMapesbury 1.4sq. km BrentNorthwick Park 2.7sq. km BrentPreston 2.4sq. km BrentQueens Park 1.5sq. km BrentQueensbury 2.1sq. km BrentStonebridge 4.1sq. km BrentSudbury 2.1sq. km BrentTokyngton 2.8sq. km BrentWelsh Harp 2.3sq. km BrentWembley Central 1.6sq. km BrentWillesden Green 1.5sq. km BrentBickley 4.9sq. km BromleyBiggin Hill 8.3sq. km BromleyBromley Common and Keston 8.3sq. km BromleyBromley Town 3.7sq. km BromleyChelsfield and Pratts Bottom 12.6sq. km BromleyChislehurst 10.4sq. km BromleyClock House 2.3sq. km BromleyCopers Cope 3.3sq. km BromleyCray Valley East 13.3sq. km BromleyCray Valley West 3.9sq. km BromleyCrystal Palace 2.3sq. km BromleyDarwin 29sq. km BromleyFarnborough and Crofton 7sq. km BromleyHayes and Coney Hall 10.8sq. km BromleyKelsey and Eden Park 5.2sq. km BromleyMottingham and Chislehurst North 2.4sq. km BromleyOrpington 4.3sq. km BromleyPenge and Cator 3sq. km BromleyPetts Wood and Knoll 4.3sq. km BromleyPlaistow and Sundridge 4sq. km BromleyShortlands 2.5sq. km BromleyWest Wickham 4.4sq. km BromleyBelsize 0.8sq. km CamdenBloomsbury 1sq. km CamdenCamden Town with Primrose Hill 1.2sq. km CamdenCantelowes 0.8sq. km CamdenFortune Green 1sq. km CamdenFrognal and Fitzjohns 1.5sq. km CamdenGospel Oak 0.7sq. km CamdenHampstead Town 2.4sq. km CamdenHaverstock 0.7sq. km CamdenHighgate 3.2sq. km CamdenHolborn and Covent Garden 1.2sq. km CamdenKentish Town 1sq. km CamdenKilburn 0.7sq. km CamdenKing's Cross 0.6sq. km CamdenRegent's Park 1.3sq. km CamdenSt. Pancras and Somers Town 1.4sq. km CamdenSwiss Cottage 1.3sq. km CamdenWest Hampstead 0.9sq. km CamdenAddiscombe 1.6sq. km CroydonAshburton 2.7sq. km CroydonBensham Manor 1.4sq. km CroydonBroad Green 2.7sq. km CroydonCoulsdon East 7.7sq. km CroydonCoulsdon West 5sq. km CroydonCroham 3.5sq. km CroydonFairfield 3.8sq. km CroydonFieldway 1.5sq. km CroydonHeathfield 8.7sq. km CroydonKenley 6.5sq. km CroydonNew Addington 2.4sq. km CroydonNorbury 2.5sq. km CroydonPurley 4.2sq. km CroydonSanderstead 6.3sq. km CroydonSelhurst 2.3sq. km CroydonSelsdon and Ballards 6.4sq. km CroydonShirley 2.8sq. km CroydonSouth Norwood 2.4sq. km CroydonThornton Heath 1.7sq. km CroydonUpper Norwood 2.6sq. km CroydonWaddon 3.6sq. km CroydonWest Thornton 2sq. km CroydonWoodside 2.2sq. km CroydonActon Central 1.8sq. km EalingCleveland 2.2sq. km EalingDormers Wells 2.3sq. km EalingEaling Broadway 1.9sq. km EalingEaling Common 2.1sq. km EalingEast Acton 4.3sq. km EalingElthorne 2sq. km EalingGreenford Broadway 2.5sq. km EalingGreenford Green 3.4sq. km EalingHanger Hill 3.3sq. km EalingHobbayne 2.2sq. km EalingLady Margaret 1.5sq. km EalingNorthfield 1.5sq. km EalingNorth Greenford 3.3sq. km EalingNortholt Mandeville 2.8sq. km EalingNortholt West End 3.5sq. km EalingNorwood Green 3.8sq. km EalingPerivale 3.4sq. km EalingSouth Acton 1.7sq. km EalingSouthall Broadway 1.6sq. km EalingSouthall Green 1.6sq. km EalingSouthfield 1.4sq. km EalingWalpole 1.5sq. km EalingBowes 1.5sq. km EnfieldBush Hill Park 2.5sq. km EnfieldChase 16.9sq. km EnfieldCockfosters 10.4sq. km EnfieldEdmonton Green 3.1sq. km EnfieldEnfield Highway 4.6sq. km EnfieldEnfield Lock 3.3sq. km EnfieldGrange 3.3sq. km EnfieldHaselbury 1.8sq. km EnfieldHighlands 5.1sq. km EnfieldJubilee 3.9sq. km EnfieldLower Edmonton 2.2sq. km EnfieldPalmers Green 1.9sq. km EnfieldPonders End 3.5sq. km EnfieldSouthbury 2.9sq. km EnfieldSouthgate 2.7sq. km EnfieldSouthgate Green 2.6sq. km EnfieldTown 2.2sq. km EnfieldTurkey Street 2.3sq. km EnfieldUpper Edmonton 2.6sq. km EnfieldWinchmore Hill 2.8sq. km EnfieldAbbey Wood 2.7sq. km GreenwichBlackheath Westcombe 2.1sq. km GreenwichCharlton 2sq. km GreenwichColdharbour and New Eltham 2.5sq. km GreenwichEltham North 2.6sq. km GreenwichEltham South 4.6sq. km GreenwichEltham West 2.4sq. km GreenwichGlyndon 1.7sq. km GreenwichGreenwich West 2.8sq. km GreenwichKidbrooke with Hornfair 2.4sq. km GreenwichMiddle Park and Sutcliffe 3.3sq. km GreenwichPeninsula 4.5sq. km GreenwichPlumstead 2.3sq. km GreenwichShooters Hill 3.8sq. km GreenwichThamesmead Moorings 4.8sq. km GreenwichWoolwich Common 2.6sq. km GreenwichWoolwich Riverside 3.3sq. km GreenwichAddison 0.6sq. km Hammersmith and FulhamAskew 0.8sq. km Hammersmith and FulhamAvonmore and Brook Green 0.9sq. km Hammersmith and FulhamCollege Park and Old Oak 3.4sq. km Hammersmith and FulhamFulham Broadway 0.7sq. km Hammersmith and FulhamFulham Reach 0.9sq. km Hammersmith and FulhamHammersmith Broadway 1.1sq. km Hammersmith and FulhamMunster 0.6sq. km Hammersmith and FulhamNorth End 0.6sq. km Hammersmith and FulhamPalace Riverside 1.5sq. km Hammersmith and FulhamParsons Green and Walham 0.9sq. km Hammersmith and FulhamRavenscourt Park 1.2sq. km Hammersmith and FulhamSands End 1.4sq. km Hammersmith and FulhamShepherd's Bush Green 1.1sq. km Hammersmith and FulhamTown 0.7sq. km Hammersmith and FulhamWormholt and White City 0.9sq. km Hammersmith and FulhamAlexandra 2.6sq. km HaringeyBounds Green 1.4sq. km HaringeyBruce Grove 0.9sq. km HaringeyCrouch End 1.4sq. km HaringeyFortis Green 2sq. km HaringeyHarringay 1.6sq. km HaringeyHighgate 2.5sq. km HaringeyHornsey 1.1sq. km HaringeyMuswell Hill 1.7sq. km HaringeyNoel Park 1.2sq. km HaringeyNorthumberland Park 1.9sq. km HaringeySt. Ann's 1.1sq. km HaringeySeven Sisters 1.3sq. km HaringeyStroud Green 1.1sq. km HaringeyTottenham Green 1.4sq. km HaringeyTottenham Hale 1.9sq. km HaringeyWest Green 1.4sq. km HaringeyWhite Hart Lane 1.7sq. km HaringeyWoodside 1.5sq. km HaringeyBelmont 1.8sq. km HarrowCanons 5.6sq. km HarrowEdgware 1.4sq. km HarrowGreenhill 1.7sq. km HarrowHarrow on the Hill 3.6sq. km HarrowHarrow Weald 4.6sq. km HarrowHatch End 3.3sq. km HarrowHeadstone North 3.3sq. km HarrowHeadstone South 1.5sq. km HarrowKenton East 1.3sq. km HarrowKenton West 1.8sq. km HarrowMarlborough 1.6sq. km HarrowPinner 3.3sq. km HarrowPinner South 2.3sq. km HarrowQueensbury 1.6sq. km HarrowRayners Lane 1.5sq. km HarrowRoxbourne 1.5sq. km HarrowRoxeth 1.6sq. km HarrowStanmore Park 4.5sq. km HarrowWealdstone 1.2sq. km HarrowWest Harrow 1.5sq. km HarrowBrooklands 4.2sq. km HaveringCranham 6.6sq. km HaveringElm Park 3.7sq. km HaveringEmerson Park 4.6sq. km HaveringGooshays 7.8sq. km HaveringHacton 2.5sq. km HaveringHarold Wood 7.6sq. km HaveringHavering Park 9.8sq. km HaveringHeaton 3.4sq. km HaveringHylands 2.9sq. km HaveringMawneys 3sq. km HaveringPettits 4sq. km HaveringRainham and Wennington 16.9sq. km HaveringRomford Town 2.9sq. km HaveringSt. Andrew's 2.7sq. km HaveringSouth Hornchurch 6.9sq. km HaveringSquirrel's Heath 2.7sq. km HaveringUpminster 22.5sq. km HaveringBarnhill 2.1sq. km HillingdonBotwell 4.4sq. km HillingdonBrunel 3.3sq. km HillingdonCavendish 2sq. km HillingdonCharville 3sq. km HillingdonEastcote and East Ruislip 4.1sq. km HillingdonHarefield 15.2sq. km HillingdonHeathrow Villages 23.5sq. km HillingdonHillingdon East 3.6sq. km HillingdonIckenham 6.2sq. km HillingdonManor 2sq. km HillingdonNorthwood 7.4sq. km HillingdonNorthwood Hills 3.3sq. km HillingdonPinkwell 2.7sq. km HillingdonSouth Ruislip 7.4sq. km HillingdonTownfield 3.5sq. km HillingdonUxbridge North 4.6sq. km HillingdonUxbridge South 3.1sq. km HillingdonWest Drayton 3.5sq. km HillingdonWest Ruislip 4.4sq. km HillingdonYeading 2.3sq. km HillingdonYiewsley 4sq. km HillingdonBedfont 4.4sq. km HounslowBrentford 3.2sq. km HounslowChiswick Homefields 2.3sq. km HounslowChiswick Riverside 2.1sq. km HounslowCranford 2.7sq. km HounslowFeltham North 3.4sq. km HounslowFeltham West 3.2sq. km HounslowHanworth 3.2sq. km HounslowHanworth Park 3.7sq. km HounslowHeston Central 1.7sq. km HounslowHeston East 2.1sq. km HounslowHeston West 3.8sq. km HounslowHounslow Central 1.7sq. km HounslowHounslow Heath 2.8sq. km HounslowHounslow South 1.8sq. km HounslowHounslow West 1.6sq. km HounslowIsleworth 2sq. km HounslowOsterley and Spring Grove 6.3sq. km HounslowSyon 2.9sq. km HounslowTurnham Green 1.8sq. km HounslowBarnsbury 0.8sq. km IslingtonBunhill 1.1sq. km IslingtonCaledonian 1.1sq. km IslingtonCanonbury 0.8sq. km IslingtonClerkenwell 0.9sq. km IslingtonFinsbury Park 0.9sq. km IslingtonHighbury East 1sq. km IslingtonHighbury West 1.1sq. km IslingtonHillrise 0.8sq. km IslingtonHolloway 1sq. km IslingtonJunction 1sq. km IslingtonMildmay 0.8sq. km IslingtonSt. George's 0.8sq. km IslingtonSt. Mary's 0.9sq. km IslingtonSt. Peter's 0.8sq. km IslingtonTollington 0.8sq. km IslingtonAlexandra 2.7sq. km Kingston upon ThamesBerrylands 1.5sq. km Kingston upon ThamesBeverley 1.9sq. km Kingston upon ThamesCanbury 1.2sq. km Kingston upon ThamesChessington North and Hook 1.9sq. km Kingston upon ThamesChessington South 7.6sq. km Kingston upon ThamesCoombe Hill 4.4sq. km Kingston upon ThamesCoombe Vale 1.6sq. km Kingston upon ThamesGrove 1.9sq. km Kingston upon ThamesNorbiton 1.3sq. km Kingston upon ThamesOld Malden 1.8sq. km Kingston upon ThamesSt. James 2.2sq. km Kingston upon ThamesSt. Mark's 1.4sq. km Kingston upon ThamesSurbiton Hill 1.7sq. km Kingston upon ThamesTolworth and Hook Rise 2.6sq. km Kingston upon ThamesTudor 1.6sq. km Kingston upon ThamesBishop's 1.5sq. km LambethBrixton Hill 1.1sq. km LambethClapham Common 1.3sq. km LambethClapham Town 1.1sq. km LambethColdharbour 1.2sq. km LambethFerndale 0.9sq. km LambethGipsy Hill 1.6sq. km LambethHerne Hill 2sq. km LambethKnight's Hill 1.5sq. km LambethLarkhall 1.1sq. km LambethOval 1.3sq. km LambethPrince's 1.2sq. km LambethSt. Leonard's 1.4sq. km LambethStockwell 0.9sq. km LambethStreatham Hill 1.3sq. km LambethStreatham South 1.7sq. km LambethStreatham Wells 1.4sq. km LambethThornton 1.1sq. km LambethThurlow Park 1.5sq. km LambethTulse Hill 1sq. km LambethVassall 1.1sq. km LambethBellingham 3.1sq. km LewishamBlackheath 2.3sq. km LewishamBrockley 1.7sq. km LewishamCatford South 1.9sq. km LewishamCrofton Park 1.7sq. km LewishamDownham 2.4sq. km LewishamEvelyn 1.8sq. km LewishamForest Hill 1.8sq. km LewishamGrove Park 2.4sq. km LewishamLadywell 1.6sq. km LewishamLee Green 1.8sq. km LewishamLewisham Central 2.1sq. km LewishamNew Cross 1.8sq. km LewishamPerry Vale 1.7sq. km LewishamRushey Green 1.8sq. km LewishamSydenham 1.7sq. km LewishamTelegraph Hill 1.5sq. km LewishamWhitefoot 2.2sq. km LewishamAbbey 1.4sq. km MertonCannon Hill 2.2sq. km MertonColliers Wood 1.1sq. km MertonCricket Green 3sq. km MertonDundonald 1.2sq. km MertonFigge's Marsh 1.1sq. km MertonGraveney 0.9sq. km MertonHillside 1.2sq. km MertonLavender Fields 1.2sq. km MertonLongthornton 1.5sq. km MertonLower Morden 1.8sq. km MertonMerton Park 1.8sq. km MertonPollards Hill 2.2sq. km MertonRavensbury 1.8sq. km MertonRaynes Park 1.9sq. km MertonSt. Helier 1.8sq. km MertonTrinity 1.1sq. km MertonVillage 6.2sq. km MertonWest Barnes 2sq. km MertonWimbledon Park 2.3sq. km MertonBeckton 6.3sq. km NewhamBoleyn 0.9sq. km NewhamCanning Town North 2.5sq. km NewhamCanning Town South 1.8sq. km NewhamCustom House 2.1sq. km NewhamEast Ham Central 1sq. km NewhamEast Ham North 0.9sq. km NewhamEast Ham South 1.8sq. km NewhamForest Gate North 1.2sq. km NewhamForest Gate South 1.2sq. km NewhamGreen Street East 0.7sq. km NewhamGreen Street West 0.8sq. km NewhamLittle Ilford 1.9sq. km NewhamManor Park 1.3sq. km NewhamPlaistow North 1sq. km NewhamPlaistow South 1.4sq. km NewhamRoyal Docks 4.7sq. km NewhamStratford and New Town 4.2sq. km NewhamWall End 1.4sq. km NewhamWest Ham 1.3sq. km NewhamAldborough 8.6sq. km RedbridgeBarkingside 1.5sq. km RedbridgeBridge 2.6sq. km RedbridgeChadwell 1.5sq. km RedbridgeChurch End 1.5sq. km RedbridgeClayhall 2.5sq. km RedbridgeClementswood 1.3sq. km RedbridgeCranbrook 2.4sq. km RedbridgeFairlop 3.6sq. km RedbridgeFullwell 2.2sq. km RedbridgeGoodmayes 1.6sq. km RedbridgeHainault 5.7sq. km RedbridgeLoxford 1.3sq. km RedbridgeMayfield 1.9sq. km RedbridgeMonkhams 3.1sq. km RedbridgeNewbury 2.1sq. km RedbridgeRoding 2.4sq. km RedbridgeSeven Kings 2.1sq. km RedbridgeSnaresbrook 2sq. km RedbridgeValentines 1.5sq. km RedbridgeWanstead 5.2sq. km RedbridgeBarnes 3sq. km Richmond upon ThamesEast Sheen 5.8sq. km Richmond upon ThamesFulwell and Hampton Hill 1.9sq. km Richmond upon ThamesHam, Petersham & Richmond Riverside 9.4sq. km Richmond upon ThamesHampton 6.9sq. km Richmond upon ThamesHampton North 1.9sq. km Richmond upon ThamesHampton Wick 2.7sq. km Richmond upon ThamesHeathfield 1.9sq. km Richmond upon ThamesKew 3.6sq. km Richmond upon ThamesMortlake and Barnes Common 1.9sq. km Richmond upon ThamesNorth Richmond 2.8sq. km Richmond upon ThamesSt. Margarets & North Twickenham 2sq. km Richmond upon ThamesSouth Richmond 2.7sq. km Richmond upon ThamesSouth Twickenham 1.7sq. km Richmond upon ThamesTeddington 4.3sq. km Richmond upon ThamesTwickenham Riverside 2sq. km Richmond upon ThamesWest Twickenham 2.5sq. km Richmond upon ThamesWhitton 1.6sq. km Richmond upon ThamesBrunswick Park 0.9sq. km SouthwarkCamberwell Green 1sq. km SouthwarkCathedrals 1.8sq. km SouthwarkChaucer 0.8sq. km SouthwarkCollege 3.2sq. km SouthwarkEast Dulwich 1sq. km SouthwarkEast Walworth 1.1sq. km SouthwarkFaraday 0.9sq. km SouthwarkGrange 1.2sq. km SouthwarkLivesey 1.4sq. km SouthwarkNewington 0.8sq. km SouthwarkNunhead 1.3sq. km SouthwarkPeckham 0.9sq. km SouthwarkPeckham Rye 2.3sq. km SouthwarkRiverside 1.3sq. km SouthwarkRotherhithe 1.5sq. km SouthwarkSouth Bermondsey 1sq. km SouthwarkSouth Camberwell 1.3sq. km SouthwarkSurrey Docks 1.9sq. km SouthwarkThe Lane 1.4sq. km SouthwarkVillage 2.8sq. km SouthwarkBeddington North 5sq. km SuttonBeddington South 3.1sq. km SuttonBelmont 2.3sq. km SuttonCarshalton Central 2sq. km SuttonCarshalton South and Clockhouse 7.1sq. km SuttonCheam 3.9sq. km SuttonNonsuch 1.9sq. km SuttonSt. Helier 1.5sq. km SuttonStonecot 2sq. km SuttonSutton Central 1.3sq. km SuttonSutton North 1.9sq. km SuttonSutton South 1.3sq. km SuttonSutton West 1.8sq. km SuttonThe Wrythe 1.5sq. km SuttonWallington North 1.6sq. km SuttonWallington South 1.7sq. km SuttonWandle Valley 2.1sq. km SuttonWorcester Park 2sq. km SuttonCann Hall 0.9sq. km Waltham ForestCathall 1.1sq. km Waltham ForestChapel End 1.9sq. km Waltham ForestChingford Green 3.7sq. km Waltham ForestEndlebury 1.9sq. km Waltham ForestForest 2sq. km Waltham ForestGrove Green 0.9sq. km Waltham ForestHale End and Highams Park 2.3sq. km Waltham ForestHatch Lane 2.4sq. km Waltham ForestHigh Street 3.1sq. km Waltham ForestHigham Hill 3.2sq. km Waltham ForestHoe Street 1.1sq. km Waltham ForestLarkswood 2.1sq. km Waltham ForestLea Bridge 2.6sq. km Waltham ForestLeyton 2sq. km Waltham ForestLeytonstone 1.3sq. km Waltham ForestMarkhouse 1.5sq. km Waltham ForestValley 2sq. km Waltham ForestWilliam Morris 1sq. km Waltham ForestWood Street 1.9sq. km Waltham ForestBalham 1.2sq. km WandsworthBedford 1.8sq. km WandsworthEarlsfield 1.4sq. km WandsworthEast Putney 1.6sq. km WandsworthFairfield 1.5sq. km WandsworthFurzedown 1.4sq. km WandsworthGraveney 1sq. km WandsworthLatchmere 1.1sq. km WandsworthNightingale 1.2sq. km WandsworthNorthcote 1.7sq. km WandsworthQueenstown 3.2sq. km WandsworthRoehampton 4.4sq. km WandsworthSt. Mary's Park 1.4sq. km WandsworthShaftesbury 0.9sq. km WandsworthSouthfields 1.5sq. km WandsworthThamesfield 1.9sq. km WandsworthTooting 1.6sq. km WandsworthWandsworth Common 2.8sq. km WandsworthWest Hill 1.7sq. km WandsworthWest Putney 1.8sq. km WandsworthAbbey Road 1.1sq. km WestminsterBayswater 0.5sq. km WestminsterBryanston and Dorset Square 0.7sq. km WestminsterChurchill 0.6sq. km WestminsterChurch Street 0.4sq. km WestminsterHarrow Road 0.5sq. km WestminsterHyde Park 1sq. km WestminsterKnightsbridge and Belgravia 3.6sq. km WestminsterLancaster Gate 0.6sq. km WestminsterLittle Venice 0.6sq. km WestminsterMaida Vale 0.6sq. km WestminsterMarylebone High Street 1sq. km WestminsterQueen's Park 0.6sq. km WestminsterRegent's Park 2.3sq. km WestminsterSt. James's 3.5sq. km WestminsterTachbrook 0.4sq. km WestminsterVincent Square 0.7sq. km WestminsterWarwick 0.6sq. km WestminsterWestbourne 0.7sq. km WestminsterWest End 2sq. km WestminsterBethnal Green 1.2sq. km Tower HamletsBlackwall and Cubitt Town 1.9sq. km Tower HamletsBow East 1.9sq. km Tower HamletsBow West 1.3sq. km Tower HamletsBromley North 0.6sq. km Tower HamletsBromley South 0.7sq. km Tower HamletsCanary Wharf 1.6sq. km Tower HamletsIsland Gardens 1.5sq. km Tower HamletsLansbury 1.3sq. km Tower HamletsLimehouse 0.5sq. km Tower HamletsMile End 1.2sq. km Tower HamletsPoplar 0.7sq. km Tower HamletsSt Dunstan's 0.7sq. km Tower HamletsSt Katharine's and Wapping 1.5sq. km Tower HamletsSt Peter's 1.1sq. km Tower HamletsShadwell 0.6sq. km Tower HamletsSpitalfields and Banglatown 0.9sq. km Tower HamletsStepney Green 0.6sq. km Tower HamletsWeavers 0.7sq. km Tower HamletsWhitechapel 1sq. km Tower HamletsBrownswood 0.5sq. km HackneyCazenove 0.7sq. km HackneyClissold 1sq. km HackneyDalston 0.5sq. km HackneyDe Beauvoir 0.6sq. km HackneyHackney Central 0.8sq. km HackneyHackney Downs 1sq. km HackneyHackney Wick 1.6sq. km HackneyHaggerston 0.9sq. km HackneyHomerton 0.8sq. km HackneyHoxton East and Shoreditch 1sq. km HackneyHoxton West 0.6sq. km HackneyKing's Park 1.9sq. km HackneyLea Bridge 1.1sq. km HackneyLondon Fields 1sq. km HackneyShacklewell 0.4sq. km HackneySpringfield 1.2sq. km HackneyStamford Hill West 0.7sq. km HackneyStoke Newington 1sq. km HackneyVictoria 0.8sq. km HackneyWoodberry Down 0.9sq. km HackneyAbingdon 0.6sq. km Kensington and ChelseaBrompton and Hans Town 1.1sq. km Kensington and ChelseaCampden 1sq. km Kensington and ChelseaChelsea Riverside 0.7sq. km Kensington and ChelseaColville 0.5sq. km Kensington and ChelseaCourtfield 0.6sq. km Kensington and ChelseaDalgarno 0.9sq. km Kensington and ChelseaEarl's Court 0.5sq. km Kensington and ChelseaGolborne 0.6sq. km Kensington and ChelseaHolland 1sq. km Kensington and ChelseaNorland 0.5sq. km Kensington and ChelseaNotting Dale 0.6sq. km Kensington and ChelseaPembridge 0.4sq. km Kensington and ChelseaQueen's Gate 0.6sq. km Kensington and ChelseaRedcliffe 0.7sq. km Kensington and ChelseaRoyal Hospital 1sq. km Kensington and ChelseaSt Helen's 0.5sq. km Kensington and ChelseaStanley 0.7sq. km Kensington and ChelseaCity of London 3.1sq. km City of London 0.4sq. km 29sq. km

What you’re looking at

London is broken down into 35 boroughs, 630 electoral wards, or 3000ish postcodes. I chose to visualise wards because although there is a huge amount of per-borough data kicking around, having only 35 bubbles doesn’t provide quite the effect I was looking for, and conversely, although having each postcode would be very beautiful and granular, it would be very slow to render for all but the most grunty CPUs. Per-ward data seems to strike a good balance of granularity, data availability, and performance.

Each ward tries to stay close to its geographical location, but can get pushed away as the wards around it grow or shrink - the idea is to give a broad sense of geography, so I think it’s ok that they can get slightly mixed up.

Limitations and choices

Working with the data at hand meant being aware of certain limitations of the data set and my ability to present it. For instance, the first version of this used a data set from 2012, which I subsequently was unable to reconcile with the 2016 Mayoral Election results because of the adjustment of certain wards in 2014. This meant becoming aware of exactly which year and which ward boundaries each dataset I looked at corresponded to. It also means that in this visualisation, the data for pre-2014 has been massaged into a post-2014 shape (this was handled at the source by GLA, not by me, thankfully!).

The locations were generated by taking the entire ONS London Postcode database, grouping by ward, and averaging the postcodes. This is OK for a relatively coarse application like this, but for more precise work a more carefully considered dataset would probably make sense.

I’m also aware that every act of communication exposes the author’s biases and conceptions, whether intentionally or not: for instance, when exposing a set of data as sizes, I have to decide both the minimum and maximum sizes, as well as the scaling function. Data can look much more dramatic with one set of choices than with another:

Prices, scaled linearly Prices, scaled by square root

Similarly, choosing which colours to use to present data about ethnic minority population or socially rented housing could appear to make value judgements about the data presented - for instance, does red signify high or low or good or bad?

Technology

To draw this data, I used d3.js force layout with the wards as nodes. Each ward has three competing forces acting on it: the first is a spring-like force pulling it towards its geographical location, and the second is a very strong short-range force attempting to prevent it overlapping its neighbours, and the third is a charge force which distributes the wards slightly more evenly across space. Choosing this set of forces allows a sense of which part of London is which, whilst allowing different areas to grow and shrink in an organic way. It does mean that wards can get a little mixed up, so the idea is only to give a broad sense of geography.

The data is served as a single SQLite file which is interrogated using SQL.js. This allows a lot of flexibility in loading in data - the ability to do joins and aggregations on-the-fly saves a lot of code and memory, and it means that whilst developing I was able to load in table as and when I wanted to try a new data shape.

Tools for effective branching structures in git

28 August 2015

Creating a good git branching structure is a difficult process. There are many considerations to be juggled, including:

  • Is this easily understandable to developers and PMs, including those who may not have prior experience with git?
  • Is it easy to trace a single change to the branch, developer and ticket for which it was made?
  • Is it possible to roll back changes which introduce issues?
  • Will this scale out to several large teams of developers, and does it need to?

In addition, when working with specific systems, for instance Salesforce.com or CPAN, the sandboxing and release processes suitable for those systems introduce additional requirements around the branching structure.

In fact, the principal trade-off to be made is that a branch model which produces a very clear and traceable result will, in general, require a higher level of fluency with git for all participants.

This article is an exploration of different techniques that can be used to build the branching structure your organisation needs; it’s important to note that there is no one true branching structure, and that anybody who says that there is is wrong!

NB: If you feel that something in this article needs improvement, please feel free to open a pull request

#The trivial structure

The simplest possible branching model has one developer working on one feature at a time. When the feature is complete, you tag a release, and continue working on the same branch. This looks like this:

   v0          v1
    |           |
o---o---o---o---o---o

This works well for personal projects, but obviously falls down as soon as you need to switch the priorities on features, fix a bug in an existing feature before resuming work on the in-progress one, or collaborate with anybody else. Nonetheless, it’s important to realise that git branching structures don’t, in fact, have to include multiple branches.

Branching

In order to work on multiple features at once, or get a bugfix done quickly, you can start to use multiple branches, merging changes as appropriate. This works as follows:

  1. You’re working on a feature:

          v0    myFeature
           |       |
       o---o---o---o
    
  2. You need to fix a bug, so you create a new branch starting at the last release

           v0   myFeature
           |       | myBugfix
       o---o---o---o   |
            \----------o
    
  3. When you’ve finished the bug, you release that branch

           v0   myFeature
           |       | v0.1
       o---o---o---o   |
            \----------o
    
  4. You merge the new release into your own branch

           v0              myFeature
           |         v0.1  |
       o---o---o---o---|---*
            \----------o--/
                           ^merge commit
    
  5. When you finish your feature, you release your branch

           v0                 v1
           |          v0.1     |
       o---o---o---o---|---*---o
            \----------o--/
    

Master

Using tags for releases works really well from a release-management and version history point of view, but it can get a bit fiddly as a developer - you have to constantly check which tag is the most recent, and ensuring that you’re branching and merging the right commits can get a little tedious. If you’re the only one working on the project, it’s probably not going to get to complicated, because you may well have only a couple of branches at once, and you’ll create each release and therefore be in a better position to remember what’s going on. However, once you have more than one person able to make releases, or you get several branches, you’ll want to handle this potential complexity.

At this point, having a master branch is really useful. Whenever you release, you ensure that master points at that commit. In that way, each time you switch branch, making sure it’s up-to-date with the latest release is simply a matter of merging in master. When you’re dealing with a master branch, your branching diagrams look a little different:

  1. You’re working on a feature

           master
         v0 |  
          \ /     myFeature
       o---o        |
            \---o---o  
    
  2. You need to fix a bug, so you create a new branch starting at the last release

           master
         v0 |  
          \ /     myFeature
       o---o         |  myBugfix
            \        |   |
             \---o---o   |
              \----------o
    
  3. When you’ve finished the bug, you release that branch by merging into master

                          master
          v0                | v0.1
           |   myFeature    |/
       o---o---------|------*          
            \        |     /^merge commit
             \---o---o    /
              \----------o
    
  4. You merge the new release into your own branch

                        master
          v0              | v0.1
           |              |/
       o---o--------------*  myFeature
            \            / \ /
             \---o---o--/---*
              \--------o
    
  5. When you finish your feature, you release your branch.

          v0            v0.1 v1 master
           |              |   | /
       o---o--------------*---*
            \            / \ /
             \---o---o--/---*
              \--------o
    

Fast-Forward

The downside of introducing the master branch like this is that we’ve introduced two extra merge commits compared to the previous version - and in fact, half of the commits since v0 are merge commits! This does serious damage to our ability to see quickly and easily what changes have been introduced and when. Fortunately, we don’t always need to do a merge - git has an ability to fast-forward, which means that, when there is nothing to merge, the branch is moved to point to a different commit, without any new commit being added.

To be more specific, a fast-forward occurs when one of the commits to be merged is the ancestor of the other, which you can see happening at v0.1 and v1 above.

If we allow fast-forward commits, we end up with much more attractive diagrams for steps 3 onwards:

  1. When you’ve finished the bug, you release that branch (nb we fast-forwarded master!)

           v0      master v0.1
           |  myFeature | /
       o---o--------|---o 
            \       |
             \--o---o   
    
  2. You merge the new release into your own branch

           v0   master  v0.1
           |         \ /
       o---o----------o  myFeature
            \          \/
             \--o---o---*
    
  3. When you finish your feature, you release your branch (nb another fast-forward onto master!)

           v0       v0.1 v1  master
           |          |   | /
       o---o----------o---* 
            \            /
             \--o---o---/
    

It’s important to realise that this set of diagrams is identical to the original, with a new branch added and some lines in different places - master is simply ‘a branch which will always point to the last release’. In fact, if you are using master in this way, you could choose only ever to fast-forward commits onto it.

Rebase

I think that merge commits are noise. When you have a branch-based workflow, you’re working on a few features simultaneously, and you release regularly, you may end up with 1/3rd or more of your commits being merge commits, and this can mean that when you use git log you end up with an effectively unreadable mess. Fortunately, in rebase we have a tool that lets us re-arrange our commit history in an extremely readable and pleasant manner. It works exactly the same as above up to step 4, at which point instead of merging, we rebase, which takes all of the commits we made on our branch and then applies them on top of the target, which means it’s as though we just checkout out the latest release and instantaneously developed on top of it. This leaves the history looking like this:

             master
          v0  | v0.1
          |   \ /
      o---o----o     myFeature
                \       |
                 \--o---o

which in turn means that when we release myFeature, we get this:

          v0  v0.1   v1 master
          |    |      \ /
      o---o----o---o---o

…which is extremely easy-to-follow.

This is the workflow that I use on my perl modules. The habitual use of rebase during development is not without controversy, however; to be able to rebase accurately and effectively whilst avoid messing up your own and other people’s work requires discipline and experience. You have to ensure that you don’t rebase a branch which you’ve pushed to a shared git server, and that when you do rebase you are aware of potential conflicts and the ways to resolve them - because it’s less obvious after the fact that when you do a merge. It was this article which got me thinking about the ways that rebase is in fact a brilliant tool to have up your sleeve, and I do think that on projects with a high enough level of expertise, it should be used.

Another caveat to add at this stage is that rebase is, like all tools, not always appropriate. If your branch is more than a few commits divergent, or if the rate of change is so fast that you’re trying to rebase dozens if not hundreds of commits at a time, you may well find that it’s more trouble than it’s worth; git merges exist for an excellent reason. I think that choosing the best way to incorporate change is largely a matter of doing it several times and getting an intuition for it.

Rebasing onto a shared branch

Let’s say you and another developer both working on some feature, and you’ve got a branch called myFeature. You actually have at least 5 branches in at least 3 locations:

  • On the server, you have myFeature
  • On your computer, you have origin/myFeature and myFeature
  • On his computer, you have origin/myFeature and myFeature

To start with, all of the branches look the same. However, once you’ve each done a little work, it can easily look a bit like this:

server/myFeature   a---b
                    \   \
theirs/myFeature     \   d
                      \
mine/myFeature         c---e

Now, when I pull from and push to the server, then make another commit, this happens:

server/myFeature   a---b-------*
                    \   \     / \
theirs/myFeature     \   d   /   \
                      \     /     \
 mine/myFeature        c---e       f

And they do the same, which looks like this:

server/myFeature   a---b-------*-----*
                    \   \     / \   / \
theirs/myFeature     \   d---/---\-/   g
                      \     /     \
mine/myFeature         c---e       f

This rapidly becomes messy and has unnecessary merge commits, not to mention being hard to follow. However, what would have happened had we fetched and rebased instead of pulling is the following much neater result:

server/myFeature   a---b---c'--e'--d'
                                \   \
theirs/myFeature                 \   g
                                  \
 mine/myFeature                    f

Essentially, a competent developer using git should almost always rebase when the commits to be pushed are not yet on a server.

Pull requests

One good use for merges is that they allow peer-review and attribution of changes. This leads to the idea of a ‘pull request’ - some contributor sends a message saying

Please pull1 my branch into your repository

At this point, every git tool out there will show you exactly what has changed and why, enabling you to have confidence in the features they’ve developed, and it also makes it easy to appreciate their contributions. Pull requests are a crucial tool for collaboration on projects where there is anything other than a small and tightly-knit team.

When you have a pull request based workflow, your master branch will look something like this:

master     ---*---*---*---
feature A  o-/   /   /
feature B  ---o-/   /
feature C  -o----o-/

This means that every commit on master is a merge commit, and they will probably look something like ‘Merge pull request #4 from my-super-special-feature to master’. This does mean that’s it’s often harder to find the specific commit which introduced a change.

Develop

At some point you’ll be working on a system where you can’t simply release several times a week, and releases need to be gathered, tested, signed off, and deployed. Some might argue that this is a pathology, but it’s also a fact of life. In this situation, you may well add in a branch for work that’s done, but not yet released. Depending on your background, you may want to call this several names, including stable and trunk - in git, it’s called develop.

This looks like this:

  1. You start with master and develop

     master v0 develop
          \ | /
            o
    
  2. You do some work on the develop branch using one or more of the above principles:

    v0  master
      \ /      
       o      develop
        \        |
         o---o---o
    
  3. You’re ready to release, so you fast-forward master onto develop and tag a release

     v0   master v1 develop
     |         \ | /
     o---o---o---o
    
  4. Rinse & repeat

Git-Flow

Git-Flow is essentially: having a master branch, a develop branch, and additional feature branches, without using fast-forward or rebase. It ends up looking a bit like this:

master   --o----------------------*---
            \                    / \
develop      \----------*-------*---*
              \        / \     /  
feature1       \--o---o   \   /  
                \          \ /  
feature2         o----o--o--*

It has the advantage of being able to accommodate reasonably-sized teams of relatively-low expertise, but it also has a fair number of disadvantages - which have been discussed at length everywhere.

Beyond Git-Flow

Git-Flow starts breaking down once you hit a large number of simultaneous teams; once you hit about 50 feature branches, you spend so much time merging down from develop and there are so many merge commits, that you lose a lot of the benefits of using git to begin with. At this point, it’s much easier to set up a branch per epic2 and have the team working on that treat it as a master branch - once the epic is ready for release, that’s then released as normal. What this means is you have:

master   o---------------*---*
          \             /   /
epic 1     \-#Black Box#   /
            \             /
epic 2       \-#Black Box#

So, depending on those teams’ structures, they may be using anything from an extremely trivial workflow up to a full on mini-Git-Flow. It’s at this point that your branching structure starts looking a bit like a fractal.

Forks

If you’re going to treat each team’s work on your product to be a separate black box waiting to be pull requested back into the develop or master branch, you may as well get them to work in separate forks - this prevents you from getting a gradual buildup of 300 stale branches where nobody’s quite sure who’s working on what.

Using forks can also unlock some useful functionality in whatever git server you’re using; Atlassian’s Stash has a ‘fork syncing’ feature which allows you to automatically apply any commit which is applied to a branch in a parent repository to all the child forks. It allows each team to set fine-grained permissions and have administrative access, isolates critical infrastructure, and makes setting up continuous integration easier (you just clone the CI environment and point it at a different URL, rather than having to reconfigure all the branches).

Per-environment branches

Depending on the way you have your continuous integration environments set up, you may want to use a branch to represent test and staging environments. However, you probably won’t want to ever merge these branches into anywhere else - tickets that are in for testing are explicity untested, and tickets undergoing UAT are not UAT’d. One successful approach is:

  1. A ticket is moved to ‘development complete’
  2. A pull request is automatically opened to the relevant test environment
  3. A build plan detects the pull request and attempts to build and deploy the pull request
  4. If the build and deployment is successful, the pull request is automatically merged

Travis CI has a great feature where it automatically detects pull requests and builds them; Atlassian Bamboo has a feature where it can automatically merge branches if a build passes, and they are both good examples of how using even simple git features can save you a lot of work.

  1. When you remember that pull means fetch then merge, this is a very clear and specific request. 

  2. Or whatever you want to call a related group of features 

Tags: Git

Bundling App::SFDC for fun and profit

12 August 2015

Motivation

Whilst installing perl and App::SFDC along with a (quite large) number of dependancies is fun, effective and powerful, it’s not always the best solution for Salesforce deployment tools. When you’re deploying from throwaway AWS instances or sending your tools to developers in foreign countries who want something that just works now, you may want to provide a ready-to-go bundle of code. Fortunately, programs such as PerlApp provide a pretty good way to achieve this.

I’m going to run through how to bundle App::SFDC to a standalone .exe suitable for deploying and retrieving metadata on a windows machine.

Introduction to PerlApp

The idea behind PerlApp is pretty straightforward: you point it at a script, and it calculates the module dependancies and bundles the perl interpreter along with all required modules into a .exe, which can then be run without a local perl installation - essentially, you run perlapp --exe SFDC.exe C:\perl64\site\bin\SFDC.pl .

Loading prerequisites

Of course, when you do this and run the resulting executable, there are some modules missing - it’s hard to detect all of the prerequisites, especially when they’re being dynamically loaded in. Examples of this are that WWW::SFDC loads in modules by running:

for my $module (qw'
    Apex Constants Metadata Partner Tooling
'){
    has $module,
      is => 'ro',
      lazy => 1,
      default => sub {
        my $self = shift;
        require "WWW/SFDC/$module.pm"; ## no critic
        "WWW::SFDC::$module"->new(session => $self);
      };
  }

In a similar way, when you create a screen appender for Log::Log4perl , it quietly loads in Log::Log4perl::Appender::Screen. To fix this sort of issue, we add a few more arguments to perlapp:

perlapp  --add MooX::Options::Role^
 --add App::SFDC::Role::^
 --add Log::Log4perl::Appender::Screen^
 --add WWW::SFDC::^
 --exe SFDC.exe C:\perl64\site\bin\SFDC.pl

Fixing SSL certification

Perl isn’t great at picking up a system’s SSL settings, especially installed certificates - and when the entire purpose of a script is to send HTTPS requests, it’s something that you just have to get right - lest you get errors like 500 C:\Users\ALEXAN~1\AppData\Local\Temp\pdk-alexanderbrett/Mozilla/CA/cacert.pem on disk corrupt at /<C:\Dev\App-SFDC\SFDC.exe>WWW/SFDC.pm line 66..

One successful workaround I’ve found to this sort of error, which works whenever curl is installed, is to use curl’s Certificate Authority file instead of perl’s. You can find this by running curl -v https://login.salesforce.com >nul and looking for the lines like:

* successfully set certificate verify locations:
*   CAfile: C:\Program Files (x86)\Git\bin\curl-ca-bundle.crt

Then, you set HTTPS_CA_FILE=C:\Program Files (x86)\Git\bin\curl-ca-bundle.crt and your HTTPS connections start working again. This amount of manual faffing around is more than most developers or AWS images want to do, and fortunately PerlApp has our back again - we can bind in arbitrary files, and specifiy arbitrary environment variables. Let’s add more arguments to perlapp:

...
--bind certs/cafile.crt[file="C:\Program Files (x86)\Git\bin\curl-ca-bundle.crt",text,mode=666]^
--env HTTPS_CA_FILE=certs/cafile.crt^
...

Binding Retrieve plugins

Since we’re going to be wanting to use App::SFDC::Command::Retrieve, we need to make sure the plugins and manifests mentioned are, in fact, included. By default they are installed to the perl share/ location, and PerlApp won’t see them! This is how to bind in the default values:

...
--bind manifests/base.xml[file=C:\perl64\site\lib\auto\Share\dist\App-SFDC-Metadata\manifests\base.xml,text,mode=666]^
--bind manifests/all.xml[file=C:\perl64\site\lib\auto\Share\dist\App-SFDC-Metadata\manifests\all.xml,text,mode=666]^
--bind plugins/retrieve.plugins.pm[file=C:\perl64\site\lib\auto\Share\dist\App-SFDC-Metadata\plugins\retrieve.plugins.pm,text,mode=777]^
...

We should also ensure any dependencies from retrieve.plugins.pm are loaded:

...
--scan C:\perl64\site\lib\auto\Share\dist\App-SFDC-Metadata\plugins\retrieve.plugins.pm
...

This is the point at which you may want to override these values! If you have specific requirements for your manifests, for the folders you want to retrieve, or anything like that, create your own versions of those files and bundle those in instead.

Why doesn’t it work yet?

This was all great up to v0.13, but now this approach stopped working from v0.14 onwards. At that point I moved from a monolithic everything-in-one package approach to a dynamically loading plugin-oriented architecture, which allows anybody to create a command by naming their package App::SFDC::Command::Foo. The code that makes that happen is:

find
   {
       wanted => sub {push @commands, $1 if m'App/SFDC/Command/(\w*)\.pm'},
       no_chdir => 1
   },
   grep {-e} map {$_.'/App/SFDC'} @INC;

…and this approach is completely broken by PerlApp - when running this, you get the error invalid top directory at /<C:\Dev\App-SFDC\SFDC.exe>File/Find.pm line 472., because PerlApp doesn’t create any recognisable directory structure for bundled modules - it provides an overloaded version of require which gets the required module from somewhere non-obvious.

After trying a few different things, it seems that the simplest way to achieve a nicely-bundled .exe is going to be to write a new script which avoids the pitfalls of detecting commands at runtime. We can, in fact, write a small perl program which writes the script for us (compare the output of this to SFDC.pl - it’s the same idea, but static):

#!perl
use strict;
use warnings;
use 5.12.0;
use App::SFDC;

my $commandArrayDefinition = 'my @commands = ("'
    . (join '","', @App::SFDC::commands) . '");';

say <<'HEAD';
package SFDC;
use strict;
use warnings;
HEAD

say "use App::SFDC::Command::$_;" for @App::SFDC::commands;

say 'my @commands = ("'
        . (join '","', @App::SFDC::commands)
        . '");';

say <<'BODY';

my $usage = join "\n\n",
    "SFDC: Tools for interacting with Salesforce.com",
    "Available commands:",
    (join "\n", map {"\t$_"} @commands),
    "For more detail, run: SFDC <command> --help";

my $command = shift;
exit 1 unless do {
    if ($command) {
        if (my ($correct_command) = grep {/^$command$/i} @commands) {
            "App::SFDC::Command::$correct_command"->new_with_options->execute();
        } else {
            print $usage;
            0;
        }
    } else {
        print $usage;
    }
}
BODY

__END__

Tying it all together

Using perl -x in a batch file, we can combine the perl script-writing script and the call to PerlApp into one easy-to-digest package, by using some syntax like:

perl -x %0 > static_SFDC.pl

perlapp ^
 ...
 --info CompanyName=Sophos;LegalCopyright="This software is Copyright (c) 2015 by Sophos Limited https://www.sophos.com/. This is free software, licensed under the MIT (X11) License"^
 --norunlib --force --exe SFDC.exe static_SFDC.pl

goto :endofperl

#!perl
use strict;

...

__END__

:endofperl

For a full version, I’ve created a gist to play with.

Tags: SFDC Perl

Logging::Trivial - or, why to hold off on that logging module you wrote

03 May 2015

A while back, I wrote WWW::SFDC as well as a few programs calling it, and I wanted the world’s most trivial logging module which still allowed for 5-level (DETAIL, DEBUG, INFO, WARN, ERROR) logging. I couldn’t find anything appropriate on cpan, so I rolled my own and called it Logging::Trivial.

Now, I was pretty happy with this, and got it all ready to be a grown-up cpan module (my first), so I went on prepan and said, guys, what do you think. I wouldn’t call it a slap-down, but I got some pretty robust advice not to publish Yet Another Logging Module, and as a result I decided that I’d sit on it - I wouldn’t refactor it out until I found a suitable replacement, but I wouldn’t publish.

Today I revisited the issue and found Log4Perl’s easy-mode, and I’m very pleased because it does exactly what I want, with very, very little rewriting of code. I ran perl -i.bak -pe 's/Logging::Trivial/Log4Perl ":easy"/; s/DETAIL/TRACE/; s/ERROR/LOGDIE/;' and was essentially done.

I think the moral of the story is that when you’re not quite sure that your solution is going to stand the test of time, wait a while to see whether it does. In my case, it didn’t, but I’m better off for that, and so is cpan.

Tags: Perl