10 steps to avoid failure with async/await in C#
12 December 2017
In 2012 Microsoft released C# 5.0, which introduced the async and await keywords:
C# changed the game again by baking asynchrony into the language as a first-class participant
This is the future. Not only are other major languages following C#’s lead in implementing the Task Asynchrony Pattern (TAP), but more and more C# libraries are being written using Tasks to offer non-blocking APIs - most notably System.HttpClient. Programming using TAP can make better use of resources by allowing methods to yield threads to other tasks when blocked, and it can make callback and promise based programming vastly easier to write, read and understand. To get more of an idea of what teary-eyed idealogues are saying, read some best practices - and then take a step back.
The async/await keywords and tasks are by no means trivial to get right: there are some obvious and slightly less obvious pitfalls which you’ll want to avoid. This blog post is the crash course you need to avoid embarrassing yourself at work, and ending up with developers who have spent the last 25 years bashing their heads against explicit threading deadlocks go off on boring rants at you about how these newfangled language features will get us all in trouble…
Here is what you need to know. You can follow along with all of the examples in a debugger by cloning https://github.com/alexander-brett/CsharpAsyncDemo
1. Consider whether you need TAP, and integrate it thoughtfully if you do
Asynchronous and parallel programming can be harder than it’s worth. The Task Parallel Library (TPL) often does not play well with legacy code at all, since many of the easy mistakes fall at the boundary between Task-based and non-Task-based code, and juggling threads AND tasks can become a headache really fast.
However, if you have decided that callback- or promise-based or parallel programming is a benefit in your new (or relatively small) application, strongly prefer the TPL over explicit threads! Your code will be much simpler to understand, and much much easier to test. The ideal situation is an app that has been designed from the ground up as a series of asynchronous operations, for instance a data pipeline or server.
2. Understand the difference between single- and multi-threaded schedulers
The TPL uses a Task Scheduler to keep track of which tasks are waiting and in progress at any time, and kick off waiting tasks when possible. Task Schedulers operate in two modes: single and multi-threaded. In UI applications and ASP.net servers, the main thread operates in a Single Thread Apartment (STA), which means that the Task Scheduler will only schedule tasks on the main thread. When there is only one thread to share between all scheduled tasks, it is much easier to deadlock than if there are multiple threads kicking around: in fact, without taking great care, this will happen a great deal!
This demands that you be aware of what mode the code you’re writing will run in - and you must bear in mind that if you’re writing library code of any time, it’s almost certain that down the line it will eventually be consumed in a single-threaded environment.
3. Recognise the classic TPL single-threaded deadlock
The single most talked about issue in the Thread Parallel Library is how easy it is to unknowingly cause a deadlock in a single-threaded scheduler when trying to synchronously wait for async code. For an in-depth explanation, you can read any of those blog posts, but the basic idea is that the UI thread schedules a task to be run on the first free thread and then blocks on the result - consuming the only available thread!
The solution in every case that this comes up is to ensure that the continuation (the part of the code after the asynchronous call) is permitted to run in a different context from async call itself, or to allow async to permeate all way up to your application’s entrypoint.
Consider the following code:
private static async Task DelayAsync() => await Task.Delay(100);
[Test, Timeout(200)]
public static void ThisWorksFineHonest()
{
DelayAsync().Wait();
}
Because this test is running in a multiple-threaded environment, everything goes fantastically, and the test passes. However, when we queue it on the single-threaded synchronisationcontext, it fails:
// thanks to https://stackoverflow.com/questions/40343572/simulate-async-deadlock-in-a-console-application
[Test, Timeout(200)]
public static void DemonstrateFailure()
{
new DedicatedThreadSynchronisationContext().Send(state =>
{
DelayAsync().Wait();
}, null);
}
In order to demonstrate this, we’re using the DedicatedThreadSynchronisationContext
from this StackOverflow post which reproduces the single-threaded behaviour of UI apps, and allows us to put it under a microscope in some unit tests.
4. Torture test every async method in a single-threaded scheduler
It’s great news that we have a way to replicate this bug in a unit test environment! Nothing squashes bugs like a well-written unit test. As such, we can write a very straightforward harness for any code which returns a Task:
public class TestHarness
{
public static void TestAsyncBehaviour(Func<Task> test)
{
new DedicatedThreadSynchronisationContext().Send(state =>
{
test().Wait();
}, null);
}
}
Now, if this harness doesn’t deadlock, we’re safe. From here on, all the code examples will use this helper method. In your codebase, you can avoid embarrassing moments by using something like this regularly - certainly if you think you might want to blame something on Task library counter-intuitiveness you no longer need to guess.
5. Avoid deadlocks by removing async and await keywords where possible
Just because you’re using Tasks doesn’t mean you need the async and await keywords. They are only required when you need to allow the task in question to finish before running a continuation. In the following code, the first test fails and the second passes:
public static class AvoidDeadlockByNotUsingAwaitWhenUnneccesary
{
private static async Task DelayAsync() => await Task.Delay(100);
[Test, Timeout(200)]
public static void DemonstrateDeadlock()
{
TestHarness.TestAsyncBehaviour(DelayAsync);
}
private static Task DelayTask() => Task.Delay(100);
[Test, Timeout(200)]
public static void DemonstrateNoDeadlock()
{
TestHarness.TestAsyncBehaviour(DelayTask);
}
}
In DemonstrateDeadlock, the use of the await keyword attempts to capture context and yield, while the thread is blocked. This is exactly the deadlock outlined above, but in the second example we fix the deadlock simply by removing unnecessary keywords! The moral of the story is that return await
is a code smell. Don’t do this until you have enough experience with TPL to know that that’s exactly, specifically, precisely what you want, and if it is place a very precise comment explaining why. It almost never is.
6. Avoid deadlocks by using ConfigureAwait(false)
In the previous example we avoided capturing the context by avoiding the return await
costruction: in this example, we’ll do it by telling the scheduler explicitly that we don’t need it. Task.ConfigureAwait()
hints to the scheduler whether or not the continuation (the bit of code after the await
statement) needs to run in the same thread as the preceding code or not. If it does (ConfigureAwait(true)
, the default), then the caller captures the thread context and when you block that thread (with Wait()
) the thread pool is starved. However, if you don’t need the thread context to be capture - you almost never do - you can fix the deadlock with ConfigureAwait(false)
:
class AvoidDeadlockByUsingConfigureAwait
{
private static async Task<object> DelayAsync()
{
await Task.Delay(100);
return new object();
}
[Test, Timeout(200)]
public static void DemonstrateDeadlock()
{
TestHarness.TestAsyncBehaviour(DelayAsync);
}
private static async Task<object> DelayWithConfigureAwait()
{
await Task.Delay(100).ConfigureAwait(false);
return new object();
}
[Test, Timeout(200)]
public static void DemonstrateNoDeadlock()
{
TestHarness.TestAsyncBehaviour(DelayWithConfigureAwait);
}
}
For more details, read this - in fact the whole article is a really great read.
7. Understand that await is more-or-less equivalent to ContinueWith()
So the await
keyword is essentially just a way of scheduling a continuation - we’re saying when this task is finished, do this other thing. However, there is more than one way to do it: the following two tests do effectively the same thing (here is a more detailed blog post about it)[https://blogs.msdn.microsoft.com/pfxteam/2012/01/20/await-synchronizationcontext-and-console-apps/]:
class AwaitVersusContinueWith
{
private static Task Method1()
{
Console.WriteLine("Method1");
return Task.Delay(10);
}
private static Task Method2()
{
Console.WriteLine("Method2");
return Task.Delay(10);
}
[Test]
public static async Task DemonstrateAwait()
{
await Method1();
await Method2();
}
[Test]
public static async Task DemonstrateContinueWith()
{
await Method1().ContinueWith(task => Method2());
}
}
Judicious use of ContinueWith
can make your code easier to understand, especially if you write code that accepts Tasks and manipulates them. Rather than awaiting your argument and doing some things, you can return an explicit continuation. Returning a continuation happens instantaneously, whereas awaiting happens asynchronously - and you will eventually be able to form an intuition about when you want the asynchronous call to happen and where you want errors to propagate (more on that later).
8. Avoid deadlocks by using Continuewith()
Having understood ContinueWith
it becomes clear that the following example is effectively the same as #5 - remove the await point and the deadlock disappears:
class RemoveDeadlockByUsingContinueWith
{
private static async Task<object> DelayAsync()
{
await Task.Delay(100);
return new object();
}
[Test, Timeout(200)]
public static void DemonstrateDeadlock()
{
TestHarness.TestAsyncBehaviour(DelayAsync);
}
private static Task<object> DelayContinuation()
=> Task.Delay(100).ContinueWith(task => new object());
[Test, Timeout(200)]
public static void DemonstrateNoDeadlock()
{
TestHarness.TestAsyncBehaviour(DelayContinuation);
}
}
So far I have not managed to make a deadlock using ContinueWith
- if you have an example, please open a pull request!
9. Realise that Tasks are more synchronous than you think
This is known as the fast path. Essentially, a lot of the time when you call a Task method, some or all of the continuations will execute synchronously, depending on exactly what’s going on. await Task.CompletedTask
will execute synchronously, and if you only have an await keyword halfway down a method, the first lines will execute synchronously before the task yields. If you have a TaskCompletionSource
and call SetResult
on it, and there is one continuation associated with the task, then unless you set TaskContinuationOptions.RunContinuationsAsynchronously
the continuation will probably run synchronously.
If exactly which lines of code execute synchronously or otherwise matters in the application you’re writing, it’s worth spending a few hours getting a feel for which situations are and are not synchronously executed. In a debugger, you can step through a synchronous chain of operations until it ends, which can be very instructive! Until then, just be careful not to assume that anything you awaited is automatically happening in parallel.
10. Handle exceptions well
With Tasks, any exception will not be thrown until the task is awaited, unless the method returning a task is in fact synchronous. If the task is never awaited, any exception thrown is called unobserved. Having an unobserved exception in your code is in general a really bad idea, because it means that bad behaviour is almost impossible to spot. Just like it’s a good idea to thoroughly test around error paths in synchronous code, you should also write tests for error behaviour in your async code.
Also, when a task throws an exception, when you await it you will see an AggregateException
whose ex.Exception
property contains the original exception. Often it’s desirable to unpack this and handle the inner exception.
All of the following tests pass!
class DoNotSwallowExceptions
{
public static async Task ThrowException()
{
await Task.Delay(10).ConfigureAwait(false);
throw new Exception();
}
[Test]
public static void DoesNotThrowWhenAsynchronous()
{
Assert.DoesNotThrow(() => ThrowException());
}
[Test]
public static void HaveToWaitToGetAnException()
{
Assert.Throws<AggregateException>(() => ThrowException().Wait());
}
public static async Task ThrowExceptionBeforeAwaiting()
{
if ("1".Equals(1.ToString())) throw new Exception();
await Task.Delay(10).ConfigureAwait(false);
}
[Test]
public static void DoesNotThrowWhenHappensToBeAsynchronous()
{
Assert.DoesNotThrow(() => ThrowExceptionBeforeAwaiting());
}
public static Task ThrowExceptionSynchronouslyReturnTask()
{
if ("1".Equals(1.ToString())) throw new Exception();
return Task.Delay(10);
}
[Test]
public static void ThrowsExceptionWhenHappensToBeSynchronous()
{
Assert.Throws<Exception>(() => ThrowExceptionSynchronouslyReturnTask());
}
}
Conclusion
Having been working intensively with aysnc programming for the last little while, I’ve seen that almost every bug in async code I’ve written or read has come from misunderstanding one of the points in this article. I suggest that you get out there, write some code, and when it fails, come and reread this before tearing your hair out.
None of this work is genuinely new - I have aggregated insights from across various different blog posts, and as such I’m indebted to the previous work of people who have already explained and demonstrated these things.
If you think the explanation here needs tightening up, please submit a pull request, and again please do check out the csharp solution and play with it yourself.
Happy coding!
Tags: CSharp
Presenting to #Geomob
11 September 2016
Following on from my post about visualising London, I was invited to speak at #Geomob at the BCS! I had a lot of fun.
If you’re interested in the slideshow, you can see the slides here,
and some source code is here. Press p
to see my presentation notes - you drive the show by pressing the left mouse button.
The slideshow was driven by remark.js in order to be able to get all of the animations to work nicely. I did in fact have to roll my own fork of remark to pull it off - basically I ripped out the click event handling and replaced it with something intensely hacky, so it’s not code I’ll share for now!
During and after my talk, this happened!
Tonight's final #geomob speaker: Alexander Brett telling us about his London visualisation https://t.co/Tt5FuQpRd2
— geomob (@geomob) September 7, 2016
Great set of presentations @geomob, coding cartography from Alexander Brett won #locationdata
— henry_mcneill (@henry_mcneill) September 7, 2016
Thank you everyone!
Visualising London
01 June 2016
I recently started working on a couple of interesting data-visualisation projects at work, which lead me to researching visualisation libraries, watching this video, and thinking that it would be cool to do something like that for London.
Happily, data.london.gov.uk exists and has a bizarrely good selection of datasets, so I threw together the following visualisation tool to provide a little window into the state of London, ward-by-ward.
What you’re looking at
London is broken down into 35 boroughs, 630 electoral wards, or 3000ish postcodes. I chose to visualise wards because although there is a huge amount of per-borough data kicking around, having only 35 bubbles doesn’t provide quite the effect I was looking for, and conversely, although having each postcode would be very beautiful and granular, it would be very slow to render for all but the most grunty CPUs. Per-ward data seems to strike a good balance of granularity, data availability, and performance.
Each ward tries to stay close to its geographical location, but can get pushed away as the wards around it grow or shrink - the idea is to give a broad sense of geography, so I think it’s ok that they can get slightly mixed up.
Limitations and choices
Working with the data at hand meant being aware of certain limitations of the data set and my ability to present it. For instance, the first version of this used a data set from 2012, which I subsequently was unable to reconcile with the 2016 Mayoral Election results because of the adjustment of certain wards in 2014. This meant becoming aware of exactly which year and which ward boundaries each dataset I looked at corresponded to. It also means that in this visualisation, the data for pre-2014 has been massaged into a post-2014 shape (this was handled at the source by GLA, not by me, thankfully!).
The locations were generated by taking the entire ONS London Postcode database, grouping by ward, and averaging the postcodes. This is OK for a relatively coarse application like this, but for more precise work a more carefully considered dataset would probably make sense.
I’m also aware that every act of communication exposes the author’s biases and conceptions, whether intentionally or not: for instance, when exposing a set of data as sizes, I have to decide both the minimum and maximum sizes, as well as the scaling function. Data can look much more dramatic with one set of choices than with another:
Similarly, choosing which colours to use to present data about ethnic minority population or socially rented housing could appear to make value judgements about the data presented - for instance, does red signify high or low or good or bad?
Technology
To draw this data, I used d3.js force layout with the wards as nodes. Each ward has three competing forces acting on it: the first is a spring-like force pulling it towards its geographical location, and the second is a very strong short-range force attempting to prevent it overlapping its neighbours, and the third is a charge force which distributes the wards slightly more evenly across space. Choosing this set of forces allows a sense of which part of London is which, whilst allowing different areas to grow and shrink in an organic way. It does mean that wards can get a little mixed up, so the idea is only to give a broad sense of geography.
The data is served as a single SQLite file which is interrogated using SQL.js. This allows a lot of flexibility in loading in data - the ability to do joins and aggregations on-the-fly saves a lot of code and memory, and it means that whilst developing I was able to load in table as and when I wanted to try a new data shape.
Tools for effective branching structures in git
28 August 2015
Creating a good git branching structure is a difficult process. There are many considerations to be juggled, including:
- Is this easily understandable to developers and PMs, including those who may not have prior experience with git?
- Is it easy to trace a single change to the branch, developer and ticket for which it was made?
- Is it possible to roll back changes which introduce issues?
- Will this scale out to several large teams of developers, and does it need to?
In addition, when working with specific systems, for instance Salesforce.com or CPAN, the sandboxing and release processes suitable for those systems introduce additional requirements around the branching structure.
In fact, the principal trade-off to be made is that a branch model which produces a very clear and traceable result will, in general, require a higher level of fluency with git for all participants.
This article is an exploration of different techniques that can be used to build the branching structure your organisation needs; it’s important to note that there is no one true branching structure, and that anybody who says that there is is wrong!
NB: If you feel that something in this article needs improvement, please feel free to open a pull request
#The trivial structure
The simplest possible branching model has one developer working on one feature at a time. When the feature is complete, you tag a release, and continue working on the same branch. This looks like this:
v0 v1
| |
o---o---o---o---o---o
This works well for personal projects, but obviously falls down as soon as you need to switch the priorities on features, fix a bug in an existing feature before resuming work on the in-progress one, or collaborate with anybody else. Nonetheless, it’s important to realise that git branching structures don’t, in fact, have to include multiple branches.
Branching
In order to work on multiple features at once, or get a bugfix done quickly, you can start to use multiple branches, merging changes as appropriate. This works as follows:
-
You’re working on a feature:
v0 myFeature | | o---o---o---o
-
You need to fix a bug, so you create a new branch starting at the last release
v0 myFeature | | myBugfix o---o---o---o | \----------o
-
When you’ve finished the bug, you release that branch
v0 myFeature | | v0.1 o---o---o---o | \----------o
-
You merge the new release into your own branch
v0 myFeature | v0.1 | o---o---o---o---|---* \----------o--/ ^merge commit
-
When you finish your feature, you release your branch
v0 v1 | v0.1 | o---o---o---o---|---*---o \----------o--/
Master
Using tags for releases works really well from a release-management and version history point of view, but it can get a bit fiddly as a developer - you have to constantly check which tag is the most recent, and ensuring that you’re branching and merging the right commits can get a little tedious. If you’re the only one working on the project, it’s probably not going to get to complicated, because you may well have only a couple of branches at once, and you’ll create each release and therefore be in a better position to remember what’s going on. However, once you have more than one person able to make releases, or you get several branches, you’ll want to handle this potential complexity.
At this point, having a master branch is really useful. Whenever you release, you ensure that master points at that commit. In that way, each time you switch branch, making sure it’s up-to-date with the latest release is simply a matter of merging in master. When you’re dealing with a master branch, your branching diagrams look a little different:
-
You’re working on a feature
master v0 | \ / myFeature o---o | \---o---o
-
You need to fix a bug, so you create a new branch starting at the last release
master v0 | \ / myFeature o---o | myBugfix \ | | \---o---o | \----------o
-
When you’ve finished the bug, you release that branch by merging into master
master v0 | v0.1 | myFeature |/ o---o---------|------* \ | /^merge commit \---o---o / \----------o
-
You merge the new release into your own branch
master v0 | v0.1 | |/ o---o--------------* myFeature \ / \ / \---o---o--/---* \--------o
-
When you finish your feature, you release your branch.
v0 v0.1 v1 master | | | / o---o--------------*---* \ / \ / \---o---o--/---* \--------o
Fast-Forward
The downside of introducing the master branch like this is that we’ve introduced two extra merge commits compared to the previous version - and in fact, half of the commits since v0
are merge commits! This does serious damage to our ability to see quickly and easily what changes have been introduced and when. Fortunately, we don’t always need to do a merge - git has an ability to fast-forward, which means that, when there is nothing to merge, the branch is moved to point to a different commit, without any new commit being added.
To be more specific, a fast-forward occurs when one of the commits to be merged is the ancestor of the other, which you can see happening at v0.1
and v1
above.
If we allow fast-forward commits, we end up with much more attractive diagrams for steps 3 onwards:
-
When you’ve finished the bug, you release that branch (nb we fast-forwarded master!)
v0 master v0.1 | myFeature | / o---o--------|---o \ | \--o---o
-
You merge the new release into your own branch
v0 master v0.1 | \ / o---o----------o myFeature \ \/ \--o---o---*
-
When you finish your feature, you release your branch (nb another fast-forward onto master!)
v0 v0.1 v1 master | | | / o---o----------o---* \ / \--o---o---/
It’s important to realise that this set of diagrams is identical to the original, with a new branch added and some lines in different places - master is simply ‘a branch which will always point to the last release’. In fact, if you are using master in this way, you could choose only ever to fast-forward commits onto it.
Rebase
I think that merge commits are noise. When you have a branch-based workflow, you’re working on a few features simultaneously, and you release regularly, you may end up with 1/3rd or more of your commits being merge commits, and this can mean that when you use git log
you end up with an effectively unreadable mess. Fortunately, in rebase
we have a tool that lets us re-arrange our commit history in an extremely readable and pleasant manner. It works exactly the same as above up to step 4, at which point instead of merging, we rebase, which takes all of the commits we made on our branch and then applies them on top of the target, which means it’s as though we just checkout out the latest release and instantaneously developed on top of it. This leaves the history looking like this:
master
v0 | v0.1
| \ /
o---o----o myFeature
\ |
\--o---o
which in turn means that when we release myFeature, we get this:
v0 v0.1 v1 master
| | \ /
o---o----o---o---o
…which is extremely easy-to-follow.
This is the workflow that I use on my perl modules. The habitual use of rebase during development is not without controversy, however; to be able to rebase accurately and effectively whilst avoid messing up your own and other people’s work requires discipline and experience. You have to ensure that you don’t rebase a branch which you’ve pushed to a shared git server, and that when you do rebase you are aware of potential conflicts and the ways to resolve them - because it’s less obvious after the fact that when you do a merge. It was this article which got me thinking about the ways that rebase is in fact a brilliant tool to have up your sleeve, and I do think that on projects with a high enough level of expertise, it should be used.
Another caveat to add at this stage is that rebase is, like all tools, not always appropriate. If your branch is more than a few commits divergent, or if the rate of change is so fast that you’re trying to rebase dozens if not hundreds of commits at a time, you may well find that it’s more trouble than it’s worth; git merges exist for an excellent reason. I think that choosing the best way to incorporate change is largely a matter of doing it several times and getting an intuition for it.
Rebasing onto a shared branch
Let’s say you and another developer both working on some feature, and you’ve got a branch called myFeature. You actually have at least 5 branches in at least 3 locations:
- On the server, you have myFeature
- On your computer, you have origin/myFeature and myFeature
- On his computer, you have origin/myFeature and myFeature
To start with, all of the branches look the same. However, once you’ve each done a little work, it can easily look a bit like this:
server/myFeature a---b
\ \
theirs/myFeature \ d
\
mine/myFeature c---e
Now, when I pull from and push to the server, then make another commit, this happens:
server/myFeature a---b-------*
\ \ / \
theirs/myFeature \ d / \
\ / \
mine/myFeature c---e f
And they do the same, which looks like this:
server/myFeature a---b-------*-----*
\ \ / \ / \
theirs/myFeature \ d---/---\-/ g
\ / \
mine/myFeature c---e f
This rapidly becomes messy and has unnecessary merge commits, not to mention being hard to follow. However, what would have happened had we fetched and rebased instead of pulling is the following much neater result:
server/myFeature a---b---c'--e'--d'
\ \
theirs/myFeature \ g
\
mine/myFeature f
Essentially, a competent developer using git should almost always rebase when the commits to be pushed are not yet on a server.
Pull requests
One good use for merges is that they allow peer-review and attribution of changes. This leads to the idea of a ‘pull request’ - some contributor sends a message saying
Please pull1 my branch into your repository
At this point, every git tool out there will show you exactly what has changed and why, enabling you to have confidence in the features they’ve developed, and it also makes it easy to appreciate their contributions. Pull requests are a crucial tool for collaboration on projects where there is anything other than a small and tightly-knit team.
When you have a pull request based workflow, your master branch will look something like this:
master ---*---*---*---
feature A o-/ / /
feature B ---o-/ /
feature C -o----o-/
This means that every commit on master is a merge commit, and they will probably look something like ‘Merge pull request #4 from my-super-special-feature to master’. This does mean that’s it’s often harder to find the specific commit which introduced a change.
Develop
At some point you’ll be working on a system where you can’t simply release several times a week, and releases need to be gathered, tested, signed off, and deployed. Some might argue that this is a pathology, but it’s also a fact of life. In this situation, you may well add in a branch for work that’s done, but not yet released. Depending on your background, you may want to call this several names, including stable
and trunk
- in git, it’s called develop
.
This looks like this:
-
You start with master and develop
master v0 develop \ | / o
-
You do some work on the develop branch using one or more of the above principles:
v0 master \ / o develop \ | o---o---o
-
You’re ready to release, so you fast-forward master onto develop and tag a release
v0 master v1 develop | \ | / o---o---o---o
-
Rinse & repeat
Git-Flow
Git-Flow is essentially: having a master branch, a develop branch, and additional feature branches, without using fast-forward or rebase. It ends up looking a bit like this:
master --o----------------------*---
\ / \
develop \----------*-------*---*
\ / \ /
feature1 \--o---o \ /
\ \ /
feature2 o----o--o--*
It has the advantage of being able to accommodate reasonably-sized teams of relatively-low expertise, but it also has a fair number of disadvantages - which have been discussed at length everywhere.
Beyond Git-Flow
Git-Flow starts breaking down once you hit a large number of simultaneous teams; once you hit about 50 feature branches, you spend so much time merging down from develop and there are so many merge commits, that you lose a lot of the benefits of using git to begin with. At this point, it’s much easier to set up a branch per epic2 and have the team working on that treat it as a master branch - once the epic is ready for release, that’s then released as normal. What this means is you have:
master o---------------*---*
\ / /
epic 1 \-#Black Box# /
\ /
epic 2 \-#Black Box#
So, depending on those teams’ structures, they may be using anything from an extremely trivial workflow up to a full on mini-Git-Flow. It’s at this point that your branching structure starts looking a bit like a fractal.
Forks
If you’re going to treat each team’s work on your product to be a separate black box waiting to be pull requested back into the develop or master branch, you may as well get them to work in separate forks - this prevents you from getting a gradual buildup of 300 stale branches where nobody’s quite sure who’s working on what.
Using forks can also unlock some useful functionality in whatever git server you’re using; Atlassian’s Stash has a ‘fork syncing’ feature which allows you to automatically apply any commit which is applied to a branch in a parent repository to all the child forks. It allows each team to set fine-grained permissions and have administrative access, isolates critical infrastructure, and makes setting up continuous integration easier (you just clone the CI environment and point it at a different URL, rather than having to reconfigure all the branches).
Per-environment branches
Depending on the way you have your continuous integration environments set up, you may want to use a branch to represent test and staging environments. However, you probably won’t want to ever merge these branches into anywhere else - tickets that are in for testing are explicity untested, and tickets undergoing UAT are not UAT’d. One successful approach is:
- A ticket is moved to ‘development complete’
- A pull request is automatically opened to the relevant test environment
- A build plan detects the pull request and attempts to build and deploy the pull request
- If the build and deployment is successful, the pull request is automatically merged
Travis CI has a great feature where it automatically detects pull requests and builds them; Atlassian Bamboo has a feature where it can automatically merge branches if a build passes, and they are both good examples of how using even simple git features can save you a lot of work.
Tags: Git
Bundling App::SFDC for fun and profit
12 August 2015
Motivation
Whilst installing perl and App::SFDC along with a (quite large) number of dependancies is fun, effective and powerful, it’s not always the best solution for Salesforce deployment tools. When you’re deploying from throwaway AWS instances or sending your tools to developers in foreign countries who want something that just works now, you may want to provide a ready-to-go bundle of code. Fortunately, programs such as PerlApp provide a pretty good way to achieve this.
I’m going to run through how to bundle App::SFDC to a standalone .exe suitable for deploying and retrieving metadata on a windows machine.
Introduction to PerlApp
The idea behind PerlApp is pretty straightforward: you point it at a script, and it calculates the module dependancies and bundles the perl interpreter along with all required modules into a .exe, which can then be run without a local perl installation - essentially, you run perlapp --exe SFDC.exe C:\perl64\site\bin\SFDC.pl
.
Loading prerequisites
Of course, when you do this and run the resulting executable, there are some modules missing - it’s hard to detect all of the prerequisites, especially when they’re being dynamically loaded in. Examples of this are that WWW::SFDC loads in modules by running:
for my $module (qw'
Apex Constants Metadata Partner Tooling
'){
has $module,
is => 'ro',
lazy => 1,
default => sub {
my $self = shift;
require "WWW/SFDC/$module.pm"; ## no critic
"WWW::SFDC::$module"->new(session => $self);
};
}
In a similar way, when you create a screen appender for Log::Log4perl , it quietly loads in Log::Log4perl::Appender::Screen. To fix this sort of issue, we add a few more arguments to perlapp:
perlapp --add MooX::Options::Role^
--add App::SFDC::Role::^
--add Log::Log4perl::Appender::Screen^
--add WWW::SFDC::^
--exe SFDC.exe C:\perl64\site\bin\SFDC.pl
Fixing SSL certification
Perl isn’t great at picking up a system’s SSL settings, especially installed certificates - and when the entire purpose of a script is to send HTTPS requests, it’s something that you just have to get right - lest you get errors like 500 C:\Users\ALEXAN~1\AppData\Local\Temp\pdk-alexanderbrett/Mozilla/CA/cacert.pem on disk corrupt at /<C:\Dev\App-SFDC\SFDC.exe>WWW/SFDC.pm line 66.
.
One successful workaround I’ve found to this sort of error, which works whenever curl is installed, is to use curl’s Certificate Authority file instead of perl’s. You can find this by running curl -v https://login.salesforce.com >nul
and looking for the lines like:
* successfully set certificate verify locations:
* CAfile: C:\Program Files (x86)\Git\bin\curl-ca-bundle.crt
Then, you set HTTPS_CA_FILE=C:\Program Files (x86)\Git\bin\curl-ca-bundle.crt
and your HTTPS connections start working again. This amount of manual faffing around is more than most developers or AWS images want to do, and fortunately PerlApp has our back again - we can bind in arbitrary files, and specifiy arbitrary environment variables. Let’s add more arguments to perlapp:
...
--bind certs/cafile.crt[file="C:\Program Files (x86)\Git\bin\curl-ca-bundle.crt",text,mode=666]^
--env HTTPS_CA_FILE=certs/cafile.crt^
...
Binding Retrieve plugins
Since we’re going to be wanting to use App::SFDC::Command::Retrieve, we need to make sure the plugins and manifests mentioned are, in fact, included. By default they are installed to the perl share/
location, and PerlApp won’t see them! This is how to bind in the default values:
...
--bind manifests/base.xml[file=C:\perl64\site\lib\auto\Share\dist\App-SFDC-Metadata\manifests\base.xml,text,mode=666]^
--bind manifests/all.xml[file=C:\perl64\site\lib\auto\Share\dist\App-SFDC-Metadata\manifests\all.xml,text,mode=666]^
--bind plugins/retrieve.plugins.pm[file=C:\perl64\site\lib\auto\Share\dist\App-SFDC-Metadata\plugins\retrieve.plugins.pm,text,mode=777]^
...
We should also ensure any dependencies from retrieve.plugins.pm
are loaded:
...
--scan C:\perl64\site\lib\auto\Share\dist\App-SFDC-Metadata\plugins\retrieve.plugins.pm
...
This is the point at which you may want to override these values! If you have specific requirements for your manifests, for the folders you want to retrieve, or anything like that, create your own versions of those files and bundle those in instead.
Why doesn’t it work yet?
This was all great up to v0.13, but now this approach stopped working from v0.14 onwards. At that point I moved from a monolithic everything-in-one package approach to a dynamically loading plugin-oriented architecture, which allows anybody to create a command by naming their package App::SFDC::Command::Foo
. The code that makes that happen is:
find
{
wanted => sub {push @commands, $1 if m'App/SFDC/Command/(\w*)\.pm'},
no_chdir => 1
},
grep {-e} map {$_.'/App/SFDC'} @INC;
…and this approach is completely broken by PerlApp - when running this, you get the error invalid top directory at /<C:\Dev\App-SFDC\SFDC.exe>File/Find.pm line 472.
, because PerlApp doesn’t create any recognisable directory structure for bundled modules - it provides an overloaded version of require
which gets the required module from somewhere non-obvious.
After trying a few different things, it seems that the simplest way to achieve a nicely-bundled .exe is going to be to write a new script which avoids the pitfalls of detecting commands at runtime. We can, in fact, write a small perl program which writes the script for us (compare the output of this to SFDC.pl - it’s the same idea, but static):
#!perl
use strict;
use warnings;
use 5.12.0;
use App::SFDC;
my $commandArrayDefinition = 'my @commands = ("'
. (join '","', @App::SFDC::commands) . '");';
say <<'HEAD';
package SFDC;
use strict;
use warnings;
HEAD
say "use App::SFDC::Command::$_;" for @App::SFDC::commands;
say 'my @commands = ("'
. (join '","', @App::SFDC::commands)
. '");';
say <<'BODY';
my $usage = join "\n\n",
"SFDC: Tools for interacting with Salesforce.com",
"Available commands:",
(join "\n", map {"\t$_"} @commands),
"For more detail, run: SFDC <command> --help";
my $command = shift;
exit 1 unless do {
if ($command) {
if (my ($correct_command) = grep {/^$command$/i} @commands) {
"App::SFDC::Command::$correct_command"->new_with_options->execute();
} else {
print $usage;
0;
}
} else {
print $usage;
}
}
BODY
__END__
Tying it all together
Using perl -x
in a batch file, we can combine the perl script-writing script and the call to PerlApp into one easy-to-digest package, by using some syntax like:
perl -x %0 > static_SFDC.pl
perlapp ^
...
--info CompanyName=Sophos;LegalCopyright="This software is Copyright (c) 2015 by Sophos Limited https://www.sophos.com/. This is free software, licensed under the MIT (X11) License"^
--norunlib --force --exe SFDC.exe static_SFDC.pl
goto :endofperl
#!perl
use strict;
...
__END__
:endofperl
For a full version, I’ve created a gist to play with.
Logging::Trivial - or, why to hold off on that logging module you wrote
03 May 2015
A while back, I wrote WWW::SFDC as well as a few programs calling it, and I wanted the world’s most trivial logging module which still allowed for 5-level (DETAIL, DEBUG, INFO, WARN, ERROR) logging. I couldn’t find anything appropriate on cpan, so I rolled my own and called it Logging::Trivial.
Now, I was pretty happy with this, and got it all ready to be a grown-up cpan module (my first), so I went on prepan and said, guys, what do you think. I wouldn’t call it a slap-down, but I got some pretty robust advice not to publish Yet Another Logging Module, and as a result I decided that I’d sit on it - I wouldn’t refactor it out until I found a suitable replacement, but I wouldn’t publish.
Today I revisited the issue and found Log4Perl’s easy-mode, and I’m very pleased because it does exactly what I want, with very, very little rewriting of code. I ran perl -i.bak -pe 's/Logging::Trivial/Log4Perl ":easy"/; s/DETAIL/TRACE/; s/ERROR/LOGDIE/;'
and was essentially done.
I think the moral of the story is that when you’re not quite sure that your solution is going to stand the test of time, wait a while to see whether it does. In my case, it didn’t, but I’m better off for that, and so is cpan.
Tags: Perl