Alexander Brett

Handling repository rewrites with git

16 January 2015

Let’s say you’ve decided that you need to make a change which changes almost every line in your git repository, for instance if you’ve realised you’ve got your line endings all messed up and want to make them uniform. If you’re the only developer, or you can close every branch so that you can make your change on one branch only, you’re ok. However, if you have dozens of developers working on dozens of branches, you’ll come across a problem which is that once you’ve applied your change, the next time you attempt to merge anything, every single line will come up as a merge conflict in git.

Let’s say you have branches A and B, and that each branch has some changes to a file called foo.txt, which has CRLF line endings, and you introduce a commit on each branch which changes them to LF. Git sees this as a change on every line in each branch, which means that there is nothing in common with the base commit of those branches, so it simply has nothing to go on when merging.

The good news is that by a little bit of git trickery, you can avoid this situation altogether. In my organisation, we have some branches organised as follows: master is our currently-released branch. develop is all features which are done, and is branched from master. qa is a testing branch and all features are merged into it. Each feature has a branch which is created from develop. I hope this diagram makes the situation clear!


master   --*--------------------------------------*---
            \                                    / \
develop      *-*-*-*-------------*--------------*---*-
                \ \ \           / \            /
QA               \ \ *-----*---/---\-------*--/-------
                  \ \     /   /     \     /  /
feature1           \ *-*-*---*       \   /  /
                    \                 \ /  /
feature2             *----*---*--*-----*--*

This means that unless we have a hotfix underway, every branch being worked on is branched from develop, and in general we keep develop merged into each branch as much as possible. The thing that will make this rollout of a huge number of changes possible without breaking everything is creating a point on develop which we ensure is merged into every branch, then, applying our mass change on every branch, without any other commit. This means that every branch gets a commit called something like apply mass change. Lastly, we will pretend that nothing happened.

Let’s go into a bit more detail. In this example, I was trying to compress some profiles and apply whitespace changes at the same time. I’m going to tell it as a story because it works better that way.

##Preparation

I created a branch called, addGitattributes. This contained only one change - the addition of a .gitattributes file detailed in my last post. Other than that, it was created from master, so I was guaranteed that it would merge into any branch just fine.

Then, I created a batch script called for instance, doMassChange.bat. It looked like this, although yours will vary depending on what you were trying to achieve.

git reset --hard && ^
git clean -f && ^
git merge origin/addGitattributes && ^
rm .git/index && ^
git reset && ^
git add -u && ^
git commit -m "Whitespace normalisation commit" && ^
compressProfiles.bat && ^
git add -u && ^
git commit -m "Profile compression commit"

As you can see, I’ve chained each command with && which ensures that if one thing breaks, we stop and the developer has a chance to call me over so I can work out what! Lastly, I created a file called applyMassChangeCleanly.bat (these names are actually fictional to make it clear what I mean, to be honest) which looked like this:

git reset --hard && ^
git clean -f && ^
git merge MASS_CHANGE_DEVELOP_BEFORE && ^
doMassChange.bat && ^
git merge -sours MASS_CHANGE_DEVELOP_AFTER

The crucial bit is the -sours strategy being chosen on the last line. What the ours strategy does is mark the branches as merged, without actually doing a merge. This has all sorts of potential to break things, but because we know we’re going from a known state (the tag MASS_CHANGE_DEVELOP_BEFORE) and applying identical changes (doMassChange.bat), it is in fact perfect for this situation.

I ensured that these two files are propagated onto every branch (you could alternatively distribute them to every developer in another way), and lastly I sent an email to my developers detailing what was going to happen on rollout day.

##Rollout

On rollout day, I got to the office early and made myself a strong coffee, then did the following:

  • Merged master into develop (just to make sure)
  • Tagged develop as MASS_CHANGE_DEVELOP_BEFORE
  • Ran doMassChange.bat on develop
  • Ran doMassChange.bat on master
  • Ran git merge -sours master on develop
  • Tagged develop as MASS_CHANGE_DEVELOP_AFTER
  • Pushed everything (including tags) to the server
  • Ran applyMassChangeCleanly.bat on QA
  • Pushed QA to the server

Then I had another strong coffee.

As the developers got to the office, they did their daily pull of develop and got huge merge conflicts. Then they remembered I’d sent them an email and read it, following which they ran applyMassChangeCleanly.bat on their branches.

And we all lived happily ever after!

Tags: Git

How to handle whitespace with Salesforce.com and git

15 January 2015

A common problem when working in a git repository in a cross-platform environment is end-of-line handling, as testified by the number of stackoverflow questions on the topic! I found that the most useful guide to getting whitespace right in a repository was github’s, but that there were some additional concerns when working with Salesforce.com.

Firstly, it’s important to bear in mind that SFDC provides all of its text files with unix-style (LF) line endings, and I think that the path of least resistance is to stick with what they provide! However, if you’re a windows shop, your developers are probably using tools which introduce windows-style line endings (CRLF) into the files which they touch. The problem with letting this go unchecked is that you are liable to end up with a huge number of merge conflicts which are extremely frustrating, and eventually you put --ignore-space-change or --ignore-whitespace on every git command.

The first recommendation of github’s guide is to set core.autocrlf=true and call it a day. However, you must not do this! The reason why not is that when you retrieve from SFDC, your static resources are saved as src/staticresources/foo.resource, and git does not by default recognise that these are binary files. This means if you just set up autocrlf, git will mangle these files by deleting bytes which it thinks are CR characters and are in fact useful information, and then SFDC will stop being able to read the files.

So the correct solution is to set up a .gitattributes file in the root of your git repository which looks a lot like this:

# ensure all salesforce code is normalised to LF upon commit      
*.cls text=auto eol=lf                                            
*.xml text=auto eol=lf                                            
*.profile text=auto eol=lf                                        
*.permissionset text=auto eol=lf                                  
*.layout text=auto eol=lf                                         
*.queue text=auto eol=lf                                          
*.app text=auto eol=lf                                            
*.component text=auto eol=lf                                      
*.email text=auto eol=lf                                          
*.page text=auto eol=lf                                           
*.object text=auto eol=lf                                         
*.report text=auto eol=lf                                         
*.site text=auto eol=lf                                           
*.tab text=auto eol=lf                                            
*.trigger text=auto eol=lf                                        
*.weblink text=auto eol=lf                                        
*.workflow text=auto eol=lf                                       
*.reportType text=auto eol=lf                                     
*.homePageLayout text=auto eol=lf                                 
*.homePageComponent text=auto eol=lf                              
*.labels text=auto eol=lf                                         
*.group text=auto eol=lf                                          
*.quickAction text=auto eol=lf                                    
*.js text=auto eol=lf                                             
*.py text=auto eol=lf                                             
*.pl text=auto eol=lf                                             
*.csv text=auto eol=lf                                            

… which is to say, every metadata type which you know is going to be text gets an entry, but those types which might be binary get no entry (or you can add *.staticresource binary etc). This probably isn’t quite comprehensive depending on your setup, because inside documents/*/ you can end up with arbitrary file endings - however, normally the files you have there have ‘normal’ filenames, such as downArrow.png or footer.html which git has a chance of being able to recognise as binary or not.

Once you’ve set up your .gitattributes properly, if you’re starting off a new repository you’re good to go, but if you’re having to apply these changes to a repository which you already have developers working on, you need to be quite careful about rolling them out. I’m going to write a post on that topic soon.

Tags: SFDC Git

How and why to compress your Salesforce.com profiles

15 January 2015

Why compressing your profiles is a good idea

When handling Salesforce.com metadata, especially attempting to store it in source control, it doesn’t take long to notice the following:

  • Profiles are big. In fact, they contain 3-10 lines for every Apex Class, Visualforce Page, Object, Field, App, and so on, and so forth. Before long you’ve got thousands of lines, which means…
  • It’s difficult to commit changes to a profile, because you’ve got to scroll down to line 10243 to check that that’s the change you meant.
  • It takes ages to diff your profiles because they take up many megabytes on disk.
  • Profiles are vulnerable to merge errors because git’s standard diff algorithm doesn’t respect xml structure, and good luck finding an algorithm which does which can handle such huge files.

I work with a large salesforce installation with about 110 profiles and 30 permissionsets, each of which is some 25,000 lines long, and they take up 120mb on disk. These are real problems for my organisation, and I had to come up with a solution. I realised that there’s no reason to have exactly what you retrieve from Salesforce.com stored on disk. You can apply retrieve-time transformations to your code so long as:

  • Whatever you store is still deployable.
  • The tools used to retrieve your metadata are uniform across your organisation.

I write developer tools for my colleagues, so I am in a position to guarantee the latter. As for the former, all you have to do is remove a lot of line breaks. The idea is to transform this:

    <applicationVisibilities>
        <application>Order_Management</application>
        <default>false</default>
        <visible>true</visible>
    </applicationVisibilities>
    <applicationVisibilities>
        <application>SendGrid</application>
        <default>false</default>
        <visible>true</visible>
    </applicationVisibilities>
    <applicationVisibilities>
        <application>Territory_Management</application>
        <default>false</default>
        <visible>true</visible>
    </applicationVisibilities>
    <applicationVisibilities>
        <application>standard__AppLauncher</application>
        <default>false</default>
        <visible>true</visible>
    </applicationVisibilities>
    <applicationVisibilities>
        <application>standard__Chatter</application>
        <default>false</default>
        <visible>true</visible>
    </applicationVisibilities>
    <applicationVisibilities>
        <application>standard__Community</application>
        <default>false</default>
        <visible>true</visible>
    </applicationVisibilities>
    <applicationVisibilities>
        <application>standard__Content</application>
        <default>false</default>
        <visible>true</visible>
    </applicationVisibilities>

into this:

<applicationVisibilities><application>Order_Management</application><default>false</default><visible>true</visible></applicationVisibilities>
<applicationVisibilities><application>SendGrid</application><default>false</default><visible>true</visible></applicationVisibilities>
<applicationVisibilities><application>Territory_Management</application><default>false</default><visible>true</visible></applicationVisibilities>
<applicationVisibilities><application>standard__AppLauncher</application><default>false</default><visible>true</visible></applicationVisibilities>
<applicationVisibilities><application>standard__Chatter</application><default>false</default><visible>true</visible></applicationVisibilities>
<applicationVisibilities><application>standard__Community</application><default>false</default><visible>true</visible></applicationVisibilities>
<applicationVisibilities><application>standard__Content</application><default>false</default><visible>true</visible></applicationVisibilities>

The key idea is that each metadata component, whether an application, a custom field, a visualforce page or anything else, gets precisely one line in the resulting document, which means:

  • Any addition, deletion or modification of a component changes exactly one line
  • The addition or removal of lines is guaranteed to result in well-formed XML which is deployable.
  • Merges are much, much easier to perform.
  • Since git diff works line-by-line and we’re reducing the file from 25,000 to 2,500 lines, we gain a huge increase in efficiency when working with git.
  • We get back about 500kb of disk space per file.

For really tiny Salesforce instances, this might be overkill, but you can see that once you get big enough, this makes a real impact.

##How to do this compression

I produced an extremely simple Perl script to carry out this compression. Why Perl?

  • Unmatched string processing ability
  • Perl 5.8.8 comes bundled with a git installation on windows

Save this file as profileCompress.pl:

BEGIN { $\ = undef; }
s/\r//g;                  # remove all CR characters
s/\t/    /g;              # replace all tabs with 4 spaces
if (/^\s/) {              # ignore the the xml root node
  s/\n//;                 # remove newlines
  s/^    (?=<(?!\/))/\n/; # insert newlines where appropriate
  s/^(    )+//;		      # trim remaining whitespace
}

Then every time you do a retrieve, invoke it with perl -i.bak -p profileCompress.pl src/profiles/*.profile src/permissionsets/*.permissionset. The obvious disclaimer about backing up your code first because it might get mangled and I can’t take any responsibility for that applies!

I handle this by adding

<exec executable = "perl">
	<arg value = "-pi.bak"/>
	<arg value = "${lib.dir}/script/profileCompress.pl"/>
	<arg value = "${src.dir}/profiles/*.profile"/>
	<arg value = "${src.dir}/permissionsets/*.permissionset"/>
</exec>

to my ant script right after I retrieve, once I’ve stored all of my stuff inside the folders stored in those variables.

Tags: SFDC Git

The singlePackage option in Salesforce.com metadata deployments

09 December 2014

I’ve been working on a metadata client application for Salesforce.com to completely replace ant, because once you reach a certain level of complexity, ant really doesn’t cut it anymore! However, I found that when deploying a large .zip file (130MB of data compressed to 19MB using deflate with compression level 9) my deployments were taking far, far longer than the equivalent ant deployment of the same files.

In fact, the deployment would hang in the ‘waiting’ status for over an hour before starting to make progress. I finally found the solution by essentially iterating over every combination of deployment options in my SOAP call, and thought I’d write a quick post in case anybody else needs to fix the issue.

The problem occurred when deploying a package using this file structure, without the singlepackage option set.

zip root
|  unpackaged
|  |  classes
|  |  | files
|  |  | ...
|  |  triggers
|  |  | files
|  |  | ...
|  |  ...
|  |  package.xml

I found that by setting singlePackage=true and re-arranging the zip file as follows, I was able to completely eliminate the time spent hung in ‘waiting’.

zip root
|  classes
|  | files
|  | ...
|  triggers
|  | files
|  | ...
|  ...
|  package.xml

I hope somebody else stumbling across this issue finds this and it helps!

Tags: SFDC

A Regex for validating XML text

16 September 2014

I’m currently working on version 3 of a tool to diff XML files in a content-aware manner, viz. to parse nodes and identify where two nodes are different, rather than just apply a string diff to them. A crucial part of that is that the parsing be fast and robust.

Every language there is has an XML parsing library, but often they are not as fast as I’d like, or it’s difficult to have it return a tree-structure well-suited to writing the diff algorithm, so I decided to write my own.

The route I decided to go down was a recursive regular expression capturing the node’s declaration (if any), tagName and properties, and then to extract the child nodes and pass each of them through the regex.

To start with, the possible node types are xml, text, and CDATA:

m{
  (
    \s*<!\[CDATA\[.*?\]\]>
  |
    [^<>&]+                  # Text Node
  |
    \s*<(\w+)>.*?<\/\2>      # XML Node
  )
}xs

Now let’s add support for properties, self-closing tags, and an XML declaration, and name the capture groups:

m{
  (?<outerXML>
    \s*<!\[CDATA\[.*?\]\]>
  |                          # Text Node
    [^<>&]+
  |                          # XML Node
    (?<declaration> \s*<\?.*?\?>)?
    \s*<
    (?<tagName>     \w+)
    (?<properties>  [^>]*?)
    (                        # tag is self-closing...
      \/>
    |                        # ...or has content
      >.*?<\/\g{tagName}>
    )
  )
}xs

This is great and kind of works ok, but it does fall down when it meets a construct like <a><a><a></a></a><a></a></a><a></a> thanks to not being intelligent about searching for a tag’s inner XML - we can fix this by using Perl’s crazily powerful recursive (?0) syntax.

The other downside of this regex as written is that it isn’t fast at all - feed it a few thousand lines of XML and it’ll really struggle. Fortunately, we can use another powerful perl regex feature, the atomic group (?> ... ). This speeds up the match by an order of magnitude for large files. The final version is this:

m{
  (?<outerXML>
    (?>
      \s*<!\[CDATA\[.*?\]\]>
    |                          # Text Node
      [^<>&]+
    |                          # XML Node
      (?<declaration> \s*<\?.*?\?>)?
      \s*<
      (?<tagName>     \w+)
      (?<properties>  [^>]*?)
      (?>                      # tag is self-closing...
        \/>
      |                        # ...or has content
        >
        (?<innerXML> (?0)*\s* )
        <\/\g{tagName}>
      )
    )
  )
}xs

It happens that although this regex is a way of extracting data from an XML node, it also only matches valid XML, in a speedy way.

The Mandelbrot set in Javascript

29 August 2014

A while back I came across a fun puzzle on codegolf.stackexchange.com to generate the Mandelbrot set in as few characters as possible. I’m happy with my javascript entry as an example of how a usually pretty readable language can be made horrendous! Here’s the entry:

document.body.appendChild(V=document.createElement('Canvas'));j=(D=(X=V.getContext('2d')).createImageData(Z=V.width=V.height=255,Z)).data;for(x=Z*Z;x--;){k=a=b=c=0;while(a*a+b*b<4&&Z>k++){c=a*a-b*b+4*(x%Z)/Z-3;b=2*a*b+4*x/(Z*Z)-2;a=c;}j[4*x]=99*k%256;j[4*x+3]=Z;}X.putImageData(D,0,0);

And here’s the output:

The codegolf version

I’ve indented the code made the variables more verbose, and added some comments:

document.body.appendChild(Canvas=document.createElement('Canvas'));
dataArray=(
  imageData=(
    context=Canvas.getContext('2d')
  ).createImageData(
    //save space by making the image size equal to the number of iterations
    Z=Canvas.width=Canvas.height=255,Z
  )
).data;

// Rather than two nested for loops, use one big one, and use
// (x%Z)/Z and x/(Z*Z). Also, decreasing instead of increasing
// means you don't need to specify an end value
for(x=Z*Z;x--;){
  k=a=b=c=0;
  
  // This would also traditionally be a for loop, but this while
  // loop is very terse
  while(
    a*a+b*b<4
    &&Z>k++
  ){
    c=a*a-b*b+4*(x%Z)/Z-3;
    b=2*a*b+4*x/(Z*Z)-2;
    a=c;
  }
  // the quickest way I could find to populate the red channel with a
  // semi-random value
  dataArray[4*x]=99*k%256; 
  dataArray[4*x+3]=Z;
}
context.putImageData(imageData,0,0);

What I wanted to do is adapt and extend this code a bit, which doesn’t really fit into the code-golf format. I added zooming on the left and right clicks, a few extra parameters on the generator so that it can generate julia fractals too, and you can set arbitrary bounds and iterations. I had enough fun playing around with different parameters that I thought it’d be good to share it - for more in-depth tweaking, try this version.

var generateMandelbrot = function (
    Canvas, iterations, limit, cX, cY, scale, u, w, z
) {
    dataArray=(
      imageData=(
        context=Canvas.getContext('2d')
      ).createImageData(Z=Canvas.width, Z)
    ).data;
    for (i=Z*Z;i--;) {
      var k = c = 0,
          a = x = scale*(2*(i%Z)/Z-1) - cX,
          b = y = scale*(2*i/(Z*Z)-1) - cY;
      while (a*a+b*b < limit && iterations > k++ ) {
        c=a*a-b*b+u*x+w;
        b=2*a*b+u*y+z;
        a=c;
      }
      dataArray[4*i+3]=k*255/iterations;
    }
    context.putImageData(imageData,0,0);
}

(function init() {

    var recalc = function(evt,elem) {
        cX -= scale*(2*(evt.pageX-elem.offsetLeft)/Canvas.width-1);
        cY -= scale*(2*(evt.pageY-elem.offsetTop)/Canvas.height-1);
    }
    var Canvas = document.createElement('Canvas'),
        iter   = 50,
        limit  = 4,
        cX     = 0,
        cY     = 0,
        scale  = 1.5,
        u = 1,
        v = 0,
        w = 0;
    document.body.appendChild(Canvas);
    Canvas.width=Canvas.height=400;
    generateMandelbrot(Canvas, iter, limit, cX, cY, scale, u, v, w);
    
    Canvas.addEventListener('click',function(e){
        recalc(e,this);
        scale /= 2;
        iter *=1.25;
        generateMandelbrot(
            Canvas, iter, limit, cX, cY, scale, u, v, w
        );
    });
    Canvas.addEventListener('contextmenu', function(e){
        e.preventDefault();
        recalc(e, this);
        scale *= 2;
        iter /= 1.25;
        generateMandelbrot(
            Canvas, iter, limit, cX, cY, scale, u, v, w
        );
    });
}());