⚉ DWIM Atom feed

DWIM — Trying to make the computer Do What I Mean

2016-01-11 -- Fast-Forward and parent reversal

Consider a feature branch which has conflicts against the mainline branch. The developer merges from upstream in order to bring in those changes and resolve the conflicts; the branch is then pushed to the server, where the web interface offers a "merge" button for this branch.

If the mainline branch has not moved since the developer merged it into the feature branch, the server has two options to perform the merge: it can create a new commit on top of the mainlie branch in which it merges the feature branch into the mainline branch, or it can perform a fast-forward "merge", making the mainline branch point to the same commit as the feature branch.

These are the two modes of operation of the git merge command, and it's what is generally expected when a program or system implements merge functionality in git. However, the second option can cause issues in the scenario mentioned above.

The history fo the parents is significant during a merge. The first parent is the commit into which changes are being introduced. The second parent is the commit being brought in as a merge. So the Git system knows which direction merges went in. In this case we have a merge from the mainline into the feature branch.

By making the tip of the feature branch now also become the tip of the mainline branch, we would make this feature branch in essence the main branch. This means that the first parent of this commit is the tip of the feature branch, and the incoming changes are from the mainline branch. This also means that all those commits which were made on the mainline branch between the branch point and the fast-forward "merge" will now look like they've been done in a side branch (interestingly in one which bears the name of the mainline branch).

In the scenario above, after merging into the feature branch, the branches look like

-----x----x----x mainline
      \         \
       x--x---x--x feature

and if we perform a full merge, the history becomes

-----x----x----x---x mainline
      \         \ /
       x--x---x--x feature

and looking at the log from the mainline's point of view everything is as expected. If we however perform a fast-foward (what git merge would do by default) we get something different

------x--x---x---x mainline, feature
       \        /

By making the tip of the feature branch become the tip of the mainline, those three commits we saw in the mainline have now become part of a side of the history, and those commits building up the feature branch now form part of the mainline history.

A look though git log would usually show the same commits, as it orders primarily by time so you would only miss the extra commit made by the no-ff variant. However, if you look at git log --graph, you'll see that the mainline branch now has those small commits from the feature branch, instead of bringing in those changes.

If you use git log --first-parent to filter out side history and figure out what was done directly on the mainline (e.g. what was merged, what was a hotfix made directly there), you'll now see those commits from the feature branch as part of mainline's direct history, and it will hide those commits which were in fact made directly on the mainline and would have shown up as such before the fast-forward "merge".

2012-04-17 -- The case of the magically appearing mmap window

The moral of this story is to make sure you always have your function declarations available wherever you use a function, even if you're sure that you're using the function properly. In libgit2 we access pack files by mapping parts of them, and those mmap windows are stored in a singly-linked list like so:

typedef struct git_mwindow {
    struct git_mwindow *next;
} git_mwindow;

and a function that accepts a git_mwindow *w parameter. As I'd been looking at the mwindow code and was used to seeing &w in many places (that code often works with pointers to pointers), I wrote that instead of simply w. Interestingly enough, the code didn't break immediately but gave wrong results a bit later on. I suspected some odd record-keeping in the mwindow code and so I wrote a loop to dump the window list to the console whenever we tried to locate an open window for a particular range. What I saw confused me even more: a new window had appeared in front of the one we should be using! Not only was there a rogue window open, but it also contained completely wrong values except for the next pointer. So, where was this window coming from? I had instrumented the rest of the code and the only place where this could happen is during the function call, which is when I finally manged to see that I was passing a pointer to my pointer instead of my pointer. Had I moved the function declaration to a header earlier, the compiler would have told me.

But why did it seem that there was an extra window appearing? This is a good way to show how structs in C work. When I passed the pointer to the pointer, the function (or more importantly, my function to dump the list) though it was a pointer to a struct and treated it as such. In C, the first field in a struct must have the same address as the struct itself (that is to say, there is no padding allowed before the first field). Thus, as the pointer was actually to a pointer to the real struct and w and w->next have the same address, when looking at the value of w->next, the function was reading the value of the pointer, which is the only thing that was right (reading the rest of the values would be reading values from the caller's stack, which have no meaning in our context).

And there we have it.

git_mwindow *w;
    some_function(some_var, &w);

can make code think that there is an extra entry in the linked list.

2011-08-31 -- GSoC 2011 libgit2 final report

[This is a copy of my final report e-mail sent to the git and libgit2 lists; http://article.gmane.org/gmane.comp.version-control.git/180505]
Hello all, GSoC is finished and I’ll send the proof of work to Google shortly. Many thanks to everyone who helped me along the way.

So? How did it go? Unfortunately I wasn’t able to do everything that was in the (quite optimistic) original plan as there were some changes and additions that had to be done to the library in order to support the new features (the code movement in preparation for the indexer (git-index-pack) being the clearest example of this. The code has been merged upstream and you want to look at examples of use, you can take a look at my libgit2-utils repo where you can find a functional implementation of git-fetch (git-clone would be about 20 lines more, I just never got around to writing it). Let me give you a few highlights of what new features were added to the library:


A remote (struct git_remote) is the (library) user’s interface to the communications with external repositories. When read from the configuration file, it will parse the refspecs and take them into consideration when fetching. With the most recent changes, you can also create one on the fly with an URL. The remote will create an instance of a transport and will take care of the lower-levels.


The logic exists inside the transports. Currently only the fetch part of the plain git protocol is supported, but the architecture is extensible. The code would have to live in the library, but adding support for plug-ins, as it were, would be an easy task.


The code for parsing and creating these lines is its own namespace, so that it can be used for other transports. It supports a kind of streaming parsing, as it will return the appropriate error code if the buffer isn’t large enough for the line.


This is what libgit2 has instead of git-index-pack. It’s much slower than the git implementation because it hasn’t been optimised yet as it uses the normal pack access methods. Currently the only user would be a git-fetch implementation and that is still fast enough so it’s not that high a priority. As a result of this work, the memory window and pack access code has been made much more generic.

I plan to continue working on this project. The next steps are push (which has quite a few prerequisites, not least pack generation) and smart HTTP support. The addition of the new backend should help make code more generic. After that, SSH support should be a matter of wrapping the existing code up.

2011-08-04 -- Desktop Summit 2011

I thought I'd say it as well

2011-08-03 -- [GSoC 11] libgit2: midterm report

[A bit late, but here is my midterm report in blog form] Hello everyone, As it's the GSoC midterm and I'm taking a rest from coding (my exams are in the next few days) I'm taking this opportunity to write up a more detailed report on what has been happening on the libgit2 network front. All the code is available from my 'fork' on github. The more useful working code has been merged into mainline, and you can get a list of references on the remote. If you want to filter which references you want to see, you can do that as well (with some manual work). I had hoped that fetching and/or pack indexing would be working by now, but sadly the university got in the way. At any rate, here's a list of what's working/implemented:


I believe all the important stuff has been implemented. You can get one from a remote and you can see if a string matches what it describes. You can also transform a string/path from the source to the destination form (this probably has a different name in git.git). The transformation code assumes that a '*' in the destination implies that there is a '*' at the end of the source name as well. This might need to be 'hardened'.


You can parse its information from the configuration file (the push and fetch refspecs will be parsed as well) and an appropriate transport (see below) will be chosen based on the URL prefix. Right now there is a static list, but plug-ins could be supported without much effort if somebody can come up with an use-case. It is through these transports that everything is done through the network (or simulating the network, as in the local filesystem "network" transport).


This is where most of the work actually happens. Each transport registers its callbacks in a structure and does its work transparently. The data structures are still in flux, as I haven't yet found the best way to avoid duplicating the information in several places, and the want/have/need code is really still in it infancy. The idea is that the object list you get when you connect can be used to mark which commits you want to receive or send. Right now only the local filesystem and git/tcp are implemented; and the only working operation is 'git-ls-remote'.

Sliding memory maps, packfile reading and the indexer

Or whatever you want to call them; I believe it's mmfile in git. This code and the packfile reading code live in the "pack ODB backend" so I'm making it somewhat more generic so I can use it without an ODB backend. Once that code is decoupled (which is a good change on its own), writing and indexer shouldn't be too hard. ----- So this is where I am now. I'm a bit behind according to the original schedule but still on track to finish on time. It's been interesting and fun, sometimes a bit frustrating. Thanks to all the people who have helped me thus far. Cheers, cmn
Generated at 2017-02-25 12:57:01 +0100