Trey Hunner

web development, programming, open source

Migrating From Subversion to Git

| Comments

I recently migrated multiple Subversion repositories to Git. I found this blog post very helpful during the process. Here are some tips I found helpful that were not mentioned in that post.

Generating an authors file

Subversion denotes authors by usernames while Git uses name and email. An authors file can be used to map subversion usernames to Git authors when creating the Git repository, like so:

git-svn clone --stdlayout --authors-file=authors.txt http://your/svn/repo/url

The trickest part about making an authors file was finding all the authors. I found this command useful for finding usernames of all Subversion committers:

svn log | awk '($0 ~ /^r/) {print $3}' | sort -u

A similar method of creating an authors file is presented here.

Removing git-svn-id messages

When migrating from Subversion, every commit messages has a git-svn-id line appended to it like this one:

git-svn-id: http://svn/repo/url/trunk@9837 1eab27b1-3bc6-4acd-4026-59d9a2a3569e

If you are planning on migrating away from your old Subversion repository entirely, there’s no need to keep these. The following command (taken from the git filter-branch man page) removes these git-svn-id lines from all commit messages in the current branch:

git filter-branch -f --msg-filter 'sed -e "/git-svn-id:/d"'

Removing empty commit messages

Subversion allows empty commit messages, but Git does not. Any empty commit messages in your newly migrated git repository should be replaced so commands like git rebase will work on these commits.

This command will replace all empty commit messages with “<empty commit message>”:

git filter-branch -f --msg-filter '
read msg
if [ -n "$msg" ] ; then
    echo "$msg"
else
    echo "<empty commit message>"
fi'

Poking around

After you’ve cleaned up your new master branch, you should cleanup other branches you plan to keep. Just git checkout each branch and repeat the same steps. To find the remote subversion branches available for checkout use git branch -a.

Migrating between version control systems may be a good time to permanently cleanup commit history with a git rebase or eliminate large files that were never used with a git filter-branch. Just remember to make backups of previous working versions of branches before changing them, just in case.

Django and Model History

| Comments

Recently I had a need to store a snapshot of every state of particular model instance in a Django application. I basically needed version control for the rows in my database tables. When searching for applications that provided this feature, which I call model history, I found many different approaches but few good comparisons of them.  In an attempt to fill that void, I’m going to detail some of my findings here.

django-reversion

The django-reversion application was started in 2008 by Dave Hall. Reversion uses only one table to store data for all version-tracked models. Each version of a model adds a new row to this table, using a JSON object representing the model state. Models can be reverted to previous versions from the admin interface.  This single-table version structure makes django-reversion very easy to install and to uninstall, but it also creates problems when model fields are changed.

django-revisions

The django-revisions application was created by Stijn Debrouwere in 2010 because the existing Django model history applications at the time were abandoned or suffered from fundamental design problems. Revisions uses a model history method called same-table versioning (design details outlined here). Same-table versioning adds a few fields to each version-tracked model which allows it to record the most recent version of each model as well as old versions in the original model table. Model changes are simplified because they change all versions at once and no new tables need to be added to use revisions (just new fields on existing tables). The only problem I found with revisions was that it does not currently support database-level uniqueness constraints. Adding unique=True to a model field or a unique_together Meta attribute will result in an error. Currently uniqueness constraints must be specified in a separate way for Revisions to honor them when saving models.

django-simple-history

The django-simple-history application was based on code originally written by Marty Alchin, author of Pro Django. Marty Alchin posted AuditTrail on the Django trac wiki in 2007 and later revised and republished the code in his book Pro Django in 2008, renaming it to HistoricalRecords. Corey Bertram created django-simple-history from this code and put it online in 2010.

Simple History works by creating a separate “historical” model for each model that requires an audit trail and storing a snapshot of each changed model instance in that historical model. For example, a Book model would have a HistoricalBook created from it that would store a new HistoricalBook instance every time a Book instance was changed. Collisions are avoided by disabling uniqueness constraints and model schema changes are accepted by automatically changing historical models as well. This method comes at the cost of creating an extra table in the database for each model that needs history.

My conclusions

When testing these three applications myself, I immediately eliminated django-reversion because I needed to allow easy model schema changes for my project. I found that both django-revisions and django-simple-history worked well with schema migrations through South (which I use on everything). Django-revisions worked better for data migrations in South (due to only needing to change one model), but the uniqueness constraint problems with django-revisions would have been problematic for some of my models. So eventually I settled on django-simple-history.

Sharing Screenshots in Linux

| Comments

I have been using Github Issues recently and loving its simplicity.  Unfortunately, I’ve found that I often need to upload screenshots to demonstrate bugs and Issues does not support file uploads.  There are Windows and Mac applications that solve this problem by capturing a screenshot, uploading it, and copying a URL to access the screenshot to the clipboard.

I did not find any Linux applications that will capture/upload a screenshot and copy the URL but I discovered a thread in the Dropbox forums with a script that does just that.  I added comments to the script, changed the variable names, removed the need for a temporary file, and added a notify-send call as a visual cue (should work on Ubuntu).  I have the script mapped to Ctrl-PrtScrn in Ubuntu.

Encrypted Private Keys in Django

| Comments

Uniquely identifiable URLs are necessary for many web applications. For example, a website that provides book reviews may identify the URL of a specific book like this: www.example.com/books/8839/. The easiest way to identify entities in Django is to use the unique primary key of each object, which by default is an auto-incremented positive integer.

Revealing the primary key of an entity is often not desirable. An astute visitor of the website mentioned above may be able to guess information from the URL such as how many book reviews are available on the website or how old specific reviews are.

The code snippet below demonstrates one way to use a unique but cryptic identifier for an object without needing to change the way primary keys are generated. There are two notable extensions to the basic Django Model in the below code:

  1. The encrypted_pk and encrypted_id model properties return an AES-encrypted version of the primary key as a 13 character base-36 string.
  2. The get method of the default manager can be queried with an encrypted primary key by using the keyword argument encrypted_pk.

Feel free to use this code however you want.