2009-06-01

How to migrate or merge a CVS or SVN repository to a remote SVN repository

Let's suppose you have one or more local CVS and/or SVN repositories, and you want to merge them (with their change history) to new directories of an existing, possibly remote target SVN repository. This blog post explains how to do this using Unix tools.

The following tools will be used:Please note that we won't use svnsync, because it requires the target repository to be empty. We won't use svn-merge-repos.pl much either, because it requires the target repository to be local. We won't use svnadmin load much either, because it requires the target repository to be local.

Access remote repositories for the first time

For some repositories, you have to specify your username (usually in the command line) and password (usually answering to an interactive prompt) in order to be able to connect. For each machine, you have to do it once, because SVN records your username and password to files under $HOME/.subversion/auth. To avoid problems connecting later, make sure you access the the remote repositories on each machine you'll be working on, so your credentials get saved. The easiest way to do it is to run
svn ls URL://TO/SVN/REPOSITORY --username MYUSER

Install svn-pusher

svn-pusher can add any SVN repository to another SVN repository (as a subdirectory) no matter local or remote, keeping the commit history of the specified revision interval. You can skip this installation step now and come back only if you are asked to use svn-pusher in some of the steps below.

svn-pusher is implemented as a Perl script using SVN's Perl bindings (SVN::Core). The easiest way to install svn-pusher is with root access on a Unix system. For example, on Debian or Ubuntu, run this as root:
# apt-get update
# apt-get install libsvn-core-perl
# echo no | cpan -i SVN::Pusher # this may take about a minute
# type -p svn-pusher
/usr/local/bin/svn-pusher
# svn-pusher help
For the sake of completeness we mention (but we recommend against using) svn-push as an alternative of svn-pusher. svn-push is a tool written in C, available in the SVN contrib directory, and it does something like svn-pusher, but it's dumber: it can commit only a single revision at once, and it needs the head revision number. Here is how to compile it:
$ wget http://svn.collab.net/repos/svn/trunk/contrib/client-side/svn-push/svn-push.c
$ sudo apt-get install libsvn-dev
$ gcc -W -Wall -I/usr/include/subversion-1 -I/usr/include/apr-1.0 \
-Doff64_t='unsigned long long' -o svn-push svn-push.c -lsvn_client-1
$ ./svn-push
Usage : svn-push -r N:M SRC_URL DEST_URL
For the sake of completeness, we also mention the SVN::Push Perl module, which provides the svnpush command. SVN::Pusher seems to be more up-to-date.

Convert the CVS repository to an SVN repository dump

You can skip this step if your source repository is not a CVS repository.

A CVS repository is a directory containing *,v files (possibly in subdirectories), and containing a directory named CVSROOT (or one of its parents containing the CVSROOT).

Download and install cvs2svn from here. It's a Python script, so install Python as well (2.4 or 2.5 should be OK). You don't need root access to run cvs2svn; in fact, it can be run as extracted from the tarball. Example:
$ wget http://cvs2svn.tigris.org/files/documents/1462/44372/cvs2svn-2.2.0.tar.gz
$ tar xzvf cvs2svn-2.2.0.tar.gz
$ cvs2svn-2.2.0/cvs2svn --help
You don't need Subversion itself for this step – cvs2svn (if run with --dumpfile=) needs only Python and the standard Unix sort utility.

Make sure you have your CVS repository on the same machine as cvs2svn. (Copy with scp -r or rsync if necessary.) It is a good and safe idea to make a copy and to use it for the purpose of the conversion. Make sure you have a neighbor or parent directory named CVSROOT next to the repository. The CVSROOT directory can be empty. Make sure that all files in your CVS repository directory are named *,v (i.e. their name ends with ,v). If you don't need all files or all directories, feel free to remove them now. The effect would be as if those files and/or directories have never been added to the CVS repository.

Run cvs2svn --dumpfile=PROJECT.dump PATH/TO/CVS/REPOSITORY . This creates the file PROJECT.dump, which contains all files in the CVS repository, with their full commit history.

Convert the SVN repository to an SVN repository dump

You can skip this step if your source repository is not a SVN repository.

If your source repository is local and you have read access to it, just run
svnadmin dump PATH/TO/SVN/REPOSITORY >PROJECT.dump
Otherwise use svn-pusher like this:
$ svnadmin create PROJECT.copy
$ svn-pusher push URL://OF/SVN/REPOSITORY PROJECT.copy
$ svnadmin dump PROJECT.copy >PROJECT.dump # this may take some time
$ rm -rf PROJECT.copy
As an alternative to svn-pusher, you can use svnsync as well (part of the standard SVN installation), see blog post Dump a SVN repository from a URL how to do it. Please note that both svn-pusher and svnsync are quite slow (as compared to svnadmin dump), and svnsync is better supported and documented since it is part of standard SVN.

Once you have your PROJECT.dump file, use svndumpfilter (part of standard SVN) to get rid of the unnecessary files and directories. You may also edit the file in a text editor to do some other modifications (such as renaming files). The file format should be self-explanatory.

If you expect a file name conflict between the repositories (between source1 and source2 or source1 and target), e.g. multiple repositories have trunk/version.h, it is safest to move/rename all the source repositories to their own directories, and once the merge is done, do a careful and safe svn mv. Here is how to rename everything (e.g. from DIR/TO/FILE to PROJECT.merge/DIR/TO/FILE) in a *.dump file:
$ perl -pi -e's@^(Node-path: )@${1}PROJECT.merge/@' PROJECT.dump
After that, please make sure you add the directory creation into revision 1 of PROJECT.dump. Use your text editor to insert the following lines just below the first PROPS-END line:
Node-path: PROJECT.merge
Node-kind: dir
Node-action: add

Merge an SVN repository dump to a local target SVN repository

You can skip this step if the target repository is not local.
The contents of the dump file PROJECT.dump can be added to an existing local target SVN repository using svnadmin load PATH/TO/TARGET/SVN/REPOSITROY <PROJECT.dump . An example for creating a new target SVN repostiory, and adding multiple projects to it:
$ svnadmin create myprojects
$ cvs2svn --dumpfile=myproject1.dump cvsrepo/dir/myproject1
$ svnadmin load myprojects <myproject1.dump
$ cvs2svn --dumpfile=myproject2.dump cvsrepo/dir/myproject2
$ # (edit myproject2.dump, see below)
$ svnadmin load myprojects <myproject2.dump
$ svn ls -R file://$PWD/myprojects
Please note that cvs2svn adds the creation of the directories trunk, tags and branches to the *.dump file. This will be a problem in svnadmin load myprojects <myproject2.dump, because this tries to add directory trunk, which already exists in repository mpyrojects, so the operation will fail. The solution is to edit the file myproject2.dump, and remove the following lines from near the beginning:
Node-path: trunk
Node-kind: dir
Node-action: add


Node-path: branches
Node-kind: dir
Node-action: add


Node-path: tags
Node-kind: dir
Node-action: add

Merge an SVN repository dump to a target SVN repository

This steps works for both local and remote target SVN repositories, but it's a lot slower than the svnadmin load method described above (which works only if the target SVN repository is local). Install svn-pusher (see the step for it above). It doesn't matter which machine you install svn-pusher to as long as it can connect to the target repository. Copy PROJECT.dump created above to the machine you've installed svn-pusher to. Then run
$ svnadmin create PROJECT.copy
$ svnadmin load PROJECT <PROJECT.dump
$ svn-pusher push file://$PWD/PROJECT.copy URL://TO/TARGET/SVN/REPOSITORY # slow
$ rm -rf PROJECT.copy
Please note that revision numbers in messages reported by svn-pusher are usually off by one, e.g. Committed revision 1 from revision 0. This is normal, the revision numbers will match perfectly in the target repository.

You can use svn-pusher multiple times on the same target repository to merge multiple source repositories. If the target repository is not empty by the time you start svn-pusher, it is safest to dump it first (to have a backup), and then please pay attention to the following facts. svn-pusher (unlike svnadmin load) reports a warning if a directory (e.g. trunk) has already been added. This warning is usally harmless. If svn-pusher wants to add a file which already exists, it will skip merging that source revision, but it will proceed merging subsequent source revisions. This is not always what you want, because you may want to fix the conflict first by renaming files, and only then proceed with subsequent revisions. Should this happen, you may have to rebuild the target repository from scratch using the backup.

No comments: