On using unison to synchronise files efficiently between Windows and Linux machines

Suppose you have two computers, one running Linux, the other one Windows. Suppose further there is a directory on each computer which has to be kept synchronised between those computers. Worst of all, changes might be made to each directory at the same time, so synchronising is not a simple matter of copying files.

This situation is (relatively) easily manageable with unison, a file-synchronizing tool running both under Linux and Windows. It is very fast, works locally and over SSH connections and does a very good job in detecting changes and deletes. There are plenty of good tutorials to be found and it has an excellent manual.

However, there are problems concerning non-ASCII characters, as can be read in the unison Wikipedia entry: Note that when synchronising data between different computer systems, Unison has significant problems if the file names contain accented or international characters.

This is certainly true, because unison itself does not do any encoding translation. However, with Cygwin and the excellent UTF-8 Cywgin project, it is not hard to make things just work. However, information on this method is scant, and I had to search for a longer time. This is why I decided to write a short tutorial to make things easier for other people having this problem.

Important: This will only work if your Linux machine is configured to use a UTF-8 locale. As far as I know, this is standard in most modern Linux distributions. For Gentoo, see ‘Using UTF-8 with Gentoo’. All you have to do is to enable the unicode use flag and set unicode="YES" in /etc/rc.conf.

  • Install unison (and openssh, if it is not installed yet) on the Linux machine. In Gentoo, a simple emerge unison as root is enough, it should not be much harder in other Linux distributions. Remember the version number of unison — the major and minor version of unison have to agree on both computers. The one I used is 2.27.
  • Start the ssh daemon on the Linux machine. In Gentoo, enter /etc/init.d/sshd start as root.
  • On the Windows machine, go to the Cygwin website and download setup.exe. Start the installer, click through and select openssh and unison to be installed as well. Make sure that you have identical major and minor versions of unison on both computers!
  • Here is the important step: Go to the UTF-8 Cygwin project page and download the binary cywgin1.dll package (currently called cygwin1-dll-20-11-18.tar.bz2). Unpack the dll and put it into c:\CYGWIN\bin, assuming that is where you installed Cygwin, and replace the standard dll with it. This dll translates the internal Windows file name encoding to UTF-8, magically making unison ‘just work’.
  • Start cygwin and run unison once to set up the home directory.
  • Edit ~/.unison/default.pdf or c:\CYGWIN\home\username\.unison\default.pdf, respectively, (perhaps with a text editor which is able to respect UNIX line breaks, I am not sure about that) to configure unison. An easy example is

    root = /cygdrive/drive_letter/Path/to/Directory to synchronize/
    root = ssh://user@linux_computer//home/user/path/
    

    Detailed descriptions of the file format can be found in the unison manual. For Gentoo, do not forget to add addversionno = true, since the name of the unison binary is actually unison-2.27.

  • Start unison by either starting a Cygwin shell and entering unison or just by executing C:\CYGWIN\bin\unison-2.27.exe directly. Enter your Linux machine password, and everything should work.

The first synchronisation will take a long time, but subsequent runs will be much faster. Determining changed files in about 50GB of data takes less than 5 minutes for me. Transferring the changed files is done via rsync which is also quite efficient, but this of course depends largely on the amount of changed data. This should work in a local network as well as over the internet. I encourage you to try, I highly recommend it.