On using unison to synchronise files efficiently between Windows and Linux machines

am 24. November 2008, 15:02 in Software, Tags: ,

Suppose you have two computers, one running Linux, the other one Windows. Suppose further there is a directory on each computer which has to be kept synchronised between those computers. Worst of all, changes might be made to each directory at the same time, so synchronising is not a simple matter of copying files.

This situation is (relatively) easily manageable with unison, a file-synchronizing tool running both under Linux and Windows. It is very fast, works locally and over SSH connections and does a very good job in detecting changes and deletes. There are plenty of good tutorials to be found and it has an excellent manual.

However, there are problems concerning non-ASCII characters, as can be read in the unison Wikipedia entry: Note that when synchronising data between different computer systems, Unison has significant problems if the file names contain accented or international characters.

This is certainly true, because unison itself does not do any encoding translation. However, with Cygwin and the excellent UTF-8 Cywgin project, it is not hard to make things just work. However, information on this method is scant, and I had to search for a longer time. This is why I decided to write a short tutorial to make things easier for other people having this problem.

Important: This will only work if your Linux machine is configured to use a UTF-8 locale. As far as I know, this is standard in most modern Linux distributions. For Gentoo, see ‘Using UTF-8 with Gentoo’. All you have to do is to enable the unicode use flag and set unicode=”YES” in /etc/rc.conf.

  1. Install unison (and openssh, if it is not installed yet) on the Linux machine. In Gentoo, a simple emerge unison as root is enough, it should not be much harder in other Linux distributions. Remember the version number of unison — the major and minor version of unison have to agree on both computers. The one I used is 2.27.
  2. Start the ssh daemon on the Linux machine. In Gentoo, enter /etc/init.d/sshd start as root.
  3. On the Windows machine, go to the Cygwin website and download setup.exe. Start the installer, click through and select openssh and unison to be installed as well. Make sure that you have identical major and minor versions of unison on both computers!
  4. Here is the important step: Go to the UTF-8 Cygwin project page and download the binary cywgin1.dll package (currently called cygwin1-dll-20-11-18.tar.bz2). Unpack the dll and put it into c:\CYGWIN\bin, assuming that is where you installed Cygwin, and replace the standard dll with it. This dll translates the internal Windows file name encoding to UTF-8, magically making unison ‘just work’.
  5. Start cygwin and run unison once to set up the home directory.
  6. Edit ~/.unison/default.pdf or c:\CYGWIN\home\username\.unison\default.pdf, respectively, (perhaps with a text editor which is able to respect UNIX line breaks, I am not sure about that) to configure unison. An easy example is

    root = /cygdrive/drive_letter/Path/to/Directory to synchronize/
    root = ssh://user@linux_computer//home/user/path/

    Detailed descriptions of the file format can be found in the unison manual. For Gentoo, do not forget to add addversionno = true, since the name of the unison binary is actually unison-2.27.

  7. Start unison by either starting a Cygwin shell and entering unison or just by executing C:\CYGWIN\bin\unison-2.27.exe directly. Enter your Linux machine password, and everything should work.

The first synchronisation will take a long time, but subsequent runs will be much faster. Determining changed files in about 50GB of data takes less than 5 minutes for me. Transferring the changed files is done via rsync which is also quite efficient, but this of course depends largely on the amount of changed data. This should work in a local network as well as over the internet. I encourage you to try, I highly recommend it.


5 Kommentare zu “On using unison to synchronise files efficiently between Windows and Linux machines”

  1. effjot schreibt am 14. February 2009, 18:27:

    Thanks! This greatly helped me to get my umlaut-ridden files synchronised.

  2. boa13 schreibt am 10. June 2009, 16:13:

    Thanks for the information! Unfortunately, this requires a Cygwin installation; I would have preferred a native Windows solution. However, the Unison filename encoding issue is so bad I might end up going your way.

  3. Peter schreibt am 16. October 2009, 22:40:

    Thanks! I had the same problem with the Umlauts between a Linux and a Windows box and it is now solved thanks to your manual.

  4. Adrianus schreibt am 14. April 2010, 22:39:

    Auch wenn die Frage etwas offtopic ist, wo bekomme ich fuer meinen Blog das Theme her? Gibt es das irgentwo zu downloaden? Sieht sehr Schick aus!

  5. Jan schreibt am 15. April 2010, 10:53:

    Ich muss an dieser Stelle mal klarstellen, dass ich prinzipiell gerne den Code für mein Blog hergebe. Allerdings würde ich mich dann über einen Link und über eine Erwähnung freuen.
    Im Fall dieses Kommentars bin ich aber recht skeptisch, schließlich bekam ich denselben Kommentar innerhalb von 15 Minuten zweimal von derselben IP, allerdings unter zwei verschiedenen Namen, zwei verschiedenen Email-Adressen und mit zwei verschiedenen Webseiten (gleiches Layout, aber vollständig verschiedener Inhalt), die ich aufgrund von Spam-Verdacht mal entfernt habe.
    Also: Ernsthaften Fragen bin ich durchaus nicht abgeneigt, in diesem Fall zweifle ich den Ernst aber an. Ich lasse mich natürlich gerne eines Besseren belehren.


Schreibe einen Kommentar

Du musst angemeldet sein, um einen Kommentar schreiben zu können.

Wo ist der Inhalt?