Suppose you have two computers, one running Linux, the other one Windows. Suppose further there is a directory on each computer which has to be kept synchronised between those computers. Worst of all, changes might be made to each directory at the same time, so synchronising is not a simple matter of copying files.
This situation is (relatively) easily manageable with unison, a file-synchronizing tool running both under Linux and Windows. It is very fast, works locally and over SSH connections and does a very good job in detecting changes and deletes. There are plenty of good tutorials to be found and it has an excellent manual.
However, there are problems concerning non-ASCII characters, as can be read in the unison Wikipedia entry: Note that when synchronising data between different computer systems, Unison has significant problems if the file names contain accented or international characters.
This is certainly true, because unison itself does not do any encoding translation. However, with Cygwin and the excellent UTF-8 Cywgin project, it is not hard to make things just work. However, information on this method is scant, and I had to search for a longer time. This is why I decided to write a short tutorial to make things easier for other people having this problem.
Important: This will only work if your Linux machine is configured to use a UTF-8 locale. As far as I know, this is standard in most modern Linux distributions. For Gentoo, see ‘Using UTF-8 with Gentoo’. All you have to do is to enable the unicode use flag and set unicode="YES"
in /etc/rc.conf
.
- Install unison (and openssh, if it is not installed yet) on the Linux machine. In Gentoo, a simple
emerge unison
as root is enough, it should not be much harder in other Linux distributions. Remember the version number of unison — the major and minor version of unison have to agree on both computers. The one I used is 2.27. - Start the ssh daemon on the Linux machine. In Gentoo, enter
/etc/init.d/sshd start
as root. - On the Windows machine, go to the Cygwin website and download
setup.exe
. Start the installer, click through and selectopenssh
andunison
to be installed as well. Make sure that you have identical major and minor versions of unison on both computers! - Here is the important step: Go to the UTF-8 Cygwin project page and download the binary
cywgin1.dll
package (currently calledcygwin1-dll-20-11-18.tar.bz2
). Unpack the dll and put it intoc:\CYGWIN\bin
, assuming that is where you installed Cygwin, and replace the standard dll with it. This dll translates the internal Windows file name encoding to UTF-8, magically making unison ‘just work’. - Start cygwin and run
unison
once to set up the home directory. -
Edit
~/.unison/default.pdf
orc:\CYGWIN\home\username\.unison\default.pdf
, respectively, (perhaps with a text editor which is able to respect UNIX line breaks, I am not sure about that) to configure unison. An easy example isroot = /cygdrive/drive_letter/Path/to/Directory to synchronize/ root = ssh://user@linux_computer//home/user/path/
Detailed descriptions of the file format can be found in the unison manual. For Gentoo, do not forget to add
addversionno = true
, since the name of the unison binary is actuallyunison-2.27
. - Start unison by either starting a Cygwin shell and entering
unison
or just by executingC:\CYGWIN\bin\unison-2.27.exe
directly. Enter your Linux machine password, and everything should work.
The first synchronisation will take a long time, but subsequent runs will be much faster. Determining changed files in about 50GB of data takes less than 5 minutes for me. Transferring the changed files is done via rsync which is also quite efficient, but this of course depends largely on the amount of changed data. This should work in a local network as well as over the internet. I encourage you to try, I highly recommend it.