Transfer protocols ================== A number of methods allow transferring data in and out of PSMN computing center. For most cases, we recommend using SSH-based file transfer commands, such as ``scp``, ``sftp``, or ``rsync``. They will provide the best performance for data transfers from and to computing center. For the rest of this documentation, replace ``mylogin`` by your login as provided by PSMN. .. NOTE:: All examples below are based on the :ref:`following configuration ` and :ref:`login nodes `. SCP (Secure Copy) ----------------- The easiest command to use to transfer files to/from PSMN is *scp*. It works like the *cp* command, except it work over the network to copy files from one computer to another, using :term:`SSH` protocol. For instance, the following command will copy the file named *myfile* from my local machine to the *mydir* directory in my home directory on PSMN (on allo-psmn gateway): .. code-block:: console $ scp myfile mylogin@allo-psmn.psmn.ens-lyon.fr:~/mydir/ .. IMPORTANT:: While this is handy, for large transfert operations **it is better to use multi-hops** (See :ref:`related documentation ` and :ref:`login nodes `): .. code-block:: console $ scp myfile mylogin@x5570comp1:~/mydir/ (replace ``mylogin`` by your login as provided by PSMN, ``x5570comp1`` by your prefered login node). You can copy *myfile* under a different name, or to another directory, with the following commands: .. code-block:: console $ scp myfile mylogin@x5570comp1:~/inputfile $ scp myfile mylogin@x5570comp1:~/mydir/subdir/foofile To copy back files from PSMN to your local machine, you just need to reverse the order of the arguments, as in this example: .. code-block:: console $ scp mylogin@x5570comp1:~/inputfile local_inputfile *scp* also support recursive copying of directories, with *-r* option: .. code-block:: console $ scp -r mydir/ mylogin@x5570comp1:~/ SCP from outside ENS network ---------------------------- To transfer your files between your PC and ``allo.psmn`` from outside the ENS network, you have to use the ``ssh.psmn`` gateway as a proxy (see :doc:`Connection on PSMN servers <../connection/connection_on_PSMN_servers>`), so in a terminal of your workstation, you could execute: .. code-block:: console #your PC -> your PSMN home: $ scp -oProxyCommand="ssh mylogin@ssh.psmn.ens-lyon.fr netcat -w1 allo-psmn %p" source_file mylogin@allo-psmn:~/destination_file .. code-block:: console # your PSMN home -> your PC : $ scp -oProxyCommand="ssh mylogin@ssh.psmn.ens-lyon.fr netcat -w1 allo-psmn %p" mylogin@allo-psmn:~/source_file destination_file where *source_file* and *destination_file* should be changed as needed. If you want to transfer a directory (and not a file) you have to add ``-r`` option to ``scp`` (i.e. ``scp -r -oProxyCommand=...``). .. IMPORTANT:: While this is handy, for large transfert operations **it is better to use multi-hops** (See :ref:`related documentation ` and :ref:`login nodes `). Which will resume to: .. code-block:: console $ scp source_file x5570comp1:~/destination_file SFTP (Secure File Transfer Protocol) ------------------------------------ :term:`SFTP` clients are interactive file transfer programs (as to :term:`FTP`), which perform all operations over an encrypted transport. A variety of graphical SFTP clients are available: * `WinSCP `_, `MobaXterm `_ (for Windows) * `CyberDuck `_ (for MacOS) * `FileZilla `_ (all platforms, but **deprecated**) When setting up your connection to PSMN in the above, use these informations: .. code-block:: bash host: x5570comp1 port: 22 ssh gateway (or jump host): allo-psmn.psmn.ens-lyon.fr port: 22 username: your login at PSMN password: your password at PSMN (if needed) ssh key: your personnal ssh private key file (prefered method) MobaXterm or WinSCP (via PuTTY/KiTTY) can use ssh keys and ssh-agent, and multi-hops. However, as FileZilla has no native support for SSH tunnelling (aka jump hosts/port forwarding), you will have to setup a ssh tunnel on your local machine: .. code-block:: console $ ssh -L 3322:x5570comp1:22 mylogin@allo-psmn.psmn.ens-lyon.fr then configure FileZilla to connect to *localhost:3322* using your PSMN credentials. This will also work for MobaXterm or WinSCP on Windows (using OpenSSH). It will also be necessary when more than one hop is needed (localhost -> ssh.psmn -> allo-psmn -> x5570comp1). OpenSSH also provide a command-line :term:`SFTP`, named *sftp*, which can take advantage of *ssh-agent*, *ssh keys* and configured *ProxyJump*. Example of use: .. code-block:: console $ sftp mylogin@x5570comp1 Connected to x5570comp1. sftp> There are many tutorials online containing more informations about SFTP clients. `Here's one `_. rsync ----- If you have complex hierarchies of files to transfer, or if you need to synchronize a set of files and directories between your local machine and PSMN storages, *rsync* will be one of the best tools to do the job. It will efficiently transfer and synchronize files across systems, by checking the timestamp and size of files. Which means that it won't re-transfer files that have not changed since the last transfer, and will complete faster. Also, if, for any reason, a transfer is interrupted, you might end up with part of files being transferred. Rather than restarting the transfer from scratch, *rsync* will only transfer what needs to be transferred: missing files, modified files, etc. For large transfert operations, it is better to use multi-hops (See :ref:`related documentation ` and :ref:`login nodes `). For instance, to transfer the whole ``~/test/`` folder tree from my local machine to my home directory on PSMN, I can use the following command: .. code-block:: console $ rsync -n -avzP -e ssh ~/test/ mylogin@x5570comp1:~/test Refer to the *rsync* manual for more options, like these ones: .. code-block:: bash --dry-run (-n) --archive --verbose --recursive --itemize-changes --append-verify --progress --bwlimit=56K --numeric-ids .. WARNING:: Always test with a dry-run first !!! **As it is very easy** to rsync empty data, or non-existent data, to existent data (therefore erasing data), we do recommend to test with a ``-n/--dry-run`` first. `Here's another tutorial `_. fpart (+rsync) -------------- *fpart* generate lists of files that can be feeded to *rsync*, correcting some of rsync defaults on large filetrees: - no parallelism -> small parallelism (3 to 4 process, don't be greedy), - larges batches that don't fit in memory -> small batches (start early, fit in memory), - decreasing use of bandwidth over time -> frequent 'restarts' maintening maximum use of bandwidth over time. See `fpart documentation `_. .. code-block:: console $ cd /Xnfs/planetary $ fpart -L -v -f 2000 -Z -o /tmp/planetary.part.out -W \ 'parallel --semaphore -j 4 \ "rsync -e ssh -az --numeric-ids --files-from=${FPART_PARTFILENAME} /Xnfs/planetary/ user@external_server:/data/planetary"' . This example will scan the */Xnfs/planetary* filetree, creating lists of 2000 files each, feeding them to 4 parallel rsync that copy these files, from a PSMN login node, over ssh, on external_server. Refer to the *fpart* manual for more options and use cases. Unison ------ ``unison`` is a file-synchronization tool that is available on PSMN clusters. See :term:`Unison` `homepage `_ for more. SSHFS ----- Sometimes, moving files in and out of the cluster, and maintaining two copies of each of the files you work on, both on your local machine and on PSMN, may be painful. Fortunately, PSMN offers the ability to mount its home filesystem to your local machine, using a secure and encrypted connection (and vice-versa, if your workstation expose a SSH server). With SSHFS, a FUSE-based filesystem implementation used to mount remote SSH-accessible filesystems, you can access your files on PSMN as if they were locally stored on your own computer. .. HINT:: Be aware that, while very convenient, SSHFS is also **quite slow**, due to FUSE. This comes particularly handy when you need to access those files from an application that is not available on PSMN, but that you already use or can install on your local machine. Like a data processing program that you have licensed for your own computer but can't be use on PSMN, a specific text editor that only runs on MacOS, or any data-intensive 3D rendering software that wouldn't work comfortably enough over a forwarded X11 connection (See also :doc:`Visualization server <../connection/visualization_server>`). SSHFS is available for all platforms (Linux, MacOS and Windows). .. WARNING:: **SSHFS on MacOS** SSHFS on macOS is known to try to automatically reconnect filesystem mounts after resuming from sleep or suspend, even without any valid credentials. As a result, it will generate a lot of failed connection attempts and **likely make your IP address blacklisted** on *ssh.psmn.ens-lyon.fr* or *allo-psmn.psmn.ens-lyon.fr*. Make sure to unmount your SSHFS drives before putting your macOS system to sleep to avoid this situation. For instance, on a Linux machine with SSHFS installed, you could mount your PSMN home directory with the following commands: .. code-block:: console $ mkdir ~/PSMN_home $ sshfs mylogin@allo-psmn.psmn.ens-lyon.fr:~/ ~/PSMN_home (replace ``mylogin`` by your login as provided by PSMN). And to unmount it: .. code-block:: console $ umount ~/PSMN_home or: .. code-block:: console $ fusermount -u ~/PSMN_home For more information about using SSHFS on your local machine, you can refer to `this tutorial for more details and examples `_.