how to make SCP work faster?

How to Copy Files Across a Network/Internet in UNIX/LINUX (Redhat, Debian, FreeBSD, etc) - scp tar rsync

One of the many advantages of Linux/UNIX is how many ways you can do one thing. This tutorial is going to show you some of the many ways you can transfer files over a network connection.
In this article/tutorial we will cover rsync, scp, and tar. Please note that there are many other ways these are just some of the more common ones. The methods covered assume that SSH is used in all sessions. These methods are all much more secure and reliable than using rcp or ftp. This tutorial is a great alternative for those looking for an FTP alterative to transferring files over a network.

scp
scp or secure copy is probably the easiest of all the methods, its is designed as a replacement for rcp, which was a quick copy of cp with network functionality.
scp syntax
scp [-Cr] /some/file [ more ... ] host.name:/destination/file
-or-
scp [-Cr] [[user@]host1:]file1 [ more ... ] [[user@]host2:]file2
Before scp does any copying it first connects via ssh. Unless proper keys are in place, then you will be asked for usernames. You can test if this is working by using ssh -v hostname
The -r switch is used when you want to recursively go through directories. Please note you must specify the source file as a directory for this to work.
scp encrypts data over your network connection, but by using the -C switch you can compress the data before it goes over the network. This can significantly decrease the time it takes to copy large files.
Tip: By default scp uses 3DES encryption algorithm, all encryption algorithms are slow, but some are faster than others. Using -c blowfish can speed things up.
What scp shouldn't be used for:
1. When you are copying more than a few files, as scp spawns a new process for each file and can be quite slow and resource intensive when copying a large number of files.
2. When using the -r switch, scp does not know about symbolic links and will blindly follow them, even if it has already made a copy of the file. The can lead to scp copying an infinite amount of data and can easily fill up your hard disk, so be careful.

rsync
rsync has very similar syntax to scp:
rsync -e ssh [-avz] /some/file [ more ... ] host.name:/destination/file

-or-
rsync -ave ssh source.server:/path/to/source /destination/dir
rsync's speciality lies in its ability to analyse files and only copy the changes made to files rather than all files. This can lead to enormous improvements when copying a directory tree a second time.
Switches:
-a Archive mode, most likely you should always keep this on. Preserves file permissions and does not follow symlinks.
-v Verbose, lists files being copied
-z Enable compression, this will compress each file as it gets sent over the pipe. This can greatly decrease time depending on what sort files you are copying.
-e ssh Uses ssh as the transport, this should always be specified.
Disadvantages of using rsync:
1. Picky syntax, use of trailing slashes can be confusing.
2. Have to remember that you are using ssh.
3. rsync is not installed on all computers.

tar
tar is usually used for achiving applications, but what we are going to do in this case is tar it then pipe it over an ssh connection. tar handles large file trees quite well and preserves all file permissions, etc, including those UNIX systems which use ACLs, and works quite well with symlinks.
the syntax is slightly different as we are piping it to ssh:
tar -cf - /some/file | ssh host.name tar -xf - -C /destination
-or with compression-
tar -czf - /some/file | ssh host.name tar -xzf - -C /destination

For HP-UX: 
tar -cf - . | ssh user@target-host  "cd ; tar -xf  -"


Switch -c for tar creates an archive and -f which tells tar to send the new archive to stdout.
The second tar command uses the -C switch which changes directory on the target host. It takes the input from stdin. The -x switch extracts the archive.
The second way of doing the transfer over a network is with the -z option, which compresses the stream, decreasing time it will take to transfer over the network.
Some people may ask why tar is used, this is great for large file trees, as it is just streaming the data from one host to another and not having to do intense operations with file trees.
If using the -v (verbose) switch, be sure only to include it on the second tar command, otherwise you will see double output.
Using tar and piping can also be a great way to transfer files locally to be sure that file permissions are kept correctly:
tar cf - /some/file | (cd /some/file; tar xf -)
This may seem like a long command, but it is great for making sure all file permissions are kept in tact. What it is doing is streaming the files in a sub-shell and then untarring them in the target directory. Please note that the -z command should not be used for local files and no perfomance increase will be visible as overhead processing (CPU) will be evident, and will slow down the copy.
Why tar shouldn't be used:
1. The syntax can be hard to remember
2. It's not as quick as to type scp for a small number of files
3. rsync will beat it hands down for a tree of files that already exist in the destination.
There are several other ways of copying over a network, such as FTP, NAS, and NFS but these all requre specialised software installed on either the receiving or sending end, and hence are not as useful as the above commands.

1 comment:

  1. Nice post.

    But in hp-ux, "tar and scp" wont work as specified in this blog.

    The exact syntax is:
    tar cf - source_dir | ssh user@target "cd destin_dir; tar xf - "

    Continue your great work

    ReplyDelete

Kindly make your valuable and progressive comments here.
Junks will be deleted without notice :-)

 SPACE MANAGEMENT :: 1. Table space usuage: ============== SELECT tablespaces.tablespace_name,          allocated.total_alloc allocated_mb, ...