Why does it take so long to do an Online Backup?
by Rob Cosgrove
What is the difference between copying a file and backing it up?
We are often asked why our software takes “so long” to do backups – longer than simply copying files. The answer is simple and complicated.
Many factors affect the speed of backups. Some are in your control and some are not. The answer is, simply, that RBackup does far more than copy files. That’s why it takes longer than copying files. It doesn’t copy files. It backs files up. There is a vast difference.
Backing up files right is a complex process that RBS has perfected from performing many millions of backups and millions of restores since our company was started in 1987. It is definitely NOT a simple copy process.
Backup is a critical process that must be dependable, reliable, and perfect. Since we are also sending files offsite over a public network it must also be secure and private. Since offsite backups are governed by so many security and privacy regulations it must also be compliant with all these regulations.
Here is the complete process RBackup uses to back up files. Each process requires some time. Some processes are executed for each file, some for each batch of files, and some for the entire backup session.
Initialize TCP Connection – Open a connection with the Network Interface Adapter.
Log into RBS Server – Contact the RBS Server. This involves sending a message to the RBS Server and looking for a valid response. If the IP address of the Client is listed in the Server’s firewall as “deny” the Server will not respond.
Authenticate Client – The RBS Server identifies itself and then sends the Client a unique encrypted Session Authentication Token. The Client decrypts the token, transforms it using a proprietary algorithm, encrypts it, and sends it back to the Server with some proprietary cargo. The Server receives and validates the token and performs various functions based on the associated cargo. The Client will not be allowed to continue if it does not authenticate.
Assign a Data Port and IP Address – The Server assigns the Client a unique data port and IP address to use for file transmission and for some commands. The Client receives the Server’s assignment and opens a second connection to the Server on the assigned IP address and data port. There is a further authentication on the data port to authenticate the Client.
Begin All Files Process – The Client begins its main process loop to back up all files. This establishes a process start point in case the process is interrupted or aborted before completion. At this point, and until the End All Files Process, the CPU speed and disk speed play a larger part in determining how fast a backup proceeds.
Select Files and Objects – The selection phase of the backup process can take a long time depending on the number of files selected for backup, the method you use for File Selection Criteria, disk speed, network speed, and whether you are using AutoSelect or manual file selection, and whether the file is locked by a local application.
Unlike simple copy processes, RBackup can back up locked files. Some applications lock files exclusively, which prohibits any other application from accessing them, even for backup. But since a backup process, unlike a copy process, must be 100% reliable, RBackup must back them up regardless.
SO, RBackup first checks the Windows operating system to see if it supports Microsoft’s Volume Shadow Copy Service (VSS.) If it does, RBackup switches on its VSS driver. If it does not, RBackup switches on its legacy open files driver.
If VSS is ON, RBackup first checks to see if a file is listed in Windows’ Locked Files list. If it is, RBackup takes a snapshot of the locked file using VSS. While it takes a little time, this is a relatively fast process. Depending on the file size and time, it might require some disk space.
If VSS is OFF, RBackup attempts to open each file for reading. If it fails, RBackup uses its legacy open file system driver to snapshot the file. This is slower than using VSS.
If the file is not locked, RBackup locks it, then opens the file for reading and begins working with it.
File Selection Criteria
Archive Bit selection is the fastest because RBackup only has to examine the archive bit of each file. It scans the file selections and examines only the archive bit of each file that matches your selections.
FastPick is next. RBackup examines the date and time of each file or object selected and compares them to the date and time of the last backup. Files that have dates and times newer than that of the last backup are backed up.
Date/Time takes the longest. For each file that matches your file selections, RBackup has to examine the date and time of the file and compare it to the date and time of the last time each specific file was backed up. This requires RBackup to look up each file in its catalog, examine each file on the disk, and compare dates.
If you use AutoSelect rather than selecting files manually, RBackup might examine all files on the hard drive and all files on mapped drives, depending on how you have defined the AutoSelect function. Because of this AutoSelect is much slower than selecting files manually.
Extract Changes – Based on the information RBackup gets from the file selection process, it extracts changes from the file system depending on the selected Backup Method and File Type.
Incremental / Differential – Back up only files that have changed since the last backup. This is the quickest.
Full – Back up the entire file, regardless of the File Selection Criteria.
BitBackup – Back up only the parts of the files that have changed since the last backup. This requires RBackup to compare the current file to the previous version of the file, and takes the longest amount of time, and uses the most drive space and CPU time.
File Backup – If the file selected is a simple file like a word processing document or spreadsheet, RBackup can process it quickly.
Database Backup – If the file selected is a database like Exchange, SQL Server, Active Directory, or SharePoint, RBackup will switch in one of its built in agents to extract the data changed since the last backup. This allows RBackup to back up only the Objects that have changed, like Mailboxes, Records, and Directory Objects. It is the slowest form of backup by File Type.
Calculate File/Object Signature – RBackup must calculate a pre-compression, pre-encryption digital signature for each file or object it backs up. This allows the Client and Server to verify the authenticity, origin, and content of each file securely before and after transmitting it to the Server without the need to view the file’s contents.
Compress – Files and Objects are then compressed using one of five built-in lossless compression algorithms, each one selected for each file based on the file’s contents and encoding method as optimum for each specific file.
Encrypt – Compressed files and objects are encrypted using the selected encryption method for the current backup set. Encryption methods have different speeds depending on their algorithm and key length. Generally, the longer the key length, the slower the encryption speed, however the various algorithms available also have different speeds.
Close the File – RBackup instructs the Windows file system that it is now finished with the file, and authorizes other applications to access it.
Verify Locally – The file or object is then verified to be sure that the encryption and compression process did not alter the file’s contents.
Digitally Sign – The file is then digitally signed using the previously calculated signature. The signature is appended to the backup copy. This will be used by the Server and the Client later to authenticate the file’s contents without the need to read the file. This insures that no viruses or Trojans get attached to the backup copy after transmission to the Server and that the file has not been altered since it was signed. On restore, the file is guaranteed to be 100% identical to its original version.
Alias Filename – To comply with worldwide data security and privacy regulations the filename is removed from the file and replaced with a unique identifier that indexes each file to its metadata in the Catalog.
Store in Cache – The backup file, now secure, is stored in the local cache where it will stay until it has been verified as properly stored on the Server. Then it will be erased.
Index the File – The secure file structure of RBackup indexes filenames to folder names, and breaks up the data into manageable chunks that Windows can handle most efficiently. This requires indexing files to folder names in the server’s data store, the file’s metadata, a signature of its encryption key, encryption method, group type, backup set, compression type, and storage name.
Assign Directory Names – The file is assigned to a directory on the RBS Server, no more than 5,000 files per directory (unless set differently.)
Catalog – All the file’s metadata, including its index, directory name, and original filename are stored in the local Catalog. This is a database maintained by the RBackup Client for quick lookup on restore.
Transmit – Each backup file, now renamed, compressed, encrypted, signed and secure, is then transmitted to the Server. Depending on the settings of the software, RBackup might transmit several files at a time. File transmission time is determined primarily by Internet speed, but is also affected by the number of ports assigned to each simultaneous connection (at the Server) and other factors in the control of the Service Provider.
Wait for Server to Acknowledge – The Client waits for the Server to acknowledge that it has received the file. Backups of previously prepared files may continue in other threads. The client software hardly ever really “waits” for the server.
Server Verify – The Server stores the file in its Data Store. The Server then performs a local validation based on the file’s metadata and its signature.
Acknowledge – If the Server determines that the file is properly stored it sends the Client some metadata and a proprietary token that identifies the file it has just received. The Client calculates the token and authenticates it, then notifies the Server if its validation is correct. If it is not, the transmission process restarts.
Erase from Cache – After acknowledgement the Client erases the backup file from its local cache.
Exchange Tokens – Periodically during the transmission the Server will ask for a token from the Client to re-authenticate. Responses vary in content and protocol, and happen at random intervals. Some responses are fake, and the Server expects them to be fake. Some are real, and are sent to the server with a pre-agreed upon timing. There may be fake responses before the real expected one. The Client also queries the Server similarly. All this makes sure that the Client communicating with the Server is still the authenticated Client, and the Server is still the authenticated Server.
End All Files Process – After all files are processed, the main process loop is closed and the server is signaled.
Validate Batch – The Client and Server exchange information about the batch to validate that all files were received and stored in their original forms, unchanged.
Store Key Escrow Files – If Key Escrow is turned on, the Client transmits and validates its Escrow files.
Perform Delete and File Move Functions – The Client searches its catalog and compares the current date/time with the dates and times of all backup sets and all previously stored files and objects to determine if it needs to prune files from the server. It sends the Server encrypted instructions to delete or move backup files depending on the file retention protocol previously defined for each backup set.
Update Local Catalog – The Client updates its local catalog with any changes induced by the pruning process.
Validate Local Catalog Synced with Server – If this process is turned ON, the Client and Server verify that the local Catalog and the Server’s Data Store are in sync. If not, the local Catalog is updated and messages logged for the Service Provider.
Compare Catalog – RBackup compares the last copy of the catalog it stored on the Server with its current copy to extract changes.
Extract Changes from Catalog – If the catalog has changed, the changes are extracted.
Prepare Catalog for Backup – The changes are prepared for backup exactly the same way files and objects are prepared. They are compressed, encrypted, and signed.
Store Catalog – The catalog changes are sent to the Server and stored.
Verify Catalog – The Client and Server exchange metadata and signatures through secure tokens to validate the catalog changes were properly stored on the Server.
End of Backup Processes – At the end of the backup both the RBackup server and the Client do some housekeeping tasks like flushing cache, updating databases, writing logs, closing and releasing ports. In addition, the Client might run any command files that have been defined by the Service Provider.
Other things that might affect speed:
Throttling – RBackup has a function that throttles bandwidth. It is set to medium by default, but it can be turned way down or way up. This makes a huge difference in the speed of transmission, and not in the speed of file preparation.
Priority – RBackup can be set to use High, Medium, or Low priority. This affects how much CPU time the backup process takes from other applications. It is set to Medium by default.
CPU speed – The speed of the CPU affects the file preparation process, starting with the Begin All Files Process phase.
Internet Speed – Of course Internet speed plays a large part in transmission speed. Sending files UP your Internet connection (backing them up) is usually far slower than downloading files (restoring them.) Most Internet Service Providers specify their speeds by advertising only the download speed, the fastest one. The remote-backup.com website has several articles and calculators to help predict transmission speeds.
File Size – Big files take a long time to back up. That’s just simple physics.
Number of files – RBackup needs about 1 second per file to handle overhead tasks like validating signatures and verifying that files have been correctly stored on the Server, and this can add a lot to the backup time. For example, if you are backing up 100,000 files, that’s 100,000 seconds or 1,666 minutes, or 28 hours – just in overhead time for file validation, not including the time needed to prepare and transmit the files.
There are many ways you can speed up backups by changing various settings in the RBackup software. See your documentation for specifics, or search the Knowledge Base at http://help.remote-backup.com.
There are articles that can help you adjust your file selections and other settings in the Best Practices links at http://www.remote-backup.com/online-backup-index.htm
If you made it through this article to the end, I hope you are convinced that backing up files is vastly different from copying files, and that doing proper backups is worth the time it takes.