Correct Online Backup of Databases Using Agents
Takeaway: When backing up databases that are managed by a database engine or server, use the backup software’s built in agents instead of directly backing up the data files themselves. This is the only way to ensure a completely stable, restorable backup of the complete database.
Some of us are familiar with databases – Exchange, SQL, Oracle. These are applications that can manage vast amounts of information, most of which is usually stored in files stored on hard drives, that have file names with extensions like .EDB, .MDF, .LDF, and .DBF.
However, “most” isn’t all, and that’s one reason it is dangerous to rely on backup strategies that back up only files.
Besides the actual data stored in databases (names, addresses, etc), there is “maintenance” information about how the data is to be sorted and how it is to be backed up, log files that can help insure the integrity of the data, information that controls security and access, rules for allocating file space and numbers of data files, info about the current state of the database, transaction control data, roll-back queues and much more.
All this other information may or may not be stored in data files that you can easily see on the hard drive. Some of it may be stored in the Windows registry, some in system files, some in RAM, and some may even be stored on different computers in different locations.
Simply restoring a few data files found on a single hard drive, ignoring all the other important information, may corrupt the database and make it unusable.
Databases are managed by database engines, sometimes called servers – applications that constantly monitor and maintain the data using the “maintenance” information. The database engines make sure access to the data is controlled, and that the data remains in a stable state, with full integrity, no matter how many people access it.
When a user needs access to data stored in databases, the application he uses queries the database engine instead of the data files themselves. It sends a request to the database engine, which in turn initiates a series of operations to locate the data, make sure it is the most current version, and that the user has authority to use it. It then serves the requested data to the user.
Depending on the relevant rules, the database engine might also lock records so other users cannot read it or update it while the requesting user is viewing it. It might also log the user’s access to the data, and perform other operations.
That’s a lot of complicated work to retrieve one record, and as you can see, even this simple operation requires access to a lot of data besides just the data stored in the files.
Database files are usually shared, accessed by many users at the same time. At any given time there might be dozens, hundreds, or thousands of users requesting and updating data. This constantly churns the data files as thousands of read/write requests come through.
The database engines themselves access and update the data files even when there are no users requesting access. They take advantage of periods of low usage to maintain the files, test integrity, flush logs, and clear transactions.
Because of all this activity in the data files, simply backing up the raw files associated with a database is rarely adequate.
Database engines also track and control backups. When a proper backup is done using the methods recommended for the database engine, the engine completely manages the backup to make sure it is done right – backing up everything needed to restore the full database to a fully stable state.
During a typical file-only backup operation (not a proper database backup) the backup program selects files to be backed up and then copies them. To copy a file, the backup program has to read it from start to end, byte by byte, ignoring its internal structure altogether.
The operation of reading a file from start to end takes some time. The backup program starts at the front of the file and reads sequentially through the file until it reaches the end. For a big file this can take a few minutes.
But what if, during the backup, after the backup program has read halfway through the file, someone updates a record that happens to be in the first half of the file – the half that the backup program has already read?
The backup program will continue to read to the end of the file, unaware that the file has changed, and the resulting backup will not contain the most recent change. Worse, if the most recent update is transaction-based (requiring more than one record to be correctly updated) and a corresponding record is in the last half of the file, the backup program will back up only half of the transaction, resulting in corrupted data when restored.
There are operating system functions that can help prevent this particular kind of data corruption, and good backup software makes use of those functions. However, this would solve just one of the many things that can go wrong when backing up ONLY the files.
All database engines have features that enable proper backups, and backup software containing agents specifically designed to back up databases can access these features to do stable, reliable, restorable backups.
Rather than backing up just the data files, backup software containing agents for the different database engines query the database engines to do backups. They send requests to the engines, which in turn initiate their recommended backup procedure.
This may include queuing and suspending all read/write operations to the data files while the engine prepares to serve stable data to the requesting agent, collecting all the various maintenance information required to restore the database, flushing logs, and more, depending on the rules set up by the database administrator.
The result is a backup that contains the correct information, in a stable state, with all supporting files, keys, and rules, tested and verified, with accuracy that ensures a good restore.
Here’s the bottom line. When backing up databases that are managed by a database engine or server, use the backup software’s built in agents instead of directly backing up the data files themselves. This is the only way to ensure a completely stable, restorable backup of the complete database.
Remote Backup Systems, Inc. is the global leader in turnkey Online Backup solutions to MSPs, VARs, ISPs and Managed Hosting Providers, providing leading edge functionality, best-in-class customer support, and an unwavering commitment to product quality and customer satisfaction. With a twenty-seven year history in Remote Backup technologies, RBS is headquartered in Memphis, Tennessee, USA. Rob Cosgrove’s latest book, The Online Backup Guide for Service Providers is available at Amazon.com and bookstores.