0

How do I synchronize target filesystems?

Synchronizing directories is the process of making some part of a filesystem (files, directories and subdirectories) identical on more than one computer. The context for Everserve is to have multiple peers in a community duplicate a dataset sent to them from the community’s Publisher.

This document will describe default behavior of package execution using the delta and version attributes. The process is limited to the synchronization of a peer’s filesystem to the publisher's filesystem. Only changed files on the publisher system will be propagated. This means that the directories may not match exactly because files removed from the publisher will not be removed from the target. Renamed files will be duplicated on the target peer. Everserve can easily handle the case where new files or subdirectories are added to the dataset, or when existing files are modified. Everserve has package attributes to send these limited changes efficiently to the community peers. Future releases of Everserve will support the transmission of only the changes rather than the complete file. The current release will resend the entire file when changes are made to an existing file.

Files that are deleted on the publisher are not removed from peer systems with this level of synchronization. If file or directory names are changed, the objects are simply copied with their new names. This means that the same file with the previous name will remain on the target peer.

To use Everserve's synchronization features a package specification must set its version attribute to true and record the filesystem state. After the first delivery of the complete dataset, Everserve packages that set the delta flag to true will transmit updates and send only the filesystem structures that are new or changed since the last 'version' was sent.

Here is an example package enabling basic updates for the Windows OS:

<?xml version="1.0" ?>
<!DOCTYPE spec-container SYSTEM "package_spec.dtd">
<spec-container name="synch" delta="true" version="true" >
<spec-version version="1.0" earliest="1.0"/>
<directory-spec source="c:\test" target="c:\test" recurse="true"/>
</spec-container >

A similar package for a Unix environment:

<?xml version="1.0" ?>
<!DOCTYPE spec-container SYSTEM "package_spec.dtd">
<spec-container name="synch" delta="true" version="true" >
<spec-version version="1.0" earliest="1.0"/>
<directory-spec source="/tmp/test"
target="/tmp/test"
recurse="true" />
</spec-container >

What follows are some techniques for more advanced synchronization. These methods will enable Everserve to propagate complete filesystems. The packages are designed to allow all peers in the deliver command to have exactly the same filesystem as the publisher system sending the package.

Selective deletion using the receipt and a script

After a package specification is executed for the first time a receipt including the packing list of the files that were delivered is stored in the database. The most efficient and sophisticated method for insuring complete synchronization with the publisher is to create a script to iterate this list, and compare it to the publisher filesystem in advance of the next delivery. If a file or directory was removed from the publisher, a new package containing delete command-specs for the missing objects can created to correct the peers.

Full directory deletion and redelivery

An alternative means of insuring that the publisher’s filesystem matches each peer is to remove the previously delivered dataset and send the complete package every time. In this case, the version and delta flags are set to false in the package spec. The package spec below provides an example that you can customize. It does not take advantage of the Everserve features for more efficient delivery of datasets via updates. The first set of commands in the package-spec will issue delete commands to completely remove the file or directory structure before redelivering the complete dataset with file-spec or directory-spec package attributes.

Sample packages to provide this complete synchronization follow. Note that the use of scripts or success codes might be a better solution. These samples are simplified for demonstration purposes.

Windows OS:

<?xml version="1.0" ?>
<!DOCTYPE spec-container SYSTEM "package_spec.dtd">
<spec-container name="rmsynch" version="false" delta="false" >
<spec-version version="1.0" earliest="1.0"/>
<!-- first - remove the entire target directory -->
<command-spec commandline="rmdir /s /q c:\test" />
<!-- now resend the complete directory -->
<directory-spec source="c:\test"
target="c:\test"
recurse="true" />
<!-- Sending individual files is also possible -->
<file-spec source="c:\test\file1" target="c:\test\file1" />
<file-spec source="c:\test\file2" target="c:\test\file2" />
</spec-container>

And for the Unix environment:

<?xml version="1.0" ?>
<!DOCTYPE spec-container SYSTEM "package_spec.dtd">
<spec-container name="rmsynch" version="false" delta="false" >
<spec-version version="1.0" earliest="1.0"/>
<!-- first - remove the entire target directory -->
<command-spec commandline="rm -rf /tmp/test" />
<!-- now resend the complete directory -->
<directory-spec source="/tmp/test"
target="/tmp/test"
recurse="true" />
<!-- Sending individual files is also possible -->
<file-spec source="/tmp/test/file1" target="/tmp/test/file1" />
<file-spec source="/tmp/test/file2" target="/tmp/test/file2" />
</spec-container>

The points to remember are that the delta and version flags will enable efficient delivery of filesets through Everserve by transmitting only the additions and changed files of the dataset. Future releases will support an even more efficient delivery where only the changed pieces of a file (rather than the complete file) are sent out to the community (along with the newly added data). Since file-spec and directory-spec package attributes do not delete existing files, extra steps must be taken if it is required that the publisher and peer filesystems match exactly.

Download packages in a zip file