responses that do not represent true discrepancies (e.g., directory entries returned in different orders), and then assists the user with visualizing the “problem” RPCs. Off-line post-processing is useful for reducing on-line overheads as well as allowing the user to refine comparison rules without losing data from the real environment (since the log is a filtered trace). 4.2 State synchronization The synchronization module updates the SUT to enable useful comparisons. Doing so requires making the SUT’s internal state match the reference server’s to the point that the two servers’ responses to a given RPC could be expected to match. Fortunately, NFSv3 RPCs generally manipulate only one or two file objects (regular files, directories, or links), so some useful comparisons can be made long before the entire file system is copied to the reference server. Synchronizing an object requires establishing a point within the stream of requests where comparison could begin. Then, as long as RPCs affecting that object are handled in the same order by both servers, it will remain synchronized. The lifetime of an object can be viewed as a sequence of states, each representing the object as it exists between two modifications. Synchronizing an object, then, amounts to replicating one such state from the reference server to the SUT. Performing synchronization offline (i.e., when the reference server is not being used by any clients) would be straightforward. But, one of our goals is the ability to insert a SUT into a live environment at runtime. This requires dealing with object changes that are concurrent with the synchronization process. The desire not to disrupt client activity precludes blocking requests to an object that is being synchronized. The simplest solution would be to restart synchronization of an object if a modification RPC is sent to the reference server before it completes. But, this could lead to unacceptably slow and inefficient synchronization of large, frequently-modified objects. Instead, our synchronization mechanism tracks changes to objects that are being synchronized. RPCs are sent to the reference server as usual, but are also saved in a changeset for later replay against the SUT. Figure 3 illustrates synchronization in the presence of write concurrency. The state S1 is first copied from the reference server to the SUT. While this copy is taking place, a write (Wr1) arrives and is sent to the reference server. Wr1 is not duplicated to the SUT until the copy of S1 completes. Instead, it is recorded at the Tee. When the copy of S1 completes, a new write, Wr1’, is constructed based on Wr1 and sent to the SUT. Since no further concurrent changes need to be replayed, the object is marked reference object lifetime SUT object lifetime S2 Wr1 Copy S1 S1 Wr1’ Time S1 S2 Figure 3: Synchronization with a concurrent write. The top series of states depicts a part of the lifetime of an object on the reference server. The bottom series of states depicts the corresponding object on the SUT. Horizontal arrows are requests executed on a server (reference or SUT), and diagonal arrows are full object copies. Synchronization begins with copying state S1 onto the SUT. During the copy of S1, write Wr1 changes the object on the reference server. At the completion of the copy of S1, the objects are again out of synchronization. Wr1’ is the write constructed from the buffered version of Wr1 and replayed on the SUT. synchronized and all subsequent requests referencing it are eligible for duplication and comparison. Even after initial synchronization, concurrent and overlapping updates (e.g., Wr1 and Wr2 in Figure 4) can cause a file object to become unsynchronized. Two requests are deemed overlapping if they both affect the same state. Two requests are deemed concurrent if the second one arrives at the relay before the first one’s response. This definition of concurrency accounts for both network reordering and server reordering. Since the Tee has no reliable way to determine the order in which concurrent requests are executed on the reference server, any state affected by both Wr1 and Wr2 is indeterminate. Resynchronizing the object requires re-copying the affected state from the reference server to the SUT. Since overlapping concurrency is rare, our Tee simply marks the object unsynchronized and repeats the process entirely. The remainder of this section provides details regarding synchronization of files and directories, and describes some synchronization ordering enhancements that allow comparisons to start more quickly. Regular file synchronization: A regular file’s state is its data and its attributes. Synchronizing a regular file takes place in three steps. First, a small unit of data and the file’s attributes are read from the reference server and written to the SUT. If a client RPC affects the object during this initial step, the step is repeated. This establishes a point in time for beginning the changeset. Second, the remaining data is copied. Third, any changeset entries are replayed. A file’s changeset is a list of attribute changes and written-to extents. A bounded amount of the written data reference object lifetime SUT object lifetime Time X S2 S2 Copy S2 S1 S1 Wr1, Wr2 Figure 4: Re-synchronizing after write concurrency. The example begins with a synchronized object, which has state S1 on both servers. When concurrent writes are observed (Wr1 and Wr2 in this example), the Tee has no way of knowing their execution order at the reference server. As a consequence, it cannot know the resulting reference server state. So, it must mark the object as unsynchronized and repeat synchronization. is cached. If more data was written, it must be read from the reference server to replay changes. As the changeset is updated, by RPCs to reference server, overlapping extents are coalesced to reduce the work of replaying them; so, for example, two writes to the same block will result in a single write to the SUT during the third step of file synchronization. Directory synchronization: A directory’s state is its attributes and the name and type of each of its children. 5 This definition of state allows a directory to be synchronized regardless of whether its children are synchronized. This simplifies the tracking of a directory’s synchronization status and allows the comparison of responses to directory-related requests well before the children are synchronized. Synchronizing a directory is done by creating missing directory entries and removing extraneous ones. Hard links are created as necessary (i.e., when previously discovered file handles are found). As each unsynchronized child is encountered, it is enqueued for synchronization. When updates occur during synchronization, a directory’s changeset will include new attribute values and two lists: entries to be created and entries to be removed. Each list entry stores the name, file handle, and type for a particular directory entry. Synchronization ordering: By default, the synchronization process begins with the root directory. Each unknown entry of a directory is added to the list of files to be synchronized. In this way, the synchronization process works its way through the entire reference file system. One design goal is to begin making comparisons as 5File type is not normally considered to be part of a directory’s contents. We make this departure to facilitate the synchronization process. During comparison, file type is a property of the file, not of the parent directory. quickly as possible. To accomplish this, our Tee synchronizes the most popular objects first. The Tee maintains a weighted moving average of access frequency for each object it knows about, identifying accesses by inspecting the responses to lookup and create operations. These quantities are used to prioritize the synchronization list. Because an object cannot be created until its parent directory exists on the SUT, access frequency updates are propagated from an object back to the file system root. 4.3 Comparison The comparison module compares responses to RPC requests on synchronized objects. The overall comparison functionality proceeds in two phases: on-line and postprocessed. The on-line comparisons are performed at runtime, by the Tee’s comparison module, and any nonmatching responses (both responses in their entirety) are logged together with the associated RPC request. The logged information allows post-processing to eliminate false non-matches (usually with more detailed examination) and to help the user to explore valid non-matches in detail. Most bitwise-comparable fields are compared on-line. Such fields include file data, file names, soft link contents, access control fields (e.g., modes and owner IDs), and object types. Loosely-comparable fields include time values and directory contents. The former are compared on-line, while the latter (in our implementation) are compared on-line and then post-processed. Directory contents require special treatment, when comparison fails, because of the looseness of the NFS protocol. Servers are not required to return entries in any particular order, and they are not required to return any particular number of entries in a single response to a READDIR or READDIRPLUS RPC request. Thus, entries may be differently-ordered and differently-spread across multiple responses. In fact, only when the Tee observes complete listings from both servers can some non-matches be definitively declared. Rather than deal with all of the resulting corner cases on-line, we log the observed information and leave it for the post-processor. The post-processor can link multiple RPC requests iterating through the same directory by the observed file handles and cookie values. It filters log entries that cannot be definitively compared and that do not represent mismatches once reordering and differing response boundaries are accounted for. 4.4 Implementation We implemented our Tee in C++ on Linux. We used the State Threads user-level thread library. The relay runs |