A Disk RAID array Volume Manager for Disk

can begin detailed analysis of those RPC requests and
responses.
Comparison-based verification offers a simpler solution,
assuming that the benchmark runs properly when using
the reference server. Comparing the SUT’s responses to
the problem-free responses produced by the reference
server can quickly identify the specific RPC requests
for which there are differences. Comparison provides
the most benefit when problems involve nuances in responses
that cause problems for clients (as contrasted
with problems where the server crashes)—often, these
will be places where the server implementors interpreted
the specification differently. For such problems, the exact
differences between the two servers’ responses can be
identified, providing detailed guidance to the developer
who needs to find and fix the implementation problem.
2. Bug compatibility: In discussing vagueness in specifications,
we have noted that some aspects are often open
to interpretation. Sometimes, implementors misinterpret
them even if they are not vague. Although it is tempting
to declare both situations “the other implementor’s problem,”
that is simply not a viable option for those seeking
to achieve widespread use of their server. For example,
companies attempting to introduce a new server product
into an existing market must make that server work for
the popular clients. Thus, deployed clients introduce de
facto standards that a server must accommodate. Further,
if clients (existing and new) conform to particular “features”
of a popular server’s implementation (or a previous
version of the new server), then that again becomes
a de facto standard. Some use the phrase, “bug compatibility,”
to describe what must be achieved given these
issues.
As a concrete example of bug compatibility, consider
the following real problem encountered with a previous
NFSv2 server we developed: Linux clients (at the
time) did not invalidate directory cookies when manipulating
directories, which our interpretation of the specification
(and the implementations of some other clients)
indicated should be done. So, with that Linux client, an
“rm -rf” of a large directory would read part of the directory,
remove those files, and then do another READDIR
with the cookie returned by the first READDIR.
Our server compressed directories when entries were removed,
and thus the old cookie (an index into the directory)
would point beyond some live entries after some
files were removed—the “rm -rf” would thus miss some
files. We considered keeping a table of cookie-to-index
mappings instead, but without a way to invalidate entries
safely (there are no definable client sessions in
NFSv2), the table would have to be kept persistently; we
finally just disabled directory compression. (NFSv3 has
a “cookie verifier,” which would allows a server to solve
this problem, even when other clients change the directory.)
Comparison-based verification is a great tool for achieving
bug compatibility. Specifically, one can compare
each response from the SUT with that produced by a
reference server that implements the de facto standard.
Such comparisons expose differences that might indicate
differing interpretations of the specification or other
forms of failure to achieve bug compatibility. Of course,
one needs an input workload that has good coverage to
fully uncover de facto standards.
3. In situ verification: Testing and benchmarking allow
offline verification that a server works as desired, which
is perfect for those developing a new server. These approaches
are of less value to IT administrators seeking
comfort before replacing an existing server with a new
one. In high-end environments (e.g., bank data centers),
expensive service agreements and penalty clauses can
provide the desired comfort. But, in less resource-heavy
environments (e.g., university departments or small businesses),
administrators often have to take the plunge with
less comfort.
Comparison-based verification offers an alternative,
which is to run the new server as the SUT for a period
of time while using the existing server as the reference
server.3 This requires inserting a server Tee into the live
environment, which could introduce robustness and performance
issues. But, because only the reference server’s
responses are sent to clients, this approach can support
reasonably safe in situ verification.
4. Isolating performance differences: Performance
comparisons are usually done with benchmarking. Some
benchmarks provide a collection of results on different
types of server operations, while others provide overall
application performance for more realistic workloads.
Comparison-based verification could be adapted to performance
debugging by comparing per-request response
times as well as response contents. Doing so would allow
detailed request-by-request profiles of performance differences
between servers, perhaps in the context of application
benchmark workloads where disappointing overall
performance results are observed. Such an approach
might be particularly useful, when combined with in situ
verification, for determining what benefits might be expected
from a new server being considered.
3Although not likely to be its most popular use, this was our original
reason for exploring this idea. We are developing a large-scale
storage service to be deployed and maintained on the Carnegie Mellon
campus as a research expedition into self-managing systems [4]. We
wanted a way to test new versions in the wild before deploying them.
We also wanted a way to do live experiments safely in the deployed
environment, which is a form of the fourth item.
3 Components of a file system Tee
Comparison-based server verification happens at an interposition
point between clients and servers. Although
there are many ways to do this, we believe it will often
take the form of a distinct proxy that we call a “server
Tee”. This section details what a server Tee is by describing
its four primary tasks. The subsequent section
describes the design and implementation of a server Tee
for NFSv3.
Relaying traffic to/from reference server: Because it
interposes, a Tee must relay RPC requests and responses
between clients and the reference server. The work involved
in doing so depends on whether the Tee is a passive
or an active intermediary. A passive intermediary
observes the client-server exchanges but does not manipulate
them at all—this minimizes the relaying effort,
but increases the effort for the duplicating and comparing
steps, which now must reconstruct RPC interactions
from the observed packet-level communications. An active
intermediary acts as the server for clients and as the
only client for the server—it receives and parses the RPC
requests/responses and generates like messages for the final
destination. Depending on the RPC protocol, doing
so may require modifying some fields (e.g., request IDs
since all will come from one system, the Tee), which is
extra work. The benefit is that other Tee tasks are simplified.
Whether a Tee is an active intermediary or a passive one,
it must see all accesses that affect server state in order
to avoid flagging false positives. For example, an unseen
file write to the reference server would cause a subsequent
read to produce a mismatch during comparison
that has nothing to do with the correctness of the SUT.
One consequence of the need for complete interposing is
that tapping the interconnect (e.g., via a network card in
promiscuous mode or via a mirrored switch port) in front
of the reference server will not work—such tapping is
susceptible to dropped packets in heavy traffic situations,
which would violate this fundamental Tee assumption.
Synchronizing state on the SUT: Before RPC requests
can be productively sent to the SUT, its state must be
initialized such that its responses could be expected to
match the reference server’s. For example, a file read’s
responses won’t match unless the file’s contents are the
same on both servers. Synchronizing the SUT’s state
involves querying the reference server and updating the
SUT accordingly.
For servers with large amounts of state, synchronizing
can take a long time. Since only synchronized objects
can be compared, few comparisons can be done soon
after a SUT is inserted. Requests for objects that have
yet to be synchronized produce no useful comparison
data. To combat this, the Tee could simply deny client
requests until synchronization is complete. Then, when
all objects have been synchronized, the Tee could relay
and duplicate client requests knowing that they will all
be for synchronized state. However, because we hope
for the Tee to scale to terabyte- and petabyte-scale storage
systems, complete state synchronization can take so
long that denying client access would create significant
downtime. To maintain acceptable availability, if a Tee
is to be used for in situ testing, requests must be handled
during initial synchronization even if they fail to yield
meaningful comparison results.
Duplicating requests for the SUT: For RPC requests
that can be serviced by the SUT (because the relevant
state has been synchronized), the Tee needs to duplicate
them, send them, and process the responses. This is often
not as simple as just sending the same RPC request
packets to the SUT, because IDs for the same object on
the two servers may differ. For example, our NFS Tee
must deal with the fact that the two file handles (reference
server’s and SUT’s) corresponding to a particular
file will differ; they are assigned independently by each
server. During synchronization, any such ID mappings
must be recorded for use during request duplication.
Comparing responses from the two servers: Comparing
the responses from the reference server and SUT involves
more than simple bitwise comparison. Each field
of a response falls into one of three categories: bitwisecomparable,
non-comparable, or loosely-comparable.
Bitwise-comparable fields should be identical for any
correct server implementation. Most bitwise-comparable
fields consist of data provided directly by clients, such as
file contents returned by a file read.
Most non-comparable fields are either server-chosen values
(e.g., cookies) or server-specific information (e.g.,
free space remaining). Differences in these fields do not
indicate a problem, unless detailed knowledge of the internal
meanings and states suggest that they do. For example,
the disk space utilized by a file could be compared
if both server’s are known to use a common internal
block size and approach to space allocation.
Fields are loosely-comparable if comparing them requires
more analysis than bitwise comparison—the reference
and SUT values must be compared in the context of
the field’s semantic meaning. For example, timestamps
can be compared (loosely) by allowing differences small
enough that they could be explained by clock skew, communication
delay variation, and processing time variation.