A Disk RAID array Volume Manager for Disk

13
92 Users 92 News 99 Source 99 Reference 99 Development
All 47.2 9.8 23.8 6.5 8.2
All-no-bad 26.7 5.2 9.1 3.4 5.0
Thread+Placement 7.1 1.5 1.5 3.0 1.1
Placement 1.9 1.9 2.0 1.9 1.9
Thread+Shuffle 0 0 0 0 0
Sequential 0.3 0.2 0.1 0.1 0.1
Table 2: Conflict counts This shows the average number of conflicts per constraint involving 3 or fewer other constraints.
1 2 3
0
1
2
3
4
5
Conflict length
Average number of conflicts
Figure 5: Conflict order. This graph shows the number of conflicts by conflict length. The number of conflicts is the average
number for each constraint, averaged across all of the test cases found in Table 2.
a sequential constraint. Data from a subset of traces shows that the number of conflicts of length four is
also low, and lower than the number of conflicts of length three. This suggests that most conflicts could be
captured by a search depth of two or three.
7.1.2 Constraint Performance Dependencies
The performance of a constraint is also tied to its environment, creating another form of dependencies. Some
number of other constraints will have an impact on the performance of each constraint. There are two main
ways that a constraint can affect the performance of another constraint: overlap and access dependencies.
Both depend upon the trace and the heuristics used.
7.1.3 Overlap Dependencies
Constraints may have dependencies caused by constraint overlap. In many cases, the learner can apply
several constraints on the same set of blocks. When several constraints overlap in this way, the worth
of a constraint can be affected by whether the other overlapping constraints are applied. For example,
14
1 2 3 4 3 4 6 constraint A constraint B 7 constraint C 1 A
Figure 6: Overlap dependency. This figure shows three overlapping constraints. Constraints A and B are sequential constraints,
while constraint C is a placement constraint putting block 1 in disk area A. Any combination of these three constraints could be
applied. However, the weight of each individual constraint will depend on what combination of other constraints are applied.
constraint A 1 A constraint B 3 6 8 2 Trace: 1, 3, 5, 7, 9
Figure 7: Access dependency. This figure shows an access dependency. Sequential constraint B is dependent on placement
constraint A because constraint A affects a block which precedes a block in constraint B.
a placement constraint could heavily affect the performance of a sequential constraint affecting the same
block by placing it in a different region. Figure 6 shows an example of an overlap dependency. We select
those constraints which overlap with a constraint as dependent constraints. However, a constraint can also
depend upon constraints which do not immediately overlap with it, but overlap with constraints that in turn
overlap with the original constraint. These dependency chains can be arbitrarily long.
7.1.4 Access Dependencies
Constraints may also have dependencies which arise from access patterns in the trace. For example, if
block
often follows block in the trace, then the value of a constraint placing block
may depend upon
the constraints placing block . This dependency occurs because of the distance-dependent mechanical
latencies of disk drives. Figure 7 shows an example of an access dependency.
Our model only considers the immediate predecessor in the trace as a possible dependency for simplicity.
In the real system, however, any number of previous requests can affect the response time due to
scheduling and caching algorithms. Ways to extend this notion of a predecessor could be an interesting area
of study.
7.1.5 Evaluation
This section shows the results from a variety of experiments, all of which share a similar experimental set up.
The experiments use the set of traces and heuristics described in Section 5. After generating the constraints
from the set of heuristics given the trace, the experiments choose a random 1% of the constraints to monitor.
They then monitor the change in constraint weight for these measured constraints under a given condition,
over 30 runs. For each combination, the experiment runs on five days of trace. The reported values are the
averages of these five days. The actual state of the experiment varies between the experiments and will be
explained at that point.
The change in constraint weight is an interesting value because it shows how accurately we can predict
15
92 Users 92 News 99 Source 99 Reference 99 Development
All 76% 67% 59% 48% 60%
All-no-bad 111% 87% 79% 59% 81%
Thread+Placement 96% 76% 76% 48% 72%
Placement 45% 41% 37% 30% 41%
Thread+Shuffle 119% 78% 87% 49% 84%
Sequential 35% 18% 36% 24% 34%
Table 3: Performance dependencies This shows the change in constraint weight when all of the dependencies are permuted.
92 Users 92 News 99 Source 99 Reference 99 Development
All 5.4 3.8 4.4 4 4.2
All-no-bad 4.6 3 3.4 3 3.4
Thread+Placement 3 2 3 2 2.2
Placement 2 2 2 2 2
Thread+Shuffle 2 1 2 1 1.2
Sequential 2 1.6 2 1 1.2
Table 4: Overlap counts This shows the average number of dependent constraints using only overlap dependencies.
the weight of any individual constraint across runs. If this error is too high, it will be difficult for the
learner to make decisions about constraints. To measure the change in constraint weights, we report percent
standard deviation of the constraint weights.
The results from the first experiment are listed in Table 3. This table shows the standard deviation of
the selected constraint weights when the overall layout is kept the same except for the dependent constraints
for the selected constraints, which are randomly applied or not applied between the runs. The table shows
that the standard deviation in constraint weight is quite high, running as high as 119%. This shows that these
dependent constraints have a large effect on constraint weights, and must be considered.
Thus, in order to adequately evaluate constraint weight, we must determine what the weight of the
constraint is in each of these conditions. This is a problem that is exponential in the number of dependent
constraints.
Tables 4 and 5 show the average numbers of performance dependencies for each constraint. It is
interesting to notice that the number of dependent constraints is fairly small on average, being around 10-15
constraints total. In general adding more heuristics increases the number of dependencies, but both the trace
and the heuristics are important contributing factors in the number.
16
92 Users 92 News 99 Source 99 Reference 99 Development
All 9.4 18.4 9 12 12.6
All-no-bad 8.2 15.6 7 9.2 10.2
Thread+Placement 6.2 11.4 6.2 7.2 7.8
Placement 5.6 12.6 5.8 8.8 8.6
Thread+Shuffle 4 6.8 3.8 4 4.8
Sequential 7.4 20.8 6 7.2 9.6
Table 5: Dependency counts This shows the average number of dependent constraints using both types of dependencies.
Heuristics 92 Users 92 News 99 Source 99 Reference 99 Development
All 23% 31% 18% 19% 21%
All-no-bad 30% 34% 15% 22% 23%
Thread+Placement 39% 40% 33% 24% 25%
Placement 35% 27% 31% 21% 22%
Thread+Shuffle 39% 46% 29% 29% 28%
Sequential - 3% 11% 5% 6%
Table 6: Conversion error This shows the change in constraint weights even when the dependent constraints are pinned. The
values not included did not generate enough constraints to perform the experiment.
7.1.6 Conversion Error
The conversion from constraint into layout also contributes some amount of randomness into constraint
weights. Even when the set of dependent constraints discussed in the previous section are held constant,
the performance of a constraint will be affected by other factors. For example, the placement constraints
place a block in a given area, not an exact location. This means that the location of the block may change
even if the constraints affecting that block do not. Other changes may affect cache behavior and scheduling
decisions for a block as well. In addition, there may be other dependent constraints besides those we have
chosen. For example, access dependencies can include more than just the immediate predecessor in the trace
due to scheduling and caching. Overlap dependencies may include constraints which do not immediately
overlap with a constraint, but overlap with constraints which then overlap with the original constraint. These
dependency chains can be arbitrarily long.
Table 6 shows the conversion error for a few trace and heuristic sets. The conversion error is determined
by applying all of the constraints the experiment has chosen to monitor. For each constraint upon which a
monitored constraint depends, the experiment decides whether it will be applied at the first iteration, and
keeps this constant throughout the runs. However, the other constraints in the set are randomly applied or
not applied each run. This effectively shows the change in each constraint’s weight if the constraints upon
which it depends are held constant, but other constraints are varied.
17
1 2 3 4 5 6
0
20
40
60
80
100
Day
Percent still applied
(a) 92 Users.
1 2 3 4 5 6
0
20
40
60
80
100
Day
Percent still applied
(b) 92 News.
1 2 3 4 5 6
0
20
40
60
80
100
Day
Percent still applied
(c) 99 Source.
1 2 3 4 5 6
0
20
40
60
80
100
Day
Percent still applied
(d) 99 Reference.
1 2 3 4 5 6
0
20
40
60
80
100
Day
Percent still applied
(e) 99 Development.
Figure 8: Constraint filtering These graphs show the results from a long term experiment. In each experiment, a set of
constraints is chosen by the learner to be applied on a given day. The learner then uses this set of constraints on the next 6 days.
The graph shows what percent of these constraints the learner chooses to apply on the following 6 days. The graphs show that the
change from day to day is usually not significant, but the change between workloads is large.
This standard deviation is higher than desired, in the 30% range - but should still be tolerable. Section
8 describes several tradeoff points that might help mitigate this conversion error. Expanding the dependency
list might also decrease this error. However, the conversion error is much smaller than the dependent constraint
error found in Table 3. Note that Table 3 does not include the conversion error, since the experiment
only varies the dependent constraints, and not other constraints in the set.
7.2 Workload change
Another challenge in building this kind of system is that workloads change over time. This means that we
must find constraints that will perform well over a long period of time, and not just on the current set of
training data. Some constraints may only fit idiosyncrasies of the current day of trace, and may even be
detrimental later on.
Figure 8 shows the results of a long term experiment. In this experiment, we first find a set of constraints
which perform well on a given day. We then use this set of constraints for the next few days. The results
show what percent of these original constraints the learner chooses to apply on the next 6 days. Each day
18
was run with 10 iterations.
The graphs show that the percent applied is fairly steady from day to day, suggesting that most of
the problem constraints are over-fitted to the particular day, rather than gradually made inappropriate by
workload change. It is also interesting to note that the percent that are not applicable by the second day is
considerable, between 20 and 80 percent. The large difference between different workloads is also noteworthy.
7.3 Problem Summary
The combination of the various problems listed above make creation of an effective learner very difficult.
Unfortunately, solutions to one problem tend to make it difficult to solve the others. For example, the
problem of solving which constraints to apply given a weight for each constraint is exponentially hard.
However, the weight of each constraint is also dependent on other constraints, making the problem of finding
the constraint weight exponential.
After finding a suitable solution to this challenge, the learner must take long term trends into account.
However, when determining the value of constraints in a long term environment, the learner must remember
the worth of the constraint is dependent upon which other constraints will also be applied, and which will
not. This context may change from day to day, making it difficult to predict. These problem complexities,
several of which were unexpected at the project’s outset, explain why we were ultimately unable to construct
a solid learner for our two-tiered architecture.
8 Design Tradeoffs
There are a wide variety of tradeoffs in the construction of this system. This section discusses a few of the
tradeoffs that we have explored, for the benefit of researchers who pick up this line of research in the future.
8.1 Heuristic Support
Another area for exploration is the frequency of accesses needed for a constraint to be adequately evaluated.
We will call the number of times the block is accessed support. If the blocks referenced by a constraint occur
infrequently in the trace, the learner may not be able to determine the value of the constraint effectively.
Initially we expected that low numbers of accesses would simple lead to weaker constraints. However, low
numbers of accesses also mean that the constraint weight is more likely to be heavily influenced by chance
factors. For example, if a block were only accessed a single time, whether the placement happened to fit
that chance rotational latency would have a huge effect on the constraint weight. However, if the required