Free Academic Seminars And Projects Reports

[attachment=880]
Chapter 1. 1.Introduction
Despite seemingly endless increases in the amount of storage
and ever decreasing costs of hardware, managing storage is
still expensive. Additionally, users continue to fill increasingly
larger disks, worsened by the proliferation of large multimedia
files and high-speed broadband networks. Storage requirements
are continuing to grow at a rate of 50% a year. Worse, existing
hard disk technology is reaching physical limitations, making it
harder and costlier to meet growing user demands .
Storage management costs have remained a significant component
of total storage costs. Even in the 70s, storage costs at
IBM were several times more than hardware costs, and projected
that they would reach ten times the cost of the hardware
. Today, management costs are five to ten times the cost of underlying
hardware and are actually increasing as a proportion
of cost because administrators have a limited amount of storage
each can manage. Up to 47% of storage costs are associated
with administrators manually manipulating files .
Thankfully, significant savings are possible: studies show that
over 20% of all files--representing over half of the storage--are
regenerable . Other studies, indicate that 82%-85% of storage
is allocated to files that have not been accessed in more than a
month.
The studies shows that storage management has been a problem
in a past, continues to be a problem today, and is only getting
worse--all despite growing disk sizes. Recent trends have
begun to address the management of storage through virtualization
. Morris put forth the idea of Autonomic Computing,
which includes "the system s ability to adjust to its configuration
and resource allocation to achieve predetermined goals" .
Elastic Quota system is designed to help the management problem
via efficient allocation of storage while allowing users maximal
freedom, all with minimal administrator intervention.
Elastic quotas enter users into an agreement with the system:
users can exceed their quota while space is available, under
the condition that the system will be able to automatically reclaim
the storage when the need arises. Users or applications
may designate some files as elastic. When space runs short, the
elastic quota system (Equota) may reclaim space from those files
marked as elastic; nonelastic files maintain existing semantics
and are accounted for in users persistent quotas .
This report focuses on policies for elastic space reclamation and
is organized as follows. Section 2 describes the overall architecture
of the policy system. Section 3 discusses the various elastic
quota policies. In Section 4 we discuss interesting implementation
aspects of Elastic Quota.Secrion 5 presents measurements
1
Chapter 1. 1.Introduction
and performance results of various policies. Section 6 discusses
work related to storage space management policies. Finally, Section
7 presents some concluding remarks and directions for future
work.
2
Chapter 2. Design
The primary design goals were to allow for versatile and efficient
elastic quota policy management. An additional goal was
to avoid changes to the existing OS to support elastic quotas.
To achieve versatility the Elastic Quota system is designed with
a flexible policy management configuration language for use by
administrators and users; a number of user-level and kernel
features exist to support this flexibility. To achieve efficiency
the design allows the system to run as a kernel file system with
DB3 databases accessible to the user-level tools. Finally, the
dsign uses a stackable file system to ensure that there is no
need to modify existing file systems such as Ext3 .
Given the above goals, the design of an elastic quota file system
has to answer one key question: how to identify a file as elastic
vs. persistent. A related question is how to efficiently locate all
elastic files on a given file system. We identified two different
alternatives. The first, is to separate elastic and persistent files
into two directory hierarchies and unify them at the stackable
file system level. Using this method, a file is identified as elastic
if it is located in the elastic directory hierarchy; identifying
elastic files requires recursive scanning of the elastic hierarchy.
Changing the elasticity of a file here involves moving or copying
it from one directory hierarchy to another.
The second alternative, described in this report, is to mark a file
as elastic using a single inode bit; such spare bits are available
in most modern disk-based file systems such as FFS, UFS, and
Ext2/3. Changing elasticity here involves using a standard ioctl
to turn the bit on/off. Locating all elastic files requires recursive
scanning of all files and checking if the inode bit is on or
off. To improve performance and versatility in this design elastic
file information is recorded in DB3 databases: user IDs, file
names, and inode numbers. These DB3 databases improve performance,
but if lost they can be regenerated from information
contained within the file system.
3
Chapter 2. Design
Figure 2-1. Elastic Quota Architecture
Architecture
Figure 1 shows the overall architecture of Elastic Quota system.
There are four components in the Elastic Quota system:
EQFS: The elastic quota file system is a stackable file system
that is mounted on top of another disk based file system
such as Ext3. EQFS includes a component (Edquot) that indirectly
manages the kernel s native quota accounting. EQFS
may also be configured to send messages to a user space component,
Rubberd. EQFS also supports an optimization called
Short-lived-files.
DB3 databases: These databases record information about
elastic files. There is two types of databases. First, for each
user there is a database that maps inode numbers of elastic
files to their names, which is be denoted in this paper as I->N.
Having separate databases for each user allows us to locate
all of the elastic files of a user easily, as well as enumerate all
elastic files by going over each per-user database. The second
type of database (denoted U->A) is a small one that records
for each user an abusage factor number denoting how "good"
or "bad" has a given user been with respect to historical utilization
of disk space.
Rubberd: This user-level daemon contains two threads. The
database management thread is responsible for updating the
various DB3 databases. The policy thread executes cleaning
policies at given times, which often involves querying the DB3
databases.
Elastic Quota Utilities: These utilities include enhanced versions
of the quota-utils package used to query, set, and con-
4
Chapter 2. Design
trol user and group quotas. There is also new utilities that
can build or query the DB3 databases; this is useful to build
the DB3 databases from an existing file system or to quickly
list all elastic files owned by a user.
System Operation
Before the description of the system s operation, we can look at
how quotas are accounted on a system without elastic quotas.
In traditional operating systems, quota accounting is often integrated
with the native disk-based file system. Since Linux supports
a number of file systems with quotas, quota accounting
is an independent VFS component called dquot. Usually, system
calls invoke VFS operations which in turn call file-system--
specific operations.
However, unlike other VFS code, Dquot code does not call the
file system. Instead, the native file system calls the Dquot operations
directly. This Dquot operations vector is initialized when
quotas are turned on for that file system by the system administrator
or at boot time. The reason for this reverse calling
sequence is that only the native disk-based file system knows
when a user operation has resulted in a change in the consumption
of inodes or disk blocks.
The stackable file system EQFS intercepts file system
operations, performs related elastic quota operations, and then
passes the operation to the lower file system (Ext2, Ext3,
etc.). EQFS also intercepts the file system operations used to
turn on quota management and inserts its own set of quota
management operations vector in the component which is
called edquot. Figure 1 shows that the calling convention for
regular file system operations is from a user process issuing
a system call, to the stackable file system, and then down to
lower file system. However, the calling convention for quota
operations is reversed: from the lower file system, through our
elastic quota management (Edquot), and to the VFS s own disk
quota management (Dquot). This novel interception is a form
of reverse stacking which we have devised to avoid changing
either Dquot (and hence the VFS) or native file systems.
Each user on our system has two UIDs: one that accounts
for persistent quotas and another that accounts for elastic
quotas. The latter, called the shadow UID, is simply the
ones-complement of the former. The shadow UID does not
modify existing ownership or permissions semantics, it is
only used for quota accounting. Users execute system calls
which the VFS translates into a file system operation. Those
operations are passed to Ext3. When Ext3 calls EQFS s Edquot
operations, Edquot determines if the operation was for an
5
Chapter 2. Design
elastic or a persistent file, by inspecting the file s elastic inode
bit. If the accounting operation was to an elastic file, Edquot
tells Dquot to account for the changed resource (inode or disk
block) in the shadow UID. In this manner it is easy to account
for elastic vs. persistent quotas separately; both kernel and
userlevel utilities can easily find out how much persistent or
elastic space a user is using, which eases Rubberd s policy
management tasks.
There is two methods for keeping track of elastic files. The first,
called full mode, is to track each individual file operation. The
second, called null mode, is to periodically generate a list of
elastic files from the file system. The advantage of full mode is
that the list of elastic files will always be current; the advantage
of the null mode is that overhead is minimized during normal
system operation. The mode can be selected using the "netlink"
option in rubberd.conf (see Table 1).
If running in full mode, whenever EQFS performs certain operations
that affect an elastic file, it informs Rubberd of that
event. Rubberd records the status of that elastic file in the DB3
databases. EQFS informs Rubberd about creation, deletion, renames,
hard links, or ownership changes of elastic files. Additionally,
whenever a persistent file is made elastic or an elastic
file is made persistent, EQFS treats this as a creation or deletion
event, respectively. EQFS communicates this information with
Rubberd s database management thread over a Linux kernelto-
user socket called netlink.
When using full mode, Rubberd s database management thread
listens for netlink messages from EQFS. When it receives a message,
Rubberd decodes it and applies the proper operation on
the per-user I->N database. For example, users can make a
file elastic using the chattr (change file attributes) program on
Linux. When they turn on the elastic bit on a file, EQFS sends
a "create elastic file" netlink message to Rubberd along with the
UID, inode number, and the name of the file. Rubberd then performs
a DB3 "put" method to insert a new entry in that user s
I->N database, using the inode number as key and the file s
name as the entry s value.
Rubberd s policy thread executes a given policy as defined by
system administrators. Suppose Rubberd s policy thread is executing
a removal policy to reclaim disk space by deleting certain
elastic files. Rubberd invokes unlink operations through EQFS,
which in turn are passed to Ext3 and Dquot. When using full
mode, EQFS sends a netlink message to Rubberd s database
management thread--in this case a "delete elastic file" netlink
message. Rubberd is multi-threaded because it has to concurrently
invoke EQFS system calls as well as receive and process
netlink messages from EQFS.
6
Chapter 2. Design
When using null mode, the DB3 databases will not be up-todate
with respect to the file system. Nevertheless, this mode is
useful for a time-based policy such as cleaning oldest files first,
since older files are likely to be in the DB3 database. Since Rubberd
obtains information about all files at cleaning time, even if
the file was updated after the nighly generation of the database,
Rubberd will still use up-to-date file attributes. If Rubberd is
not able to reclaim enough space using the previously generated
databases, it will initiate a more expensive recursive scan of the
file system to generate up-to-date databases. System administrators
must weigh the added benefit of upto- date accounting
with the extra performance overhead introduced by EQFS full
mode.
Rubberd is configured to wakeup periodically and record historical
abusage factors for each user, denoting the user s average
elastic space utilization over a period of time. Rubberd gets the
list of all users, their elastic and persistent disk usage, and their
elastic and persistent quotas (if any). With these numbers, Rubberd
computes an updated abusage factor, and stores this value
in the U->A database.
Elasticity Modes
EQFS supports five methods of determining when a file becomes
elastic.
First, users can turn the file s elasticity on or off by using the
Linux standard chattr tool. This allows users to control elasticity
on a per file basis. Once a file is made elastic or persistent,
moving it to other directories on the system does not change its
elasticity.
Second, users can use chattr to turn the elastic bit on a directory
inode. EQFS inherits the elastic bit to any newly-created
file or sub-directory. This mode works similarly to inheriting the
group of a file from a setgid directory. This elasticity mode is
useful for whole elastic hierarchies, such as /tmp or a user s
Web browser cache directory.
Third, users can tell EQFS (via an ioctl) that all files that are
created next should be elastic (or not). This mode works similarly
to how the newgrp command sets the default group that
all newly-created files or directories should use. One use for this
mode is when users unpack an important source distribution;
before beginning to build the package, users can set this elasticity
mode for all future files. That way, all newly created files
during the build, regardless of their location will be elastic: objects,
libraries, executables, header-dependency files, etc.
Fourth, users can tell EQFS (again, via an ioctl) which newly-
7
Chapter 2. Design
created files should be elastic by their name. Specifically, users
can specify a small number of file extension strings that are
matched by eqfs create from a newly created file name. We allow
only simple strings and compare file names to those strings
using strcmp against the end of the file s name; regular expressions
or shell-style filename patterns are not allowed to avoid
consuming too much CPU time. This mode is particularly useful
because users often think of the importance of files by their
type--or extension (e.g., .c are more important than .o files because
the latter can be easily regenerated from the former).
Finally, application developers may know best which files are
temporary and can be marked elastic. Since many temporary
files are not explicitly created by users, but rather by programs,
there is a new flag to the open and creat EQFS file system
methods: O ELASTIC. This flag tells EQFS to create the new
file as elastic. For example, Emacs can automatically create its
backup files elastically.
Optimizations
There is a number of optimizations in this Elastic Quota system
aimed at improving performance. The two most innovative
optimizations are discussed next.
Short Lived Files
An exhaustive file system study and showed that 80% of all files
live for fewer than five seconds and short-lived files are significantly
smaller than long-lived files . In such an environment,
short-lived elastic files may be deleted well before Rubberd has
a chance to consider them as candidates for removal. Typically,
Rubberd will be configured to check for space utilization less
frequently than a few seconds. Moreover, certain policies are
likely to use LRU techniques--meaning that long-lived files are
more likely candidates for removal.
There is a Short Lived Files (SLF) optimization to EQFS which
queues information about newly created files and defers informing
Rubberd about them for a number of configurable seconds.
This is shown in Figure 1. When a new file is created, we insert
its name, owner, and inode number in the SLF queue. When
a file is deleted, we check to see if it is in the SLF queue; if
so, we remove it from the SLF queue and we do not have to
inform Rubberd about that file. Periodically, a kernel thread,
keqfs flushd, scans the SLF queue looking for information that
has been queued for longer than a certain number of seconds,
and then flushes it to Rubberd.
8
Chapter 2. Design
When administrators mount EQFS, they can define three parameters:
whether SLF optimizations are enabled; the number
of seconds that a message can remain in the queue; and the
maximum amount kernel memory used by the SLF queue. Disabling
SLF may be desired on systems with little memory because
SLF requires buffering Rubberd-bound information in the
kernel. If the SLF queue reaches the upper limit on memory usage,
the oldest messages will be flushed before a new message
is added.
Bulk Inode stat(2)
When Rubberd begins to execute certain policies, it often requires
the size or last modification times of elastic files, typically
gotten via the stat system call. File sizes and times are
necessary for policies that reclaim space by deleting the largest
or oldest files first. To find such files, however, Rubberd has
to stat numerous elastic files, a process that could take a long
time. Normally stat would be called with the name of each file.
When the kernel executes the sys stat service routine, it first
must translate the file name it gets to an inode structure (name
lookup). However, Rubberd already knows the inode number
of each elastic file, which could allow the kernel to skip name
lookup.
In Elastic Quota there is an EQFS ioctl called Bulk Inode Stat
(bistat) that allows a user process to get status information on
a number of files given their inode numbers. Since an inode
number is only four bytes on Linux, and Rubberd may have
to get status information on many files, bistat takes an array
of inode numbers and returns status information (struct stats)
on multiple files at once. It is expected that bistat will be more
efficient than individual stats, especially if the inode numbers
are localized, as is often the case with most sequentially created
files. The system time for bistat is almost 5 times faster than the
normal stat system call when used with a very large number of
files.
9
Chapter 2. Design
10
Chapter 3. Elastic Quota Policies
The core of the elastic quota system is its handling of space
reclamation policies. EQFS is the file system that provides elasticity
support and works in conjunction with Rubberd, the userspace
daemon that implements cleaning policies, as seen in Figure
1. Section 3.1 presents a general discussion of the design
issues involved in policies; as we see, there are often conflicting
concerns that must be carefully balanced to provide a fair, convenient,
and versatile system. Section 3.2 presents the design
of Rubberd s policy engine from the perspectives of users and
administrators. Section 3.3 presents how Rubberd determines
fairly how much disk space to reclaim and from which users;
that culminates in Section 3.4 where we detail the actual methods
and algorithms which are used to reclaim disk space; and
finally Section 3.5 presents how elastic quotas may be used in
various situations.
Policy Design Considerations
File system management involves two parties: the running system
and the people (administrators and users). To the system,
file system reclamation must be efficient so as not to disturb
normal operations. For example, when Rubberd wakes up periodically,
it must be able to quickly determine if the file system
is over the administratordefined high watermark. If so, Rubberd
must be able to locate all elastic files quickly because those files
are candidates for removal. Moreover, depending on the policy,
Rubberd will also need to find out certain attributes of elastic
files: owner, size, last modification time, etc. Quotas have often
been disliked by users and administrators alike. To the people
involved, file system reclamation policies must consider three
factors: gaming, fairness, and convenience. These three factors
are important especially in light of efficiency, because some policies
could be executed more efficiently than others.
Gaming
Gaming is defined as the ability of individual users to circumvent
the system and prevent their files from being removed first.
Good policies should be resistant to gaming. For example, a
global LRU policy that removes older files could be circumvented
by files owners simply by reading or touching those files. Some
policies are more difficult to game, for example a policy that removes
the largest files first. Users could split their large files into
smaller chunks, but then have to assemble the parts back before
the large file could be used again. Policies that are difficult
to game include a per-user worst-offender policy. Regardless of
the file s times or sizes, a user still owns the same total amount
11
Chapter 3. Elastic Quota Policies
of data. Such policies work well on multi-user systems where it
is expected that users will try to game the system.
There are certain situations where gaming may not be an important
factor in choosing policies. Certain global policies (e.g.,
by time or size) may still be useful in situations such as with a
small group of cooperative users who do no have an incentive
to circumvent the system; such gaming could hurt their colleagues
ability to work. Another useful scenario where gaming
is not an issue is a single-user workstation: to such a user, elastic
quotas can be a useful method of ensuring that temporary
files get automatically cleaned periodically.
Fairness
Fairness is hard to quantify precisely. It is often perceived by
the individual users as how they personally feel that the system
and the administrators treat them. Moreover, different sites may
want to use different policies based on the given needs and user
community.
Nevertheless, it is important to provide a number of policies that
could be tailored to the site s own needs. For example, some
users might consider a largest-file-first removal policy unfair because
recently-created files that the user may have deleted after
a short period of time may be removed by the system before that
time. Other users might feel that an oldest-creation-time policy
is unfair because it does not account for recency or frequency
of use.
For these reasons, the policies that are more fair are based on
individual users disk space usage. In particular, users that consume
more disk space over longer periods of time should be considered
the worst offenders. Overall, it is more fair if the amount
of disk space being cleaned is proportional to the level of offense
of each user who is using elastic space.
Once the worst offender is determined and the amount of disk
space to clean from that user is calculated, however, the system
must define which specific files should be removed first from
that user. Basic policies allow for time-based or size-based policies
for each user. For the utmost in flexibility, users are allowed
to define their own ordered list of files to be removed first. This
not just allows users to override system-wide policies, but also
to define new policies based on file names and other attributes
(e.g., remove *.o and * files first).
Convenience
Finally, the system should be easy to use and simple to under-
12
Chapter 3. Elastic Quota Policies
stand. Users should be able to find out how much disk space
they are consuming in persistent and elastic files and which of
their elastic files will be removed first. Administrators should be
able to configure new policies easily.
The algorithms used to define a worst offender should be simple
and easy to understand. For example considering the current
total elastic usage is simple and easy to understand. A more
complex algorithm could count the elastic space usage over time
as a weighted average. Although such algorithm is also more
fair because it accounts for historical usage, it might be more
difficult to understand by users.
Rubberd Configuration Files
Administrators
Administrators typically control two configuration files in /etc:
(1) an elastic quotas configuration file (policy.conf) and (2) a
Rubberd configuration file that defines startup options (rubberd.
conf).
policy.conf The policy configuration file uses a simple syntax as
follows. Blank lines and those starting with the comment character
# are ignored. The configuration file may define multiple
policies, one per line. When Rubberd has to reclaim space, it
first determines how much space it should reclaim--the goal.
Rubberd then evaluates each policy in order until the goal is
reached or no more policies can be evaluated. Each line in this
file has the following space-delimited format:
type method sort filter .. (1)
The first parameter, type, defines what kind of policy to use and
can have one of three values: global for a global policy, user
for a per-user policy, and user profile for a per-user policy that
first considers the user s own personal policy file. In this way
administrators can permit users to define policies on their files.
The second parameter, method, defines how space should be
reclaimed. Our prototype defines two methods currently: rm for
a policy that deletes files and gzip for a policy that compresses
files. In this way, administrators can define a system policy that
first compresses files and then removes them: such a policy has
the benefit that enough space may be reclaimed by compressing
files and users can still get access to their elastic files (if needed)
by decompressing them. A policy using mv and tar could be
used together as an HSM system, archiving and migrating files
to slower media at cleaning time.
13
Chapter 3. Elastic Quota Policies
The third parameter, sort, defines the order of files being reclaimed.
We define several keys: size (in disk blocks) for sorting
by largest file first, mtime for sorting by oldest modification time
first, and similarly for ctime and atime.
The remaining entries on the policy line are optional and define
file name filters to apply the policy to. If not specified, the policy
applies to all files.
As an example, consider the following policy.conf file:
global rm size .bak
user gzip atime .o
user_profile rm ctime
global rm mtime
The first line starts a simple global policy that will delete obviously
unnecessary elastic files such as backup files used by editors;
this could be a useful technique to reclaim space quickly
without having to process each user s files or quotas separately.
When Rubberd tries to reclaim space, it will try to bring the
system down to the goal level by this first policy line. If that
is insufficient, Rubberd will proceed and apply the second policy
line, which defines a per-user policy that will compress all
compiler object files that have not been read in a while. Next,
Rubberd will try a per-user policy that will remove files by their
last inode-change time, but allow users to define which files they
might want to remove first. Finally, if still not enough space has
been reclaimed, Rubberd will apply a global policy that deletes
any remaining elastic file based on its last modification time.
rubberd.conf The Rubberd configuration file is simple and defines
several parameters described in Table 1.
Figure 3-1. Rubberd configuration file parameters.
Users
If the system administrator has allowed users to determine their
own removal policies, users can then use whatever policy they
desire for determining the order in which their files are removed
14
Chapter 3. Elastic Quota Policies
(or compressed) first. The user policy file can only instruct Rubberd
to prefer those files for removal first; if not enough space
could be reclaimed, Rubberd will continue to reclaim space as
defined in the system-wide policy file, policy.conf.
A user-defined removal policy is simply a file stored in
/var/spool/rubberd/USER. The file is a newlinedelimited list
of file and directory names or simple patterns thereof, designed
to be both simple and flexible to use. Each line can list a
relative or absolute name of a file or directory. A double-slash
(//) syntax at the end of a directory name signifies that the
directory should be scanned recursively. In addition, simple
file extension patterns could be specified. Table 2 shows a few
examples and explains them.
Figure 3-2. Example user removal policy file entries.
Management of this removal policy file is done similarly to how
crontab manages per-user cron jobs. A separate user tool allows
a user to add, delete, or edit their policy file--as well as
to install a new policy from another source file. The tool verifies
that any updated policy conforms to the proper syntax. This tool
includes options to allow users to initialize their default policy
file to the list of all their elastic files, optionally sorted by name,
size, modification time, access time, or creation time.
Abusage Factors
When Rubberd has to reclaim some disk space, it must provide
a fair mechanism to distribute the amount of reclaimed space
among all users that consume any elastic space. To decide how
much disk space to reclaim from each user, Rubberd computes
an abusage factor (AF) for all users. Then Rubberd distributes
the amount of space to reclaim from each user proportionally to
their AF. For example, suppose Rubberd needs to clean 6MB of
disk space from two users; user A s AF is 10 and user B s AF
is 20; then Rubberd will clean 2MB from user A and 4MB from
user B.
Deciding how to compute AF, however, can vary depending on
what is perceived as fair by users and administrators for a given
15
Chapter 3. Elastic Quota Policies
site. Therefore, we provide a variety of configurable methods for
administrators to tailor the computation of abusage factors to
the site s needs. First, there are two types of AF calculations:
one that considers the current usage and a second that considers
historical usage. Current usage is better at tracking users
existing elastic usage as it changes; historical usage takes into
account users behavior patterns over longer periods of time.
As an example, consider two users: user A has never used elastic
space and just in the past day began consuming 100MB;
user B has used exactly 50MB of elastic space each day for the
past five days. Based on the current usage policy alone, user A s
AF will be double that of user B. During cleaning, twice as much
disk space will be reclaimed from user A than from user B. This
policy can be considered fair to the system and all users on the
system--because it will clean space based on how much is currently
being used. However, such a policy may unfairly punish
user A who, on average, has not used as much as user B: user
A s usage over the five days, averaged per day, is just 20MB.
Therefore, a historical usage policy may be considered more fair
because it takes into account long-term behavior. The converse
could also be true: a past disk space abuser could have a high
average average usage, but currently is not using much disk
space; a history-based AF could result in many of this user s
elastic files being deleted. Interestingly, historical abusage factors
may promote more responsible disk usage over time, and
reward those with lower average usage by allowing them to consume
more disk space during a shorter period of time.
Table 1 shows the three Rubberd configuration parameters used
to compute abusage factors. Rubberd always computes the current
usage per user (Uc) at configurable intervals. If the administrator
configured the use of historical factors, then Rubberd
also computes a running composite AF and stores it in a DB3
file.
Current Usage The Rubberd configuration file (rubberd.conf) parameter
abusage cur takes a single parameter that defines the
mode in which total current usage (Uc) is computed currently:
Ue In this mode we only consider the total elastic usage (Ue,
in disk blocks) that the user consumes. This mode considers
elastic usage separately from persistent quotas or persistent
usage; it is most useful in environments with small persistent
quotas. Rubberd gets this number directly from the in-kernel
quota system by querying it for the usage of the user s shadow
UID.
Ue-Ap Users who use elastic files and also have a persistent
quota may not have consumed all of their persistent quota.
Such users could argue that Ue alone is not a fair assessment
of their usage because they have persistent quota available
16
Chapter 3. Elastic Quota Policies
(Ap) and they could simply convert some of their elastic files
to persistent ones. Therefore, this method computes a user s
current usage as the amount of elastic space consumed minus
the available persistent quota the user has (truncated to
zero).
Ue+Up Similarly to the previous mode, this mode considers
the current usage as the total amount of disk space a user
consumes--the sum of both elastic and persistent usage. This
mode could be useful in environments where certain users
could have very different persistent quotas. In such an environment,
users with large persistent quotas could be viewed
as "hogging" disk space as compared to users with smaller
persistent quotas.
Historical Usage The Rubberd configuration parameter abusage
avg computes a linear average of usage over a period of time.
This option takes two parameters: I defines the interval in seconds
between samplings of current usage; N defines the number
of samples to include in the running average. This mode
gives equal importance to each sample interval, but quickly "forgets"
usage prior to the oldest sample. The smaller I is, the more
closely this mode tracks elastic usage.
The configuration parameter abusage exp computes an exponentially
decaying average. This option takes two parameters:
I is the sampling interval; D is the decay factor. For example,
with D = 2, the computation half-life decays every I seconds.
The benefit of this mode is that it never forgets entirely a user s
past usage, but considers more recent usage progressively more
important than older usage.
Cleaning Operation
To reclaim elastic space, Rubberd periodically wakes up and
performs a statfs to determine if high watermark has been
reached. If so, Rubberd spawns a new thread to perform the
actual cleaning. The thread reads the global policy file and
applies each policy sequentially, until the low watermark is
met or all policy entries are enforced.
The application of each policy proceeds in three phases:
abusage calculation, candidate selection, and application. For
user policies, Rubberd retrieves the abusage factor of each user
and then determines the number of blocks to clean from
each user proportionally according to the abusage factor. For
global policies this step is skipped since all files are considered
without regard to the owner s abusage factor.
Rubberd performs the candidate selection and application
phases only once for global policies. For user policies these two
17
Chapter 3. Elastic Quota Policies
phases are performed once for each user.
In the candidate selection phase all possible candidate inode
numbers are first retrieved from the DB3 databases. Then Rubberd
gets the status information (size and times) for each file
using our bulk inode stat, bistat. Rubberd then sorts the candidates
based on the policy (say, largest or oldest files first).
For global policies we iterate through each user database and
store all candidates in an array. For user policies we simply
fetch all entries from the appropriate database. When a file pattern
is specified in policy.conf, we retrieve the file name from
the database and compare it against the pattern for each file.
We discard that name because we anticipate that most files will
not have any cleaning operations performed on them (e.g., rm
or gzip); discarding the name avoids spending CPU and memory
resources to make a copy of each file name.
We use bistat to retrieve the file s size and sort attribute for each
file, which we store with the candidate inode number. Since we
fetch all entries from a Btree database for each user, we automatically
get the files ordered first by UID and then by inode.
This lets us take advantage of inode locality on the physical disk
when performing the bistat procedure, because most user files
are created within the same directory and file systems try to
cluster such files on the native media to reduce disk seeks. The
last phase of the candidate selection is to sort the entire set of
candidates as defined in policy.conf.
In the application phase, we start at the first element of the
candidate array and retrieve its name (or names if a hard link
exists) from the DB3 database. Then we reclaim disk space using
the administrator supplied method. For example we unlink
each name associated with the file if the "rm" policy was configured.
As we perform the application phase, we tally the number
of blocks reclaimed based on the previously-obtained stat information;
this avoids having to call statfs after each file removal
to check if the low watermark was reached.
Each time Rubberd completes an application phase, it runs
statfs and computes the number of blocks that still need to
be cleaned. If this number is not positive then cleaning terminates.
This gives smaller abusers a slight advantage. Since we
can only reclaim space in whole files this means that the goal
for each user is really a minimum goal. For example, suppose
Rubberd computes that it needs to remove 2MB from a given
user, and then deletes the oldest file which happens to be 3MB
in size: Rubberd winds up deleting more space than the minimum
computed for that user. This excess space reclaimed from
the largest abusers ends up benefiting the smallest abusers because
Rubberd will clean fewer files from the least abusers.
18
Chapter 3. Elastic Quota Policies
Usage Scenarios
The Equota system is flexible and can be configured to work well
in many situations. Here we describe two possible scenarios in
which Equota might be used.
Large Group File Server The first scenario is that of a large
university-wide server. Users on such a large server usually are
anonymous to each other, and will try to get as much out of
the system as possible. Gaming would be a major concern, as
there would be little to no cooperation between users. In such
a situation, both persistent and elastic quotas would have to be
set. Although the purpose of elastic systems is to allow an almost
infinite amount of space to users, it would be necessary on
such a large system to set elastic quotas. Users would not be allowed
to use over a certain amount of elastic space, thus avoiding
denial-of-service attacks and other gaming of the system.
Rubberd monitors disk usage more closely at intervals as short
as an hour, and reclaim a large percentage of disk space when
the system goes over the high watermark. In such a hostile environment,
Rubberd will use a long historical abusage factor, so
as to account for longer-trends of disk abusage.
Small Developer Community Server The second scenario is that
of a cooperative group of software developers. In such a group,
both elastic and persistent quotas may be unlimited: all of the
disk space will be available to elastic or persistent files. Equota s
automatic cleaning mechanisms may be attractive to such a
group that would rather spend time programming than managing
files. Since the group is cooperative, they are working toward
a common goal, and the chance for gaming is very small. Such
a group would use Equota to mark certain patterns of files for
deletion, such as all regenerable files (compiler-generated ones).
These advanced users might modify some of their tools to use of
the O_ELASTIC flag to designate certain application-generated
files elastic by default. Such a user community will also make
extensive use of per-user policy files, for example to mark personal
MP3 files elastic.
19
Chapter 3. Elastic Quota Policies
20
Chapter 4. Implementation
In this section we discuss two interesting implementation aspects
of Equota: the single bit for recording elasticity and the
SLF queue.
To determine if a file is elastic we are using the Ext3 nodump
bit. This bit indicates that a file should not be backed up, so the
semantics already make sense for elastic files. Most stackable
file systems attempt to achieve complete independence from the
underlying file system. Our implementation, however, takes advantage
of Ext2 or Ext3 file system specific features without
modifying them.
The original stacking templates EQFS was based on had 5226
lines of code; 2274 lines are added to implement EQFS. Of these
added lines only 9 are specific to Ext2/Ext3 (0.4%). The use
of a stacking layer still provides most of the benefits of stacking:
we did not need to modify the existing operating system;
we were able to use file-system--specific features to enhance
performance, and we can support additional native file systems
with little new code.
The short-lived-file (SLF) queue acts as a front-end to the netlink
socket. It is designed to help filter out unnecessary messages.
There are 5 message types: CREATE, LINK, RENAME, DELETE,
and CHOWN. For the first 4 types, an object is the same if the
inode number and name match. For a CHOWN message only
the inode number needs to match for two messages to be for the
same object. There are two ways that the SLF queue prevents
unneeded messages from reaching Rubberd. The first is to defer
sending CREATE messages for a short period of time; if the file
is quickly deleted there is no need to send the original CREATE
or the DELETE message. The second is that messages for the
same object can often be coalesced into a single message.
Since we coalesce messages in the queue, we can ensure that
two types of messages for the same object will not exist in the
queue at the same time. For example, a CREATE and RENAME
can not exist in the queue at the same time because the CREATE
would absorb the RENAME. All messages except DELETE
messages are enqueued. If there is no corresponding CREATE
in the queue, a DELETE message must be sent at some point
in time. Rather than deferring this message, we send it immediately.
Other messages may be coalesced into the DELETE message;
for example if a RENAME message is in the queue, the
DELETE will be sent with the original name.
Each time an operation occurs for which a message would normally
be generated, the SLF queue is scanned for a message
that is for the same object. If no queued message is found then
the new message is simply inserted at the end of the queue. If
21
Chapter 4. Implementation
a pending message is encountered then an appropriate action
takes place. For example if the root user unpacks a file from a
tar file and then removes it, the following sequence takes place
for each file in the archive: a CREATE entry is inserted into
the queue; a CHOWN event takes place, but rather than being
sent to netlink or inserting a new entry into the SLF queue, the
CREATE message is modified to represent the new UID. When
the DELETE event takes place, the existing CREATE message is
simply removed from the SLF queue.
22
Chapter 5. Related Work
Elastic quotas are complementary to Hierarchical Storage Management
(HSM) systems. HSM systems provide disk backup as
well as ways to reclaim disk space by moving less-frequently accessed
files to a slower disk or tape. These systems then provide
a way to access files stored on the slower media, ranging from
file search software to replacing the original, migrated file with
a page link to its new location. Examples of HSM systems include the
Network Appliance Snapshot system , the Smart Storage Infinet
system , IBM Storage Management , and UniTree . The UniTree
HSM system uses a combination of file size and the age of a
file in hours to compute the eligibility of a file to be moved to
another medium. Rubberd can be similarly configured to clean
files based on size and time; however, Rubberd also uses more
complex algorithms to compute disk space usage over time.
Elastic quotas can be used with HSM systems as a mechanism
for determining which files are moved to slower storage. Given
an HSM system, Rubberd could then reclaim disk space when
it becomes scarce by moving elastic files to the slower layers of
the HSM storage hierarchy.
The SLF queue optimization is somewhat similar to Soft Updates
(SU). SU creates a queue of meta-data information to be written
to the file system, much as the SLF queue maintains a queue of
meta-data to be written to DB3 databases. Both SU and the SLF
queue can aggregate meta-data updates into a single on-disk
update. Unlike SU, the SLF queue does not contain dependency
information. The DB3 databases are kept consistent by maintaining
the order of the events for each inode, and ensuring that
each event results in at most one message to Rubberd.
The bulk inode stat (bistat) is similar to dm get bulkattr in the
DMAPI . bistat and dm get bulkattr both avoid the overhead
of lookups and recursive directory scanning. However, whereas
dm get bulkattr retrieves attributes for all files, bistat allows
retrieving information for targeted files.
Much previous work has been done to develop mecchanisms
for sharing resources can be utilized fully yet fairly. These resources
are elastic in the sense that they ccan be allocated to
a user in such a way that the allocation can be increased or
decreased over time based on availability. For example, a processor
scheduler enables a group of users to share proceessor
cycles fairly, but allows a user to monopolize the resource when
no one else is using it. Elastic Quotas can be thought of as a
way to make disk space an elastic resource as well. The ability
to use disk space elastically opens up new opportunities for applying
elastic resource management ideas such as proportional
sharing to disk space, a previously unexplored area.
23
Chapter 5. Related Work
24
Chapter 6. Conclusions and Future Work
The main contribution of this report is in the exploration of various
elastic quota policies, demonstrating the utility of treating
storage as an elastic resource. Elastic Quota for Linux prototype
includes many features that allow both site administrators
and users to tailor their elastic quota policies to their needs. For
example, it provides several different ways to decide when a file
becomes elastic: from the directory s mode, from the file s name,
from the user s login session, and even by the application itself.
Through the concept of an abusage factor we have introduced
historical use into quota systems. Its policy engine is flexible,
allowing a variety of methods for elastic space reclamation. Our
evaluation shows that the performance overheads are small and
acceptable for day-to-day use. Finally, the research is going on
this area to more functionalities to Elastic Quota System.
One optimization is to expand upon the definition of persistent
and elastic files to include a file that has a lifetime. A file lifetime
would include a minimum lifetime and a maximum lifetime. A
persistent file has an infinite minimum lifetime and an elastic
file has a minimum lifetime of zero. The minimum lifetime would
be useful for data that may not be relevant for longer than some
predefined time period. A maximum lifetime would provide for
the automatic deletion of files; this could be valuable because it
would ensure a company s records retention policy is enforced
or personal information is not available after a certain point.
This type of lifetime would also serve as a better priority for
space reclamation than atime or mtime.
Aside from the general context of shared file-servers, elastic
quotas can be applied to managing storage for special purpose
file systems. Versioning or snapshotting file systems would benefit
from elastic quotas because past versions are less important
to maintain. The Elephant File System showed that versions
come in bursts. It is possible to select versions that are at the
tail of a burst and discard intermediate versions within a burst
. Elastic Quotas could be adapted to efficiently manage these
intermediate versions. In addition, the elastic attribute could be
used to indicate whether a file is a good candidate for versioning.
Since elastic files are not as important as persistent files
they do not need to be versioned.
Maintenance of the DB3 databases in user space is one of the
more expensive components of this system. One optimization is
to embed the DB3 management into the kernel. This will eliminate
the need for netlink messages and data copies, and could
provide significant performance improvements.
25
Chapter 6. Conclusions and Future Work
26
Appendix A. References
1. Advanced File System Design by Jerhard D K Addison Wesley
2. Modern File Systems by Richards Stevenson O Reilly Publications
3. http://cu.edu/fs/eqfs/
27
Appendix A. References
28

to get information about the topic elastic quotas full report ppt and related topic refer the page link bellow
http://seminarsprojects.net/Thread-elast...ars-report

http://seminarsprojects.net/Thread-elastic-quotas--1810
http://seminarsprojects.net/Thread-elastic-quotas

This really is this kind of a benevolent resource that you are providing and also you exude it away quest of absolutely free.

could i please get a ppt on "elastic optical networking"

sai

ronharry

adwaitpatra

rakeshmltr