OSCER PetaStore Policy and Procedures
Last Update: March 6, 2012
Table of Contents
Summary
The Oklahoma PetaStore is now in full production!
It has room for
an almost arbitrarily large amount of storage.
Details
The Oklahoma PetaStore is now in full production,
available for the following categories of users:
-
users who are employees of, or students at, OU
(Norman, Health Science Center and
Schusterman Tulsa campuses);
-
users who are at US institutions and who are
collaborating on projects that have
at least one OU faculty and/or staff member
as Principal and/or Co-Principal Investigator(s);
-
user who are at other Oklahoma institutions
but who aren’t collaborating with
OU faculty or staff.
Please note that we CANNOT provide
access to the PetaStore
to users at institutions outside the US
(where the US includes both
US states and US territories).
Tape
The PetaStore has room for
an arbitrarily large number of tape cartridges
(currently 2889 tape cartridge slots total,
with only a few hundred already filled,
but expandable to over 22,600 slots
at no charge to you).
Each tape cartridge currently holds
up to 1.5 TB (raw) and costs $100,
plus $50 in Indirect Costs,
for a total of $150 per tape cartridge
($100 per TB raw).
Prices will come down over time,
so we recommend not overbuying
at any one time.
As a general rule,
for tape we recommend including
a contingency of 25% for growth
and an additional 25% because
tape cartridges are rarely filled 100%.
In addition,
we recommend keeping
two copies of every file,
on two different pieces of media –
so either buy twice as many tape cartridges,
or store the second copy elsewhere than the
PetaStore
(for example,
at one of the national supercomputing centers).
So you may need to budget as much as $300 per TB,
for double copies and contingency needs.
Purchases MUST be made through the IT Store.
We’ll be happy to help you work
with them to get a quote and execute a purchase.
Disk
The PetaStore has room for 1200 disk drives,
NOT expandable beyond 1200 at all.
We already have 540 disk drives in place,
leaving room for 660 more
(about another 1 PetaByte,
which is about 1000 TB).
Disk drives (2 TB SATA 7200 RPM) cost $525 each,
but if you buy more than $5000 at a time
(at least 10 disk drives, $5250),
then you won’t be charged Indirect Costs for them.
In general,
disk drives MUST bought
in lots of 10 anyway,
because of how they’re configured
in the PetaStore.
A group of 10 disk drives
will provide about 15 TB
of useable disk space
(roughly $350 per useable TB,
compared to
roughly $125 per useable TB of tape per copy).
(They're deployed in a RAID6 configuration,
meaning that 2 of the disk drives
are used for redundant data,
and useable space on the other 8 drives
is roughly 99% of raw capacity.)
Growth beyond the existing 1200 disk drive slots
isn’t possible.
Archiving
vs Backups
Here,
“backup” means
“make copies of new and changed files
frequently (incremental backup)
and copies of all files
infrequently (full dump).”
Whereas,
“archive” means
“Write Once, Read Seldom.”
The PetaStore is for ARCHIVING ONLY –
NO BACKUPS.
What Data
Can be Archived?
The PetaStore is intended for RESEARCH DATA.
The following categories are explicitly forbidden:
Also, if your files are subject to one or more
Institutional Review Board (IRB) agreements
governing human subjects research,
whether OU's IRB or another institutions,
then it's
YOUR RESPONSIBILITY
to ensure compliance.
Eligibility
Please note that,
at the moment,
we can accommodate only those users
who fall into the categories
at the top of this note.
For legal reasons,
we CANNOT accommodate users at
institutions outside the US.
Preparing
to Use the PetaStore
Before you can use the PetaStore,
you’ll need to submit a completed
PetaStore
Use Agreement form.
You can submit it on paper,
by fax or a scanned image by e-mail.
Submit it to Henry Neeman as follows:
-
By e-mail (preferred):
-
By fax: 405–325–7181
-
By campus mail: 4PP 1000
-
By postal mail:
4PP 1000, 301 David L. Boren Blvd, Norman OK 73019
Once we receive your form,
we’ll acknowledge it
so that you know
you can start moving forward.
Contact us at:
We’ll help you coordinate with
OU’s IT Store
to get a quote and
execute the purchase.
Please recall that
tape cartridges purchased on
research grant funds
typically are subject to Indirect Costs
(also known as
“Facilities & Administration”).
Using
the PetaStore
Once we’ve received
your completed PetaStore Use Agreement form,
and you have your research group’s media
purchased and deployed,
we’ll be happy to send
one of the OSCER operations team
to train your group
in how to use the PetaStore properly.
If you’re off campus,
we may choose to do that
via phone or videoconferencing
instead of in person.
File Sizes
Please note that
the PetaStore CANNOT be used
for small files.
We strongly urge that
all files stored on the PetaStore
be between 10 GB and 100 GB,
although in special cases
you can store files as small as 1 GB.
We don’t recommend
files larger than 100 GB,
because of long retrieval times,
but if you absolutely need to,
please talk to us about it.
If you have many small files
(under 1 GB each),
then you can aggregate them into a
Zip file
or a
gzipped
tar
file
(Unix equivalent of a Zip file),
in order to achieve
the minimum acceptable file size
for the PetaStore.
Note that,
if you have individual files of
10 to 100 GB each,
we still strongly recommend
compressing them via Zip or gzip
(or other comparable methods),
even if you aren’t
aggregating multiple files into
a single Zip or tar file.
We can help you learn how to do any of these tasks.
Note that
you’ll need to create
the Zip file or gzipped tar file
on the source disk system
where the files are originally,
NOT on the PetaStore.
That is,
the files that you want to transfer to the PetaStore
must ALREADY be
compressed and/or converted to Zip or tar files.
Storage
(Write) Speed
We’ve benchmarked file store (write) speeds
into the PetaStore as follows:
-
approximately 150 MB/sec
from Sooner’s fast /scratch,
for a non-parallel copy
(1 TB should be roughly 2 hours);
-
approximately 200 MB/sec
from Sooner’s fast /scratch,
for a parallel copy
(1 TB should be roughly 1 1/2 hours);
-
we expect 25 - 50 MB/sec
from Sooner’s slow /scratch
(1 TB should be roughly 6 to 12 hours);
-
we expect 25 - 50 MB/sec from Sooner’s /home;
-
we expect 25 MB/sec via sftp or scp
from outside OSCER.
We expect the performance on fast /scratch
to improve as we learn how to optimize it;
the others probably won’t improve much.
For comparison purposes,
file transfers to national supercomputing centers
typically run at 30–50 MB/sec.
Note that,
because file stores (writes) go to the PetaStore disk
rather than directly to tape,
your file stores (writes)
won’t spend any time waiting for
your store-to-tape operation to get started;
instead,
that happens invisibly under the covers
after your file store (write) completes.
Retrieval
(Read) Speed
We anticipate file retrieval (read) speeds
to be roughly the following:
-
time spent
waiting for a tape drive
to become available,
which will depend entirely
on how many other file transfers
are ongoing or waiting to begin,
-
time spent collecting the correct tape cartridge,
inserting it into a tape drive,
and rewinding to the appropriate place
in the tape cartridge:
typically 2 minutes,
-
read time:
2 minutes for a 10 GB file,
20 minutes for a 100 GB file,
-
copying back to
wherever you actually want
to transfer the file to:
see STORAGE SPEED, above.
Other than time spent waiting to get started,
total time should be:
-
10 GB file: roughly 10 minutes;
-
100 GB file: roughly 1 1/4 hours.
In other words,
compared to copying
the same file between
two different disk systems
in the same room,
the time cost will be
roughly the same, or better,
plus 4 minutes,
plus wait time.
Note that,
at least for the near future,
your time spent waiting
for a retrieval to get started
should be very modest in most cases.
This is because the PetaStore
has 4 tape drives
and gives priority to file retrievals (reads)
when there are
a mix of stores and retrieves
waiting to get started.
Duplication
for Resiliency
You’ll be able to store your files
under any of
the following duplication policies,
depending on what media
tape and/or disk) you buy:
-
one copy on disk only (unsafe);
-
one copy on disk and one copy on tape;
-
one copy on tape only (unsafe);
-
two copies on tape.
You can choose which of these policies to use,
on a per-file or per-directory basis.
NOTE:
You can only choose a particular duplication policy
if you’ve purchased appropriate media.
For example,
if you only buy tape cartridges,
then you can only choose (c) or (d).
Please note that we DON’T allow
multiple copies on disk,
because of the severe scarcity of disk drive slots.
Duplication
Use Cases
Here are some example use cases
for these duplication policies:
-
One copy on disk only:
The file is also stored
on another storage resource
(perhaps a large scale storage resource
at a national supercomputing center)
and you expect to need to use it often.
-
One copy on disk and one copy on tape:
You’re the owner of the file,
it only exists on the PetaStore,
and you expect to need to use it often.
-
One copy on tape only:
The file is also stored
on another storage resource
(perhaps a large scale storage resource
at a national supercomputing center)
and you don’t expect
to need to use it often.
-
Two copies on tape:
You’re the owner of the file,
it only exists on the PetaStore,
and you don’t expect
to need to use it often.
Limitations
Please carefully read
the
PetaStore
Use Agreement form
to understand
the limitations of using the PetaStore,
including the kinds of files
that CANNOT be stored
on the PetaStore,
who CANNOT use the PetaStore,
and what you have to take responsibility for
when using the PetaStore.