OU Supercomputing Center for Education & Research
University of Oklahoma   OSCER   OU IT

 

 

OSCER PetaStore Policy and Procedures

Last Update: March 6, 2012

Table of Contents

Summary

The Oklahoma PetaStore is now in full production!

It has room for an almost arbitrarily large amount of storage.

Details

The Oklahoma PetaStore is now in full production, available for the following categories of users:

  • users who are employees of, or students at, OU (Norman, Health Science Center and Schusterman Tulsa campuses);

  • users who are at US institutions and who are collaborating on projects that have at least one OU faculty and/or staff member as Principal and/or Co-Principal Investigator(s);

  • user who are at other Oklahoma institutions but who aren’t collaborating with OU faculty or staff.

Please note that we CANNOT provide access to the PetaStore to users at institutions outside the US (where the US includes both US states and US territories).

Tape

The PetaStore has room for an arbitrarily large number of tape cartridges (currently 2889 tape cartridge slots total, with only a few hundred already filled, but expandable to over 22,600 slots at no charge to you).

Each tape cartridge currently holds up to 1.5 TB (raw).

As a general rule, for tape we recommend including a contingency of 25% for growth and an additional 25% because tape cartridges are rarely filled 100%.

In addition, we recommend keeping two copies of every file, on two different pieces of media – so either buy twice as many tape cartridges, or store the second copy elsewhere than the PetaStore (for example, at one of the national supercomputing centers).

Purchases MUST be made through an approved reseller, and MUST be of approved model numbers.

WARNING: We CANNOT put incorrect model numbers, or correct model numbers from unapproved vendors, into the PetaStore.

We’ll be happy to help you to execute a purchase.

Current pricing can be found here.

If you're buying the tape cartridge(s) via external funds (for example, a grant), then your purchase is subject to Indirect Costs.

For OU external funds, the rate can be found here. (Look for the "Organized Research" rate.)

Disk

The PetaStore has room for 1200 disk drives, NOT expandable beyond 1200 at all.

We already have 540 disk drives in place, leaving room for 660 more (about another 1 PetaByte, which is about 1000 TB).

Disk drives (2 TB SATA 7200 RPM) cost $525 each, but if you buy more than $5000 at a time (at least 10 disk drives, $5250), then you won’t be charged Indirect Costs for them.

In general, disk drives MUST bought in lots of 10 anyway, because of how they’re configured in the PetaStore. A group of 10 disk drives will provide about 15 TB of useable disk space (roughly $350 per useable TB, compared to roughly $125 per useable TB of tape per copy).

(They're deployed in a RAID6 configuration, meaning that 2 of the disk drives are used for redundant data, and useable space on the other 8 drives is roughly 99% of raw capacity.)

Growth beyond the existing 1200 disk drive slots isn’t possible.

Archiving vs Backups

Here, “backup” means “make copies of new and changed files frequently (incremental backup) and copies of all files infrequently (full dump).”

Whereas, “archive” means “Write Once, Read Seldom.”

The PetaStore is for ARCHIVING ONLY – NO BACKUPS.

What Data Can be Archived?

The PetaStore is intended for RESEARCH DATA.

The following categories are explicitly FORBIDDEN:

Also, if your files are subject to one or more Institutional Review Board (IRB) agreements governing human subjects research, whether OU's IRB or another institutions, then it's YOUR RESPONSIBILITY to ensure compliance.

Eligibility

Please note that, at the moment, we can accommodate only those users who fall into the categories at the top of this note.

For legal reasons, we CANNOT accommodate users at institutions outside the US.

Preparing to Use the PetaStore

Before you can use the PetaStore, you’ll need to submit a completed PetaStore Use Agreement form.

You can submit it on paper, by fax or a scanned image by e-mail.

Submit it to Henry Neeman as follows:

  • By e-mail (preferred):
  • By fax: 405–325–7181
  • By campus mail: 4PP 1000
  • By postal mail: 4PP 1000, 301 David L. Boren Blvd, Norman OK 73019

Once we receive your form, we’ll acknowledge it so that you know you can start moving forward.

Purchasing Media

Contact us at:

We’ll help you coordinate with the appropriate resellers to get a quote and execute the purchase.

Please recall that tape cartridges purchased on research grant funds typically are subject to Indirect Costs (also known as “Facilities & Administration”).

Using the PetaStore

Once we’ve received your completed PetaStore Use Agreement form, and you have your research group’s media purchased and deployed, we’ll be happy to send one of the OSCER operations team to train your group in how to use the PetaStore properly.

If you’re off campus, we may choose to do that via phone or videoconferencing instead of in person.

File Sizes

Please note that the PetaStore CANNOT be used for small files.

We strongly urge that all files stored on the PetaStore be between 10 GB and 100 GB, although in special cases you can store files as small as 1 GB.

We don’t recommend files larger than 100 GB, because of long retrieval times, but if you absolutely need to, please talk to us about it.

If you have many small files (under 1 GB each), then you can aggregate them into a Zip file or a gzipped tar file (Unix equivalent of a Zip file), in order to achieve the minimum acceptable file size for the PetaStore.

Note that, if you have individual files of 10 to 100 GB each, we still strongly recommend compressing them via Zip or gzip (or other comparable methods), even if you aren’t aggregating multiple files into a single Zip or tar file.

We can help you learn how to do any of these tasks.

Note that you’ll need to create the Zip file or gzipped tar file on the source disk system where the files are originally, NOT on the PetaStore.

That is, the files that you want to transfer to the PetaStore must ALREADY be compressed and/or converted to Zip or tar files.

Storage (Write) Speed

We’ve benchmarked file store (write) speeds into the PetaStore as follows:

  • approximately 150 MB/sec from Sooner’s fast /scratch, for a non-parallel copy (1 TB should be roughly 2 hours);
  • approximately 200 MB/sec from Sooner’s fast /scratch, for a parallel copy (1 TB should be roughly 1 1/2 hours);
  • we expect 25 - 50 MB/sec from Sooner’s slow /scratch (1 TB should be roughly 6 to 12 hours);
  • we expect 25 - 50 MB/sec from Sooner’s /home;
  • we expect 25 MB/sec via sftp or scp from outside OSCER.

We expect the performance on fast /scratch to improve as we learn how to optimize it; the others probably won’t improve much.

For comparison purposes, file transfers to national supercomputing centers typically run at 30–50 MB/sec.

Note that, because file stores (writes) go to the PetaStore disk rather than directly to tape, your file stores (writes) won’t spend any time waiting for your store-to-tape operation to get started; instead, that happens invisibly under the covers after your file store (write) completes.

Retrieval (Read) Speed

We anticipate file retrieval (read) speeds to be roughly the following:

  • time spent waiting for a tape drive to become available, which will depend entirely on how many other file transfers are ongoing or waiting to begin,
    PLUS
  • time spent collecting the correct tape cartridge, inserting it into a tape drive, and rewinding to the appropriate place in the tape cartridge: typically 2 minutes,
    PLUS
  • read time: 2 minutes for a 10 GB file, 20 minutes for a 100 GB file,
    PLUS
  • copying back to wherever you actually want to transfer the file to: see STORAGE SPEED, above.

Other than time spent waiting to get started, total time should be:

  • 10 GB file: roughly 10 minutes;
  • 100 GB file: roughly 1 1/4 hours.

In other words, compared to copying the same file between two different disk systems in the same room, the time cost will be roughly the same, or better, plus 4 minutes, plus wait time.

Note that, at least for the near future, your time spent waiting for a retrieval to get started should be very modest in most cases.

This is because the PetaStore has 4 tape drives and gives priority to file retrievals (reads) when there are a mix of stores and retrieves waiting to get started.

Duplication for Resiliency

You’ll be able to store your files under any of the following duplication policies, depending on what media tape and/or disk) you buy:

  1. one copy on disk only (unsafe);
  2. one copy on disk and one copy on tape;
  3. one copy on tape only (unsafe);
  4. two copies on tape.

You can choose which of these policies to use, on a per-file or per-directory basis.

NOTE: You can only choose a particular duplication policy if you’ve purchased appropriate media.

For example, if you only buy tape cartridges, then you can only choose (c) or (d).

Please note that we DON’T allow multiple copies on disk, because of the severe scarcity of disk drive slots.

Duplication Use Cases

Here are some example use cases for these duplication policies:

  1. One copy on disk only:

    The file is also stored on another storage resource (perhaps a large scale storage resource at a national supercomputing center) and you expect to need to use it often.

  2. One copy on disk and one copy on tape:

    You’re the owner of the file, it only exists on the PetaStore, and you expect to need to use it often.

  3. One copy on tape only:

    The file is also stored on another storage resource (perhaps a large scale storage resource at a national supercomputing center) and you don’t expect to need to use it often.

  4. Two copies on tape:

    You’re the owner of the file, it only exists on the PetaStore, and you don’t expect to need to use it often.

Limitations

Please carefully read the PetaStore Use Agreement form to understand the limitations of using the PetaStore, including the kinds of files that CANNOT be stored on the PetaStore, who CANNOT use the PetaStore, and what you have to take responsibility for when using the PetaStore.

   


Copyright (C) 2002-2013 University of Oklahoma