OU Supercomputing Center for Education & Research
University of Oklahoma   OSCER   OU IT

 

 

OSCER PetaStore Policy and Procedures

Last Update: March 6, 2012

Table of Contents

Summary

The Oklahoma PetaStore has room for an almost arbitrarily large amount of storage.

Details

The Oklahoma PetaStore is available for the following categories of users:

  • users who are employees of, or students at, OU (Norman, Health Science Center and Schusterman Tulsa campuses);

  • user who are at other Oklahoma institutions but who aren’t collaborating with OU faculty or staff;

  • users who are at US institutions and who are collaborating on projects that have at least one Oklahoma faculty and/or staff member as Principal and/or Co-Principal Investigator(s).

Please note that we CANNOT provide access to the PetaStore to users at institutions outside the US (where the US includes both US states and US territories).

Tape

The PetaStore has room for an arbitrarily large number of tape cartridges (currently 2859 tape cartridge slots total, with only a few hundred already filled, but expandable to over 22,600 slots at no charge to you).

Each tape cartridge currently holds up to 1.5 TB (raw).

As a general rule, for tape we recommend including a contingency of 25% for growth and an additional 25% because tape cartridges are rarely filled 100%.

In addition, we recommend keeping two copies of every file, on two different pieces of media – so either buy twice as many tape cartridges, or store the second copy elsewhere than the PetaStore (for example, at one of the national supercomputing centers).

Purchases MUST be made through an approved reseller, and MUST be of approved model numbers.

WARNING: DON'T purchase ANY tape cartridges until you've consulted us first! Our warranty coverage FORBIDS the use of unapproved tape cartridge brands, model numbers and/or vendors.

We’ll be happy to help you to execute a purchase.

Current pricing can be found here.

Please also note that, because tape cartridges are classed as "materials" instead of "permanent equipment," then if you're buying the tape cartridge(s) via external funds (for example, a grant), then your purchase is subject to Indirect Costs (IDC), also known as "Facilities & Administration."

For OU external funds, the rate can be found here. (Look for the "Organized Research" rate.)

Disk

The PetaStore has room for 600 disk drives, NOT expandable beyond 600 at all.

We already have 530 disk drives in place.

However, IBM no longer sells these disk drives, and our warranty FORBIDS the use of disk drives purchased from anyone else.

Therefore, at this time you CANNOT purchase disk drives for the PetaStore.

Purchasing Media

Contact us at:

We’ll help you coordinate with the appropriate resellers to get a quote and execute the purchase.

Details about purchasing can be found below.

Archiving vs Backups

Here, “backup” means “make copies of new and changed files frequently (incremental backup) and copies of all files infrequently (full dump).”

Whereas, “archive” means “Write Once, Read Seldom.”

The PetaStore is for ARCHIVING ONLY – NO BACKUPS.

What Data Can be Archived?

The PetaStore is intended for OPEN RESEARCH DATA.

The following categories are explicitly FORBIDDEN:

Also, if your files are subject to one or more Institutional Review Board (IRB) agreements governing human subjects research, whether OU's IRB or another institutions, then it's YOUR RESPONSIBILITY to ensure compliance.

Eligibility

Please note that, at the moment, we can accommodate only those users who fall into the categories at the top of this note.

For legal reasons, we CANNOT accommodate users at institutions outside the US.

Preparing to Use the PetaStore

Before you can use the PetaStore, you’ll need to submit a completed PetaStore Use Agreement form.

You can submit it on paper, by fax or a scanned image by e-mail.

Submit it to Henry Neeman as follows:

  • By e-mail (preferred):
  • By fax: 405–325–7181
  • By campus mail: 1PP 2600
  • By postal mail: 1PP 2600, 350 David L. Boren Blvd, Norman OK 73019

Once we receive your form, we’ll acknowledge it so that you know you can start moving forward.

PetaStore Training

Once we’ve received your completed PetaStore Use Agreement form, and you have your research group’s media purchased and deployed, then you can book a meeting with one of the OSCER operations team to train your group in how to use the PetaStore properly.

If you’re off campus, we may choose to do that via phone or videoconferencing instead of in person.

File Sizes

Please note that the PetaStore CANNOT be used for small files.

We strongly urge that all files stored on the PetaStore be between 10 GB and 100 GB, although in special cases you can store files as small as 1 GB.

We don’t recommend files larger than 100 GB, because of long retrieval times, but if you absolutely need to, please talk to us about it.

If you have many small files (under 1 GB each), then you can aggregate them into a Zip file or a gzipped tar file (Unix equivalent of a Zip file), in order to achieve the minimum acceptable file size for the PetaStore.

Note that, if you have individual files of 10 to 100 GB each, we still strongly recommend compressing them via Zip or gzip (or other comparable methods), even if you aren’t aggregating multiple files into a single Zip or tar file.

We can help you learn how to do any of these tasks.

Note that you’ll need to create the Zip file or gzipped tar file on the source disk system where the files are originally, NOT on the PetaStore.

That is, the files that you want to transfer to the PetaStore must ALREADY be compressed and/or converted to Zip or tar files.

Store (Write) Speed

We’ve benchmarked file store (write) speeds into the PetaStore as follows:

  • approximately 150 MB/sec from Sooner’s fast /scratch, for a non-parallel copy (1 TB should be roughly 2 hours);
  • approximately 200 MB/sec from Sooner’s fast /scratch, for a parallel copy (1 TB should be roughly 1 1/2 hours);
  • we expect 25 - 50 MB/sec from Sooner’s slow /scratch (1 TB should be roughly 6 to 12 hours);
  • we expect 25 - 50 MB/sec from Sooner’s /home;
  • we expect 25 MB/sec via sftp or scp from outside OSCER.

We expect the performance on fast /scratch to improve as we learn how to optimize it; the others probably won’t improve much.

For comparison purposes, file transfers to national supercomputing centers typically run at 30–50 MB/sec.

Note that, because file stores (writes) go to the PetaStore disk rather than directly to tape, your file stores (writes) won’t spend any time waiting for your store-to-tape operation to get started; instead, that happens invisibly under the covers after your file store (write) completes, typically within about 2 hours.

Retrieve (Read) Speed

We anticipate file retrieval (read) speeds to be roughly the following:

  • time spent waiting for a tape drive to become available, which will depend entirely on how many other file transfers are ongoing or waiting to begin,
    PLUS
  • time spent collecting the correct tape cartridge, inserting it into a tape drive, and rewinding to the appropriate place in the tape cartridge: typically 2 minutes,
    PLUS
  • read time: 2 minutes for a 10 GB file, 20 minutes for a 100 GB file,
    PLUS
  • copying back to wherever you actually want to transfer the file to: see STORAGE SPEED, above.

Other than time spent waiting to get started, total time should be:

  • 10 GB file: roughly 10 minutes;
  • 100 GB file: roughly 1 1/4 hours.

In other words, compared to copying the same file between two different disk systems in the same room, the time cost will be roughly the same, or better, plus 4 minutes, plus wait time.

Note that, at least for the near future, your time spent waiting for a retrieval to get started should be very modest in most cases.

This is because the PetaStore has 4 tape drives and gives priority to file retrievals (reads) when there are a mix of stores and retrieves waiting to get started.

Duplication for Resiliency

You’ll be able to store your files under any of the following duplication policies, depending on what media tape and/or disk) you buy:

  1. one copy on disk only (unsafe);
  2. one copy on disk and one copy on tape;
  3. one copy on tape only (unsafe);
  4. two copies on tape.

You can choose which of these policies to use, on a per-file or per-directory basis.

NOTE: You can only choose a particular duplication policy if you’ve purchased appropriate media.

For example, if you only own tape cartridges but not disk drives, then you can only choose (c) or (d).

Please note that we DON’T allow multiple copies on disk, because of the severe scarcity and non-expandability of disk capacity.

Duplication Use Cases

Here are some example use cases for these duplication policies:

  1. One copy on disk only:

    The file is also stored on another storage resource (perhaps a large scale storage resource at a national supercomputing center) and you expect to need to use it often.

  2. One copy on disk and one copy on tape:

    You’re the owner of the file, it only exists on the PetaStore, and you expect to need to use it often.

  3. One copy on tape only:

    The file is also stored on another storage resource (perhaps a large scale storage resource at a national supercomputing center) and you don’t expect to need to use it often.

  4. Two copies on tape:

    You’re the owner of the file, it only exists on the PetaStore, and you don’t expect to need to use it often.

Limitations

Please carefully read the PetaStore Use Agreement form to understand the limitations of using the PetaStore, including the kinds of files that CANNOT be stored on the PetaStore, who CANNOT use the PetaStore, and what you have to take responsibility for when using the PetaStore.

Purchasing Media: Details

Please recall that tape cartridges purchased on research grant funds typically are subject to Indirect Costs (also known as “Facilities & Administration”).

OU's current IDC rate can be found here.
Look for the current On Campus Organized Research rate.

Each LTO-5 tape cartridge has a raw capacity of 1.5 TB, but we recommend budgeting as if each had just 1 TB of useable space, because of a combination of:

  • contingency, because we can't guarantee that any particular tape cartridge will be 100% filled;
  • unanticipated growth of your datasets.

In addition, you'll want to think about whether your files will, over the long term, reside only on the PetaStore, or also on some other long term archive.

If these files will be stored only on the PetaStore, then we urge you to double the number of tape cartridges, so that you can keep two complete copies of all files, on separate tape cartridges.

(Tape breakage isn't an everyday occurrence, but it happens often enough to be worth planning for.)

For comparison purposes, currently on the PetaStore, users have stored more than 4 times as much data on dual copies as on only single copy.

At the moment, on the website of our approved reseller:

  • 1-Pack IBM LTO-5 tape cartridge
    IBM part #: 46X1290 | CDW part #: 2121871
    webpage with pricing
  • 5-Pack IBM LTO-5 tape cartridge
    IBM part #: 46X1290-5PK | CDW part #: 2525021
    webpage with pricing
  • 20-Pack IBM LTO-5 tape cartridge
    IBM part #: 46X1290-20PK | CDW part #: 2525022
    webpage with pricing

However, if you're at OU, your financial person can also order these items directly through Crimson Corner (the internal OU purchasing system), which will get you something like a 10% discount — but may have a different winner and loser for price per tape cartridge.

Here are the internal Crimson Corner item numbers:

  • CDW#2121871 for individual tape cartridges
  • CDW#2525021 for 5-packs
  • CDW#2525022 for 20-packs

Have your unit's financial person check what the current pricing is (it tends to fluctuate).

For budgeting purposes (in a grant proposal or whatever), it's probably wiser to use the website pricing.

Once tape cartridges are purchased, there AREN'T any recurring charges (for example, NO monthly service charge).

But, after that (hoped for) 10 years, you'll need to buy new tape cartridges.

Based on the current rate of tape cartridge capacity increase (roughly 2/3 increase per 2 years), and assuming a 3% inflation rate per year and no change in the IDC rate, the cost in 10 years will be less than 10% of today's cost.

Obviously, there's no way to know (let alone guarantee) that these rates will remain constant, so consider these numbers to be best guess estimates.

We've also ordered 3 LTO-6 tape drives, so we'll be able to offer LTO-6 soon.

However, we current AREN'T recommending LTO-6, because the price per TB is still higher than for LTO-5.

Here's LTO-6 pricing, just for comparison:

  • 1-Pack IBM LTO-6 tape cartridge
    IBM part #: 00V7590 | CDW part #: 2890034
    webpage with pricing
  • 5-Pack IBM LTO-6 tape cartridge
    IBM part #: 35P1902 | CDW part #: 3031155
    webpage with pricing
  • 20-Pack IBM LTO-6 tape cartridge
    IBM part #: 00V7594 | CDW part #: 2890036
    webpage with pricing
   


Copyright (C) 2002-2013 University of Oklahoma