OU Supercomputing Center for Education & Research
University of Oklahoma   OSCER   OU IT

 

Accessing OSCER's Tape Archive

Questions about the tape archive or how to use it? Contact us!

The OSCER tape archive is now online.

This tape archive replaces the /data disk partition.

Access to the tape archive is still somewhat primitive, but we plan to improve the access software, to make your use of the tape archive both simpler and more powerful.

Currently, the only way to access the tape archive is using sftp (Secure File Transfer Protocol).

On all OSCER systems (topdawg.oscer.ou.edu, schooner.oscer.ou.edu, condor.oscer.ou.edu) the sftp command to use is:

    sftp archive

So, you can log in to topdawg.oscer.ou.edu or schooner.oscer.ou.edu and do the command above, then copy files from, for example, /scratch/yourusername to the tape archive:

    cd /scratch/yourusername
    sftp archive

NOTES:

  • When you sftp into the tape archive, currently it is the case that you end up in the directory one level above your directory, so if you do an ls command to list the contents of your current working directory, you'll see the list of OSCER users.

    Therefore, the first command that you should type when you sftp into the tape archive is:

        cd yourusername

    (Replace yourusername with your user name.)
     
  • We encourage you to set your directory permissions as tightly as you're able to. So, the FIRST TIME you log in, after you do the cd command, above, do this:

        chmod 700 .

    NOTICE THE PERIOD AT THE END OF THE COMMAND -- IT'S CRUCIAL!!!

    This command means: "Set the permissions on my tape archive directory so that I can read, write and go into my files and directories, but nobody else can do anything with my files and directories."

    If you also want your files to be accessible by members of your user group, you can do:

        chmod 750 .

    NOTICE THE PERIOD AT THE END OF THE COMMAND -- IT'S CRUCIAL!!!

    This command means: "Set the permissions on my tape archive directory so that I can read, write and go into my files and directories, and members of my user group can read and go into my files and directories, but nobody else can do anything with my files and directories."
     
  • You can get a list of Unix-like commands that can be used within sftp by typing

        help

    at the sftp prompt.
     
  • Note that when you want to get files from the tape archive, there may be a delay, of anywhere from several minutes on up, before your files become available, because the tape that contains your files will have to be selected by the tape robot, placed into one of the tape drives, and then fast forwarded or rewound to the appropriate place in the tape. (Each tape is 400 GB, so this can take a while.)

    So, please be patient when using the tape archive.

    Also, if there are many people trying to access tapes at the same time, this will cause even longer delays.
     
  • If you had a /data/yourusername directory on topdawg.oscer.ou.edu (if you don't know whether you had one, then you didn't have one), then your files that were on /data/yourusername have already been moved to the tape archive.
     
  • If you have files parked on /scratch/yourusername, because you couldn't find a suitable place to park them elsewhere, then please move those files to the tape archive.
     
  • When storing files to the tape archive, and especially when retrieving the files from the tape archive later, it's much much faster to store a few large files than many small files.
     
    You can accomplish this by creating one or a few "tar" files. How to create a tar file is explained below.
     
    Suppose that you have many files that together consume a lot of disk, but that many of these files are individually quite small.
     
    For example, suppose that the files had an aggregate total size of 10 GB, but most of the files were about 1 MB each.
     
    That would mean that you had approximately 10,000 files.
     
    You now have two choices:
     
    1. Save each of the individual small files to the tape archive.
       
    2. Create a tar file consisting of all of the files (or a big subset of them), and save the tar file to the tape archive.
       
    (If you aren't familiar with tar files, they're like zip files in Windows: one file can contain many smaller files, and even a directory structure, inside it; more below).
     
    When you save a file to the tape archive, here's what happens:
     
    1. When you do the sftp, your file(s) actually write to a disk that acts as a cache for the tape archive.
       
    2. At some point after you save your file(s) to that disk cache, the tape archive software automatically copies your file(s) out to tape, and then erases your file(s) from the disk cache.
       
    Therefore, the process of saving a file to the tape archive typically is quite fast, because disk is much faster than tape.
     
    When you want to retrieve a file from the tape archive, here's what happens:
     
    1. The tape archive determines which tape cartridge contains the file that you want to retrieve, and which storage slot that tape cartridge is stored in.
       
    2. If all of our tape drives are full, then the tape archive waits patiently for your tape cartridge's turn.
       
    3. The tape archive robot pulls that tape cartridge out of its storage slot.
       
    4. The robot carries the tape cartridge to the tape drive that's ready for your tape cartridge.
       
    5. The robot inserts the tape cartridge into that tape drive.
       
    6. The tape drive winds the tape cartridge to the place where your file is stored.
       
    7. The tape drive reads your file and copies it to the disk cache.
       
    8. The tape drive ejects the tape cartridge.
       
    9. The robot removes the tape cartridge from the tape drive.
       
    10. The robot carries the tape cartridge back to its storage slot.
       
    As you can imagine, this can take a lot of time.
     
    Now, suppose that you have one big tar file containing all of your 10,000 little files.
     
    Then this procedure will have to be performed only once.
     
    On the other hand, suppose that you've saved each of the 10,000 files individually to the tape archive.
     
    Then this procedure will have to be performed as many as 10,000 times, because there's no guarantee that the individual files are all stored on the same tape.
     
    That's bad.
     
    HOW TO CREATE A TAR FILE
     
    Here's how to create a tar file:
     
        tar zcvf DirectoryName_Date.tgz DirectoryName
     
    For example, suppose that, in my home directory /home/yourusername, I have a subdirectory named TestSymposium2004, and suppose that I want to create a tar file of TestSymposium2004, on Jan 9 2008, and have that tar file reside in my scratch directory, /scratch/yourusername (which is a great idea).
     
    Then the tar command would be:
     
        cd /home/yourusername
        tar zcvf /scratch/yourusername/TestSymposium2004_20080109.tgz TestSymposium2004
     
    Note that "z" means "gzip" (that is, compress to a smaller size, much like zipping), "c" means "create," "v" means "verbose" (tell me what's happening as it happens), and "f" means "the next thing on this command line is the name of the tar file."
     
    Then, the only file that I would want to save to the tape archive would be:
     
        /scratch/yourusername/TestSymposium2004_20080109.tgz
     
    For example, the commands I'd use would be:
     
        cd /scratch/yourusername
        sftp archive
        cd yourusername
        put TestSymposium2004_20080109.tgz
        logout
     
    And, once I'd saved the tar file to the tape archive, I could delete the tar file from /scratch/yourusername:
     
        rm /scratch/yourusername/TestSymposium2004_20080109.tgz
     
    And, when I wanted the contents of TestSymposium2004_20080109.tgz, then I'd retrieve it from the tape archive, probably into my scratch directory /scratch/yourusername, and then extract the individual files from the tar file.
     
    Here's how I'd retrieve the tar file:
     
        cd /scratch/yourusername
        sftp archive
        cd yourusername
        get TestSymposium2004_20080109.tgz
        logout
     
    And here's how I'd extract the contents, creating a subdirectory under /scratch/yourusername named /scratch/yourusername/TestSymposium2004:
     
        cd /scratch/yourusername
        tar zxvf /scratch/yourusername/TestSymposium2004_20080109.tgz
     
    Notice that the only differences between this tar file extraction command and the tar file creation command (above) are:
     
    1. "x" (for "extract") replaces "c" (for "create"), and
       
    2. we don't bother to say the name of the directory that's stored inside the tar file, because that name itself is stored inside the tar file.

     

Questions about the tape archive or how to use it? Contact us!

 

 


Copyright (C) 2004-2007 OU Supercomputing Center for Education & Research