Where to Store Your Data on Spartan
Many HPC jobs have large datasets. There are a number of places to store data on Spartan and different ways to get data in and out. Some directories have a faster interconnect than others. If a job involves a larger dataset with I/O a slow connection between the compute device and the storage device will effect job performance. In all but the smallest jobs, it is best to have data close (physically, with a fast connection) to compute.
/data/scratch are all network-based storage that can be accessed by multiple nodes and processes at the same time across the whole of Spartan. Take care that you don't inadvertently write to the same file from multiple jobs at the same time.
While it's often essential to have fast nearby storage while working on your data, please don't use Spartan as a long-term data repository. It's not designed for that, may not conform to the requirements set by an institution or funding body.
A user's home directory, i.e.
/home/$username can be used to store small amounts of data, however this is generally discouraged. It's best suited to short-lived and non-critical data that is for the user. Others in a project won't have access to this data, and it's 50GB of storage. Quota and usage with the command
[new-user@spartan ~]$ check_home_usage new-user has used 4GB out of 50GB in /home/new-user
Your projects directory is the best place to store research data while you're working on it. It's located at
/data/gpfs/projects/$projectID. The projects directories is backed up nightly, but does not have a snapshot ability.
All members of project can access this datastore, and 500 GB of storage is available per project. If more storage is required than, contact the helpdesk. In general, for University of Melbourne users, 1 TB of project storage is available upon request, and up to 10 TB is possible if needed. Project storage beyond 10 TB will generally require some sort of co-investment.
Project quota and usage with the command
[new-user@spartan ~]$ check_project_usage myproject has used 3997GB out of 8000GB in /data/gpfs/projects/myproject myproject1 has used 265GB out of 500GB in /data/gpfs/projects/myproject1
Temporary working data can be stored while your job is running at
/tmp. This will map to a directory on the faster scratch network storage specific to a job ID, and clean up once the job is complete. It's also possible to write directly to
/data/scratch/projects/$projectID, which is shared across multiple nodes. Note that scratch directory is not backed up, and does not have a snapshot ability.
Files in the
/data/scratch directory have a nominal lifetime of 60 days. Files can be deleted to ensure that
/data/scratch does not fill up.
If you wish to use /data/scratch, please submit a request for a scratch filesystem directory.
Spartan has some commonly used datasets set up in a shared location so you don't need to store it in your own area.
To access them, please join the Software group using Karaage (go to Karaage, click
Software Agreements->Add software).
Connectome: The Human Connectome HCP dataset is stored on Spartan in
Imagenet: The Imagenet blurred dataset is stored on Spartan in
Local disk is typically faster than shared disks. Spartan has
/home for home (slower),
/data/scratch/projects/$projectID for temporary storage data (even faster), and as local disk,
/var/local/tmp (fastest, not shared). For the latter you will need to copy data between these locations within a job script.
How to Transfer Data In and Out of Spartan
Secure Copy (scp)
scp command can be used to move data from your local machine to Spartan. For example, to move
mydata.dat from a current working directory on a local to a project directory on Spartan:
$ scp local.dat firstname.lastname@example.org:/data/gpfs/projects/$myproject/remote.dat
Files can be transferred from Spartan to a local machine whilst by reversing the order of the arguments:
$ scp email@example.com:/data/gpfs/projects/$myproject/$remote.dat local.dat
Entire directories can be copied with the
$ scp -r firstname.lastname@example.org:/data/gpfs/projects/$myproject/$remotedir/ .
Both these examples assume the user is on the local machine. One cannot normally copy data from Spartan to a local system, as local systems typically do not have a public fixed IP address.
For Windows users, PuTTY provides an equivalent tool called
pscp. If data is located on a remote machine, SSH into that system first, and then use
scp from that machine to transfer data to Spartan.
Repeatedly transferring large files in and out of Spartan via
scp can be inefficient. A good alternative is rsync, which only transfers the parts that have changed. It can work on single files, or whole directories, and the syntax is much same as for
$ rsync local.dat email@example.com:/data/gpfs/projects/$myproject/remote.dat
Note that the first argument is the source, and the second is the destination which will be modified to match the source.
The rsync application can copy directories (as above) and also protect destination directories, ensuring that files that have been modified at the destination are not over-written:
$ rsync -avz --update source/ username@remotemachine:/path/to/destination
To force a destination to synchronise absolutely with the source, use the
--delete flag. Consider this with the
--dry-run options first!
$ rsync -avz --delete source/ username@remotemachine:/path/to/destination
Research Computing Services provides a data management service utilising the Mediaflux platform. This platform provides a persistent location for research data and meta-data. To aid integration between Mediaflux and Spartan, Java clients are available on Spartan, allowing data to be downloaded from and uploaded to Mediaflux. Details on Mediaflux integration with Spartan can be found in the Mediaflux support wiki
Research Computing Services provides an object storage service with an S3-compatible layer. Data can be archived from Spartan to this service, and retrieved to be analysed later. For more information, please see our wiki
Data and Storage Solutions Beyond Spartan
The University offers a range of other data storage and management solutions to meet your needs, beyond the short-term storage available on Spartan, which are described.
In some cases it's possible to integrate these resources with your account on Spartan to streamline workflows. Get in touch.