Status: (more...)

Chances are you need to run your HPC job against a dataset, perhaps quite a sizable one. There are a number of places to store data on Spartan while you're working with it, and ways to get data in and out.

Where to Store Your Data on Spartan

Note that all the directories below are shared across the whole of Spartan - login node, control node and all worker nodes.

Home Directory

Your home directory, i.e. /home/yourusername can be used to store small amounts of data, however this is generally discouraged. It's best suited to short-lived and non-critical data, for example while working through our getting started tutorial or testing out new software.

Others in your project won't have access, and you're limited to 50GB of storage. You can check your quota and usage with the command check_home_usage e.g.,

[new-user@spartan ~]$ check_home_usage
new-user has used 4GB out of 50GB in /home/new-user

Projects Directory

Your projects directory is the best place to store research data while you're working on it. It's located at /data/cephfs/<projectid>.

N.B. Note that projects directory is not backed up, and does not have a snapshot ability. If you delete files, they will be unrecoverable.

Others in your project can access it, and 500 GB of storage is available per project. If you need more than this, get in touch and we'll try to find a solution. In general, for University of Melbourne users, 1 TB of project storage is available upon request, and up to 10 TB is possible if needed. Project storage beyond 10 TB will generally require some sort of co-investment, but this may be waived in some circumstances, particularly for high-value shared datasets.

To increase your project storage space from more than 1TB up to 10TB please get in touch.

You can check your quota and usage with the command check_project_usage e.g.,

[new-user@spartan ~]$ check_project_usage
myproject has used 3997GB out of 8000GB in /data/cephfs/myproject
myproject1 has used 265GB out of 500GB in /data/cephfs/myproject1

Scratch Space

You can store temporary working data while your job is running at /tmp. This will map to a directory on our fast scratch network storage specific to your job ID, and clean up once your job is complete. It's also possible to write directly to /scratch/, for instance if you would like to share your working files across multiple nodes. In this case it's your own responsibility to avoid collisions (i.e. two processes writing to the same file at the same time), and clean up afterwards.

N.B. Note that scratch directory is not backed up, and does not have a snapshot ability. If you delete files, they will be unrecoverable.

N.B. Note that home, project and scratch are all network-based storage that can be accessed by multiple nodes and processes at the same time. Take care that you don't inadvertently write to the same file from multiple jobs at the same time.

N.B. /scratch has a nominal lifetime of 60 days. Files can and will be deleted without notice to ensure that /scratch does not fill up.


Local disk is typically faster than shared disks. If you find that your read-writes are slow and you are making use of a lot of I/O you may need to stage your data.

Spartan has /data for /home and /projects (large, slower), /scratch for temporary storage data (faster), and as local disk, /var/local/tmp (fastest, not shared). You may need to copy data between these locations.

How to Transfer Data In and Out of Spartan

Secure Copy (scp)

You can use the scp command to move data from your local machine to Spartan. For example, to move mydata.dat from your current working directory to your project directory on Spartan:

$ scp local.dat

You can transfer files from Spartan to your local machine by reversing the order of the arguments like so:

$ scp local.dat

For Windows users, PuTTY provides an equivalent tool called pscp. If you're data is located on a remote machine, you can SSH into that system first, and then use scp from that machine to transfer your data into Spartan.

If you'd prefer a GUI interface, you can use tools like FileZilla (cross-platform) or CyberDuck (OS X & Windows).


Repeatedly transferring large files in and out of Spartan via scp can be tedious. A good alternative is rsync, which only transfers the parts that have changed. It can work on single files, or whole directories, and the syntax is much same as for scp.

$ rsync local.dat

Note that the first argument is the source, and the second is the destination which will be modified to match the source.

Not for Long-Term Storage

While it's often essential to have fast nearby storage while working on your data, please don't use Spartan as your long-term data repository. It's not designed for that, may not conform to the requirements set by your institution or funding body, and we don't guarantee to store your data indefinitely (though we certainly won't get rid of it without asking you first).

Mediaflux Integration

Research Computing Services provides a data management service utilising the Mediaflux platform. This platform provides a persistent location for research data and meta-data. To aid integration between Mediaflux and Spartan, Java clients are available on Spartan, allowing data to be downloaded from and uploaded to Mediaflux. Details on Mediaflux integration with Spartan can be found in Section 4 of the Mediaflux support wiki

S3-compatible storage

Research Computing Services provides an object storage service with an S3-compatible layer. Data can be archived from Spartan to this service, and retrieved to be analysed later. For more information, please see our wiki

Data and Storage Solutions Beyond Spartan

The University offers a range of other data storage and management solutions to meet your needs, beyond the short-term storage available on Spartan, which are described here.

In some cases it's possible to integrate these resources with your account on Spartan to streamline your workflow. Get in touch if you'd like to find out more for your particular application.