The University of Queensland has prototyped a new data storage fabric it hopes to roll out by the end of the year to improve data access between its main campus and its off-site data centre.
To support its research, the university uses a high-capacity cloud storage node, the QRIScloud, which is located at the Polaris data centre in Springfield, around 30 kilometres south of its inner-city campus in St Lucia.
Also at Polaris, a data-intensive supercomputer called FlashLite and the Tinaroo cluster are connected back to the campus - which has on-premise computer rooms and high-end scientific instruments - over multiple 10 gigabit network cables.
The problem has been that until now, QRIScloud and the on-campus data centres have operated as separate storage silos, according to UQ research computing centre director David Abramson.
This has made the process of storing and moving data from one location to another a less-than-user-friendly process for already busy researchers.
“Polaris is a serious, commercially-run data centre with physical security, data backup, diesel generators – all sorts of stuff. It augments the data centres on campus,” Abramson said.
"But .. we would never put our scientific instruments in a data centre. For example, we have a Siemens MAGNETOM 7-Tesla MRI machine on campus. The people or animals that go in that are on campus and the data needs to be accessed locally.
“At the moment if you store some data at this remote data centre in the cloud storage, and then you want to manipulate that on campus, you have to manually copy it, and then work on it locally and then send it back, or delete it, or whatever else, to manage it.”
What a mesh
Abramson is looking to ease the burden on researchers by implementing a data storage fabric that spans both the campus and the off-site data centre, known as MeDiCI (the metropolitan data caching infrastructure).
The distributed file system will allow the working set of data that is currently being accessed by researchers to be cached on campus using IBM’s GPFS product (also known Spectrum Scale) without any additional user involvement.
Data that hasn’t been accessed for a while is automatically moved back to Polaris, which uses QRIScloud’s SGI DMF hierarchical file system, with the data ultimately ending up on tape.
“If you want to touch some data remotely in the cloud at Polaris, you can access that data using this distributed file system. If you want to access that data in campus, it appears automatically and gets moved without the user’s involvement. So that lessens the burden,” Abramson said.
A major additional benefit for users working with data stored at Polaris is that bandwidth bottlenecks between the campus and Polaris are less likely to cause a slowdown in their work.
“When I touch a file on campus I need it to appear reasonably quickly, but as long as it’s being manipulated here, it isn’t bouncing between the two locations, it will stay here as long as needs to be accessed," Abramson said.
“Say someone does an experiment on an instrument. They capture the data, it’s stored on the local cache, they might do some image processing on that data, and delete parts of the data because it’s rubbish, and then over time it will percolate back to the data centre, and then I might do some more processing out there using the HPC systems.
“Latency only matters between those major workflow steps. So if I touched a file here, and I touched a file there, and I touched a file somewhere else, data would ping-pong back and forth, but that tends not to be the use case.”
Implementation
While the university hasn’t written any new code for MeDiCI, Abramson said it had worked closely with GPFS’ developers in order to configure it to work with SGI DMF.
“The common belief from various people was that we couldn’t make [IBM GPFS] work [with SGI DMF] and we have,” Abramson said.
“GPFS is a parallel system, so it works well on high-performance computers, but it’s also a distributed file system. And they’re pretty smart guys [at IBM] that have worked out a lot of things over the years. It’s an old product, but it’s been constantly updated.
“The biggest secret is talking to the people in those organisations in the labs who built the software to figure out what we needed to do to get it to work. So we haven’t written any code, but we have configured in a particular way, and we have a lot of help from them.”
The decision to create MeDiCI was made a year ago, and work on the proof-of-concept kicked off around four months ago.
It is already being trialled by a number of heavy users, including the Queensland Brain Institute, which is using a MeDiCI network node to access data it stores in Polaris.
“We’re running up a proof-of-concept with a couple of cache nodes on campus at the moment, and we’re bringing that online, and by the end of the year I want that operationalised and we’re looking to add some more cache nodes to it," Abramson said.
This project was named a finalist in the iTnews Benchmark Awards 2017.