The Future Fund - Australia's sovereign wealth fund, with $273 billion under management - is standing up a data lakehouse that will act as a “core central data platform” for the entire agency.
The project represents the latest evolution of the agency’s data analysis environment, which is powered by Databricks.
Speaking at a ‘Data Intelligence Day’ summit in Melbourne last week, data and software engineering manager Paul Matheson said the Fund’s Databricks platform had grown in use and importance over time.
“It was initially stood up in 2018 … as an engineering ‘playground’ and research capability,” he said.
“It was generally used by just a small group of data engineers at the time … for data exploration and basic visualisation using notebooks, and running just some small non-critical workloads, so it had limited usage among just a small group and small data transactional volumes.
“Fast forward a few years and Databricks is now our central data platform used at the agency. It runs a whole series of critical workloads for us using Databricks workflows and is used for extensive data analysis.
“We’ve got users spanning from investment and business analysts, [to] data and software engineers, right the way through to a team of quant developers as well.”
The next evolution of Databricks is the establishment of a lakehouse environment for all Future Fund’s data - structured, semi-structured and unstructured.
“We’re now in the middle of an implementation of a data lakehouse at the agency,” Matheson said.
“The lakehouse will be the core central data platform for the entire Future Fund agency.”
Serverless SQL warehouse
As usage of the existing Databricks environment increased over time, handling more production workloads, Future Fund said that users had started to encounter performance problems.
It resolved these issues by changing the way its data warehouses are hosted, shifting from SQL warehouses based in its cloud environment, to serverless SQL warehouses.
The use of serverless SQL warehouses is now recommended by Databricks as a way to improve query performance and reduce costs.
Senior manager of DevOps and platforms Matt Blair said that serverless SQL warehouses meant that users waited less time to start or finish a job.
The characteristics of a serverless architecture also meant cost savings, as resources are scaled up and down again according to usage demands.
Previously, SQL warehouses ran constantly but were idle for considerable periods of time; a common operating model to avoid long startup times.
While not specifically seeking cost savings - the Fund’s Databricks platform team was more concerned with performance and user experience, particularly as more users onboarded - Blair acknowledged that savings had turned out to be substantial, nonetheless.
“Before we moved to serverless, the average day in the life of the [compute resource] cluster running the warehouse was that it started up in the morning and it just ran all day,” Blair said.
“We had a long idle time and, as you can imagine, inefficient resource utilisation and increased cost.
“When we moved to serverless, with rapid upscaling and downscaling, as soon as your job runs, it starts up, and within three-to-four seconds it’s running. When it’s done ... it shuts down again.
“We had an approximately 36 percent saving once we were in Serverless SQL, which makes the people who pay the bills very happy.”
AI readiness
In its most recent 'year in review' [pdf], the Future Fund positioned investments in Databricks as foundational to its AI ambitions, and ability to participate in the burgeoning AI space.
“We already have significant exposure to AI companies, with one of our biggest single investments being in Databricks,” private equity director Kelvin Mak-Lui is quoted as saying.
Databricks, the document states, "provides and enables organisations to build, train and deploy AI models to derive data insights that can be used to drive efficiencies and new opportunities in their businesses.