UniSuper’s Google private cloud environment was deleted because a single parameter in a software tool was left blank, inadvertently placing a one-year expiry on the environment.
The cloud provider on Saturday finally explained the “rare” and cascading series of events that led to UniSuper’s online services being inaccessible for nine days and having to be rebuilt from backups.
The previous best explanation of the incident was “an inadvertent misconfiguration during the provisioning of UniSuper’s private cloud, which triggered a previously unknown software bug.”
The cloud provider has now published a post-incident report to “publicly clarify the nature of the incident and ensure there is an accurate account in the interest of transparency”.
iTnews previously reported that other customers had been seeking explanations of the incident to understand their own potential exposure. The incident also occurred a week prior to a major but closed-door Google Cloud summit in Sydney attended by customers.
The official post-mortem came days after a widely shared LinkedIn post that appeared to leak aspects of the findings.
'One input parameter was left blank'
Google Cloud said that the incident was isolated to one Google Cloud VMware Engine (GCVE) private cloud run by UniSuper across two zones. It said UniSuper had more than one private cloud.
Owing to specific provisioning requirements, the setup was performed by Google Cloud engineers themselves using an internal tool that's no longer in use.
While saying that Google operators “followed internal control protocols”, the provider said that “one input parameter was left blank when using [the] internal tool to provision the customer’s private cloud.”
“As a result of the blank parameter, the system assigned a then unknown default fixed one-year term value for this parameter,” it said.
“After the end of the system-assigned one year period, the customer’s GCVE private cloud was deleted.”
Google said that UniSuper would have received no warning of the deletion because they didn’t ask for it to happen.
“No customer notification was sent because the deletion was triggered as a result of a parameter being left blank by Google operators using the internal tool, and not due a customer deletion request,” Google said.
“Any customer-initiated deletion would have been preceded by a notification to the customer.”
The recovery and rebuild of the deleted environment were made possible because UniSuper had a “robust and resilient architectural approach to managing risk of outage or failure” on its end, including the use of “third party backup software”.
“The customer’s CIO and technical teams deserve praise for the speed and precision with which they executed the 24x7 recovery, working closely with Google Cloud teams,” it said.
Google said additional backups it made for UniSuper were also accessible.
It said the same incident is no longer possible, in part because customers can now do the more complex configurations themselves - which would trigger warnings if an environment was ever up for deletion.
Google said it also “manually reviewed all GCVE private clouds to ensure that no other GCVE deployments are at risk” of the same set of circumstances.
Google added that it was a “one-time incident” and that its resiliency and stability credentials remained intact.
UniSuper set up GCVE private clouds to replace two data centres it previously ran in Melbourne.