Optus triple-0 failure: Independent review into September outage exposes raft of internal issues at telco

Daniel NewellThe Nightly
CommentsComments
Camera IconOptus released the findings of an independent review, led by former NBN director Kerry Schott. Credit: News Corp Australia

A damning report into September’s Optus outage — which cut off access to triple-0 services and was linked to two deaths — has exposed a raft of internal problems at the foreign-owned telco, notably a workplace culture that slows crisis response times.

Optus on Thursday released the findings of an independent review, led by former NBN director Kerry Schott, into the network disaster which found there were “gaps in process, accountability, and escalation and information protocols that need urgent attention”.

“It also highlighted challenges in Optus’ culture that have impacted decision-making and response times,” the report concluded.

However Dr Schott warned against axing the telco’s under-fire CEO Stephen Rue, while Optus chairman John Arthur described the findings as “sobering”.

The system meltdown was sparked by a scheduled firewall upgrade in South Australia at 12.30am on September 18.

Read more...

Normal calls were mostly unaffected but the outage blocked about 600 triple-0 calls from connecting to emergency services.

The 14-hour breakdown hit South Australia, WA, the Northern Territory and NSW.

The report found that between Optus’ networks personnel and their contractor, Nokia, at least 10 mistakes were made — despite 15 similar upgrades having been completed in previous months.

Dr Schott blamed Optus, saying the instructions about the change needed for the upgrade given to Nokia were incorrect. The errors resulted in what the report called a “locked gateway” which blocked both voice calls and emergency calls.

“It took Optus and Nokia about 13 hours to know of the problem,” the report said. “There were early alerts at the operations centre, but these received only cursory attention by Nokia and Optus.

“The call centre response (after checking that the caller was not in immediate danger) was to try and find a technical problem — either in the caller’s device or in the nearby network.

“The call centre had not been advised of any outage and thus assumed that the problem was technical in nature. The issue was not escalated further.”

The report found that of the 605 customers who initially failed to connect, only 105 eventually reached triple-0, with the rest unconnected. Calls that did go through could have taken between 40 and 60 seconds to connect, with Dr Schott noting “in an emergency, people are unlikely to hang on for this length of time, especially when the only response they are getting is silence on the line”.

But she also acknowledged the shortfalls of the triple-0 system, saying it is a highly regulated system that was originally designed in a world of 2G and 3G and “since that time there have been major changes to the networks and devices”.

The board of Optus, which is owned by Singapore’s Singtel, said it had accepted all 21 of Dr Schott’s recommendations.

“The 21 recommendations build on the multi-year strategic transformation underway at Optus and the changes introduced to address shortcomings identified by the company during the initial response to the incident,” the telco said.

“Optus has established a dedicated work program to implement Dr Schott’s recommendations, complemented by comprehensive cultural reforms underway to further strengthen accountability, transparency, risk responsibility and a customer first mindset at all levels of the organisation.”

The report’s recommendations include sweeping changes throughout the organisation, from strengthening tests during network changes to encouraging “staff to escalate any issues outside their immediate group if they have doubts”.

For the board, the report recommended making changes to ensure members are up to the job.

“To strengthen the recent move to more local responsibility, the board should consider the adequacy of its skill base and depth and, if appropriate, make the changes needed,” the report said.

What went wrong

Governance issues at Optus came in for stiff criticism in Dr Schott’s report.

“At a general level, risk management must be elevated so that all three lines of defence in risk management are working properly,” it said.

“This is not the case at present.

“The first line of defence in network operations clearly failed. The second line of defence, which provides independent review and challenge of risk management in the business under a chief risk officer, needs an upgrade in both importance and capability — which is underway.”

The third line — the internal audit function — is currently being boosted, including through the planned appointment of a chief auditor who will report directly to the board.

But most alarming, the report noted, was the time it took for Optus to find the number of triple-0 callers impacted and their details, along with “the poor information flow internally at Optus because of the siloed working culture”.

It took Optus and Nokia about 13 hours to know of the problem. There were early alerts at the operations centre, but these received only cursory attention by Nokia and Optus.

Schott report

“Some complicated engineering had to be done to find this information and awareness of about 100 callers was not acquired until after 8pm on 18 September,” it said.

“By midnight it was clear that over 600 callers had been impacted — about 10 hours after the incident was recognised. Steps have already been taken to expedite this data-finding procedure, noting that its automation is pending.”

Optus also failed to act because of “poor internal information flow”. While the networks team was managing the outage, it did not acknowledge its severity or move its management to a “central incident and crisis management team” who operate with a crisis communications team — which could have maintained “appropriate communication across internal divisions, the executive team and external stakeholders”.

This wasn’t put in place until the following morning, “when the CEO was informed of both the large number of Triple Zero calls affected and the two fatalities known at that point”.

It was also discovered that email addresses were incorrect, with typographical errors and out-of-date information.

Defence of the CEO

CEO Mr Rue took the helm from Kelly Bayer Rosmarin, who in November, 2023 bowed to immense public and political pressure and quit the job less than two weeks after another embarrassing nationwide outage left 10 million customers without internet and telephone services.

Mr Rue faced similar scrutiny after the triple-0 failure.

But Dr Schott’s report said calls to replace the CEO “are not helpful at the start of this large program of change and will not help develop the more competitive and reliable telecommunications company needed in Australia”.

“The incident certainly highlights significant problems at Optus that have been evident for some time. Major efforts are now being made to address them.

“Telecommunications is an essential service, and not only for Triple Zero calls. Optus must recognise their role in serving the community.”

Following the release of the report on Thursday, Mr Rue said he was committed to “setting a new standard”.

“Australia deserves world-class emergency call services,” he said.

”We are working closely with government, regulators, and the wider telecommunications sector to enhance the reliability of the triple-0 service for our customers.”

Optus chairman John Arthur again apologised for the deadly outage, calling it “unacceptable” and warning of “further action in relation to individual accountabilities”, which could include financial penalties or termination.

He also welcomed the report’s findings.

“The board wanted an independent and forthright assessment of what went wrong and what needs to change, and the Schott review has delivered on that in a candid and succinct analysis,” Mr Arthur said.

“The report is a sobering read for everyone at Optus.

“While the Schott review acknowledges the work undertaken before the incident to build a better company, it is clear there is much more to do. We recognise the scale of the challenge and will act decisively to make the necessary changes to strengthen the business and rebuild trust.”

Get the latest news from thewest.com.au in your inbox.

Sign up for our emails