Skip to main content
Best News Website or Mobile Service
WAN-IFRA Digital Media Awards Worldwide 2022
Best News Website or Mobile Service
Digital Media Awards Worldwide 2022
Hamburger Menu

Advertisement

Advertisement

Singapore

'This is not supposed to happen': Experts on DBS, Citi outage caused by data centre failure

Most banks have two or more data centres and redundancies at multiple levels if a primary data centre goes offline, industry experts say. 

'This is not supposed to happen': Experts on DBS, Citi outage caused by data centre failure

An error message on a DBS ATM at Central Mall on Oct 14, 2023 (left) and a file photo of a data centre. (File photo: AFP)

New: You can now listen to articles.

This audio is generated by an AI tool.

SINGAPORE: Banks are heavily reliant on data centres for their operations, making it crucial to have a backup data centre should one fail, banking and technology experts told CNA in the wake of a DBS and Citibank outage earlier this month

These centres form the nerve centre of the banking ecosystem, and they store, manage and process massive amounts of data, the experts said. 

In the case of DBS and Citibank, the data centre they used experienced technical issues with its cooling system during a planned upgrade, raising temperatures and affecting equipment. The incident disrupted both banks' payment and banking services on Oct 14

Last Friday, the banks’ data centre provider Equinix said its initial investigation found that the incident was caused by a vendor contractor. The contractor had incorrectly sent a signal to close the valves from the chilled water buffer tanks, which affected the flow of chilled water to the cooling system. 

Equinix said it is reviewing its processes and putting in place additional audits to prevent a similar incident during future upgrades.

The DBS and Citibank outage started on Oct 14 at 3pm and only fully resumed the following morning. Customers were unable to access both banks' apps and online banking or payment services such as PayLah! and PayNow. ATM services were also down at several locations.

Last Thursday, the Monetary Authority of Singapore (MAS) said it had ordered both banks to conduct a thorough investigation, noting that the downtime exceeded its limit of four hours within a 12-month period. 

While MAS has no oversight over data centres, it expects banks to establish contractual agreements with data centre providers that incorporate its requirements on system availability, it said. 

DATA CENTRES THE "NERVE CENTRE"

Data centres and their functions form the backbone of modern-day businesses, said Associate Professor Lee Poh Seng. 

"At their core, they house a variety of components, including servers, storage systems, networking hardware like switches and routers, and the cabling infrastructure necessary for data and network communications,” said Assoc Prof Lee, who is from the National University of Singapore’s (NUS) Department of Mechanical Engineering. 

Singapore Management University's (SMU) Dr Patrick Thng, who has close to three decades of experience in banking, including as managing director at various banks, likened data centres to a car's engine. 

"From very fundamental office support applications … to banking critical operations, like core banking, credit card applications, ATM, foreign exchange trading, branch platform, digital banking services, (all rely on data centres)," said the director of a financial technology and analytics programme at SMU. 

"Data centres have massive storage to store all your transactions and customer data. Banks rely on this data centre like a nerve centre."

DBS ATMs displaying an error message at Simei’s Eastpoint Mall on Oct 14, 2023. (Photo: CNA/Calvin Oh)

COOLING SYSTEM CRITICAL TO DATA CENTRE

NUS’ Assoc Prof Lee pointed out that cooling systems are critical to data centres. Equinix uses a chilled water system, which works by circulating chilled water to absorb heat from the data centre environment and dissipate it outside the facility. 

“They keep the hardware at optimal operating temperatures, and a failure could result in overheating and subsequent IT hardware failures,” he said.

According to the Singapore Computer Society, most data centres use centralised chiller plants that have built-in redundancy to enable continuous cooling in the event of equipment failure. Outages still occur, but are “very rare”, it said. 

The recommended temperature range for data centres is 18 to 27 degrees Celsius, said Assoc Prof Lee, referring to guidelines by the American Society of Heating, Refrigerating and Air-Conditioning Engineers. 

In Singapore’s tropical climate, with high year-round temperatures and humidity, robust cooling systems to prevent overheating are necessary. 

When a cooling system is down and temperatures are not brought back to a safe range, overheating can ensue. This can cause intermittent hardware errors, data corruption, and in severe cases, permanent hardware damage, added Assoc Prof Lee, who holds two US patents in thermal systems. 

Asked why it might have taken such a long time for the data centre to resume operations, Assoc Prof Lee said: “If not previously encountered, the incident's nature could present a learning curve, extending the recovery time. The adequacy and effectiveness of disaster recovery procedures in place could also affect the recovery time.”

BACKUPS, RECOVERY PLANS NEEDED

The importance of the data centre means that banks always have a backup data centre. Some banks also have two data centres that share the workload concurrently. In such cases, if one fails, the other can pick up the slack.

Dr Dennis Khoo, a managing partner at digital consultancy allDigitalFuture, said: "Generally for mission-critical applications like banks, there are multiple levels of redundancy. 

"In most advanced banks, using the latest technologies, the database will be instantly replicated, that means they'll have a primary site and alternate site and the data is replicated instantly on both sites," said Dr Khoo. 

The Singapore Computer Society said most data centres are designed and built with a certain level of redundancy and the ability to conduct maintenance in real time. They are also specifically built to meet the exact redundancy requirements of the business, it said.

A common uptime guarantee of a data centre would usually be at 99.982 per cent to mitigate possible disruption to its customers, the society added. 

However, there is still 0.018 per cent of possible downtime. “Thus, the client must establish an efficient Business Continuity Management System and IT Disaster Recovery Plan to allow their critical IT systems and data information to immediately failover to the secondary data centre in the shortest possible time should such an incident occur.”

WHAT CAN BANKS DO WHEN DATA CENTRES ARE DOWN 

That said, should all data centres fail, there is little – if any – service a bank can provide. 

Dr Thng said banks activate what they call "offline mode", which means they render some services at branch offices using what is available. These transactions are then updated with the mainframe when the data centre is back up. 

These services may include cash withdrawals – as banks have spare cash on hand – cash deposits, payment instructions and credit card transactions. In the case of DBS, it reopened branches to help customers with some services. 

According to Dr Thng, IT incidents, such as a delay in processing, occur daily. Some of these go unnoticed by customers. Still, service outages will negatively impact the bank’s reputation and may have some financial repercussions, for example, if a client incurs late fees from being unable to pay his bill on time due to an outage. 

On reputational impact, Dr Khoo said banks would have “broken” their service commitments to provide around-the-clock service to customers.

“So definitely, in that sense, there will be some reputational damage in terms of your ability to serve customers properly. And with proper design, this is not supposed to happen.”  

Source: CNA/wt(cy)

Advertisement

Also worth reading

Advertisement