SAN FRANCISCO >> Facebook said today that it had repaired a technical error that led to long lapses in service at its various properties, including Instagram, WhatsApp and Messenger.
The interruption lasted nearly 24 hours on some of the services and was the longest in Facebook’s recent history. It was an eye-opening reminder that even the most powerful internet companies, employing the best computer scientists and cutting-edge technology, can still be crippled by human error.
“All of the big web companies have multiple lines of defense, but sometimes a coding mistake made by one engineer can make its way onto many thousands of computers and cause major errors,” said Alex Stamos, a former chief security officer at Facebook and a lecturer at Stanford University. “In other words, rebooting something as complex as Facebook is very, very hard.”
A “server configuration change” made Wednesday had a cascading effect through the company’s network, a Facebook spokesman said. That created a repeating loop of problems that kept growing and could not be immediately fixed, according to one current and one former Facebook employee, who spoke on the condition of anonymity because they were not allowed to talk to reporters.
That small mistake had big consequences. Instagram users couldn’t view other profiles, WhatsApp users couldn’t send messages, and news feeds across Facebook’s main app went blank.
DownDetector, which likens itself to a weather report for the internet, said it had received 7.5 million problem reports about Facebook’s apps. In comparison, widespread problems on YouTube in October prompted just 2.7 million reports. DownDetector measures service interruptions in part by counting reports from users who are experiencing problems.
“Never before have we such a large-scale outage,” said Tom Sanders, a co-founder of DownDetector.
Early today, Facebook was able to pull most of its systems back online. The company is still trying to figure how that error reverberated throughout its network. Facebook officials emphasized that the problem had not been caused by hacking or a cyberassault like a so-called denial-of-service attack, which would hit servers with a wave of traffic that caused them to stop working.
Facebook, like other internet giants, prides itself on never going offline. That predictability has helped it become one of the most influential — and criticized — companies in the world. An estimated 2 billion-plus people use one or several of its services daily.