Tag Archives: Facebook

Why Facebook’s Lack of Customer Support Is a Problem

от Божидар Божанов

лиценз CC BY

Facebook is arguably the biggest social network. The network effect makes it hard for people to leave Facebook, and so many businesses, celebrities, institutions, politicians rely on it for reaching out to their customers/fans/citizens/voters.

Yet, at least in my part of the world, the customer support of Facebook is practically non-existent. Because I’m a member of parliament and former minister that had handled disinformation and relations with Meta, many people turn to me for their Facebook woes. And they are almost never resolved.

A few examples: a deep fake of the Bulgarian prime minister was circulating on Facebook for several days, after two institutions submitted official take-down notices. Profiles of fellow members of parliament were blocked/hacked. None of their support requests succeeded and their profiles remained blocked for months. A fellow member of parliament with paid subscription could not change his cover photo during an election campaign for mayor, and Facebook’s support stopped answering. Facebook bulk-deleted our candidate pages after one election campaign (after it has been taking ad money), and its support did not respond adequately (pages remained deleted). One colleague’s ad account was hacked and a malicious actor used his credit card to promote ads. He was unable to remove the intruder and Facebook’s support didn’t manage to do it either, so my colleagues had to remove the credit card. When I became a minister, my request for a blue checkmark was initially rejected and the official support channel didn’t answer. And in all of those cases support was requested in English, so it’s not about language-specific limitations.

I’m sure anyone using Facebook for business has similar experiences. In a nutshell, support is useless, even if you are paying customer or advertiser. And clearly there is no market pressure to change that.

The European Union recently introduced the Digital Services Act which at least pushes forward a long-time proposal of mine for appeals and independent arbitration for decisions that block access. I don’t know if that’s working already, but at least it’s a step.

So why is that a problem? Facebook argues it is not a ‘natural monopoly’, and I’ll agree with that to an extent – it faces competition from different types of social networks. But its scale and the network effect means it is not just a regular market player – it is (as the digital services act puts it) – a very large online platform that has gained a broad influence and therefore needs to be required to bear extra responsibility. The ability for some entity with 4 million users in a country of 7 million to arbitrarily ban members of parliament or candidates for mayors, or to choose (because of inefficiency) to leave a deep fake of a prime minister up for days, is a systemic risk. It’s a systemic risk to leave a business to be reliant on the whims and inefficiencies of the nearly non-existent customer support.

If a company can’t get customer support sorted, market forces usually push it out of the market. But because of the network effect (and its policy of acquiring some potential competitors), this hasn’t been the case. And if one of the most highly-valued companies on earth can’t have a decent support process, regulators should step up and set standards.

The post Why Facebook’s Lack of Customer Support Is a Problem appeared first on Bozho's tech blog.

Hypotheses About What Happened to Facebook

от Божидар Божанов

лиценз CC BY

Facebook was down. I’d recommend reading Cloudflare’s summary. Then I recommend reading Facebook’s own account on the incident. But let me expand on that. Facebook published announcements and withdrawals for certain BGP prefixes which lead to removing its DNS servers from “the map of the internet” – they told everyone “the part of our network where our DNS servers are doesn’t exist”. That was the result of a backbone self-inflicted failure due to a bug in the auditing tool that checks whether the commands executed aren’t doing harmful things.

Facebook owns a lot of IPs. According to RIPEstat they are part of 399 prefixes (147 of them are IPv4). The DNS servers are located in two of those 399. Facebook uses a.ns.facebook.com, b.ns.facebook.com, c.ns.facebook.com and d.ns.facebook.com, which get queries whenever someone wants to know the IPs of Facebook-owned domains. These four nameservers are served by the same Autonomous System from just two prefixes – 129.134.30.0/23 and 185.89.218.0/23. Of course “4 nameservers” is a logical construct, there are probably many actual servers behind that (using anycast).

I wrote a simple “script” to fetch all the withdrawals and announcements for all Facebook-owned prefixes (from the great API of RIPEstats). Facebook didn’t remove itself from the map entirely. As CloudFlare points out, it was just some prefixes that are affected. It can be just these two, or a few others as well, but it seems that just a handful were affected. If we sort the resulting CSV from the above script by withdrawals, we’ll notice that 129.134.30.0/23 and 185.89.218.0/23 are the pretty high up (alongside 185.89 and 123.134 with a /24, which are all included in the /23). Now that perfectly matches Facebook’s account that their nameservers automatically withdraw themselves if they fail to connect to other parts of the infrastructure. Everything may have also been down, but the logic for withdrawal is present only in the networks that have nameservers in them.

So first, let me make three general observations that are not as obvious and as universal as they may sound, but they are worth discussing:

Use longer DNS TTLs if possible – if Facebook had 6 hour TTL on its domains, we may have not figured out that their name servers are down. This is hard to ask for such a complex service that uses DNS for load-balancing and geographical distribution, but it’s worth considering. Also, if they killed their backbone and their entire infrastructure was down anyway, the DNS TTL would not have solved the issue.
We need improved caching logic for DNS. It can’t be just “present or not”; DNS caches may keep “last known good state” in case of SERVFAIL and fallback to that. All of those DNS resolvers that had to ask the authoritative nameserver “where can I find facebook.com” knew where to find facebook.com just a minute ago. Then they got a failure and suddenly they are wondering “oh, where could Facebook be?”. It’s not that simple, of course, but such cache improvement is worth considering. And again, if their entire infrastructure was down, this would not have helped.
Have a 100% test coverage on critical tools, such as the auditing tool that had a bug. 100% test coverage is rarely achievable in any project, but in such critical tools it’s a must.

The main explanation is the accidental outage. This is what Facebook engineers explain in the blogpost and other accounts, and that’s what seems to have happened. However, there are alternative hypotheses floating around, so let me briefly discuss all of the options.

Accidental outage due to misconfiguration – a very likely scenario. These things may happen to everyone and Facebook is known for it “break things” mentality, so it’s not unlikely that they just didn’t have the right safeguards in place and that someone ran a buggy update. The scenarios why and how that may have happened are many, and we can’t know from the outside (even after Facebook’s brief description). This remains the primary explanation, following my favorite Hanlon’s razor. A bug in the audit tool is absolutely realistic (btw, I’d love Facebook to publish their internal tools).
Cyber attack – It cannot be known by the data we have, but this would be a sophisticated attack that gained access to their BGP administration interface, which I would assume is properly protected. Not impossible, but a 6-hour outage of a social network is not something a sophisticated actor (e.g. a nation state) would invest resources in. We can’t rule it out, as this might be “just a drill” for something bigger to follow. If I were an attacker that wanted to take Facebook down, I’d try to kill their DNS servers, or indeed, “de-route” them. If we didn’t know that Facebook lets its DNS servers cut themselves from the network in case of failures, the fact that so few prefixes were updated might be in indicator of targeted attack, but this seems less and less likely.
Deliberate self-sabotage – 1.5 billion records are claimed to be leaked yesterday. At the same time, a Facebook whistleblower is testifying in the US congress. Both of these news are potentially damaging to Facebook reputation and shares. If they wanted to drown the news and the respective share price plunge in a technical story that few people understand but everyone is talking about (and then have their share price rebound, because technical issues happen to everyone), then that’s the way to do it – just as a malicious actor would do, but without all the hassle to gain access from outside – de-route the prefixes for the DNS servers and you have a “perfect” outage. These coincidences have lead people to assume such a plot, but from the observed outage and the explanation given by Facebook on why the DNS prefixes have been automatically withdrawn, this sounds unlikely.

Distinguishing between the three options is actually hard. You can mask a deliberate outage as an accident, a malicious actor can make it look like a deliberate self-sabotage. That’s why there are speculations. To me, however, by all of the data we have in RIPEStat and the various accounts by CloudFlare, Facebook and other experts, it seems that a chain of mistakes (operational and possibly design ones) lead to this.

The post Hypotheses About What Happened to Facebook appeared first on Bozho's tech blog.