A Self-Authenticating Social Protocol
Apr 6, 2022
by The Bluesky Team
Bluesky’s mission is to drive the evolution from platforms to protocols. The tools for public conversation should exist outside of private companies as common infrastructure, like the Internet itself. An open and durable decentralized protocol for public conversations can allow users a choice in their experience, creators control over their relationships with their audience, and developers freedom to innovate without permission from a platform.
We started with a research-intensive process to learn what can be applied from existing decentralized protocols. This research began with the ecosystem review and has continued while the Bluesky team was forming. We will be publishing our preliminary work and research this month, starting with this high level introduction. There are many projects that have created protocols for decentralizing discourse, including ActivityPub and SSB for social, Matrix and IRC for chat, and RSS for blogging. While each of these are successful in their own right, none of them fully met the goals we had for a network that enables global long-term public conversations at scale.
Some of the most important objectives we have been evaluating are portability, scale, and trust. Portability allows people to keep their social life intact even if they switch providers. Scale allows people to participate in global discourse. And trust is created by giving people insight into what services are doing with their data and how information is being promoted into or removed from their feed. To dive a bit deeper into each topic:
Portability
Portability is the ability to move between services without losing everything, like how we can switch mobile carriers without losing our phone numbers. User choice requires portability for identity, data, payments, and any other service. When people can switch providers without losing their identity or social graph, then social media can work as a competitive open market again.
With email, if you change your provider then your email address has to change too. This is a common problem for federated social protocols, including ActivityPub and Matrix. If your ActivityPub server shuts down, you lose your identity and relationships tied to your account on that server, just like you would if any other social platform shut down. Since ActivityPub servers are much smaller than the existing platforms, and often run by volunteers, this scenario is not unlikely and has happened before. We want users to have an easy path to switching servers, even without the server’s help.
Scale
Social networking platforms bring hundreds of millions of people together in a global conversation. Some people prefer smaller communities, and ActivityPub and SSB are great for those tight-knit groups, but with Bluesky we want to give users the option to participate in global conversations the way they do on large social networking platforms.
Operating at scale requires engineering for scale. Early on, Twitter’s site crashed so often that the “fail whale” became a meme. They’ve since solved these problems, but existing decentralized networks that try to replicate the functionality of big platforms have not. When you search a trending hashtag on social media and find posts from around the world or see a viral post that has 125k likes, this is the service providing you with a global view across the network while hiding the complexity under the hood. By decentralizing aspects of social platforms, we’re adding cross-organizational networking that re-exposes the complexity. A protocol for conversations at scale needs to be developed around these challenges at every step.
Decentralization adds new functionality in other domains, but when it comes to scale, we’re aiming to replicate the global experience that social networking platforms currently offer. Existing decentralized social protocols default to local conversations because it’s a natural fit for a decentralized architecture, but our goal is to make global conversations possible while preserving the freedoms users gain from interacting through an open protocol.
Trust
Decentralized networks are complex. Providers need to manage spam and abuse without inadvertently creating biases which lose the trust of their users. This is even more important for the algorithms that drive our feeds. Social media has the power to shape cultural discourse and needs to exist within a system of checks and balances. Like scale and portability, we aim to build around trust from the start by exposing what’s going on under the hood and allowing users to adjust their experience.
Starting as centralized platforms, social networks can take steps to open up APIs and provide choices to users, and this can be a path towards restoring trust in the current service. The premise of Bluesky, however, is to work towards a transparent and verifiable system from the bottom up by building a network that is open by default. We’ll do this by giving users ways to audit the performance of services and the ability to switch if they are dissatisfied.
The conceptual framework we've adopted for meeting these objectives is the "self-authenticating protocol." In law, a “self-authenticating” document requires no extrinsic evidence of authenticity. In computer science, an “authenticated data structure” can have its operations independently verifiable. When resources in a network can attest to their own authenticity, then that data is inherently live – that is, canonical and transactable – no matter where it is located. This is a departure from the connection-centric model of the Web, where information is host-certified and therefore becomes dead when it is no longer hosted by its original service. Self-authenticating data moves authority to the user and therefore preserves the liveness of data across every hosting service.
The three components that enable self-authentication are cryptographic identifiers, content-addressed data, and verifiable computation. The first two are familiar concepts in distributed systems, and the third is an emerging area of research that is not yet widely applied, but that we think will have large ramifications.
Cryptographic identifiers associate users with public keys. Self-sovereign identity is based on having cryptographic identifiers for users. Control of an account is proved by a cryptographic signature from a user, rather than an entry in a database keeping track of logins.
Content-addressed data means content is referenced by its cryptographic hash — the unique digital “fingerprint” of a piece of data. Using public keys and content-addresses, we can sign content by the user's key to prove they created it. Authenticated data enables trust to reside in the data itself, not in where you found it, allowing apps to move away from client-server architectures. This creates “user-generated authority”.
Verifiable computation uses cryptographic proofs to allow observers to verify that a computation was performed correctly without having to run it themselves. This can be used to preserve privacy by concealing inputs, as in a zero-knowledge proof, or to compress state that would otherwise have to be kept around for verification. The full potential of these cryptographic primitives is still being explored. Cutting edge research is currently being applied to scaling blockchains, but we are also investigating novel applications in distributed social networks.
Now that we've explained self-authenticating protocols, let's look at how the components help us achieve our goals.
Portability is directly satisfied by self-authenticating protocols. Users who want to switch providers can transfer their dataset at their convenience, including to their own infrastructure. The UX for how to handle key management and username association in a system with cryptographic identifiers has come a long way in recent years, and we plan to build on emerging standards and best practices. Our philosophy is to give users a choice: between self-sovereign solutions where they have more control but also take on more risk, and custodial services where they gain convenience but give up some control.
Self-authenticating data provides a scalability advantage by enabling store-and-forward caches. Aggregators in a self-authenticating network can host data on behalf of smaller providers without reducing trust in the data's authenticity. With verifiable computation, these aggregators will even be able to produce computed views – metrics, follow graphs, search indexes, and more – while still preserving the trustworthiness of the data. This topological flexibility is key for creating global views of activity from many different origins.
Finally, self-authenticating data provides more mechanisms that can be used to establish trust. Self-authenticated data can retain metadata, like who published something and whether it was changed. Reputation and trust-graphs can be constructed on top of users, content, and services. The transparency provided by verifiable computation provides a new tool for establishing trust by showing precisely how the results were produced. We believe verifiable computation will present huge opportunities for sharing indexes and social algorithms without sacrificing trust, but the cryptographic primitives in this field are still being refined and will require active research before they work their way into any products.
In this post, we’ve started to lay out the high level objectives we’ve set and how we plan to meet them. In the coming weeks, we’ll be publishing more of the research we’ve done since the ecosystem review and open sourcing preliminary code. We’ve started writing code to validate ideas and iterate on something concrete, but everything is still fully experimental. You can expect a command line client to play with, but don’t expect to build your next big app on it yet, as anything can change at this stage.
We’re not describing what we’re building as a federated or p2p network, or as a blockchain network, because it doesn’t fall neatly in any of these categories. It could be described as a hybrid federated network with p2p characteristics, but it’s more descriptive to focus on the capabilities – self-authenticating identities and data – than on network topology. Our team has previously built leading decentralized web protocols and blockchain networks, and is working on synthesizing the best of what we’ve seen into something new. For some aspects, we’ll be able to use pieces that already exist, and for others, we’ll have to come up with solutions of our own. Stay tuned for updates, we’ll share more soon.