BlogComposable Moderation

Composable Moderation

April 13, 2023

by Jay Graber

Human & automated moderation systems working together. Image generated by Midjourney.

Moderation is a necessary feature of social spaces. It’s how bad behavior gets constrained, norms get set, and disputes get resolved. We’ve kept the Bluesky app invite-only and are finishing moderation before the last pieces of open federation because we wanted to prioritize user safety from the start.

Just like our approach to algorithmic choice, our approach to moderation allows for an ecosystem of third-party providers. Moderation should be a composable, customizable piece that can be layered into your experience. For custom feeds, there is a basic default (only who you follow), and then many possibilities for custom algorithms. For moderation as well, there should be a basic default, and then many custom filters available on top.

The basics of our approach to moderation are well-established practices. We do automated labeling, like centralized social sites, and make service-level admin decisions, like many federated networks. But the piece we’re most excited about is the open, composable labeling system we’re building that both developers and users can contribute to. Under the hood, centralized social sites use labeling to implement moderation — we think this piece can be unbundled, opened up to third-party innovation, and configured with user agency in mind. Anyone should be able to create or subscribe to moderation labels that third parties create.

An Ecosystem of Moderation Labeling

Here’s the way we’re designing an open, composable labeling system for moderation:

Anyone can define and apply “labels” to content or accounts (i.e. “spam”, “nsfw”). This is a separate service, so they do not have to run a PDS (personal data server) or a client app in order to do so.
Labels can be automatically generated (by third-party services, or by custom algorithms) or manually generated (by admins, or by users themselves)
Any service or person in the network can choose how these labels get used to determine the final user experience.

So how will we be applying this on the Bluesky app? Automated filtering is a commoditized service by now, so we will be taking advantage of this to apply a first pass to remove illegal content and label objectionable material. Then we will apply server-level filters as admins of bsky.social, with a default setting and custom controls to let you hide, warn, or show content. On top of that, we will let users subscribe to additional sets of moderation labels that can filter out more content or accounts.

Let’s dig into the layers here. Centralized social platforms delegate all moderation to a central set of admins whose policies are set by one company. This is a bit like resolving all disputes at the level of the Supreme Court. Federated networks delegate moderation decisions to server admins. This is more like resolving disputes at a state government level, which is better because you can move to a new state if you don’t like your state's decisions — but moving is usually difficult and expensive in other networks. We’ve improved on this situation by making it easier to switch servers, and by separating moderation out into structurally independent services.

We’re calling the location-independent moderation infrastructure “community labeling” because you can opt-in to an online community’s moderation system that's not necessarily tied to the server you're on.

Community Labeling

Community labeling can be done by automated systems, or it can consist of humans manually labeling things. The human-generated label sets can be thought of as something similar to shared mute/block lists.

Here’s how we think a manual community labeling system will work:

Anyone can create a label set, then add admins or mods to help manage it
Mods can add labels that the set defines to accounts or content (“rude”, “troll”, etc.)
Anyone can subscribe to the set and have the labels be applied to their experience

What's Ahead

We’re landing a first pass on automated filtering in the app today, and will improve it based on user feedback. Community labeling is still in the works. We’ll be publishing more details soon.

Through this approach of a composable, customizable moderation system, we aim to prioritize user safety while giving people more control. An open labeling system for moderation contributed to by both developers and users will allow for innovation, transparency, and agency in this critical piece of social networking infrastructure where technical and social systems collide.

Composable Moderation

An Ecosystem of Moderation Labeling

Community Labeling

What's Ahead

We're Hiring

Bluesky

Links

Connect