BlogBluesky’s Current Efforts on Trust and Safety

Bluesky’s Current Efforts on Trust and Safety

September 18, 2024

by Aaron Rodericks

In August, we published a blog post on anti-toxicity features that Bluesky’s product team designed with the Trust & Safety team. You can read that blog post here.

Trust and Safety (T&S) encompasses how we make all aspects of the Bluesky app a safe and enjoyable experience for users, covering the processes, policies, and the product. As Bluesky’s Head of T&S, my goal is to understand where the biggest gaps in user needs are and how to address them to ensure that people have a pleasant experience on Bluesky.

This is a big quarter for Trust and Safety at Bluesky, as we work on a large number of improvements. Here’s a preview of everything that is in progress!

Ban evasion and multi-account detection capabilities

People deserve to have an experience free from harassment on Bluesky. While harassers can be infinitely creative in how they avoid detection, we’re working on tooling to reduce their impact. For example, we’re adding more friction to their ability to create new accounts. We currently register users for additional defenses when we see a pattern of new account harassment, but in the future, we'll be able to better detect and surface when multiple new malicious accounts are created and managed by the same user.

Toxicity detection experiments

Addressing toxicity is one of the biggest challenges on social media. On Bluesky, the two areas that made up 50% of user reports in the past quarter are for content that is rude and for accounts that are fake, scams, or spam. Rude content especially can drive people away from forming connections, posting, or engaging for fear of attacks and dogpiles.

In our first experiment, we are attempting to detect toxicity in replies, since user reports indicate that is where they experience the most harm. We’ll be detecting rude replies, and surfacing them to mods, then eventually reducing their visibility in the app. Repeated rude labels on content will lead to account level labels, and suspensions. This will be a building block for detecting group harassment and dog-piling of accounts.

Automating spam and fake account removals

Harm on social media can happen quickly. For example, if a fake impersonation account asks for a fund transfer, it might take only a matter of minutes before someone falls for a scam. We’re launching a pilot project to automatically detect when an account is clearly fake, scamming, or spamming users to hopefully reduce the likelihood this happens. We’re hoping that this project, paired with our moderation team, can cut down the action time for these reports to within seconds of receiving a report.

Feedback on moderation reports

In the coming months, we’re working to move away from communicating with users about violations via email to communicating through the Bluesky app. Users will receive notices of infractions or labels within the app. We’ll also send outcomes of your own reports through the app as well.

Geography-specific labels

In some cases, content or accounts may be allowed under Bluesky's Community Guidelines but violate local laws in certain countries. To balance freedom of speech with legal compliance, we are introducing geography-specific labels. When we receive a valid legal request from a court or government to remove content, we may limit access to that content for users in that area. This allows Bluesky's moderation service to maintain flexibility in creating a space for free expression, while also ensuring legal compliance so that Bluesky may continue to operate as a service in those geographies. This feature will be introduced on a country-by-country basis, and we will aim to inform users about the source of legal requests whenever legally possible.

Designing video on Bluesky for safety

We recently launched video on Bluesky, and the T&S team has been working with the product team to ensure the feature is launched safely.

Here’s a look at how T&S works together with product. The product team puts together a document listing what they intend to build. Trust and Safety then assesses the risks associated with the feature, and makes recommendations to minimize harms that are most likely from that feature. This ensures that we anticipate harms and integrate mitigations before launch.

For video, Trust & Safety has incorporated various features like being able to turn off auto-play or ensuring that reports can be made and labels applied to content. You can read more about the available safety tooling for video here.

We try to be pragmatic in building the safety elements that most people will need prior to launch, but there’s always room for more improvements in response to user feedback. So after a product launches, we pay close attention to reports and support requests as we improve the feature.

List changes to restrict abuse

Lists are a powerful way to have more control over your experience on Bluesky. You’re able to curate your favorite users, or to filter individuals out from your Bluesky experience — and to share those lists with others, so they can benefit from your curation as well.

However, sometimes bad actors use lists to harass others and violate our rules, so we’re making some changes. We have recently updated starter packs to remove members when blocked, and are doing the same for curated lists. Prior to this, the Bluesky Trust & Safety team has only been able to take down entire lists as a moderation action, instead of removing specific individuals. For moderation lists, this would mean that we’d unintentionally erase blocks. Now, when you block the creator of a list that you are on, you will get removed from the list. This behavior doesn’t apply to moderation lists since that would defeat their purpose.

We will also be starting a widespread effort to identify lists with toxic and abusive names or descriptions. Lists with names or descriptions that violate the Bluesky Community Guidelines will be hidden in the app until or unless their creator modifies them to comply with our rules. We will also take further action against users that repeatedly create abusive lists.

Lists continue to be an area of active discussion and development for our team to find the right balance for user safety.

Prioritizing User Concerns

This section provides some transparency on how we prioritize T&S efforts across the organization.

We read your concerns raised via reports, emails, or mentions to @safety.bsky.app. Our overall framework is asking how often something happens vs how harmful it is. Then we focus on addressing high-harm/high-frequency issues while also tracking edge cases that could result in serious harm to a few users.

For example, a small number of accounts have been harassing a few people on the app by creating multiple accounts and targeting the user repeatedly. Although this happens to a tiny fraction of users, it causes enough continual harm that we want to take action to prevent this abuse.

As always, your feedback is welcome through comments or by reaching out to moderation@blueskyweb.xyz.

Bluesky’s Current Efforts on Trust and Safety

Ban evasion and multi-account detection capabilities

Toxicity detection experiments

Automating spam and fake account removals

Feedback on moderation reports

Geography-specific labels

Designing video on Bluesky for safety

List changes to restrict abuse

Prioritizing User Concerns

We're Hiring

Bluesky

Links

Connect