Sift’s new Text Clustering capability enables Content Integrity customers to identify repetitive, spammy text, investigate related content, and take bulk action to maintain platform integrity. Augmenting our leading machine learning technology, this feature unlocks higher levels of accuracy and efficiency for content moderation and site integrity teams.
- Detect more spam & scams: Catch highly-repeated content not detected by user-focused ML or text filtering.
- Discover new trends: Explore clusters of similar content without having to apply any decisions or labels.
- Decision on content faster: Take bulk action on a cluster to reduce exposure and improve moderation efficiency.
Overview
Explore clusters
To see clusters of similar content on your platform, navigate to the Clusters tab. You will see a list of clusters along with data about their composition—how many pieces of content, whether they’ve been flagged by users, a preview of the repeated content, and more.
Filter clusters based on their attributes
Apply filters to see sets of clusters that are of a certain size, have been flagged by users, or other criteria that are of interest to your team. You can adjust filters by clicking "Add criteria." You can create additional filtered views by selecting "Create Set." Saved cluster sets are displayed on the left side of the Clusters tab.
Investigate a specific cluster
Click on a cluster to see all of the content that comprises it. We break out each piece of content and display the repeated text, Sift Score, number of user flags, user details, and more so you can get the context you need to make a decision.
Expand for more detail
Click on a piece of content to see the full body of repeated text for more detailed investigation.
Take bulk action
Select all or some of the content within a cluster using our bulk selection feature—then, apply the appropriate decision using the decision dropdown for efficient moderation.
FAQ
Q: How often are clusters generated?
A: Clusters are generated in a batch every other day. They only contain content that was posted at the time the cluster was generated.
Q: What decisions should I apply to content in a cluster? Do best practices vary from decisioning in other parts of the product?
A: You should apply the same decisioning practices to content in a cluster as you do in Workflows or doing other manual moderation. Applying decisions within a cluster simply improves efficiency by enabling you to do more at once.
Q: The content in a cluster isn't similar or similar enough to bulk decision on. What should I do?
A: Share your feedback with us! This is a capability we are working to refine so it's important for us to hear about areas for improvement.