Model-Dependent Moderation: Inconsistencies in Hate Speech Detection Across LLM-based Systems

by Alex Engler | Aug 20, 2025 | Uncategorized

This paper finds that different LLMs result in widely divergent results when applied to hate speech detection, and determining they are more effective at preventing hate speech towards protected classes than other groups.

Model-Dependent Moderation: Inconsistencies in Hate Speech Detection Across LLM-based Systems

Or reach out to us at:

We are supported by