This study evaluates the ability of lightweight open-source large language models (LLMs) to detect bias in text. Eleven models of six popular LLM families were tested in a zero-shot setting on a unified dataset of 8,745 sentences derived from three selected sources, covering gender, race, religion, and appearance bias. Results showed that none of the models exceeded 70% accuracy, which highlighted limitations of lightweight LLMs and existing challenges related to current bias detection datasets.

This work is licensed under a Creative Commons Attribution 4.0 International License.