
Information
Thesis title: Artificial Intelligence, Biases, and Transparency: Towards Fair, Interpretable, and Reliable Systems
Abstract: The concept of the “black box” symbolizes systems or processes whose inner workings are hidden, opaque,
or poorly understood. In the context of artificial intelligence (AI), this notion has become a central critique,
reflecting the lack of transparency in AI decision-making. Yet, the black box in AI extends far beyond the
model itself, encompassing the processes, interactions, and societal implications that surround AI systems.
This thesis adopts a multidimensional perspective on black-boxing, exploring not only the opacity of AI
models but also the hidden layers of explanation mechanisms and societal biases.
The research begins with the most familiar black box: the AI system. Modern AI models, especially deep
learning systems, are notoriously complex and difficult to interpret. By introducing semantic explanations
that leverage knowledge graphs and prototypes, this thesis presents methods to bridge the gap between
AI decision-making and human understanding, aligning system behavior with interpretable and intuitive
representations.
The focus then shifts to a subtler black box, the processes behind explainers and explanations themselves.
While intended to clarify AI decisions, explanations can obscure their own limitations, leaving users with
a partial or misleading understanding. This thesis investigates the design and evaluation of explanations,
proposing methods that enhance their reliability, consistency, and alignment with user objectives.
Finally, we turn to a black box that is often confronted bluntly: bias. Whether algorithmic or embedded
in language, bias is frequently approached with a seek-and-destroy mindset, where the goal is to detect and
suppress, rather than understand or address it. In the case of algorithmic bias, this thesis moves beyond
plain detection to explore its underlying sources, tracing how gender stereotypes emerge within AI systems
through their interaction with training data and real-world structures. Focusing on occupational terms
in machine translation, we examine how models respond to gender ambiguity, often resolving it through
stereotypical defaults, revealing that such biases are not mere reflections of reality but are shaped by the
system’s design and data. Shifting focus from model behavior to human-authored data, we explore the biases
encoded in cultural heritage metadata. Here, rather than erasing harmful language, we aim to contextualize
it, developing tools that detect and surface contentious terms to support informed curation. Across both
cases, this thesis advocates for a more nuanced engagement with bias, one that opens the black box rather
than simply silencing its contents.
By approaching the black box in its various forms—AI systems, explanations, algorithmic biases, and societal
nuances—this thesis offers a cohesive framework for understanding and addressing the multifaceted challenges
of AI transparency. It argues that opening these black boxes is essential to developing AI systems that are
fair, interpretable, and aligned with human values in an increasingly complex world.
Supervisor: Professor Giorgos Stamou
PhD Student: Orfeas Menis Mastromichalakis