Tassallah Amina Abdullahi

Logo

PhD student at Brown University

View My GitHub Profile

I am a 5th-year Ph.D. Candidate in Computer Science at Brown University, co-advised by Carsten Eickhoff in the Health AI group and Ritambhara Singh in the Singh Lab. I am also an ACM SIGHPC Fellow.

My research focuses on the evaluation, robustness, and safety of large language models (LLMs) across speech and text modalities. I design benchmarks, adversarial evaluation frameworks, and LLM evaluation pipelines to improve reliability and trustworthiness in AI systems, with particular attention to multilingual, low-resource, and high-stakes domains such as healthcare.

Current Research Areas

LLM Evaluation and Benchmarking
I develop evaluation frameworks and benchmarks to assess the safety, factuality, and robustness of LLMs. My work investigates adversarial vulnerabilities, limitations of self-evaluation, and failure modes in generative AI systems.

Clinical Decision Support
In collaboration with Dr. Hamish Fraser and Bio-RAMP Labs, I evaluate the safety, reliability, and clinical applicability of LLMs and generative AI systems for diagnostic reasoning, especially in low-resource healthcare settings.

Global Accessibility and Multilingual AI
Through the Masakhane NLP community, I work on evaluating automatic speech recognition and multimodal LLMs to improve healthcare accessibility in low- and middle-income countries.

Prior to Brown, I obtained a Master’s in Computer Science from the University of Cape Town, where I worked with Professor Geoff Nitschke on evolutionary algorithms for predictive modeling.


I am currently seeking full-time Research Scientist or Applied Scientist roles starting in 2026.
You can contact me at tassallah_abdullahi@brown.edu

CV, Google Scholar, LinkedIn, Twitter

Recent News

Selected Conference Proceedings

Position: Benchmarking is Broken-Don’t Let AI Be Its Own Judge
Cheng, Z., Wohnig S., Gupta, R., Alam, S., Abdullahi, T., et al.
NeurIPS (2025)

LLM-Powered Graph Reasoning for Knowledge Discovery
Gemou, I., Abdullahi, T., Singh, R.
WIML NeurIPS (2025)

K-Paths: Reasoning over Graph Paths for Drug Repurposing and Drug Interaction Prediction
Abdullahi, T., Gemou, I., Nayak, N., Murtaza G., et al.
KDD (2025)

AfriMed-QA: Towards A Pan-African, Multi-Specialty, Medical Question-Answering Benchmark Dataset
Olatunji, T., Charles, N., Owodunni, A., Yuehgoh F., Abdullahi, T., et al.
ACL (2025)

Afrispeech-Dialog: A Benchmark Dataset for Spontaneous English Conversations in Healthcare and Beyond
Mardhiyyah Sanni, Abdullahi, T. et al.
NAACL (2025)

Diagnostic Accuracy of ChatGPT4.0 for TIA or Stroke Using Patient Symptoms and Demographic Data
Khatri, I., Zahiri, A., Abdullahi, T. et al.
International Stroke Conference, (2025)

Retrieval Augmented Zero-Shot Text Classification
Abdullahi, T., Singh, R., and Eickhoff, C.
ACM SIGIR ICTIR. (2024)

Predicting Disease Outbreaks with Climate Data
Abdullahi, T., and Nitschke, G.
In Proceedings of the IEEE Congress on Evolutionary Computation (IEEE CEC 2021), Kraków, Poland

Disease Outbreaks: Tuning Predictive Machine Learning
Abdullahi, T., and Nitschke. G.
In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 2021), ACM, Lille, France


Selected Journal Articles

Identifying and Timing Patient Outcomes in Clinician Notes Using Large Language Models
Abdullahi, T., A. Hamzeh, I. Sears, N. Abadi, R. Singh, C. Eickhoff, A. Abbasi
Artificial Intelligence in Medicine, (2026)

Retrieval-Based Diagnostic Decision Support: A Mixed Methods Study
Abdullahi, T., Mercurio L., Singh, R., and Eickhoff, C.
JMIR Med Informatics. (2024)

Learning to Make Rare and Complex Diagnoses With Generative AI Assistance: Qualitative Study of Popular Large Language Models
Abdullahi, T., Singh, R., and Eickhoff, C.
JMIR Edu. (2024)

Predicting Diarrhoea Outbreaks with Climate Change
Abdullahi, T., Nitschke, G., and Sweijd, N.
PLoS ONE. (2022)


Last Updated on 03/09/2026 by Tassallah Abdullahi