PERFORMANCE OF GENERATIVE ARTIFICIAL INTELLIGENCE MODELS IN PUBLIC DENTISTRY EXAM QUESTIONS: A COMPARATIVE STUDY OF THE ACCURACY RATE IN COLLECTIVE ORAL HEALTH

Authors

DOI:

https://doi.org/10.70187/recisatec.v6i1.414

Keywords:

Generative Artificial Intelligence, Oral Health, Public Health

Abstract

The advancement of generative artificial intelligence has sparked interest in its application in the health field, including dentistry. However, studies evaluating the performance of these tools in specific contexts of public dentistry, such as solving public service examination questions, are still scarce. Therefore, this study aimed to evaluate and compare the accuracy rates of three generative artificial intelligence models: ChatGPT, Gemini, and DeepSeek, in their free versions, in solving 100 public service examination questions in the area of ​​public oral health. The questions were extracted from public exam banks for dentists conducted between 2016 and 2026, covering topics such as oral epidemiology, SUS (Brazilian Unified Health System) public policies, health surveillance, social determinants, and service management. Each question was applied individually to the three models using a standardized prompt, without a history of previous conversations, and the accuracy rate was calculated considering each correct answer as equivalent to 1 percentage point. The results demonstrated that ChatGPT achieved the best performance (75 correct answers), followed by Gemini (47 correct answers) and DeepSeek (23 correct answers). The differences between all pairs were statistically significant (p < 0.001), with ChatGPT outperforming Gemini by 28 percentage points and DeepSeek by 52 percentage points. It is concluded that, among the free models tested, only ChatGPT would achieve the minimum score for approval in most public examinations for dental surgeons in collective oral health, while Gemini and DeepSeek, in the versions evaluated, did not prove to be reliable tools for this purpose.

Downloads

Download data is not yet available.

Author Biographies

  • Tânia Adas Saliba, Universidade Estadual Paulista "Júlio de Mesquita Filho". (UNESP). Araçatuba, São Paulo, Brasil.

    Doutora em Odontologia Legal e Deontologia. Universidade  Estadual  Paulista  "Júlio  de  Mesquita  Filho". (UNESP). Araçatuba, São Paulo, Brasil. 

  • Eder Akydawan de Paiva Gomes Fernandes, Universidade Estadual Paulista "Júlio de Mesquita Filho". (UNESP). Araçatuba, São Paulo, Brasil.

    Doutorando em Saúde Coletiva em Odontologia. Universidade Estadual Paulista "Júlio de Mesquita Filho" - UNESP. Araçatuba, São Paulo, Brasil. 

  • Cristhiane Martins Schmidt, Universidade Estadual Paulista "Júlio de Mesquita Filho". (UNESP). Araçatuba, São Paulo, Brasil.

    Doutora em Biologia Buco-Dental. Universidade  Estadual  Paulista  "Júlio  de  Mesquita  Filho". (UNESP). Araçatuba, São Paulo, Brasil. 

References

ALANSARI, Aisha; LUQMAN, Hamzah. Large language models hallucination: A comprehensive survey. Computer Science Review, v. 61, p. 100970, 1 ago. 2026.

ARAÚJO, Samara Lavínnya Serrano de Souza et al. Impactos do ChatGPT no ensino da Odontologia: Uma revisão de escopo. Arquivos em Odontologia, v. 61, p. 213–228, 20 dez. 2025.

BAMASHMOUS, Mohamed. The Role of Artificial Intelligence in Transforming Dental Public Health: Current Applications, Ethical Considerations, and Future Directions. 2025.

BHUYAN, Soumitra S. et al. Generative Artificial Intelligence Use in Healthcare: Opportunities for Clinical Excellence and Administrative Efficiency. Journal of Medical Systems, v. 49, n. 1, p. 10, 2025.

BICALHO, Gabriela Magalhães; OLIVEIRA, Arthur Henrique de; GUIDA, José Paulo de Siqueira. Desempenho da inteligência artificial em questões de processo seletivo de residência médica. Femina, v. 52, n. 6, p. 370–373, 14 maio 2025.

FIGUEIREDO, Maria Clara Pimenta de et al. Performance of the Artificial Intelligence large language models ChatGPT 3.5, Gemini (Google Bard), ChatGPT 4.0, and Gemini 2.5 flash in surgical subspecialty questions of Brazilian medical residency exams. Performance of the Artificial Intelligence large language models ChatGPT 3.5, Gemini (Google Bard), ChatGPT 4.0, and Gemini 2.5 flash in surgical subspecialty questions of Brazilian medical residency exams, v. 24, 2026.

MARTINS, Diogo Gonçalves dos Santos et al. Análise comparativa de desempenho entre ChatGPT, Scholar GPT e DeepSeek em provas teóricas do Conselho Brasileiro de Oftalmologia 2022. Rev. bras.oftalmol., v. 85, 11 fev. 2026.

NARVAI, Paulo Capel. Saúde bucal coletiva: caminhos da odontologia sanitária à bucalidade. Revista de Saúde Pública, v. 40, p. 141–147, 2006.

PARK, Ye-Jean et al. Assessing the research landscape and clinical utility of large language models: a scoping review. BMC Medical Informatics and Decision Making, v. 24, n. 1, p. 72, 12 mar. 2024.

SAVEGNAGO, Gleica Dal’ Ongaro et al. Inteligência artificial na odontologia: uma revisão narrativa de literatura. RFO UPF, 2024.

WANG, Shanshan et al. Generative Artificial Intelligence in Medical Imaging: Foundations, Progress, and Clinical Translation. Research, v. 8, p. 1029, 2025.

YALAMANCHILI, Amulya et al. Quality of Large Language Model Responses to Radiation Oncology Patient Care Questions. JAMA Network Open, v. 7, n. 4, p. e244630, 2 abr. 2024.

Published

2026-05-06

How to Cite

Tânia Adas Saliba, Eder Akydawan de Paiva Gomes Fernandes, & Cristhiane Martins Schmidt. (2026). PERFORMANCE OF GENERATIVE ARTIFICIAL INTELLIGENCE MODELS IN PUBLIC DENTISTRY EXAM QUESTIONS: A COMPARATIVE STUDY OF THE ACCURACY RATE IN COLLECTIVE ORAL HEALTH. RECISATEC SCIENTIFIC JOURNAL - ISSN 2763-8405, 6(1), e61414. https://doi.org/10.70187/recisatec.v6i1.414

Similar Articles

11-20 of 185

You may also start an advanced similarity search for this article.