Ningke Li

Ph.D. student

I am Ningke Li, a first-year CS Ph.D. student at the TEST Lab, National University of Singapore, where I have been conducting research under the supervision of Prof. Manuel Rigger since 2025. My research focuses on improving the reliability of complex systems through two primary directions: leveraging AI techniques to enhance software testing (AI for Testing) and systematically evaluating and strengthening the trustworthiness of AI systems (Testing for AI). I also have experience in program analysis and source code vulnerability detection.

I received my master degree in Cybersecurity from Huazhong University of Science and Technology in 2025 under the supervision of Prof. Haoyu Wang and Prof. Kailong Wang. I received my B.E. degree in Information Security from Beijing University of Posts and Telecommunications in 2022.

Singapore zmlnk001(at)gmail.com Google Scholar GitHub LinkedIn

👩‍🎓Education

National University of Singapore

TEST Lab
Ph.D. Student

Aug. 2025 - present
Huazhong University of Science and Technology

Security Pride Lab
M.S. in Cybersecurity

Sep. 2022 - Jun. 2025
Beijing University of Posts and Telecommunications

B.S. in Information Security

Sep. 2018 - Jun. 2022

🏆 Honors & Awards

China National Scholarship

2023
Outstanding Graduates of Beijing

2022
China National Scholarship

2019

👩🏻‍🏫 Services

Reviewer/Sub-reviewer

ASE, FSE, MSR, Internetware, EMSE, ICECCS...
Student Volunteer

OOPSLA 2025
Member

NUS SoC Student Area Search Committees

News

2026

Our paper on exposing logical flaws in multi-step LLM reasoning using ATP is accepted by ICSE26.

Jan 15

2025

Our survey paper on LLM4Security is accepted by TOSEM.

Sep 21

I become a PhD student at NUS.

Aug 03

I received my master degree at HUST.

Jun 21

2024

Our paper on metamorphic testing about LLM hallucination is accepted by OOPSLA 2024.

Jun 13

2023

Our paper on malicious npm/pypi package detection is accepted by ASE 2023 (Industry Challenge Track, full paper).

Aug 15

Our experience paper on label errors and denoising in deep learning-based vulnerability detection is accepted by ISSTA 2023.

Apr 07

2022

Our paper on bug-triggering paths and multi-metrics in deep learning-based vulnerability detection is accepted by TDSC 2022, Vol.8.

Jul 17

I receive my B.E. degree at BUPT.

Jun 30

Recent Publications (view all )

ICSE

Beyond Correctness: Exposing LLM-generated Logical Flaws in Reasoning via Multi-step Automated Theorem Proving

Xinyi Zheng*; Ningke Li*; Xiaokun Luan; Kailong Wang; Ling Shi; Meng Sun; Haoyu Wang. (* equal contribution)

48th IEEE/ACM International Conference on Software Engineering (ICSE) 2026

An automated theorem-proving-based framework for verifying multi-step LLM reasoning, translating natural language into first-order logic to detect hidden logical errors and systematically assess reasoning correctness across diverse benchmarks.

[Arxiv]

Beyond Correctness: Exposing LLM-generated Logical Flaws in Reasoning via Multi-step Automated Theorem Proving

Xinyi Zheng*; Ningke Li*; Xiaokun Luan; Kailong Wang; Ling Shi; Meng Sun; Haoyu Wang. (* equal contribution)

48th IEEE/ACM International Conference on Software Engineering (ICSE) 2026

[Arxiv]

ICSE

arxiv

Large Language Models are overconfident and amplify human bias

Fengfei Sun*; Ningke Li*; Kailong Wang; Lorenz Goette. (* equal contribution)

Under review 2025

LLMs exhibit significant overconfidence in reasoning tasks, often exceeding human levels, and can amplify human overconfidence when their outputs are used as input.

[Paper]

Large Language Models are overconfident and amplify human bias

Fengfei Sun*; Ningke Li*; Kailong Wang; Lorenz Goette. (* equal contribution)

Under review 2025

LLMs exhibit significant overconfidence in reasoning tasks, often exceeding human levels, and can amplify human overconfidence when their outputs are used as input.

[Paper]

arxiv

TOSEM

Large language models for cyber security: A systematic literature review

Hanxiang Xu; Shenao Wang; Ningke Li; Kailong Wang; Yanjie Zhao; Kai Chen; Ting Yu; Yang Liu; Haoyu Wang.

ACM Transactions on Software Engineering and Methodology (TOSEM) 2025

A literature review about the use of LLMs in cybersecurity, analyzing applications, trends, techniques, and challenges.

[Paper] [Arxiv]

Large language models for cyber security: A systematic literature review

Hanxiang Xu; Shenao Wang; Ningke Li; Kailong Wang; Yanjie Zhao; Kai Chen; Ting Yu; Yang Liu; Haoyu Wang.

ACM Transactions on Software Engineering and Methodology (TOSEM) 2025

A literature review about the use of LLMs in cybersecurity, analyzing applications, trends, techniques, and challenges.

[Paper] [Arxiv]

TOSEM

OOPSLA

Drowzee: Metamorphic Testing for Fact-conflicting Hallucination Detection in Large Language Models

Ningke Li*; Yuekang Li*; Yi Liu; Ling Shi; Kailong Wang; Haoyu Wang. (* equal contribution)

Object-Oriented Programming, Systems, Languages & Applications (OOPSLA) 2024

Fact-Conflicting Hallucinations in LLMs are prevalent and challenging to detect, but logic-programming-based metamorphic testing effectively generates diverse test cases and identifies reasoning errors across multiple models and domains.

[Paper] [Arxiv] [Code]

Drowzee: Metamorphic Testing for Fact-conflicting Hallucination Detection in Large Language Models

Ningke Li*; Yuekang Li*; Yi Liu; Ling Shi; Kailong Wang; Haoyu Wang. (* equal contribution)

Object-Oriented Programming, Systems, Languages & Applications (OOPSLA) 2024

[Paper] [Arxiv] [Code]

OOPSLA

Warning

Action required

👩‍🎓Education

🏆 Honors & Awards

👩🏻‍🏫 Services

News

Recent Publications (view all )

Beyond Correctness: Exposing LLM-generated Logical Flaws in Reasoning via Multi-step Automated Theorem Proving

Beyond Correctness: Exposing LLM-generated Logical Flaws in Reasoning via Multi-step Automated Theorem Proving

Large Language Models are overconfident and amplify human bias

Large Language Models are overconfident and amplify human bias

Large language models for cyber security: A systematic literature review

Large language models for cyber security: A systematic literature review

Drowzee: Metamorphic Testing for Fact-conflicting Hallucination Detection in Large Language Models

Drowzee: Metamorphic Testing for Fact-conflicting Hallucination Detection in Large Language Models

All publications