Ningke Li
Ph.D. student

I am Ningke Li, a first-year CS Ph.D. student at the TEST Lab, National University of Singapore, where I have been conducting research under the supervision of Prof. Manuel Rigger since 2025. My research focuses on improving the reliability of complex systems through two primary directions: leveraging AI techniques to enhance software testing (AI for Testing) and systematically evaluating and strengthening the trustworthiness of AI systems (Testing for AI). I also have experience in program analysis and source code vulnerability detection.

I received my master degree in Cybersecurity from Huazhong University of Science and Technology in 2025 under the supervision of Prof. Haoyu Wang and Prof. Kailong Wang. I received my B.E. degree in Information Security from Beijing University of Posts and Telecommunications in 2022.


πŸ‘©β€πŸŽ“Education
  • National University of Singapore
    National University of Singapore National University of Singapore 2
    TEST Lab
    Ph.D. Student
    Aug. 2025 - present
  • Huazhong University of Science and Technology
    Huazhong University of Science and Technology Huazhong University of Science and Technology 2
    Security Pride Lab
    M.S. in Cybersecurity
    Sep. 2022 - Jun. 2025
  • Beijing University of Posts and Telecommunications
    Beijing University of Posts and Telecommunications
    B.S. in Information Security
    Sep. 2018 - Jun. 2022
πŸ† Honors & Awards
  • China National Scholarship
    2023
  • Outstanding Graduates of Beijing
    2022
  • China National Scholarship
    2019
πŸ‘©πŸ»β€πŸ« Services
  • Reviewer/Sub-reviewer
    ASE, FSE, MSR, Internetware, EMSE, ICECCS...
  • Student Volunteer
    OOPSLA 2025
  • Member
    NUS SoC Student Area Search Committees
News
2026
Our paper on exposing logical flaws in multi-step LLM reasoning using ATP is accepted by ICSE26.
Jan 15
2025
Our survey paper on LLM4Security is accepted by TOSEM.
Sep 21
I become a PhD student at NUS.
Aug 03
I received my master degree at HUST.
Jun 21
2024
Our paper on metamorphic testing about LLM hallucination is accepted by OOPSLA 2024.
Jun 13
2023
Our paper on malicious npm/pypi package detection is accepted by ASE 2023 (Industry Challenge Track, full paper).
Aug 15
Our experience paper on label errors and denoising in deep learning-based vulnerability detection is accepted by ISSTA 2023.
Apr 07
2022
Our paper on bug-triggering paths and multi-metrics in deep learning-based vulnerability detection is accepted by TDSC 2022, Vol.8.
Jul 17
I receive my B.E. degree at BUPT.
Jun 30
Star Recent Publications (view all )
Beyond Correctness: Exposing LLM-generated Logical Flaws in  Reasoning via Multi-step Automated Theorem Proving
ICSE
Beyond Correctness: Exposing LLM-generated Logical Flaws in Reasoning via Multi-step Automated Theorem Proving

Xinyi Zheng*; Ningke Li*; Xiaokun Luan; Kailong Wang; Ling Shi; Meng Sun; Haoyu Wang. (* equal contribution)

48th IEEE/ACM International Conference on Software Engineering (ICSE) 2026

An automated theorem-proving-based framework for verifying multi-step LLM reasoning, translating natural language into first-order logic to detect hidden logical errors and systematically assess reasoning correctness across diverse benchmarks.

Beyond Correctness: Exposing LLM-generated Logical Flaws in Reasoning via Multi-step Automated Theorem Proving

Xinyi Zheng*; Ningke Li*; Xiaokun Luan; Kailong Wang; Ling Shi; Meng Sun; Haoyu Wang. (* equal contribution)

48th IEEE/ACM International Conference on Software Engineering (ICSE) 2026

An automated theorem-proving-based framework for verifying multi-step LLM reasoning, translating natural language into first-order logic to detect hidden logical errors and systematically assess reasoning correctness across diverse benchmarks.

ICSE
Large Language Models are overconfident and amplify human bias
arxiv
Large Language Models are overconfident and amplify human bias

Fengfei Sun*; Ningke Li*; Kailong Wang; Lorenz Goette. (* equal contribution)

Under review 2025

LLMs exhibit significant overconfidence in reasoning tasks, often exceeding human levels, and can amplify human overconfidence when their outputs are used as input.

Large Language Models are overconfident and amplify human bias

Fengfei Sun*; Ningke Li*; Kailong Wang; Lorenz Goette. (* equal contribution)

Under review 2025

LLMs exhibit significant overconfidence in reasoning tasks, often exceeding human levels, and can amplify human overconfidence when their outputs are used as input.

arxiv
Large language models for cyber security: A systematic literature review
TOSEM
Large language models for cyber security: A systematic literature review

Hanxiang Xu; Shenao Wang; Ningke Li; Kailong Wang; Yanjie Zhao; Kai Chen; Ting Yu; Yang Liu; Haoyu Wang.

ACM Transactions on Software Engineering and Methodology (TOSEM) 2025

A literature review about the use of LLMs in cybersecurity, analyzing applications, trends, techniques, and challenges.

Large language models for cyber security: A systematic literature review

Hanxiang Xu; Shenao Wang; Ningke Li; Kailong Wang; Yanjie Zhao; Kai Chen; Ting Yu; Yang Liu; Haoyu Wang.

ACM Transactions on Software Engineering and Methodology (TOSEM) 2025

A literature review about the use of LLMs in cybersecurity, analyzing applications, trends, techniques, and challenges.

TOSEM
Drowzee: Metamorphic Testing for Fact-conflicting Hallucination Detection in Large Language Models
OOPSLA
Drowzee: Metamorphic Testing for Fact-conflicting Hallucination Detection in Large Language Models

Ningke Li*; Yuekang Li*; Yi Liu; Ling Shi; Kailong Wang; Haoyu Wang. (* equal contribution)

Object-Oriented Programming, Systems, Languages & Applications (OOPSLA) 2024

Fact-Conflicting Hallucinations in LLMs are prevalent and challenging to detect, but logic-programming-based metamorphic testing effectively generates diverse test cases and identifies reasoning errors across multiple models and domains.

Drowzee: Metamorphic Testing for Fact-conflicting Hallucination Detection in Large Language Models

Ningke Li*; Yuekang Li*; Yi Liu; Ling Shi; Kailong Wang; Haoyu Wang. (* equal contribution)

Object-Oriented Programming, Systems, Languages & Applications (OOPSLA) 2024

Fact-Conflicting Hallucinations in LLMs are prevalent and challenging to detect, but logic-programming-based metamorphic testing effectively generates diverse test cases and identifies reasoning errors across multiple models and domains.

OOPSLA
All publications