TLDR:
Researchers discovered that GPT-4 can exploit 87% of one-day vulnerabilities, outperforming other models. The manuscript explores the capabilities of large language models in hacking real-world vulnerabilities.
Article Summary:
Large language models (LLMs) like GPT-4 have shown potential in various areas, including cybersecurity. Researchers found that GPT-4 can exploit 87% of one-day vulnerabilities, surpassing other models and open-source scanners.
The study used a benchmark of 15 real-world vulnerabilities, covering websites, containers, and Python packages. GPT-4’s success rate dropped to 7% without CVE descriptions, highlighting its ability to exploit known vulnerabilities more effectively than finding new ones.
Results showed that GPT-4 could exploit complex vulnerabilities, launch different attack methods, craft exploit codes, and manipulate non-web vulnerabilities. Additional features like planning and subagents improved GPT-4’s autonomy in exploitation.
The analysis highlights the potential of GPT-4 in cybersecurity and raises questions about the balance between leveraging known vulnerabilities and discovering new ones. Future research may focus on enhancing LLM agents’ abilities in hacking real-world systems.