I built a vulnerable app and spent $1,500 seeing if LLMs could hack it
A security researcher created a vulnerable app to test if large language models (LLMs) could exploit it. The app, a book review platform, aimed to see if LLMs could access private user reviews through Firebase. The researcher spent $1,500 on the experiment, which was not scientifically rigorous but aimed to explore the capabilities of various LLMs in identifying security flaws.
- ▪The app was built using React Native and Python, with Firebase as the data layer.
- ▪The goal was to exploit common vulnerabilities related to Firebase's access control.
- ▪The researcher conducted multiple runs with different LLMs, spending a total of $1,500 on the tests.
Opening excerpt (first ~120 words) tap to expand
Thoughts · Jun 3, 2026 I built a vulnerable app and spent $1,500 seeing if LLMs could hack it As a part of my work I do security research for various apps and websites. I wanted to see if LLMs could reproduce a common class of exploits I’ve found in multiple apps. I made a fake React Native app in Expo and a backend in Python. It’s a book review app and the goal is to find a flag in a user’s private reviews. If you would like to try solving it yourself before I spoil it, here’s a ZIP of the APK and challenge description each LLM was fed.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at Kasra Rahjerdi.