
Pandora's Box AI Honeypot
What is Pandora's Box?
Pandora's Box is an LLM-powered honeypot—a security measure whose goal is to study cyber-attacks and deflect attackers away from real systems.This is achieved by training our model to provide context-aware responses to web requests,making it less likely for an attacker to realise they're going after a honeypot and providing more insight into their methods and how to react.
We were inspired to build Pandora's Box during LaunchHacks IV (a hackathon hosted on devpost) after finding that existing honeypot solutions relied on external LLM APIs. These often come with ongoing costs, and privacy concerns. To fix this we set out to make a solution that was easy to deploy and could be run locally. Our team consisted of myself, Brandon and Sam.
The Build Process
I primarily worked on the underlying honeypot during the project. One of my other teammates suggested we use Go and while all my experience with Go was reading some of the 'Go by Example' guide to the language I was keen for the challenge.
Having experience working with individual packets to make an intrusion detection system for a university assignment I initially utilised gopacket to try and intercept the packets. After spending the first night working on this, I came to the realisation that this was over complicating it and would require managing the handshake process as well as requiring Windows users to install either WinPcap or Npcap. The solution to this was using Go's awesome net/http library to create a basic http server. Working with net/http proved to be quite fun, it’s well documented and has plenty of online guides centered around it. In the end I had a server I could send requests to and get back responses from our LLM.
For our LLM our teammate Brandon fintuned the existing distilgpt2 to leverage its small size in creating fast responses. For training, a combination of real data and synthesised data created using a python script was used to conduct 10+ training runs during the event. In the end we had a model that we were satisfied with its speed and quality of response. Running on an RTX 5070ti it will produce results in an under a second. The final model can be found on Hugging Face.
For monitoring the honeypot and its statistics Sam designed and developed a front-end dashboard and corresponding statistics server. These turned out great, providing an easy way to view relevant statistics and request history.
Additionally, a classification for requests was added. Unfortunately we ran out of time to train another model to classify these, but optionally a Gemini API key can be added to use this functionality.
Biggest Challenges
Finding data was a huge hurdle for this project. Our final solution wasto collect and create the data ourselves. Had we been able to find existing data sets I believe we could have extended the capabilities of Pandora's Box and remove the need for a Gemini API key.
Another challenge I personally faced was trying to setup an SSH server so my GPU could be used for the LLM training. I spent too long unable to work out why I couldn't get SSH working and considered moving my GPU to a Fedora machine, which I'd had success in hosting an SSH server on in the past. As a last attempt I set up Ubuntu through WSL and realised my ISP had CGNAT enabled and I'd wasted a bunch of time. Well, not completely wasted, as I learnt plenty about how SSH and CGNAT works. I ended up using Tailscale, which was exactly what I needed and worked flawlessly.
Final Remarks
The LaunchHacks IV hackathon was a lot of fun. I gained valuable knowledge and experience in Go and remote access for computers, and we even ended up taking 5th place in the end. With just over 100 projects submitted I'm very pleased with the outcome.
Feel free to checkout the project at its GitHub, LaunchHacks IV submission page, or the final model on Hugging Face.