The other day while hunting for security vulnerabilities on a website I had to perform a set of tasks using a huge list of random words on a pretty big list of subdomains of the website under test. This is actually not a big deal. I could fire a quick and dirty bash script to accomplish this in no time and that's what I did.
I started the job and focused on looking at other attack vectors. I got so involved in the research that I practically forgot that I had started a script on the other terminal. When I realized it was about an hour up and hardly one subdomain was fuzzed.
Automation - necessity of efficiency
I stopped whatever I was doing and sat back to think how can this be automated with efficiency. Well, it was already automated, but wasn't efficient to provide results as quickly I wanted. The immediate solution that struck me was - what if I make each task a background process and fire the script. This way all will execute simultaneously.
To give you an idea below was what I was looking at.
File 1 : wordlist.txt ($wc -l wordlist.txt returned ~500k lines of word :) )
File 2: subdomains.txt ($wc -l subdomains.txt returned ~100 subdomains)
Task is to execute a set of command(s) which primarily includes http requests followed
by regex to parse interesting parts from the response against each url in "subdomains.txt"
using every word in the "wordlist.txt".
I have the necessary commands scripted in a bash script "run.sh" that takes two parameters a. subdomains.txt b. wordlist.txt
cat subdomains.txt | while read url;
do
cat wordlist.txt | while read word;
do
cmd1 -u url -w word &
cmd2 -u url -w word &
done
done
This did the trick. The ampersands at the end of cmd1 and cmd2 made the processes background and stuff got executed without waiting. But soon the network bottlenecked and the CPU was operating at 100% without any room for other requests. In fact, I had wait few seconds to login and the latency got worse with every command I tried to execute.
Time to step back and think.
Quick search on google led me to GNU Parallel. Read up, it's pretty cool. I spent no time implementing this and below was how I did it.
Notes:
- My server (raspberry pi 4) has 4 cores. Therefore, I could run 4 threads at ease. In fact "GNU Parallel" does it for me.
- As soon as a thread finishes I should be able to throw another chunk of task to the free core.
Therefore, at any point of time all my cores would be involved in processing data without clogging the CPU and Network.
cat subdomains.txt | while read url;
do
t_lines=`wc -l < wordlist.txt`
s_line=$(($t_lines/10)) //Here 10 is the number of files I am splitting wordlist into.
split -l $s_line -d wordlist.txt split.wordlist.txt
ls split.* | parallel run.sh -u url -w
done
Below is what the snippet of code above is likely to do:
a. wordlist is split into 10 files. In this case each file will have roughly 50k words as opposed to 500k in one file.
b. at any point of time 4 threads would run in parallel because I have 4 cores. Therefore, 4 files would get used in parallel.
c. as soon as a thread completes the 5th file will get pushed for execution on the free core.
This helps in parallelizing the operation using the cores to the max without me having to deal with the bottleneck. Here is a great stackoverflow thread that explains the efficiency of GNU parallel. I could get this implemented in less than 30mins and the task that I set out for completed in a record 4 hours which if I had left to the traditional means of parallelism would have consumed nothing less than 36 to 48 hours.
If you like or didn't like what you read, let me know. I'd love to read your feedback. If you want to collaborate feel free to drop me a text here or twitter and we could either hunt together or build new stuff. If you are not one of those who wishes to do any of the above; do not hesitate to share.
Talk to you later, ciao!