Captcha Breaker HOWTO
=====================
This howto will take you through using Captcha Breaker to break a given
captcha. This howto covers only how to use the solvers once you already have
image files. This howto does NOT cover how to extract that file from a web
site, nor how to enter the text back.
At the end, you will pass your image file to the breaker and the last line of
output will be the text (with *'s for unrecognized characters):
$ ./rogers samples/HJQ1QX.gif
[...]
[Rotter] Dims
[Rotter] rows 2
[Rotter] cols 136
HJQ1QX
0. Install Dependencies
=======================
For ubuntu/debian run this command as root:
# apt-get install libcamlimages-ocaml libcamlimages-ocaml-dev \
libocamlgsl-ocaml libocamlgsl-ocaml-dev ocaml-findlib \
ocaml-native-compilers m4 make
1. Unpack the Package
=====================
Assuming you downloaded captcha.2008mmdd.tar.gz in the current directory:
$ tar xzvf captcha.2008mmdd.tar.gz
$ cd captcha
2. Gather Some Samples
======================
Go to your website and generate a few captchas. Solve them (using your brain)
and save them in a directory (samples/) and use the text in the image as the
file name (for example, a captcha that reads "HJQ1QX" would be saved as
samples/HJQ1QX.gif).
Make sure to collect enough samples to cover the entire character set at least
once. The more samples per character the better.
Some captchas use special characters like "*". Those are still legal in Linux
filenames. Save as a simple name first, like "sample.gif", then rename using
the mv command and single quotes:
$ mv sample.gif 'MN*21B.gif'
3. Segment the Samples
======================
This part splits the samples you gathered into separate character images. First
build the segmenter for your website:
$ make _segmenter
Right now, there's only the following segmenters:
digg_segmenter, seedpeer_segmenter, aim_segmenter, pirate_bay_segmenter
Then run the segmenter on the samples you gathered. This will create files in
the segments/ directory:
$ ./_segmenter samples/*.gif
4. Move the Segments
====================
Here we group each of the sample segments by the character they represent.
$ ls -1 segments/*.png | perl segmover.pl | tee segmove.sh
This creates a segmove.sh that contains 'mv' commands for this grouping.
Inspect the output to make sure it's not doing something stupid and run it:
$ sh segmove.sh
The segments will be moved to a directory called "out". Its contents need to be
copied in the correct fonts directory for the solvers to use this data:
$ mv out/* fonts/
5. Use the Solver
=================
First build the solver:
$ make
Where is one of:
aim_c, rogers, digg, seedpeer, aim_ml, phpbb, ebaum, lilo, pirate_bay
Save another captcha, this time we will be solving it:
$ ./ input.gif
[...]
[Rotter] Dims
[Rotter] rows 2
[Rotter] cols 136
HJQ1QX
Sometimes, the solver has trouble recognizing some characters. These will
appear as "*".
6. Learn Something
==================
Read final.medium.png.