Captcha Breaker HOWTO ===================== This howto will take you through using Captcha Breaker to break a given captcha. This howto covers only how to use the solvers once you already have image files. This howto does NOT cover how to extract that file from a web site, nor how to enter the text back. At the end, you will pass your image file to the breaker and the last line of output will be the text (with *'s for unrecognized characters): $ ./rogers samples/HJQ1QX.gif [...] [Rotter] Dims [Rotter] rows 2 [Rotter] cols 136 HJQ1QX 0. Install Dependencies ======================= For ubuntu/debian run this command as root: # apt-get install libcamlimages-ocaml libcamlimages-ocaml-dev \ libocamlgsl-ocaml libocamlgsl-ocaml-dev ocaml-findlib \ ocaml-native-compilers m4 make 1. Unpack the Package ===================== Assuming you downloaded captcha.2008mmdd.tar.gz in the current directory: $ tar xzvf captcha.2008mmdd.tar.gz $ cd captcha 2. Gather Some Samples ====================== Go to your website and generate a few captchas. Solve them (using your brain) and save them in a directory (samples/) and use the text in the image as the file name (for example, a captcha that reads "HJQ1QX" would be saved as samples/HJQ1QX.gif). Make sure to collect enough samples to cover the entire character set at least once. The more samples per character the better. Some captchas use special characters like "*". Those are still legal in Linux filenames. Save as a simple name first, like "sample.gif", then rename using the mv command and single quotes: $ mv sample.gif 'MN*21B.gif' 3. Segment the Samples ====================== This part splits the samples you gathered into separate character images. First build the segmenter for your website: $ make _segmenter Right now, there's only the following segmenters: digg_segmenter, seedpeer_segmenter, aim_segmenter, pirate_bay_segmenter Then run the segmenter on the samples you gathered. This will create files in the segments/ directory: $ ./_segmenter samples/*.gif 4. Move the Segments ==================== Here we group each of the sample segments by the character they represent. $ ls -1 segments/*.png | perl segmover.pl | tee segmove.sh This creates a segmove.sh that contains 'mv' commands for this grouping. Inspect the output to make sure it's not doing something stupid and run it: $ sh segmove.sh The segments will be moved to a directory called "out". Its contents need to be copied in the correct fonts directory for the solvers to use this data: $ mv out/* fonts/ 5. Use the Solver ================= First build the solver: $ make Where is one of: aim_c, rogers, digg, seedpeer, aim_ml, phpbb, ebaum, lilo, pirate_bay Save another captcha, this time we will be solving it: $ ./ input.gif [...] [Rotter] Dims [Rotter] rows 2 [Rotter] cols 136 HJQ1QX Sometimes, the solver has trouble recognizing some characters. These will appear as "*". 6. Learn Something ================== Read final.medium.png.