Click Here to Read more about: Captcha Breaking (final.medium.png) ; why Captchas are not great for security ; why you shouldn't oppress other users with your insidious turing tests. Also I realize this is not readable without OCR

READ THE AWESOME LICENSE! See the source code!
Download the source code!
See the great fun howto!

The Captcha Breaker Readme

Captcha Breaker

ALL software unless otherwise noted is distributed 
under the GPL-3 License (c) 2006-2007 Abram Hindle
see GPL-3.0, LICENSE and/or HACKING

You will need ocaml to compile this

For ubuntu/debian install the following packages:

libcamlimages-ocaml
libcamlimages-ocaml-dev
libocamlgsl-ocaml
libocamlgsl-ocaml-dev
ocaml-findlib
ocaml-native-compilers
m4
make

Ocaml 3.09.2 is recommended
m4 is needed
Make is needed

make phpbb digg seedpeer piratebay

Should build the captcha breakers, they need fonts though.


PHPBB comes with an example font file

The captcha breakers expect a:
segments directory
fonts directory
in their current directory

I can't distribute copyrighted captchas so I just show the font skeleton. You
can probably make your own "font" for whatever site you want. I can't limit
what you do with this software other than how you license it. Please read
GPL-3.0 to understand your rights.

./phpbb imagefile.gif

The last line will contain the guess of the captcha.

========= How do I break a captcha =======
Read this giant image final.medium.png

1. CLean up the image
2. Segment the image
3. Annotate the segments per letter
3.1 Make a font directory
4. Define a solver which uses that font (see phpbb for example)

====== What are the limitations ========

I couldn't get shape matching working very well:

http://www.eecs.berkeley.edu/Research/Projects/CS/vision/shape/mori-gimpy.pdf

I can solve linear transformations like skew and rotation but
non-linear warps are difficult. E.g. google/aim/yahoo.

This code here is more of a repository of attempts at breaking
captchas, some successful, some not.

Other interesting work to look at includes:

http://www.ceas.cc/papers-2005/160.pdf

Essentially with enough time (1-2 weeks) I could probably get even
google done. It is just a lot implementation and testing.

======= This is wrong! ======

Read this giant image final.medium.png (linked at the top)

I pose a reasonable argument. I think it limits us too much and limits
progress. It especially harms those who use alternative software to view
standard web pages and the disabled (who probably use such software).

You shouldn't rely on really poor security. Perhaps make your users smarter,
there are other methods of verifying a warm body is behind a keyboard.

====== Contact Info =======

captchas at churchturing dot org

===== I hate you =====

I hope you enjoy the yellow on blue text then.

===== Your Documentation Sucks ====

I have no incentive to make them any better ;)

===== Your code Sucks ====

Hey! >:(


Captcha Breaker HOWTO

Captcha Breaker HOWTO
=====================

This howto will take you through using Captcha Breaker to break a given 
captcha. This howto covers only how to use the solvers once you already have 
image files. This howto does NOT cover how to extract that file from a web 
site, nor how to enter the text back.

At the end, you will pass your image file to the breaker and the last line of 
output will be the text (with *'s for unrecognized characters):

 $ ./rogers samples/HJQ1QX.gif
 [...]
 [Rotter] Dims
 [Rotter] rows 2
 [Rotter] cols 136
 HJQ1QX

0. Install Dependencies
=======================
For ubuntu/debian run this command as root:

 # apt-get install libcamlimages-ocaml libcamlimages-ocaml-dev \
     libocamlgsl-ocaml libocamlgsl-ocaml-dev ocaml-findlib \
     ocaml-native-compilers m4 make

1. Unpack the Package
=====================
Assuming you downloaded captcha.2008mmdd.tar.gz in the current directory:
 $ tar xzvf captcha.2008mmdd.tar.gz
 $ cd captcha

2. Gather Some Samples
======================
Go to your website and generate a few captchas. Solve them (using your brain) 
and save them in a directory (samples/) and use the text in the image as the 
file name (for example, a captcha that reads "HJQ1QX" would be saved as 
samples/HJQ1QX.gif).

Make sure to collect enough samples to cover the entire character set at least 
once. The more samples per character the better.

Some captchas use special characters like "*". Those are still legal in Linux 
filenames. Save as a simple name first, like "sample.gif", then rename using 
the mv command and single quotes:
  $ mv sample.gif 'MN*21B.gif'

3. Segment the Samples
======================
This part splits the samples you gathered into separate character images. First 
build the segmenter for your website:
 $ make _segmenter

Right now, there's only the following segmenters:
 digg_segmenter, seedpeer_segmenter, aim_segmenter, pirate_bay_segmenter

Then run the segmenter on the samples you gathered. This will create files in 
the segments/ directory:
 $ ./_segmenter samples/*.gif

4. Move the Segments
====================
Here we group each of the sample segments by the character they represent.
 $ ls -1 segments/*.png | perl segmover.pl | tee segmove.sh

This creates a segmove.sh that contains 'mv' commands for this grouping. 
Inspect the output to make sure it's not doing something stupid and run it:
 $ sh segmove.sh

The segments will be moved to a directory called "out". Its contents need to be 
copied in the correct fonts directory for the solvers to use this data:
 $ mv out/* fonts/

5. Use the Solver
=================
First build the solver:
 $ make 

Where  is one of:
  aim_c, rogers, digg, seedpeer, aim_ml, phpbb, ebaum, lilo, pirate_bay

Save another captcha, this time we will be solving it:
 $ ./ input.gif
 [...]
 [Rotter] Dims
 [Rotter] rows 2
 [Rotter] cols 136
 HJQ1QX

Sometimes, the solver has trouble recognizing some characters. These will 
appear as "*".

6. Learn Something
==================
Read final.medium.png.