OCR, Optical character recognition, is being used widely now a days for everything from spamming people via text on twitter to actual business needs. Amazingly enough Python and Tesseract make this a extremely trivial task. Below we will take a business card from Pivotal (I sent him an e-mail asking for permission) and we will read it with code! What you do with this technology from here is up to you.
sudo apt-get install libtiff5-dev libjpeg8-dev zlib1g-dev libfreetype6-dev liblcms2-dev libwebp-dev tcl8.6-dev tk8.6-dev python-tk tesseract-ocr
sudo dnf install libtiff-devel libjpeg-devel libzip-devel freetype-devel lcms2-devel libwebp-devel tcl-devel tk-devel tesseract
brew install libjpeg zlib tesseract
virtualenv venv . ./venv/bin/activate pip install pytesseract Image
Now your server is ready to start processing text.
#!/usr/bin/env python """ Demonstrate OCR Awesomeness with Pivotal Business Card that was randomly on my desk from DevOpsDays """ try: import Image except ImportError: from PIL import Image import pytesseract def read_card(filename): print(pytesseract.image_to_string(Image.open(filename))) if __name__ == '__main__': read_card('pivot.jpg')