Poor Man's Textract

Introduction

Amazon Textract a (paid) service that "automatically extracts text and data from scanned documents. Amazon Textract goes beyond simple optical character recognition (OCR) to also identify the contents of fields in forms and information stored in tables.". We want to build a free alternative that provides an output of similar quality.

Your job

Improve upon the existing PMT project: https://github.com/kenAlparslan/Texttract

Previous (GCi) tasks that did something (albeit simpler) similar:

Musab Kılıç's exam analyzer
RobOHt's exam analyzer
knightron0's exam analyzer

Qualification tasks

Take a look at this page.