Planet London Python

January 27, 2015

Ian Ozsvald

Annotate.io self-learning text cleaner demo online

A few weeks I posted some notes on a self-learning text cleaning system, to be used by data scientists who didn’t want to invest time cleaning their data by hand. I have a first demo online over at annotate.io (the demo code is here in github).

The intuition behind this is that we currently divert a lot of mental resource early in a project to cleaning data and a bunch of that can be spent just figuring out which libraries will help with the cleaning. What if we could just let the machine do that for us? We can then focus on digging into new data and figuring out how to solve the bigger problems.

With annotate.io you give it a list of “data you have” and “data you want”, it’ll figuring out how to transform the former into the latter.  With the recipe it generates you then feed in new data and it performs the cleaning for you. You don’t have to install any of the libraries it might use (that’s all server-side).

Using Python 2.7 or 3.4 you can run the demo in github (you need the requests library). You can sign-up to the announce list if you’d like to be kept informed on developments.


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

by Ian at January 27, 2015 10:51 PM