Planet London Python

August 28, 2015

Ian Ozsvald

EuroSciPy 2015 and Data Cleaning on Text for ML (talk)

I’m at EuroSciPy 2015, we have 2 days of Pythonistic Science in Cambridge. Next year will be in Bavaria, you can sign-up for announces.

EuroSciPy 2015

I spoke in the morning on Data Cleaning on Text to Prepare for Data Analysis and Machine Learning (which is a terribly verbose title, sorry!). I’ve just covered 10 years of lessons learned working with NLP on (often crappy) text data, and ways to clean it up to make it easy to work with. Topics covered:

  • decoding bytes into unicode (including chardet, ftfy, chromium language detector) to step past the UnicodeDecodeError
  • validating that a new dataset looks like a previous+trusted dataset (I’m thinking of writing a tool for this – would that be useful to you?)
  • automatically transforming data from “what I have” to “what I want” with annotate.io without writing regexps (now public)!
  • manual approaches to normalisation (the stuff I do that started me thinking on annotate.io)
  • visualisation with GlueViz, Seaborn and csv-fingerprint
  • starting your first ML project

Here are the slides:

 

Thanks to Enthought and the org-team for a lovely conference!


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

by Ian at August 28, 2015 10:27 AM

August 26, 2015

Python Anywhere

New release - Web app charts, MySQL upgrade and bug fixes

Hit charts for web apps

Screenshot of hit charts The main change for this release is that we now report hits and errors to your web apps on the web app page. If you're a paying user, you get pretty charts over a range of time periods. If you're not, you'll get a text report.

Web app error reporting

We've greatly improved the errors that are reported when you reload a web app.

Batteries included

As much as is possible, we have tried to bring the packages that we install for Python 3 to parity with Python 2. That means that the number of packages that come preinstalled for Python 3 has increased dramatically.

Database upgrade

All of your databases have been upgraded to MySQL 5.5.

Other stuff

We've also applied a number of small bug fixes, user interface improvements and stability fixes.

by glenn at August 26, 2015 08:02 AM

August 25, 2015

Jonathan Hartley

Git: When to use three dots vs two

I endlessly misremember when to use ‘…’ in git versus ‘..’. That ends today:

To see the commits or diffs introduced by a branch:

     f
    +●  m           git log m..f
     |  ○
    +●  |           git diff m...f
      \ ○
       \|
        ○

To see the commits or diffs between the tip of one branch and another:

     f              git log m...f
    +●  m           All commits look the same,
     |  ●-          unless you use --left-right, which
    +●  |           shows where each commit comes from.
      \ ●-
       \|           git diff m..f
        ○           '-' commits are shown inverted,
                     i.e. additions as deletions.

To see the commits from ‘f’ back to the beginning of time:

     f
    +●  m           git log f
     |  ○
    +●  |           (diffs back to start of time are just
      \ ○            the contents of the working tree)
       \|
       +●
        |
       +●

Throughout, omitted branchname defaults to current HEAD, i.e:

    git diff m..f

is the same as

    git checkout m
    git diff ..f

or

    git checkout f
    git diff m..

Is there a word for unicode ascii art?

by tartley at August 25, 2015 10:51 AM