Planet London Python

July 18, 2014

Ian Ozsvald

IPython Memory Usage interactive tool

I’ve written a tool (ipython_memory_usage) to help my colleague and I understand how RAM is allocated for large matrix work, it’ll work for any large memory allocations (numpy or regular Python or whatever) and the allocs/deallocs are reported after every command. Here’s an example – we make a matrix of 10,000,000 elements costing 76MB and then delete it:

IPython 2.1.0 -- An enhanced Interactive Python.
In [1]: %run -i  ipython_memory_usage.py
In [2]: a=np.ones(1e7)
'a=np.ones(1e7)' used 76.2305 MiB RAM in 0.32s, 
peaked 0.00 MiB above current, total RAM usage 125.61 MiB 
In [3]: del a 
'del a' used -76.2031 MiB RAM in 0.10s, 
peaked 0.00 MiB above current, total RAM usage 49.40 MiB

 

UPDATE As of October 2014 I’ll be teaching High Performance Python and Data Science in London, sign-up here to get on our course announce list (no spam, just occasional updates about upcoming courses). We’ll cover topics like this one from beginners to advanced, using Python, to do interestinng science and to give you the edge.

The more interesting behaviour is to check the intermediate RAM usage during an operation. In the following example we’ve got 3 arrays costing approx. 760MB each, they assign the result to a fourth array, overall the operation adds the cost of a temporary fifth array which would be invisible to the end user if they’re not aware of the use of temporaries in the background:

In [2]: a=np.ones(1e8); b=np.ones(1e8); c=np.ones(1e8)
'a=np.ones(1e8); b=np.ones(1e8); c=np.ones(1e8)' 
used 2288.8750 MiB RAM in 1.02s, 
peaked 0.00 MiB above current, total RAM usage 2338.06 MiB 
In [3]: d=a*b+c 
'd=a*b+c' used 762.9453 MiB RAM in 0.91s, 
peaked 667.91 MiB above current, total RAM usage 3101.01 MiB

 

If you’re running out of RAM when you work with large datasets in IPython, this tool should give you a clue as to where your RAM is being used.

UPDATE – this works in IPython for PyPy too and so we can show off their homogeneous memory optimisation:

# CPython 2.7
In [3]: l=range(int(1e8))
'l=range(int(1e8))' used 3107.5117 MiB RAM in 2.18s, 
peaked 0.00 MiB above current, total RAM usage 3157.91 MiB

And the same in PyPy:

# IPython with PyPy 2.7
In [7]: l=[x for x in range(int(1e8))]
'l=[x for x in range(int(1e8))]' used 763.2031 MiB RAM in 9.88s, 
peaked 0.00 MiB above current, total RAM usage 815.09 MiB

If we then add a non-homogenous type (e.g. adding None to the ints) then it gets converted back to a list of regular Python (heavy-weight) objects:

In [8]:  l.append(None)
'l.append(None)' used 3850.1680 MiB RAM in 8.16s, 
peaked 0.00 MiB above current, total RAM usage 4667.53 MiB

The inspiration for this tool came from a chat with my colleague where we were discussing the memory usage techniques I discussed in my new High Performance Python book and I realised that what we needed was a lighter-weight tool that just ran in the background.

My colleague was fighting a scikit-learn feature matrix scaling problem where all the intermediate objects that lead to a binarised matrix took >6GB on his 6GB laptop. As a result I wrote this tool (no, it isn’t in the book, I only wrote this last Saturday!). During discussion (and later validated with the tool) we got his allocation to <4GB so it ran without a hitch on his laptop.

UPDATE UPDATE excitedly I’ll note (and this will be definitely exciting to about 5 other people too including at least @FrancescAlted) that I’ve added proto-perf-stat integration to track cache misses and stalled CPU cycles (whilst waiting for RAM to be transferred to the caches), to observe which operations cause poor cache performance. This lives in a second version of the script (same github repo, see the README for notes). I’ve also experimented with viewing how NumExpr makes far more efficient use of the cache compared to regular Python.

I’m probably going to demo this at a future PyDataLondon meetup.


Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

by Ian at July 18, 2014 08:12 AM

July 16, 2014

Python Anywhere

Outage Report for 15 July 2014

After a lengthy outage last night, we want to let you know about the events that led up to it and how we can improve our outage responses to reduce or eliminate downtime when things go wrong.

As usual with these things, there is no single cause to be blamed. It was a confluence of a number of things happening together:

  1. An AWS instance that was running a critical piece of infrastructure had a hardware failure
  2. We performed a non-standard deploy earlier in the day
  3. We took too long to realize the severity of the hardware failure

It is a fact of life that our machines run on physical hardware. As much as the cloud, in general, and AWS, in particular, try to insulate us from that fact. Hardware fails and we need to deal with it when it does. In fact, we believe that a large part of the long-term value of PythonAnywhere is that we deal with it so you don't have to.

Since our early days, we have been finding and eliminating single points of failure to increase the robustness of our service, but there are still a few left and we have plans to eliminate them, too. One of the remaining ones is the file server and that's the machine that suffered the hardware failure last night.

The purpose of the file server is to make your private file storage available to all of the web, console, and scheduled task servers. It does this by owning a set of Elastic Block Storage devices, arranged in a RAID cluster, and sharing them out over NFS. This means that we can easily upgrade the file server hardware, and simply move the data storage volumes over from the old hardware to the new.

Under normal operation, we have a backup server that has a live, constantly updated copy of the data from the file server, so in the event of either a file server outage, or a hardware problem with the file server's attached storage, we can switch across to using that instead. However, yesterday, we upgraded the storage space on the backup server and switched to SSDs. This meant that, instead of starting off with a hot backup, we had a period where the backup server disks were empty and were syncing with the file server. So we had no fallback when the file server died. Just to be completely clear -- the data storage volumes themselves were unaffected. But the file server that was connected to them, and which serves them out via NFS, crashed hard.

With all of that in mind, we decided to try to resurrect the file server. On the advice of AWS support, we tried to stop the machine and restart it so it would spin up on new hardware. The stop ended up taking an enormous amount of time and then restarting it took a long time, too. After trying several times and poring over boot logs, we determined that the boot disk of the instance had been corrupted by the hardware failure. Now the only path we could see to a working cluster was to create an entirely new one which could take over and use the storage disks from the dead file server. So we kicked off the build (which takes about 20min) and waited. After re-attaching the disks, we checked that they were intact and switched over to the new cluster.

Lessons and Responses

Long term

  • A file server hardware failure is a big deal for us, it takes everything down. Even under normal circumstances switching across to the backup is a manual process and takes several minutes. And, as we saw, rare circumstances can make it significantly worse. We need to remove it as a single point of failure.

Short term

  • A new cluster may be necessary to prevent extended downtime like we had last night. So our first response to failure of the file server must be to start spinning up a new cluster so it's available if we need it. This should mean that our worst downtime could be about 40 mins from when we start working on it to having everything back up.
  • We need to ensure that all deploys (even ones where we're changing the storage on either the file or backup servers) start with the backup server pre-seeded with data so the initial sync can be completed quickly.

We have learned important lessons from this outage and we'll be using them to improve our service. We would like to extend a big thank you and a grovelling apology to all our loyal customers who were extremely patient with us.

by glenn at July 16, 2014 02:27 PM

Menno Smits

Inbox is sponsoring IMAPClient development

Now that they have officially launched I can happily announce that the good folks at Inbox are sponsoring the development of certain features and fixes for IMAPClient. Inbox have just released the initial version of their open source email sync engine which provides a clean REST API for dealing with email - hiding all the archaic intricacies of protocols such as IMAP and MIME. IMAPClient is used by the Inbox engine to interact with IMAP servers.

The sponsorship of IMAPClient by Inbox will help to increase the speed of IMAPClient development and all improvements will be open source, feeding directly in to trunk so that all IMAPClient users benefit. Thanks Inbox!

The first request from Inbox is to fix some unicode/bytes handling issues that crept in as part of the addition of Python 3 support. It's a non-trivial amount of work but things are going well. Watch this space...

July 16, 2014 11:10 AM