Session

Vectorization: How slow Python runs fast code

You need to quickly process a large amount of data—but running Python code is slow. To help bridge this performance gap, the scientific and data science Python communities have built libraries like NumPy and Pandas that speed up computation using a technique called vectorization: batch APIs with fast native processing, that can give you two orders of magnitude improvement in run time!

In order to take full advantage of these libraries to speed up your code, it's helpful to understand what vectorization means and when and how it works. That way you can make sure you're using the fastest path, and avoiding code patterns that slow down your code.

In this talk you'll learn:

  • Why writing fast software matters: to you, your employer, and the world at large.
  • How vectorization allows your code to run multiple orders of magnitude faster.
  • How to identify both vectorized code, and code that will run slowly by breaking vectorization.
  • How to turn slow code into fast vectorized code.

The talk presumes some minimal experience with either NumPy or Pandas, but the same principles apply more broadly to other data processing libraries, and beyond.