Jon Gallant

How To View Number of PyPI Package Downloads by Python Version

3 min read

Building Python packages that support both Python 2 and 3 is time consuming and at some point you are going to ask the question “Can I just build this thing for Python 3?” Ideally, the answer to that question is “Yes!”. But, why not make it a data-driven decision? You may be in a position to only ship with a Python 3 version and require that your users use Python 3.x+. If you are, then great!, do that. If not, then you’ll probably want to see how many users have downloaded your package with 2 and then make your decision.

It took me a few mins to figure out how to get this data. Hope this helps you out.

What I learned

  1. PyPI does not give you this data.
  2. Google BigQuery does here: https://bigquery.cloud.google.com/dataset/the-psf:pypi
  3. You need to sign up for Google Cloud Platform, Create a Project, and Enable BigQuery API for that project to run BigQuery API queries.
  4. BigQuery API free allowance only allows you to run the following query a couple of times before hitting limit. So use it wisely.

How to run the query

  1. Go to: https://bigquery.cloud.google.com/dataset/the-psf:pypi

  2. Sign In

  3. Accept Terms

    000012 000013

  4. Create Project

    000014

  5. Accept More Terms

    000015

  6. Click “Create Project”

    000016

  7. Enter Project Name, Click Create

    000017

  8. View Notification for Creating Project

    000018

  9. Refresh Page

  10. Click on Project

    000019

  11. Click Hamburger Icon, hover over API & Services, Click Dashboard

    000021

  12. Click “View All” link to the right

    000022

  13. Search for ‘big’

    000023

  14. Click on BigQuery API

    000024

  15. Click “Enable” button

    000026

  16. Go to: https://bigquery.cloud.google.com/dataset/the-psf:pypi

    If you see “Unable to find dataset the-psf:pypi”, then that means you probably haven’t BigQuery API. See above for how to enable that. 000011

  17. Click “Compose Query”

    000004

  18. Copy and Paste this query into New Query

SELECT
REGEXP_EXTRACT(details.python, r"[0-9]+\.[0-9]+") AS python_version,
COUNT(*) AS downloads
FROM `the-psf.pypi.downloads*`
WHERE file.project="iotedgedev"
GROUP BY python_version
ORDER BY downloads DESC

000005

  1. Change ‘iotedgedev’ to the name of your PyPI Package

  2. Click “Show Options”

    000010

  3. Uncheck “Use Legacy SQL”

    000007

  4. Click “Run Query”

    000009

  5. View Results

    000008

From https://langui.sh/2016/12/09/data-driven-decisions/ null = downloads from PyPI using clients that do not support sending the statistics we’re querying against. This can be an older version of pip or alternate clients. You also see 341 downloads from 1.17, which is…who knows! When making maintenance decisions you should factor these unknowns as you feel appropriate.

The following sites were helpful

Hope this helps you out.

Jon

Share:
Share on X