Building Python packages that support both Python 2 and 3 is time consuming and at some point you are going to ask the question “Can I just build this thing for Python 3?” Ideally, the answer to that question is “Yes!”. But, why not make it a data-driven decision? You may be in a position to only ship with a Python 3 version and require that your users use Python 3.x+. If you are, then great!, do that. If not, then you’ll probably want to see how many users have downloaded your package with 2 and then make your decision.
It took me a few mins to figure out how to get this data. Hope this helps you out.
What I learned
- PyPI does not give you this data.
- Google BigQuery does here: https://bigquery.cloud.google.com/dataset/the-psf:pypi
- You need to sign up for Google Cloud Platform, Create a Project, and Enable BigQuery API for that project to run BigQuery API queries.
- BigQuery API free allowance only allows you to run the following query a couple of times before hitting limit. So use it wisely.
How to run the query
- Go to: https://bigquery.cloud.google.com/dataset/the-psf:pypi
- Sign In
- Accept Terms
- Create Project
- Accept More Terms
- Click “Create Project”
- Enter Project Name, Click Create
- View Notification for Creating Project
- Refresh Page
- Click on Project
- Click Hamburger Icon, hover over API & Services, Click Dashboard
- Click “View All” link to the right
- Search for ‘big’
- Click on BigQuery API
- Click “Enable” button
- Go to: https://bigquery.cloud.google.com/dataset/the-psf:pypi
If you see “Unable to find dataset the-psf:pypi”, then that means you probably haven’t BigQuery API. See above for how to enable that.
- Click “Compose Query”
- Copy and Paste this query into New Query
SELECT
REGEXP_EXTRACT(details.python, r"[0-9]+\.[0-9]+") AS python_version,
COUNT(*) AS downloads
FROM `the-psf.pypi.downloads*`
WHERE file.project="iotedgedev"
GROUP BY python_version
ORDER BY downloads DESC
- Change ‘iotedgedev’ to the name of your PyPI Package
- Click “Show Options”
- Uncheck “Use Legacy SQL”
- Click “Run Query”
- View Results
From https://langui.sh/2016/12/09/data-driven-decisions/
null = downloads from PyPI using clients that do not support sending the statistics we’re querying against. This can be an older version of pip or alternate clients. You also see 341 downloads from 1.17, which is…who knows! When making maintenance decisions you should factor these unknowns as you feel appropriate.
The following sites were helpful
- https://github.com/tswast/code-snippets/blob/master/2018/python-community-insights/Python Community Insights.ipynb
- https://kirankoduru.github.io/python/pypi-stats.html
- https://stackoverflow.com/questions/38102317/why-pypi-doesnt-show-download-stats-anymore
- https://cloud.google.com/blog/big-data/2017/05/try-google-bigquery-today-now-with-10gb-of-free-storage
- https://cloud.google.com/billing/docs/how-to/modify-project#change_the_billing_account_for_a_project
- https://packaging.python.org/guides/analyzing-pypi-package-downloads/
- https://langui.sh/2016/12/09/data-driven-decisions/
Hope this helps you out.
Jon