Sample output of a pyspeedinsights Excel report

This is my first Python package release. All constructive feedback is welcome.

You can find it on GitHub and PyPI.

What is pyspeedinsights? Link to this heading

In short, pyspeedinsights is a command line tool that parses your sitemap, runs a Lighthouse analysis via the PageSpeed Insights API for every page, and writes the scores/values for each audit in a color-coded manner to an Excel sheet.

You can also run reports on single pages which output to raw JSON or Excel.

All the standard lighthouse categories (performance, a11y, SEO, etc.) and strategies (desktop, mobile) can be specified via command line arguments.

You can even control which individual metrics are written to Excel on top of the default audits.

Who’s it for? Link to this heading

This is more of an end-user command line tool. Although if anyone’s interested, I would be willing to improve the developer API to make it more extensible. There are also barely any tests written yet (I know, I’ll get on that).

If you’re looking for a 10k-foot view of your website performance, feel free to give it a spin.

I got the idea for this from a Reddit user who posted his JavaScript version in r/bigseo. Huge credit to u/arduinobits for the great idea.

I knew that the Python community would appreciate having a similar tool, so I got to work!

Challenges Link to this heading

It’s been a long time since I wrote something in Python that wasn’t a Django project. I had to completely teach myself Python packaging from scratch.

This included things like:

  • How to write a setup.cfg file with a console_scripts entry point
  • What to include in pyproject.toml
  • Where to keep tests and how to run them (I kept them separate from the package in a tests dir at the root of the project)
  • How to structure the project in general (I went with the src directory layout)
  • Adding auto re-formatting, import sorting, and linting with black, isort, flake8, and pre-commit hooks. This wasn’t necessary obviously, but a great learning experience nonetheless.

The biggest challenge by far was writing the API module using requests, then realizing that the latency on the server-side was too high (10s-1m per request) for synchronous requests to be remotely useful for bulk analysis. Rookie mistake!

What did I do to fix this?

Packages Used Link to this heading

I knew it would work so much better if I made the requests async so I decided to learn the basics of asyncio and aiohttp.

Once I converted the requests to use aiohttp, I started getting 500 errors from the server for (I presume) launching too many requests at once. I eventually figure out how to sleep them for a second which also helped with avoiding the per-minute quota set by Google.

Now, the script launches requests via async tasks and there is no blocking time between API calls.

Then there was the issue of how to store the API key in a safe and user-friendly manner. Environment variables with a package like python-dotenv are great for development but it’s not very intuitive for end users.

Luckily, I came across the keyring package which was perfect for this use case. Just generate your key, run one command, and you’re ready to go.

The writing to Excel is handled by XlsxWriter which is an alternative to openpyxl that works great for write-only operations.

Lastly, I decided to keep the command line stuff simple by using the built-in argparse library instead of a 3rd-party package like click. Although I may move to something more robust in the future.

Conclusion Link to this heading

Thanks for reading! Again, any feedback is welcome. If you run into any issues with it let me know.

I’m also open to anyone who wants to contribute or improve the package if you find it useful.