Getting Entry Points from PyPI

Entry_points are used in a lot of my projects as they’re a fairly simple way of adding plugin points. Recently I was wondering if there was a tool or site that maps them to PyPI though I am not aware of any of them. I spent a little time prototyping with an idea to see how much of it might be feasable.

PyPI provides a json api with basic meta data, but understandably, it’s unreasonable to expect for them to include everything. Even though our entry-point data is not included, we can use this as a starting point.

import requests

class PypiSession(requests.Session):
    def package_data(self, package_name):
        result = self.get(url=f"https://pypi.org/pypi/{package_name}/json")
        result.raise_for_status()
        return result.json()

with PypiSession() as session:
    data = session.package_data(package)
    version = data["info"]["version"]
    for release in data["releases"][version]:
        if release["packagetype"] == "bdist_wheel":
            break
    else:
        raise Exception("No wheel found")

First we want to use the PyPI json api, and get the most recent release, and see if it has a wheel package. The wheel package will have the full data that we could need. Alternatively we could try to download the source distribution and if there’s a pyproject.toml or setup.cfg or setup.py, but the wheel should be easier for us.

We then want to download the wheel somewhere that we can read it.

from pathlib import Path

class PypiSession(requests.Session):
    ...
    def package_download(self, url, target: Path) -> Path:
        with self.get(url, stream=True) as result:
            result.raise_for_status()
            with target.open("wb") as f:
                for chunk in result.iter_content(chunk_size=8192):
                    f.write(chunk)
        return target


with PypiSession() as session:
    # Download file to cache
    path: Path = settings.CACHE_DIR / data["info"]["name"] / release["filename"]
    if not path.exists():
        if not path.parent.exists():
            logger.debug("Creating parent directory %s", path.parent)
            path.parent.mkdir(parents=True)
        logger.debug("Downloading %s", release["url"])
        session.package_download(url=release["url"], target=path)

We will use the file name from the earlier API call, to help create our cache directory, then download the wheel to our local machine. A production version will likely want to do more checks, but for our usage we can probably trust what PyPI gives us.

With the file on disk, we want to pull out our entry_points. Since a wheel is a zip file, we can extend on that.

from zipfile import ZipFile
class Wheel(ZipFile):
    def entry_points(self) -> str | None:
        for fn in self.filelist:
            # Easier to loop through to find the file, than other logic because the parent directory
            # of our file will be the package+version
            if fn.filename.endswith("entry_points.txt"):
                logger.debug("Examing entry_points %s", fn)
                with self.open(fn) as fp:
                    return fp.read().decode("utf8")
        raise Exception('No entry points')

 # Process wheel
with Wheel(path) as wheel:
    if contents := wheel.entry_points():
        print(contents)

Now we have the entry_points data from the wheel which we can process further with tools alreay built in. If we were using entry-points in our normal application, we might read the group using the following code.

from importlib import metadata
for ep in metadata.entry_points(group='mygroup'):
    print(ep)

Instead, we will process our entry_points from our downloaded metadata, instead of the normal metadata from installing a package.

from importlib import metadata

with Wheel(path) as wheel:
    if contents := wheel.entry_points():
        entry_points: list[metadata.EntryPoint] = metadata.EntryPoints._from_text(contents)

for ep in entry_points:
    print(ep)

With this, we can then store the data in whatever format we want to generate our reports. For my prototype, I’m using django so my code looks something like this.

from django.db import models

class EntryPoint(models.Model):
    project = models.CharField()
    name = models.CharField()
    group = models.CharField()

for ep in entry_points:
    EntryPoint.objects.get_or_create(
        project=data["info"]["name"],
        name=ep.name,
        group=ep.group,
    )

My next step would be to use my Django site to think of how to display the data and allowing queries, but that is not as special so I will not be documenting it here. I am not quite sure how viable this is for a small project, how much utility it would provide, and how it would scale, but it is still interesting to poke around at the internals and see how things work.