blog Autosort With Raindrop

Tags:
celery django python

One advantage of having my own personal api is that I can put various useful scripts under a single repo and have them run. I have been using raindrop for several years, to collect bookmarks to read later. Often, while researching things, it would be useful to automatically group things into collections, so I wrote some celery tasks to help with this.

To start with, I have a dictionary of various site mappings that I will use.

MAPPING = {
    # Various amazon domains to their own collection
    "www.amazon.com": "auto:amazon",
    "www.amazon.co.jp": "auto:amazon",
    # When I find a neat app, I can filter to a collection
    "itunes.apple.com": "auto:apps",
    "apps.apple.com": "auto:apps",
    # and many more mappings
}

# For things like wikipedia that have multiple domains, I'll add them to the
# mapping this way
for host in [
    "en.m.wikipedia.org",
    "en.wikipedia.org",
    "ja.m.wikipedia.org",
    "ja.m.wiktionary.org",
    "ja.wikipedia.org",
]:
    MAPPING[host] = "auto:wiki"

Now that I have my MAPPING created, I can loop through recent links. From the system-collections documentation we see that the unsorted collection has the ID -1. In Python we can then loop through some of our unsorted items and then sort them. We parse the link so that we can look it up in our MAPPING table, and for any we find, we pass it off to another celery job to handle the linking.

from urllib.parse import urlparse

@periodic_task(run_every=crontab(minute=0))
def unsorted():
    cache = Collections()
    # https://developer.raindrop.io/v1/raindrops/multiple#get-raindrops
    for link in get("https://api.raindrop.io/rest/v1/raindrops/-1"):
        url = urlparse(link["link"])
        if url.netloc in MAPPING:
            collection_name = MAPPING[url.netloc]
            collection_obj = cache[collection_name]
            to_collection.delay(collection_id=collection_obj["_id"], raindrop=link)

There is a bit of magic in my Collections class, I use a custom UserDict so that I can create the needed collection if it does not yet exist.

import collections
class Collections(collections.UserDict):
    def __missing__(self, key):
        if not self.data:
            self.data = {c["title"]: c for c in get_all_collections()}
        if key not in self.data:
            self.data[key] = create_new_collection(key)
        return self.data[key]

Creating a new collection is easy, I just post the title to the api

def create_new_collection(name):
    # https://developer.raindrop.io/v1/collections/methods#create-collection
    result = client.post(
        "https://api.raindrop.io/rest/v1/collection",
        auth=CustomAuthClass(),
        json={"title": name},
    )
    result.raise_for_status()
    return result.json()["item"]

Getting all the collections is slightly more involved, since it requires getting nested collections as well.

def get_all_collections():
    auth = CustomAuthClass()

    def root():
        # https://developer.raindrop.io/v1/collections/methods#get-root-collections
        yield from client.get("https://api.raindrop.io/rest/v1/collections", auth=auth)

    def child():
        # https://developer.raindrop.io/v1/collections/methods#get-child-collections
        yield from client.get(
            "https://api.raindrop.io/rest/v1/collections/childrens", auth=auth
        )

    return chain(root(), child())

For each link we queue, we update it with just the new collection ID. Even though the verb is PUT, raindrop treats it kinda like PATCH so you only need to update the specific fields you want.

@shared_task
def to_collection(collection_id, raindrop):
    # https://developer.raindrop.io/v1/raindrops/single#update-raindrop
    result = client.put(
        f"https://api.raindrop.io/rest/v1/raindrop/{raindrop['_id']}",
        json={"collection": {"$id": collection_id}},
    )
    result.raise_for_status()

With this, I can quickly bookmark a link in Raindrop, and then just let my celery job pick it up and sort it.