I'm Luke Johnson

I live and work in Vancouver, BC
I write about research and other interests here


Lemmy/Mbin/PieFed communities on Mastodon part 2: RSS to the rescue

When we left off, I’d been trying to create a simple tool to improve Mastodon users’ experience of interacting with communities on Lemmy, Mbin and PieFed (or “LemBinFed” as I like to call them1). This would be a bot that would only boost top-level posts from a community, to save Mastodon users from having to sift through out-of-context reply posts to find them.

The plan was initially make this bot using only Python and the Mastodon API: follow the community, read recent posts on the bot’s timeline, and filter out replies using the status.in_reply_to_id attribute. However, there is a critical problem with Mastodon: you can’t follow a LemBinFed community if there is a LemBinFed user with the same handle!

An "it's the same picture" meme from The Office, where the two options are the URLs and search queries for Lemmy users vs groups. Mastodon is confidently saying they're the same.

Sit here and RSS a while

We obviously need a different way to capture the latest top-level posts from places like linux@lemmy.ml, and preferably one that doesn’t need as many API requests—I can imagine my initial approach not being particularly scalable.

Good thing we have such an alternative in the venerable RSS feed! Most people might know this as the little RSS icon thingy you find on many websites, or as the underlying technology behind many podcast apps. You can even follow an RSS feed of this website. It shouldn’t be too surprising, then, that you can follow RSS feeds for LemBinFed communities (or even feeds of the comments on individual posts), with any post ordering you want.

I’m not the first person to have this idea—here’s one example of a bot script to do a similar thing in NodeJS. I opted to add to my existing Python script instead, because I’d already wrapped my head around the Mastodon API on Python, and I’d prefer to make changes on Python if needed down the line.

Here’s the gist of what my code is doing (heavily abridged for clarity, so don’t expect to paste it anywhere and it to work—here’s the complete version):

from mastodon import Mastodon # Talks to Mastodon API
import feedparser # Reads RSS feeds
import config # Local script containing config variables
"""Some other imports"""

"""Skip a bit"""

# Create Mastodon client
mastodon = Mastodon(
    client_id = CLIENT_CRED_FILE,
    api_base_url = config.API_BASE_URL,
)

# Log in
mastodon.log_in(
        config.USERNAME,
        config.PASSWORD,
    )

# RSS search loop
while True:
    f = feedparser.parse(feed_url)

    # f contains results from the first page - typically 20 entries
    entries = f.entries

    for entry in entries:
        
        # Filter for entries newer than twice TIME_TO_SLEEP to allow
        # some redundancy and overlap. E.g. to account for short 
        # disconnections
        if (time.time() - timegm(entry.published_parsed)) < \
            (2*TIME_TO_SLEEP):

            # Search Mastodon using the post's URL
            search_result = mastodon_client.search(q=entry.link, \
                result_type="statuses")

            status = search_result.statuses[0]
            domain = urlparse(status.url).netloc

            # Avoid reboosting, self-boosting and filtered 
            # users/servers
            if not status.reblogged and \
                status.account.acct != this_account.acct and \
                status.account.acct not in config.IGNORE_AUTHORS and \
                domain not in config.IGNORE_SERVERS:

                    # Boost the post!
                    mastodon_client.status_reblog(status.id)

    time.sleep(TIME_TO_SLEEP)

Another big benefit is that this script uses the Mastodon API much less than the old version. Back then, it was making API requests to check every single and reply we came across from a given community. Now the API calls are pretty much the last stage after identifying likely-valid posts with RSS. This is great for scalability!

Crashing loudly

Another problem I had with my original code was that I wouldn’t know if anything went wrong. If it crashed for some reason, it would crash quietly–and I’d have to notice that the bot had stopped boosting potentially very infrequent posts.

This I also solved, by adding a variable to config.py called ADMIN_USER, a fediverse account that the script would privately message if there was a problem.

Most of the main script is now wrapped in a try ... except block like so:

import traceback

# function to search for account to send error messages to
admin_account = exact_search(mastodon, config.ADMIN_USER)

try:
    # Main code block
except:
    # Record the error message 
    msg = traceback.format_exc()

    # DM the admin account with the traceback message
    if admin_account is not None:
        mastodon.status_post("@{} ".format(config.ADMIN_USER) + \
            "Something went wrong with {} bot: ".format(config.CLIENT_NAME) + \
            msg, visibility='direct')

The first couple of times this worked and I got useful debug info via a private message from a bot, I’ll admit I felt quite clever.

Nothing is ever finished

This script isn’t perfect, of course, and there are always to-dos. For example, the script doesn’t currently unboost deleted posts (either by the user or by mods), so it could end up having trouble with spam or dubious content. It also doesn’t handle cross-posts very well; these are treated as separate posts for for each followed instance, so show up on the Mastodon timeline as duplicates.

For now though, these are problems for the future. If I wanted perfection, this would never see the light of day!

The bot zoo

I used this script to run a couple of test bots on mastodon.world, which happily chugged away for a couple of weeks while I had other things going on. At this point (and because mastodon.world started requiring approval for new sign-ups) I was confident enough to set up a new instance for my bot zoo using FediHost.

Why set up a new instance and not just add them to fedi.lukejohns.online? I can’t work out how to change this from a single-user instance to a multi-user one.

Why use FediHost and not your Raspberry Pi under the bed? I’m not sure the Pi would be able to handle too much more traffic, and although it is possible, hosting two instances on one local network is too far outside my comfort zone for such an inconsequential project.

So without any further ado, I am pleased to introduce my bot zoo: bots.lukejohns.online. Here is a list of the current bot accounts, which will be updated if there are any changes:


  1. I don’t actually like to call them that, but I have no better ideas of how to collectively describe “Reddit-like ActivityPub-based link-aggregator platforms”. If you have any suggestions, please leave a comment! ↩︎

Comments

You can use your Mastodon or other Fediverse account to comment on this post by replying to this thread.

Alternatively, you can reply to this Bluesky thread using a bridged Bluesky account.

Learn how this is implemented here.