Sakurajima Downtime Post Mortem and Mitigations

1775648957295.jpeg

Sakurajima Social that runs Sharkey, a fork of Misskey had issues recently, most of it stem with the most recent security update. I want to at least addressed what happened for transparency sake.
  1. First, there was instability after the security patch was installed. This caused the instance to crash every few minutes. We have identified this as a bot (applebot) trying to load the stream web socket, sending invalid data to it, causing it to crash. While we have blocked the bot, hopefully this will get patched out soon.
  2. A second downtime is due to issues with Redis, which been says it’s a misconfiguration. That has been fixed
  3. Downtime that happened the morning of yesterday, and haven’t been caught until today this morning. I wasn’t notified until this morning after a forums post. I think the instance crashed or hung and couldn’t restart due to the service using a run script in my attempt to debug issue #1. The systemd service daemon config has been reverted to not use the start script
To prevent a day of downtime, I finally set up the uptime monitoring, which has been down since the personal server crashed last September. I didn’t get the chance to bring a new one up. I have set up uptime monitoring with email alters so I can fix issues that arises as soon as possible without being out of the loop. This status page is now located at:
 
Last edited:
Top