Post Mortem: Management Console Downtime on October 17, 2024, between 5:40 PM and 6:00 PM

While the instances remained operational, the inability to access the Management Console hindered user interaction with these instances. Fly.io identified and resolved the issue within approximately 20 minutes.

Timeline

  • 5:40 PM: Users began reporting issues accessing the Management Console.
  • 5:55 PM: Fly.io started investigating increased proxy errors affecting apps using Flycast internal networking.
Investigating - We are investigating increased proxy errors for apps communicating over Flycast internal networking.
  • 5:56 PM Fly.io continued their investigation.
Update - We are continuing to investigate this issue.
  • 6:00 PM: The Management Console became fully operational again.
  • 6:04 PM: Fly.io reported that they had fixed the error.
We have identified the issue and deployed a fix. We are seeing recovery in affected apps, with proxy error levels returning to normal. We are continuing to monitor for full recovery.

Root Cause

The downtime was caused by an internal networking issue within fly.io’s infrastructure, specifically affecting applications communicating over their Flycast internal networking. This issue resulted in increased proxy errors, preventing users from accessing the Management Console.

Preventive Measures

  • Multi-Cloud Strategy: While adopting a multi-cloud approach could provide failover options during such incidents, it is currently not feasible for us due to resource constraints.
  • Vendor Evaluation: We will continue to monitor fly.io’s performance closely. So far there was nothing negative to report about the reliability. Should issues persist, we will consider switching to a more reliable cloud vendor to enhance service stability.

Conclusion

We sincerely apologize for the inconvenience caused by this downtime. Our team is committed to providing reliable services and is taking steps to prevent similar issues in the future. We appreciate your understanding and patience.