Service Incident Update - July 25, 2025 - US1

We want to inform you about a service interruption that occurred on Friday, July 25, 2025, and share the steps we’ve taken to prevent similar issues in the future.

What Happened

MSPintegrations runs on a distributed, redundant infrastructure orchestrated using Kubernetes, which manages our services in pods of containers. These containers are defined in images stored in a container registry service. Under normal conditions, this architecture allows our platform to dynamically scale to meet demand and ensure that customer emails are processed quickly regardless of volume.

Between approximately 9:00 AM and 9:45 AM Pacific Time on July 25, 2025, MSPintegrations experienced an approximately 47-minute service outage affecting email processing, scheduled tasks, the web console, and API access. The root cause was an outage at our primary container registry service, which prevented our Kubernetes infrastructure from scaling to meet demand. When the registry service went offline and began returning 502 errors on image pulls, our system could not deploy new container instances, leading to service unavailability.

How We Responded

Our monitoring systems immediately detected the outage and alerted us at 9:00 AM. We quickly identified the issue and attempted to failover to our secondary container registry service. However, authentication configuration issues prevented the immediate switch to our secondary system. We successfully restored service at 9:45 AM when our primary container registry service recovered. We then scaled our infrastructure to double capacity to process the backlog of emails and tasks, completing all catch-up processing by 10:47 AM. Throughout the incident, we provided real-time updates at https://status.mspintegrations.com.

Prevention Measures

We are implementing several improvements to prevent future occurrences. First, we will resolve the authentication issues that prevented failover to our secondary container registry service and document tested procedures for manual failover. We are also working toward automated failover capabilities that will allow our system to switch between multiple container registries without manual intervention. Additionally, we are evaluating potential replacement of our primary container registry service with a different vendor to reduce dependency risk.

Thank you for your patience during this incident and for your continued trust in MSPintegrations. We are committed to maintaining the reliability you expect from our platform.