After monitoring overnight, we are resolving this incident.
Cause
The run time for one of our scheduled job’s became longer than it was scheduled for, which caused a spike in one of our databases load. This led to slow performance of that specific database and caused the performance degradation.
Prevention Steps
We have improved that scheduled job’s performance
We have implemented new alarms to be early notified in case of future similar occurrences and be able to solve the issue without any customer impact.
Monitoring
Monitoring
We implemented a fix and are currently monitoring the result. Based on early results connection times and recording/streaming events should be working as expected.
UPDATE: We will continue monitoring until Wednesday 14 June 2023, but all systems should be operational and working as expected.
Investigating
Investigating
We are currently investigating longer connection times and failed join events to rooms. Cloud recording and live-streaming may also fail to start or take longer to start.