Problems with Events
The cluster was restored at 7:30pm (PST). While there was no impact to any of the metric data, a subset of events between 1:30 PM to 5:30 PM are not accessible.
We apologize for the inconvenience, the monitoring and infrastructure has been updated and should help in mitigating these issues in the future.
Jul 27, 21:43-23:33 CST
Problem with Data Store Node
We rolled back the change that was made in order to re-analyze the issues in the staging environment. We will roll out the new code again at a point after we have ensured that this issue will no longer materialize.
Apr 8, 21:46 - Apr 10, 11:03 CST
Problems with Data Store
The problem with the data store was resolved. The issue was with the number of open files. The code was modified to rectify the issue.
Apr 8, 17:15-19:41 CST
Dashboards Not Updating Regularly
We identified the problem and have modified the meter to be able to track the metric which would have helped us to identify the issue earlier. We are now in a state where we can resolve this issue before it becomes an issue by using Boundary.
Mar 4, 16:40 - Mar 5, 09:54 CST
Momentary Graph Disruption
At 1:30 PM CST today, several of the services in the backend re-started themselves due to a race condition. The display of data was temporarily affected, but there was no data loss. The services came back online and the system is currently running smoothly. There was no data loss in either this disruption or the previous disruption.
Mar 4, 14:19 CST
At 12:00 PM today, several of the services in the backend re-started themselves due to a race condition. The display of data was temporarily affected, but there was no data loss. The services came back online and the system is currently running smoothly.
Mar 4, 12:12 CST
Boundary Dashboards Not Updating Regularly
While we are continuing to make updates in order to improve the recoverability of the backend components, we believe that we have resolved the primary issues which caused this issue in the first place. As previously stated, it centered around timing issues between the backend services and was exacerbated by an issue in the Linux kernel. We have updated both components and have seen no service interruption since the components were updated.
Mar 2, 10:26 - Mar 3, 15:44 CST