Outage of chaos.social 04.02.2020

We run into a problem with our qcow2 image (again). The image seems corrupted which lead to an increase of used storage far beyond the set limits. We didn't noticed that earlier.

To fix the problem we have informed the datacenter provider to add additional storage capacity to our server. We need this additional storage because we have not enough capacity left for the fixing process.

So for the moment we have to wait for the installation of the new disk.

After that we have to copy the whole disk twice. To the new disk for fixing and back to the raid again for production. This will take some time too despite the underlying NVMe SSDs.

Please be patient.

Update 08:20 CET: We got informed about the outage by a user. Thanks!

Update 08:30 CET: We posted some short message about the outage on twitter.

Update 08:35 CET: We found the root cause of the outage.

Update 08:46 CET: We requested a new additional disk from our hoster.

Update 10:00 CET: The new disk is installed, recovery is in progress!

Update 10:35 CET: The progress is quite slow current ETA for the first recovery process is 14:00 CET

Update 11:13 CET: We could increase the speed by a mangnitude. The first process is done. We are going to check everything now and then start the second recovery process.

Update 11:21 CET: Checks are good, we now proceed with the second recovery process. ETA 12:00 CET

Update 11:57 CET: The second process needs a little bit longer than expected. ETA for full recovery 13:00 CET

Update 12:16 CET: The second process just finished. Now bringing everything up online.

Update 12:20 CET: We are back online!