My mishap with 24x7servermanagement.com
We are a hosting company and we had a problem on a server, with damaged cluster disks.
Not having the system administrator available because of a fever, we contacted 24x7servermanagement.com.
We wrote to 24x7servermanagement.com:
We saw in the log:
Nov 4 03:58:11 srv smartd : Device: / dev / sda [SAT], 37 Currently unreadable (pending) sectors
Nov 4 03:58:12 srv smartd : Device: / dev / sdb [SAT], 31 Currently unreadable (pending) sectors
Do you want to check in?
I bought their service management and they have analyzed the problem:
The SDA and SDB can both have the issue of unreadable sector.
For now, I will suggest you as follows:
Replace SDA, then update us and we will reconstruct the RAID sync.
Once the RAID sync is complete, then we will do the same for SDB ..
Datacenter wrote me:
Please make sure beforehand, that the GRUB bootloader is installed on both HDDs.
Afterwards, you'll have to reintegrate the HDD into the existing RAID-arrays and wait for the resync to be finished.
So I wrote a 24x7servermanagement: GRUB bootloader is installed on both HDDs?
24x7servermanagement wrote to me: it is obvious that the grub is installed, is raid.
The datacenter has changed the first SDB disk, server restarted and resync raid done.
Then, the datacenter changed the second SDA disk, server does not restart. The server has been down for over 14 hours because it did not start, over 1000 sites down on this server. Servermanagement has responded every time with different technicians and nobody has managed to solve the problem. Then, in despair I contacted cloudlinux support and they wrote to me:
Cloudlinux wrote me:
Your issue was resolved with grub2-efi-x64 package, the system is online. However, there are fs errors, probably you need to make fscheck and maybe verify raid on the system.
Thanks Cloudlinux, they have the best support in the world and always solve problems.
Then, once the server started, the PDNS and cPanel PHP-fpm services did not start, so I wrote to servermanagement, they did not solve the problem, so I wrote to cpanel and they solved the problem.
In conclusion, my advice: I thought that these servermanagement were good, I was wrong, they are not able to solve the problems are incompetent, do not trust them.
I did not want to do this review, but servermanagement made me suffer for over 14 hours, they teased me and deserve this review.
I asked 24x7servermanagement for the reimbursement for the damage they did to our company, nothing.
Do not trust servermanagement.
Response from 24x7servermanagement:
30 November 2018 at 21:22
Please read my reply carefully.
I'm sorry to hear about your bad experience. We're normally known for our exceptional attention to critical issues, and we regret that we took little more time to resolve this issue, For which we had already informed you.
The work was scheduled to be done in 2 patches, first was replacement of second disk and then sync the array and the second one was to replace the first disk and then sync the array again. The first part was done with no issues. Only when the second part was initiated, the server was directly handed over in rescue mode without any intimation/screenshot of why it was put in rescue mode.
More chances of server not rebooting relates to the Cloudlinux package, as the same thing has been evident to lot of people around and only CloudLinux support was the answer left, as the package provided was from them only "grub2-efi-x64-2.02-0.65.el7_4.2.cloudlinux.x86_64.rpm" so they already knew things that would potentially break the system running CloudLinux if the system using their package does not boot up.
Similar incident like you, which no one answered for 3 years can be found in below link..
It took very less time for cloudlinux to resolve because they are already aware of such issue, whereas when we close to the solution we found cloudlinux had already fixed the grub loader.
On the PHP-FPM service issue you reported, as far as I know, there was issue with php-fpm service, which was reported in the later part and was addressed too and was replied with answer too with verification log of service being online after it was fixed when it was tested from our end. Should it have been already fixed, you would have received reply from our end stating that the "service is already online" and the reason why it was updated to you that it being fixed is because we saw the error and when we applied the fix and restarted the service, the service restarted properly.
Your analysis on rating us negative is unfair as had you trusted us by giving us some more time we could have resolved this issue for you. It was just a matter of time and having little faith with us nothing else.