Solved

MediaAgent crash after guest file restore


Userlevel 1
Badge +12

Hello,

 

I have issue with guest file restore, after starting restore the MA getting error and crashing. I am trying to restore Windows File and using RH8 MediaAgent. After starting restore there is high CPU usage and warrning on MA

 

At some point MA not responding more and you need to restart it.

 

When i am doing file restore and using Windows MediaAgent its work perfectly. I can remember in the past we always used RH8 MA for restores all kind of files.

 

We are now on 11.28.48 SP. I made commvault case 230228-312 aswell, but they couldn't help. 

icon

Best answer by Egor Skepko 13 March 2023, 11:06

View original

20 replies

Userlevel 6
Badge +14

Hi @Egor Skepko ,

From the case you said that you will use a Windows MA which is working as expected.

I checked with the Engineer and requested to re-open the case and escalate to Development with the crash dump..

Best Regards,

Sebastien

Userlevel 6
Badge +14

Looks like issue is Red Hat Enterprise Linux 8.7 crashing due to latest kernel on which cvblk driver support not present.

Can you confirm the version you have please and if so, can you downgrade to RHEL 8.6?

If that’s the issue, we have a fix SP28-HotFix-4090 which will be available in the next Maintenance Release in April.

Otherwise we can provide the Diag in your case and see if that fixes your issue.

 

Userlevel 1
Badge +12

@Sebastien Merluzzi Hello, yeah in the case i said that we gonna use Windows MA but didnt said to close the case. 

 

But yes pls can you provide me fix now so i can test it out.

Userlevel 6
Badge +14

Sure, my colleague will contact you.

Userlevel 1
Badge +12

@Sebastien Merluzzi Sorry for delay, we are running on 4.18.0-425.13.1.el8_7.x86_64 kernel, and we we are not going to downgrade to 8.6. So we juist wait for the fix in april? 

Userlevel 6
Badge +14

@Egor Skepko ,

Please work on the case you have with my colleague, as I can see he is scheduling a session with you.

Userlevel 1
Badge +12

@Sebastien Merluzzi We have installed the DIAG on one of the MA (11.28.48) to resolve the drivers on OS version 8.7 and after the installation i did restore and its work fine. This DIAG wil be availble next week at 11.28.54 SP

Userlevel 6
Badge +14

@Egor Skepko That’s correct. Please mark this question as solved.😉

Userlevel 6
Badge +15

Hi I’m interested on this as it looks like I experience the same issue

Userlevel 1
Badge +12

@Laurent There wil be new hotfix next week that should solve your issue aswell. Otherwise you can create case at commvault or as them to send you update.

Userlevel 6
Badge +14

@Laurent @Egor Skepko ,

The Hotfix is in MR54 which will be available on Tuesday 4th April (EST).

So you can log a case with us and we will provide the Diag.

Userlevel 6
Badge +15

Thanks @Sebastien Merluzzi , to accelerate and not go into usual troubleshooting/error reproducing/send logs, is there something I can mention to get the immediate download link to this hotfix ? 

Userlevel 6
Badge +14

@Laurent ,

It looks like it is RHEL 8.7 latest kernel on which cvblk driver support not present.

My colleague has your case and will send it to you.

Bonne journée 😉

Seb

Userlevel 6
Badge +15

@Laurent ,

It looks like it is RHEL 8.7 latest kernel on which cvblk driver support not present.

My colleague has your case and will send it to you.

Bonne journée 😉

Seb

Merci Sebastien, it worked after I applied the diag 😉

Userlevel 6
Badge +15

Hi guys ! 

 

Adding to this thread as it was quite useful to solve the issue.

Last wednesday I applied the latest MR, 11.28.56 onto my 11.28.52 + DIAG that had the issue fixed.

I have been asked to perform a file-level restore from a windows VM backup, using my same RHEL8.7 linux MA. And it failed to succeed. I also had new occurences of ‘kernel:watchdog : BUG : soft lockup - cpu stuck for …’

I think I will open a new case, as I don’t think I can apply the same DIAG after this MR56.

Wasn’t MR54 supposed to embed this fix ?

 

Regards,

Laurent.

 

Userlevel 6
Badge +14

@Laurent ,

It should be in MR54 already, so MR56 should also have it.

https://documentation.commvault.com/2022e/expert/assets/service_pack/updates/11_28_56.htm

Soft CPU lockups in _raw_spin_lock on RHEL 8.7 with 4.18.0-425.10.1 kernel
Adding Linux driver support for Debian 10 kernels up to 4.19.0-23

4090

 

But sure, please log a new case and we will check.

Best Regards,

Sebastien

 

   
Userlevel 6
Badge +14

@Laurent ,

The logs have rolled over, can you please reproduce the issue and send us the logs.

I can only see that the Diag has been removed and MR56 installed.

Meanwhile I have sent an email to Development.

I will get back to you.

Best Regards,

Sebastien

Userlevel 6
Badge +15

Thanks for the followup, Sebastien. I was off this wednesday.

I just uploaded the logs of the MA, as, well, there’s no job reference itself for the simple ‘browse’ session, mentionning it’s related to Incident 230411-419.

 

I am not 100% sure how to interpret what you wrote.

Were you talking about my case/logs about Diag removed and MR56  installed? 

Or just that Diag is not available anymore as MR56 includes it in any way ? (Sorry for my blurry mind 😄)

Userlevel 6
Badge +14

@Laurent ,

Development confirmed that the Diag is in 54 and up.

From UpdateInfo.log you see we remove all the Diags, then we install the MR.

So when I checked your logs I could see we removed the Diag.

1629126 140672227698496 04/05 13:12:54 5371325 All updates to remove = ['linux-x8664_11.0.0B80-SP28_PreRelease-4206-BIN:1101'], sucessfulUpdates = ['linux-x8664_11.0.0B80-SP28_PreRelease-4206-BIN'], list lengths = 1 1

The job id is from the Persistent Recovery job, in Jobmanager.log you would see:

[---- RESTORE 3RD PARTY REQUEST ----]

However from the FREL, the /var/log/messages have rolled over as the Browse was done on 07/04.

We will check the logs you sent then.

Userlevel 6
Badge +15

Thanks for details.

Right now since the browse request, just an extract of what I have from the server, live : 

[root@MyLinuxMAanonymized Log_Files]#
Message from syslogd@MyLinuxMAanonymized at Apr 13 14:22:14 ...
 kernel:watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [CVODS:2921670]
Apr 13 14:22:14 MyLinuxMAanonymized kernel: CPU: 1 PID: 2921670 Comm: CVODS Kdump: loaded Tainted: P        W  OEL   --------- -  - 4.18.0-372.32.1.el8_6.x86_64 #1

Message from syslogd@MyLinuxMAanonymized at Apr 13 14:22:46 ...
 kernel:watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [CVODS:2921670]
Apr 13 14:22:46 MyLinuxMAanonymized kernel: CPU: 0 PID: 2921670 Comm: CVODS Kdump: loaded Tainted: P        W  OEL   --------- -  - 4.18.0-372.32.1.el8_6.x86_64 #1

Message from syslogd@MyLinuxMAanonymized at Apr 13 14:23:03 ...
 kernel:watchdog: BUG: soft lockup - CPU#50 stuck for 22s! [CVODS:2921662]
Apr 13 14:23:03 MyLinuxMAanonymized kernel: CPU: 50 PID: 2921662 Comm: CVODS Kdump: loaded Tainted: P        W  OEL   --------- -  - 4.18.0-372.32.1.el8_6.x86_64 #1

Message from syslogd@MyLinuxMAanonymized at Apr 13 14:23:11 ...
 kernel:watchdog: BUG: soft lockup - CPU#22 stuck for 23s! [CVODS:2921670]
Apr 13 14:23:11 MyLinuxMAanonymized kernel: CPU: 22 PID: 2921670 Comm: CVODS Kdump: loaded Tainted: P        W  OEL   --------- -  - 4.18.0-372.32.1.el8_6.x86_64 #1

Message from syslogd@MyLinuxMAanonymized at Apr 13 14:23:27 ...
 kernel:watchdog: BUG: soft lockup - CPU#26 stuck for 23s! [CVODS:2921662]
Apr 13 14:23:27 MyLinuxMAanonymized kernel: CPU: 26 PID: 2921662 Comm: CVODS Kdump: loaded Tainted: P        W  OEL   --------- -  - 4.18.0-372.32.1.el8_6.x86_64 #1

Message from syslogd@MyLinuxMAanonymized at Apr 13 14:23:42 ...
 kernel:watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [CVODS:2921670]
Apr 13 14:23:42 MyLinuxMAanonymized kernel: CPU: 4 PID: 2921670 Comm: CVODS Kdump: loaded Tainted: P        W  OEL   --------- -  - 4.18.0-372.32.1.el8_6.x86_64 #1

Message from syslogd@MyLinuxMAanonymized at Apr 13 14:23:51 ...
 kernel:watchdog: BUG: soft lockup - CPU#58 stuck for 22s! [CVODS:2921662]
Apr 13 14:23:51 MyLinuxMAanonymized kernel: CPU: 58 PID: 2921662 Comm: CVODS Kdump: loaded Tainted: P        W  OEL   --------- -  - 4.18.0-372.32.1.el8_6.x86_64 #1

Message from syslogd@MyLinuxMAanonymized at Apr 13 14:24:11 ...
 kernel:watchdog: BUG: soft lockup - CPU#36 stuck for 23s! [CVODS:2921670]
Apr 13 14:24:11 MyLinuxMAanonymized kernel: CPU: 36 PID: 2921670 Comm: CVODS Kdump: loaded Tainted: P        W  OEL   --------- -  - 4.18.0-372.32.1.el8_6.x86_64 #1

Message from syslogd@MyLinuxMAanonymized at Apr 13 14:24:18 ...
 kernel:watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [CVODS:2921662]
Apr 13 14:24:18 MyLinuxMAanonymized kernel: CPU: 2 PID: 2921662 Comm: CVODS Kdump: loaded Tainted: P        W  OEL   --------- -  - 4.18.0-372.32.1.el8_6.x86_64 #1

i[root@MyLinuxMAanonymized Log_Files]# uptime
 14:24:26 up 5 days, 21:08,  1 user,  load average: 38.35, 22.22, 11.21
[root@MyLinuxMAanonymized Log_Files]#
 

(of course MA hostname is different in reality/case).

Reply