Question

sucessful backups not protecting files The process cannot access the file because it is being used by another process

  • 8 February 2021
  • 30 replies
  • 249 views

Badge +1

The file system backups are showing as successful(no warning) however failing to protect some of the files; I  am wondering why the job is not completing with partial sucess and showing it as a VSS issue.  Is that becuase there are no application specific vss writers for the application and hence quiesing is not working its magic 

  • [C:\Program Files (x86)\BigFix Enterprise\BES Client\__BESData\SiteData.db] The process cannot access the file because it is being used by another process.

30 replies

Badge +1

I did try forcing this using the key however it still failed.  will review the logs further.

Badge +1

The command ‘vssadmin list providers’ should show what’s available. 


If another provider is set to be the defaults, you can force CommVault to call the windows provider using nUseVSSSoftwareProvider

https://documentation.commvault.com/commvault/v11_sp20/article?p=18457.htm

Badge +1

okay so I investigated it further; although the VSS writers were showing as okay; the VSS providers had an additional provider Acronis VSS sw provider

 

and VSS errors on the event log 

12292 during backup

 

Checking with customer now if they have any other backup agent installed and shared the following article for them to review.  

Hello guitarfish,

Thank you for posting this information in our forum.

Did you check if this, after upgrading from Acronis Backup & Recovery 11 to Acronis Backup & Recovery 11.5?

  1. Go to Start -> Run -> regedit.exe and press ENTER;
  2. Navigate to this key: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\VSS\Providers\
  3. Check, if you can find at the left part entries below Providers. There should be only one entry.

    Under every condition: Click on every entry and check in the right part, if you can find this entry: "Microsoft Software Shadow Copy provider 1.0". Do not delete this entry!

    For the other entries: Check if you can find the Acronis VSS Provider entry and delete it.

  4. Restart the machine.

 

 

Userlevel 4
Badge +6

The file system backups are showing as successful(no warning) however failing to protect some of the files; I  am wondering why the job is not completing with partial sucess and showing it as a VSS issue.  Is that becuase there are no application specific vss writers for the application and hence quiesing is not working its magic 

  • [C:\Program Files (x86)\BigFix Enterprise\BES Client\__BESData\SiteData.db] The process cannot access the file because it is being used by another process.

 

Correct, and it cant quiese the DB.   How does bigfix recommend protecting the DB?  From there we can help craft a solution

Userlevel 4
Badge +6

@MFasulo  Have to admin that it's getting better and better, but adding in auto remediation would be the next step right ;-)

Shall I open a CMR for the reprocessing of failed files after the job finishes to perform a re-scan that will scan all missed files to identify if they were deleted during the job run? That would remove a lot of false positives from being taking into account.

yes!!   

In regards to the reprocessing:   during file scan we build a collect file.  That collect file is passed to the backup process to protect the files.  What is happening is inbetween the scan and backup, the files disappeared.  The backup will flag that as a specific failure with an appropriate context/error code.  In the screenshot below we flag windows error code 3 (first red box) for path not found (it was in the scan, but when we went to back it up it wasnt there, and the second red box was error code 2, the file is not there.  

In those cases remediation isnt possible, but for other conditions we should explore a retry (I’m almost positive we try again at the end of phase runtime, but ill check with dev)

 

Userlevel 3
Badge +6

@MFasulo  Have to admit that it's getting better and better, but adding in auto remediation would be the next step right ;-)

Shall I open a CMR for the reprocessing of failed files after the job finishes to perform a re-scan that will scan all missed files to identify if they were deleted during the job run? That would remove a lot of false positives from being taking into account.

Userlevel 4
Badge +6

We do this today in several areas.  

 

We highlight the last backup status and depending on the failure type we provide some quick recommended actions:

You can see there is a difference between the failed to start recommendations and the failed VSS snapshot.  When we first started talking about this, this is where I was thinking we can inject a workflow/action that stops and restarts VSS (or something like that.

 

 Here is from VM group where you can resubmit and  backup just the failed VMs:

 

Here is a shot from databases:

 

 

 

Userlevel 3
Badge +6

Well, @MFasulo is the right guy for Command Center, no question.

Can you link me to your idea about the post phase scan?  I want to ensure it’s getting traction/attention.

@Mike Struening see here

 

 

Userlevel 4
Badge +11

Well, @MFasulo is the right guy for Command Center, no question.

Can you link me to your idea about the post phase scan?  I want to ensure it’s getting traction/attention.

Userlevel 3
Badge +6

@Mike Struening I would definitely not pause the job because this will definitely have impact. So what I would like to see is an alert being raised/send but more importantly I want better feedback from Command Center that something is happening. Both myself and a lot of colleagues who have been forced to use Command Center really miss a single pane of glass. Its to much clicking around and information is scattered all over the place. To be precise, in this case I would like to see a warning sign/indicator/led/icon besides the client computer in the servers view that gives an indication that something is not ok with that particular client.

Please also look at my idea around initiating a post phase that re-scans all failed files and clears the ones who have been deleted in between the 2 scan phases. This will rule-out false positives like temp files etc.

Next week we'll discuss the bigger picture regarding my statement about Command Center with @MFasulo

Userlevel 4
Badge +11

@MFasulo , File Scan does provide a list (logs below), but I’m not aware of any actual remediation; only status reporting.

@Onno van den Berg , in your vision, would the job pause and wait for admin action/corrective action or simply provide more upfront detail?

Here is an output from a lab backup I just ran to test:

2640  5f4   02/09 08:42:30 165759 CsSnapRequestor::AddWritersToVolumeList() - Added writer(s) <[Task Scheduler Writer, d61d61c8-d73a-4eee-8cdd-f6f9786b7124] [VSS Metadata Store Writer, 75dfb225-e2e4-4d39-9ac9-ffaff65ddf06] [Performance Counters Writer, 0bada1de-01a9-4625-8278-69e735f39dd2] [System Writer, e8132975-6f93-4464-a53e-1050253ae220] [ASR Writer, be000cbe-11fe-4426-9c58-531aa6355fc4] [Registry Writer, afbab4a2-367d-4d15-a586-71dbb18f8485] [BITS Writer, 4969d978-be47-48b0-b100-f328f07ac1e0] [WMI Writer, a6ad56c2-b509-4e6c-bb19-49d8f43532f0] [COM+ REGDB Writer, 542da469-d3e1-473c-9f4f-7847f01fc64f] > to included writer list.

2640  5f4   02/09 08:42:30 165759 CsSnapRequestor::StartSnapshotSet() - Created shadow set 6a1137ec-d491-4d73-9b27-547c64bea85e

2640  5f4   02/09 08:42:30 165759 CsSnapRequestor::AddVolumesToSnapshotSet() - Successfully added volume [C:\] to shadow set.

Badge +1

When this error occurred, there should have been corresponding errors in the OS event viewer. 

“4600  18f0  02/09 00:42:54 4041670 CsSnapRequestor::AddVolumesToSnapshotSet() - Call m_vss->AddToSnapshotSet  [FAILED, throwing CV exception] - Code = 0x8004230f, Description = VSS_E_UNEXPECTED_PROVIDER_ERROR”

600  18f0  02/09 00:42:54 4041670 SHADOWSET::CShadowSet::CreateShadowSet(218) - Failed to create shadow, error=0x8004230F

Userlevel 4
Badge +6

My .02 on this:

IF the writer is in a bad state, that needs to get corrected, before anything.   

If the writer still fails on that database file, I would suggest looking into how that vendor recommends protecting the file.   If there are freeze/thaw scripts or quiesce scripts, those should be applied to the subclient.  This can be done in command center.  

 


This has always baffeled me.

How can a job be successfull if it miss to backup files? I’ve never understood the reasoning behind it. If that one file missed is the most critical file and it’s lost even though backups been successfull, how would you explain it?

BR
Henke

 

I agree with this Henke… which is why only under extremely rare cases would I personally suggest manipulating how jobs are classified, based on errors. 

To me, errors and failed files, mean “fix me”.  If those are files you dont need to protect (like tmp/cache files),  we can filter them.   In command center, filtering can be done globally (manage > system global filters), on server group level (configuration tab > file exceptions), on a plan level (through backup content settings), and on the subclient level (through custom backup content)

Like SLA, if its not 100% something is wrong and needs corrective action.  

 

 

@MFasulo maybe an Idea to check the writer status before executing a job so Command Center can deliver sensible information to the user informing him/her to investigate the writer status. 

 

I agree.  @Mike Struening   do you know if we do any VSS writer remediation as part of some error output?  I recall back in my support days we did post the VSS writer status before backup and after backup, not sure if we still do that.  

Userlevel 3
Badge +6

My .02 on this:

IF the writer is in a bad state, that needs to get corrected, before anything.   

If the writer still fails on that database file, I would suggest looking into how that vendor recommends protecting the file.   If there are freeze/thaw scripts or quiesce scripts, those should be applied to the subclient.  This can be done in command center.  

 


This has always baffeled me.

How can a job be successfull if it miss to backup files? I’ve never understood the reasoning behind it. If that one file missed is the most critical file and it’s lost even though backups been successfull, how would you explain it?

BR
Henke

 

I agree with this Henke… which is why only under extremely rare cases would I personally suggest manipulating how jobs are classified, based on errors. 

To me, errors and failed files, mean “fix me”.  If those are files you dont need to protect (like tmp/cache files),  we can filter them.   In command center, filtering can be done globally (manage > system global filters), on server group level (configuration tab > file exceptions), on a plan level (through backup content settings), and on the subclient level (through custom backup content)

Like SLA, if its not 100% something is wrong and needs corrective action.  

 

 

@MFasulo maybe an Idea to check the writer status before executing a job so Command Center can deliver sensible information to the user informing him/her to investigate the writer status. 

Userlevel 3
Badge +6

 

Yes you are right there are many jobs that completes with errors, though we thought that missing one file is bad enough. So we changed the setting, I think it’s this that we talk about, to 0 failed files.

This gives an indication to what clients needs attention, and most of them can be dealt with applying filters, either global or local. Some can’t be filtered out though, such as SystemState and so on.

Most of the problematic ones are systems with alot of “temp” files, that seem to be flagged for backup but when the backup occurs they aren’t there, hence are failed due to the system.

 

In addition we create alerts for systems under compliance audits for failed files/jobs.

 

 

 

@Henke what Commvault could introduce is a post process that process the list of failed files and performs another file scan to see if the files still exist. That could filter out the temp files automatically and could reduce the amount of reported files. The issue with Temp files being wiped while the job is progressing will also be more visible with long-running jobs.

maybe this is something to look into!

Userlevel 2
Badge +4

Hi Mike

That's where “Completed with Warnings” would come into play for me.

Everything backed up fine: Completed

Some minor or non critical files not backed up: Completed with Warnings

Some major or critical files not backed up: Completed with Error

Major parts of Backup or System Writers failed: Failed

CVDB knows the StatusName but havn’t seen it used in action yet, nor is it available as Option in Error Threshold Roles.


How do you distinguish between “Some minor or non critical files not backed up” and “Some major or critical files not backed up”? You need some rule to put in place there.

Like if user data is on D: then any file missed goes into Malor category and a Windows O/S none critical files goes into minor.

 

 

Userlevel 4
Badge +6

My .02 on this:

IF the writer is in a bad state, that needs to get corrected, before anything.   

If the writer still fails on that database file, I would suggest looking into how that vendor recommends protecting the file.   If there are freeze/thaw scripts or quiesce scripts, those should be applied to the subclient.  This can be done in command center.  

 


This has always baffeled me.

How can a job be successfull if it miss to backup files? I’ve never understood the reasoning behind it. If that one file missed is the most critical file and it’s lost even though backups been successfull, how would you explain it?

BR
Henke

 

I agree with this Henke… which is why only under extremely rare cases would I personally suggest manipulating how jobs are classified, based on errors. 

To me, errors and failed files, mean “fix me”.  If those are files you dont need to protect (like tmp/cache files),  we can filter them.   In command center, filtering can be done globally (manage > system global filters), on server group level (configuration tab > file exceptions), on a plan level (through backup content settings), and on the subclient level (through custom backup content)

Like SLA, if its not 100% something is wrong and needs corrective action.  

 

 

Userlevel 2
Badge +4

 

Yes you are right there are many jobs that completes with errors, though we thought that missing one file is bad enough. So we changed the setting, I think it’s this that we talk about, to 0 failed files.

This gives an indication to what clients needs attention, and most of them can be dealt with applying filters, either global or local. Some can’t be filtered out though, such as SystemState and so on.

Most of the problematic ones are systems with alot of “temp” files, that seem to be flagged for backup but when the backup occurs they aren’t there, hence are failed due to the system.

 

In addition we create alerts for systems under compliance audits for failed files/jobs.

 

 

 

Userlevel 4
Badge +11

Hi Mike

That's where “Completed with Warnings” would come into play for me.

Everything backed up fine: Completed

Some minor or non critical files not backed up: Completed with Warnings

Some major or critical files not backed up: Completed with Error

Major parts of Backup or System Writers failed: Failed

CVDB knows the StatusName but havn’t seen it used in action yet, nor is it available as Option in Error Threshold Roles.

That’s a clever idea :nerd: .  Let me pass that up the chain and see what I can do!

 

Thanks, @Stefan Vollrath !

Userlevel 1
Badge +3

Hi Mike

That's where “Completed with Warnings” would come into play for me.

Everything backed up fine: Completed

Some minor or non critical files not backed up: Completed with Warnings

Some major or critical files not backed up: Completed with Error

Major parts of Backup or System Writers failed: Failed

CVDB knows the StatusName but havn’t seen it used in action yet, nor is it available as Option in Error Threshold Roles.

Userlevel 4
Badge +11

Good conversation here!  the main (historical) reason for job completion status was that it can be normal for a few files to fail for various reasons.  If you marked every job that skipped one file as Completed with Errors, they’d all look that way.

Completed with Errors is reserved for either system State Components getting missed, or expected Databases getting missed (there might be others escaping me now).  The status really means “we ran and finished, but something major was missed; we advise looking” as well as “we won’t prune previous Completed Without Errors jobs until this is fixed” by default.  There’s some extra logic that goes into play for CwE.

For your request @Onno van den Berg for Command Center inclusion, I’ll get the right people to respond.

For the OP, @Theseeker , let us know if you’re able to address the VSS issues via a reboot and if subsequent backups work as expected.

Userlevel 3
Badge +6

Hi @Onno van den Berg ,

The global level is still IDA Specific so may clear some very specific conditions, however understand there may be FS ida’s used to protect critical systems too.

For this, you can customize error conditions at a client level or a client group level to keep things more granular:
https://documentation.commvault.com/commvault/v11/article?p=6191.htm

 

I think this should cover those scenarios you mentioned, but let me know if not and i’ll do some more digging.

 

Cheers,

Jase

Now I would really like to configure this via Command Center. Customers use this as their console. 

Userlevel 1
Badge +3

Same question here. Best guess would be someone didn’t like to see so many Failed or CWE Jobs so they requested this “solution” to hide the minor issues.

Even considers Job ok if CommServe Services failed to be backed up, or most other parts of SystemWriters.Though that bug should finally be fixed in next HPK.

Personal Solution/Workaround: Alert that goes of on specific SystemWriter Events and triggers Workflow to fix Issues , create Ticket with Server-Owner, or just send an Info-Mail to us for further checking.

Userlevel 2
Badge +4

Hi @Theseeker ,

Backup jobs can be configured with thresholds, this is so a job with 10s of 1000s of files will still complete, despite 1 or 2 failed files.
The thresholds can be modified here: Home tab > Control Panel > Data > Job Management

https://documentation.commvault.com/commvault/v11/article?p=6190.htm

 

Cheers,

Jase

@jgeorges 
This has always baffeled me.

How can a job be successfull if it miss to backup files? I’ve never understood the reasoning behind it. If that one file missed is the most critical file and it’s lost even though backups been successfull, how would you explain it?

BR
Henke

Userlevel 1
Badge +2

@Theseeker Looks like vss is not is a good shape to take a vss snap and hence it is failing. Can you if there is enough space on the volume to take the snapshot? You can use the below command to see where is the shadow storage located for the volume

vssadmin list shadowstorage

 

If this is good, can you check the event viewer to see if there are any errors from VSS?

Restart vss service to see if the issue can be auto corrected.

 

Thanks,

Karthik 

Reply