Question

Oracle Linux - 1-Touch Backup - Received failed message for job, phase [Scan]

  • 18 April 2023
  • 9 replies
  • 476 views

Badge +2

Greetings,

There are two physical machines running Oracle Linux 7.6 with 1-Touch Backup (CV 11.24.78) configured the same way. 1-Touch Backup is working fine on one machine, no errors, ‘all green’ 👍

Now, let’s take a look at the other host:

It’s a differential backup (the full backup completed successfully the day before); six hours later, after 10 attempts CV gives up 😥

There aren’t many details in the CV journal with regard to what happens at 10:15:14 - this is where (and when) the problem occurs:

However, I see a lot more in FileScan.log I downloaded from the Linux - the crucial entries are in CFileScan - Status - FAIL (cv-log).txt

Those entries repeat further in the FileScan log until the job ultimately fails.

This may be the key sequence 🤔:

  1. comparing with previous dir change [] to get deleted items
  2. Failed to process DirChange
  3. CPostProcessDirChange::Run(173)/Compare failed

I will be grateful for any advice/hint on this one 🙂


9 replies

Userlevel 4
Badge +10

Hi @Tommy 

 

Do you see DCTot / DCInc files under the job results folder for this specific subclient? You can get the job results folder path from FileScan.log for the corresponding job ID.

 

Looks like previous dirchange file is missing, which we use to identify the deleted items.

 

Thanks,

Sparsh

Userlevel 4
Badge +10

Also, would it be possible for you to get complete FileScan.log for the specific job ID in question? Can you try running an incremental job instead of differential?

Badge +2

 

Do you see DCTot / DCInc files under the job results folder for this specific subclient? You can get the job results folder path from FileScan.log for the corresponding job ID.

 

Hi @SparshGupta !🙂

I found the folder, and I can see two large files in it, DCTmp.cvf & DCTot.cvf, and plenty other of the following name patterns:

  1. CollectInc1-8.cvf
  2. CollectTotXX.cvf

I included the entire “ls” output in ls_jobResults_folder.txt

I also included some more entries from the FileScan.log that you may find interesting - but if you want - I can upload the whole file 👍

Can you try running an incremental job instead of differential?

Technically that is possible though I would rather do it later in the evening when the workload is usually not as high as during the day. I also presume it would require me to perform a new full backup first.

[UPDATE]

I’ve just found something interesting - DCInc.cvf is present on server 1 on which backup is working (there’s also DCTot.cvf file) - so, maybe the problem is that DCInc.cvf file is not created properly on server 2 ?? (apparently it ends up with DCTmp.cvf name instead of DCInc.cvf)

Some entries in the CScanEngine - OutputHandler.txt file suggest that it may be case 🤔

Userlevel 4
Badge +10

Hi @Tommy 

 

I’ve just found something interesting - DCInc.cvf is present on server 1 on which backup is working (there’s also DCTot.cvf file) - so, maybe the problem is that DCInc.cvf file is not created properly on server 2 ?? (apparently it ends up with DCTmp.cvf name instead of DCInc.cvf)

 

 

No. This is correct. What happens is during Filescan phase for every job, we compare the dirchange of previous job and dirchange of current job to identify the deleted items from the client machine.

 

Dirchange of current job = DcTmp.cvf

  • If previous job was full, dirchange for previous job = DCTot.cvf
  • If previous job was full, dirchange for incremental job = DCInc.cvf

 

When the comparison is successful, then we rename the temp file DcTmp to DcInc. But, for this to happen Filescan should complete successfully.

 

On server1, there might not be any issues, hence you are seeing DcInc.cvf correctly. But, on server 2, filescan phase is failing during dirchange comparison, hence you are seeing DcTmp.cvf.

 

Let me know if you can upload the complete Filescan log for the failing job for further investigation.

 

Thanks,

Sparsh

 

Badge +2

Hi @SparshGupta 

Here are two files:

  1. job_10795478.txt - log from the job performed after the last successfully completed full backup
  2. job_10843786_latest.txt - as the name sugguests, the last differential backup

I hope this is what you asked for, oh, and thanks for the explanation on DCTmp.cvf file 😉

Userlevel 4
Badge +10

Hi @Tommy 

 

We are not getting previous dirchange here. This can happen when you run a differential job directly after a full job.

 

4450 1162 04/18 22:15:04 10843786 CPostProcessDirChange::Run(155) - comparing with previous dir change [] to get deleted items
 


 

Is there any specific reason for running a differential job after full? Please try running an incremental backup and that should work without any hiccups.

 

Thanks,

Sparsh

Userlevel 4
Badge +10

Hi @Tommy 

 

Can you please let us know what service pack you are on? CS & client?

 

We are unable to reproducible this in-lab.

 

Thanks,

Sparsh

Badge +2

Hi @SparshGupta 

 

Is there any specific reason for running a differential job after full?

 

It’s not that it was run immediately after the full backup was completed - say, the full backup was initiated on Sunday @10 PM and then (~22 hours later, Monday) the following (differential) backup failed and it’s been like that since then.

We prefer differential approach as potential recovery would take us less time plus we have enough storage, and we know it’s working fine on the other server.

I just checked the client, iDataAgent: File System, version: 11 SP24.78 and I see the same on CS.

 

I’m not sure if it matters but:

Those two physical servers (Server 1 & Server 2) are in a cluster (RAC), and CV also does archive logs backup every hour (logs are stored on LUNs ‘linked’ to those servers). The last full (file system) backup (Server 2) was done manually on April 6th, and since then we’re getting the following error every time full file system (scheduled) backup is triggered:

It does not apply to the other server (Server 1) on which everyting seems to be working fine (file system backups are scheduled the same way for both machines)

I’ve noticed that full backup (Server 2) works when run manually and when ArchiveLog backup is not taking place, in every other case (and when done according to the schedule) it looks as follows:

  1. April 15th, ArchiveLog backup starts at 00:00:09 AM, end at 00:01:43 AM
  2. April 15th, File System backup (full), Server 2 - backup starts at 00:00:22 AM and fails because: Another backup is running for client (...)
Userlevel 4
Badge +10

Hi @Tommy 

 

If job is already running for a subclient, and another job gets triggered for same subclient (manually or via schedule, the above error is displayed).

 

For instance, let’s say your incremental schedule runs every 6 hours. Job X1 was triggered at 9AM but, it takes 14 hours for the job to complete. Now, as part of schedules, 2 jobs will trigger for same subclient but since another job is already running, we don’t process new requests.

 


 

Also, differential backups takes longer compared to incremental backups, as the purpose of differential jobs is to cumulate all data since last full / synthfull.

 

https://documentation.commvault.com/2022e/expert/11689_differential_backups.html

 


 

Regarding the issue of differential job going to pending, I would request you to open a support ticket with Commvault, and we will check it further.

 

Thanks,

Sparsh

Reply