Question

Big Data Apps on no longer needed copy

  • 20 December 2023
  • 12 replies
  • 181 views

Badge +4

Hello all!

In our environment, we have created a temporary copies within our few storage policies - (call them copy 6 and 7). Those were tape copies.

 

We have keep some backup there for 2 months, but it is no longed needed. We do not need those copies within those storage policies any more.

 

But for now, even if they are no longer any backup retained on this 6-7 copy, we do have a lot of “Big Data Apps” jobs retained on those copies.

We would like to delete those copies because it interferes with our “workflow” which picks jobs for copies and it's keeping to mark them to copy for 6-7 copy. (even if copy is disabled)

 

So my question is - what exactly this Big Data Apps is? It is some kind of Index, am I correct? 

If those “Big Data Apps” are stored on copies 6-7 are there any dependencies for other copies, other jobs keep in other/same storage policies?

Can we just delete copy 6-7 from our storage policies even if they are still those Big Data Apps stored on it, and it won't affect the rest of backups on other copies/policies? Even if some backups are still stored on other copies and were present on copy 6-7 for only some time?

 

 


12 replies

Userlevel 6
Badge +15

@Grzegorz 


Correct, these big data apps jobs are the v2 index database backups which occur, and don't follow normal retention rules. A v2 index database backup will always keep 3 versions of that index backed up, and once the 4th one runs, the first one ages. If the MA loses its index on the local drive, to rebuild it may take a long time. The 'big data apps' is the solution. These 'big data apps' jobs can be used to rebuild the MA's Indexing in hours instead of Days.

So assuming you want to retain these big data app jobs in the event disaster strikes, then I would verify these jobs exist on at least one other copy (besides the primary).. if they do, feel free to delete them the temp copies you’ve created. 

Regards,

Chris 

 

Badge +4

Thanks for Your response! 

But which jobs You want me to verify on other copies? The original backup of client, or that “big data app” job?

Because for now those “temp” backup jobs for client server have already expired, and only that big data app is stored there. But our script is still trying to copy new jobs there, which i need to manual select as do not copy.

Userlevel 6
Badge +15

@Grzegorz 

Big Data Apps jobs is what I’m referring to.

Regards,

Chris

Badge +4

Once again, many thanks for the response!

Do You have any tips, how easily we can check if those jobs are present in other copies? We are not able to check it in the Job Retention 

 

We can check it one by one via command Center by manually copping it…
 


But still, its not that a big problem… question is, that we do have those copies configured:

 


And what will happen if we will delete this Big Data from copy 6 and 7? Will it be automatically replicated to other copy? Best idea will be to have it on 1,2,3,4 coppies. Now it is on 1,2 and 6,7...

Userlevel 6
Badge +15

@Grzegorz 

I believe the easiest way to check if those jobs exist on a particular copy is to right click each copy > view jobs > disable a timeframe (to get all job results) and view the results.

If you delete the jobs from copy 6 and 7, they shouldn’t re-copy (especially as those copies are now disabled / to be deleted right?). 

If those copies are not disabled/going to be deleted any longer, then you view jobs from the source copy and mark them ‘do not copy’.

https://documentation.commvault.com/2023e/expert/prevent_jobs_from_being_copied.html

Regards, 
Chris

Badge +4

Ok then!
I'm really grateful for all Your support! 

Just last question… If those Big Data Apps will only remain on two Disk Copies (copy 1,2), then we will delete this temporary copies (6-7). Then do You think that we shall manually copy those jobs to any other Tape Copy to have them in case on disaster recovery - to copy 3-4 in our environment?

Because now if I will delete those jobs and the whole copy 6-7 it seems that those Big Data will only remain on two copies: 1-primary disk copy, and 2-synchronus disk copy. Shouldn't we have them on tapes also in case on DR? We do have regular tape copies 3-4 which are our regular copies, and they will stay in our environment.

What do You think is best practice here?

Userlevel 6
Badge +15

@Grzegorz 


Having additional copies of data can’t hurt!

If you have the storage, may as well utilize it, especially because these jobs don’t take up much space (usually). 


So yes, having a copy on tape would probably be beneficial to protect you in the event both primary and secondary go down... however, note that if there are other jobs on that tape (non index backup jobs), they may hit their retention and qualify for aging, however as these index backup jobs don’t ‘age’ based on retention, they may hold the tape up from being reused. 

So you might want to keep an eye on this and then manually start cleaning up some of these jobs from the copy to allow tapes to be reused.

Hopefully that makes sense.

Regards,

Chris 
 

Badge +4

 


Having additional copies of data can’t hurt!


 

Totally agree :D

 

You said “index backup jobs don’t ‘age’ based on retention, they may hold the tape up from being reused”
and it's also a case in our environment. We do have a few tapes which are blocked by those Big Data Jobs - is there a way to check if they are still needed? Or to which jobs they are related to? Is that possible to “consolidate” them to one tape if they are spread out on a few? Will “Media Refresh” work? 

Userlevel 6
Badge +15

@Grzegorz 

Not sure you can consolidate them to the one tape.. I’ll see what I can find (media refresh won’t work for this scenario). 

Can you please share a screenshot of these jobs on the tape (view media)?

Just want to double check something.

Regards,

Chris

Badge +4

So it seems that media refresh won't work for Big Data, but it still can “consolidate” all other backups separated between tapes?

 

I do have experiencing a lot of workloads now, so sorry for such a delay in replay… 

So speaking of our Big Data separated on few tapes, please have a look: 
Few tapes which retention should already expire:

 

all jobs are already aged:
 

and on this particular media there is only one job:
 

Is there a way to check to which backup jobs those Indexes are related? I do think that those backups jobs might already expire, but index backups are still on tapes due to other retention rules...

Userlevel 6
Badge +15

@Grzegorz 

Thanks for those screenshots. Can you run the data retention and forecast compliance report: https://documentation.commvault.com/2023e/expert/viewing_data_retention_forecast_and_compliance_report.html

Select the storage policy this job is writing to, then let me know if it shows up in the report (and if so, what is the ‘reason for not aging’?).

Regards,

Chris

Badge +4

Thanks for another tips!

I have created a report, and it seems that all jobs are not aged because of the same reason, “indexing job”:

 

There were also some job which were not aged because they do have “infinite” retention, but they have been already exported to UAL.

Reply