Solved

Audit Reporting: Confirm data exists in all storage locations


Badge +4

Hey everyone,

 

I’ve got a bit of a puzzler.  We have several years of data on prem, and in a 3rd party S3 bucket.  We’re looking to reduce the footprint of this on prem and 3rd party S3 bucket somewhat, and are moving the data to AWS and Azure combined storage tier libraries as it’s long term data that we need to keep per SLA, but do not expect to recover from unless a project is resurrected or a legal search request comes in, and as such, we can lower some costs by storing it on the lower cost AWS and Azure offerings.

The test Aux copies worked quite well - I can see that both my AWS and Azure libraries have the same number of jobs, and the same total data, but if I am asked by an auditor to show that during this work, for client X, that data was on prem, at the 3rd party S3 site, and at AWS and Azure, before I clear it from on prem and the 3rd party S3, I have no idea how to get a report showing that there are 4 copies of the data.  Alternately, an Auditor could say show me for job XXXX where data resided on this date, and where it resides today.  

By the same token, I’ve no easy way/understanding of how to determine the differences in the on prem library vs the 3rd party S3, and there are some that I will need to account for. 

In my days of working with NetBackup, I could run commands to show that we had copy 1, 2, 3, and 4, and where those copies were (be it tape or disk).    Unfortunately, the same number of jobs, and same TB size being shown for two libraries does not actually mean it’s the same data, not scientifically anyway. :wink:

Ultimately, I want to have a record showing the data exists before I start to prune, and I want this stored outside of CV as a point in time record showing the chain of custody if you will as we revamp our storage policies.

I’ve tried to play with the custom report tools but I’ve only been playing with CV for about a year or so now, and frankly, I don’t understand the DB schema to figure out what tables I need to pull together to extract the data - I also know I’m a lousy DBA - and most databases will misbehave for me (it goes all the way back to my post secondary years). 

To date, none of the reports I’ve looked at get even remotely near what I want, and I can’t do this via the GUI for each and every client and VM we have by doing a browse and restore type operation - there’s just too many systems.

Is there a way to get there from here? :relaxed:

Thanks!

icon

Best answer by D. Kerrivan 12 August 2021, 16:11

View original

10 replies

Userlevel 2
Badge +5

Hi!

 

 It sounds like the “Jobs in Storage Policy Copies Report” might fit the bill here. With this report you can collect details surrounding jobs that exist on a per-copy basis. Check out the doc here:

https://documentation.commvault.com/commvault/v11/article?p=40080.htm

 

 You can Group by client, agent, or JobID to make the data more ingestible depending on your needs as well. Give this a shot and let us know!

Badge +4

@Vsicherman ,

 

Nope. That doesn’t give me what I want, if anything it creates more questions - clearly the data I want is in there someplace, but how do I get it out where it can be examined by an auditor.

 

I know in this instance that 41 jobs were copied to our AWS and Azure libraries (copies 3 and 4 respectively), and they represent about 32 TB of data taken during our first year of backups, and are the year end backups for that time frame.

The “Jobs in Storage Policy Copies Report” does not show the number of jobs in the Summary by Storage Policy Copies table.  The Auxiliary Copy Summary by Clients for All Jobs table shows me lots of info about the primary copy, two columns about the Secondary copy, and zero info about the 3rd and 4th copy. 

Moving on to the Jobs in Storage Policies table, and I can see the policy, job id, but I am only seeing data for one storage policy, not the alternate copies, nor am I seeing a count of available copies any place.

The last table in the report, Jobs in Storage Policies Copies shows zero data.  

When I set the report to return all options, and I run it again, I do get more data, but it’s still not meeting my needs as I only get information about the primary copy. 

 

I need to show job id, client, agent type, start/end time, data size (multiple cols), retain backup until, number of copies, primary copy location, secondary copy location, tertiary copy location, quaternary location.  Knowing the date of the copy would be nice to have, but ultimately I can probably live without that. I don’t even really need the type of backup (incr, synth full, etc.) because by retention, all of these jobs should only be our yearly fulls.

Userlevel 2
Badge +3

Hello @D. Kerrivan,

When running the report “Jobs in Storage Policy Copies”, on the General tab, select the storage policy so that all copies under are selected as well. Also check the box for Associated media which would provide the location for the job in the report. On the Time Range tab, make sure to select First Job to Last Job for the View From option. 

Once the report is generated:

Summary By Storage Policy Copies at the top will show Jobs [# of jobs] on Copy for each copy under the Storage Policy.

 

 Jobs in Storage Policy Copies section will list jobs along with the Storage policy name and the copy name as well as rest of the information you require.

 

Thank you,
Sandip

 

Badge +4

@Sandip Domadia ,

 

No. Again, it is not showing the data as I need to see it.

 

Let me backup a bit.  If I look at one of my long term retention policies, the library view via the CommCell Console shows me the following, under Data Written Distribution by Copies window:

 

So, this is how I know I have 41 jobs, and how big it is.

When I run the report “Jobs in Storage Policy Copies”, I get this sort of answer back in the Jobs in Storage Policy Copies section (names and paths changed):

 

I am looking to see something like this (yes, I took some liberties with the image and text to illustrate my point):

 

As I’ve mentioned, in NetBackup, you could pull for every backup image, a list of all of the copies and what media they were on, and unless they’ve changed the rules, you could have up to 10 copies of that data, and pull a report to show where it was - be it tape, disk or cloud.  As long as I can track the parent job id, and client, the copies could have different sub job id’s - but all reference the parent, and that should be showable.  

I have had some pretty odd questions asked by auditors over the past quarter century, but the most basic one that repeats is, show me where you have copies of data for the following jobs, for the following clients. I’ve found no easy way to answer that as yet in CV.  If this was back in the day when tape was the gold standard, I’d need to know each and every tape ID that covered the multiple copies. 

 

Userlevel 7
Badge +23

@D. Kerrivan , currently, we don’t have a report that has this functionality all in one place.

I would suggest we go one of three ways:

  1. Create a custom report
  2. Contact your Account Rep to inquire about having Professional Services/Personalization create this report for you
  3. Create a CMR to look into getting this added (though there is no ETA or guarantee)

Let me know your thoughts!

Userlevel 7
Badge +23

Hey @D. Kerrivan , following up to see if you have a chance to read my response.

Thanks!

Badge +4

Mike,

Sorry - have been a tad busy with another issue internally. I’m honestly disappointed that the report functionality doesn’t exist, and that one has to currently hop through so many areas to pull the data. 

As for custom reports, that would require more spare time than I have - I have put some effort in on this but do not find that my way of thinking and the report writing tool mesh well. I don’t know where the data I want is, in order to begin to pull it together so drilling through countless tables in the hopes I’ll trip over it is too time consuming (yes, I’ve wasted a bit of time trying and hoping for sheer dumb luck :-)  )  There’s a reason I’m a Storage and Backup guy, and not a DBA… 

I’ll talk with my rep(s) and see re. having the report created and at what cost, though I think the CMR is likely where I’d spend my energy at this point.

 

Userlevel 7
Badge +23

Totally understand @D. Kerrivan .  I’ll keep this open for a bit, just let me know how you want to proceed.  We can go the CMR route, though there’s no guarantee nor ETA for that release.

Badge +4

Just a further update - I did go down the road to start a CMR, due to schedules, vacations etc. it took a bit for we had a chance to discuss.  My good friend S.Jacobs @ CV also queried the team that does the personalization, and they came back asking if I’d seen this report https://cloud.commvault.com/webconsole/softwarestore/#!/135/660/12184  which had been released/posted 11 days before I asked here (I’d exhausted the available reports earlier in the spring, and did not see this get loaded.  This allowed me to pull what I needed and show the job number, and that a copy resided on each of the relevant storage pools/buckets. This will let me show on day X we had the data in one or more places, on Day Y, in several more, and on Day Z that it had been deleted from the original paths.   This should pass any audits, and as I think of it, possibly be of value in a GDPR request.

Userlevel 7
Badge +23

Just a further update - I did go down the road to start a CMR, due to schedules, vacations etc. it took a bit for we had a chance to discuss.  My good friend S.Jacobs @ CV also queried the team that does the personalization, and they came back asking if I’d seen this report https://cloud.commvault.com/webconsole/softwarestore/#!/135/660/12184  which had been released/posted 11 days before I asked here (I’d exhausted the available reports earlier in the spring, and did not see this get loaded.  This allowed me to pull what I needed and show the job number, and that a copy resided on each of the relevant storage pools/buckets. This will let me show on day X we had the data in one or more places, on Day Y, in several more, and on Day Z that it had been deleted from the original paths.   This should pass any audits, and as I think of it, possibly be of value in a GDPR request.

Awesome outcome.

I’m going to mark this answer as correct in case anyone else is looking for a similar solution!

 

Reply