Answer

What type of hardware and architecture would be required for sensitive data governance on 4PB of data?

Forum|Forum|4 years ago
July 5, 2021
10 replies
115 views

Hemant
Certified Expert

Hi,

I have a customer that would like us to detail the hardware & architecture required to use Activate (Sensitive Data Governance) on the existing backup job data within their Commvault solution.

The solution needs to be able to index and analyse 4PB of file system application data stored within the archive and backup deduplicated storage pool. The solution would also need to scale to support a live crawl of the file system data on the servers in the future.

The current architecture has a limit of 160TB per node. Using this architecture guideline would result in a large index server hardware footprint (25 servers for indexing alone).

Is there potentially a different architecture guideline to follow for big data sizes?

Reference:
https://documentation.commvault.com/11.22/expert/120371_sensitive_data_governance_hardware_specifications_01.html

Specifications for Dedicated Servers for File Data

Component	Large	Medium	Small
Source data size per node*	160 TB	80 TB	40 TB
Objects per node (estimated)	80 million	40 million	20 million
CPU or vCPU	32 cores	16 cores	8 cores
RAM	64 GB	32 GB	16 GB
Index disk space (SSD class disk recommended)	12 TB	6 TB	3 TB

Best answer by Blaine Williams

@Hemant

Roughly there are 400 million objects as per average size 1MB.

We will require at most 5 large access nodes (Index serve, CA and gateway/webserver). We have lots of optimization which helps in picking correct document for SDG so we can except this count to be less.

Also we recommend large environment configuration to be done incrementally so that we can scale accordingly.

Component	Large
File source data size per node*	160 TB
Email source application size	25 TB
File objects per node (estimated)	80 million
Email objects per node (estimated)**	250 million
CPU	32 cores
RAM	64 GB
Index disk space (SSD class disk recommended)	12 TB

https://documentation.commvault.com/commvault/v11_sp20/article?p=95225.htm

I hope this helps your planning.

Blaine Williams
Vaulter
Forum|Forum|4 years ago
July 6, 2021

Hi Hemant,

I will discuss this internal and find out for you.

Hemant
Author
Certified Expert
Forum|Forum|4 years ago
July 6, 2021

Thanks Blaine

Hemant

Blaine Williams
Vaulter
Forum|Forum|4 years ago
July 6, 2021

Hi Hemant,

Can we get answers (approx) for below:

1 – Do they need to do sensitive data analysis on complete 4PB?
2 – What kind of files do they have?
3 – What is approx. average file size?

Hemant
Author
Certified Expert
Forum|Forum|4 years ago
July 7, 2021

Hi Blaine,

There is archive and backup data in a Commvault deduplicated storage pool.

It is made up of File System data.

The application size of all the jobs totals over 4PB.

The files are office based extensions, with an average of over 1MB per file.

Please let me know if you need anything, appreciate the assistance.

Hemant

Blaine Williams
Vaulter
Forum|Forum|4 years ago
July 7, 2021

Hi Hemant,

You didn't answering the following so I have elaborated further, can you please advise.

1 – Do you need to do sensitive data analysis on the complete 4PB of data?
2 – What kind of files do they have in there. Eg, all doc, email etc...?

Hemant
Author
Certified Expert
Forum|Forum|4 years ago
July 7, 2021

Hi Blaine,

1 – Do you need to do sensitive data analysis on the complete 4PB of data? Yes

2 – What kind of files do they have in there. Eg, all doc, email etc...? Office based extensions e.g. .doc, .pdf, docx, etc.

Hemant

Blaine Williams
Vaulter
Answer
Forum|Forum|4 years ago
July 8, 2021

@Hemant

Roughly there are 400 million objects as per average size 1MB.

Also we recommend large environment configuration to be done incrementally so that we can scale accordingly.

Component	Large
File source data size per node*	160 TB
Email source application size	25 TB
File objects per node (estimated)	80 million
Email objects per node (estimated)**	250 million
CPU	32 cores
RAM	64 GB
Index disk space (SSD class disk recommended)	12 TB

https://documentation.commvault.com/commvault/v11_sp20/article?p=95225.htm

I hope this helps your planning.

Hemant
Author
Certified Expert
Forum|Forum|4 years ago
July 12, 2021

Thanks @Blaine Williams , much appreciated.

Is the optimisations available on BoL to review?

Hemant

+22

Mike Struening
Vaulter
Forum|Forum|4 years ago
July 15, 2021

@Hemant , I believe @Blaine Williams is referring to internal code that is optimized at its tasks.

I’ll defer to him if I misunderstood.

https://www.linkedin.com/in/michael-struening

Hemant
Author
Certified Expert
Forum|Forum|4 years ago
July 19, 2021

Thanks Mike.

Hemant

Specifications for Dedicated Servers for File Data

Sign up

Login to the community

Scanning file for viruses.

This file cannot be downloaded