Next-Generation Sequencing (NGS) has become an essential tool in genomics research and also plays a crucial role in variant analysis. Setting the appropriate sequencing depth is critical when analyzing variants using NGS data, as it determines the accuracy and cost-effectiveness of the experiment. For instance, higher sequencing depth allows for the detection of variants with lower Variant Allele Frequency (VAF) and increases the reliability of detected variants. However, this comes with increased analysis costs. If the VAF of the variant of interest is high, a lower sequencing depth can still provide sufficient detection performance.
To detect variants with a specific VAF, the required minimum sequencing depth can be calculated. Assuming minimum read counts of 10 for certain variant detection, a sequencing depth of 100X is necessary to analyze a 10% variant, while a depth of 1000X is needed to analyze a 1% variant. If the sequencing depth is lower than these values, the number of reads containing the variant will not meet the threshold for detection.
Another critical factor in determining the appropriate sequencing depth and attainable VAF detection limit is the amount of starting material. The human genome, which consists of 3 billion base pairs, has a haploid mass of approximately 3.3 pg. Thus, 3.3 ng of human gDNA contains about 1000 haploid copies of the human genome. The amount of starting material significantly impacts the number of genome copies available for analysis. If the sequencing depth exceeds the number of available genome copies, the PCR duplicate rate increases, reducing data analysis efficiency. Furthermore, if the genome copy number is insufficient relative to the variant detection rate, even extensive sequencing may fail to detect the variant.
Additionally, the quality of the DNA sample is also an important factor for variant detection. The haploid copy number values calculated based on sample quantity assume an optimal genome with little or no damage. If the sample quality is poor or the DNA is damaged, the number of molecules available for the experiment decreases, reducing data analysis efficiency. To overcome this and ensure sufficient performance, more starting material or more sequencing data may be required.
In conclusion, the appropriate sequencing depth should be determined by considering the comprehensive factors, such as experiment’s objectives, sample quality, budget, and available analytical tools. The decision for setting sequencing depths may vary based on the researcher’s experience, knowledge, and the characteristics of the variants being analyzed.