Celemics, Inc.

Blogs

Discover our Innovative Stories

NGS Glossary Part 2: Sequencing

  • Post category:Blogs

NGS Key Terminology Guide Part 2: Sequencing

Glossary of common NGS terms

The foundation of a successful NGS experiment continues with precise and reliable sequencing.
This post breaks down essential sequencing terms such as sequencing by synthesis (SBS), SMRT, nanopore, and more. From concepts like read length, paired-end reads, and error rate to techniques like base calling and cluster generation, this guide helps you understand how sequencing works—and why each step matters for high-quality genomic data.

The process of determining the precise order of nucleotides (A, T, G, C) in DNA or RNA molecules.
NGS technologies perform massively parallel sequencing, allowing millions of fragments to be read simultaneously,
greatly enhancing throughput and reducing per-base cost.

Sanger sequencing is a first-generation DNA sequencing method that uses chain-terminating dideoxynucleotides during DNA synthesis.
It produces highly accurate reads but is limited in throughput and read length compared to next-generation sequencing technologies.

Sequencing by Ligation (SBL) is a sequencing method where short, fluorescently labeled oligonucleotides are ligated to the DNA template.
The sequence is inferred based on the ligation patterns detected during the process, as used in SOLiD systems.

Sequencing by Synthesis (SBS) is a method used in Illumina platforms where DNA polymerase incorporates fluorescently labeled nucleotides,
and optical sensors detect the emitted signals to determine the DNA sequence one base at a time.

Sequencing by Binding (SBB) is a emerging sequencing method where fluorescently labeled binding probes specifically recognize and bind to DNA bases.
This process generates a signal without requiring nucleotide incorporation or ligation.
A representative platform utilizing SBB is the PacBio Onso system, which aims to deliver highly accurate short-read sequencing
with reduced error rates compared to traditional methods.

Nanopore sequencing usually means Oxford Nanopore Technologies’ NGS method. It determines the nucleotide sequence
by measuring changes in electrical current as DNA or RNA strands pass through a nanopore in real time.

Single-Molecule Real-Time (SMRT) sequencing, developed by Pacific Biosciences,
analyzes individual DNA molecules in real time, offering long read lengths and high accuracy.

Paired-end sequencing reads both ends of a DNA fragment, generating two reads per fragment.
This approach provides insert size information and enhances the accuracy of read alignment,
especially across repetitive regions, and improves detection of structural variants.

Read length refers to the number of base pairs in each sequencing read, which affects analysis accuracy and determines suitable applications.

A group of identical DNA copies generated during cluster amplification on a flow cell.
Each cluster originates from a single library molecule and produces a distinct sequencing read.

The step where library molecules are bound to a flow cell and clonally amplified to form clusters,
which are then sequenced. Cluster density and uniformity directly affect sequencing throughput and data quality.
Performed automatically within sequencers like those from Illumina.

Chimeric reads are sequencing artifacts formed when two unrelated DNA fragments are erroneously joined or amplified as a single molecule,
often during PCR amplification. These artifacts can interfere with downstream analyses such as genome assembly, structural variant detection,
and accurate alignment. Proper optimization of PCR conditions, including cycle number and enzyme choice, helps minimize their formation.

Read throughput refers to the total number of sequencing reads generated by a sequencing run.
Higher throughput platforms can produce billions of reads, supporting large-scale genomic and transcriptomic studies.

Error rate is the percentage of incorrectly identified bases in sequencing data.
It varies by platform and sequencing technology, influencing the accuracy and reliability of downstream genomic analyses.

The process of translating raw signal intensities—typically fluorescent signals—into nucleotide bases (A, T, G, C) during sequencing.
Base calling accuracy significantly affects downstream analyses such as variant calling and alignment.

A Quality Score is a numerical value assigned to each base in sequencing data, representing the probability that the base is called correctly.
It is typically expressed on the Phred scale, where higher scores indicate greater confidence.
Quality scores are used in filtering, variant calling, and overall assessment of sequencing data reliability.

Contact Us