DRAGEN

The World's First Bio-IT Processor

You are here: Home / DRAGEN

Genome Pipeline

Ultra-Rapid Genome and Exome Data Analysis

Genome Sequencing

Overview

The DRAGEN Genome pipeline enables ultra rapid analysis of Next Generation Sequencing (NGS) data, reducing the time required for analyzing a whole genome at 30x coverage from ~10 hours using the current industry standard, BWA-MEM+GATK-HC software, to ~22 minutes, while also improving accuracy for both SNPs and INDELs. This pipeline harnesses the tremendous power of the DRAGEN Bio-IT Platform and includes highly optimized algorithms for mapping, aligning, sorting, duplicate marking, haplotype variant calling, compression and decompression.

How Does it Work?

DRAGEN takes raw read data produced by a sequencing instrument, such as Illumina’s HiSeq X Ten. After variant calling, DRAGEN outputs a standard VCF file ready for tertiary analysis. The DRAGEN platform includes a fully functional and easy to use graphical user interface (GUI) and Workflow Management System, enabling customers to easily schedule multiple workflow runs, analyze results such as alignment statistics and coverage metrics, compare different pipelines, monitor multiple networked DRAGEN systems and receive updated software releases.

DRAGEN Genome Pipeline

The DRAGEN pipeline offers supreme flexibility of data analysis. DRAGEN can handle multiple input formats and produces industry standard output formats compatible for downstream analysis. DRAGEN can stream BCL data directly from sequencer storage, a solution unique to the DRAGEN pipeline, enabling the customer to go directly from raw sequencing data to an output VCF. DRAGEN can also convert BCL to FASTQ or BAM/CRAM, then proceed with the standard DRAGEN pipeline.

DRAGEN Genome Pipeline

Efficient Multi Sample Processing

DRAGEN is capable of processing BCL data directly, eliminating any FASTQ conversion step. The BCL data is fed directly to the pipeline to produce a unique output VCF file per sample. Intermediate BAM/CRAM files can be generated on demand. To streamline and automate multi sample processing, DRAGEN offers a comprehensive Workflow Management System (WMS). The WMS enables customers to easily schedule multiple workflow runs for any pipeline, as well as adjust or accelerate their own NGS analysis algorithms, pipelines, and applications.

Genome Pipeline

Pipeline Steps

1

Input/Output File Formats

  • FASTQ or BCL to BAM/CRAM or VCF/gVCF
  • BAM/CRAM to VCF/gVCF
2

Compression/Decompression

  • Decompression of FASTQ, BCL, BAM/CRAM
  • Gzip and CRAM in and out
logo

BCL Convert/Demultiplex

  • BCL conversion to FASTQ
  • BCL can also be processed directly
4

Mapping/Aligning

  • Single end or paired end reads
  • Supports read lengths from 26 bp to 10k bp
5

Position Sorting

  • Binning by reference range
  • Sorting of bins by reference position
6

Duplicate Marking

  • Based on starting position and CIGAR string
  • Highest quality duplicate report
7

Variant Calling

  • Haplotype variant caller with reassembly
  • Uses Hidden Markov Model and Smith-Waterman Alignment

Ultra-Rapid Analysis: # Genomes Sequenced in 48 Hours*

Whole Genome Analysis

Speed: Single Sample Pipeline*

Speed of Single Sample Genome Pipeline

Speed: Multi Sample Pipeline*

Multi Sample Genome Pipeline Analysis

**BCL direct to VCF capability is unique to the DRAGEN pipeline.

Accuracy: Single and Multi Sample Pipelines*

Accuracy for Genome Pipleine

*All DRAGEN results are compared against BWA-MEM 0.7.12 + GATK 3.1 running on comparable servers.

ROC Plots of Variants at 30x Coverage

ROC of SNPs

ROC of SNPs: A SNP (single nucleotide polymorphism) occurs when a single base differs between two genomes, in this case the subject and the reference genome. Use of the NIST Platinum Genome high confidence call set enables performance comparisons between different pipelines. In this ROC plot, a higher count of true positive SNPs and lower count of false positive SNPs is considered better.

ROC of INDELs

ROC of INDELs: An INDEL (insertion or deletion) occurs when bases are inserted or deleted in the subject genome with respect to a reference genome. Use of the NIST Platinum Genome high confidence call set enables performance comparisons between different pipelines. In this ROC plot, a higher count of true positive INDELs and lower count of false positive INDELs is considered better.

Product Sheet:

DRAGEN Genome Pipeline

Genome Sequencing Pipeline