DRAGEN

The World's First Bio-IT Processor

You are here: Home / DRAGEN

Transcriptome Pipeline

Ultra-Rapid RNA-Seq Data Analysis

Transcriptome Header Image

Overview

The DRAGEN Transcriptome (RNA-Seq) Pipeline performs Next Generation Sequencing (NGS) secondary analysis of RNA transcripts. The Transcriptome Pipeline offers multiple operating modes, including reference-only alignment and annotation-assisted alignment. DRAGEN transcriptome alignments are compatible with downstream transcript assembly tools, novel transcript discovery, differential gene expression, gene fusion detection, and other RNA-Seq applications.

How Does it Work?

DRAGEN takes raw read data produced by a sequencing instrument, such as Illumina’s HiSeq 2500. After variant calling, DRAGEN outputs a standard VCF file ready for tertiary analysis. The DRAGEN Bio-IT Platform includes a fully functional and easy to use graphical user interface (GUI) and Workflow Management System, enabling customers to easily schedule multiple workflow runs, analyze results, compare different pipelines, monitor multiple networked DRAGEN cards and receive update software releases.

Transcriptome Pipeline

The DRAGEN Transcriptome pipeline accepts input FASTQ/BAM/CRAM and produces an output aligned BAM/CRAM. DRAGEN offers the option to input a gene annotations file (GTF) to guide the spliced alignments. DRAGEN is also capable of running in a “2-pass” mode which uses novel splice junctions, as detected in the first pass, to guide the second pass mapping / aligning phase.

Pipeline Steps

1

Input/Output File Formats

  • FASTQ or BCL to BAM/CRAM or VCF/gVCF
  • BAM/CRAM to VCF/gVCF
logo

Spliced Mapping/Aligning

  • Single end or paired ends
  • BCL can also be processed directly
Single Junction Output

Single Junction Output

  • Format similar to STAR’s SJ.out.tab.
  • User-configurable junction filters
5

Position Sorting

  • Binning by reference range
  • Sorting of bins by reference position
6

Duplicate Marking

  • Based on starting position and CIGAR string
  • Highest quality duplicate report
6

Duplicate Marking

  • Based on starting position and CIGAR string
  • Highest quality duplicate report
7

Variant Calling

  • Haplotype variant caller with reassembly
  • Uses Hidden Markov Model and Smith-Waterman Alignment
Downstream tools

Downstream Analysis Tools

  • Outputs compatible with downstream tools
  • Tools include featureCounts and DESeq

Transcriptome Pipeline Speed

The DRAGEN Transcriptome Pipeline offers multiple modes, including reference-only alignment and annotation-assisted alignment. The alignment accuracy and splice junction discovery accuracy tables for each mode are shown on the following pages. The reference-only alignment and annotation-assisted alignment pipelines were performed using the Engstrom Sim2 Dataset*.

Transcriptome Pipeline rapid speed

*BEERS Sim 2 datasets obtained from Nature Methods – Systematic evaluation of spliced alignment programs for RNA-seq data. doi:10.1038/nmeth.2722

Alignment Accuracy for Reference-Only Alignment

DRAGEN is highly accurate when sequencing RNA transcripts independent of any annotation-assistance, as shown in the alignment accuracy and splice junction discovery accuracy tables below.

Accuracy

Splice Junction Discovery Accuracy

Reference-Only Alignment

splice junction discovery

Splice Junction Discovery
Cumulative counts of true and false junctions were computed over a range of thresholds for the number of supporting alignments. A point further to the left on a curve has a higher supporting alignment count threshold than a point to the right.

Overall Read Alignment Accuracy*

Reference-Only Alignment

Read Alignment Accuracy

Each bar plot shows the number of perfect alignments (all bases in read aligned correctly), number of partially correct alignments (at least one base aligned correctly but not all) and totally incorrect alignments.

*Reference-only and annotation-assisted alignment pipelines were performed using the Engstrom Sim2 Dataset

Alignment Accuracy for Gene Annotation Input

DRAGEN also offers annotation-assisted alignment that is achieved with gene annotation input (GTF format). GTF format is used to improve the sensitivity of splice junction discovery. DRAGEN may take a GTF as input, providing the pipeline with the precise locations of known splice junctions for a given species. The annotation assisted alignment pipelines were also performed using the Engstrom Sim2 Dataset.

Gene Annotation Input Diagram

Splice Junction Discovery Accuracy*

Gene Annotation Input

Splice Junction GTF

Splice Junction Discovery: Annotations

Gene annotation input improves the  sensitivity of splice junction discovery.  Accurate gene annotations are available for a limited number of species at present.

Overall Read Alignment Accuracy

Gene Annotation Input

Transcriptome Gene Annotation

Read Alignment Accuracy: Annotations

With gene annotation input, DRAGEN perfectly aligns at least 10% more reads than STAR or TopHat.