Dark-blue-pieces-abstract-background

DRAGEN

The World's First Bio-IT Processor

You are here: Home / DRAGEN

Joint Genotyping Pipeline

Ultra-Rapid Multi Genome Analysis

Joint Genotyping Pipeline

Overview

The DRAGEN Joint Genotyping Pipeline calls variants from multiple samples at a speed 25x faster than competing pipelines with uncompromising accuracy. The Joint Genotyping pipeline supports pedigree as well as population variant calling from a cohort of samples. The Joint Genotyping pipeline handles up to ten samples at one time. The DRAGEN Population Calling pipeline handles sample sizes of many thousands  at once.

The combination of DRAGEN’s speed and hierarchical grouping of multiple samples provides the most computationally efficient analysis solution for joint genotyping.

How Does it Work?

DRAGEN performs map/align and variant calling on multiple FASTQ files or multi-sample BCL folders produced by a sequencing instrument (such as Illumina’s HiSeq X10). The output files are in gVCF format and are fed into DRAGEN’s Joint Genotyper to produce a single VCF file for subsequent analysis.

The DRAGEN Bio-IT Platform includes a fully functional and easy to use graphical user interface (GUI) and an extensive set of tools, enabling customers to easily schedule multiple workflow runs, analyze results, compare different pipelines, monitor multiple networked DRAGEN cards and manage update releases.

DRAGEN Joint Genotyping Pipeline

The DRAGEN Joint Genotyping pipeline enables variant calls to be made with information from multiple samples. DRAGEN produces an output gVCF file for each of the individual samples. Each gVCF file provides a comprehensive record of every position in the genome. The gVCF files are fed into the DRAGEN Joint Genotyper to produce a single VCF for subsequent joint or family analysis. The Joint Genotyping pipeline handles ten samples at one time. The DRAGEN Population Calling pipeline handles sample sizes of many thousands at once.

Joint Genotyping Diagram 1

Joint Calling from BCL

In the event the user is joint calling samples sequenced on the same flow cell, he can take advantage of the capability of DRAGEN to simultaneously map/align multi-sample inputs to speed up the overall process of joint calling. DRAGEN is capable of processing BCL data directly, eliminating any FASTQ conversion step. The BCL data is fed directly to the pipeline to produce unique gVCF files for each sample. Intermediate BAM/CRAM files can be generated on demand.

Joint Genotyping Diagram 2

Pipeline Steps

1

Input/Output File Formats

  • FASTQ or BCL to BAM/CRAM or VCF/gVCF
  • BAM/CRAM to VCF/gVCF
2

Compression/Decompression

  • Decompression of FASTQ, BCL, BAM/CRAM
  • Gzip and CRAM in and out
logo

BCL Convert/Demultiplex

  • BCL conversion to FASTQ
  • BCL can also be processed directly
4

Mapping/Aligning

  • Single end or paired end reads
  • Supports read lengths from 26 bp to 10k bp
5

Position Sorting

  • Binning by reference range
  • Sorting of bins by reference position
6

Duplicate Marking

  • Based on starting position and CIGAR string
  • Highest quality duplicate report
7

Variant Calling

  • Haplotype variant caller with reassembly
  • Uses Hidden Markov Model and Smith-Waterman Alignment
8

Joint Genotyper

  • Single Up to 10 input gVCF files
  • Jointy called VCF output

Speeds: Joint Genotyping Pipeline*

Joint Genotyping

Accuracy: Joint Genotyping Pipeline*

Accuracy - Joint

Ultra-Rapid Analysis: # Platinum Genome Trios Genotyped in 48 Hours*

Ultra Rapid Analysis Joint Genotyping

*All DRAGEN results are compared against BWA-MEM 0.7.12 + GATK 3.1 running on comparable servers.

ROC Plots of Variants at 50x Coverage

ROC of SNPs Joint Genotyping

ROC of SNPs

A SNP (single nucleotide polymorphism) occurs when a single base differs between two genomes, in this case the subject and the reference genome. Use of the NIST Platinum Genome high confidence call set enables performance comparisons between different pipelines. In this ROC plot, a higher count of true positive SNPs and lower count of false positive SNPs is considered better.

ROC of INDELs Joint Genotyping

ROC of INDELs

An INDEL (insertion or deletion) occurs when bases are inserted or deleted in the subject genome with respect to a reference genome. Use of the NIST Platinum Genome high confidence call set enables performance comparisons between different pipelines. In this ROC plot, a higher count of true positive INDELs and lower count of false positive INDELs is considered better.