Sequence data were generated with Illumina RTA 188.8.131.52 & CASAVA-1.8.2, aligned to the human reference GRCh37.p5 using GSNAP , and nucleotide (nt) variants were detected and genotyped with the Genome Analysis Tool Kit  (GATK, versions 1.6. and 3.2). Sequence analysis used FASTQ, bam, and VCF files. Variants were annotated with the Rapid Understanding of Nucleotide variant Effect Software (RUNES, v3.3.5)
The DRAGEN pipeline operates on a single-server hybrid hardware/software platform, with a dual Intel Xeon central processing units (CPUs), and a custom Peripheral Component Interconnect Express (PCIe) board with a field-programmable gate array (FPGA) and 32 GB of Dynamic random-access memory (DRAM) attached directly via four double data rate type three synchronous dynamic random-access memory DDR3 SDRAM channels. Critical compute-intensive functions of the pipeline are performed by custom massively parallel FPGA logic for maximum speed, while other functions run in optimized multi-threaded software on the Xeon cores, for maximum flexibility. A parallel (redundant array of independent disks, RAID 0) Solid State Drive (SSD) file system provides the I/O bandwidth necessary to feed the processing pipeline, and FPGA compress/decompress engines maintain throughput to and from compressed file formats.
DRAGEN uses a hash table index of a reference genome to map many overlapping seeds from each read to exact matches in the reference. After mapping, reads are sorted by reference position; PCR or optical duplicates are optionally flagged. An initial sorting phase operates on aligned reads returning from the FPGA. Final sorting and duplicate marking commences when mapping completes; these operations overlap variant calling when the latter is requested, and add almost zero time to the FASTQ-to-VCF pipeline.
The DRAGEN variant caller runs mostly in highly optimized software, for maximum flexibility of the algorithms. Only stable, compute-intensive operations are accelerated by FPGA engines. DRAGEN implements multi-threaded parallelism in a single pass over the whole reference genome, without launching multiple caller processes on various subsets of the reference. A single call to the DRAGEN executable runs the entire pipeline from FASTQ to VCF, for the whole genome. Mapping/alignment is done in one pass over the reads, and all steps of variant calling (in addition to read sorting and duplicate marking) run simultaneously in a software/hardware pipeline emitting VCF results.
Causative variants were identified primarily with Variant Integration and Knowledge INterpretation in Genomes (VIKING) software. By allowing dynamic filtering of variants based on variables such as individual clinical
features, diseases, genes, assigned ACMG-type pathogenicity category, allele frequency, genotype, and inheritance pattern, VIKING assists in identification of a differential diagnosis.