Novaseq Data

File description

Samples to rename

renameSampleList A text file listing the .fastq files that will be renamed, with the new name in column 2. This naming change will propogate throughout the processing

Samples

sampleList Lists the .fastq files received for a particular run

Fastq

L001_R1_001.fastq.gz is the read data from lane 1 read 1 of the NovaSeq S2 flowcell.
L001_R2_001.fastq.gz is the read data from lane 1 read 2 of the NovaSeq S2 flowcell.
L002_R1_001.fastq.gz is the read data from lane 2 read 1 of the NovaSeq S2 flowcell.
L002_R2_001.fastq.gz is the read data from lane 2 read 2 of the NovaSeq S2 flowcell.

BQSR (Base quality score recalibration)

.recal.bam is a binary file of the mapped sequencing reads, and the bases' quality score has been recalibrated to account for technological bias
.recal.bai is a binary index file for file 1

MultiQC

.html is a report that collects data from FastQC, Deduplication, BQSR, and Metrics

Call

.haplo.bam is a BAM file representing the regions reassembled by GATK's HaplotypeCaller algorithm, before it makes it's final calls
.haplo.bai is a binary index file for file 1

Sample to delete

deleteSampleList The Call output file (.g.vcf) will be deleted for these samples, so they wont be included in Joint Call or any process after this point

Cohort

allSampleList Lists the .fastq files received for all runs. Note this used to be called cohortSampleList
genotypeList Lists the .g.vcf files for all runs. i.e all of the samples that are included in joint calling and beyond. Note this used to be called callSampleList

VQSR (Variant quality score recalibration)

-INDEL-ApplyVQSR.vcf is a VCF file, and the variants' quality score has been recalibrated.

.vqsr.vcf is the same as VQSR file 1 but has been filtered to retain only the variants that passed quality checks, i.e only the variants that contained PASS in the FILTER column

Sample to swap

swapSampleList These sample names will be swapped over in the Pedigree files becuase it is suspected the physical samples have been inadvertently switched

Pedigree

.ped contains all of the samples in the cohort (as of joint calling). This file is used by Peddy.
-conforming.ped is the same as file 1 but filtered to retain only samples that are part of a trio (i.e three family members exist). This file is used by Denovo.
-disconforming.ped is the same as file 1 but filtered to retain only samples that are not part of a trio (i.e three family member do not exist)

Peddy

.html is a report used to visualise sample-sample relationship and sample gender, as calculated using the genetic data

Denovo

.denovo.vcf is the same as VQSR file 2, but with denovo annotations added to the INFO column
-Conf.denovo.vcf is the same as file 2, but filtered to retain only high confidence and low confidence denovo variants
-loConf.denovo.vcf is the same as file 2, but filtered to retain only low confidence denovo variants
-hiConf.denovo.vcf is the same as file 2, but filtered to retain only high confidence denovo variants

VEP (Variant Effect Predictor)

A VCF version of this text file maybe available upon request

_annotated_variants.txt annotation of variants from VQSR file 2. Tab delimited.
_annotated_variants.txt_summary.html visual version of file 1

Downloading your files

All data files can be downloaded via one of the following methods. Additionally, small web browser compatible files can be viewed via the clickable links provided on your webpage.

To you local machine

You can download your data from our RWD server via the command line or file transfer software (for example FileZilla), this will require we create you (or another person working on your behalf) with login credentials. Once you're logged in, you would use the paths provided on your webpage to copy the data to your local machine.

To your project on Rocket

If you have a project on Rocket, we can copy the data directly to that project.