Whole Genome Sequencing Workflow

PMI Workflow – Whole Genome Sequencing

Whole Genome Sequencing is DNA based. DNA is extracted from prepared and extracted from samples.

Step 1: Experiment Generic Information

Experiment generic information like PI, experimenter etc. Add extra info into the notes/comments section as required.

    General Experiment Details
     Entered Date
     Experiment Name
     Experimentor / Analyst
     Associated PIs
     MIGS standard/mandatory fields (GOLD)

Step 2: DNA Extraction

Extract DNA from samples collected. QC and quantification process is also carried out in this step.

DNA Extraction Method / Kit 
DNA Extraction Date
DNA Extraction Technician / Experimenter
Notebook Reference
Extraction results file (Allow File Upload)
Gel Image file (Allow File Upload)
260/280 ratio - Mandatory
Amount of DNA - Mandatory
    Concentration of DNA – Mandatory
    How Concentration Calculated – (allowed values - "nano drop", "pico green", "Bio Analyzer")
    Upload one or more files – number and type will vary depending on the  methods used

 Step 3:  Library construction, QC, titration

    Technician / Experimenter
    Notebook Reference
    Layout - (allowed values - "paired-end", "mate pair", "single read")
    Method  - (allowed values - paired–end, 3KB, 8KB, 500BP, shotgun, other)
    Source - (default value = Genomic)
    Selection - (default value = Random)
    Strategy - (default value = WGS)
    Insert Size
    Standard Deviation
    Planned read length 
    Library ID 
    Sample ID  
    Bar codes / regions
    QC Files Upload ( File Upload - csv  – Mandatory – Nonparsable)

 Step 4: Sequence the library

Instrument Type ( Controlled Vocabulary -  454, Sanger, Miseq, Hiseq, PacBio -- other (add entry when selected) )
Date Sequenced
Run Identifier
Sequencer Location (Controlled Vocabulary -  ORNL, Duke)
Reagent Type / Chemistry
Vendor Software Version 
Link/location to the raw data
Sequencing Output Files ( File Upload - SSF/FastQ - Mandatory - Nonparsable - Project Open)

Step 5: Sequence QC, Data trimming and Assembly.  Allow for multiple assemblies.

Software Used
Software Name
Output Files (File Upload – QC output, fasta - Mandatory – Nonparsable)
NCBI read archive ID
BioProject Accession

Step 6: Annotate the assembled sequence.  One or more per assembly.

Date Analysis Run
Annotation Pipeline
Pipeline Name
Pipeline Version
Web Link
Analyst Name
QC Review Process – text paragraph
Output Files (File Upload - Fasta file of predicted proteins, GenBank file)
NCBI GenBank submission ID