I am using a new RAD-seq protocol that allows for the multiplexing of hundreds of samples (depending on genome size) and alleviates concern for PCR duplicates. The general protocol is called "Rapture" and can be viewed at: http://biorxiv.org/content/early/2015/10/11/028837, which also includes a subsequent "capture" section to target sequencing reads at a location of interest. The protocol allows for genome-wide single-nucleotide polylmorphisms (SNPs) to be detected with limited amounts of DNA (50 - 75 ng). We pooled 144 samples into a single lane of 2x 150bp sequencing on the Illumina NextSeq machine - which outputs 400 million reads per lane. We expect about 25x coverage on average for our genome which is about 2.5 Gbp in size.
General Order of Analysis:
- Flip read pairs depending on which side has the cut site (required for the new RAD protocol - you will need to write a short script to do it)
- De-multiplex with process_radtags in STACKS (or use other similar software).
- Either align against reference(s) and/or identify loci de novo in STACKS (or use other similar software).
- Call genotypes with STACKS (or similar software).
- Analysis of SNPs with population genomics software.
In this analysis I will be utilizing the STACKS workflow to organize Illumina sequencing output, create an independent genomic assembly, and call SNPs. See the Stacks category of this blog for further details.
In a unix command line it would be:
flip_trim_sbfI_150821.pl barcodes.txt reads_R1.fastq reads_R2.fastq output1.fastq output2.fastq
barcodes.txt is just a text file with all the expected barcodes, one per line
reads_R1.fastq is the raw file of forward reads (fastq format, not gzipped)
reads_R2.fastq is the raw file of reverse reads
output1.fastq will be the output file of forward reads
output2.fastq will be the output file of reverse reads
From this point it should be ready to go into process_radtags in STACKS. Check to make sure it's working before moving on though (ie. the output files should have 80-90% or more of the number of lines of the input files).
For running on a server that utilizes "slurm" (Simple Linux Utility for Resource Management) you must schedule your job to run with the "sbatch" command. You will need to have a script (save it as flipRADreads.sh) that you will launch by typing
The script will look something like:
#SBATCH -N 4 # nodes=4
#SBATCH --ntasks-per-node=8 # ppn=20
#SBATCH -J flip_HT03 # job name
#SBATCH -t 48:00:00 # 14:00 is 14 min walltime
#this loads Environment Modules
module load openmpi
# cd /directory/where/datafiles/are/stored
#barcodes.txt is just a text file with all the expected barcodes, one per line
#reads_R1.fastq is the raw file of forward reads (fastq format, not gzipped)
#reads_R2.fastq is the raw file of reverse reads
#output1.fastq will be the output file of forward reads
#output2.fastq will be the output file of reverse reads