Below you will find 5 shell scripts that each run 2 different K (cluster) values 5 times each and outputs each run to a separate output file so that multiple runs can be easily compressed and loaded into the online Structure Harvester program.
In the next few weeks I will document the pipeline (and flag weird glitchy things) for SNP calling in the software package Stacks with the denovo method (no reference genome).
Step 1: Flip Reads
In a unix command line it would be:
flip_trim_sbfI_150821.pl barcodes.txt reads_R1.fastq reads_R2.fastq output1.fastq output2.fastq
barcodes.txt is just a text file with all the expected barcodes, one per line
reads_R1.fastq is the raw file of forward reads (fastq format, not gzipped)
reads_R2.fastq is the raw file of reverse reads
output1.fastq will be the output file of forward reads
output2.fastq will be the output file of reverse reads
From this point it should be ready to go into process_radtags in STACKS. Check to make sure it's working before moving on though (ie. the output files should have 80-90% or more of the number of lines of the input files).
For running on a server that utilizes "slurm" (Simple Linux Utility for Resource Management) you must schedule your job to run with the "sbatch" command. You will need to have a script (save it as flipRADreads.sh) that you will launch by typing
The script will look something like:
#SBATCH -N 4 # nodes=4
#SBATCH --ntasks-per-node=8 # ppn=20
#SBATCH -J flip_HT03 # job name
#SBATCH -t 48:00:00 # 14:00 is 14 min walltime
#this loads Environment Modules
module load openmpi
# cd /directory/where/datafiles/are/stored
#barcodes.txt is just a text file with all the expected barcodes, one per line
#reads_R1.fastq is the raw file of forward reads (fastq format, not gzipped)
#reads_R2.fastq is the raw file of reverse reads
#output1.fastq will be the output file of forward reads
#output2.fastq will be the output file of reverse reads
Step 2: Check to ensure >80% of reads are retained
To make sure that flipping of the reads was executed correctly you should check to make sure that at least 70% of the reads were retained. To do this you can simply count the number of lines in the original raw sequence files and compare them to the output flipped files. Each sequence read takes up 4 lines in the .fastq file so you will need to divide the number of lines by 4 to get the number of sequence reads for the before and after files. Then simply divide the number of lines in the flipped file by the number of reads in the original raw file and multiply by 100 to get the % of reads retained.
In general you can use the #wc [options] filenames to do many of these types of simple tasks.
wc -l : Prints the number of lines in a file.
wc -w : prints the number of words in a file.
wc -c : Displays the count of bytes in a file.
wc -m : prints the count of characters from a file.
wc -L : prints only the length of the longest line in a file.
# To get the number of lines in a file then simply type:
$wc -l filename.fastq
Step 3: Process Radtags
Prepare a barcode file in the following format with barcode in the first column and sample ID i nthe second:
Create a "sample" directory in Stacks. this will be the output directory for process_radtags.
run the following bash script:
#SBATCH -N 1
#SBATCH -J process_radtagsRobins1
#SBATCH -t 48:00:00
./process_radtags -1 /directory/forwardinputfile.fastq -2 /directory/reverseinputfile.fastq -o directory/samples1/ -b /directory/SampleBarcodes.txt -e sbfI -r -c -q -D
Step 4: Check process_radtags.log
Open the process_radtags.log to assess individual sequence calls and to remove any bad samples from downstream processing.
#how to get STACKS on Cluster
tar zxvf stacks-1.29.tar.gz
module load samtools/0.1.19/x86_64
./configure --prefix=/home/directory/stacks-1.29 #use the path where you want it installed
make #this may give you an error, if so, skip to next step
#to run anything, you need to be in the bin directory
#this will show you all the stacks commands, in order to use any of them, you need a ./
The new RAD-sequencing protocol is now available in Genetics
RAD Capture (Rapture): Flexible and Efficient Sequence-Based Genotyping.
Omar A. Ali, Sean M. O'Rourke, Stephen J. Amish, Mariah H. Meek, Gordon Luikart, Carson Jeffres, Michael R. Miller
GENETICS December 29 , 2015 vol. (Early Online January 1 , 2015); DOI: 10.1534/genetics.115.183665
These posts are intended to document my workflow as I handle and manipulate genomic data within the Mac OSX terminal window. The data processed here are from the Illumina platform.