In a unix command line it would be:
flip_trim_sbfI_150821.pl barcodes.txt reads_R1.fastq reads_R2.fastq output1.fastq output2.fastq
barcodes.txt is just a text file with all the expected barcodes, one per line
reads_R1.fastq is the raw file of forward reads (fastq format, not gzipped)
reads_R2.fastq is the raw file of reverse reads
output1.fastq will be the output file of forward reads
output2.fastq will be the output file of reverse reads
From this point it should be ready to go into process_radtags in STACKS. Check to make sure it's working before moving on though (ie. the output files should have 80-90% or more of the number of lines of the input files).
For running on a server that utilizes "slurm" (Simple Linux Utility for Resource Management) you must schedule your job to run with the "sbatch" command. You will need to have a script (save it as flipRADreads.sh) that you will launch by typing
The script will look something like:
#SBATCH -N 4 # nodes=4
#SBATCH --ntasks-per-node=8 # ppn=20
#SBATCH -J flip_HT03 # job name
#SBATCH -t 48:00:00 # 14:00 is 14 min walltime
#this loads Environment Modules
module load openmpi
# cd /directory/where/datafiles/are/stored
#barcodes.txt is just a text file with all the expected barcodes, one per line
#reads_R1.fastq is the raw file of forward reads (fastq format, not gzipped)
#reads_R2.fastq is the raw file of reverse reads
#output1.fastq will be the output file of forward reads
#output2.fastq will be the output file of reverse reads
To make sure that flipping of the reads was executed correctly you should check to make sure that at least 70% of the reads were retained. To do this you can simply count the number of lines in the original raw sequence files and compare them to the output flipped files. Each sequence read takes up 4 lines in the .fastq file so you will need to divide the number of lines by 4 to get the number of sequence reads for the before and after files. Then simply divide the number of lines in the flipped file by the number of reads in the original raw file and multiply by 100 to get the % of reads retained.
In general you can use the #wc [options] filenames to do many of these types of simple tasks.
wc -l : Prints the number of lines in a file.
wc -w : prints the number of words in a file.
wc -c : Displays the count of bytes in a file.
wc -m : prints the count of characters from a file.
wc -L : prints only the length of the longest line in a file.
# To get the number of lines in a file then simply type:
$wc -l filename.fastq
Prepare a barcode file in the following format with barcode in the first column and sample ID i nthe second:
Create a "sample" directory in Stacks. this will be the output directory for process_radtags.
run the following bash script:
#SBATCH -N 1
#SBATCH -J process_radtagsRobins1
#SBATCH -t 48:00:00
./process_radtags -1 /directory/forwardinputfile.fastq -2 /directory/reverseinputfile.fastq -o directory/samples1/ -b /directory/SampleBarcodes.txt -e sbfI -r -c -q -D
Open the process_radtags.log to assess individual sequence calls and to remove any bad samples from downstream processing.