Reduced representation bisufite sequencing (RRBS) is a next-generation sequencing approach for DNA methylation analysis. It uses an enzyme digestion of genomic DNA to provide a partial genome. From there, a bisulfite treatment deaminates unmethylated cytosine residues to become uracil. Methylated cytosines remain in tact and do not get converted.
Bisulfite Conversion of Cytosine to Uracil
Polymerase chain reaction (PCR) of the converted DNA then amplifies the converted uracils as thymines to differentiation between sites that are unmethylated (C) or methylated (T). After sequencing, this data is mapped to reference genome for comparative analysis of methylation patterns.
It is very often useful to be able to display a file in the terminal window by using the "cat" command. It is also useful to search large data files for a specific string of characters with the "grep" command.
1. To display a file in the terminal window:
$ cat filename.fastq
Note that if the file is very large it will fly by you on the screen and may take a while to get to the bottom.
2. To search a data file for a specific string:
$ grep "literal_string" filename.fastq
Example, to search the sequence file mydata.fastq for a specific barcode "ACAAGCTA"
$ grep "ACAAGCTA" mydata.fastq
3. Use the "*" to check for string in multiple files of the same type:
$ grep "GCCAAGAC" *.fastq
4. Or to search recursively in all files in a directory and subdirectory:
$ grep "GCCAAGAC" *
For more uses of the "grep" command, click here.
These posts are intended to document my workflow as I handle and manipulate genomic data within the Mac OSX terminal window. The data processed here are from the Illumina platform.