I have SOLiD RNA-Seq data for a bacterium that we study grown under two different conditions. We did paired end sequencing but the F5 reads were very poor quality and unusable. The F3 reads were better, but needed significant end-trimming to improve mapping. I am relatively happy with the results so far, but I would really like to get some quantitative information from my data and I am having some issues with Cufflinks.
SHRiMP read mapping:
I have mapped the reads using SHRiMP using the .sam output option. Then I used samtools to convert the file to an indexed .bam file to view with the Broad Institute's Integrated Genomics Viewer. I have several control genes that I know change expression during the different conditions, and these changes are visually present in the read mapping. However, I would like to get more quantitative information, and I have turned to Cufflinks.
Cufflinks for bacterial RNA Seq:
Since bacteria don't splice like their eukaryote counterparts, mapping splice junctions isn't useful.
-.gff reference file
On the cufflinks website they suggest using a minimal .gff file as a guide to map to for FPKM calculation. I downloaded the genbank .gff file for my organism and removed many of the misc_features, leaving only CDS and gene information. Then I used the gffread program that comes with cufflinks to "minimalize" my gff file further.
-SHRiMP .sam file prep
Then I cleaned up my .sam file to remove extraneous fields left over from SHRiMP, and added the final column to each row "XS:A:+" or "XS:A:-" to indicate the positive or negative strand, as suggested by cufflinks.
-Cufflinks/Cuffdiff
When I try to run cufflinks or cuffdiff, for some reason both programs think that my .sam file has 0bp reads, and then won't give me any FPKM values (from cufflinks) or differential FPKM values (from cuffdiff).
Does anyone have any suggestions for correcting this problem? Do I need to annotate the TSS's? If so, does anyone have a tool they would recommend for doing this? I definitely want to map TSS's eventually, but first I was just trying to do some quick and dirty quantitative analysis.
Any thoughts or suggestions would be much appreciated!
-Pete
SHRiMP read mapping:
I have mapped the reads using SHRiMP using the .sam output option. Then I used samtools to convert the file to an indexed .bam file to view with the Broad Institute's Integrated Genomics Viewer. I have several control genes that I know change expression during the different conditions, and these changes are visually present in the read mapping. However, I would like to get more quantitative information, and I have turned to Cufflinks.
Cufflinks for bacterial RNA Seq:
Since bacteria don't splice like their eukaryote counterparts, mapping splice junctions isn't useful.
-.gff reference file
On the cufflinks website they suggest using a minimal .gff file as a guide to map to for FPKM calculation. I downloaded the genbank .gff file for my organism and removed many of the misc_features, leaving only CDS and gene information. Then I used the gffread program that comes with cufflinks to "minimalize" my gff file further.
-SHRiMP .sam file prep
Then I cleaned up my .sam file to remove extraneous fields left over from SHRiMP, and added the final column to each row "XS:A:+" or "XS:A:-" to indicate the positive or negative strand, as suggested by cufflinks.
-Cufflinks/Cuffdiff
When I try to run cufflinks or cuffdiff, for some reason both programs think that my .sam file has 0bp reads, and then won't give me any FPKM values (from cufflinks) or differential FPKM values (from cuffdiff).
Does anyone have any suggestions for correcting this problem? Do I need to annotate the TSS's? If so, does anyone have a tool they would recommend for doing this? I definitely want to map TSS's eventually, but first I was just trying to do some quick and dirty quantitative analysis.
Any thoughts or suggestions would be much appreciated!
-Pete