Dear all,
I'm looking for small, non-coding RNAs in 3 samples of mouse tissue. To do this we have performed 2x76bp strand specific RNAseq on total RNA depleted for ribosomal RNA with a kit called RiboMinus. I have the data from our collaborators in BAM format. Although this data is for non-coding RNA I want to extract information on gene expression levels from it as well, if possible. So far I have done the following using the Galaxy bioinformatics platform to try and get expression levels:
1) Transcript assembly with cufflinks for each sample with no reference annotation
2) Used cuffmerge to combine the 3 samples with the UCSC genes reference annotation
3) Used cuffdiff on the output of cuffmerge, along with my 3 original BAM files
4) Taken the output from cuffdiff and used R with cummeRbund to analyse the results
I have attached the output of csVolcano and csScatter for one pairwise comparison to give an idea of the data.
Now, my questions:
a) Should I be worried about the appearance of the plots. I have highlighted the regions which appear to represent 'expression' in only one sample. They seem to form a large portion of my data - a problem with cufflinks, perhaps?
b) Why, when I do
does the output for that gene look like this:
No fpkm, no diff etc. even for genes I know are highly expressed. This means I cannot, for example, plot expressionBarPlot for my genes of interest.
c) What general quality control recommendations can anyone make for determining how good this data is at capturing gene expression profiles, bearing in mind that it is total RNA and therefore (I assume) not as well suited as polyA-enriched RNA for this task?
Thanks for reading this, this is my first foray into NGS data and my only information has come from the excellent Nature Protocols paper "Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks" by Trapnell et al. It is entirely probable that I have missed some pretty basic points as a result so please recommend any beginners literature if you can.
Cheers,
J
I'm looking for small, non-coding RNAs in 3 samples of mouse tissue. To do this we have performed 2x76bp strand specific RNAseq on total RNA depleted for ribosomal RNA with a kit called RiboMinus. I have the data from our collaborators in BAM format. Although this data is for non-coding RNA I want to extract information on gene expression levels from it as well, if possible. So far I have done the following using the Galaxy bioinformatics platform to try and get expression levels:
1) Transcript assembly with cufflinks for each sample with no reference annotation
2) Used cuffmerge to combine the 3 samples with the UCSC genes reference annotation
3) Used cuffdiff on the output of cuffmerge, along with my 3 original BAM files
4) Taken the output from cuffdiff and used R with cummeRbund to analyse the results
I have attached the output of csVolcano and csScatter for one pairwise comparison to give an idea of the data.
Now, my questions:
a) Should I be worried about the appearance of the plots. I have highlighted the regions which appear to represent 'expression' in only one sample. They seem to form a large portion of my data - a problem with cufflinks, perhaps?
b) Why, when I do
Code:
mygene <- getGene(cuff_data, 'GENENAME') mygene
Code:
CuffGene instance for gene XLOC_181410 Short name: uc009ajk.1 Slots: annotation fpkm diff isoforms CuffFeature instance of size 1 TSS CuffFeature instance of size 1 CDS CuffFeature instance of size 1
c) What general quality control recommendations can anyone make for determining how good this data is at capturing gene expression profiles, bearing in mind that it is total RNA and therefore (I assume) not as well suited as polyA-enriched RNA for this task?
Thanks for reading this, this is my first foray into NGS data and my only information has come from the excellent Nature Protocols paper "Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks" by Trapnell et al. It is entirely probable that I have missed some pretty basic points as a result so please recommend any beginners literature if you can.
Cheers,
J
Comment