Hi all,
I have 2 questions about strand calling..
1) From what I understand, if using an unstranded library protocol, there is no way to directly tell which strand a transcript is from just from the reads. So decide on the strand of a transcript, Cufflinks looks at the splice junctions and based on which strand has a valid AC-GT splice site pair, it calls the strand.
[This thread explains it better.. http://seqanswers.com/forums/showthread.php?t=4704]
But in my cufflinks results, I see that there are a lot of single exon transcripts for which too Cufflinks has assigned a strand. So my question is, how does Cufflinks decide on the strand for a transcript when there is no splice site in the transcript??
2) Secondly, as a way of benchmarking strand calling accuracy of Cufflinks, for each Cufflinks transcript, I have been looking for known genes that overlap with the predicted transcript and compared the strand predicted by cufflinks to the strand of the known gene. I consider a cufflinks prediction wrong if all of the known overlapping genes have the opposite strand to the transcript.
In my analysis, almost 40% of transcripts for which Cufflinks has assigned a strand were on the wrong strand (by my definition above). That seems a pretty high number. So just wanted to know, has anyone else tried something like this.. what kind of results did you get?
Also, do you think I might be doing something wrong that is causing the inaccurate strand calling? Any ideas on how I might improve it?
I am working with Illumina unstranded rna-seq reads.
thanks..
I have 2 questions about strand calling..
1) From what I understand, if using an unstranded library protocol, there is no way to directly tell which strand a transcript is from just from the reads. So decide on the strand of a transcript, Cufflinks looks at the splice junctions and based on which strand has a valid AC-GT splice site pair, it calls the strand.
[This thread explains it better.. http://seqanswers.com/forums/showthread.php?t=4704]
But in my cufflinks results, I see that there are a lot of single exon transcripts for which too Cufflinks has assigned a strand. So my question is, how does Cufflinks decide on the strand for a transcript when there is no splice site in the transcript??
2) Secondly, as a way of benchmarking strand calling accuracy of Cufflinks, for each Cufflinks transcript, I have been looking for known genes that overlap with the predicted transcript and compared the strand predicted by cufflinks to the strand of the known gene. I consider a cufflinks prediction wrong if all of the known overlapping genes have the opposite strand to the transcript.
In my analysis, almost 40% of transcripts for which Cufflinks has assigned a strand were on the wrong strand (by my definition above). That seems a pretty high number. So just wanted to know, has anyone else tried something like this.. what kind of results did you get?
Also, do you think I might be doing something wrong that is causing the inaccurate strand calling? Any ideas on how I might improve it?
I am working with Illumina unstranded rna-seq reads.
thanks..
Comment