Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Question about Cufflink strand calling

    Hi all,

    I have 2 questions about strand calling..

    1) From what I understand, if using an unstranded library protocol, there is no way to directly tell which strand a transcript is from just from the reads. So decide on the strand of a transcript, Cufflinks looks at the splice junctions and based on which strand has a valid AC-GT splice site pair, it calls the strand.

    [This thread explains it better.. http://seqanswers.com/forums/showthread.php?t=4704]

    But in my cufflinks results, I see that there are a lot of single exon transcripts for which too Cufflinks has assigned a strand. So my question is, how does Cufflinks decide on the strand for a transcript when there is no splice site in the transcript??


    2) Secondly, as a way of benchmarking strand calling accuracy of Cufflinks, for each Cufflinks transcript, I have been looking for known genes that overlap with the predicted transcript and compared the strand predicted by cufflinks to the strand of the known gene. I consider a cufflinks prediction wrong if all of the known overlapping genes have the opposite strand to the transcript.

    In my analysis, almost 40% of transcripts for which Cufflinks has assigned a strand were on the wrong strand (by my definition above). That seems a pretty high number. So just wanted to know, has anyone else tried something like this.. what kind of results did you get?

    Also, do you think I might be doing something wrong that is causing the inaccurate strand calling? Any ideas on how I might improve it?

    I am working with Illumina unstranded rna-seq reads.

    thanks..

  • #2
    Hi avi,

    I am finding similar results. Did you find any answers to your questions?

    Thanks.

    Comment


    • #3
      Hi joro,

      No, i still didn't get any answers to how Cufflinks does its strand calling for single exon transcripts.

      But when I looked at the "wrong" strand assigned transcripts, I found that a huge majority of wrong strand assignments came from single exon transcripts. So going forward, I decided to use the cufflink strand assignments only for multi-exon transcripts. I am assuming all single exon transcripts as strand unknown even if cufflinks assigns it a strand.

      On a related note.. i recently realised that my data itself might not be very good and posted a question about that http://seqanswers.com/forums/showthread.php?t=14416. I haven't got an answer to that yet, but you might also want to check that about your data.

      cheers..

      Comment


      • #4
        Thanks for your quick reply avi.

        Out of interest, did you specify a library type when running Cufflinks?

        I'm using SOLiD stranded rna-seq reads so it's interesting that we experienced the same problem.

        Comment


        • #5
          Originally posted by avi View Post
          Hi all,

          I have 2 questions about strand calling..

          1) From what I understand, if using an unstranded library protocol, there is no way to directly tell which strand a transcript is from just from the reads. So decide on the strand of a transcript, Cufflinks looks at the splice junctions and based on which strand has a valid AC-GT splice site pair, it calls the strand.

          [This thread explains it better.. http://seqanswers.com/forums/showthread.php?t=4704]

          But in my cufflinks results, I see that there are a lot of single exon transcripts for which too Cufflinks has assigned a strand. So my question is, how does Cufflinks decide on the strand for a transcript when there is no splice site in the transcript??
          I think assigning a strand for a transcript depends on which genome sequence on which the reads are mapping. The strand should be + if reads are mapped on the sense strand, and vice versa.

          Cheers,

          Comment


          • #6
            I've looked at some examples of single exon transcripts which have been assigned a strand even though all the reads map to the opposite strand (e.g. assigned '+' even though the reads map to the antisense strand).

            I don't know if avi finds the same thing?

            Comment


            • #7
              Originally posted by joro View Post
              I've looked at some examples of single exon transcripts which have been assigned a strand even though all the reads map to the opposite strand (e.g. assigned '+' even though the reads map to the antisense strand).

              I don't know if avi finds the same thing?
              I don't know if I have catched your meaning.
              But I guess it depends on which strand is defined "sense" or "antisense". I think it is reasonable to assign "+" to an "antisense" strand. You can check it comparing to strands of multi-exon transcripts with their reads.

              Cheers,

              Comment


              • #8
                Thanks. From my experience and from avi's posts, the wrong strand assignments tend to occur in single exon transcripts.

                Comment


                • #9
                  Originally posted by joro View Post
                  Thanks. From my experience and from avi's posts, the wrong strand assignments tend to occur in single exon transcripts.
                  I am wondering how you could know some assigned strands is wrong. Have you got reference transcripts for them?

                  Comment


                  • #10
                    Yes, I have looked at reference genes that overlap with the predicted transcripts. Also, all the reads map to the same strand as the reference gene so it seems that Cufflinks assigns the opposite strand.

                    Comment


                    • #11
                      Hi Hunny,

                      Thats what I originally thought too. But apparently it doesn't work that way. From what I understand, during the processing for RNAseq, the RNA is converted into double stranded cDNA and this cDNA gets sequenced. So the reads could be from either one of the cDNA strands. Therefore the strand to which the reads map doesn't tell us anything about which strand the original RNA came from.

                      This is only for non-strand specific protocols. There are protocols to maintain the strand information during the RNA seq processing. But I haven't read about them yet.

                      @joro: No, I didn't specify a library type. Thats very strange if you are getting wrong strand predictions even though your data is strand specific. But your guess is as good as mine here. Hopefully someone with more experience might be able to clear this up for us.

                      Comment


                      • #12
                        Hi avi,

                        Originally posted by avi View Post
                        Hi Hunny,

                        Thats what I originally thought too. But apparently it doesn't work that way. From what I understand, during the processing for RNAseq, the RNA is converted into double stranded cDNA and this cDNA gets sequenced. So the reads could be from either one of the cDNA strands. Therefore the strand to which the reads map doesn't tell us anything about which strand the original RNA came from.

                        This is only for non-strand specific protocols. There are protocols to maintain the strand information during the RNA seq processing. But I haven't read about them yet.
                        Yes, I see. I've just now checked my predicted transcripts from Cufflinks. And I find that Cufflinks does not assign any strand information on some of my single-exon transcripts(because I haven't checked all of them), but assigns with a dot in the strand field of GTF file.

                        I am using Tophat-1.3.1 and Cufflinks-1.0.3 with default options.

                        Cheers,

                        Comment


                        • #13
                          This work was done a while ago. I used cufflinks-0.9.3 & tophat-1.1.1 so maybe if I run it again with updated versions of these programs I won't see such strange results.

                          @Hunny, what reads are you working with? Did you specify a library type?

                          Thanks.

                          Comment


                          • #14
                            Originally posted by joro View Post
                            This work was done a while ago. I used cufflinks-0.9.3 & tophat-1.1.1 so maybe if I run it again with updated versions of these programs I won't see such strange results.

                            @Hunny, what reads are you working with? Did you specify a library type?

                            Thanks.
                            I am working with Illumina single-end reads.
                            No, I didn't specify a library type, just with default options.

                            Cheers,

                            Comment


                            • #15
                              I encountered the same problem. I found strand error and overlap between transcripts.
                              Detail in http://seqanswers.com/forums/showthread.php?t=26555
                              I am confused also.
                              github:
                              https://github.com/Bioinformatics-and-Genomics

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              25 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              28 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              24 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              52 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X