Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • d f
    Member
    • Feb 2010
    • 18

    Cufflinks' estimated fragment length mean and standard deviation

    I have a question about the Fragment Length Distribution that Cufflinks (v 0.9.0+) estimates, based on paired end read alignments:

    Is this the length of the fragment, including the two reads; or is it the inner distance between mate pairs, without the reads?

    The value I'm talking about is outputted to the screen when Cufflinks runs, e.g.,

    > Read Type: 60bp paired-end
    > Fragment Length Distribution: Empirical (learned)
    > Estimated Mean: 86.94
    > Estimated Std Dev: 13.90

    In the above example, if the value is the fragment length including reads, then I must be having a lot of adapter read through.
  • adarob
    Member
    • Jul 2010
    • 71

    #2
    It is the length of the full fragment, including reads.

    Comment

    • d f
      Member
      • Feb 2010
      • 18

      #3
      Thanks for the info!

      I guess my fragments are a lot shorter than what the Agilent High Sensitivity chip predicts/measures. Sigh.

      Comment

      • roryk
        Member
        • Aug 2010
        • 15

        #4
        I am having a similar issue-- both the gel I ran to look at the fragment length and the bioanalyzer result had a peak at 240 bp, with the bioanalyzer result showing a tight distribution centered around 240. Cufflinks estimates the insert length of this sample as 120 bp with a standard deviation of 70 bp. That is not even close! There is literally no signal at all at 120 bp on the bioanalyzer trace.

        If I supply a GTF file, the estimated distributions are similar in length (120 vs 107) but not std. dev (70 vs. 19). Either way, I am not really understanding where the report of insert length of 100 bp less than I cut out of the gel, visualized on a gel and visualized on a bioanalyzer trace comes from. Any thoughts?
        Last edited by roryk; 10-27-2010, 06:57 AM.

        Comment

        • adarob
          Member
          • Jul 2010
          • 71

          #5
          roryk,

          A few questions:

          1. What read mapper did you use?
          2. Does Cufflinks correctly report the read length?
          3. Have you tried using Cufflinks 0.9.2?

          Thanks.

          Comment

          • roryk
            Member
            • Aug 2010
            • 15

            #6
            Hi adarob,

            I used bowtie (tophat) to map the reads, an example:

            tophat -p 4 -G ../misc/rat_knowngene.gtf -o /mnt/sc_exp/E_L4 -r 178 rn4 E_L4_1.fq E_L4_2.fq

            178 is from 250 - 36 * 2, two 36 basepair reads. rat_knowngene.gtf is from the UCSC knowngenes.

            Cufflinks does correctly report the read length. I get the same result using Cufflinks 0.9.1 and 0.9.2.

            Comment

            • adarob
              Member
              • Jul 2010
              • 71

              #7
              roryk,

              Are you compiling cufflinks from source? There is a small change you can make to the source code to have it output the empirical distribution. Otherwise, would you be willing to make your bam file available for me to help resolve this?

              -Adam

              Comment

              • roryk
                Member
                • Aug 2010
                • 15

                #8
                I have just been using the precompiled binaries for linux but am not adverse to compiling it from source. I emailed you a link to the bam file.

                Comment

                • kmcarr
                  Senior Member
                  • May 2008
                  • 1181

                  #9
                  Originally posted by roryk View Post
                  I am having a similar issue-- both the gel I ran to look at the fragment length and the bioanalyzer result had a peak at 240 bp, with the bioanalyzer result showing a tight distribution centered around 240. Cufflinks estimates the insert length of this sample as 120 bp with a standard deviation of 70 bp. That is not even close! There is literally no signal at all at 120 bp on the bioanalyzer trace.

                  If I supply a GTF file, the estimated distributions are similar in length (120 vs 107) but not std. dev (70 vs. 19). Either way, I am not really understanding where the report of insert length of 100 bp less than I cut out of the gel, visualized on a gel and visualized on a bioanalyzer trace comes from. Any thoughts?
                  Just to be absolutely clear, was the sample which you measured as 240bp before or after ligating the Illumina adapters? The combined length of the Illumina RNA-Seq (or PE) adapters is 119bp. If your 240bp includes these than the estimate from cufflinks is spot on as it is estimating the size of the insert only.

                  Comment

                  • roryk
                    Member
                    • Aug 2010
                    • 15

                    #10
                    Originally posted by kmcarr View Post
                    Just to be absolutely clear, was the sample which you measured as 240bp before or after ligating the Illumina adapters? The combined length of the Illumina RNA-Seq (or PE) adapters is 119bp. If your 240bp includes these than the estimate from cufflinks is spot on as it is estimating the size of the insert only.
                    Yup, this is exactly right; I thought the combined length of the PE adaptors was half what it is. Thanks!

                    Comment

                    • adarob
                      Member
                      • Jul 2010
                      • 71

                      #11
                      Glad to see the problem was resolved.

                      Comment

                      • d f
                        Member
                        • Feb 2010
                        • 18

                        #12
                        Thanks, kmcarr!

                        I had a similar problem: The fragment length estimated by Cufflinks completely disagreed with that measured by the Agilent High Sensitivity Chip. But, subtracting 119bp from the Agilent measurement, they now agree. Another discrepancy resolved, thank goodness.

                        Comment

                        • qserenali
                          Junior Member
                          • Apr 2013
                          • 3

                          #13
                          It is helpful to read this old thread.

                          I am new to Illumina RNA-Seq data. Our lab used SOLiD/Torrent/Proton in the past but we will use Illumina platform for future RNA-Seq projects with large sample size. I am looking at an unstranded RNA-Seq data generated by another lab using TruSeq RNA sample prep kit v2 protocol to get familiar with the Illumina data. My question related to this thread is whether the combined length of adapters for all Illumina protocols including the stranded protocol is always 119bp?

                          Thanks!

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Pathogen Surveillance with Advanced Genomic Tools
                            by seqadmin




                            The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                            03-24-2025, 11:48 AM
                          • seqadmin
                            New Genomics Tools and Methods Shared at AGBT 2025
                            by seqadmin


                            This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                            The Headliner
                            The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                            03-03-2025, 01:39 PM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 03-20-2025, 05:03 AM
                          0 responses
                          49 views
                          0 reactions
                          Last Post seqadmin  
                          Started by seqadmin, 03-19-2025, 07:27 AM
                          0 responses
                          57 views
                          0 reactions
                          Last Post seqadmin  
                          Started by seqadmin, 03-18-2025, 12:50 PM
                          0 responses
                          50 views
                          0 reactions
                          Last Post seqadmin  
                          Started by seqadmin, 03-03-2025, 01:15 PM
                          0 responses
                          201 views
                          0 reactions
                          Last Post seqadmin  
                          Working...