Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cufflinks' estimated fragment length mean and standard deviation

    I have a question about the Fragment Length Distribution that Cufflinks (v 0.9.0+) estimates, based on paired end read alignments:

    Is this the length of the fragment, including the two reads; or is it the inner distance between mate pairs, without the reads?

    The value I'm talking about is outputted to the screen when Cufflinks runs, e.g.,

    > Read Type: 60bp paired-end
    > Fragment Length Distribution: Empirical (learned)
    > Estimated Mean: 86.94
    > Estimated Std Dev: 13.90

    In the above example, if the value is the fragment length including reads, then I must be having a lot of adapter read through.

  • #2
    It is the length of the full fragment, including reads.

    Comment


    • #3
      Thanks for the info!

      I guess my fragments are a lot shorter than what the Agilent High Sensitivity chip predicts/measures. Sigh.

      Comment


      • #4
        I am having a similar issue-- both the gel I ran to look at the fragment length and the bioanalyzer result had a peak at 240 bp, with the bioanalyzer result showing a tight distribution centered around 240. Cufflinks estimates the insert length of this sample as 120 bp with a standard deviation of 70 bp. That is not even close! There is literally no signal at all at 120 bp on the bioanalyzer trace.

        If I supply a GTF file, the estimated distributions are similar in length (120 vs 107) but not std. dev (70 vs. 19). Either way, I am not really understanding where the report of insert length of 100 bp less than I cut out of the gel, visualized on a gel and visualized on a bioanalyzer trace comes from. Any thoughts?
        Last edited by roryk; 10-27-2010, 06:57 AM.

        Comment


        • #5
          roryk,

          A few questions:

          1. What read mapper did you use?
          2. Does Cufflinks correctly report the read length?
          3. Have you tried using Cufflinks 0.9.2?

          Thanks.

          Comment


          • #6
            Hi adarob,

            I used bowtie (tophat) to map the reads, an example:

            tophat -p 4 -G ../misc/rat_knowngene.gtf -o /mnt/sc_exp/E_L4 -r 178 rn4 E_L4_1.fq E_L4_2.fq

            178 is from 250 - 36 * 2, two 36 basepair reads. rat_knowngene.gtf is from the UCSC knowngenes.

            Cufflinks does correctly report the read length. I get the same result using Cufflinks 0.9.1 and 0.9.2.

            Comment


            • #7
              roryk,

              Are you compiling cufflinks from source? There is a small change you can make to the source code to have it output the empirical distribution. Otherwise, would you be willing to make your bam file available for me to help resolve this?

              -Adam

              Comment


              • #8
                I have just been using the precompiled binaries for linux but am not adverse to compiling it from source. I emailed you a link to the bam file.

                Comment


                • #9
                  Originally posted by roryk View Post
                  I am having a similar issue-- both the gel I ran to look at the fragment length and the bioanalyzer result had a peak at 240 bp, with the bioanalyzer result showing a tight distribution centered around 240. Cufflinks estimates the insert length of this sample as 120 bp with a standard deviation of 70 bp. That is not even close! There is literally no signal at all at 120 bp on the bioanalyzer trace.

                  If I supply a GTF file, the estimated distributions are similar in length (120 vs 107) but not std. dev (70 vs. 19). Either way, I am not really understanding where the report of insert length of 100 bp less than I cut out of the gel, visualized on a gel and visualized on a bioanalyzer trace comes from. Any thoughts?
                  Just to be absolutely clear, was the sample which you measured as 240bp before or after ligating the Illumina adapters? The combined length of the Illumina RNA-Seq (or PE) adapters is 119bp. If your 240bp includes these than the estimate from cufflinks is spot on as it is estimating the size of the insert only.

                  Comment


                  • #10
                    Originally posted by kmcarr View Post
                    Just to be absolutely clear, was the sample which you measured as 240bp before or after ligating the Illumina adapters? The combined length of the Illumina RNA-Seq (or PE) adapters is 119bp. If your 240bp includes these than the estimate from cufflinks is spot on as it is estimating the size of the insert only.
                    Yup, this is exactly right; I thought the combined length of the PE adaptors was half what it is. Thanks!

                    Comment


                    • #11
                      Glad to see the problem was resolved.

                      Comment


                      • #12
                        Thanks, kmcarr!

                        I had a similar problem: The fragment length estimated by Cufflinks completely disagreed with that measured by the Agilent High Sensitivity Chip. But, subtracting 119bp from the Agilent measurement, they now agree. Another discrepancy resolved, thank goodness.

                        Comment


                        • #13
                          It is helpful to read this old thread.

                          I am new to Illumina RNA-Seq data. Our lab used SOLiD/Torrent/Proton in the past but we will use Illumina platform for future RNA-Seq projects with large sample size. I am looking at an unstranded RNA-Seq data generated by another lab using TruSeq RNA sample prep kit v2 protocol to get familiar with the Illumina data. My question related to this thread is whether the combined length of adapters for all Illumina protocols including the stranded protocol is always 119bp?

                          Thanks!

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Advancing Precision Medicine for Rare Diseases in Children
                            by seqadmin




                            Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                            12-16-2024, 07:57 AM
                          • seqadmin
                            Recent Advances in Sequencing Technologies
                            by seqadmin



                            Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                            Long-Read Sequencing
                            Long-read sequencing has seen remarkable advancements,...
                            12-02-2024, 01:49 PM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 12-17-2024, 10:28 AM
                          0 responses
                          33 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 12-13-2024, 08:24 AM
                          0 responses
                          49 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 12-12-2024, 07:41 AM
                          0 responses
                          34 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 12-11-2024, 07:45 AM
                          0 responses
                          46 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X