Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • DrD2009
    Member
    • Oct 2009
    • 88

    Calculating small RNA expressions from Solexa Sequencing with Cufflinks

    I am in need of your collective bioinformatic brains.

    Here is my situation:
    I have data sets of small RNAs from Solexa sequencing. I'm wanting to calculate the expressions of these small RNAs, but having some issues. To calculate expressions I used SAM output from Bowtie and used Cufflinks to calculate the expressions. The expressions calculated people in my lab are having concerns about. The small RNAs should be <35bps, but the calculations from Cufflinks involve regions that are much larger than 35bps. People in my lab wish to calculate expression for only the regions <35bps and I can only think of doing this by doing counts. Is Cufflinks an inappropriate expression calculation tool for small RNAs (miRNAs, siRNAs, etc.)? Or is there another program I could used to achieve this? I think Cufflinks is calculating the expressions correctly, but I thought I would reach out to all of you and ask.

    Thanks in advance.


    -Brandon
  • minghui
    Junior Member
    • Feb 2010
    • 7

    #2
    first,i do not understand what your mean to the ground. maybe you mean that you only want to calculate the read whose length > 35 bp ? if so ,you can write a script to filter your dataset before calculation.

    Comment

    • DrD2009
      Member
      • Oct 2009
      • 88

      #3
      I'll try to clear that up. I guess I said it in a confusing way.

      We calculated the expressions in Cufflinks for the small RNAs that were mapped, but the region size for the expression calculations range from ~24 to ~7,000 bps. I think the following is occuring when Cufflinks calculates the expressions:

      (small RNAs aligned to genome)
      _______ _______
      ___ _______ ________
      ________ _______

      [------------------------] <--- Size of region grouped by Cufflinks and expression value calculated for.

      What the members of my lab want is the following:

      (small RNAs aligned to genome)
      _______ _______
      ___ _______ ________
      ________ _______

      [-------] [-------]
      [--] [------] [--------]
      [-------] [-------] <----Size of regions calculated for expression.


      Basically, they want a way to measure the expression of each read so they remain the size of small RNAs and are not lengthened, which seems to happen in Cufflinks, but Cufflinks is built for genes as opposed to small RNA analysis. The only way I can currently think of how to do expression calculations of each read would be to simply perform unique read counts.

      Comment

      • minghui
        Junior Member
        • Feb 2010
        • 7

        #4
        ok,i understand what you mean exactly . Cufflinks measures transcript abundances in Fragments Per Kilobase of exon per Million fragments mapped (FPKM). it is fit for mRNA-seq .et , but, not smallRNA. Because of the length of smallRNA, one read one molecule. when we use FPKM , the same expression may have different FRKM, because the different length of different smallRNA. So, the exact regions you want may be useless. In my opinion:

        two solutions :
        1.use rpm (reads per million reads)
        (OR)2.use software align the reads to genome , define each region yourself, count the reads in each region.

        """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
        (small RNAs aligned to genome)
        _______ _______
        ___ _______ ________
        ________ _______

        [-------] [-------]
        [--] [------] [--------]
        [-------] [-------] <----Size of regions calculated for expression.
        """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""

        Comment

        • DrD2009
          Member
          • Oct 2009
          • 88

          #5
          Minghui,

          Thanks for getting back with me on this.

          I think we are going to do counts. Do you know if there is a preferred way to calculate expressions for small RNAs? Counts versus RPM? And do you know a software that calculates RPM?

          I'm using Bowtie as the aligner.


          Thanks again,
          Brandon

          Comment

          • minghui
            Junior Member
            • Feb 2010
            • 7

            #6
            hi,Brandon!

            I am sorry ,i do not know any software that calculates RPM. Because I calculated RPM myself. There is no difference between counts and RPM,when you calculate in a single library. counts/million=RPM. RPM, is a normalization between two or many libraries. I think that you could try to align all reads to genome ,and calculate the numbers of reads in each region.

            (small RNAs aligned to genome)
            _______ _______
            ___ _______ ________
            ________ _______

            [-------] [-------]
            [--] [------] [--------]
            [-------] [-------] <----Size of regions calculated for expression.

            For example: ATGCATGCATGC ATGGATGCATGC TGCACGATCGAT (3 reads)

            alignment :----------------------------------1----------2--3----------4
            -----------(genome sequence) GGGGGGTAGCGATGCATGCATGCACGATCGAT
            -----------(read)--------------------------- ATGCATGCATGC
            -----------(read)--------------------------- ATGGATGCATGC
            -----------(read)---------------------------------------TGCACGATCGAT

            Calculation:"1-3" region : expression level :2
            "2-4" region : expression level :1

            Advantage: when couts unique sequences ,these thress have the same expression level 1; but that may be wrong,because RNA edit or sequencing errors.
            Last edited by minghui; 06-29-2010, 06:35 PM.

            Comment

            • DrD2009
              Member
              • Oct 2009
              • 88

              #7
              Minghui,

              Thanks for explaining that to me it makes sense. From the papers I've been reading lately they have all been using counts as well so I suppose that is how we will go about it too.

              Thanks again.

              Comment

              Latest Articles

              Collapse

              • SEQadmin2
                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                by SEQadmin2


                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                ...
                06-02-2026, 10:05 AM
              • SEQadmin2
                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                by SEQadmin2


                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                Introduction

                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                05-22-2026, 06:42 AM
              • SEQadmin2
                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                by SEQadmin2

                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                05-06-2026, 09:04 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, Today, 08:59 AM
              0 responses
              10 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-02-2026, 12:03 PM
              0 responses
              21 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-02-2026, 11:40 AM
              0 responses
              17 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 05-28-2026, 11:40 AM
              0 responses
              31 views
              0 reactions
              Last Post SEQadmin2  
              Working...