Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • bvk
    Member
    • May 2015
    • 65

    how to get the count of reads mapped to rRNA regions?

    Hi all,

    I just mapped the RNA-Seqs by Tophat and got the output file accepted_hit.bam. I also have gtf file. I want to count the reads mapping to rRNA regions and also other. I hope featurecounts can be used. But could anyone please tell me how to give the options and different parameters.

    Any reply will be appreciated.

    Thank you very much.
  • GenoMax
    Senior Member
    • Feb 2008
    • 7142

    #2
    For reference cross-posted (and mostly answered): https://www.biostars.org/p/187338

    @bvk: Please check the links for featureCounts page posted and read featureCounts manual starting at page #27.

    Comment

    • bvk
      Member
      • May 2015
      • 65

      #3
      Hi,

      As you said I had a look in the manual.

      featureCounts -p -t exon -g gene_id -a hg19_RefSeq.gtf -o counts.txt accepted_hits.bam

      This Summarize paired-end reads and count fragments (instead of reads).

      This is the output I got.

      //========================== featureCounts setting ===========================\\
      || ||
      || Input files : 1 BAM file ||
      || P MCF10A_BaP_pool1_R1_postTrimming/accepted_ ... ||
      || ||
      || Output file : counts.txt ||
      || Annotations : hg19_RefSeq.gtf (GTF) ||
      || ||
      || Threads : 1 ||
      || Level : meta-feature level ||
      || Paired-end : yes ||
      || Strand specific : no ||
      || Multimapping reads : not counted ||
      || Multi-overlapping reads : not counted ||
      || Read orientations : fr ||
      || ||
      || Chimeric reads : counted ||
      || Both ends mapped : not required ||
      || ||
      \\===================== http://subread.sourceforge.net/ ======================//

      //================================= Running ==================================\\
      || ||
      || Load annotation file hg19_RefSeq.gtf ... ||
      || Features : 432390 ||
      || Meta-features : 24019 ||
      || Chromosomes/contigs : 24 ||
      || ||
      || Process BAM file MCF10A_BaP_pool1_R1_postTrimming/accepted_hits.bam... ||
      || Paired-end reads are included. ||
      || Assign fragments (read pairs) to features... ||
      || ||
      || WARNING: reads from the same pair were found not adjacent to each ||
      || other in the input (due to read sorting by location or ||
      || reporting of multi-mapping read pairs). ||
      || ||
      || Read re-ordering is performed. ||
      || ||
      || Total fragments : 56783755 ||
      || Successfully assigned fragments : 31088269 (54.7%) ||
      || Running time : 4.13 minutes ||
      || ||
      || Read assignment finished. ||
      || ||
      \\===================== http://subread.sourceforge.net/ ======================//

      Is this right one to count reads mapped to rRNA regions. Could you please help me.

      Thank you
      Last edited by bvk; 04-20-2016, 02:16 AM.

      Comment

      • Michael.Ante
        Senior Member
        • Oct 2011
        • 127

        #4
        There are a couple of annotations for rRNA regions (e.g. the repeat masker) which you can use to count reads in these regions. Nevertheless, rRNA is located in clusters which itself have repeats. Most reads originating from rRNA would map too many times and would be discarded then by Tophat.
        The easiest way of counting reads mapping to the rRNA is to generate a bowtie2 index from the rRNA sequences (28S, 5.8S, 18S, 5S, the 45S pre-ribosomal cluster, and the mt-rRNAs) and map your reads with bowtie2.

        Comment

        • bvk
          Member
          • May 2015
          • 65

          #5
          Actually the samples which I have are rRNA depleted but from the person who did sequencing told me there may be some present still. As I already have gtf and bam files I'm trying to use featureCounts to count the reads mapped to rRNA regions.

          Comment

          • GenoMax
            Senior Member
            • Feb 2008
            • 7142

            #6
            If you use a GTF file that only has genes (and things that you are interested in) then the resulting counts file only has counts for those features. Unless you are actually interested in rRNA biology you can safely ignore the reads that mapped to rRNA past this point.

            BTW: You should look at the contents of counts.txt.summary file or post it here. I suggest that you stop worrying about rRNA (assuming you are not interested in rRNA) and move on with your DE analysis.
            Last edited by GenoMax; 04-20-2016, 03:53 AM.

            Comment

            • bvk
              Member
              • May 2015
              • 65

              #7
              Here it is... the contents of counts.txt.summary file
              Status MCF10A_BaP_pool1_R1_postTrimming/accepted_hits.bam
              Assigned 31088269
              Unassigned_Ambiguity 613410
              Unassigned_MultiMapping 15375382
              Unassigned_NoFeatures 9706694
              Unassigned_Unmapped 0
              Unassigned_MappingQuality 0
              Unassigned_FragmentLength 0
              Unassigned_Chimera 0
              Unassigned_Secondary 0
              Unassigned_Nonjunction 0
              Unassigned_Duplicate 0

              Yes actually Im interested in rRNA biology. I guess the gtf file doesn't have that info. So, how can I get a gtf file with only rRNA regions?

              Comment

              • bvk
                Member
                • May 2015
                • 65

                #8
                Ok. Just now I searched I can get it from UCSC table browser. So now how can I give the arguments and options for featureCounts. As I said before the command which I gave gives fragments.

                I need read pairs mapped to rRNA regions.

                Comment

                • GenoMax
                  Senior Member
                  • Feb 2008
                  • 7142

                  #9
                  What does the following command produce when you run it on the GTF file you used?
                  Code:
                  $ grep rRNA your_GTF_file
                  If you don't get anything then there is no annotation for rRNA in your file. I believe only ensembl may have rRNA annotations (UCSC GTF does not, I have mouse ensembl files and there is rRNA there).

                  At this point you have two choices. You can either map your reads against just the rDNA repeat (that I had linked in Biostars thread) or use a different genome build (e.g. Ensembl verify that it has rRNA annotations) and re-do your Tophat alignments.

                  If you are interested in rRNA biology then why were these samples ribo-depleted? You have likely lost most of the interesting data you would have wanted in this process.

                  Comment

                  • GenoMax
                    Senior Member
                    • Feb 2008
                    • 7142

                    #10
                    Originally posted by bvk View Post
                    Ok. Just now I searched I can get it from UCSC table browser. So now how can I give the arguments and options for featureCounts. As I said before the command which I gave gives fragments.

                    I need read pairs mapped to rRNA regions.
                    Fragments are represented by read pairs (reads sample the two ends of the fragment).

                    Comment

                    • bvk
                      Member
                      • May 2015
                      • 65

                      #11
                      But I found a link which is telling how to get rRNA gtf file from UCSC table browser. check this once.


                      Yes, as my samples are ribo depleted the person who did sequencing said me that there me be some rRNA still present. So, they also want the reads which are mapped to rRNA.

                      Can you give me the right information whether it can be done or not. Thank you

                      Comment

                      • bvk
                        Member
                        • May 2015
                        • 65

                        #12
                        Originally posted by GenoMax View Post
                        What does the following command produce when you run it on the GTF file you used?
                        Code:
                        $ grep rRNA your_GTF_file
                        If you don't get anything then there is no annotation for rRNA in your file. I believe only ensembl may have rRNA annotations (UCSC GTF does not, I have mouse ensembl files and there is rRNA there).

                        At this point you have two choices. You can either map your reads against just the rDNA repeat (that I had linked in Biostars thread) or use a different genome build (e.g. Ensembl verify that it has rRNA annotations) and re-do your Tophat alignments.

                        If you are interested in rRNA biology then why were these samples ribo-depleted? You have likely lost most of the interesting data you would have wanted in this process.
                        When I gave that command it didn't give anything. May be that gtf doesn't have any info regarding rRNA?

                        Comment

                        • GenoMax
                          Senior Member
                          • Feb 2008
                          • 7142

                          #13
                          IF the rRNA GTF file you downloaded from UCSC is for the exact genome build of the reference you used (and assuming that reference was also from UCSC) for doing TopHat alignments then you can use that GTF file to count the reads that map to the features defined in the rRNA GTF file.

                          Comment

                          • GenoMax
                            Senior Member
                            • Feb 2008
                            • 7142

                            #14
                            Originally posted by bvk View Post
                            When I gave that command it didn't give anything. May be that gtf doesn't have any info regarding rRNA?
                            Your original GTF file does not have any information about rRNA. We have confirmed this now.

                            Comment

                            • bvk
                              Member
                              • May 2015
                              • 65

                              #15
                              Originally posted by GenoMax View Post
                              IF the GTF file you downloaded from UCSC is for the exact genome build of the reference you used (and assuming that reference was also from UCSC) for doing TopHat alignments then you can use that GTF file to count the reads that map to the features defined in the rRNA GTF file.
                              Ok. But the featureCounts command is the same as given before? or it should be changed?

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                by SEQadmin2


                                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


                                Here are nine questions we think about, in roughly the order they matter, before...
                                06-18-2026, 07:11 AM
                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM
                              • SEQadmin2
                                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                by SEQadmin2


                                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                Introduction

                                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                05-22-2026, 06:42 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 06-17-2026, 06:09 AM
                              0 responses
                              21 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-09-2026, 11:58 AM
                              0 responses
                              38 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              45 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-04-2026, 08:59 AM
                              0 responses
                              49 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...