Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to get the count of reads mapped to rRNA regions?

    Hi all,

    I just mapped the RNA-Seqs by Tophat and got the output file accepted_hit.bam. I also have gtf file. I want to count the reads mapping to rRNA regions and also other. I hope featurecounts can be used. But could anyone please tell me how to give the options and different parameters.

    Any reply will be appreciated.

    Thank you very much.

  • #2
    For reference cross-posted (and mostly answered): https://www.biostars.org/p/187338

    @bvk: Please check the links for featureCounts page posted and read featureCounts manual starting at page #27.

    Comment


    • #3
      Hi,

      As you said I had a look in the manual.

      featureCounts -p -t exon -g gene_id -a hg19_RefSeq.gtf -o counts.txt accepted_hits.bam

      This Summarize paired-end reads and count fragments (instead of reads).

      This is the output I got.

      //========================== featureCounts setting ===========================\\
      || ||
      || Input files : 1 BAM file ||
      || P MCF10A_BaP_pool1_R1_postTrimming/accepted_ ... ||
      || ||
      || Output file : counts.txt ||
      || Annotations : hg19_RefSeq.gtf (GTF) ||
      || ||
      || Threads : 1 ||
      || Level : meta-feature level ||
      || Paired-end : yes ||
      || Strand specific : no ||
      || Multimapping reads : not counted ||
      || Multi-overlapping reads : not counted ||
      || Read orientations : fr ||
      || ||
      || Chimeric reads : counted ||
      || Both ends mapped : not required ||
      || ||
      \\===================== http://subread.sourceforge.net/ ======================//

      //================================= Running ==================================\\
      || ||
      || Load annotation file hg19_RefSeq.gtf ... ||
      || Features : 432390 ||
      || Meta-features : 24019 ||
      || Chromosomes/contigs : 24 ||
      || ||
      || Process BAM file MCF10A_BaP_pool1_R1_postTrimming/accepted_hits.bam... ||
      || Paired-end reads are included. ||
      || Assign fragments (read pairs) to features... ||
      || ||
      || WARNING: reads from the same pair were found not adjacent to each ||
      || other in the input (due to read sorting by location or ||
      || reporting of multi-mapping read pairs). ||
      || ||
      || Read re-ordering is performed. ||
      || ||
      || Total fragments : 56783755 ||
      || Successfully assigned fragments : 31088269 (54.7%) ||
      || Running time : 4.13 minutes ||
      || ||
      || Read assignment finished. ||
      || ||
      \\===================== http://subread.sourceforge.net/ ======================//

      Is this right one to count reads mapped to rRNA regions. Could you please help me.

      Thank you
      Last edited by bvk; 04-20-2016, 02:16 AM.

      Comment


      • #4
        There are a couple of annotations for rRNA regions (e.g. the repeat masker) which you can use to count reads in these regions. Nevertheless, rRNA is located in clusters which itself have repeats. Most reads originating from rRNA would map too many times and would be discarded then by Tophat.
        The easiest way of counting reads mapping to the rRNA is to generate a bowtie2 index from the rRNA sequences (28S, 5.8S, 18S, 5S, the 45S pre-ribosomal cluster, and the mt-rRNAs) and map your reads with bowtie2.

        Comment


        • #5
          Actually the samples which I have are rRNA depleted but from the person who did sequencing told me there may be some present still. As I already have gtf and bam files I'm trying to use featureCounts to count the reads mapped to rRNA regions.

          Comment


          • #6
            If you use a GTF file that only has genes (and things that you are interested in) then the resulting counts file only has counts for those features. Unless you are actually interested in rRNA biology you can safely ignore the reads that mapped to rRNA past this point.

            BTW: You should look at the contents of counts.txt.summary file or post it here. I suggest that you stop worrying about rRNA (assuming you are not interested in rRNA) and move on with your DE analysis.
            Last edited by GenoMax; 04-20-2016, 03:53 AM.

            Comment


            • #7
              Here it is... the contents of counts.txt.summary file
              Status MCF10A_BaP_pool1_R1_postTrimming/accepted_hits.bam
              Assigned 31088269
              Unassigned_Ambiguity 613410
              Unassigned_MultiMapping 15375382
              Unassigned_NoFeatures 9706694
              Unassigned_Unmapped 0
              Unassigned_MappingQuality 0
              Unassigned_FragmentLength 0
              Unassigned_Chimera 0
              Unassigned_Secondary 0
              Unassigned_Nonjunction 0
              Unassigned_Duplicate 0

              Yes actually Im interested in rRNA biology. I guess the gtf file doesn't have that info. So, how can I get a gtf file with only rRNA regions?

              Comment


              • #8
                Ok. Just now I searched I can get it from UCSC table browser. So now how can I give the arguments and options for featureCounts. As I said before the command which I gave gives fragments.

                I need read pairs mapped to rRNA regions.

                Comment


                • #9
                  What does the following command produce when you run it on the GTF file you used?
                  Code:
                  $ grep rRNA your_GTF_file
                  If you don't get anything then there is no annotation for rRNA in your file. I believe only ensembl may have rRNA annotations (UCSC GTF does not, I have mouse ensembl files and there is rRNA there).

                  At this point you have two choices. You can either map your reads against just the rDNA repeat (that I had linked in Biostars thread) or use a different genome build (e.g. Ensembl verify that it has rRNA annotations) and re-do your Tophat alignments.

                  If you are interested in rRNA biology then why were these samples ribo-depleted? You have likely lost most of the interesting data you would have wanted in this process.

                  Comment


                  • #10
                    Originally posted by bvk View Post
                    Ok. Just now I searched I can get it from UCSC table browser. So now how can I give the arguments and options for featureCounts. As I said before the command which I gave gives fragments.

                    I need read pairs mapped to rRNA regions.
                    Fragments are represented by read pairs (reads sample the two ends of the fragment).

                    Comment


                    • #11
                      But I found a link which is telling how to get rRNA gtf file from UCSC table browser. check this once.


                      Yes, as my samples are ribo depleted the person who did sequencing said me that there me be some rRNA still present. So, they also want the reads which are mapped to rRNA.

                      Can you give me the right information whether it can be done or not. Thank you

                      Comment


                      • #12
                        Originally posted by GenoMax View Post
                        What does the following command produce when you run it on the GTF file you used?
                        Code:
                        $ grep rRNA your_GTF_file
                        If you don't get anything then there is no annotation for rRNA in your file. I believe only ensembl may have rRNA annotations (UCSC GTF does not, I have mouse ensembl files and there is rRNA there).

                        At this point you have two choices. You can either map your reads against just the rDNA repeat (that I had linked in Biostars thread) or use a different genome build (e.g. Ensembl verify that it has rRNA annotations) and re-do your Tophat alignments.

                        If you are interested in rRNA biology then why were these samples ribo-depleted? You have likely lost most of the interesting data you would have wanted in this process.
                        When I gave that command it didn't give anything. May be that gtf doesn't have any info regarding rRNA?

                        Comment


                        • #13
                          IF the rRNA GTF file you downloaded from UCSC is for the exact genome build of the reference you used (and assuming that reference was also from UCSC) for doing TopHat alignments then you can use that GTF file to count the reads that map to the features defined in the rRNA GTF file.

                          Comment


                          • #14
                            Originally posted by bvk View Post
                            When I gave that command it didn't give anything. May be that gtf doesn't have any info regarding rRNA?
                            Your original GTF file does not have any information about rRNA. We have confirmed this now.

                            Comment


                            • #15
                              Originally posted by GenoMax View Post
                              IF the GTF file you downloaded from UCSC is for the exact genome build of the reference you used (and assuming that reference was also from UCSC) for doing TopHat alignments then you can use that GTF file to count the reads that map to the features defined in the rRNA GTF file.
                              Ok. But the featureCounts command is the same as given before? or it should be changed?

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              7 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              7 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              66 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X