Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • miRNA reads alignment with bowtie2

    Hi everyone,

    I'm working on miRNA data right now (which is very new field for me). I'm trying to filter out only those reads which belongs to miRBase. Here is my procedure.
    Align all reads with bowtie2 (default settings) to the reference build index. Which gives me 92.23% overall alignment rate - looks good so far.
    Building new index with miRBase hairpin data (with U/T conversion) and then bowtie2 alignment to this index. Last alignment provides only 5.17% overall alignment rate.
    Also I've checked first alignment with genome browser using miRNA annotations which clearly shows that reads rarely align to mRNA so it should be fine.

    Do you have any clue how to solve this problem? Is it common to have only ~5% coverage of miRNA sequences while data originate from miRNA sequencing?

  • #2
    For miRNA data, I would suggest using bowtie instead of bowtie2, since the author claims bowtie to be more sensitive for reads <50bps.

    Also, I would suggest you make a database from Rfam and ncbi/Ensembl's known non-coding RNA (mostly rRNA,tRNA,snoRNA, etc). Map reads against this first and you'll see how much contamination you have in the data.

    Comment


    • #3
      Don't use Bowtie1 unless you've checked it with simulated reads yourself. With perfect 20nt reads Bowtie1 has a 15% error rate compared to Bowtie2 (~2%)
      Alignment of microRNA to the genome poses a particular challenge because the reads are short, and some microRNAs are nearly identical. More...

      Comment


      • #4
        Originally posted by mziemann View Post
        Don't use Bowtie1 unless you've checked it with simulated reads yourself. With perfect 20nt reads Bowtie1 has a 15% error rate compared to Bowtie2 (~2%)
        http://genomespot.blogspot.com.au/20...-compared.html
        Thanks for the input, so bowtie1 has the highest mapping rate but also significantly high incorrect rate. Sound like a terrible tradeoff.

        Comment


        • #5
          I've made some research before and found opinions that bowtie1 is not better than bowtie2 while considering miRNA data. So I'm convinced to use bowtie2 for this purpose, but still I have this problem.

          Comment


          • #6
            Regarding the original Qn its really puzzling that your alignment to miRbase gave such low alignment rates. Did you align to the mature.fa or the hairpin.fa? I would think that you would get higher alignment rates for the hairpin.fa because it will capture most isomiRs. Removing special characters from the fasta headers might help too.

            Comment


            • #7
              Yes, I've aligned reads to the mature.fa also but this alignment provides me only 4.39% of overall alignment rate. As you suggested I'll try remove special characters from those files.

              Comment


              • #8
                Still the same overall alignment rate. Does anyone have some suggestions about bowtie2 parameters for this task?

                Comment


                • #9
                  One explantion could be contamination as Yueluo suggested. Use featureCounts or HTSeq to see where the reads are landing in the reference genome. If these are mostly mRNA and rRNA, then you have a contamination issue.

                  Comment


                  • #10
                    As you suggested I have used featureCounts to check reads location and it is more or less the same (4.6%). So it is an issue of contamination, however I will try to get some valuable data. Guess I can still get additional 1-2% through novel miRNA research.

                    Thank you yueluo and mziemann for your interest and help

                    But I still wonder what is typical overall alignment rate or content of miRNA sequences in miRNA sequencing data. Can anyone provide such information?

                    Comment


                    • #11
                      Originally posted by mziemann View Post
                      Don't use Bowtie1 unless you've checked it with simulated reads yourself. With perfect 20nt reads Bowtie1 has a 15% error rate compared to Bowtie2 (~2%)
                      http://genomespot.blogspot.com.au/20...-compared.html
                      Hi, I read the great blog regularly. I was however a little puzzled by this analysis. My concern is what is meant by incorrectly and correctly mapped reads. If I got it right, the sequences for alignment are randomly cut out from miRNA-hairpins and mapped back against the reference genome. A possible problem is that, by chance, some of these sequences may match other loci as well. For 100% specificity, the aligner can discard such multi-mappers and thus seem very accurate, while for 100% sensitivity it could just report mapping on all corresponding loci. Thus, the large differences between the aligners may be due to different ways (default parameters) of assigning/reporting multi-mappers and not to actual errors. What's your thought on this?

                      Comment


                      • #12
                        @Ahaswer
                        "But I still wonder what is typical overall alignment rate or content of miRNA sequences in miRNA sequencing data. Can anyone provide such information?"

                        Your small RNA seq should be >75% microRNA as a rule of thumb. If it is lower than that, then there is an issue about contamination which happens three ways (1) lots of adapter-only reads (2) degradation of mRNA/rRNAs into small fragments that swamp the miRNAs (3) Size selection of the miRNA-containing fragments went wrong.

                        @peterawe
                        Regarding my analysis, I use featureCounts "-R" option which outputs information regarding which feature each read is mapped to. I purposely use a map quality threshold of 20 to remove multimappers. Reads which are mapped back to the correct microRNA gene (mapq>=20) are "correct" and reads that are mapped to some other part of the genome (mapq>=20) are "incorrect". Reads that are unassigned due to mapq threshold are neither.

                        The % correct/incorrect proportions are relative to the starting number of reads, so they are accurate way of showing the performance. One thing I didn't explore was whether mapq20 was really the right threshold or not;. Who knows, maybe bowtie1 is really good at mapq30.

                        Comment


                        • #13
                          Originally posted by mziemann View Post
                          @Ahaswer
                          "But I still wonder what is typical overall alignment rate or content of miRNA sequences in miRNA sequencing data. Can anyone provide such information?"

                          Your small RNA seq should be >75% microRNA as a rule of thumb. If it is lower than that, then there is an issue about contamination which happens three ways (1) lots of adapter-only reads (2) degradation of mRNA/rRNAs into small fragments that swamp the miRNAs (3) Size selection of the miRNA-containing fragments went wrong.

                          @peterawe
                          Regarding my analysis, I use featureCounts "-R" option which outputs information regarding which feature each read is mapped to. I purposely use a map quality threshold of 20 to remove multimappers. Reads which are mapped back to the correct microRNA gene (mapq>=20) are "correct" and reads that are mapped to some other part of the genome (mapq>=20) are "incorrect". Reads that are unassigned due to mapq threshold are neither.

                          The % correct/incorrect proportions are relative to the starting number of reads, so they are accurate way of showing the performance. One thing I didn't explore was whether mapq20 was really the right threshold or not;. Who knows, maybe bowtie1 is really good at mapq30.
                          Thanks for your reply, I think this may explain the difference. Running bowtie with default parameters returns a MAPQ of 255 for all alignable reads, regardless of multi-mapping. Thus, this part of the bowtie output is not very informative for downstream filtering.

                          @Ahaswer
                          miRNA content depends on biological sample, RNA quality, library preparation protocol and accuracy in gel-cutting. Hoen et al. did smRNA-seq on 465 lymphoblastoid cell lines and reported
                          ...the relative miRNA content in our samples ranged from 2% to 62% of mapped reads, with a median of 19%...
                          Hoen (2013) Nat Biotech. I guess this was an automated protocol, doing manual preparations we routinely get over 50%.

                          Comment


                          • #14
                            Thanks everybody for your replies. That satisfies my curiosity, especially those papers.
                            Also I've checked reads alignment count for each miRNA sequence, based on miRBase annotations, which gives me subset ranging from 5 - 20 000 reads per sequence. Guess despite of mentioned before small alignment rate of overall reads I can still work on this dataset.

                            Comment


                            • #15
                              Since this was never mentioned in the thread, are the reads being adapter-trimmed prior to alignment? If not, that would cause a bias toward long rather than short sequences, and a low alignment rate.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Advancing Precision Medicine for Rare Diseases in Children
                                by seqadmin




                                Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                                12-16-2024, 07:57 AM
                              • seqadmin
                                Recent Advances in Sequencing Technologies
                                by seqadmin



                                Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                                Long-Read Sequencing
                                Long-read sequencing has seen remarkable advancements,...
                                12-02-2024, 01:49 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 12-17-2024, 10:28 AM
                              0 responses
                              26 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-13-2024, 08:24 AM
                              0 responses
                              42 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-12-2024, 07:41 AM
                              0 responses
                              28 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-11-2024, 07:45 AM
                              0 responses
                              42 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X