Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • miRNA Illumina sequencing - low alignment rate

    Dear Colleagues,
    I want to share my strange experience with you, to ask your opinions and help.

    I'm working on the miRNA sequencing for an uncommon plant. I received data from a service company that gave to me a fastq file.
    I already did RNA-seq analysis, so I'm quite familiar with several tools such as FastQC, trimmomatic, Bowtie2, cuffdiff etc.

    I removed the 3' and 5' adapters, provided to me by the service company. The quality control confirmed that the adaptors sequences were right. I used cutadapt to remove the adapters. I have great peaks between 19 and 39 bp, also some reads between 39 and 51 (original reads length with adaptors attached).

    I downloaded the hairpin.fa file from MirBase, without filtering for a specific organism, changing all U in T and removing lines with strange chars (Y, K etc...).
    First strange thing:the alignment rate is very low, about 3%!

    So, I did the alignment again, this time versus the A. thaliana genome. The alignment rate increased to 20%.
    Second strange thing: if I launch htseq-count in order to count alignments, I found 0 for all mirnas!

    I'm sure that I'm wrong in some analysis steps...can someone help me?

    Thanks in advance

  • #2
    Originally posted by wynstep View Post
    Dear Colleagues,
    I want to share my strange experience with you, to ask your opinions and help.

    I'm working on the miRNA sequencing for an uncommon plant. I received data from a service company that gave to me a fastq file.
    I already did RNA-seq analysis, so I'm quite familiar with several tools such as FastQC, trimmomatic, Bowtie2, cuffdiff etc.

    I removed the 3' and 5' adapters, provided to me by the service company. The quality control confirmed that the adaptors sequences were right. I used cutadapt to remove the adapters. I have great peaks between 19 and 39 bp, also some reads between 39 and 51 (original reads length with adaptors attached).

    I downloaded the hairpin.fa file from MirBase, without filtering for a specific organism, changing all U in T and removing lines with strange chars (Y, K etc...).
    First strange thing:the alignment rate is very low, about 3%!

    So, I did the alignment again, this time versus the A. thaliana genome. The alignment rate increased to 20%.
    Second strange thing: if I launch htseq-count in order to count alignments, I found 0 for all mirnas!

    I'm sure that I'm wrong in some analysis steps...can someone help me?

    Thanks in advance
    Anyone helps me? Please!

    Comment


    • #3
      The Illumina miRNA library kit is known to display ligation bias. There is probably something wrong with your library.

      Sequencing bias of small RNAs partially influenced which microRNAs have been studied in depth; therefore most previous small RNA profiling experiments should be re-evaluated. New microRNAs are likely to be found, which were selected against by existing adapters. Preference of currently used adapters …

      Comment


      • #4
        Originally posted by NextGenSeq View Post
        The Illumina miRNA library kit is known to display ligation bias. There is probably something wrong with your library.

        http://www.ncbi.nlm.nih.gov/pubmed/22647250
        Thank you very much for your help!
        So, what is your suggestion? How to proceed to remove or reduce ligation biases?

        Thank you!

        Comment


        • #5
          If someone wants, I can attach the fastqc files after 3' adaptor trimming...in order to have a better overview of my strange situation. I hope someone can help me, cause I finished the ideas on how to solve this problem.

          Tried the adaptor trimming with: trimmomatic, cutadapt, fasts_clipper, novoalign etc...
          Tried mapping with: bowtie, bowtie2, mirdeep2 etc...
          for now I only want to know if there are some known mirnas...

          The only thing I did not try is BLAST.

          Please help!

          Comment


          • #6
            Originally posted by wynstep View Post
            Thank you very much for your help!
            So, what is your suggestion? How to proceed to remove or reduce ligation biases?

            Thank you!
            The paper at the link describes how to reduce ligation bias.

            The Bioo Small RNA kit uses this method for Illumina platforms.

            Ion Torrent has used that method for a couple years for the PGM and Proton sequencers.

            Comment


            • #7
              Seems quite normal to me since major population of sRNAs in Plants (like A thaliana) are not miRNAs but siRNAs (a mixture of 21, 22 and 24 mers) not well conserved and arranged along the genome in cluster. I guess that you have got the mir390 mir168 and others in your mapped miRNAs since they are well conserved in thaliana as well as particular cluster of siRNA, also conserved.

              Comment


              • #8
                Originally posted by NextGenSeq View Post
                The paper at the link describes how to reduce ligation bias.

                The Bioo Small RNA kit uses this method for Illumina platforms.

                Ion Torrent has used that method for a couple years for the PGM and Proton sequencers.
                I've read the paper you suggested, but I didn't find any bioinformatics suggestion on how to treat raw data from sequencing "affected" by Illumina adaptors ligation biases... Am I missing something important into the paper or are they focusing only on a sperimental solution (only on library preparation I mean)?

                Thanks for your help!

                Comment


                • #9
                  Hi wynstep,
                  I have seen that low-mapping libraries can sometimes be attributed to some sort of artifact product that is taking up many of your reads. If this artifact is present in many of your reads, you should be able to find it with FastQC in the overrepresented sequences section. You will probably need to do this after adapter trimming, as otherwise I think the only overrepresented sequences that will be reported are from the 3' adapter. Also, you may want to just try BLASTing some random sequences from your data to see if you can get an indication of what they represent.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM
                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  27 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  30 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  26 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  52 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X