Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Clip adapter Hisat2

    Hi,

    I am processing analysis on reads obtained from ribosome profiling experiments.
    I need first to clip adapter before mapping my reads.
    This step, however, is very time consuming with fastx_clipper.
    I am wondering if there is any other way to it faster, for instance directly into hisat2 would be awesome.
    Thanks for your advises,

    G

  • #2
    Look at BBDuk.sh from BBMap. It should be intuitive to use and fast. You would want to process paired-end data files together if you have that kind of data.

    Comment


    • #3
      Thank GenoMax.
      I saw this interesting post before posting. I was just wondering before trying this if hisat2 can have natively this function since I saw it can trim and 'soft clip' -- which I thought was similar to clipping adapters.

      Comment


      • #4
        Soft-clipping won't actually remove data. In that sense it is not the same thing as clipping adapter sequences using a dedicated trimming program.

        Comment


        • #5
          Thanks a lot for the link. So, it seems that I would need to clip in an additional step before mapping with hisat2. I am gonna use BBDuk.sh thanks GenoMax!

          Comment


          • #6
            You don't have to trim but if you need clean sequence files then a pass through the trimming program would keep that data available.

            Comment


            • #7
              But if I do not clip the adapters, mapping will be biased by the adapter sequence, won't it?

              Comment


              • #8
                If the adapter contamination is short/minimal then the aligner should be able to manage but if you know you have short inserts/adapter dimers etc then it would be best to trim independently. I like to pass all data through a trimming program. If there is no contamination then only thing invested is a bit of time.
                Last edited by GenoMax; 02-19-2016, 09:10 AM.

                Comment


                • #9
                  My adapter is CTGTAGGCACCATCAAT -- quite long I think. My reads are about 30 nt after clipping. And I need to perfect mapping (no mismatch) so I think clipping is necessary here. I am trying the software you adviced me, thanks GenoMax!

                  Comment


                  • #10
                    What was the original read length (if post clip is 30 bp)? Is this miRNA data?

                    Comment


                    • #11
                      After clipping the read length is around 30nt. This is ribo-seq data (ribosomal footprint: RNA-seq covered by ribosome).

                      Comment


                      • #12
                        If you need perfect mapping, then absolutely, adapter-trimming is crucial. In general, requiring perfect mapping will incur sequence-dependent bias (as sequencing error rates are sequence-dependent), but that's more of an issue with long reads and may not matter with 30bp reads. Still, it also might matter since ribosomal sequences are typically low-diversity which makes them especially susceptible to sequence-dependent errors.

                        So... why are you requiring perfect mapping?

                        Comment


                        • #13
                          Thanks Brian.
                          I am not very familiar with NGS data analysis so I tried to apply the exact protocol described in the original paper: Ribosome profiling is a technique to track the translation pausing (Ingolia 2009). In fact, we freeze the translation at a t time and digest the uncover messenger RNA. Then, we obtained only footprint of the ribosome -- part of the messenger covers by the ribosome. These footprints are sequenced and I use the SRA data from these sequencing.
                          In the original method introduced by Ingolia et al. 2009, they clipped the adapter, mapped to the genome assembly and they keep only reads with a perfect match (retains only NM tag = 0).

                          I am not very familiar with NGS data so, I tried to respect closely the original protocol. I have just switched to hisat2 since I found bowtie2 and tophat rather slow.

                          Comment


                          • #14
                            You should add BBMap alignment as well. I wonder what fraction of your reads would be straight alignment and what fraction would have a splice site, with just 30 nt to work with. @Brian may have a suggestion about parameters to use with BBMap.

                            Comment


                            • #15
                              Originally posted by GenoMax View Post
                              @Brian may have a suggestion about parameters to use with BBMap.
                              Normally, I use the defaults But for 30bp ribosomal reads, you could add "maxindel=10" (just a random small number I picked). Searching for long indels (which BBMap does by default) is not necessary when aligning to ribosomes (which as far as I know are never spliced); it decreases both speed and sensitivity. BBMap does have a "perfectmode" flag which allows only perfect alignments, but I do not really think it is appropriate in this case (or most situations, especially those involving quantification).

                              There are a lot of papers written by people who do not fully understand all aspects of what they are doing - who can, these days, in any paper that is not purely theoretical? Often people try to make choices they think are safer and more conservative, overriding the suggested defaults, to minimize risk of a paper being rejected because something was hard to describe or explain. Particularly, in bioinformatics, it is common for people to throw out all reads with any mismatches, or quality-trim to Q30 prior to mapping, etc. These are almost never good ideas! They are typically devised by biologists on the assumption that "My data has variable quality, and is annotated with its actual quality. Therefore, if I throw away low-quality data, my results will be strictly better."

                              This is absolutely wrong, as it relies on a lot of implicit assumptions (that quality is unrelated to sequence, that quality scores are correct, that trimming low-quality bases yields better mapping, that differences between a read and the reference are due to errors, etc) which may seem obvious, but are false.

                              I am not trying to slam biologists here - they are experts in their field. It's just important to understand that being an expert in biology does not make one also an expert in statistics, or photonics, or any of the other numerous areas that go in to bioinformatics. So, bioinformatics papers written, reviewed, and published solely by biologists will often have subtle errors in the non-biological part of the methodology - as in this case.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-27-2024, 06:37 PM
                              0 responses
                              12 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-27-2024, 06:07 PM
                              0 responses
                              11 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              53 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              69 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X