Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • guilhem
    Member
    • Feb 2016
    • 10

    Clip adapter Hisat2

    Hi,

    I am processing analysis on reads obtained from ribosome profiling experiments.
    I need first to clip adapter before mapping my reads.
    This step, however, is very time consuming with fastx_clipper.
    I am wondering if there is any other way to it faster, for instance directly into hisat2 would be awesome.
    Thanks for your advises,

    G
  • GenoMax
    Senior Member
    • Feb 2008
    • 7142

    #2
    Look at BBDuk.sh from BBMap. It should be intuitive to use and fast. You would want to process paired-end data files together if you have that kind of data.

    Comment

    • guilhem
      Member
      • Feb 2016
      • 10

      #3
      Thank GenoMax.
      I saw this interesting post before posting. I was just wondering before trying this if hisat2 can have natively this function since I saw it can trim and 'soft clip' -- which I thought was similar to clipping adapters.

      Comment

      • GenoMax
        Senior Member
        • Feb 2008
        • 7142

        #4
        Soft-clipping won't actually remove data. In that sense it is not the same thing as clipping adapter sequences using a dedicated trimming program.

        Comment

        • guilhem
          Member
          • Feb 2016
          • 10

          #5
          Thanks a lot for the link. So, it seems that I would need to clip in an additional step before mapping with hisat2. I am gonna use BBDuk.sh thanks GenoMax!

          Comment

          • GenoMax
            Senior Member
            • Feb 2008
            • 7142

            #6
            You don't have to trim but if you need clean sequence files then a pass through the trimming program would keep that data available.

            Comment

            • guilhem
              Member
              • Feb 2016
              • 10

              #7
              But if I do not clip the adapters, mapping will be biased by the adapter sequence, won't it?

              Comment

              • GenoMax
                Senior Member
                • Feb 2008
                • 7142

                #8
                If the adapter contamination is short/minimal then the aligner should be able to manage but if you know you have short inserts/adapter dimers etc then it would be best to trim independently. I like to pass all data through a trimming program. If there is no contamination then only thing invested is a bit of time.
                Last edited by GenoMax; 02-19-2016, 09:10 AM.

                Comment

                • guilhem
                  Member
                  • Feb 2016
                  • 10

                  #9
                  My adapter is CTGTAGGCACCATCAAT -- quite long I think. My reads are about 30 nt after clipping. And I need to perfect mapping (no mismatch) so I think clipping is necessary here. I am trying the software you adviced me, thanks GenoMax!

                  Comment

                  • GenoMax
                    Senior Member
                    • Feb 2008
                    • 7142

                    #10
                    What was the original read length (if post clip is 30 bp)? Is this miRNA data?

                    Comment

                    • guilhem
                      Member
                      • Feb 2016
                      • 10

                      #11
                      After clipping the read length is around 30nt. This is ribo-seq data (ribosomal footprint: RNA-seq covered by ribosome).

                      Comment

                      • Brian Bushnell
                        Super Moderator
                        • Jan 2014
                        • 2709

                        #12
                        If you need perfect mapping, then absolutely, adapter-trimming is crucial. In general, requiring perfect mapping will incur sequence-dependent bias (as sequencing error rates are sequence-dependent), but that's more of an issue with long reads and may not matter with 30bp reads. Still, it also might matter since ribosomal sequences are typically low-diversity which makes them especially susceptible to sequence-dependent errors.

                        So... why are you requiring perfect mapping?

                        Comment

                        • guilhem
                          Member
                          • Feb 2016
                          • 10

                          #13
                          Thanks Brian.
                          I am not very familiar with NGS data analysis so I tried to apply the exact protocol described in the original paper: Ribosome profiling is a technique to track the translation pausing (Ingolia 2009). In fact, we freeze the translation at a t time and digest the uncover messenger RNA. Then, we obtained only footprint of the ribosome -- part of the messenger covers by the ribosome. These footprints are sequenced and I use the SRA data from these sequencing.
                          In the original method introduced by Ingolia et al. 2009, they clipped the adapter, mapped to the genome assembly and they keep only reads with a perfect match (retains only NM tag = 0).

                          I am not very familiar with NGS data so, I tried to respect closely the original protocol. I have just switched to hisat2 since I found bowtie2 and tophat rather slow.

                          Comment

                          • GenoMax
                            Senior Member
                            • Feb 2008
                            • 7142

                            #14
                            You should add BBMap alignment as well. I wonder what fraction of your reads would be straight alignment and what fraction would have a splice site, with just 30 nt to work with. @Brian may have a suggestion about parameters to use with BBMap.

                            Comment

                            • Brian Bushnell
                              Super Moderator
                              • Jan 2014
                              • 2709

                              #15
                              Originally posted by GenoMax View Post
                              @Brian may have a suggestion about parameters to use with BBMap.
                              Normally, I use the defaults But for 30bp ribosomal reads, you could add "maxindel=10" (just a random small number I picked). Searching for long indels (which BBMap does by default) is not necessary when aligning to ribosomes (which as far as I know are never spliced); it decreases both speed and sensitivity. BBMap does have a "perfectmode" flag which allows only perfect alignments, but I do not really think it is appropriate in this case (or most situations, especially those involving quantification).

                              There are a lot of papers written by people who do not fully understand all aspects of what they are doing - who can, these days, in any paper that is not purely theoretical? Often people try to make choices they think are safer and more conservative, overriding the suggested defaults, to minimize risk of a paper being rejected because something was hard to describe or explain. Particularly, in bioinformatics, it is common for people to throw out all reads with any mismatches, or quality-trim to Q30 prior to mapping, etc. These are almost never good ideas! They are typically devised by biologists on the assumption that "My data has variable quality, and is annotated with its actual quality. Therefore, if I throw away low-quality data, my results will be strictly better."

                              This is absolutely wrong, as it relies on a lot of implicit assumptions (that quality is unrelated to sequence, that quality scores are correct, that trimming low-quality bases yields better mapping, that differences between a read and the reference are due to errors, etc) which may seem obvious, but are false.

                              I am not trying to slam biologists here - they are experts in their field. It's just important to understand that being an expert in biology does not make one also an expert in statistics, or photonics, or any of the other numerous areas that go in to bioinformatics. So, bioinformatics papers written, reviewed, and published solely by biologists will often have subtle errors in the non-biological part of the methodology - as in this case.

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                by SEQadmin2


                                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                                Here are nine questions we think about, in roughly the order they matter, before...
                                06-18-2026, 07:11 AM
                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 06-26-2026, 11:10 AM
                              0 responses
                              15 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-17-2026, 06:09 AM
                              0 responses
                              49 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-09-2026, 11:58 AM
                              0 responses
                              107 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              125 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...