Unconfigured Ad

**GenoMax** · 02-19-2016, 08:15 AM

Look at BBDuk.sh from BBMap. It should be intuitive to use and fast. You would want to process paired-end data files together if you have that kind of data.

**guilhem** · 02-19-2016, 08:21 AM

Thank GenoMax.
I saw this interesting post before posting. I was just wondering before trying this if hisat2 can have natively this function since I saw it can trim and 'soft clip' -- which I thought was similar to clipping adapters.

**GenoMax** · 02-19-2016, 08:27 AM

Soft-clipping won't actually remove data. In that sense it is not the same thing as clipping adapter sequences using a dedicated trimming program.

**guilhem** · 02-19-2016, 08:36 AM

Thanks a lot for the link. So, it seems that I would need to clip in an additional step before mapping with hisat2. I am gonna use BBDuk.sh thanks GenoMax!

**GenoMax** · 02-19-2016, 08:47 AM

You don't have to trim but if you need clean sequence files then a pass through the trimming program would keep that data available.

**guilhem** · 02-19-2016, 08:51 AM

But if I do not clip the adapters, mapping will be biased by the adapter sequence, won't it?

**GenoMax** · 02-19-2016, 09:05 AM

If the adapter contamination is short/minimal then the aligner should be able to manage but if you know you have short inserts/adapter dimers etc then it would be best to trim independently. I like to pass all data through a trimming program. If there is no contamination then only thing invested is a bit of time.

**guilhem** · 02-19-2016, 09:09 AM

My adapter is CTGTAGGCACCATCAAT -- quite long I think. My reads are about 30 nt after clipping. And I need to perfect mapping (no mismatch) so I think clipping is necessary here. I am trying the software you adviced me, thanks GenoMax!

**GenoMax** · 02-19-2016, 09:13 AM

What was the original read length (if post clip is 30 bp)? Is this miRNA data?

**guilhem** · 02-19-2016, 01:03 PM

After clipping the read length is around 30nt. This is ribo-seq data (ribosomal footprint: RNA-seq covered by ribosome).

**Brian Bushnell** · 02-19-2016, 07:19 PM

If you need perfect mapping, then absolutely, adapter-trimming is crucial. In general, requiring perfect mapping will incur sequence-dependent bias (as sequencing error rates are sequence-dependent), but that's more of an issue with long reads and may not matter with 30bp reads. Still, it also might matter since ribosomal sequences are typically low-diversity which makes them especially susceptible to sequence-dependent errors.

So... why are you requiring perfect mapping?

**guilhem** · 02-19-2016, 07:31 PM

Thanks Brian.
I am not very familiar with NGS data analysis so I tried to apply the exact protocol described in the original paper: Ribosome profiling is a technique to track the translation pausing (Ingolia 2009). In fact, we freeze the translation at a t time and digest the uncover messenger RNA. Then, we obtained only footprint of the ribosome -- part of the messenger covers by the ribosome. These footprints are sequenced and I use the SRA data from these sequencing.
In the original method introduced by Ingolia et al. 2009, they clipped the adapter, mapped to the genome assembly and they keep only reads with a perfect match (retains only NM tag = 0).

I am not very familiar with NGS data so, I tried to respect closely the original protocol. I have just switched to hisat2 since I found bowtie2 and tophat rather slow.

**GenoMax** · 02-19-2016, 07:41 PM

You should add BBMap alignment as well. I wonder what fraction of your reads would be straight alignment and what fraction would have a splice site, with just 30 nt to work with. @Brian may have a suggestion about parameters to use with BBMap.

**Brian Bushnell** · 02-19-2016, 08:05 PM

Originally posted by GenoMax View Post

@Brian may have a suggestion about parameters to use with BBMap.

Normally, I use the defaults

But for 30bp ribosomal reads, you could add "maxindel=10" (just a random small number I picked). Searching for long indels (which BBMap does by default) is not necessary when aligning to ribosomes (which as far as I know are never spliced); it decreases both speed and sensitivity. BBMap does have a "perfectmode" flag which allows only perfect alignments, but I do not really think it is appropriate in this case (or most situations, especially those involving quantification).

There are a lot of papers written by people who do not fully understand all aspects of what they are doing - who can, these days, in any paper that is not purely theoretical? Often people try to make choices they think are safer and more conservative, overriding the suggested defaults, to minimize risk of a paper being rejected because something was hard to describe or explain. Particularly, in bioinformatics, it is common for people to throw out all reads with any mismatches, or quality-trim to Q30 prior to mapping, etc. These are almost never good ideas! They are typically devised by biologists on the assumption that "My data has variable quality, and is annotated with its actual quality. Therefore, if I throw away low-quality data, my results will be strictly better."

This is absolutely wrong, as it relies on a lot of implicit assumptions (that quality is unrelated to sequence, that quality scores are correct, that trimming low-quality bases yields better mapping, that differences between a read and the reference are due to errors, etc) which may seem obvious, but are false.

I am not trying to slam biologists here - they are experts in their field. It's just important to understand that being an expert in biology does not make one also an expert in statistics, or photonics, or any of the other numerous areas that go in to bioinformatics. So, bioinformatics papers written, reviewed, and published solely by biologists will often have subtle errors in the non-biological part of the methodology - as in this case.

Topics	Statistics	Last Post
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 15 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 49 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 107 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 125 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM

Unconfigured Ad

Clip adapter Hisat2

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News