Seqanswers Leaderboard Ad

**Brian Bushnell** · 07-07-2014, 12:26 PM

You can do this with BBMerge:

bbmerge.sh in=x1.fq in2=x2.fq outm=y1.fq outm2=y2.fq mininsert=15 minoi=10 tbo

That will trim the reads based on reverse-complement overlap and send only the successfully overlapped reads to the outm destination, retaining pairing. Alternatively, you could just output the consensus merged reads like this, so you don't have to deal with pairs anymore:

bbmerge.sh in=x1.fq in2=x2.fq outm=y.fq mininsert=15 minoi=10

The result will not have anything with inserts shorter than 15bp, but it will have reads with inserts longer than 35bp, so you can filter the remainder like this:

reformat.sh in=y.fq out=z.fq maxlen=35

Reformat will accept paired or unpaired input.

Note that this method (looking for reverse-complement overlap) may result in fewer reads being retained than looking for adapters, because reads that overlap in multiple different orientations will be discarded as ambiguous. So trimming based on adapters and filtering by resultant length is also viable, but that method will also miss some reads (the ones with high error rate in adapter sequence). The reads each method misses will be different. I have another program, BBDuk, which can trim reads based on both adapter sequence AND overlap, which would be optimal for your case... but currently it has some constants set for overlap mode that will not work well on small RNAs, so I need to make them parameters.

**frnz426** · 07-09-2014, 09:04 AM

Originally posted by Brian Bushnell View Post

You can do this with BBMerge:

bbmerge.sh in=x1.fq in2=x2.fq outm=y1.fq outm2=y2.fq mininsert=15 minoi=10 tbo

That will trim the reads based on reverse-complement overlap and send only the successfully overlapped reads to the outm destination, retaining pairing. Alternatively, you could just output the consensus merged reads like this, so you don't have to deal with pairs anymore:

bbmerge.sh in=x1.fq in2=x2.fq outm=y.fq mininsert=15 minoi=10

The result will not have anything with inserts shorter than 15bp, but it will have reads with inserts longer than 35bp, so you can filter the remainder like this:

reformat.sh in=y.fq out=z.fq maxlen=35

Reformat will accept paired or unpaired input.

Note that this method (looking for reverse-complement overlap) may result in fewer reads being retained than looking for adapters, because reads that overlap in multiple different orientations will be discarded as ambiguous. So trimming based on adapters and filtering by resultant length is also viable, but that method will also miss some reads (the ones with high error rate in adapter sequence). The reads each method misses will be different. I have another program, BBDuk, which can trim reads based on both adapter sequence AND overlap, which would be optimal for your case... but currently it has some constants set for overlap mode that will not work well on small RNAs, so I need to make them parameters.

Thanks for your reply. Would this work even though the insert size is smaller than a single read? As far as I know, most of these programs are optimized to run with inserts larger than a single read, but smaller than two of the reads. Also, what about the adapters? Would those need to be trimmed beforehand?

**Brian Bushnell** · 07-09-2014, 10:53 AM

It works fine with insert size shorter than read length, with the extra flags "mininsert=15 minoi=10". By default it doesn't look for insert sizes shorter than 35bp.

You can trim adapters beforehand if you want but it's not necessary unless the r1 adapter and r2 adapter are close to being reverse-complementary. Still, it may improve the results.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

small RNA-seq size filtering

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News