Hi, I found this previous discussion which covers a lot of what I'd like to know:
but not quite all! I am working with HaloPlex data. Before alignment, I need to remove Haloplex adapters, and also clip 5bp from both ends of both forward and reverse reads. I should also not be left with any empty or orphan (i.e. unmatched reads).
I had previously been taking an approach to trim adapters with cutadapt, use a separate Perl script to remove the 5bp, then re-run cutadapt with a 'fake' adpater sequence to drop zero-length reads, then finally run another script to drop orphans. While this works, it seems tools like Trimmomatic or Trim Galore could achieve the same in a more efficient one-step manner.
My problem is therefore that neither tool seems to deal with both ends of the reads:
Trimmomatic has 'CROP: Cut the read to a specified length by removing bases from the end'
Trim Galore has --clip_R1 <int> and --clip_R2 <int> to remove <int> bp from the 5' end of read 1 and read 2.
Unless I've misunderstood, this only deals with one end of the reads. The reason I need to clip these bases from both ends is to remove residual bases from the restriction enzyme footprint.
TIA!
but not quite all! I am working with HaloPlex data. Before alignment, I need to remove Haloplex adapters, and also clip 5bp from both ends of both forward and reverse reads. I should also not be left with any empty or orphan (i.e. unmatched reads).
I had previously been taking an approach to trim adapters with cutadapt, use a separate Perl script to remove the 5bp, then re-run cutadapt with a 'fake' adpater sequence to drop zero-length reads, then finally run another script to drop orphans. While this works, it seems tools like Trimmomatic or Trim Galore could achieve the same in a more efficient one-step manner.
My problem is therefore that neither tool seems to deal with both ends of the reads:
Trimmomatic has 'CROP: Cut the read to a specified length by removing bases from the end'
Trim Galore has --clip_R1 <int> and --clip_R2 <int> to remove <int> bp from the 5' end of read 1 and read 2.
Unless I've misunderstood, this only deals with one end of the reads. The reason I need to clip these bases from both ends is to remove residual bases from the restriction enzyme footprint.
TIA!
Comment