Seqanswers Leaderboard Ad

**aprice67** · 04-09-2013, 11:21 AM

Hi, so I'm working with some similar data. Something I found is that alot of trimming tools aren't really set up for paired end stuff. I have a pipeline for trimming and aligning reads. It goes basically like this:

//There are first two files, paired end illumina. This removes all the ones that failed basic quality checks. Outputs to Filtered
grep -A 3 '^@.* [^:]*:N:[^:]*:' $INPUT1 > $FILTERED1
grep -A 3 '^@.* [^:]*:N:[^:]*:' $INPUT2 > $FILTERED2

//This tool is good for dealing with paired end reads. Best that I could find for paired end trimming. I don't remember all the parameters but theres a great resource out there describing this tool.
fastq-mcf -o $OUTPUT1 -o $OUTPUT2 -l 16 -q 15 -w 4 -x 10 -u -P 33 $ADAPTERS $FILTERED1 $FILTERED2

//This aligns using bowtie and gets a samfile made.
bowtie -t -p 8 --sam $REF_GENOME -1 $OUTPUT1 -2 $OUTPUT2 $ALIGNED_OUTPUT

//This makes a sorted bam file from our bowtie alignment, which can be used for all sorts of things.
samtools view -bS $ALIGNED_OUTPUT | samtools sort - $SORTED_BAM
samtools index $SORTED_BAM.bam $SORTED_BAM.bam.bai

That's pretty much how I'm doing it for my data. It works pretty well. As for those nasty overrepresented sequences. I'm guessing you're doing quality assessment with fastqc, which is a great tool. In my case, I did RNA-seq on bacterial genomes, so my read depth is really really high, because the genome is small. Add to that some highly expressed genes and you get queues for highly represented sequences. I'm basically ignoring them in my data, but think about how overrepresented sequences apply to your data and how bad or not important they really are.

Hope this helps.

**candida** · 05-02-2017, 07:58 PM

Hello everyone,

I am working with TruSeq paired end data (150bp). I have a doubt regarding the adapter file provided in Trimmomatic for trimming adapters.

According to the Trimmomatic provided adapter file "TruSeq3-PE-2.fa" the reverse complement of index adapter sequence is used for trimming reads from R2 file and the universal adapter is used for trimming reads from R1 file.
>PrefixPE/1 TACACTCTTTCCCTACACGACGCTCTTCCGATCT

>PrefixPE/2 GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT

>PE1 TACACTCTTTCCCTACACGACGCTCTTCCGATCT

>PE1_rc AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA

>PE2 GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT

>PE2_rc AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC

However, it looks like that for my data the actual sequences of the index adapter is in the R1 file and the reverse complement of the universal adapter is in the R2 file.

This information was also provided to me by Illumina support team.

What sequences do I use for adapter trimming | Illumina Knowledge

https://support.illumina.com/bulletins/2016/12/what-sequences-do-i-use-for-adapter-trimming.html

Therefore I prepared my adapter file as follows (I'm using the full sequence):
>PrefixPE/1 AGATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG (index adapter)

>PrefixPE/2 AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT ( reverse complement of universal adapter)

>PE1 AGATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG

>PE1_rc CAAGCAGAAGACGGCATACGAGATCGTGATGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT (revcomp of PE1)

>PE2 AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT

>PE2_rc AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT (revcomp of PE2)

Please let me know if this adapter file I prepared is fine or is the Trimmomatic adapter file better and needs to be used always.
I tried my custom made file as well as the Trimmomatic recommended file and found that both removed adapters when checked using FASTQC!!

Please correct me or let me know if I'm missing something!
Appreciate your help and guidane!
Thanks,
Candida

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Latest Articles

ad_right_rmr

News