SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Trimming with sliding window before alignment blancha Bioinformatics 2 05-14-2013 01:21 PM
Multiple testing in a sliding window for Tajima's D gwilymh Introductions 0 03-17-2013 03:17 PM
Software for sliding window mathew Bioinformatics 6 12-12-2012 09:16 AM
sliding window approach to find differentially methylated regions maria_mari Bioinformatics 1 09-08-2012 05:44 PM
Sliding window/genome coverage from pileup files? jfk Bioinformatics 0 09-06-2012 08:06 AM

Reply
 
Thread Tools
Old 11-27-2013, 03:39 PM   #1
arcolombo698
Senior Member
 
Location: Los Angeles

Join Date: Nov 2013
Posts: 142
Default Trimmomatic Sliding Window vs. Removing Adapters

Hello.

I have a quick question. WIll a sliding window suffice in removing the adapters? or must I use the ILLUMINACLIP : command.
arcolombo698 is offline   Reply With Quote
Old 11-28-2013, 02:20 AM   #2
usad
Member
 
Location: aachen

Join Date: Sep 2009
Posts: 53
Default

Hi
sliding window will only remove bad quality bases. To remove adapters you should [also] use Illuminaclip.

best
bjr÷n
usad is offline   Reply With Quote
Old 12-12-2013, 04:26 PM   #3
arcolombo698
Senior Member
 
Location: Los Angeles

Join Date: Nov 2013
Posts: 142
Default Finding the fasta file with the adapters

Hello.

I have the sequences of the adapters, but how do I create a fasta file in the correct format to use in ILLUMINACLIP ?
arcolombo698 is offline   Reply With Quote
Old 12-13-2013, 01:21 AM   #4
tonybolger
Senior Member
 
Location: berlin

Join Date: Feb 2010
Posts: 156
Default

Quote:
Originally Posted by arcolombo698 View Post
Hello.

I have the sequences of the adapters, but how do I create a fasta file in the correct format to use in ILLUMINACLIP ?
What library preps are you using? Adapter files for the typical illumina preps (TruSeq and Nextera) are already included with recent versions of the tool.

If you have something unusual, i can help you create appropriate adapter files.

Thanks,

Tony.
tonybolger is offline   Reply With Quote
Old 12-13-2013, 01:25 AM   #5
ebioman
Member
 
Location: Switzerland

Join Date: Aug 2013
Posts: 41
Default

Hi
We have an internal file with many adapter, with the following format:

Code:
>[Oligonucleotide sequences for Genomic DNA 1]
GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG
>[Oligonucleotide sequences for Genomic DNA 2]
ACACTCTTTCCCTACACGACGCTCTTCCGATCT
>[PCR Primers 1]
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT
>[PCR Primers 2]
CAAGCAGAAGACGGCATACGAGCTCTTCCGATCT
>[Genomic DNA Sequencing Primer]
ACACTCTTTCCCTACACGACGCTCTTCCGATCT
>[Paired End DNA oligonucleotide sequences]
GATCGGAAGAGCGGTTCAGCAGGAATGCCGAG
So, just usual plain FASTA
ebioman is offline   Reply With Quote
Old 12-13-2013, 01:29 AM   #6
arcolombo698
Senior Member
 
Location: Los Angeles

Join Date: Nov 2013
Posts: 142
Default

Thank you. I have emailed you already all of my problems. check your email from A.colombo. I used the TruSeq RNA sample Prep kit, and all the indices were in my email I provided. I can re-post an original thread if you would like. I hope my emails have made sense.

To publicly restate the problem

1) I noticed that my adapters matched the TruSeq2-PE.fa and also added all the indices from the illumina adapter sequences.pdf which is available on their website.

However my original FASTQC results , without using trimmomatic are:

Sequence Count Percentage Possible Source
GCAGATAGTGAGGAAAGTTGAGCCAATAATGACGTGAAGTCCGTGGAAGC 55443 0.1791778373403407 No Hit
AGTAGTATAGTGATGCCAGCAGCTAGGACTGGGAGAGATAGGAGAAGTAG 53934 0.17430112871081896 No Hit
GCTTTGGCTCTCCTTGCAAAGTTATTTCTAGTTAATTCATTATGCAGAAG 52865 0.17084638946300004 No Hit
TTTGATGGTAAGGGAGGGATCGTTGACCTCGTCTGTTATGTAAAGGATGC 47976 0.15504637058312476 No Hit
GCCATATCGGGGGCACCGATTATTAGGGGAACTAGTCAGTTGCCAAAGCC 35179 0.11368968381573591 No Hit
AGCTTTGGCTCTCCTTGCAAAGTTATTTCTAGTTAATTCATTATGCAGAA 32875 0.10624373505336474 No Hit
GTCAGTTCAGTGTTTTAATCTGACGCAGGCTTATGCGGAGGAGAATGTTT 32490 0.10499951184437475 No Hit
TTGTCAGTTCAGTGTTTTAATCTGACGCAGGCTTATGCGGAGGAGAATGT 32329 0.1044792003206153 No Hit
AGTTAGATTTACGCCGATGAATATGATAGTGAAATGGATTTTGGCGTAGG 31317 0.1012086707426988 No Hit
[FAIL] Kmer Content




after using a custom adapter.fa my results did not get rid of the adapters, but reduced them greatly. What are the best parameters which would remove the adapters, but still giving a sufficient read length?

Trimmed (improved quality results)

GCTTTGGCTCTCCTTGCAAAGTTATTTCTAGTTAATTCATTATGCAGAAG 32355 0.15813829366068524 No Hit
TTTGATGGTAAGGGAGGGATCGTTGACCTCGTCTGTTATGTAAAGGATGC 31148 0.15223896062256292 No Hit
GCAGATAGTGAGGAAAGTTGAGCCAATAATGACGTGAAGTCCGTGGAAGC 30047 0.14685771317022436 No Hit
AGTAGTATAGTGATGCCAGCAGCTAGGACTGGGAGAGATAGGAGAAGTAG 24201 0.11828480435426497 No Hit
GCCATATCGGGGGCACCGATTATTAGGGGAACTAGTCAGTTGCCAAAGCC 23374 0.1142427592651787 No Hit
TTGTCAGTTCAGTGTTTTAATCTGACGCAGGCTTATGCGGAGGAGAATGT 23325 0.11400326687175034 No Hit
AGTTAGATTTACGCCGATGAATATGATAGTGAAATGGATTTTGGCGTAGG 22057 0.10780579024180911 No Hit
AGCTTTGGCTCTCCTTGCAAAGTTATTTCTAGTTAATTCATTATGCAGAA 20926 0.10227791479349403 No Hit
GTCAGTTCAGTGTTTTAATCTGACGCAGGCTTATGCGGAGGAGAATGTTT 20488


Thank you very much. best
arcolombo698 is offline   Reply With Quote
Old 12-13-2013, 01:39 AM   #7
ebioman
Member
 
Location: Switzerland

Join Date: Aug 2013
Posts: 41
Default

Since you removed almost half of them might it be possible that you remove only one direction ?

From the Manual:

Quote:
If you want to check for the reverse-complement of a specific sequence, you need to

specifically include the reverse-complemented form of the sequence as well, with another

name. As an example have a look at the TruSeq2-PE.fa file

>PCR_Primer1

AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT

>PCR_Primer1_rc

AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT
I would just use a small fraction of your reads and test it otherwise with different settings.

cheers
ebioman is offline   Reply With Quote
Old 12-13-2013, 01:57 AM   #8
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default

just quickly looking at the FastQC results by eye, I don't see any that match the Illumina adapters.

Most of the adapter sequences occur towards the 3' end of the reads, whereas the over-represented sequences reported by FastQC are from the first (5' end) 50 bases of the reads.
mastal is offline   Reply With Quote
Reply

Tags
trimmomatic

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:56 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO