SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Illumina adapter trimming figo1019 Illumina/Solexa 12 06-03-2014 12:32 PM
FASTXtoolkit adapter trimming Mark Bioinformatics 36 10-24-2013 11:28 AM
Adapter Trimming Nextera mm.perrineau Illumina/Solexa 1 09-12-2012 11:56 AM
3' Adapter Trimming caddymob Bioinformatics 0 05-27-2009 01:53 PM
Adapter trimming in MAQ for SOLiD lgoff Bioinformatics 0 05-11-2009 10:55 AM

Reply
 
Thread Tools
Old 11-12-2012, 05:23 AM   #1
a_mt
Member
 
Location: C:/Program files/Google/Chrome

Join Date: Jul 2012
Posts: 34
Question adapter trimming - help

Hello all,

I am a newbie to NGS analysis. I recently got raw sequenced data (Illumina) for yeast.
Read length is 104 bps. I have average knowledge in R. I used ShortRead package for initial processing of reads. However I have some doubts/queries regarding adapter trimming. I searched many threads here and came to know about lot of trimming tools including these. According to many Trimmomatic found to be the best. Now I want to use this tool on my data. But this tool requires a fasta file (The Adapter Fasta) containing adapter sequence. Now,

1. How to know the adapter sequence??
2. My data is paired end reads, in this case how do I proceed??

How do I create this file?

I know my questions are too lame/simple for this forum and I am extremely sorry for such noob questions.

Help me.

Thank you.
a_mt is offline   Reply With Quote
Old 11-12-2012, 05:45 AM   #2
gaffa
Member
 
Location: Gothenburg/Uppsala, Sweden

Join Date: Oct 2010
Posts: 82
Default

In the best case scenario, the people doing the sequencing experiment will inform you about the sequence of the adapters used during the experiment. However this is not always the case. What you can do if you do not have any information about the adapter sequence is to run a program like FastQC (http://www.bioinformatics.babraham.a...ojects/fastqc/) which will search your fastq files for a number of commonly used adapters. If your sequencing experiment was performed according to some standard protocol then the adapter sequence might very well be included in FastQC's list and if there is substantial adapter contamination in your data then this will be seen in the program report.
gaffa is offline   Reply With Quote
Old 11-12-2012, 05:52 AM   #3
a_mt
Member
 
Location: C:/Program files/Google/Chrome

Join Date: Jul 2012
Posts: 34
Default

Hello Sir,

Thank you for the reply. I randomly extracted 1000 reads (fastq) and ran Fastqc on it.
fastqc reported kmer contamination as fail and gave around 200> 5bp sequences.
What do I do now?? Do I have consider all of them as adapter sequnence.

And my data has come with this file barcodes.txt

Control_1_1 CACTGT
Control_1_2 ATTCCG
Control_1_3 GCTACC
Control_1_4 CGAAAC
Mutant_3_1 GATGCT
Mutant_3_2 AGCTAG
Mutant_3_3 GGCCAC
Mutant_3_4 ATTATA

are these adapters??

Again sorry for noob question!

Thank you.
a_mt is offline   Reply With Quote
Old 11-12-2012, 06:04 AM   #4
gaffa
Member
 
Location: Gothenburg/Uppsala, Sweden

Join Date: Oct 2010
Posts: 82
Default

You should probably use much more than 1000 reads for FastQC - the program is pretty fast, I would use the whole fastq files if they are not enormous. What you can look at in the FastQC output is the "Overrepresented sequences" section, rather than the k-mer content (which is harder to interpret). If there is adapter contamination the adapter sequences should probably show up here, and if its a sequence which is present in FastQC's list of common adapters the identity will be listed in the "Possible Source" column.

And these barcode sequences are most likely the sample barcodes used for multiplexing (putting multiple samples into the same sequencing lane), and are not adapter sequences.
gaffa is offline   Reply With Quote
Old 11-12-2012, 06:19 AM   #5
a_mt
Member
 
Location: C:/Program files/Google/Chrome

Join Date: Jul 2012
Posts: 34
Default

I ran fastqc on larger fastq (>1M reads), and "there are no overrepresented sequences".
Now will it be alright if I continue with the alignment to a reference genome??

Thank you.
a_mt is offline   Reply With Quote
Old 11-12-2012, 06:41 AM   #6
TonyBrooks
Senior Member
 
Location: London

Join Date: Jun 2009
Posts: 298
Default

Quote:
Originally Posted by gaffa View Post
You should probably use much more than 1000 reads for FastQC - the program is pretty fast, I would use the whole fastq files if they are not enormous. What you can look at in the FastQC output is the "Overrepresented sequences" section, rather than the k-mer content (which is harder to interpret). If there is adapter contamination the adapter sequences should probably show up here, and if its a sequence which is present in FastQC's list of common adapters the identity will be listed in the "Possible Source" column.

And these barcode sequences are most likely the sample barcodes used for multiplexing (putting multiple samples into the same sequencing lane), and are not adapter sequences.
Won't that just look for adapter-dimer rather than adapter read-through?
You will need to ask who prepared the libraries for the adapter sequence.
TonyBrooks is offline   Reply With Quote
Old 11-12-2012, 08:36 PM   #7
a_mt
Member
 
Location: C:/Program files/Google/Chrome

Join Date: Jul 2012
Posts: 34
Default

Hello sir,

apparently our samples were "outsourced" for sequencing and they have not given me the adapter sequences. I have mailed them regarding the same. Again thank you very much for the suggestions.

I must say Seqanswers - hats off to you!! I am learning so much from this forum.
a_mt is offline   Reply With Quote
Reply

Tags
adapter trimming, trimmomatic

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:22 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO