SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
trim adapter from Illumina Genome Analyzer IIe miRNA reads NicoBxl Bioinformatics 5 01-02-2014 05:31 AM
Illumina paired-end reads. More than 2 adapter sequences. RedLightPanic Illumina/Solexa 8 03-07-2013 12:27 PM
How much adapter contamination is common? knostrov Bioinformatics 6 03-07-2013 10:49 AM
yeast rRNA contamination: Illumina prep rmetz Sample Prep / Library Generation 1 04-26-2011 08:24 PM
Massive (viral?) contamination of Illumina reads modmp General 6 09-24-2010 09:53 AM

Reply
 
Thread Tools
Old 03-09-2014, 09:07 PM   #1
paa6
Member
 
Location: south korea

Join Date: Feb 2014
Posts: 68
Default how ro see adapter contamination in Illumina reads

I have illumina read file..which is bacterial DNA sequence...I have used geneious software to assembly it, while assembly I have found that there was vector contamination and it was removed by software since I have given trimming option and I got 1,610 contigs.

but now I am performing the same assembly by using velvet. I have my fastqc report and according to that report sequence duplication level is bad, overrepresented sequences and kmer content showing warning. (I have attached these three files) So, I reached to conclusion that I have adapter contamination on the basis of the sequence I have got in overrepresented sequences. I have seen that GATCGGAAGAGC is adapter contamination because I have seen it in adapter files provided to custmoer given by illumina technology.

Problem is my PI asked me to find that adaptor contamination sequence in my reads, which I was not able to So, he asked me que. that why can't u find it?? I am new to de novo assembly, I dont know what am I supposed to answer and he gave me 1 hrs. to find it. Please help!!!
paa6 is offline   Reply With Quote
Old 03-09-2014, 10:07 PM   #2
relipmoc
Member
 
Location: Los Angeles, CA

Join Date: Jul 2011
Posts: 58
Default

Quote:
Originally Posted by paa6 View Post
I have illumina read file..which is bacterial DNA sequence...I have used geneious software to assembly it, while assembly I have found that there was vector contamination and it was removed by software since I have given trimming option and I got 1,610 contigs.

but now I am performing the same assembly by using velvet. I have my fastqc report and according to that report sequence duplication level is bad, overrepresented sequences and kmer content showing warning. (I have attached these three files) So, I reached to conclusion that I have adapter contamination on the basis of the sequence I have got in overrepresented sequences. I have seen that GATCGGAAGAGC is adapter contamination because I have seen it in adapter files provided to custmoer given by illumina technology.

Problem is my PI asked me to find that adaptor contamination sequence in my reads, which I was not able to So, he asked me que. that why can't u find it?? I am new to de novo assembly, I dont know what am I supposed to answer and he gave me 1 hrs. to find it. Please help!!!
try
Code:
$ grep -c 'GATCGGAAGAGC' reads.fastq
$ grep -c reads.fastq | awk '{print $1/4}'
then you will get an estimation of the contaminant ratio.

for adapter trimming, I suggest using skewer. For your case, you don't need to specify the adapter sequence since it's the same as the default TruSeq3 adapter sequence.

Good luck!
relipmoc is offline   Reply With Quote
Old 03-09-2014, 10:40 PM   #3
paa6
Member
 
Location: south korea

Join Date: Feb 2014
Posts: 68
Default

Quote:
Originally Posted by relipmoc View Post
try
Code:
$ grep -c 'GATCGGAAGAGC' reads.fastq
$ grep -c reads.fastq | awk '{print $1/4}'
then you will get an estimation of the contaminant ratio.

for adapter trimming, I suggest using skewer. For your case, you don't need to specify the adapter sequence since it's the same as the default TruSeq3 adapter sequence.

Good luck!
THanks for the quick reply!! I have typed $ grep -c 'GATCGGAAGAGC' reads.fastq
and I got 28875..what is this mean??
also I am doing SE assembly while skewer is for PE...

Last edited by paa6; 03-09-2014 at 10:42 PM.
paa6 is offline   Reply With Quote
Old 03-10-2014, 12:18 AM   #4
yueluo
Member
 
Location: Guangzhou China

Join Date: Aug 2013
Posts: 82
Default

You can type grep --help for a brief description of OPTIONS for grep.

Quote:
-c, --count only print a count of matching lines per FILE
The result you got was 28875, suggesting that 28875 reads contained the substring of 'GATCGGAAGAGC' - which is most likely adapter contamination.
yueluo is offline   Reply With Quote
Old 03-10-2014, 01:31 AM   #5
paa6
Member
 
Location: south korea

Join Date: Feb 2014
Posts: 68
Default

Quote:
Originally Posted by yueluo View Post
you can type grep --help for a brief description of options for grep.



The result you got was 28875, suggesting that 28875 reads contained the substring of 'gatcggaagagc' - which is most likely adapter contamination.
ohh ok thanks!!!
paa6 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:33 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO