Seqanswers Leaderboard Ad

**relipmoc** · 03-09-2014, 10:07 PM

Originally posted by paa6 View Post

I have illumina read file..which is bacterial DNA sequence...I have used geneious software to assembly it, while assembly I have found that there was vector contamination and it was removed by software since I have given trimming option and I got 1,610 contigs.

but now I am performing the same assembly by using velvet. I have my fastqc report and according to that report sequence duplication level is bad, overrepresented sequences and kmer content showing warning. (I have attached these three files) So, I reached to conclusion that I have adapter contamination on the basis of the sequence I have got in overrepresented sequences. I have seen that GATCGGAAGAGC is adapter contamination because I have seen it in adapter files provided to custmoer given by illumina technology.

Problem is my PI asked me to find that adaptor contamination sequence in my reads, which I was not able to So, he asked me que. that why can't u find it?? I am new to de novo assembly, I dont know what am I supposed to answer and he gave me 1 hrs. to find it. Please help!!!

try

Code:

$ grep -c 'GATCGGAAGAGC' reads.fastq
$ grep -c reads.fastq | awk '{print $1/4}'

then you will get an estimation of the contaminant ratio.

for adapter trimming, I suggest using skewer. For your case, you don't need to specify the adapter sequence since it's the same as the default TruSeq3 adapter sequence.

Good luck!

**paa6** · 03-09-2014, 10:40 PM

Originally posted by relipmoc View Post

try

Code:

$ grep -c 'GATCGGAAGAGC' reads.fastq
$ grep -c reads.fastq | awk '{print $1/4}'

then you will get an estimation of the contaminant ratio.

for adapter trimming, I suggest using skewer. For your case, you don't need to specify the adapter sequence since it's the same as the default TruSeq3 adapter sequence.

Good luck!

THanks for the quick reply!! I have typed $ grep -c 'GATCGGAAGAGC' reads.fastq
and I got 28875..what is this mean??
also I am doing SE assembly while skewer is for PE...

**yueluo** · 03-10-2014, 12:18 AM

You can type grep --help for a brief description of OPTIONS for grep.

-c, --count only print a count of matching lines per FILE

The result you got was 28875, suggesting that 28875 reads contained the substring of 'GATCGGAAGAGC' - which is most likely adapter contamination.

**paa6** · 03-10-2014, 01:31 AM

Originally posted by yueluo View Post

you can type grep --help for a brief description of options for grep.

The result you got was 28875, suggesting that 28875 reads contained the substring of 'gatcggaagagc' - which is most likely adapter contamination.

ohh ok thanks!!!

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

how ro see adapter contamination in Illumina reads

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News