SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Demultiplexing MiSeq run reads. memento Illumina/Solexa 5 09-10-2014 05:24 AM
problem with R2 reads from Miseq m_elena_bioinfo Bioinformatics 2 04-23-2013 01:41 AM
1x500 miseq reads instead of 2x250? koadman Illumina/Solexa 4 03-15-2013 01:15 PM
Miseq index reads missing scotoma Illumina/Solexa 4 11-06-2012 01:12 PM
weird 454 reads eoh001 454 Pyrosequencing 2 09-21-2011 06:25 PM

Reply
 
Thread Tools
Old 05-23-2013, 07:46 AM   #1
reubennowell
Member
 
Location: Edinburgh

Join Date: Jan 2013
Posts: 18
Default First 20bp of MiSeq reads are weird

Hello,

The first 20 bases of my MiSeq reads show abnormal %A, T, G and C, as evidenced by the 'per base sequence content' tab of the FastQC report (see the attached PNG). The per base GC content is similarly weird, but the quality of these bases is good.

The issue can be easily rectified by removing the first 20 bp of each read, but can anyone enlighten me as to what is causing this? I have used both CutAdapt and TagDust on these reads to get rid of adapter sequences. I thought maybe it was the Illumina barcodes, except the barcode sequence is usually contained within the fastq header, thus:

Code:
@MVM-RI-I124161:11:000000000-A3985:1:1101:18249:1757 1:N:0:TAAGGCGANAGATCGC
And searching for this sequence (i.e. TAAGGCGANAGATCGC as above) doesn't reveal it to be at the start of the read.

What is it?? And what's the best way of dealing with it? Simply chop the first 20bp off my reads or is it something that requires a bit more QC?

Thanks!
Attached Images
File Type: png fastqc.png (33.6 KB, 75 views)
reubennowell is offline   Reply With Quote
Old 05-23-2013, 10:40 AM   #2
Bukowski
Senior Member
 
Location: Aberdeen, Scotland

Join Date: Jan 2010
Posts: 388
Default

Is that RNA-Seq by any chance?

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2896536/
Bukowski is offline   Reply With Quote
Old 05-23-2013, 11:00 AM   #3
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default First 20bp of MiSeq reads are weird

We are seeing this in most of our Illumina data too, and we aren't doing RNA-Seq.

I think the reasons have been discussed in previous threads, if I can find one of the discussions I'll post the link.
mastal is offline   Reply With Quote
Old 05-23-2013, 11:08 AM   #4
reubennowell
Member
 
Location: Edinburgh

Join Date: Jan 2013
Posts: 18
Default

Thanks guys,

Nope, not RNA-Seq - this is bacterial genomic DNA. Mastal, can you remember what you did to account for it? Trim the first X bases off the 5' end? In my datasets, it seems to disappear completely after base #20.
reubennowell is offline   Reply With Quote
Old 05-23-2013, 11:23 AM   #5
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default First 20bp of MiSeq reads are weird

I just remove those 20 or so bases from the start of the reads.
mastal is offline   Reply With Quote
Old 05-23-2013, 11:26 AM   #6
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,976
Default

This is not an abnormal observation. Generally you should not need to trim this data.

Is this a known genome or unknown? If known you can take a few reads and map them. There should be no problems getting this data to map.

http://seqanswers.com/forums/showthread.php?t=11843
http://seqanswers.com/forums/showthread.php?t=17219

Last edited by GenoMax; 05-23-2013 at 11:33 AM.
GenoMax is offline   Reply With Quote
Old 05-23-2013, 11:57 AM   #7
JackieBadger
Senior Member
 
Location: Halifax, Nova Scotia

Join Date: Mar 2009
Posts: 381
Default

Ive seen something similar @~9bases in one or two amplicon libraries.
I didn't delve too deep into what the cause was, I just knew that it had to go.
JackieBadger is offline   Reply With Quote
Old 05-23-2013, 01:28 PM   #8
nickloman
Senior Member
 
Location: Birmingham, UK

Join Date: Jul 2009
Posts: 356
Default

If this is from libraries made with Nextera it may represent biases in incorporation sites favoured by the transposase. We see this phenomenon frequently and don't find it necessary to trim it.
nickloman is offline   Reply With Quote
Reply

Tags
fastqc, illumina, miseq, reads

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:37 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO