SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Illumina SNP format to merlin format evoll Bioinformatics 0 05-05-2011 02:39 AM
What's your general BWA settings for 75bp SOLEXA reads? skblazer Bioinformatics 1 12-28-2010 04:00 PM
data format from illumina solexa zhuz Illumina/Solexa 4 12-21-2010 11:52 AM
problem withe Illumina solexa sequencing g781 Illumina/Solexa 3 05-18-2010 10:05 AM
illumina GA2, 75bp paired-end data yasutake Bioinformatics 0 02-02-2009 06:39 PM

Reply
 
Thread Tools
Old 08-28-2009, 12:20 AM   #1
anyone1985
Member
 
Location: shanghai, chia

Join Date: Mar 2009
Posts: 67
Default Illumina solexa 75bp format problem

I don't know why every read ends with 22 Ns. Please tell me.

@HWI-EAS241:5:1:10:83#0/1
GCCCCGTCCATCACTTCTGCGATGCCGCGAATGCCCAATGGCAAGCCGNCGGGNNNNNNNNNNNNNNNNNNNNNN
+HWI-EAS241:5:1:10:83#0/1
[a``_`X_O\Q\YQ[Z\O[a\WXNXZZBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
@HWI-EAS241:5:1:10:1808#0/1
TGCTGCGGCCCAATGGAGCCACGTTGCCCTGGTGCTTGCCCTTGGGATNGTGGNNNNNNNNNNNNNNNNNNNNNN
+HWI-EAS241:5:1:10:1808#0/1
[aaaaaaa\UX_aaa\U__`a`a`a_^Ua``P\a_aa_\TWa`BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
@HWI-EAS241:5:1:10:1866#0/1
TGGCCGCCTGCGTCACGCCGATTGTCAGCGCCGTGGGCCATGAAACCGNCGTGNNNNNNNNNNNNNNNNNNNNNN
anyone1985 is offline   Reply With Quote
Old 08-28-2009, 03:28 AM   #2
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

There is something else funny in those records - the spaces in both the sequences and the quality strings. Are those spaces real, or some a cut & paste corruption, or quirk of the forum editor?

Last edited by maubp; 08-28-2009 at 05:11 AM. Reason: fixed typo
maubp is offline   Reply With Quote
Old 08-28-2009, 05:08 AM   #3
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,178
Default

When you say "every read" do you literally mean EVERY read? Is it the entire flow cell, one lane, part of lane? Did anything happen to the instrument between cycles 53-54, such as reagents being refilled or software restarted?
kmcarr is offline   Reply With Quote
Old 08-31-2009, 04:35 AM   #4
anyone1985
Member
 
Location: shanghai, chia

Join Date: Mar 2009
Posts: 67
Default

Yes, every read. I used the velvet to assemble the genome. I did not know if it would affect the result of assemble. Whether should I remove the Ns first?
anyone1985 is offline   Reply With Quote
Old 08-31-2009, 06:49 AM   #5
dcjamison
Member
 
Location: Cincinnati

Join Date: Oct 2008
Posts: 15
Default

The only time a N gets put into the sequence is when the base caller cannot match a cluster in the current tile. Typically this happens at the edge, when clusters "wander" on and off the image. Based on the fact your read quality went kaput in last 20-odd bases, I would guess one of the reagents ran out or was bad -- most likely the incorporation mix -- and you got no cluster illumination.

You do need to trim the N's out before you put the sequences into velvet. Probably easiest to do by rerunning gerald with the USE_BASES param set to Y52n*.

Edit: although it does occur to me the N followed by 4 called bases **might** indicate a laser issue -- highly unlikely, in my opinion, but you might want to discuss it with your FAS.

Last edited by dcjamison; 08-31-2009 at 06:53 AM.
dcjamison is offline   Reply With Quote
Old 08-31-2009, 07:02 AM   #6
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

If you want to just edit the FASTQ file, here is a tiny Biopython script to do this for you (take just the first 52 bases of each read):

Code:
from Bio import SeqIO	 
trimmed= (rec[:52] for rec in \	 
          SeqIO.parse(open("original.fastq"), "fastq"))	 
out_handle = open("trimmed.fastq", "w")	 
SeqIO.write(trimmed, out_handle, "fastq")	 
out_handle.close()
That should work on Biopython 1.51 or later (and probably 1.50 from memory).
maubp is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:08 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO