SEQanswers

Go Back   SEQanswers > Applications Forums > De novo discovery



Similar Threads
Thread Thread Starter Forum Replies Last Post
newbie desperate and confused charsonic_wu General 3 09-28-2011 02:45 AM
Confused... DrDTonge SOLiD 2 07-11-2011 08:52 AM
Confused about .sai file size CNVboy Bioinformatics 1 06-15-2011 02:14 PM
Confused with sam flags anle Bioinformatics 0 05-24-2011 11:55 PM
TopHat: the results confused me Maria_Lu Bioinformatics 2 05-14-2010 07:54 PM

Reply
 
Thread Tools
Old 10-06-2010, 03:00 AM   #1
graham90978
Junior Member
 
Location: Here and there...

Join Date: Oct 2010
Posts: 6
Default A tad bit confused....

Hi guys, I am new to this site and have a few questions. I have been given a file of sequence reads from Illumina, s_7_sequence.txt in an attempt at de novo assembly (as a means of training me ). I thought my reads were in fastq format, however running velvet said it wasnt. In velvet the only file format that seems to work is gerald format. But when I attempt to run velvetg it says there are no nodes. I think this may be as a result of the file not being in fastq format. My sequences look like this:

HWI-EASXXX_0012:7:1:0:883#CGATGT/1:NACACATACAACACACACAACACACACAACACACAACACACACAA
CACACAACACACACACCACACACACCACAAAACACACAAC: DMZZZVZZZZWZZZZZZZVZZXZXZZZXZZZXZZZXZXZ
XXZZZZZZZRWZZVVXZVZXZZXXNHIUZVVZWXZZWNWXZWRXVV

would anyone know of a script to convert this file format to fastq? Or do you think there is some error as to why velvet is not working on the file?
Thanks in advanced folks !
graham90978 is offline   Reply With Quote
Old 10-06-2010, 08:54 AM   #2
krobison
Senior Member
 
Location: Boston area

Join Date: Nov 2007
Posts: 747
Default

FASTQ has 4 basic lines to it

A line with the ID on it, starting with @
A line with the sequence on it, with no prefix
A line with the ID on it, starting with +
A line the quality values on it, with no prefix

Yup, there's redundancy in the ID lines; I didn't design this but assume it was a desire for a check

It's hard to tell where the linebreaks are in your data (next time post with the CODE tags to preserve them), but if it is in 2-line pairs as it appears then the code would be

Code:
#!/usr/bin/perl
use strict;
while (my $lineA=<IN>)
{
  my ($id,$seq)=($lineA=~/(.*\/[12]):(.*)/);
  my $lineB=<IN>;
  print "\@$id\n";
  print $seq;  # newline was not chopped out!
  print "+$id\n";
  print $lineB; # quality info
}
krobison is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:20 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO