SEQanswers

Go Back   SEQanswers > General



Similar Threads
Thread Thread Starter Forum Replies Last Post
why can not download Abyss 1.3.2 elisadouzi Bioinformatics 0 12-13-2011 08:22 PM
Working Of Abyss aarifjindani Bioinformatics 0 11-06-2011 09:05 PM
Abyss-PE error joa_ds Bioinformatics 1 11-30-2010 08:32 PM
python in ABySS dror RNA Sequencing 0 11-28-2010 05:19 AM
documentation for ABySS harrb Bioinformatics 1 11-23-2010 08:18 AM

Reply
 
Thread Tools
Old 03-30-2011, 11:00 PM   #1
Seta
Member
 
Location: Europe

Join Date: Mar 2011
Posts: 14
Default ABySS input

Hello,
I'm new to the bioinformatics side of life and started to use ABySS for allignement. I have however an input file which is not in FASTA format.
For example the lines look like:

@HWI-EAS313_0005:1:1:1158:9100#0/1
TCGATAGGCCGTGGACAGNGCTGACCGTAGGGGTGGGCTGNGNNNNNNTANGTACGTGNCTGGGTGTACCGAATANNCNT
+HWI-EAS313_0005:1:1:1158:9100#0/1
cbdddccddcece^c_ZbBb``b``cdddddcdad[`Vb^BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB


with in the second line the sequence and in the fouth line the quality score.
I wondered whether there is a command line with which ABySS could read this file (and therefore include the quality scores in the analysis as they are quite important) or whether it is easier to make a FASTA file from this txt file. In the latter case does anyone knows a quick way to do that?

Tanks
Seta is offline   Reply With Quote
Old 03-30-2011, 11:56 PM   #2
ffinkernagel
Senior Member
 
Location: Marburg, Germany

Join Date: Oct 2009
Posts: 110
Default

My AbySS readme states:
"in specifies the input files to read, which may be in FASTA, FASTQ, qseq or export format and compressed with gz, bz2 or xz."

and since your data is fastq, you can pass it directly into AbySS.

So long
ffinkernagel is offline   Reply With Quote
Old 03-31-2011, 09:42 AM   #3
Seta
Member
 
Location: Europe

Join Date: Mar 2011
Posts: 14
Default

If I try to run ABYSS I get the next error:

error: Expected either `>' or `@' or 11 fields
and saw `' and 1 fields near
??|Ms_7_sequence.txt?[?v?ʎ}??я?z?%yJ?.ža<I?T?T??H??4??툢?????

What should I do if my data are already in FASTQ?

Thanks
Seta is offline   Reply With Quote
Old 03-31-2011, 11:16 PM   #4
ffinkernagel
Senior Member
 
Location: Marburg, Germany

Join Date: Oct 2009
Posts: 110
Default

'??|MS_7_sequencetxt?' isn't fastaq - that looks more like a tarred & gziped file - try uncompressing it with tar -xf myfastq.tgz
ffinkernagel is offline   Reply With Quote
Old 04-01-2011, 12:55 AM   #5
Seta
Member
 
Location: Europe

Join Date: Mar 2011
Posts: 14
Default

Hi,

the fiel is a .txt.gz file. Abyss doesn't read the unzipped txt file and gives the abovementioned error if I use the .txt.gz file. I also tought that the extension of the file would be a problem...is there a way to convert the .txt file to a .fasta extension?

Thanks
Seta is offline   Reply With Quote
Old 04-01-2011, 01:07 AM   #6
ffinkernagel
Senior Member
 
Location: Marburg, Germany

Join Date: Oct 2009
Posts: 110
Default

There's a good chance the file extension is the whole problem - I don't know if Abyss just does file-extension detection or if it actually analysis the first few bytes.

You can simply rename the file "mv name.txt.gz name.fastq.gz" in a standard shell...
ffinkernagel is offline   Reply With Quote
Old 04-01-2011, 02:46 AM   #7
Seta
Member
 
Location: Europe

Join Date: Mar 2011
Posts: 14
Default

I keep getting the same error....although I have changed the extension the way you said (mv name.txt.gz name.fastq.gz)
the error still is
error: Expected either `>' or `@' or 11 fields
and saw `' and 1 fields near
??|Ms_7_sequence.txt?[?v?ʎ}??я?z?%yJ?.ža<I?T?T??H??4??툢?????
Any other ideas? Has anybody else encountered this problem?
Cheers
Seta is offline   Reply With Quote
Old 04-01-2011, 03:15 AM   #8
ffinkernagel
Senior Member
 
Location: Marburg, Germany

Join Date: Oct 2009
Posts: 110
Default

When you said your files look like
"@HWI-EAS313_0005:1:1:1158:9100#0/1
TCGATAGGCCGTGGACAGNGCTGACCGTAGGGGTGGGCTGNGNNNNNNTANGTACGTGNCTGGGTGTACCGAATANNCNT
+HWI-EAS313_0005:1:1:1158:9100#0/1
cbdddccddcece^c_ZbBb``b``cdddddcdad[`Vb^BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB"

how did you view them?
ffinkernagel is offline   Reply With Quote
Old 04-01-2011, 03:27 AM   #9
Seta
Member
 
Location: Europe

Join Date: Mar 2011
Posts: 14
Default

as it was a txt file I opened it as such and viewed it
Seta is offline   Reply With Quote
Old 04-01-2011, 03:50 AM   #10
ffinkernagel
Senior Member
 
Location: Marburg, Germany

Join Date: Oct 2009
Posts: 110
Default

Many editors silently decompress gziped data, but AbySS should be able to read it, at least that's what it says in it's documentation.

"??|Ms_7_sequence.txt?[?v?ʎ}??я?z?%yJ?.ža<I?T?T??H??4??툢?????"
Doesn't look like
"@HWI-EAS313_0005:1:1:1158:9100#0/1" at all though.

ok, how about this:
do
"gzip -cd name.fastq.gz | head"
and show me the output.
ffinkernagel is offline   Reply With Quote
Old 04-01-2011, 04:19 AM   #11
Seta
Member
 
Location: Europe

Join Date: Mar 2011
Posts: 14
Default

The output is:

@HWI-ST538_0098:7:1:13334:2000#NNNNNN/1
AGNTCACCAATCTCAACGTGGAGTTCTCCGCTAAGGACCCTTTCTNNCGTCAGTCAACTGTGTGGAAACTTGATGGATCGAGGAAGGAGGGAATTGTCAC
+HWI-ST538_0098:7:1:13334:2000#NNNNNN/1
X\B\\cccccggggggggggggdggfffffgfgfggfgggbbbbcBB_][][_]_fcgbgbfafeVbadd\ebeeeeeffdffeegfbggfXfdc^Xddd
@HWI-ST538_0098:7:1:17263:1999#NNNNNN/1
CTNTAAGCAGTGGTATCAACGCAGAGTACGGGGGGGTTCCTCACANNGTTGACGCTCTTTCGTCTACGGGAGAACGCTATAGCTCTGGGGAACATCTAAA
+HWI-ST538_0098:7:1:17263:1999#NNNNNN/1
XVBWU]^^]Zd`ddb^eeeddeeeeddebebddddc^ZcBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
@HWI-ST538_0098:7:1:19073:1999#NNNNNN/1
AGNTATTTGCAAAATCTGAAAGAGTTCAAAGGAAACGCTTCTCATNNAGAGAAGAGGAAAGCCATATAAAGATACAACCACGCTCTATATGTCTCCTTTA

what does it mean? it is just a part from the file right?
Seta is offline   Reply With Quote
Old 04-01-2011, 04:28 AM   #12
ffinkernagel
Senior Member
 
Location: Marburg, Germany

Join Date: Oct 2009
Posts: 110
Default

Yeah, gzip -d decompresses, and the c says: write to the console, which | passes into head, which gives you the first few lines of a file.

So. You have a fastq file - at least in the beginning, but what AbySS reads doesn't look fastq.

You can try to uncompress the file first (gzip -d name.fastq.gz - will give you name.fastq, while making name.fastq.gz disappear - gzip name.fastq will do the reverse)), but I suspect the problem somewhere else.

How are you calling AbySS (the exact command line)?
Is this a paired end run?
ffinkernagel is offline   Reply With Quote
Old 04-01-2011, 04:45 AM   #13
Seta
Member
 
Location: Europe

Join Date: Mar 2011
Posts: 14
Default

it is a single end run and the command line I use (exactly as the read me says):
ABYSS -k15 name.fastq.gz -o name_contig.fastq
Seta is offline   Reply With Quote
Old 04-01-2011, 04:57 AM   #14
ffinkernagel
Senior Member
 
Location: Marburg, Germany

Join Date: Oct 2009
Posts: 110
Default

Stubborn problem, ain't it ?

Your call looks fine, the beginning of your fastq file looks fine.
Let's try to invalidate the hypothesis that there's something wrong with it.
What do
gzip -cd name.fastq.gz | grep "Ms_7_sequence.txt"
and gzip -cd name.fastq.gz | tail
output?
(if you have the file currently uncompressed, you can
do 'tail name.fastq' and 'grep "Ms_7_sequence.txt" name.fastq'
instead).

Last edited by ffinkernagel; 04-01-2011 at 04:58 AM. Reason: Apperantly pushed submit before finishing my last sentence
ffinkernagel is offline   Reply With Quote
Old 04-01-2011, 09:35 AM   #15
Seta
Member
 
Location: Europe

Join Date: Mar 2011
Posts: 14
Default

Yeah, it quitte someting, but its a good learning upportiunity I'm glad than you want to help me

if I used the line
gzip -cd name.fastq.gz | tail

it gives me:
+HWI-ST538_0098:7:66:21272:200778#NNNNNN/1
_BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
@HWI-ST538_0098:7:66:21298:200868#NNNNNN/1
TGNTGGCGGTGGTTTTTGGGGGGGGTGGGGGGTGTTTGGTGGGGGGTTGGGGGGGTGTTTTTTGTGGTTGTTTTGGTTTGGGTGTGGGGTTGGTTGTTGT
+HWI-ST538_0098:7:66:21298:200868#NNNNNN/1
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
@HWI-ST538_0098:7:66:21274:200885#NNNNNN/1
GTNTGGGTGGGTTGGTTGGGGTGTTGGGGTGGGGGTGGCGTTTTCTGGGGAGGGTTGGGGGTTTTGGGTTGTAGGGTGTTGGTTTGGGGTGGAGGGGGTG
+HWI-ST538_0098:7:66:21274:200885#NNNNNN/1
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

Which looks similar as the heading....so there must be something wrong in the middle maybe the amount of 'BBBBB's? this is a measure of a low quality, right?
Seta is offline   Reply With Quote
Old 04-01-2011, 09:57 AM   #16
ffinkernagel
Senior Member
 
Location: Marburg, Germany

Join Date: Oct 2009
Posts: 110
Default

yeah, that's what the line with grep is supposed to check-whether that string abyss complains about is in there.
ffinkernagel is offline   Reply With Quote
Old 04-03-2011, 11:18 PM   #17
Seta
Member
 
Location: Europe

Join Date: Mar 2011
Posts: 14
Default

how long should that take..hours or just minutes...mine is already bussy for an hour...
Seta is offline   Reply With Quote
Old 04-04-2011, 12:16 AM   #18
ffinkernagel
Senior Member
 
Location: Marburg, Germany

Join Date: Oct 2009
Posts: 110
Default

Well, for a 1.2 gig file (compressed), my machine needed about 3 minutes (transfering it on a gigabit network) for doing the 'gzip -cd | grep' stanca.
ffinkernagel is offline   Reply With Quote
Old 04-04-2011, 01:39 AM   #19
Seta
Member
 
Location: Europe

Join Date: Mar 2011
Posts: 14
Default

I don't get any output at all, if I use the line:
gzip -cd s_7_sequence.fastq.gz | grep "Ms_7_sequence.txt"
But I also already thought that there wouldn't be a line with "Ms_7_sequence.txt"....
Seta is offline   Reply With Quote
Old 04-04-2011, 01:58 AM   #20
ffinkernagel
Senior Member
 
Location: Marburg, Germany

Join Date: Oct 2009
Posts: 110
Default

Yes, no output is what you would get if there was no 'Ms_7_sequence.txt' in there.

Curiouser and curiouser - have you tried passing the un-gziped file (gzip -d name.fastq.gz) to AbySS?
ffinkernagel is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:46 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO