SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
TRINITY:Input (Fastq) and Output files (GFF) mhadidi2002 Bioinformatics 6 01-11-2013 10:58 AM
FluxSimulator: FASTQ/FASTA output? mrfox Bioinformatics 3 03-23-2011 07:43 PM
sra-lite to fastq problem: no output pickrell Bioinformatics 0 02-03-2011 11:26 AM
Reduce file size after Illumina FASTQ to Sanger FASTQ conversion? jjw14 Illumina/Solexa 2 06-01-2010 04:35 PM
Fastq quliaty score and MAQ output quality score baohua100 Bioinformatics 1 02-19-2009 09:21 AM

Reply
 
Thread Tools
Old 08-22-2012, 10:55 AM   #1
joseph
Member
 
Location: ca

Join Date: Feb 2008
Posts: 39
Default tagdust output is not fastq!

Hi
can you please help me with the output of tagdust? It doesn't look like the proper fastq format.
Thanks
Joseph Dhahbi

Code:
tagdust contaminants.fasta myfile.fastq -fdr 0.05 -o myfile_tagdust.fastq
Code:
head -10 myfile.fastq
@HS1:202:C0UY5ACXX:4:1101:1485:2099 1:N:0:ACAGTG
NAAGATAGTTATGAAACAGAAGATGAAAGTTCCTGGGATAATGTTGAGTTAGGAGACTACACTACACAGGCCATAGAAGATGAAACCTATAGTGATATTA
+HS1:202:C0UY5ACXX:4:1101:1485:2099 1:N:0:ACAGTG
#1:A1?>DFF?FC>GICEFGEHBHHIHGG?EAEFHIIBFHEGIFHBCHBFHEGEHIIDF>BBHDHGIGGCGHGDCEC>EFF?B@@@EDDDCD>CDDD<>C
@HS1:202:C0UY5ACXX:4:1101:1458:2150 1:N:0:ACAGTG
CTAGAATAGGATTGCGCTGTTATCCCTAGGGTAACTTGTTCCGTTGGGCAAGTTATTGGATCAATTGAGTATAGTAGTTCGCTTTGACTGGTGAAGTCTT
+HS1:202:C0UY5ACXX:4:1101:1458:2150 1:N:0:ACAGTG
@<@FFADFGGFFFHJIIIIFHEEHGDD@EBA1?DGHIJAGGIJEHGG(93;BHGGCHIFA:==CEH>CC?>B@>A>??)6;A;BBA@AAC<@C>CAC>A;
@HS1:202:C0UY5ACXX:4:1101:1748:2155 1:Y:0:ACAGTG
CTCCGGAGGGACCCTCCAGCTGTGATGAAGGCCCAGAGCCCCATGGAAACAGTGACTGCATAGATGGGCAGCAGGCTGGCCTCATACCAGGGGTTCAGGA
Code:
head -10 myfile_tagdust.fastq
@HS1:202:C0UY5ACXX:4:1101:1485:2099 1:N:0:ACAGTG
NAAGATAGTTATGAAACAGAAGATGAAAGTTCCTGGGATAATGTTGAGTTAGGAGACTAC
ACTACACAGGCCATAGAAGATGAAACCTATAGTGATATTA
+HS1:202:C0UY5ACXX:4:1101:1485:2099 1:N:0:ACAGTG
#1:A1?>DFF?FC>GICEFGEHBHHIHGG?EAEFHIIBFHEGIFHBCHBFHEGEHIIDF>
BBHDHGIGGCGHGDCEC>EFF?B@@@EDDDCD>CDDD<>C
@HS1:202:C0UY5ACXX:4:1101:1458:2150 1:N:0:ACAGTG
CTAGAATAGGATTGCGCTGTTATCCCTAGGGTAACTTGTTCCGTTGGGCAAGTTATTGGA
TCAATTGAGTATAGTAGTTCGCTTTGACTGGTGAAGTCTT
+HS1:202:C0UY5ACXX:4:1101:1458:2150 1:N:0:ACAGTG
joseph is offline   Reply With Quote
Old 08-22-2012, 12:36 PM   #2
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

It might be valid, but appears to be using line wrapping, which is not recommended. Some tools assume 4 lines per record (for speed) and line wrapping breaks them. Other tools have more robust (but slightly slower) FASTQ parsers which will cope.

You could try running this line wrapped FASTQ through EMBOSS seqret or Biopython or something similar which will accept line wrapped FASTQ input, but produce typical 4 line per record unwrapped FASTQ as output.
maubp is offline   Reply With Quote
Old 08-23-2012, 05:37 AM   #3
SES
Senior Member
 
Location: Vancouver, BC

Join Date: Mar 2010
Posts: 275
Default

I took a look at main.c in tagdust and saw this at line 52:

Code:
	if(!param->linewrap){
		linewrap = 0;
	}else{
		linewrap = 1;
	}
which suggested there was an option to control this behavior. Sure enough, after you compile the program you can see that there is an option to print the sequences on one line (the -s option, specifically).

Code:
$ ./tagdust
TagDust version 1.13, Copyright (C) 2009 Timo Lassmann <timolassmann@gmail.com>

Usage: tagdust [options]  lib.fa read1.fa read2.fa ...
	

	Options:
	-f, -fdr	False discovery rate (default: 0.01)
	-o <file>	print clean tags to file.
	-a <file>	print artifactual tags to file.
	-trim5 <X>	trim 'X' residues from the start of all reads.
	-trim3 <X>	trim 'X' residues from the end of all reads.
	-fasta		output format is fasta.
	-s, -singleline	sequences are written in a single line.
	-q, -quiet	quite mode

	Identifies tags as artifactual sequences if they match to library sequences.
	Library sequences must be in fasta; tag sequences in either fasta or fastq format.
SES is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:59 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO