SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
sff files, fasta and fastq Feenix 454 Pyrosequencing 4 06-26-2014 05:43 AM
How to convert from sff to fasta or fastq shuang Bioinformatics 12 05-15-2014 08:09 AM
running Newbler with a lot off .sff files Autotroph Bioinformatics 11 10-31-2013 05:04 PM
FASTQ in Newbler LewisStewart Bioinformatics 1 01-17-2012 11:19 AM
sff file size limit in Newbler? lmilne Bioinformatics 7 10-21-2009 03:03 AM

Reply
 
Thread Tools
Old 10-30-2012, 10:08 AM   #1
mscholz
Member
 
Location: Los alamos

Join Date: May 2010
Posts: 13
Default Newbler with fastq from sff

I know this is an odd request, but here goes:

I want to be able to extract an 8KB sff to paired fastq files, and then assemble those fastqs through newbler (I know, it's not recommended, but bear with me). thus far, no matter how I do this, I have not been able to get newbler to treat these as paired ends. I have tried the workarounds I can find when referring to illumina pe data (#0/1 #0/1 as trailing), as well as any other tricks I can figure out.

Does anyone have any advice on this?

Thanks!

=matt
mscholz is offline   Reply With Quote
Old 10-30-2012, 11:14 AM   #2
nickloman
Senior Member
 
Location: Birmingham, UK

Join Date: Jul 2009
Posts: 356
Default

I'm not sure you'll be able to do this (as in, not sure if you can trick Newbler into using FASTQ for scaffolding). I might be wrong. Why do you want to do this?

But assuming it is possible, have you checked the orientation of the reads is correct for Illumina paired-end data, e.g. you might need to check that first read is forward and the second is reverse.

Also it's usually /1 /2 as suffixes.
nickloman is offline   Reply With Quote
Old 10-30-2012, 12:01 PM   #3
mscholz
Member
 
Location: Los alamos

Join Date: May 2010
Posts: 13
Default

So, we HAVE used illumina PE data for newbler using /1 /2 suffixes, and it's perfectly happy to treat them as paired reads. I'm not sure if newbler is also looking for some other character (illumina read names have a ton of colons in them, e.g.) in the naming to make it check for pairs.

From what I've read orientation for paired fastq needs to be inward facing, not sure if that orientation is generated for our set, but that would break out during the assembly, not during the read QC, which is where the reporting of # of paired reads used from each library is read in.....
mscholz is offline   Reply With Quote
Old 10-30-2012, 01:11 PM   #4
nickloman
Senior Member
 
Location: Birmingham, UK

Join Date: Jul 2009
Posts: 356
Default

Quote:
Originally Posted by mscholz View Post
So, we HAVE used illumina PE data for newbler using /1 /2 suffixes, and it's perfectly happy to treat them as paired reads. I'm not sure if newbler is also looking for some other character (illumina read names have a ton of colons in them, e.g.) in the naming to make it check for pairs.
Yes, I have too. The newer header format (Illumina 1.8+) with colons isn't compatible, here's a blog post about how to convert to the older format:

http://contig.wordpress.com/2011/09/...-fastq-header/

Quote:
From what I've read orientation for paired fastq needs to be inward facing, not sure if that orientation is generated for our set, but that would break out during the assembly, not during the read QC, which is where the reporting of # of paired reads used from each library is read in.....
OK, maybe you can post the top bit of your 454NewblerMetrics.txt file as this helps with debugging. The other possibility is that your sequences are too short.

I note you didn't say why you wanted to do this. Just to say that if you wanted to use Newbler with SFF files from another - competing - platform with different mate-pair linker sequences, then there is a hacky way of doing this.
nickloman is offline   Reply With Quote
Old 10-30-2012, 03:07 PM   #5
mscholz
Member
 
Location: Los alamos

Join Date: May 2010
Posts: 13
Default

Wait wait wait!

These are native 454 sffs that are being extracted from 454 runs. I haven't bothered letting a run go to completion for a while, since the first outputs during qc to the command line are whether read sets are being treated as paired or not.

The reasoning is convoluted, but involves using multiple library sets into newbler, of which the 8kB library may or may not be 454 data, so our informatics team would prefer to have the pipeline stable regardless of the sequencing type for the 8kb library. That means that all sffs have to go to fastqs for this to work. as far as I can tell the method you sent the link to works great for illumina data into newbler, but doesn't seem to work in any permutation for extracted and adapter split 454 reads.

If you really think that the length of the 454 subreads may be causing the problem I can size filter and try again, but it really looks to my unpracticed eye that it has something to do with the headers.

Once I have a completed run, I'll be happy to post the metrics file. In the meantime, this is what the headers currently look like:

sff ->fastq edit (read truncated for visibility)
Quote:
@HK0J9ML02GGB5G#0/1
CGCGAGGAAATACGGTCGACGCGGGCGGCGATCAC
+
88?=444<;;9698<<[email protected]@???ABB==
@HK0J9ML02IA4J3#0/1
mscholz is offline   Reply With Quote
Old 10-31-2012, 02:29 AM   #6
flxlex
Moderator
 
Location: Oslo, Norway

Join Date: Nov 2008
Posts: 415
Default

Newbler will not check for the linker in fastq files, so you'll have to provide the forward and reverse reads separately - this much I think you already know. You could use either newbler itself to split the reads (best option), or the sff_extract tool. For the first case, run newbler with your 8kb sff file and the '-tr flag, and look for the 454TrimmedReads.fna and qual files.

For mate pairs, I recommend using the -p option to force newbler to treat them as such. I don't think it will work with fastq files, even when set up corrrectly.
The headers need to be adjusted as per http://contig.wordpress.com/2011/01/...her-platforms/, or when you run sff_extract as per http://flxlexblog.wordpress.com/2012...substr-mg1655/
flxlex is offline   Reply With Quote
Old 10-31-2012, 02:47 AM   #7
nickloman
Senior Member
 
Location: Birmingham, UK

Join Date: Jul 2009
Posts: 356
Default

Yes, for my own edification I just tried this:

test.fastq
Code:
@test1/1
ATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATAT
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
@test1/2
ATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATAT
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
@test2/1
ATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATAT
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
@test2/2
ATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATAT
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Code:
runAssembly test.fastq
Created assembly project directory P_2012_10_31_10_52_07_runAssembly
1 read file successfully added.
    test.fastq  (Fastq dataset, with standard scores)
Doesn't work ... but if you fake up Illumina headers (test2.fastq):

Code:
@HWI-0001_0001_AAAAAA:1:1:1:1#ATCACG/1
ATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATAT
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
@HWI-0001_0001_AAAAAA:1:1:1:1#ATCACG/2
ATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATAT
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
@HWI-0001_0001_AAAAAA:1:1:1:2#ATCACG/1
ATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATAT
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
@HWI-0001_0001_AAAAAA:1:1:1:2#ATCACG/2
ATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATAT
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
Code:
runAssembly test2.fastq
Created assembly project directory P_2012_10_31_10_53_56_runAssembly
1 read file successfully added.
    test.fastq  (Illumina paired-end dataset, with standard scores)
Assembly computation starting at: Wed Oct 31 10:53:56 2012  (v2.6 (20110517_1502))
Indexing test.fastq (with quality scores)...
  -> 4 reads, 304 bases, 4 marked as matepairs.
It does ...

If you go back to the first version and add -p as Lex says, it does add it as a paired-end dataset, but doesn't mark the reads as mate-pairs:

Code:
runAssembly -p test.fastq
Created assembly project directory P_2012_10_31_10_55_01_runAssembly
1 read file successfully added as explicit paired-end files.
    test.fastq  (Fastq paired-end dataset, with standard scores)
Assembly computation starting at: Wed Oct 31 10:55:01 2012  (v2.6 (20110517_1502))
Indexing test.fastq (with quality scores)...
  -> 4 reads, 304 bases.
Interesting!
nickloman is offline   Reply With Quote
Old 10-31-2012, 02:54 AM   #8
nickloman
Senior Member
 
Location: Birmingham, UK

Join Date: Jul 2009
Posts: 356
Default

Quote:
Originally Posted by mscholz View Post
The reasoning is convoluted, but involves using multiple library sets into newbler, of which the 8kB library may or may not be 454 data, so our informatics team would prefer to have the pipeline stable regardless of the sequencing type for the 8kb library.
Separately, I'd suggest always using SFF files with Newbler when they are available due to the additional signal information contained in the flowgrams. Tends to give better results.
nickloman is offline   Reply With Quote
Reply

Tags
454, mate-pair, newbler

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:03 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO