Seqanswers Leaderboard Ad

**nickloman** · 10-30-2012, 11:14 AM

I'm not sure you'll be able to do this (as in, not sure if you can trick Newbler into using FASTQ for scaffolding). I might be wrong. Why do you want to do this?

But assuming it is possible, have you checked the orientation of the reads is correct for Illumina paired-end data, e.g. you might need to check that first read is forward and the second is reverse.

Also it's usually /1 /2 as suffixes.

**mscholz** · 10-30-2012, 12:01 PM

So, we HAVE used illumina PE data for newbler using /1 /2 suffixes, and it's perfectly happy to treat them as paired reads. I'm not sure if newbler is also looking for some other character (illumina read names have a ton of colons in them, e.g.) in the naming to make it check for pairs.

From what I've read orientation for paired fastq needs to be inward facing, not sure if that orientation is generated for our set, but that would break out during the assembly, not during the read QC, which is where the reporting of # of paired reads used from each library is read in.....

**nickloman** · 10-30-2012, 01:11 PM

Originally posted by mscholz View Post

So, we HAVE used illumina PE data for newbler using /1 /2 suffixes, and it's perfectly happy to treat them as paired reads. I'm not sure if newbler is also looking for some other character (illumina read names have a ton of colons in them, e.g.) in the naming to make it check for pairs.

Yes, I have too. The newer header format (Illumina 1.8+) with colons isn't compatible, here's a blog post about how to convert to the older format:

Newbler input III: a quick fix for the new Illumina fastq header

http://contig.wordpress.com/2011/09/01/newbler-input-iii-a-quick-fix-for-the-new-illumina-fastq-header/

One unfortunate drawback of working with Illumina sequences is the many changes to the format of their fastq readfiles. The quality scoring has been changed several times since the first Solexa rea…

From what I've read orientation for paired fastq needs to be inward facing, not sure if that orientation is generated for our set, but that would break out during the assembly, not during the read QC, which is where the reporting of # of paired reads used from each library is read in.....

OK, maybe you can post the top bit of your 454NewblerMetrics.txt file as this helps with debugging. The other possibility is that your sequences are too short.

I note you didn't say why you wanted to do this. Just to say that if you wanted to use Newbler with SFF files from another - competing - platform with different mate-pair linker sequences, then there is a hacky way of doing this.

**mscholz** · 10-30-2012, 03:07 PM

Wait wait wait!

These are native 454 sffs that are being extracted from 454 runs. I haven't bothered letting a run go to completion for a while, since the first outputs during qc to the command line are whether read sets are being treated as paired or not.

The reasoning is convoluted, but involves using multiple library sets into newbler, of which the 8kB library may or may not be 454 data, so our informatics team would prefer to have the pipeline stable regardless of the sequencing type for the 8kb library. That means that all sffs have to go to fastqs for this to work. as far as I can tell the method you sent the link to works great for illumina data into newbler, but doesn't seem to work in any permutation for extracted and adapter split 454 reads.

If you really think that the length of the 454 subreads may be causing the problem I can size filter and try again, but it really looks to my unpracticed eye that it has something to do with the headers.

Once I have a completed run, I'll be happy to post the metrics file. In the meantime, this is what the headers currently look like:

sff ->fastq edit (read truncated for visibility)

@HK0J9ML02GGB5G#0/1
CGCGAGGAAATACGGTCGACGCGGGCGGCGATCAC
+
88?=444<;;9698<<8444??444422@@???ABB==
@HK0J9ML02IA4J3#0/1

**flxlex** · 10-31-2012, 02:29 AM

Newbler will not check for the linker in fastq files, so you'll have to provide the forward and reverse reads separately - this much I think you already know. You could use either newbler itself to split the reads (best option), or the sff_extract tool. For the first case, run newbler with your 8kb sff file and the '-tr flag, and look for the 454TrimmedReads.fna and qual files.

For mate pairs, I recommend using the -p option to force newbler to treat them as such. I don't think it will work with fastq files, even when set up corrrectly.
The headers need to be adjusted as per http://contig.wordpress.com/2011/01/...her-platforms/, or when you run sff_extract as per http://flxlexblog.wordpress.com/2012...substr-mg1655/

**nickloman** · 10-31-2012, 02:47 AM

Yes, for my own edification I just tried this:

test.fastq

Code:

@test1/1
ATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATAT
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
@test1/2
ATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATAT
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
@test2/1
ATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATAT
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
@test2/2
ATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATAT
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII

Code:

runAssembly test.fastq
Created assembly project directory P_2012_10_31_10_52_07_runAssembly
1 read file successfully added.
    test.fastq  (Fastq dataset, with standard scores)

Doesn't work ... but if you fake up Illumina headers (test2.fastq):

Code:

@HWI-0001_0001_AAAAAA:1:1:1:1#ATCACG/1
ATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATAT
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
@HWI-0001_0001_AAAAAA:1:1:1:1#ATCACG/2
ATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATAT
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
@HWI-0001_0001_AAAAAA:1:1:1:2#ATCACG/1
ATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATAT
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
@HWI-0001_0001_AAAAAA:1:1:1:2#ATCACG/2
ATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATAT
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII

Code:

runAssembly test2.fastq
Created assembly project directory P_2012_10_31_10_53_56_runAssembly
1 read file successfully added.
    test.fastq  (Illumina paired-end dataset, with standard scores)
Assembly computation starting at: Wed Oct 31 10:53:56 2012  (v2.6 (20110517_1502))
Indexing test.fastq (with quality scores)...
  -> 4 reads, 304 bases, 4 marked as matepairs.

It does ...

If you go back to the first version and add -p as Lex says, it does add it as a paired-end dataset, but doesn't mark the reads as mate-pairs:

Code:

runAssembly -p test.fastq
Created assembly project directory P_2012_10_31_10_55_01_runAssembly
1 read file successfully added as explicit paired-end files.
    test.fastq  (Fastq paired-end dataset, with standard scores)
Assembly computation starting at: Wed Oct 31 10:55:01 2012  (v2.6 (20110517_1502))
Indexing test.fastq (with quality scores)...
  -> 4 reads, 304 bases.

Interesting!

**nickloman** · 10-31-2012, 02:54 AM

Originally posted by mscholz View Post

The reasoning is convoluted, but involves using multiple library sets into newbler, of which the 8kB library may or may not be 454 data, so our informatics team would prefer to have the pipeline stable regardless of the sequencing type for the 8kb library.

Separately, I'd suggest always using SFF files with Newbler when they are available due to the additional signal information contained in the flowgrams. Tends to give better results.

Topics	Statistics	Last Post
Evaluating Genome Sequencing for ECMO Patients in the NICU by seqadmin Started by seqadmin, 12-17-2024, 10:28 AM	0 responses 33 views 0 likes	Last Post by seqadmin 12-17-2024, 10:28 AM
New Genetic Toolkit Refines Studies on Gene Function and Disease by seqadmin Started by seqadmin, 12-13-2024, 08:24 AM	0 responses 48 views 0 likes	Last Post by seqadmin 12-13-2024, 08:24 AM
Study Links Brain Mechanism to Emotional Responses in Animals and Humans by seqadmin Started by seqadmin, 12-12-2024, 07:41 AM	0 responses 34 views 0 likes	Last Post by seqadmin 12-12-2024, 07:41 AM
Study Identifies Ribosomal RNA Fingerprints as Early Cancer Biomarkers by seqadmin Started by seqadmin, 12-11-2024, 07:45 AM	0 responses 46 views 0 likes	Last Post by seqadmin 12-11-2024, 07:45 AM

Seqanswers Leaderboard Ad

Announcement

Newbler with fastq from sff

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News