SEQanswers

Go Back   SEQanswers > Applications Forums > De novo discovery



Similar Threads
Thread Thread Starter Forum Replies Last Post
Assembly using Illumina Paired-end reads from SRA with MIRA chayan Bioinformatics 3 02-24-2014 02:36 AM
De novo assembly for Illumina HighSeq paired end reads hicham Bioinformatics 17 02-12-2014 09:58 AM
de novo assembly with MIRA and 454 single-end reads. Too much contigs fgajardoe De novo discovery 6 04-17-2013 06:03 AM
paired-end read length for de novo assembly Seqasaurus Illumina/Solexa 4 10-19-2011 04:32 AM
PubMed: Local De Novo Assembly of RAD Paired-End Contigs Using Short Sequencing Reads Newsbot! Literature Watch 0 05-06-2011 12:40 AM

Reply
 
Thread Tools
Old 11-19-2014, 12:19 PM   #1
ssully
Member
 
Location: NYC

Join Date: Aug 2010
Posts: 48
Default de novo assembly including Illumina and 454 paired-end reads

There's a bunch of hybrid de novo assembly threads but not sure I'm finding the answers to these questions, specifically about paired-end reads from two platforms (Illumina and 454).

454's 'paired end' are really mate pair ends from a 3-20kb fragment, swapped right to left in the read, both ends remain in 5--3' orientation (thanks to circularization), and are separated by a linker sequence.

5' right>-----[linker]left>------- '3

During assembly, Newbler splits these reads and also does quality trimming. It 'knows' that X bp should occur between the two half-reads.


Illumina paired ends are separate reads from each end of a ~350--600 bp fragment

read 1
5' --~100nt of left - 3'
read 2
5' --~100nt of reverse complement of right -- 3'

These reads may require some adapter and quality trimming as well.


So, can Newbler (I have v3.0) 'understand' Illumina paired-end reads (i.e. know that they represent of span of X bp)? Can it do trimming of adapters and low-quality bases?

Alternately, are there assemblers that 'understand' Illumina paired ends, but can also understand 454 PE reads? That is , tjhey know that the read must be split, linker discarded, and spacing of X bp between them? Can any also do quality trimming of SFF files?


I have raw fastq and sff reads, as well as trimmed versions of the same fq and sff reads. I also have (raw or trimmed) single-end sff and Fastq sets to use in the assembly (most of the data are single-end). Wanting to know what assemblers need what input, to fully exploit hybrid paired end information (e.g. for making better scaffolds)... with the least preprocessing on my part.

Last edited by ssully; 11-19-2014 at 12:24 PM.
ssully is offline   Reply With Quote
Old 11-19-2014, 01:45 PM   #2
danwiththeplan
Member
 
Location: Auckland

Join Date: Sep 2011
Posts: 72
Default

MIRA4:
http://mira-assembler.sourceforge.ne...ml#chap_denovo

Can handle hybrid assemblies, and also can handle paired-end reads in the orientation you describe (there's an example of this in the link above)..

It's a bit of a memory hog though. You may require a high-memory system.
danwiththeplan is offline   Reply With Quote
Old 11-19-2014, 02:38 PM   #3
ssully
Member
 
Location: NYC

Join Date: Aug 2010
Posts: 48
Default

That helps a lot (for MIRA) , thanks!

So as long as I convert the 454 Paired End sff to Fastq with sff_extract, MIRA will process and assemble them correctly *as paired ends*? It still understands that those reads are 'paired end' and not single end? I'm curious as to how it does that -- does it create an interleaved Fastq or two separate -1 and -2 fastq files?
ssully is offline   Reply With Quote
Old 11-19-2014, 02:48 PM   #4
danwiththeplan
Member
 
Location: Auckland

Join Date: Sep 2011
Posts: 72
Default

Quote:
It still understands that those reads are 'paired end' and not single end?
My understanding is yes

Quote:
I'm curious as to how it does that -- does it create an interleaved Fastq or two separate -1 and -2 fastq files?
Actually don't know.

MIRA4 works by using a manifest file that defines the data to go into the program.

look at section 5.3.3. Manifest for data sets with paired reads (in the link above).

There's a parameter called segment_placement that defines how the pairs are arranged (ie >> or <> or >< or << or whatever) and (I think) the expected separation.

As for separate FASTQ files for left and right reads, I think MIRA expects Illumina data to be this way, but I don't know how 454 data works.In the example in the link above the data is defined as 454 data and only one file is given, so maybe you don't have to split the pairs. not sure about this one.
danwiththeplan is offline   Reply With Quote
Old 11-19-2014, 03:31 PM   #5
ssully
Member
 
Location: NYC

Join Date: Aug 2010
Posts: 48
Default

The example in the mulitple platform manifest *seems* to indicate the 454 PEs can be left as one fastq file. The insert size and SD need to be provided (which I can do) or 'autorefine' to let MIRA figure it out. Using 'autopairing' would mean not even having to tell MIRA the direction of the reads in a pair.


Looks good, but I'll write the author and see if I can get a clearer view.
ssully is offline   Reply With Quote
Old 11-20-2014, 03:06 AM   #6
wanghao
Junior Member
 
Location: Helsinki

Join Date: Nov 2014
Posts: 1
Default

Quote:
Originally Posted by ssully View Post
I'll write the author and see if I can get a clearer view.
Can you also post the reply here if you get it from the author.
wanghao is offline   Reply With Quote
Old 11-20-2014, 06:24 AM   #7
flxlex
Moderator
 
Location: Oslo, Norway

Join Date: Nov 2008
Posts: 415
Default

Quote:
Originally Posted by ssully View Post
So, can Newbler (I have v3.0) 'understand' Illumina paired-end reads (i.e. know that they represent of span of X bp)? Can it do trimming of adapters and low-quality bases?
Yes, Newbler will figure out the pairing from the fastq files, provided the read IDs conform to the 'standards' (see the fastq entry on wikipedia). No, you cannot tell Newbler the span, as it figures this out for itself, regardless of where the data came from. It maps pairs and determines the mode and stdev of the distribution based on that.

Newbler will trim low-quality bases. It can do adaptor trimming through the -vt flag (you have to provide it with a fasta file of adapter sequences, probably both in forward and reverse complement orientation).
flxlex is offline   Reply With Quote
Old 11-21-2014, 10:20 AM   #8
ssully
Member
 
Location: NYC

Join Date: Aug 2010
Posts: 48
Default

Quote:
Originally Posted by wanghao View Post
Can you also post the reply here if you get it from the author.
That was the wrong way to do it -- instead I've joined the MIRA user forum, which is what the author recommends.
ssully is offline   Reply With Quote
Old 12-04-2014, 04:45 AM   #9
bastianwur
Member
 
Location: Germany/Netherlands

Join Date: Feb 2014
Posts: 98
Default

With that data, I'd use GapFiller with both the reads, and throw the filled up 454 contigs together with the Illumina data into an assembler which can take both, like e.g. Ray.
bastianwur is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:09 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO