SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
DeNovo assembly using pacBio data krittika.sasmal Pacific Biosciences 50 06-05-2013 10:56 AM
question about denovo assembly kenietz Bioinformatics 26 05-13-2013 06:12 PM
Denovo assembly problem huma Asif Illumina/Solexa 1 03-27-2013 10:20 PM
single gene denovo assembly? asunen Bioinformatics 3 12-18-2012 11:20 AM
denovo assembly nagaraj Bioinformatics 5 07-11-2012 07:13 AM

Reply
 
Thread Tools
Old 12-17-2012, 09:15 AM   #1
ffish
Junior Member
 
Location: auburn al.

Join Date: Jul 2010
Posts: 2
Post Anyone used Allpath-LG for denovo assembly?

Hi, all,

Is there anyone used AllPath-LG before? I encountered a error when I used it for genome assembly with Illumina short reads library, 3 Kb mates-pair library, 8 Kb mates-pair library and 36 Kb mates-pair library.
The error shows:
"No library parameter adjustment: too few pairs closed.
Less than 10% of fragment pairs were filled.
There may be a problem with the library."

Anyone knows what does this mean? I am a newie in this assembly world. Thank you.
ffish is offline   Reply With Quote
Old 12-19-2012, 02:24 AM   #2
flxlex
Moderator
 
Location: Oslo, Norway

Join Date: Nov 2008
Posts: 415
Default

ALLPATHS tries to close the gap between the short-reads (paired end) first to generate longer reads. That's why the short read insert should be less than twice the read length (e.g. 180 bp for 2x100 PE sequencing). I think the program complains that there are not enough reads that can be closed.

What was your short library insert size, and read length?

You can always ask the developers to make sure.
flxlex is offline   Reply With Quote
Old 12-20-2012, 01:06 PM   #3
ffish
Junior Member
 
Location: auburn al.

Join Date: Jul 2010
Posts: 2
Default

Thank you very much, flxlex.
The library insert size is 200 bp, and read length is 100 bp. So probably this data is not suitable for ALLPATH?
ffish is offline   Reply With Quote
Old 01-09-2013, 07:26 AM   #4
flxlex
Moderator
 
Location: Oslo, Norway

Join Date: Nov 2008
Posts: 415
Default

There may indeed not be enough overlap...
flxlex is offline   Reply With Quote
Old 09-11-2014, 11:57 AM   #5
marrakesh
Junior Member
 
Location: Austin

Join Date: Jul 2014
Posts: 8
Default

Necroposting rather than making a new thread, but I got the same error when I tried to run my assembly today. However, my smallest fragment library reads are 100bp long with an average insert size of 170 bp, so I should have quite a bit of overlap between them. The program "filled" just over 5% of my reads.

Is this not enough overlap for Allpaths? The manual specifies about a 180 bp insert for your smallest fragment library, which should be reasonably close. Or am I including too many far-apart reads? In addition to the 170bp-insert fragment library, I have a 400bp-insert fragment library and a 900bp-insert fragment library that I was hoping to use for the analysis, as well as the mate-paired library.

Should I drop some of those? I'm a little confused about what I might be doing wrong and rather hoping not to have to move to a new alignment program for my de novo assembly.
marrakesh is offline   Reply With Quote
Old 09-12-2014, 10:12 AM   #6
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

How does the quality of your reads look? Merging reads is sensitive to the quality of the tail bases.
Brian Bushnell is offline   Reply With Quote
Old 07-12-2015, 08:23 AM   #7
Marcela Uliano
Member
 
Location: Berlin, Germany

Join Date: Apr 2012
Posts: 18
Default

Hey guys, I was wondering if any of you solved this problem?

I'm having a similar issue "Less than 10% of fragment pairs were filled."

I have about 113 times coverage for the genome with pair ends and mate pairs, but 49x times only with the pair ends, which is the estimate that ALLPATHS gives at the end. And only half of my PE have an insert size of 180bp, and fragment of 100bp, which overlap. I've done all nextera sequencing.

Do you guys think it means its time for more coverage? Or do you have any advice on how to solve that in silico?

Thank you so much!
Marcela Uliano is offline   Reply With Quote
Old 07-12-2015, 10:22 AM   #8
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

There should not be any problem with overlapping in that case. Perhaps you could give BBMerge a try? I only have indirect experience with AllPaths so I'm not sure if it allows you to merge the reads externally and then feed them in, but if you can, it's another option - BBMerge can merge overlapping reads, and with sufficient coverage (and 49x should be plenty), nonoverlapping as well. It can also produce an insert-size histogram, which would be worth posting.

For overlapping reads only, the command is:
bbmerge.sh in=reads.fq out=merged.fq outu=unmerged.fq ihist=ihist.txt

For non-overlapping also, on a 512gb machine:
bbmerge.sh -Xmx420g in=reads.fq out=merged.fq outu=unmerged.fq ihist=ihist.txt extend2=20 iterations=5 ecct=t

You can alternately extend and error-correct the reads with Tadpole so that they overlap more, like this:

tadpole.sh in=reads.fq out=extended.fq mode=extend extendright=30 ecc=t

That should improve the merge rate by making more of them overlap by a larger amount, and decreasing mismatches.

Then you could feed the extended reads to AllPaths.
Brian Bushnell is offline   Reply With Quote
Old 07-12-2015, 11:38 AM   #9
luc
Senior Member
 
Location: US

Join Date: Dec 2010
Posts: 452
Default

To my knowledge Allpaths will insist on merging the merging the reads itself.
You could perhaps split the bbmerge merged reads again with 20 bp overlaps and then feed these to Allpaths or try the Tadpole extension as mentioned?
luc is offline   Reply With Quote
Old 07-12-2015, 04:37 PM   #10
fahmida
Member
 
Location: Australia

Join Date: Aug 2010
Posts: 54
Default

Quote:
Originally Posted by Marcela Uliano View Post
Hey guys, I was wondering if any of you solved this problem?

I'm having a similar issue "Less than 10% of fragment pairs were filled."

I have about 113 times coverage for the genome with pair ends and mate pairs, but 49x times only with the pair ends, which is the estimate that ALLPATHS gives at the end. And only half of my PE have an insert size of 180bp, and fragment of 100bp, which overlap. I've done all nextera sequencing.

Do you guys think it means its time for more coverage? Or do you have any advice on how to solve that in silico?

Thank you so much!
Sometime ago I've faced similar issues and after searching into ALLPATHS forums etc I found the suggestion: increase the value of FF_MAX_STRETCH parameter of RunAllPathsLG pipeline, which worked for me at that time.
FF_MAX_STRETCH=4 or 5 etc.
fahmida is offline   Reply With Quote
Old 07-15-2015, 07:57 AM   #11
vingomez
Member
 
Location: USA

Join Date: Sep 2014
Posts: 18
Default

Hi Brian,


Is possible to run only the ecc command in Tadpole?

Code:
java -Xmx8g -cp /path/to/current assemble.Tadpole in1=H_r1.fastq.gz in2=H_r2.fastq.gz  out1=H_ecc_r1.fastq.gz out2=H_ecc_r2.fastq.gz ecc=t

and

Is possible to combine additional commands in combination with ecc using Tadpole; like KmerNormalize (e.g. Normalization, Remove low coverage reads)?

Code:
java -Xmx8g -cp /path/to/current assemble.Tadpole in1=H_r1.fastq.gz in2=H_r2.fastq.gz  out1=H_ecc_r1.fastq.gz out2=H_ecc_r2.fastq.gz target=100 min=6 ecc=t
Thanks


PD. In the previous post (#8) you wrote:

ecct=t (instead of ecc=t)

Last edited by vingomez; 07-15-2015 at 08:00 AM.
vingomez is offline   Reply With Quote
Old 07-15-2015, 10:16 AM   #12
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Hi Vicente,

Yes, you can do error-correction only in Tadpole, like this:

Code:
java -Xmx8g -cp /path/to/current assemble.Tadpole in1=H_r1.fastq.gz in2=H_r2.fastq.gz  out1=H_ecc_r1.fastq.gz out2=H_ecc_r2.fastq.gz mode=correct ecc=t
Like most BBTools, you can simplify the command in this case to:

Code:
java -Xmx8g -cp /path/to/current assemble.Tadpole in=H_r#.fastq.gz out=H_ecc_r#.fastq.gz mode=correct ecc=t
I plan to eventually add normalization and removal of low-coverage reads to Tadpole, but it's not in yet. Note that while BBNorm (KmerNormalize) can handle an unlimited amount of data with finite memory, Tadpole can't, so -Xmx8g may be insufficient for large datasets (it will crash). It should be fine for a one or a few bacteria, though. You can reduce memory consumption at the expense of speed with the prealloc and prefilter flags.

As for ecct=t, thanks for noting that; but in this case, it was actually correct. I use it for BBMerge to differentiate "ecco", error-correction via overlap, with "ecct", error-correction via Tadpole. Sorry it's a bit confusing

This thread is not really the appropriate place for this discussion, though, so I'll create a Tadpole thread and move it there with a redirect.
Brian Bushnell is offline   Reply With Quote
Old 07-21-2015, 06:28 AM   #13
Marcela Uliano
Member
 
Location: Berlin, Germany

Join Date: Apr 2012
Posts: 18
Default Testing Fahmida's idea

Hey guys, before merging and cutting reads, I decided to test the internal parameter suggested by Fahmida, put a higher FF_MAX_STRETCH, and it worked. I could pass the FillFragments step..

I'll let you know how the assembly goes. Let's see!!

Thank you all so much!
Marcela Uliano is offline   Reply With Quote
Reply

Tags
allpath-lg, devovo assembly

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:05 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO