SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
how to use bambus for scaffolding? elisadouzi Bioinformatics 3 05-31-2013 02:07 PM
Scaffolding suggestion? qqsmallfrog Bioinformatics 13 05-29-2013 07:50 PM
Scaffolding tool glacerda Bioinformatics 0 08-04-2010 03:54 PM
Scaffolding problem with Velvet Melissa Bioinformatics 2 01-05-2010 06:44 PM
euler-sr scaffolding tshea Bioinformatics 2 06-11-2009 10:27 AM

Reply
 
Thread Tools
Old 02-16-2011, 03:51 AM   #1
Autotroph
Member
 
Location: Europe

Join Date: Oct 2010
Posts: 22
Default Scaffolding problem

Hi,

I have these scaffolds with different sizes(1 to 200 kb) from a previous assembly.

Now there is some new data with 3 and 5kb insert paired end. I want to add these paired end reads onto the scaffolds.

I tried SOAP denovo, but it only takes paired end reads or single reads. Same thing goes for velvet and abyss.

The data is too big for CAP3.

What other programs are able to handle this kind of data?

Thanks
Autotroph is offline   Reply With Quote
Old 02-16-2011, 05:40 AM   #2
Jean
Member
 
Location: Canada

Join Date: Nov 2008
Posts: 37
Default

Velvet will take long reads and short paired reads in the same assembly. It's described in the current manual pg. 8, "Adding long reads".
Jean is offline   Reply With Quote
Old 02-16-2011, 09:19 AM   #3
boetsie
Senior Member
 
Location: NL, Leiden

Join Date: Feb 2010
Posts: 245
Default

Hi,

do you want to scaffold the previous scaffold, or do you want to extend the previous scaffolds?

Anyway, maybe you can try out SSPACE for this purpose, see this thread;

http://seqanswers.com/forums/showthread.php?t=8350

Kind regards,
Boetsie
boetsie is offline   Reply With Quote
Old 02-16-2011, 08:51 PM   #4
Autotroph
Member
 
Location: Europe

Join Date: Oct 2010
Posts: 22
Default

Thanks.

Ya i guess i will be extending the previous scaffolds.

The problem with using SSPACE is that it does not allow N's in the input contig file.

The scaffolds which i have are having varying insert sizes. Should i break each of them into paired end reads and use as separate libraries to use it in SSPACE?

Velvet is not able to handle long reads which are more than 20KB?
Autotroph is offline   Reply With Quote
Old 02-16-2011, 11:50 PM   #5
boetsie
Senior Member
 
Location: NL, Leiden

Join Date: Feb 2010
Posts: 245
Default

Hi,

You say;
Quote:
The problem with using SSPACE is that it does not allow N's in the input contig file.
while the SSPACE manual says;
Quote:
Contigs having a non-ACGT character like . or N are not discarded. They are used for extension, mapping and building scaffolds. However, contigs having such character at either end of the sequence, could fail for proper contig extension.
So, they can be used for extending, only if the N's are at the end of a sequence it is unable to map reads.

I don't know about Velvet... I know SSAKE (which has basically the same procedure as SSPACE) also can use contigs as 'seeds' and extends them with additional reads. Difference is that SSPACE first maps the reads to the pre-assembled contigs and only uses the unmapped reads for contig/scaffold extension. SSAKE does not include mapping.

Kind regards,
Boetsie
boetsie is offline   Reply With Quote
Old 03-07-2011, 10:46 AM   #6
Ashu
Member
 
Location: NL

Join Date: Aug 2010
Posts: 15
Default SSPACE bo improvement in N50 or contig size

HI Boetsie,
I can't find any improvement before and after scaffolding ... Am I doing something wrong ??? Thanks

-x = 0
-k = 5
-a = 0.7
-n = 15
-p = 0

==================================

Number of single reads found on contigs = 84724494
Number of pairs found with pairing contigs / total pairs = 47882393 / 48019708
------------------------------------------------------------

READ PAIRS STATS:
------------------------------------------------------------
At least one sequence/pair missing from contigs: 137314
Assembled pairs: 47882393 (95764786 sequences)
Satisfied in distance/logic within contigs (i.e. -> <-, distance on target: 2500 +/-1750): 22
Unsatisfied in distance within contigs (i.e. distance out-of-bounds): 11
Unsatisfied pairing logic within contigs (i.e. illogical pairing ->->, <-<- or <-->): 81
---
Satisfied in distance/logic within a given contig pair (pre-scaffold): 26534237
Unsatisfied in distance within a given contig pair (i.e. calculated distances out-of-bounds): 21348042
---
Total satisfied: 26534259 unsatisfied: 21348134

------------------------------------------------------------

################################################################################

SUMMARY:
------------------------------------------------------------
Inserted contig file;
Total number of contigs = 1060008
Sum (bp) = 2114313317
Max contig size = 56175
Min contig size = 200
Average contig size = 1988
N50 = 3918

After scaffolding MP1:
Total number of scaffolds = 1060008
Sum (bp) = 2114313317
Max scaffold size = 56175
Min scaffold size = 200
Average scaffold size = 1988
N50 = 3918
Regards
Ashu is offline   Reply With Quote
Old 03-07-2011, 08:38 PM   #7
Autotroph
Member
 
Location: Europe

Join Date: Oct 2010
Posts: 22
Default longer reads

Thanks for the clarification Boetsie,

Bowtie can handle only reads that are a maximum of 1024 BP long. What does SSPACE do for reads that are longer than that?

I am interested in merging scaffolds, that is merging 2 sequences that look like below(SSPACE does not use reads with N's in the paired end files, am i correct?):

AGCTAGCTAGCTNNNNNNNNNCGATCGATGCNNNNNNNCGATCGATCGATCGNNNNCAGCTAGT


ANNNNNTAGCTACGATCGATCGNNNNNNNNNGATGCACGTACGATNNCGATNNNNNNNNNNNCAGCTAGT
Autotroph is offline   Reply With Quote
Old 03-08-2011, 12:07 AM   #8
boetsie
Senior Member
 
Location: NL, Leiden

Join Date: Feb 2010
Posts: 245
Default

Quote:
Originally Posted by Ashu View Post
HI Boetsie,
I can't find any improvement before and after scaffolding ... Am I doing something wrong ??? Thanks
Hi Ashu,

i'm pretty sure you turned around the library file. Are you using paired-end (--> <-- direction) or mate pair (<-- --> direction) reads? If you use paired-end, your library should look something like this;

libname file1.fasta file2.fasta 700 0.25 0

With the last column containing a 0. For mate pairs, the last column should contain a 1;

libname file1.fasta file2.fasta 700 0.25 1

I think this should do it.

Boetsie
boetsie is offline   Reply With Quote
Old 03-08-2011, 12:16 AM   #9
boetsie
Senior Member
 
Location: NL, Leiden

Join Date: Feb 2010
Posts: 245
Default

Quote:
Originally Posted by Autotroph View Post
Thanks for the clarification Boetsie,

Bowtie can handle only reads that are a maximum of 1024 BP long. What does SSPACE do for reads that are longer than that?
SSPACE can unfortunately not handle sequences longer than 1024 bp long. They simply are not used for mapping.

Quote:
I am interested in merging scaffolds, that is merging 2 sequences that look like below(SSPACE does not use reads with N's in the paired end files, am i correct?)
Indeed SSPACE does not allow reads with N's in the paired-end files.

I think you should consider another program for this, since you mention that you want to merge scaffolds, instead of extend them. You could try something like an alignment program if you want to merge 2 scaffolds. Maybe you can do something like Ken Kraaijeveld (http://www.kenkraaijeveld.nl/genomics/bioinformatics/). See the "combining contigs" section.

Boetsie
boetsie is offline   Reply With Quote
Old 03-08-2011, 01:10 AM   #10
Autotroph
Member
 
Location: Europe

Join Date: Oct 2010
Posts: 22
Default

unfortunately Minimus can be used to merge contigs only, not scaffolds.Bambus is able to merge scaffolds but does not allow N's in the input.

It might be possible for me to use Minimus and SSPACE in some combination to merge the scaffolds.

Could you please look at below example and let me know why SSPACE does not merge the 2 "contigs"?

--------------------_________________--------------------------
read1 read2(rev-comped) (common anchor sequence)

Contigs.fa:

>contig1
AGCTACTAGCTGCTACTAGCTCAGATGCATCGATCGACGATCTGATCGGCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCATCGTACTACGTATCTGATAGCTAGCTAGCTACGATCGATCGTCATCG
>contig2
TGTGTCAGCTAGCTACGAGCTAGCTAGCTACTACTAGCTACTAGCTAGCGCATCGTACTACGTATCTGATAGCTAGCTAGCTACGATCGATCGTCATCG

read1.fa

>read1
AGCTACTAGCTGCTACTAGCTCAGATGCATCGATCGACGATCTGATCGGC

read2 .fa(first 50 bases of contig2 are reverse complemented)

>read2
CGCTAGCTAGTAGCTAGTAGTAGCTAGCTAGCTCGTAGCTAGCTGACACA

lib file:

lib1 read1.fa read2.fa 100 0.7 0

command:

perl SSPACE_v1-1.pl -l lib -s contigs.fa -k 1 -a 0.7 -x 1 -o 1 -b merger

This gives me 2 scaffolds instead of the 1 scaffold that i am expecting. When the length of the anchor sequence is reduced, it gives a single scaffold with a "n" placed between the 2 scaffolds.

Surprisingly if the same information is given in the form a set of 2 mate pairs, the 2 scaffolds are merged. My guess would be that SSPACE does not treat the initial set of N's in the same way as the N's added by it in the intermediate steps.

Last edited by Autotroph; 03-08-2011 at 02:03 AM. Reason: additional information
Autotroph is offline   Reply With Quote
Old 03-08-2011, 01:39 AM   #11
Ashu
Member
 
Location: NL

Join Date: Aug 2010
Posts: 15
Default

Hi Boetsie,
Thank you for the information,
I have a mate pair, with a distance, estimated by bioanalyzer,
My library looks as follows

MP1 /G1/2_5kb/s_a_sequence_1.fastq /G1/2_5kb/s_a_sequence_2.fastq 2500 0.7 1
MP1 /G1/2_5kb/s_b_sequence_1.fastq /G1/2_5kb/s_b_sequence_2.fastq 2500 0.7 1
MP1 /G2/2_5kb/s_a_sequence_1.fastq /G2/2_5kb/s_a_sequence_2.fastq 2500 0.7 1
MP1 /G2/2_5kb/s_b_sequence_1.fastq /G2/2_5kb/s_b_sequence_2.fastq 2500 0.7 1
MP1 /G2/2_5kb/s_c_sequence_1.fastq /G2/2_5kb/s_c_sequence_2.fastq 2500 0.7 1
MP1 /G2/2_5kb/s_d_sequence_1.fastq /G2/2_5kb/s_d_sequence_2.fastq 2500 0.7 1

I will try it with paired end form (0), but i cant imagine why it turns out to be paired end not matepair. In the pairing issue file, I also see that there is a lot of distance problem, is there a way to put this in graph.
Thank you again for your kind reaction,
regards,
Ashu
Ashu is offline   Reply With Quote
Old 03-08-2011, 03:42 AM   #12
boetsie
Senior Member
 
Location: NL, Leiden

Join Date: Feb 2010
Posts: 245
Default

Quote:
Originally Posted by Autotroph View Post
Could you please look at below example and let me know why SSPACE does not merge the 2 "contigs"?
Hi Autotroph,

I've had a look at it, and i think i know why it did not merge. You should increase the insert size in your library file. SSPACE includes the read lengths within the determination of the gap/overlap. With 100bp insert size, it did not satisfy the minimum allowed distance.

The read lengths of your 2 reads are both 50bp. So increasing the insert size in your library with 100 (2*50bp of your reads) should do it, thus;

lib1 read1.fa read2.fa 200 0.7 0

If you need a more detailed description, please let me know

Kind regards,
Boetsie
boetsie is offline   Reply With Quote
Old 03-08-2011, 04:26 AM   #13
Autotroph
Member
 
Location: Europe

Join Date: Oct 2010
Posts: 22
Default

The point of giving an insert size of 100(50+50) is to not have any gaps in the final scaffold. I understood that the two reads could even overlap if an insert size less than 100 is given for 2*50 bp reads.

Actual sequence (without any gaps)expected would be:

"AGCTACTAGCTGCTACTAGCTCAGATGCATCGATCGACGATCTGATCGGCTGTGTCAGCTAGCTACGAGCTAGCTAGCTACTACTAGCTACTAGCTAGCGCATCGTACTACGTATCTGATAGCTAGCTAGCTACGATCGATCGTCATCG"

I even tried with 200 as insert size, but it fails to merge the contigs "correctly".

output given below :

>scaffold1.1|size269
AGCTACTAGCTGCTACTAGCTCAGATGCATCGATCGACGATCTGATCGGCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCATCGTACTACGTATCTGATAGCTAGCTAGCTACGATCGATCGTCATCGnCGATCGACGATCTGATCGGCTGTGTCAGCTAGCTACGAGCTAGCTAGCTACTACTAGCTACTAGCTAGCGCATCGTACTACGTATCTGATAGCTAGCTAGCTACGATCGATCGTCATCG

Does it mean that the two reads of PE must have a gap between them?

Why "TGTGTCAGCTAGCTACGAGCTAGCTAGCTACTACTAGCTACTAGCTAGCG" is not replacing the N's while it has overlap and also has PE read connecting the two 'contigs'?
Autotroph is offline   Reply With Quote
Old 03-08-2011, 04:50 AM   #14
boetsie
Senior Member
 
Location: NL, Leiden

Join Date: Feb 2010
Posts: 245
Default

Hi Autotroph,

sorry but i think it's simply not possible to merge them with SSPACE with the method you try to do. SSPACE will only look at the end of the contigs if there is any overlap, while you try to change the "N" characters into DNA characters by merging.

SSPACE does this;
CATCGTACTACGTATCTGATAGCTAGCTAGCTACGATCGATC
.............................
GCTACGATCGATCAGTAGTAGATAGATAGATGATAG

While you try to find an certain overlap, and determine the rest of the sequence;

NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCATCGTACTACGTATCTGATAGCTAGCTAGCTACGATCGATCGTCATCG


TGTGTCAGCTAGCTACGAGCTAGCTAGCTACTACTAGCTACTAGCTAGCGCATCGTACTACGTATCTGATAGCTAGCTAGCTACGATCGATCGTCATCG
.......

As said, i think what you want to do is not possible with SSPACE. Maybe you can first do a gapclosure on the scaffolds (e.g. with SOAP's gapclosure method) so the N's will be removed out of your data.

Boetsie
boetsie is offline   Reply With Quote
Old 03-08-2011, 04:59 AM   #15
Autotroph
Member
 
Location: Europe

Join Date: Oct 2010
Posts: 22
Smile

Hi boetsie,

Thanks a lot for the patient explanation.
Autotroph is offline   Reply With Quote
Old 03-08-2011, 05:06 AM   #16
boetsie
Senior Member
 
Location: NL, Leiden

Join Date: Feb 2010
Posts: 245
Default

No problem, good luck with your further analysis.
boetsie is offline   Reply With Quote
Old 06-28-2011, 07:25 PM   #17
christinawu2008
Member
 
Location: Australia

Join Date: Feb 2011
Posts: 13
Default bowtie-build error

Quote:
Originally Posted by boetsie View Post
No problem, good luck with your further analysis.
Hi boetsie,

SSPACE must be very useful tool for scaffolding. But when I tried to use it, the process was failed by bowtie-build step. I only have contig file contains all name with super_contig sequences without other information and there are lots of 'N' gaps between. Do I need to modify some information and get bowtie-build works? If not, what's the problem?

The reads I have are 100PE
so the library is like
lib1 ***1.fastq ***2.fastq 200 0.7 0
or I should replace 200 to 400?
christinawu2008 is offline   Reply With Quote
Reply

Tags
assembler, scaffold, soap

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:48 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO