Seqanswers Leaderboard Ad

**Jean** · 02-16-2011, 06:40 AM

Velvet will take long reads and short paired reads in the same assembly. It's described in the current manual pg. 8, "Adding long reads".

**boetsie** · 02-16-2011, 10:19 AM

Hi,

do you want to scaffold the previous scaffold, or do you want to extend the previous scaffolds?

Anyway, maybe you can try out SSPACE for this purpose, see this thread;

SSPACE: a new stand-alone scaffolding tool for small and large genomes - SEQanswers

http://seqanswers.com/forums/showthread.php?t=8350

Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

Kind regards,
Boetsie

**Autotroph** · 02-16-2011, 09:51 PM

Thanks.

Ya i guess i will be extending the previous scaffolds.

The problem with using SSPACE is that it does not allow N's in the input contig file.

The scaffolds which i have are having varying insert sizes. Should i break each of them into paired end reads and use as separate libraries to use it in SSPACE?

Velvet is not able to handle long reads which are more than 20KB?

**boetsie** · 02-17-2011, 12:50 AM

Hi,

You say;

The problem with using SSPACE is that it does not allow N's in the input contig file.

while the SSPACE manual says;

Contigs having a non-ACGT character like “.” or “N” are not discarded. They are used for extension, mapping and building scaffolds. However, contigs having such character at either end of the sequence, could fail for proper contig extension.

So, they can be used for extending, only if the N's are at the end of a sequence it is unable to map reads.

I don't know about Velvet... I know SSAKE (which has basically the same procedure as SSPACE) also can use contigs as 'seeds' and extends them with additional reads. Difference is that SSPACE first maps the reads to the pre-assembled contigs and only uses the unmapped reads for contig/scaffold extension. SSAKE does not include mapping.

Kind regards,
Boetsie

**Ashu** · 03-07-2011, 11:46 AM

SSPACE bo improvement in N50 or contig size

HI Boetsie,
I can't find any improvement before and after scaffolding ... Am I doing something wrong ??? Thanks

-x = 0
-k = 5
-a = 0.7
-n = 15
-p = 0

==================================

Number of single reads found on contigs = 84724494
Number of pairs found with pairing contigs / total pairs = 47882393 / 48019708
------------------------------------------------------------

READ PAIRS STATS:
------------------------------------------------------------
At least one sequence/pair missing from contigs: 137314
Assembled pairs: 47882393 (95764786 sequences)
Satisfied in distance/logic within contigs (i.e. -> <-, distance on target: 2500 +/-1750): 22
Unsatisfied in distance within contigs (i.e. distance out-of-bounds): 11
Unsatisfied pairing logic within contigs (i.e. illogical pairing ->->, <-<- or <-->): 81
---
Satisfied in distance/logic within a given contig pair (pre-scaffold): 26534237
Unsatisfied in distance within a given contig pair (i.e. calculated distances out-of-bounds): 21348042
---
Total satisfied: 26534259 unsatisfied: 21348134

------------------------------------------------------------

################################################################################

SUMMARY:
------------------------------------------------------------
Inserted contig file;
Total number of contigs = 1060008
Sum (bp) = 2114313317
Max contig size = 56175
Min contig size = 200
Average contig size = 1988
N50 = 3918

After scaffolding MP1:
Total number of scaffolds = 1060008
Sum (bp) = 2114313317
Max scaffold size = 56175
Min scaffold size = 200
Average scaffold size = 1988
N50 = 3918
Regards

**Autotroph** · 03-07-2011, 09:38 PM

longer reads

Thanks for the clarification Boetsie,

Bowtie can handle only reads that are a maximum of 1024 BP long. What does SSPACE do for reads that are longer than that?

I am interested in merging scaffolds, that is merging 2 sequences that look like below(SSPACE does not use reads with N's in the paired end files, am i correct?):

AGCTAGCTAGCTNNNNNNNNNCGATCGATGCNNNNNNNCGATCGATCGATCGNNNNCAGCTAGT

ANNNNNTAGCTACGATCGATCGNNNNNNNNNGATGCACGTACGATNNCGATNNNNNNNNNNNCAGCTAGT

**boetsie** · 03-08-2011, 01:07 AM

Originally posted by Ashu View Post

HI Boetsie,
I can't find any improvement before and after scaffolding ... Am I doing something wrong ??? Thanks

Hi Ashu,

i'm pretty sure you turned around the library file. Are you using paired-end (--> <-- direction) or mate pair (<-- --> direction) reads? If you use paired-end, your library should look something like this;

libname file1.fasta file2.fasta 700 0.25 0

With the last column containing a 0. For mate pairs, the last column should contain a 1;

libname file1.fasta file2.fasta 700 0.25 1

I think this should do it.

Boetsie

**boetsie** · 03-08-2011, 01:16 AM

Originally posted by Autotroph View Post

Thanks for the clarification Boetsie,

Bowtie can handle only reads that are a maximum of 1024 BP long. What does SSPACE do for reads that are longer than that?

SSPACE can unfortunately not handle sequences longer than 1024 bp long. They simply are not used for mapping.

I am interested in merging scaffolds, that is merging 2 sequences that look like below(SSPACE does not use reads with N's in the paired end files, am i correct?)

Indeed SSPACE does not allow reads with N's in the paired-end files.

I think you should consider another program for this, since you mention that you want to merge scaffolds, instead of extend them. You could try something like an alignment program if you want to merge 2 scaffolds. Maybe you can do something like Ken Kraaijeveld (http://www.kenkraaijeveld.nl/genomics/bioinformatics/). See the "combining contigs" section.

Boetsie

**Autotroph** · 03-08-2011, 02:10 AM

unfortunately Minimus can be used to merge contigs only, not scaffolds.Bambus is able to merge scaffolds but does not allow N's in the input.

It might be possible for me to use Minimus and SSPACE in some combination to merge the scaffolds.

Could you please look at below example and let me know why SSPACE does not merge the 2 "contigs"?

--------------------_________________--------------------------
read1 read2(rev-comped) (common anchor sequence)

Contigs.fa:

>contig1
AGCTACTAGCTGCTACTAGCTCAGATGCATCGATCGACGATCTGATCGGCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCATCGTACTACGTATCTGATAGCTAGCTAGCTACGATCGATCGTCATCG
>contig2
TGTGTCAGCTAGCTACGAGCTAGCTAGCTACTACTAGCTACTAGCTAGCGCATCGTACTACGTATCTGATAGCTAGCTAGCTACGATCGATCGTCATCG

read1.fa

>read1
AGCTACTAGCTGCTACTAGCTCAGATGCATCGATCGACGATCTGATCGGC

read2 .fa(first 50 bases of contig2 are reverse complemented)

>read2
CGCTAGCTAGTAGCTAGTAGTAGCTAGCTAGCTCGTAGCTAGCTGACACA

lib file:

lib1 read1.fa read2.fa 100 0.7 0

command:

perl SSPACE_v1-1.pl -l lib -s contigs.fa -k 1 -a 0.7 -x 1 -o 1 -b merger

This gives me 2 scaffolds instead of the 1 scaffold that i am expecting. When the length of the anchor sequence is reduced, it gives a single scaffold with a "n" placed between the 2 scaffolds.

Surprisingly if the same information is given in the form a set of 2 mate pairs, the 2 scaffolds are merged. My guess would be that SSPACE does not treat the initial set of N's in the same way as the N's added by it in the intermediate steps.

**Ashu** · 03-08-2011, 02:39 AM

Hi Boetsie,
Thank you for the information,
I have a mate pair, with a distance, estimated by bioanalyzer,
My library looks as follows

MP1 /G1/2_5kb/s_a_sequence_1.fastq /G1/2_5kb/s_a_sequence_2.fastq 2500 0.7 1
MP1 /G1/2_5kb/s_b_sequence_1.fastq /G1/2_5kb/s_b_sequence_2.fastq 2500 0.7 1
MP1 /G2/2_5kb/s_a_sequence_1.fastq /G2/2_5kb/s_a_sequence_2.fastq 2500 0.7 1
MP1 /G2/2_5kb/s_b_sequence_1.fastq /G2/2_5kb/s_b_sequence_2.fastq 2500 0.7 1
MP1 /G2/2_5kb/s_c_sequence_1.fastq /G2/2_5kb/s_c_sequence_2.fastq 2500 0.7 1
MP1 /G2/2_5kb/s_d_sequence_1.fastq /G2/2_5kb/s_d_sequence_2.fastq 2500 0.7 1

I will try it with paired end form (0), but i cant imagine why it turns out to be paired end not matepair. In the pairing issue file, I also see that there is a lot of distance problem, is there a way to put this in graph.
Thank you again for your kind reaction,
regards,
Ashu

**boetsie** · 03-08-2011, 04:42 AM

Originally posted by Autotroph View Post

Could you please look at below example and let me know why SSPACE does not merge the 2 "contigs"?

Hi Autotroph,

I've had a look at it, and i think i know why it did not merge. You should increase the insert size in your library file. SSPACE includes the read lengths within the determination of the gap/overlap. With 100bp insert size, it did not satisfy the minimum allowed distance.

The read lengths of your 2 reads are both 50bp. So increasing the insert size in your library with 100 (2*50bp of your reads) should do it, thus;

lib1 read1.fa read2.fa 200 0.7 0

If you need a more detailed description, please let me know

Kind regards,
Boetsie

**Autotroph** · 03-08-2011, 05:26 AM

The point of giving an insert size of 100(50+50) is to not have any gaps in the final scaffold. I understood that the two reads could even overlap if an insert size less than 100 is given for 2*50 bp reads.

Actual sequence (without any gaps)expected would be:

"AGCTACTAGCTGCTACTAGCTCAGATGCATCGATCGACGATCTGATCGGCTGTGTCAGCTAGCTACGAGCTAGCTAGCTACTACTAGCTACTAGCTAGCGCATCGTACTACGTATCTGATAGCTAGCTAGCTACGATCGATCGTCATCG"

I even tried with 200 as insert size, but it fails to merge the contigs "correctly".

output given below :

>scaffold1.1|size269
AGCTACTAGCTGCTACTAGCTCAGATGCATCGATCGACGATCTGATCGGCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCATCGTACTACGTATCTGATAGCTAGCTAGCTACGATCGATCGTCATCGnCGATCGACGATCTGATCGGCTGTGTCAGCTAGCTACGAGCTAGCTAGCTACTACTAGCTACTAGCTAGCGCATCGTACTACGTATCTGATAGCTAGCTAGCTACGATCGATCGTCATCG

Does it mean that the two reads of PE must have a gap between them?

Why "TGTGTCAGCTAGCTACGAGCTAGCTAGCTACTACTAGCTACTAGCTAGCG" is not replacing the N's while it has overlap and also has PE read connecting the two 'contigs'?

**boetsie** · 03-08-2011, 05:50 AM

Hi Autotroph,

sorry but i think it's simply not possible to merge them with SSPACE with the method you try to do. SSPACE will only look at the end of the contigs if there is any overlap, while you try to change the "N" characters into DNA characters by merging.

SSPACE does this;
CATCGTACTACGTATCTGATAGCTAGCTAGCTACGATCGATC
.............................GCTACGATCGATCAGTAGTAGATAGATAGATGATAG

While you try to find an certain overlap, and determine the rest of the sequence;

NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCATCGTACTACGTATCTGATAGCTAGCTAGCTACGATCGATCGTCATCG

TGTGTCAGCTAGCTACGAGCTAGCTAGCTACTACTAGCTACTAGCTAGCGCATCGTACTACGTATCTGATAGCTAGCTAGCTACGATCGATCGTCATCG.......

As said, i think what you want to do is not possible with SSPACE. Maybe you can first do a gapclosure on the scaffolds (e.g. with SOAP's gapclosure method) so the N's will be removed out of your data.

Boetsie

**Autotroph** · 03-08-2011, 05:59 AM

Hi boetsie,

Thanks a lot for the patient explanation.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 25 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 24 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Scaffolding problem

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News