SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
PubMed: Parallelized short read assembly of large genomes using de Bruijn graphs. Newsbot! Literature Watch 0 12-30-2011 02:00 AM
Assembly of Large Genomes using Cloud Computing by Contrail Gangcai De novo discovery 9 11-23-2011 07:42 AM
Scaffolding tool glacerda Bioinformatics 0 08-04-2010 03:54 PM
PubMed: BFAST: An Alignment Tool for Large Scale Genome Resequencing. Newsbot! Literature Watch 0 11-13-2009 02:10 AM
BFAST: Blat-like Fast Accurate Search Tool for Large-Scale Genome Resequencing nilshomer Bioinformatics 1 11-06-2008 09:36 PM

Reply
 
Thread Tools
Old 01-09-2013, 11:26 AM   #161
boetsie
Senior Member
 
Location: NL, Leiden

Join Date: Feb 2010
Posts: 245
Default

No, this won't work, since both reads of a pair should be mapped along the contigs. You better make paired-end data by splitting the reads. For example if your read is 200 long, you can make a paired-end read of the first 100bp and the last 100bp. Specify your insert size as 200bp. I've never done this, but I think this could work.

Regards,
Boetsie

Quote:
Originally Posted by sheepyuan View Post
hi,
I have a question.I have some single-end 454 data, how would the SSPACE run if I artificially make it a pair-end data whose sequence of the other side is all "NNNNNNNNNN"?
boetsie is offline   Reply With Quote
Old 01-10-2013, 12:18 AM   #162
sheepyuan
Junior Member
 
Location: China

Join Date: May 2012
Posts: 3
Default

Quote:
Originally Posted by boetsie View Post
No, this won't work, since both reads of a pair should be mapped along the contigs. You better make paired-end data by splitting the reads. For example if your read is 200 long, you can make a paired-end read of the first 100bp and the last 100bp. Specify your insert size as 200bp. I've never done this, but I think this could work.

Regards,
Boetsie
Thank you very much, I'll try your method of splitting the read!
sheepyuan is offline   Reply With Quote
Old 01-15-2013, 09:34 AM   #163
aharkess
Junior Member
 
Location: Georgia

Join Date: Aug 2011
Posts: 3
Default SSPACE combining cDNA and PE/MP

Hi all,

I'm using SSPACE with a wealth of data, from small PE libraries up to 20kb and 40kb mate pair libraries. In addition, I have three lanes of 2x100nt RNAseq which I'm curious if could be incorporated. My genome is highly repetitive (70%) , so I'm hoping that the more gene space sequence, the better.

I've seen the nematode paper where RNApath was used to scaffold a genome with RNAseq reads, but has anyone successfully used cDNA + PE/MP WGS data in SSPACE? There are some obvious considerations with splicing, but perhaps the plus/minus insert size error can take this into account?

Thanks,
Alex
__________________
==========
Alex Harkess
Leebens-Mack Lab
Plant Biology Department
University of Georgia, Athens GA

Last edited by aharkess; 01-15-2013 at 09:50 AM.
aharkess is offline   Reply With Quote
Old 02-21-2013, 05:12 AM   #164
yzzhang
Member
 
Location: florida

Join Date: Jan 2013
Posts: 67
Default

Hello, have you use SSPACE for scaffolding your genome using RNA-seq data? How did you determine your insert size data?Thanks.
Quote:
Originally Posted by aharkess View Post
Hi all,

I'm using SSPACE with a wealth of data, from small PE libraries up to 20kb and 40kb mate pair libraries. In addition, I have three lanes of 2x100nt RNAseq which I'm curious if could be incorporated. My genome is highly repetitive (70%) , so I'm hoping that the more gene space sequence, the better.

I've seen the nematode paper where RNApath was used to scaffold a genome with RNAseq reads, but has anyone successfully used cDNA + PE/MP WGS data in SSPACE? There are some obvious considerations with splicing, but perhaps the plus/minus insert size error can take this into account?

Thanks,
Alex
yzzhang is offline   Reply With Quote
Old 02-27-2013, 06:40 AM   #165
CPCantalapiedra
Member
 
Location: Zaragoza (Spain)

Join Date: Sep 2011
Posts: 38
Default

Hi!

I am getting very good results with SSPACE Boetsie, which I plan to use forward with GapFiller.
I have a bunch of questions though, but the one more important now is about the foundlinks files.

I am sure I am missing the true naming convention of the foundlinks file (I mean, r1 f1 does mean contig1 in formattedcontigs file, and so on?). Any light on this please?

If the question it is not well understood, read below (if it is, skip it)

I have done several SSPACE runs over Velvet generated contigs, arranged in different fasta inputs:
- 1: contigs 1,3,4,6,7
- 2: contigs 2,3,4,6
- 3: contigs 1,2,4,6

I use SSPACE with two read libraries, in two runs. The first one with both libraries, the second one with the bigger insert size library. Both runs are free of scaffolds correct ones, and then I inspect the links. However, in the run1.big_insert_lib.foundlinks I have the same links than in run2.big_insert_lib.foundlinks, but I am not able to associate them to the same contigs, using the formattedcontigs file for name translation. (the question above
CPCantalapiedra is offline   Reply With Quote
Old 03-05-2013, 07:35 AM   #166
boetsie
Senior Member
 
Location: NL, Leiden

Join Date: Feb 2010
Posts: 245
Default

Quote:
Originally Posted by CPCantalapiedra View Post
I am sure I am missing the true naming convention of the foundlinks file (I mean, r1 f1 does mean contig1 in formattedcontigs file, and so on?). Any light on this please?
Yes, this is indeed the case.

I'm not sure if I have to reply to the remaining questions, please contact me with a personal message if you need further help.

Regards,
Boetsie
boetsie is offline   Reply With Quote
Old 03-25-2013, 12:47 AM   #167
pjuneja
Member
 
Location: United Kingdom

Join Date: Aug 2011
Posts: 12
Default

I am using SSPACE to improve a genome assembly, and unfortunately it is giving me results that conflict with a genetic linkage map that I have made. I am trying to figure out what is causing the discrepancy. My organism has a repetitive genome so I suspect that is playing into it. Here are my questions:

1) I've read that SSPACE does not use reads that map to multiple locations within the genome. How does it obtain this information? Does it map to the entire scaffold or just the scaffold edges?

2) Is there a way to extract the exact positions where pairs used for scaffolding are mapped?

3) My ratio of pairs that satisfy:do not satistify distance/logic requirements within contigs is way different than for pairs that map to different contigs. Is this normal? For example:

Satisfied in distance/logic within contigs (i.e. -> <-, distance on target: 600 +/-570): 110771
Unsatisfied in distance within contigs (i.e. distance out-of-bounds): 300
Unsatisfied pairing logic within contigs (i.e. illogical pairing ->->, <-<- or <-->): 1897
---
Satisfied in distance/logic within a given contig pair (pre-scaffold): 6691
Unsatisfied in distance within a given contig pair (i.e. calculated distances out-of-bounds): 56729

Thanks so much for your help and for writing a very useful program!
pjuneja is offline   Reply With Quote
Old 04-04-2013, 07:30 AM   #168
boetsie
Senior Member
 
Location: NL, Leiden

Join Date: Feb 2010
Posts: 245
Default

Quote:
Originally Posted by pjuneja View Post
1) I've read that SSPACE does not use reads that map to multiple locations within the genome. How does it obtain this information? Does it map to the entire scaffold or just the scaffold edges?
It only uses the edges of the contigs/scaffolds based on the max insert size you have provided (insert size + (insert size * stdev)).

Quote:
Originally Posted by pjuneja View Post
2) Is there a way to extract the exact positions where pairs used for scaffolding are mapped?
No, I'm sorry, this can not be extracted from SSPACE. Only pairs that could not map correctly are stored in the folder 'pairinfo'.

You could of course map the reads to the edges yourself. The edges are in the 'alignoutput' folder, and the reads in the 'reads' folder.

Quote:
Originally Posted by pjuneja View Post
3) My ratio of pairs that satisfy:do not satistify distance/logic requirements within contigs is way different than for pairs that map to different contigs. Is this normal? For example:

Satisfied in distance/logic within contigs (i.e. -> <-, distance on target: 600 +/-570): 110771
Unsatisfied in distance within contigs (i.e. distance out-of-bounds): 300
Unsatisfied pairing logic within contigs (i.e. illogical pairing ->->, <-<- or <-->): 1897
---
Satisfied in distance/logic within a given contig pair (pre-scaffold): 6691
Unsatisfied in distance within a given contig pair (i.e. calculated distances out-of-bounds): 56729

Thanks so much for your help and for writing a very useful program!
Well, it is common that the number of pairs within a contig is higher than the number of pairs between pairs. Though, the number of pairs between two contigs is very high. I can't tell you why this is the case..

Regards,
Boetsie
boetsie is offline   Reply With Quote
Old 04-15-2013, 01:29 PM   #169
mxr1895
Junior Member
 
Location: new zealand

Join Date: Feb 2012
Posts: 6
Default

Hi,
I'm using SSPACE to extend a hybrid (454 + illumina PE) denovo assembly using more illumina PE reads. I'm running into a problem with PERL after the mapping and extension:
System is UBUNTU 12.04 64 bit. Looks like this:

=>Mon Apr 15 18:27:32 2013: Reading, filtering and converting input sequences of library file initiated

------------------------------------------------------------

=>Mon Apr 15 18:54:15 2013: Building Bowtie index for contigs

=>Mon Apr 15 18:54:20 2013: Mapping reads to Bowtie index

=>Mon Apr 15 19:30:28 2013: Contig extension initiated

LIBRARY Lib1
------------------------------------------------------------

=>Mon Apr 15 19:48:03 2013: Reading contig file

=>Mon Apr 15 19:48:03 2013: Building Bowtie index for contigs

=>Mon Apr 15 19:48:08 2013: Mapping reads to contigs. Reading bowtie output and pairing contigs

=>Mon Apr 15 20:19:00 2013: Building scaffolds file

=>Mon Apr 15 20:19:01 2013: Merging contigs and creating fasta file of scaffolds
100Quantifier follows nothing in regex; marked by <-- HERE in m/* <-- HERE TTAAAAAA*CGTTTCTAACAGCTCTAGCAATATTCTAATTTCGAAAGT/ at /home/mron003/Programs/SSPACE-BASIC-2.0_linux-x86_64/SSPACE_Basic_v2.0.pl line 447, <IN> line 102.

Any ideas on what's going on?
Cheers,

Miguel
mxr1895 is offline   Reply With Quote
Old 04-17-2013, 10:51 AM   #170
brettroberts89
Junior Member
 
Location: San Diego, CA

Join Date: Mar 2013
Posts: 1
Post

Will the single end reads specified with the -u option be incorporated if -x is set to 0?

And is there a way to tell from the output files of SSPACE if these reads were used?
brettroberts89 is offline   Reply With Quote
Old 05-06-2013, 01:03 PM   #171
jjjscuedu
Member
 
Location: NY

Join Date: Mar 2012
Posts: 35
Default SSPACE error

Dear all,

I have a pair end 454 library, which I extract pair end sequence by myself into two files like this:

left
>H68R2DI01DH3A5/1
TTTCAAAGGAGATTGTCTGATAACTTCTCAAGAAAGAGAGCGTATGAATAGAGTTCCATATGCTTTGGCAG
>H68R2DI01DUQPJ/1
TTTCAAAGGAGATTGTCTGATAACTTCTCAAGAAAGAGAGCGTATGAATAGAGTTCCATATGCTTTGGCAG
>H68R2DI01DYR3Y/1
ACAATCTTCCTATACCAATCAAAATGACCATCTAGCAATGATATCCGATGTTCGGATAGGTCAAAAGATTGCAAAGTATCATTCAAGAACCTATTGGCAT


right
>H68R2DI01DH3A5/2
CTTGAAAATCAAAAGGCCGTATATGATAGGGCCGGACTTGGCTATAACCCTAC
>H68R2DI01DUQPJ/2
CGTAAAGAAACTAAAGTCTCGTAAAGTAAAATTTATTTAGTAAGTTAAATTTACTTAACGTAAAGTTAAAGTTAACGTTACCCTAAACCTAAATTAACCT
>H68R2DI01DYR3Y/2
GAGAACTGTGATGACAATCAAACTTTTATTCTCTGTAATGTAGGGATATCATTTTTGTATTAAGAGAATGTCATCGACATAC
>H68R2DI01DJ0QZ/2
AATAAATATACATATTCAATGCAACAATGAATAGGTACTCCTTGAAGTTTAAAAATCATATAAATT



Then, I run SSPACE like this:

perl /home/jingjing/software/SSPACE-BASIC-2.0_linux-x86_64/SSPACE_Basic_v2.0.pl -l library.txt -s ../oilpalm.gapclose.fa -k 5 -a 0.7 -x 0 -b oil_palm_no_extension


the library file is like this:

lib1 /backup/454/left_short.fa /backup/454/right_short.fa 20000 0.35 RF


However, for the log is very strange:

Your inserted inputs on [SSPACE_Basic_v2.0_linux] at Tue May 7 04:54:10 2013:
Required inputs:
-l = library.txt
-s = ../oilpalm.gapclose.fa
-b = oil_palm_no_extension

Optional inputs:
-x = 0
-z = 0
-k = 5
-a = 0.7
-n = 15
-T = 1
-p = 0


=>Tue May 7 04:54:10 2013: Reading, filtering and converting input sequences of library file initiated
Reading read-pairs lib1.1 @ 0 //there are no pair reads

------------------------------------------------------------

=>Tue May 7 04:54:12 2013: Storing contigs to format for scaffolding

LIBRARY lib1
------------------------------------------------------------

=>Tue May 7 04:56:33 2013: Reading contig file

=>Tue May 7 04:57:07 2013: Building Bowtie index for contigs


In the reads folder, I can find it correct parse the reads:

[jingjing@tll-bioinfo02 reads]$ less -h 5 oil_palm_no_extension.lib1.file1.fa
>read0/1
TTTCAAAGGAGATTGTCTGATAACTTCTCAAGAAAGAGAGCGTATGAATAGAGTTCCATATGCTTTGGCAG
>read0/2
CTTGAAAATCAAAAGGCCGTATATGATAGGGCCGGACTTGGCTATAACCCTAC
>read1/1
TTTCAAAGGAGATTGTCTGATAACTTCTCAAGAAAGAGAGCGTATGAATAGAGTTCCATATGCTTTGGCAG
>read1/2
CGTAAAGAAACTAAAGTCTCGTAAAGTAAAATTTATTTAGTAAGTTAAATTTACTTAACGTAAAGTTAAAGTTAACGTTACCCTAAACCTAAATTAACCT
>read2/1
ACAATCTTCCTATACCAATCAAAATGACCATCTAGCAATGATATCCGATGTTCGGATAGGTCAAAAGATTGCAAAGTATCATTCAAGAACCTATTGGCAT
>read2/2
GAGAACTGTGATGACAATCAAACTTTTATTCTCTGTAATGTAGGGATATCATTTTTGTATTAAGAGAATGTCATCGACATAC


Can anyone give me some suggestions?

Jingjing
jjjscuedu is offline   Reply With Quote
Old 05-28-2013, 08:44 AM   #172
seb.lees
Member
 
Location: France, Poitiers

Join Date: Sep 2012
Posts: 12
Default

Hello boetsie,

I've posted a new thread about a problem I have when merging overlapping contigs using SSPACE:

http://seqanswers.com/forums/showpos...35&postcount=1

regards,

seb.


Quote:
Originally Posted by boetsie View Post
You could decrease the -a value to 0.5 (meaning that there should at least be 2 times more links) if multiple links are found.

The -n parameter is useful for merging two contigs. Say you have contigA and contigB, they are scaffolded with a gap of -20bp. Then SSPACE will search for an overlap of -n or more nucleotides:

contigA
AGATGATATAAAAGTATAGATTA
contigB
ATAAAAGTATAGATTAGGGGTTATGATA

overlap:
AGATGATATAAAAGTATAGATTA
-------ATAAAAGTATAGATTAGGGGTTATGATA


So if the size of the overlap is above the defined -n parameter, they are merged together;
AGATGATATAAAAGTATAGATTAGGGGTTATGATA

regards,
Boetsie
seb.lees is offline   Reply With Quote
Old 05-29-2013, 01:03 AM   #173
boetsie
Senior Member
 
Location: NL, Leiden

Join Date: Feb 2010
Posts: 245
Default

Hi seb,

could you maybe send me a personal message and show me by an example what you mean?

Regards,
Boetsie
boetsie is offline   Reply With Quote
Old 08-02-2013, 11:41 AM   #174
OTU
Member
 
Location: Utah

Join Date: May 2013
Posts: 44
Default

Hi Boetsie,

I am trying to run SSPACE and I am having an error "Can't locate getopts.pl in @INC (@INC contains: /home/annet/Programs/SSPACE/dotlib/ /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at /home/annet/Programs/SSPACE/SSPACE_Basic_v2.0.pl line 87."

Could you please tell me, where I can find this file getopts.pl??

Waiting forward to your answer!

Anna
OTU is offline   Reply With Quote
Old 08-02-2013, 01:31 PM   #175
boetsie
Senior Member
 
Location: NL, Leiden

Join Date: Feb 2010
Posts: 245
Default

Hmmm, It seems that in the newest version of perl they removed the getopts.pl library (see http://search.cpan.org/~rjbs/perl-5....s_and_Pragmata). At this site they explain how to solve this issue:
http://heasarc.gsfc.nasa.gov/lheasoft/bugs.html

Hope this helps.
Boetsie

Quote:
Originally Posted by OTU View Post
Hi Boetsie,

I am trying to run SSPACE and I am having an error "Can't locate getopts.pl in @INC (@INC contains: /home/annet/Programs/SSPACE/dotlib/ /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at /home/annet/Programs/SSPACE/SSPACE_Basic_v2.0.pl line 87."

Could you please tell me, where I can find this file getopts.pl??

Waiting forward to your answer!

Anna
boetsie is offline   Reply With Quote
Old 08-02-2013, 07:18 PM   #176
OTU
Member
 
Location: Utah

Join Date: May 2013
Posts: 44
Default

Thank you, Boetsie! It helped!

Anna
OTU is offline   Reply With Quote
Old 08-04-2013, 05:19 PM   #177
OTU
Member
 
Location: Utah

Join Date: May 2013
Posts: 44
Default

Boetsie,

I am having troubles with my input data. At the very beginning of the run I get an error:
>> Can't write to single file filereads//home/annet/output6/TM7.Lib1.filtered.readpairs.singles.fasta-- fatal

What can it be about? My data consists of two fastq FR sequences and a fasta contig data.

Anna
OTU is offline   Reply With Quote
Old 09-21-2013, 08:03 PM   #178
xuseq
Junior Member
 
Location: China

Join Date: Sep 2013
Posts: 2
Default

Hi,Anna
Did you resolve your question now?and how?
I have the same question as yours.

xu
Quote:
Originally Posted by OTU View Post
Boetsie,

I am having troubles with my input data. At the very beginning of the run I get an error:
>> Can't write to single file filereads//home/annet/output6/TM7.Lib1.filtered.readpairs.singles.fasta-- fatal

What can it be about? My data consists of two fastq FR sequences and a fasta contig data.

Anna
xuseq is offline   Reply With Quote
Old 09-22-2013, 12:32 PM   #179
OTU
Member
 
Location: Utah

Join Date: May 2013
Posts: 44
Default

Xu,

Have you specified the output directory in your command line?
If yes - delete it.
OTU is offline   Reply With Quote
Old 09-22-2013, 05:16 PM   #180
xuseq
Junior Member
 
Location: China

Join Date: Sep 2013
Posts: 2
Default

Hi, Anna

That's right! Thank you!


Quote:
Originally Posted by OTU View Post
Xu,

Have you specified the output directory in your command line?
If yes - delete it.
xuseq is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:03 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO