SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
PubMed: Parallelized short read assembly of large genomes using de Bruijn graphs. Newsbot! Literature Watch 0 12-30-2011 02:00 AM
Assembly of Large Genomes using Cloud Computing by Contrail Gangcai De novo discovery 9 11-23-2011 07:42 AM
Scaffolding tool glacerda Bioinformatics 0 08-04-2010 03:54 PM
PubMed: BFAST: An Alignment Tool for Large Scale Genome Resequencing. Newsbot! Literature Watch 0 11-13-2009 02:10 AM
BFAST: Blat-like Fast Accurate Search Tool for Large-Scale Genome Resequencing nilshomer Bioinformatics 1 11-06-2008 09:36 PM

Reply
 
Thread Tools
Old 02-18-2011, 09:56 AM   #41
boetsie
Senior Member
 
Location: NL, Leiden

Join Date: Feb 2010
Posts: 245
Default

Hi Hliang,

no i'm sorry, this is not possible. They should be paired in two files.

We use bowtie for mapping, were we only use only reads that map entirely for scaffolding. If the whole read can be mapped to the contig (thus without gaps) it should be possible. If it really works... I really don't know. You can give it a try The differences in size does not matter, Illumina reads with different read lengths is also possible. In the future it is a good idea to have a mapper for larger sequences, you know any?

Boetsie
boetsie is offline   Reply With Quote
Old 02-18-2011, 10:22 AM   #42
hliang
Junior Member
 
Location: US

Join Date: Oct 2010
Posts: 3
Default

gotcha.

I'm not doing a lot mapping at the moment. but there are a bunch of programs you can take a look at here: http://en.wikipedia.org/wiki/List_of...nment_software
MUMmer and MAQ can handle long reads.

There is another one called LAST not mentioned above: http://last.cbrc.jp/

Quote:
Originally Posted by boetsie View Post
We use bowtie for mapping, were we only use only reads that map entirely for scaffolding. If the whole read can be mapped to the contig (thus without gaps) it should be possible. If it really works... I really don't know. You can give it a try The differences in size does not matter, Illumina reads with different read lengths is also possible. In the future it is a good idea to have a mapper for larger sequences, you know any?

Boetsie
hliang is offline   Reply With Quote
Old 02-22-2011, 02:30 PM   #43
themwg
Junior Member
 
Location: Madison, WI

Join Date: Jan 2011
Posts: 6
Default

I have a question or two about the mapping stage.

I'm working with datasets that consist of a contig file assembled by using both paired end and mate pair data. I'm running SSpace with that contig file against the mate pair reads for scaffolding. In my best case I have 80 million inserted pairs, 10 million single reads and 7 million pairs with pairing contigs. In other cases 25 million inserted pairs, 600k single reads and 400k pairs w/ pairing contigs.

in the first case I do end up with extensive scaffolding despite ~6% of the reads mapping. in the other cases with less than 1% reads used for mapping I get very little scaffolding. I'm a little concerned about the low level of reads mapping to my contigs. and without getting into details of my datasets (as they are different species and could be the source of the difference) I'm curious if you have any thoughts on this from the program's point of view.

Perhaps I just need some clarification of some of the terms.
#number of single reads found on contigs =
(I use an insert size of 3000bp with a std dev of .5)
regarding the mapping step, does this mean you take the 4500bp from the left and right edge of each contig to use for the mapping step or do you delete 4500 bp off each edge and just use the middle of the contigs for mapping. I assume it's the first option but you use the word "subtracted" in the readme file which is somewhat misleading.

#number of pairs found with pairing contigs =
for "pairing contigs" I get numbers that are greater than half the single reads. If SSPACE uses 10 million single reads for mapping, I would imagine that at most I could get 5 million pairs

#total pairs =
I'm unclear about what this number means. total read pairs used in mapping? if so, i'm unclear how this relates to the single reads. my understanding is that SSPACE/BOwtie takes all the read pairs that don't have Ns then maps each single read to the contigs. It then determines which of the reads are paired and what contigs those lie on etc.

any light you could shed would be greatly appreciated.. I'm fully ready to realize i'm just being dense.
themwg is offline   Reply With Quote
Old 02-23-2011, 12:13 AM   #44
boetsie
Senior Member
 
Location: NL, Leiden

Join Date: Feb 2010
Posts: 245
Default

Hi themwg,

thank you for the good points you mention there. I see indeed some vague descriptions and mistakes in the summary file.

About your questions;

- I indeed take 4500bp from the left and right edge for scaffolding, which is of course the obvious method.

- You are absolutely right, the number of pairs should indeed be at least two times smaller than the number of single reads. I see that I displayed the wrong variable in my script. I will fix this in a next release.

- As said above, wrong calculation for the total number of single reads. Total pairs is a sort of filtering step for the pairs. The actual pairs used for scaffolding is the value given at "Assembled pairs".

I'm sorry for the mistakes, as said, i will fix this in next release which will probably come in the next week.

Kind regards,
Boetsie
boetsie is offline   Reply With Quote
Old 02-23-2011, 03:10 PM   #45
jstjohn
Member
 
Location: San Francisco, CA

Join Date: Jun 2010
Posts: 35
Default file in ./reads/ folder really small

Hey,
I am a little worried because my input files were each about 3G of gzipped fastq, and the .fasta files in the ./reads/ folder are only about 100M each. I am pretty sure that there are more perfect reads without N's than that... I trimmed bases from the beginning and end of reads prior to running the program, and the file should only have reads that are over 30nt.

One possible bug is that I noticed a few of the fastq reads have 0 length in my input file, but they are still paired properly, and have the right new lines and everything so the two files are the right relative length. Do you think that is causing issues for the program?

UPDATE: fixed the above issue with the few 0 length reads, and the output files still have this issue of being very small compared with the size of the input files. Maybe I just don't understand what the files in the ./reads folder are?

Thanks!
-John

Last edited by jstjohn; 02-23-2011 at 06:26 PM.
jstjohn is offline   Reply With Quote
Old 02-24-2011, 01:29 AM   #46
boetsie
Senior Member
 
Location: NL, Leiden

Join Date: Feb 2010
Posts: 245
Default

Hi jstjohn,

in the 'reads' folder there are 2 files generated per library.

*.filtered.readpairs.singles.fasta extension is the filtered file during reading of the paired-read files. This file is filtered on paired-sequences containing N's. If one of the two sequences of a pair contains an 'N' character, the paired-read is discarded.

*.foundpairedreads extension contains the paired-reads used for scaffolding. Keep in mind that we only use the edges of the contigs for mapping the reads, therefore less reads can be mapped (though we don't mind, because they are not very useful for scaffolding).

Sorry that this was not clear.

Boetsie
boetsie is offline   Reply With Quote
Old 03-15-2011, 04:59 AM   #47
davfre
Junior Member
 
Location: Bergen, Norway

Join Date: Nov 2009
Posts: 2
Default

I, too, would like to include BAC end sequences (Sanger seq) to scaffold pre-assembled contigs/scaffolds. I had trouble running Bambus (due to the requirement for a .contig file) and found SSPACE. However, Bowtie fails to map a majority of long reads due to indels/accumulation of mismatches in the long reads, so I would rather like to use (the best) mappings from bwa-swa, blat or another tool.

Did anyone already solve the issue of either reading sam files, or convert those to the simple tab delimited format asked for by SSPACE? Would be great, as most aligners can output sam or be converted to that format with existing tools.
davfre is offline   Reply With Quote
Old 03-15-2011, 08:04 AM   #48
boetsie
Senior Member
 
Location: NL, Leiden

Join Date: Feb 2010
Posts: 245
Default

Hi davfre,

i'm sorry, i've not included this functionality yet. Also, It is not possible to include TAB-delimited files in SSPACE. Reason is that I did not have much time to work on this. I hope to do so in the near future, it is on my to-do list

Kind regards,
Boetsie
boetsie is offline   Reply With Quote
Old 03-28-2011, 02:21 PM   #49
boetsie
Senior Member
 
Location: NL, Leiden

Join Date: Feb 2010
Posts: 245
Default

Quote:
Originally Posted by e-summer-3 View Post
SSPACE is very nice tool for us. Thank you for your good job.

By the way, what should I call SSPACE?

es es pace?
es pace?
es space?

Regards.
I'm sorry, i had not seen this question before, i must have missed it. You can call it 'es space'

Boetsie
boetsie is offline   Reply With Quote
Old 03-30-2011, 04:37 AM   #50
SLB
Member
 
Location: Ireland

Join Date: Sep 2010
Posts: 21
Default input files

Hi, I am just trying out SSPACE at the moment for a large genome (2.7Gb) using contigs assembled from abyss and a number of 3,5 and 10Kb mate pairs. Is there any way of getting SSPACE to accept compressed files as input?
SLB is offline   Reply With Quote
Old 03-30-2011, 07:14 AM   #51
boetsie
Senior Member
 
Location: NL, Leiden

Join Date: Feb 2010
Posts: 245
Default

Quote:
Originally Posted by SLB View Post
Hi, I am just trying out SSPACE at the moment for a large genome (2.7Gb) using contigs assembled from abyss and a number of 3,5 and 10Kb mate pairs. Is there any way of getting SSPACE to accept compressed files as input?
hi SLB,

i'm sorry but this is currently not possible.

Boetsie
boetsie is offline   Reply With Quote
Old 05-05-2011, 04:17 AM   #52
SLB
Member
 
Location: Ireland

Join Date: Sep 2010
Posts: 21
Default

I am just interested if anyone has been playing around with the parameters of SSPACE (k, a,) and has any insights.

Cheers,

Stephen
SLB is offline   Reply With Quote
Old 05-26-2011, 12:02 PM   #53
ajtritt
Junior Member
 
Location: United States

Join Date: May 2011
Posts: 1
Default -t parameter

hi boetsie,

First, SSPACE is a great piece of software. Hands down, it's the easiest to use and most efficient scaffolder I've found and gives great results. Thanks for all your work.

I have a question about the -t parameter. After clipping up to -t bases, does SSPACE try to extended contigs again?

thanks,
ajtritt
ajtritt is offline   Reply With Quote
Old 05-26-2011, 12:56 PM   #54
boetsie
Senior Member
 
Location: NL, Leiden

Join Date: Feb 2010
Posts: 245
Default

Quote:
Originally Posted by ajtritt View Post
hi boetsie,

First, SSPACE is a great piece of software. Hands down, it's the easiest to use and most efficient scaffolder I've found and gives great results. Thanks for all your work.

I have a question about the -t parameter. After clipping up to -t bases, does SSPACE try to extended contigs again?

thanks,
ajtritt
Hi ajtritt,

thank you so much for this nice comment! Good that it gives great results for you, and I'm glad it could help.

To clearify about the trimming (-t option); SSPACE does not first clip up to -t bases. It first tries to extend. Then, if it can no longer extend a contig, it will try to trim bases and try to extend them again. This will go through until no more extension can be done as well as the trimming does not lead to further extension.

Kind regards,
Boetsie
boetsie is offline   Reply With Quote
Old 06-07-2011, 06:44 AM   #55
SLB
Member
 
Location: Ireland

Join Date: Sep 2010
Posts: 21
Default

Is it possible to change the bowtie options to allow mismatches?
SLB is offline   Reply With Quote
Old 06-07-2011, 07:19 AM   #56
boetsie
Senior Member
 
Location: NL, Leiden

Join Date: Feb 2010
Posts: 245
Default

Quote:
Originally Posted by SLB View Post
Is it possible to change the bowtie options to allow mismatches?
Hi SLB,

At the moment not, but in the next version i'm planning to include this option as well as multithreading bowtie.

For the moment, you can change the code and do it yourself . Go to the 'bin' directory and open the file MapWithBowtie.pl. Now go the lines starting with 'system'

Change the -v 0 to your required number of gaps, for example -v 2 allows two gaps.

For gapped mapping of the reads on the contigs for extension, change the first line starting with system("$bowtiepath ...
For gapped mapping of the reads on the contigs for scaffolding, change the second line starting with system("$bowtiepath ...

Boetsie
boetsie is offline   Reply With Quote
Old 06-07-2011, 07:28 AM   #57
SLB
Member
 
Location: Ireland

Join Date: Sep 2010
Posts: 21
Default

Hi Boetsie,

Thanks a million.

Cheers,

Stephen

Quote:
Originally Posted by boetsie View Post
Hi SLB,

At the moment not, but in the next version i'm planning to include this option as well as multithreading bowtie.

For the moment, you can change the code and do it yourself . Go to the 'bin' directory and open the file MapWithBowtie.pl. Now go the lines starting with 'system'

Change the -v 0 to your required number of gaps, for example -v 2 allows two gaps.

For gapped mapping of the reads on the contigs for extension, change the first line starting with system("$bowtiepath ...
For gapped mapping of the reads on the contigs for scaffolding, change the second line starting with system("$bowtiepath ...

Boetsie
SLB is offline   Reply With Quote
Old 06-08-2011, 03:33 AM   #58
VidJa
Junior Member
 
Location: The Netherlands

Join Date: Apr 2010
Posts: 7
Default

Hi Boetie,

Do you plan to include the use of SAM/BAM as output instead of the default bowtie, that way you clould dropin all sam/bam capable aligners, like BWA (for longer reads with the sw option), Smalt http://www.sanger.ac.uk/resources/software/smalt/
or Stampy http://www.well.ox.ac.uk/project-stampy
and probably a full bag of other aligners.
VidJa is offline   Reply With Quote
Old 06-08-2011, 06:35 AM   #59
boetsie
Senior Member
 
Location: NL, Leiden

Join Date: Feb 2010
Posts: 245
Default

Hi Vidja,

yes, i'm currently working on this though i don't have much time. Thank you for the suggestion of BWA for longer reads! I hope I can include more aligners.

Boetsie

Quote:
Originally Posted by VidJa View Post
Hi Boetie,

Do you plan to include the use of SAM/BAM as output instead of the default bowtie, that way you clould dropin all sam/bam capable aligners, like BWA (for longer reads with the sw option), Smalt http://www.sanger.ac.uk/resources/software/smalt/
or Stampy http://www.well.ox.ac.uk/project-stampy
and probably a full bag of other aligners.
boetsie is offline   Reply With Quote
Old 06-08-2011, 08:34 AM   #60
KanyeDidIt
Junior Member
 
Location: US

Join Date: Sep 2010
Posts: 8
Default

edited after solution

Last edited by KanyeDidIt; 06-08-2011 at 08:42 AM. Reason: solved
KanyeDidIt is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:29 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO