![]() |
|
|||||||
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| PubMed: Parallelized short read assembly of large genomes using de Bruijn graphs. | Newsbot! | Literature Watch | 0 | 12-30-2011 02:00 AM |
| Assembly of Large Genomes using Cloud Computing by Contrail | Gangcai | De novo discovery | 9 | 11-23-2011 07:42 AM |
| Scaffolding tool | glacerda | Bioinformatics | 0 | 08-04-2010 03:54 PM |
| PubMed: BFAST: An Alignment Tool for Large Scale Genome Resequencing. | Newsbot! | Literature Watch | 0 | 11-13-2009 02:10 AM |
| BFAST: Blat-like Fast Accurate Search Tool for Large-Scale Genome Resequencing | nilshomer | Bioinformatics | 1 | 11-06-2008 09:36 PM |
![]() |
|
|
Thread Tools |
|
|
#121 | |
|
Senior Member
Location: NL, Leiden Join Date: Feb 2010
Posts: 206
|
Hi Lisa,
no problem, good that I could help you and it's all working now! Good luck and feel free to contact me if you have any questions. Regards, Boetsie Quote:
|
|
|
|
|
|
|
#122 |
|
Senior Member
Location: NL, Leiden Join Date: Feb 2010
Posts: 206
|
Hi all,
we have released a new version of both SSPACE Basic and SSPACE Premium. SSPACE Basic is the previous version of SSPACE Premium. The new SSPACE premium contains the following new features:
See our website for more information about SSPACE and GapFiller: http://www.baseclear.com/landingpage...ics-solutions/ Kind regards, Boetsie |
|
|
|
|
|
#123 | |
|
Member
Location: Belgium Join Date: Aug 2011
Posts: 14
|
Quote:
does this mean that we should have better results with the -m parameter optimised for k-mer size instead of read length ? How can we know the k-mer size used and how do we best adjust the -m value for example for a 50bp read? regards, Steve |
|
|
|
|
|
|
#124 |
|
Senior Member
Location: NL, Leiden Join Date: Feb 2010
Posts: 206
|
Hi Steve,
The kmer size used is just the (-m +1)value. -m thus actually means the overlap the kmer should have, and the extra nucleotide is the 'overhang'. The difference between the two; previous method: ctg: GTCGATAGATAGATCGTCGATAGTAGTCGA read:...GATTGATAGATCGTCGATAGTAGTCGAG The above read will not be used for extension, since it contains a mismatch and thus does not fully overlap with the contig. The new method cuts the read into k-mers; Say we use a -m of 20, the kmers of the read is; READ: GATAGATCGTCGATAGTAGTCGAGAT kmer: GATAGATCGTCGATAGTAGTC kmer: .ATAGATCGTCGATAGTAGTCG kmer: ..TAGATCGTCGATAGTAGTCGA kmer: ...AGATCGTCGATAGTAGTCGAG kmer: ....GATCGTCGATAGTAGTCGAGA kmer: .....ATCGTCGATAGTAGTCGAGAT etc... if we now extend the contig, the overlapping k-mer is; ctg: GTCGATAGATAGATCGTCGATAGTAGTCGA read:..........AGATCGTCGATAGTAGTCGAG This will thus increase the coverage since it removes the errors, especially for longer reads. Regards, Boetsie |
|
|
|
|
|
#125 |
|
Member
Location: Germany Join Date: Apr 2010
Posts: 93
|
Hey all,
I got a quite strange problem: my contig fasta file looks like: >22617 GTCTACTTCAGACAAGGAAGACGGTCTACTTCAGATGAGGAAGATGGTCTGCTACAAAGGGAAGACGGTCTGCTTCAGGCCAGGAAGACGGTCTGCTACA >22619 CGTCTTCCAATTTTGAATCAGACCGTCTTGATTTTGAATTGGACCGTCTCCCCTGGGCGCATCTGCTGGGCCGCTGGGGCTGGAACCGTGGCTCAAAATT >22621 TTCCTCAGCAACAACATTGATGGTGTCTTTTGTGTACATGTATGAGTAGTCAGTCAAGTAAAGTATGCGCACCTGTCTTTTGGTAAGCCTACGCAGCCTG >22623 AGGCACTCTGCCCGAGTGGTTAAGGGGTAAGTCTCGAATACATTATTCGACCGTCCATCATGACGGGTTAACTTATAGGCTCTGCCTGCGTCGGTTCAAA BUT the programms tells me that: ERROR: Invalid (-s) contig file /home/dpr..../de_novo_assembly_DNA/SOAPdenovo_39/PseudoAfi_K39.contig.fastasorted.fasta ...Exiting. So can u tell me why my file should be corrupt? Any help is kindly appreciated, best Phil |
|
|
|
|
|
#126 | |
|
Senior Member
Location: NL, Leiden Join Date: Feb 2010
Posts: 206
|
Quote:
the error has nothing to do with the file format. The line where this error occurs is just checking whether the contig file exists or not. Somehow it does not find your file. Can you check if the file is really at the specified location and that the user rights are correct? Boetsie |
|
|
|
|
|
|
#127 |
|
Member
Location: Germany Join Date: Apr 2010
Posts: 93
|
Hey,
sry for the late answer but I was not in the office last days. I checked the location and it is the right one so maybe i got something wrong in the library file. here is the line containing my library... TrueSeqStd /home/dpr/P/PA/SGII_ATCACG_L003_R1.fastq /home/dpr/P/PA/SGII_ATCACG_L003_R2.fastq 50 0.5 FR maybe there is a fault? Best, Phil got it, thanks for the help
Last edited by sphil; 01-12-2012 at 11:55 PM. Reason: solved |
|
|
|
|
|
#128 |
|
Junior Member
Location: Lviv Join Date: May 2011
Posts: 5
|
Dear boetsie,
Is it possible to implement a feature in SSPACE for it to recognize inward-facing reads in a Illumina MP library? This is a serious problem for some library preparations. This feature is present in Ray assembler, for example: http://seqanswers.com/forums/showthr...?t=4301&page=7 Regards, Nestor |
|
|
|
|
|
#129 | |
|
Senior Member
Location: NL, Leiden Join Date: Feb 2010
Posts: 206
|
Hi Nestor,
This is already implemented in SSPACE. Basically, Ray does the same as SSPACE by incoorperating a range of allowed reads, for example an insert size of 4000 with 0.25 deviation (range is thus 3000-5000). This will initialy filter out 'paired-end' reads, since these have smaller insert sizes (< 500bp). In addition, SSPACE requires for each library the orientation of the paired-reads. If you specify the orientation <-- -->, --> <-- paired-reads will not be taking into account for scaffolding. Regards, Boetsie Quote:
|
|
|
|
|
|
|
#130 | |
|
Junior Member
Location: Lviv Join Date: May 2011
Posts: 5
|
Dear boetsie,
What's with the libraries, where number of "smaller insert size" read pairs is significantly higher, than of "long insert size" read pairs? Don't you think that using such libraries with SSPACE could lead to horrible results such as, in some cases, re-orienting the contigs? Is SSPACE capable now of detecting such libraries by counting PE/MP ratio of reads that were mapped within each contiguous sequence of DNA? Regards, Nestor Quote:
|
|
|
|
|
|
|
#131 | |
|
Senior Member
Location: NL, Leiden Join Date: Feb 2010
Posts: 206
|
Quote:
I do not see the benefit of including the PE/MP ratio of reads mapped within a contig, they do not contribute to the scaffolding process. They can only influence the process when the pairs are aligned on different contigs, but as said, they will be filtered out because of orientation. |
|
|
|
|
|
|
#132 | |
|
Junior Member
Location: Lviv Join Date: May 2011
Posts: 5
|
Dear boetsie,
Thank you for the answer. I still, however, would not agree. Correct me, please, if i am wrong. If we have contig 1 and contig 2 with some PE reads (short arrow "->") and some MP reads (longer arrow "-->") like this: Code:
contig 1 contig 2
5`------------3` 5`------------3`
<-- -> <- -->
-> <-
---------- 4000bp ----------
We gave SSPACE the information that the library is MP with 4000bp insert size. Won't SSPACE reverse-complement contigs in this manner to make the more-abundant "PE" reads to fit the 4000bp "<-- -->" pattern? Code:
contig 1(RC) contig 2 (RC)
5`------------3` 5`------------3`
<- --> <-- ->
<- ->
---------- 4000bp ----------
Regards, Nestor Quote:
Last edited by user1313; 02-29-2012 at 06:24 AM. |
|
|
|
|
|
|
#133 |
|
Senior Member
Location: NL, Leiden Join Date: Feb 2010
Posts: 206
|
Yes, you are right, sorry. But this will only happen if both the contigs are short. Say the pair-end reads are mapped as following;
Code:
contig 1 (1000bp) contig 2 (8000 bp)
5`------------>3` 5`----------------------------->3`
<- <-
pos900 pos100
Code:
contig 1 (1000bp) contig 2 (8000 bp)
5`------------3` 3`<-----------------------------5`
<- ->
pos900 pos7900
I agree though, that if contig 2 is 4000bp smaller, the distance would be 4000bp. Near the size of your library! This could be a problem, especially with contig orientation and insert size estimation (distance is not 4000 for above example, but ~200bp (1000-900 of contig1) + (pos100 of contig2)). Thanks for the direction, I'll try to dive deeper into this... Regards, Boetsie |
|
|
|
|
|
#134 |
|
Member
Location: Gothenburg/Uppsala, Sweden Join Date: Oct 2010
Posts: 81
|
Is it possible to run SSPACE on external read mappings, i.e. can I perform the read mappings on my own and then have SSPACE do the scaffolding based on them?
|
|
|
|
|
|
#135 | |
|
Senior Member
Location: NL, Leiden Join Date: Feb 2010
Posts: 206
|
Quote:
<contig1> <startpos_on_contig1> <endpos_on_contig1> <contig2> <startpos_on_contig2> <endpos_on_contig2> E.g. contig1 100 150 contig1 350 300 contig1 4000 4050 contig2 110 60 There is a script in the 'tools' directory of the package to convert SAM/BAM to a tab format. Regards, Boetsie |
|
|
|
|
|
|
#136 |
|
Member
Location: Uppsala, Sweden Join Date: Apr 2010
Posts: 27
|
I am having problems using SSPACE basic with my 454 paired-end data, and was hoping to get some help here. SSPACE runs fine using my Illumina PE data, but my 454-data has much longer insert-sizes (3-5 kb), and I think they really could make difference.
My problem is that SSPACE reads all the 454-pairs in, removes quite a lot of them as the include Ns, and then maps 0 of them. The report is below. It was difficult to get the reads in a format that SSPACE accepts, and I guess that the problem lies in the fastq-files. Some (very few) reads are too long (over 1024 bases), and bowtie complains about these. Would this crash the whole run? I know that bowtie is not the best choice for longer reads, but I thought it would still manage to map some reads? Is SSPACE premium the answer? Any/all help would be much appreciated, Henrik READING READS Lib454: ------------------------------------------------------------ Total inserted pairs = 1217215 Number of pairs containing N's = 1066178 Remaining pairs = 151037 ------------------------------------------------------------ ... LIBRARY Lib454 STATS: ################################################################################ MAPPING READS TO CONTIGS: ------------------------------------------------------------ Number of single reads found on contigs = 0 Number of pairs used for pairing contigs / total pairs = 0 / 0 ------------------------------------------------------------ READ PAIRS STATS: Assembled pairs: 0 (0 sequences) Satisfied in distance/logic within contigs (i.e. -> <-, distance on target: 3709 +/-927.25): 0 Unsatisfied in distance within contigs (i.e. distance out-of-bounds): 0 Unsatisfied pairing logic within contigs (i.e. illogical pairing ->->, <-<- or <-->): 0 --- Satisfied in distance/logic within a given contig pair (pre-scaffold): 0 Unsatisfied in distance within a given contig pair (i.e. calculated distances out-of-bounds): 0 --- Total satisfied: 0 unsatisfied: 0 Estimated insert size statistics (based on 0 pairs): Mean insert size = 0 Median insert size = 0 REPEATS: Number of repeated edges = 0 ------------------------------------------------------------ ################################################################################ |
|
|
|
|
|
#137 | |
|
Member
Location: Gothenburg/Uppsala, Sweden Join Date: Oct 2010
Posts: 81
|
Quote:
On a slightly related note, how well do you think SSPACE would deal with scaffolding information from other sources than paired/mate-reads, such as e.g. physical/genetic linkage data (supplied then in the above file format)? Some scaffolders (notably Bambus) claim to be able to work with essentially any kind of link information between contigs - could the same be said of SSPACE? |
|
|
|
|
|
|
#138 | |
|
Senior Member
Location: NL, Leiden Join Date: Feb 2010
Posts: 206
|
Quote:
Boetsie |
|
|
|
|
|
|
#139 | |
|
Senior Member
Location: NL, Leiden Join Date: Feb 2010
Posts: 206
|
Quote:
|
|
|
|
|
|
|
#140 |
|
Junior Member
Location: Croatia Join Date: May 2012
Posts: 2
|
Hi,
to save me the hassle of going through the code, I have a short question regarding insert sizes. When scaffolding, does SSPACE use the user specified insert size (from the library.txt file), or the estimated insert size (that is reported in the summary file)? It is important, since in my case these two seem to differ, and I need the real (user-specified) value to be used. Thank you, Ivan. |
|
|
|
![]() |
| Thread Tools | |
|
|