![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
bacterial genome assembly on Miseq | Etherella | Bioinformatics | 5 | 12-13-2013 08:58 AM |
cuffmerge assembly vs denovo assembly of RNAseq data | skm | Bioinformatics | 0 | 10-16-2013 10:16 PM |
Inquiry: minimum length of reads for referece-based assembly or de novo assembly | sunfuhui | Bioinformatics | 1 | 10-04-2013 10:28 AM |
Miseq de novo assembly : Ambigous base pairs (NNs) in the contigs | ndeshpan | Bioinformatics | 2 | 07-21-2013 04:59 PM |
![]() |
|
Thread Tools |
![]() |
#21 |
Member
Location: Gainesville Join Date: Dec 2012
Posts: 28
|
![]()
I was wondering if spades can take 454 reads. I was thinking to use them in a hybrid assembly and perhaps "use" the 454 reads as a "pseudo sanger reads"
Thanks |
![]() |
![]() |
![]() |
#22 |
Member
Location: Saint Petersburg, Russia Join Date: Sep 2013
Posts: 25
|
![]()
Right. Basically, filtered subreads are the result of "P_Filter" step which can be performed at SMRT portal (and usually filtered subreads is what one would obtain from 3rd party sequencing provider).
|
![]() |
![]() |
![]() |
#23 | |
Member
Location: Saint Petersburg, Russia Join Date: Sep 2013
Posts: 25
|
![]() Quote:
You may also want to pre-correct them as IonTorrent reads (--iontorrent --only-error-correction), and try to provide corrected reads as an additional single read library. |
|
![]() |
![]() |
![]() |
#24 |
Member
Location: Gainesville Join Date: Dec 2012
Posts: 28
|
![]()
I am trying to estimate the coverage of the contigs from the spades output. In The fasta files the headers have the
>NODE_1_length_251_cov_0.7_ID27771 >NODE2_length_10997_cov_41_ID335 >... >... >.. Are cov_0.7 and cov_41 the kmer coverage of these contigs. I use this to plot coverages but they look very low in generak. Maybe I am getting the wrong information. Where Can I get the coverage info? Thanks |
![]() |
![]() |
![]() |
#25 | |
Member
Location: Saint Petersburg, Russia Join Date: Sep 2013
Posts: 25
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#26 |
Member
Location: Gainesville Join Date: Dec 2012
Posts: 28
|
![]()
Thanks for your previous answer.
Just curious now about how spades handle the long Pacbio reads. After this reads are corrected by illumina reads, are the long Pacbio reads "shopped in kmers two?, or these reads are used only for a later alignment (I am trying to follow the log file). The second question is about two warnings in my log file. How critical are these and if there is a way to fix it? The assembly looks good but I got this at the tail of the file: ===== Mismatch correction finished. * Corrected reads are in /scratch/lfs/ascunce/Spades/spadesPacBio_output/corrected/ * Assembled contigs are in /scratch/lfs/ascunce/Spades/spadesPacBio_output/contigs.fasta (contigs.fastg) * Assembled scaffolds are in /scratch/lfs/ascunce/Spades/spadesPacBio_output/scaffolds.fasta (scaffolds.fastg) ======= SPAdes pipeline finished WITH WARNINGS! === Error correction and assembling warnings: * 2:36:44.887 5G / 5G WARN General (kmer_coverage_model.cpp : 359) Failed to determine erroneous kmer threshold. Threshold set to: 7 * 2:17:39.519 8G / 9G WARN General (kmer_coverage_model.cpp : 359) Failed to determine erroneous kmer threshold. Threshold set to: 104 ======= Warnings saved to /scratch/lfs/ascunce/Spades/spadesPacBio_output/warnings.log Thanks for your help. |
![]() |
![]() |
![]() |
#27 | ||
Member
Location: Saint Petersburg, Russia Join Date: Sep 2013
Posts: 25
|
![]() Quote:
Quote:
|
||
![]() |
![]() |
![]() |
#28 |
Member
Location: NYC Join Date: Aug 2010
Posts: 48
|
![]()
re: attempting SPAdes with 454 reads -- if I understand correctly:
2) SPAdes does not natively support 454 reads 2) 454 reads resemble Ion Torrent reads (similar technology) but SPAdes will not do a hybrid Illumina/Ion Torrent assembly (though the IonHammer corrector could be used on 454 reads) 3) therefore to attempt 454/Illumina hybrid assembly, one must try treating 454 reads as Sanger 4) but Sanger reads do not have 'paired end' modes, so paired end* 454 reads will be treated as single-end (i.e., all paired-ness/insert size info is lost) Is that all correct? *Roche calls them 'paired end' but they are mate-pair. |
![]() |
![]() |
![]() |
#29 | |
Member
Location: Saint Petersburg, Russia Join Date: Sep 2013
Posts: 25
|
![]() Quote:
1. Correct your Illumina reads using --only-error-correction mode 2. Correct your 454 reads using --only-error-correction --iontorrent mode (make sure you're using the latest SPAdes release - it does support proper error correction of paired IonTorrent data) 3. Provide corrected reads from 1. and 2. and assemble everything using --only-assembler option (your 454 reads should go as mate pairs, yes). However, since your 454 data is likely of low coverage, then you can simply try to feed them as Illumina mate pairs. |
|
![]() |
![]() |
![]() |
#30 | |
Member
Location: NYC Join Date: Aug 2010
Posts: 48
|
![]() Quote:
...and oriented rf (reverse-forward) if they are to be interpreted as Illumina mate pairs? (yes, they are low coverage) Last edited by ssully; 12-01-2014 at 03:40 PM. |
|
![]() |
![]() |
![]() |
#31 | |
Member
Location: Saint Petersburg, Russia Join Date: Sep 2013
Posts: 25
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#32 |
Member
Location: NYC Join Date: Aug 2010
Posts: 48
|
![]()
I have removed the linkers and split the 454 mate pair reads with sff_extract; I have them now as (after deinterlacing) a pair of fastq files (454_1.fastq and 454_2.fastq) containing reads _1 and _2 only, respectively. In each case Read_1 represents the pre-linker and Read_2 represents the post-linker part of the original read, both in forward orientation:
schematic of original read Code:
================================^^^^^^^^^^^^^^^======================= 454_1---> linker 454_2---> because when assembled, they should be ordered _2 --> _1 (again both in forward i.e., 5'--3' orientation), with the library insert size distance between them schematic of assembled reads Code:
454_2 454_1 --------> (~3kb) --------> ================================================================== would a YAML readset section like this work? { orientation: "ff", type: "mate-pairs", right reads: [ "/FULL_PATH_TO_DATASET/454_1.fastq" ], left reads: [ "/FULL_PATH_TO_DATASET/454_2.fastq" ] }, or should it be { orientation: "ff", type: "mate-pairs", right reads: [ "/FULL_PATH_TO_DATASET/454_2.fastq" ], left reads: [ "/FULL_PATH_TO_DATASET/454_1.fastq" ] }, ? (I adapted these views from http://seqanswers.com/forums/showpos...85&postcount=2 ) Last edited by ssully; 12-02-2014 at 07:10 PM. |
![]() |
![]() |
![]() |
#33 | |
Member
Location: Saint Petersburg, Russia Join Date: Sep 2013
Posts: 25
|
![]() Quote:
Anyway, you can simply feed the data to SPAdes and check whether it inferred the insert size distribution properly. |
|
![]() |
![]() |
![]() |
#34 |
Member
Location: NYC Join Date: Aug 2010
Posts: 48
|
![]()
I don't know; the second variant seems to be saying to me , 'the reads from the right side of the library read (post-linker, 454_2.fastq) belong at the right end of the genome fragment' -- which would be incorrect.
For me it really comes down to what 'right reads' and 'left reads' means in the YAML specification: e.g. does 'right reads' refer to a read's position in the 454 mate pair library read (i.e., right side/post-linker in the 454 read, but maps to the left end of the genomic fragment) or with respect to the genome (i.e., maps to the right end of the genomic fragment...but comes from the left side/pre-linker half of the 454 read) (it's also unusual to me that 'right read' is specified before 'left read' in the YAML, for both paired end and mate pair types, given that sequences are typically read by humans from left to right, 5' to 3'... is there a particular reason for that?) But anyway I can try inputting it both ways, in two runs, and see which one assembles the 454 mate pairs correctly. Last edited by ssully; 12-03-2014 at 01:12 PM. |
![]() |
![]() |
![]() |
#35 |
Member
Location: NYC Join Date: Aug 2010
Posts: 48
|
![]()
I worked out the correct orientation and order of 454 paired reads input for SPAdes, and have corrected the reads with --iontorrent option (ionhammer). Btu now I have questions regarding ionhammer error correction -- does it pay any attention to fastq quality scores?
here is an original paired-end sff read (converted to fastq -- note 'sanger style' quality scores, and lower case for low-quality bases). I have underlined that the part that constitutes the 'post linker' read. sff to fastq @GIDY76W02G4JWL Code:
tcagTTATTGATCAGTATTAGAATGAGGCCTATTAATAGCCAATTATCACATTTTGGATCTATTTTGTATCGATGATATCATTTATCGATAATCATCATAGTTATTTCGTAATAACTTCGTATAGCATACATTATACGAAGTTATACGATTATTGCTATAAATAAACGTACTTCTGGAGTAGAATTGAAGTGAGATAGAATTTCTGGTTTTAAGctgagactgccaaggcacacaggggataggn + III;;;;BIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII;:8599>>@:9////92EBEDDDGIIIIIIFEC?:??IIHHHEIIIIIIIIIICCECC:??C==?EEEIEGHHIIIIIIIIGHFHGIIIIC?==CIIIIEEAAAEE>8333C444IIICIIIIGGGGGIIIGGGGIIIIIIIIIIIIA>999=499----./25:===;=@A>>::::EEIIII@@BAGGGII! sffToCA Code:
@GIDY76W02G4JWLb clr=0,95 clv=1,0 max=1,0 tnt=1,0 rnd=t TTATTGCTATAAATAAACGTACTTCTGGAGTAGAATTGAAGTGAGATAGAATTTCTGGTTTTAAGCTGAGACTGCCAAGGCACACAGGGGATAGG + IEEAAAEE>8333C444IIICIIIIGGGGGIIIGGGGIIIIIIIIIIIIA>999=499----./25:===;=@A>>::::EEIIII@@BAGGGII here was my spades command Code:
spades.py --only-error-correction --iontorrent --dataset 454_4.yaml -t 8 --sc -k 21,33,55 --disable-gzip-output -o sff2ca_spades_corrected and here is the output of ionhammer for the above read Code:
>GIDY76W02G4JWLb TTATTGCTATAAATAAACGTACTTCTGGAGTAGAATTGAAGTGAGATAGAATTTCTGTTTTAAGCTGAGACTGCCAAGGCACACAGGGGATAGG So, I'm not clear on what ionhammer should be doing; it appears I need to quality-trim my 454 reads *before* running them through ionhammer...*OR* I need to preserve the lower-case base formatting in the input file? Last edited by ssully; 12-06-2014 at 07:50 AM. |
![]() |
![]() |
![]() |
#36 |
Member
Location: Saint Petersburg, Russia Join Date: Sep 2013
Posts: 25
|
![]()
This is more or less expected. IonHammer is conservative - when it fails to correct something it preserves the original read and postpones the final decision to assembler. In general we suggest not to trim reads when the coverage is low.
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|