SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
bacterial genome assembly on Miseq Etherella Bioinformatics 5 12-13-2013 08:58 AM
cuffmerge assembly vs denovo assembly of RNAseq data skm Bioinformatics 0 10-16-2013 10:16 PM
Inquiry: minimum length of reads for referece-based assembly or de novo assembly sunfuhui Bioinformatics 1 10-04-2013 10:28 AM
Miseq de novo assembly : Ambigous base pairs (NNs) in the contigs ndeshpan Bioinformatics 2 07-21-2013 04:59 PM

Reply
 
Thread Tools
Old 04-15-2014, 03:46 PM   #21
joxcargator73
Member
 
Location: Gainesville

Join Date: Dec 2012
Posts: 28
Default One more question for 454 reads

I was wondering if spades can take 454 reads. I was thinking to use them in a hybrid assembly and perhaps "use" the 454 reads as a "pseudo sanger reads"

Thanks
joxcargator73 is offline   Reply With Quote
Old 04-16-2014, 03:06 AM   #22
akorobeynikov
Member
 
Location: Saint Petersburg, Russia

Join Date: Sep 2013
Posts: 25
Default

Quote:
Originally Posted by Brian Bushnell View Post
Filtered subreads are broken into pieces at adapters. The raw reads would potentially have the same sequence multiple times: forward, adapter, reverse, adapter, etc.
Right. Basically, filtered subreads are the result of "P_Filter" step which can be performed at SMRT portal (and usually filtered subreads is what one would obtain from 3rd party sequencing provider).
akorobeynikov is offline   Reply With Quote
Old 04-16-2014, 03:08 AM   #23
akorobeynikov
Member
 
Location: Saint Petersburg, Russia

Join Date: Sep 2013
Posts: 25
Default

Quote:
Originally Posted by joxcargator73 View Post
I was wondering if spades can take 454 reads. I was thinking to use them in a hybrid assembly and perhaps "use" the 454 reads as a "pseudo sanger reads"
Thanks
This may be non-trivial. But probably yes - try to provide them as "sanger" reads.

You may also want to pre-correct them as IonTorrent reads (--iontorrent --only-error-correction), and try to provide corrected reads as an additional single read library.
akorobeynikov is offline   Reply With Quote
Old 05-19-2014, 09:46 AM   #24
joxcargator73
Member
 
Location: Gainesville

Join Date: Dec 2012
Posts: 28
Default Coverage

I am trying to estimate the coverage of the contigs from the spades output. In The fasta files the headers have the
>NODE_1_length_251_cov_0.7_ID27771
>NODE2_length_10997_cov_41_ID335
>...
>...
>..

Are cov_0.7 and cov_41 the kmer coverage of these contigs. I use this to plot coverages but they look very low in generak. Maybe I am getting the wrong information.
Where Can I get the coverage info?
Thanks
joxcargator73 is offline   Reply With Quote
Old 05-19-2014, 02:23 PM   #25
akorobeynikov
Member
 
Location: Saint Petersburg, Russia

Join Date: Sep 2013
Posts: 25
Default

Quote:
Originally Posted by joxcargator73 View Post
I am trying to estimate the coverage of the contigs from the spades output. In The fasta files the headers have the
>NODE_1_length_251_cov_0.7_ID27771
>NODE2_length_10997_cov_41_ID335
>...
>...
>..

Are cov_0.7 and cov_41 the kmer coverage of these contigs. I use this to plot coverages but they look very low in generak. Maybe I am getting the wrong information.
Where Can I get the coverage info?
Thanks
The reported coverages are the average k-mer coverage of the contig. Use the last k-mer iteration for the value of a k-mer length.
akorobeynikov is offline   Reply With Quote
Old 05-21-2014, 07:07 AM   #26
joxcargator73
Member
 
Location: Gainesville

Join Date: Dec 2012
Posts: 28
Default Pacbio in Spades

Thanks for your previous answer.
Just curious now about how spades handle the long Pacbio reads. After this reads are corrected by illumina reads, are the long Pacbio reads "shopped in kmers two?, or these reads are used only for a later alignment (I am trying to follow the log file).


The second question is about two warnings in my log file. How critical are these and if there is a way to fix it?

The assembly looks good but I got this at the tail of the file:

===== Mismatch correction finished.

* Corrected reads are in /scratch/lfs/ascunce/Spades/spadesPacBio_output/corrected/
* Assembled contigs are in /scratch/lfs/ascunce/Spades/spadesPacBio_output/contigs.fasta (contigs.fastg)
* Assembled scaffolds are in /scratch/lfs/ascunce/Spades/spadesPacBio_output/scaffolds.fasta (scaffolds.fastg)

======= SPAdes pipeline finished WITH WARNINGS!

=== Error correction and assembling warnings:
* 2:36:44.887 5G / 5G WARN General (kmer_coverage_model.cpp : 359) Failed to determine erroneous kmer threshold. Threshold set to: 7
* 2:17:39.519 8G / 9G WARN General (kmer_coverage_model.cpp : 359) Failed to determine erroneous kmer threshold. Threshold set to: 104
======= Warnings saved to /scratch/lfs/ascunce/Spades/spadesPacBio_output/warnings.log

Thanks for your help.
joxcargator73 is offline   Reply With Quote
Old 05-23-2014, 12:17 PM   #27
akorobeynikov
Member
 
Location: Saint Petersburg, Russia

Join Date: Sep 2013
Posts: 25
Default

Quote:
Originally Posted by joxcargator73 View Post
Thanks for your previous answer.
Just curious now about how spades handle the long Pacbio reads. After this reads are corrected by illumina reads, are the long Pacbio reads "shopped in kmers two?, or these reads are used only for a later alignment (I am trying to follow the log file).
SPAdes uses PacBio reads for repeat resolution. So, it uses original uncorrected reads.


Quote:
The second question is about two warnings in my log file. How critical are these and if there is a way to fix it?

The assembly looks good but I got this at the tail of the file:

===== Mismatch correction finished.

* Corrected reads are in /scratch/lfs/ascunce/Spades/spadesPacBio_output/corrected/
* Assembled contigs are in /scratch/lfs/ascunce/Spades/spadesPacBio_output/contigs.fasta (contigs.fastg)
* Assembled scaffolds are in /scratch/lfs/ascunce/Spades/spadesPacBio_output/scaffolds.fasta (scaffolds.fastg)

======= SPAdes pipeline finished WITH WARNINGS!

=== Error correction and assembling warnings:
* 2:36:44.887 5G / 5G WARN General (kmer_coverage_model.cpp : 359) Failed to determine erroneous kmer threshold. Threshold set to: 7
* 2:17:39.519 8G / 9G WARN General (kmer_coverage_model.cpp : 359) Failed to determine erroneous kmer threshold. Threshold set to: 104
======= Warnings saved to /scratch/lfs/ascunce/Spades/spadesPacBio_output/warnings.log

Thanks for your help.
Usually such warnings indicate that you have quite uneven coverage and trying to assemble in multi-cell mode. Please email us (at SPAdes support ) your spades.log, so we can see whether it's indeed so.
akorobeynikov is offline   Reply With Quote
Old 12-01-2014, 10:54 AM   #28
ssully
Member
 
Location: NYC

Join Date: Aug 2010
Posts: 48
Default

re: attempting SPAdes with 454 reads -- if I understand correctly:

2) SPAdes does not natively support 454 reads

2) 454 reads resemble Ion Torrent reads (similar technology) but SPAdes will not do a hybrid Illumina/Ion Torrent assembly (though the IonHammer corrector could be used on 454 reads)

3) therefore to attempt 454/Illumina hybrid assembly, one must try treating 454 reads as Sanger

4) but Sanger reads do not have 'paired end' modes, so paired end* 454 reads will be treated as single-end (i.e., all paired-ness/insert size info is lost)

Is that all correct?




*Roche calls them 'paired end' but they are mate-pair.
ssully is offline   Reply With Quote
Old 12-01-2014, 10:59 AM   #29
akorobeynikov
Member
 
Location: Saint Petersburg, Russia

Join Date: Sep 2013
Posts: 25
Default

Quote:
Originally Posted by ssully View Post
re: attempting SPAdes with 454 reads -- if I understand correctly:

2) SPAdes does not natively support 454 reads

2) 454 reads resemble Ion Torrent reads (similar technology) but SPAdes will not do a hybrid Illumina/Ion Torrent assembly (though the IonHammer corrector could be used on 454 reads)

3) therefore to attempt 454/Illumina hybrid assembly, one must try treating 454 reads as Sanger

4) but Sanger reads do not have 'paired end' modes, so paired end* 454 reads will be treated as single-end (i.e., all paired-ness/insert size info is lost)

Is that all correct?

*Roche calls them 'paired end' but they are mate-pair.
You're missing the 5th possibility which is actually the proper choice here. Basically:

1. Correct your Illumina reads using --only-error-correction mode
2. Correct your 454 reads using --only-error-correction --iontorrent mode (make sure you're using the latest SPAdes release - it does support proper error correction of paired IonTorrent data)
3. Provide corrected reads from 1. and 2. and assemble everything using --only-assembler option (your 454 reads should go as mate pairs, yes).

However, since your 454 data is likely of low coverage, then you can simply try to feed them as Illumina mate pairs.
akorobeynikov is offline   Reply With Quote
Old 12-01-2014, 03:38 PM   #30
ssully
Member
 
Location: NYC

Join Date: Aug 2010
Posts: 48
Default

Quote:
Originally Posted by akorobeynikov View Post
You're missing the 5th possibility which is actually the proper choice here. Basically:

1. Correct your Illumina reads using --only-error-correction mode
2. Correct your 454 reads using --only-error-correction --iontorrent mode (make sure you're using the latest SPAdes release - it does support proper error correction of paired IonTorrent data)
3. Provide corrected reads from 1. and 2. and assemble everything using --only-assembler option (your 454 reads should go as mate pairs, yes).

However, since your 454 data is likely of low coverage, then you can simply try to feed them as Illumina mate pairs.
But 454 paired end reads are two 'end' reads connected by a linker sequence. Does the IonHammer corrector actually recognize those and split the reads before correcting? Or do the 454 PE reads have to first be split into left/right by linker removal, then run through --only-error-correction?

...and oriented rf (reverse-forward) if they are to be interpreted as Illumina mate pairs? (yes, they are low coverage)

Last edited by ssully; 12-01-2014 at 03:40 PM.
ssully is offline   Reply With Quote
Old 12-01-2014, 11:44 PM   #31
akorobeynikov
Member
 
Location: Saint Petersburg, Russia

Join Date: Sep 2013
Posts: 25
Default

Quote:
Originally Posted by ssully View Post
But 454 paired end reads are two 'end' reads connected by a linker sequence. Does the IonHammer corrector actually recognize those and split the reads before correcting? Or do the 454 PE reads have to first be split into left/right by linker removal, then run through --only-error-correction?

...and oriented rf (reverse-forward) if they are to be interpreted as Illumina mate pairs? (yes, they are low coverage)
You need to split them before, yes. Make sure you specified the correct library type (mate pairs) and the correct orientation (whatever you have, e.g. even ff is supported). See http://spades.bioinf.spbau.ru/releas...al.html#sec3.2 for more information
akorobeynikov is offline   Reply With Quote
Old 12-02-2014, 06:43 PM   #32
ssully
Member
 
Location: NYC

Join Date: Aug 2010
Posts: 48
Default

I have removed the linkers and split the 454 mate pair reads with sff_extract; I have them now as (after deinterlacing) a pair of fastq files (454_1.fastq and 454_2.fastq) containing reads _1 and _2 only, respectively. In each case Read_1 represents the pre-linker and Read_2 represents the post-linker part of the original read, both in forward orientation:

schematic of original read

Code:
================================^^^^^^^^^^^^^^^=======================
454_1--->                             linker    454_2--->
But I'm a bit confused as to what mp parameters to feed to SPAdes for 454 mate pair reads,
because when assembled, they should be ordered _2 --> _1 (again both in forward i.e., 5'--3' orientation), with the library insert size distance between them

schematic of assembled reads

Code:
454_2                                                   454_1
-------->                  (~3kb)                        -------->
==================================================================
How to make sure SPAdes assembles these pairs in correct order and orientation?

would a YAML readset section like this work?

{
orientation: "ff",
type: "mate-pairs",
right reads: [
"/FULL_PATH_TO_DATASET/454_1.fastq"
],
left reads: [
"/FULL_PATH_TO_DATASET/454_2.fastq"
]
},


or should it be


{
orientation: "ff",
type: "mate-pairs",
right reads: [
"/FULL_PATH_TO_DATASET/454_2.fastq"
],
left reads: [
"/FULL_PATH_TO_DATASET/454_1.fastq"
]
},

?



(I adapted these views from http://seqanswers.com/forums/showpos...85&postcount=2 )

Last edited by ssully; 12-02-2014 at 07:10 PM.
ssully is offline   Reply With Quote
Old 12-03-2014, 12:04 AM   #33
akorobeynikov
Member
 
Location: Saint Petersburg, Russia

Join Date: Sep 2013
Posts: 25
Default

Quote:
Originally Posted by ssully View Post
I have removed the linkers and split the 454 mate pair reads with sff_extract; I have them now as (after deinterlacing) a pair of fastq files (454_1.fastq and 454_2.fastq) containing reads _1 and _2 only, respectively. In each case Read_1 represents the pre-linker and Read_2 represents the post-linker part of the original read, both in forward orientation:

schematic of original read

Code:
================================^^^^^^^^^^^^^^^=======================
454_1--->                             linker    454_2--->
But I'm a bit confused as to what mp parameters to feed to SPAdes for 454 mate pair reads,
because when assembled, they should be ordered _2 --> _1 (again both in forward i.e., 5'--3' orientation), with the library insert size distance between them

schematic of assembled reads

Code:
454_2                                                   454_1
-------->                  (~3kb)                        -------->
==================================================================
How to make sure SPAdes assembles these pairs in correct order and orientation?

would a YAML readset section like this work?

{
orientation: "ff",
type: "mate-pairs",
right reads: [
"/FULL_PATH_TO_DATASET/454_1.fastq"
],
left reads: [
"/FULL_PATH_TO_DATASET/454_2.fastq"
]
},


or should it be


{
orientation: "ff",
type: "mate-pairs",
right reads: [
"/FULL_PATH_TO_DATASET/454_2.fastq"
],
left reads: [
"/FULL_PATH_TO_DATASET/454_1.fastq"
]
},

?



(I adapted these views from http://seqanswers.com/forums/showpos...85&postcount=2 )
The second variant looks correct to me, basically you need to specify the first and the second read of a fragment and how they were read (in which direction).

Anyway, you can simply feed the data to SPAdes and check whether it inferred the insert size distribution properly.
akorobeynikov is offline   Reply With Quote
Old 12-03-2014, 08:52 AM   #34
ssully
Member
 
Location: NYC

Join Date: Aug 2010
Posts: 48
Default

I don't know; the second variant seems to be saying to me , 'the reads from the right side of the library read (post-linker, 454_2.fastq) belong at the right end of the genome fragment' -- which would be incorrect.

For me it really comes down to what 'right reads' and 'left reads' means in the YAML specification:

e.g. does 'right reads' refer to a read's position in the 454 mate pair library read (i.e., right side/post-linker in the 454 read, but maps to the left end of the genomic fragment) or with respect to the genome (i.e., maps to the right end of the genomic fragment...but comes from the left side/pre-linker half of the 454 read)


(it's also unusual to me that 'right read' is specified before 'left read' in the YAML, for both paired end and mate pair types, given that sequences are typically read by humans from left to right, 5' to 3'... is there a particular reason for that?)

But anyway I can try inputting it both ways, in two runs, and see which one assembles the 454 mate pairs correctly.

Last edited by ssully; 12-03-2014 at 01:12 PM.
ssully is offline   Reply With Quote
Old 12-06-2014, 07:38 AM   #35
ssully
Member
 
Location: NYC

Join Date: Aug 2010
Posts: 48
Default

I worked out the correct orientation and order of 454 paired reads input for SPAdes, and have corrected the reads with --iontorrent option (ionhammer). Btu now I have questions regarding ionhammer error correction -- does it pay any attention to fastq quality scores?


here is an original paired-end sff read (converted to fastq -- note 'sanger style' quality scores, and lower case for low-quality bases). I have underlined that the part that constitutes the 'post linker' read.

sff to fastq
@GIDY76W02G4JWL
Code:
tcagTTATTGATCAGTATTAGAATGAGGCCTATTAATAGCCAATTATCACATTTTGGATCTATTTTGTATCGATGATATCATTTATCGATAATCATCATAGTTATTTCGTAATAACTTCGTATAGCATACATTATACGAAGTTATACGATTATTGCTATAAATAAACGTACTTCTGGAGTAGAATTGAAGTGAGATAGAATTTCTGGTTTTAAGctgagactgccaaggcacacaggggataggn
+
III;;;;BIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII;:8599>>@:9////92EBEDDDGIIIIIIFEC?:??IIHHHEIIIIIIIIIICCECC:??C==?EEEIEGHHIIIIIIIIGHFHGIIIIC?==CIIIIEEAAAEE>8333C444IIICIIIIGGGGGIIIGGGGIIIIIIIIIIIIA>999=499----./25:===;=@A>>::::EEIIII@@BAGGGII!
here is the postlinker read, after sffToCA (a tool from Celera Assembler) has removed the linker from the original sff read and split it into two reads (parameters were set to perform NO quality trimming, since I expected ionhammer to do that -- so all the 'low quality' bases remain at the end of the read, but are converted to upper case. Fastq scores remain the same):

sffToCA
Code:
@GIDY76W02G4JWLb clr=0,95 clv=1,0 max=1,0 tnt=1,0 rnd=t
TTATTGCTATAAATAAACGTACTTCTGGAGTAGAATTGAAGTGAGATAGAATTTCTGGTTTTAAGCTGAGACTGCCAAGGCACACAGGGGATAGG
+
IEEAAAEE>8333C444IIICIIIIGGGGGIIIGGGGIIIIIIIIIIIIA>999=499----./25:===;=@A>>::::EEIIII@@BAGGGII

here was my spades command
Code:
spades.py --only-error-correction --iontorrent --dataset 454_4.yaml -t 8 --sc -k 21,33,55  --disable-gzip-output -o sff2ca_spades_corrected

and here is the output of ionhammer for the above read
Code:
>GIDY76W02G4JWLb
TTATTGCTATAAATAAACGTACTTCTGGAGTAGAATTGAAGTGAGATAGAATTTCTGTTTTAAGCTGAGACTGCCAAGGCACACAGGGGATAGG
The only difference is the removal of a single G base (at the underlined position) in the middle of the read (not even as part of a homopolymer)...all of the low-quality (originally lower case) bases remain.

So, I'm not clear on what ionhammer should be doing; it appears I need to quality-trim my 454 reads *before* running them through ionhammer...*OR* I need to preserve the lower-case base formatting in the input file?

Last edited by ssully; 12-06-2014 at 07:50 AM.
ssully is offline   Reply With Quote
Old 12-08-2014, 11:45 AM   #36
akorobeynikov
Member
 
Location: Saint Petersburg, Russia

Join Date: Sep 2013
Posts: 25
Default

Quote:
Originally Posted by ssully View Post
So, I'm not clear on what ionhammer should be doing; it appears I need to quality-trim my 454 reads *before* running them through ionhammer...*OR* I need to preserve the lower-case base formatting in the input file?
This is more or less expected. IonHammer is conservative - when it fails to correct something it preserves the original read and postpones the final decision to assembler. In general we suggest not to trim reads when the coverage is low.
akorobeynikov is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:24 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO