Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Yes, DipSPAdes can use PacBio reads. Currently it uses them only for haplocontigs construction, so this may be a bit suboptimal.

    Also, I would suggest you to disable --careful option for initial assembly

    Comment


    • #17
      meanning of the consensus_contigs.fasta

      I am confused about the consensus_contigs.fasta file. I though that this contained a consensus of both haplotypes, but for I understood from the manual this contains the constructed contigs from the paired_consensus_contigs. Or maybe I am missing something?
      If I would want estimate the size of the genome which file or combination of files would be the best to use.
      Furthermore which file(s) summarize the genome of each haplotype (haplotype_assembly.out?) and which file(s) would be the best as input for further gene prediction and annotation.

      Thanks

      Comment


      • #18
        Originally posted by joxcargator73 View Post
        I am confused about the consensus_contigs.fasta file. I though that this contained a consensus of both haplotypes, but for I understood from the manual this contains the constructed contigs from the paired_consensus_contigs. Or maybe I am missing something?
        I looked over manual and found nothing which may indicate so. The manual clearly states that "consensus_contigs.fasta - file in FASTA format with a set of constructed consensus contigs".

        Originally posted by joxcargator73 View Post
        If I would want estimate the size of the genome which file or combination of files would be the best to use.
        It depends whether you're interested in the size of diploid or haploid genome.

        Originally posted by joxcargator73 View Post
        Furthermore which file(s) summarize the genome of each haplotype (haplotype_assembly.out?)
        The manual states that haplotype_assembly.out contains the information about which contig from the set of haplocontigs belongs to each haplome. So, you can use it, right.

        Originally posted by joxcargator73 View Post
        and which file(s) would be the best as input for further gene prediction and annotation.
        Again, it would depend on the result you want to obtain. But probably for gene finding you would want to use the haplocontigs.

        Comment


        • #19
          pacbio question

          In the manual says:
          "PacBio CLR reads are used for hybrid assemblies (e.g. with Illumina or IonTorrent). There is no need to pre-correct PacBio CLR reads. You just need to have filtered subreads in FASTQ/FASTA format. Provide these filtered subreads using --pacbio option. SPAdes will use PacBio CLR reads for gap closure and repeat resolution."

          What is the real meaning of "filtered subreads". is the set of reads that just passed the quality control from the machine? or there is an additional cutoff based on quality and length?

          Thanks

          Comment


          • #20
            Filtered subreads are broken into pieces at adapters. The raw reads would potentially have the same sequence multiple times: forward, adapter, reverse, adapter, etc.

            Comment


            • #21
              One more question for 454 reads

              I was wondering if spades can take 454 reads. I was thinking to use them in a hybrid assembly and perhaps "use" the 454 reads as a "pseudo sanger reads"

              Thanks

              Comment


              • #22
                Originally posted by Brian Bushnell View Post
                Filtered subreads are broken into pieces at adapters. The raw reads would potentially have the same sequence multiple times: forward, adapter, reverse, adapter, etc.
                Right. Basically, filtered subreads are the result of "P_Filter" step which can be performed at SMRT portal (and usually filtered subreads is what one would obtain from 3rd party sequencing provider).

                Comment


                • #23
                  Originally posted by joxcargator73 View Post
                  I was wondering if spades can take 454 reads. I was thinking to use them in a hybrid assembly and perhaps "use" the 454 reads as a "pseudo sanger reads"
                  Thanks
                  This may be non-trivial. But probably yes - try to provide them as "sanger" reads.

                  You may also want to pre-correct them as IonTorrent reads (--iontorrent --only-error-correction), and try to provide corrected reads as an additional single read library.

                  Comment


                  • #24
                    Coverage

                    I am trying to estimate the coverage of the contigs from the spades output. In The fasta files the headers have the
                    >NODE_1_length_251_cov_0.7_ID27771
                    >NODE2_length_10997_cov_41_ID335
                    >...
                    >...
                    >..

                    Are cov_0.7 and cov_41 the kmer coverage of these contigs. I use this to plot coverages but they look very low in generak. Maybe I am getting the wrong information.
                    Where Can I get the coverage info?
                    Thanks

                    Comment


                    • #25
                      Originally posted by joxcargator73 View Post
                      I am trying to estimate the coverage of the contigs from the spades output. In The fasta files the headers have the
                      >NODE_1_length_251_cov_0.7_ID27771
                      >NODE2_length_10997_cov_41_ID335
                      >...
                      >...
                      >..

                      Are cov_0.7 and cov_41 the kmer coverage of these contigs. I use this to plot coverages but they look very low in generak. Maybe I am getting the wrong information.
                      Where Can I get the coverage info?
                      Thanks
                      The reported coverages are the average k-mer coverage of the contig. Use the last k-mer iteration for the value of a k-mer length.

                      Comment


                      • #26
                        Pacbio in Spades

                        Thanks for your previous answer.
                        Just curious now about how spades handle the long Pacbio reads. After this reads are corrected by illumina reads, are the long Pacbio reads "shopped in kmers two?, or these reads are used only for a later alignment (I am trying to follow the log file).


                        The second question is about two warnings in my log file. How critical are these and if there is a way to fix it?

                        The assembly looks good but I got this at the tail of the file:

                        ===== Mismatch correction finished.

                        * Corrected reads are in /scratch/lfs/ascunce/Spades/spadesPacBio_output/corrected/
                        * Assembled contigs are in /scratch/lfs/ascunce/Spades/spadesPacBio_output/contigs.fasta (contigs.fastg)
                        * Assembled scaffolds are in /scratch/lfs/ascunce/Spades/spadesPacBio_output/scaffolds.fasta (scaffolds.fastg)

                        ======= SPAdes pipeline finished WITH WARNINGS!

                        === Error correction and assembling warnings:
                        * 2:36:44.887 5G / 5G WARN General (kmer_coverage_model.cpp : 359) Failed to determine erroneous kmer threshold. Threshold set to: 7
                        * 2:17:39.519 8G / 9G WARN General (kmer_coverage_model.cpp : 359) Failed to determine erroneous kmer threshold. Threshold set to: 104
                        ======= Warnings saved to /scratch/lfs/ascunce/Spades/spadesPacBio_output/warnings.log

                        Thanks for your help.

                        Comment


                        • #27
                          Originally posted by joxcargator73 View Post
                          Thanks for your previous answer.
                          Just curious now about how spades handle the long Pacbio reads. After this reads are corrected by illumina reads, are the long Pacbio reads "shopped in kmers two?, or these reads are used only for a later alignment (I am trying to follow the log file).
                          SPAdes uses PacBio reads for repeat resolution. So, it uses original uncorrected reads.


                          The second question is about two warnings in my log file. How critical are these and if there is a way to fix it?

                          The assembly looks good but I got this at the tail of the file:

                          ===== Mismatch correction finished.

                          * Corrected reads are in /scratch/lfs/ascunce/Spades/spadesPacBio_output/corrected/
                          * Assembled contigs are in /scratch/lfs/ascunce/Spades/spadesPacBio_output/contigs.fasta (contigs.fastg)
                          * Assembled scaffolds are in /scratch/lfs/ascunce/Spades/spadesPacBio_output/scaffolds.fasta (scaffolds.fastg)

                          ======= SPAdes pipeline finished WITH WARNINGS!

                          === Error correction and assembling warnings:
                          * 2:36:44.887 5G / 5G WARN General (kmer_coverage_model.cpp : 359) Failed to determine erroneous kmer threshold. Threshold set to: 7
                          * 2:17:39.519 8G / 9G WARN General (kmer_coverage_model.cpp : 359) Failed to determine erroneous kmer threshold. Threshold set to: 104
                          ======= Warnings saved to /scratch/lfs/ascunce/Spades/spadesPacBio_output/warnings.log

                          Thanks for your help.
                          Usually such warnings indicate that you have quite uneven coverage and trying to assemble in multi-cell mode. Please email us (at SPAdes support ) your spades.log, so we can see whether it's indeed so.

                          Comment


                          • #28
                            re: attempting SPAdes with 454 reads -- if I understand correctly:

                            2) SPAdes does not natively support 454 reads

                            2) 454 reads resemble Ion Torrent reads (similar technology) but SPAdes will not do a hybrid Illumina/Ion Torrent assembly (though the IonHammer corrector could be used on 454 reads)

                            3) therefore to attempt 454/Illumina hybrid assembly, one must try treating 454 reads as Sanger

                            4) but Sanger reads do not have 'paired end' modes, so paired end* 454 reads will be treated as single-end (i.e., all paired-ness/insert size info is lost)

                            Is that all correct?




                            *Roche calls them 'paired end' but they are mate-pair.

                            Comment


                            • #29
                              Originally posted by ssully View Post
                              re: attempting SPAdes with 454 reads -- if I understand correctly:

                              2) SPAdes does not natively support 454 reads

                              2) 454 reads resemble Ion Torrent reads (similar technology) but SPAdes will not do a hybrid Illumina/Ion Torrent assembly (though the IonHammer corrector could be used on 454 reads)

                              3) therefore to attempt 454/Illumina hybrid assembly, one must try treating 454 reads as Sanger

                              4) but Sanger reads do not have 'paired end' modes, so paired end* 454 reads will be treated as single-end (i.e., all paired-ness/insert size info is lost)

                              Is that all correct?

                              *Roche calls them 'paired end' but they are mate-pair.
                              You're missing the 5th possibility which is actually the proper choice here. Basically:

                              1. Correct your Illumina reads using --only-error-correction mode
                              2. Correct your 454 reads using --only-error-correction --iontorrent mode (make sure you're using the latest SPAdes release - it does support proper error correction of paired IonTorrent data)
                              3. Provide corrected reads from 1. and 2. and assemble everything using --only-assembler option (your 454 reads should go as mate pairs, yes).

                              However, since your 454 data is likely of low coverage, then you can simply try to feed them as Illumina mate pairs.

                              Comment


                              • #30
                                Originally posted by akorobeynikov View Post
                                You're missing the 5th possibility which is actually the proper choice here. Basically:

                                1. Correct your Illumina reads using --only-error-correction mode
                                2. Correct your 454 reads using --only-error-correction --iontorrent mode (make sure you're using the latest SPAdes release - it does support proper error correction of paired IonTorrent data)
                                3. Provide corrected reads from 1. and 2. and assemble everything using --only-assembler option (your 454 reads should go as mate pairs, yes).

                                However, since your 454 data is likely of low coverage, then you can simply try to feed them as Illumina mate pairs.
                                But 454 paired end reads are two 'end' reads connected by a linker sequence. Does the IonHammer corrector actually recognize those and split the reads before correcting? Or do the 454 PE reads have to first be split into left/right by linker removal, then run through --only-error-correction?

                                ...and oriented rf (reverse-forward) if they are to be interpreted as Illumina mate pairs? (yes, they are low coverage)
                                Last edited by ssully; 12-01-2014, 03:40 PM.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM
                                • seqadmin
                                  The Impact of AI in Genomic Medicine
                                  by seqadmin



                                  Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                                  02-26-2024, 02:07 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 03-14-2024, 06:13 AM
                                0 responses
                                34 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-08-2024, 08:03 AM
                                0 responses
                                72 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-07-2024, 08:13 AM
                                0 responses
                                81 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-06-2024, 09:51 AM
                                0 responses
                                68 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X