Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Hello,everybody! Chloroplast genome assembly Q!!

    I'.m doing the chloroplast genome assembly these days, but I haven't learnt any bioinformatics.
    How to do the genome assembly. Now I'm doing the mapping using some software like Bwa, GATK,picard, sickle,and I have got the .bam file.
    Can I using bam to do the assembly?

  • #2
    Which assembler to use depends partly on what type of data you have, Illumina, IonTorrent, 454, etc., and whether there is a reference genome for the species you are working on.

    Popular assemblers include velvet, Abyss, Soapdenovo, Mira, 454's Newbler, but there are many others.

    Comment


    • #3
      Originally posted by mastal View Post
      Which assembler to use depends partly on what type of data you have, Illumina, IonTorrent, 454, etc., and whether there is a reference genome for the species you are working on.

      Popular assemblers include velvet, Abyss, Soapdenovo, Mira, 454's Newbler, but there are many others.
      Thank you for your Re. Im using the illumina pair-end fastq data! It's about rice, so, I have the reference!

      Comment


      • #4
        You can use samtools / bcftools to do this. I have modified the vcftools vcf2fq script (link here) to work on VCF files that only contain variant information, as well as properly dealing with INDELs (i.e. changing the sequence, rather than just masking out INDEL regions in lower case). You can use it with a small modification to that specified in the SAMtools man page:

        Code:
        $ samtools mpileup -L ${maxdepth} -uf \
             chloroplast_ref.fasta results/sample1.bam | \
             bcftools view -v -g - > sample1.vcf
        $ ./vcf2fq.pl -Q 20 -L 20 -f chloroplast_ref.fasta output.vcf \
             > con_sample1.fastq
        Basic help for the vcf2fq script:

        Code:
        Usage:   vcf2fq.pl [options] <variant-sites.vcf>
        
        Options: -d INT    minimum depth                   [3]
                 -D INT    maximum depth                   [100000]
                 -Q INT    min RMS mapQ                    [10]
                 -L INT    min INDEL Qual                  [100]
                 -f FASTA  file with reference sequence(s) []
        
        Note: without a reference sequence, sites that are not specified
              in the VCF file will be filled with Ns
        Last edited by gringer; 03-17-2014, 12:26 PM.

        Comment


        • #5
          Thank you very much!
          I will try first.

          Comment


          • #6
            Velvet inquiry about assembling from multiple libraries

            Hi,
            I' m trying to assembly a plastidial genome (about 160000 bp) from a whole genome NGS experiment.

            I installed Velvet using these configuration parameters:

            make 'CATEGORIES=20' 'MAXKMERLENGTH=77' 'LONGSEQUENCES=1' 'OPENMP=1'

            I have 4 paired-end libraries, with different insert-lengths.

            First I aligned reads to reference and extracted mapping reads.
            Second I tried to do de-novo assembling by Velvet:

            velveth velvet/ 43,69,2 -fastq -separate
            -shortPaired
            mappedReads/paired/lib_400_paired_forward.fastq mappedReads/paired/lib_400_paired_reverse.fastq
            -shortPaired2
            mappedReads/paired/lib_500_paired_forward.fastq mappedReads/paired/lib_500_paired_reverse.fastq
            -shortPaired3
            mappedReads/paired/lib_700_paired_forward.fastq mappedReads/paired/lib_700_paired_reverse.fastq
            -shortPaired4
            mappedReads/paired/lib_700_BP_paired_forward.fastq mappedReads/paired/lib_700_BP_paired_reverse.fastq
            -fastq -short
            mappedReads/unpaired/lib_400_unpaired.fastq
            mappedReads/unpaired/lib_500_unpaired.fastq
            mappedReads/unpaired/lib_700_unpaired.fastq
            mappedReads/unpaired/lib_700_BP_unpaired.fastq
            mappedReads/unpaired/lib_single.fastq


            and for each kmer:

            velvetg velvet/_43/ -exp_cov auto -cov_cutoff auto -min_contig_lgth 200 -ins_length 250 -ins_length2 400 -ins_length3 600 -ins_length4 600

            Unfortunately N50 value for each kmer is identical to kmer value.
            (e.g. kmer 59 Result: Final graph has 24686 nodes and n50 of 59, max 897, total 781924, using 2297000/54940402 reads )

            Could anyone help me to understand what am I doing wrong?

            Comment


            • #7
              How long are your reads? Maybe the range of kmer values you tried aren't the best ones for velvet.

              What kind of coverage do you have?
              You could try using velvetk to calculate what kind of coverage you would get for different kmers.

              How closely related is the reference genome that you are aligning your reads to?

              Maybe you could try a reference-assisted assembly with velvet instead of de novo.

              Comment


              • #8
                > How long are your reads? Maybe the range of kmer values you tried aren't the best ones for velvet.

                Reads are 100 bp. The range is 43 - 69, I think it is large enough

                > What kind of coverage do you have?
                About 57 millions of reads mapping the reference, this is a very large coverage for plastidial genome (about 160000 bp)

                Comment


                • #9
                  In that case the problem could be that you have very high coverage, velvet doesn't do so well with high coverage.

                  Comment


                  • #10
                    Thanks for suggestion.
                    It has been very useful.
                    Setting the right coverage cutoff, I obtained a good result.


                    Originally posted by mastal View Post
                    In that case the problem could be that you have very high coverage, velvet doesn't do so well with high coverage.

                    Comment


                    • #11
                      Originally posted by concitacantarella View Post
                      Thanks for suggestion.
                      It has been very useful.
                      Setting the right coverage cutoff, I obtained a good result.
                      Hi concitacantarella!
                      Can you post the velvet parameters what you used, cause I have the same problem like yours. Thanks!

                      Comment


                      • #12
                        hi,
                        I used velvet-estimate-exp_cov.pl to evaluate the coverage-cutoff.

                        this was the runned command line:

                        velvetg _73/ -exp_cov auto -cov_cutoff 500 -min_contig_lgth 200 -ins_length 280

                        Comment


                        • #13
                          Originally posted by concitacantarella View Post
                          hi,
                          I used velvet-estimate-exp_cov.pl to evaluate the coverage-cutoff.

                          this was the runned command line:

                          velvetg _73/ -exp_cov auto -cov_cutoff 500 -min_contig_lgth 200 -ins_length 280
                          Thanks! I am using these parameters and it work great.

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Current Approaches to Protein Sequencing
                            by seqadmin


                            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                            04-04-2024, 04:25 PM
                          • seqadmin
                            Strategies for Sequencing Challenging Samples
                            by seqadmin


                            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                            03-22-2024, 06:39 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 04-11-2024, 12:08 PM
                          0 responses
                          24 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 10:19 PM
                          0 responses
                          25 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-10-2024, 09:21 AM
                          0 responses
                          21 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 04-04-2024, 09:00 AM
                          0 responses
                          52 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X