Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bowtie alignment using de novo velvet contigs

    I'll start by saying that I am a biologist learning bioinformatics from scratch so please bear with me!

    I'm doing de novo sequencing of a complex of dsRNA viruses using MiSeq 150PE reads. Some sanger sequencing has been done before so I have some partial sequences already which I'm hoping to use to validate my assemblies.

    As a quick test of my new data I aligned my reads to the sanger assemblies using bowtie and used Tablet to visualise my alignments. I found that I had average read depth of 3300x to 81000x depending on the virus but sometimes the reads density was quite patchy, producing "islands" of very high coverage (suggesting that the sanger seq may not be entirely correct?).

    When I produce a de novo assembly using velvet:
    Code:
    velvetg Trimmed/234/_51 -cov_cutoff 10 -ins_length 170 -exp_cov 13200 -scaffolding no -min_contig_lgth 100 -unused_reads yes -read_trkg yes &
    and then use the contigs as a reference sequence in bowtie to align the original reads, I find that I still get patchy coverage of the contigs.

    How can I get patchy coverage of my velvet-derived contigs when I use the same reads to produce a bowtie alignment? Is there a better/easier way of visualising my reads on the contigs that I produce (bearing in mind that my command line skills are very basic and programming is non-existent)?

    Thanks, Ed

  • #2
    I would wonder if velvet will work properly with so many reads.

    Try passing it just a fraction of your reads, see if you get better contigs. Aim for maybe 200x, at the most.

    Comment


    • #3
      Thanks for the reply..I got the same answer in a similar thread. I tried running 1/6000th of my reads yesterday and I got a much better assembly. I'm looking forward to seeing what happens when I do the same with the remaining 5999/6000th and merge it all

      Comment


      • #4
        Hi e.dobbs

        Hi,

        I am also learning bioinformatics same as you from scratch.
        Hope you can help me out.
        I got fastq files from ion proton instrument, it has single end read from 50-340 sequence length.
        I don't have reference genome.
        Here is how I did my analysis:
        1]Fastqc
        2] Trimmmimg some read using fastx tool

        For de novo assembly:
        3]Velveth with kmer 31
        4]velvetg
        5] bowtie-build== to build reference index from contig.fa file created from velvetg
        6] Mapping my fastq read with above build reference index, to

        So my questions are:
        1] How you select parameter of velvetg and velveth
        2] Which kmer value to select
        3] Am I running bowtie in correct manner?
        4]If yes, how do i confirm assembly created using velvet contain preserved input information and it's accuracy.

        Thanks a bunch in advance.

        Naresh






        Originally posted by e.dobbs View Post
        I'll start by saying that I am a biologist learning bioinformatics from scratch so please bear with me!

        I'm doing de novo sequencing of a complex of dsRNA viruses using MiSeq 150PE reads. Some sanger sequencing has been done before so I have some partial sequences already which I'm hoping to use to validate my assemblies.

        As a quick test of my new data I aligned my reads to the sanger assemblies using bowtie and used Tablet to visualise my alignments. I found that I had average read depth of 3300x to 81000x depending on the virus but sometimes the reads density was quite patchy, producing "islands" of very high coverage (suggesting that the sanger seq may not be entirely correct?).

        When I produce a de novo assembly using velvet:
        Code:
        velvetg Trimmed/234/_51 -cov_cutoff 10 -ins_length 170 -exp_cov 13200 -scaffolding no -min_contig_lgth 100 -unused_reads yes -read_trkg yes &
        and then use the contigs as a reference sequence in bowtie to align the original reads, I find that I still get patchy coverage of the contigs.

        How can I get patchy coverage of my velvet-derived contigs when I use the same reads to produce a bowtie alignment? Is there a better/easier way of visualising my reads on the contigs that I produce (bearing in mind that my command line skills are very basic and programming is non-existent)?

        Thanks, Ed

        Comment


        • #5
          Hi nareshvasani,

          As a newbie I'm probably not the best person to ask but I can certainly try to answer your questions:

          You are definitely doing everything the right way and in the right order, from what you have said.

          For velvet most people seem to try a range of different kmers and then chose the ones that give the highest n50 value (ie average contig length). Often using a number of velvet assemblies and then further assembling the contigs with a separate program (eg cap3) can give you longer contigs.

          When I was doing my analysis using velvet I chose a range of kmer values and then merged the results using velvet-oases. I was looking for ~30 different contigs ranging in size from 500bp to 20kb so the n50 wasn't the best reflection of the quality of the assemblies in my case. Oases produced thousands of contigs so I used Cap3 to reduce the redundancy of the assemblies and provide me with a sensible number of final assemblies (100-250) which I could then assess using BLASTn, BLASTp and by searching for conserved domains.

          In terms of validating the contigs produced I have been using PCR and Sanger sequencing but again this may not be ideal for your purposes if you are looking at a large genome.

          Good luck!

          Ed

          Originally posted by nareshvasani View Post
          Hi,

          I am also learning bioinformatics same as you from scratch.
          Hope you can help me out.
          I got fastq files from ion proton instrument, it has single end read from 50-340 sequence length.
          I don't have reference genome.
          Here is how I did my analysis:
          1]Fastqc
          2] Trimmmimg some read using fastx tool

          For de novo assembly:
          3]Velveth with kmer 31
          4]velvetg
          5] bowtie-build== to build reference index from contig.fa file created from velvetg
          6] Mapping my fastq read with above build reference index, to

          So my questions are:
          1] How you select parameter of velvetg and velveth
          2] Which kmer value to select
          3] Am I running bowtie in correct manner?
          4]If yes, how do i confirm assembly created using velvet contain preserved input information and it's accuracy.

          Thanks a bunch in advance.

          Naresh

          Comment


          • #6
            Hi e.dobbs,

            Thanks for your reply.

            Originally posted by e.dobbs View Post
            Hi nareshvasani,

            As a newbie I'm probably not the best person to ask but I can certainly try to answer your questions:

            You are definitely doing everything the right way and in the right order, from what you have said.

            For velvet most people seem to try a range of different kmers and then chose the ones that give the highest n50 value (ie average contig length). Often using a number of velvet assemblies and then further assembling the contigs with a separate program (eg cap3) can give you longer contigs.

            When I was doing my analysis using velvet I chose a range of kmer values and then merged the results using velvet-oases. I was looking for ~30 different contigs ranging in size from 500bp to 20kb so the n50 wasn't the best reflection of the quality of the assemblies in my case. Oases produced thousands of contigs so I used Cap3 to reduce the redundancy of the assemblies and provide me with a sensible number of final assemblies (100-250) which I could then assess using BLASTn, BLASTp and by searching for conserved domains.

            In terms of validating the contigs produced I have been using PCR and Sanger sequencing but again this may not be ideal for your purposes if you are looking at a large genome.

            Good luck!

            Ed

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM
            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            18 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            22 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            17 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            49 views
            0 likes
            Last Post seqadmin  
            Working...
            X