Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Velveth and velvetg use

    Hello Everyone,

    This is the first time I am using Velveth and velvetg.
    I have around 5 million read, which has 50-300bp.
    I used below cmd, and it work jusdt fine.
    # velveth auto 31,45,2 -fastq -short -inputfile
    output# it gave me 7 file with kmer length 31,33,35,36,39,41 &43.

    Can anybody please give me suggetion about which kmer length to select, do i need to use long or shord read in command?

    How do I excecute velvetg cmd?
    What cutoff and min_contig_length to use?

    Thanks in advance!

  • #2
    This question rightly belongs in http://seqanswers.com/forums/forumdisplay.php?f=27
    which is the de novo assembly forum.

    When you say "50-300 bp", are you referencing the length of what velvet calls inserts?

    And in response to which Kmer to use; I refer you back to the manual:

    5.2 Choice of hash length k
    The hash length is the length of the k-mers being entered in the hash table.
    Firstly, you must observe three technical constraints:
    • it must be an odd number, to avoid palindromes. If you put in an even
    number, Velvet will just decrement it and proceed.
    • it must be below or equal to MAXKMERHASH length (cf. 2.3.3, by
    default 31bp), because it is stored on 64 bits
    • it must be strictly inferior to read length, otherwise you simply will not
    observe any overlaps between reads, for obvious reasons.
    Now you still have quite a lot of possibilities. As is often the case, it’s a tradeoff between specificity and sensitivity. Longer kmers bring you more specificity
    (i.e. less spurious overlaps) but lowers coverage (cf. below). . . so there’s a sweet
    spot to be found with time and experience.
    Experience shows that kmer coverage should be above 10 to start getting
    decent results. If Ck is above 20, you might be “wasting” coverage. Experience
    also shows that empirical tests with different values for k are not that costly to
    run!
    5.3 Choice of a coverage cutoff
    Velvet was designed to be explicitly cautious when correcting the assembly, to
    lose as little information as possible. This consequently will leave some obvious
    errors lying behind after the Tour Bus algorithm (cf. 7) was run. To detect
    them, you can plot out the distribution of k-mer coverages (5.2), using plotting
    software (I use R).


    velvetg is simply

    Code:
    velvetg auto
    This would also be a good time to ask what you are assembling, and whether or not you have gotten your feet wet on de novo assembly for which there is an "answer", like E. coli MG1655.

    Comment


    • #3
      HI winsettz

      My Fastq file has 50-300bp long sequence read. And all are single end read.

      So I was wondering which command to executive;
      For eg:
      velveth auto 31 -fastq -short -inputfile

      or

      velveth auto 31 -fastq -long -inputfile

      Comment


      • #4
        Originally posted by nareshvasani View Post
        My Fastq file has 50-300bp long sequence read. And all are single end read.

        So I was wondering which command to executive;
        For eg:
        velveth auto 31 -fastq -short -inputfile

        or

        velveth auto 31 -fastq -long -inputfile
        Again, in the velvet manual

        5.6 What’s long and what’s short?
        Velvet was pretty much designed with micro-reads (e.g. Illumina) as short and
        short to long reads (e.g. 454 and capillary) as long. Reference sequences can
        also be thrown in as long.
        That being said, there is no necessary distinction between the types of reads.
        The only constraint is that a short read be shorter than 32kb. The real difference
        is the amount of data Velvet keeps on each read. Short reads are presumably
        too short to resolve many repeats, so only a minimal amount of information is
        kept. On the contrary, long reads are tracked in detail through the graph.
        This means that whatever you call your reads, you should be able to obtain
        the same initial assembly. The differences will appear as you are trying to resolve
        repeats, as long reads can be followed through the graph. On the other hand,
        long reads cost more memory. It is therefore perfectly fine to store Sanger reads
        as “short” if necessary
        Illumina stuff is definitely short-read; and things like PacBio will require you to determine this beforehand. 454 and Sanger will also likely meet the definition of short read for velvet.

        Comment


        • #5
          winsettz

          Thanks a lot.

          This fastq file was generated from ion torrent proton instrumnet.
          So I don't know what to consider this file as short or long?

          Comment


          • #6
            If you read the extract from the manual, as posted above, it tells you that for your size of reads, it really doesn't matter whether you call them short or long, you will get the same result.

            Comment


            • #7
              Mastal

              Thanks a lot!

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM
              • seqadmin
                The Impact of AI in Genomic Medicine
                by seqadmin



                Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                02-26-2024, 02:07 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 03-14-2024, 06:13 AM
              0 responses
              34 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-08-2024, 08:03 AM
              0 responses
              72 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-07-2024, 08:13 AM
              0 responses
              81 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-06-2024, 09:51 AM
              0 responses
              68 views
              0 likes
              Last Post seqadmin  
              Working...
              X