Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Velveth and velvetg use

    Hello Everyone,

    This is the first time I am using Velveth and velvetg.
    I have around 5 million read, which has 50-300bp.
    I used below cmd, and it work jusdt fine.
    # velveth auto 31,45,2 -fastq -short -inputfile
    output# it gave me 7 file with kmer length 31,33,35,36,39,41 &43.

    Can anybody please give me suggetion about which kmer length to select, do i need to use long or shord read in command?

    How do I excecute velvetg cmd?
    What cutoff and min_contig_length to use?

    Thanks in advance!

  • #2
    This question rightly belongs in http://seqanswers.com/forums/forumdisplay.php?f=27
    which is the de novo assembly forum.

    When you say "50-300 bp", are you referencing the length of what velvet calls inserts?

    And in response to which Kmer to use; I refer you back to the manual:

    5.2 Choice of hash length k
    The hash length is the length of the k-mers being entered in the hash table.
    Firstly, you must observe three technical constraints:
    • it must be an odd number, to avoid palindromes. If you put in an even
    number, Velvet will just decrement it and proceed.
    • it must be below or equal to MAXKMERHASH length (cf. 2.3.3, by
    default 31bp), because it is stored on 64 bits
    • it must be strictly inferior to read length, otherwise you simply will not
    observe any overlaps between reads, for obvious reasons.
    Now you still have quite a lot of possibilities. As is often the case, it’s a tradeoff between specificity and sensitivity. Longer kmers bring you more specificity
    (i.e. less spurious overlaps) but lowers coverage (cf. below). . . so there’s a sweet
    spot to be found with time and experience.
    Experience shows that kmer coverage should be above 10 to start getting
    decent results. If Ck is above 20, you might be “wasting” coverage. Experience
    also shows that empirical tests with different values for k are not that costly to
    run!
    5.3 Choice of a coverage cutoff
    Velvet was designed to be explicitly cautious when correcting the assembly, to
    lose as little information as possible. This consequently will leave some obvious
    errors lying behind after the Tour Bus algorithm (cf. 7) was run. To detect
    them, you can plot out the distribution of k-mer coverages (5.2), using plotting
    software (I use R).


    velvetg is simply

    Code:
    velvetg auto
    This would also be a good time to ask what you are assembling, and whether or not you have gotten your feet wet on de novo assembly for which there is an "answer", like E. coli MG1655.

    Comment


    • #3
      HI winsettz

      My Fastq file has 50-300bp long sequence read. And all are single end read.

      So I was wondering which command to executive;
      For eg:
      velveth auto 31 -fastq -short -inputfile

      or

      velveth auto 31 -fastq -long -inputfile

      Comment


      • #4
        Originally posted by nareshvasani View Post
        My Fastq file has 50-300bp long sequence read. And all are single end read.

        So I was wondering which command to executive;
        For eg:
        velveth auto 31 -fastq -short -inputfile

        or

        velveth auto 31 -fastq -long -inputfile
        Again, in the velvet manual

        5.6 What’s long and what’s short?
        Velvet was pretty much designed with micro-reads (e.g. Illumina) as short and
        short to long reads (e.g. 454 and capillary) as long. Reference sequences can
        also be thrown in as long.
        That being said, there is no necessary distinction between the types of reads.
        The only constraint is that a short read be shorter than 32kb. The real difference
        is the amount of data Velvet keeps on each read. Short reads are presumably
        too short to resolve many repeats, so only a minimal amount of information is
        kept. On the contrary, long reads are tracked in detail through the graph.
        This means that whatever you call your reads, you should be able to obtain
        the same initial assembly. The differences will appear as you are trying to resolve
        repeats, as long reads can be followed through the graph. On the other hand,
        long reads cost more memory. It is therefore perfectly fine to store Sanger reads
        as “short” if necessary
        Illumina stuff is definitely short-read; and things like PacBio will require you to determine this beforehand. 454 and Sanger will also likely meet the definition of short read for velvet.

        Comment


        • #5
          winsettz

          Thanks a lot.

          This fastq file was generated from ion torrent proton instrumnet.
          So I don't know what to consider this file as short or long?

          Comment


          • #6
            If you read the extract from the manual, as posted above, it tells you that for your size of reads, it really doesn't matter whether you call them short or long, you will get the same result.

            Comment


            • #7
              Mastal

              Thanks a lot!

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Advancing Precision Medicine for Rare Diseases in Children
                by seqadmin




                Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                12-16-2024, 07:57 AM
              • seqadmin
                Recent Advances in Sequencing Technologies
                by seqadmin



                Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                Long-Read Sequencing
                Long-read sequencing has seen remarkable advancements,...
                12-02-2024, 01:49 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 12-17-2024, 10:28 AM
              0 responses
              22 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 12-13-2024, 08:24 AM
              0 responses
              42 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 12-12-2024, 07:41 AM
              0 responses
              28 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 12-11-2024, 07:45 AM
              0 responses
              42 views
              0 likes
              Last Post seqadmin  
              Working...
              X