Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Runs of Ns in Velvet assembly?

    Hi All,
    Recently got bacterial genome assembled using Velvet. Reads were generated from 454. The assembly got quite a bit of runs of "Ns". I thought, 454s does not generate a N. AM wondering what could be the source of these Ns.

    Possibly, velvet assigns them when the 454 scores are not great in the overlapping regions of the reads? Any pointers?

    Thanks very much community.
    Gowthaman

  • #2
    Hi All i found this on Velvet manual "
    The N’s in the sequence correspond to gaps between scaffolded contigs. The
    number of N’s corresponds to the estimated length of the gap. For reasons of
    compatibility with the archives, any gap shorter than 10bp is represented by a
    sequence of 10 N’s. "

    That answers my question, only to create more!
    How does velvet estimates a gap in the assembly. Unless it uses a reference genome, how it calculates the gap between two reads? or for that matter how it finds two reads/scafolds are close by? Isn't it velvet assembles the scaffolds by de nova?

    Comment


    • #3
      paired-end reads
      --
      Jeremy Leipzig
      Bioinformatics Programmer
      --
      My blog
      Twitter

      Comment


      • #4
        First question, why are you assembling reads from 454 with Velvet? This is not the preferred choice except if you had complementary short-read data. I'd start with an assembly using Roche's Newbler software.

        Comment


        • #5
          I have just encountered Ns in my Velvet assembly as well. My reads are 36bp generated on an Illumina GA. all single-end reads and no reference sequence was used.

          Can anyone answer ragowthaman's question regarding how velvet estimates a gap in the assembly in my case?

          Comment


          • #6
            Illumina sequencing can generates Ns. Just grep for them in your FASTQ files. And check you haven't accidentally used paired-end mode for your assembly.

            Comment


            • #7
              In answer to your question - it can't estimate gap lengths if you don't have paired-end data.

              Comment


              • #8
                Thanks for the quick response. I agree with you, but still can't figure out why I'm getting Ns in my assembled contig.

                Prior to velvet assembly, I removed all short reads which contained one or more Ns. So the Ns are being inserted into my contigs during the assembly process. (just ran grep to double-check - no Ns)

                And as far as accidently assembling as paired-end reads - I used the -short option which, according to the manual, is also the default.

                any other thoughts?

                Thanks!

                Comment


                • #9
                  Hi Friends,

                  How I can compute the hash_length??. I don't understand the formule kmC = C*(L-K+1)/L. I unknown this values.

                  This is my problem: I have two files paired in format fastq which contains DNA data in differents shorts HTS (high throughput sequencing). I need to assembly this files in only one sequence. I would like to compute the hash_length appropiate.

                  Thanks

                  Comment


                  • #10
                    Originally posted by wgarzon View Post
                    Hi Friends,

                    How I can compute the hash_length??. I don't understand the formule kmC = C*(L-K+1)/L. I unknown this values.

                    This is my problem: I have two files paired in format fastq which contains DNA data in differents shorts HTS (high throughput sequencing). I need to assembly this files in only one sequence. I would like to compute the hash_length appropiate.

                    Thanks
                    Hi
                    You can combine different paired-end reads in velveth; thats is not an issue. About which Kmer is the best, you have to compute it; I mean you've to try different Kmer lengths and check which one has the best output (like N50).
                    You probably find this info somewhere else:
                    kmC = C*(L-K+1)/L
                    C: expected coverage
                    L: length of reads
                    K: Kmer value used
                    L: genome size (in bp)

                    One option is to ask in velveth to compute different Kmers in the same run
                    $velveth file_name 27,45,2 -fastq
                    where it takes Kmers values from 27 to 45 (in 2); but I think this needs an awesome memory cluster requirement.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM
                    • seqadmin
                      Techniques and Challenges in Conservation Genomics
                      by seqadmin



                      The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                      Avian Conservation
                      Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                      03-08-2024, 10:41 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 03-27-2024, 06:37 PM
                    0 responses
                    13 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-27-2024, 06:07 PM
                    0 responses
                    11 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-22-2024, 10:03 AM
                    0 responses
                    53 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-21-2024, 07:32 AM
                    0 responses
                    69 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X