Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Runs of Ns in Velvet assembly?

    Hi All,
    Recently got bacterial genome assembled using Velvet. Reads were generated from 454. The assembly got quite a bit of runs of "Ns". I thought, 454s does not generate a N. AM wondering what could be the source of these Ns.

    Possibly, velvet assigns them when the 454 scores are not great in the overlapping regions of the reads? Any pointers?

    Thanks very much community.
    Gowthaman

  • #2
    Hi All i found this on Velvet manual "
    The N’s in the sequence correspond to gaps between scaffolded contigs. The
    number of N’s corresponds to the estimated length of the gap. For reasons of
    compatibility with the archives, any gap shorter than 10bp is represented by a
    sequence of 10 N’s. "

    That answers my question, only to create more!
    How does velvet estimates a gap in the assembly. Unless it uses a reference genome, how it calculates the gap between two reads? or for that matter how it finds two reads/scafolds are close by? Isn't it velvet assembles the scaffolds by de nova?

    Comment


    • #3
      paired-end reads
      --
      Jeremy Leipzig
      Bioinformatics Programmer
      --
      My blog
      Twitter

      Comment


      • #4
        First question, why are you assembling reads from 454 with Velvet? This is not the preferred choice except if you had complementary short-read data. I'd start with an assembly using Roche's Newbler software.

        Comment


        • #5
          I have just encountered Ns in my Velvet assembly as well. My reads are 36bp generated on an Illumina GA. all single-end reads and no reference sequence was used.

          Can anyone answer ragowthaman's question regarding how velvet estimates a gap in the assembly in my case?

          Comment


          • #6
            Illumina sequencing can generates Ns. Just grep for them in your FASTQ files. And check you haven't accidentally used paired-end mode for your assembly.

            Comment


            • #7
              In answer to your question - it can't estimate gap lengths if you don't have paired-end data.

              Comment


              • #8
                Thanks for the quick response. I agree with you, but still can't figure out why I'm getting Ns in my assembled contig.

                Prior to velvet assembly, I removed all short reads which contained one or more Ns. So the Ns are being inserted into my contigs during the assembly process. (just ran grep to double-check - no Ns)

                And as far as accidently assembling as paired-end reads - I used the -short option which, according to the manual, is also the default.

                any other thoughts?

                Thanks!

                Comment


                • #9
                  Hi Friends,

                  How I can compute the hash_length??. I don't understand the formule kmC = C*(L-K+1)/L. I unknown this values.

                  This is my problem: I have two files paired in format fastq which contains DNA data in differents shorts HTS (high throughput sequencing). I need to assembly this files in only one sequence. I would like to compute the hash_length appropiate.

                  Thanks

                  Comment


                  • #10
                    Originally posted by wgarzon View Post
                    Hi Friends,

                    How I can compute the hash_length??. I don't understand the formule kmC = C*(L-K+1)/L. I unknown this values.

                    This is my problem: I have two files paired in format fastq which contains DNA data in differents shorts HTS (high throughput sequencing). I need to assembly this files in only one sequence. I would like to compute the hash_length appropiate.

                    Thanks
                    Hi
                    You can combine different paired-end reads in velveth; thats is not an issue. About which Kmer is the best, you have to compute it; I mean you've to try different Kmer lengths and check which one has the best output (like N50).
                    You probably find this info somewhere else:
                    kmC = C*(L-K+1)/L
                    C: expected coverage
                    L: length of reads
                    K: Kmer value used
                    L: genome size (in bp)

                    One option is to ask in velveth to compute different Kmers in the same run
                    $velveth file_name 27,45,2 -fastq
                    where it takes Kmers values from 27 to 45 (in 2); but I think this needs an awesome memory cluster requirement.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Essential Discoveries and Tools in Epitranscriptomics
                      by seqadmin


                      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                      Yesterday, 07:01 AM
                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    39 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    41 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    35 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-04-2024, 09:00 AM
                    0 responses
                    55 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X