Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    The main peak is still below 10x, which will make assembly very fragmented. But this data looks a lot better than the normalized data. I don't know which normalizer you used, or what settings, but the output came out very strange - I would expect the raw reads to give a better assembly than the normalized ones.

    Comment


    • #17
      Thank you Brian. I appreciate for your all advices.

      I have the last question. I am not still clear how you interpret the kmer frequency histogram. Could you explain a bit more on how to interpret the histogram? Thank you.

      Comment


      • #18
        The kmer frequency histogram tells you the depth of coverage of the target genome. This is more complicated in a metagenome, but if you have a single bacterial genome with no repeats, then with very even coverage, you would get a single peak in your graph. With repeat content, there will be additional higher peaks. You get the best assembly when this primary peak is narrow and above some minimum depth. Velvet does well when the primary peak is around 30x to 40x coverage (that's the X axis).

        If most of your kmers occur fewer than 10 times or so, you can't assemble them very well.

        Comment


        • #19
          First off, I discovered that I'd counted my initial reads wrong, so my .sam file has the right number of reads afterall.

          Instead of bowtie, I tried mapping with bbmap. On my first attempt, I reduced the percentage of unmapped reads from 50% to 29%. I'm planning on running bbmap again with a more sensitive set of input parameters.

          Annoyingly, this recent sam file produced a bam file which samtools' index command seems unable to index. It keeps stopping and outputting "Killed".

          Comment


          • #20
            Originally posted by AndrewRGross View Post
            Instead of bowtie, I tried mapping with bbmap. On my first attempt, I reduced the percentage of unmapped reads from 50% to 29%. I'm planning on running bbmap again with a more sensitive set of input parameters.

            Annoyingly, this recent sam file produced a bam file which samtools' index command seems unable to index. It keeps stopping and outputting "Killed".
            The bam format does not really allow files with more than 2GB of header data. I don't know if that's the problem in your case, but if you do have millions of contigs, it's a possibility. You can often get around it by filtering out the contigs shorter than a certain length, which are often kind of useless anyway. I think the latest version of samtools is fixed to allow larger headers, but there's no assurance that downstream tools will be able to process the bam.

            If you want to run bbmap with greater sensitivity, you can use the "slow" flag, and reduce the "minratio". The default is "minratio=0.56", which allows mappings with scores down to 56% of the max. Also, if you want to map really low quality reads, you can set "qtrim=rl trimq=10 untrim". This will quality-trim reads to q10 before mapping, then undo the trimming after mapping (trimmed bases will be soft-clipped). The "local" flag will also slightly improve mapping rates by doing local rather than global alignments.

            Comment


            • #21
              A message of "killed" usually means that the OS killed it (often due to using all the memory). You might go through the system logs and see if those tell you what went wrong.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM
              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              18 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              22 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              17 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              49 views
              0 likes
              Last Post seqadmin  
              Working...
              X