Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Newbler runMapping via command line

    Hello everyone, I am new to this forum and this is my first post so I hope someone can help me.

    I have some 454 transcriptomic data which I am trying to analyse using Newbler mapping to the human GRCh37.61 cDNA fasta reference. I am having to run Newbler via the command line at the moment as I do not have enough RAM to launch it via Java. However, it seems to be running quite well this way so I am not too bothered about not being able to launch it via Java.

    However, I am exploring the command line options to try and improve the number of reads which are fully/partially mapping. I am still getting a large number of reads which are classified as repeats and I wondered if anyone had any tips on how to improve the quality of my mapping. (The default settings gave me 11% fully mapped, 6% partially mapped, 32% unmapped, 41% repeat, 6.5% chimeric, 3.5% too short).

    I have tried decreasing the seed length from 16 to 10 and this greatly decreased the number of unmapped reads, but increased the number of repeats (almost 50%). I have also changed the repeat score threshold from default (12) to 0 which has improved it a bit more and has greatly increased the number of contigs generated. I am now playing with the minimum overlap length but am getting more chimeric reads.

    I am really just arbitrarily changing these numbers and could sit here from now until Christmas doing this, so I wondered if anyone had any advice or tips they could give me.

    Before you ask why I am not using the assembler, well I just don't think I have enough reads to get a good assembly. My dataset contains around 50,000 reads per sample. What do you think?

    Any advice would be very much appreciated. Thank you in advance.

    Helen

  • #2
    Reads marked 'Repeat' map equally well to multiple locations in the reference. The settings you are trying are not going to change that...

    The only thing I can think of is to have more stringent alignment requirements, so that perhaps these reads start mapping uniquely (i.e. reads from different paralogues mapping to just one of the copies). This can be done by

    - increasing the minimum overlap length -ml, default is 40 bases, but you can go up to higher numbers, or even better, use '-ml 90%' to force at least 90% of the length of the read to map (or try 95%).
    - increasing the minimum overlap identity, -mi, default 90, but you could try '-mi 95' (no % here).

    On the other hand, you might get less reads mapped this way....

    Good luck anyways!

    Comment


    • #3
      Thank you for replying so quickly. I have been exploring many options with Newbler mapping.

      Unfortunately, the options you suggested did not improve the number of reads mapped. However, I think I may have worked out the problem. I am using a cDNA fasta reference as I have transcriptome reads. I have had a look at some of the reads which are 'unmapped' and a quick BLAST of a couple shows these are ribosomal RNAs (and as such will not be in my cDNA fasta file).

      I wonder if anyone else has noticed this in the past? Do you know of a fasta file containing rRNA sequences that I could concatenate with my cDNA reference to maybe annotate my 'unmapped' reads?

      Thank you
      Helen

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM
      • seqadmin
        Techniques and Challenges in Conservation Genomics
        by seqadmin



        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

        Avian Conservation
        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
        03-08-2024, 10:41 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Yesterday, 06:37 PM
      0 responses
      11 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, Yesterday, 06:07 PM
      0 responses
      10 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-22-2024, 10:03 AM
      0 responses
      51 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-21-2024, 07:32 AM
      0 responses
      68 views
      0 likes
      Last Post seqadmin  
      Working...
      X