Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Newbler and read coverage

    Hello,

    I am running Newbler under several parameters and have some questions.

    1) Is there any parameter that specifies the coverage of the sample used by Newbler in the assembly process?

    2) If there are areas with high coverage (eg repetitive regions) regions with low coverage are simply excluded from the assembly process because they are considered as sequencing errors (due to low coverage)?


    I'm trying to assemble a genome that has major discrepancies in coverage and my results are not very good.


    Thanks in advance,
    André

  • #2
    You can tell Newbler your expected genome coverage with the '-e' parameter.

    There is a long list of options/flags/parameters for a newbler assembly, some of which have been treated in the previous post. In this post I will describe some more parameters. At the end, as a bo…


    Newbler will not in my experience discard low coverage regions just because of higher coverage regions, I regularly have contigs with 1x coverage, for example.

    Are you sure that you actually have reads for these low coverage regions in your dataset?

    Another option you might like to play with is '-urt'.
    Recently, newbler version 2.5.3 became available. With this post, I’ll describe the changes between this version, and the previous (2.3). As I have not yet described the gsMapper function of …

    Comment


    • #3
      If there are regions with extreme coverage (hundreds of thousand times), you might want to filter some of the reads that go into these regions, and assemble again with a more evenly covered read dataset.

      Comment


      • #4
        Thanks, for all the aswers. The parameter -urt reallys works for my problem. I will try also your suggestion flxlex.

        André

        Comment


        • #5
          Hello!

          I have a Newbler run output (I did not launch it myself) and I want to know the depth of coverage of every position. Or even better where the reads were placed into the scaffolds (or contigs).

          For what I've seen I could get this information from 454AlignmentInfo.tsv file, but I don't have it since -info option was not used (I'm trying to avoid running Newbler again).

          Also, I could get the information of where the reads were placed (contig, start and end) from the 454ReadStatus.txt file, but only for the reads that were completely assembled to a single contig. (Is this correct?). For the rest of the reads there is information of where the 5' and 3' ends mapped but I can't know if these reads map into some other contig, or how much of the length of the read was mapped (when Partially Assembled).

          Is the coverage information somewhere else?
          Thank you for your help, any inputs are appreciated.

          Nuria

          Comment


          • #6
            Hi again!

            After digging on 454ReadStatus.txt I am starting to realize that I cannot get information about where every read was placed.

            In my case 94,27% of the reads have their 5' start and 3' end positions assigned to the same contig. I thought I could use these start and end points like coordinates. But apparently I was wrong. Only 5% of these reads have their ends in the same strand (5' strand + and 3' strand + or 5' strand - and 3' strand -). (All reads were Assembled reads, but one that was PartiallyAssembled)

            So I'm giving up on this, but I still don't understand why according to 454ReadStatus.txt most reads are not completly colinear with a contig fragment. What seems reasonable to me. Does anyone know why?

            I assume coverage information for each base is only available from 454AlignmentInfo.tsv. Am I correct? Is there another way?

            I've read other threads about 454ReadStatus, and Newbler output files in general, but in my opinion they were not discussing exactly my issue.

            Thank your for your time

            Comment


            • #7
              Nuria,

              Maybe this website can help you.

              The single file I’ll discuss today has in fact almost the entire assembly in it, besides the actual sequences (although even some of these are also included, see below). As explained in my first po…


              Everything we need to know about newbler is there.
              Best regards,
              André

              Comment


              • #8
                Thank you André!

                Yes, I've reading flxlex blog too There is where I found out about 454AligmentInfo.tsv, since I don't have this file.

                But I could not find an alternative, just like I could not find an explanation for my opposite strands problem.

                Maybe there is no alternative to AlignmentInfo to get the coverage, even though I had to try it, and ask it here. :|

                Comment


                • #9
                  You would expect the read orientations in the 454ReadStatus.txt file for reads that start and end in the same contig to be either '+' and '-', or '-' and '+'. That was put in 'by definition'. Are you sure the 5% with the same strand map to the same contig?

                  And I think you are in fact stuck without the 454AligmentInfo.tsv file, unless you have the 454Contigs.ace file, which has all the read positions.

                  Comment


                  • #10
                    Sorry, actually only 70/20199527 reads with ends in the same contig are in the same strand. I was counting also the repeats and singletons without strand information.

                    Thank you, for your answer and your blog

                    Nuria

                    Comment


                    • #11
                      So, to avoid assembling the genome (I would get a different output) I am trying to use GS Reference Mapper (v2.6), with the same reads used in the assembly (I know is not the same, but I hope is a good proxy).

                      I used the -bam option because I want to know the coverage of some regions, and I only know how to do this with bedtools. But every time I launch the mapper with -bam option I get an error. And if I launch it exactly the same but without the -bam option it works nicely.

                      nohup runProject -bam myprojectname >myproj_status &
                      Error: An internal error (segmentation fault) has occurred in the computation.
                      My guess is this is probably related to this:

                      Originally posted by flxlex View Post
                      I got the impression that SAM support for newbler was kind of experimental still. So, please go ahead and report a bug to Roche/454!
                      If so, has someone found a way to use the 454AlignmentInfo.tsv file to find out the coverage of particular regions?

                      Any help is appreciated

                      Thank you

                      Comment


                      • #12
                        ok, if someone encounters this problem:

                        I found that if I try to map 454 and fasta sequences gsMapper fails to create a bam file, but if I only map 454 reads it creates the bam file.

                        Nuria

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Current Approaches to Protein Sequencing
                          by seqadmin


                          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                          04-04-2024, 04:25 PM
                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 04-11-2024, 12:08 PM
                        0 responses
                        24 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 10:19 PM
                        0 responses
                        25 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 09:21 AM
                        0 responses
                        23 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-04-2024, 09:00 AM
                        0 responses
                        52 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X