Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Unused reads in SOAPdenovo

    Hi all,

    I want to extract the reads that were not used in the assembly process of SOAPdenovo.
    I did not find a straight forward way of doing so. I thought about mapping the reads back to the assembly and taking the unmapped reads. But it seems like a simple output SOAPdenovo could give, and I'm not sure the unmapped reads from re-mapping are the same as the reads that were not used for the assembly.

    Does anyone know a better way of getting the unused reads?

    Thanks,
    Rachelly.

  • #2
    Originally posted by Rachelly View Post
    Does anyone know a better way of getting the unused reads?
    There usually isn't any nice way in De Bruijn assemblers - they don't generally track where the k-mers from the reads ended up.

    Unmapped reads from the re-mapping is your best bet - i think there is such an output already from the soap 'map' step.

    Comment


    • #3
      Thanks for your reply tonybloger.
      SOAPdenovo doesn't supply useful data from the "map" step, there is no way to know what reads the indices refer to..

      So I did remapping of the reads to the assembly, but got a totaly different amount of reads mapped back to the assembly, than what SOAPdenovo states in the log file.. It seems that only about 1/3 of the reads were able to re-map to the assembly when using BWA or Bowtie, while SOAPdenovo showed over 94% mapping!

      SOAPdenovo states:
      Code:
      15646393 out of 16551980 (94.5)% reads mapped to contigs
      While mapping with Bowtie gives:
      Code:
      # reads processed: 8275990
      # reads with at least one reported alignment: 862689 (10.42%)
      # reads that failed to align: 7413301 (89.58%)
      Reported 862689 paired-end alignments to 1 output stream(s)
      And BWA:
      Code:
      16551980 + 0 in total (QC-passed reads + QC-failed reads)
      0 + 0 duplicates
      6108971 + 0 mapped (36.91%:nan%)
      16551980 + 0 paired in sequencing
      8275990 + 0 read1
      8275990 + 0 read2
      3906364 + 0 properly paired (23.60%:nan%)
      4618869 + 0 with itself and mate mapped
      1490102 + 0 singletons (9.00%:nan%)
      1192368 + 0 with mate mapped to a different chr
      1188458 + 0 with mate mapped to a different chr (mapQ>=5)
      I tried to map only one end of the reads to the assembly, to see if the problem has to do with the insert size or pairing and got similar results.

      Does anyone know why is there such a big difference between the mapping of SOAPdenovo and after-assembly-mapping?

      Thanks!
      Rachelly.

      Comment


      • #4
        I also have this problem.

        From the SOAPdenovo log, it seems about 90% reads align to the contig.

        However, when I use the bowtie trying to align the raw reads to the contig file, it is also just about 1/3 reads align to the contig.

        I also don't understand why there is so much difference between SOAPdenovo log and after-assembly-mapping?

        Thanks!

        Jingjing

        Comment


        • #5
          I am going to this soon! Is it possible that the contig sequences are different in the log file and the real output(.contig) ?

          Comment


          • #6
            So, the contig file will be different from the final scafSeq file, but only because you've done scaffolding, some error correction, and probably gap filling as well. So, all else equal, you should see more or at least similar numbers of reads map to your genome in the final SOAPdenovo output than in the contig file. However, your after assembly mapping might not use the same mechanism as the SOAP map. When using bwa or bowtie, you may need to loosen the alignment parameters to obtain the same level of mapping.

            Comment


            • #7
              Thanks to Wallysb01!
              As you mean the mapping mechanism is different from the ones used by bwa or bowtie, then what parameters should be changed in the bwa/bowtie softwares? It is hard for me to give an equivalent parameterset, because I do not know any rules for mapping by SOAPdenovo.
              Thanks!

              Comment


              • #8
                Originally posted by Rachelly View Post
                Does anyone know why is there such a big difference between the mapping of SOAPdenovo and after-assembly-mapping?
                I found this more common with smaller value of K. It could be the case that SOAPdenovo counts a read "used" if only a kmer from that read is used. The read alignment algorithms require the entire read to be aligned.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Essential Discoveries and Tools in Epitranscriptomics
                  by seqadmin


                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                  Yesterday, 07:01 AM
                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                39 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                41 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                35 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                55 views
                0 likes
                Last Post seqadmin  
                Working...
                X