Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SNP calling with paired and single read data

    Hi all,

    I'm working with a set of Illumina paired-end data and my reference genome is a non-model organism made up of only scaffolds and contigs at the moment.
    My overall aim is to use the reference as a guide for assembly as well as SNP calling.

    My first question is, should I do assembly first and use it for SNP calling, or are these two processes separate and independent?

    For now, I've decided to work on the SNP calling part first, by mapping my sequence reads (using BWA) to the set of publicly available reference scaffold and contigs. But before that, I did a quality trim step which left me with paired and unpaired reads (single).

    My second question is, how do I deal with the unpaired reads? Can I also use them for SNP calling and later merge my SNP results with the ones I obtain with paired reads?

    Any help/suggestion is much appreciated.
    Thanks!
    Last edited by mht; 10-08-2012, 01:29 AM.

  • #2
    1. You can use assembly or BWA mapping for SNP calling. So the two process are separate.
    2. I usually does not use unpaired reads, because BWA can make higher quality mappings with paired reads. But you have only scaffolds, so I think you could use those reads.

    Comment


    • #3
      Much thanks TiborNagy for your reply.

      I'm wondering what the difference will be if I use my assembly for SNP calling versus BWA mapping. I'm thinking of pros and cons of doing SNP calling based on assembly:

      pro: if my sample is not so similar to the reference, then calling SNPs after assembly would be better.
      con: the SNP results I get will be dependent on the quality of my assembly.

      Any advice on this?

      Comment


      • #4
        Well, if there is no consensus, I think your assembly maybe has much more benefits than a simple BWA mapping. Do not forget that the quality of the contigs also limit the SNP results.

        Comment


        • #5
          Hi TiborNagy,

          I was wondering if you know of any good open source reference assembly programs that I can use for references which contain scaffolds/contigs and not full assemblies?

          Comment


          • #6
            Hi mht,

            My personal favourite is MIRA.

            Comment


            • #7
              Hi TiborNagy,

              I've used MIRA and am now wondering what I should do next and how to judge the quality of my assembly. I can calculate the N50 and total size of the original reference, and compare it with my assembly. Should the N50 and total size increase as compared to the original reference? I notice that MIRA did mostly mapping back to the backbone, but also did some extensions to each backbone contig.

              Also, since it is already a reference assembly, is it necessary to try to join the contigs to see if any overlap (since some contigs have been extended). Are there any programs you can recommend?

              Thanks so much for your help.

              Comment


              • #8
                It is hard to say anything about the comparison of your assembly and the original one. It depends on many factors. For your second question: I can not recommend other programs.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Essential Discoveries and Tools in Epitranscriptomics
                  by seqadmin




                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                  04-22-2024, 07:01 AM
                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Today, 08:47 AM
                0 responses
                12 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                60 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                59 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                54 views
                0 likes
                Last Post seqadmin  
                Working...
                X