Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Running Pindel prior to Dindel

    In the Dindel manual, it recommends generating candidate indels using Pindel.

    Has any one got experience with that?

    I'm trying to achieve that but have the following questions:
    1) Pindel generates 6 files for each chromosome. Can we potentially convert all of them as candidate indels for Dindel? Or
    2) is there a limit on the size of indel that can be processed by Dindel (120bp?)?
    3) Is there a restriction on the type of indels that can be processed by Dindel, e.g. will it handle inversion, non-template sequence insertion+deletion?
    4) Does anyone have scripts to share for converting Pindel output to a format that can be handled by Dindel?

    Thanks
    Jason
    jason.li @ petermac.org

  • #2
    hi Jason,

    I should work on a script to convert Pindel output for VCF 4.0 as it is requested by the members of 1000 genomes project. I need to find time for it.

    In Dindel paper, the simulated indels are 1-10bp and they are short insertions and deletions. Could Kees comment here? What is the maximum size of indel you would recommend users to push into Dindel?

    To my knowledge, inversions, non-template sequence and tandem duplications are better characterized as structural variants (SV). Could Kees indicate whether SVs will be handled by Dindel?

    I would position Pindel as primary variant caller. It doesn't report genotyping likelihood yet. If Dindel could cover genotyping of both short indels and SVs, I would rather do something else more useful to the community.

    Kai

    Comment


    • #3
      Hi Jason,

      the current version of Dindel does not do a banded alignment of reads against haplotypes. As a result, very long haplotypes will become memory and time-intensive to analyze.

      For the 1000 Genomes project I considered candidate indels up to 50 bp. Dindel will make sure that the haplotype is long enough to contain the variant in the case of a deletion. Currently, Dindel will only consider variants that can be expressed as an insertion or a deletion, so this makes it not particularly suited to genotype or call SVs in general.

      I am working on a version that takes any VCF4 variant (any REF, ALT combination) sequence as input, and will perform the alignment more efficiently so that longer (>50 bp) variants can be handled as well.

      Kees

      Comment


      • #4
        Hello,

        We are considering the usage of Pindel with Illumina PE reads of 83bp. Due to the longer read length, the reads are probably more likely to contain SNPs or sequencing errors in their alignments. I have two questions regarding this issue:

        1) In the current version of Pindel, are the anchor sequences still required to align uniquely and perfectly to the reference sequence, and

        2) When searching for the minimum and maximum unique substrings, are the "unique" substrings still required to be unique, or has the inexact matching been implemented?

        If perfect matching is still being used in either anchor definition or pattern growth, would you recommend preprocessing steps, such as trimming of the reads to 36-50bp, for the reads before using them as Pindel input?

        Thanks,
        Yilong Li

        Comment


        • #5
          hi Yilong Li

          1. It is better to have the anchor reads aligned uniquely as Pindel only examines the nearby regions to split-read map the reads. The issue of uniqueness is less important for longer reads. If you want to have a sensitive callset, you may put the read multiple times in the input if the anchor read is aligned to several locations.

          2. The uniqueness of split-read mapping is still required and mismatch has been implemented so that you don't have to worry about SNPs or sequence errors. We do have nice examples (confirmed experimentally) in cancer genome data that indel and nearby (3bp apart) SNP are discovered by Pindel at the same time.

          Please don't trim your data, currently the new version of Pindel can handle up to 5% sequence error rate. 5 mismatches for 100bp reads. It is less likely that you would need higher error rate but I can set it even higher if necessary.

          Kai

          Comment


          • #6
            Thank you for your quick reply! Just one clarifying question: can the anchor sequences be aligned with mismatches, or will the newest Pindel version only consider perfectly matched anchor sequences as described in the original Pindel article?

            Yilong Li

            Comment


            • #7
              The anchor read may also contain mismatches or indels. The new version of Pindel is rather different than the one in the paper because the read length was short (36bp) at that time.

              Kai

              Comment


              • #8
                I don't know if I am not looking from the right place, but it seems to me that www.ebi.ac.uk/~kye/pindel/ does not exist. I found Pindel's sourceforge page by googling, but there were no software files there either. Is Pindel still available for download and usage somewhere?

                Yilong

                Comment


                • #9
                  Originally posted by Yilong Li View Post
                  I don't know if I am not looking from the right place, but it seems to me that www.ebi.ac.uk/~kye/pindel/ does not exist. I found Pindel's sourceforge page by googling, but there were no software files there either. Is Pindel still available for download and usage somewhere?

                  Yilong
                  Hi Yilong,

                  you can find the latest Pindel version at
                  ftp://ftp.sanger.ac.uk/pub/zn1/pindel/

                  And we are building a website for better support of Pindel at


                  Kai

                  Comment


                  • #10
                    Thanks!
                    Yilong

                    Comment


                    • #11
                      Dear Kai,

                      Does Pindel allow detection of indels for overlapping paired-end reads?

                      Cheers,

                      Dave

                      Comment


                      • #12
                        Originally posted by dnusol View Post
                        Dear Kai,

                        Does Pindel allow detection of indels for overlapping paired-end reads?

                        Cheers,

                        Dave
                        Dear Dave,

                        In principle, Pindel works also for overlapping paired-end reads, even for single end reads.

                        You may need to do the following to achieve a better result:
                        1. extract reads with bam2pindel.pl (http://www.ebi.ac.uk/~kye/pindel/v_0..._PINDEL.tar.gz) as the internal BAM input function of Pindel may break (or need some changes);
                        2. modify the coordinate in the extracted records by shifting one read length.
                        Say read length is 105 and you saw:
                        @readn
                        AGGTAATA...
                        + chr1 10000 60 200 sample1

                        change the third line to
                        + chr1 9900 60 200 sample1

                        If you saw "-", put 10100 there, for example.

                        3. run Pindel (source code at https://trac.nbic.nl/pindel/)

                        Good luck and send me an email if you have any questions.

                        Kai

                        Comment


                        • #13
                          Hi Kai,

                          does it matter adding or subtracting 100b or 105b, since the reads are 105 bp long? I still have to look into the details of bam2pindel.pl, but what is that third column mean?

                          D.

                          Comment


                          • #14
                            protocol for using pindel candidates with dindel

                            Hi All,

                            Is there a protocol for preparing and running the pindel candidates with dindel, i.e. an example? I have run both programs and they are great. We want to use both dindel and pindel, but I am having some trouble understanding how to run dindel with the pindel candidates as we do need the VCF for pindel.

                            Thanks,

                            Dexter Duncan

                            Comment


                            • #15
                              Originally posted by dnusol View Post
                              Hi Kai,

                              does it matter adding or subtracting 100b or 105b, since the reads are 105 bp long? I still have to look into the details of bam2pindel.pl, but what is that third column mean?

                              D.
                              105 would be correct as it the read length.
                              The columns are
                              mapping_strand chr_name mapping_pos mapping_quality insert_size sample_tag

                              The new version of Pindel accepts BAMs and/or pindel input. If you have normal Illumina BWA BAMs, just provide a simple configure file and Pindel will read all BAMs in the list.

                              In your case, we need to do a little bit more to be safe.

                              Kai

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-27-2024, 06:37 PM
                              0 responses
                              13 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-27-2024, 06:07 PM
                              0 responses
                              12 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              53 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              69 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X