Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Topulaneus-Hattum
    Member
    • Feb 2014
    • 14

    I wrote:
    "It's not at all clear to me how sam2pindel knows whether I am giving it alignments from mate-pair instead of alignments from paired-end."

    For anyone else who has a similar problem and happens upon this thread, it turns out sam2pindel has a 6th argument that isn't shown in the user manual, which is how you tell it this difference.

    I now notice that even though the URL says "current" the text says Pindel 0.2.4. I'm using 0.2.5a3. Web docs lag the source.

    T.Hattum

    Comment

    • kartong
      Junior Member
      • Mar 2014
      • 4

      Hi,

      As per AlisonF, I need to extract the reference sequencing depth (RD) at the position of the detected structural variant for subsequent computation.

      However, after running pindel on my novoalign generated .bam files and converting the pindel output into the vcf format using pindel2vcf, I only see the "GT:AD" fields (example below) and not the "GT:RD:AD" field as shown by the other users.


      #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT T101a T101b
      chr1 5919343 . ATC A . PASS END=5919345;HOMLEN=5;HOMSEQ=TCTCT;SVLEN=-2;SVTYPE=DEL GT:AD 0/0:0,0 0/0:4,4
      chr1 16975201 . CT C . PASS END=16975202;HOMLEN=0;SVLEN=-1;SVTYPE=DEL GT:AD 0/0:57,0 0/0:53,4
      chr1 17084536 . TGGAACA T . PASS END=17084542;HOMLEN=2;HOMSEQ=GG;SVLEN=-6;SVTYPE=DEL GT:AD 0/0:55,0 0/0:71,4


      For my analysis, I am using "Pindel version 0.2.5a3, Oct 24 2013." with default settings and the -i option with the config file and "pindel2vcf 0.5.8" to convert the pindel file into the vcf format.

      Is anyone able to assist me on this?


      Thank you.

      Comment

      • mcorces
        Junior Member
        • Feb 2012
        • 2

        Hi Kai Ye,
        I'm attempting to use Pindel (version 0.2.5a3) to annotate Internal Tandem Duplications in the gene FLT3. Others (PMID: 23159595) have had great success doing this with Pindel. I've run into a problem where I'm unable to limit the search to any region using the -c option. I'm particularly interested in the region chr13:28608000-28608600. Every time I execute Pindel with the -c option set to anything other than "ALL", I get a segmentation fault:

        "
        Initializing parameters...
        Pindel version 0.2.5a3, Oct 24 2013.
        Loading reference genome ...
        Loading reference genome done.
        Initializing parameters done.
        SearchRegion::SearchRegion
        Segmentation fault (core dumped)
        "

        However, when I execute with -c ALL, the program runs fine (and finds the desired ITD). I've tried setting -r, -l, and -k to false and I've tried setting -w to 1 to limit the amount of memory used. I've reproduced the command I'm using below (with absolute paths removed for clarity):

        pindel -f hg19.fa -i pindel_input.txt -c 13 -r false -l false -k false -t true -w 1 -o pindel_output

        I could just run the program on all chromosomes but would prefer to run it just on this defined region. Any idea where this seg fault is coming from? Unfortunately to core file isnt actually getting dumped so I cant reproduce that for you.
        Thanks in advance!
        -Ryan

        Comment

        • KaiYe
          Senior Member
          • Jun 2009
          • 133

          Originally posted by mcorces View Post
          Hi Kai Ye,
          I'm attempting to use Pindel (version 0.2.5a3) to annotate Internal Tandem Duplications in the gene FLT3. Others (PMID: 23159595) have had great success doing this with Pindel. I've run into a problem where I'm unable to limit the search to any region using the -c option. I'm particularly interested in the region chr13:28608000-28608600. Every time I execute Pindel with the -c option set to anything other than "ALL", I get a segmentation fault:

          "
          Initializing parameters...
          Pindel version 0.2.5a3, Oct 24 2013.
          Loading reference genome ...
          Loading reference genome done.
          Initializing parameters done.
          SearchRegion::SearchRegion
          Segmentation fault (core dumped)
          "

          However, when I execute with -c ALL, the program runs fine (and finds the desired ITD). I've tried setting -r, -l, and -k to false and I've tried setting -w to 1 to limit the amount of memory used. I've reproduced the command I'm using below (with absolute paths removed for clarity):

          pindel -f hg19.fa -i pindel_input.txt -c 13 -r false -l false -k false -t true -w 1 -o pindel_output

          I could just run the program on all chromosomes but would prefer to run it just on this defined region. Any idea where this seg fault is coming from? Unfortunately to core file isnt actually getting dumped so I cant reproduce that for you.
          Thanks in advance!
          -Ryan
          You have -c 13 and -c chr13. Please make sure that use the chromosome names as in your reference file. If you wish to check a set of genomic regions, you might use the new -j bed option. You would be able to specify chr start end per line and put as many genomic regions as you want. You could combine -c with -j so that Pindel compute regions also constrained by -c.

          Let me know whether this works.

          Kai

          Comment

          • KaiYe
            Senior Member
            • Jun 2009
            • 133

            Originally posted by kartong View Post
            Hi,

            As per AlisonF, I need to extract the reference sequencing depth (RD) at the position of the detected structural variant for subsequent computation.

            However, after running pindel on my novoalign generated .bam files and converting the pindel output into the vcf format using pindel2vcf, I only see the "GT:AD" fields (example below) and not the "GT:RD:AD" field as shown by the other users.


            #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT T101a T101b
            chr1 5919343 . ATC A . PASS END=5919345;HOMLEN=5;HOMSEQ=TCTCT;SVLEN=-2;SVTYPE=DEL GT:AD 0/0:0,0 0/0:4,4
            chr1 16975201 . CT C . PASS END=16975202;HOMLEN=0;SVLEN=-1;SVTYPE=DEL GT:AD 0/0:57,0 0/0:53,4
            chr1 17084536 . TGGAACA T . PASS END=17084542;HOMLEN=2;HOMSEQ=GG;SVLEN=-6;SVTYPE=DEL GT:AD 0/0:55,0 0/0:71,4


            For my analysis, I am using "Pindel version 0.2.5a3, Oct 24 2013." with default settings and the -i option with the config file and "pindel2vcf 0.5.8" to convert the pindel file into the vcf format.

            Is anyone able to assist me on this?


            Thank you.
            We also put RD as part of AD so that AD has two int.

            Comment

            • mcorces
              Junior Member
              • Feb 2012
              • 2

              Hi Kai Ye,
              You're exactly right. Sorry for such a silly error!
              Thanks!
              -ryan

              Comment

              • t.wieland
                Junior Member
                • Mar 2010
                • 8

                Hi KaiYe,

                we're using Pindel to analyse exome data for quite a while now. We are starting to analyse more whole genome datasets now, and running pindel takes quite a while for these datasets (several days using a single thread), especially after switching from v0.2.4t to v0.2.5. So I'm wondering what is the recommended way to analyse large datasets? Currently we are using pindel with standard parameters (except -A which we set to 20) directly on BAM files from BWA and are just increasing the number of threads to 8-10. Is there something we can do better?

                Best,
                Thomas

                Comment

                • KaiYe
                  Senior Member
                  • Jun 2009
                  • 133

                  Originally posted by t.wieland View Post
                  Hi KaiYe,

                  we're using Pindel to analyse exome data for quite a while now. We are starting to analyse more whole genome datasets now, and running pindel takes quite a while for these datasets (several days using a single thread), especially after switching from v0.2.4t to v0.2.5. So I'm wondering what is the recommended way to analyse large datasets? Currently we are using pindel with standard parameters (except -A which we set to 20) directly on BAM files from BWA and are just increasing the number of threads to 8-10. Is there something we can do better?

                  Best,
                  Thomas
                  For exome data, you could use -x 2 or even -x 1 to speed up. Which version are you using? Recently I changed the code and remove one memory leak.

                  You'd better run jobs per chr and maximum 4 threads as you will not cut runtime further with more threads but it is better to increase the number of jobs.

                  Kai

                  Comment

                  • t.wieland
                    Junior Member
                    • Mar 2010
                    • 8

                    Hi Kai,

                    thanks for your reply!

                    Currently we're using v.2_5a3
                    Ok, I also thought about splitting by chromosome, but I wasn't sure if the detection of interchromosomal events still works?

                    Thomas

                    Comment

                    • KaiYe
                      Senior Member
                      • Jun 2009
                      • 133

                      Originally posted by t.wieland View Post
                      Hi Kai,

                      thanks for your reply!

                      Currently we're using v.2_5a3
                      Ok, I also thought about splitting by chromosome, but I wasn't sure if the detection of interchromosomal events still works?

                      Thomas
                      interchromosomal events will not be affected if split per chr.

                      Comment

                      • BGould
                        Member
                        • May 2010
                        • 14

                        pindel variants counted multiple times

                        Originally posted by hshain View Post
                        Thanks for the quick reply. Let's discuss the duplication first. I now see in the output where the "unique" reads are reported for a given event. In the example below, there are 6 reads which are duplicates of each other and Pindel recognizes and reports this:

                        ####################################################################################################
                        4725 D 1 NT 0 "" ChrID chr3 BP 3016666 3016668 BP_range 3016666 3016668 Supports 6 1 + 6 1 - 0 0 S1 7 SUM_MS 314 1 NumSupSamples 1 1 28_1_GTGTTA 6 1 0 0
                        ATTGGATGCATAATAAAATTAAAACATTTTTTGTTTCTGGCATGGCCAATATTGCTATTTGTCTTATAGAAACCTCTTCTCATTACTAAATTATATATTCTgTATAGTGGGCCCCCCTTTCTAATTAATAATTAATATTGTCTTCCAGGCATTTTAGTTACCAAGTGGTAAAGGAAGCTTCTGTGATTTCAACTTCAAGTTA
                        CTTATAGAAACCTCTTCTCATTACTAAATTATATATTCT TATAGTGGGCCCCCCTTTCTAATTAATAATTAATATTGTCTTCCAGGCATTTTAGTTACCAA + 3016209 37 28_1_GTGTTA @DCDF8JN1:204:C0V4FACXX:4:2215:15528:50861/1
                        CTTATAGAAACCTCTTCTCATTACTAAATTATATATTCT TATAGTGGGCCCCCCTTTCTAATTAATAATTAATATTGTCTTCCAGGCATTTTAGTTACCAA + 3016209 37 28_1_GTGTTA @DCDF8JN1:204:C0V4FACXX:4:1312:8307:12733/1
                        CTTATAGAAACCTCTTCTCATTACTAAATTATATATTCT TATAGTGGGCCCCCCTTTCTAATTAATAATTAATATTGTCTTCCAGGCATTTTAGTTACCAA + 3016209 60 28_1_GTGTTA @DCDF8JN1:204:C0V4FACXX:4:1209:17429:19881/1
                        CTTATAGAAACCTCTTCTCATTGCTAAATTATATATTCT TATAGTGGGCCCCCCTTTCTAATTAATAATTAATATTGTCTTCCAGGCATTTTAGTTACCAA + 3016209 60 28_1_GTGTTA @DCDF8JN1:204:C0V4FACXX:4:1201:7760:69122/1
                        CTTATAGAAACCTCTTCTCATTACTAAATTATATATTCT TATAGTGGGCCCCCCTTTCTAATTAATAATTAATATTGTCTTCCAGGCATTTTAGTTACCAA + 3016209 60 28_1_GTGTTA @DCDF8JN1:204:C0V4FACXX:4:1109:9254:43555/1
                        CTTATAGAAACCTCTTCTCATTACTAAATTATATATTCT TATAGTGGGCCCCCCTTTCTAATTAATAATTAATATTGTCTTCCAGGCATTTTAGTTACCAA + 3016209 60 28_1_GTGTTA @DCDF8JN1:204:C0V4FACXX:4:1103:14378:72999/1
                        ####################################################################################################


                        However, Pindel does not seem to properly recognize duplicates when the paired end reads run into each other as in this example:

                        ####################################################################################################
                        4727 D 4 NT 0 "" ChrID chr3 BP 3910869 3910874 BP_range 3910869 3910877 Supports 4 4 + 2 2 - 2 2 S1 9 SUM_MS 240 1 NumSupSamples 1 1 28_1_GTGTTA 2 2 2 2
                        GCAGAAATAAAAAGAAAACATCAAATGCGGCTCTTCCATGACCTGCTAGGATCTGCTTCTACCAAATCATGGATATAGAAATAGGCCCAGCTGCACACCACtcttTCTAATTATCCTGTTCTTCCAATTCCTCTTTTATGCATTTTTTTTGCCCACTCTCTCTCGAAACACAGTAGCTCTGGGAGTTGAAAATTAAGTTTTA
                        AGAAATAGGCCCAGCTGCACACCAC TCTAATTATCCTGTTCTTCCAATTCCTCTTTTATGCATTTTTTTTGCCCACTCTCTCTCGAAACACAGTAGCTCTG + 3910628 60 28_1_GTGTTA @DCDF8JN1:204:C0V4FACXX:4:2214:15997:96580/1
                        CTCTTCTATGACCTGCTAGGATCTGCTTCTACCAAATCATGGATATAGAAATAGGCCCAGCTGCACACCAC TCTAATTATCCTGTTCTTCCAATTCCTCTT - 3911111 60 28_1_GTGTTA @DCDF8JN1:204:C0V4FACXX:4:2214:15997:96580/2
                        AGAAATAGGCCCAGCTGCACACCAC TCTAATTATCCTGTTCTTCCAATTCCTCTTTTATGCATTTTTTTTGCCCACTCTCTCTCGAAACACAGTAGCTCTG + 3910628 60 28_1_GTGTTA @DCDF8JN1:204:C0V4FACXX:4:1302:10814:73798/1
                        CTCTTCTATGACCTGCTAGGATCTGCTTCTACCAAATCATGGATATAGAAATAGGCCCAGCTGCACACCAC TCTAATTATCCTGTTCTTCCAATTCCTCTT - 3911111 60 28_1_GTGTTA @DCDF8JN1:204:C0V4FACXX:4:1302:10814:73798/2
                        ####################################################################################################

                        You can see by the read coordinates that the X and Y position are the same for the forward and reverse reads -- suggesting that they are in the same pair. It is apparent that this is one unique paired-end read, the forward and reverse reads overlap, there is a deletion in the overlapping portion, and the read was duplicated. This is one event that is counted 4 times. I also confirmed that Picard mark dups correctly flagged all 4 reads as duplicates.

                        When I rank order candidates by the reported number of unique reads, these types of events are enriched at the top of my list.


                        Hi Hshain, I have this same issue using Pindel 0.2. Did you ever figure out why this is happening or how to fix/avoid this problem?

                        Comment

                        • KaiYe
                          Senior Member
                          • Jun 2009
                          • 133

                          Pindel marks a read as duplicated if both read pairs are identical. For the first event, both read sequences and the their mates' mapping positions are the same. Thus they are all marked as duplicates. For the second case, the mates' positions differ. Of course, this could be modified depending what we consider as true duplicates.

                          kai

                          Comment

                          • Yann G.
                            Junior Member
                            • Jun 2015
                            • 1

                            Hi Kai

                            I am a little confused about "-u" option (maximum_allowed_mismatch_rate)

                            In 0.2.5a8 version, help says:
                            "Only reads with more than this fraction of mismatches than the reference genome will be considered as harboring potential SVs. (default 0.02) "

                            BUT, on the website (http://gmt.genome.wustl.edu/packages...er-manual.html), it says
                            "only reads with fewer mismatches with the reference genome than this fraction will be considered (default 0.1)"

                            1- Which version should I trust ?
                            2- If I have well understood, this parameter (and the ones below) allow me to play with sensitivity/specificity depending on the kind of result I want (SV discovery or validation)
                            but I am not sure about when do I want to modify one option over the others in the list below :

                            -u (maximum_allowed_mismatch_rate)
                            -H (min_distance_to_the_end)
                            -n (minimum number of edit distance between reads and ref genome)
                            -d (min_num_matched_bases)
                            -a (additional_mismatch)
                            -m (min_perfect_match_around_BP)


                            Thanks!
                            Yann

                            PS: (still on the website) The link to dispersed duplication 's documentation seems dead.

                            Comment

                            • Yongyong.Ren
                              Junior Member
                              • Jul 2016
                              • 2

                              Hi KaiYe,
                              Thanks so much for your attention to this. We are using the latest version 0.2.5b8, 20151210. But we are really confused about some variants with large insertion, for example the record in the xx_SI file:
                              (Line1)287 I.98 NT.98."AAAAAGATTTGACTTGTGTAAACGAACCCATTTTCAAGAACTCTACCATGGTTTTATATGGAGACACAGGTGATAAACAAGCAACCCAAGTGTCAATT" ChrID.13 BP.32911324 32911325 BP_range.32911324 32911334 Supports.40 40 +.40 40 -.0 0 S1.41 SUM_MS.3840 1 NumSupSamples.1 1 tumor.0.0.40.40.0.0
                              (Line2)AAA..................................................................................................................................................................................................AAAGATTTGGTTTATGTTCTTGCAGAGGAGAACAAAAATAGTGTAAAGCAGCATATAAAAATGACTCTAGGTCAAGATTTAAAATCGGACATCTCCTTGAATATAGATAAAATACCAGAAAAAAATAATGATTACATGAACAAATGGGCAG
                              (Line3)AAACAGACTTGACTTGTGTAAACGAACCCATTTTCAAGAACTCTACCATGGTTTTATATGGAGACACAGGTGATAAACAAGCAACCCAAGTGTCAATTATAAAGATTTGGTTTATGTTCTTGCAG............................ + 32911224 96 tumor @M03097:19:000000000-G05CH:1:2104:21078:24842
                              But it seems impossiple exist the insertion 'ACAGACTTGACTTGTGTAAACGAACCCATTTTCAAGAACTCTACCATGGTTTTATATGGAGACACAGGTGATAAACAAGCAACCCAAGTGTCAATTAT' based on the blat result for this read '@M03097:19:000000000-G05CH:1:2104:21078:24842':

                              Blat result:
                              Genomic chr13 :
                              gactctgaag aacttttctc agacaatgag aataattttg tcttccaagt 32911173
                              agctaatgaa aggaataatc ttgctttagg aaatactaag gaacttcatg 32911223
                              AAACAGACTT GACTTGTGTA AACGAACCCA TTTTCAAGAA CTCTACCATG 32911273
                              GTTTTATATG GAGACACAGG TGATAAACAA GCAACCCAAG TGTCAATTAa 32911323
                              aAAAGATTTG GTTTATGTTC TTGCAGagga gaacaaaaat agtgtaaagc 32911373
                              agcatataaa aatgactcta ggtcaagatt taaaatcgga catctccttg 32911423
                              aatatagata aaataccaga aaaaaa

                              We try to send the temp files (bam and the input file for pindel) to you, but the email address like [email protected] and [email protected] are already invalid.
                              Thanks in advance for your reply
                              my email is [email protected]
                              Yongyong Ren
                              Last edited by Yongyong.Ren; 07-17-2016, 07:29 PM.

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM
                              • SEQadmin2
                                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                by SEQadmin2


                                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                Introduction

                                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                05-22-2026, 06:42 AM
                              • SEQadmin2
                                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                by SEQadmin2

                                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                05-06-2026, 09:04 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              19 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 11:40 AM
                              0 responses
                              14 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-28-2026, 11:40 AM
                              0 responses
                              29 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-26-2026, 10:12 AM
                              0 responses
                              31 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...