Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bug in Pindel

    My run is not completed at chr9, for other chr is fine.
    Please see the log file as the following.
    How do you think I should fix this issue on inversion searching, and how to explain the corruption of the run in chr9?


    $ tail -20 HF19_chr9_log.txt
    Sorting and outputing deletions ...
    Deletions: 10269

    Searching deletion-insertions ...
    Total: 111710 +61073 -50637
    Sorting and outputing deletions with non-template sequences ...
    Added: Sorting and outputing deletions with non-template sequences ...
    deletions with non-template sequences: 1072

    Searching tandem duplication events ...
    Total: 41724 +15707 -26017
    Sorting and outputing tandem duplications ...
    Tandem duplications: 35

    Searching tandem duplication events with non-template sequence ...
    Total: 18411 +4855 -13556
    Sorting and outputing tandem duplications with non-template sequence ...
    Tandem duplications with non-template sequence (TD_NT): 2

    Searching inversions ...

  • #2
    your Pindel version? The latest version is 0.2.5 at github

    Originally posted by GFP View Post
    My run is not completed at chr9, for other chr is fine.
    Please see the log file as the following.
    How do you think I should fix this issue on inversion searching, and how to explain the corruption of the run in chr9?


    $ tail -20 HF19_chr9_log.txt
    Sorting and outputing deletions ...
    Deletions: 10269

    Searching deletion-insertions ...
    Total: 111710 +61073 -50637
    Sorting and outputing deletions with non-template sequences ...
    Added: Sorting and outputing deletions with non-template sequences ...
    deletions with non-template sequences: 1072

    Searching tandem duplication events ...
    Total: 41724 +15707 -26017
    Sorting and outputing tandem duplications ...
    Tandem duplications: 35

    Searching tandem duplication events with non-template sequence ...
    Total: 18411 +4855 -13556
    Sorting and outputing tandem duplications with non-template sequence ...
    Tandem duplications with non-template sequence (TD_NT): 2

    Searching inversions ...

    Comment


    • #3
      My version is Pindel version 0.2.5a4

      My running version is Pindel version 0.2.5a4

      Comment


      • #4
        can you tell me more about your data? Illumina, paired-end? insert size?

        Originally posted by GFP View Post
        My running version is Pindel version 0.2.5a4

        Comment


        • #5
          I am running illumina pair end

          My reads are from Illumina, paired-end, insert size is 250bp.
          The Pindel call run smoothly in most chromosomes, but in 5 chr, the program stop upon searching for inversion.

          I am wondering how to debug at this point.
          Thanks!

          Comment


          • #6
            Originally posted by GFP View Post
            My reads are from Illumina, paired-end, insert size is 250bp.
            The Pindel call run smoothly in most chromosomes, but in 5 chr, the program stop upon searching for inversion.

            I am wondering how to debug at this point.
            Thanks!
            so Pindel terminates when searches for inversions? How much memory it uses when existing? Can you run Pindel with -c to find the genomic region where it went wrong and extract the local data around. Then send to me ([email protected]) for debug?

            Comment


            • #7
              Here is my log file during pindel running

              The memory of it used is 20% when exit

              /apps/x86_64/pindel/pindel -f Homo_sapiens_assembly19.fasta -i HF.Conf.txt -c 9:130,000,000-140,000,000 -o chr9_bug130140 > ./chr9_log130140 &

              tail -50 chr9_log130140
              Bam file name /verhaak-data/users2/xhu3/RNASeq_validation/HF3205_TTAGGC.unaligned.withRG.GATKRecalibrated.flagged.bam
              Number of split-reads so far 1627373

              Insertsize in config: 250
              Number of problematic reads in current window: 3316451, + 1723170 - 1593281
              Number of split-reads where the close end could be mapped: 1710740, + 815723 - 895017
              Percentage of problematic reads with close end mapped: + 47.34% - 56.17%
              BAM file index 18
              Bam file name /verhaak-data/users2/xhu3/RNASeq_validation/HF3245_GTCCGC.unaligned.withRG.GATKRecalibrated.flagged.bam
              Number of split-reads so far 1710740

              There are 6313891 reads supporting the reference allele.
              There are 19 samples.
              SampleName2Index done
              declaring g_RefCoverageRegion for 19 samples and 5000001 positions.
              There are 1710740 split-reads for this chromosome region.

              There are 0 split-reads mapped by aligner.
              search far ends
              Far end searching completed for this window.
              update FarFragName
              update FarFragName done
              save interchromsome SR
              Searching and reporting variations
              Reads already used: 0
              Far ends already mapped 1368438
              Checksum of far ends: 853428827
              Searching some type of variant, replace this with the correct name in child class ...
              Total: 992452 +446326 -546126
              Sorting and outputing deletions ...
              Deletions: 1860

              Searching deletion-insertions ...
              Total: 33596 +18337 -15259
              Sorting and outputing deletions with non-template sequences ...
              Added: Sorting and outputing deletions with non-template sequences ...
              deletions with non-template sequences: 268

              Searching tandem duplication events ...
              Total: 10882 +4788 -6094
              Sorting and outputing tandem duplications ...
              Tandem duplications: 8

              Searching tandem duplication events with non-template sequence ...
              Total: 4511 +997 -3514
              Sorting and outputing tandem duplications with non-template sequence ...
              Tandem duplications with non-template sequence (TD_NT): 2

              Searching inversions ...

              Comment


              • #8
                hi GFP,

                The running log file looks fine to me. I noticed the string "unaligned" in your file name. Is your bam file complete without removing any reads?

                send me some data to reproduce the error.

                Kai

                Originally posted by GFP View Post
                The memory of it used is 20% when exit

                /apps/x86_64/pindel/pindel -f Homo_sapiens_assembly19.fasta -i HF.Conf.txt -c 9:130,000,000-140,000,000 -o chr9_bug130140 > ./chr9_log130140 &

                tail -50 chr9_log130140
                Bam file name /verhaak-data/users2/xhu3/RNASeq_validation/HF3205_TTAGGC.unaligned.withRG.GATKRecalibrated.flagged.bam
                Number of split-reads so far 1627373

                Insertsize in config: 250
                Number of problematic reads in current window: 3316451, + 1723170 - 1593281
                Number of split-reads where the close end could be mapped: 1710740, + 815723 - 895017
                Percentage of problematic reads with close end mapped: + 47.34% - 56.17%
                BAM file index 18
                Bam file name /verhaak-data/users2/xhu3/RNASeq_validation/HF3245_GTCCGC.unaligned.withRG.GATKRecalibrated.flagged.bam
                Number of split-reads so far 1710740

                There are 6313891 reads supporting the reference allele.
                There are 19 samples.
                SampleName2Index done
                declaring g_RefCoverageRegion for 19 samples and 5000001 positions.
                There are 1710740 split-reads for this chromosome region.

                There are 0 split-reads mapped by aligner.
                search far ends
                Far end searching completed for this window.
                update FarFragName
                update FarFragName done
                save interchromsome SR
                Searching and reporting variations
                Reads already used: 0
                Far ends already mapped 1368438
                Checksum of far ends: 853428827
                Searching some type of variant, replace this with the correct name in child class ...
                Total: 992452 +446326 -546126
                Sorting and outputing deletions ...
                Deletions: 1860

                Searching deletion-insertions ...
                Total: 33596 +18337 -15259
                Sorting and outputing deletions with non-template sequences ...
                Added: Sorting and outputing deletions with non-template sequences ...
                deletions with non-template sequences: 268

                Searching tandem duplication events ...
                Total: 10882 +4788 -6094
                Sorting and outputing tandem duplications ...
                Tandem duplications: 8

                Searching tandem duplication events with non-template sequence ...
                Total: 4511 +997 -3514
                Sorting and outputing tandem duplications with non-template sequence ...
                Tandem duplications with non-template sequence (TD_NT): 2

                Searching inversions ...

                Comment


                • #9
                  I will try to extract some BAM file (100 in depth) to you, it is too large to send you just one intact file by mail, Thanks!

                  Comment


                  • #10
                    Where to find the source code for Pindel 0.2.5?
                    And what is the difference between Pindel 0.2.4 and Pindel 0.2.5?


                    Thanks!
                    Last edited by GFP; 06-01-2014, 11:10 PM.

                    Comment


                    • #11
                      I run Pindel 0.2.4 on HPC, yet the program stopped, how do you think I should debug?
                      ############################################ This is PBS file
                      # !/bin/bash
                      # PBS -S /bin/bash
                      # PBS -N PINDEL_test
                      # PBS -d /scratch/genomic_med/xhu3/Pindel
                      # PBS -e /scratch/genomic_med/xhu3/Pindel -o /scratch/genomic_med/xhu3/Pindel
                      # PBS -q medium
                      # PBS -l nodes=20: ppn24, walltime=24:00:00
                      # PBS -V
                      /scratch/rists/hpcapps/x86_64/pindel/0.2.4/pindel -f /scratch/genomic_med/xhu3/Pindel/Homo_sapiens_assembly19.fasta -i /scratch/genomic_med/xhu3/Pindel/HF.Conf.txt -c 21 -o HF_1 > /scratch/genomic_med/xhu3/Pindel/HF_1_logfile

                      $ tail HF_1_logfile
                      adding interchr 20 5166330 5166906 576 3 147667641 147668217 576 to breakdancer events. Support: 5 6
                      adding interchr 20 5165153 5165805 652 1 150756131 150756707 576 to breakdancer events. Support: 9 10
                      adding interchr 20 5044685 5045337 652 6 37623197 37623773 576 to breakdancer events. Support: 7 8
                      adding interchr 20 5023012 5023588 576 3 141574379 141574955 576 to breakdancer events. Support: 9 10
                      adding interchr 20 5012933 5013509 576 11 11199500 11200076 576 to breakdancer events. Support: 5 6
                      summarize BP as BD complete. Now start sorting BD...
                      sorting BD... done.
                      external BD events: 0 Added BreakDancer-like events: 21

                      Insertsize in config: 250

                      Comment


                      • #12
                        Originally posted by GFP View Post
                        I will try to extract some BAM file (100 in depth) to you, it is too large to send you just one intact file by mail, Thanks!
                        you could use dropbox or box

                        Comment


                        • #13
                          Originally posted by GFP View Post
                          Where to find the source code for Pindel 0.2.5?
                          And what is the difference between Pindel 0.2.4 and Pindel 0.2.5?


                          Thanks!
                          some bugs removed and a few performance optimization.

                          Comment


                          • #14
                            I still could not run pindel on HPC, it stopped autpmatically, Thanks!
                            $ more pindelrun1.PBS
                            # !/bin/bash
                            # PBS -S /bin/bash
                            # PBS -N PINDEL_test
                            # PBS -d /scratch/genomic_med/xhu3/Pindel
                            # PBS -e /scratch/genomic_med/xhu3/Pindel -o /scratch/genomic_med/xhu3/Pindel
                            # PBS -q medium
                            # PBS -l nodes=20: ppn48, walltime=72:00:00
                            # PBS -V
                            /scratch/rists/hpcapps/x86_64/pindel/0.2.4/pindel -f /scratch/genomic_med/xhu3/Pindel/Homo_sapiens_assembly19.fasta -i /scratch/
                            genomic_med/xhu3/Pindel/HF.Conf.txt -c 20:5,000,000-7,000,000 -o HF_1 > /scratch/genomic_med/xhu3/Pindel/HF_1_logfile


                            $ more HF_1_logfile
                            Initializing parameters...
                            Pindel version 0.2.5a4, May 2 2014.
                            Loading reference genome ...
                            Loading reference genome done.
                            Initializing parameters done.
                            SearchRegion::SearchRegion
                            Processing region: 20 5000000 7000000
                            Chromosome Size: 63025520
                            NumBoxes: 60015 BoxSize: 2107

                            Looking at chromosome 20 bases 5000000 to 7000000 of the bed region: chromosome 20:5000000-7000000
                            /scratch/genomic_med/xhu3/Pindel/HF3245.merged.bam RP 8
                            Discovery RP: 8
                            sorting RP complete.
                            Reads_RP.size(): 8
                            sorting read-pair
                            sorting read-pair finished.
                            Modify RP complete.
                            adding BD from RP.
                            modify and summarize interchr RP.
                            Reads_RP.size(): 488
                            sorting read-pair
                            sorting InterChr read-pair finished.
                            adding BD from interChr RP.
                            adding interchr 20 6698136 6698712 576 19 46995182 46995758 576 to breakdancer events. Support: 5 6
                            adding interchr 20 6277228 6277804 576 7 99667321 99667897 576 to breakdancer events. Support: 7 8
                            adding interchr 20 5671433 5672085 652 19 31020802 31021378 576 to breakdancer events. Support: 7 8
                            summarize BP as BD complete. Now start sorting BD...
                            sorting BD... done.
                            external BD events: 0 Added BreakDancer-like events: 3

                            Insertsize in config: 250

                            Comment


                            • #15
                              Pindel speedup issue

                              I have 40 bam files in config file, each 30GB, I am running each chr at one PBS job. So how much memory each job needs? Does each job read the entire BAM file into memory at once?

                              Pindel could take 24 ppn at one node, right?

                              Thanks,

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Essential Discoveries and Tools in Epitranscriptomics
                                by seqadmin


                                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                                Yesterday, 07:01 AM
                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              39 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              41 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              35 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              55 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X