Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #76
    Great tool! I would make a couple suggestions for future versions. Or if I'm missing these features in the current version, please let me know.

    1. Do not use reads marked as PCR or optical duplicates when identifying variants.
    2. Allow an intervals file to be used.

    Both options would speed the program, and the first option would make the results more accurate.

    Thanks!

    Comment


    • #77
      for 1, we count both total number and unique reads. some validation methods use amplification and would produce a lot of duplicated reads. we were asked not to remove duplicates but count them separately.
      not sure about the suggestion 2. please clarify.

      Kai

      Comment


      • #78
        Originally posted by bwubb View Post
        Can I request a max supporting samples option for the pindel2vcf program? I did not see that as an option in the --help and for my purposes, I want to filter out any SV that occurs in more then 2 samples.

        Or am I missing how to do this? Thank you.
        sorry for the late reply. I did not come back to this thread.

        for this filtering, you may better to use awk to get the calls you want and then pass the head line to pindel2vcf.

        grep BP outputfile | awk '{if ($10 <=2) print}' > head.txt
        then pindel2vcf -p head.txt

        I do not remember which column is the number of samples containing the variants. I just put $10 here, you may have to check before continue.

        Kai

        Comment


        • #79
          Originally posted by bwubb View Post
          What is SVTYPE RPL?
          replacement. Pindel is able to predict variants with inserted sequence around the breakpoint. for example, a 10kb deletion with 5 bp insertion.

          Kai

          Comment


          • #80
            Originally posted by mikhmv View Post
            KaiYe: Does pindel accept breakdancer files?

            I tried to use this command: pindel --fasta "human_g1k_v37_decoy_optimized.fasta" \
            --config-file "config_1" \
            --output-prefix "$PREFIX-$CHROM" \
            --chromosome $CHROM \
            --number_of_threads 10 \
            --max_range_index 5 \
            --report_inversions --report_duplications --report_long_insertions --report_breakpoints --report_close_mapped_reads \
            --min_NT_size 50 \
            --min_inversion_size 50 \
            --min_num_matched_bases 30 \
            --additional_mismatch 1 \
            --min_perfect_match_around_BP 3 \
            --sequencing_error_rate 0.03 \
            --maximum_allowed_mismatch_rate 0.1 \
            --anchor_quality 20 \
            --balance_cutoff 100 \
            --window_size 300 \
            --minimum_support_for_event 3 \
            --genotyping \
            --breakdancer "file_1190575.txt" \
            --output_of_breakdancer_events "breakdancer-events-$CHROM.txt" \
            --name_of_logfile $CHROM.log

            But I didn't get neither breakdancer-events neither log file. Does this options work?
            try -Q filename. the result will be there.

            Comment


            • #81
              Thanks for the quick reply. Let's discuss the duplication first. I now see in the output where the "unique" reads are reported for a given event. In the example below, there are 6 reads which are duplicates of each other and Pindel recognizes and reports this:

              ####################################################################################################
              4725 D 1 NT 0 "" ChrID chr3 BP 3016666 3016668 BP_range 3016666 3016668 Supports 6 1 + 6 1 - 0 0 S1 7 SUM_MS 314 1 NumSupSamples 1 1 28_1_GTGTTA 6 1 0 0
              ATTGGATGCATAATAAAATTAAAACATTTTTTGTTTCTGGCATGGCCAATATTGCTATTTGTCTTATAGAAACCTCTTCTCATTACTAAATTATATATTCTgTATAGTGGGCCCCCCTTTCTAATTAATAATTAATATTGTCTTCCAGGCATTTTAGTTACCAAGTGGTAAAGGAAGCTTCTGTGATTTCAACTTCAAGTTA
              CTTATAGAAACCTCTTCTCATTACTAAATTATATATTCT TATAGTGGGCCCCCCTTTCTAATTAATAATTAATATTGTCTTCCAGGCATTTTAGTTACCAA + 3016209 37 28_1_GTGTTA @DCDF8JN1:204:C0V4FACXX:4:2215:15528:50861/1
              CTTATAGAAACCTCTTCTCATTACTAAATTATATATTCT TATAGTGGGCCCCCCTTTCTAATTAATAATTAATATTGTCTTCCAGGCATTTTAGTTACCAA + 3016209 37 28_1_GTGTTA @DCDF8JN1:204:C0V4FACXX:4:1312:8307:12733/1
              CTTATAGAAACCTCTTCTCATTACTAAATTATATATTCT TATAGTGGGCCCCCCTTTCTAATTAATAATTAATATTGTCTTCCAGGCATTTTAGTTACCAA + 3016209 60 28_1_GTGTTA @DCDF8JN1:204:C0V4FACXX:4:1209:17429:19881/1
              CTTATAGAAACCTCTTCTCATTGCTAAATTATATATTCT TATAGTGGGCCCCCCTTTCTAATTAATAATTAATATTGTCTTCCAGGCATTTTAGTTACCAA + 3016209 60 28_1_GTGTTA @DCDF8JN1:204:C0V4FACXX:4:1201:7760:69122/1
              CTTATAGAAACCTCTTCTCATTACTAAATTATATATTCT TATAGTGGGCCCCCCTTTCTAATTAATAATTAATATTGTCTTCCAGGCATTTTAGTTACCAA + 3016209 60 28_1_GTGTTA @DCDF8JN1:204:C0V4FACXX:4:1109:9254:43555/1
              CTTATAGAAACCTCTTCTCATTACTAAATTATATATTCT TATAGTGGGCCCCCCTTTCTAATTAATAATTAATATTGTCTTCCAGGCATTTTAGTTACCAA + 3016209 60 28_1_GTGTTA @DCDF8JN1:204:C0V4FACXX:4:1103:14378:72999/1
              ####################################################################################################


              However, Pindel does not seem to properly recognize duplicates when the paired end reads run into each other as in this example:

              ####################################################################################################
              4727 D 4 NT 0 "" ChrID chr3 BP 3910869 3910874 BP_range 3910869 3910877 Supports 4 4 + 2 2 - 2 2 S1 9 SUM_MS 240 1 NumSupSamples 1 1 28_1_GTGTTA 2 2 2 2
              GCAGAAATAAAAAGAAAACATCAAATGCGGCTCTTCCATGACCTGCTAGGATCTGCTTCTACCAAATCATGGATATAGAAATAGGCCCAGCTGCACACCACtcttTCTAATTATCCTGTTCTTCCAATTCCTCTTTTATGCATTTTTTTTGCCCACTCTCTCTCGAAACACAGTAGCTCTGGGAGTTGAAAATTAAGTTTTA
              AGAAATAGGCCCAGCTGCACACCAC TCTAATTATCCTGTTCTTCCAATTCCTCTTTTATGCATTTTTTTTGCCCACTCTCTCTCGAAACACAGTAGCTCTG + 3910628 60 28_1_GTGTTA @DCDF8JN1:204:C0V4FACXX:4:2214:15997:96580/1
              CTCTTCTATGACCTGCTAGGATCTGCTTCTACCAAATCATGGATATAGAAATAGGCCCAGCTGCACACCAC TCTAATTATCCTGTTCTTCCAATTCCTCTT - 3911111 60 28_1_GTGTTA @DCDF8JN1:204:C0V4FACXX:4:2214:15997:96580/2
              AGAAATAGGCCCAGCTGCACACCAC TCTAATTATCCTGTTCTTCCAATTCCTCTTTTATGCATTTTTTTTGCCCACTCTCTCTCGAAACACAGTAGCTCTG + 3910628 60 28_1_GTGTTA @DCDF8JN1:204:C0V4FACXX:4:1302:10814:73798/1
              CTCTTCTATGACCTGCTAGGATCTGCTTCTACCAAATCATGGATATAGAAATAGGCCCAGCTGCACACCAC TCTAATTATCCTGTTCTTCCAATTCCTCTT - 3911111 60 28_1_GTGTTA @DCDF8JN1:204:C0V4FACXX:4:1302:10814:73798/2
              ####################################################################################################

              You can see by the read coordinates that the X and Y position are the same for the forward and reverse reads -- suggesting that they are in the same pair. It is apparent that this is one unique paired-end read, the forward and reverse reads overlap, there is a deletion in the overlapping portion, and the read was duplicated. This is one event that is counted 4 times. I also confirmed that Picard mark dups correctly flagged all 4 reads as duplicates.

              When I rank order candidates by the reported number of unique reads, these types of events are enriched at the top of my list.

              Comment


              • #82
                Hi, Kai Ye
                My project is to analyze InDels in two groups of samples. The aim is to find deletion site that is shared in one group of samples meanwhile the other group
                of samples in the same location msut be the same to the reference. The output results I need is like this:

                TATCTTACTAAGTTATCCTCCACTAACTCCTTAAGCTTAACATGCAAAGATAtcgcaatcaagatgTCACATCATC Reference
                TATCTTACTAAGTTATCCCCCACTAACTCCTTAAGCTTAACATGCAAAGATAtcgcaatcaagatgTCACATCATC Normal1
                TATCTTACTAAGTTATCCCCCACTAACTCCTTAAGCTTAACATGCAAAGATAtcgcaatcaagatgTCACATCATC Normal1
                TATCTTACTAAGTTATCCCCCACTAACTCCTTAAGCTTAACATGCAAAGATAtcgcaatcaagatgTCACATCATC Normal2
                TATCTTACTAAGTTATCCCCCACTAACTTCTTAAGCTTAACATGCAAAGATAtcgcaatcaagatgTCACATCATC Normal3
                TATCTTACTAAGTTATCCCCCACTAACTTCTTAAGCTTAACATGCAAAGATA TCACATCATC Cancer1
                TATCTTACTAAGTTATCCCCCACTAACTCCTTAAGCTTAACATGCAAAGATA TCACATCATC Cancer2
                TATCTTACTAAGTTATCCCCCACTAACTCCTTAAGCTTAACATGCAAAGATA TCACATCATC Cancer2
                TATCTTACTAAGTTATCCCCCACTAACTTCTTAAGCTTAACATGCAAAGATA TCACATCATC Cancer3
                TATCTTACTAAGTTATCCCCCACTAACTCCTTAAGCTTAACATGCAAAGATA TCACATCATC Cancer3

                I was wondering if there is any options in Pindel that can do this.
                Thanks~

                Comment


                • #83
                  Originally posted by iveryone View Post
                  Hi, Kai Ye
                  My project is to analyze InDels in two groups of samples. The aim is to find deletion site that is shared in one group of samples meanwhile the other group
                  of samples in the same location msut be the same to the reference. The output results I need is like this:

                  TATCTTACTAAGTTATCCTCCACTAACTCCTTAAGCTTAACATGCAAAGATAtcgcaatcaagatgTCACATCATC Reference
                  TATCTTACTAAGTTATCCCCCACTAACTCCTTAAGCTTAACATGCAAAGATAtcgcaatcaagatgTCACATCATC Normal1
                  TATCTTACTAAGTTATCCCCCACTAACTCCTTAAGCTTAACATGCAAAGATAtcgcaatcaagatgTCACATCATC Normal1
                  TATCTTACTAAGTTATCCCCCACTAACTCCTTAAGCTTAACATGCAAAGATAtcgcaatcaagatgTCACATCATC Normal2
                  TATCTTACTAAGTTATCCCCCACTAACTTCTTAAGCTTAACATGCAAAGATAtcgcaatcaagatgTCACATCATC Normal3
                  TATCTTACTAAGTTATCCCCCACTAACTTCTTAAGCTTAACATGCAAAGATA TCACATCATC Cancer1
                  TATCTTACTAAGTTATCCCCCACTAACTCCTTAAGCTTAACATGCAAAGATA TCACATCATC Cancer2
                  TATCTTACTAAGTTATCCCCCACTAACTCCTTAAGCTTAACATGCAAAGATA TCACATCATC Cancer2
                  TATCTTACTAAGTTATCCCCCACTAACTTCTTAAGCTTAACATGCAAAGATA TCACATCATC Cancer3
                  TATCTTACTAAGTTATCCCCCACTAACTCCTTAAGCTTAACATGCAAAGATA TCACATCATC Cancer3

                  I was wondering if there is any options in Pindel that can do this.
                  Thanks~
                  you may wish to take a look at Pindel's raw output format to see whether this fits your need.

                  please check the page


                  you get the alignment of the reads with sample information to the reference genome. you won't get read sequence support reference allele, although. there is one summary line per variant to show you per sample how many reads support reference and how many support the variant.

                  let me know this is still not sufficient.


                  kai

                  Comment


                  • #84
                    Originally posted by KaiYe View Post
                    you may wish to take a look at Pindel's raw output format to see whether this fits your need.

                    please check the page


                    you get the alignment of the reads with sample information to the reference genome. you won't get read sequence support reference allele, although. there is one summary line per variant to show you per sample how many reads support reference and how many support the variant.

                    let me know this is still not sufficient.


                    kai
                    Thanks a lot, Kai. For deletions, the end of the summary line seems only list by samples how many reads supporting 'Deletion'. How can I get the number of reads that support the reference allele(none deletion)?

                    Comment


                    • #85
                      After use samtools sam2pindel then I execute Pindel. Here is what I get ....

                      $ pindel -f [PATH]/human_g1k_v37.fasta -p [PATH]/pindel/input_pindel.txt -c 17:41,194,312-41,279,500 -o [PATH]/pindel/EI17_Pindel
                      Pindel version 0.2.4t, August 13 2012.
                      Processing chromosome: 1
                      Skipping chromosome: 1
                      Processing chromosome: 2
                      Skipping chromosome: 2
                      Processing chromosome: 3
                      Skipping chromosome: 3
                      Processing chromosome: 4
                      Skipping chromosome: 4
                      Processing chromosome: 5
                      Skipping chromosome: 5
                      Processing chromosome: 6
                      Skipping chromosome: 6
                      Processing chromosome: 7
                      Skipping chromosome: 7
                      Processing chromosome: 8
                      Skipping chromosome: 8
                      Processing chromosome: 9
                      Skipping chromosome: 9
                      Processing chromosome: 10
                      Skipping chromosome: 10
                      Processing chromosome: 11
                      Skipping chromosome: 11
                      Processing chromosome: 12
                      Skipping chromosome: 12
                      Processing chromosome: 13
                      Skipping chromosome: 13
                      Processing chromosome: 14
                      Skipping chromosome: 14
                      Processing chromosome: 15
                      Skipping chromosome: 15
                      Processing chromosome: 16
                      Skipping chromosome: 16
                      Processing chromosome: 17
                      Chromosome Size: 81195210
                      NumBoxes: 60004 BoxSize: 3373

                      Looking at chromosome 17 bases 41194312 to 41279500.
                      getReads 17 101195210
                      Scanning and processing reads anchored in 17
                      last one: 0 and UPCLOSE= 0

                      The last read Pindel scanned:
                      @PC-LABO-NGS-MIS_25:1:1103:23265:10796
                      TAAGGGTGGGTAGGTTTGTTGGTATCCTAGTGGGTGAGGGGTGGCTTTGGAGTTGCAGTTGATGTGTGATAGTTGAGGGTTGATTGCTGTACTTGCTTGTAAGCATGGGGGGGGGGGGTTTTGATGGGGTTTGGGTTTTTATGT
                      + chrM 16129 254 220 patient

                      Number of reads in current window: 0, + 0 - 0
                      Number of reads where the close end could be mapped: 0, + 0 - 0
                      Percentage of reads which could be mapped: + 0.00% - 0.00%

                      No reads found in [PATH]/input_pindelbis.txt
                      There are no reads for this bin.
                      Loading genome sequences and reads: 0 seconds.
                      Mining, Sorting and output results: 0 seconds.
                      Do you have any idea?

                      Thank you in advance for your help

                      Comment


                      • #86
                        Originally posted by tonio100680 View Post
                        After use samtools sam2pindel then I execute Pindel. Here is what I get ....



                        Do you have any idea?

                        Thank you in advance for your help
                        the chromosome names in your reference file are "1", "2",..., but "chr1" in your extracted file. make sure you use the same reference file for mapping and pindel running.

                        you can use -i to directly run Pindel on bam files. please read the user manual and the latest version is 0.2.5 at https://github.com/genome/pindel

                        Comment


                        • #87
                          Hi, I was wondering what the -g/--genotyping does. In the Pindel documentation:
                          -g/--genotyping
                          gentype variants if -i is also turn true.

                          I don't really understand what that means. -i indicates the input file that lists the bam files. So does this mean genotyping can only be carried out if you use bam input files? I can't seem to find any information that explains it better. Also, to make matters worse I don't really fully understand what genotyping variants means.

                          Comment


                          • #88
                            -A/--anchor_quality
                            the minimal mapping quality of the reads Pindel uses as anchor
                            (default 20)

                            I am sorry, I also have another question. Is the anchor quality (above), the threshold alignment score for the mapped read? I just want to make sure I understand correctly.


                            and in the case of:

                            -x/--max_range_index
                            the maximum size of structural variations to be detected; the higher this number, the greater the number of SVs reported, but the computational cost and memory requirements increase, as does the rate of false positives. 1=128, 2=512, 3=2,048, 4=8,092, 5=32,368, 6=129,472, 7=517,888, 8=2,071,552, 9=8,286,208 (maximum 9, default 5)

                            Is this the same as the user-defined Maximum Deletion Size parameter (Max_D_Size) that is referred to in the original Pindel paper

                            Comment


                            • #89
                              Hi, Kai ye
                              I can't solve this problem:
                              "$ pindel -f ref.fa -p out.bam2pindel.txt -c ALL -o outpindel", or
                              "$ pindel -f ref.fa -i pindel.config -c ALL -o outpindel"
                              then:
                              Welcome to Pindel, developed by Kai Ye, [email protected]

                              6 parameters are required here:
                              1. Input: the reference genome sequences in fasta format;
                              2. Input: the unmapped reads in a modified fastq format;
                              3. Output: the output for short insertions (SI)
                              4. Output: the output for deletions (D)
                              5. Output: the output for a special type of deletion events, non-template insertion after deletion (DI).
                              deletions >= 100bp and inserted bp <= 7bp
                              6. Which chr/fragment

                              $ pindel ref.fa out.bam2pindel.txt 320 out.si out.d out.di
                              Processing chromosome gi|411024077|gb|CM001634.1| ...
                              Current chromosome size: 28635137 bases
                              Processing chromosome gi|411024076|gb|CM001635.1| ...
                              Current chromosome size: 27864329 bases
                              Processing chromosome gi|411024075|gb|CM001636.1| ...
                              Current chromosome size: 31725688 bases
                              Processing chromosome gi|411024074|gb|CM001637.1| ...
                              Current chromosome size: 18984343 bases
                              Processing chromosome gi|411024073|gb|CM001638.1| ...
                              Current chromosome size: 23960834 bases
                              Processing chromosome gi|411024072|gb|CM001639.1| ...
                              Current chromosome size: 26286742 bases
                              Processing chromosome gi|411024071|gb|CM001640.1| ...
                              Current chromosome size: 22597724 bases
                              Processing chromosome gi|411024070|gb|CM001641.1| ...
                              Current chromosome size: 21613650 bases
                              Processing chromosome gi|411024069|gb|CM001642.1| ...
                              Current chromosome size: 37155481 bases
                              Processing chromosome gi|411024068|gb|CM001643.1| ...
                              Current chromosome size: 17599535 bases
                              Loading genome sequences and reads: 0 seconds.
                              Mining indels: 0 seconds.
                              Sorting and output results: 0 seconds.
                              Do you have any idea?

                              Thank you in advance for your help

                              Comment


                              • #90
                                Hi Kai Ye,

                                I have a bam file generated from Bowtie. Can I use it directly to pindel for SV detection or do I need to pre-process the bam file before using Pindel. Please suggest.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Essential Discoveries and Tools in Epitranscriptomics
                                  by seqadmin




                                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                  04-22-2024, 07:01 AM
                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 11:49 AM
                                0 responses
                                15 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-24-2024, 08:47 AM
                                0 responses
                                16 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                62 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                60 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X