Seqanswers Leaderboard Ad

**hshain** · 01-31-2013, 01:07 PM

Great tool! I would make a couple suggestions for future versions. Or if I'm missing these features in the current version, please let me know.

1. Do not use reads marked as PCR or optical duplicates when identifying variants.
2. Allow an intervals file to be used.

Both options would speed the program, and the first option would make the results more accurate.

Thanks!

**KaiYe** · 01-31-2013, 01:18 PM

for 1, we count both total number and unique reads. some validation methods use amplification and would produce a lot of duplicated reads. we were asked not to remove duplicates but count them separately.
not sure about the suggestion 2. please clarify.

Kai

**KaiYe** · 01-31-2013, 01:25 PM

Originally posted by bwubb View Post

Can I request a max supporting samples option for the pindel2vcf program? I did not see that as an option in the --help and for my purposes, I want to filter out any SV that occurs in more then 2 samples.

Or am I missing how to do this? Thank you.

sorry for the late reply. I did not come back to this thread.

for this filtering, you may better to use awk to get the calls you want and then pass the head line to pindel2vcf.

grep BP outputfile | awk '{if ($10 <=2) print}' > head.txt
then pindel2vcf -p head.txt

I do not remember which column is the number of samples containing the variants. I just put $10 here, you may have to check before continue.

Kai

**KaiYe** · 01-31-2013, 01:26 PM

Originally posted by bwubb View Post

What is SVTYPE RPL?

replacement. Pindel is able to predict variants with inserted sequence around the breakpoint. for example, a 10kb deletion with 5 bp insertion.

Kai

**KaiYe** · 01-31-2013, 01:27 PM

Originally posted by mikhmv View Post

KaiYe: Does pindel accept breakdancer files?

I tried to use this command: pindel --fasta "human_g1k_v37_decoy_optimized.fasta" \
--config-file "config_1" \
--output-prefix "$PREFIX-$CHROM" \
--chromosome $CHROM \
--number_of_threads 10 \
--max_range_index 5 \
--report_inversions --report_duplications --report_long_insertions --report_breakpoints --report_close_mapped_reads \
--min_NT_size 50 \
--min_inversion_size 50 \
--min_num_matched_bases 30 \
--additional_mismatch 1 \
--min_perfect_match_around_BP 3 \
--sequencing_error_rate 0.03 \
--maximum_allowed_mismatch_rate 0.1 \
--anchor_quality 20 \
--balance_cutoff 100 \
--window_size 300 \
--minimum_support_for_event 3 \
--genotyping \
--breakdancer "file_1190575.txt" \
--output_of_breakdancer_events "breakdancer-events-$CHROM.txt" \
--name_of_logfile $CHROM.log

But I didn't get neither breakdancer-events neither log file. Does this options work?

try -Q filename. the result will be there.

**hshain** · 02-01-2013, 03:07 PM

Thanks for the quick reply. Let's discuss the duplication first. I now see in the output where the "unique" reads are reported for a given event. In the example below, there are 6 reads which are duplicates of each other and Pindel recognizes and reports this:

####################################################################################################
4725 D 1 NT 0 "" ChrID chr3 BP 3016666 3016668 BP_range 3016666 3016668 Supports 6 1 + 6 1 - 0 0 S1 7 SUM_MS 314 1 NumSupSamples 1 1 28_1_GTGTTA 6 1 0 0
ATTGGATGCATAATAAAATTAAAACATTTTTTGTTTCTGGCATGGCCAATATTGCTATTTGTCTTATAGAAACCTCTTCTCATTACTAAATTATATATTCTgTATAGTGGGCCCCCCTTTCTAATTAATAATTAATATTGTCTTCCAGGCATTTTAGTTACCAAGTGGTAAAGGAAGCTTCTGTGATTTCAACTTCAAGTTA
CTTATAGAAACCTCTTCTCATTACTAAATTATATATTCT TATAGTGGGCCCCCCTTTCTAATTAATAATTAATATTGTCTTCCAGGCATTTTAGTTACCAA + 3016209 37 28_1_GTGTTA @DCDF8JN1:204:C0V4FACXX:4:2215:15528:50861/1
CTTATAGAAACCTCTTCTCATTACTAAATTATATATTCT TATAGTGGGCCCCCCTTTCTAATTAATAATTAATATTGTCTTCCAGGCATTTTAGTTACCAA + 3016209 37 28_1_GTGTTA @DCDF8JN1:204:C0V4FACXX:4:1312:8307:12733/1
CTTATAGAAACCTCTTCTCATTACTAAATTATATATTCT TATAGTGGGCCCCCCTTTCTAATTAATAATTAATATTGTCTTCCAGGCATTTTAGTTACCAA + 3016209 60 28_1_GTGTTA @DCDF8JN1:204:C0V4FACXX:4:1209:17429:19881/1
CTTATAGAAACCTCTTCTCATTGCTAAATTATATATTCT TATAGTGGGCCCCCCTTTCTAATTAATAATTAATATTGTCTTCCAGGCATTTTAGTTACCAA + 3016209 60 28_1_GTGTTA @DCDF8JN1:204:C0V4FACXX:4:1201:7760:69122/1
CTTATAGAAACCTCTTCTCATTACTAAATTATATATTCT TATAGTGGGCCCCCCTTTCTAATTAATAATTAATATTGTCTTCCAGGCATTTTAGTTACCAA + 3016209 60 28_1_GTGTTA @DCDF8JN1:204:C0V4FACXX:4:1109:9254:43555/1
CTTATAGAAACCTCTTCTCATTACTAAATTATATATTCT TATAGTGGGCCCCCCTTTCTAATTAATAATTAATATTGTCTTCCAGGCATTTTAGTTACCAA + 3016209 60 28_1_GTGTTA @DCDF8JN1:204:C0V4FACXX:4:1103:14378:72999/1
####################################################################################################

However, Pindel does not seem to properly recognize duplicates when the paired end reads run into each other as in this example:

####################################################################################################
4727 D 4 NT 0 "" ChrID chr3 BP 3910869 3910874 BP_range 3910869 3910877 Supports 4 4 + 2 2 - 2 2 S1 9 SUM_MS 240 1 NumSupSamples 1 1 28_1_GTGTTA 2 2 2 2
GCAGAAATAAAAAGAAAACATCAAATGCGGCTCTTCCATGACCTGCTAGGATCTGCTTCTACCAAATCATGGATATAGAAATAGGCCCAGCTGCACACCACtcttTCTAATTATCCTGTTCTTCCAATTCCTCTTTTATGCATTTTTTTTGCCCACTCTCTCTCGAAACACAGTAGCTCTGGGAGTTGAAAATTAAGTTTTA
AGAAATAGGCCCAGCTGCACACCAC TCTAATTATCCTGTTCTTCCAATTCCTCTTTTATGCATTTTTTTTGCCCACTCTCTCTCGAAACACAGTAGCTCTG + 3910628 60 28_1_GTGTTA @DCDF8JN1:204:C0V4FACXX:4:2214:15997:96580/1
CTCTTCTATGACCTGCTAGGATCTGCTTCTACCAAATCATGGATATAGAAATAGGCCCAGCTGCACACCAC TCTAATTATCCTGTTCTTCCAATTCCTCTT - 3911111 60 28_1_GTGTTA @DCDF8JN1:204:C0V4FACXX:4:2214:15997:96580/2
AGAAATAGGCCCAGCTGCACACCAC TCTAATTATCCTGTTCTTCCAATTCCTCTTTTATGCATTTTTTTTGCCCACTCTCTCTCGAAACACAGTAGCTCTG + 3910628 60 28_1_GTGTTA @DCDF8JN1:204:C0V4FACXX:4:1302:10814:73798/1
CTCTTCTATGACCTGCTAGGATCTGCTTCTACCAAATCATGGATATAGAAATAGGCCCAGCTGCACACCAC TCTAATTATCCTGTTCTTCCAATTCCTCTT - 3911111 60 28_1_GTGTTA @DCDF8JN1:204:C0V4FACXX:4:1302:10814:73798/2
####################################################################################################

You can see by the read coordinates that the X and Y position are the same for the forward and reverse reads -- suggesting that they are in the same pair. It is apparent that this is one unique paired-end read, the forward and reverse reads overlap, there is a deletion in the overlapping portion, and the read was duplicated. This is one event that is counted 4 times. I also confirmed that Picard mark dups correctly flagged all 4 reads as duplicates.

When I rank order candidates by the reported number of unique reads, these types of events are enriched at the top of my list.

**iveryone** · 05-09-2013, 11:17 PM

Hi, Kai Ye
My project is to analyze InDels in two groups of samples. The aim is to find deletion site that is shared in one group of samples meanwhile the other group
of samples in the same location msut be the same to the reference. The output results I need is like this:

TATCTTACTAAGTTATCCTCCACTAACTCCTTAAGCTTAACATGCAAAGATAtcgcaatcaagatgTCACATCATC Reference
TATCTTACTAAGTTATCCCCCACTAACTCCTTAAGCTTAACATGCAAAGATAtcgcaatcaagatgTCACATCATC Normal1
TATCTTACTAAGTTATCCCCCACTAACTCCTTAAGCTTAACATGCAAAGATAtcgcaatcaagatgTCACATCATC Normal1
TATCTTACTAAGTTATCCCCCACTAACTCCTTAAGCTTAACATGCAAAGATAtcgcaatcaagatgTCACATCATC Normal2
TATCTTACTAAGTTATCCCCCACTAACTTCTTAAGCTTAACATGCAAAGATAtcgcaatcaagatgTCACATCATC Normal3
TATCTTACTAAGTTATCCCCCACTAACTTCTTAAGCTTAACATGCAAAGATA TCACATCATC Cancer1
TATCTTACTAAGTTATCCCCCACTAACTCCTTAAGCTTAACATGCAAAGATA TCACATCATC Cancer2
TATCTTACTAAGTTATCCCCCACTAACTCCTTAAGCTTAACATGCAAAGATA TCACATCATC Cancer2
TATCTTACTAAGTTATCCCCCACTAACTTCTTAAGCTTAACATGCAAAGATA TCACATCATC Cancer3
TATCTTACTAAGTTATCCCCCACTAACTCCTTAAGCTTAACATGCAAAGATA TCACATCATC Cancer3

I was wondering if there is any options in Pindel that can do this.
Thanks~

**KaiYe** · 05-10-2013, 09:58 AM

Originally posted by iveryone View Post

Hi, Kai Ye
My project is to analyze InDels in two groups of samples. The aim is to find deletion site that is shared in one group of samples meanwhile the other group
of samples in the same location msut be the same to the reference. The output results I need is like this:

TATCTTACTAAGTTATCCTCCACTAACTCCTTAAGCTTAACATGCAAAGATAtcgcaatcaagatgTCACATCATC Reference
TATCTTACTAAGTTATCCCCCACTAACTCCTTAAGCTTAACATGCAAAGATAtcgcaatcaagatgTCACATCATC Normal1
TATCTTACTAAGTTATCCCCCACTAACTCCTTAAGCTTAACATGCAAAGATAtcgcaatcaagatgTCACATCATC Normal1
TATCTTACTAAGTTATCCCCCACTAACTCCTTAAGCTTAACATGCAAAGATAtcgcaatcaagatgTCACATCATC Normal2
TATCTTACTAAGTTATCCCCCACTAACTTCTTAAGCTTAACATGCAAAGATAtcgcaatcaagatgTCACATCATC Normal3
TATCTTACTAAGTTATCCCCCACTAACTTCTTAAGCTTAACATGCAAAGATA TCACATCATC Cancer1
TATCTTACTAAGTTATCCCCCACTAACTCCTTAAGCTTAACATGCAAAGATA TCACATCATC Cancer2
TATCTTACTAAGTTATCCCCCACTAACTCCTTAAGCTTAACATGCAAAGATA TCACATCATC Cancer2
TATCTTACTAAGTTATCCCCCACTAACTTCTTAAGCTTAACATGCAAAGATA TCACATCATC Cancer3
TATCTTACTAAGTTATCCCCCACTAACTCCTTAAGCTTAACATGCAAAGATA TCACATCATC Cancer3

I was wondering if there is any options in Pindel that can do this.
Thanks~

you may wish to take a look at Pindel's raw output format to see whether this fits your need.

please check the page

https://trac.nbic.nl/pindel/wiki/UserManual

you get the alignment of the reads with sample information to the reference genome. you won't get read sequence support reference allele, although. there is one summary line per variant to show you per sample how many reads support reference and how many support the variant.

let me know this is still not sufficient.

kai

**iveryone** · 05-12-2013, 12:28 AM

Originally posted by KaiYe View Post

you may wish to take a look at Pindel's raw output format to see whether this fits your need.

please check the page

https://trac.nbic.nl/pindel/wiki/UserManual

you get the alignment of the reads with sample information to the reference genome. you won't get read sequence support reference allele, although. there is one summary line per variant to show you per sample how many reads support reference and how many support the variant.

let me know this is still not sufficient.

kai

Thanks a lot, Kai. For deletions, the end of the summary line seems only list by samples how many reads supporting 'Deletion'. How can I get the number of reads that support the reference allele(none deletion)?

**tonio100680** · 06-11-2013, 06:54 AM

After use samtools sam2pindel then I execute Pindel. Here is what I get ....

$ pindel -f [PATH]/human_g1k_v37.fasta -p [PATH]/pindel/input_pindel.txt -c 17:41,194,312-41,279,500 -o [PATH]/pindel/EI17_Pindel
Pindel version 0.2.4t, August 13 2012.
Processing chromosome: 1
Skipping chromosome: 1
Processing chromosome: 2
Skipping chromosome: 2
Processing chromosome: 3
Skipping chromosome: 3
Processing chromosome: 4
Skipping chromosome: 4
Processing chromosome: 5
Skipping chromosome: 5
Processing chromosome: 6
Skipping chromosome: 6
Processing chromosome: 7
Skipping chromosome: 7
Processing chromosome: 8
Skipping chromosome: 8
Processing chromosome: 9
Skipping chromosome: 9
Processing chromosome: 10
Skipping chromosome: 10
Processing chromosome: 11
Skipping chromosome: 11
Processing chromosome: 12
Skipping chromosome: 12
Processing chromosome: 13
Skipping chromosome: 13
Processing chromosome: 14
Skipping chromosome: 14
Processing chromosome: 15
Skipping chromosome: 15
Processing chromosome: 16
Skipping chromosome: 16
Processing chromosome: 17
Chromosome Size: 81195210
NumBoxes: 60004 BoxSize: 3373

Looking at chromosome 17 bases 41194312 to 41279500.
getReads 17 101195210
Scanning and processing reads anchored in 17
last one: 0 and UPCLOSE= 0

The last read Pindel scanned:
@PC-LABO-NGS-MIS_25:1:1103:23265:10796
TAAGGGTGGGTAGGTTTGTTGGTATCCTAGTGGGTGAGGGGTGGCTTTGGAGTTGCAGTTGATGTGTGATAGTTGAGGGTTGATTGCTGTACTTGCTTGTAAGCATGGGGGGGGGGGGTTTTGATGGGGTTTGGGTTTTTATGT
+ chrM 16129 254 220 patient

Number of reads in current window: 0, + 0 - 0
Number of reads where the close end could be mapped: 0, + 0 - 0
Percentage of reads which could be mapped: + 0.00% - 0.00%

No reads found in [PATH]/input_pindelbis.txt
There are no reads for this bin.
Loading genome sequences and reads: 0 seconds.
Mining, Sorting and output results: 0 seconds.

Do you have any idea?

Thank you in advance for your help

**KaiYe** · 06-11-2013, 07:02 AM

Originally posted by tonio100680 View Post

After use samtools sam2pindel then I execute Pindel. Here is what I get ....

Do you have any idea?

Thank you in advance for your help

the chromosome names in your reference file are "1", "2",..., but "chr1" in your extracted file. make sure you use the same reference file for mapping and pindel running.

you can use -i to directly run Pindel on bam files. please read the user manual and the latest version is 0.2.5 at https://github.com/genome/pindel

**dGho** · 06-26-2013, 09:27 AM

Hi, I was wondering what the -g/--genotyping does. In the Pindel documentation:
-g/--genotyping
gentype variants if -i is also turn true.

I don't really understand what that means. -i indicates the input file that lists the bam files. So does this mean genotyping can only be carried out if you use bam input files? I can't seem to find any information that explains it better. Also, to make matters worse I don't really fully understand what genotyping variants means.

**dGho** · 06-26-2013, 11:24 AM

-A/--anchor_quality
the minimal mapping quality of the reads Pindel uses as anchor
(default 20)

I am sorry, I also have another question. Is the anchor quality (above), the threshold alignment score for the mapped read? I just want to make sure I understand correctly.

and in the case of:

-x/--max_range_index
the maximum size of structural variations to be detected; the higher this number, the greater the number of SVs reported, but the computational cost and memory requirements increase, as does the rate of false positives. 1=128, 2=512, 3=2,048, 4=8,092, 5=32,368, 6=129,472, 7=517,888, 8=2,071,552, 9=8,286,208 (maximum 9, default 5)

Is this the same as the user-defined Maximum Deletion Size parameter (Max_D_Size) that is referred to in the original Pindel paper

**shuiwudao** · 06-27-2013, 01:20 AM

Hi, Kai ye
I can't solve this problem:

"$ pindel -f ref.fa -p out.bam2pindel.txt -c ALL -o outpindel", or
"$ pindel -f ref.fa -i pindel.config -c ALL -o outpindel"
then:
Welcome to Pindel, developed by Kai Ye, [email protected]

6 parameters are required here:
1. Input: the reference genome sequences in fasta format;
2. Input: the unmapped reads in a modified fastq format;
3. Output: the output for short insertions (SI)
4. Output: the output for deletions (D)
5. Output: the output for a special type of deletion events, non-template insertion after deletion (DI).
deletions >= 100bp and inserted bp <= 7bp
6. Which chr/fragment

$ pindel ref.fa out.bam2pindel.txt 320 out.si out.d out.di
Processing chromosome gi|411024077|gb|CM001634.1| ...
Current chromosome size: 28635137 bases
Processing chromosome gi|411024076|gb|CM001635.1| ...
Current chromosome size: 27864329 bases
Processing chromosome gi|411024075|gb|CM001636.1| ...
Current chromosome size: 31725688 bases
Processing chromosome gi|411024074|gb|CM001637.1| ...
Current chromosome size: 18984343 bases
Processing chromosome gi|411024073|gb|CM001638.1| ...
Current chromosome size: 23960834 bases
Processing chromosome gi|411024072|gb|CM001639.1| ...
Current chromosome size: 26286742 bases
Processing chromosome gi|411024071|gb|CM001640.1| ...
Current chromosome size: 22597724 bases
Processing chromosome gi|411024070|gb|CM001641.1| ...
Current chromosome size: 21613650 bases
Processing chromosome gi|411024069|gb|CM001642.1| ...
Current chromosome size: 37155481 bases
Processing chromosome gi|411024068|gb|CM001643.1| ...
Current chromosome size: 17599535 bases
Loading genome sequences and reads: 0 seconds.
Mining indels: 0 seconds.
Sorting and output results: 0 seconds.

Do you have any idea?

Thank you in advance for your help

**sahilsukla** · 08-18-2013, 09:46 PM

Hi Kai Ye,

I have a bam file generated from Bowtie. Can I use it directly to pindel for SV detection or do I need to pre-process the bam file before using Pindel. Please suggest.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News