Seqanswers Leaderboard Ad

**Topulaneus-Hattum** · 02-11-2014, 12:50 PM

I wrote:
"It's not at all clear to me how sam2pindel knows whether I am giving it alignments from mate-pair instead of alignments from paired-end."

For anyone else who has a similar problem and happens upon this thread, it turns out sam2pindel has a 6th argument that isn't shown in the user manual, which is how you tell it this difference.

I now notice that even though the URL says "current" the text says Pindel 0.2.4. I'm using 0.2.5a3. Web docs lag the source.

T.Hattum

**kartong** · 03-07-2014, 04:01 AM

Hi,

As per AlisonF, I need to extract the reference sequencing depth (RD) at the position of the detected structural variant for subsequent computation.

However, after running pindel on my novoalign generated .bam files and converting the pindel output into the vcf format using pindel2vcf, I only see the "GT:AD" fields (example below) and not the "GT:RD:AD" field as shown by the other users.

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT T101a T101b
chr1 5919343 . ATC A . PASS END=5919345;HOMLEN=5;HOMSEQ=TCTCT;SVLEN=-2;SVTYPE=DEL GT:AD 0/0:0,0 0/0:4,4
chr1 16975201 . CT C . PASS END=16975202;HOMLEN=0;SVLEN=-1;SVTYPE=DEL GT:AD 0/0:57,0 0/0:53,4
chr1 17084536 . TGGAACA T . PASS END=17084542;HOMLEN=2;HOMSEQ=GG;SVLEN=-6;SVTYPE=DEL GT:AD 0/0:55,0 0/0:71,4

For my analysis, I am using "Pindel version 0.2.5a3, Oct 24 2013." with default settings and the -i option with the config file and "pindel2vcf 0.5.8" to convert the pindel file into the vcf format.

Is anyone able to assist me on this?

Thank you.

**mcorces** · 04-10-2014, 10:10 AM

Hi Kai Ye,
I'm attempting to use Pindel (version 0.2.5a3) to annotate Internal Tandem Duplications in the gene FLT3. Others (PMID: 23159595) have had great success doing this with Pindel. I've run into a problem where I'm unable to limit the search to any region using the -c option. I'm particularly interested in the region chr13:28608000-28608600. Every time I execute Pindel with the -c option set to anything other than "ALL", I get a segmentation fault:

"
Initializing parameters...
Pindel version 0.2.5a3, Oct 24 2013.
Loading reference genome ...
Loading reference genome done.
Initializing parameters done.
SearchRegion::SearchRegion
Segmentation fault (core dumped)
"

However, when I execute with -c ALL, the program runs fine (and finds the desired ITD). I've tried setting -r, -l, and -k to false and I've tried setting -w to 1 to limit the amount of memory used. I've reproduced the command I'm using below (with absolute paths removed for clarity):

pindel -f hg19.fa -i pindel_input.txt -c 13 -r false -l false -k false -t true -w 1 -o pindel_output

I could just run the program on all chromosomes but would prefer to run it just on this defined region. Any idea where this seg fault is coming from? Unfortunately to core file isnt actually getting dumped so I cant reproduce that for you.
Thanks in advance!
-Ryan

**KaiYe** · 04-10-2014, 10:21 AM

Originally posted by mcorces View Post

Hi Kai Ye,
I'm attempting to use Pindel (version 0.2.5a3) to annotate Internal Tandem Duplications in the gene FLT3. Others (PMID: 23159595) have had great success doing this with Pindel. I've run into a problem where I'm unable to limit the search to any region using the -c option. I'm particularly interested in the region chr13:28608000-28608600. Every time I execute Pindel with the -c option set to anything other than "ALL", I get a segmentation fault:

"
Initializing parameters...
Pindel version 0.2.5a3, Oct 24 2013.
Loading reference genome ...
Loading reference genome done.
Initializing parameters done.
SearchRegion::SearchRegion
Segmentation fault (core dumped)
"

However, when I execute with -c ALL, the program runs fine (and finds the desired ITD). I've tried setting -r, -l, and -k to false and I've tried setting -w to 1 to limit the amount of memory used. I've reproduced the command I'm using below (with absolute paths removed for clarity):

pindel -f hg19.fa -i pindel_input.txt -c 13 -r false -l false -k false -t true -w 1 -o pindel_output

I could just run the program on all chromosomes but would prefer to run it just on this defined region. Any idea where this seg fault is coming from? Unfortunately to core file isnt actually getting dumped so I cant reproduce that for you.
Thanks in advance!
-Ryan

You have -c 13 and -c chr13. Please make sure that use the chromosome names as in your reference file. If you wish to check a set of genomic regions, you might use the new -j bed option. You would be able to specify chr start end per line and put as many genomic regions as you want. You could combine -c with -j so that Pindel compute regions also constrained by -c.

Let me know whether this works.

Kai

**KaiYe** · 04-10-2014, 10:24 AM

Originally posted by kartong View Post

Hi,

As per AlisonF, I need to extract the reference sequencing depth (RD) at the position of the detected structural variant for subsequent computation.

However, after running pindel on my novoalign generated .bam files and converting the pindel output into the vcf format using pindel2vcf, I only see the "GT:AD" fields (example below) and not the "GT:RD:AD" field as shown by the other users.

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT T101a T101b
chr1 5919343 . ATC A . PASS END=5919345;HOMLEN=5;HOMSEQ=TCTCT;SVLEN=-2;SVTYPE=DEL GT:AD 0/0:0,0 0/0:4,4
chr1 16975201 . CT C . PASS END=16975202;HOMLEN=0;SVLEN=-1;SVTYPE=DEL GT:AD 0/0:57,0 0/0:53,4
chr1 17084536 . TGGAACA T . PASS END=17084542;HOMLEN=2;HOMSEQ=GG;SVLEN=-6;SVTYPE=DEL GT:AD 0/0:55,0 0/0:71,4

For my analysis, I am using "Pindel version 0.2.5a3, Oct 24 2013." with default settings and the -i option with the config file and "pindel2vcf 0.5.8" to convert the pindel file into the vcf format.

Is anyone able to assist me on this?

Thank you.

We also put RD as part of AD so that AD has two int.

**mcorces** · 04-10-2014, 10:35 AM

Hi Kai Ye,
You're exactly right. Sorry for such a silly error!
Thanks!
-ryan

**t.wieland** · 09-24-2014, 04:38 AM

Hi KaiYe,

we're using Pindel to analyse exome data for quite a while now. We are starting to analyse more whole genome datasets now, and running pindel takes quite a while for these datasets (several days using a single thread), especially after switching from v0.2.4t to v0.2.5. So I'm wondering what is the recommended way to analyse large datasets? Currently we are using pindel with standard parameters (except -A which we set to 20) directly on BAM files from BWA and are just increasing the number of threads to 8-10. Is there something we can do better?

Best,
Thomas

**KaiYe** · 09-24-2014, 06:33 AM

Originally posted by t.wieland View Post

Hi KaiYe,

we're using Pindel to analyse exome data for quite a while now. We are starting to analyse more whole genome datasets now, and running pindel takes quite a while for these datasets (several days using a single thread), especially after switching from v0.2.4t to v0.2.5. So I'm wondering what is the recommended way to analyse large datasets? Currently we are using pindel with standard parameters (except -A which we set to 20) directly on BAM files from BWA and are just increasing the number of threads to 8-10. Is there something we can do better?

Best,
Thomas

For exome data, you could use -x 2 or even -x 1 to speed up. Which version are you using? Recently I changed the code and remove one memory leak.

You'd better run jobs per chr and maximum 4 threads as you will not cut runtime further with more threads but it is better to increase the number of jobs.

Kai

**t.wieland** · 09-24-2014, 06:38 AM

Hi Kai,

thanks for your reply!

Currently we're using v.2_5a3
Ok, I also thought about splitting by chromosome, but I wasn't sure if the detection of interchromosomal events still works?

Thomas

**KaiYe** · 09-24-2014, 07:43 AM

Originally posted by t.wieland View Post

Hi Kai,

thanks for your reply!

Currently we're using v.2_5a3
Ok, I also thought about splitting by chromosome, but I wasn't sure if the detection of interchromosomal events still works?

Thomas

interchromosomal events will not be affected if split per chr.

**BGould** · 05-26-2015, 06:19 AM

pindel variants counted multiple times

Originally posted by hshain View Post

Thanks for the quick reply. Let's discuss the duplication first. I now see in the output where the "unique" reads are reported for a given event. In the example below, there are 6 reads which are duplicates of each other and Pindel recognizes and reports this:

####################################################################################################
4725 D 1 NT 0 "" ChrID chr3 BP 3016666 3016668 BP_range 3016666 3016668 Supports 6 1 + 6 1 - 0 0 S1 7 SUM_MS 314 1 NumSupSamples 1 1 28_1_GTGTTA 6 1 0 0
ATTGGATGCATAATAAAATTAAAACATTTTTTGTTTCTGGCATGGCCAATATTGCTATTTGTCTTATAGAAACCTCTTCTCATTACTAAATTATATATTCTgTATAGTGGGCCCCCCTTTCTAATTAATAATTAATATTGTCTTCCAGGCATTTTAGTTACCAAGTGGTAAAGGAAGCTTCTGTGATTTCAACTTCAAGTTA
CTTATAGAAACCTCTTCTCATTACTAAATTATATATTCT TATAGTGGGCCCCCCTTTCTAATTAATAATTAATATTGTCTTCCAGGCATTTTAGTTACCAA + 3016209 37 28_1_GTGTTA @DCDF8JN1:204:C0V4FACXX:4:2215:15528:50861/1
CTTATAGAAACCTCTTCTCATTACTAAATTATATATTCT TATAGTGGGCCCCCCTTTCTAATTAATAATTAATATTGTCTTCCAGGCATTTTAGTTACCAA + 3016209 37 28_1_GTGTTA @DCDF8JN1:204:C0V4FACXX:4:1312:8307:12733/1
CTTATAGAAACCTCTTCTCATTACTAAATTATATATTCT TATAGTGGGCCCCCCTTTCTAATTAATAATTAATATTGTCTTCCAGGCATTTTAGTTACCAA + 3016209 60 28_1_GTGTTA @DCDF8JN1:204:C0V4FACXX:4:1209:17429:19881/1
CTTATAGAAACCTCTTCTCATTGCTAAATTATATATTCT TATAGTGGGCCCCCCTTTCTAATTAATAATTAATATTGTCTTCCAGGCATTTTAGTTACCAA + 3016209 60 28_1_GTGTTA @DCDF8JN1:204:C0V4FACXX:4:1201:7760:69122/1
CTTATAGAAACCTCTTCTCATTACTAAATTATATATTCT TATAGTGGGCCCCCCTTTCTAATTAATAATTAATATTGTCTTCCAGGCATTTTAGTTACCAA + 3016209 60 28_1_GTGTTA @DCDF8JN1:204:C0V4FACXX:4:1109:9254:43555/1
CTTATAGAAACCTCTTCTCATTACTAAATTATATATTCT TATAGTGGGCCCCCCTTTCTAATTAATAATTAATATTGTCTTCCAGGCATTTTAGTTACCAA + 3016209 60 28_1_GTGTTA @DCDF8JN1:204:C0V4FACXX:4:1103:14378:72999/1
####################################################################################################

However, Pindel does not seem to properly recognize duplicates when the paired end reads run into each other as in this example:

####################################################################################################
4727 D 4 NT 0 "" ChrID chr3 BP 3910869 3910874 BP_range 3910869 3910877 Supports 4 4 + 2 2 - 2 2 S1 9 SUM_MS 240 1 NumSupSamples 1 1 28_1_GTGTTA 2 2 2 2
GCAGAAATAAAAAGAAAACATCAAATGCGGCTCTTCCATGACCTGCTAGGATCTGCTTCTACCAAATCATGGATATAGAAATAGGCCCAGCTGCACACCACtcttTCTAATTATCCTGTTCTTCCAATTCCTCTTTTATGCATTTTTTTTGCCCACTCTCTCTCGAAACACAGTAGCTCTGGGAGTTGAAAATTAAGTTTTA
AGAAATAGGCCCAGCTGCACACCAC TCTAATTATCCTGTTCTTCCAATTCCTCTTTTATGCATTTTTTTTGCCCACTCTCTCTCGAAACACAGTAGCTCTG + 3910628 60 28_1_GTGTTA @DCDF8JN1:204:C0V4FACXX:4:2214:15997:96580/1
CTCTTCTATGACCTGCTAGGATCTGCTTCTACCAAATCATGGATATAGAAATAGGCCCAGCTGCACACCAC TCTAATTATCCTGTTCTTCCAATTCCTCTT - 3911111 60 28_1_GTGTTA @DCDF8JN1:204:C0V4FACXX:4:2214:15997:96580/2
AGAAATAGGCCCAGCTGCACACCAC TCTAATTATCCTGTTCTTCCAATTCCTCTTTTATGCATTTTTTTTGCCCACTCTCTCTCGAAACACAGTAGCTCTG + 3910628 60 28_1_GTGTTA @DCDF8JN1:204:C0V4FACXX:4:1302:10814:73798/1
CTCTTCTATGACCTGCTAGGATCTGCTTCTACCAAATCATGGATATAGAAATAGGCCCAGCTGCACACCAC TCTAATTATCCTGTTCTTCCAATTCCTCTT - 3911111 60 28_1_GTGTTA @DCDF8JN1:204:C0V4FACXX:4:1302:10814:73798/2
####################################################################################################

You can see by the read coordinates that the X and Y position are the same for the forward and reverse reads -- suggesting that they are in the same pair. It is apparent that this is one unique paired-end read, the forward and reverse reads overlap, there is a deletion in the overlapping portion, and the read was duplicated. This is one event that is counted 4 times. I also confirmed that Picard mark dups correctly flagged all 4 reads as duplicates.

When I rank order candidates by the reported number of unique reads, these types of events are enriched at the top of my list.

Hi Hshain, I have this same issue using Pindel 0.2. Did you ever figure out why this is happening or how to fix/avoid this problem?

**KaiYe** · 05-26-2015, 06:48 AM

Pindel marks a read as duplicated if both read pairs are identical. For the first event, both read sequences and the their mates' mapping positions are the same. Thus they are all marked as duplicates. For the second case, the mates' positions differ. Of course, this could be modified depending what we consider as true duplicates.

kai

**Yann G.** · 06-01-2015, 06:06 AM

Hi Kai

I am a little confused about "-u" option (maximum_allowed_mismatch_rate)

In 0.2.5a8 version, help says:
"Only reads with more than this fraction of mismatches than the reference genome will be considered as harboring potential SVs. (default 0.02) "

BUT, on the website (http://gmt.genome.wustl.edu/packages...er-manual.html), it says
"only reads with fewer mismatches with the reference genome than this fraction will be considered (default 0.1)"

1- Which version should I trust ?
2- If I have well understood, this parameter (and the ones below) allow me to play with sensitivity/specificity depending on the kind of result I want (SV discovery or validation)
but I am not sure about when do I want to modify one option over the others in the list below :

-u (maximum_allowed_mismatch_rate)
-H (min_distance_to_the_end)
-n (minimum number of edit distance between reads and ref genome)
-d (min_num_matched_bases)
-a (additional_mismatch)
-m (min_perfect_match_around_BP)

Thanks!
Yann

PS: (still on the website) The link to dispersed duplication 's documentation seems dead.

**Yongyong.Ren** · 07-17-2016, 07:14 PM

Hi KaiYe,
Thanks so much for your attention to this. We are using the latest version 0.2.5b8, 20151210. But we are really confused about some variants with large insertion, for example the record in the xx_SI file:
(Line1)287 I.98 NT.98."AAAAAGATTTGACTTGTGTAAACGAACCCATTTTCAAGAACTCTACCATGGTTTTATATGGAGACACAGGTGATAAACAAGCAACCCAAGTGTCAATT" ChrID.13 BP.32911324 32911325 BP_range.32911324 32911334 Supports.40 40 +.40 40 -.0 0 S1.41 SUM_MS.3840 1 NumSupSamples.1 1 tumor.0.0.40.40.0.0
(Line2)AAA..................................................................................................................................................................................................AAAGATTTGGTTTATGTTCTTGCAGAGGAGAACAAAAATAGTGTAAAGCAGCATATAAAAATGACTCTAGGTCAAGATTTAAAATCGGACATCTCCTTGAATATAGATAAAATACCAGAAAAAAATAATGATTACATGAACAAATGGGCAG
(Line3)AAACAGACTTGACTTGTGTAAACGAACCCATTTTCAAGAACTCTACCATGGTTTTATATGGAGACACAGGTGATAAACAAGCAACCCAAGTGTCAATTATAAAGATTTGGTTTATGTTCTTGCAG............................ + 32911224 96 tumor @M03097:19:000000000-G05CH:1:2104:21078:24842
But it seems impossiple exist the insertion 'ACAGACTTGACTTGTGTAAACGAACCCATTTTCAAGAACTCTACCATGGTTTTATATGGAGACACAGGTGATAAACAAGCAACCCAAGTGTCAATTAT' based on the blat result for this read '@M03097:19:000000000-G05CH:1:2104:21078:24842':

Blat result:
Genomic chr13 :
gactctgaag aacttttctc agacaatgag aataattttg tcttccaagt 32911173
agctaatgaa aggaataatc ttgctttagg aaatactaag gaacttcatg 32911223
AAACAGACTT GACTTGTGTA AACGAACCCA TTTTCAAGAA CTCTACCATG 32911273
GTTTTATATG GAGACACAGG TGATAAACAA GCAACCCAAG TGTCAATTAa 32911323
aAAAGATTTG GTTTATGTTC TTGCAGagga gaacaaaaat agtgtaaagc 32911373
agcatataaa aatgactcta ggtcaagatt taaaatcgga catctccttg 32911423
aatatagata aaataccaga aaaaaa

We try to send the temp files (bam and the input file for pindel) to you, but the email address like [email protected] and [email protected] are already invalid.
Thanks in advance for your reply
my email is [email protected]
Yongyong Ren

Topics	Statistics	Last Post
Evaluating Genome Sequencing for ECMO Patients in the NICU by seqadmin Started by seqadmin, 12-17-2024, 10:28 AM	0 responses 33 views 0 likes	Last Post by seqadmin 12-17-2024, 10:28 AM
New Genetic Toolkit Refines Studies on Gene Function and Disease by seqadmin Started by seqadmin, 12-13-2024, 08:24 AM	0 responses 49 views 0 likes	Last Post by seqadmin 12-13-2024, 08:24 AM
Study Links Brain Mechanism to Emotional Responses in Animals and Humans by seqadmin Started by seqadmin, 12-12-2024, 07:41 AM	0 responses 34 views 0 likes	Last Post by seqadmin 12-12-2024, 07:41 AM
Study Identifies Ribosomal RNA Fingerprints as Early Cancer Biomarkers by seqadmin Started by seqadmin, 12-11-2024, 07:45 AM	0 responses 46 views 0 likes	Last Post by seqadmin 12-11-2024, 07:45 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News