SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
processing pindel output files odoyle81 Bioinformatics 9 01-17-2017 10:10 PM
Interpreting Pindel output bwubb Bioinformatics 11 07-07-2014 05:47 AM
breakdancer - empty output? caswater Bioinformatics 6 07-16-2012 12:41 PM
Breakdancer_max output is empty inou13 Bioinformatics 4 04-25-2012 03:54 PM
Bfast output and "Empty Sequence Dictionary" in .sam output aiden Bioinformatics 1 05-28-2010 06:50 PM

Reply
 
Thread Tools
Old 07-16-2012, 08:57 AM   #1
Seq_student
Junior Member
 
Location: California

Join Date: Jun 2012
Posts: 5
Default Pindel- empty output?

Hello sequencing gurus,

I am trying to setup a pipeline to analyze NGS data. I have been having trouble with 2 programs in particular (breakdancer_max and pindel). In regards to pindel, I have a correctly sorted and indexed bam file for input and the output looks like this (for every chromosome):

*** Calling SV using Split-Read Analysis: /home/usr/apps/pindel024s/src ***
>> Running Pindel on ALL: /home/usr/apps/pindel024s/src/pindel -f /srv/gs1/projects/lab/usr/data/hg19/ucsc.hg19.fasta -i /srv/gs1/projects/lab/usr/dir.2/JS2.sra.cfg -o /srv/gs1/projects/lab/usr/dir.2/JS2.sra -c ALL
Pindel version 0.2.4s, June 18 2012.
Looping over all chromosomes.
Processing chromosome: chrM
Chromosome Size: 16571
NumBoxes: 60020 BoxSize: 667

Looking at chromosome chrM bases 0 to 10000000.
getReads chrM 20016571
Insertsize in bamreads: 195
Number of reads in current window: 0, + 0 - 0
Number of reads where the close end could be mapped: 0, + 0 - 0
Percentage of reads which could be mapped: + 0.00% - 0.00%

No currentState.Reads for chrM found in /srv/gs1/projects/lab/usr/dir.2/JS2_L1_1_pf_aa.sorted.recal.bam
BAM file index 0 0
There are no reads for this bin.


And the configuration file looks like this:

/srv/gs1/projects/lab/usr/dir.2/JS2_L1_1_pf_aa.sorted.recal.bam 195 JS2


I can't figure out what's wrong. I originally used an older version and then reinstalled the latest version of pindel (0.2.4s) and I get the same problem, the "BAM file index 0 0" "no reads in this bin" for every bin for every chromosome.

Has anyone encountered a problem like this and been able to solve it? Any ideas would be really helpful.

Thanks.
Seq_student is offline   Reply With Quote
Old 07-20-2012, 07:26 AM   #2
mboursnell
Member
 
Location: Cambridge, UK

Join Date: Jul 2012
Posts: 17
Default Pindel "There are no reads for this bin"

Hi,

I also am having trouble with pindel

I run this command on a BAM file:

/opt/pindel/pindel -f /home/genetics/canfam2/canfam2.fasta -i SHY_01_input.txt -c chr18 -o SHY_01_out

and I get this:

Processing chromosome: chr18
Chromosome Size: 58872314
7888 10000
Looking at chromosome chr18 bases 0 to 10000000.
There are no reads for this bin.
Looking at chromosome chr18 bases 10000000 to 20000000.
There are no reads for this bin.
Looking at chromosome chr18 bases 20000000 to 30000000.
There are no reads for this bin.
Looking at chromosome chr18 bases 30000000 to 40000000.
There are no reads for this bin.
Looking at chromosome chr18 bases 40000000 to 50000000.
There are no reads for this bin.
Looking at chromosome chr18 bases 50000000 to 60000000.
There are no reads for this bin.
Looking at chromosome chr18 bases 60000000 to 70000000.
There are no reads for this bin.
Looking at chromosome chr18 bases 70000000 to 80000000.
There are no reads for this bin.


Any ideas?
mboursnell is offline   Reply With Quote
Old 08-01-2013, 11:08 AM   #3
dGho
Member
 
Location: Rochester, NY

Join Date: Jan 2013
Posts: 43
Default

Quote:
Originally Posted by Seq_student View Post
Hello sequencing gurus,

I am trying to setup a pipeline to analyze NGS data. I have been having trouble with 2 programs in particular (breakdancer_max and pindel). In regards to pindel, I have a correctly sorted and indexed bam file for input and the output looks like this (for every chromosome):

*** Calling SV using Split-Read Analysis: /home/usr/apps/pindel024s/src ***
>> Running Pindel on ALL: /home/usr/apps/pindel024s/src/pindel -f /srv/gs1/projects/lab/usr/data/hg19/ucsc.hg19.fasta -i /srv/gs1/projects/lab/usr/dir.2/JS2.sra.cfg -o /srv/gs1/projects/lab/usr/dir.2/JS2.sra -c ALL
Pindel version 0.2.4s, June 18 2012.
Looping over all chromosomes.
Processing chromosome: chrM
Chromosome Size: 16571
NumBoxes: 60020 BoxSize: 667

Looking at chromosome chrM bases 0 to 10000000.
getReads chrM 20016571
Insertsize in bamreads: 195
Number of reads in current window: 0, + 0 - 0
Number of reads where the close end could be mapped: 0, + 0 - 0
Percentage of reads which could be mapped: + 0.00% - 0.00%

No currentState.Reads for chrM found in /srv/gs1/projects/lab/usr/dir.2/JS2_L1_1_pf_aa.sorted.recal.bam
BAM file index 0 0
There are no reads for this bin.


And the configuration file looks like this:

/srv/gs1/projects/lab/usr/dir.2/JS2_L1_1_pf_aa.sorted.recal.bam 195 JS2


I can't figure out what's wrong. I originally used an older version and then reinstalled the latest version of pindel (0.2.4s) and I get the same problem, the "BAM file index 0 0" "no reads in this bin" for every bin for every chromosome.

Has anyone encountered a problem like this and been able to solve it? Any ideas would be really helpful.

Thanks.
Hi Seqstudent, I am having the same problem which I find vexing as I have run pindel on many older bam files without this problem. Did you ever figure out what the issue is?
dGho is offline   Reply With Quote
Old 08-02-2013, 12:31 AM   #4
mboursnell
Member
 
Location: Cambridge, UK

Join Date: Jul 2012
Posts: 17
Default

I think it was the reference sequence in my case. After replacing the file and re-doing the indexes, it seemed to work OK.

Last edited by mboursnell; 08-02-2013 at 07:50 AM.
mboursnell is offline   Reply With Quote
Old 08-02-2013, 07:48 AM   #5
KaiYe
Senior Member
 
Location: amsterdam

Join Date: Jun 2009
Posts: 133
Default

check whether you provide the same reference sequence used for mapping to pindel.
chr1 is consider different than 1.
check bam header (samtools view -H) and reference index (.fai), whether they match?
KaiYe is offline   Reply With Quote
Old 08-07-2013, 09:40 AM   #6
dGho
Member
 
Location: Rochester, NY

Join Date: Jan 2013
Posts: 43
Default

Quote:
Originally Posted by KaiYe View Post
check whether you provide the same reference sequence used for mapping to pindel.
chr1 is consider different than 1.
check bam header (samtools view -H) and reference index (.fai), whether they match?
first 10 bam headers:
Quote:
@SQ SN:1 LN:249250621
@SQ SN:2 LN:243199373
@SQ SN:3 LN:198022430
@SQ SN:4 LN:191154276
@SQ SN:5 LN:180915260
@SQ SN:6 LN:171115067
@SQ SN:7 LN:159138663
@SQ SN:8 LN:146364022
@SQ SN:9 LN:141213431
@SQ SN:10 LN:135534747
first 10 lines of ref.fasta.fai:

Quote:
1 dna:chromosome chromosome:GRCh37:1:1:249250621:1 249250621 52 80 81
2 dna:chromosome chromosome:GRCh37:2:1:243199373:1 243199373 252366358 80 81
3 dna:chromosome chromosome:GRCh37:3:1:198022430:1 198022430 498605776 80 81
4 dna:chromosome chromosome:GRCh37:4:1:191154276:1 191154276 699103539 80 81
5 dna:chromosome chromosome:GRCh37:5:1:180915260:1 180915260 892647296 80 81
6 dna:chromosome chromosome:GRCh37:6:1:171115067:1 171115067 1075824049 80 81
7 dna:chromosome chromosome:GRCh37:7:1:159138663:1 159138663 1249078107 80 81
8 dna:chromosome chromosome:GRCh37:8:1:146364022:1 146364022 1410206056 80 81
9 dna:chromosome chromosome:GRCh37:9:1:141213431:1 141213431 1558399681 80 81
10 dna:chromosome chromosome:GRCh37:10:1:135534747:1 135534747 1701378334 80 81
...They seem to match. I just can't figure out why pindel works for our files and not the new. The only difference between the two sets was the cleanup process that occurs before bwa. Could this be causing the problem?
dGho is offline   Reply With Quote
Old 08-07-2013, 09:45 AM   #7
KaiYe
Senior Member
 
Location: amsterdam

Join Date: Jun 2009
Posts: 133
Default

Quote:
Originally Posted by dGho View Post
first 10 bam headers:


first 10 lines of ref.fasta.fai:


...They seem to match. I just can't figure out why pindel works for our files and not the new. The only difference between the two sets was the cleanup process that occurs before bwa. Could this be causing the problem?
what kind of clean up?
KaiYe is offline   Reply With Quote
Old 08-07-2013, 10:37 AM   #8
dGho
Member
 
Location: Rochester, NY

Join Date: Jan 2013
Posts: 43
Default

We switched a new sequencing center and the preliminary cleanup involves removal of adapters and qc filteration of reads, and then removal of singletons and syncing of files. I'm sure the first sequencing center also performed similar cleanup. But the actual sequencing and cleanup may have changed slightly. Otherwise, the rest of the workflow (aligning etc) was performed by me in the same manner for both sets of samples.

head of the raw fastq file (new set):

Quote:
@HISEQ:40239YACXX:5:1101:1629:2201 1:N:0:ATCACG
AGTAGAGACGGGGTTTCACCATGTTAGCNAGGNTGGTCTTGATCTCCTGACCTCGTGATCTGCCCACCTCGGCCTCCCAAAGTGCTGGGATTACAGGTGT
+
BBBFFFFFFFFFFBFFFIFIFFFFIFFF#0BF#0<B7BBFFFFFBFFBFFIIBBFBFFFIFFFFFFFFFBBFFFFBBFBFFF7BBFFFFBBBBBB<B0<0
@HISEQ:40239YACXX:5:1101:1657:2204 1:N:0:ATCACG
GTTTTACAAATTATGATATATTATTTCCTAATNATCAACATTTACCCTTTGATGGCAGTAAAATCTTGTTCATATGGAGCATTGTCTAAGAAGGCAAATT
+
BBBFFFFFFFFFFIIFIIIIIIIIIIIIIIII#0BFFIIIIIIIIIIIIIIIIIIIIIFFIIIIIIIIFIIIIIIFIIIIFFFFFFFFFFFFFFFFFFFF
@HISEQ:40239YACXX:5:1101:1727:2211 1:N:0:ATCACG
TCGAACGGAATCGAATAGAATCATCATGGGATNGAAATGAATGGAATCATCATCGAATGGAATCGAATAGAATTATGGAATGAAATCCAGTGTGATCATC
head of fastq file after qc filtration and adapter trimming:

Quote:
@HISEQ:42:H0JD4ADXX:1:1101:1403:1985
NGGTTACATTTCAGGGGGAAAGGCTACACTGAAATGAACATTGTAAGAAAACTCAATTTAATGATACCTTGGAGTATATTCTTGCTTCATGTACTGCTGT
+
#0<FFFFFFFFFFIIIIIFIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIFIIIIIFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFF
@HISEQ:42:H0JD4ADXX:1:1101:1657:1979
NTTTTTTAAATTTAGGATAACACATTTTTGTTTCTAAAGTGATTTGTGATTTGTGCTGTATAAACTGTATAAAAGGTTCTGTTTTTAAAGGTGGATTTTC
+
#0BFFFFFFFFFFIIIIIIIIIIIIIIIIIIIIIIIIIIFFFIIIIFFFFIIIFFIIIIIIIIIIIIIIIIIIIIFFFFBFFFFFFFFFFFBBFFFFFFF
@HISEQ:42:H0JD4ADXX:1:1101:1604:1993
CTACAAAAACGAAAATTAGCCGGACATGGTGGTGCACGCCTGTAGTCCCAGATACTCGGGAGACTGAGATGGAAGGATCACCTGAGCCTAGGGAAGTCGA
So the actual fastq files look different. for example, the whole 1:Y:0:ATCACG part (sequence index,etc) is missing from the cleaned fastq. I don't know if this is the problem, but is this set of information that has been removed something that pindel uses?
dGho is offline   Reply With Quote
Old 08-07-2013, 11:54 AM   #9
KaiYe
Senior Member
 
Location: amsterdam

Join Date: Jun 2009
Posts: 133
Default

Removing sequence index is necessary. I do not see this step affects Pindel. what is "removal of singletons"? can you send me your processing steps for me to figure out?
KaiYe is offline   Reply With Quote
Old 08-09-2013, 05:00 AM   #10
dGho
Member
 
Location: Rochester, NY

Join Date: Jan 2013
Posts: 43
Default

Quote:
Originally Posted by KaiYe View Post
Removing sequence index is necessary. I do not see this step affects Pindel. what is "removal of singletons"? can you send me your processing steps for me to figure out?
Hello Kai,

Thank you so much. You are always very prompt and helpful when it comes to questions about your software.

We sequence paired end human DNA, the genome center removes the adaptor sequence using seqClean, then fasts-toolkit end trimming based on quality (remove bases with quality scores less than 13 from the end of each sequence). They also sync the fastqs(so they are in the same order) and filter out reads(along with their pair) that do not pass a specific quality threshold. By removal of singletons, I meant that sometimes one of the member of a pair does not pass the qc filters while the other member does. In this case the member that does pass qc (the singleton) is also removed.

After the initial cleanup performed by the genome sequencing center (above), I align to hg19 with bwa. Import sam to bam and index using samtools. I then use Picard to CleanSam and MarkDuplicates. Gatk then does IndelRealignment (realigns the area around indels) and BaseQuaityScoreRecalibration.

At this point the bams are ready for variant calling. I have tried running Pindel on the new samples before Picard, and also before GATK to see if it would make a difference; however, I still keep getting the blank output.

I use the same hg19.fa ref file for the whole pipeline. I am sorry if this is too detailed or not detailed enough. Any clue about what the issue may be would be great.
dGho is offline   Reply With Quote
Old 08-09-2013, 06:39 AM   #11
KaiYe
Senior Member
 
Location: amsterdam

Join Date: Jun 2009
Posts: 133
Default

hi dGho,

I have questions about qc filters. how do you treat reads with poor mapping quality, clipped and unmapped, and so on. Pindel examines all reads without perfect mapping. If for some reasons, they are removed, Pindel will fail to capture variants.

checked your post on Aug 1:
"Looking at chromosome chrM bases 0 to 10000000."

there is "chr" in your reference file. but without it, in your bam file based on your post on aug 7.

insert size might also be an issue here. can you put 500 there and give another try?

providing a small dataset along with your reference file for me to reproduce the error is helpful. please also update your pindel version, which is already more than one year old. we had multiple major updates.

Kai
KaiYe is offline   Reply With Quote
Old 08-15-2013, 12:26 PM   #12
dGho
Member
 
Location: Rochester, NY

Join Date: Jan 2013
Posts: 43
Default

Quote:
Originally Posted by KaiYe View Post
hi dGho,

I have questions about qc filters. how do you treat reads with poor mapping quality, clipped and unmapped, and so on. Pindel examines all reads without perfect mapping. If for some reasons, they are removed, Pindel will fail to capture variants.

checked your post on Aug 1:
"Looking at chromosome chrM bases 0 to 10000000."

there is "chr" in your reference file. but without it, in your bam file based on your post on aug 7.

insert size might also be an issue here. can you put 500 there and give another try?

providing a small dataset along with your reference file for me to reproduce the error is helpful. please also update your pindel version, which is already more than one year old. we had multiple major updates.

Kai
Sorry for my late response. My post from Aug 1 was quoting another person who posted. I don't have chr in my reference or bams, just the numbers.

We recently received third set of exomes from the same new sequencing center. This new set is running through pindel with no problems like our old ones used to. The difference between these two sets was the following:

1. In the set that does not run on pindel: after removing reads from the raw fastqs that did not pass a qc threshold, the orphaned singletons (or reads that no longer have a pair bc one mate did not pass qc but the other did) were removed using the following script that I found here :

http://seqanswers.com/forums/showthread.php?t=17974

In the newest set of exomes, the sequencing center removed the singletons for me. Before and after this step, the pipeline was identical, so I must have messed something up when I ran that script. Other that pindel, I have not had any problems downstream with that script so far.

So this solved my specific problem.

Would you still like me to send you part of one of the fastqs that did not work to see why pindel cannot handle the fastqs after using this script? I don't think it is a widely used script.

Last edited by dGho; 08-15-2013 at 12:28 PM.
dGho is offline   Reply With Quote
Old 08-15-2013, 12:30 PM   #13
KaiYe
Senior Member
 
Location: amsterdam

Join Date: Jun 2009
Posts: 133
Default

can you provide a small region of the bam (10kb)? if the bam file size is less than 10MB, you can post to my email kye@genome.wustl.edu. otherwise, dropbox or other means would be better. I will take a look.
KaiYe is offline   Reply With Quote
Reply

Tags
empty output, pindel

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:21 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO