![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Interpreting Pindel output | bwubb | Bioinformatics | 11 | 07-07-2014 06:47 AM |
cuffdiff output files are empty | pinki999 | Bioinformatics | 1 | 05-21-2012 08:49 AM |
Tophat - processing several files fastq | marb | Bioinformatics | 3 | 04-18-2012 04:12 PM |
Processing Blast output for Blast2GO | JueFish | Bioinformatics | 3 | 10-29-2011 07:37 AM |
Re-processing cwf files on GS-FLX | richardbadge | 454 Pyrosequencing | 5 | 03-24-2011 02:13 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: United States Join Date: Aug 2011
Posts: 31
|
![]()
I have a massive pindel deletion file that has too many rows to open in excel..
Anyone have any ideas on how I can analyze this? I probably need to convert this to a database? It would be nice if pindel could output a list of deletions as a nice csv file... EDIT: I was able to delete all lines from the output file not starting with a digit (using a regex) and that gives me file I can start to format as a csv. Any better ideas or methods out there? |
![]() |
![]() |
![]() |
#2 |
Member
Location: Philadelphia Join Date: Jan 2012
Posts: 58
|
![]()
It comes with an executable to format output to a VCF file. This should trim it down a good deal, depending on how many variants you have versus supporting reads.
VCF format is accepted by certain annotation programs as well, which is very nice. |
![]() |
![]() |
![]() |
#3 | |
Senior Member
Location: amsterdam Join Date: Jun 2009
Posts: 133
|
![]() Quote:
grep ChrID Pindel_output.txt | awk '{print $.....}' |
|
![]() |
![]() |
![]() |
#4 | |
Member
Location: United States Join Date: Aug 2011
Posts: 31
|
![]() Quote:
Thanks! |
|
![]() |
![]() |
![]() |
#5 | |
Senior Member
Location: amsterdam Join Date: Jun 2009
Posts: 133
|
![]() Quote:
./INSTALL <path to samtools folder> you will have binary programs, pindel, pindel2vcf... If you type ./pindel2vcf you will see documentation... |
|
![]() |
![]() |
![]() |
#6 |
Member
Location: United States Join Date: Aug 2011
Posts: 31
|
![]()
awesome thanks - this works great!
|
![]() |
![]() |
![]() |
#7 |
Junior Member
Location: San Diego Join Date: Nov 2009
Posts: 3
|
![]()
Hello,
I am having an issue running pindel2vcf, just for a particular reference genome. Another set of Pindel files using a different reference genome converted fine without a problem. I successfully ran Pindel and have what looks to be proper Pindel output files. The reference was indexed with samtools faidx. When I run pindel2vcf it looks like it can't find the scaffold sequences in the fasta. Are certain characters not allowed in the reference fasta, or something wrong with the ChrID naming for this particular fasta? I have them named S00, S01, so on, with the ChrIDs in the Pindel output files matching those in the reference so that doesn't seem to be the issue. Any help would be greatly appreciated. Thanks. pindel2vcf -p EXAMPLE_D -r ./EXAMPLE_reference.fasta -R example -d 20130728 -v EXAMPLE_deletions.vcf Samples: 1. EXAMPLE Chromosomes in which SVs have been found: 1. S00 2. S01 3. S02 4. S04 5. S05 6. S06 7. S07 8. S08 9. S09 10. S10 11. S11 12. S12 13. S13 14. S14 15. S15 16. S16 17. S17 18. S18 19. S19 20. S20 21. S21 22. S22 23. S23 24. S26 25. S28 26. S29 27. S36 28. S37 29. S39 Scanning chromosome: S00 Scanning chromosome: S01 Scanning chromosome: S02 Scanning chromosome: S03 Scanning chromosome: S04 Scanning chromosome: S05 Scanning chromosome: S06 Scanning chromosome: S07 Scanning chromosome: S08 Scanning chromosome: S09 Scanning chromosome: S10 Scanning chromosome: S11 Scanning chromosome: S12 Scanning chromosome: S13 Scanning chromosome: S14 Scanning chromosome: S15 Scanning chromosome: S16 Scanning chromosome: S17 Scanning chromosome: S18 Scanning chromosome: S19 Scanning chromosome: S20 Scanning chromosome: S21 Scanning chromosome: S22 Scanning chromosome: S23 Scanning chromosome: S24 Scanning chromosome: S25 Scanning chromosome: S26 Scanning chromosome: S27 Scanning chromosome: S28 Scanning chromosome: S29 Scanning chromosome: S30 Scanning chromosome: S31 Scanning chromosome: S32 Scanning chromosome: S33 Scanning chromosome: S34 Scanning chromosome: S35 Scanning chromosome: S36 Scanning chromosome: S37 Scanning chromosome: S38 Scanning chromosome: S39 Exiting reference scanning. , skipping it.hromosome S00 from memory.mosome S00 , skipping it.hromosome S01 from memory.mosome S01 , skipping it.hromosome S02 from memory.mosome S02 , skipping it.hromosome S03 from memory.mosome S03 , skipping it.hromosome S04 from memory.mosome S04 , skipping it.hromosome S05 from memory.mosome S05 , skipping it.hromosome S06 from memory.mosome S06 , skipping it.hromosome S07 from memory.mosome S07 , skipping it.hromosome S08 from memory.mosome S08 , skipping it.hromosome S09 from memory.mosome S09 , skipping it.hromosome S10 from memory.mosome S10 , skipping it.hromosome S11 from memory.mosome S11 , skipping it.hromosome S12 from memory.mosome S12 , skipping it.hromosome S13 from memory.mosome S13 , skipping it.hromosome S14 from memory.mosome S14 , skipping it.hromosome S15 from memory.mosome S15 , skipping it.hromosome S16 from memory.mosome S16 , skipping it.hromosome S17 from memory.mosome S17 , skipping it.hromosome S18 from memory.mosome S18 , skipping it.hromosome S19 from memory.mosome S19 , skipping it.hromosome S20 from memory.mosome S20 , skipping it.hromosome S21 from memory.mosome S21 , skipping it.hromosome S22 from memory.mosome S22 , skipping it.hromosome S23 from memory.mosome S23 , skipping it.hromosome S24 from memory.mosome S24 , skipping it.hromosome S25 from memory.mosome S25 , skipping it.hromosome S26 from memory.mosome S26 , skipping it.hromosome S27 from memory.mosome S27 , skipping it.hromosome S28 from memory.mosome S28 , skipping it.hromosome S29 from memory.mosome S29 , skipping it.hromosome S30 from memory.mosome S30 , skipping it.hromosome S31 from memory.mosome S31 , skipping it.hromosome S32 from memory.mosome S32 , skipping it.hromosome S33 from memory.mosome S33 , skipping it.hromosome S34 from memory.mosome S34 , skipping it.hromosome S35 from memory.mosome S35 , skipping it.hromosome S36 from memory.mosome S36 , skipping it.hromosome S37 from memory.mosome S37 , skipping it.hromosome S38 from memory.mosome S38 , skipping it.hromosome S39 from memory.mosome S39 |
![]() |
![]() |
![]() |
#8 |
Senior Member
Location: amsterdam Join Date: Jun 2009
Posts: 133
|
![]()
first time have this issue. can you provide a subset of your output and your reference file somewhere like ftp?
|
![]() |
![]() |
![]() |
#9 | |
Junior Member
Location: Tunisia Join Date: Jan 2017
Posts: 1
|
![]() Quote:
I would like to discuss about the preprocessing of the input files and the running of Pindel program. At the begining, I should present the basic of this work: Five unrelated patient's DNA were sequenced using an illumina kit on the MiSeq. This kit covers 12 Mb of genomic content. In order to detect the breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants from next-gen sequence data, Pindel was chosen to refine and complete the analysis procedure. To success this step, I have encountered some problems: 1- The preprocessing of the input files: The input for Pindel consists of the reference genome sequence and the Bam files resulting from our high throughput sequencing manipulation. Here, my question is as follows: I should download all the human reference genome? Or simply, I write this command './pindel -f hs_ref_GRCh37.fa -p my_input_name_files.txt -c ALL -o my_output-name_files' and the software can run it? And, in my case, searching for indels and SVs should be limited to the genomic regions covered by the Trusight One kit? Can I generate a false results when we map paired-end reads to the entire human reference genome ? 2- Insert size: My question is the following : What are the tools used to obtain the insert size metrics for the each samples? 3- Running Pindel on five bam files: I have five bam files generated from the sequencing of five unrelated affected patient's DNA. What do you recommend: I run pindel with bam files one by one or I run all the files at the same time ? And what's the diffrence(s) between the output files in each case? 4- The computational infrastructure recommended for the execution of Pindel (memory size, Hard disk). I look forward to your response. |
|
![]() |
![]() |
![]() |
#10 |
Junior Member
Location: Xi'an, China Join Date: Jan 2017
Posts: 1
|
![]()
Hello Myriem,
this is Eric-Wubbo Lameijer from Kai Ye's (Pindel's) lab. To answer your questions: 1) you need the reference genome/fasta file that has been used to generate the BAM file, and give the name of that file (and the path to it) as the -f parameter. If another bioinformatician has created the BAM file, they should be able to provide you with the correct fasta file. If you can't get that fasta file, you need to do some extra work; some people in the forum may know where you can download a 'proper' reference genome, I myself have not found a ready-made reference genome yet and had to use ftp://ftp.ncbi.nlm.nih.giv/genomes/H...romosomes/seq/ and of those the hs_ref_ files. Gunzip, merge, possibly change the chromosome names (after the >) to chr1, chr2 etc., and use samtools to index the reference file. There is also a file on the UCSC website – you can check hgdownload.cse.ucsc.edu/downloads.html . But easiest (and best) is if you can use the fasta file that has been used for creating the BAM files. 1b) Yes, Pindel can generate (more) false positives if the whole genome reference is used, as it could be that a region outside the scanned area provides a more exact match. The ways I would personally handle this are first to limit the size of indels to seek (-x option with 1 or 2), and basically be wary of all indels that have very low coverage/support – though what counts as low support will depend on your dataset. You can use an option in pindel2vcf (the -e option) if there seem too many indel calls with a very low support. What support to take as border depends on the coverage of your original data set, calls with a total support of less than something like 20-25% of the median coverage tend to be relatively unreliable in my experience. 2) Insert size metrics: at the moment, Pindel assumes that the user knows the insert size of the library he/she used/ordered. If you don't know: according to some discussions on biostars (https://www.biostars.org/p/14339/ and https://www.biostars.org/p/94246/ ) some BAM/SAM files have this information, otherwise you need to copy/use some script to deduce it. 3) Running on the patients separately or as one group: in general, I would recommend running Pindel on the full set of samples in one go; this increases Pindel's sensitivity somewhat, and makes downstream processing easier. And if you see in (in all unrelated patients) an indel at a certain position with low allele frequency (say 10-20%), then you can be reasonably certain that this is a false call caused my measurement errors or problems with genomic repetitiveness or such. So in general, try to run Pindel on the entire set in one go. As for the differences: running samples together increases the sensitivity of Pindel (chance that it finds a relatively difficult-to-find indel), though it decreases the specificity (larger chance to find a 'fake' indel). So it is a tradeoff, but generally I think it more useful to throw away bad indels later than not to find real indels in the first place. 4) One does not need special hardware for Pindel; basically, if a computer runs Linux (OSX can also work, but getting Pindel to work on OSX can be a bit trickier) it can run Pindel; even on a normal system (say PC with 2 GB of memory) Pindel should not run out of memory and should be finished in a time between 10 minutes and a day, for your exome I'd estimate an hour at most. If there is a problem with lack of memory, please consult the FAQ file in the Pindel main directory, that should generally work. If that does not work, please contact us directly on our contact e-mail addresses or by raising an issue on GitHub. But basically, I would not expect any problems with extreme running times or out-of-memory errors. Best regards, Eric-Wubbo |
![]() |
![]() |
![]() |
Thread Tools | |
|
|