SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   Missing chromosomes in vcf output (http://seqanswers.com/forums/showthread.php?t=45641)

spyf89 08-06-2014 12:31 PM

Missing chromosomes in vcf output
 
**UPDATE**

Problem solved. The local version of the .fai index file I was using only contained chromosome 1. Version on the server that I accidentally checked multiple times was fine. Whoops.

Hello everyone!

I have bam files for several histone ChIP-seq and RNA-seq experiments and I am attempting to call SNPs. I used the following command successfully with the ChIP-seq data:

Code:

samtools mpileup -uD -f hg19.fa INPUT.bam | bcftools view -bcvg - > INPUT_raw.bcf;
bcftools view DL237_INPUT_raw.bcf | vcfutils.pl varFilter -D100 > INPUT.vcf;

For the RNA-seq data, the sequencing center changed their pipeline and I needed a different ENSEMBL build for mpileup:

Code:

samtools mpileup -uD -f Homo_sapiens.GRCh37.72.dna_rm.toplevel.fa.gz RNA.bam | bcftools view -bcvg - > RNA_raw.bcf;
bcftools view RNA_raw.bcf | vcfutils.pl varFilter -D100 > RNA.vcf;

Everything seemed to run smoothly until the final step. I get no errors and no indication of a problem.
The output VCF starts out just like all the others but doesn't have entries for any chromosomes except for chromosome 1.
If I take out the vcfutils.pl varFilter step and run:

Code:

samtools mpileup -uD -f Homo_sapiens.GRCh37.72.dna_rm.toplevel.fa.gz RNA.bam | bcftools view -bcvg - > RNA_raw.bcf;
bcftools view RNA_raw.bcf > RNA.vcf;

I end up with a MASSIVE vcf file with SNPs called for all chromosomes and no errors. The problem seems to be in the vcfutils.pl step but I have used this script successfully with other files. I'm not sure what else to try/where the problem might be. Any advice? And please let me know if any additional information would be helpful! *Note* I am limitedly familiar with this pipeline and am doing this as a favor for a lab mate who is out of town.

swbarnes2 08-06-2014 03:30 PM

First thing to check...I wonder if one file is sorted chr1, chr2, chr3...and the other chr1, chr10, chr11...

That discrepancy might explain why the file stops at Chr1.

spyf89 08-06-2014 03:43 PM

I'll definitely check that out. I sorted the bam to see if that fixed the problem (no luck) but perhaps they are sorted differently.

The only reason I think this might not fix things is because, without the vcftools.pl varFilter step, there are SNPs called across all chromosomes. The information seems to be in order in the .bcf file...it's just not getting through the filtering step.

Thank you for the response!


All times are GMT -8. The time now is 09:14 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.