Unconfigured Ad

**GenoMax** · 04-29-2019, 03:37 PM

I think "bcftools view" command should give you what you need. Check the help page here.

**JTH-Genome** · 04-30-2019, 03:50 AM

Canine genomes

I liked GenoMax's suggestion, so go with that if you haven't already. Also be sure to use long read (PacBio) based assembly for your analysis against a reference genome.
If you are interested in canine genomes, here are some articles that may be of interest:

303 See Other

https://www.nature.com/articles/s41598-018-29190-3

https://www.youtube.com/watch?v=OjsvbDqfp7w&feature=youtu.be

German Shepherd Genome Project Fetches Popular Vote - PacBio

https://www.pacb.com/blog/german-shepherd-genome/

A proposal to sequence the genome of the German Shepherd dog by the University of Wisconsin - Madison was selected as winner of the 2019 Plant and Animal SMRT Grant after garnering more than 3,000 votes in a close online video competition. We caught up with members of the Comparative Genetics Research Laboratory to find out more about the project.

Dog meet dog world: Exploring canine genomes - PacBio

https://www.pacb.com/blog/dog-meet-dog-world-exploring-canine-genomes/

Scientists are doing deep dives into the genomes of a range of canine cousins along the evolutionary chain.

#SeqLife

**XeroxHero69** · 04-30-2019, 09:09 AM

Originally posted by GenoMax View Post

I think "bcftools view" command should give you what you need. Check the help page here.

Thanks so much, I'll try this out

**XeroxHero69** · 04-30-2019, 11:04 AM

Originally posted by GenoMax View Post

I think "bcftools view" command should give you what you need. Check the help page here.

So I have bcftools installed. I want to check for a single-replacement mutation on a specific gene in each of the genomes in the file. Do you know of a way to do this using bcftools? I am new to bioinformatics and this is going way over my head. Thanks

**questor2010** · 04-30-2019, 07:53 PM

Originally posted by GenoMax View Post

I think "bcftools view" command should give you what you need. Check the help page here.

Note that for very large vcf files, using -t/-T instead of -r/-R is going to be much faster. And use the --threads option, that will help as well.

**XeroxHero69** · 05-01-2019, 09:16 AM

Originally posted by questor2010 View Post

Note that for very large vcf files, using -t/-T instead of -r/-R is going to be much faster. And use the --threads option, that will help as well.

Thanks for your reply. Could you give me an example of the format of what a command that would use bcftools view to find a base at a position would look like?

**questor2010** · 05-01-2019, 09:35 AM

Originally posted by XeroxHero69 View Post

Thanks for your reply. Could you give me an example of the format of what a command that would use bcftools view to find a base at a position would look like?

Sure - here's an example:

bcftools view -f PASS --threads 8 -T target.bed -o gnomad.genomes.r2.1.sites.target.vcf.gz -O z gnomad.genomes.r2.1.sites.vcf.bgz

In this case, I'm using a bed file, instead of a single region. It is pulling out the FILTER=PASS variants that intersect the bed file into a new compressed vcf file. The source vcf file in this case is 465GB. If you have a single variant, you could use -t 1:11022. It might be best to specify a short range (1:11015-11030) if you're looking at indels - variant callers represent indels in different ways and you want to be sure you properly intersect them.

**XeroxHero69** · 05-01-2019, 10:22 AM

Originally posted by questor2010 View Post

Sure - here's an example:

bcftools view -f PASS --threads 8 -T target.bed -o gnomad.genomes.r2.1.sites.target.vcf.gz -O z gnomad.genomes.r2.1.sites.vcf.bgz

In this case, I'm using a bed file, instead of a single region. It is pulling out the FILTER=PASS variants that intersect the bed file into a new compressed vcf file. The source vcf file in this case is 465GB. If you have a single variant, you could use -t 1:11022. It might be best to specify a short range (1:11015-11030) if you're looking at indels - variant callers represent indels in different ways and you want to be sure you properly intersect them.

Thanks so much. I tried this command:

bcftools view -f PASS --threads 8 -r chr9:55252802-55252802 -o 722g.990.SNP.INDEL.chrAll.vcf.gz -O z 722g.990.SNP.INDEL.chrAll.vcf.gz

and it returned:
[W::hts_idx_load2] The index file is older than the data file: 722g.990.SNP.INDEL.chrAll.vcf.gz.tbi
[W::hts_idx_load2] The index file is older than the data file: 722g.990.SNP.INDEL.chrAll.vcf.gz.tbi
[W::bgzf_read_block] EOF marker is absent. The input is probably truncated

Do you think you could point out what I did wrong?
Edit: I think i see my error, i made it output the data to the same file and now I think i ruined that data file... its only 16.1 kb now

**questor2010** · 05-01-2019, 10:51 AM

Originally posted by XeroxHero69 View Post

Thanks so much. I tried this command:

bcftools view -f PASS --threads 8 -r chr9:55252802-55252802 -o 722g.990.SNP.INDEL.chrAll.vcf.gz -O z 722g.990.SNP.INDEL.chrAll.vcf.gz

and it returned:
[W::hts_idx_load2] The index file is older than the data file: 722g.990.SNP.INDEL.chrAll.vcf.gz.tbi
[W::hts_idx_load2] The index file is older than the data file: 722g.990.SNP.INDEL.chrAll.vcf.gz.tbi
[W::bgzf_read_block] EOF marker is absent. The input is probably truncated

Do you think you could point out what I did wrong?
Edit: I think i see my error, i made it output the data to the same file and now I think i ruined that data file... its only 16.1 kb now

Ouch. Yes - that's the problem. I hope you have another copy readily available. The index file warnings are often caused by file transfers. They transfer first for large files and then have an older timestamp. You can use "touch" to fix that, or reindex, but that takes a long time with large vcf files.

**XeroxHero69** · 05-06-2019, 08:58 AM

Originally posted by questor2010 View Post

Ouch. Yes - that's the problem. I hope you have another copy readily available. The index file warnings are often caused by file transfers. They transfer first for large files and then have an older timestamp. You can use "touch" to fix that, or reindex, but that takes a long time with large vcf files.

So to make sure I do it right this time, what do I do to make the .vcf.bgz file?

**questor2010** · 05-06-2019, 10:39 AM

The -o switch specifies the output file name, the -O switch specifies the format:

-O, --output-type <b|u|z|v> b: compressed BCF, u: uncompressed BCF, z: compressed VCF, v: uncompressed VCF

So -O z implies that the name should be basename.vcf.gz (or basename.vcf.bgz).

If you have a single variant you just want to examine, you could use -v and output to a standard vcf file (text).

**XeroxHero69** · 05-06-2019, 10:44 AM

Originally posted by questor2010 View Post

The -o switch specifies the output file name, the -O switch specifies the format:

-O, --output-type <b|u|z|v> b: compressed BCF, u: uncompressed BCF, z: compressed VCF, v: uncompressed VCF

So -O z implies that the name should be basename.vcf.gz (or basename.vcf.bgz).

If you have a single variant you just want to examine, you could use -v and output to a standard vcf file (text).

So do I need to create a new file for the output prior to running this command? how would the command look if I use -v without overwriting my file again. Sorry if these are simple questions. Thanks!

Topics	Statistics	Last Post
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, Yesterday, 12:03 PM	0 responses 19 views 0 reactions	Last Post by SEQadmin2 Yesterday, 12:03 PM
DNA Methylation Study Reveals How Epigenetic Changes Pass Between Generations by SEQadmin2 Started by SEQadmin2, Yesterday, 11:40 AM	0 responses 14 views 0 reactions	Last Post by SEQadmin2 Yesterday, 11:40 AM
MetaBeeAI Helps Scientists Process Research Literature Faster by SEQadmin2 Started by SEQadmin2, 05-28-2026, 11:40 AM	0 responses 29 views 0 reactions	Last Post by SEQadmin2 05-28-2026, 11:40 AM
Scientists Solve a 25-Year Mystery in RNA Interference by SEQadmin2 Started by SEQadmin2, 05-26-2026, 10:12 AM	0 responses 31 views 0 reactions	Last Post by SEQadmin2 05-26-2026, 10:12 AM

Unconfigured Ad

Accessing Genomic data in a large vcf.gz file

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News