Go Back   SEQanswers > Bioinformatics > Bioinformatics
Similar Threads
Thread Thread Starter Forum Replies Last Post
How to filter out chromosomes/specific regions from a BAM file? fcr Bioinformatics 6 10-29-2018 05:24 AM
error in the location of chromosomes (bam files) - ncRNA ref (alignament tophat) tfcardoso Bioinformatics 0 10-22-2016 07:38 AM
Write the subset of reads from BAM file into new SAM/BAM file, using R tools. Old Pioneer Bioinformatics 0 01-27-2016 04:41 AM
removing reads that map to more than one location from gsnap-aligned bam files efoss Bioinformatics 4 07-14-2015 07:39 PM
Removing pairs that align to almost the same positions from bam ShellfishGene Bioinformatics 2 07-16-2013 02:00 AM

Thread Tools
Old 09-26-2017, 06:31 PM   #1
Location: Auckland, NZ

Join Date: Nov 2011
Posts: 46
Default removing chromosomes from a bam file.

I'm trying to upload a bam file, from an alignment to GRCh38 that I've done, to a google genomics dataset, associating it with the reference set for GRCh38. The reason it fails given in the logfile reads:

reference names must be a subset of those of the requested
    reference set: missing ["chr1" "chr10" "chr11" "chr11_KI270721v1_random" "chr12"
    "chr13" "chr14" "chr14_GL000009v2_random" "chr14_GL000194v1_random" "chr14_GL000225v1_random" ...
It goes on to list another 40 or so fragments. If I do a quick

samtools idxstats cal1.bam
I do indeed get a whole bunch of chromosome fragments listed. The best I can come up with is that the referenceset on google genomics doesn't like all the fragments, thus the reference names must be a subset message.
The obvious workaround to test this, is to remove those chromosomes from the bam file. Unfortunately,

samtools view -b cal1.bam chr1 chr2 chr3 > cal-sub-1.bam
samtools index cal-sub-1.bam cal-sub-1.bai
samtools idxstats cal-sub-1.bam
returns a bam file that indeed, removes all the reads from the fragments from the bam file. It still lists the actual fragments though, which in turn, when I try and load to the google genomics dataset, gives me the same error.

How to I remove all references to the fragments from the bam file? Or is that not what the googlegenomics upload is objecting to.

tirohia is offline   Reply With Quote
Old 09-27-2017, 03:22 AM   #2
Senior Member
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,143

Did you check the headers from the subset BAM files? Those may still contain the offending chromosomes.
GenoMax is offline   Reply With Quote

google cloud, samtools

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 06:52 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2022, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO