SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
error in the location of chromosomes (bam files) - ncRNA ref (alignament tophat) tfcardoso Bioinformatics 0 10-22-2016 08:38 AM
Write the subset of reads from BAM file into new SAM/BAM file, using R tools. Old Pioneer Bioinformatics 0 01-27-2016 05:41 AM
removing reads that map to more than one location from gsnap-aligned bam files efoss Bioinformatics 4 07-14-2015 08:39 PM
Removing pairs that align to almost the same positions from bam ShellfishGene Bioinformatics 2 07-16-2013 03:00 AM
How to filter out chromosomes/specific regions from a BAM file? fcr Bioinformatics 5 02-13-2013 06:50 AM

Reply
 
Thread Tools
Old 09-26-2017, 07:31 PM   #1
tirohia
Member
 
Location: Auckland, NZ

Join Date: Nov 2011
Posts: 46
Default removing chromosomes from a bam file.

I'm trying to upload a bam file, from an alignment to GRCh38 that I've done, to a google genomics dataset, associating it with the reference set for GRCh38. The reason it fails given in the logfile reads:

Code:
reference names must be a subset of those of the requested
    reference set: missing ["chr1" "chr10" "chr11" "chr11_KI270721v1_random" "chr12"
    "chr13" "chr14" "chr14_GL000009v2_random" "chr14_GL000194v1_random" "chr14_GL000225v1_random" ...
It goes on to list another 40 or so fragments. If I do a quick

Code:
samtools idxstats cal1.bam
I do indeed get a whole bunch of chromosome fragments listed. The best I can come up with is that the referenceset on google genomics doesn't like all the fragments, thus the reference names must be a subset message.
The obvious workaround to test this, is to remove those chromosomes from the bam file. Unfortunately,

Code:
samtools view -b cal1.bam chr1 chr2 chr3 > cal-sub-1.bam
samtools index cal-sub-1.bam cal-sub-1.bai
samtools idxstats cal-sub-1.bam
returns a bam file that indeed, removes all the reads from the fragments from the bam file. It still lists the actual fragments though, which in turn, when I try and load to the google genomics dataset, gives me the same error.

How to I remove all references to the fragments from the bam file? Or is that not what the googlegenomics upload is objecting to.

Thanks
Ben.
tirohia is offline   Reply With Quote
Old 09-27-2017, 04:22 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,585
Default

Did you check the headers from the subset BAM files? Those may still contain the offending chromosomes.
GenoMax is offline   Reply With Quote
Reply

Tags
google cloud, samtools

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:58 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO