SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   removing chromosomes from a bam file. (http://seqanswers.com/forums/showthread.php?t=78301)

tirohia 09-26-2017 06:31 PM

removing chromosomes from a bam file.
 
I'm trying to upload a bam file, from an alignment to GRCh38 that I've done, to a google genomics dataset, associating it with the reference set for GRCh38. The reason it fails given in the logfile reads:

Code:

reference names must be a subset of those of the requested
    reference set: missing ["chr1" "chr10" "chr11" "chr11_KI270721v1_random" "chr12"
    "chr13" "chr14" "chr14_GL000009v2_random" "chr14_GL000194v1_random" "chr14_GL000225v1_random" ...

It goes on to list another 40 or so fragments. If I do a quick

Code:

samtools idxstats cal1.bam
I do indeed get a whole bunch of chromosome fragments listed. The best I can come up with is that the referenceset on google genomics doesn't like all the fragments, thus the reference names must be a subset message.
The obvious workaround to test this, is to remove those chromosomes from the bam file. Unfortunately,

Code:

samtools view -b cal1.bam chr1 chr2 chr3 > cal-sub-1.bam
samtools index cal-sub-1.bam cal-sub-1.bai
samtools idxstats cal-sub-1.bam

returns a bam file that indeed, removes all the reads from the fragments from the bam file. It still lists the actual fragments though, which in turn, when I try and load to the google genomics dataset, gives me the same error.

How to I remove all references to the fragments from the bam file? Or is that not what the googlegenomics upload is objecting to.

Thanks
Ben.

GenoMax 09-27-2017 03:22 AM

Did you check the headers from the subset BAM files? Those may still contain the offending chromosomes.


All times are GMT -8. The time now is 10:40 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.