I'm trying to upload a bam file, from an alignment to GRCh38 that I've done, to a google genomics dataset, associating it with the reference set for GRCh38. The reason it fails given in the logfile reads:
It goes on to list another 40 or so fragments. If I do a quick
I do indeed get a whole bunch of chromosome fragments listed. The best I can come up with is that the referenceset on google genomics doesn't like all the fragments, thus the reference names must be a subset message.
The obvious workaround to test this, is to remove those chromosomes from the bam file. Unfortunately,
returns a bam file that indeed, removes all the reads from the fragments from the bam file. It still lists the actual fragments though, which in turn, when I try and load to the google genomics dataset, gives me the same error.
How to I remove all references to the fragments from the bam file? Or is that not what the googlegenomics upload is objecting to.
Thanks
Ben.
Code:
reference names must be a subset of those of the requested reference set: missing ["chr1" "chr10" "chr11" "chr11_KI270721v1_random" "chr12" "chr13" "chr14" "chr14_GL000009v2_random" "chr14_GL000194v1_random" "chr14_GL000225v1_random" ...
Code:
samtools idxstats cal1.bam
The obvious workaround to test this, is to remove those chromosomes from the bam file. Unfortunately,
Code:
samtools view -b cal1.bam chr1 chr2 chr3 > cal-sub-1.bam samtools index cal-sub-1.bam cal-sub-1.bai samtools idxstats cal-sub-1.bam
How to I remove all references to the fragments from the bam file? Or is that not what the googlegenomics upload is objecting to.
Thanks
Ben.
Comment