I am using Tophat to perform single-end alignment to a plant genome that has 12 chromosomes. Tophat generated a file called accepted_hits.bam (1.75G). I converted the file to a bed file using the bamToBed utility.
Then I split the bed file based on chromosomes. Below are the number of lines (reads) for each chromosome.
chrom1 - 4.9M
chrom2 - 6.2M
chrom3 - 5.0M
chrom4 - 2.4M
chrom5 - 5.3M
chrom6 - 2.8M
chrom7 - 3.0M
chrom8 - 2.6M
chrom9 - 12.6M
chrom10 - 1.7M
chrom11 - 1.8M
chrom12 - 1.8M
My question: Why are there 12.6 million reads on chromosome 9?
I looked at the genome map and chromosome 9 is the second smallest (chrom 10 is the smallest).
Then I split the bed file based on chromosomes. Below are the number of lines (reads) for each chromosome.
chrom1 - 4.9M
chrom2 - 6.2M
chrom3 - 5.0M
chrom4 - 2.4M
chrom5 - 5.3M
chrom6 - 2.8M
chrom7 - 3.0M
chrom8 - 2.6M
chrom9 - 12.6M
chrom10 - 1.7M
chrom11 - 1.8M
chrom12 - 1.8M
My question: Why are there 12.6 million reads on chromosome 9?
I looked at the genome map and chromosome 9 is the second smallest (chrom 10 is the smallest).
Comment