So I've just used the pre-built GRCh38 bowtie2 index to map paired end reads using Tophat. Approximately 70% of the reads mapped (~11000000).
These were then sorted by name (samtools sort -n). I then downloaded the Homo_sapiens.GRCh38.88.gtf file from Ensembl and ran the sorted reads through HTSeq:
This has resulted in zero reads mapping. I get ~1/3rd of reads registering as no feature and the rest register as alignment not unique. I also get nothing if I ask to count exon rather that gene. When I load a sample of the reads up into Seqmonk though, I get numerous reads mapping to genes. Any thoughts as to why?
The only thing that I can think of is that the bowtie index file isn't the same assembly as the gtf file. I'm in the process of downloading corresponding fa file (Homo_sapiens.GRCh38.dna.toplevel.fa.gz I think) from ensembl and trying to align everything again. I'm not particularly sanguine though as I'm assuming the gtf that Seqmonk uses is coming from ensembl as well, whilst my alignment is using the bowtie2 pre-built index.
Cheers
Ben.
Code:
tophat -o pilot_S10.5 ../bowtieIndex/GCA_000001405.15_GRCh38_n o_alt_analysis_set.fna.bowtie_index pilot_S10_L005_R1_001.fastq pilot_S10_L005_R2_001.fastq
Code:
htseq-count -f bam -t gene --stranded=yes accepted.sorted.bam /projects/comm00007/rnaSeqData/Homo_sapiens.GRCh38.88.gtf > sample.count
The only thing that I can think of is that the bowtie index file isn't the same assembly as the gtf file. I'm in the process of downloading corresponding fa file (Homo_sapiens.GRCh38.dna.toplevel.fa.gz I think) from ensembl and trying to align everything again. I'm not particularly sanguine though as I'm assuming the gtf that Seqmonk uses is coming from ensembl as well, whilst my alignment is using the bowtie2 pre-built index.
Cheers
Ben.
Comment