Hi. I'm using HTseq to produce some count read files that I can then use for NOIseq and edgeR. I used trimmomatic to trim my reads in paired end mode and merged the paired ends in tophat2 using
I then sorted them using the samtools sort function. When I run the resulting bam files through HTseq count function, I get a really high percentage (95%) of reads with missing mate pair and also around 65% amiguous feature rate.
I used this command and ran it on a gtf file I produced myself in cuffmerge from the RNAseq data I collected.
Can anyone help me why I am getting such high ambiguous count rate and such a high percentage of missing mate pairs?
Thanks
Code:
/home/user/Downloads/tophat2/tophat -p 4 -o /home/user/Desktop/sam/trimmomatic_out/2B82 --mate-inner-dist 100 --library-type fr-firststrand /home/user/Downloads/bowtie2/example/index/DinoAnt /home/user/Desktop/sam/trimmomatic_out/2B82/output_forward_paired.fastq.gz /home/user/Desktop/sam/trimmomatic_out/2B82/output_reverse_paired.fastq.gz
I used this command and ran it on a gtf file I produced myself in cuffmerge from the RNAseq data I collected.
Code:
htseq-count -m union -s no -t exon -f bam -i gene_id /home/user/Desktop/sam/trimmomatic_out/10G87/accepted_hits.sorted.bam /home/user/Desktop/new_dino_merged.gtf
Thanks