Seqanswers Leaderboard Ad

**lanner** · 05-04-2014, 04:22 PM

This is head output of some of my files:

hg19.gtf:

chr1 hg19_refGene start_codon 67000042 67000044 0.000000 + . gene_id "NM_032291"; transcript_id "NM_032291";
chr1 hg19_refGene CDS 67000042 67000051 0.000000 + 0 gene_id "NM_032291"; transcript_id "NM_032291";
chr1 hg19_refGene exon 66999825 67000051 0.000000 + . gene_id "NM_032291"; transcript_id "NM_032291";
chr1 hg19_refGene CDS 67091530 67091593 0.000000 + 2 gene_id "NM_032291"; transcript_id "NM_032291";
chr1 hg19_refGene exon 67091530 67091593 0.000000 + . gene_id "NM_032291"; transcript_id "NM_032291";
chr1 hg19_refGene CDS 67098753 67098777 0.000000 + 1 gene_id "NM_032291"; transcript_id "NM_032291";
chr1 hg19_refGene exon 67098753 67098777 0.000000 + . gene_id "NM_032291"; transcript_id "NM_032291";
chr1 hg19_refGene CDS 67101627 67101698 0.000000 + 0 gene_id "NM_032291"; transcript_id "NM_032291";
chr1 hg19_refGene exon 67101627 67101698 0.000000 + . gene_id "NM_032291"; transcript_id "NM_032291";
chr1 hg19_refGene CDS 67105460 67105516 0.000000 + 0 gene_id "NM_032291"; transcript_id "NM_032291";

mrna.fasta:

>AF001540 1
ggcacgaggcaggtctgtctgttctgttggcaagtaaatgcagtactgtt
ctgatcccgctgctattagaatgcattgtgaaacgactggagtatgatta
aaagttgtgttccccaatgcttggagtagtgattgttgaaggaaaaaatc
cagctgagtgataaggctgagtgttgaggaaatttctgcagttttaagca
gtcgtatttgtgattgaagctgagtacatttgctggtgtatttttaggta
aaatgcttttttgttcatttctgggtggtgggaggggactgaagccttta
gtcttttccagatgcaaccttaaaatcagtgacaagaaacattccaaaca
agcaacagtcttcaagaaattaaactggcaagtggaaatgtttaaacagt
tcagtgatctttagtgcattgtttatgtgtgggtttctctctcccctccc

FourA.sam:

@SQ SN:AF001540 LN:1781
@SQ SN:AF001541 LN:1138
@SQ SN:AF001542 LN:2992
@SQ SN:AF001543 LN:903
@SQ SN:AF001544 LN:434
@SQ SN:AF001545 LN:370
@SQ SN:AF001546 LN:1142
@SQ SN:AF001547 LN:1092
@SQ SN:AF034176 LN:7232
@SQ SN:AF038950 LN:2384

FourA.count (head and tail)

head:
NM_000014 0
NM_000015 0
NM_000016 0
NM_000017 0
NM_000018 0
NM_000019 0
NM_000020 0
NM_000021 0
NM_000022 0
NM_000023 0

tail:
NR_046431 0
NR_046432 0
NR_046433 0
NR_046435 0
NR_046436 0
no_feature 257
ambiguous 0
too_low_aQual 0
not_aligned 6366159
alignment_not_unique 0

**dpryan** · 05-05-2014, 12:36 AM

Since you mapped against the transcriptome, htseq-count (or any tool for that matter) won't be able to give you counts. The reason for this is that these tools look at the coordinates in the SAM file (e.g., "AF001540:100-300") and then the GTF file and try to see if they overlap (and if the read overlaps multiple things). Since AF001540 (or any other transcript ID) doesn't exist as a chromosome or contig in your annotation file, this will never work.

You have two possibilities for moving forward. (1) Map to the genome with STAR or tophat2 or a similar tool. The resulting alignments will work fine with htseq-counts (or featureCounts, which is faster). (2) Realize that the counts are equivalent to the number of reads mapping to each contig/transcript and that you can get this information with "samtools idxstats sorted.and.indexed.alignments.bam". Option (2) is the less ideal of the two, since you can't filter by MAPQ (well you can, but you need to just create a new BAM file to do so) and even if you could it would be misleading since you'd have reads that look like they map to multiple places but actually don't since it's likely you have many transcripts that overlap. I would strongly encourage you to just remap things.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Zero counts in htseq-count output

Comment

Comment

Latest Articles

ad_right_rmr

News