Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • htseq-count low count problem

    Hi guys,

    I am looking at some RNA-seq data using DEseq2, before this I need to get a count table for each gene.

    here is what i have done:

    mapped the stranded, paired-end reads to transcriptome using tophat2:
    Code:
    tophat -p 12 -r 60 -o $out --transcriptome-only --no-novel-juncs --no-coverage-search --library-type fr-firststrand --transcriptome-ind
    ex=$known $hg19 lane1.1.repaa_val_1.fq lane1.2.repaa_val_2.fq
    I got about 6.5 million mapped pairs, which is about 50% of the input reads.

    then i took the mapped reads and count them against a gtf table

    Code:
    htseq-count -f bam -r pos -t exon -i gene_id accepted_hits.bam hg19.gtf > accepted_hits.bam.counts
    I got 0.4 million reads counted into the table and the number of no feature reads is about 13 million.

    I tried to sort the bam files using samtools and use -r name option in the htseq-count line but it also didn't work.

    this is how the gtf file look like
    Code:
    chr1	unknown	exon	11874	12227	.	+	.	gene_id "DDX11L1"; gene_name "DDX11L1"; transcript_id "NR_046018"; tss_id "TSS14844";
    chr1	unknown	exon	12613	12721	.	+	.	gene_id "DDX11L1"; gene_name "DDX11L1"; transcript_id "NR_046018"; tss_id "TSS14844";
    chr1	unknown	exon	13221	14409	.	+	.	gene_id "DDX11L1"; gene_name "DDX11L1"; transcript_id "NR_046018"; tss_id "TSS14844";
    chr1	unknown	exon	14362	14829	.	-	.	gene_id "WASH7P"; gene_name "WASH7P"; transcript_id "NR_024540"; tss_id "TSS7514";
    chr1	unknown	exon	14970	15038	.	-	.	gene_id "WASH7P"; gene_name "WASH7P"; transcript_id "NR_024540"; tss_id "TSS7514";
    chr1	unknown	exon	15796	15947	.	-	.	gene_id "WASH7P"; gene_name "WASH7P"; transcript_id "NR_024540"; tss_id "TSS7514";
    chr1	unknown	exon	16607	16765	.	-	.	gene_id "WASH7P"; gene_name "WASH7P"; transcript_id "NR_024540"; tss_id "TSS7514";
    chr1	unknown	exon	16858	17055	.	-	.	gene_id "WASH7P"; gene_name "WASH7P"; transcript_id "NR_024540"; tss_id "TSS7514";
    chr1	unknown	exon	17233	17368	.	-	.	gene_id "WASH7P"; gene_name "WASH7P"; transcript_id "NR_024540"; tss_id "TSS7514";
    chr1	unknown	exon	17606	17742	.	-	.	gene_id "WASH7P"; gene_name "WASH7P"; transcript_id "NR_024540"; tss_id "TSS7514";
    chr1	unknown	exon	17915	18061	.	-	.	gene_id "WASH7P"; gene_name "WASH7P"; transcript_id "NR_024540"; tss_id "TSS7514";
    chr1	unknown	exon	18268	18366	.	-	.	gene_id "WASH7P"; gene_name "WASH7P"; transcript_id "NR_024540"; tss_id "TSS7514";
    chr1	unknown	exon	24738	24891	.	-	.	gene_id "WASH7P"; gene_name "WASH7P"; transcript_id "NR_024540"; tss_id "TSS7514";
    chr1	unknown	exon	29321	29370	.	-	.	gene_id "WASH7P"; gene_name "WASH7P"; transcript_id "NR_024540"; tss_id "TSS7514";
    chr1	unknown	exon	34611	35174	.	-	.	gene_id "FAM138A"; gene_name "FAM138A"; transcript_id "NR_026818_1"; tss_id "TSS8403";
    chr1	unknown	exon	34611	35174	.	-	.	gene_id "FAM138F"; gene_name "FAM138F"; transcript_id "NR_026820_1"; tss_id "TSS8403";
    chr1	unknown	exon	35277	35481	.	-	.	gene_id "FAM138A"; gene_name "FAM138A"; transcript_id "NR_026818_1"; tss_id "TSS8403";
    chr1	unknown	exon	35277	35481	.	-	.	gene_id "FAM138F"; gene_name "FAM138F"; transcript_id "NR_026820_1"; tss_id "TSS8403";
    chr1	unknown	exon	35721	36081	.	-	.	gene_id "FAM138A"; gene_name "FAM138A"; transcript_id "NR_026818_1"; tss_id "TSS8403";
    chr1	unknown	exon	35721	36081	.	-	.	gene_id "FAM138F"; gene_name "FAM138F"; transcript_id "NR_026820_1"; tss_id "TSS8403";
    chr1	unknown	CDS	69091	70005	.	+	0	gene_id "OR4F5"; gene_name "OR4F5"; p_id "P1230"; transcript_id "NM_001005484"; tss_id "TSS14428";
    chr1	unknown	exon	69091	70008	.	+	.	gene_id "OR4F5"; gene_name "OR4F5"; p_id "P1230"; transcript_id "NM_001005484"; tss_id "TSS14428";
    chr1	unknown	start_codon	69091	69093	.	+	.	gene_id "OR4F5"; gene_name "OR4F5"; p_id "P1230"; transcript_id "NM_001005484"; tss_id "TSS14428";
    chr1	unknown	stop_codon	70006	70008	.	+	.	gene_id "OR4F5"; gene_name "OR4F5"; p_id "P1230"; transcript_id "NM_001005484"; tss_id "TSS14428";
    Any ideas? thanks

  • #2
    Have a look at things in IGV or use the -o option to track what's happening to reads that aren't getting counted but you think should be.

    Comment


    • #3
      Got most mapped reads counted if I specify -s no, but the library was made using a strand-specific protocol and mapped using tophat in --library-type fr-firststrand mode. Don't know why.

      Comment


      • #4
        I'd have to recheck the strandedness settings, maybe you just need "-s reverse" to match your library type.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 03-27-2024, 06:37 PM
        0 responses
        12 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-27-2024, 06:07 PM
        0 responses
        11 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        53 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        69 views
        0 likes
        Last Post seqadmin  
        Working...
        X