Seqanswers Leaderboard Ad

**Simon Anders** · 10-10-2012, 04:41 AM

How have you called htseq-count? Please post the precise command line, and any output generated.

**EGrassi** · 10-10-2012, 05:03 AM

Thank you for you answer:

Code:

samtools sort -n /rogue/bioinfotree/task/RNAseq/dataset/0.1/GSE27003/alignment_tophat/SRR097789.merged.bam SRR097789_sorted  
samtools view SRR097789_sorted.bam | htseq-count -m intersection-nonempty --stranded=no - annotation.gtf > nonoverlap_nonempty_counts_SRR097789_htseq 2> nonoverlap_nonempty_counts_SRR097789_htseq.err

No output reported, just the processing of the gtf infos.
I've manually inspected the bam file and it seems to have a lot of missing read ids (? I got the data from sra and just dumped it to fastq files, and they seem ok to me. Then standard tophat pipeline...), so now I'm trying to filter them out to see if that's the cause of the problem.
Samtools flagstat on the bam file does not report anything wrong with the bam file itself, apart from a very log percentage of properly paired reads.

**Simon Anders** · 10-10-2012, 05:09 AM

Htseq-count starts with reading in the annotation file and reports when its finished with that before looking at the reads. So, if it prints nothing, your problem is with the GTF file. How does it look like?

**EGrassi** · 10-10-2012, 05:13 AM

Sorry, I did not point out htat properly maybe: "No output reported, just the processing of the gtf infos."
It finished the processing of the gtf, if you want the number I have to rerun it...but I swear that it printed them and that the same gtf worked with other bam files.

**EGrassi** · 10-10-2012, 05:36 AM

Ok, I tried removing from the .sam files lines with an empty read name (I hope to understand why they're there) and it finished without any problem, the counts seems sensible to me (not from a biological point of view, ok, but they are numbers and sometimes they are different from 0!). If you want a sample of the .bam/.sam files with those strange lines I can give them to you (although I do not know if they are out of the bam format standard and samtools is just being nice to them or if they are ok and this could be a small bug of htseq_count/the library that it uses to scan bam files).

**EGrassi** · 10-11-2012, 01:01 AM

...because of tophat?

Ok, as long as fastq seems fine to me (they have an id foreach read, for example) I'm starting to think that the strange bam/sam obtained are tophat's responsibility.

I have "normal" lines like these ones:

Code:

SRR097789.29777200      89      chr19   55899346        255     50M     *       0       0       ATGCTCGCGCCNCGNTCAGCAGCATCAGACACATGATCCGCAAGAACAAG      AHFH@44455-!5-!:A:A:DHHHHHHHHHH=HEHDDDHDHHHHFFEGFG      AS:i:-2 XN:i:0  XM:i:2  XO:i:0  XG:i:0  NM:i:2  MD:Z:11A2C35    YT:Z:UU XS:A:+  NH:i:1

And others without the QNAME field:

Code:

       329     chr1    10005   0       50M     *       0       0       CCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCC
      *       AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:50 YT:Z:UU NH:i:20 CC:Z:chr5       CP:i:10131      HI:i:0

Sequences gotten from lines like this one are found in the fastq files, associated with IDs.

Does anyone have an idea? I'm using tophat 2.0 with this command line:

Code:

tophat -p 7 --no-novel-juncs -G /rogue/bioinfotree/prj/ewing-rnaseq/local/share/data/Homo_sapiens/UCSC/hg19/Annotation/Genes/genes.gtf --transcriptome-index=transcriptome -o SRR097789.th ../alignment/expanded_genome2 <(zcat /rogue/bioinfotree/task/RNAseq/dataset/0.1/GSE27003/reads//SRR097789_1.fastq.gz) <(zcat /rogue/bioinfotree/task/RNAseq/dataset/0.1/GSE27003/reads//SRR097789_2.fastq.gz)

The --no-novel-juncs option had to be added otherwise it will just freeze and stop after a while (see this other thread: http://seqanswers.com/forums/showthread.php?t=23887)

**jyuert** · 06-01-2022, 08:07 PM

I read that Post and got it fine and informative. stick merge

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 27 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 31 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 27 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

htseq-count eats 42G of memory

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News