Seqanswers Leaderboard Ad

**Simon Anders** · 04-04-2012, 02:42 AM

Something went wrong here. htseq-count never uses much memory, because it reads the data read for read. Only the content of the relevant lines in the GFF file are kept in memory, and this can be nowhere near several GB. Please double-check that it really was htseq-count that filled up your machine and that your files are sane. Maybe something is wrong with the GFF file, so that HTSeq chokes on trying to read it.

**dglemay** · 04-04-2012, 04:16 AM

Hi Simon,
Thanks so much for responding. I was hoping you would see this thread.
I have rebooted my machine and run only my script. The process using all of the memory is python...and this is the only thing using python. Just to be sure, I've killed everything, rebooted, and started only the htseq-count call and as it is running, the memory used gradually climbs and climbs.
The script is outputting warnings that look like this:
Read HWI-ST623:0:2:1101:11808:178924:0:2:1 claims to have an aligned mate which could not be found. (Is the SAM file properly sorted?)

So...what may be happening is that htseq-count is storing more and more reads in memory as it tries to find the mate.

I am sorting the sam file but I'm a newbie at this so it is possible I'm doing something wrong. The options I'm using are

samtools sort -n bamfile.bam
samtools view -h sorted_bamfile.bam > sorted_samfile.sam

Here is the full script:
foreach bamfile (./TopHat_output/*accepted_hits.bam)
set fprefix = `echo $bamfile:t:r | sed 's/accepted_hits//'`
samtools sort -n $bamfile sorted_{$fprefix}
samtools view -h sorted_{$fprefix}.bam >
./HTSEQcount_output/sorted_{$fprefix}.sam
/home/dglemay/work/tools/HTSeq-0.5.3p3/scripts/htseq-count -m
intersection-nonempty -s no -t CDS -i gene_id -o
./HTSEQcount_output/htseq_{$fprefix}.sam
./HTSEQcount_output/sorted_{$fprefix}.sam ./hg19/hg19_EnsGene.gff >&!
./HTSEQcount_output/log_{$fprefix}.txt
grep ENS ./HTSEQcount_output/log_{$fprefix}.txt >
./count_data/counts_{$fprefix}.txt
# cleanup
rm ./HTSEQcount_output/sorted_{$fprefix}.sam
samtools view -bSl ./HTSEQcount_output/htseq_{$fprefix}.sam >
./HTSEQcount_output/htseq_{$fprefix}.bam
end

Thank you for reading,
Danielle

**emilyjia2000** · 04-04-2012, 06:17 AM

Hi Simon,

Is HTseq-count possible to count UTR? I tried it and got nothing. If it works on UTR, any particular aspect I have to pay attention to?

THanks

**dglemay** · 04-04-2012, 07:11 AM

Ah!

Should be
samtools sort
not
samtools sort -n

@emilyjia2000: you probably need to start a new thread

**labunit** · 04-04-2012, 08:25 AM

Does the memory usage climb the entire time or just at the beginning? What is the file size of your GFF?

If it is alignment related, you would see an initial increase of memory consumption as you start the tool, as the GFF is read. Then, depending on your SAM file, the memory consumption, if what you say is true, would start to increase again. Can you observe this behavior?
I am guessing you are using a *nix OS. Just open another Console window and enter "top". You'll be shown a detailed list of processes and their consumption of resources.

Please correct me if I am wrong.

**emilyjia2000** · 04-04-2012, 11:44 AM

try picard sort, it works on me.

**Simon Anders** · 04-04-2012, 10:24 PM

Originally posted by emilyjia2000 View Post

try picard sort, it works on me.

Sorry, but as author of HTSeq, I would like to say, just for the record: HTSeq works as well for nearly everybody, and it is designed to work with little memory. I have no clue what is wrong here but I am very sure that there must be something very strange with dglemay's input files.

**xuguorong** · 10-23-2012, 07:08 PM

Hi Simon,

I am very wondering the paired-end sorting problem before using HTSeq.
I read many posts about this issue, but no standard and complete thread explain it.
At first, I sort my paired-end BAM file with the command,
samtools sort -n my.bam my.sort

Then, I convert the BAM to SAM,
samtools view my.sort.bam > my.sort.sam

finally, I run HTSeq to get the counts,
htseq-counts --stranded=no --mode=intersection-nonempty -t exon -i gene_id my.sort.sam annotation.gtf > output.txt

But I still got a lot error messages that HTSeq cannot find the other aligned mate(Is the SAM file properly sorted). Someone said that we still need to sort the SAM file again. If I sort SAM again, then how to sort it? Still sorted by name or other sorting method?
Could you explain it more detailed?

Thanks a lot!

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 13 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

htseq-count performance

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News