SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
"allele balance ratio" and "quality by depth" in VCF files efoss Bioinformatics 2 10-25-2011 12:13 PM
Relatively large proportion of "LOWDATA", "FAIL" of FPKM_status running cufflink ruben6um Bioinformatics 3 10-12-2011 01:39 AM
The position file formats ".clocs" and "_pos.txt"? Ist there any difference? elgor Illumina/Solexa 0 06-27-2011 08:55 AM
"Systems biology and administration" & "Genome generation: no engineering allowed" seb567 Bioinformatics 0 05-25-2010 01:19 PM
SEQanswers second "publication": "How to map billions of short reads onto genomes" ECO Literature Watch 0 06-30-2009 12:49 AM

Reply
 
Thread Tools
Old 06-11-2012, 02:55 AM   #21
EGrassi
Member
 
Location: Turin, Italy

Join Date: Oct 2010
Posts: 66
Default

Yep, thanks! Now that problem does not occur anymore.
It's claiming to have some troubles about the sort order of the bam (which cuffdiff used without problems) but as long as I've not read all the docs it's my time to work now
EGrassi is offline   Reply With Quote
Old 06-11-2012, 08:24 AM   #22
chadn737
Senior Member
 
Location: US

Join Date: Jan 2009
Posts: 392
Default

A very quick and dirty workaround that I used until they fixed the problem in Tophat and since I'm really not familiar with Python was to replace the missing quality scores with the read sequence itself. That way its guaranteed to be the same length as the read and HTSeq-count is happy.

Code:
awk '$11=="*"' accepted_hits.sam > noquals.sam;
awk '$11 !~ /^\*$/' accepted_hits.sam > tmp.sam;
awk 'FNR==NR{a[NR]=$10;next}{$11=a[FNR]}1' noquals.sam noquals.sam > tmp2.sam;
awk -v OFS="\t" '$1=$1' tmp2.sam > tmp3.sam
cat tmp.sam tmp3.sam > accepted_hits2.sam;
rm tmp.sam tmp2.sam tmp3.sam noquals.sam;
Then just run HTSeq-count on the accepted_hits2.sam

Not pretty (I'm still a novice at this stuff, so I settle for what works for me), but it gets the job done if you don't want to realign it and if you don't want to fiddle around with the HTSeq-count code. Of course if Simon Anders has fixed the issue, then thats obviously a better solution than this.
chadn737 is offline   Reply With Quote
Old 06-13-2012, 04:44 AM   #23
phred
Member
 
Location: Ireland

Join Date: May 2012
Posts: 11
Default

Thanks chad for the workaround, I will give it a try.

Having installed HTSeq 0.5.3p6, I now get the following message:

Code:
$ htseq-count

Traceback (most recent call last):
  File "/home/support/apps/cports/rhel-6.x86_64/gnu/HTSeq/0.5.3p6_Python_2.7.2/bin/htseq-count", line 3, in <module>
    import HTSeq.scripts.count
  File "/home/support/apps/cports/rhel-6.x86_64/gnu/HTSeq/0.5.3p6_Python_2.7.2/lib/python/HTSeq/__init__.py", line 8, in <module>
    from _HTSeq import *
  File "_HTSeq.pyx", line 11, in init HTSeq._HTSeq (src/_HTSeq.c:30228)
ImportError: No module named pysam
I am wondering has anyone else encountered this?
phred is offline   Reply With Quote
Old 06-13-2012, 04:59 AM   #24
EGrassi
Member
 
Location: Turin, Italy

Join Date: Oct 2010
Posts: 66
Default

Quote:
Originally Posted by phred View Post
Thanks chad for the workaround, I will give it a try.

Having installed HTSeq 0.5.3p6, I now get the following message:

Code:
$ htseq-count

Traceback (most recent call last):
  File "/home/support/apps/cports/rhel-6.x86_64/gnu/HTSeq/0.5.3p6_Python_2.7.2/bin/htseq-count", line 3, in <module>
    import HTSeq.scripts.count
  File "/home/support/apps/cports/rhel-6.x86_64/gnu/HTSeq/0.5.3p6_Python_2.7.2/lib/python/HTSeq/__init__.py", line 8, in <module>
    from _HTSeq import *
  File "_HTSeq.pyx", line 11, in init HTSeq._HTSeq (src/_HTSeq.c:30228)
ImportError: No module named pysam
I am wondering has anyone else encountered this?
Yep, but I solved it installing pysam from http://pysam.googlecode.com/files/pysam-0.6.tar.gz !
EGrassi is offline   Reply With Quote
Old 06-13-2012, 05:36 AM   #25
Dethecor
Member
 
Location: Germany

Join Date: May 2010
Posts: 24
Default pysam dependency

Hi all,

indeed certain functions in HTSeq depend on an installed pysam. We have changed this so that now this error is only raised if one actually tries to call one of these functions while not having pysam installed.
HTSeq version 0.5.3p7 includes this change (if you do not want to upgrade you can always just install pysam)

Cheers,
Paul
__________________

"You are only young once, but you can stay immature indefinitely."
Dethecor is offline   Reply With Quote
Old 06-15-2012, 10:50 PM   #26
vyellapa
Member
 
Location: phoenix

Join Date: Oct 2011
Posts: 59
Default

I'm curious what the best way to sort the sam file to give as input to htseq-count
I have sorted using samtools and UNIX sort as shown below
Code:
sort 1:
sort -T /scratch/vyellapantula -s -k 1,1 input.sam > sorted.sam

sort 2:
samtools sort input.bam sorted
samtools view -h -o sorted.sam sorted.bam
Ive used the same htseq-count command on these sams and ran a diff on these outputs.There are some differences and would like to know of the best way to sort the bam/sam file for htseq.

Code:
< ENSG00000259158	   14
---
> ENSG00000259158	   23
51905c51905
< ENSG00000259165	   15
---
> ENSG00000259165	   30
vyellapa is offline   Reply With Quote
Old 06-16-2012, 02:01 AM   #27
Dethecor
Member
 
Location: Germany

Join Date: May 2010
Posts: 24
Default Sort by position or name

Hi vyellapa,

if you run htseq-count on paired-end reads then you might want a .sam file sorted by name. Your Unix sort command does this (I hope it deals with the header properly), samtools sort sorts by position unless you specify the -n option.

That would be one likely source of the disparity between the counts you observed.

Cheers,
Paul
__________________

"You are only young once, but you can stay immature indefinitely."
Dethecor is offline   Reply With Quote
Old 06-16-2012, 10:01 PM   #28
vyellapa
Member
 
Location: phoenix

Join Date: Oct 2011
Posts: 59
Default

Thanks Paul, using -n produced no differences.
vyellapa is offline   Reply With Quote
Old 06-18-2012, 12:01 AM   #29
EGrassi
Member
 
Location: Turin, Italy

Join Date: Oct 2010
Posts: 66
Default

Woa, I used samtools -n and htseq-count gave me all counts of zero...that's strange, I will check with IGV or similar tools but cuffdiff gave me fpkg values with the same gtf and bam file...
EGrassi is offline   Reply With Quote
Old 06-18-2012, 04:29 AM   #30
TEFA
Junior Member
 
Location: Belgium

Join Date: Mar 2012
Posts: 5
Default

Quote:
Originally Posted by EGrassi View Post
Woa, I used samtools -n and htseq-count gave me all counts of zero...that's strange, I will check with IGV or similar tools but cuffdiff gave me fpkg values with the same gtf and bam file...
Hi EGrassi,

Did you solve this issue??????

Cheers,
TEFA
TEFA is offline   Reply With Quote
Old 06-18-2012, 04:41 AM   #31
EGrassi
Member
 
Location: Turin, Italy

Join Date: Oct 2010
Posts: 66
Default

Quote:
Originally Posted by TEFA View Post
Hi EGrassi,

Did you solve this issue??????
Actually no, right now I'm trying to avoid using '-' as an htseq-count argument without using it in a pipeline with samtools view, I'll let you know what happens...
EGrassi is offline   Reply With Quote
Old 06-18-2012, 07:41 AM   #32
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 992
Default

Quote:
Originally Posted by EGrassi View Post
Woa, I used samtools -n and htseq-count gave me all counts of zero...that's strange, I will check with IGV or similar tools but cuffdiff gave me fpkg values with the same gtf and bam file...
Was this by any chance a TopHat output? Somebody alerted me recently to a bug that causes mishandling of SAM lines that contain the "NH" flag, and that seems to appear in newer tophat output. I'll try to fix this ASAP.

Simon
Simon Anders is offline   Reply With Quote
Old 06-18-2012, 07:47 AM   #33
EGrassi
Member
 
Location: Turin, Italy

Join Date: Oct 2010
Posts: 66
Default

Quote:
Originally Posted by Simon Anders View Post
Was this by any chance a TopHat output? Somebody alerted me recently to a bug that causes mishandling of SAM lines that contain the "NH" flag, and that seems to appear in newer tophat output. I'll try to fix this ASAP.
Tophat 2.0 output, yes!

Thank you,
E.
EGrassi is offline   Reply With Quote
Old 06-18-2012, 08:15 AM   #34
EGrassi
Member
 
Location: Turin, Italy

Join Date: Oct 2010
Posts: 66
Default

Yep, I can confirm you that removing the NH flags from a subset of my sam files I get some counts different from 0. This could be a temporary workaroud.
EGrassi is offline   Reply With Quote
Old 06-22-2012, 04:41 AM   #35
EGrassi
Member
 
Location: Turin, Italy

Join Date: Oct 2010
Posts: 66
Default

Ok, on the whole data it falls back to all 0:
samtools view S1.sorted.bam.bam | sed 's/NM:i:\d//g' | htseq-count --stranded=no - /rogue/bioinfotree/prj/ewing-rnaseq/local/share/data/Homo_sapiens/UCSC/hg19/Annotation/Genes/genes.gtf > counts_S1_htseqq

I'm really a little surprised by the amount of errors and other glitches in rna seq software (this is a generic rant, not focused on htseq!)...it's a really complicated and new area, that's true, but right now using any of the existing tool seems a bet and to tell the truth I never feel confident on the results, as long as with every version results changes, and when something runs without errors I never know if that means that it worked or that an exception just had not screamed enough :/
EGrassi is offline   Reply With Quote
Old 06-24-2012, 09:42 AM   #36
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 992
Default

The bug with the NH flags, which was introduced during some refactoring in the version of a week ago or so is now fixed in version 0.5.3p8. Sorry that the recent changes turned out to be so volatile.
Simon Anders is offline   Reply With Quote
Old 06-25-2012, 08:47 AM   #37
EGrassi
Member
 
Location: Turin, Italy

Join Date: Oct 2010
Posts: 66
Default

Thanks, 0.5.3p9 has produced counts different from zero. Let's see what deSeq thinks of this data!
EGrassi is offline   Reply With Quote
Old 10-11-2018, 08:59 AM   #38
sushant
Member
 
Location: pune

Join Date: Sep 2015
Posts: 14
Default

Quote:
Originally Posted by Simon Anders View Post
Sorry, it seems we made some mix-up between version 0.5.3p4 and 0.5.3p5. Essentially, p5 undid some fixes in p4, including the one for "*" qualities. Now, there is version 0.5.3p6, which should clean up this mess. Please let me know if you still have problems.
Hello, I have performed the mapping of miRNA reads to the mirbase precursor file by SHRiMP. But I'm facing a probem in HTSeq while taking the read counts from the output.sam file.

The error which I'm trying to solve is;

Error occured when processing SAM input (line 2305 of file HBI_10.sam): ("'seq' and 'qualstr' do not have the same length.", 'line 2305 of file HBI_10.sam') [Exception type: ValueError, raised in _HTSeq.pyx:808]

Has anyone of you come across this error? What might be the solution?
sushant is offline   Reply With Quote
Old 10-15-2018, 04:57 AM   #39
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,541
Default

Sushant - what does line 2305 of your SAM file look like? Unless the quality entry is the special value '*' only then it is a likely an unrelated issue.
maubp is offline   Reply With Quote
Old 10-15-2018, 06:21 AM   #40
sushant
Member
 
Location: pune

Join Date: Sep 2015
Posts: 14
Default

Quote:
Originally Posted by maubp View Post
Sushant - what does line 2305 of your SAM file look like? Unless the quality entry is the special value '*' only then it is a likely an unrelated issue.
Actually, I did see that SAM file. The problem is that my sequence string & its respective sequence quality string is not of equal length.
However, I converted my fastq files to fasta and then performed the alignment, which solved this problem.

But the mapping percentage I'm getting with SHRiMP is too low(2.77%), for the mirna aligning to mature.
What parameters should I consider?
sushant is offline   Reply With Quote
Reply

Tags
bowtie2, htseq, quality, sam

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:06 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO