![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
"allele balance ratio" and "quality by depth" in VCF files | efoss | Bioinformatics | 2 | 10-25-2011 12:13 PM |
Relatively large proportion of "LOWDATA", "FAIL" of FPKM_status running cufflink | ruben6um | Bioinformatics | 3 | 10-12-2011 01:39 AM |
The position file formats ".clocs" and "_pos.txt"? Ist there any difference? | elgor | Illumina/Solexa | 0 | 06-27-2011 08:55 AM |
"Systems biology and administration" & "Genome generation: no engineering allowed" | seb567 | Bioinformatics | 0 | 05-25-2010 01:19 PM |
SEQanswers second "publication": "How to map billions of short reads onto genomes" | ECO | Literature Watch | 0 | 06-30-2009 12:49 AM |
![]() |
|
Thread Tools |
![]() |
#21 |
Member
Location: Turin, Italy Join Date: Oct 2010
Posts: 66
|
![]()
Yep, thanks! Now that problem does not occur anymore.
It's claiming to have some troubles about the sort order of the bam (which cuffdiff used without problems) but as long as I've not read all the docs it's my time to work now ![]() |
![]() |
![]() |
![]() |
#22 |
Senior Member
Location: US Join Date: Jan 2009
Posts: 392
|
![]()
A very quick and dirty workaround that I used until they fixed the problem in Tophat and since I'm really not familiar with Python was to replace the missing quality scores with the read sequence itself. That way its guaranteed to be the same length as the read and HTSeq-count is happy.
Code:
awk '$11=="*"' accepted_hits.sam > noquals.sam; awk '$11 !~ /^\*$/' accepted_hits.sam > tmp.sam; awk 'FNR==NR{a[NR]=$10;next}{$11=a[FNR]}1' noquals.sam noquals.sam > tmp2.sam; awk -v OFS="\t" '$1=$1' tmp2.sam > tmp3.sam cat tmp.sam tmp3.sam > accepted_hits2.sam; rm tmp.sam tmp2.sam tmp3.sam noquals.sam; Not pretty (I'm still a novice at this stuff, so I settle for what works for me), but it gets the job done if you don't want to realign it and if you don't want to fiddle around with the HTSeq-count code. Of course if Simon Anders has fixed the issue, then thats obviously a better solution than this. |
![]() |
![]() |
![]() |
#23 |
Member
Location: Ireland Join Date: May 2012
Posts: 11
|
![]()
Thanks chad for the workaround, I will give it a try.
Having installed HTSeq 0.5.3p6, I now get the following message: Code:
$ htseq-count Traceback (most recent call last): File "/home/support/apps/cports/rhel-6.x86_64/gnu/HTSeq/0.5.3p6_Python_2.7.2/bin/htseq-count", line 3, in <module> import HTSeq.scripts.count File "/home/support/apps/cports/rhel-6.x86_64/gnu/HTSeq/0.5.3p6_Python_2.7.2/lib/python/HTSeq/__init__.py", line 8, in <module> from _HTSeq import * File "_HTSeq.pyx", line 11, in init HTSeq._HTSeq (src/_HTSeq.c:30228) ImportError: No module named pysam |
![]() |
![]() |
![]() |
#24 | |
Member
Location: Turin, Italy Join Date: Oct 2010
Posts: 66
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#25 |
Member
Location: Germany Join Date: May 2010
Posts: 24
|
![]()
Hi all,
indeed certain functions in HTSeq depend on an installed pysam. We have changed this so that now this error is only raised if one actually tries to call one of these functions while not having pysam installed. HTSeq version 0.5.3p7 includes this change (if you do not want to upgrade you can always just install pysam) Cheers, Paul
__________________
"You are only young once, but you can stay immature indefinitely." |
![]() |
![]() |
![]() |
#26 |
Member
Location: phoenix Join Date: Oct 2011
Posts: 59
|
![]()
I'm curious what the best way to sort the sam file to give as input to htseq-count
I have sorted using samtools and UNIX sort as shown below Code:
sort 1: sort -T /scratch/vyellapantula -s -k 1,1 input.sam > sorted.sam sort 2: samtools sort input.bam sorted samtools view -h -o sorted.sam sorted.bam Code:
< ENSG00000259158 14 --- > ENSG00000259158 23 51905c51905 < ENSG00000259165 15 --- > ENSG00000259165 30 |
![]() |
![]() |
![]() |
#27 |
Member
Location: Germany Join Date: May 2010
Posts: 24
|
![]()
Hi vyellapa,
if you run htseq-count on paired-end reads then you might want a .sam file sorted by name. Your Unix sort command does this (I hope it deals with the header properly), samtools sort sorts by position unless you specify the -n option. That would be one likely source of the disparity between the counts you observed. Cheers, Paul
__________________
"You are only young once, but you can stay immature indefinitely." |
![]() |
![]() |
![]() |
#28 |
Member
Location: phoenix Join Date: Oct 2011
Posts: 59
|
![]()
Thanks Paul, using -n produced no differences.
|
![]() |
![]() |
![]() |
#29 |
Member
Location: Turin, Italy Join Date: Oct 2010
Posts: 66
|
![]()
Woa, I used samtools -n and htseq-count gave me all counts of zero...that's strange, I will check with IGV or similar tools but cuffdiff gave me fpkg values with the same gtf and bam file...
|
![]() |
![]() |
![]() |
#30 |
Junior Member
Location: Belgium Join Date: Mar 2012
Posts: 5
|
![]() |
![]() |
![]() |
![]() |
#31 |
Member
Location: Turin, Italy Join Date: Oct 2010
Posts: 66
|
![]() |
![]() |
![]() |
![]() |
#32 | |
Senior Member
Location: Heidelberg, Germany Join Date: Feb 2010
Posts: 994
|
![]() Quote:
Simon |
|
![]() |
![]() |
![]() |
#33 |
Member
Location: Turin, Italy Join Date: Oct 2010
Posts: 66
|
![]() |
![]() |
![]() |
![]() |
#34 |
Member
Location: Turin, Italy Join Date: Oct 2010
Posts: 66
|
![]()
Yep, I can confirm you that removing the NH flags from a subset of my sam files I get some counts different from 0. This could be a temporary workaroud.
|
![]() |
![]() |
![]() |
#35 |
Member
Location: Turin, Italy Join Date: Oct 2010
Posts: 66
|
![]()
Ok, on the whole data it falls back to all 0:
samtools view S1.sorted.bam.bam | sed 's/NM:i:\d//g' | htseq-count --stranded=no - /rogue/bioinfotree/prj/ewing-rnaseq/local/share/data/Homo_sapiens/UCSC/hg19/Annotation/Genes/genes.gtf > counts_S1_htseqq I'm really a little surprised by the amount of errors and other glitches in rna seq software (this is a generic rant, not focused on htseq!)...it's a really complicated and new area, that's true, but right now using any of the existing tool seems a bet and to tell the truth I never feel confident on the results, as long as with every version results changes, and when something runs without errors I never know if that means that it worked or that an exception just had not screamed enough :/ |
![]() |
![]() |
![]() |
#36 |
Senior Member
Location: Heidelberg, Germany Join Date: Feb 2010
Posts: 994
|
![]()
The bug with the NH flags, which was introduced during some refactoring in the version of a week ago or so is now fixed in version 0.5.3p8. Sorry that the recent changes turned out to be so volatile.
|
![]() |
![]() |
![]() |
#37 |
Member
Location: Turin, Italy Join Date: Oct 2010
Posts: 66
|
![]()
Thanks, 0.5.3p9 has produced counts different from zero. Let's see what deSeq thinks of this data!
![]() |
![]() |
![]() |
![]() |
#38 | |
Member
Location: pune Join Date: Sep 2015
Posts: 14
|
![]() Quote:
The error which I'm trying to solve is; Error occured when processing SAM input (line 2305 of file HBI_10.sam): ("'seq' and 'qualstr' do not have the same length.", 'line 2305 of file HBI_10.sam') [Exception type: ValueError, raised in _HTSeq.pyx:808] Has anyone of you come across this error? What might be the solution? |
|
![]() |
![]() |
![]() |
#39 |
Peter (Biopython etc)
Location: Dundee, Scotland, UK Join Date: Jul 2009
Posts: 1,543
|
![]()
Sushant - what does line 2305 of your SAM file look like? Unless the quality entry is the special value '*' only then it is a likely an unrelated issue.
|
![]() |
![]() |
![]() |
#40 | |
Member
Location: pune Join Date: Sep 2015
Posts: 14
|
![]() Quote:
However, I converted my fastq files to fasta and then performed the alignment, which solved this problem. But the mapping percentage I'm getting with SHRiMP is too low(2.77%), for the mirna aligning to mature. What parameters should I consider? |
|
![]() |
![]() |
![]() |
Tags |
bowtie2, htseq, quality, sam |
Thread Tools | |
|
|