SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
Problems with htseq-count reading bam file created by STAR priya RNA Sequencing 3 06-01-2013 11:32 PM
RNAseq read alignments using STAR Maura Casey Bioinformatics 5 05-16-2013 08:59 AM
Strange error when using htseq-count shhuang Bioinformatics 13 11-19-2012 12:40 AM
Error using HTSeq on Mapsplice alignments AnneBiton Bioinformatics 5 08-23-2012 11:34 PM
htseq-count error sissi Bioinformatics 0 03-20-2012 11:40 PM

Reply
 
Thread Tools
Old 08-04-2013, 05:04 AM   #1
bioBob
Member
 
Location: Virginia

Join Date: Mar 2011
Posts: 72
Default HTSeq error using STAR alignments

Hi,
I am trying to use STAR alignments as input to the HTSeq count tools suitable for either DESeq or DEXSeq.

The error I am getting is:

2181086 GFF lines processed.
Error occured when reading first line of sam file.
Error: ("SAM optional field with illegal type letter ':'", 'line 28 of file GRL1259_76_GAGATTCC_L005_R1_001_AT_QT.paire
R_aligned.sam.bam.sorted.bam.sam')
[Exception type: ValueError, raised in _HTSeq.pyx:1180]

This is the first line of the samfile, the 28th line looks similar with a different read.
HWI-1KL163:53:C2191ACXX:1:1204:19189:58727 99 1 155979169 255 51M = 155979289 171 CCTTCACTATGGCTGAGAGCAGGGCAGGATCCAGGAGAAAGTGGCCAAGGG CCCFFFFFHHHHHJJJJJJIJJJJJJJJIJJJJIJFHHJJJFHGIJJJJJJ NH:i:1 HI:i:1 AS:i:100 nM:i:0

I am using the following commands:

STAR --genomeDir $genome_DIR --runThreadN 12 --genomeLoad NoSharedMemory --outSAMmode Full --outFilterMismatchNmax 3 --outFilterMultimapNmax 5 --readFilesIn $align_FILES

samtools view -h file.bam -o file.bam.sam

python -m HTSeq.scripts.count -m intersection-strict -t exon -s no -i gene_id $BAM_NAME.sorted.bam.sam $GTF >$BAM_NAME.sorted.bam.sam.out

I am using samtools 0.1.19, HTSeq 0.5.4p3, STAR 2.3.0

Thanks,
Bob
bioBob is offline   Reply With Quote
Old 08-04-2013, 08:09 AM   #2
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Can you post the actual 28th line? I think that this will include the header, so it could be the line that you posted. One odd thing is the "nM" field, which should be "NM". You might also try just making a smaller version of the same file, excluding the line that causes issues. You can then see if this crops up elsewhere.
dpryan is offline   Reply With Quote
Old 08-04-2013, 08:24 AM   #3
bioBob
Member
 
Location: Virginia

Join Date: Mar 2011
Posts: 72
Default

Hi,
it looks like it doesn't like the second sequence in all cases, 3000 jobs all choked.
Here are the first two from one of the files.
HWI-1KL163:5422YEACXX:5:1101:1627:2223 99 3 196083660 255 51M = 196089177 5568 TGAGAAGTTCAAAACGTTCAT
TTGGGTATCCTTTAGACTGCACGTGCTTCA @CCFDDFDHHHHHJJJIJJJJJJJJJGHIJJJJJIHIJJJJJJJJJJJJJG NH:i:1 HI:i:1 AS:i:99 nM:i:0 jM:B:c,-1 jI:B:i,-1
HWI-1KL163:5422YEACXX:5:1101:1627:2223 147 3 196089177 255 51M = 196083660 -5568 TCCCCTCCACTACTCCATCTG
CTTTTTCAGGTGGCATCTCCAGTATCTCTG GJJHF?HGFHHDIGJJJIJJJJJJJJJJJJJIJJJJIJHHHHHFFFFFCCC NH:i:1 HI:i:1 AS:i:99 nM:i:0 jM:B:c,-1 jI:B:i,-1
bioBob is offline   Reply With Quote
Old 08-04-2013, 08:36 AM   #4
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

I wonder if it's choking on the jM and/or jI tags. Try making a miniature SAM file with just the header and a couple read-pairs and then try removing those two fields (optionally changing "nM" to "NM"). I recall in the STAR manual, there's mention made that some of the additional fields can break other programs.
dpryan is offline   Reply With Quote
Old 08-04-2013, 08:39 AM   #5
bioBob
Member
 
Location: Virginia

Join Date: Mar 2011
Posts: 72
Default

Good ideas. I will give that a try.
bioBob is offline   Reply With Quote
Old 08-04-2013, 09:02 AM   #6
bioBob
Member
 
Location: Virginia

Join Date: Mar 2011
Posts: 72
Default

Hi,
thanks, with that as a hint, I removed the --outSAMmode Full option and went with the defaults, all seems to work now.
Bob
bioBob is offline   Reply With Quote
Old 08-04-2013, 09:36 AM   #7
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Quote:
Originally Posted by bioBob View Post
Hi,
thanks, with that as a hint, I removed the --outSAMmode Full option and went with the defaults, all seems to work now.
Bob
That must be the option I'm remembering, glad that everything's working.

Devon
dpryan is offline   Reply With Quote
Old 08-05-2013, 07:38 AM   #8
alexdobin
Senior Member
 
Location: NY

Join Date: Feb 2009
Posts: 161
Default

Quote:
Originally Posted by dpryan View Post
That must be the option I'm remembering, glad that everything's working.

Devon
Hi @bioBob and @dpryan

you are right - HTseq cannot deal with jM:B:c,-1 and jI:B:i,-1 SAM attributes which are output if you use (non-default) --outSAMmode Full. These new type of "array" attributes are relatively new to SAM, they appeared in 0.1.17, I believe. HTseq (as well as Cufflinks and probably others) seem to have a problem with these attributes. These attributes are convenient but not really required, so you can either run STAR without them, or cut the corresponding columns before feeding them to HTseq or Cufflinks. I probably need to report this problem to HTseq authors.

nM attribute is intentionally named differently than the standard NM attributes because nM contains the number of mismatches in both mates of a paired-end read, and does not include indels. It should not cause any problems with the downstream processing.

Cheers
Alex
alexdobin is offline   Reply With Quote
Old 10-13-2016, 08:37 AM   #9
ellybelly
Junior Member
 
Location: Europe

Join Date: Oct 2016
Posts: 2
Default HTSeq works fine with BAM

I had the same issue when trying to count sam files in HTSeq. However, if you got pysam installed you can run it with the -f bam flag and the new fields are no longer an issue.
ellybelly is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:24 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO