SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Reply
 
Thread Tools
Old 11-30-2015, 01:48 PM   #1
Olha
Junior Member
 
Location: Columbia, MO

Join Date: Sep 2015
Posts: 6
Default HTSeq_output_Error_occured_when_processing_SAM_input

Dear, SEQanswers community,

I faced issue with creation counting tables for RNA-seq data using HTSeq software.
Before I performed TopHat alignment (accepted_hints.bam) for 28 fastq files (single-end reads) and Cuffmerge assemble (merged.gtf).

This is script, that I used:
htseq-count --format bam --stranded no accepted_hits.bam /merged.gtf > htseq_out.txt

At the end of HTSeq output file I got a message:

33300000 SAM alignment records processed.
Error occured when processing SAM input (record #33301227 in file /media/olha/3CEBA24E475DF793/ALL_olha_2/A15/tophat_out/accepted_hits.bam):
reference_id -1 out of range 0<=tid<84
[Exception type: ValueError, raised in calignmentfile.pyx:642]

Any suggestions will be highly appreciated.

Olha
Olha is offline   Reply With Quote
Old 11-30-2015, 11:22 PM   #2
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

A "reference_id -1" should only occur if a read is unmapped. My guess is that you somehow have an unmapped read without the unmapped bit in the flag being set. What's the output of:
Code:
samtools view -F 4 /media/olha/3CEBA24E475DF793/ALL_olha_2/A15/tophat_out/accepted_hits.bam | awk '{if($3=="*") print $0}'
That should produce nothing, though I suspect it'll print the problematic line in the BAM file.
dpryan is offline   Reply With Quote
Old 12-01-2015, 07:56 AM   #3
Olha
Junior Member
 
Location: Columbia, MO

Join Date: Sep 2015
Posts: 6
Default

Dear, Devon,

Thank you for reply.
I run your script and get this output:

Code:
HWI-ST538:357:D2BKUACXX:1:1105:13318:13823	16	*	1204092	50	7M600N25M	*	0	0	CGCACGGACGCCCCCAAAACGCATATGACTCG	HE=AJJJJJJJJIGIJIGHHHHHHFFFFF@CB	AS:i:0	XM:i:2	XO:i:0	XG:i:0	MD:Z:17G13A0	NM:i:2	XS:A:+	NH:i:1
HWI-ST538:357:D2BKUACXX:1:1107:19710:10717	0	*	3860981	50	2M516N22M	*	0	0	CCGCCACTCCACGATGATGGGGTT	<?@DDFFDFFAFHHII?FHGGHIG	AS:i:0	XM:i:2	XO:i:0	XG:i:0	MD:Z:13G8C1	NM:i:2	XS:A:-	NH:i:1
HWI-ST538:357:D2BKUACXX:1:2314:13745:61117	0	*	4385042	50	12M1572N11M	*	0	0	TGGGGTGGGGTCCTGGCTTTGTG	CCCFFBDFHHHHHJJJJJJJJHI	AS:i:0	XM:i:2	XO:i:0	XG:i:0	MD:Z:18C2G1	NM:i:2	XS:A:-	NH:i:1
How can I extract these reads from my accepted_hints.bam file?

Thank You!
Olha is offline   Reply With Quote
Old 12-01-2015, 10:11 AM   #4
Olha
Junior Member
 
Location: Columbia, MO

Join Date: Sep 2015
Posts: 6
Default

Also I tried this script for removing aforemention reads. I put them all into txt file (read_ids_to_remove.txt).

Code:
samtools view -h accepted_hits.bam [grep -vf read_ids_to_remove.txt] samtools view -bS -o accepted_hits_filter.bam
But I got error message:

Code:
view: invalid option -- 'v'
[samopen] no @SQ lines in the header.
[main_samview] random alignment retrieval only works for indexed BAM files.
Does it means, that I need to sorted and indexing accepted_hits.bam file before removing unmapped reads?
Olha is offline   Reply With Quote
Reply

Tags
counting_tables, htseq, sam

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:03 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO