SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
how to extract unique hits from a sam file gfmgfm Bioinformatics 8 01-27-2014 12:49 AM
Bowtie call to get unique, multi-hits and nonmatching reads PFS Bioinformatics 2 07-19-2011 01:05 PM
Regarding Unique reads, Unique alignments sridharacharya RNA Sequencing 2 09-20-2010 05:39 AM
Unique VS Non-Unique read analysis samt Bioinformatics 2 09-29-2009 09:44 AM
Unique and repeat Hits pfranchini Bioinformatics 1 07-15-2009 08:51 AM

Reply
 
Thread Tools
Old 05-26-2011, 11:55 AM   #1
arrchi
Member
 
Location: ma

Join Date: Mar 2011
Posts: 46
Default unique hits

Hi there,

I am looking for a way to find unique hits for my RNA-seq data. After searching online and in this community, I still can't find a good way to find unique hits from a sam file. Any input will be welcome.

I used tophat to align the data to the HG19 and samtools -bq -1 to generated reliable hits.

Here is part of the output:

[Tophat_out]$ grep -w SRR087416.97659 accepted_hits_realiable.sam
SRR087416.97659 0 chr1 11356 1 36M * 0 0 CAGCTAGGGACATTGCAGGGTCCTCTTGCTCAAGGT BBBCBCB9CBBBC@:+>3A97AACABA9@CCCCB9# NM:i:0 NH:i:4 CC:Z:chr12 CP:i:94218
SRR087416.97659 16 chr12 94218 1 36M * 0 0 ACCTTGAGCAAGAGGACCCTGCAATGTCCCTAGCTG #9BCCCC@9ABACAA79A3>+:@CBBBC9BCBCBBB NM:i:0 NH:i:4 CC:Z:chr15 CP:i:102519779
SRR087416.97659 16 chr15 102519779 1 36M * 0 0 ACCTTGAGCAAGAGGACCCTGCAATGTCCCTAGCTG #9BCCCC@9ABACAA79A3>+:@CBBBC9BCBCBBB NM:i:0 NH:i:4 CC:Z:chr2 CP:i:114359624

My questions are: Does this mean that the read SRR087416.97659 maps to chr1, chr12, and chr15?

If I understand the concept "unique-hit" correctly, then this read can not be counted as an unique hit. Am I right?

Thanks,

-A
arrchi is offline   Reply With Quote
Old 05-29-2011, 01:57 PM   #2
rnaeye
Member
 
Location: Massachusetts

Join Date: May 2011
Posts: 65
Default

Quote:
Originally Posted by arrchi View Post
Hi there,

I am looking for a way to find unique hits for my RNA-seq data. After searching online and in this community, I still can't find a good way to find unique hits from a sam file. Any input will be welcome.

I used tophat to align the data to the HG19 and samtools -bq -1 to generated reliable hits.

Here is part of the output:

[Tophat_out]$ grep -w SRR087416.97659 accepted_hits_realiable.sam
SRR087416.97659 0 chr1 11356 1 36M * 0 0 CAGCTAGGGACATTGCAGGGTCCTCTTGCTCAAGGT BBBCBCB9CBBBC@:+>3A97AACABA9@CCCCB9# NM:i:0 NH:i:4 CC:Z:chr12 CP:i:94218
SRR087416.97659 16 chr12 94218 1 36M * 0 0 ACCTTGAGCAAGAGGACCCTGCAATGTCCCTAGCTG #9BCCCC@9ABACAA79A3>+:@CBBBC9BCBCBBB NM:i:0 NH:i:4 CC:Z:chr15 CP:i:102519779
SRR087416.97659 16 chr15 102519779 1 36M * 0 0 ACCTTGAGCAAGAGGACCCTGCAATGTCCCTAGCTG #9BCCCC@9ABACAA79A3>+:@CBBBC9BCBCBBB NM:i:0 NH:i:4 CC:Z:chr2 CP:i:114359624

My questions are: Does this mean that the read SRR087416.97659 maps to chr1, chr12, and chr15?

If I understand the concept "unique-hit" correctly, then this read can not be counted as an unique hit. Am I right?

Thanks,

-A
You are correct: SRR087416.97659 sequence maps more than one location in the genome. This means that you have no way of knowing where that sequence is coming from. You need to sort your output by sequence ID, then find uniq IDs. Does this help? Unique read should hit the genome only once at one specific location.
rnaeye is offline   Reply With Quote
Old 05-29-2011, 01:59 PM   #3
rnaeye
Member
 
Location: Massachusetts

Join Date: May 2011
Posts: 65
Default

btw, what instrumentation this output is coming from? Thanks.
rnaeye is offline   Reply With Quote
Old 05-31-2011, 12:52 PM   #4
arrchi
Member
 
Location: ma

Join Date: Mar 2011
Posts: 46
Default

Thanks for your message.

The result is generated by a Mac Pro with 8GB Memory and 1TB hard drive. Is that you want to know?
arrchi is offline   Reply With Quote
Old 05-31-2011, 12:56 PM   #5
rnaeye
Member
 
Location: Massachusetts

Join Date: May 2011
Posts: 65
Default

my questions was what DNA sequencing platform this output is from, such as Illumina, ABI SOLiD, etc. thanks.
rnaeye is offline   Reply With Quote
Old 06-01-2011, 05:11 AM   #6
arrchi
Member
 
Location: ma

Join Date: Mar 2011
Posts: 46
Default

Oh. Sorry.

This is Illumina RNA sequencing data.
arrchi is offline   Reply With Quote
Old 06-01-2011, 07:17 AM   #7
arrchi
Member
 
Location: ma

Join Date: Mar 2011
Posts: 46
Default

Does anybody knows that the value 0 in "SRR087416.97659 0" means? I checked samtools menu, it only says if 0x1 is unset, no assumptions can be made about .....
arrchi is offline   Reply With Quote
Old 06-01-2011, 07:53 AM   #8
arrchi
Member
 
Location: ma

Join Date: Mar 2011
Posts: 46
Default

I found an old post saying that
Quote:
Flag 0 means "the read is not paired and mapped, forward strand".
Hope it is true.

Last edited by arrchi; 06-01-2011 at 07:55 AM.
arrchi is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:24 PM.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.