SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
soap2 PE results do not match bowtie or novoalign axiom7 Bioinformatics 0 01-26-2011 12:43 PM
how to match snp position to GRCh37 release? cheng Bioinformatics 1 10-06-2010 03:12 PM
negative match position g781 Illumina/Solexa 0 07-03-2010 02:30 AM
chromosome match position in eland files litd General 0 03-02-2010 12:54 PM
the meaning of Match Position for reads? beliefbio Illumina/Solexa 2 11-27-2009 05:43 PM

Reply
 
Thread Tools
Old 11-25-2009, 12:29 PM   #1
dukevn
Member
 
Location: RI

Join Date: Apr 2009
Posts: 50
Default BOWTIE - match with empty chr and 0 position

Hi all,

I am still learning how BOWTIE does its jobs, and now I got something that I do not understand. Basically what I did was just BOWTIE with the default options (bowtie -p 8 -q --solexa1.3-quals --sam-nohead -S) with human genome reference. When checking the sam output, I saw something like below:
Code:
HWUSI-EAS751_0001:1:1:0:852#0/1	16	gi|224589800|ref|NC_000001.10|	155633307	255	35M	*	0	0	TGAGACCAGCCTGACCAACAAGGTGAAACCCCGTN	CCCCA;ACCCCCCCCCCCCCCCBCCBBBAAAABB#	XA:i:1	MD:Z:34C0	NM:i:1
HWUSI-EAS751_0001:1:1:0:823#0/1	16	gi|224589817|ref|NC_000005.9|	55233504	255	35M	*	0	0	TAATCTTATCAGCACAATATAATCTAACAATACCN	CCCCBCACCCCCCCCCCCCCCCC@CCCCCCCCBB#	XA:i:1	MD:Z:34T0	NM:i:1
HWUSI-EAS751_0001:1:1:0:385#0/1	0	gi|224589811|ref|NC_000002.11|	230683139	255	35M	*	0	0	NCAGTAACTGACACATCTCAATAACTGCCTGAAGC	#CCCCCCCCCCCCCCCCCCCCCCBCCCCCCCCCAC	XA:i:1	MD:Z:0C34	NM:i:1
HWUSI-EAS751_0001:1:1:0:1865#0/1	16	gi|224589805|ref|NC_000014.8|	47407772	255	35M	*	0	0	ATCTGACCCCAATTAGAACAGCTATTATGAAAAAN	BAB?B;AAC@CACBCCCBCCCBCCCCCCCCCCCC#	XA:i:1	MD:Z:34G0	NM:i:1
HWUSI-EAS751_0001:1:1:0:1878#0/1	16	gi|224589821|ref|NC_000009.11|	132898793	255	35M	*	0	0	GCAGGGGAACAGGTACCTCCGAGGGTGAGAGTCGN	@;@BBBBBBAABBBAAAAA?BBBBBBBBBBBB?B#	XA:i:1	MD:Z:34T0	NM:i:1
HWUSI-EAS751_0001:1:1:0:1348#0/1	0	gi|224589811|ref|NC_000002.11|	39229151	255	35M	*	0	0	NTCCTTTCACTTAAGAACATGTTATGGCCAGGCGC	#CCCCCCCCCCCCCCCCCCCCCCCCBABCCCCCBB	XA:i:1	MD:Z:0C34	NM:i:1
HWUSI-EAS751_0001:1:1:0:1507#0/1	16	gi|224589800|ref|NC_000001.10|	15747351	255	35M	*	0	0	CCCAAGCTGGTCTGAAACTCCTGGGCTCAAGTGAN	A=@BCCBCCCCCCCCCCCBAABCCCCCCCCCCCC#	XA:i:1	MD:Z:34T0	NM:i:1
HWUSI-EAS751_0001:1:1:0:69#0/1	16	gi|224589818|ref|NC_000006.11|	74229138	255	35M	*	0	0	GGTCTCAAATTTCCACAAGGAGATATCAATGGTGN	CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC#	XA:i:1	MD:Z:34A0	NM:i:1
HWUSI-EAS751_0001:1:1:0:200#0/1	16	gi|224589809|ref|NC_000018.9|	55285040	255	35M	*	0	0	GGGAGGCTGAGGCAGAAGAATCTCTTGAATCCGGN	CCCCCCCCCCCCCCCCCCCB?B;@CCCCCB@ACC#	XA:i:1	MD:Z:34G0	NM:i:1
HWUSI-EAS751_0001:1:1:0:418#0/1	4	*	0	0	*	*	0	0	NATCGGAAGAGCGGTTCAGCAGGAATGCCGAGATC	#BCCCCCBABCCACCCCAABAACCC@-@@9=8@>>	XM:i:0
HWUSI-EAS751_0001:1:1:0:978#0/1	4	*	0	0	*	*	0	0	NCTCGCCGACGCCTCTCATCTCACACCTGTCCACG	###################################	XM:i:0
My question concerns about those matches with * in their names and 0 in their positions in the reference. Can anybody explain to me what are those? Why these matches not in the reference and BOWTIE still reports them? How do I get the sam result without those matches?

Thanks,

D.
dukevn is offline   Reply With Quote
Old 11-30-2009, 07:12 AM   #2
ffinkernagel
Senior Member
 
Location: Marburg, Germany

Join Date: Oct 2009
Posts: 110
Default

These are unmatched reads - i.e. those that did not align to a reference sequence.

What are you planning to do with the reads that having these would be a problem?
ffinkernagel is offline   Reply With Quote
Old 11-30-2009, 07:22 AM   #3
dukevn
Member
 
Location: RI

Join Date: Apr 2009
Posts: 50
Default

Quote:
Originally Posted by ffinkernagel View Post
These are unmatched reads - i.e. those that did not align to a reference sequence.
Yeah I guessed that but I could not find any option with BOWTIE to prevent showing those. Anyone has any experience?

Quote:
Originally Posted by ffinkernagel View Post
What are you planning to do with the reads that having these would be a problem?
Those would surely be a problem. When I converted SAM to BED, the BED file then contains a lot of matches that are not in the reference and it showed error when I tried to upload to UCSC. I do want to remove those, but I do not what I should do. Do you have any suggestion ffinkernagel?

Thanks,

D.
dukevn is offline   Reply With Quote
Old 11-30-2009, 07:38 AM   #4
ffinkernagel
Senior Member
 
Location: Marburg, Germany

Join Date: Oct 2009
Posts: 110
Default

Well, serveral, depending on your ultimate goal.
How do you convert to BED, and do you really want to display *reads* in the UCSC browser.
Only I've found that to be problematic because of the (still large) size of the file you'd need to upload,
and I'd advise at least to split into 'by chromosome' anyhow, and well, that would eliminate those reads as a side effect.
ffinkernagel is offline   Reply With Quote
Old 11-30-2009, 07:50 AM   #5
dukevn
Member
 
Location: RI

Join Date: Apr 2009
Posts: 50
Default

Quote:
Originally Posted by ffinkernagel View Post
How do you convert to BED
I use a python code from another user in the forum (http://seqanswers.com/forums/showpos...12&postcount=2) to convert to BED.
Quote:
Originally Posted by ffinkernagel View Post
and do you really want to display *reads* in the UCSC browser
Why not?
Quote:
Originally Posted by ffinkernagel View Post
Only I've found that to be problematic because of the (still large) size of the file you'd need to upload,
and I'd advise at least to split into 'by chromosome' anyhow, and well, that would eliminate those reads as a side effect.
Yeah, that is the problem. What I want to do is to eliminate them in SAM file so that I can use my result with other softwares as well, not just UCSC Browser. How could you split the result into chromosomes?

Thanks,

D.
dukevn is offline   Reply With Quote
Old 11-30-2009, 11:20 PM   #6
ffinkernagel
Senior Member
 
Location: Marburg, Germany

Join Date: Oct 2009
Posts: 110
Default

Well, to just filter out the not aligned positions,
replace the line that reads
chrom = samFields[2]
with
chrom = samFields[2]
if chrom == '*': # or whatever is the offending chromosome 'name'
continue # one tab or four spaces, depending on what's already in the file.

Splitting into sepearate chromosomes is a tad bit more than I'm willing to modify on Arons script without his permission.

Would be easier to just get the vancouver short read package and use their convertToBed utility.

So long,
Florian
ffinkernagel is offline   Reply With Quote
Old 12-01-2009, 11:22 AM   #7
dukevn
Member
 
Location: RI

Join Date: Apr 2009
Posts: 50
Default

Quote:
Originally Posted by ffinkernagel View Post
Well, to just filter out the not aligned positions,
replace the line that reads
chrom = samFields[2]
with
chrom = samFields[2]
if chrom == '*': # or whatever is the offending chromosome 'name'
continue # one tab or four spaces, depending on what's already in the file.

Splitting into sepearate chromosomes is a tad bit more than I'm willing to modify on Arons script without his permission.

Would be easier to just get the vancouver short read package and use their convertToBed utility.

So long,
Florian
Thanks for your suggestion Florian. I thought that would be some option of BOWTIE that I was not aware of . Anyway, I still do not understand why BOWTIE reports unmatched reads? Is the software supposed to the matching job, ie to show up what is matched?

I checked output with MAQ and I do not see those unmatched reads.

Thanks,

D.
dukevn is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:19 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO