SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
Bowtie: Ultrafast and memory-efficient alignment of short reads to the human genome Ben Langmead Literature Watch 2 03-04-2013 03:06 AM
The best short read aligner Deutsche Bioinformatics 4 04-14-2011 08:12 PM
Short Read Micro re-Aligner Paper nilshomer Literature Watch 0 10-29-2010 10:59 AM
New Short Read Aligner sparks Bioinformatics 48 08-26-2009 09:01 AM
Very Short Read aligner Rupinder Bioinformatics 1 06-02-2009 08:10 PM

Reply
 
Thread Tools
Old 08-10-2011, 03:16 AM   #441
vebaev
Senior Res.
 
Location: Plovdiv, Bulgaria

Join Date: Oct 2008
Posts: 110
Default

Hi,
I have followed the tread here but finally confused what option to use.

I have a FASTA file with illumina small RNAs that are clipped, filtered and cleaned, also collapsed in unique seqs ready for mapping.

I want to map than onto human genome but do not know that is optimal in my situation - I want to know many times a read is mapping onto the genome.
In this case I used -a -v 0 and -a -v 1. My concerns for -v 1 is that do not know if I allow 1 mismatch a read can map also in a place that is not real? In the opposite the concern about --v 0 is that I get only 30% of the uniq seqs aligned?

Last edited by vebaev; 08-10-2011 at 03:59 PM.
vebaev is offline   Reply With Quote
Old 08-10-2011, 04:43 PM   #442
cswarth
Member
 
Location: seattle

Join Date: Mar 2010
Posts: 14
Default

I am new to this, but it seems to me that if you allow mismatches, you absolutely can get alignments that aren't real. You can also get alignments that aren't real if you don't allow mismatches!

There are several sources of false-positives and false-negative alignments. The reference sequence you are aligning to is the consensus from probably many replicates of a particular lineage of organism. Your experimental sequences may come from a slightly different lineage of organism with a slightly different genome. If you do not allow mismatches, you will miss valid alignments that differ only by an expected polymorphic site.

There are also several sources of error in the sequencing itself. If you're using an illumina machine, there are at least four sources of error that may mis-call a base in the sequence. If you don't allow mismatches, those reads that have an error in sequencing might not align to your genome at all.

On the other hand, if you allow mismatches, your reads may align to several places on the genome, and how do you know which one is valid? There is a really no good answer. You could do some further processing and only consider reads that land inside exons of known genes. Or maybe you want to allow mismatches but only use those reads that match a single place on the genome.

In our experiment we are starting with the most conservative assumptions and slowly loosening the criteria as we gain more confidence in our methodology. So we only consider reads that match perfectly against mm9 genome and which fall inside of known exons with a coverage of at least 10 reads. We'll start to loosen the criteria and see how that affects our results.
cswarth is offline   Reply With Quote
Old 08-10-2011, 05:01 PM   #443
vebaev
Senior Res.
 
Location: Plovdiv, Bulgaria

Join Date: Oct 2008
Posts: 110
Default

hi cswarth
You are quite right!
My main concerns are for example in this case:
I want to annotate where in the genome are mapping 2 reads. If I do not allow mistmaches the first read will have 1 hit in intron and the second will not align to the genome at all. In the option with 1 mismatch the first read will map in the intron perfectly and in intergenic region with 1 mismatch, in other hand now the second read can map to the genome in one place as mismatching is allowed.
In the second scenario we are happy because the secong read can align, but then how to annotate the first read which hits are increased

If you followed me my point is that if I want to map more reads that cannot map with zero mismatches I will lose the "sensitivity" of my reads that are already mapped

I hope you got it

Last edited by vebaev; 08-10-2011 at 05:06 PM.
vebaev is offline   Reply With Quote
Old 08-11-2011, 10:06 AM   #444
vebaev
Senior Res.
 
Location: Plovdiv, Bulgaria

Join Date: Oct 2008
Posts: 110
Default

Hi, again
as I told before I'm trying to map my cleaned reads to hg19

If I use -a -v 0 my output is like 2GB and I see that many seq with low read counts like 1 or 2 can align ten of thousands of time onto human genome?! and it is messy...

I can use the option -k 100 -v 0, but If I want to know how many times a seq is mapping in the genome how to be sure as I artifivially put a threshold?
As I want to annotate also repeat-assosiated and other RNAs how to do that and escape from the mess of the above?

or beter to discard these by -m 100?

Best

Last edited by vebaev; 08-11-2011 at 10:45 AM.
vebaev is offline   Reply With Quote
Old 09-15-2011, 03:19 AM   #445
[mic]
Junior Member
 
Location: austria

Join Date: Sep 2011
Posts: 3
Default Additional Index information

Hi,

i try to analyse Bowtie for using GPGPUs through CUDA. Next to the limited Hardware ressources, I have one big problem. It seems that Bowtie relies on structs, using C++ datatypes (please correct me if I'm wrong), but i need C compatible datatypes to get them on the device memory (global memory of the graphic card) and also to work with.
On my walkthrough I noticed that the first bytes are used to store some extra information for the ebwt_params struct, but:

How do I get the BWT?
How is it stored? (I think either uint32 or uint64)
How do i "read" the nc values (0,1,2,3) from that?

Are there any additional information available how the files built? (Any files, slides,.. are welcome..)

The plan:
read the index file with my own code and store it into C compatible Datatypes, get them to the device and try to make an exact alignment on GPU.

Thank you
mic
[mic] is offline   Reply With Quote
Old 09-15-2011, 06:59 AM   #446
Xi Wang
Senior Member
 
Location: MDC, Berlin, Germany

Join Date: Oct 2009
Posts: 317
Default

Quote:
Originally Posted by [mic] View Post
Hi,

i try to analyse Bowtie for using GPGPUs through CUDA. Next to the limited Hardware ressources, I have one big problem. It seems that Bowtie relies on structs, using C++ datatypes (please correct me if I'm wrong), but i need C compatible datatypes to get them on the device memory (global memory of the graphic card) and also to work with.
On my walkthrough I noticed that the first bytes are used to store some extra information for the ebwt_params struct, but:

How do I get the BWT?
How is it stored? (I think either uint32 or uint64)
How do i "read" the nc values (0,1,2,3) from that?

Are there any additional information available how the files built? (Any files, slides,.. are welcome..)

The plan:
read the index file with my own code and store it into C compatible Datatypes, get them to the device and try to make an exact alignment on GPU.

Thank you
mic
If you are good at programming, you can check the source code of bow tie_build.
__________________
Xi Wang
Xi Wang is offline   Reply With Quote
Old 09-15-2011, 09:18 AM   #447
[mic]
Junior Member
 
Location: austria

Join Date: Sep 2011
Posts: 3
Default

Quote:
Originally Posted by Xi Wang View Post
If you are good at programming, you can check the source code of bow tie_build.
I still tried, but the code is very nested, which makes it difficult for me to get the all-over-picture. I would be grateful if someone can help me.

Last edited by [mic]; 09-19-2011 at 06:55 AM.
[mic] is offline   Reply With Quote
Old 09-21-2011, 09:45 PM   #448
phatjoe
Member
 
Location: Asia

Join Date: Aug 2011
Posts: 13
Default BOWTIE, shortreads with different length

Hi,

Just tried out BOWTIE today. May I know if BOWTIE supports the mapping for shortreads of different lengths? (e.g:for r1/#1 I have 96 bp whereas for the r1/#2, i have 86 bp.) The shortreads was trimmed with a different software prior to the alignment.

My bowtie version is 0.12.7

Thanks in advance!
phatjoe is offline   Reply With Quote
Old 09-30-2011, 12:09 AM   #449
belmax
Junior Member
 
Location: Europe

Join Date: Sep 2011
Posts: 1
Default bowtie 0.12.7 & SOLiD PE reads

Hi all,
There is the problem for bowtie 0.12.7 & SOLiD mate pair reads.
bowtie (-C -f -I 1000 -X 4000 --ff <ebwt> -1 F3.csfasta -2 R3.csfasta ) maps 0.0%, while SOLiD`s Bioscope maps about 70%.
Insert size is about 2500.
Colorspace index is OK. Synthetic csfasta reads are mapped well by bowtie. Separately F3 or R3 are mapped well.
What is could be wrong? Is the problem of bowtie or mate pair reads?

cheers

Last edited by belmax; 09-30-2011 at 01:50 AM.
belmax is offline   Reply With Quote
Old 10-03-2011, 01:53 AM   #450
nemesis
Junior Member
 
Location: Paris, France

Join Date: Jun 2010
Posts: 3
Default bowtie -e (--maqerr) parameter

Hi all,

According to the bowtie manual and some posts I've read here, the -e/--maqerr <int> option indicates the maximum sum of quality scores allowed at the mismatched bases throughout the entire alignment and as such can control the total number of mismatches over the entire read length.

I understand that the higher this option will be, the higher number of alignments I will obtain. But I still have trouble understanding the logic behind this parameter. Indeed let's say I set -e 70 with --nomaqround.
A read with an overall high quality (for ex. each of its base has a Phred score of 38) and 3 mismatched bases to the reference sequence will be excluded from the alignment, since (38 * 3) > 70. While another read with an overall poor quality (for instance, having a Phred score of 10 for each of its bases) and 5 mismatches will be kept, since (10 * 5) < 70. But if we suppose that bases with low quality have higher chance to be sequencing errors than true variations, I'd rather exclude the latter read and keep the former one... (No ?)

If anyone could help me understand this parameter and its usage I would be very grateful.

Cheers
nemesis is offline   Reply With Quote
Old 10-08-2011, 02:15 AM   #451
oxydeepu
Member
 
Location: bangalore,india

Join Date: Jul 2011
Posts: 41
Default

Hi all,

I am running bowtie, i have this query that can we specify the mismatches to be at a particular end, say 3'...??
waiting for a reply
Thanking you
Deepak
oxydeepu is offline   Reply With Quote
Old 10-08-2011, 02:17 AM   #452
oxydeepu
Member
 
Location: bangalore,india

Join Date: Jul 2011
Posts: 41
Default

Hi all,

I am running bowtie, i have this query that is there any way can we specify the mismatches to be at a particular end, say 3'...??
waiting for a reply
Thanking you

Deepak

Last edited by oxydeepu; 10-09-2011 at 02:58 AM. Reason: did not get any reply
oxydeepu is offline   Reply With Quote
Old 10-12-2011, 06:20 AM   #453
rahilsethi
Member
 
Location: Pittsburgh, PA

Join Date: May 2010
Posts: 22
Unhappy Extra parameter(s) specified error

I am running bowtie version 0.12.7 for mapping SOLiD (colorspace 50bp read length) data against human genome (hg19), on a linux platform (CentOS). When I run with the following parameters:

Quote:
$bowtie -C -f -Q sample_QV.qual -a --best --strata -n -l 20 --maxbts --chunkmbs 1000 -t --al 50_mapped_reads.csfasta --sam -p 5 /bowtie-ref-build/hg19/hg19 sample.csfasta 50_mapping.sam
it gives me the following error

Extra parameter(s) specified: "sample.csfasta", "50_mapping.sam"

and when I was running with default seed-length(-l) value by not defining
-l 20 i.e.:


Quote:
$bowtie -C -f -Q sample_QV.qual -a --best --strata -n --maxbts --chunkmbs 1000 -t --al 50_mapped_reads.csfasta --sam -p 5 /bowtie-ref-build/hg19/hg19 sample.csfasta 50_mapping.sam
it runs successfully, generating the number of reads mapped and unmapped
details on the screen.

How can I then run the program at different seed length when I run bowtie
since, as seen above, it does not run whenever I mention seed length
within permissible range (i.e. 20 > 5 for read length 50bp)?
rahilsethi is offline   Reply With Quote
Old 10-12-2011, 06:54 AM   #454
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

There error messages makes me think that you are missing some parameter options. In particular '-n' should have a number after it; e.g., '-n 2' as should '--maxbts'.

What I think is happening in the successful line where you have '-n --maxbts' is that the 'n' parameter is reading in '--maxbts' as the number to use. Thus there is no problem.

Where as in the bad line you have '-n -l 20 --maxbts --chunkmbs' with the results that '-n' is swallowing (using) '-l' ... '20' is being skipped, '--maxbts' is swallowing '--chunkmbs' which then throws off the rest of the command line.

Anyway that is my guess. Please try your command either with numbers after '-n' and '--maxbts' or just get rid of those two parameters.
westerman is offline   Reply With Quote
Old 10-12-2011, 10:15 AM   #455
rahilsethi
Member
 
Location: Pittsburgh, PA

Join Date: May 2010
Posts: 22
Default Re: Extra parameter(s) specified error

the reason why I did not give any value to -n and --maxbts because I am trying to use their default values. If I wouldn't mention -n then how would bowtie know whether I want to do mapping with -n or -v options? I will give it a try by giving numbers to all of them though, but I think it should not give any problem because I did not give value to -n and --maxbts
rahilsethi is offline   Reply With Quote
Old 12-14-2011, 04:43 PM   #456
mediator
Member
 
Location: New England

Join Date: Nov 2010
Posts: 27
Default

Hi All,
I am running bowtie on a pair end alignment (illumina hiseq). Here is the command and the output I got:
Quote:
bowtie -p 4 -v 2 -k 11 -m 10 -t --best /bowtie/indexes/hg19 -1 /data/rna_seq/0916_1.fq -2 /data/rna_seq/0916_2.fq /data/rna_seq/0916.SAM -S

Time loading forward index: 00:00:08
Time loading mirror index: 00:00:08
Time loading reference: 00:00:03
End-to-end 2/3-mismatch full-index search: 04:48:55
# reads processed: 114497412
# reads with at least one reported alignment: 64037326 (55.93%)
# reads that failed to align: 50290127 (43.92%)
# reads with alignments suppressed due to -m: 169959 (0.15%)
Reported 94715801 paired-end alignments to 1 output stream(s)
Time searching: 04:49:14
Overall time: 04:49:14
Not sure why I have so many reads fail to align?
mediator is offline   Reply With Quote
Old 12-14-2011, 06:51 PM   #457
biznatch
Senior Member
 
Location: Canada

Join Date: Nov 2010
Posts: 124
Default

mediator, are you using Bowtie to align RNA-Seq data? You should use Tophat for RNA-Seq data, as Bowtie can't deal with splice sites, which would be why you're getting a low alignment percentage.
biznatch is offline   Reply With Quote
Old 12-14-2011, 08:55 PM   #458
mediator
Member
 
Location: New England

Join Date: Nov 2010
Posts: 27
Default

Quote:
Originally Posted by biznatch View Post
mediator, are you using Bowtie to align RNA-Seq data? You should use Tophat for RNA-Seq data, as Bowtie can't deal with splice sites, which would be why you're getting a low alignment percentage.
Yes, my data was RNA-Seq. Thanks for the advice! Do you think -X will help?
mediator is offline   Reply With Quote
Old 01-27-2012, 12:27 PM   #459
Nick
Member
 
Location: Philadelphia, PA

Join Date: Jun 2009
Posts: 16
Default

Quote:
Originally Posted by Lien View Post
Apparently, it is normal that some reads are skipped because they can't align. Just hope the percentage that is skipped isn't too high!
I know this thread is quite old, but just wanted to point out this is not normal.
http://sourceforge.net/tracker/index...7&atid=1101606
The search tree is exceeding the default available memory.
Nick is offline   Reply With Quote
Old 02-29-2012, 08:55 PM   #460
rfrancis
Junior Member
 
Location: Perth, Australia

Join Date: Jul 2011
Posts: 7
Default Bowtie truncates ID line if it has spaces

Dear all,
Has anyone seen this before? I am using bowtie v0.12.7 to align reads from the short read archive which have IDs as follows:

SRR064286.51418 HWI-EAS418:1:5:1357:1070 length=50

In the resultant SAM file where bowtie finds a match, for some reason the ID is truncated to the first space:

SRR064286.51418

However when no match is found the ID is reported in full.

This seems odd, so I would appreciate someone trying to replicate this for me. Below are a couple of reads and a very short sequence to use as a reference. The first read should match but the other should not. Can someone try and align these using bowtie and let me know what you get.

Many thanks in advance.

Reads: Save as test.fq
@SRR064286.10 HWI-EAS418:1:4:1:147 length=50
TGGCTTCTTCTGTCTTCATAAGTTTTTCCAGGCGGTCTTCCAAGTCCAAA
+SRR064286.10 HWI-EAS418:1:4:1:147 length=50
BCBCCCCCCCCA8::>:?:>8!/@:1&7>6@BCBA@CACCA6>!<BB<BA
@SRR064286.11 HWI-EAS418:1:4:1:119 length=50
GGTTGTAGGACAGCATTTCAAGAACTAAACAGAGATGGTTTCGGAACATA
+SRR064286.11 HWI-EAS418:1:4:1:119 length=50
BBABA@BAABB:3707::9</!.B>:76:8;B9BAAAB>BBC<!<BCBB?

Ref: Save as ref.fa and run "bowtie-build ref.fa ref" to make a reference
>testref
ATTTCGATGCGAGCTTATTCGAGGCGTATCGTAGCGAGTGCTAGGGCTAT
TGGCTTCTTCTGTCTTCATAAGTTTTTCCAGGCGGTCTTCCAAGTCCAAA
GCGGATTGCTGATGCGAGCGTAGTCGTAGTGTGCGTATTGCGATTCGATG

Run bowtie with "bowtie --sam ref test.fq test.sam" and check out the SAM file test.sam.

Thanks for your help
Rich

Last edited by rfrancis; 02-29-2012 at 08:57 PM.
rfrancis is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:43 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO