SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Bowtie: Ultrafast and memory-efficient alignment of short reads to the human genome Ben Langmead Literature Watch 2 03-04-2013 02:06 AM
The best short read aligner Deutsche Bioinformatics 4 04-14-2011 07:12 PM
Short Read Micro re-Aligner Paper nilshomer Literature Watch 0 10-29-2010 09:59 AM
New Short Read Aligner sparks Bioinformatics 48 08-26-2009 08:01 AM
Very Short Read aligner Rupinder Bioinformatics 1 06-02-2009 07:10 PM

Reply
 
Thread Tools
Old 08-22-2013, 11:52 AM   #501
sahiilseth
Junior Member
 
Location: texas

Join Date: Aug 2012
Posts: 2
Default

[QUOTE=GenoMax;114232]From Bowtie website:
They also say:
'If your computer has more than 3-4 GB of memory and you would like to exploit that fact to make index building faster, use a 64-bit version of the bowtie2-build binary. The 32-bit version of the binary is restricted to using less than 4 GB of memory. If a 64-bit pre-built binary does not yet exist for your platform on the sourceforge download site, you will need to build one from source.'

I thought 64 bit binary, should be able to handle more characters as well; not true?
sahiilseth is offline   Reply With Quote
Old 08-22-2013, 11:56 AM   #502
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

Quote:
Originally Posted by sahiilseth View Post
From Bowtie website:
They also say:
'If your computer has more than 3-4 GB of memory and you would like to exploit that fact to make index building faster, use a 64-bit version of the bowtie2-build binary. The 32-bit version of the binary is restricted to using less than 4 GB of memory. If a 64-bit pre-built binary does not yet exist for your platform on the sourceforge download site, you will need to build one from source.'

I thought 64 bit binary, should be able to handle more characters as well; not true?
That reference is only for being able to use more memory during the index building stage to speed that process up.
GenoMax is offline   Reply With Quote
Old 09-09-2013, 05:26 AM   #503
subkhankul
Junior Member
 
Location: London, UK

Join Date: Sep 2013
Posts: 1
Default

Dear Ben,
Why is the last version of Bowtie using the mm9 rather than mm10?
What is better Bowtie or Bowtie2 for alighment of 50 nt HiSeq Illumina ChIP-Seq redas?
I have read that Bowtie is good for short reads up to 100 nt, but Bowtie2 from 50 nt and higher. Still 50 nt reads are on the border for the programms.
If Bowtie2 is used, how to get rid of ununique reads?
Many thanks in advance
subkhankul is offline   Reply With Quote
Old 10-24-2013, 04:32 AM   #504
angie_red
angie
 
Location: Ireland

Join Date: Oct 2010
Posts: 2
Default

Hi Ben,
Sorry to resurrect an old post. I am getting the error Error: Reference sequence has more than 2^32-1 characters!. I know this means I need to split my reference in order to use bowtie2-build but I am wondering about mapping my reads to this reference which has been split. Is it possible to concatenate the split-indexed files and map the reads to this concatenated file or will I have to map the reads to each indexed files separately and write scripts to find which has the best hit.
Thank you
Angela
angie_red is offline   Reply With Quote
Old 10-24-2013, 05:37 AM   #505
rsinha
Junior Member
 
Location: Kansas

Join Date: May 2012
Posts: 5
Default Bowtie2

I think you need to map your reads to divided indexes and then write script to bring them together.
rsinha is offline   Reply With Quote
Old 10-24-2013, 04:33 PM   #506
sparks
Senior Member
 
Location: Kuala Lumpur, Malaysia

Join Date: Mar 2008
Posts: 126
Default Reply to thread 'Bowtie, an ultrafast, memory-efficient, open source short read align

Hi Angie,

I think if you use split reference you'll have issues calculating the alignment quality of the best alignment during the merge, it's a bit more complicated than just selecting the best alignment.
You will likely get more accurate alignment qualities if you don't split the reference and instead use an aligner like BWA or Novoalign that can handle genomes >4Gbp.

KR, Colin

Quote:
Originally Posted by angie_red View Post
Hi Ben,
Sorry to resurrect an old post. I am getting the error Error: Reference sequence has more than 2^32-1 characters!. I know this means I need to split my reference in order to use bowtie2-build but I am wondering about mapping my reads to this reference which has been split. Is it possible to concatenate the split-indexed files and map the reads to this concatenated file or will I have to map the reads to each indexed files separately and write scripts to find which has the best hit.
Thank you
Angela
sparks is offline   Reply With Quote
Old 10-24-2013, 11:51 PM   #507
angie_red
angie
 
Location: Ireland

Join Date: Oct 2010
Posts: 2
Default

Thanks for the reply Colin and rshina. As suggested I have indexed the reference without splitting it with BWA so I will proceed with this approach
Cheers
Angela
angie_red is offline   Reply With Quote
Old 10-25-2013, 12:23 AM   #508
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Quote:
Originally Posted by sparks View Post
Hi Angie,

I think if you use split reference you'll have issues calculating the alignment quality of the best alignment during the merge, it's a bit more complicated than just selecting the best alignment.
You will likely get more accurate alignment qualities if you don't split the reference and instead use an aligner like BWA or Novoalign that can handle genomes >4Gbp.

KR, Colin
While using BWA or Novoalign are certainly the better solutions, one can relatively simply recalculate MAPQs from multiple alignment files to different references with bowtie. The bowtie MAPQ score is dependent primarily on the AS:i: and XS:i: score of each read, so you can just rerun the algorithm on that (bowtie MAPQs are more of a vague approximation than you may think). This is the approach I took in bison, where there are multiple parallel alignments of each read to different bisulfite converted genomes.
dpryan is offline   Reply With Quote
Old 10-25-2013, 01:30 AM   #509
sparks
Senior Member
 
Location: Kuala Lumpur, Malaysia

Join Date: Mar 2008
Posts: 126
Default

Agree you can merge Bowtie and calculate a vague alignment quality. Perhaps you can give Angie the formulae for it.

Quote:
Originally Posted by dpryan View Post
While using BWA or Novoalign are certainly the better solutions, one can relatively simply recalculate MAPQs from multiple alignment files to different references with bowtie. The bowtie MAPQ score is dependent primarily on the AS:i: and XS:i: score of each read, so you can just rerun the algorithm on that (bowtie MAPQs are more of a vague approximation than you may think). This is the approach I took in bison, where there are multiple parallel alignments of each read to different bisulfite converted genomes.
sparks is offline   Reply With Quote
Old 10-25-2013, 01:52 AM   #510
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Quote:
Originally Posted by sparks View Post
Agree you can merge Bowtie and calculate a vague alignment quality. Perhaps you can give Angie the formulae for it.
Inputs are the AS and XS score of the resulting best hit (the XS score may be the AS score of the second best hit, that is the alignment to the other chunk of the reference). scMin is the minimum score for a given read (this is derived from the --score-min option given as input). I have a function to calculate this, but it depends on previously parsing user input and storing things in a struct that's specific to bison (so that function wouldn't be very useful), so I won't paste it below. This is basically a C version of what bowtie2 uses (complete with casting single-precision floats to double precision).

Oh, the config.mode just denotes --end-to-end or --local. You'd need to change that to be a function input rather than relying on a global struct I think the remainder should work, though!

Code:
/******************************************************************************
*
*   Calculate a MAPQ, given AS, XS, and the minimum score (ala bowtie2)
*
*******************************************************************************/
int calc_MAPQ_BT2(int AS, int XS, int scMin) {
    int diff, bestOver, bestdiff;
    diff = abs(scMin); //Range of possible alignment scores
    bestOver = AS-scMin; //Shift alignment score range, so worst score is 0

    //The method depends on config.mode
    bestdiff = (int) abs(abs((float) AS)-abs((float) XS)); //Absolute distance between alignment scores
    if(config.mode == 0) { //--end-to-end (default)
        if(XS < scMin) {
            if(bestOver >= diff * (double) 0.8f) return 42;
            else if(bestOver >= diff * (double) 0.7f) return 40;
            else if(bestOver >= diff * (double) 0.6f) return 24;
            else if(bestOver >= diff * (double) 0.5f) return 23;
            else if(bestOver >= diff * (double) 0.4f) return 8;
            else if(bestOver >= diff * (double) 0.3f) return 3;
            else return 0;
        } else {
            if(bestdiff >= diff * (double) 0.9f) {
                if(bestOver == diff) {
                    return 39;
                } else {
                    return 33;
                }
            } else if(bestdiff >= diff * (double) 0.8f) {
                if(bestOver == diff) {
                    return 38;
                } else {
                    return 27;
                }
            } else if(bestdiff >= diff * (double) 0.7f) {
                if(bestOver == diff) {
                    return 37;
                } else {
                    return 26;
                }
            } else if(bestdiff >= diff * (double) 0.6f) {
                if(bestOver == diff) {
                    return 36;
                } else {
                    return 22;
                }
            } else if(bestdiff >= diff * (double) 0.5f) {
                if(bestOver == diff) {
                    return 35;
                } else if(bestOver >= diff * (double) 0.84f) {
                    return 25;
                } else if(bestOver >= diff * (double) 0.68f) {
                    return 16;
                } else {
                    return 5;
                }
            } else if(bestdiff >= diff * (double) 0.4f) {
               if(bestOver == diff) {
                    return 34;
                } else if(bestOver >= diff * (double) 0.84f) {
                    return 21;
                } else if(bestOver >= diff * (double) 0.68f) {
                    return 14;
                } else {
                    return 4;
                }
            } else if(bestdiff >= diff * (double) 0.3f) {
                if(bestOver == diff) {
                    return 32;
                } else if(bestOver >= diff * (double) 0.88f) {
                    return 18;
                } else if(bestOver >= diff * (double) 0.67f) {
                    return 15;
                } else {
                    return 3;
                }
            } else if(bestdiff >= diff * (double) 0.2f) {
                if(bestOver == diff) {
                    return 31;
                } else if(bestOver >= diff * (double) 0.88f) {
                    return 17;
                } else if(bestOver >= diff * (double) 0.67f) {
                    return 11;
                } else {
                    return 0;
                }
            } else if(bestdiff >= diff * (double) 0.1f) {
                if(bestOver == diff) {
                    return 30;
                } else if(bestOver >= diff * (double) 0.88f) {
                    return 12;
                } else if(bestOver >= diff * (double) 0.67f) {
                    return 7;
                } else {
                    return 0;
                }
            } else if(bestdiff > 0) {
                if(bestOver >= diff * (double) 0.67f) {
                    return 6;
                } else {
                    return 2;
                }
            } else {
                if(bestOver >= diff * (double) 0.67f) {
                    return 1;
                } else {
                    return 0;
                }
            }
        }
    } else { //--local
        if(XS < scMin) {
            if(bestOver >= diff * (double) 0.8f) return 44;
            else if(bestOver >= diff * (double) 0.7f) return 42;
            else if(bestOver >= diff * (double) 0.6f) return 41;
            else if(bestOver >= diff * (double) 0.5f) return 36;
            else if(bestOver >= diff * (double) 0.4f) return 28;
            else if(bestOver >= diff * (double) 0.3f) return 24;
            else return 22;
        } else {
            if(bestdiff >= diff * (double) 0.9f) return 40;
            else if(bestdiff >= diff * (double) 0.8f) return 39;
            else if(bestdiff >= diff * (double) 0.7f) return 38;
            else if(bestdiff >= diff * (double) 0.6f) return 37;
            else if(bestdiff >= diff * (double) 0.5f) {
                if     (bestOver == diff)       return 35;
                else if(bestOver >= diff * (double) 0.5f) return 25;
                else                            return 20;
            } else if(bestdiff >= diff * (double) 0.4f) {
                if     (bestOver == diff)       return 34;
                else if(bestOver >= diff * (double) 0.5f) return 21;
                else                            return 19;
            } else if(bestdiff >= diff * (double) 0.3f) {
                if     (bestOver == diff)       return 33;
                else if(bestOver >= diff * (double) 0.5f) return 18;
                else                            return 16;
            } else if(bestdiff >= diff * (double) 0.2f) {
                if     (bestOver == diff)       return 32;
                else if(bestOver >= diff * (double) 0.5f) return 17;
                else                            return 12;
            } else if(bestdiff >= diff * (double) 0.1f) {
                if     (bestOver == diff)       return 31;
                else if(bestOver >= diff * (double) 0.5f) return 14;
                else                            return 9;
            } else if(bestdiff > 0) {
                if(bestOver >= diff * (double) 0.5f)      return 11;
                else                            return 2;
            } else {
                if(bestOver >= diff * (double) 0.5f)      return 1;
                else                            return 0;
            }
        }
    }
}

Last edited by dpryan; 02-11-2014 at 07:31 AM. Reason: Slightly incorrect code
dpryan is offline   Reply With Quote
Old 04-03-2014, 08:31 AM   #511
Steven_hun
Junior Member
 
Location: Hungary

Join Date: Mar 2014
Posts: 3
Default

Hi everybody!

I starting using bowtie today, i wanted to align csfasta + qual file width the bowtie.
I build the reference fasta file width the bowtie-build, after that i try to align the csfasta+qual file to the reference file(s), but i have error massege.
The bowtie-build command:
Quote:
bowtie-build -C reference_genom.fa ref/reference_genom
The bowtie command:
Quote:
bowtie -C ref/reference_genom -f read.csfasta -Q quality.qual -S align.sam
And the error command with my bowtie commnad:
Quote:
bowtie -C ref/reference_genom -f read.csfasta -Q quality.qual -S align.sam
/usr/include/seqan/sequence/string_base.h:237 Assertion failed : static_cast<TStringPos>(pos) < static_cast<TStringPos>(length(me)) was: 48 >= 48 (Trying to access an element behind the last one!)
Aborted
The csfasta file contains only short reads, every sequances are 50 bp long.

My question is that what is the error mean? I try to search this error message but don't found anything.
I installed the bowtie width the following way:
Quote:
sudo apt-get install bowtie
I really appreciate any help/answer.
Thank you!

Last edited by Steven_hun; 04-03-2014 at 08:34 AM.
Steven_hun is offline   Reply With Quote
Old 04-04-2014, 12:53 AM   #512
TiborNagy
Senior Member
 
Location: Budapest

Join Date: Mar 2010
Posts: 329
Default

This means something is wrong with you csfasta or quality file.
TiborNagy is offline   Reply With Quote
Old 05-14-2015, 01:30 PM   #513
fereshteh
Junior Member
 
Location: berlin

Join Date: Dec 2014
Posts: 1
Default

Hi Ben,
really happy that i can talk with you here because at first when i was working with bowtie2 i asked myself how much you can be clever that created bowtie and how much i am not who cant run bowtie properly...
anyway i have a question about --un option:
if i want to separate mapped and unmapped reads when aligning, which code i should type???
bowtie2 -x [name of the bowtie2-build indicized file containing the rRNA sequence] --un [name of the fastq file which will contain the UNMAPPED reads] -U [name of the fastq file containing the reads] -S [name of the .sam file that will contain the MAPPED and UNMAPPED reads]
I could not understand about --un option because i don't know which i should type instead of [name of the fastq file which will contain the UNMAPPED reads]
fereshteh is offline   Reply With Quote
Old 05-14-2015, 02:29 PM   #514
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

I'm obviously not Ben, but "--un unmapped.fastq" or "--un sample.unmapped.fastq" or something along those lines would be common. Pick a name that makes sense to you, it doesn't matter what it is.
dpryan is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:25 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO