SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Bowtie: Ultrafast and memory-efficient alignment of short reads to the human genome Ben Langmead Literature Watch 2 03-04-2013 02:06 AM
The best short read aligner Deutsche Bioinformatics 4 04-14-2011 07:12 PM
Short Read Micro re-Aligner Paper nilshomer Literature Watch 0 10-29-2010 09:59 AM
New Short Read Aligner sparks Bioinformatics 48 08-26-2009 08:01 AM
Very Short Read aligner Rupinder Bioinformatics 1 06-02-2009 07:10 PM

Reply
 
Thread Tools
Old 03-30-2011, 10:09 AM   #421
gntc
Member
 
Location: Phoenix, AZ

Join Date: Feb 2011
Posts: 15
Default repeats

Quote:
Originally Posted by biznatch View Post
Depends whether the index was made from the masked version of hg19 or not. I'm pretty sure the pre-made index from the Bowtie website is made from the non-masked genome. Both masked and non-masked are available here:

http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/

"chromFa.tar.gz - The assembly sequence in one file per chromosome.
Repeats from RepeatMasker and Tandem Repeats Finder (with period
of 12 or less) are shown in lower case; non-repeating sequence is
shown in upper case.

chromFaMasked.tar.gz - The assembly sequence in one file per chromosome.
Repeats are masked by capital Ns; non-repeating sequence is shown in
upper case."
The files in chromFa.tar.gz each start out with a large number of 'N's. Is this due to uncertainty near the ends of chromosomes in sequencing?
gntc is offline   Reply With Quote
Old 03-31-2011, 08:52 AM   #422
tallphil
Phil Ewels
 
Location: SciLifeLab, Stockholm, Sweden

Join Date: Mar 2011
Posts: 12
Default

Quote:
Originally Posted by gntc View Post
The files in chromFa.tar.gz each start out with a large number of 'N's. Is this due to uncertainty near the ends of chromosomes in sequencing?
What good timing - I was just searching out the answer to this exact question for that exact file... Does anyone know the answer?
tallphil is offline   Reply With Quote
Old 03-31-2011, 11:22 AM   #423
biznatch
Senior Member
 
Location: Canada

Join Date: Nov 2010
Posts: 126
Default

Quote:
Originally Posted by gntc View Post
The files in chromFa.tar.gz each start out with a large number of 'N's. Is this due to uncertainty near the ends of chromosomes in sequencing?
I think yes, it is because of uncertainty near the ends of the chromosomes. If you look at hg19 in the UCSC Genome Browser and turn on the Gap track you can see where there are gaps in sequencing on each chromosome. Anywhere there is a gap will be N's in the sequence. There are gaps at the ends of each chromosome because telomeres and subtelomeres are repetitive and difficult to sequence and assemble. There are also large gaps at the centromeres for the same reason.
biznatch is offline   Reply With Quote
Old 04-04-2011, 07:26 AM   #424
droog_22
Member
 
Location: Vienna

Join Date: Nov 2010
Posts: 10
Default Counting Hits in a BAM file

Dear All,

I am using bowtie to align reads to the dm3 genome. I just read that the SAM specifications allow for tags such as H0, H1, etc. which counts the number of 0-differences, 1-difference hits, and so on. I know how to do ass these tags using awk, I was just wondering if it would be straightforward to modify bowtie so that it outputs these values.

Cheers D.
droog_22 is offline   Reply With Quote
Old 04-04-2011, 09:34 AM   #425
gntc
Member
 
Location: Phoenix, AZ

Join Date: Feb 2011
Posts: 15
Default mismatches

Quote:
Originally Posted by droog_22 View Post
Dear All,
I am using bowtie to align reads to the dm3 genome. I just read that the SAM specifications allow for tags such as H0, H1, etc. which counts the number of 0-differences, 1-difference hits, and so on. I know how to do ass these tags using awk, I was just wondering if it would be straightforward to modify bowtie so that it outputs these values.
Bowtie does this automatically. The tag is XA:i:0 (for a read with 0 mismatches).
gntc is offline   Reply With Quote
Old 04-20-2011, 12:19 PM   #426
kerhard
Member
 
Location: Oakland

Join Date: Feb 2011
Posts: 27
Question limit to bowtie-build fasta input files?

Hi all,

I've been trying to make a bowtie index using a long list of annotated transposons as the input fasta files rather than reference chromosome files and bowtie-build does not seem to like it very much.

If I try to use ALL of the fasta files (which is a lot, probably around ~1000), I get the error message:

Error: could not open <fileX.fa>

But if I use only a subset of the fasta files (including fileX.fa), it works just fine.

I'm assuming that it's a memory issue, but the total contents of all of these fasta files is much less than the fasta files containing the full reference genome sequences, and I can make an index with them just fine.

Has anyone had any experience doing something similar? Is there some limit to the number of input files bowtie-build can take? I imagine that I can just split these files up into smaller groups and make several index files, but it would be nice to be able to have all of them in one index.

Thanks for any help/advice!
kerhard is offline   Reply With Quote
Old 04-21-2011, 11:50 AM   #427
kerhard
Member
 
Location: Oakland

Join Date: Feb 2011
Posts: 27
Default bowtie-index can use single fasta file with multiple entries as input

woops, didn't realize that the bowtie indexer could take a single fasta file with multiple entries as input. seems to work just fine that way. because the bowtie website describes the input for bowtie-index as:

"A comma-separated list of FASTA files containing the reference sequences to be aligned to"

i assumed that they had to be separate files, but it looks like they don't.

sorry for the silly question, should have tried this simple solution first.
kerhard is offline   Reply With Quote
Old 04-25-2011, 08:51 AM   #428
dara
Member
 
Location: texas

Join Date: Apr 2009
Posts: 10
Question bowtie -v option not working in version 0.12.3?

Hi all,

I'm finding an issue/possible bug or error on my part with one of bowtie's options, so I thought I would ask here.

I'm trying to use bowtie -v <int> option for a colorspace alignment and everytime I do, it takes me back to the usage parameters, indicating that something is wrong with the arguments.

I know that this works:

bowtie -C -f -n 3 -S -t hg18.cs.bowtie reads.csfasta aln.sam

But one I add the -v option, it doesnt, though it seems from the manual that it should:

bowtie -C -v 4 -f -S -t hg18.cs.bowtie reads.csfasta aln.sam

Any help/suggestions would be really appreciated.

thank you!
dara is offline   Reply With Quote
Old 04-29-2011, 01:50 AM   #429
azer
Junior Member
 
Location: china

Join Date: Oct 2010
Posts: 1
Default

-v 4 may be wrong. (1-3) is ok
azer is offline   Reply With Quote
Old 04-29-2011, 12:23 PM   #430
Gators
Member
 
Location: North Carolina

Join Date: Feb 2011
Posts: 19
Default

So I am having some weird issues building a bowtie index with the hairpin.fa file from mirbase. The file was filtered to get rid of non-human miRNAs and adjusted to get rid of spaces. But I had the same problem before this filtering, etc. was done. I built the index with all default parameters. There is no obvious error message during the building procedure, but I am not sure I would catch anything unless there was the word "error." Anyway after building the index if I align with bowtie it reports back some alignments, but not even close to all of them. The vast majority of what it reports back have 1 or 2 mismatches (v was set to 2),although there are some perfect matches there. The sequences it reports back are also GC rich, which is weird to me...I also tried to build this on another computer and it gave similar results. So clearly something weird with the fasta file...

Any ideas?

I should say that other indexes have been built on this machine w/o problem, as has other alignments...
Gators is offline   Reply With Quote
Old 05-08-2011, 09:12 AM   #431
BioSlayer
Member
 
Location: Wellington

Join Date: Feb 2010
Posts: 26
Default bowtie never finishes nor read all ebwt indices

I have been following this thread from the beginning, I have few issues... the past four days I had a running bowtie instant and it never finished, had to kill it, my command was as follows
Code:
bowtie hg19 -q /CombinedReads/SRR065070_Combined.fastq -S  align.map --offrate 20 -p 2
So, looking around for possible causes I saw that an issue was registered at Sourceforge but was not followed up, I don't know if my situation in here is replicable but here are the factors that may have had some influence on that above behavior:
01- I downloaded indices from the bowtie website and unzipped that to a directory, that is the same directory I navigated to and ran the bowtie command from. So, I suppose bowtie could automatically relate to this. I took this measure since unzipping to the index folder within bowtie could not get it to read the indices (it kept complaining that it could not find an index hg19) so I created a directory and invoked bowtie from within it.
02- The file I get the reads from is downloaded from SRA (SRR065070), it is located in another directory from where I am calling bowtie (it is about 6 GBs) and has around 19 million reads. I used samtools to create the forward and backward reads in fastq format...
03- My system is a Ubuntu, 32 bits, 2 GB RAM, 7 GB SWAP.
04- The $bowtie --version output is


bowtie version 0.12.7
32-bit
Built on bio-laptop
Thu Apr 21 21:12:27 AST 2011
Compiler: gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5)
Options: -O3 -Wl,--hash-style=both
Sizeof {int, long, long long, void*, size_t, off_t}: {4, 4, 8, 4, 4, 8}



Then, trying to investigate the behavior somewhat deeper (--verbose), I notice that out of the 6 ebwt indices (hg19.1.ebwt...hg19.4.ebwt, hg19.1.rev.ebwt and hg19.2.rev.ebwt), only four are being read (it just doesn't open hg19.3.ebwt nor hg19.4.ebwt), I tested that by passing a query from the STDIN...
Code:
$ bowtie hg19 -c acgggtttaa  test.map -t --verbose
and following is a brief excerpt from the output log
Opening hit output file: 15:16:21
About to initialize fw Ebwt: 15:16:21
About to open input files: 15:16:21
Opening "hg19.1.ebwt"
Opening "hg19.2.ebwt"
Finished opening input files: 15:16:21

About to initialize rev Ebwt: 15:16:21
About to open input files: 15:16:21
Opening "hg19.rev.1.ebwt"
Opening "hg19.rev.2.ebwt"
Finished opening input files: 15:16:21
Reading header: 15:16:21

About to open input files: 15:16:21
Opening "hg19.1.ebwt"
Opening "hg19.2.ebwt"
Finished opening input files: 15:16:21
Reading header: 15:16:21

About to open input files: 15:16:44
Opening "hg19.rev.1.ebwt"
Opening "hg19.rev.2.ebwt"
Finished opening input files: 15:16:44


Seeking your guidance and support with appreciation...
BioSlayer is offline   Reply With Quote
Old 05-09-2011, 06:34 AM   #432
harrike
Member
 
Location: winchester, va

Join Date: Jun 2010
Posts: 15
Default

Here maybe the best place for my question.

I am trying to use Bowtie to map about 300,000 reads to my reference. I use the command: bowtie -a -v 0 -p 10 -t INDEX_FILE -f READS_FILE.fasta > RESULT_FILE --un unmapped.txt.

The command couldn't be finished, an error showed up "You Mac OSX startup disk has no more space available for application memory". The Bowtie process took about 30 GB memory and froze.

I checked my startup disk (Macintosh HD). It still has about 1 Tb space available. I don't know what's going on.

I am using a computer with 16 cores and 32 GB memory.

Hope somebody here can help me. Thanks in advance.
harrike is offline   Reply With Quote
Old 05-09-2011, 06:59 AM   #433
sdvie
Member
 
Location: Spain

Join Date: Jul 2010
Posts: 68
Default

Quote:
Originally Posted by harrike View Post
Here maybe the best place for my question.

I am trying to use Bowtie to map about 300,000 reads to my reference. I use the command: bowtie -a -v 0 -p 10 -t INDEX_FILE -f READS_FILE.fasta > RESULT_FILE --un unmapped.txt.

The command couldn't be finished, an error showed up "You Mac OSX startup disk has no more space available for application memory". The Bowtie process took about 30 GB memory and froze.

I checked my startup disk (Macintosh HD). It still has about 1 Tb space available. I don't know what's going on.

I am using a computer with 16 cores and 32 GB memory.

Hope somebody here can help me. Thanks in advance.
are you executing this command from your Mac on a remote machine with 16 cores and 32 GB, or does your Mac have 16 cores and 32 GB itself ?

The TB space available will not help if the application freezes because of lack of RAM.
sdvie is offline   Reply With Quote
Old 05-09-2011, 07:02 AM   #434
harrike
Member
 
Location: winchester, va

Join Date: Jun 2010
Posts: 15
Default

Thanks, Sdvie.

It my Mac which has 16 cores and 32 GB. I don't use remote control.
harrike is offline   Reply With Quote
Old 05-09-2011, 07:08 AM   #435
sdvie
Member
 
Location: Spain

Join Date: Jul 2010
Posts: 68
Default

Quote:
Originally Posted by harrike View Post
Thanks, Sdvie.

It my Mac which has 16 cores and 32 GB. I don't use remote control.
I am suprised, that bowtie seems to be so memory-intensive in your case... especially with relatively few reads. Did you use a particularly large genome?

cheers,
Sophia
sdvie is offline   Reply With Quote
Old 05-09-2011, 07:24 AM   #436
harrike
Member
 
Location: winchester, va

Join Date: Jun 2010
Posts: 15
Default

Quote:
Originally Posted by sdvie View Post
I am suprised, that bowtie seems to be so memory-intensive in your case... especially with relatively few reads. Did you use a particularly large genome?

cheers,
Sophia
I am using the apple genome which is about 750 mb.

Actually, I tried the mapping several times. At the beginning, the size of the output file kept increasing and the Bowtie command only took about 2 GB RAM. After several minutes, the size stopped increasing but the RAM used by Bowtie command rise steadily to reach about 30 GB and then froze there.

Another interesting thing is that there is a difference between output file sizes when I specified different number of cores used. i.e. the output file size is 2.66 GB when I used option '-p 10', but 2.38 GB without this option. There are about 3 M aligns different. How can this happen?

Last edited by harrike; 05-09-2011 at 07:27 AM.
harrike is offline   Reply With Quote
Old 05-10-2011, 10:48 AM   #437
JimC
Member
 
Location: Ann Arbor, MI

Join Date: Nov 2008
Posts: 10
Default

Try removing the command line option -a You are going to report ALL possible alignments for all reads. if you have repetitive sequences, this could be causing the memory problem. I would set a max number of matches to some high but useful value such as 10 or 30 or 40. Try that.

Jim
JimC is offline   Reply With Quote
Old 05-10-2011, 09:25 PM   #438
BioSlayer
Member
 
Location: Wellington

Join Date: Feb 2010
Posts: 26
Default

Quote:
Originally Posted by harrike View Post
I am trying to use Bowtie to map about 300,000 reads to my reference. I use the command: bowtie -a -v 0 -p 10 -t INDEX_FILE -f READS_FILE.fasta > RESULT_FILE --un unmapped.txt.
using the output direction via '>' seems a bit unscrupulous, what do you want to writ in the RESULT_FILE, or, why are you using the '>'?, I would consider a more direct something like:

Code:
bowtie -a -v 0 -p 10 -t INDEX_FILE -f READS_FILE.fasta  --un unmapped.txt
A few days ago, I used the output direction to capture output from the --verbose flag of bowtie and in less than 8 hours, 14 GBs of space went waste compared to obtaining only a 188 MB of results in the alignment file...
BioSlayer is offline   Reply With Quote
Old 07-11-2011, 02:32 PM   #439
genlyai
Member
 
Location: Boston, MA

Join Date: Aug 2009
Posts: 39
Default

Hi. I wonder if anyone can help me, as I think bowtie (0.12.7) is misbehaving.

I'm trying to map reads to a sequence with a short duplicated stretch. The problem is that given a read that should clearly map to one repeat (i.e. it has some unique sequence flanking the repeat) sometimes maps to the wrong repeat instead.

For instance, given the read pair

Code:
@HWI-ST568_0055:8:1106:17676:67081#GCCAAT/1
ATTGCTGAAGAGCTTGGCGGCGAATGGGCTGACCGCTTCCTCGTGCTTTA
+HWI-ST568_0055:8:1106:17676:67081#GCCAAT/1
ggggggggggggggggggggggegeeggegedgeegeegggggdgegdge

@HWI-ST568_0055:8:1106:17676:67081#GCCAAT/2
GATATCCTGTTTGGCCCATATTCAGCTGTTCCATCTGTTCTTGGCCCTGA
+HWI-ST568_0055:8:1106:17676:67081#GCCAAT/2
ggggggggggggggggggggggggggggggggggggggggggggggbgge
if I run

Code:
bowtie -q --solexa1.3-quals -v 3 --minins 100 --maxins 450 --best -k 1 -t -p 8 index_name -1 testB1.fq -2 testB2.fq
I get

Code:
HWI-ST568_0055:8:1106:17676:67081#GCCAAT/1      +       seq_id      5188   ATTGCTGAAGAGCTTGGCGGCGAATGGGCTGACCGCTTCCTCGTGCTTTA       HHHHHHHHHHHHHHHHHHHHHHFHFFHHFHFEHFFHFFHHHHHEHFHEHF      0
HWI-ST568_0055:8:1106:17676:67081#GCCAAT/2      -       seq_id      5458   TCAGGGCCAAGAACAGATGGAACAGCTGAATATGGGCCAAACAGGATATC       FHHCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH      0       40:G>A,43:T>C,46:A>G
Those three mismatches should (?) make this alignment not show up, given that there is another site where this could align with no mismatches. Even stranger, if I run without the --best option

Code:
bowtie -q --solexa1.3-quals -v 3 --minins 100 --maxins 450 -k 1 -t -p 8 index_name -1 testB1.fq -2 testB2.fq
I do get the "right" answer

Code:
HWI-ST568_0055:8:1106:17676:67081#GCCAAT/1      +       seq_id      5188   ATTGCTGAAGAGCTTGGCGGCGAATGGGCTGACCGCTTCCTCGTGCTTTA       HHHHHHHHHHHHHHHHHHHHHHFHFFHHFHFEHFFHFFHHHHHEHFHEHF      0
HWI-ST568_0055:8:1106:17676:67081#GCCAAT/2      -       seq_id      5533   TCAGGGCCAAGAACAGATGGAACAGCTGAATATGGGCCAAACAGGATATC       FHHCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH      0
Anyway, I recognize this is probably a solved problem, but I'm having a tough time understanding what's going on, so if anybody could help me understand what's up, I'd be really grateful.
genlyai is offline   Reply With Quote
Old 07-19-2011, 08:34 AM   #440
medalofhonour
Member
 
Location: Brighton

Join Date: Jul 2011
Posts: 16
Default Using "Eland" input format in Bowtie

Like Bowtie !

Last edited by medalofhonour; 07-19-2011 at 09:17 AM.
medalofhonour is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:50 PM.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.