Seqanswers Leaderboard Ad

**tniranj1** · 03-12-2009, 05:28 PM

You're probably right... I'm using for too old an operating system. Hopefully I will have results-oriented questions for you in the future, rathe than technical installation stuff. ;-)
Thanks again,
-TiN

**SillyPoint** · 03-13-2009, 07:45 AM

Ben, your answer to What Da Seq's question a couple of days ago about ambiguity characters was:

A result of Bowtie's indexing strategy is that alignments involving one or more ambiguous reference characters (N, -, R Y, etc.) are considered invalid by Bowtie, regardless of the alignment policy. This is true only for ambiguous characters in the reference; alignments involving ambiguous characters in the read are legal, subject to the alignment policy.

Do I interpret "legal, subject to the alignment policy" to mean they are accepted, but counted as mismatches subject to the -n limit?

Thanks,

--SP

**Ben Langmead** · 03-13-2009, 07:53 AM

Yes, that's correct. An ambiguous characters in the read is "charged" as a mismatch, which can affect whether the alignment is legal according to the alignment policy.

**ieuanclay** · 04-06-2009, 10:42 AM

Hi Ben,

Thanks for your advice earlier, post-processing of results is now working well!

However I have found that using --all isn't a problem, until i specify --nostrata as well which causes me to rapidly run out of memory (>32Gb) and get the std::bad_alloc error I think a few people mentioned earlier. I had built indices with smaller -o, but even using a high offrate, small footprint index has the same problem. Any suggestions (other than not using --nostrata...)?

THanks again,

Ieuan

**Ben Langmead** · 04-06-2009, 11:59 AM

Hi Ieuan,

Can you tell me which Bowtie version/index/reads/arguments you're using? Also, could you give the same experiment a try with version 0.9.9.2 (just released)? I'll take a look.

Thanks,
Ben

**ieuanclay** · 04-07-2009, 01:48 AM

Hi Ben,

I have seen this in 0.9.9 and 0.9.9.1 (x64), m.musculus ncbi36 and 37 indices (offrates 2,3,4,5). Args were -q --solexa-quals -a --unfq ... -p 2/3/6 . Input was only ~1/2 Mb of fastq reads! Worrying because the real input will be >2Gb! In all cases the combination of -a and --nostrata seemed to be causing the problem, because with only -a the footprint was as expected.

I will try 0.9.9.2 today and get back to you - i checked yesterday and noticed you had improved the --best behaviour (thanks!) so i'll try it with that too.

Thanks again,

Ieuan

## update ##
0.9.9.2 does not have the same problem, and has roughly the rsaem footprint for both -a and -a --nostrata. Any idea what the change was? Either way I am happy!

**thondeboer** · 04-08-2009, 09:30 AM

Hi Ben,

Complete Genomics here....
Have you tried to use our gapped read structure yet with Bowtie? As you may know, we have quite an unusual read structure so most mapping software is not able to use this effectively and we have build our own, but our customers would probably want to use other mapping software as well if only to compare our mapping to theirs...

The data is available in the SRA under number SRA008092

ftp://ftp.ncbi.nlm.nih.gov/sra/Submi...008/SRA008092/

You can also get a sample data set which is part of the API we have released.

Page not found - Complete Genomics

http://www.completegenomics.com/developer/default.aspx

We are considering changing to the SAM/BAM format as the export of our mapping data...Are you considering supporting SAM/BAM as an output format as well?

Thanks!

Thon

**Ben Langmead** · 04-08-2009, 01:26 PM

Hey Thon,

We haven't tried implementing gapped alignment yet, though tools like BWA and SOAP2 show it's doable in this framework. Can you describe the "unusual read structure"?

Yes, we would certainly like to support SAM/BAM output eventually. It's on the TODO list!

Thanks,
Ben

**thondeboer** · 04-08-2009, 02:30 PM

Hi Ben,

You can read more on our read structure on our website and on this forum as well:

Question and Confuse about Complete Genomics - SEQanswers

http://seqanswers.com/forums/showthread.php?t=1307

Sequencing technologies without a commercially released platform (Oxford Nanopore, Halcyon Molecular, etc.)

Page not found - Complete Genomics

http://www.completegenomics.com/pages/materials/CompleteGenomicsTechnologyPaper.pdf

But basically we have a gapped read structure of 5 + 10 + 10 + 10 (times two) bases.
The first gap is "negative" that is, has overlap between the 5 and 10 base reads.
The other gaps are positive, that is, gaps in the more classical sense.

You won't know the negative gap value (it can vary from 1 to 3 overlaps) unless you map the data (or unless there is only one way to overlap) onto the reference genome.

Good to hear you are in support of SAM/BAM. We are considering this as our export format as well...

Thon
Complete Genomics

**Ben Langmead** · 04-09-2009, 02:16 PM

Hi Ieuan,

Originally posted by ieuanclay View Post

0.9.9.2 does not have the same problem, and has roughly the rsaem footprint for both -a and -a --nostrata. Any idea what the change was? Either way I am happy!

--best mode got an overhaul in 0.9.9.2 such that --best now conducts a best-first search, rather than a depth-first search with buffering and flushing of results, as before. My suspicion is that the old approach was, for some reads, buffering a huge number of results and exhausting memory. I'll take a harder look, though.

Thanks,
Ben

**dara** · 04-30-2009, 07:44 AM

BOWTIE_BUILD: Problems when using with large reference genomes?

Hi all,

I've been trying to run bowtie using the human_genomic.fa file from blast db as reference. When I attempted to use Bowtie-build to break up this large file into indexes, I keep getting a 'Error: could not open human_genomic.fa' message.
I tried creating a file with just the first 10000 lines of the human genome and that works fine. I thought bowtie can easily handle such big reference files. Has anyone else faced this issue- any suggestions of how to overcome it?

Here's what I did: ./bowtie-build -f human_genomic.fa human_genom

thanks

**Ben Langmead** · 05-01-2009, 06:41 AM

Hi dara,

How large is the human_genomic.fa file? Are you using 32-bit or 64-bit bowtie-build? I've not seen this before. Most versions of Linux and glibc can handle very large files with no problem.

I suspect that once you fix this problem, you'll run into the problem that Bowtie can only index reference sequences in chunks of about 3.6 Gbases or so. When you try to feed bowtie-build an input with too much sequence, it will say "Error: Reference sequence has more than 2^32-1 characters! Please divide the reference into batches or chunks of about 3.6 billion characters or less each and index each independently." This is because Bowtie uses 32-bit ints internally to refer to offsets in the index. We may fix this some day, but until then you'll have to work around this by indexing your reference in chunks.

Ben

**dara** · 05-01-2009, 07:11 AM

Hi Ben,

Thank you for your response. The file is a human genome download from blast- Its about 8.3 gb in size and I was using the default 32-bit version of bowtie-build. Alright I will try what you suggested- will split the genome (by chromosome maybe) and then feed those splits to the bowtie-build.

I will let you know if that causes any issues.

Thanks

**dara** · 05-07-2009, 06:05 AM

Hi Ben,

Once the reference file has been split into chunks, do they have to be made into seperate indexes? So, for example if I've split the reference into chrom1, chrom2 and chrom3, would I need to do:

./bowtie-build -f chrom1 indexchrom1
./bowtie-build -f chrom2 indexchrom2
./bowtie-build -f chrom3 indexchrom3

If I build separate indexes, how would I call all of them when mapping with my reads file?

Thanks for your help

**dara** · 05-07-2009, 06:24 AM

Also another question for you:

Any updates on plans for bowtie supporting gapped alignment?

thanks

Topics	Statistics	Last Post
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks by seqadmin Started by seqadmin, Yesterday, 05:31 AM	0 responses 10 views 0 likes	Last Post by seqadmin Yesterday, 05:31 AM
Small Blood Stem Cell Subset Linked to Immune System Aging by seqadmin Started by seqadmin, 10-24-2024, 06:58 AM	0 responses 20 views 0 likes	Last Post by seqadmin 10-24-2024, 06:58 AM
New AI Model Designs Synthetic DNA Switches for Targeted Gene Expression in Specific Cell Types by seqadmin Started by seqadmin, 10-23-2024, 08:43 AM	0 responses 50 views 0 likes	Last Post by seqadmin 10-23-2024, 08:43 AM
Microbes in Urban Spaces Adapt to Disinfectants and Scarce Resources by seqadmin Started by seqadmin, 10-17-2024, 07:29 AM	0 responses 58 views 0 likes	Last Post by seqadmin 10-17-2024, 07:29 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News