Seqanswers Leaderboard Ad

**nickloman** · 11-21-2009, 04:49 AM

My hunch, if you are running on 32-bit Linux is that you are hitting the 2GB file limit. You could test this is the case by trying a smaller test file to see if Bowtie opens it, e.g. head -10000 mybigfile > mysmallfile. See this page for more detail (http://linuxmafia.com/faq/VALinux-kb...ize-limit.html). You might want to consider recompiling Bowtie with large file support, see (http://www.suse.de/~aj/linux_lfs.html).

**CarlElit** · 11-21-2009, 05:28 AM

Thanks very much for your reply.

But I am using 64-bit Fedora 11. Does it has the same limit?

**HTS** · 11-21-2009, 08:08 AM

64-bit Linux shouldn't have the 2GB file limit, so I am not sure what causes your problem, but pre-built indexes of hg18 and hg19 (with headers chr1, chr2 etc) can be readily downloaded from the Bowtie main page, which seems to be what you want anyway. BTW, when indexing the human reference genome on my 64-bit Linux machine with bowtie 0.11.3, the memory footprint is ~5GB. Hope this helps somewhat,

-- Leo

**CarlElit** · 11-21-2009, 09:07 AM

Thanks, dear Leo.

It is the first time track memory footprint, could you be so nice to tell whether my approach is feasible? When I run "./bowtie ...", I get its process id by "top", say 1000. Then I use "cat /proc/1000/status" to print memory footprint. I reported 6.7G, which is actually the addition of all peak values. If we consider "VmWHM", it is maybe ~4G. I am still not very clear about what is "VmWHM"...

And one more question, hope no so stupid. What is "hg18", is it "human genome 18"? So what is the difference between "hg18" and "hg19"? There are also NCBI 36.3, mm9... What do they stand for?

Thank you again.

**HTS** · 11-21-2009, 09:35 AM

Originally posted by CarlElit View Post

Thanks, dear Leo.

It is the first time track memory footprint, could you be so nice to tell whether my approach is feasible? When I run "./bowtie ...", I get its process id by "top", say 1000. Then I use "cat /proc/1000/status" to print memory footprint. I reported 6.7G, which is actually the addition of all peak values. If we consider "VmWHM", it is maybe ~4G. I am still not very clear about what is "VmWHM"...

And one more question, hope no so stupid. What is "hg18", is it "human genome 18"? So what is the difference between "hg18" and "hg19"? There are also NCBI 36.3, mm9... What do they stand for?

Thank you again.

You are welcome! I simply used top to take a rough look at the memory used by bowtie-build, but you could use more sophisticated tools such as memtime or tstime if you want. If memory is really an issue, you could use the -p option to trade speed for memory, but I would recommend you to add more RAM if possible

. hg18, hg19, mm9 are genome build names given by UCSC, while NCBI 36.3, NCBI 37.1 etc are names used by NCBI. For example, hg18 has exactly the same sequences as NCBI 36.3 while hg19 has exactly the same sequences as NCBI 37.1, only the header information is slightly different. UCSC use headers such as chr1, chr2 to be compatible with the UCSC genome browser while NCBI use more obscure headers such as ">gi|224589800|ref|NC_000001.10| Homo sapiens chromosome 1, GRCh37 primary reference assembly".

**CarlElit** · 11-21-2009, 04:57 PM

Wo, I learn a lot.

Thanks, Leo.

**lh3** · 11-21-2009, 06:20 PM

A comment. NCBI/UCSC/Ensembl do not use exactly the same human reference sequence. This is is true at least for build 36. UCSC concatenates unassembled contigs into chrN_random. Ensembl masks out pseudoautosomal regions on Y. I recommend the Ensembl reference sequence. If you use the NCBI/UCSC genome, you can essentially map no read to pseudoautosomal regions.

**HTS** · 11-21-2009, 08:25 PM

Originally posted by lh3 View Post

A comment. NCBI/UCSC/Ensembl do not use exactly the same human reference sequence. This is is true at least for build 36. UCSC concatenates unassembled contigs into chrN_random. Ensembl masks out pseudoautosomal regions on Y. I recommend the Ensembl reference sequence. If you use the NCBI/UCSC genome, you can essentially map no read to pseudoautosomal regions.

Thanks for your clarification, Heng! I checked hg19 and NCBI 37.1 the other day and they are exactly the same, i.e., both don't include chrN_random now. Personally I stick with UCSC for convenience reasons most of the time, but this may not be enough depends on what you do (as pointed out by Heng). CarlElit, as you can see there are quite some subtleties in Bioinformatics and you just have to learn to cope with them along the way (and hopefully contribute to make things better as well

).

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Bowite "Error: could not open" fa file

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News