Seqanswers Leaderboard Ad

**Ben Langmead** · 06-27-2010, 09:59 AM

It's hard to know if there is a problem unless we know (a) how many reads you have and (b) how many lines of output bowtie generates before your disk quota is exceeded. It seems likely that Bowtie's doing what it should do.

Ben

**jjw14** · 06-27-2010, 10:27 AM

Thanks for the reply. As to the information needed:

a) The current file I am working with has 29751337 illumina reads, converted into sanger reads.

b) I can't seem view the SAM output file contents (e.g. in a text editor) in their entirety. The first few lines that do come up are pasted below.

I'm sure Bowtie is doing what I'm asking it to do. Being new to this type of analysis, I'm sure the glitch is on my end, not the software's. I'm very impressed with the alignment program you have produced.

Much appreciated!
JJW

HWUSI-EAS558R_0001:2:1:1054:8831#0/1 16 15 64381792 255 42M * 0 0 CCCTAACATGGGAAAGGGATCACTCACTCAAATCCATGAAGC 5EEEBEBECBED>@41:ECEE$
HWUSI-EAS558R_0001:2:1:1054:8831#0/1 16 7 78866817 255 42M * 0 0 CCCTAACATGGGAAAGGGATCACTCACTCAAATCCATGAAGC 5EEEBEBECBED>@41:ECEE$
HWUSI-EAS558R_0001:2:1:1054:8831#0/1 16 8 36985051 255 42M * 0 0 CCCTAACATGGGAAAGGGATCACTCACTCAAATCCATGAAGC 5EEEBEBECBED>@41:ECEE$
HWUSI-EAS558R_0001:2:1:1054:8831#0/1 16 7 78903247 255 42M * 0 0 CCCTAACATGGGAAAGGGATCACTCACTCAAATCCATGAAGC 5EEEBEBECBED>@41:ECEE$
HWUSI-EAS558R_0001:2:1:1054:8831#0/1 16 7 78837623 255 42M * 0 0 CCCTAACATGGGAAAGGGATCACTCACTCAAATCCATGAAGC 5EEEBEBECBED>@41:ECEE$
HWUSI-EAS558R_0001:2:1:1054:8831#0/1 16 5 22131022 255 42M * 0 0 CCCTAACATGGGAAAGGGATCACTCACTCAAATCCATGAAGC 5EEEBEBECBED>@41:ECEE$
HWUSI-EAS558R_0001:2:1:1054:8831#0/1 16 9 9081785 255 42M * 0 0 CCCTAACATGGGAAAGGGATCACTCACTCAAATCCATGAAGC 5EEEBEBECBED>@41:ECEECDDECDEE$
HWUSI-EAS558R_0001:2:1:1054:11106#0/1 4 * 0 0 * * 0 0 TCATATTGCTTTTTGAACTTGATGAACTGTCTGATAGTTTAT B=B=B-AACCDDDDD=DDD?CCCC?=DDD$
HWUSI-EAS558R_0001:2:1:1054:4407#0/1 4 * 0 0 * * 0 0 CAGAGTGTCTATGTGAAGCCGTATGTCTTGAAGAGAAGCTTT D?BD@@6?@@DD?DD-BD??CDDDDD=:=$
HWUSI-EAS558R_0001:2:1:1054:5597#0/1 16 15 120471582 255 42M * 0 0 AAAAAGAGAAATTTGATTATAGTATATTCATTCATTCAAGAA AEDDGFGDDCFFD

FGGFAG$

**lh3** · 06-27-2010, 11:02 AM

because you use "-a"

**jjw14** · 06-27-2010, 11:06 AM

Thanks,

I'll drop the -a option. Out of curiosity, don't I want all valid alignments?

JJW

**DrD2009** · 06-27-2010, 11:38 AM

Did you remove the adapters from your Solexa reads?

I know from personal experience that if you do trim the adapters off and don't set a minimum length, say 17bp, you could end up with sequences that are only a few base pairs long. Needless to say those sequences will be found thousands of times in the reference genome you are using and create a HUGE SAM file, if you don't run out of memory first.

If you want the most valid alignments use '-a', but you could also use '-k x' (example: -k 40, which would find the best 40 regions that match a read to the reference genome).

Welcome to the wonderful world of Solexa sequencing data.

Hope this might help you.

**jjw14** · 06-27-2010, 12:47 PM

Thanks so much for the great advice! I assumed (always a bad idea) that the reads sent back to me from our DNA core had trimmed the adapters. I will verify this before proceeding.

JJW

**DrD2009** · 06-27-2010, 01:54 PM

If it created a file that large I would assume they are already trimmed. If they weren't and still had the adapters on them you would get around a 1.00% alignment to your reference genome and it would complete the alignment in a few minutes or less. I've done this before. haha '-t' is a great parameter.

**jjw14** · 06-27-2010, 02:11 PM

Actually, I'm glad to hear you say that. I think your earlier suggestion to use the -k option to limit the number of valid alignments returned sounds like a great way to go. I'll give it a try and post the outcome.

Thanks for helping out a greenhorn!

**DrD2009** · 06-27-2010, 02:30 PM

Thanks, I'm just glad I'm at a point I can actually start helping others with problems I've experienced.

Us greenhorns have to stick together.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

SAM output from Bowtie >50GB?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News