Seqanswers Leaderboard Ad

**xuying** · 01-12-2010, 03:09 PM

Hi Ben:
I will put the csfastq (maybe part of it) later somewhere because it's huge.
And I am using bowtie 0.12.1 (but color index was built by using 0.12-beta).
There are millions of lines in SAM and pileup files. So fixed "48M" in SAM and fixed "A" in pileup file are unreasonable. (pls wait for me to send you the csfastq files). Thanks a lot! :-)

**Ben Langmead** · 01-12-2010, 04:14 PM

Originally posted by xuying View Post

There are millions of lines in SAM and pileup files. So fixed "48M" in SAM and fixed "A" in pileup file are unreasonable. (pls wait for me to send you the csfastq files). Thanks a lot! :-)

Why? Given that M = "match or mismatch", when would you expect something other than 48M?

Ben

**xuying** · 01-12-2010, 09:12 PM

Oh, yes, sorry. I just confused the file with CIGAR notation.

**xuying** · 01-13-2010, 07:04 AM

Hi Ben:
It seems I can't find a suitable place to put my csfastq file.
Here I just show some lines in the csfastq file generated from program "solid2fastq" of bfast. Do you think it is ok to go? Should I remove the first primer letter and 1st color to get a true base there?

@2292_469_84
T210002310010221002200330303002200201120221.2111.2.
+
8<;==:=@?=<<>>>;;??<=<;96:?:5<>;85:=7,,:5/",(/)"*"
@2292_469_216
T000111101020011320222113222200220200120202.2222.2.
+
/6=>=::>>=;==>;;6=;;9<6:8<(3:-<;/9:852=-7/"2(6)")"
@2292_469_274
T300101122322222232222222210222222222022220.2222.2.
+
,=#$$#@%#'#>$,&(;$*$*=)*'&6%,%##*,+#,4),#)",5'#","

**acnoll** · 01-18-2010, 02:18 PM

Option for output of pairs where only one end aligns

With bowtie's current set of options is it possible to have pairs with only one end mapping to the genome be included in the alignment file (e.g. sam file)? I am interested in identifying intra-read short indels through the
anchoring of one of a mate pair's ends.

**SillyPoint** · 01-20-2010, 11:25 AM

I'd just logged on here to post exactly the question acnoll poses above: "is it possible to have pairs with only one end mapping to the genome be included in the alignment file?"

The implication there, which after reading the manual and running Bowtie 0.12.1 I believe, is that only read pairs which both match, and fall within the -I/-X constraints, will be output. True?

The alternative for now is to specify the -a option to get all the mapped output, and post-process that to find what you're interested in, be that the best pair (for some definition of "best"), or reads where only one end matches.

To have the option to do that directly in Bowtie would be nice.

--TS

**bekkari** · 01-21-2010, 01:45 PM

Hi Ben,
Can some one pleast let me know whether bowtie works with longer inserts (~20kb) between mate pairs?

Thanks

**malcook** · 01-22-2010, 10:35 AM

bowtie: should I mask the pseudoautosomal segments of human genome

What do you think of my plan to mask the pseudoautosomal segments of human Y chromosome prior to running bowtie on an RNASeq project.

Since pseudoautosomal portion of human genome chromosomes X & Y are sequence-wise identical, any alignment strategy that utilizes only unique alignments will discard all alignments to these regions, as each aligning read will have two matches. Thus the 24 known genes wont be counted.

I plan to use EMBOSS' `maskseq` to "hard mask" (replace with 'N') chrY prior to building the bowtie indices at:

chrY:10001-2649520
chrY:59034050-59363566

Does anyone see a problem with this approach?

I see the `--ntoa` option of the bowtie manual that explicitly states that "By default, Ns are simply excluded from the index and bowtie will not report alignments that overlap them." Does anyone know if the same is true for Xs?

Finally, do you agree that the ability to direct bowtie-build to ignore portions of <reference_in> would be a sensible feature to request?

Thanks for thinking!

Malcolm Cook
Stowers Institute for Medical Research

**amaer** · 01-28-2010, 11:04 AM

Originally posted by Ben Langmead View Post

Hi amaer,

Perhaps by end-of-year. It's very hard to say because most of my time goes to collaborators, and they don't have predictable schedules

. But by end-of-year is a reasonable guess.

Thanks,
Ben

Hi Ben,

What's the status of doing gapped alignments? Do you have an estimated date?

thanks, and keep up the great work!

**Ben Langmead** · 01-28-2010, 11:42 AM

I'm working on this now. I don't have any time estimates.

Thanks,
Ben

**jlmlj** · 01-29-2010, 12:34 PM

Hi Dr. lengmead,

I am doing data analysis for ChIP-seq experiments on transcription factor binding sites. I have 5 million raw reads (76 bp read length) per sample from Illumina platform. I used bowite 0.11.3 to align these reads to reference human genome.

The code I used for one high quality alignment was:
~/120809_ChiPseq/bowtie-0.11.3_linux_x86_64/bowtie --solexa1.3-quals -v 2 -a -m 1 -t -p 30 --un result_chipseq2/index2.hq.un --max result_chipseq2/index2.hq.max indexes_chipseq1/h_sapiens_asm reads/index2.fq > result_chipseq2/index2.hq.bt

The result is as below:
Reads uniquely aligned was 45~%,
Reads multiple aligned was ~6%,
Read failed to align was ~49%.

Then I increased mismatches to 3 (-v 3) and trimmed the low quality end (--trim3 22). However I still had ~45% reads failed to align.

There are two questions bother me:
1. I have 76bp read length, however bowtie only allows me 3 mismatches at maximum, which I think it is too stringent. Do you think the bowtie will allow more mismatches (7-8) for 76bp or even longer 100bp read length?

2. I have ~45-49% reads failed to align (no repeats included) to human reference genome, which is very high. I thought the rate is too high to accept. Do you have any idea of how it happens?

Many thanks for your help,
jlmlj

**Xi Wang** · 01-29-2010, 10:30 PM

There are two questions bother me:
1. I have 76bp read length, however bowtie only allows me 3 mismatches at maximum, which I think it is too stringent. Do you think the bowtie will allow more mismatches (7-8) for 76bp or even longer 100bp read length?

There is another parameter set of bowtie to deal with the mismaches when mapping reads back to the reference genome: -n -e -l

2. I have ~45-49% reads failed to align (no repeats included) to human reference genome, which is very high. I thought the rate is too high to accept. Do you have any idea of how it happens?

Maybe the rate is among the normal. If the ChIP-seq reads are from the repeat regions, and you masked the repeat regions when mapping, these reads will fail to map. And there are still quite a few 'N's in the human reference genome.

**jlmlj** · 02-04-2010, 08:30 PM

"Maybe the rate is among the normal. If the ChIP-seq reads are from the repeat regions, and you masked the repeat regions when mapping, these reads will fail to map. And there are still quite a few 'N's in the human reference genome.[/QUOTE]"

Thanks a lot for the reply, Xi. I checked the human genome that I used, it was un-masked version, and I did not mask repeat regions when mapping, so it may not be the case...

I am thinking to try a couple of parameters, such as --strata, however it looks a bit tricky and I am not sure of the way to handle it yet

**Xi Wang** · 02-04-2010, 11:24 PM

Originally posted by jlmlj View Post

Thanks a lot for the reply, Xi. I checked the human genome that I used, it was un-masked version, and I did not mask repeat regions when mapping, so it may not be the case...

I meant here also the 'N's existing in the human reference genome. Our group have observed many cases where lots of reads packed at the neighbor of 'N' regions.
Hope this helps.

**Chipper** · 02-05-2010, 12:51 AM

Originally posted by jlmlj View Post

The result is as below:
Reads uniquely aligned was 45~%,
Reads multiple aligned was ~6%,
Read failed to align was ~49%.

51% aligned is not too bad, but yo could try also without the -v parameter to allow more mismatches in the 3' end.

Topics	Statistics	Last Post
ASHG 2024 Highlights – Part Two by seqadmin Started by seqadmin, 11-08-2024, 11:09 AM	0 responses 128 views 0 likes	Last Post by seqadmin 11-08-2024, 11:09 AM
ASHG 2024 Highlights – Part One by seqadmin Started by seqadmin, 11-08-2024, 06:13 AM	0 responses 95 views 0 likes	Last Post by seqadmin 11-08-2024, 06:13 AM
Seq-Scope Expands Possibilities for High-Resolution Gene Expression Analysis by seqadmin Started by seqadmin, 11-01-2024, 06:09 AM	0 responses 67 views 0 likes	Last Post by seqadmin 11-01-2024, 06:09 AM
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks by seqadmin Started by seqadmin, 10-30-2024, 05:31 AM	0 responses 25 views 0 likes	Last Post by seqadmin 10-30-2024, 05:31 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News