Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Hi Ben:
    I will put the csfastq (maybe part of it) later somewhere because it's huge.
    And I am using bowtie 0.12.1 (but color index was built by using 0.12-beta).
    There are millions of lines in SAM and pileup files. So fixed "48M" in SAM and fixed "A" in pileup file are unreasonable. (pls wait for me to send you the csfastq files). Thanks a lot! :-)

    Comment


    • Originally posted by xuying View Post
      There are millions of lines in SAM and pileup files. So fixed "48M" in SAM and fixed "A" in pileup file are unreasonable. (pls wait for me to send you the csfastq files). Thanks a lot! :-)
      Why? Given that M = "match or mismatch", when would you expect something other than 48M?

      Ben

      Comment


      • Oh, yes, sorry. I just confused the file with CIGAR notation.

        Comment


        • Hi Ben:
          It seems I can't find a suitable place to put my csfastq file.
          Here I just show some lines in the csfastq file generated from program "solid2fastq" of bfast. Do you think it is ok to go? Should I remove the first primer letter and 1st color to get a true base there?

          @2292_469_84
          T210002310010221002200330303002200201120221.2111.2.
          +
          8<;==:=@?=<<>>>;;??<=<;96:?:5<>;85:=7,,:5/",(/)"*"
          @2292_469_216
          T000111101020011320222113222200220200120202.2222.2.
          +
          /6=>=::>>=;==>;;6=;;9<6:8<(3:-<;/9:852=-7/"2(6)")"
          @2292_469_274
          T300101122322222232222222210222222222022220.2222.2.
          +
          ,=#$$#@%#'#>$,&(;$*$*=)*'&6%,%##*,+#,4),#)",5'#","

          Comment


          • Option for output of pairs where only one end aligns

            With bowtie's current set of options is it possible to have pairs with only one end mapping to the genome be included in the alignment file (e.g. sam file)? I am interested in identifying intra-read short indels through the
            anchoring of one of a mate pair's ends.

            Comment


            • I'd just logged on here to post exactly the question acnoll poses above: "is it possible to have pairs with only one end mapping to the genome be included in the alignment file?"

              The implication there, which after reading the manual and running Bowtie 0.12.1 I believe, is that only read pairs which both match, and fall within the -I/-X constraints, will be output. True?

              The alternative for now is to specify the -a option to get all the mapped output, and post-process that to find what you're interested in, be that the best pair (for some definition of "best"), or reads where only one end matches.

              To have the option to do that directly in Bowtie would be nice.

              --TS

              Comment


              • Hi Ben,
                Can some one pleast let me know whether bowtie works with longer inserts (~20kb) between mate pairs?

                Thanks

                Comment


                • bowtie: should I mask the pseudoautosomal segments of human genome

                  What do you think of my plan to mask the pseudoautosomal segments of human Y chromosome prior to running bowtie on an RNASeq project.

                  Since pseudoautosomal portion of human genome chromosomes X & Y are sequence-wise identical, any alignment strategy that utilizes only unique alignments will discard all alignments to these regions, as each aligning read will have two matches. Thus the 24 known genes wont be counted.

                  I plan to use EMBOSS' `maskseq` to "hard mask" (replace with 'N') chrY prior to building the bowtie indices at:

                  chrY:10001-2649520
                  chrY:59034050-59363566

                  Does anyone see a problem with this approach?

                  I see the `--ntoa` option of the bowtie manual that explicitly states that "By default, Ns are simply excluded from the index and bowtie will not report alignments that overlap them." Does anyone know if the same is true for Xs?

                  Finally, do you agree that the ability to direct bowtie-build to ignore portions of <reference_in> would be a sensible feature to request?

                  Thanks for thinking!

                  Malcolm Cook
                  Stowers Institute for Medical Research

                  Comment


                  • Originally posted by Ben Langmead View Post
                    Hi amaer,

                    Perhaps by end-of-year. It's very hard to say because most of my time goes to collaborators, and they don't have predictable schedules . But by end-of-year is a reasonable guess.

                    Thanks,
                    Ben
                    Hi Ben,

                    What's the status of doing gapped alignments? Do you have an estimated date?

                    thanks, and keep up the great work!

                    Comment


                    • I'm working on this now. I don't have any time estimates.

                      Thanks,
                      Ben

                      Comment


                      • Hi Dr. lengmead,

                        I am doing data analysis for ChIP-seq experiments on transcription factor binding sites. I have 5 million raw reads (76 bp read length) per sample from Illumina platform. I used bowite 0.11.3 to align these reads to reference human genome.

                        The code I used for one high quality alignment was:
                        ~/120809_ChiPseq/bowtie-0.11.3_linux_x86_64/bowtie --solexa1.3-quals -v 2 -a -m 1 -t -p 30 --un result_chipseq2/index2.hq.un --max result_chipseq2/index2.hq.max indexes_chipseq1/h_sapiens_asm reads/index2.fq > result_chipseq2/index2.hq.bt

                        The result is as below:
                        Reads uniquely aligned was 45~%,
                        Reads multiple aligned was ~6%,
                        Read failed to align was ~49%.

                        Then I increased mismatches to 3 (-v 3) and trimmed the low quality end (--trim3 22). However I still had ~45% reads failed to align.

                        There are two questions bother me:
                        1. I have 76bp read length, however bowtie only allows me 3 mismatches at maximum, which I think it is too stringent. Do you think the bowtie will allow more mismatches (7-8) for 76bp or even longer 100bp read length?

                        2. I have ~45-49% reads failed to align (no repeats included) to human reference genome, which is very high. I thought the rate is too high to accept. Do you have any idea of how it happens?

                        Many thanks for your help,
                        jlmlj

                        Comment


                        • There are two questions bother me:
                          1. I have 76bp read length, however bowtie only allows me 3 mismatches at maximum, which I think it is too stringent. Do you think the bowtie will allow more mismatches (7-8) for 76bp or even longer 100bp read length?
                          There is another parameter set of bowtie to deal with the mismaches when mapping reads back to the reference genome: -n -e -l

                          2. I have ~45-49% reads failed to align (no repeats included) to human reference genome, which is very high. I thought the rate is too high to accept. Do you have any idea of how it happens?
                          Maybe the rate is among the normal. If the ChIP-seq reads are from the repeat regions, and you masked the repeat regions when mapping, these reads will fail to map. And there are still quite a few 'N's in the human reference genome.
                          Xi Wang

                          Comment


                          • "Maybe the rate is among the normal. If the ChIP-seq reads are from the repeat regions, and you masked the repeat regions when mapping, these reads will fail to map. And there are still quite a few 'N's in the human reference genome.[/QUOTE]"

                            Thanks a lot for the reply, Xi. I checked the human genome that I used, it was un-masked version, and I did not mask repeat regions when mapping, so it may not be the case...

                            I am thinking to try a couple of parameters, such as --strata, however it looks a bit tricky and I am not sure of the way to handle it yet

                            Comment


                            • Originally posted by jlmlj View Post
                              Thanks a lot for the reply, Xi. I checked the human genome that I used, it was un-masked version, and I did not mask repeat regions when mapping, so it may not be the case...
                              I meant here also the 'N's existing in the human reference genome. Our group have observed many cases where lots of reads packed at the neighbor of 'N' regions.
                              Hope this helps.
                              Xi Wang

                              Comment


                              • Originally posted by jlmlj View Post

                                The result is as below:
                                Reads uniquely aligned was 45~%,
                                Reads multiple aligned was ~6%,
                                Read failed to align was ~49%.
                                51% aligned is not too bad, but yo could try also without the -v parameter to allow more mismatches in the 3' end.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Genetic Variation in Immunogenetics and Antibody Diversity
                                  by seqadmin



                                  The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
                                  11-06-2024, 07:24 PM
                                • seqadmin
                                  Choosing Between NGS and qPCR
                                  by seqadmin



                                  Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                                  10-18-2024, 07:11 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 11-08-2024, 11:09 AM
                                0 responses
                                128 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 11-08-2024, 06:13 AM
                                0 responses
                                95 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 11-01-2024, 06:09 AM
                                0 responses
                                67 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-30-2024, 05:31 AM
                                0 responses
                                25 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X