Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Hi Ben:
    I will put the csfastq (maybe part of it) later somewhere because it's huge.
    And I am using bowtie 0.12.1 (but color index was built by using 0.12-beta).
    There are millions of lines in SAM and pileup files. So fixed "48M" in SAM and fixed "A" in pileup file are unreasonable. (pls wait for me to send you the csfastq files). Thanks a lot! :-)

    Comment


    • Originally posted by xuying View Post
      There are millions of lines in SAM and pileup files. So fixed "48M" in SAM and fixed "A" in pileup file are unreasonable. (pls wait for me to send you the csfastq files). Thanks a lot! :-)
      Why? Given that M = "match or mismatch", when would you expect something other than 48M?

      Ben

      Comment


      • Oh, yes, sorry. I just confused the file with CIGAR notation.

        Comment


        • Hi Ben:
          It seems I can't find a suitable place to put my csfastq file.
          Here I just show some lines in the csfastq file generated from program "solid2fastq" of bfast. Do you think it is ok to go? Should I remove the first primer letter and 1st color to get a true base there?

          @2292_469_84
          T210002310010221002200330303002200201120221.2111.2.
          +
          8<;==:=@?=<<>>>;;??<=<;96:?:5<>;85:=7,,:5/",(/)"*"
          @2292_469_216
          T000111101020011320222113222200220200120202.2222.2.
          +
          /6=>=::>>=;==>;;6=;;9<6:8<(3:-<;/9:852=-7/"2(6)")"
          @2292_469_274
          T300101122322222232222222210222222222022220.2222.2.
          +
          ,=#$$#@%#'#>$,&(;$*$*=)*'&6%,%##*,+#,4),#)",5'#","

          Comment


          • Option for output of pairs where only one end aligns

            With bowtie's current set of options is it possible to have pairs with only one end mapping to the genome be included in the alignment file (e.g. sam file)? I am interested in identifying intra-read short indels through the
            anchoring of one of a mate pair's ends.

            Comment


            • I'd just logged on here to post exactly the question acnoll poses above: "is it possible to have pairs with only one end mapping to the genome be included in the alignment file?"

              The implication there, which after reading the manual and running Bowtie 0.12.1 I believe, is that only read pairs which both match, and fall within the -I/-X constraints, will be output. True?

              The alternative for now is to specify the -a option to get all the mapped output, and post-process that to find what you're interested in, be that the best pair (for some definition of "best"), or reads where only one end matches.

              To have the option to do that directly in Bowtie would be nice.

              --TS

              Comment


              • Hi Ben,
                Can some one pleast let me know whether bowtie works with longer inserts (~20kb) between mate pairs?

                Thanks

                Comment


                • bowtie: should I mask the pseudoautosomal segments of human genome

                  What do you think of my plan to mask the pseudoautosomal segments of human Y chromosome prior to running bowtie on an RNASeq project.

                  Since pseudoautosomal portion of human genome chromosomes X & Y are sequence-wise identical, any alignment strategy that utilizes only unique alignments will discard all alignments to these regions, as each aligning read will have two matches. Thus the 24 known genes wont be counted.

                  I plan to use EMBOSS' `maskseq` to "hard mask" (replace with 'N') chrY prior to building the bowtie indices at:

                  chrY:10001-2649520
                  chrY:59034050-59363566

                  Does anyone see a problem with this approach?

                  I see the `--ntoa` option of the bowtie manual that explicitly states that "By default, Ns are simply excluded from the index and bowtie will not report alignments that overlap them." Does anyone know if the same is true for Xs?

                  Finally, do you agree that the ability to direct bowtie-build to ignore portions of <reference_in> would be a sensible feature to request?

                  Thanks for thinking!

                  Malcolm Cook
                  Stowers Institute for Medical Research

                  Comment


                  • Originally posted by Ben Langmead View Post
                    Hi amaer,

                    Perhaps by end-of-year. It's very hard to say because most of my time goes to collaborators, and they don't have predictable schedules . But by end-of-year is a reasonable guess.

                    Thanks,
                    Ben
                    Hi Ben,

                    What's the status of doing gapped alignments? Do you have an estimated date?

                    thanks, and keep up the great work!

                    Comment


                    • I'm working on this now. I don't have any time estimates.

                      Thanks,
                      Ben

                      Comment


                      • Hi Dr. lengmead,

                        I am doing data analysis for ChIP-seq experiments on transcription factor binding sites. I have 5 million raw reads (76 bp read length) per sample from Illumina platform. I used bowite 0.11.3 to align these reads to reference human genome.

                        The code I used for one high quality alignment was:
                        ~/120809_ChiPseq/bowtie-0.11.3_linux_x86_64/bowtie --solexa1.3-quals -v 2 -a -m 1 -t -p 30 --un result_chipseq2/index2.hq.un --max result_chipseq2/index2.hq.max indexes_chipseq1/h_sapiens_asm reads/index2.fq > result_chipseq2/index2.hq.bt

                        The result is as below:
                        Reads uniquely aligned was 45~%,
                        Reads multiple aligned was ~6%,
                        Read failed to align was ~49%.

                        Then I increased mismatches to 3 (-v 3) and trimmed the low quality end (--trim3 22). However I still had ~45% reads failed to align.

                        There are two questions bother me:
                        1. I have 76bp read length, however bowtie only allows me 3 mismatches at maximum, which I think it is too stringent. Do you think the bowtie will allow more mismatches (7-8) for 76bp or even longer 100bp read length?

                        2. I have ~45-49% reads failed to align (no repeats included) to human reference genome, which is very high. I thought the rate is too high to accept. Do you have any idea of how it happens?

                        Many thanks for your help,
                        jlmlj

                        Comment


                        • There are two questions bother me:
                          1. I have 76bp read length, however bowtie only allows me 3 mismatches at maximum, which I think it is too stringent. Do you think the bowtie will allow more mismatches (7-8) for 76bp or even longer 100bp read length?
                          There is another parameter set of bowtie to deal with the mismaches when mapping reads back to the reference genome: -n -e -l

                          2. I have ~45-49% reads failed to align (no repeats included) to human reference genome, which is very high. I thought the rate is too high to accept. Do you have any idea of how it happens?
                          Maybe the rate is among the normal. If the ChIP-seq reads are from the repeat regions, and you masked the repeat regions when mapping, these reads will fail to map. And there are still quite a few 'N's in the human reference genome.
                          Xi Wang

                          Comment


                          • "Maybe the rate is among the normal. If the ChIP-seq reads are from the repeat regions, and you masked the repeat regions when mapping, these reads will fail to map. And there are still quite a few 'N's in the human reference genome.[/QUOTE]"

                            Thanks a lot for the reply, Xi. I checked the human genome that I used, it was un-masked version, and I did not mask repeat regions when mapping, so it may not be the case...

                            I am thinking to try a couple of parameters, such as --strata, however it looks a bit tricky and I am not sure of the way to handle it yet

                            Comment


                            • Originally posted by jlmlj View Post
                              Thanks a lot for the reply, Xi. I checked the human genome that I used, it was un-masked version, and I did not mask repeat regions when mapping, so it may not be the case...
                              I meant here also the 'N's existing in the human reference genome. Our group have observed many cases where lots of reads packed at the neighbor of 'N' regions.
                              Hope this helps.
                              Xi Wang

                              Comment


                              • Originally posted by jlmlj View Post

                                The result is as below:
                                Reads uniquely aligned was 45~%,
                                Reads multiple aligned was ~6%,
                                Read failed to align was ~49%.
                                51% aligned is not too bad, but yo could try also without the -v parameter to allow more mismatches in the 3' end.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM
                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                31 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                32 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                28 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-04-2024, 09:00 AM
                                0 responses
                                53 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X