Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by sdm View Post
    Hi all,

    I have used bowtie in paired-end mode. When I checked the results I don't understand the following result:

    these are 2 mates, as far as I understand two identical sequences (forward and reverse), which map to chromosome 6, if I map them individually (bowtie -m1 -v2)

    1.sam:MISEQ:2:000000000-A26AB:1:1101:17175:1762 0 chr6 72678938 255 77M * 0 0 GGACAATTAAAAAGCAACAACCACAATTAATACGGTTTACACAGGCAAAACTCATTAAGTGTGGGTTGGGGCGCTCT DDDDB9BFFFFF??C;ECFFFEHFFEEGHFGHGFHHHHHFFFGFCCACFHFHHHHHG-ECFBEEECE>>*5+CCHHH XA:i:1MD:Z:76C0 NM:i:1

    2.sam:MISEQ:2:000000000-A26AB:1:1101:17175:1762 16 chr6 72678938 255 77M * 0 0 GGACAATTAAAAAGCAACAACCACAATTAATACGGTTTACACAGGCAAAACTCATTAAGTGTGGGTTGGGGCGCTCT CAC-C5-,FFFCAA,C>+A5+-5-AA--CA+C7A9...A-A.EEA,FEAA../A..CC@+@+@@=<+@@==+<,,5, XA:i:1MD:Z:76C0 NM:i:1

    however if I run
    bowtie-0.12.7/bowtie --phred33-quals -X 2000 --fr --chunkmbs 300 -p 4 -a -v 2 --sam -q -1 1.fq -2 2.fq > paired.sam

    the sequence pair is said to be either unmapped or with an insert size of -1109, it should be 0 in this case?

    paired.sam:MISEQ:2:000000000-A26AB:1:1101:17175:1762 1:N:0: 77 * 0 0 * * 0 0 GGACAATTAAAAAGCAACAACCACAATTAATACGGTTTACACAGGCAAAACTCATTAAGTGTGGGTTGGGGCGCTCT DDDDB9BFFFFF??C;ECFFFEHFFEEGHFGHGFHHHHHFFFGFCCACFHFHHHHHG-ECFBEEECE>>*5+CCHHH XM:i:0
    raw.paired.sam:MISEQ:2:000000000-A26AB:1:1101:17175:1762 147 chr6 72678938 255 77M = 72677906 -1109 GGACAATTAAAAAGCAACAACCACAATTAATACGGTTTACACAGGCAAAACTCATTAAGTGTGGGTTGGGGCGCTCT CAC-C5-,FFFCAA,C>+A5+-5-AA--CA+C7A9...A-A.EEA,FEAA../A..CC@+@+@@=<+@@==+<,,5, XA:i:1 MD:Z:76C0 NM:i:1

    If anybody has an idea what I have missed, I would be very grateful.
    You didn't miss anything, bowtie1 doesn't deal well with those (or whenever the start/end coordinates of one reads are found completely within another). I believe that functions normally in bowtie2. Alternatively, trim_galore has an option to get around this in your read-trimming step.

    Comment


    • Originally posted by dpryan View Post
      You didn't miss anything, bowtie1 doesn't deal well with those (or whenever the start/end coordinates of one reads are found completely within another). I believe that functions normally in bowtie2. Alternatively, trim_galore has an option to get around this in your read-trimming step.
      thanks for your quick reply ! Apparently, can't use bowtie1 then for paired-end alignment.

      Comment


      • Originally posted by sdm View Post
        thanks for your quick reply ! Apparently, can't use bowtie1 then for paired-end alignment.
        If this sort of thing is common in your library, then it'd be best not to (unless you use the trim_galore trick that I mentioned). I expect for most libraries this is a rare enough occurrence that bowtie1 works fine (I don't actually use bowtie1 these days, so I can't say I've checked).

        Comment


        • bowtie2

          Hi I am using the latest builds of bowtie 1 and 2 with 64 bit support..

          But they are still dying with the error:
          Reading reference sizes
          Error: Reference sequence has more than 2^32-1 characters! Please divide the
          reference into batches or chunks of about 3.6 billion characters or less each

          64-bit
          Built on do-dmxp-mac.win.ad.jhu.edu
          Tue Feb 26 13:33:50 EST 2013
          Compiler: gcc version 4.1.2 20080704 (Red Hat 4.1.2-54)
          Options: -O3 -m64 -msse2 -funroll-loops -g3
          Sizeof {int, long, long long, void*, size_t, off_t}: {4, 8, 8, 8, 8, 8}

          Comment


          • Originally posted by sahiilseth View Post
            Hi I am using the latest builds of bowtie 1 and 2 with 64 bit support..

            But they are still dying with the error:
            Reading reference sizes
            Error: Reference sequence has more than 2^32-1 characters! Please divide the
            reference into batches or chunks of about 3.6 billion characters or less each

            64-bit
            Built on do-dmxp-mac.win.ad.jhu.edu
            Tue Feb 26 13:33:50 EST 2013
            Compiler: gcc version 4.1.2 20080704 (Red Hat 4.1.2-54)
            Options: -O3 -m64 -msse2 -funroll-loops -g3
            Sizeof {int, long, long long, void*, size_t, off_t}: {4, 8, 8, 8, 8, 8}

            From Bowtie website:

            Because bowtie2-build uses 32-bit pointers internally, it can handle up to a theoretical maximum of 2^32-1 (somewhat more than 4 billion) characters in an index, though, with other constraints, the actual ceiling is somewhat less than that. If your reference exceeds 2^32-1 characters, bowtie2-build will print an error message and abort. To resolve this, divide your reference sequences into smaller batches and/or chunks and build a separate index for each.
            BWA is able to handle genomes > 4 GB in size (individual chromosomes < 2 GB).
            Last edited by GenoMax; 08-22-2013, 11:51 AM.

            Comment


            • [QUOTE=GenoMax;114232]From Bowtie website:
              They also say:
              'If your computer has more than 3-4 GB of memory and you would like to exploit that fact to make index building faster, use a 64-bit version of the bowtie2-build binary. The 32-bit version of the binary is restricted to using less than 4 GB of memory. If a 64-bit pre-built binary does not yet exist for your platform on the sourceforge download site, you will need to build one from source.'

              I thought 64 bit binary, should be able to handle more characters as well; not true?

              Comment


              • Originally posted by sahiilseth View Post
                From Bowtie website:
                They also say:
                'If your computer has more than 3-4 GB of memory and you would like to exploit that fact to make index building faster, use a 64-bit version of the bowtie2-build binary. The 32-bit version of the binary is restricted to using less than 4 GB of memory. If a 64-bit pre-built binary does not yet exist for your platform on the sourceforge download site, you will need to build one from source.'

                I thought 64 bit binary, should be able to handle more characters as well; not true?
                That reference is only for being able to use more memory during the index building stage to speed that process up.

                Comment


                • Dear Ben,
                  Why is the last version of Bowtie using the mm9 rather than mm10?
                  What is better Bowtie or Bowtie2 for alighment of 50 nt HiSeq Illumina ChIP-Seq redas?
                  I have read that Bowtie is good for short reads up to 100 nt, but Bowtie2 from 50 nt and higher. Still 50 nt reads are on the border for the programms.
                  If Bowtie2 is used, how to get rid of ununique reads?
                  Many thanks in advance

                  Comment


                  • Hi Ben,
                    Sorry to resurrect an old post. I am getting the error Error: Reference sequence has more than 2^32-1 characters!. I know this means I need to split my reference in order to use bowtie2-build but I am wondering about mapping my reads to this reference which has been split. Is it possible to concatenate the split-indexed files and map the reads to this concatenated file or will I have to map the reads to each indexed files separately and write scripts to find which has the best hit.
                    Thank you
                    Angela

                    Comment


                    • Bowtie2

                      I think you need to map your reads to divided indexes and then write script to bring them together.

                      Comment


                      • Reply to thread 'Bowtie, an ultrafast, memory-efficient, open source short read align

                        Hi Angie,

                        I think if you use split reference you'll have issues calculating the alignment quality of the best alignment during the merge, it's a bit more complicated than just selecting the best alignment.
                        You will likely get more accurate alignment qualities if you don't split the reference and instead use an aligner like BWA or Novoalign that can handle genomes >4Gbp.

                        KR, Colin

                        Originally posted by angie_red View Post
                        Hi Ben,
                        Sorry to resurrect an old post. I am getting the error Error: Reference sequence has more than 2^32-1 characters!. I know this means I need to split my reference in order to use bowtie2-build but I am wondering about mapping my reads to this reference which has been split. Is it possible to concatenate the split-indexed files and map the reads to this concatenated file or will I have to map the reads to each indexed files separately and write scripts to find which has the best hit.
                        Thank you
                        Angela

                        Comment


                        • Thanks for the reply Colin and rshina. As suggested I have indexed the reference without splitting it with BWA so I will proceed with this approach
                          Cheers
                          Angela

                          Comment


                          • Originally posted by sparks View Post
                            Hi Angie,

                            I think if you use split reference you'll have issues calculating the alignment quality of the best alignment during the merge, it's a bit more complicated than just selecting the best alignment.
                            You will likely get more accurate alignment qualities if you don't split the reference and instead use an aligner like BWA or Novoalign that can handle genomes >4Gbp.

                            KR, Colin
                            While using BWA or Novoalign are certainly the better solutions, one can relatively simply recalculate MAPQs from multiple alignment files to different references with bowtie. The bowtie MAPQ score is dependent primarily on the AS:i: and XS:i: score of each read, so you can just rerun the algorithm on that (bowtie MAPQs are more of a vague approximation than you may think). This is the approach I took in bison, where there are multiple parallel alignments of each read to different bisulfite converted genomes.

                            Comment


                            • Agree you can merge Bowtie and calculate a vague alignment quality. Perhaps you can give Angie the formulae for it.

                              Originally posted by dpryan View Post
                              While using BWA or Novoalign are certainly the better solutions, one can relatively simply recalculate MAPQs from multiple alignment files to different references with bowtie. The bowtie MAPQ score is dependent primarily on the AS:i: and XS:i: score of each read, so you can just rerun the algorithm on that (bowtie MAPQs are more of a vague approximation than you may think). This is the approach I took in bison, where there are multiple parallel alignments of each read to different bisulfite converted genomes.

                              Comment


                              • Originally posted by sparks View Post
                                Agree you can merge Bowtie and calculate a vague alignment quality. Perhaps you can give Angie the formulae for it.
                                Inputs are the AS and XS score of the resulting best hit (the XS score may be the AS score of the second best hit, that is the alignment to the other chunk of the reference). scMin is the minimum score for a given read (this is derived from the --score-min option given as input). I have a function to calculate this, but it depends on previously parsing user input and storing things in a struct that's specific to bison (so that function wouldn't be very useful), so I won't paste it below. This is basically a C version of what bowtie2 uses (complete with casting single-precision floats to double precision).

                                Oh, the config.mode just denotes --end-to-end or --local. You'd need to change that to be a function input rather than relying on a global struct I think the remainder should work, though!

                                Code:
                                /******************************************************************************
                                *
                                *   Calculate a MAPQ, given AS, XS, and the minimum score (ala bowtie2)
                                *
                                *******************************************************************************/
                                int calc_MAPQ_BT2(int AS, int XS, int scMin) {
                                    int diff, bestOver, bestdiff;
                                    diff = abs(scMin); //Range of possible alignment scores
                                    bestOver = AS-scMin; //Shift alignment score range, so worst score is 0
                                
                                    //The method depends on config.mode
                                    bestdiff = (int) abs(abs((float) AS)-abs((float) XS)); //Absolute distance between alignment scores
                                    if(config.mode == 0) { //--end-to-end (default)
                                        if(XS < scMin) {
                                            if(bestOver >= diff * (double) 0.8f) return 42;
                                            else if(bestOver >= diff * (double) 0.7f) return 40;
                                            else if(bestOver >= diff * (double) 0.6f) return 24;
                                            else if(bestOver >= diff * (double) 0.5f) return 23;
                                            else if(bestOver >= diff * (double) 0.4f) return 8;
                                            else if(bestOver >= diff * (double) 0.3f) return 3;
                                            else return 0;
                                        } else {
                                            if(bestdiff >= diff * (double) 0.9f) {
                                                if(bestOver == diff) {
                                                    return 39;
                                                } else {
                                                    return 33;
                                                }
                                            } else if(bestdiff >= diff * (double) 0.8f) {
                                                if(bestOver == diff) {
                                                    return 38;
                                                } else {
                                                    return 27;
                                                }
                                            } else if(bestdiff >= diff * (double) 0.7f) {
                                                if(bestOver == diff) {
                                                    return 37;
                                                } else {
                                                    return 26;
                                                }
                                            } else if(bestdiff >= diff * (double) 0.6f) {
                                                if(bestOver == diff) {
                                                    return 36;
                                                } else {
                                                    return 22;
                                                }
                                            } else if(bestdiff >= diff * (double) 0.5f) {
                                                if(bestOver == diff) {
                                                    return 35;
                                                } else if(bestOver >= diff * (double) 0.84f) {
                                                    return 25;
                                                } else if(bestOver >= diff * (double) 0.68f) {
                                                    return 16;
                                                } else {
                                                    return 5;
                                                }
                                            } else if(bestdiff >= diff * (double) 0.4f) {
                                               if(bestOver == diff) {
                                                    return 34;
                                                } else if(bestOver >= diff * (double) 0.84f) {
                                                    return 21;
                                                } else if(bestOver >= diff * (double) 0.68f) {
                                                    return 14;
                                                } else {
                                                    return 4;
                                                }
                                            } else if(bestdiff >= diff * (double) 0.3f) {
                                                if(bestOver == diff) {
                                                    return 32;
                                                } else if(bestOver >= diff * (double) 0.88f) {
                                                    return 18;
                                                } else if(bestOver >= diff * (double) 0.67f) {
                                                    return 15;
                                                } else {
                                                    return 3;
                                                }
                                            } else if(bestdiff >= diff * (double) 0.2f) {
                                                if(bestOver == diff) {
                                                    return 31;
                                                } else if(bestOver >= diff * (double) 0.88f) {
                                                    return 17;
                                                } else if(bestOver >= diff * (double) 0.67f) {
                                                    return 11;
                                                } else {
                                                    return 0;
                                                }
                                            } else if(bestdiff >= diff * (double) 0.1f) {
                                                if(bestOver == diff) {
                                                    return 30;
                                                } else if(bestOver >= diff * (double) 0.88f) {
                                                    return 12;
                                                } else if(bestOver >= diff * (double) 0.67f) {
                                                    return 7;
                                                } else {
                                                    return 0;
                                                }
                                            } else if(bestdiff > 0) {
                                                if(bestOver >= diff * (double) 0.67f) {
                                                    return 6;
                                                } else {
                                                    return 2;
                                                }
                                            } else {
                                                if(bestOver >= diff * (double) 0.67f) {
                                                    return 1;
                                                } else {
                                                    return 0;
                                                }
                                            }
                                        }
                                    } else { //--local
                                        if(XS < scMin) {
                                            if(bestOver >= diff * (double) 0.8f) return 44;
                                            else if(bestOver >= diff * (double) 0.7f) return 42;
                                            else if(bestOver >= diff * (double) 0.6f) return 41;
                                            else if(bestOver >= diff * (double) 0.5f) return 36;
                                            else if(bestOver >= diff * (double) 0.4f) return 28;
                                            else if(bestOver >= diff * (double) 0.3f) return 24;
                                            else return 22;
                                        } else {
                                            if(bestdiff >= diff * (double) 0.9f) return 40;
                                            else if(bestdiff >= diff * (double) 0.8f) return 39;
                                            else if(bestdiff >= diff * (double) 0.7f) return 38;
                                            else if(bestdiff >= diff * (double) 0.6f) return 37;
                                            else if(bestdiff >= diff * (double) 0.5f) {
                                                if     (bestOver == diff)       return 35;
                                                else if(bestOver >= diff * (double) 0.5f) return 25;
                                                else                            return 20;
                                            } else if(bestdiff >= diff * (double) 0.4f) {
                                                if     (bestOver == diff)       return 34;
                                                else if(bestOver >= diff * (double) 0.5f) return 21;
                                                else                            return 19;
                                            } else if(bestdiff >= diff * (double) 0.3f) {
                                                if     (bestOver == diff)       return 33;
                                                else if(bestOver >= diff * (double) 0.5f) return 18;
                                                else                            return 16;
                                            } else if(bestdiff >= diff * (double) 0.2f) {
                                                if     (bestOver == diff)       return 32;
                                                else if(bestOver >= diff * (double) 0.5f) return 17;
                                                else                            return 12;
                                            } else if(bestdiff >= diff * (double) 0.1f) {
                                                if     (bestOver == diff)       return 31;
                                                else if(bestOver >= diff * (double) 0.5f) return 14;
                                                else                            return 9;
                                            } else if(bestdiff > 0) {
                                                if(bestOver >= diff * (double) 0.5f)      return 11;
                                                else                            return 2;
                                            } else {
                                                if(bestOver >= diff * (double) 0.5f)      return 1;
                                                else                            return 0;
                                            }
                                        }
                                    }
                                }
                                Last edited by dpryan; 02-11-2014, 08:31 AM. Reason: Slightly incorrect code

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM
                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                18 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                22 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                16 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-04-2024, 09:00 AM
                                0 responses
                                47 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X