Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bowtie - seaching for less aligned reads

    Hello,

    I'm trying to use BOWTIE to find less aligned sequences to the mouse genome, and even better sequence that has 0 % matching.
    Is it possible via Bowtie?

    More over, I have the following read 'GAGCAAACAGAAAAACCAACCCCGGCTGATCGGAAACAGGCA' which matches 100% to chr12 in mm10 (mouse) genome. For this match (see below the results) I got maximum score.

    I don't understand why if I change 3 bases randomly in the middle, or add 4 random bases at 3'/5' it does not align the read anymore, and don't display score overthought there is still big similarity. Don't understand why it misses a short alignments (10-30 bases) in a 100 bases read for example.

    The command I used was:

    CL:"C:\bowtie2\bowtie2-align-s.exe --wrapper basic-0 --local -N 1 -L 2 --gbar 100 --ma 2 --mp 0,0 --score-min L,0,0 -x D:/Augmanity/index/mm10 -f C:/bowtie2/reads/100LengthReads.fa --passthrough"

    Attached the results:
    Mouse-exact_seq 0 chr12 56691388 44 42M * 0 0 GAGCAAACAGAAAAACCAACCCCGGCTGATCGGAAACAGGCA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:84 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:42 YT:Z:UU
    Mouse-3_add_bases 4 * 0 0 * * 0 0 GAGCAAACAGAAAAACCAACCCCGGCTGATCGGAAACAGGCATTT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII YT:Z:UU
    Mouse-4-mismatches 4 * 0 0 * * 0 0 GAGCAAACAGAAAAACCAAAAAAGGCTGATCGGAAACAGGCA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII YT:Z:UU


    I will appreciate any help. Thank you.
    Anastasia
    Last edited by Nastya; 08-30-2015, 11:47 PM.

  • #2
    If you need a tool to separate sequences that are NOT aligning to the mouse genome then look at BBSplit.sh as an option: http://seqanswers.com/forums/showthread.php?t=41288

    Comment


    • #3
      bbMap - Getting started

      Hi,
      Thank you!

      I'm quite new with these programs, so have many basic questions
      If I understood correctly, the reads that won't aligned to the mouse genome should be found in clean.fq file.

      * What is considered as a "map" read? only when it matches exactly to the reference?

      * Is there an example with the outputs files I can test to be sure that I use it correctly?

      * I tried to run the following command: bbmap.sh ref=lambda_virus.fa

      and didn't see that the index was created.

      * How do I index a reference that is build from a several fa. files (for ex. the entire mouse genome)?

      Thank you !

      Comment


      • #4
        Originally posted by Nastya View Post
        Hi,
        Thank you!

        I'm quite new with these programs, so have many basic questions
        If I understood correctly, the reads that won't aligned to the mouse genome should be found in clean.fq file.
        Correct. Those names are just examples you can use your own names.

        Originally posted by Nastya View Post

        * What is considered as a "map" read? only when it matches exactly to the reference?
        BBSplit uses BBMap so you can use the parameters described in BBMap thread to control alignment stringency (Brian Bushnell, author of BBMap participates in the forum and he will confirm).

        Originally posted by Nastya View Post
        * Is there an example with the outputs files I can test to be sure that I use it correctly?
        You can deliberately mix two disparate sequence files you have and see how well BBsplit works.

        Originally posted by Nastya View Post
        * I tried to run the following command: bbmap.sh ref=lambda_virus.fa

        and didn't see that the index was created.
        There should be top level directory (always called "ref") that should have been created. Index files will be inside that top-level directory.

        Originally posted by Nastya View Post
        * How do I index a reference that is build from a several fa. files (for ex. the entire mouse genome)?

        Thank you !
        "Cat" the fasta chromosome files together into a single big multi-fasta file. Use that to create the index for the genome.

        Comment


        • #5
          Originally posted by Nastya View Post
          Hello,

          I'm trying to use BOWTIE to find less aligned sequences to the mouse genome, and even better sequence that has 0 % matching.
          Is it possible via Bowtie?

          More over, I have the following read 'GAGCAAACAGAAAAACCAACCCCGGCTGATCGGAAACAGGCA' which matches 100% to chr12 in mm10 (mouse) genome. For this match (see below the results) I got maximum score.

          I don't understand why if I change 3 bases randomly in the middle, or add 4 random bases at 3'/5' it does not align the read anymore, and don't display score overthought there is still big similarity. Don't understand why it misses a short alignments (10-30 bases) in a 100 bases read for example.

          The command I used was:

          CL:"C:\bowtie2\bowtie2-align-s.exe --wrapper basic-0 --local -N 1 -L 2 --gbar 100 --ma 2 --mp 0,0 --score-min L,0,0 -x D:/Augmanity/index/mm10 -f C:/bowtie2/reads/100LengthReads.fa --passthrough"

          Attached the results:
          Mouse-exact_seq 0 chr12 56691388 44 42M * 0 0 GAGCAAACAGAAAAACCAACCCCGGCTGATCGGAAACAGGCA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:84 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:42 YT:Z:UU
          Mouse-3_add_bases 4 * 0 0 * * 0 0 GAGCAAACAGAAAAACCAACCCCGGCTGATCGGAAACAGGCATTT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII YT:Z:UU
          Mouse-4-mismatches 4 * 0 0 * * 0 0 GAGCAAACAGAAAAACCAAAAAAGGCTGATCGGAAACAGGCA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII YT:Z:UU


          I will appreciate any help. Thank you.
          Anastasia
          Your first read is not an exact match, look at the AS:i, it is -84, should be 0 if is is a exact match.

          Comment


          • #6
            Ouch. Cloudflare ate my response

            Anyway -

            You can split into mouse and non-mouse reads with BBMap like this:

            bbmap.sh ref=mm9.fa in=reads.fq outm=mouse.fq outu=nonmouse.fq

            For more elaborate splitting into one set of reads per organism (specifically, per reference file), you can use BBSplit:

            bbsplit.sh ref=mm9.fa,virus1.fa,virus2.fa in=reads.fq basename=out_%.fq outu=unmapped.fq


            Each organism needs to be represented by a single file (using cat, as Genomax mentioned).

            Aligners have limits to the difference between a read and a reference for successful aligning. The higher the identity of the alignment, the more likely it is to be correct; so, aligners generally focus on alignments with 90% similarity or higher. You can adjust this in BBMap using the "idfilter" flag. There is no real concept of 0% similarity; even a random sequence will align to the mouse genome with at least 25% identity or so. "map" just means "The aligner thinks it came from this location", so it varies by aligner. Bowtie rejects any alignments with any indels or more than 3 mismatches.
            Last edited by Brian Bushnell; 09-01-2015, 12:41 PM.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM
            • seqadmin
              The Impact of AI in Genomic Medicine
              by seqadmin



              Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
              02-26-2024, 02:07 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 03-14-2024, 06:13 AM
            0 responses
            32 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-08-2024, 08:03 AM
            0 responses
            71 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-07-2024, 08:13 AM
            0 responses
            80 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-06-2024, 09:51 AM
            0 responses
            68 views
            0 likes
            Last Post seqadmin  
            Working...
            X