SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Bowtie : More reference sequences Less aligned reads dl432dl Bioinformatics 3 12-17-2013 07:02 AM
how to get uniquely aligned reads from bowtie anurupa Bioinformatics 6 11-15-2012 07:48 PM
Bowtie - only 4.12% reads aligned to transcriptome mcek RNA Sequencing 0 11-15-2011 02:10 AM
TopHat/Bowtie - number of reads aligned mgibson Bioinformatics 7 10-22-2011 08:04 PM
How to retrieve un-aligned reads from Bowtie shuang Bioinformatics 1 10-17-2011 01:35 PM

Reply
 
Thread Tools
Old 08-30-2015, 11:42 PM   #1
Nastya
Junior Member
 
Location: Israel

Join Date: Aug 2015
Posts: 2
Default Bowtie - seaching for less aligned reads

Hello,

I'm trying to use BOWTIE to find less aligned sequences to the mouse genome, and even better sequence that has 0 % matching.
Is it possible via Bowtie?

More over, I have the following read 'GAGCAAACAGAAAAACCAACCCCGGCTGATCGGAAACAGGCA' which matches 100% to chr12 in mm10 (mouse) genome. For this match (see below the results) I got maximum score.

I don't understand why if I change 3 bases randomly in the middle, or add 4 random bases at 3'/5' it does not align the read anymore, and don't display score overthought there is still big similarity. Don't understand why it misses a short alignments (10-30 bases) in a 100 bases read for example.

The command I used was:

CL:"C:\bowtie2\bowtie2-align-s.exe --wrapper basic-0 --local -N 1 -L 2 --gbar 100 --ma 2 --mp 0,0 --score-min L,0,0 -x D:/Augmanity/index/mm10 -f C:/bowtie2/reads/100LengthReads.fa --passthrough"

Attached the results:
Mouse-exact_seq 0 chr12 56691388 44 42M * 0 0 GAGCAAACAGAAAAACCAACCCCGGCTGATCGGAAACAGGCA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:84 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:42 YT:Z:UU
Mouse-3_add_bases 4 * 0 0 * * 0 0 GAGCAAACAGAAAAACCAACCCCGGCTGATCGGAAACAGGCATTT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII YT:Z:UU
Mouse-4-mismatches 4 * 0 0 * * 0 0 GAGCAAACAGAAAAACCAAAAAAGGCTGATCGGAAACAGGCA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII YT:Z:UU


I will appreciate any help. Thank you.
Anastasia

Last edited by Nastya; 08-30-2015 at 11:47 PM.
Nastya is offline   Reply With Quote
Old 08-31-2015, 03:40 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,950
Default

If you need a tool to separate sequences that are NOT aligning to the mouse genome then look at BBSplit.sh as an option: http://seqanswers.com/forums/showthread.php?t=41288
GenoMax is offline   Reply With Quote
Old 09-01-2015, 06:54 AM   #3
Nastya
Junior Member
 
Location: Israel

Join Date: Aug 2015
Posts: 2
Default bbMap - Getting started

Hi,
Thank you!

I'm quite new with these programs, so have many basic questions
If I understood correctly, the reads that won't aligned to the mouse genome should be found in clean.fq file.

* What is considered as a "map" read? only when it matches exactly to the reference?

* Is there an example with the outputs files I can test to be sure that I use it correctly?

* I tried to run the following command: bbmap.sh ref=lambda_virus.fa

and didn't see that the index was created.

* How do I index a reference that is build from a several fa. files (for ex. the entire mouse genome)?

Thank you !
Nastya is offline   Reply With Quote
Old 09-01-2015, 09:08 AM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,950
Default

Quote:
Originally Posted by Nastya View Post
Hi,
Thank you!

I'm quite new with these programs, so have many basic questions
If I understood correctly, the reads that won't aligned to the mouse genome should be found in clean.fq file.
Correct. Those names are just examples you can use your own names.

Quote:
Originally Posted by Nastya View Post

* What is considered as a "map" read? only when it matches exactly to the reference?
BBSplit uses BBMap so you can use the parameters described in BBMap thread to control alignment stringency (Brian Bushnell, author of BBMap participates in the forum and he will confirm).

Quote:
Originally Posted by Nastya View Post
* Is there an example with the outputs files I can test to be sure that I use it correctly?
You can deliberately mix two disparate sequence files you have and see how well BBsplit works.

Quote:
Originally Posted by Nastya View Post
* I tried to run the following command: bbmap.sh ref=lambda_virus.fa

and didn't see that the index was created.
There should be top level directory (always called "ref") that should have been created. Index files will be inside that top-level directory.

Quote:
Originally Posted by Nastya View Post
* How do I index a reference that is build from a several fa. files (for ex. the entire mouse genome)?

Thank you !
"Cat" the fasta chromosome files together into a single big multi-fasta file. Use that to create the index for the genome.
GenoMax is offline   Reply With Quote
Old 09-01-2015, 09:59 AM   #5
wetSEQer
Member
 
Location: TX

Join Date: Dec 2013
Posts: 15
Default

Quote:
Originally Posted by Nastya View Post
Hello,

I'm trying to use BOWTIE to find less aligned sequences to the mouse genome, and even better sequence that has 0 % matching.
Is it possible via Bowtie?

More over, I have the following read 'GAGCAAACAGAAAAACCAACCCCGGCTGATCGGAAACAGGCA' which matches 100% to chr12 in mm10 (mouse) genome. For this match (see below the results) I got maximum score.

I don't understand why if I change 3 bases randomly in the middle, or add 4 random bases at 3'/5' it does not align the read anymore, and don't display score overthought there is still big similarity. Don't understand why it misses a short alignments (10-30 bases) in a 100 bases read for example.

The command I used was:

CL:"C:\bowtie2\bowtie2-align-s.exe --wrapper basic-0 --local -N 1 -L 2 --gbar 100 --ma 2 --mp 0,0 --score-min L,0,0 -x D:/Augmanity/index/mm10 -f C:/bowtie2/reads/100LengthReads.fa --passthrough"

Attached the results:
Mouse-exact_seq 0 chr12 56691388 44 42M * 0 0 GAGCAAACAGAAAAACCAACCCCGGCTGATCGGAAACAGGCA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:84 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:42 YT:Z:UU
Mouse-3_add_bases 4 * 0 0 * * 0 0 GAGCAAACAGAAAAACCAACCCCGGCTGATCGGAAACAGGCATTT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII YT:Z:UU
Mouse-4-mismatches 4 * 0 0 * * 0 0 GAGCAAACAGAAAAACCAAAAAAGGCTGATCGGAAACAGGCA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII YT:Z:UU


I will appreciate any help. Thank you.
Anastasia
Your first read is not an exact match, look at the AS:i, it is -84, should be 0 if is is a exact match.
wetSEQer is offline   Reply With Quote
Old 09-01-2015, 12:38 PM   #6
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Ouch. Cloudflare ate my response

Anyway -

You can split into mouse and non-mouse reads with BBMap like this:

bbmap.sh ref=mm9.fa in=reads.fq outm=mouse.fq outu=nonmouse.fq

For more elaborate splitting into one set of reads per organism (specifically, per reference file), you can use BBSplit:

bbsplit.sh ref=mm9.fa,virus1.fa,virus2.fa in=reads.fq basename=out_%.fq outu=unmapped.fq


Each organism needs to be represented by a single file (using cat, as Genomax mentioned).

Aligners have limits to the difference between a read and a reference for successful aligning. The higher the identity of the alignment, the more likely it is to be correct; so, aligners generally focus on alignments with 90% similarity or higher. You can adjust this in BBMap using the "idfilter" flag. There is no real concept of 0% similarity; even a random sequence will align to the mouse genome with at least 25% identity or so. "map" just means "The aligner thinks it came from this location", so it varies by aligner. Bowtie rejects any alignments with any indels or more than 3 mismatches.

Last edited by Brian Bushnell; 09-01-2015 at 12:41 PM.
Brian Bushnell is offline   Reply With Quote
Reply

Tags
bowtie, bowtie 2, bowtie call function

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:59 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO