SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
GATK to discover Single Nucleotide Variation in mature miRNA from miRNA-Seq Bioinfo83 Bioinformatics 0 01-31-2012 04:11 AM
miRNA-Seq with samples that have different % miRNA to Total RNA... DrDTonge Bioinformatics 0 01-12-2012 11:20 PM
multiple mapping in miRNA sequencing jay2008 Bioinformatics 1 10-10-2010 11:52 PM
BWA, BOWTIE: what parameters for different analysis (ChIP, RNA, miRNA etc) dukevn Bioinformatics 2 08-12-2010 09:57 AM
miRNA-seq - mapping to MIRBASE hrajasim Illumina/Solexa 0 02-28-2010 03:29 PM

Reply
 
Thread Tools
Old 01-26-2010, 06:46 AM   #1
staylor
Member
 
Location: Oxford

Join Date: Feb 2009
Posts: 17
Default miRNA mapping using BOWTIE

Hi,

Can bowtie be used for mapping miRNAs to the genome and if so what is the best parameters to use? I have FASTQ files where I have removed the adapter sequence leaving a 18-23mer.

Would

bowtie -l 18 --best --strata

be appropriate?

Thanks.
staylor is offline   Reply With Quote
Old 01-29-2010, 06:12 AM   #2
whsqwghlm
Member
 
Location: Cambridge, UK

Join Date: Jun 2009
Posts: 14
Default

We've been using (to get the top 101 exact matches);
bowtie -k 101 -v 0

Our workflow uniquifies the sequences before alignment so we're not concerned about quality values. I'm also guessing that the miRNA sequences are sufficiently conserved for us not to worry about mismatches.

However, I'm very interested in the views of others on this.
whsqwghlm is offline   Reply With Quote
Old 02-02-2010, 06:21 AM   #3
yjhua2110
Member
 
Location: china

Join Date: Nov 2009
Posts: 67
Default

in our deepBase database, we use options: k 200 v 0. the Specifying the parameters (k 200 v 0) instructs Bowtie to report up to 200 perfect hits for each read.

deepBase is a platform for annotating and discovering small and long ncRNAs from next generation sequencing data. It is available at http://deepbase.sysu.edu.cn
yjhua2110 is offline   Reply With Quote
Old 02-02-2010, 06:49 AM   #4
houhuabin
Member
 
Location: wenzhou.zhejiang.china

Join Date: Apr 2009
Posts: 23
Default

Are you looking for this?
http://seqanswers.com/forums/showthr...light=mirtools

Last edited by houhuabin; 02-02-2010 at 06:58 AM.
houhuabin is offline   Reply With Quote
Old 02-02-2010, 06:55 AM   #5
whsqwghlm
Member
 
Location: Cambridge, UK

Join Date: Jun 2009
Posts: 14
Default

Could well be. However, the link is broken. I would be very grateful if you could fix. Thanks!
whsqwghlm is offline   Reply With Quote
Old 02-02-2010, 06:59 AM   #6
houhuabin
Member
 
Location: wenzhou.zhejiang.china

Join Date: Apr 2009
Posts: 23
Default

Sorry for that, now it is fixed.

Thanks!

Last edited by houhuabin; 02-02-2010 at 07:03 AM.
houhuabin is offline   Reply With Quote
Old 02-02-2010, 09:37 AM   #7
whsqwghlm
Member
 
Location: Cambridge, UK

Join Date: Jun 2009
Posts: 14
Default

After a few days of struggling with quality/homeopolymer/adaptor trimming my reads, and reading about 3' RNA edits and so forth, I've decided to try something similar to staylor's original suggestion (similar to the algorithm used by miRanalyzer);

bowtie -n 0 -l 15 --best

This should give the best match(es) for an exact 15bp 5' seed. If anyone is interested in a direct comparison between this and the original (-v 0) parameters, or has another view on this, please let me know.
whsqwghlm is offline   Reply With Quote
Old 02-03-2010, 11:58 AM   #8
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default

so what is your post processing? what is the reference sequence? and how do you summarize the data?
__________________
--
bioinfosm
bioinfosm is offline   Reply With Quote
Old 02-04-2010, 12:50 AM   #9
whsqwghlm
Member
 
Location: Cambridge, UK

Join Date: Jun 2009
Posts: 14
Default

In terms of post-processing, We're loading the alignments into an Ensembl database so that we can screen for known genes and repeats. We then predict novel small RNAs, and estimate transcript counts for all loci based on read coverage. It's designed to be a generic pipeline for metazoa. As everything is in an Ensembl database the results can be browsed, and ad-hoc reports generated.
whsqwghlm is offline   Reply With Quote
Old 02-25-2010, 04:23 AM   #10
staylor
Member
 
Location: Oxford

Join Date: Feb 2009
Posts: 17
Default

Quote:
Originally Posted by whsqwghlm View Post
In terms of post-processing, We're loading the alignments into an Ensembl database so that we can screen for known genes and repeats. We then predict novel small RNAs, and estimate transcript counts for all loci based on read coverage. It's designed to be a generic pipeline for metazoa. As everything is in an Ensembl database the results can be browsed, and ad-hoc reports generated.
For some reason I didn't get emailed about the activity on my post so I thought no-one was interested! Looks like people have been thinking about it...

whsqwghlm - how did you get on with the mapping? Did the parameters work?
staylor is offline   Reply With Quote
Old 02-25-2010, 04:49 AM   #11
whsqwghlm
Member
 
Location: Cambridge, UK

Join Date: Jun 2009
Posts: 14
Default

Yes! We ended up using;
bowtie -n 0 -l 15 -e 99999 -k 200 --best --chunkmbs 128

We then post-processed the alignments to take the one with the longest 5' exact match (could not find a way to get bowtie to do this natively). The preparation of our library helped - it had been poly-A filled, and the 3' primer was terminated with a poly-T chain. We did not bother to poly-A trim the reads (i.e. remove the primer) as we did not want to lose any 'real' As of the end of sequences.

I'm still generating comparisons with other bowtie configs, and I also need to test the pipeline against a GEO data set with 'normal' primers.
whsqwghlm is offline   Reply With Quote
Old 02-25-2010, 06:15 AM   #12
staylor
Member
 
Location: Oxford

Join Date: Feb 2009
Posts: 17
Default

Ah excellent. I will try that. Thanks for the tip!
staylor is offline   Reply With Quote
Old 02-25-2010, 12:55 PM   #13
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default

Quote:
Originally Posted by whsqwghlm View Post
In terms of post-processing, We're loading the alignments into an Ensembl database so that we can screen for known genes and repeats. We then predict novel small RNAs, and estimate transcript counts for all loci based on read coverage. It's designed to be a generic pipeline for metazoa. As everything is in an Ensembl database the results can be browsed, and ad-hoc reports generated.
Are you using the mirBase for mapping, or the whole human genome?
__________________
--
bioinfosm
bioinfosm is offline   Reply With Quote
Old 02-28-2010, 12:52 PM   #14
whsqwghlm
Member
 
Location: Cambridge, UK

Join Date: Jun 2009
Posts: 14
Default

We're aligning against the whole genome. Reads that do not align to the genome are aligned to mirBase (all species) just in case the assembly is incomplete.
whsqwghlm is offline   Reply With Quote
Old 03-01-2010, 05:33 AM   #15
staylor
Member
 
Location: Oxford

Join Date: Feb 2009
Posts: 17
Default

So are you filtering on the one with the smallest NM value with the longest read?

If you get multiple matches and they all score equally do you pick one at random?
staylor is offline   Reply With Quote
Old 03-01-2010, 05:44 AM   #16
whsqwghlm
Member
 
Location: Cambridge, UK

Join Date: Jun 2009
Posts: 14
Default

Smallest NM value? Sorry - you lost me...

The idea is to record the hit(s) with the longest identical 5' match(es) to the genome, the theory being that primer artefacts, sequencing errors and RNA edits are all concentrated at the 3' end. We also assume that natural variation is absent for miRNAs. If we get multiple matches with the same score, then all of the matches are recorded.
whsqwghlm is offline   Reply With Quote
Old 03-01-2010, 05:58 AM   #17
staylor
Member
 
Location: Oxford

Join Date: Feb 2009
Posts: 17
Default

Clearly not using NM then!:-) In SAM format, NM = the number of nucleotide differences to the reference sequence. I thought this may be a useful tag for filtering. Or do you just count the length of the match?

Do you reject sequences longer than 22bp?
staylor is offline   Reply With Quote
Old 06-10-2010, 11:37 AM   #18
didymos
Junior Member
 
Location: Poland

Join Date: Jun 2010
Posts: 8
Default

I am new in the miRNA field and I am wondering why you are using -k 200 or 101 option? In other words why you want to have 200 alignment with 0 mismatches, rather than one unique with 0 mm?
Thanks!

tomek
didymos is offline   Reply With Quote
Old 06-14-2010, 06:00 AM   #19
whsqwghlm
Member
 
Location: Cambridge, UK

Join Date: Jun 2009
Posts: 14
Default

Each read may map exactly to many places in the genome. We want to capture all of these locations to a threshold promiscuity, typically 100, over which we discard all of the mappings (i.e. if 101 alignments are returned from the search).
whsqwghlm is offline   Reply With Quote
Old 06-14-2010, 06:23 AM   #20
yjhua2110
Member
 
Location: china

Join Date: Nov 2009
Posts: 67
Default

Quote:
Originally Posted by whsqwghlm View Post
Each read may map exactly to many places in the genome. We want to capture all of these locations to a threshold promiscuity, typically 100, over which we discard all of the mappings (i.e. if 101 alignments are returned from the search).
you can use the options: -a -m 100. Specifying -m 100 instructs bowtie to refrain from reporting any alignments for reads having more than 100 reportable alignments.
yjhua2110 is offline   Reply With Quote
Reply

Tags
bowtie parameters mirna

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:14 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO