SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Learning the basics of RNA-seq 1230Rock General 0 11-30-2011 06:56 AM
Newbie Help roshanpatel95 Bioinformatics 5 10-31-2011 08:39 AM
Another Newbie.. Anyone to advise.? teutara Bioinformatics 7 03-16-2011 12:14 PM
fastx newbie madsaan Bioinformatics 0 01-10-2011 11:03 AM
hello from a newbie kathryn Introductions 0 08-13-2008 01:36 AM

Reply
 
Thread Tools
Old 05-29-2012, 11:49 AM   #1
akbowser
Junior Member
 
Location: Maritime Canada

Join Date: Apr 2012
Posts: 5
Default Newbie... need help with the basics

Hello Everyone!

If someone wouldn't mind helping me along here I would really appreciate it...

I have a bunch of sequences from different species. I've been able to identify (using BLAST) a few of them, but there are many that are unknown (no strong BLAST matches) They were all amplified with the same primer pair, but have produced amplicons of different sizes. They don't align unless there are a million (or so it seems) gaps. Some of the sequences are so divergent they don't align at all!

I would eventually like to draw a tree to give insight into where an unidentified sequence belongs.

Do all those gaps affect how the tree will be constructed? What do I do about the sequences that I can't align with the others??

Is there a book that might help me out with this?

Any advice I can get would be great.

Thanks,
Kirsten
akbowser is offline   Reply With Quote
Old 05-29-2012, 12:47 PM   #2
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

If you can't align the sequences because they are too different, you shouldn't make a tree out of them.
maubp is offline   Reply With Quote
Old 05-29-2012, 01:24 PM   #3
Artem
Junior Member
 
Location: Vancouver, BC

Join Date: May 2012
Posts: 6
Default

To construct a tree you want the sequences to have homology, a common evolutionary origin. A good introduction to bioinformatics and trees can be found at. It's targeted at biology students so it's more straightforward to understand than most bioinformatics texts.
http://helix.biology.mcmaster.ca/courses.html

As to your experiment, by using a primer pair you don't only amplify the region you are interested in, you will also amplify any other sequence that also happens to match that primer pair and can arise due to chance (remember the genome is not uniform, some sequences are more common then others).

If you amplify a region in many species, in some you may be amplifying one locus, and in others you can amplify a completely different one.

AB cdefghi JK where AB, JK is your primer pair and ABCDEFGHIJK is the locus you are interested in. In some species they can have AB q835%9 JK, a sequence completely unrelated in evolutionary terms and therefore you shouldn't be building a tree to compare them.

Hope that helps.
Artem is offline   Reply With Quote
Old 05-29-2012, 02:14 PM   #4
Mark
Member
 
Location: Raleigh, NC

Join Date: Nov 2008
Posts: 51
Default

What is the purpose of this work (other than the desire to draw a tree)?
Mark is offline   Reply With Quote
Old 05-29-2012, 10:27 PM   #5
mike.t
Member
 
Location: Spain

Join Date: Mar 2010
Posts: 36
Default

try to reverse complement the sequences that don't align with the others and see if they'll align.
mike.t is offline   Reply With Quote
Old 06-21-2012, 04:58 PM   #6
akbowser
Junior Member
 
Location: Maritime Canada

Join Date: Apr 2012
Posts: 5
Default

Thanks for the replies so far.

The purpose of my work is to identify species within a mixed (and unknown composition) sample. The problem is that there is no complete reference database for me to use to identify all of my sequences. I figured a tree was my best bet at assigning some type of taxonomic identity to my unknown sequences, but now I'm seeing that some people use operational taxonomic units (OTU) with this type of work. I started looking into programs that deal with OTUs but I am already extremely intimidated by the basic programming skills required to run such programs. I don't know where to begin! Please help!
akbowser is offline   Reply With Quote
Old 06-22-2012, 04:57 AM   #7
Wurstmensch
Junior Member
 
Location: Germany

Join Date: Aug 2011
Posts: 6
Default

You could try a metagenomic program like MEGAN (http://ab.inf.uni-tuebingen.de/software/megan/). In my opinion they are easy to start, you only have to blast your reads versus a sufficient database and just import them to the program. But beware that blasting a bunch os sequences could last a lot of time. In addition to this some formats need a lot of disk space, so choosing the right ones in the start could safe you a lot of time.
Wurstmensch is offline   Reply With Quote
Old 06-22-2012, 05:45 AM   #8
Mark
Member
 
Location: Raleigh, NC

Join Date: Nov 2008
Posts: 51
Default

Yes, MEGAN is a useful tool for this. When you say you have a bunch of sequence do you mean 100s, 1000s, 1000000s ? Note when using MEGAN one should generally interpret the output as "these sequences are most similar to sequences in taxon X" not "these sequences are from taxon X". This is particualarly true the nearer to species level you go (MEGAN can make taxonomic assignments at multiple levels).
Mark is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:27 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO