SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > 454 Pyrosequencing
Similar Threads
Thread Thread Starter Forum Replies Last Post
New, Completely Gel-Free, DNA Library Prep Protocol Bioo Scientific Vendor Forum 1 01-11-2012 07:49 AM
Read depth recommendations dpryan RNA Sequencing 2 09-30-2011 11:15 AM
What's maximum depth and minimum depth? slny Bioinformatics 0 03-17-2011 10:13 AM
Completely new to NGS NGS_user General 1 11-05-2010 06:35 PM
depth calculation sheilal Bioinformatics 5 10-04-2009 11:20 AM

Reply
 
Thread Tools
Old 07-29-2010, 08:03 AM   #1
Fwip
Junior Member
 
Location: Rochester

Join Date: Jul 2010
Posts: 6
Question Completely new to this and out of my depth

Hi, I am a complete newbie to the entire realm of sequence analysis. My background is in Computer Engineering, so I don't understand a lot of the details on how primers, etc. work.

What I'm trying to do is, I think, pretty straightforward. We've received data from Roche, including the sff files which contain the raw reads. Previously, I have worked with other data, starting from Fasta files which contain about 10-100 aligned sequences. That is the type of file I would like to produce.

Currently, I am attempting to do this the AVA software. I create a project, define the appropriate amplicons and MID sequences, and hook up the samples and data with multiplexers. I _think_ that I am doing all of this correctly, the manual was very helpful.

To get the fasta files, I am using the doAmplicon 'report align' command. When I report the consensus files, I get about ~1000 sequences. Individually, I have about ~10000 sequences.

I don't mind having a lot of sequences, but the similarity in them is very high. Infections that are known to have existed for a while are exhibiting low diversity, characteristic of much more recent infections.

I am not sure if my workflow is appropriate for what I am trying to do - perhaps I would be better served by the gsMapper or gsAssembler programs?

Any advice would be very much appreciated.
Thank you,
-Fwip

P.S: There is probably relevant important information that I am not including - please ask for clarification and I will do my best to answer. Thank you!
Fwip is offline   Reply With Quote
Old 07-29-2010, 08:19 AM   #2
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,318
Default

Hi Fwip,
Could you give us some details of the experiment lab-side? Did you create the amplicons or did Roche? How much starting material was used in the PCR reaction to create the amplicons and how did you assay the amount of starting material. Also, what was the starting material (eg, RNA, ssDNA, dsDNA), etc.
--
Phillip
pmiguel is offline   Reply With Quote
Old 07-29-2010, 08:22 AM   #3
Fwip
Junior Member
 
Location: Rochester

Join Date: Jul 2010
Posts: 6
Default

Oh goodness... I have very little knowledge of what went on lab-side. I am reasonably sure that Roche created the amplicons. I have no idea on the other questions. If it helps, this is data on HIV infection.

Sorry I can't be more helpful here.
Fwip is offline   Reply With Quote
Old 07-29-2010, 10:37 AM   #4
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,318
Default

Well then suffice it to say that if the amount of RNA given to Roche was limiting, the number of viruses assayed may not have been sufficient to detect any but the most common variants.

Amplicons employ PCR. PCR, as long as it gets a single, amplifiable, chunk of DNA to start with, can usually produce lots of DNA given enough cycles of amplification.

That said, I find the AVA software to be quite inscrutable. So, it is also possible that AVA is hiding most of your variants for one reason or another. There are ways, however, to force it to reveal them all.

--
Phillip
pmiguel is offline   Reply With Quote
Old 07-30-2010, 06:13 AM   #5
Fwip
Junior Member
 
Location: Rochester

Join Date: Jul 2010
Posts: 6
Default

Hmmm, thank you. I guess it is certainly possible that the samples were not terribly informative and represent only a small portion of the genetic diversity.

The results seem similar for all of the samples that I have analyzed so far, so I am still hoping it is user error on my part.

I'll keep toying around with this software, then.

Thank you,
Fwip
Fwip is offline   Reply With Quote
Old 07-30-2010, 07:43 AM   #6
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,318
Default

I hasten to add, that there are some default settings that would tend not to display rarer variants. But those become obvious as you tinker around with the AVA GUI.

--
Phillip
pmiguel is offline   Reply With Quote
Old 08-02-2010, 07:38 AM   #7
Fwip
Junior Member
 
Location: Rochester

Join Date: Jul 2010
Posts: 6
Default

The more I try to work this, the more I think that my workflow may be incorrect. I haven't had any training with this software, so I am not even sure what programs I should be using, or in what order.

Here is where I am, and my goals:

The data we received included read data from eight gaskets, split up into 8 SFF files. There was also an FNA and QUAL file for each gasket, which I believe was derived from the respective SFF file. Each gasket had 4 MID tags. All samples had an identical forward 5' primer and one of two different reverse 5' primers, but are still uniquely identified by their gasket/MID combo.

I would like to get a FASTA file for each gasket/MID combination, representing the reconstructed sequences. If a variant is very common, it should show up in the FASTA file multiple times. Alternatively, if I could get a number representing the frequency at which it appears, that would work.

What should my workflow look like for this? Should I do this entirely in gsMapper, gsAssembler, or gsAmplicon? Should I take the output of one program and analyze it in another?

Thank you very much for your help,
Fwip
Fwip is offline   Reply With Quote
Old 08-02-2010, 07:54 AM   #8
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,318
Default

gsAmplicon should be able to handle the .sff files as they are, although you may need to specify the info for the MIDs.

gsMapper -- there you would probably want to use sfffile to break the .sff into separate MID sffs. Given your MID structure this would not be trivial, although it should be do-able.

Seems like your best bet is gsAmplicon. Are you using the GUI?

--
Phillip
pmiguel is offline   Reply With Quote
Old 08-02-2010, 08:29 AM   #9
Fwip
Junior Member
 
Location: Rochester

Join Date: Jul 2010
Posts: 6
Default

Thanks for the help.

I am currently primarily using the GUI, just because it gives me the most feedback on what I am doing. I'm using the command-line tool to extract the data once the computation is complete though (using 'report align'), because I could not figure out how to do that with the GUI.

-Fwip
Fwip is offline   Reply With Quote
Old 08-04-2010, 01:06 PM   #10
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

Quote:
Originally Posted by Fwip View Post
...

I would like to get a FASTA file for each gasket/MID combination, representing the reconstructed sequences. If a variant is very common, it should show up in the FASTA file multiple times. Alternatively, if I could get a number representing the frequency at which it appears, that would work.
...
What should my workflow look like for this? Should I do this entirely in gsMapper, gsAssembler, or gsAmplicon? Should I take the output of one program and analyze it in another?
Working with amplicons you should use gsAmplicon and doAmplicon. Both are complex programs which take a while to master. Thus it is hard to troubleshoot what problems you may be having. However having 1000s of consensus sequences for a given sample and reference sounds high.

Phillip's and my most recent project -- admittedly a simple one with 1 amplicon, 4 samples/MIDs, and 1 reference -- generates between 15 to 23 consensus sequences per sample. This is starting from 65,000 to 100,000+ reads per sample. We also only found 4 variants in all of the samples. Such a low variance may explain our low number of consensus sequences.

Perhaps, in order to add troubleshooting, you could within doAmplicon a 'list' of your 'amplicon', 'mid', 'sample' and 'variant' and tell us, if not the exact entries, at least the counts of what you have. This could aid us in figuring out how complex your project is.
westerman is offline   Reply With Quote
Old 08-05-2010, 05:12 AM   #11
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,318
Default

Quote:
Originally Posted by westerman View Post

Phillip's and my most recent project -- admittedly a simple one with 1 amplicon, 4 samples/MIDs, and 1 reference -- generates between 15 to 23 consensus sequences per sample. This is starting from 65,000 to 100,000+ reads per sample. We also only found 4 variants in all of the samples. Such a low variance may explain our low number of consensus sequences.
Ours were from a host gene. Fwip's are from a virus. Completely different set of expectations. The high error rate of HIV replication is thought to aid it in evading the host immune system. Fwip actually expected to see more variants than he did.

--
Phillip
pmiguel is offline   Reply With Quote
Old 08-05-2010, 10:01 AM   #12
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

Quote:
Originally Posted by pmiguel View Post
Ours were from a host gene. Fwip's are from a virus. Completely different set of expectations. The high error rate of HIV replication is thought to aid it in evading the host immune system. Fwip actually expected to see more variants than he did.

--
Phillip
No doubt that Fwip's experiment will be expected to produce different expectations than was anticipated in our experiment. Did Fwip actually mention the number of variants he saw? I did not see it in the posts. Perhaps he sent you private mail?

Anyway by having him do a various 'list's (via doAmplicon) I can get a feel for if he is setting up and running the Amplicon software correctly. As you probably recall getting the MID part setup properly was tricky for our experiment.

Fwip: If you want to send private email to Phillip and/or myself then please do so. My address is just my user name at purdue.edu
westerman is offline   Reply With Quote
Old 08-05-2010, 10:21 AM   #13
Fwip
Junior Member
 
Location: Rochester

Join Date: Jul 2010
Posts: 6
Default

Thank you both for the help - it turns out that a large part of it, at least, was that my reference sequence was incorrectly setup. I had not realized that the reference sequence included the primers, and so I was not using the correct reverse primer for the data I was looking at. On top of this, I wasn't narrowing the amplicon range to only include the area between the primers, which could not have helped things.

I've rerun one gasket with these corrections, and already the data looks much better. Thank you!
Fwip is offline   Reply With Quote
Old 10-26-2011, 10:54 AM   #14
RubyTuesday
Junior Member
 
Location: Panama

Join Date: Oct 2011
Posts: 4
Default

Hello, not sure how much relevance this has to you now considering the date and that I may not have accurately gauged the question you´re asking, but you could run the relevant sections of your sequences through the online Stanford DB for drug-resistance mutations in HIV as a validation of the results you´re getting.
RubyTuesday is offline   Reply With Quote
Old 12-06-2011, 03:02 PM   #15
Himalaya
Member
 
Location: uk

Join Date: Jun 2010
Posts: 38
Default

Hello
I am also using Ava variant analyzer. The software produces the consensus sequence but I would like to change the parameters for producing the consensus alignment. Do anyone know how to change the consensus generating parameters of Ava software or anyone know any script out there that can generate consensus alignment from Ava multiple aligned sequences?
Himalaya is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



All times are GMT -8. The time now is 06:08 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2022, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO