Seqanswers Leaderboard Ad

**pmiguel** · 07-29-2010, 08:19 AM

Hi Fwip,
Could you give us some details of the experiment lab-side? Did you create the amplicons or did Roche? How much starting material was used in the PCR reaction to create the amplicons and how did you assay the amount of starting material. Also, what was the starting material (eg, RNA, ssDNA, dsDNA), etc.
--
Phillip

**Fwip** · 07-29-2010, 08:22 AM

Oh goodness... I have very little knowledge of what went on lab-side. I am reasonably sure that Roche created the amplicons. I have no idea on the other questions. If it helps, this is data on HIV infection.

Sorry I can't be more helpful here.

**pmiguel** · 07-29-2010, 10:37 AM

Well then suffice it to say that if the amount of RNA given to Roche was limiting, the number of viruses assayed may not have been sufficient to detect any but the most common variants.

Amplicons employ PCR. PCR, as long as it gets a single, amplifiable, chunk of DNA to start with, can usually produce lots of DNA given enough cycles of amplification.

That said, I find the AVA software to be quite inscrutable. So, it is also possible that AVA is hiding most of your variants for one reason or another. There are ways, however, to force it to reveal them all.

--
Phillip

**Fwip** · 07-30-2010, 06:13 AM

Hmmm, thank you. I guess it is certainly possible that the samples were not terribly informative and represent only a small portion of the genetic diversity.

The results seem similar for all of the samples that I have analyzed so far, so I am still hoping it is user error on my part.

I'll keep toying around with this software, then.

Thank you,
Fwip

**pmiguel** · 07-30-2010, 07:43 AM

I hasten to add, that there are some default settings that would tend not to display rarer variants. But those become obvious as you tinker around with the AVA GUI.

--
Phillip

**Fwip** · 08-02-2010, 07:38 AM

The more I try to work this, the more I think that my workflow may be incorrect. I haven't had any training with this software, so I am not even sure what programs I should be using, or in what order.

Here is where I am, and my goals:

The data we received included read data from eight gaskets, split up into 8 SFF files. There was also an FNA and QUAL file for each gasket, which I believe was derived from the respective SFF file. Each gasket had 4 MID tags. All samples had an identical forward 5' primer and one of two different reverse 5' primers, but are still uniquely identified by their gasket/MID combo.

I would like to get a FASTA file for each gasket/MID combination, representing the reconstructed sequences. If a variant is very common, it should show up in the FASTA file multiple times. Alternatively, if I could get a number representing the frequency at which it appears, that would work.

What should my workflow look like for this? Should I do this entirely in gsMapper, gsAssembler, or gsAmplicon? Should I take the output of one program and analyze it in another?

Thank you very much for your help,
Fwip

**pmiguel** · 08-02-2010, 07:54 AM

gsAmplicon should be able to handle the .sff files as they are, although you may need to specify the info for the MIDs.

gsMapper -- there you would probably want to use sfffile to break the .sff into separate MID sffs. Given your MID structure this would not be trivial, although it should be do-able.

Seems like your best bet is gsAmplicon. Are you using the GUI?

--
Phillip

**Fwip** · 08-02-2010, 08:29 AM

Thanks for the help.

I am currently primarily using the GUI, just because it gives me the most feedback on what I am doing. I'm using the command-line tool to extract the data once the computation is complete though (using 'report align'), because I could not figure out how to do that with the GUI.

-Fwip

**westerman** · 08-04-2010, 01:06 PM

Originally posted by Fwip View Post

...

I would like to get a FASTA file for each gasket/MID combination, representing the reconstructed sequences. If a variant is very common, it should show up in the FASTA file multiple times. Alternatively, if I could get a number representing the frequency at which it appears, that would work.
...
What should my workflow look like for this? Should I do this entirely in gsMapper, gsAssembler, or gsAmplicon? Should I take the output of one program and analyze it in another?

Working with amplicons you should use gsAmplicon and doAmplicon. Both are complex programs which take a while to master. Thus it is hard to troubleshoot what problems you may be having. However having 1000s of consensus sequences for a given sample and reference sounds high.

Phillip's and my most recent project -- admittedly a simple one with 1 amplicon, 4 samples/MIDs, and 1 reference -- generates between 15 to 23 consensus sequences per sample. This is starting from 65,000 to 100,000+ reads per sample. We also only found 4 variants in all of the samples. Such a low variance may explain our low number of consensus sequences.

Perhaps, in order to add troubleshooting, you could within doAmplicon a 'list' of your 'amplicon', 'mid', 'sample' and 'variant' and tell us, if not the exact entries, at least the counts of what you have. This could aid us in figuring out how complex your project is.

**pmiguel** · 08-05-2010, 05:12 AM

Originally posted by westerman View Post

Phillip's and my most recent project -- admittedly a simple one with 1 amplicon, 4 samples/MIDs, and 1 reference -- generates between 15 to 23 consensus sequences per sample. This is starting from 65,000 to 100,000+ reads per sample. We also only found 4 variants in all of the samples. Such a low variance may explain our low number of consensus sequences.

Ours were from a host gene. Fwip's are from a virus. Completely different set of expectations. The high error rate of HIV replication is thought to aid it in evading the host immune system. Fwip actually expected to see more variants than he did.

--
Phillip

**westerman** · 08-05-2010, 10:01 AM

Originally posted by pmiguel View Post

Ours were from a host gene. Fwip's are from a virus. Completely different set of expectations. The high error rate of HIV replication is thought to aid it in evading the host immune system. Fwip actually expected to see more variants than he did.

--
Phillip

No doubt that Fwip's experiment will be expected to produce different expectations than was anticipated in our experiment. Did Fwip actually mention the number of variants he saw? I did not see it in the posts. Perhaps he sent you private mail?

Anyway by having him do a various 'list's (via doAmplicon) I can get a feel for if he is setting up and running the Amplicon software correctly. As you probably recall getting the MID part setup properly was tricky for our experiment.

Fwip: If you want to send private email to Phillip and/or myself then please do so. My address is just my user name at purdue.edu

**Fwip** · 08-05-2010, 10:21 AM

Thank you both for the help - it turns out that a large part of it, at least, was that my reference sequence was incorrectly setup. I had not realized that the reference sequence included the primers, and so I was not using the correct reverse primer for the data I was looking at. On top of this, I wasn't narrowing the amplicon range to only include the area between the primers, which could not have helped things.

I've rerun one gasket with these corrections, and already the data looks much better. Thank you!

**RubyTuesday** · 10-26-2011, 10:54 AM

Hello, not sure how much relevance this has to you now considering the date and that I may not have accurately gauged the question you´re asking, but you could run the relevant sections of your sequences through the online Stanford DB for drug-resistance mutations in HIV as a validation of the results you´re getting.

**Himalaya** · 12-06-2011, 04:02 PM

Hello
I am also using Ava variant analyzer. The software produces the consensus sequence but I would like to change the parameters for producing the consensus alignment. Do anyone know how to change the consensus generating parameters of Ava software or anyone know any script out there that can generate consensus alignment from Ava multiple aligned sequences?

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 25 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 29 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 24 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Completely new to this and out of my depth

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News