Hi, I am a complete newbie to the entire realm of sequence analysis. My background is in Computer Engineering, so I don't understand a lot of the details on how primers, etc. work.
What I'm trying to do is, I think, pretty straightforward. We've received data from Roche, including the sff files which contain the raw reads. Previously, I have worked with other data, starting from Fasta files which contain about 10-100 aligned sequences. That is the type of file I would like to produce.
Currently, I am attempting to do this the AVA software. I create a project, define the appropriate amplicons and MID sequences, and hook up the samples and data with multiplexers. I _think_ that I am doing all of this correctly, the manual was very helpful.
To get the fasta files, I am using the doAmplicon 'report align' command. When I report the consensus files, I get about ~1000 sequences. Individually, I have about ~10000 sequences.
I don't mind having a lot of sequences, but the similarity in them is very high. Infections that are known to have existed for a while are exhibiting low diversity, characteristic of much more recent infections.
I am not sure if my workflow is appropriate for what I am trying to do - perhaps I would be better served by the gsMapper or gsAssembler programs?
Any advice would be very much appreciated.
Thank you,
-Fwip
P.S: There is probably relevant important information that I am not including - please ask for clarification and I will do my best to answer. Thank you!
What I'm trying to do is, I think, pretty straightforward. We've received data from Roche, including the sff files which contain the raw reads. Previously, I have worked with other data, starting from Fasta files which contain about 10-100 aligned sequences. That is the type of file I would like to produce.
Currently, I am attempting to do this the AVA software. I create a project, define the appropriate amplicons and MID sequences, and hook up the samples and data with multiplexers. I _think_ that I am doing all of this correctly, the manual was very helpful.
To get the fasta files, I am using the doAmplicon 'report align' command. When I report the consensus files, I get about ~1000 sequences. Individually, I have about ~10000 sequences.
I don't mind having a lot of sequences, but the similarity in them is very high. Infections that are known to have existed for a while are exhibiting low diversity, characteristic of much more recent infections.
I am not sure if my workflow is appropriate for what I am trying to do - perhaps I would be better served by the gsMapper or gsAssembler programs?
Any advice would be very much appreciated.
Thank you,
-Fwip
P.S: There is probably relevant important information that I am not including - please ask for clarification and I will do my best to answer. Thank you!
Comment