Seqanswers Leaderboard Ad

**Brian Bushnell** · 05-29-2014, 11:29 AM

Let's hope they were eating mainly Drosophila, otherwise it sounds pretty difficult...

But, I would suggest that you start by grabbing all the insect assemblies publicly available (at NCBI, for example) and running BLAST against them, on a few thousand reads, just to get an idea of what proportion of reads you can characterize this way. For higher throughput, you can use BBSplit, which will have somewhat less sensitivity, but deposit the reads in one little pile per reference organism for further analysis.

**GenoMax** · 05-29-2014, 11:46 AM

Jennifer: Since you are looking at a specific gene perhaps you can get away with downloading a collection of CO1 sequences from NCBI for insects and building a database with it.

You can try BBMap/Blast/Blat (perhaps in that order) to map your reads.

**batgrrl** · 05-29-2014, 12:15 PM

Thanks for the pointers. I'll check out BBSplit and BBMap.

I am not ready to blast away at anything yet. The data is in a series of BAM files. I need to figure out how to parse the contents and read forward and backward barcode keys (100 samples were combined into a single run using 20 keys). I have Perl code that does this with FASTQ files, but I haven't figured out how to parse BAM files yet. Then I need to label each sequence with the sample number, then somehow bin them into MOTUs before I have sequences to blast. I expect that many of the sequences will not have matches, but many will; these bats like moths and that's one area where the Barcode of Life project has many entries. And even if I don't have a match, I can still look at changes in the insect community between samples (the samples are a time series).

My current plan is to start by finding sequences that are 100% matches for all of the 220 bases, which I should be able to do programmatically, and then blast those. But there will be many that are just slightly different and that may require alignment-by-hand. I really don't know how to handle that but I would really prefer not to sit in front of a GUI looking at each sequence until the cows come home.

I just can't believe I'm the first person to try to do this using Ion Torrent amplicons. Heck it could probably come from any high-throughput sequencer. I really don't want to invent anything that already works for someone. I don't think it matters if it's bacterial (e.g. 16s) or CO1 or any other specific target gene, as far as the workflow goes; in the end you might blast against a different collection of sequences, that's all. Am I way off base with this? Feel free to point me to some newbie website or something! I would be grateful for any feedback.
Jennifer

**Brian Bushnell** · 05-29-2014, 12:27 PM

If your data is in bam files, then has the alignment already been done? If so, you may simply need to do a pileup of the bam file, which is simple with existing tools.

If you want to parse the bam file manually, you should convert it to sam first (e.g. with samtools), then it's just tab-delimited columns, one row per read.

**blancha** · 05-29-2014, 12:27 PM

Just convert the BAM files to FASTQ files.
Why do you have BAM files instead of FASTQ files in the first place anyway?

Here is a sample command to do the conversion.
After multi-threaded sorting by read name, the paired end reads are separated into 2 FASTQ files.

---

samtools sort -@ 7 -n ../../../BAM/HI.1674.007.Index_13.DF_3A-IP.bam

bedtools bamtofastq -i ../../../BAM/HI.1674.007.Index_13.DF_3A-IP_sorted.bam \
-fq ../../../FASTQ/untrimmed/HI.1674.007.Index_13.DF_3A-IP_R1.fastq \
-fq2 ../../../FASTQ/untrimmed/HI.1674.007.Index_13.DF_3A-IP_R2.fastq

**batgrrl** · 05-29-2014, 12:36 PM

Thanks, that gives me a place to start.

I have BAM files because I couldn't figure out how to get Ion Torrent to give me all the FASTQ files in one zip file, and because I didn't know it would be in BAM format until I downloaded the 2G file and expanded it. Apparently I can get the files sorted by the first barcode, but there didn't appear to be a way to get them aligned.

I will go look in the Ion Torrent forum to see if there is some option I could not find or understand. If any of you use IT and could give me some advice offline on extracting data from the server web page, give me a shout. The sequencing lab here doesn't seem to have much of a clue about it.

**GenoMax** · 05-29-2014, 04:46 PM

Originally posted by batgrrl View Post

I will go look in the Ion Torrent forum to see if there is some option I could not find or understand. If any of you use IT and could give me some advice offline on extracting data from the server web page, give me a shout. The sequencing lab here doesn't seem to have much of a clue about it.

While you work on the BAM files look on the ion community site (http://ioncommunity.lifetechnologies.com/welcome) to find the manual for Ion Reporter (you may need to create a free account). Ion Reporter is "not-free" software and your sequencing lab should have a copy of it on the machine that runs the ion instrument.

You could look at a) exporting the demultiplexed fastq files from IR b) providing a set of CO1 sequences and getting ion reporter to do some alignments. Ion Reporter takes into account errror models/types of errors and could give you some immediately usable data.

A couple of times I have interacted with Ion Reporter it has not been the most intuitive software to use but no harm in trying it as plan B (as long as the local lab allows you to use the software on the instrument server).

**snetmcom** · 05-29-2014, 06:20 PM

Originally posted by batgrrl View Post

Thanks, that gives me a place to start.

I have BAM files because I couldn't figure out how to get Ion Torrent to give me all the FASTQ files in one zip file, and because I didn't know it would be in BAM format until I downloaded the 2G file and expanded it. Apparently I can get the files sorted by the first barcode, but there didn't appear to be a way to get them aligned.

I will go look in the Ion Torrent forum to see if there is some option I could not find or understand. If any of you use IT and could give me some advice offline on extracting data from the server web page, give me a shout. The sequencing lab here doesn't seem to have much of a clue about it.

You can have the Ion platforms give you fastq. They will be barcode separated unless you want to do the barcode splitting yourself.

There might be some confusion with Ion Reporter. This is primarily a tool for Human analysis.

I agree a database approach for this gene might be the easiest way to see what is in there.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 45 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 46 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 39 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Insect DNA from bat guano

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News