Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Insect DNA from bat guano

    Hi all,
    I'm a postdoc at the University of Tennessee. My current project is to figure out what bats are eating by sequencing insect DNA extracted from guano samples. We've already done Ion Torrent runs that look OK, now I just have to interpret the piles o data. The data are approx 220 bp reads from the CO1 gene, per the Barcode of Life approach. I can do basic Perl programming and am prepared to learn what I need (I used to code years ago) but would prefer to assemble established solutions rather than writing my own from scratch. I guess my main questions now are about the appropriate workflow to use in handling the sequences. I don't need to assemble sequences, but will need to do some kind of basic alignment. Should I try to do that in Perl, or switch to something else like Sequencher? High throughput data means that inspecting individual reads is not a viable approach, so it's got to be automated somehow. Can I run sequencher in batch mode?

    In short, if you can point me to information that would get me started identifying the best components of an analysis workflow that would be awesome. Thanks!
    Jennifer

  • #2
    Let's hope they were eating mainly Drosophila, otherwise it sounds pretty difficult...

    But, I would suggest that you start by grabbing all the insect assemblies publicly available (at NCBI, for example) and running BLAST against them, on a few thousand reads, just to get an idea of what proportion of reads you can characterize this way. For higher throughput, you can use BBSplit, which will have somewhat less sensitivity, but deposit the reads in one little pile per reference organism for further analysis.

    Comment


    • #3
      Jennifer: Since you are looking at a specific gene perhaps you can get away with downloading a collection of CO1 sequences from NCBI for insects and building a database with it.

      You can try BBMap/Blast/Blat (perhaps in that order) to map your reads.
      Last edited by GenoMax; 05-29-2014, 04:35 PM.

      Comment


      • #4
        Thanks for the pointers. I'll check out BBSplit and BBMap.

        I am not ready to blast away at anything yet. The data is in a series of BAM files. I need to figure out how to parse the contents and read forward and backward barcode keys (100 samples were combined into a single run using 20 keys). I have Perl code that does this with FASTQ files, but I haven't figured out how to parse BAM files yet. Then I need to label each sequence with the sample number, then somehow bin them into MOTUs before I have sequences to blast. I expect that many of the sequences will not have matches, but many will; these bats like moths and that's one area where the Barcode of Life project has many entries. And even if I don't have a match, I can still look at changes in the insect community between samples (the samples are a time series).

        My current plan is to start by finding sequences that are 100% matches for all of the 220 bases, which I should be able to do programmatically, and then blast those. But there will be many that are just slightly different and that may require alignment-by-hand. I really don't know how to handle that but I would really prefer not to sit in front of a GUI looking at each sequence until the cows come home.

        I just can't believe I'm the first person to try to do this using Ion Torrent amplicons. Heck it could probably come from any high-throughput sequencer. I really don't want to invent anything that already works for someone. I don't think it matters if it's bacterial (e.g. 16s) or CO1 or any other specific target gene, as far as the workflow goes; in the end you might blast against a different collection of sequences, that's all. Am I way off base with this? Feel free to point me to some newbie website or something! I would be grateful for any feedback.
        Jennifer

        Comment


        • #5
          If your data is in bam files, then has the alignment already been done? If so, you may simply need to do a pileup of the bam file, which is simple with existing tools.

          If you want to parse the bam file manually, you should convert it to sam first (e.g. with samtools), then it's just tab-delimited columns, one row per read.

          Comment


          • #6
            Just convert the BAM files to FASTQ files.
            Why do you have BAM files instead of FASTQ files in the first place anyway?

            Here is a sample command to do the conversion.
            After multi-threaded sorting by read name, the paired end reads are separated into 2 FASTQ files.

            ---

            samtools sort -@ 7 -n ../../../BAM/HI.1674.007.Index_13.DF_3A-IP.bam

            bedtools bamtofastq -i ../../../BAM/HI.1674.007.Index_13.DF_3A-IP_sorted.bam \
            -fq ../../../FASTQ/untrimmed/HI.1674.007.Index_13.DF_3A-IP_R1.fastq \
            -fq2 ../../../FASTQ/untrimmed/HI.1674.007.Index_13.DF_3A-IP_R2.fastq

            Comment


            • #7
              Thanks, that gives me a place to start.

              I have BAM files because I couldn't figure out how to get Ion Torrent to give me all the FASTQ files in one zip file, and because I didn't know it would be in BAM format until I downloaded the 2G file and expanded it. Apparently I can get the files sorted by the first barcode, but there didn't appear to be a way to get them aligned.

              I will go look in the Ion Torrent forum to see if there is some option I could not find or understand. If any of you use IT and could give me some advice offline on extracting data from the server web page, give me a shout. The sequencing lab here doesn't seem to have much of a clue about it.

              Comment


              • #8
                Originally posted by batgrrl View Post
                I will go look in the Ion Torrent forum to see if there is some option I could not find or understand. If any of you use IT and could give me some advice offline on extracting data from the server web page, give me a shout. The sequencing lab here doesn't seem to have much of a clue about it.
                While you work on the BAM files look on the ion community site (http://ioncommunity.lifetechnologies.com/welcome) to find the manual for Ion Reporter (you may need to create a free account). Ion Reporter is "not-free" software and your sequencing lab should have a copy of it on the machine that runs the ion instrument.

                You could look at a) exporting the demultiplexed fastq files from IR b) providing a set of CO1 sequences and getting ion reporter to do some alignments. Ion Reporter takes into account errror models/types of errors and could give you some immediately usable data.

                A couple of times I have interacted with Ion Reporter it has not been the most intuitive software to use but no harm in trying it as plan B (as long as the local lab allows you to use the software on the instrument server).

                Comment


                • #9
                  Originally posted by batgrrl View Post
                  Thanks, that gives me a place to start.

                  I have BAM files because I couldn't figure out how to get Ion Torrent to give me all the FASTQ files in one zip file, and because I didn't know it would be in BAM format until I downloaded the 2G file and expanded it. Apparently I can get the files sorted by the first barcode, but there didn't appear to be a way to get them aligned.

                  I will go look in the Ion Torrent forum to see if there is some option I could not find or understand. If any of you use IT and could give me some advice offline on extracting data from the server web page, give me a shout. The sequencing lab here doesn't seem to have much of a clue about it.
                  You can have the Ion platforms give you fastq. They will be barcode separated unless you want to do the barcode splitting yourself.

                  There might be some confusion with Ion Reporter. This is primarily a tool for Human analysis.

                  I agree a database approach for this gene might be the easiest way to see what is in there.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Essential Discoveries and Tools in Epitranscriptomics
                    by seqadmin


                    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                    Yesterday, 07:01 AM
                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  45 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  46 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  39 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  55 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X