Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • A workflow for filtering

    Hi!

    First of all thanks to the community who often solve my problems before I even have time to ask the question.

    I have a problem though. I’m working with datasets from 454 sequencing and I know one organism to be present and want to filter out those reads to find possible other organisms by analyzing the remaining reads.

    From what I have managed to grasp running around in these forums and other forums it should be an easy task if I could extract the data from an alignment. MAQ as well as Bowtie have these functions built into them (to dump the unaligned sequences) but since I’m using 454 they won’t work.

    So my question is, is this doable with BWA and samtools? I think it should be doable but I’m left with mostly incomplete clues in the forums and I’m a bit lost right now.
    I know that samtools view with the flag –f should be able to dump the reads but I’m unsure of the flag and even more unsure on how to handle the output (ideally I will have an output that I can incorporate into MIRA for assembly (or velvet / Ray)).

  • #2
    samtools view -f 4 <in.bam> > unmapped.sam

    You can then use Picard's SamToFastq.jar utility.

    See http://picard.sourceforge.net/explain-flags.html, http://picard.sourceforge.net
    and http://samtools.sourceforge.net

    Comment


    • #3
      Originally posted by nilshomer View Post
      samtools view -f 4 <in.bam> > unmapped.sam

      You can then use Picard's SamToFastq.jar utility.

      See http://picard.sourceforge.net/explain-flags.html, http://picard.sourceforge.net
      and http://samtools.sourceforge.net
      Great! Then I was not totally out on thin ice at least.

      Is there another way to convert between sam files and FastQ? Also, since Mira for example will whine if I do not provide it with 3 files for assembly (.fa .qual and .xml) I think it might be simpler to run Mira first (the datasets are easily handled with the current cluster anyway so computing power is not a problem) then map it using BWA, convert and BLAST the unmapped results. Should filter out the problematic reads/contigs from the dataset

      What I'm trying to emulate is close to the PathSeq workflow but using different tools (since they use the illumina platform and MAQ but my datasets are 454).

      Best Regards

      Comment


      • #4
        Hi Ackia,

        You can samtools JDK toolkit and extract the information.

        Chandra

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        59 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        57 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        53 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        56 views
        0 likes
        Last Post seqadmin  
        Working...
        X