Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • A workflow for filtering

    Hi!

    First of all thanks to the community who often solve my problems before I even have time to ask the question.

    I have a problem though. I’m working with datasets from 454 sequencing and I know one organism to be present and want to filter out those reads to find possible other organisms by analyzing the remaining reads.

    From what I have managed to grasp running around in these forums and other forums it should be an easy task if I could extract the data from an alignment. MAQ as well as Bowtie have these functions built into them (to dump the unaligned sequences) but since I’m using 454 they won’t work.

    So my question is, is this doable with BWA and samtools? I think it should be doable but I’m left with mostly incomplete clues in the forums and I’m a bit lost right now.
    I know that samtools view with the flag –f should be able to dump the reads but I’m unsure of the flag and even more unsure on how to handle the output (ideally I will have an output that I can incorporate into MIRA for assembly (or velvet / Ray)).

  • #2
    samtools view -f 4 <in.bam> > unmapped.sam

    You can then use Picard's SamToFastq.jar utility.

    See http://picard.sourceforge.net/explain-flags.html, http://picard.sourceforge.net
    and http://samtools.sourceforge.net

    Comment


    • #3
      Originally posted by nilshomer View Post
      samtools view -f 4 <in.bam> > unmapped.sam

      You can then use Picard's SamToFastq.jar utility.

      See http://picard.sourceforge.net/explain-flags.html, http://picard.sourceforge.net
      and http://samtools.sourceforge.net
      Great! Then I was not totally out on thin ice at least.

      Is there another way to convert between sam files and FastQ? Also, since Mira for example will whine if I do not provide it with 3 files for assembly (.fa .qual and .xml) I think it might be simpler to run Mira first (the datasets are easily handled with the current cluster anyway so computing power is not a problem) then map it using BWA, convert and BLAST the unmapped results. Should filter out the problematic reads/contigs from the dataset

      What I'm trying to emulate is close to the PathSeq workflow but using different tools (since they use the illumina platform and MAQ but my datasets are 454).

      Best Regards

      Comment


      • #4
        Hi Ackia,

        You can samtools JDK toolkit and extract the information.

        Chandra

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 06:37 PM
        0 responses
        10 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 06:07 PM
        0 responses
        9 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        49 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        67 views
        0 likes
        Last Post seqadmin  
        Working...
        X