Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reorganize unaligned read output

    Hello all.

    I am using Mosaik to align an Illumina read dataset to a reference genome. The total DNA used in Illumina sequencing included DNA from more than one organism, so I would like to try a Velvet de novo assembly with the unaligned reads that Mosaik dumped into a fastq file. Our read data is paired end, which should help our de novo assembly, but Mosaik doesn't retain the paired end information when dumping reads to the fastq file.

    Does anyone have a script for reorganizing a fastq file into a Velvet-acceptable order? (i.e., Read 1 from a pair is followed by Read 2, then Read 1 and Read 2 from another pair, etc.)

    Thank you,
    Laura Williams

  • #2
    revised request

    Follow-up to my previous post.

    After a closer inspection of the Mosaik unaligned reads output, I see that pairs of reads are kept together. Unfortunately, now the problem is that I have paired reads and single (orphaned) reads in the same unaligned reads file. I'd like to input the paired reads into Velvet for a de novo assembly. Does anyone have any advice for how to easily remove the orphaned reads from my unaligned reads file?

    Thanks,
    Laura

    Comment


    • #3
      If paired reads are "next" to each other in the file (i.e. always first one than the other), then a little bit of Perl/Python etc can easily do this.

      It is likely that trying to read all the reads into memory at once will blow out your memory if you have many reads.

      What you want to do is cache the current read. If the next read is from the same fragment, output both. Either way, the current read now becomes the cached read.

      I haven't used Mosaik; if it writes the unaligned sequences to a FASTQ file, then BioPerl has all the routines left unwritten below (though not necessarily with my names!)

      Code:
      #!/usr/bin/perl
      ## not nearly a complete perl program
      use strict;
      
      my $cachedRead=undef
      
      # here's the incomplete part: need to write an object to open unaligned reads file
      # and report one read each time nextRead() is called; return undef if end-of-file)
      
      # also each read is an object with a fragmentId() method (returns fragmentId)
      # and a writeRead() method (writes read back to STDOUT or a waiting filehandle)
      
      while (my $currRead=$reader->nextRead())
      {
        last if (!defined $currRead);  ## if not defined, end-of-file reached
        if (defined $cachedRead && $cachedRead->fragmentId() eq $currRead->fragmentId())
         {
           $cachedRead->writeRead();
           $currRead->writeRead()
        }
        $cachedRead=$currRead;
      }

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Advancing Precision Medicine for Rare Diseases in Children
        by seqadmin




        Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
        12-16-2024, 07:57 AM
      • seqadmin
        Recent Advances in Sequencing Technologies
        by seqadmin



        Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

        Long-Read Sequencing
        Long-read sequencing has seen remarkable advancements,...
        12-02-2024, 01:49 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 12-17-2024, 10:28 AM
      0 responses
      26 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 12-13-2024, 08:24 AM
      0 responses
      42 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 12-12-2024, 07:41 AM
      0 responses
      28 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 12-11-2024, 07:45 AM
      0 responses
      42 views
      0 likes
      Last Post seqadmin  
      Working...
      X