Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reorganize unaligned read output

    Hello all.

    I am using Mosaik to align an Illumina read dataset to a reference genome. The total DNA used in Illumina sequencing included DNA from more than one organism, so I would like to try a Velvet de novo assembly with the unaligned reads that Mosaik dumped into a fastq file. Our read data is paired end, which should help our de novo assembly, but Mosaik doesn't retain the paired end information when dumping reads to the fastq file.

    Does anyone have a script for reorganizing a fastq file into a Velvet-acceptable order? (i.e., Read 1 from a pair is followed by Read 2, then Read 1 and Read 2 from another pair, etc.)

    Thank you,
    Laura Williams

  • #2
    revised request

    Follow-up to my previous post.

    After a closer inspection of the Mosaik unaligned reads output, I see that pairs of reads are kept together. Unfortunately, now the problem is that I have paired reads and single (orphaned) reads in the same unaligned reads file. I'd like to input the paired reads into Velvet for a de novo assembly. Does anyone have any advice for how to easily remove the orphaned reads from my unaligned reads file?

    Thanks,
    Laura

    Comment


    • #3
      If paired reads are "next" to each other in the file (i.e. always first one than the other), then a little bit of Perl/Python etc can easily do this.

      It is likely that trying to read all the reads into memory at once will blow out your memory if you have many reads.

      What you want to do is cache the current read. If the next read is from the same fragment, output both. Either way, the current read now becomes the cached read.

      I haven't used Mosaik; if it writes the unaligned sequences to a FASTQ file, then BioPerl has all the routines left unwritten below (though not necessarily with my names!)

      Code:
      #!/usr/bin/perl
      ## not nearly a complete perl program
      use strict;
      
      my $cachedRead=undef
      
      # here's the incomplete part: need to write an object to open unaligned reads file
      # and report one read each time nextRead() is called; return undef if end-of-file)
      
      # also each read is an object with a fragmentId() method (returns fragmentId)
      # and a writeRead() method (writes read back to STDOUT or a waiting filehandle)
      
      while (my $currRead=$reader->nextRead())
      {
        last if (!defined $currRead);  ## if not defined, end-of-file reached
        if (defined $cachedRead && $cachedRead->fragmentId() eq $currRead->fragmentId())
         {
           $cachedRead->writeRead();
           $currRead->writeRead()
        }
        $cachedRead=$currRead;
      }

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin




        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
        04-22-2024, 07:01 AM
      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Today, 08:47 AM
      0 responses
      12 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      60 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      59 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      54 views
      0 likes
      Last Post seqadmin  
      Working...
      X