Seqanswers Leaderboard Ad

**lew13** · 03-14-2010, 02:52 PM

revised request

Follow-up to my previous post.

After a closer inspection of the Mosaik unaligned reads output, I see that pairs of reads are kept together. Unfortunately, now the problem is that I have paired reads and single (orphaned) reads in the same unaligned reads file. I'd like to input the paired reads into Velvet for a de novo assembly. Does anyone have any advice for how to easily remove the orphaned reads from my unaligned reads file?

Thanks,
Laura

**krobison** · 03-14-2010, 06:56 PM

If paired reads are "next" to each other in the file (i.e. always first one than the other), then a little bit of Perl/Python etc can easily do this.

It is likely that trying to read all the reads into memory at once will blow out your memory if you have many reads.

What you want to do is cache the current read. If the next read is from the same fragment, output both. Either way, the current read now becomes the cached read.

I haven't used Mosaik; if it writes the unaligned sequences to a FASTQ file, then BioPerl has all the routines left unwritten below (though not necessarily with my names!)

Code:

#!/usr/bin/perl
## not nearly a complete perl program
use strict;

my $cachedRead=undef

# here's the incomplete part: need to write an object to open unaligned reads file
# and report one read each time nextRead() is called; return undef if end-of-file)

# also each read is an object with a fragmentId() method (returns fragmentId)
# and a writeRead() method (writes read back to STDOUT or a waiting filehandle)

while (my $currRead=$reader->nextRead())
{
  last if (!defined $currRead);  ## if not defined, end-of-file reached
  if (defined $cachedRead && $cachedRead->fragmentId() eq $currRead->fragmentId())
   {
     $cachedRead->writeRead();
     $currRead->writeRead()
  }
  $cachedRead=$currRead;
}

Topics	Statistics	Last Post
Evaluating Genome Sequencing for ECMO Patients in the NICU by seqadmin Started by seqadmin, 12-17-2024, 10:28 AM	0 responses 26 views 0 likes	Last Post by seqadmin 12-17-2024, 10:28 AM
New Genetic Toolkit Refines Studies on Gene Function and Disease by seqadmin Started by seqadmin, 12-13-2024, 08:24 AM	0 responses 42 views 0 likes	Last Post by seqadmin 12-13-2024, 08:24 AM
Study Links Brain Mechanism to Emotional Responses in Animals and Humans by seqadmin Started by seqadmin, 12-12-2024, 07:41 AM	0 responses 28 views 0 likes	Last Post by seqadmin 12-12-2024, 07:41 AM
Study Identifies Ribosomal RNA Fingerprints as Early Cancer Biomarkers by seqadmin Started by seqadmin, 12-11-2024, 07:45 AM	0 responses 42 views 0 likes	Last Post by seqadmin 12-11-2024, 07:45 AM

Seqanswers Leaderboard Ad

Announcement

Reorganize unaligned read output

Comment

Comment

Latest Articles

ad_right_rmr

News