Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
How to get the output for each mapped read from TopHat? davidehs Bioinformatics 5 08-24-2011 05:07 AM
Bowtie changes read names in SAM output ashish Bioinformatics 9 07-22-2011 01:33 PM
How to read pysam SNPCaller output? aggp11 Bioinformatics 2 07-11-2011 10:12 AM
missing read groups for unaligned reads gfmgfm Bioinformatics 2 01-20-2011 06:49 AM
Weird Bowtie Alignment Result (unaligned remaining in output file) DrD2009 Bioinformatics 2 07-16-2010 09:24 AM

Thread Tools
Old 03-05-2010, 07:52 AM   #1
Junior Member
Location: Massachusetts

Join Date: Aug 2009
Posts: 2
Default Reorganize unaligned read output

Hello all.

I am using Mosaik to align an Illumina read dataset to a reference genome. The total DNA used in Illumina sequencing included DNA from more than one organism, so I would like to try a Velvet de novo assembly with the unaligned reads that Mosaik dumped into a fastq file. Our read data is paired end, which should help our de novo assembly, but Mosaik doesn't retain the paired end information when dumping reads to the fastq file.

Does anyone have a script for reorganizing a fastq file into a Velvet-acceptable order? (i.e., Read 1 from a pair is followed by Read 2, then Read 1 and Read 2 from another pair, etc.)

Thank you,
Laura Williams
lew13 is offline   Reply With Quote
Old 03-14-2010, 03:52 PM   #2
Junior Member
Location: Massachusetts

Join Date: Aug 2009
Posts: 2
Default revised request

Follow-up to my previous post.

After a closer inspection of the Mosaik unaligned reads output, I see that pairs of reads are kept together. Unfortunately, now the problem is that I have paired reads and single (orphaned) reads in the same unaligned reads file. I'd like to input the paired reads into Velvet for a de novo assembly. Does anyone have any advice for how to easily remove the orphaned reads from my unaligned reads file?

lew13 is offline   Reply With Quote
Old 03-14-2010, 07:56 PM   #3
Senior Member
Location: Boston area

Join Date: Nov 2007
Posts: 747

If paired reads are "next" to each other in the file (i.e. always first one than the other), then a little bit of Perl/Python etc can easily do this.

It is likely that trying to read all the reads into memory at once will blow out your memory if you have many reads.

What you want to do is cache the current read. If the next read is from the same fragment, output both. Either way, the current read now becomes the cached read.

I haven't used Mosaik; if it writes the unaligned sequences to a FASTQ file, then BioPerl has all the routines left unwritten below (though not necessarily with my names!)

## not nearly a complete perl program
use strict;

my $cachedRead=undef

# here's the incomplete part: need to write an object to open unaligned reads file
# and report one read each time nextRead() is called; return undef if end-of-file)

# also each read is an object with a fragmentId() method (returns fragmentId)
# and a writeRead() method (writes read back to STDOUT or a waiting filehandle)

while (my $currRead=$reader->nextRead())
  last if (!defined $currRead);  ## if not defined, end-of-file reached
  if (defined $cachedRead && $cachedRead->fragmentId() eq $currRead->fragmentId())
krobison is offline   Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 02:24 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO