SEQanswers

Go Back   SEQanswers > General



Similar Threads
Thread Thread Starter Forum Replies Last Post
50 bp paired end reads vs. 100 bp single end reads efoss Bioinformatics 12 01-15-2014 08:05 PM
Can Cuffdiff treat paired-end and single-end reads at the same time? zun RNA Sequencing 3 06-12-2012 05:37 PM
paired-end reads mapped to genome.. gene with only one direction of paired-end reads? danwiththeplan Bioinformatics 2 09-22-2011 02:06 AM
PubMed: Methods for generating shotgun and mixed shotgun/paired-end libraries for the Newsbot! Literature Watch 0 04-11-2009 05:40 AM

Reply
 
Thread Tools
Old 05-19-2011, 08:11 AM   #1
whaleberg
Junior Member
 
Location: Philadelphia

Join Date: May 2011
Posts: 2
Default Generating simulated paired end reads

I'm writing a NSG simulator as part of a project for my school. I'm trying to simulate fastq paired-end data, but I'm unsure of the formatting for doing so. I'm producing simulated reads and successfully aligning them in BWA for single ended data, but when I try to produce pairs I end up with various alignment errors. I read that for BWA, the reads in the second alignment file should be the reverse complement.

If I have a base sequence and a simulated a pair of reads from each end like this...
Code:
            AAAGGGTTCTC
read        AAAG
read               TCTC
I'm outputing
file1
read/1 AAAG

file2
read/2 TCTC

(along with the quality scores and the rest of the formatting)

Is that the way they should be? It doesn't seem to work, so I'm guessing not, but neither does not doing the reverse complement. I suspect I'm missing something about how pairs of reads are represented.


I may also have the naming conventions wrong. Should paired reads be separated into 2 different files and labeled <read-name>/1 and <read-name>/2? Ultimately they get rolled into 1 file, so should I be putting them together into 1?

I'm not sure if I have a software bug and am just producing wrong data, or if I'm producing the right thing but formatting it wrong.

Does anyone have an example of correct formatting that I could use as a template? I have had trouble locating an example.

Any help would be appreciated.
whaleberg is offline   Reply With Quote
Old 05-19-2011, 08:26 AM   #2
proteomania
Member
 
Location: France

Join Date: Sep 2010
Posts: 11
Default

If you are talking about the illumina standard pair-end seq, i think i should be

file1
read/1 AAAG

file2
read/2 GAGA

Our illumina data look this way. The mate are on opposite strands although the library is not strand specific.

hope it helps.
proteomania is offline   Reply With Quote
Old 05-19-2011, 09:56 AM   #3
whaleberg
Junior Member
 
Location: Philadelphia

Join Date: May 2011
Posts: 2
Default

Thank you, that is helpful. I had tried doing it that way, but I was still getting weird results. It must be a bug in something else, not that way I'm formatting them.
whaleberg is offline   Reply With Quote
Old 05-21-2011, 12:15 PM   #4
husamia
Member
 
Location: cinci

Join Date: Apr 2010
Posts: 66
Default

do you mean that they are not aligning correctly or what?
husamia is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:04 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO