SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Recommended aligner to use with a draft reference genome and paired end reads N311V Bioinformatics 2 07-16-2013 05:32 PM
scaffolding GAII paired-end library with Hiseq mate-pairs stevebaeyen Bioinformatics 17 02-27-2013 01:45 AM
scaffolding without paired-end, mate pair yifangt Bioinformatics 4 07-13-2012 08:52 AM
paired-end reads mapped to genome.. gene with only one direction of paired-end reads? danwiththeplan Bioinformatics 2 09-22-2011 02:06 AM
Instructions for scaffolding MIRA 454 contigs & 25 KB paired-end data with BAMBUS edge Bioinformatics 1 09-30-2009 01:04 AM

Reply
 
Thread Tools
Old 01-30-2015, 09:35 AM   #1
JPZ
Junior Member
 
Location: Greece

Join Date: Mar 2012
Posts: 4
Question Draft genome scaffolding with RNAseq paired-end reads

Hello all,

I used Tophat to map 100bp, PE Illumina transcriptome reads to a draft genome (133062 contigs).
Our main goal was SNP mining, but I have been suggested the reads could also be used for scaffolding.

I have no experience in genome assembly and scaffolding, but I assume that if I can find read pairs where the 2 reads are mapped to different genomic contigs, the 2 genomic contigs could then be connected.

How can I search the BAM alignments for such read pairs?

Alternatively I could use an assembler that can combine different types of reads such as Mira, but I thought it would take longer, and the genomic reads are not available anyway.

Thank you!
JPZ is offline   Reply With Quote
Old 01-30-2015, 10:16 AM   #2
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Look for reads where the "rname" field and "rnext" field are different (and rnext is not "=" or "*"); those have the reads mapped to different contigs.
Brian Bushnell is offline   Reply With Quote
Old 02-03-2015, 04:03 AM   #3
JPZ
Junior Member
 
Location: Greece

Join Date: Mar 2012
Posts: 4
Default

Thank you for the info Brian,
good starting point, saved me a lot of reading and guesswork.

I also thought of filtering for MAPQ = 50 (should be uniquely mapped reads)
and properly paired reads (FLAG = 83|99|147|163)

The following command should then extract the alignments of interest:

samtools view -q 50 accepted_hits.bam |gawk '($2 == 83 || $2 == 99 || $2 == 147 || $2 == 163) && $7 !~/[*=]/ {print $3, $7}' > output

And thus obtain a list of joined contigs.
However, while it is possible to determine which contigs are joined, I assume the lenght of N bases padding between them cannot.
Not only the region may not be transcribed, but the insert size for paired reads that have a mate in a different contig appears to be always 0 (at least that is what Tablet shows).

Or do I have other options I'm unaware of?
JPZ is offline   Reply With Quote
Old 02-03-2015, 07:18 AM   #4
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

The insert size of reads mapped to different contigs is unknown. Scaffolding tools can use the distribution of insert sizes of pairs on the same contig, or user-supplied insert size numbers, to determine how many Ns to pad.

This might be easier if you just use a standalone scaffolding tool. There are various out there, but I don't have a recommendation. Here's a paper comparing some of them:

http://genomebiology.com/2014/15/3/R42
Brian Bushnell is offline   Reply With Quote
Old 02-03-2015, 07:37 AM   #5
JPZ
Junior Member
 
Location: Greece

Join Date: Mar 2012
Posts: 4
Default

Thank you again for the input,
I'll check the paper and see if using the above filtered alignments can work.
JPZ is offline   Reply With Quote
Old 02-04-2015, 12:37 AM   #6
boetsie
Senior Member
 
Location: NL, Leiden

Join Date: Feb 2010
Posts: 245
Default

Maybe this program will help:

https://github.com/svm-zhang/AGOUTI

I have not used it myself though.
boetsie is offline   Reply With Quote
Old 02-05-2015, 04:45 AM   #7
bioBob
Member
 
Location: Virginia

Join Date: Mar 2011
Posts: 72
Default

L-rna-scaffolder may also help.

I have used it with varying success.
bioBob is offline   Reply With Quote
Old 02-10-2015, 02:05 AM   #8
JPZ
Junior Member
 
Location: Greece

Join Date: Mar 2012
Posts: 4
Default

Thank you all for your answers,
I'll try some of the suggested tools, more likely those that do not have too many dependencies..
JPZ is offline   Reply With Quote
Reply

Tags
paired end reads, rnaseq alignment, scaffolding

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:41 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO