SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Contigs Vs Scaffolds for Assembly Analysis narain Bioinformatics 5 10-14-2011 07:15 AM
break genome in 44bp reads fadista Bioinformatics 1 10-30-2010 02:11 AM
Break dancer help rakeshponnala Bioinformatics 0 09-13-2010 05:24 PM
Script to break scaffolds? allo Bioinformatics 2 06-24-2010 08:07 AM
Forum functions may break temporarily...combating spam. ECO Site Announcements 1 02-13-2010 09:25 AM

Reply
 
Thread Tools
Old 01-31-2013, 07:49 AM   #1
yzzhang
Member
 
Location: florida

Join Date: Jan 2013
Posts: 67
Default break scaffolds with many Ns to contigs

Dear all,
Recently I sequenced a plant genome using Hiseq 2000, 101bp paired end reads. I assembled the reads to contigs. Because I just sequenced one insert libray (size 400 bp), I got so many contigs. To build a better genome and get more genes, I aligned the contigs to a related species genome using ABACAS, and filled the gaps using Gapfiller. Maybe the two species have some different regions in chromosomes, so in the result there are many Ns in the generated scaffolds (the same as the chromosomes number). So I want to break the scaffolds to contigs if there are more than 100 Ns between two continuous sequences. Because I am only a biologist and with on knowledge on scripts. Could anyone could indicate me which software or script works for my data? Many thanks.
Best Wishes,
yun
yzzhang is offline   Reply With Quote
Old 01-31-2013, 08:52 AM   #2
westerman
Rick Westerman
 
Location: Purdue University, Indiana, USA

Join Date: Jun 2008
Posts: 1,104
Default

There are probably many ways. Using fastx toolkit plus unix should work. I think, but have not tried, the following

1) fasta_formatter to convert the contigs into single-line sequences
2) sed -e 's/N{100,}/>/g'
3) fastx_renamer
4) (optional) fasta_formatter for more sensible line breaks

A short perl/python/ruby program would also work but that would require some level of programming.

What I gave above is just the bare-bones -- the idea -- and not the actual implementation which is left up to you.
westerman is offline   Reply With Quote
Old 02-01-2013, 07:50 AM   #3
yzzhang
Member
 
Location: florida

Join Date: Jan 2013
Posts: 67
Default

Thank you very much. Finally I got a perl script for this purpose. I need to learn some programming skills, I am wondering perl or python is earier to learn, and try perl first. Anyway, Thanks.

Quote:
Originally Posted by westerman View Post
There are probably many ways. Using fastx toolkit plus unix should work. I think, but have not tried, the following

1) fasta_formatter to convert the contigs into single-line sequences
2) sed -e 's/N{100,}/>/g'
3) fastx_renamer
4) (optional) fasta_formatter for more sensible line breaks

A short perl/python/ruby program would also work but that would require some level of programming.

What I gave above is just the bare-bones -- the idea -- and not the actual implementation which is left up to you.
yzzhang is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:44 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO