![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
BWA and mate pair | bouhassi | Bioinformatics | 0 | 12-07-2011 08:33 AM |
Mate-Pair sequencing | versa | Bioinformatics | 0 | 02-10-2011 12:51 AM |
Difference between mate pair and pair end | bassu | General | 2 | 06-19-2010 07:13 AM |
Mate pair, high GC | chen | Sample Prep / Library Generation | 3 | 05-25-2010 09:45 AM |
mate pair sequencing | Chien-Yuan Chen | Illumina/Solexa | 8 | 03-25-2010 08:55 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Senior Member
Location: NL, Leiden Join Date: Feb 2010
Posts: 245
|
![]()
Hi all,
We are currently performing a de novo assembly using Illumina mate-pairs. we have assembled them using CLCBio, though with CLCBio no scaffolds can be produced, only contigs. Now we have mate pairs, so we would like to use them to make a scaffold. The problem is that assembly programs like SOAPdenovo or SSAke etc. use files which where produced during contig assembling. They don't have a stand-alone program for just scaffolding a contig file. Is there any software/algorithm available which has the contigs file (in .fasta format) and mate pair files as input, and can produce a scaffold? Or has someone a solution? Kind regards, Marten |
![]() |
![]() |
![]() |
#3 | |
Member
Location: Uppsala Join Date: Jan 2010
Posts: 25
|
![]() Quote:
__________________
~Adnan~ |
|
![]() |
![]() |
![]() |
#4 |
Senior Member
Location: NL, Leiden Join Date: Feb 2010
Posts: 245
|
![]()
Thanks for the reply's, but I don't think you answers work..
MIRA uses Bambus for scaffolding (if i'm correct?). Though, Bambus doesn't read in a .fasta file for scaffolding, it needs a .contig file, which i don't have. In addition, i can't put in the two mate-pair files i have (one for each read end), only a regular expression of how the two pairs are mated. So, my input is; - 1 .fasta file containing contigs - 2 .fasta files containing the mate pairs Is there a way to do this? Kind regards, Marten |
![]() |
![]() |
![]() |
#5 |
Junior Member
Location: Buenos Aires Join Date: Dec 2009
Posts: 7
|
![]()
any updates on this....
Last edited by gabriel.lichtenstein; 04-07-2010 at 06:06 AM. |
![]() |
![]() |
![]() |
#6 |
Senior Member
Location: NL, Leiden Join Date: Feb 2010
Posts: 245
|
![]() |
![]() |
![]() |
![]() |
#7 |
Junior Member
Location: Vancouver Join Date: Oct 2009
Posts: 4
|
![]()
I believe CLCBio export assemblies as ace file.
|
![]() |
![]() |
![]() |
#8 |
Senior Member
Location: NL, Leiden Join Date: Feb 2010
Posts: 245
|
![]() |
![]() |
![]() |
![]() |
#9 |
Junior Member
Location: Vancouver Join Date: Oct 2009
Posts: 4
|
![]() |
![]() |
![]() |
![]() |
#10 |
Senior Member
Location: NL, Leiden Join Date: Feb 2010
Posts: 245
|
![]() |
![]() |
![]() |
![]() |
#11 |
Junior Member
Location: spain Join Date: Apr 2010
Posts: 7
|
![]()
Hi, I'm trying to run bambus but I don't have any .mates. Does anyone know how can I create this files?
I have a 454 output (fasta + sff) from a bacteria genome and I assembled it with phrap, I already convert the .ace to .contig, using ace2contig from AMOS. Thanx! |
![]() |
![]() |
![]() |
#12 | |
Senior Member
Location: NL, Leiden Join Date: Feb 2010
Posts: 245
|
![]() Quote:
cat my.fasta |grep ">" |sed s/\>//g |sed 's/\/1*$/./g;s/\/2*$/./g'|awk -F "." '{print $1}' |sort |uniq -c |awk '{if ($1 == 2) print $2"/1\t"$2"/2\tsmall"}' > mates.txt You need to put in the fasta file with the read names as 'my.fasta'. The file 'my.fasta' requires filenames to end with /1 and /2. If you have other file names, like .x and .y. You should replace; sed 's/\/1*$/./g;s/\/2*$/./g' to for example; sed 's/.x*$/./g;s/.y*$/./g' in the code above. If you have two fasta files. Just insert one and change; if ($1 == 2) to if ($1 == 1) in the code, this way you only have to run it for one file. This will print the names to 'mates.txt'. Only thing to do is to set your library name and insert sizes on the top of this file. Bambus will probably generate a lot of errors, because some names are not found in the .contig file. But this shouldn't be a problem. Hope this works ![]() |
|
![]() |
![]() |
![]() |
#13 |
Junior Member
Location: spain Join Date: Apr 2010
Posts: 7
|
![]()
Thanx boetsie for your quick answer.
But I can't use your script in this project because the 454 outputs I have 454Reads.01.MID4.fna and 454Reads.02.MID4.fna, have sequences with different names, so all id is unique and it creates a mates.txt empty. Besides, the other bacteria I'm working with has only one fasta from 454. Both fasta are like this: >F35ERS102DJ7GS rank=0000002 x=1343.0 y=826.0 length=56 ATCAGACACGGAGGCGTACGCGCCGCTGTTCCAGGTGATGCTGGCATTCCAGAACA >F35ERS102DBYUE rank=0000006 x=1249.0 y=1428.0 length=69 ATCAGACACGCCGCCGGCACCTTCGCCGCTGCCGCGCTCGCCACCGGTGGCACCCGTCGT GCTGTGGTC >F35ERS102C47FN rank=0000036 x=1172.0 y=1361.0 length=68 ATCAGACACGAGGTGAAGACCGGTTTCCGTCGCGGCGGAGAATAGCCGAACATCAGCGCG CGATCGGG I'm wondering if there is a way to create the .mates from the data I have. Any other idea? Thanx |
![]() |
![]() |
![]() |
#14 | |
Junior Member
Location: spain Join Date: Apr 2010
Posts: 7
|
![]() Quote:
454Reads.01.MID4.fna is like this: >FZ92HC101CZUHH length=41 xy=1111_1155 region=1 run=R_2009_08_04_12_33_02_ CGCGCGTTTCTCGTACGGCTCGCTGTATCCGACNCGCGCGC >FZ92HC101DJEHD length=46 xy=1334_0127 region=1 run=R_2009_08_04_12_33_02_ GTCTCGCGTCGTGTCTTCGCGTCGTATGCGGTACTGGTCAGGCGTT 454Reads.02.MID4.fna is like this: >FZ92HC102IDBLW length=40 xy=3315_0370 region=2 run=R_2009_08_04_12_33_02_ CGCGCGTTCTCGTACGGCTCGCTGTATCCGACNCGCGCGC >FZ92HC102JYG94 length=40 xy=3966_0618 region=2 run=R_2009_08_04_12_33_02_ CGCGCGTTCTCGTACGGCTCGCTGTATCCGACNCGCGCGC Can I extract any information from these fastas to create a .mates? Thanx |
|
![]() |
![]() |
![]() |
#15 | |
Senior Member
Location: NL, Leiden Join Date: Feb 2010
Posts: 245
|
![]() Quote:
![]() Can you tell me how your .contig file looks like? The mate file should have the same name as the first string after the "#" line in the .contig file. This line represents which read has mapped to the contig (starting with ##). So if the line with "#" starts with e.g. FZ92HC102IDBLW, followed by the offset in parantheses, like; #FZ92HC102IDBLW(0) you should extract the names out of both files and put them in the same file If this is indeed the case, you can use my script i attached. Use it with; perl testmates.pl file1 file2 It will generate a txt file with the mates. Only thing to do is put the library sizes at the top of the file. more info about .contig file at http://www.cbcb.umd.edu/research/con...entation.shtml Hope this helps. Last edited by boetsie; 04-15-2010 at 06:25 AM. |
|
![]() |
![]() |
![]() |
#16 |
Junior Member
Location: spain Join Date: Apr 2010
Posts: 7
|
![]()
Hi boetsie, thanx again for your quick reply.
Here is a part of my .contig file. It was created by ace2contig (AMOS pack) and the input was the .ace that phrap generated after the assembly. I'll try to use the script u attached. Thank you so much again! ##Contig1 1 458 bases, 00000000 checksum. agttcggcatggggtcaggtggttccactgcgctattgccgccaggcaaattcttcaatc tgagaaagctgatgtaagtaattcgttcattcgctacaaggccagaaacacttcttgggt gttgtatggttaagcctcacgggtaattagtatgggttagctcaacgtatcgctacgctt acacaccccacctatcaacgttgtggtctccaacggccctttaggaccctcaaggggtca gggatgactcatctcagggctcgcttcccgcttagatgctttcagcggttatcgattccg aacttagctaccgggcagtgccactggcgtgacaacccgaacaccagaggttcgttcact ccggtcctctcgtactaggagcaactcccttcaatcatccaacgcccacggcagataggg accgaactgtctcacgacgttctgaacccagctcgcgt #FZ92HC101BPK62(0) [] 458 bases, 00000000 checksum. {1 458} <1 459> agttcggcatggggtcaggtggttccactgcgctattgccgccaggcaaattcttcaatc tgagaaagctgatgtaagtaattcgttcattcgctacaaggccagaaacacttcttgggt gttgtatggttaagcctcacgggtaattagtatgggttagctcaacgtatcgctacgctt acacaccccacctatcaacgttgtggtctccaacggccctttaggaccctcaaggggtca gggatgactcatctcagggctcgcttcccgcttagatgctttcagcggttatcgattccg aacttagctaccgggcagtgccactggcgtgacaacccgaacaccagaggttcgttcact ccggtcctctcgtactaggagcaactcccttcaatcatccaacgcccacggcagataggg accgaactgtctcacgacgttctgaacccagctcgcgt ##Contig2 1 379 bases, 00000000 checksum. ttctgagggaacacgcgttctgcgcgggttgtcttggtgctcactgttttccgccccgga gtttgtggggtgttgggggtggtgggtgtgtgttgtttgagaagtgcatagtggatgcga gcatctagcccggcgagttccttggtgttcttgttgggttgtgtgttctgcaatttcgat tctggtttgtgcgatcgcgtgttgtgatcgttgatttttgtttgttgtccgcattcgcgt ctcgggcactgtttggtgtgtggggtgtgtttgtgggtgttgttgtaagtgtttgagggc gttcggtggatgccttggtaccaggagccgatgaaggacggccgtgcggtgggtcagtga taaatcgacatgttaggtg #FZ92HC101BFQDN(0) [] 379 bases, 00000000 checksum. {1 379} <1 380> ttctgagggaacacgcgttctgcgcgggttgtcttggtgctcactgttttccgccccgga gtttgtggggtgttgggggtggtgggtgtgtgttgtttgagaagtgcatagtggatgcga gcatctagcccggcgagttccttggtgttcttgttgggttgtgtgttctgcaatttcgat tctggtttgtgcgatcgcgtgttgtgatcgttgatttttgtttgttgtccgcattcgcgt ctcgggcactgtttggtgtgtggggtgtgtttgtgggtgttgttgtaagtgtttgagggc gttcggtggatgccttggtaccaggagccgatgaaggacggccgtgcggtgggtcagtga taaatcgacatgttaggtg |
![]() |
![]() |
![]() |
#17 |
Junior Member
Location: spain Join Date: Apr 2010
Posts: 7
|
![]()
Hi, I forgot to mention that I also have the .sff if I can use them to create .mates it'll be great.
Can I? If so, how? |
![]() |
![]() |
![]() |
#18 | |
Senior Member
Location: NL, Leiden Join Date: Feb 2010
Posts: 245
|
![]() Quote:
If the mates that are present in the .contig file, are all present in the two .fasta files, you can just use the two fasta files to create the .mates file. |
|
![]() |
![]() |
![]() |
#19 |
Junior Member
Location: spain Join Date: Apr 2010
Posts: 7
|
![]()
Hi, the 454 output is sff (looks like a binary file), but we use a script called sff_extract to convert this data in fasta, xml and quality files. I was just reading now that "The 454 paired-end protocol will generate reads which contain the forward and reverse direction in one read, separated by a linker."
So I think the key to generate .mates is .sff, but I don't know how. I think I shouldn't be so complicated... ![]() |
![]() |
![]() |
![]() |
#20 | |
Junior Member
Location: spain Join Date: Apr 2010
Posts: 7
|
![]() Quote:
FZ92HC101CZUHH.1 FZ92HC102IDBLW.2 libname FZ92HC101DJEHD.1 FZ92HC102JYG94.2 libname FZ92HC101DUWKQ.1 FZ92HC102HS1LU.2 libname FZ92HC101CUUV5.1 FZ92HC102G8H4Z.2 libname FZ92HC101EMKQX.1 FZ92HC102HOD38.2 libname FZ92HC101CE653.1 FZ92HC102HO0J7.2 libname FZ92HC101ECTBB.1 FZ92HC102IBNJJ.2 libname FZ92HC101DXMSC.1 TGATCCGGCGCAGGCGTATCTGGGCTCGGATCGTGCCTGGTGCCGACGGCGATGAACGAC libname FZ92HC101C587C.1 FZ92HC102F3E16.2 libname FZ92HC101BZ63S.1 CGGTCGGCCGCGGCCGATCTCGGGATTGCGCGGCGTGTGCAT libname FZ92HC101DEODE.1 CCGCGTGGACATGCCGTTCGAGGAACCGTGGACGCAACC libname FZ92HC101DP9HX.1 ATCGGCTATGCACAGGTCATCGAGTATCTCGACGGCG libname FZ92HC101EE90B.1 ACGTCCGACGTGATCAGGAGCGAGTCGGTGACGGCGCTTCGCACTCCGAGGG libname TTTGATGATCGACATCAAT GCGTTCGACTACCAGTTCGTCGGACCATCCGGGTAGCGTGTCGCAAGGGTCGGTTCCGAA libname CGTTCGCTGAGCACCGCCGAATCGAGCAGTTCGCGGATCTCGTCGAACGTCCNCGA FZ92HC102GE3MB.2 libname CGTACGGATGTAGCTGGTGAAGAGGTCCCTTGCGGGCGGAGAAGTCGAGTCGTTCCGTCG TCGAGAGGCCGCGGAAGCGGCCGGAAAGGACGGCAACGATGTTTGACCGTTTCAACTCAG libname FZ92HC101DBOTK.1 FZ92HC102GVOHT.2 libname FZ92HC101BEEQB.1 TCTGCGTGGAGACCGTGACGGCTGATCTACGGCCNCCTCGGCCGATGATCGCCGCCT |
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|