![]() |
|
|||||||
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Orientation of 454 paired end reads split by linker | skblazer | 454 Pyrosequencing | 8 | 04-26-2012 05:30 PM |
| 454 PE-linker question | SES | Bioinformatics | 6 | 11-09-2010 03:14 AM |
| who know extract linker/Primer sequences for HTS sequencing? | feng | Bioinformatics | 2 | 10-26-2010 02:19 PM |
| Linker Bias in 454 Paired-End Libraries | lzembek | Sample Prep / Library Generation | 5 | 06-03-2010 02:32 AM |
| sff_extract: combining data from 454 Flx and Titanium data sets | agroster | Bioinformatics | 7 | 01-14-2010 10:19 AM |
![]() |
|
|
Thread Tools |
|
|
#1 |
|
Member
Location: Adelaide Join Date: Jul 2011
Posts: 11
|
A quick question...
I have some sff files from a paired end 454 run using the titanium linker. When I extract the fastq data from the sff files using: sff_extract.py -Q -l linker.fasta *.sff Do I need to include the reverse compliment in the linker.fasta file like: >titanium_linker_seq TCGTATAACTTCGTATAATGTATGCTATACGAAGTTATTACG >titanium_linker_seq_rc CGTAATAACTTCGTATAGCATACATTATACGAAGTTATACGA I'm using sff_extract 0.2.8 Cheers, Nathan |
|
|
|
|
|
#2 | |
|
Member
Location: Adelaide Join Date: Jul 2011
Posts: 11
|
Apologies for replying to my own thread, but I thought it'd help with future replies and archiving purposes.
According to section 5.5.4.3 Extracting paired-end data from SFF (pg 82-83) of the "Sequence assembly with MIRA3 - The Definative Guide": Quote:
|
|
|
|
|
|
|
#3 |
|
Member
Location: Missouri, USA Join Date: May 2011
Posts: 16
|
Hello,
I will be using MIRA too for performing 454 assembly. I would be glad if you could answer to my questions. I am confused about what an Insert actually is ? And what is an insert size ? I am dealing with 454 paired end data. How is a Linker different from an Adaptor? I will be using MIRA's sff extract to extract the fasta and qual files from the sff files. So does the sff files have read info this way: |-----75----|------------------------100-----------------|-----75-----| i.e - Seq.forward - Linker - Seq.reverse ?? In the above, what is an insert ? and whats the size ? So do each of the reads in an sff have the above format ? So in the MIRA's mates.file setup to perform scaffolding, can I just use the following default format for the mate pairs ? pair (.*)\.f (.*)\.r Please help me out with my confusions: Thanks Aarthi |
|
|
|
|
|
#4 |
|
Member
Location: Adelaide Join Date: Jul 2011
Posts: 11
|
Hi Aarthi,
454 inserts - Please see the blue box on the 2nd page of the following 454 flyer for a diagrammatic explanation of what inserts are and how the sequences are generated and where the linker/adaptor might be positioned anywhere in the sequence (or in fact, nowhere): http://454.com/downloads/De-Novo-Com...omes-Flyer.pdf Standard 454 protocols can generate 3kb, 8kb, 20kb paired end libraries. Essentially you have the linker/adapter flanked by some of the DNA from your organism that was approx 3kb, 8kb or 20kb apart in your original organism. MIRA's sff_extract will take care of reorientating the sequences to a more standard format expected by most assembly software. The default format for 454 pairs are as you describe and MIRA will take care of this. If sff_extract didn't find the linker/adaptor sequence in a read, it will treat it as a single end (shotgun) read without a pair i.e. maximise the usage of raw data. The sff_extract (v0.2.8) command I use for creating a FASTQ file from the SFF file from a 20kb paired end library is as follows: sff_extract -Q -c -l 454_titanium_linker.fasta -i “insert_size:20000,insert_stdev:5000” sff_file.sff Hope this helps. Nathan |
|
|
|
|
|
#5 |
|
Member
Location: Missouri, USA Join Date: May 2011
Posts: 16
|
Thank you so much
![]() A comapany called 'Seqwright' sequenced the data for us and provided with 3kb, 8kb and 20kb libraries. So now i understand that the insert size is 3kb, 8kb and 20kb respectively. But they had done an initial assembly with newbler for us and in the newbler metrics, in the paired read status section a 'pairDistanceAvg' is given. So is that the insert size ? They have given 2 sff files per library, since they say- 'reason you have two files per run is because it’s sequenced on the DNA chip with two regions'. e.g for the 3kb library for sff file 1 - the pairdistanceavg = 2247.3, pairDistdev=561.8 and for sff file 2 pairdistavg is 2254.9 and pairDistdev 563.7. Why are these not 3kb ? And to enter the stddev, do i sum up both of them or do I take the average ?? And since they are 2 sff files per run, for 'sff_extract' can I give in the 2 sff files as input along with the script u mentioned above and will it output the fasta, qual and xml files into just one file ?? |
|
|
|
|
|
#6 |
|
Member
Location: Missouri, USA Join Date: May 2011
Posts: 16
|
And I would like to add about using shotgun sff files in the bambus scaffolding step.
Please let me know if I got this right. When the sheared DNA fragments are circularized with an adaptor/linker, they are fragmented again. And some of these fragments will have the adaptor flanked by read pairs approx 150bp on each side, and there will be some other fragments with NO adaptor in between them obviously. So these frags with no adaptors are the shotgun sequences ? Which is why you provide the shotgun sequence sff files to bambus so that it will not miss out on that data ? Did I get it correct ? |
|
|
|
|
|
#7 |
|
Member
Location: Missouri, USA Join Date: May 2011
Posts: 16
|
Also for the mates.file as I am required to provide the minimum insert size(which is mean of insert size-stddev) and maximum insert size(which is mean of insert size-stddev). I hope I got these right ?
So for example to consider the mean insert size of the 3kb run, would it just be 3000 or would it be the average of the numbers 2247.3 and 2254.9 of the 2 sff files that I mentioned earlier ?? Same is applied for the standard deviations. Do i again consider the averages ? (561.8+563.7/2) ?? Did you setup a mates files yet for scaffolding ? If yes may I know how u set it up with respect to the naming convention? |
|
|
|
|
|
#8 | |||
|
Member
Location: Adelaide Join Date: Jul 2011
Posts: 11
|
Quote:
Quote:
Quote:
Some links you, or readers of this post, might find useful:
|
|||
|
|
|
|
|
#9 | |
|
Member
Location: Adelaide Join Date: Jul 2011
Posts: 11
|
Quote:
Some links you might find useful |
|
|
|
|
|
|
#10 | |
|
Member
Location: Adelaide Join Date: Jul 2011
Posts: 11
|
Quote:
|
|
|
|
|
|
|
#11 |
|
Member
Location: Missouri, USA Join Date: May 2011
Posts: 16
|
Thankyou very much ! that was really helpful.
I am sorry to bother you with all the questions. Can I ask you one last question. For the scaffolding with bambus is it necessary that we provide the mates file ? If yes, since we cannot read the sff file, do all the sff's contain the format that i mentioned ? (.*)\.f (.*)\.r (with an 'f' and and 'r' to it) ? And can I just blindly assume to give in this ?? may I know if you have provided the mates and the conf file ? Thanks !! |
|
|
|
|
|
#12 | |||
|
Member
Location: Adelaide Join Date: Jul 2011
Posts: 11
|
Quote:
Quote:
Quote:
e.g. you would have a simplified workflow something like this: Code:
file.sff ----> sff_extract ----> file.fastq ----> chosen_assembly_tool ----> assembly_output Here's some more resources you may find useful: |
|||
|
|
|
|
|
#13 |
|
Member
Location: Missouri, USA Join Date: May 2011
Posts: 16
|
Thank you
Do we have to perform the step of appending the shotgun files to the extracted paried end fasta files ? Or can we do the assembly of the extracted files by sff extract directly ? Because when I appended the shotgun files to the extracted paired end fasta files and performed the assembly , it shows memory allocation problem !! The memory of my linux machine is 7GB.. isnt that enough ? how much memory does mira require to perform the asembly ? may I know how u performed your assembly ? did u append the shotgun files or just did an assembly of the paired ends extracted fasta files ?? Thanks |
|
|
|
|
|
#14 |
|
Member
Location: Missouri, USA Join Date: May 2011
Posts: 16
|
have you used MIRA to perform assembly ? If NO, then which wud u suggest ? Since you have converted the sff's to fastq, i assume you used an illumina denovo software??
|
|
|
|
|
|
#15 | |
|
Member
Location: Adelaide Join Date: Jul 2011
Posts: 11
|
Quote:
MIRA is an Overlap/Layout/Consensus (OLC) type assembler. They inherently require lots of memory for all but the smallest genomes. Try using the miramem command to estimate what the memory requirement is likely to be. |
|
|
|
|
|
|
#16 |
|
Member
Location: Missouri, USA Join Date: May 2011
Posts: 16
|
Extracting the files from sff:
sff_extract -c -l linker.fasta "insert_size:2500,insert_stdev:500" file1.sff file2.sff -o 3kb_norton This worked perfect and had given me the fasta, qual and xml.. Appending shotgun files: sff_extract -a shotgun1.sff shotgun2.sff shotgun3.sff -o 3kb_norton This also appended the sff's and the sizes of the initial fasta,qual and xml files changed to a much larger size to allocate the shotgun seqs... Assembly: mira --project=3kb_norton --job=denovo,genome,accurate,454 -SK:mnr=yes:nrr=10 >&3kb_log_assembly.txt now this also showed no error.. it started running.. and after a while below the command it said: tcmalloc: large alloc 1482399744 bytes == 0*867e000 @ and below this, after a while it said: Aborted |
|
|
|
|
|
#17 |
|
Member
Location: Missouri, USA Join Date: May 2011
Posts: 16
|
The log file at the end showed this :
Dynamic allocs: 0 Align allocs: 0 Out of memory detected, exception message is: std::bad_alloc You are running a 32 bit executable. Please note that the maximum theoretical memory a 32 bit programm can use (be it in Linux, Windows or other) is 4 GiB, in practice less: between 2.7 and 3.3 GiB. This is valid even if your machine has hundreds of GiB. Should your machine have more that 4 GiB, use a 64 bit OS and a 64 bit version of MIRA. ---- So i downloaded the 64 bit of MIRA and after the command mira --project=3kb_norton --job=denovo,genome,accurate,454 -SK:mnr=yes:nrr=10 >&3kb_log_assembly.txt it said: tcmalloc: large alloc 2323759104 bytes == 0*1f5b9000 @ so the number of bytes increased |
|
|
|
|
|
#18 |
|
Member
Location: Adelaide Join Date: Jul 2011
Posts: 11
|
It looks like you don't have enough memory to do this assembly. As mentioned before, try the miramem command to estimate the memory requirement for this assembly....that way you'll know if you're in the ball park of what is required by MIRA.
Last edited by nathanhaigh; 08-17-2011 at 04:48 PM. Reason: typo |
|
|
|
|
|
#19 |
|
Member
Location: Missouri, USA Join Date: May 2011
Posts: 16
|
yes seems like it..
So i just wanted to ask you if appending the shotgun files is a must ? have you done the appending and performed the assembly ? If NO, what are the softwares u suggest I do a 454 denovo with ? Since you converted to fastq, i assume you used an illumina assembler ? |
|
|
|
![]() |
| Tags |
| 454, sff_extract, titanium |
| Thread Tools | |
|
|