Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • sff_extract: combining data from 454 Flx and Titanium data sets

    Hi,
    I am attempting to feed my 454 sequencing data sets into MIRA for assembly.
    Here is my data, all in .sff files:
    Flx unpaired reads (6 separate. sff files)
    Titanium unpaired reads (3 separate .sff files)
    Titanium paired-end reads (2 separate .sff files).

    I can extract to fasta and xml with the individual sets (unpaired-only and paired-only), but the MIRA assembler requires all data to be combined into a single data file.

    Does anyone know if this can be done with sff_extract? If so, can you explain to me how? The sff_extract webpage doesn't explain how to do this.

  • #2
    Have you tried to extract them toghether? Just like:

    sff_extract 1.sff 2.sff

    Comment


    • #3
      Thanks for your reply.

      Since one set of my data is paired-end, i believe i need to force sff_extract to split up the reads by inducing the -l command to look for the linker. Also, mira requests an addition of the paired-end library size and stdev info, but I wouldn't want that applied to the unpaired reads.

      I tried the cat command, but mira quit inexplicably when loading these files.

      Comment


      • #4
        Originally posted by agroster View Post
        I tried the cat command, but mira quit inexplicably when loading these files.
        You can't just cat SFF files together - they have a complex header structure. It isn't really needed, but you can use the Roche tools (or 3rd party software) to combine SFF files, but they must all have the same number of flow cycles. In your case you have both FLX and Titanium runs so you can't do this.

        In any case, don't you want to give MIRA two separate sets of 454 data
        (the pairs and the unpaired reads)?

        Comment


        • #5
          Originally posted by maubp View Post
          You can't just cat SFF files together - they have a complex header structure. It isn't really needed, but you can use the Roche tools (or 3rd party software) to combine SFF files, but they must all have the same number of flow cycles. In your case you have both FLX and Titanium runs so you can't do this.

          In any case, don't you want to give MIRA two separate sets of 454 data
          (the pairs and the unpaired reads)?
          I concatenated the separate .fasta, .qual, and .xml files that were generated post sff_extract for each type of data, not the .sff files before sff_extract.

          I DO want to give MIRA two separate sets of 454 data, but I don't know how to do this (since MIRA looks for only one file name). Does anyone know how to do this?

          Comment


          • #6
            Originally posted by agroster View Post
            I concatenated the separate .fasta, .qual, and .xml files that were generated post sff_extract for each type of data, not the .sff files before sff_extract.
            Concatenating FASTA and QUAL files is fine, but I don't think you should concatenate XML files together. MIRA may cope, but I would expect any XML validator to reject such a file.
            Originally posted by agroster View Post
            I DO want to give MIRA two separate sets of 454 data, but I don't know how to do this (since MIRA looks for only one file name). Does anyone know how to do this?
            Have you read the "Walkthrough: combined unpaired and paired-end assembly of Brucella ceti" example in the MIRA manual?

            Comment


            • #7
              Ok, i now see the issue - I need to append my subsequent sff extractions using the "-a" option. i.e. do three subsequent extractions, each appending to the previous. I'll see if this works.

              Comment


              • #8
                Originally posted by agroster View Post
                Ok, i now see the issue - I need to append my subsequent sff extractions using the "-a" option. i.e. do three subsequent extractions, each appending to the previous. I'll see if this works.
                Ah, I'm too late. Actually, two extractions are enough. First all the unpaired, then all paired with -a.

                Regards,
                Bastien

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM
                • seqadmin
                  Techniques and Challenges in Conservation Genomics
                  by seqadmin



                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                  Avian Conservation
                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                  03-08-2024, 10:41 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 06:37 PM
                0 responses
                7 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, Yesterday, 06:07 PM
                0 responses
                7 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-22-2024, 10:03 AM
                0 responses
                49 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-21-2024, 07:32 AM
                0 responses
                66 views
                0 likes
                Last Post seqadmin  
                Working...
                X