Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • pmiguel
    Senior Member
    • Aug 2008
    • 2328

    How to extract paired-end reads from .sff 454?

    I have a Titanium paired end .sff and want to convert it to fasta and qual files. But I want the paired end linker removed and the reads containing them split into "right" and "left" side reads. (Best if the distal part of the paired end would also be reverse complemented)

    Just want to try some other assembly engines. Small bacterial genome using 3kb paired end Titanium protocol.

    Best way to do this? I can write a script to parse the trim info file, but that is work. Would prefer something like an sffinfo option or a program someone else has already written.

    --
    Phillip
  • kmcarr
    Senior Member
    • May 2008
    • 1181

    #2
    According to its documentation sff_extract (http://bioinf.comav.upv.es/sff_extract/index.html) can do this. I have used sff_extract but not on paired end data so I can't offer any first hand information.

    Comment

    • forevermark4
      Junior Member
      • Jan 2009
      • 6

      #3
      Hi everyone

      I just started to work with next generation sequencing data . I have following query : If you can provide me help to handle this kind of simulation and reassembling problems. How to generate reads from sequence. i have fasta file. I think we can go for maq toll for simulation. Nut not be able to work out.

      To establish simulation of reassembling sequence from NGS data. This will build from re-assembling a simple sequence of 1 Mb with no repeats in the haploid state, to inclusion of genetic variation and polyploidy.
      -simulate a NGS run from a 1 Mb segment of human with little/no repeats. Average fragment size 500 bp with normal distribution. Paired end with 75 bp reads. Assume perfect sequencing. Check out other simulation methods
      - align the reads back to the 1 Mb sequence. How much variation in coverage
      - reassemble the reads WITHOUT using the reference sequence.

      Thanks

      Comment

      • themerlin
        Member
        • Feb 2010
        • 51

        #4
        I have had good luck with sff_extract. All you need is the linker sequence, insert length and insert length standard deviation. Then you run:

        sff_extract -l linker.fasta yoursff.sff -i "insert_size:XXXX, insert_stdev:XXX" -o prefix

        -Jason

        Comment

        • maven
          Member
          • Oct 2009
          • 11

          #5
          This can be done with 454 software too, although there are bound to be differences in the result based on the specifics of the linker-recognition algorithms.

          runAssembly -tr -noa -no myfile.sff

          It's not the friendliest of output in that it generates an assembly directory and a few extra files that are unneeded for this use case, but it gets the job done. I've done this with version 2.3, I don't know about earlier versions.

          Comment

          • pmiguel
            Senior Member
            • Aug 2008
            • 2328

            #6
            Originally posted by maven View Post
            This can be done with 454 software too, although there are bound to be differences in the result based on the specifics of the linker-recognition algorithms.

            runAssembly -tr -noa -no myfile.sff

            It's not the friendliest of output in that it generates an assembly directory and a few extra files that are unneeded for this use case, but it gets the job done. I've done this with version 2.3, I don't know about earlier versions.
            That looks like just what I want. Alas:

            runAssembly -tr -noa -no GB71BC401.sff

            gives me:

            Error: Invalid option: -noa.
            Usage: runAssembly [-o projdir] [-nrm] [-p (sfffile | [regionlist:]analysisDir)]... (sfffile | [regionlist:]analysisDir)...

            I am running v. 2.3

            --
            Phillip

            Comment

            • westerman
              Rick Westerman
              • Jun 2008
              • 1104

              #7
              The closest option to '-noa' is '-noace' which skips the output of ACE files, etc.

              Comment

              • maven
                Member
                • Oct 2009
                • 11

                #8
                -noa is supposed to tell it to not actually bother doing the assembly itself. The -no option turns off most output generation, since the goal here is to just generate the split fasta (and qual) file. Both options are .... optional ... in the sense that once it gets past the first stage of the assembly you can manually kill it if you don't want to sit around waiting for an assembly to complete. The fasta file should still be there, as it's generated prior to actually starting the assembly.

                Comment

                • pmiguel
                  Senior Member
                  • Aug 2008
                  • 2328

                  #9
                  Originally posted by maven View Post
                  -noa is supposed to tell it to not actually bother doing the assembly itself. The -no option turns off most output generation, since the goal here is to just generate the split fasta (and qual) file. Both options are .... optional ... in the sense that once it gets past the first stage of the assembly you can manually kill it if you don't want to sit around waiting for an assembly to complete. The fasta file should still be there, as it's generated prior to actually starting the assembly.
                  Alright! Leaving out the -noa worked. It did create a new assembly directory and do the assembly, but that didn't take long.

                  Thanks!
                  --
                  Phillip

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Pathogen Surveillance with Advanced Genomic Tools
                    by seqadmin




                    The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                    03-24-2025, 11:48 AM
                  • seqadmin
                    New Genomics Tools and Methods Shared at AGBT 2025
                    by seqadmin


                    This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                    The Headliner
                    The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                    03-03-2025, 01:39 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 03-20-2025, 05:03 AM
                  0 responses
                  49 views
                  0 reactions
                  Last Post seqadmin  
                  Started by seqadmin, 03-19-2025, 07:27 AM
                  0 responses
                  57 views
                  0 reactions
                  Last Post seqadmin  
                  Started by seqadmin, 03-18-2025, 12:50 PM
                  0 responses
                  50 views
                  0 reactions
                  Last Post seqadmin  
                  Started by seqadmin, 03-03-2025, 01:15 PM
                  0 responses
                  201 views
                  0 reactions
                  Last Post seqadmin  
                  Working...