Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to sort with sfffile/sffinfo...

    Hey all!
    I've got an unusual request. I want to be able to access just the reads that do not match any barcodes. I can easily sort reads into .sffs based on barcodes, but I want to look at the reads that don't match any barcodes. Is there a way to do that?
    Thank you!!

  • #2
    You could split your .sff file by MID, then use sffinfo with -a to output just the accession numbers from each of the resulting .sff files and the original .sff file. Then use the lists from the split files to subtract from the list from the original file. There are probably several ways to do that, but a simple one would be to just put all of the lists into one column in Excel and use the "remove duplicates" function. Finally, use that list with the -i option of sfffile to get a .sff file with just those.

    Now that I think about it, you could just combine all of the accession number lists from the split files and use that list with the -e option of sfffile. That's a bit simpler.

    Comment


    • #3
      Thank you! I'll give that a try.

      Comment


      • #4
        Hi Anthony, I did this a year or so ago, and the Fastx toolkit can do what you want. The barcode splitter in that toolkit puts the unmatched reads into a separate file. I have in my notes that the key was figuring out that the 454 .fna file isn’t really fasta format. You may have to use "fasta formatter" first. No guarantee, but I hope this helps.

        Comment


        • #5
          Have you realized that a read in a dataset seemingly lacking a MID in a datasets of just *all* MID-tagged reads means you or the software failed to identify the MID? ;-) At least you should throw away the first 4+10 or 4+11 nt from your reads (provided you speak about reads on the left end, right after the sequencing key).

          Did you use RapidLibrary protocol or one of those with GSMIDs/TiMIDs? If Rapid, then mask also the rcRLMIDs somewhere on the right side of each read. Or was that a multiplexing setup with different TiMIDs/GSMIDs on each side? Then treat it same way like when RapidLib was involved.

          Finally to say, if you sequenced beads with some MID-tagged samples and some without MID tags then the above still does apply. Rather throw away the nucleotides where a MID, possibly masked due to sequencing errors, might reside. And next time separate the samples into different regions. ;-)

          In brief, just don't do this next time, that is my best advice, really.

          Comment


          • #6
            I actually ran across what is probably the simplest way to do this a couple of days ago. This option is not in the documentation for some reason, but you can get it if you call up the sfffile help at the command line. If you do that, you will see the -umid option, which allows you to specify a name under which to report all reads that don't match any of the specified MIDs. So, forget all that list stuff, just add in -umid <name> to the options list when you use sfffile and it's done.

            Comment


            • #7
              Number6, I'll look into that. That could be helpful.

              Martin2, I think you misunderstood the question. I want to be able to look at reads that don't sort to any barcode. It doesn't matter how they were prepared. And, no, I did not mix MID-tagged with non-tagged samples. I can't imagine how that would be useful...is that what you were referring to in that last line?

              aj, you're using a newer software version than I, I'm afraid...that option doesn't seem to be available in 2.5.3. Someday we'll get upgraded.

              Comment


              • #8
                Ah, yes. I do have 2.8. I don't know when this option showed up, but there have been several versions between 2.5.3 and 2.8, so it's hard to say.

                You might want to talk to your FAS about getting a newer version. In my experience, it doesn't take any more than an email and they'll send you a link to download it. The latest versions, 2.7 (Jr.) and 2.8 (FLX), have some new options and features in the analysis programs that might be useful to you.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Advancing Precision Medicine for Rare Diseases in Children
                  by seqadmin




                  Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                  12-16-2024, 07:57 AM
                • seqadmin
                  Recent Advances in Sequencing Technologies
                  by seqadmin



                  Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                  Long-Read Sequencing
                  Long-read sequencing has seen remarkable advancements,...
                  12-02-2024, 01:49 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 12-17-2024, 10:28 AM
                0 responses
                26 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 12-13-2024, 08:24 AM
                0 responses
                42 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 12-12-2024, 07:41 AM
                0 responses
                28 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 12-11-2024, 07:45 AM
                0 responses
                42 views
                0 likes
                Last Post seqadmin  
                Working...
                X