Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Thank you! It worked beautifully.

    Comment


    • #17
      Hello Again,
      I thought I would ask you a follow up question since you had a tool that nicely worked for my other issues. Do you have a script that will pull out reads that match a certain sequence with a certain number of mismatches?

      I have a sequence of about 18bp that a subset of my reads contain somewhere within the read and I would like to be able to pull them out allowing for 1 or 2 mismatches?

      THANKS,
      JEN

      Comment


      • #18
        Hi Jen,

        You can use BBDuk for that:

        bbduk.sh in=reads.fq out=unmatched.fq outm=matched.fq literal=ACGTACGTACGTACGTAC k=18 mm=f hdist=2

        Make sure "k" is set to the exact length of the sequence. "hdist" controls the number of substitutions allowed. "outm" gets the reads that match. By default this also looks for the reverse-complement; you can disable that with "rcomp=f".

        Comment


        • #19
          Wonderful! I will try it. These tools are amazing!
          Jen

          Comment


          • #20
            Finding RNA-editing

            I have strand specifice RNA-seq data from different samples and I'm interested to find the possible RNA editing between samples, I used CLC workbench to call variants and did the comparison and in the output I have SNP and MNP variation. My question is that how I can filter these variants to find editing sites vs SNP?

            Thanks

            Comment


            • #21
              That doesn't really have anything to do with this thread, so I suggest you post it in a new thread.

              Comment


              • #22
                Hi again Brian,
                Apparently you have all of the tools that I need for this problem that I am dealing with in my data. Now I am wondering if you have a tool that will extract a random subset of reads from a fasta file? For example, I have a fasts file of about 50,000 reads. I want to align them to a database that has a limit of only 3000 reads at a time so I would like to pull out randomly, a subset of 3000 reads from my file. I prefer not to pull out the first 3000 using head or the last 3000 using tail. I could write a quick script for this but thought I would first ask you.

                Thanks again for all of your help.
                Jen

                Comment


                • #23
                  Hi Jen,

                  You seem to be asking all the right questions!

                  reformat.sh in=reads.fasta out=sampled.fasta sample=3000

                  There are various other sampling options too, like a specific number of bases or a specific fraction of the total number of reads, but that's the one you want in this case.

                  -Brian

                  Comment


                  • #24
                    Yes another awesome script!
                    Thank you so much. If we ever get to publication, we will certain cite your tool!
                    jen

                    Comment


                    • #25
                      Another question, Brian. Have you ever seen a case where the reformat.sh script did not work properly? I have made 4 subsets of read ids into name files from a large fasta file and I am trying to separate the reads into 4 different fasta files based on my 4 different names file however 3 of these work fine and I get my expected subset of reads but one of them is not working. I can not figure out what is going on with it. I used all of the same steps to generate the 4 of them but for some reason, one subset is not working at all. I then took a couple of reads in that name file and did a grep with my big file as a sanity check and it pulled the reads out just fine. Any idea here? my command line:

                      filterbyname.sh in=combined_seqs.fa out=subset4.fa names=subset_names.txt include=t overwrite=true

                      Jen

                      Comment


                      • #26
                        Hi Jen,

                        Are you talking about reformat not working or filterbyname not working? Reformat is very well tested, used hundreds of times a day, and I have only heard of one bug in it in the last 5 months, which has been fixed. filterbyname is not used nearly as much, though I still have not encountered a situation in which it failed recently.

                        What is the format of your names file? Well, specifically, can you give an example of a fasta entry in the fasta file, and a line from the names.txt file, that you expect to match but don't - as well as the console output of the program? Or, if they're small and non-confidential, you can email them both to me and I'll investigate. I suspect it's a formatting issue.

                        Comment


                        • #27
                          Hi Brian,
                          I was talking about the filterbyname.sh script. However, I just did a sanity check on a subset and it worked. Then I reran it on the full data set and it worked. Maybe I was just too tired on Friday or something and I was missing something somewhere.

                          At any rate, it worked great and now I am moving along with my project.

                          Thank you again for your help!
                          Jen

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Advancing Precision Medicine for Rare Diseases in Children
                            by seqadmin




                            Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                            12-16-2024, 07:57 AM
                          • seqadmin
                            Recent Advances in Sequencing Technologies
                            by seqadmin



                            Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                            Long-Read Sequencing
                            Long-read sequencing has seen remarkable advancements,...
                            12-02-2024, 01:49 PM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 12-17-2024, 10:28 AM
                          0 responses
                          33 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 12-13-2024, 08:24 AM
                          0 responses
                          49 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 12-12-2024, 07:41 AM
                          0 responses
                          34 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 12-11-2024, 07:45 AM
                          0 responses
                          46 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X