Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to extract sequences between adaptors ?

    Hy Everybody,

    i have a fasta file with a lots of miRNAsequences. The problem is that we have done a concameration experiment before run on 454 machine. So now i have this situation:

    >1
    AdaptorA__miRNA_AdaptorB
    >2
    XXXXXX-AdaptorA_ miRNA_AdaptorB_XXXXXX_AdaptorA_miRNA_AdaptorB

    etc. ( XXX.. are random nucleotides)

    How can i extract sequences between the two adaptors considering that some reads may present the possibilities to have until 4 miRNA ( wiht 4 adaptorA and 4 AdaporB) ???

    Can anyone suggest me a script so to not loose any miRNA ??? Thank you very much

    Giorgio

  • #2
    I think I might have an advice and question, I have similar situation in my case its not straight forward and I am not sure about your case.
    If the adapter is consistently placed for example XXXX is always the same place with reference to all reads then a script would make sense, provided a fasta file with known adapters. see (http://www.bioperl.org/wiki/Removing...ncing_adapters) I use tool provided with commencial software from softgenetics to remove adapters and it can automatically detect them or provide file. However, if your describing concatenation which is random then XXXX is entirely dependent on your alignment. There is no way you can remove them without doing alignment.
    Also I thought contatenation applies to ends of the reads i.e.

    >1
    XXXXX_AdpaterA
    >or 2
    Adapter2_XXXXX

    This has been my case with miRNA seq at least with illumina. You may consult the sequencing platform documentation.
    Last edited by husamia; 07-08-2011, 09:05 AM.

    Comment


    • #3
      Thank yo for your reply. Yes the problem is that the adaptors and the miRNAs don't occure always in a regular way. All the reads are different from the others. I may have only one miRNA in a read between two adaptors, or Two miRNAs or Three until a max of four miRNAs in one read. But not always these miRNAs are perfectly closed by their adaptors so i can't use the most frequently scripts to extract adaptors and obtai only the miRNAs. If is it suitable for this situation can you explain me the way of the alignment please, telling a little pipeline ??? Thanks a lot.

      Cheers,

      Giorgio

      Comment


      • #4
        I have used a tool called cutadapt for trimming adapter sequences. It works really well when you have partial adapter sequence in your reads:

        Comment


        • #5
          Originally posted by Giorgio C View Post
          If is it suitable for this situation can you explain me the way of the alignment please, telling a little pipeline ???
          it seems that your read lengths are variable, i.e. you may be using 454 instead of illumina. In my case my read length is same for all reads. It makes it easier to generalize when taking read length into account for all reads since they are the same length for script writing which requires some assumptions.
          In my case the read length of all reads is 40 and my adapter is 15 so my unknown mirs is 25bps. You see in my case I doubt that I will be capturing more than one read or adapter so 1adapter=1read for all reads. I simply trimmed my reads based on known adapter sequence. So I ended up with reads ranging from 25-40. I also did quality trimming based on base call but read length is not below 25bps. Then I did alginment to whole genome for reads that were trimmed and reads that were not trimmed I got ~40% of non-trimmed reads and ~90% of trimmed reads to whole genome. Thats my general pipeline. I am sure there are better ways than this but I have limited set of tools. If you can write script I hope this may help you make basic assumptions which is needed to try removing reads based on certain criteria and see what you get.

          Comment


          • #6
            Thanks for your reply. Infact in my case these are 454 reads. And as you said this pipeline is not aplicable. I try other ways.

            Cheers,
            Giorgio

            Comment


            • #7
              Perl script?

              a quick perl script that just clips out adaptor sequences may work:

              Code:
              perl -pe 's/(AdaptorA|AdaptorB)//g' <fileName>
              For example:
              Code:
              $ perl -pe 's/(AdaptorA|AdaptorB)//g' file.txt 
              >1
              __miRNA_
              >2
              XXXXXX-_ miRNA__XXXXXX__miRNA_
              Last edited by gringer; 07-12-2011, 07:58 AM. Reason: wrong tag

              Comment


              • #8
                Oh, wait, you wanted the sequence between the adaptors only. That's a bit more tricky, but might still come under the 'perl can do it easy' umbrella.

                Code:
                $ perl -pe 's/.*?((AdaptorA|AdaptorB)(.*?)(AdaptorA|AdaptorB))/$3/g' file.txt 
                >1
                __miRNA_
                >2
                _ miRNA__miRNA_

                Comment


                • #9
                  Thanks you very much, it's a very useful script !!!

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Essential Discoveries and Tools in Epitranscriptomics
                    by seqadmin




                    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                    04-22-2024, 07:01 AM
                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 11:49 AM
                  0 responses
                  15 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-24-2024, 08:47 AM
                  0 responses
                  16 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  61 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  60 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X