Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • software to cut adaptor

    Hi all,

    Since I am new here, I am not familiar with the data analysis from SOLiD platform.

    Yesterday, I was told to cut the reads(50bp) into 30bp or shorter before performing mapping, otherwise I won't get anything since almost half of the read is from adaptor.

    So, could you give me some suggestions about which software can do this work and what the adaptor should I prepare before?

    Thanks a lot.

    BTW, I am focus on miRNA sequencing. And yesterday I used PerM to mapping the reads with the genome reference, but I only got 0.03% mapping result.

  • #2
    You need to know how the library was constructed to determine where to trim. Ideally whoever generated the library would know this information.

    Comment


    • #3
      I just the person who make the construction, and already have the adaptor sequence, but what I don't know is that which software can do the trimming, or should I write a script with Perl to do so?

      Comment


      • #4
        If you want to write a script yourself, our HTSeq framework might be useful. It has functions to partially match an adapter sequence to a read and trim the read this way.

        Here is how this would roughly look like:

        Code:
        import HTSeq
        
        file_in = "yeast_RNASeq_excerpt_sequence.txt"
        file_out = "trimmed.fastq"
        
        adapter = HTSeq.Sequence( "ACCGTA" )
        adapter_rc = adapter.get_reverse_complement()
        
        fout = open( file_out, "w" )
        for read in HTSeq.FastqReader( file_in ):
           read = read.trim_right_end( adapter )
           read.write_to_fastq_file( fout )
        fout.close()
        This is now for Sanger FASTQ, but I guess it should work with CSFASTQ as well.

        Simon

        Comment


        • #5
          Hi, Simon

          I was wondering whether I should change the format of csfasta to fastq before trimming, while your suggestion definitely helps me a lot.

          Thanks a lot.

          Tiffany

          Comment


          • #6
            If you have the adapter sequence with you then try using the program fastx_clipper which is a part of FASTX-toolkit.


            Hope this helps.

            Comment


            • #7
              Tiffany, my group are actively developing a new aligner, NovoalignCS, for colorspace and this would be a great feature to have. In fact we already have it in our NT space aligner.

              We currently support iterative read trimming in cases where the adaptor is not known.

              Are your adaptors in nucleotide space? I would be interested in obtaining some test data if that's possible and we could provide you with a beta version of the working program.


              Originally posted by tiffany081126 View Post
              Hi all,

              Since I am new here, I am not familiar with the data analysis from SOLiD platform.

              Yesterday, I was told to cut the reads(50bp) into 30bp or shorter before performing mapping, otherwise I won't get anything since almost half of the read is from adaptor.

              So, could you give me some suggestions about which software can do this work and what the adaptor should I prepare before?

              Thanks a lot.

              BTW, I am focus on miRNA sequencing. And yesterday I used PerM to mapping the reads with the genome reference, but I only got 0.03% mapping result.

              Comment


              • #8
                If you don't want to use another aligner you could just make a new file like this:
                awk '{print substr($0,0,21)}' filename.csfasta > filename.csfasta.20

                else try bowtie, you can either set it to use a short seed or trim ends.

                Comment


                • #9
                  Originally posted by Tina View Post
                  If you have the adapter sequence with you then try using the program fastx_clipper which is a part of FASTX-toolkit.


                  Hope this helps.
                  Tina,

                  Thanks a lot. I am using this toolkit these days.

                  Tiffany

                  Comment


                  • #10
                    Originally posted by zee View Post
                    Tiffany, my group are actively developing a new aligner, NovoalignCS, for colorspace and this would be a great feature to have. In fact we already have it in our NT space aligner.

                    We currently support iterative read trimming in cases where the adaptor is not known.

                    Are your adaptors in nucleotide space? I would be interested in obtaining some test data if that's possible and we could provide you with a beta version of the working program.
                    zee,

                    Thanks a lot. But I will try the toolkit first. Connect you later if it doesn't work.

                    Tiffany

                    Comment


                    • #11
                      Originally posted by Chipper View Post
                      If you don't want to use another aligner you could just make a new file like this:
                      awk '{print substr($0,0,21)}' filename.csfasta > filename.csfasta.20

                      else try bowtie, you can either set it to use a short seed or trim ends.
                      Chipper,

                      But I think it's too blind to do so. Any other good ideas to trim it according to the known sequence of the adaptor?

                      Nevertheless, thanks a lot for your suggestion.

                      Tiffany

                      Comment


                      • #12
                        mirTools web site provides a perl script for adaptor trimming.

                        Comment


                        • #13
                          Originally posted by patternist View Post
                          mirTools web site provides a perl script for adaptor trimming.

                          http://centre.bioinformatics.zj.cn/m...daptortrim.php
                          Thanks, patternist!

                          And I find this info is very useful for me.

                          Thanks a lot.

                          Tiffany

                          Comment


                          • #14
                            I see requests for an adapter-removal software quite often and since fastx_clipper does not seem to be able to deal with color space data, I have now made the tool we use in our group available for download. Please have a look https://code.google.com/p/cutadapt/ and write me if you have any questions.

                            Comment


                            • #15
                              Originally posted by mmartin View Post
                              I see requests for an adapter-removal software quite often and since fastx_clipper does not seem to be able to deal with color space data, I have now made the tool we use in our group available for download. Please have a look https://code.google.com/p/cutadapt/ and write me if you have any questions.
                              Thanks very much, but I have solve the problem by a script of my senior.

                              Thansks again.

                              Tiffany

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              18 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              22 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              16 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              47 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X