Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fastq adaptors removal/stripping/cleaning

    Hi guys,
    I'm facing such a very dumb problem.
    I cannot find a tool which simply strip off adaptors from Fastq Illumina files. I have contamination from library synthesis adaptors (SMART).
    -Seqclean only works with fasta.
    -Lucy2: libgtk1.2 libraries no longer supported in my linux distro (and I don't even know if it handle fastq)
    -fastx_clipper form FastX-toolkit: makes a big mess, cause it doesn't only strip the adaptor but blow away the whole sequence (it's not supposed to behave like this): it results in loss of more than 1/3 of the dataset.

    Other solutions are integrated in assembler or aligner, but I need a crude trimmed fastq as output.

    Does anybody know something which might be helpful to me?

    Thanks in advance!!

    Davide

  • #2
    fastx_clipper seemed to work fine for me. Are you sure you're running it with the parameters you want? (I did have some issues with clipping paired-end reads, where fastx_clipper would blow away one of the sequences if it was too short after clipping, leading to unmatched pairs, and had to modify fastx_clipper to leave sequences in even if they were completely clipped)

    Comment


    • #3
      Personally I use Biopython with some simple adaptor matching.

      There ought to be a tool in EMBOSS to do this too...

      Comment


      • #4
        With some of the tools in fastx_toolkit, I've found redirection and flags act differently. Most give the option of using -i infile -o outfile or `cat inline | tool > outfile` - if you're using -i -o, try cat in | > out and see if that works better. I forget which one usually gives me issues with it.

        Comment


        • #5
          did you try novoaligns adapter trimming? That ought to help
          --
          bioinfosm

          Comment


          • #6
            I'm using this command.
            I want to clarify I'm not a bioinformatician.

            fastx_clipper -a AAGCAGTGGTATCAACGCAGAGTACGCGGG -i 1M_1.txt -M 20 -n -o output.txt

            For Adrian_H: could you please give me your modified version. You just make me remeber the aligner I will use (Mosaik) doesn't accept missing paired-end.

            For raela: Sorry I didn't get your point with cat and pipe..could you please write how should I type it? Thanks!

            However; I found that I have more than one trouble: while blowing away tons of reads with no reason (other than a regular adaptor match, I guess), it also leaves tons of other adapters in the output sequences (e.g. sequences heavily trimmed, and the little remaining stretch....is an adaptor!! )

            If anybody can provide me a working utility, will be my idol.
            (I played so well with SeqClean and cln2qual.....why so many format on this world??)

            Comment


            • #7
              Novoalign seems to have an internal-only trimming pipeline.
              Looking at the manual it doesn't seem it will just return a trimmed fastq, but the final alignment....Am I wrong?

              I would like to avoid conversion to fasta+qual, expecially cause I'm dealing with several dataset (need to report clipping to qual).

              Mosaik, only accepts perfect matching paired sequences (only a missing one and it will stop, thus I'm also concerned about this issue). Have anybody evere dealed with this kind of issue: keep zero length sequences?
              Last edited by Gianza; 07-23-2010, 10:21 AM.

              Comment


              • #8


                There are some perl scripts there, which might help.
                SpliceMap: De novo detection of splice junctions from RNA-seq
                Download SpliceMap Comment here

                Comment


                • #9
                  We have build a pipeline, and the first step (the read cleaning) takes care of that. You can take a look at:

                  Comment


                  • #10
                    Originally posted by Adrian_H View Post
                    fastx_clipper seemed to work fine for me. Are you sure you're running it with the parameters you want? (I did have some issues with clipping paired-end reads, where fastx_clipper would blow away one of the sequences if it was too short after clipping, leading to unmatched pairs, and had to modify fastx_clipper to leave sequences in even if they were completely clipped)
                    yes I have written to the fastx guy about orphaned pairs he said the next version might have some solution. At any rate I think the fastx clipping is too aggressive.
                    --
                    Jeremy Leipzig
                    Bioinformatics Programmer
                    --
                    My blog
                    Twitter

                    Comment


                    • #11
                      Did you try a tool called TagCleaner?

                      Background Sequencing metagenomes that were pre-amplified with primer-based methods requires the removal of the additional tag sequences from the datasets. The sequenced reads can contain deletions or insertions due to sequencing limitations, and the primer sequence may contain ambiguous bases. Furthermore, the tag sequence may be unavailable or incorrectly reported. Because of the potential for downstream inaccuracies introduced by unwanted sequence contaminations, it is important to use reliable tools for pre-processing sequence data. Results TagCleaner is a web application developed to automatically identify and remove known or unknown tag sequences allowing insertions and deletions in the dataset. TagCleaner is designed to filter the trimmed reads for duplicates, short reads, and reads with high rates of ambiguous sequences. An additional screening for and splitting of fragment-to-fragment concatenations that gave rise to artificial concatenated sequences can increase the quality of the dataset. Users may modify the different filter parameters according to their own preferences. Conclusions TagCleaner is a publicly available web application that is able to automatically detect and efficiently remove tag sequences from metagenomic datasets. It is easily configurable and provides a user-friendly interface. The interactive web interface facilitates export functionality for subsequent data processing, and is available at http://edwards.sdsu.edu/tagcleaner .



                      It's a web-based tool, but I heard you can contact them if your files are large and they will process them offline for you.

                      Comment


                      • #12
                        Not sure if it'll work in your case, but try running it as:
                        cat 1M_1.txt | fastx_clipper -a AAGCAGTGGTATCAACGCAGAGTACGCGGG -M 20 -n > output.txt

                        Comment


                        • #13
                          You can use Genome Analysis Toolkit (GATK) to do this. http://www.broadinstitute.org/gsa/wi.../Read_Clipping

                          You can configure it to mask your adapters sequences with Ns so you don't end up with an empty sequence which can cause trouble with aligners when aligning in a paired-end mode.

                          Comment


                          • #14
                            oops nevermind I see it
                            can someone familiar with FASTX explain to me which 14 nt are aligning here? it seems way too aggressive
                            Code:
                            cat myseq.fq 
                            @HWI-EASXXX/1
                            AACGCGATGCCTCCATTGCTGGTGCAACTGAGCCTGGATATCGGCAGTGCGATCCTCATGGACTTGGATCTGGGTT
                            +HWI-EASXXX/1
                            `_bb_b_bbYbb^bbbaaXbbb`b_a[S``[[MWO`\``]b_bbJ\^Z\J`Y^a[`^[b_bF^b_BBBBBBBBBBB
                            
                            >cat myseq.fq | fastx_clipper -a AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCG -M 14
                            @HWI-EASXXX/1
                            AACGCGATGCCTCCATTGCTGGTGCAACTGAGCCTGG
                            +HWI-EASXXX/1
                            `_bb_b_bbYbb^bbbaaXbbb`b_a[S``[[MWO`\
                            Last edited by Zigster; 07-30-2010, 01:36 PM.
                            --
                            Jeremy Leipzig
                            Bioinformatics Programmer
                            --
                            My blog
                            Twitter

                            Comment


                            • #15
                              If you dig into the fastx_clipper source code, you can see what it's doing (I agree with you that I'm not at all sure that it's the right thing to do though!).

                              if ( alignment_size > 5
                              &&
                              alignment_results.target_start == 0
                              &&
                              (alignment_results.matches * 100 / alignment_size ) >= 75 ) {
                              //printf("--2\n");
                              return alignment_results.query_start ;
                              }

                              I think that this is what is aligning:

                              ATATCGGCAGTGCGAT
                              : ::::: :: ::: :
                              AGATCGGAAGAGCGGT


                              and the it is cutting off everything that follows

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              30 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              32 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              28 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              52 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X