Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • rzeng
    Member
    • Aug 2013
    • 19

    how to delete the all fastq reads which includes a potential 50bp Illumina Single End

    HI,

    I did Fastqc and found that a potential 50bp illumina single End PCR primer 1 sequence in my reads as followings

    AGTTGATCCGGTCCTAGGCAGTGTAGATCTCGGTGGTCGCCGTATCATTA (100% over 30bp)

    I checked my reads and found that this 50bp sequence locates on 5' of my reads that account 0.25% of all reads. (also some of my reads that there are GCGCA/GCTCAG/AACCG/AACAAAAGG sequence before this 50bp sequence too))

    Since my reads are all 88bp length. I do not want to keep these reads even if I cut these 50bp sequence off.

    Anyone know if there is any tools that can get rid of these reads who contain this 50bp sequence in the read? Or anyone has scripts or other ways to do this?
  • rzeng
    Member
    • Aug 2013
    • 19

    #2
    My aim for above question is that I want to get rid of these reads which contain AGTTGATCCGGTCCTAGGCAGTGTAGATCTCGGTGGTCGCCGTATCATTA sequence. since the reads contained this 50bp sequence only account for 0.25%. Fastq toolkit trimmer or other tools can not help.

    Comment

    • rhinoceros
      Senior Member
      • Apr 2013
      • 372

      #3
      Map your reads against this sequence with something like bowtie2?
      savetherhino.org

      Comment

      • rzeng
        Member
        • Aug 2013
        • 19

        #4
        No. I need to remove these reads which contain this 50bp sequence noisy from my library before I map them with BWA

        Comment

        • GenoMax
          Senior Member
          • Feb 2008
          • 7142

          #5
          If the reads only account for 0.25% of total why are you worried about them?

          Comment

          • kmcarr
            Senior Member
            • May 2008
            • 1181

            #6
            Use a read filtering/trimming application which includes adapter detection and removal. My choice is Trimmomatic.

            Comment

            • dariober
              Senior Member
              • May 2010
              • 311

              #7
              Originally posted by rzeng View Post
              My aim for above question is that I want to get rid of these reads which contain AGTTGATCCGGTCCTAGGCAGTGTAGATCTCGGTGGTCGCCGTATCATTA sequence. since the reads contained this 50bp sequence only account for 0.25%. Fastq toolkit trimmer or other tools can not help.
              For example I would use cutadapt with "--discard-trimmed" option. Anyway, as GenoMax suggested you might ignore these reads which wouldn't align anyway if the adapter makes up a big chunk of the read.
              Dario

              Comment

              • FroggyFlox
                Junior Member
                • Feb 2012
                • 4

                #8
                If you don't want to deal with Trimmomatic, you can also have a look at the Galaxy platform... Use the tool called 'Manipulate Fastq' which will give you the possibilty to select all the reads containing your sequence and do whatever you want with them, including deleting them.
                Galaxy is a community-driven web-based analysis platform for life science research.

                Comment

                • rzeng
                  Member
                  • Aug 2013
                  • 19

                  #9
                  Thank you all the guys

                  Comment

                  Latest Articles

                  Collapse

                  • GATTACAT
                    Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                    by GATTACAT
                    Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                    07-01-2026, 11:43 AM
                  • SEQadmin2
                    Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                    by SEQadmin2


                    I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                    Here are nine questions we think about, in roughly the order they matter, before...
                    06-18-2026, 07:11 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by SEQadmin2, Yesterday, 11:08 AM
                  0 responses
                  6 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-30-2026, 05:37 AM
                  0 responses
                  11 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-26-2026, 11:10 AM
                  0 responses
                  19 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-17-2026, 06:09 AM
                  0 responses
                  53 views
                  0 reactions
                  Last Post SEQadmin2  
                  Working...