Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to delete the all fastq reads which includes a potential 50bp Illumina Single End

    HI,

    I did Fastqc and found that a potential 50bp illumina single End PCR primer 1 sequence in my reads as followings

    AGTTGATCCGGTCCTAGGCAGTGTAGATCTCGGTGGTCGCCGTATCATTA (100% over 30bp)

    I checked my reads and found that this 50bp sequence locates on 5' of my reads that account 0.25% of all reads. (also some of my reads that there are GCGCA/GCTCAG/AACCG/AACAAAAGG sequence before this 50bp sequence too))

    Since my reads are all 88bp length. I do not want to keep these reads even if I cut these 50bp sequence off.

    Anyone know if there is any tools that can get rid of these reads who contain this 50bp sequence in the read? Or anyone has scripts or other ways to do this?

  • #2
    My aim for above question is that I want to get rid of these reads which contain AGTTGATCCGGTCCTAGGCAGTGTAGATCTCGGTGGTCGCCGTATCATTA sequence. since the reads contained this 50bp sequence only account for 0.25%. Fastq toolkit trimmer or other tools can not help.

    Comment


    • #3
      Map your reads against this sequence with something like bowtie2?
      savetherhino.org

      Comment


      • #4
        No. I need to remove these reads which contain this 50bp sequence noisy from my library before I map them with BWA

        Comment


        • #5
          If the reads only account for 0.25% of total why are you worried about them?

          Comment


          • #6
            Use a read filtering/trimming application which includes adapter detection and removal. My choice is Trimmomatic.

            Comment


            • #7
              Originally posted by rzeng View Post
              My aim for above question is that I want to get rid of these reads which contain AGTTGATCCGGTCCTAGGCAGTGTAGATCTCGGTGGTCGCCGTATCATTA sequence. since the reads contained this 50bp sequence only account for 0.25%. Fastq toolkit trimmer or other tools can not help.
              For example I would use cutadapt with "--discard-trimmed" option. Anyway, as GenoMax suggested you might ignore these reads which wouldn't align anyway if the adapter makes up a big chunk of the read.
              Dario

              Comment


              • #8
                If you don't want to deal with Trimmomatic, you can also have a look at the Galaxy platform... Use the tool called 'Manipulate Fastq' which will give you the possibilty to select all the reads containing your sequence and do whatever you want with them, including deleting them.
                Galaxy is a community-driven web-based analysis platform for life science research.

                Comment


                • #9
                  Thank you all the guys

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM
                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  34 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 08:48 AM
                  0 responses
                  28 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-01-2024, 06:45 AM
                  0 responses
                  45 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-27-2024, 06:37 PM
                  0 responses
                  32 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X