Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • trimming in tophat

    Hi all,

    I am trying to analyse my PE Illumina data using tophat.

    At first I run fastqc. Checking the raw data, I discovered at the beginnings (and presumably at the ends) of my reads I have some containments from the adapters of the sequencing.
    I run bowtie first on both the full length and trimmed sequences and got better results with the trimmed sequences.

    Do I need to trim the data before running tophat?

    Does someone know how to do it? do I need to convert my trimmed sam files (bowtie output) back into fastq files?

    Thanks for any help
    Assa

  • #2
    Hi, I found this useful page about this issue.



    HTH

    Dave

    Comment


    • #3
      Thanks for the tip.
      It is a good page with summaries about the different images of the fastqc software, a thing a lot of people were looking for in a different thread.

      BTW, can anyone tell me of a good way to remove the duplicates reads from the equation.
      running the fastqc program I get a lot of duplicated reads (see attachment).

      As I am looking for differentially regulated genes I am not sure whether I should exclude the duplicated reads or not, but I would like to try ans see what I get when doing so.

      Q: can anyone tell me how to filter duplicated genes from the sam files or before the bowtie run from the fastq files?

      Q: Is it the right way when going for differential expression also to exclude the duplications? or do I need to keep them?

      Thanks

      Assa
      Attached Files

      Comment


      • #4
        did you get the answer ?
        would like to share it here
        thank you

        Originally posted by frymor View Post
        Thanks for the tip.
        It is a good page with summaries about the different images of the fastqc software, a thing a lot of people were looking for in a different thread.

        BTW, can anyone tell me of a good way to remove the duplicates reads from the equation.
        running the fastqc program I get a lot of duplicated reads (see attachment).

        As I am looking for differentially regulated genes I am not sure whether I should exclude the duplicated reads or not, but I would like to try ans see what I get when doing so.

        Q: can anyone tell me how to filter duplicated genes from the sam files or before the bowtie run from the fastq files?

        Q: Is it the right way when going for differential expression also to exclude the duplications? or do I need to keep them?

        Thanks

        Assa

        Comment


        • #5
          No I didn't get any response for the questions I posted.

          I am not sure though how important is the duplication rate in this step. I'm using tophat2 with the option to exclude all duplicated reads, so I am not worried about the duplication in the original fastq file.

          I hope I am thinking in the right direction.

          Comment


          • #6
            Sangenix

            SangeniX: A comprehensive, automated, scalable and user friendly NGS data analysis suite

            Sangenix Has module for duplication removal.

            Give it a try : http://www.sangenix.com/

            Comment


            • #7
              let me know again, when it is a freeware

              Comment


              • #8
                Sangenix

                Beta Version is available. you can contact to us via contact page in http://www.sangenix.com/contactus.aspx

                Comment


                • #9
                  Removing the duplicates could be done with the samtools rmdup command (you could alternatively use markDuplicates from picard). This is generally not needed for RNAseq, since a certain amount of duplication would be both expected and desired for highly expressed genes (i.e., many/most of these probably aren't PCR duplicates).

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM
                  • seqadmin
                    Techniques and Challenges in Conservation Genomics
                    by seqadmin



                    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                    Avian Conservation
                    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                    03-08-2024, 10:41 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 06:37 PM
                  0 responses
                  10 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, Yesterday, 06:07 PM
                  0 responses
                  9 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-22-2024, 10:03 AM
                  0 responses
                  49 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-21-2024, 07:32 AM
                  0 responses
                  67 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X