Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Inputting fastq files into Tophat2 without info on seq platform type

    I'm trying to use Tophat2 in galaxy to map paired reads, but the drop key for selecting the files doesn't recognize any files imported (files look fine using fastqc). It only recognizes them after running fastq groomer.

    I don't know the platform used to sequence these, so I don't know what to use for running fastq groomer. I tried illumina 1.3-1.7 and then separately sanger/illumina. But then when I used tophat to map either sets of files, the mapping results were terrible. For illumina1.3, it gave me 94.7 discordant alignments. For sanger/illumina, it gave me 0% mapped reads. I'm assuming the problem is the file type I'm converting? The data have been used before for RNA seq DGE analysis, so I'm assuming they're fine.

    My question: how can I know from the original fastq file what to put for the fastq groomer? Or: any helpful information.

    Original fastq files (top line):
    GWZHISEQ02:321YMKACXX:4:1101:1856:1996 1:N:0:ATCACG
    CACGATGATGGCCTTCGACGGCAAGTACGACTTCCCCCTGGACATCAGCGA
    +
    @@CFDDFFHHHHHJJHJIIIJDIJJDGHIIJJJIJJJJJIJIJJJGJJJHH

  • #2
    Those are Illumina reads, and could be either ASCII-64 (old Illumina) or ASCII-33 (Sanger) format; most likely ASCII-64 but I can't tell from that read. It may be possible if you post some more reads (particularly if you can find a read with an 'N' base call).

    Comment


    • #3
      Here's one with several Ns:

      @GWZHISEQ02:321YMKACXX:5:1101:5470:1986 1:N:0:ATCACG
      CTGGATATCAATAATGCTCTCCNTAGGGATATTTCCCGCAAATTTGANNNN
      +
      CCCFFFFFHHHHHJJJJJJJJJ#3AGIJJJJJJJJJJJJJJJJJJJJ####

      Comment


      • #4
        That's strange, normally N should be Q0 (!) not Q2 (#), but it appears to be ASCII-33 (Sanger) data. I'm not sure why the reads are not mapping. You may want to BLAST some of them to a database like NT to make sure they come from the correct organism.

        Comment


        • #5
          You should not need to "groom" the data if they are already Sanger formatted. Just choose the "pencil" edit icon against the name of the dataset and manually set the data type to "fastqsanger" under "datatype" tab.

          You should do some QC/trimming though as that may be affecting your alignments.

          Comment


          • #6
            I'm wondering if maybe they need to be adapter-trimmed? They all failed Kmer in fastqc.

            Comment


            • #7
              In that case, probably yes! Though that's easiest to do if you know what kind of adapters were used.

              Comment


              • #8
                Originally posted by GenoMax View Post
                You should not need to "groom" the data if they are already Sanger formatted. Just choose the "pencil" edit icon against the name of the dataset and manually set the data type to "fastqsanger" under "datatype" tab.
                I changed the dataset type to fastqsanger, but Tophat2 and Trimmomatic are still not recognizing the files. I click the dropkey in either program (ex: RNA-Seq FASTQ file, forward reads) and there's nothing there.

                Edit: This is true for either paired-end (which is correct for my data) or single-end options.

                SOLUTION: I am dumb. Accidentally changed them to fastqCsanger
                Last edited by skmotay; 10-08-2014, 12:44 PM.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM
                • seqadmin
                  Techniques and Challenges in Conservation Genomics
                  by seqadmin



                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                  Avian Conservation
                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                  03-08-2024, 10:41 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 06:37 PM
                0 responses
                10 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, Yesterday, 06:07 PM
                0 responses
                9 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-22-2024, 10:03 AM
                0 responses
                49 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-21-2024, 07:32 AM
                0 responses
                67 views
                0 likes
                Last Post seqadmin  
                Working...
                X