Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • STAR with trimmed reads

    Hi Everyone,

    I am doing a comparison of tophat2 vs STAR alignment of my RNA-seq data, and trimmed vs untrimmed data. (I was getting different results using tophat2 than the bioinformaticians were with STAR but they don't seem to interested in determining why which is why I am testing it out myself). I found quite a large difference in mapping efficiency in tophat when I trimmed my reads using cutadapt (up to 35% more mapping) compared to untrimmed. I know STAR is supposed to soft clip the reads but I'm still curious to see if there is any difference and the percentages compared to tophat2. While I have no problems with my raw input data in STAR, it doesn't seem to like my trimmed reads and gives the following error:

    EXITING because of FATAL ERROR: Read1 and Read2 are not consistent, reached the end of the one before the other one
    SOLUTION: Check you your input files: they may be corrupted

    I assume this is because during the trimming process, they will no longer all be 100bp long and I will lose some reads altogether. I tried the following option: --readMatesLengthsIn NotEqual but it still gave the same error.

    Any suggestions? Will STAR let me run the files if they aren't equal? Or is it pointless to test trimmed reads with STAR at all?

    Thanks for your help!

  • #2
    It sounds like you just trimmed incorrectly. What was the exact command you used?

    Comment


    • #3
      I trimmed two adapters based on the overrepresented sequences found by FastQC: the Nextera barcodes and a primer used during the cDNA synthesis.

      Code:
      cutadapt -q 10 -a CTGTCTCTTATACACATCTCCGAGCCCACGAGACNNNNNNNNATCTCGTATGCCGTCTTCTGCTTGAAAAA -b AAGCAGTGGTATCAACGCAGAGTACNNNNN --minimum-length 36 Sample1_R1.fastq > Sample1trim_R1.fastq 2> Sample1trimlogR1

      Comment


      • #4
        Just as a side note, I have used STAR on trimmed reads (unequal lengths) and it works fine.

        Have you checked if the order of the reads in R1 file and R2 file are the same? From the error message it seems that either of the file has more reads. Check using wc -l

        I use Trimmomatic in Paired-end mode for clipping adapters. The final files have only those reads that passed QC in both R1 and R2. Check if this is the case from cutadapt output

        Comment


        • #5
          It sounds like the error message is poorly-worded and actually means there are different numbers of reads in the two files. It sounds like you did your trimming incorrectly such that paired reads were not kept together. When trimming paired reads, you must trim both together, not one file at a time in different processes.

          Comment


          • #6
            Trimming the input files separately will lead to a lot of problems. As suggested, use trimmomatic or trim_galore or skewer to trim both files at once.

            Comment


            • #7
              Ok, I checked with cutadapt and indeed, I hadn't trimmed them properly for paired data. I reran the STAR alignment and it worked. Thank you all for taking the time to help me.

              As a note, I originally trimmed my data with trimmomatic but got errors with both tophat and STAR so I opted for cutadapt instead.

              Code:
              java -jar /path/to/Trimmomatic-0.32/trimmomatic-0.32.jar PE -threads 8 -phred33 -trimlog Sample1trimlog sample1_R1.fastq sample1_R2.fastq sample1_R1_TP.fastq sample1_R1_TU.fastq sample1_R2_TP.fastq sample1_R2_TU.fastq ILLUMINACLIP:/path/to/Trimmomatic-0.32/adapters/adapters.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
              STAR error:

              EXITING because of FATAL ERROR in input reads: unknown file format: the read ID should start with @ or >

              tophat2 error:

              Error: beginning of quality values record not found! (@D3VDZHS1:119:H036PADXX:1:1103:8363:72199 1:N:0:GGACTCCTTATCCTCT)

              Comment


              • #8
                Originally posted by dpryan View Post
                Trimming the input files separately will lead to a lot of problems. As suggested, use trimmomatic or trim_galore or skewer to trim both files at once.
                It looks like the output files were corrupted somehow. Can you output the top 8 lines of each file?

                And if you want another trimming option, I recommend BBDuk.

                Syntax:

                bbduk.sh -Xmx1g in1=reads1.fq in2=reads2.fq out1=trimmed1.fq out2=trimmed2.fq ref=truseq.fa.gz,nextera.fa.gz k=25 ktrim=r hdist=1 tbo tpe

                truseq.fa.gz and nextera.fa.gz are included with the package, in the /resources/ directory.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Techniques and Challenges in Conservation Genomics
                  by seqadmin



                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                  Avian Conservation
                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                  03-08-2024, 10:41 AM
                • seqadmin
                  The Impact of AI in Genomic Medicine
                  by seqadmin



                  Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                  02-26-2024, 02:07 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 03-14-2024, 06:13 AM
                0 responses
                33 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-08-2024, 08:03 AM
                0 responses
                72 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-07-2024, 08:13 AM
                0 responses
                81 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-06-2024, 09:51 AM
                0 responses
                68 views
                0 likes
                Last Post seqadmin  
                Working...
                X