Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • paired end reads don't have the same length

    Dear all,

    My genomes were sequenced by illumina hiseq in paired end. I was running cortex but it died in SAM line1 says the list of paired end reads don't have the same length and quit. How should I process it?

    Thank you!
    Last edited by fnn4; 08-08-2014, 05:36 AM.

  • #2
    You should re-obtain the raw read files that have the correct number of reads, and start with those. Then, you'll have to clarify what you are trying to do next.

    Comment


    • #3
      Originally posted by Brian Bushnell View Post
      You should re-obtain the raw read files that have the correct number of reads, and start with those. Then, you'll have to clarify what you are trying to do next.
      Thank you for replying!
      I'm using files of raw reads though. How can I make them the same length?
      Thank you!!

      Comment


      • #4
        Can you post the head, tail, and wordcounts of the two files?

        e.g. the output of these commands:

        head -n 8 read1.fastq
        tail -n 8 read1.fastq
        wc read1.fastq

        head -n 8 read2.fastq
        tail -n 8 read2.fastq
        wc read2.fastq

        Comment


        • #5
          Brian: Sounds like some of the reads are not the same length (total number may be the same). Perhaps "cortex" (not sure what that is) requires them to be.

          Comment


          • #6
            The files sizes are the same when they are in fastq, but not the same when they are gziped. But even when I use fastq files, it still quit saying paired end reads don't have the same length, die SAM line 2.
            below is one's sample's head, tail wc.
            head -n 8 E_F.fastq
            @HWI-ST1360:45:C1JK9ACXX:1:1101:3383:2427 1:N:0:TGACCA
            TCAATTCGACTGGGTACGACCACGTAAGACATGGTTAGATGCAAAATGTCCAGTGTATATTGATTTTGGTGACGAACACCTTGTAAAACTAGATACATATG
            +
            BBBFFFFFFFFFFIFFIIIIIIIIFIIIIIIIIIFIIIIIIIIIIIIIFIIFIFFFFIIIIIIIIIIFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFF
            @HWI-ST1360:45:C1JK9ACXX:1:1101:3303:2477 1:N:0:TGACCA
            AGAGATAACCAACAGTTCGCGTTTGAATATCAAGACAATGAAGGCCAAACACAATCTAAAGCATTAACACTCAAAGTAGGCGATAGCCTTGAAGAAGTGGC
            +
            BBBFFFFFFFFFFIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFB

            tail -n 8 E_F.fastq
            @HWI-ST1360:45:C1JK9ACXX:1:2316:20760:100871 1:N:0:GGACCA
            TGTCATATCCGCAAAGCTCTGAAATGACATTATTTTCAATAGCCTCAACGGCTAATTGGTTGGCGCGGGCAGTATTGATACAAATACGATGCCCTTGCTCG
            +
            BB<FFFFFFFFFFBFFIIFFFFFFIIFIFFFFFFIB<FBFFFFIFFBBFFFF<FBFFFIIIIFBBBFFBB<<'0<BBFFFFBBBBBFB<BFBBBBBB<BBF
            @HWI-ST1360:45:C1JK9ACXX:1:2316:21212:100876 1:N:0:TGACCA
            TTTTCTAAAGCGATACAACCTTACAGGTAAACTGGAAGCCGGTTTGTATCACTTGTCCGTTGTATCTAAAGATAAACAAGTTTACAGAACTTGGGTGATAC
            +
            7<B<B<00<B0<<BFFFB00<BFBBB'0<0BF<07BFB0'<77BF<7BB'<BBBB70<7BBFBBFBB<BB<BBBBBB<<B<<<<7'0<<<''070077<<'

            wc E_F.fastq
            16531308 20664135 1084831013 E_F.fastq
            wc E_R.fastq
            16531308 20664135 1084831013 E_R.fastq

            Thank you!!

            Originally posted by Brian Bushnell View Post
            Can you post the head, tail, and wordcounts of the two files?

            e.g. the output of these commands:

            head -n 8 read1.fastq
            tail -n 8 read1.fastq
            wc read1.fastq

            head -n 8 read2.fastq
            tail -n 8 read2.fastq
            wc read2.fastq

            Comment


            • #7
              The fact that both files have the exact same number of lines and characters indicates that probably there is no problem with the input, but rather some kind of bug in cortex (which I have also never heard of) or an incorrect command line.

              What are you trying to do with the data?

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM
              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 08:47 AM
              0 responses
              12 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              60 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              59 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              54 views
              0 likes
              Last Post seqadmin  
              Working...
              X