Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    OK, I can understand what you're doing here but if your $FILE represents "read_1 read_2" then your output BAM file will be called "read_1 read_2".bam which is going to cause some problems for you. Note that novoalign only recognizes paired-end reads from them being in separate files so you cannot concatenate them and assume it's going to pair them together.



    Originally posted by nguyendofx View Post
    Hi zee,

    Thanks for input, you are right. The last novoalign command was a bit confuse you.

    I run novoalign on our cluster. Here is a real command that I have used.

    echo "novoalign -k -F ILM1.8 -o SAM -f $FILE -d $REF | samtools view -S -b -h -o $FILE.novoalign.bam - " | qsub -S /bin/bash -V -q $Q -cwd -N $NAME.novocall -t 1-$LENGTH -tc $P

    where $FILE is all files for read_1 and read_2


    Thanks,
    Ng

    Comment


    • #17
      Yes, Every $FILE.fastq input file generates $FILE.bam. I view a BAM file on IGV, it shows both reads ???
      The problem could be either 'samtool flagstat' don't understand flags on a BAM file or BAM/SAM file doesn't have flags ?

      Thanks,
      Ng


      Originally posted by zee View Post
      OK, I can understand what you're doing here but if your $FILE represents "read_1 read_2" then your output BAM file will be called "read_1 read_2".bam which is going to cause some problems for you. Note that novoalign only recognizes paired-end reads from them being in separate files so you cannot concatenate them and assume it's going to pair them together.

      Comment


      • #18
        flagstat understands flags just fine. And right, your bam doesn't have flags. It can't possibly have the normal paired end flags, because you told Novoalign, the software making the .sam, that you didn't have paired end data!

        The problem is not with the software. The problem is with the command line you wrote. Software isn't magic. It can't read your mind, and know that you have pairs of fastqs which are pairs. You have to tell it which fastq file is paired with whch other one. It takes five seconds to see how to do that on the novoalign quick start page, and your command line isn't going to do that.

        Comment


        • #19
          Originally posted by swbarnes2 View Post
          The problem is not with the software. The problem is with the command line you wrote. Software isn't magic. It can't read your mind, and know that you have pairs of fastqs which are pairs. You have to tell it which fastq file is paired with whch other one. It takes five seconds to see how to do that on the novoalign quick start page, and your command line isn't going to do that.
          This should have been the first reply in the thread

          Comment


          • #20
            Thanks you all. Its all good now

            Comment


            • #21
              Originally posted by nguyendofx View Post
              It is PE, so I expect read_1 and read_2 both should have the same positive number, not zero! I can not explain why most of output are zeros.
              using samtools flagstat on the sorted bam file- I got
              5191445 + 0 in total (QC-passed reads + QC-failed reads)
              0 + 0 duplicates
              4863917 + 0 mapped (93.69%:-nan%)
              5191445 + 0 paired in sequencing
              2595624 + 0 read1
              2595821 + 0 read2
              4752609 + 0 properly paired (91.55%:-nan%)
              4829623 + 0 with itself and mate mapped
              34294 + 0 singletons (0.66%:-nan%)

              the length of read1 is different from read2- is this correct as I understand that the length should be identical in the PE reads??

              Comment


              • #22
                Originally posted by mmmm View Post
                using samtools flagstat on the sorted bam file- I got
                5191445 + 0 in total (QC-passed reads + QC-failed reads)
                0 + 0 duplicates
                4863917 + 0 mapped (93.69%:-nan%)
                5191445 + 0 paired in sequencing
                2595624 + 0 read1
                2595821 + 0 read2
                4752609 + 0 properly paired (91.55%:-nan%)
                4829623 + 0 with itself and mate mapped
                34294 + 0 singletons (0.66%:-nan%)

                the length of read1 is different from read2- is this correct as I understand that the length should be identical in the PE reads??
                It depends on how the alignment was done. Some aligners allow you to align the mates individually if they can't align as a pair. Compare the number "mapped" to the number "with itself and mate mapped", noting also that you have "singletons".

                Comment


                • #23
                  flagstat output

                  so it should be ok- BWA software was used in mapping to the reference genome

                  Comment


                  • #24
                    Yes, it seems reasonable.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM
                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    22 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    24 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    20 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-04-2024, 09:00 AM
                    0 responses
                    52 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X