Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Thanks,

    If I use bowtie, does it make any difference in alignment.
    When I am running cmd with bowtie, while processing it shows below message:
    CMD: bowtie -k 20 -S --sam-nohead -q target single_fa > single_fa.pre.sam

    Upto my understanding it says sam has no header. Am I correct?

    when i run stat.pl cmd to obtain read alignment statistics:
    result shows:
    read_type count pct
    single 6874526 100.00

    but no information about proper and improper pairing.

    Hope you can shed lights on this.

    Thanks a bunch.
    Naresh



    Originally posted by westerman View Post
    I am surprised that Trinity would be looking for tophat2. However bowtie2 is closely associated with tophat2. I suggest installing tophat2. You may never need it but that should be the way to get Trinity to use bowtie2.

    Comment


    • #17
      Bowtie 2 supports gapped, local, and paired-end alignment modes. That may be a reason to prefer it over bowtie.

      Do you have paired-end data? Otherwise how will you get information about pairing?

      Comment


      • #18
        Hi Genomax,

        Thanks for feedback.

        I don't have paired end data.
        I was confused about output of alignReads.pl script. i.e. how to confirm percentage of alignment. That's why asked about pairing. Sorry about that.
        Can you please give me some feedback about how to check preservation of assembly and accurary of assembled contigs.

        Thanks,
        Naresh



        Originally posted by GenoMax View Post
        Bowtie 2 supports gapped, local, and paired-end alignment modes. That may be a reason to prefer it over bowtie.

        Do you have paired-end data? Otherwise how will you get information about pairing?

        Comment


        • #19
          Hi

          Hi everyone,

          I run below cmd for alignment.
          bin/util/alignReads.pl --seqType fq --single CombineIonXpressRNA_010_NareshPool_Chip1_2_WT2_fastxtrimmer_from_quality_trimmer.fastq --target /media/DATAPART3/Combine_Files/Velvetoptimiser_27_37/trinity_output/Trinity.fasta --aligner bowtie --retain_intermediate_files --output align_bowtie_output1

          ##output in terminla show as follow:

          CMD: bowtie -k 20 -S --sam-nohead -q target single_fa > single_fa.pre.sam
          # reads processed: 18001999
          # reads with at least one reported alignment: 3881611 (21.56%)
          # reads that failed to align: 14120388 (78.44%)
          Reported 6874538 alignments to 1 output stream(s)
          CMD: touch single_fa.pre.sam.finished
          CMD: /root/Trinity/util/../util/SAM_filter_out_unmapped_reads.pl single_fa.pre.sam > single_fa.sam
          -filtered 14120388 of 20994926 reads as unaligned = 67.26% unaligned reads
          CMD: touch single_fa.sam.finished
          CMD: sort -T . -S 2G -k 1,1 -k 3,3 single_fa.sam > single_fa.nameSorted.sam
          CMD: touch single_fa.nameSorted.sam.finished
          -child alignment process completed.

          ## Alignment steps succeeded.

          CMD: sort -T . -S 2G -k 3,3 -k 4,4n single_dir/single_fa.nameSorted.sam > single_dir/single.coordSorted.sam
          CMD: touch single_dir/single.coordSorted.sam.finished
          CMD: cp single_dir/single.coordSorted.sam align_bowtie_output1.pre.coordSorted.sam
          CMD: touch align_bowtie_output1.pre.coordSorted.sam.finished
          CMD: cp align_bowtie_output1.pre.coordSorted.sam align_bowtie_output1.coordSorted.spliceAdjust.sam
          CMD: touch align_bowtie_output1.coordSorted.spliceAdjust.sam.finished
          CMD: cp /media/DATAPART3/Combine_Files/Velvetoptimiser_27_37/trinity_output/align_bowtie_output1/align_bowtie_output1.coordSorted.spliceAdjust.sam align_bowtie_output1.coordSorted.sam
          CMD: touch align_bowtie_output1.coordSorted.sam.finished
          CMD: samtools view -bt target.fa.fai -S align_bowtie_output1.coordSorted.sam | samtools sort - align_bowtie_output1.coordSorted
          [bam_header_read] EOF marker is absent. The input is probably truncated.

          [sam_header_read2] 102085 sequences loaded.
          [bam_sort_core] merging from 3 files...
          CMD: touch align_bowtie_output1.coordSorted.bam.finished
          CMD: sort -T . -S 2G -k 1,1 -k 3,3 align_bowtie_output1.coordSorted.sam > align_bowtie_output1.nameSorted.sam
          CMD: touch align_bowtie_output1.nameSorted.sam.finished
          CMD: samtools view -bt target.fa.fai align_bowtie_output1.nameSorted.sam > align_bowtie_output1.nameSorted.bam
          [sam_header_read2] 102085 sequences loaded.
          CMD: samtools index align_bowtie_output1.coordSorted.bam

          As per above ouput:
          20994926 reads where used for alignment. But in actually I have only 18,001,199 reads.
          # 67.26% was unaligned and 32.74% was aligned

          So Questions are:
          1] Why it used more amout of input reads i.e. is 20 million instead of 18M.
          2] 32% is cosidered as good or bad alignment? If bad, how do i improve alignment %
          3] What does this mean == EOF marker is absent. The input is probably truncated


          Thanks in advance.
          Naresh

          Comment


          • #20
            Hi nareshvasani,

            I am curious about your assembly result with trinity. I have a instrument of Ion proton data and I am trying to assembly the transcriptome data. But I did not get good result with MIRA4, Trinity and Velvet assembler. Can you tell me your command line for trinity and general information about your raw read and assembly file? it will help me a lot.Thanks


            sabbir




            Originally posted by nareshvasani View Post
            Hi,

            Thanks for your prompt reply.
            I really appreciate your suggestion.
            Do you think if I put some more input option for butterfly, inchworm, kmer and Chrysalis, it will give me better contig file?

            Thanks,
            Naresh

            Comment


            • #21
              Hi Sabbir,

              How did you confirm that assembly from MIRA4, Trinity and velvet assembler were not good enough?
              I had very big input file (fastq) which was not very well handled by all assembler. So, in order to generate input file which can be very well handled by Trinity, i processed several steps like trim, duplication removal from each fastq files and finally combining all fasta files into one fasta file.


              Below is the command I used for Trinity [I used same cmd as suggested by trinity website]

              1] Trinity.pl -seqType fa -min_contig_length 200 -JM 40G -CPU 4 -single inputfilename.fasta -output trinity_output

              #### took 2 days to complete 100% ######

              2] /bin/util/TrinityStats.pl Trinity.fasta # gives basic stat information i.e.e assembly file info.

              Total trinity transcripts: 128578
              Total trinity components: 43455
              Contig N50: 862


              Hope this helps.
              Best luck,
              Naresh


              Originally posted by sabbir_barj View Post
              Hi nareshvasani,

              I am curious about your assembly result with trinity. I have a instrument of Ion proton data and I am trying to assembly the transcriptome data. But I did not get good result with MIRA4, Trinity and Velvet assembler. Can you tell me your command line for trinity and general information about your raw read and assembly file? it will help me a lot.Thanks


              sabbir

              Comment


              • #22
                Hi nareshvasani,

                Thanks for your reply. I got huge number of contigs from the three assembler than I expected. From the genome sequences of my species, the contigs number should be 30000-40000 but I got more than 200000 contigs. Also I got large size of trancripts than I expected. Following pipeline I used for assembl
                1. remove adapter by cutadapt
                2. remove duplicate
                3. then assembly with trinity.
                I also used trinity quality options

                I have also a big input file (Fastq). Accroding to your suggestions I need to run several tools to improve input file. Can you please tell me in details the several steps like trim, duplication removal from each fastq files and finally combining all fasta files into one fasta file for assembly (name of different tools and how to combine the fasta files)?

                Regards
                Sabbir

                Comment


                • #23
                  Hi Sabbir,

                  Below are the following steps I used for trimming:

                  The following command keep reads which has quality score above 20 in at least 50% of bases.
                  > fastq_quality_filter -Q33 -q20 -p 50 -i <SAMPLE_NAME>.fastq -o <SAMPLE_NAME>.qulaity_filter.fastq

                  The following operation removes nucleotides having quality scores lower than 20 from the ends of reads. Furthermore, any trimmed reads having lengths less than 50 nucleotides are discarded altogether:
                  > fastq_quality_trimmer -Q33 -t 20 -l 50 -i <SAMPLE_NAME>.fastq -o <SAMPLE_NAME>.clean.fastq

                  To remove base sequence content and GC content from the end of reads, following command was used. It removes 15 nucleotides from the end of reads.
                  > fastx_trimmer –Q33 –f 1 -l 335 -i <SAMPLE_NAME>.clean.fastq -o <SAMPLE_NAME>.fastx_trimmer.fastq

                  After this step, the read length distribution changed minimally, with the majority of reads retaining their full length. In addition around 25% of the reads were discarded completely.

                  In order to remove identical sequences, fastx_collapser tool was used:
                  > fastx_collapser -v -i <SAMPLE_NAME>.fastq -o <SAMPLE_NAME>_collapsed.fasta
                  Above tools removes few millions reads from each files while maintaining all read counts and gives output in fasta format.


                  ##All of the above steps will help you to reduce size of input file and improve quality of each fastq file.



                  Best,
                  NareshVasani

                  Comment


                  • #24
                    @sabbir: You are probably doing this already but use the example command lines/setting supplied by Naresh as a guideline. You will have to experiment with your own data.

                    What is your expected genome size? Do you have an idea of the approximate fold coverage you have? If there is a reference genome available then alignment may be a better option to try than assembly.

                    Comment


                    • #25
                      Hi NareshVasani,

                      Thanks for your kind reply. Did you run different trimming tools on the same file and the merge the output files of each tools or you run one after another?
                      Such as,
                      sample file >fastq_quality_filter >output file1
                      sample file> fastq_quality_trimmer> ouput file 2
                      sample file > fastx_trimmer> output file3
                      sample file > fastx_collapser >output file4

                      then merge the all 4 files.

                      then merge the four files, or
                      You run one tools one after, Like first you run fastq_quality_filter and then run fastq_quality_trimmer with output of fastq_quality_filter (input file is fastq_quality_filter output) and the following the tools.

                      If you followed first one then I want to how to you merge the files and why you did not use the fastq file?

                      Regards
                      Sabbir

                      Comment


                      • #26
                        Hi Sabbir,


                        This cmd I used for my data. You do not have to follow same steps but you have to use this cmd as your reference as per your data need [as said bu Genomax]

                        I think you have not understand how this fastx toolkit work. You need to read manual of this toolkit properly.

                        First method you described it doesn't make any sense.
                        I used second method.

                        Best,
                        Nareshvasani


                        Originally posted by sabbir_barj View Post
                        Hi NareshVasani,

                        Thanks for your kind reply. Did you run different trimming tools on the same file and the merge the output files of each tools or you run one after another?
                        Such as,
                        sample file >fastq_quality_filter >output file1
                        sample file> fastq_quality_trimmer> ouput file 2
                        sample file > fastx_trimmer> output file3
                        sample file > fastx_collapser >output file4

                        then merge the all 4 files.

                        then merge the four files, or
                        You run one tools one after, Like first you run fastq_quality_filter and then run fastq_quality_trimmer with output of fastq_quality_filter (input file is fastq_quality_filter output) and the following the tools.

                        If you followed first one then I want to how to you merge the files and why you did not use the fastq file?

                        Regards
                        Sabbir

                        Comment


                        • #27
                          Hi Nareshvasani,
                          Thanks a lot. You said before that several steps like trim, duplication removal from each fastq files and finally combining all fasta files into one fasta file. If you follow the second step, then how can you got several files and how can you merge file? Because you run the tools one after and working with output file of each tool?

                          Regards
                          Sabbir

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Strategies for Sequencing Challenging Samples
                            by seqadmin


                            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                            03-22-2024, 06:39 AM
                          • seqadmin
                            Techniques and Challenges in Conservation Genomics
                            by seqadmin



                            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                            Avian Conservation
                            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                            03-08-2024, 10:41 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, Yesterday, 06:37 PM
                          0 responses
                          8 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, Yesterday, 06:07 PM
                          0 responses
                          8 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-22-2024, 10:03 AM
                          0 responses
                          49 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-21-2024, 07:32 AM
                          0 responses
                          66 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X