Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • MSA on large scale of sequences?

    Hello guys,

    I have 15 miRNA seq, around 8MB each. What i am trying to do is, apply Multiple Sequence Alignment and get a consensus sequence after that do the annotation. I ve tried Clustalx, and Omega also others such as Kalign, Muscle. But i can not get any result from any of them. Can anyone help me with this?

  • #2
    Originally posted by mastercoder View Post
    I have 15 miRNA seq, around 8MB each.
    Those are some huge microRNAs!

    Seriously, though, can you clarify a bit? Do you mean you have 15 files, 8MB each, gzip-compressed fastqs of single-ended 50bp miRNA reads, for example - and if not, what exactly do you have? And when you say you tried X, Y, and Z, what were your command lines, what did they print to the screen, and what was the output? Also, what's your experiment?

    Comment


    • #3
      Originally posted by Brian Bushnell View Post
      Those are some huge microRNAs!

      Seriously, though, can you clarify a bit? Do you mean you have 15 files, 8MB each, gzip-compressed fastqs of single-ended 50bp miRNA reads, for example - and if not, what exactly do you have? And when you say you tried X, Y, and Z, what were your command lines, what did they print to the screen, and what was the output? Also, what's your experiment?
      First, thanks for replying. I ll start with ur last question.

      I have 15 miRNA paired-end seqs 29bp reads. First i used velvet on trimmed data and then SSPACE. The scaffolds for each are ranging from 2MB to 8MB depending on the kmer i used while doing the assembly. After this using UGENE I merged these scaffolds into single sequence. I did this step for each of them. And what I am told is apply MSA on these files. Get a consensus and do the annotation on this consensus seq.
      So these files are no gzip compressed. They are .fa files.
      About X,Y and Z when i use smaller files it gives me an MSA output (.aln) but when i try the X,Y,Z on my actual data. It gives nothing. It just works eventho it has been more than a week. It did not give any output although these softwares are using my cores.

      I am sorry if this does not make sense, but fresh graduate, and could not find somebody to give me a lead.

      Comment


      • #4
        @mastercoder: This is not making sense. miRNA's are inherantly small. Why are you trying to assemble them?

        What did you start this analysis with? What is the aim of the experiment?

        Comment


        • #5
          Originally posted by GenoMax View Post
          @mastercoder: This is not making sense. miRNA's are inherantly small. Why are you trying to assemble them?

          What did you start this analysis with? What is the aim of the experiment?
          The trimmed data of these are really huge. As you can see on the picture

          Comment


          • #6
            There is no doubt there are lots of reads.

            But what experiment are they from? miRNA sequencing? Are you trying to identify how many miRNA's (known?) are there in the samples? What is the point of doing an MSA?

            Comment


            • #7
              Originally posted by GenoMax View Post
              There is no doubt there are lots of reads.

              But what experiment are they from? miRNA sequencing? Are you trying to identify how many miRNA's (known?) are there in the samples? What is the point of doing an MSA?

              There is a treatment and a control group. Each has 15 sequence from rats. What I am told is find out known and novel miRNA's. So i thought i can assemble, get the scaffolds and then get it into a single sequence and apply MSA so I can get a consensus sequence from both group. and then I do the annotation on the consensus sequence, instead of doing it one by one.

              Is this all wrong?

              Comment


              • #8
                Originally posted by mastercoder View Post
                Each has 15 sequence from rats.
                This is not making sense. Did you mean to say that you are only interested in 15 genes/regions?

                Code:
                What I am told is find out known and novel miRNA's.
                The first part can be done by aligning against miRBASE data. No need to do any assembly (if fact that may give you some odd results). For the novel discovery part you can look for software that can do that. Here is one example.

                Code:
                So i thought i can assemble, get the scaffolds and then get it into a single sequence and apply MSA so I can get a consensus sequence from both group. and then I do the annotation on the consensus sequence, instead of doing it one by one.
                This part is not making much sense. You need to ask whoever asked you to do this for further clarification.

                Comment


                • #9
                  @GenoMax
                  No, What I mean is I have 15 miRNA sequences from 15 rats that are control. and other 15 miRNA from 15 rats that are treatment. That is why i was trying to get a consensus sequence from each group. So should I try to align these sequences against miRBASE data one by one?

                  Comment


                  • #10
                    Ah. So you have 15 sequence files (not literally 15 sequences) each for control and treatment. Is that correct?

                    If that is the case then you can align each of them against the miRBASE (not sure if you only want the rat sequences subset from there) to identify reads that align to known miRNA. Then ones that don't align to miRBASE could go into other software to look for novel ones.

                    Comment


                    • #11
                      Originally posted by GenoMax View Post
                      Ah. So you have 15 sequence files (not literally 15 sequences) each for control and treatment. Is that correct?

                      If that is the case then you can align each of them against the miRBASE (not sure if you only want the rat sequences subset from there) to identify reads that align to known miRNA. Then ones that don't align to miRBASE could go into other software to look for novel ones.
                      GenoMax, I really am thankful to you. Sorry to make you straggle a bit. Last 2 question, please bear with me. Should I do aligning against miRBASE with my trimmed data or the assembled ones (scaffolds). Lastly Is there any article or a source or some other keywords that you can give me?

                      Comment


                      • #12
                        Originally posted by mastercoder View Post
                        GenoMax, I really am thankful to you. Sorry to make you straggle a bit. Last 2 question, please bear with me. Should I do aligning against miRBASE with my trimmed data or the assembled ones (scaffolds). Lastly Is there any article or a source or some other keywords that you can give me?
                        Happy to help.

                        You should use the trimmed data (hopefully it was correctly trimmed, what program did you use for that?). If this was a pure miRNA prep then the assembled data makes no sense since most of your miRNA's should be smaller than length of one read (how long were they?).

                        A review like this may be of help.

                        Comment


                        • #13
                          Originally posted by GenoMax View Post
                          Happy to help.

                          You should use the trimmed data (hopefully it was correctly trimmed, what program did you use for that?). If this was a pure miRNA prep then the assembled data makes no sense since most of your miRNA's should be smaller than length of one read (how long were they?).

                          A review like this may be of help.
                          Trimmed data was provided by the company that did the sequencing. Below is the info.

                          And secondly, the trimmed data has 2 files for each sample, i think this is because they are paired-end. That is why I tried to assembly.

                          Comment


                          • #14
                            The two files are most likely paired-end sequencing data (as described here).

                            You only have 29 bp reads (if that info is correct). Do you know what was the fragment size for this library?

                            Comment


                            • #15
                              Originally posted by GenoMax View Post
                              The two files are most likely paired-end sequencing data (as described here).

                              You only have 29 bp reads (if that info is correct). Do you know what was the fragment size for this library?
                              Nope that is not written on the report.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-27-2024, 06:37 PM
                              0 responses
                              15 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-27-2024, 06:07 PM
                              0 responses
                              13 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              55 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              70 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X