Dear All,
May I also ask, since my RNA seq libraries were about 260 bp in size according to Illumina's preparation protocol, for the FASTQ files which I've currently have, do I need to remove the Adapter (Index) sequences before mapping on the reference genome?
Many thanks.
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Originally posted by piet View PostI use 'bwa mem' but my use case is processing of DNA sequencing data. It is very fast and reliable with default settings. Nevertheless, bwa and similar mappers should be suited also for bacterial RNA sequencing since bacteria do not splice their messanger RNA.
In the beginning it took me quite a while to fiddle out how to write shell scripts to start bwa runs in a comfortable way and to handle the resulting sam files. You will definitely need to learn some kind of shell or script programming if you want to go that route.
Why don't you do a DNA sequencing run of your particular isolate before you go into RNA sequencing?
--
piet
Hi Piet,
I see, I have close to none coding/programming knowledge, then maybe BWA is not suitable then. But I will check out the website for more info about it.
I did consider DNA sequencing the genome of my sequence type strain, but the lab has limited funds.
Thank you very much.
Leave a comment:
-
Originally posted by michaellim View PostMay I know what kind of alignment/mapping software do you use?
In the beginning it took me quite a while to fiddle out how to write shell scripts to start bwa runs in a comfortable way and to handle the resulting sam files. You will definitely need to learn some kind of shell or script programming if you want to go that route.
Why don't you do a DNA sequencing run of your particular isolate before you go into RNA sequencing?
--
pietLast edited by piet; 12-19-2014, 02:57 PM.
Leave a comment:
-
Originally posted by piet View PostMulti locus sequence typing (MLST) is a method frequently used to characterized bacterial genomes. MLST schematas have been published for most pathogenic bacteria. For the species Escherichia coli (including Shigella) there exist even three concurring schematas. With the schema maintained at Cork University sequence type 11 (ST11) refers to isolates typically found with cattle (serovar O157:H4), while strains belonging to ST131 are uropathogenic which means they are assoziated with infections of the urinary tract in humans. The chromosome of E.coli encodes more than 4000 proteins. Maybe half of them belongs to the accessory genome, which means they are only found in some strains or clonal groups.
If you want to map your reads from RNA sequencing I would recommend to use a genome from the same or a very closely related sequence type. Otherwise you will miss several genes from the accessory genome. For E.coli ST131 there are several genomes available in Genbank, even fully finished ones (AP009378.1 and plasmid AP009379.1, CP002797.2). Sequences for ST131 isolates KTE173, KTE49, KTE162, KTE6, KTE211, KTE175, KTE178, KTE216, KTE148, KTE139 are available as WGS contigs.
I would recommend to try several reference genomes. A mapping run usually takes only a few minutes on a desktop PC.
--
piet
Many thanks for the clarification. I will give it a try with different genomes then if it doesn't take too long. May I know what kind of alignment/mapping software do you use? Is there any particular reasons for that choice?
Cheers.
Leave a comment:
-
Originally posted by michaellim View PostFor example, E.coli ST11 will be different from ST131. However, we aren't certain whether there is any genes which is specific to ST131 which cannot be found in other E. coli sequence types.
So, if ST11 has a completed genome, but ST131 is in contigs, and my current RNA seq data is on ST131, should I use ST131 (multiple contigs) as the reference or the completed genome of ST11 which is not so related as the reference genome. That was my question. Hope that makes it clearer.
If you want to map your reads from RNA sequencing I would recommend to use a genome from the same or a very closely related sequence type. Otherwise you will miss several genes from the accessory genome. For E.coli ST131 there are several genomes available in Genbank, even fully finished ones (AP009378.1 and plasmid AP009379.1, CP002797.2). Sequences for ST131 isolates KTE173, KTE49, KTE162, KTE6, KTE211, KTE175, KTE178, KTE216, KTE148, KTE139 are available as WGS contigs.
I would recommend to try several reference genomes. A mapping run usually takes only a few minutes on a desktop PC.
--
piet
Leave a comment:
-
Originally posted by Brian Bushnell View PostAll aligners are designed to handle references with multiple contigs; you don't need to combine anything (nor should you). You just need to index it.
Well since you ask me, I will recommend BBMap, which also handles RNA-seq data, but is faster and more sensitive than Tophat. But bacteria generally lack introns - when they are present, they are very short and only in a handful of genes. So it's not strictly necessary to use a splice-aware aligner for bacterial RNA-seq, though I would still recommend it.
I will give it a go first and see what happens.
Leave a comment:
-
Originally posted by Sergioo View PostWhat do you mean exactly by sequence type? Maybe those assigned from MLST typing?
Yes, MLST. For example, E.coli ST11 will be different from ST131. However, we aren't certain whether there is any genes which is specific to ST131 which cannot be found in other E. coli sequence types.
So, if ST11 has a completed genome, but ST131 is in contigs, and my current RNA seq data is on ST131, should I use ST131 (multiple contigs) as the reference or the completed genome of ST11 which is not so related as the reference genome. That was my question. Hope that makes it clearer.
Thank you.
Leave a comment:
-
Originally posted by michaellim View PostDear everyone,
A complete and annotated reference genome of a bacteria from a different sequence type.
Which would be more appropriate? Would appreciate some advice.
Thank you.
Leave a comment:
-
Originally posted by michaellim View PostHi Brian,
So if I were to use the multiple contigs for my reference when aligning my RNAseq data, may I ask how should I do this? Do I need to first combine all the contigs (how can I do this?)?
And during alignment, which is the best to be used for bacterial RNAseq? Tophat or BWA or Bowtie? I heard Tophat is used a lot in eukaryotic RNAseq as it looks for splice-junctions.
Thank you very much.
Leave a comment:
-
Originally posted by GenoMax View PostIf the overall organization of the genomes is similar then whole genome comparison can be informative. Mauve is designed for doing these kinds of comparisons, which can help locate genome level rearrangements. Comparing multiple Ecoli strains would be appropriate as in this example from Yersinia: http://asap.genetics.wisc.edu/softwa...creenshots.php
Thanks for the info. Could you please advise how do I compare the "Published completed genome" with the other "published genome which is in contigs", do I need to merge the contigs first before using Mauve (may I ask how can I do that?)?
Many thanks.
Leave a comment:
-
Originally posted by Brian Bushnell View PostIt's difficult to get single-contig assemblies (unless you use PacBio data). Multiple contigs typically mean that the coverage was too low in places to assemble correctly, or there were long repeats that confused the assembler. When we assemble a microbe from Illumina data, we might get 50 contigs or more. Probably 99%+ of the genome is there, but typically the order and orientation of the contigs is not know. There are not necessarily gaps but there may be.
As for "ST", I've just never heard that terminology before; people I work with normally refer to those as "strains". And yes, I think it's still best to use the genome that is most closely related to your organism unless the assembly is really bad (hundreds of small contigs).
Edit - also, as GenoMax pointed out, plasmids will cause correct multi-contig assemblies.
So if I were to use the multiple contigs for my reference when aligning my RNAseq data, may I ask how should I do this? Do I need to first combine all the contigs (how can I do this?)?
And during alignment, which is the best to be used for bacterial RNAseq? Tophat or BWA or Bowtie? I heard Tophat is used a lot in eukaryotic RNAseq as it looks for splice-junctions.
Thank you very much.
Leave a comment:
-
Originally posted by michaellim View PostHi Brian,
For example with E. coli, although this is ONE species, but there are various version of it, i.e. sequence type (ST), for example the human adapted E. coli which causes problematic infections around the world is ST131. Between the different sequence types, there might be mutations/genes specific to each of them.
Leave a comment:
-
Originally posted by michaellim View PostHi Brian,
For example with E. coli, although this is ONE species, but there are various version of it, i.e. sequence type (ST), for example the human adapted E. coli which causes problematic infections around the world is ST131. Between the different sequence types, there might be mutations/genes specific to each of them.
I'm totally new to sequencing. When they are in several contigs, does it mean that there are gaps between the sequences, hence the authors deposited the sequences in contigs rather than a circular 4Mb chromosome?
Many thanks for the advice.
As for "ST", I've just never heard that terminology before; people I work with normally refer to those as "strains". And yes, I think it's still best to use the genome that is most closely related to your organism unless the assembly is really bad (hundreds of small contigs).
Edit - also, as GenoMax pointed out, plasmids will cause correct multi-contig assemblies.
Leave a comment:
-
Originally posted by michaellim View Post
I'm totally new to sequencing. When they are in several contigs, does it mean that there are gaps between the sequences, hence the authors deposited the sequences in contigs rather than a circular 4Mb chromosome?
Many thanks for the advice.
Leave a comment:
-
Originally posted by AntonioRFranco View PostYou can do a whole genome comparison with some programs such as Mauve or Act. There are tutorials around explaining how to use them
Do you mean compare the two options first? What if there's a difference between the two genomes? What do you suggest I do then?
Many thanks.
Leave a comment:
Latest Articles
Collapse
-
by seqadmin
In recent years, precision medicine has become a major focus for researchers and healthcare professionals. This approach offers personalized treatment and wellness plans by utilizing insights from each person's unique biology and lifestyle to deliver more effective care. Its advancement relies on innovative technologies that enable a deeper understanding of individual variability. In a joint documentary with our colleagues at Biocompare, we examined the foundational principles of precision...-
Channel: Articles
01-27-2025, 07:46 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Yesterday, 09:07 AM
|
0 responses
12 views
0 likes
|
Last Post
by seqadmin
Yesterday, 09:07 AM
|
||
Started by seqadmin, 01-31-2025, 08:31 AM
|
0 responses
23 views
0 likes
|
Last Post
by seqadmin
01-31-2025, 08:31 AM
|
||
Started by seqadmin, 01-24-2025, 07:35 AM
|
0 responses
78 views
0 likes
|
Last Post
by seqadmin
01-24-2025, 07:35 AM
|
||
Started by seqadmin, 01-23-2025, 09:43 AM
|
0 responses
46 views
0 likes
|
Last Post
by seqadmin
01-23-2025, 09:43 AM
|
Leave a comment: