Protein clustering resulted in 3, clusters with 21, sequences Top 15 groups by size are presented in Supplementary Table S4. To obtain an overview of the entire genome, protein annotations were counted across all the groups. Comparative analysis of globe artichoke proteins against Arabidopsis thaliana, turnip Brassica rapa , strawberry Fragaria vesca , tomato Solanum lycopersicum and lettuce Lactuca sativa resulted in 25, clusters, which included , proteins out of , proteins across the six taxa. Of the 27, globe artichoke proteins, 20, were grouped into 14, gene families, of which are unique families and 6, are singletons. A core set of 6, families was shared by the six species of which 2, were detected as single copy in all the species Supplementary Table S6A.
|Published (Last):||6 October 2008|
|PDF File Size:||14.5 Mb|
|ePub File Size:||9.57 Mb|
|Price:||Free* [*Free Regsitration Required]|
Protein clustering resulted in 3, clusters with 21, sequences Top 15 groups by size are presented in Supplementary Table S4. To obtain an overview of the entire genome, protein annotations were counted across all the groups.
Comparative analysis of globe artichoke proteins against Arabidopsis thaliana, turnip Brassica rapa , strawberry Fragaria vesca , tomato Solanum lycopersicum and lettuce Lactuca sativa resulted in 25, clusters, which included , proteins out of , proteins across the six taxa. Of the 27, globe artichoke proteins, 20, were grouped into 14, gene families, of which are unique families and 6, are singletons.
A core set of 6, families was shared by the six species of which 2, were detected as single copy in all the species Supplementary Table S6A. The largest number of gene clusters was shared between globe artichoke and tomato, presumably due to their closer evolutionary relationship vs. The shared gene clusters represent Figure 4 Venn diagram of orthologous gene clusters among Arabidopsis thaliana a , Cynara cardunculus c , Fragaria vesca f and Solanum lycopersicum s , showing a total of 9, common gene clusters.
Full size image We searched for gene families in which a particular species contained a higher number of genes than expected and 46 returned a significant deviation, mostly driven by expansions in either Arabidopsis, tomato, strawberry or lettuce. Four clusters, in which globe artichoke had an expanded number of genes, are reported in Supplementary Table S6B. These are annotated as i Cystatin-like proteins with the cysteine proteases inhibitor signature 33 members , ii replication factor A, C-terminal domain 15 members , iii glycosyltransferase family 10 fucosyltransferase, 8 members ; none of these three groups of genes were physically clustered.
A fourth group 27 members , containing genes encoding p class E enzyme group I proteins was physically clustered in chromosome 13 15 out of 27 genes, Supplementary Fig. S3A , in a region spanning Kb interspersed with 44 other genes. The phylogenetic analysis suggested that the 15 genes resulted from serial duplication events Supplementary Fig. Five other clusters were over-represented in both globe artichoke and lettuce Supplementary Table S6B. The first two encoded NB-LRR disease resistance related proteins with 20 and 19 members , one of which was clustered on chromosome 10 in two separate regions of 2 Mb each Supplementary Fig.
A third cluster encoded bulb-type mannose specific lectin associated protein kinases 19 members that localized on three chromosomes chr. A fourth group encoded pentatricopeptide proteins 36 members. The last group encoded an Allergen Related Protein family 13 members that was clustered on chromosome 15 over a 1 Mb region. A total of clusters 1, genes were unique to globe artichoke, across the panel of six analyzed species.
The specific genes are widely distributed over the globe artichoke chromosomes; 18 regions had clusters of genes involved in the same biological process Supplementary Table 6c ; Supplementary Fig. Repetitive content of the genome Approximately Mb Insertion dates were calculated based on sequence divergence of the LTRs for all the clustered elements; an expansion was evident at 2.
Lower variability in the ages among the Gypsy elements was observed, with a tendency to be more stable across time, with the exception of the Fatima sub-family. Higher variability was detected between different subtypes of Copia elements, with certain sub-families showing a decrease in insertion events over time, while others had a higher proportion of young elements Fig.
Figure 5 Repeat identification and dating in the genome of C. Conserved miRNAs are present in at least two of the 11 tested species. Full size image Satellite and microsatellite sequences The inspection of the available assembly for the presence of satellite sequences allowed the identification of several monomers scattered across the genome.
The most abundant where three monomers of 96 bp, 94 bp and bp, found , and times, respectively Supplementary Table S7. Other satellite sequences have been identified with minor frequencies.
The distribution of satellite motifs across the chromosomes showed a contextual enrichment within presumptive centromeres as suggested by density of repeats Supplementary Fig. It was not possible to determine a single satellite motif involved in centromeres. The SSR loci namely CyMSat identified were classified on the basis of the repeat motif and their distribution over the pseudomolecules Supplementary Table S8 as well as the number of repeat units Di-nucleotides are the most frequent Nearly , imperfect SSR motifs were identified.
Since their identification was driven by similarity to known miRNAs, the size of the globe artichoke miRNome was likely underestimated 39 , but sufficient to provide an overview of its basic genomic organization. Although some previously known, conserved miRNAs, such as miR and miR, were not identified, possibly due to their loss or, more likely, to genomic loci missing in the assembly, candidate genes for all but one of the 19 conserved miRNAs reported in previous smallRNA—seq studies of leaves and roots of globe artichoke 40 were detected.
Fifty-eight miRNAs were represented by less than 10 loci, nine numbered between 10 and 99 loci and four between and , while two had greater than 1, loci Fig. All the miRNAs with more than loci were C. Analysis of putative miRNA-targets among the transcripts identified here was carried out on both conserved and putative non-conserved miRNAs using mature sequences present in miRBase, release 21 as a reference. Both conserved and non-conserved microRNAs target multiple genes: as examples, miR, miR and miR target 39, 26 and 23 genes, respectively and, among non-conserved miRNAs, miR, miR and miR target 28, 24 and 15 genes respectively.
For components C enrichments were present for: nucleus GO and mitochondrial membrane outer membrane translocase complex GO Many conserved miRNAs were predicted to target known transcription factors related to plant development, morphology and flowering time. Age of speciation Distributions of synonymous nucleotide substitutions Ks were analysed to investigate ages of speciation Fig. However, by applying the Ks correction to compensate for nuclear rate heterogeneity across Compositae as estimated elsewhere 19 , the C.
A novel locally guided genome reassembling technique using an artificial ant system
Abstract Background Despite the short length of their reads, micro-read sequencing technologies have shown their usefulness for de novo sequencing. However, especially in eukaryotic genomes, complex repeat patterns are an obstacle to large assemblies. Principal Findings We present a novel heuristic algorithm, Pebble, which uses paired-end read information to resolve repeats and scaffold contigs to produce large-scale assemblies. In simulations, we can achieve weighted median scaffold lengths N50 of above 1 Mbp in Bacteria and above kbp in more complex organisms.
South African Journal of Science
Palsson2, Derek R. Lovley3, Christian L. Craig Venter Institute, Rockville, Maryland, United States of America Abstract State-of-the-art DNA sequencing technologies are transforming the life sciences due to their ability to generate nucleotide sequence information with a speed and quantity that is unapproachable with traditional Sanger sequencing. Genome sequencing is a principal application of this technology, where the ultimate goal is the full and complete sequence of the organism of interest. Due to the nature of the raw data produced by these technologies, a full genomic sequence attained without the aid of Sanger sequencing has yet to be demonstrated.
Application of 'next-generation' sequencing technologies to microbial genetics
Metrics details Key Points New sequencing technologies, such as Solexa, pyrosequencing and SOLiD, developed by Illumina, Roche and Applied Biosystems, respectively, are set to revolutionize microbiology by dramatically increasing throughput and reducing costs of DNA sequencing. These new technologies present new technical and computational challenges, as well as new research opportunities. Applications include de novo genome sequence assembly, metagenomics, sRNA discovery, detection of polymorphisms, expression profiling and epigenetics. Many freely available software packages are available for dealing with the large datasets generated by these applications. As well as sequence alignment and assembly, there is a need for downstream processing of data into a form that is accessible to biologists.
This component of our strategy is a key distinguishing aspect of our approach. Although Newbler alone was able to assemble the reads into five scaffolds, the resulting assembly had a considerable number of degenerate positions which could not be resolved just from an error correction step using Illumina reads Table 3. Similarly, while EULER-SR  and Velvet  both generated high quality contigs, they do not perform as well as Newbler with respect to leveraging the paired-end information in the reads. Our results clearly show that integrating more than one assembly algorithm is very important for enhancing the quality of the assembly. In the third phase, the simple PCR-based search strategy allowed us to quickly order and orient the scaffolds into a circular genome. This is another unique aspect of our approach in that we address the problem of relative orientation of the scaffolds as well as their ordering with just a few PCRs.