![]()
|
[Frontiers in Bioscience 2, a31-36, November 1, 1997] Reprints PubMed CAVEAT LECTOR |
|
|---|---|---|
![]() ![]() ![]()
|
FUNCTIONAL BIOINFORMATICS: THE CELLULAR RESPONSE DATABASE James Sorace1,2,3, Kip Canfield1, Steven Russell1
1 Received 4/3/97 Accepted 10/24/97 5. DISCUSSION Our effort centers upon developing common data models for the storing and retrieving of data in biology and medicine. In addition to the current data model, we have proposed a data model for a Molecular Diagnostics Laboratory Information System intended for the clinical use (13). It is particularly important that these databases support queries based on the biological or medical context in which an experiment occurs. The undertaking of this type of data modeling is necessary if the concepts of intelligent agents, distributed collaboration, data mining and a shared intelligence (14,15) are to be applied to practice. If widely implemented, a database like the one proposed above would have many advantages in generating new biological knowledge. First, comprehensive queries are rapidly displayed and more easily interpreted. Secondly, networks of control and regulatory pathways can begin to be investigated. It is increasingly recognized that the next phase of biological advances will involve the exploration of complicated networks of biological control (16). Once a critical mass of data has been entered it will be potentially possible to search for physiologically relevant pathways, and produce biologically relevant hypothesis for further experimental study. Finally, such an approach to data management would reduce redundant experimentation, and allow researchers with conflicting data to be aware of these results earlier. However, the development of shared data models, such as the one outlined above, is an absolute prerequisite before this level of enterprise informatics can occur. Many difficulties associated with the development of an electronic database repository will need to be addressed. Several of these issues are outlined below: 1. Lack of standard data models and a formal syntax for the description of experimental data. Perhaps, the greatest challenge and opportunity in the field of bioinformatics is to design data models and logical notations that will support the future generation of knowledge. The data model proposed here and its notation (e.g. agent, test agent, control agent) represent one approach to some of these issues. These concepts can be extended. Some users may wish to use transfected genes as agents that modify the test cell population's target gene expression. For example, does the transfection of epidermal growth factor into fibroblasts increase cellular c-myc expression? The current database contains tables to support this particular possibility (figure 2). While these tables have not been fully implemented, they represent an explicit example of one way the database might be upgraded. However, no data model is entirely inclusive in that it can capture every type of data generated by an experimenter. There always will remain a need for the data to be annotated through the use of memo fields, and for links to be established to journal like entities. 2. Peer review: Assuring the entry of high quality data is critical for database success. Database entry would occur in 2 stages. First, the investigator would submit the paper to a participating journal for review, and its associated database would be placed in a provisional CRD database. This review would focus on the scientific merit of the paper. The reviewers would now be able to query and reformat the submitted data directly, as well as search the published database for related information. Once accepted for publication by the journal, the data entered in the provisional database would be transferred to the fully on-line version. It is important to recognize that the paradigms governing the scale of biological research are also changing. Software like the CRD may enable large collaborative groups, using good laboratory practice protocols, to establish high quality databases. These groups may divide along the test agent used (chemokines or IFN) or assay types (protein or mRNA expression). By using a common database the compilation and quality control of such an effort can be improved. Regardless of whether databases are populated by individual submissions or by large group efforts, issues regarding review and quality assurance will be the subjects of considerable future debate. 3. Data entry and presentation: Careful consideration will need to be given to interface design to assure that data can be entered quickly and accurately. If the laboratory were to use a database with similar data structures to the depositories, submission could be largely automated. Conversely, once the data is entered, the development of useful query and presentation formats will be crucial to support end users. 4. Archiving of historical data: Data that has already been generated will not be archived in an electronic database. It is possible that a highly selected subset of such publications may be retrospectively archived. However, this criticism overlooks the fact that the rate of data generation will only increase as biological research continues to advance, inundating the current information infrastructure and rendering it obsolete. In this report, we have outlined one possible way to advance bioinformatics. However, the biological community should be given the opportunity to influence the types of databases that need to be developed and supported by collaborative data entry. In an effort to initiate a dialogue, the interested reader is invited to complete a survey associated with the CRD web site. We will publish the results of this survey to help guide the future database design. The interested reader is invited to further participate in the development of this database by visiting the prototype CRD web site. Currently, investigators can deposit their data provided the work has been accepted or published by a peer-reviewed journal. |