INTRODUCTION TO DATABASE MANAGEMENT SYSTEM
A database management system (DBMS) consists of collection of interrelated data and a set of programs to access that data. The collection of data, usually referred to as the database contains information about one particular enterprise The primary goal of a DBMS is to provide an environment that is both convenient and efficient to use in retrieving and storing database information.
Database systems are designed to manage large bodies of information The management of data involves both the definition of structures for the storage of information and the provision of mechanisms for the manipulation of information. In addition, the database system must provide for the safety of the information stored, despite system crashes or attempts at unauthorized access. If data are to be shared among several users, the system must avoid possible anomalous results.
The following topics have been discussed here :
· Why Database
· Characteristics of Data in Database
· Drawbacks in File System
· Data redundancy and inconsistency
· DBMS Components
· DBMS Advantages
Why Database
A database can be defined as a collection of non-redundant data which can be shared by user.
Changes (or updation) can be made to data without affecting other components of the system. These changes include, for example, change of data format or file structure or relocation from one device to another.
Advantages of a Database
· Reduction in data redundancy
· Maintenance of data integrity and quality
· Avoidance of inconsistency
· Reduced cost of software development
· Security restrictions
View of the Database
The database can present different views of itself to users, programmers and the database administrator.
· The internal data representation (internal view) is normally not seen by the user or application programmer.
· The conceptual view or conceptual schema is the primary means by which the DBA builds and manages the database.
· The DBMS can present multiple views of the conceptual schema (external views) to programmers and users, depending on the application.
Characteristics of Data in Database
In general, the data in the database-at least in a large system –will be both integrated and shared.
· By integrated, we mean that the database can be thought of as a unification of several otherwise distinct data files, with any redundancy among those files wholly or partly eliminated.
· By shared, we mean that individual pieces of data in the database can be shared among several different users, in the sense that each of those users can have access to the same piece of data (and different users can use it for different purposes). As indicated earlier, different users can even be accessing the same piece of data at the same time (“concurrent access”). Such sharing (concurrent or otherwise) is partly a consequence of the fact that the database is integrated.
Drawbacks in File System
Data redundancy and inconsistency
Since different programmers create application programs over a long period, various files are likely to have different formats and the programs may be written in several programming languages. Moreover, the same information may be duplicated in several places (files). For example, the address and telephone number of particular customer may appear in a file that consists of savings account records and in file that consists of checking-account records. This redundancy leads to higher storage and access cost. In addition, it may lead to data inconsistency, that is, the various copies of the same data may no longer agree. For example, changed customer address may be reflected in savings-account records but not elsewhere in the system.
Difficulty in accessing data
Suppose that one of the bank officers needs to find out the names of all customers who live within the city's 78733 zip code. The officer asks the data -processing department to generate such a list. Because this request was not anticipated when the original system was designed, there is no application program on hand to meet it. There is, however, an application program to generate the list of all customers. The bank officer has now two choices: either obtain the list of all customers and have the needed information extracted manually, or ask the data-processing department to have system programmer write the necessary application program. Both alternatives are obviously unsatisfactory. Suppose that such a program is written, and that, several days later, the same officer needs to trim that list to include only those customers who have an account balance of Rs10,000 or more. As expected, a program to generate such a list does not exist. Again, the officer has the preceding two options, neither of which is satisfactory. The point here is that conventional file-processing environments do not allow needed data to be retrieved in a convenient and efficient manner. More responsive data-retrieval systems must be developed for general use.
Data isolation
Because data are scattered in various files, and files may be in different formats, it is difficult to write new application programs to retrieve the appropriate data.
Integrity problems
The data values stored in the database must satisfy certain types of consistency constraints. For example, the balance of a bank account may never fall below a prescribed amount (say, Rs25). Developers enforce these constraints in the system by adding appropriate code in the various application programs. However, when new constraints are added, it is difficult to change the programs to enforce them. The problem is compounded when constraints involve several data items from different files.
Atomicity problems
A computer system, like any other mechanical or electrical device, is subject to failures. In many applications, it is crucial to ensure that, once a failure has occurred and has been detected, the data are restored to the consistent state that existed prior to the failure. Consider a program to transfer Rs50 from account A to B. If a system failure occurs during the execution of the program, it is possible that the Rs50 was removed from account A but was not credited to account B, resulting in an inconsistent database state. Clearly, it is essential for database consistency that either both the credit or debit occur, or that neither occur. That is, the funds transfer must be atomic-it must happen in the entirety or not at all. It is difficult to ensure this property in a conventional file-processing system.
Concurrent-access anomalies
So that the overall performance of the system is improved and a faster response time is possible, many systems allow multiple users to update the data simultaneously. In such an environment, interaction of concurrent updates may result in inconsistent data. Consider bank account A, containing Rs 500. If two customers withdraw funds (say Rs 50 and Rs 100 respectively) from account A at about the same time, the result of the concurrent executions may leave the account in an incorrect (or inconsistent) state. Suppose that the programs executing on behalf of each withdrawal read the old balance, reduce that value by the amount being withdrawn, and write the result back. If the two programs run concurrently, they may both read the value Rs 500, and write back Rs 450 and Rs 400, respectively. Depending on which one writes the value last, the account may contain either Rs450 or Rs400, rather that the correct value of Rs350. To guard against this possibility, the system must maintain some form of supervision. Because many different application programs that have not been coordinated previously may access data, therefore, supervision is difficult to provide.
Security problems
Not every user of the database system should be able to access all the data, for example, in a banking system, payroll personnel need to see only that part of the database that has information about the various bank employees. They do not need access to information about customer accounts. Since application programs are added to the system in an adhoc manner, it is difficult to enforce such security constraints.
DBMS Components
Data: Data stored in a database include numerical data which may be integers (whole numbers only) or floating point numbers (decimal), and non-numerical data such as characters (alphabetic and numeric characters), date or logical (true or false). More advanced systems may include more complicated data entities such as pictures and images as data types.
Standard operations: Standard operations are provided by most DBMSs. These operations provide the user with basic capabilities for data manipulation. Examples of these standard operations are sorting, deleting and selecting records.
Data definition language (DDL): DDL is the language used to describe the contents of the database. It is used to describe, for example, attribute names (field names), data types, location in the database, etc.
Data manipulation and query language: Normally a query language is supported by a DBMS to form commands for input, edit, analysis, output, reformatting, etc. Some degree of standardization has been achieved with SQL (Structured Query Language).
Programming tools: Besides commands and queries, the database should be accessible directly from application programs through function calls (subroutine calls) in conventional programming languages.
File structures: Every DBMS has its own internal structures used to organize the data although some common data models are used by most DBMSs.
DBMS Advantages
There are three main features of a database management system that make it attractive to use a DBMS in preference to other systems. These features are:
· Centralized data management,
· Data independence
· Systems integration.
In a database system, the data is managed by the DBMS and all access to the data is through the DBMS providing a key to effective data processing. This contrasts with conventional data processing systems where each application program has direct access to the data it reads or manipulates.
In the conventional data processing application programs, the programs usually are based on a considerable knowledge of data structure and format. In such environment any change of data structure or format would require appropriate changes to the application programs. If major changes were to be made to the data, the application programs may need to be rewritten.
In a database system, the database management system provides the interface between the application programs and the data. When changes are made to the data representation, the metadata maintained by the DBMS is changed but the DBMS continues to provide data to application programs in the previously used way. The DBMS handles the task of transformation of data wherever necessary.
This independence between the programs and the data is called data independence. Data independence is important because every time some change needs to be made to the data structure, the programs that were being used before the change would continue to work. To provide a high degree of data independence, a DBMS must include a sophisticated metadata management system.
In DBMS, all files are integrated into one system thus reducing redundancies and making data management more efficient. In addition, DBMS provides centralized control of the operational data. Some of the advantages of data independence, integration and centralized control are:
- Redundancies and inconsistencies can be reduced
- Better service to the Users
- Flexibility of the system is improved
- Cost of developing and maintaining systems is lower
- Standards can be enforced
- Security can be improved
- Integrity can be improved
- Enterprise requirements can be identified
- Data model must be developed