The objective of designing an efficient database is to jettison data redundancy. One data needs not to be stored in more than one place to save storage space and reduce data inconsistency incurred when dealing with duplicate data. A well-designed database will ensure the accuracy and integrity of the data being handled.
Database design process
When designing a database, you have to note that it is more of an art than science since you will be tasked with making a lot of decisions. You can customize a database to suit a given application’s requirement. There is not even a single case whereby you find two like applications, and thus two databases can never be the same. When making design decisions, it is essential to know guidelines on things you should not do and that the choices should be your own – the database designer.
Step One – Defining the database purpose.
Before you begin designing a database, it is vital to collect the requirements and outline your database’s object. For instance, draft out samples of input forms, reports, and queries as this would help know what you want to achieve at the end of your design.
Step Two – Collect Data and organize them in tables specifying primary keys.
After deciding to design a database, gather information and data to be kept in the database. Data can be divided into tables based on their subjects. Pick columns as the primary keys to uniquely identify the subject of every row.
The primary key
Duplicate rows cannot exist in a relational model table since it would be ambiguous to retrieve data. Ensure that every table has columns known as primary keys to identify every table record uniquely. For instance, ISBN for the table of books; productCode for products; customerID as customers’ table primary key. The primary key can either be referred to as a composite key for several columns, simple key for a single column.
After deciding on what column(s) are to be used as primary keys, ensure that the primary key has unique values and not NULL. You cannot have CustomerName as a primary key since more than one customer has the same name. Always remember to keep it simple, and the value of the primary key is constant every time.
Step Three – Creating table relationships.
Spreadsheets can be best if you want to create a table with independent data. However, a relational database is more concerned with the relationships between tables. The most common relationship types in a relational database are one-to-one, one-to-many, and many-to-many—for example, One-to-one – a product having supplementary information like comment, more description, and image. One-to-many – a manager manages zero or many employees; many-to-many – customers order one or more products.
Step Four – Refining and normalizing the design
Consider adding more columns in the table or creating a new one for optional data applying the relational relationship like one-to-one. A large table can be split into two similar tables. Next, apply normalization rules to ensure that the design is optimal and correct.
1NF – First Normal Form – the atomic properties prohibits column group repetition like itema, itemb,… itemN. The table is considered 1NF if each cell has a single value and not lists. It would be best if you considered creating a new one-to-many relationship table to accommodate these.
2NF – Second Normal Form – the table can only be 2NF if it is 1NF and that each non-key column is dependent on the primary key. After all, suppose the primary key consist of numerous columns; each non-key column should depend on the whole set and not partly.
3NF – Third Normal Form – A table is considered a 3NF; suppose it is 2NF and that all columns that are non-keys are entirely independent of one another. This is to mean, every non-key column must depend only on the primary key. Of course, the 3NF may have its inadequacies and result in higher standard forms like Codd/Boyce Normal form, 4NF, and 5NF.
Rules of integrity
It is essential to consider some integrity rules to ensure data integrity. The entity integrity rule states that there cannot be a NULL primary key; it cannot be uniquely identified as a row. The referential integrity rule emphasizes that every foreign key value must match an immediate key value in the reference table. The business integrity rule applies to the logic of business. For instance, a 5-digit zip code within a given range, date of delivery, and time have to fall within business hours; the quantity of a customer has to be less or equal to that in stock.