A database is an organized collection of data, generally stored and accessed electronically from a computer system. Where databases are more complex they are often developed using formal design and modeling techniques. Here is a list of coding interview questions on Databases to help you get ready for your next data structures interview in 2021.
- 1.
What is Normalization?
Answer:
Normalization is the systematic process of reducing data redundancy, eliminating data anomalies, and enhancing data integrity in a database. The goal is to minimize data duplication and ensure that every piece of data is stored in the most efficient and logical manner.
Why is Normalization Important?
Normalization is crucial for several reasons:
- Data Integrity: By eliminating redundant data, the risk of data becoming inconsistent is greatly reduced.
- Efficiency: Normalized databases are more space-efficient and require fewer resources to maintain.
- Flexibility: A normalized database is more flexible and adaptable to future changes in business requirements.
- Quality Assurance: Data anomalies and inconsistencies can lead to incorrect analytical results or business decisions. Normalization helps in avoiding such issues.
Normalization Levels
There are different levels of normalization that a database can achieve, known as normal forms. The most commonly discussed normal forms are:
- First Normal Form (1NF): Ensures atomicity, i.e., each data field contains only indivisible values.
- Second Normal Form (2NF): Removes partial dependencies, where a non-key attribute is functionally dependent on only part of the primary key.
- Third Normal Form (3NF): Eliminates transitive dependencies, where a non-key attribute depends on another non-key attribute.
- Boyce-Codd Normal Form (BCNF): A stricter version of 3NF designed to handle a special type of anomaly called a functional dependency.
- Fourth Normal Form (4NF): Deals with multi-valued dependencies.
- Fifth Normal Form (5NF): Addresses join dependencies, where a database table can be further decomposed if it contains sets of attributes that can be logically grouped.
Demonstration
Let's take a simple database table with a few records to demonstrate normalization:
Unnormalized Form (UNF)
Order ID Customer Name Product Name Category 1 John Doe iPhone 12 Electronics 2 Jane Smith Apple Watch Electronics 3 Sarah Thompson Nike Shoes Apparel 4 John Doe Nike T-shirt Apparel In this UNF table, both
Customer Name
andProduct Name
are duplicated.First Normal Form (1NF)
The table is already in 1NF because it has a primary key, and each field contains atomic values.
Second Normal Form (2NF)
To achieve 2NF, we need to remove the partial dependencies. In this case, the
Category
is partially dependent on theProduct Name
, which is a part of the primary key. We can break it into a separate table:Table 1: Orders
Order ID Customer Name Product ID 1 John Doe 1 2 Jane Smith 2 3 Sarah Thompson 3 4 John Doe 4 Table 2: Products
Product ID Product Name Category 1 iPhone 12 Electronics 2 Apple Watch Electronics 3 Nike Shoes Apparel 4 Nike T-shirt Apparel Third Normal Form (3NF)
In 3NF, we identify and remove transitive dependencies. In our example, there is no such dependency.
Boyce-Codd Normal Form (BCNF)
This form deals with functional dependencies and is stricter than 3NF. In our example, the table is already in BCNF.
Final Thoughts
Normalization is not a one-time process, and databases should be periodically checked for normalization. Moreover, depending on the context, going beyond 3NF may not always be necessary and could even lead to performance issues.
- 2.
What are the advantages of NoSQL over traditional RDBMS?
Answer:
The decision to use either NoSQL or RDBMS for a particular application depends on a multitude of factors. Here are some of the key advantages that NoSQL databases have over traditional RDBMS systems while also discussing the areas where RDBMS might still be better suited.
Advantages of NoSQL
Flexible Schema
One of the main advantages of NoSQL is its ability to handle unstructured and semi-structured data. Unlike RDBMS, NoSQL databases do not require a predefined schema, offering more flexibility for data storage and retrieval.
High Performance and Scalability
NoSQL databases often excel in terms of read/write and horizontal scalability. They can easily handle large volumes of data, making them a good fit for Big Data and high-throughput applications.
Variety of Data Models
NoSQL databases come in different flavors, each catering to specific data storage and retrieval needs. The four main types are:
- Key-Value Stores: Provide a lookup service based on keys.
- Document Stores: Store, query, and retrieve document-oriented information.
- Column-Family Stores: Optimize for large datasets with transactional integrity.
- Graph Databases: Designed to manage and query connections between entities.
Cost-Effectiveness
NoSQL databases can be run on commodity hardware, which is often more affordable compared to specialized storage solutions required for RDBMS.
Quick Iteration and Development
The lack of a rigid schema and other features designed for flexibility often makes NoSQL databases easier to work with, leading to faster development cycles.
Example: Document Database with Python
Here's a quick example using MongoDB, a popular NoSQL document database, with Python:
from pymongo import MongoClient # Connect to MongoDB client = MongoClient("mongodb://localhost:27017/") db = client["mydatabase"] # Insert a document mydoc = {"name": "John", "age": 30} db["customers"].insert_one(mydoc) # Query documents for x in db["customers"].find(): print(x)
Advantages of RDBMS
While NoSQL solutions offer many benefits, there are areas where RDBMS systems still hold the advantage.
Data Integrity and ACID Compliance
RDBMS are known for their strong data integrity through features like ACID (Atomicity, Consistency, Isolation, Durability) properties. This makes RDBMS a natural choice for applications that deal with financial transactions and other sensitive types of data.
Complex Query Support
If your application relies heavily on complex queries that require joins and group by operations, RDBMS might be a better fit.
Mature Ecosystem and Tooling
RDBMS solutions have a long-established ecosystem, including query optimization tools, support for business intelligence (BI), and analytics, making them a reliable choice for many types of applications.
Example: Relational Database with Python
Here's a simple example using SQLite, a lightweight RDBMS, with Python:
import sqlite3 # Connect to SQLite database conn = sqlite3.connect('example.db') c = conn.cursor() # Create a table c.execute('''CREATE TABLE stocks (date text, trans text, symbol text, qty real, price real)''') # Insert a row of data c.execute("INSERT INTO stocks VALUES ('2022-01-01','BUY','RHAT',100,35.14)") # Query the table for row in c.execute('SELECT * FROM stocks ORDER BY price'): print(row) # Save the changes conn.commit() # Close the connection conn.close()
Conclusion
While NoSQL databases offer flexibility and scalability, they might not be suitable for use cases that require ACID compliance, strong schema, and complex relational queries. The choice between NoSQL and RDBMS ultimately depends on your application's specific requirements.
- 3.
What is the difference between Data Definition Language (DDL) and Data Manipulation Language (DML)?
Answer:
In the world of databases, two SQL command categories are particularly important to grasp: Data Definition Language (DDL) and Data Manipulation Language (DML).
Core Concepts
Data Definition Language (DDL)
- Objective: Defines the structure of database schemas and objects.
- Key Commands:
CREATE
,ALTER
,DROP
,TRUNCATE
,RENAME
.
Data Manipulation Language (DML)
- Objective: Manipulates the data within the database objects.
- Key Commands:
SELECT
,INSERT
,UPDATE
,DELETE
,MERGE
.
In-Depth Explanation
Data Definition Language (DDL)
DDL is used to define, modify, and delete structures in the database. These structures include tables, views, indexes, and more. DDL statements generally auto-commit, meaning the changes are effective immediately and cannot be rolled back.
The commonly used DDL commands are:
-
CREATE: Creates new database objects like tables, views, and indexes.
CREATE TABLE Customers (ID int, Name varchar(255));
-
ALTER: Modifies existing database objects, like tables or columns.
ALTER TABLE Customers ADD Age int;
-
DROP: Removes database objects.
DROP TABLE Customers;
-
TRUNCATE: Removes all rows from a table, but keeps the table structure.
TRUNCATE TABLE Customers;
-
RENAME: Renames a table or column.
ALTER TABLE Customers RENAME TO Clients;
Data Manipulation Language (DML)
DML is used to retrieve, manipulate, and delete data in existing database objects. DML commands generally do not auto-commit, meaning you can use transaction management to roll back the changes.
The commonly used DML commands are:
-
SELECT: Retrieves data from one or more tables.
SELECT * FROM Customers;
-
INSERT: Adds new rows of data into a table.
INSERT INTO Customers (ID, Name) VALUES (1, 'Alice');
-
UPDATE: Modifies existing data in a table.
UPDATE Customers SET Name = 'Robert' WHERE ID = 1;
-
DELETE: Removes rows from a table.
DELETE FROM Customers WHERE ID = 1;
-
MERGE: Performs an upsert operation (combines INSERT and UPDATE).
MERGE INTO Customers AS TARGET USING NewCustomers AS SOURCE ON TARGET.ID = SOURCE.ID WHEN MATCHED THEN UPDATE SET TARGET.Name = SOURCE.Name WHEN NOT MATCHED THEN INSERT (ID, Name) VALUES (SOURCE.ID, SOURCE.Name);
Real-World Examples
To better understand the practical applications of DDL and DML, consider the following scenarios:
-
Setting Up a New Table (DDL):
- DDL Command:
CREATE TABLE Employees (ID int, Name varchar(255));
- Objective: Defines a new table
Employees
with columnsID
andName
.
- DDL Command:
-
Adding a New Employee Record (DML):
- DML Command:
INSERT INTO Employees (ID, Name) VALUES (1, 'John');
- Objective: Inserts a new row with
ID=1
andName='John'
into theEmployees
table.
- DML Command:
-
Changing the Employee Name (DML):
- DML Command:
UPDATE Employees SET Name = 'Jenny' WHERE ID = 1;
- Objective: Updates the
Name
column to'Jenny'
whereID=1
in theEmployees
table.
- DML Command:
-
Dropping the Employees Table (DDL):
- DDL Command:
DROP TABLE Employees;
- Objective: Deletes the entire
Employees
table and its schema.
- DDL Command: