A database is a powerful tool for managing data of any kind. It makes it possible for users to store their data in a logical, organized manner. Data can be easily retrieved, and queries can be used to examine database contents in order to extract valuable information. Relationships between various pieces of data can also be defined.

Basic Structure of a Database

Spreadsheets are closely related to databases, in that they follow the same basic premise of organizing data into rows and columns. Think of a database as a collection of spreadsheets, except each spreadsheet is called a “table”.

High level diagram of a database as a collection of tables.

Each table contains the rows and columns of data. Each row is one record, and the columns are the individual fields. If we were going to design an address book, we might setup a table called “addresses” in which each row would correspond to one address and contain columns such as “City”, “State” and “Zip Code”.

Diagram of a database table, showing rows and columns.

At this point you might be thinking “I can do all this with spreadsheets! What’s the difference between a database of tables and a workbook full of spreadsheets?” Excellent question. And if this were the extent of a database’s capabilities, spreadsheets be a fine substitute. But databases are much more powerful — they can store relationships between data, respond to complex queries, and ensure the integrity of their contents.

Relational Databases

Database tables often correspond to objects in the real world… people, addresses, products, etc. A relational database is able to define relationships between tables, just like things in the real world relate to each other.

A classic example used to illustrate this concept is the hierarchy of an organization. Companies have departments and employees, and employees have roles. The following diagram represents a database that not only stores a company’s employees and departments, but knows how they relate to each other.

A relational database design for a company

Those ID columns are a basic tenet of database design — assigning each record a unique identifier by which it can be located. The “Employee” and “Department” tables each have a primary key column (its name ending with “ID”) and no two records in each of those tables will share the same value in that column.

Next, observe how the the “Employee Position” table contains a position column alongside both an employee and department ID. Here, the ID columns are used as “foreign keys”. A record may contain the following data:

Employee ID: 5
Department ID: 16
Position: Team Leader

This means the employee with ID # 5 holds the position “Team Leader” in the department with ID # 16.

Normalization

Database normalization is the practice of not repeating yourself. In most cases, a database administrator will design a table structure that avoids duplicating data in order to make the system efficient. Storage space is not consumed in excess, and data can be accessed quickly.

The company database shown above is normalized. Employee and department data exists only within its respective tables. Instead of repeating all that data in the “Employee Position” table — names, office locations, salaries — we simply use foreign key columns to reference the other tables’ records.

There are many levels of database normalization because as a system becomes more complex, it may not always be possible to have a perfectly normalized database. Furthermore, an extremely large and complex system might actually benefit from denormalization. That, however, is beyond the scope of this article.

Queries

Data goes in, data has got to come out, right? Of course. The raw data contained within such a system has little if any inherent meaning, but with queries we’re able to extract the data and transform it into information. From information, we can draw conclusions and make decisions.

We use SQL or “Server Query Language” to query the database. Starting with the basics, here’s how we would get a list of all employees in the company:

SELECT * FROM Employee

Output:

FirstName LastName Hired Salary
Jane Doe 2002-07-12 120000.00
John Doe 2005-05-14 100000.00
Joe Schmoe 2011-09-30 40000.00

Want to know what each employee does for the organization? We can use the foreign keys join data from other tables onto the EmployeePosition table.

SELECT
Employee.FirstName, Employee.LastName, EmployeePosition.position, Department.Name AS DepartmentName
FROM EmployeePosition
INNER JOIN Employee ON Employee.EmployeeID = EmployeePosition.EmployeeID
INNER JOIN Department ON Department.DepartmentID = EmployeePosition.DepartmentID

Output:

FirstName LastName Position DepartmentName
Jane Doe Director Sales
John Doe Mobile Strategist Marketing
Joe Schmoe Intern Marketing

Perhaps HR asked for a tally of each department’s employees. We can do that by grouping data together and using the COUNT() function.

SELECT Department.Name, COUNT(*) As NumEmployees
FROM EmployeePosition
INNER JOIN Department ON Department.DepartmentID = EmployeePosition.DepartmentID
GROUP BY Department.DepartmentID

Output:

Name NumEmployees
Sales 1
Marketing 2

This is of course just a tip of the iceberg, meant to illustrate the power of queries. If you are interested in knowing more, I would suggest a book like “Learning SQL” by Alan Beaulieu for O’Reilly.

Transactions

Imagine a complex financial system in which we must maintain data integrity at all times. A misplaced or incomplete record could mean serious problems in the real world. Transactions enable us to take a multi-step process such as transferring money and either ensure the entire operation completed successfully, or completely roll back all its changes upon failure.

Performing an electronic funds transfer might involve the following steps:

  1. Deduct the amount from the payer’s account.
  2. Deposit the amount in the payee’s account.
  3. Record the transaction in the payer’s history.
  4. Record the transaction in the payee’s history.

If one of these steps were to fail for some reason — a disk drive goes bad, a server is unresponsive, whatever — the parties accounts may no longer be balanced. If we deducted the amount from the payer’s account but failed to deposit into the payee’s, do we just let those funds vanish?

If we performed these queries independent of each other the system might not know how to recover from an error. But if we roll everything up into a transaction, we can take an “all or nothing” approach and roll back the steps if a failure is detected at any point.

Conclusion

A database utilizes rows and columns much like a spreadsheet, but adds powerful capabilities that make it possible to manage large amounts of crucial data. Queries and relationships permit us to mix and match data into all sorts of useful forms. Normalization keeps the system running efficiently, and transactions allow us to prevent errors from disrupting the flow of business.

Many of us may not realize we use databases on a daily basis. You may not be designing table layouts or issuing queries to a server, but you may use software that acts as a front-end to a database. Address books, CRM software, administrative tools and most web applications are all backed by database software.