At its core, you can imagine a database as a bunch of spreadsheets. Except they’re called “tables”, and they can be huge, containing tons and tons of rows. More than any spreadsheet ever could handle.

Each table has a bunch of columns, and can have an arbitrary number of rows (also referred to as “records”). Columns (also referred to as “fields”) have a name and a type. Their type specifies which kind of data can be stored. Each row has a number of cells , and each one of the cells can hold some value with the type defined by the column.

Let’s have a look at an example.

Imagine we have a table called members, and it contains our member data. So it could look like this:

id name joined_on
1 Anja 2013-06-24
2 Carla 2013-06-24
3 Rebecca 2013-06-31

The id column would be a running number, so it would have the type integer. The name column is a string (databases call it this type a varchar), and the joined_on column is a date.

For a table like this, the columns (with their name and type) are referred to as the database structure, whereas the rows represent the data that we store. Rows change all the time: new members sign up, existing members change their details, or remove their profile etc. The structure remains the same, unless we, as developers, need another column, or table in order to store more data.

Databases usually contain many tables. And often data from one table relates to data in other tables.

For example, we could add the ability to post status updates to our members application. Maybe we would have a table statuses:

id member_id message
1 1 Finished the search feature for
2 1 Working on the CSS cleanup with Maren next
3 3 Created some new designs for our stickers!

The id column, again, contains a running number that allows to identify a single status update. The column message obviously contains the status update message.

What about the member_id column though?

It references a row in a different table: our members table. This means that, in this example, Anja has posted two status updates, Rebecca one, and Carla hasn’t posted yet.

If you look at the two columns id and member_id you notice that the column id is special: It must never contain duplicate values, because we want to use the id to identify a certain message (or member). This is called a primary key, and the column is called a “unique” one. Also, it usually auto-increments the id for us: Whenever we store a new row to this table then the database will assign an id, make sure we get the next number, and never get duplicate values.

These “features”, or special properties of the id column also are considered part of the structure, alongside with the column name and type: we define these things when we create or modify the database structure.

The column member_id on the other hand should not be unique: We want to be able to store many messages that all belong to the same member row, in the members table. Therefor we need to be able to have multiple rows with the same member_id in the messages table.

Does that make sense? This is how we can store data in a database, give it a certain structure, and relate a piece of data (a row) in one table to data (rows) in other tables.

Now, how can we talk to a database like this? How can we actually add data to a table, or retrieve it?