Converting static pages to dynamic pages using PHP and MySQL, Part 2

The challenge:
Take Adam Walker’s Carrajina dictionary and stick it in a database so that Adam can add an entry whenever he wants without re-writing an html page and possibly messing up his html. One stricture is that Adam does not want to learn too much new stuff – no SQL, no PHP, just HTML. And if he can enter SAMPA and have it turn into IPA, that’s a plus!

Step 1 (data prep) is here

Step 2 involves importing data into a database. I  will skip blogging Step 2 for now. Putting data into a database is more complicated than taking it back out again. But I will blog it eventually. Promise.

Step 3: Getting the data back out of the database to generate the dictionary listings.

So, the first thing we need to do is connect to the database. I mentioned in a previous entry that PHP’s mysqli extension is the preferred extension to use with a MySQL database. Here’s how that works:

$host = "my_host"; //"localhost" or "http://mysql.host.com"...
$user = "my_username"; //an authorized user of MySQL
$password = "my_password"; //my_username's password
$database = "my_database"; //the database we want to use.

$mysqli = new mysqli($host, $user, $password, $database);

if ($mysqli->connect_error) {
    die('Connect Error ('. $mysqli->connect_errno .') '
            . $mysqli->connect_error);
}

After connecting to the database, the next step is writing a select query to find the appropriate data. Querying the database with mysqli uses the following steps, initialize the statement, prepare the (parameterized) statement, bind the variables to the parameterized statement, and execute the query. (More on this in a future post.) One of the nice things about using PHP (or any programming language, really) to make dynamic pages is that you do not have to write a complicated SQL query to get everything from the database at once. Instead, you can write a sequence of queries and use PHP to do things with the results.

Adam currently has his dictionary set up so that each letter is on its own page. To duplicate this, I’ve used the following queries:

public function getList($mysqli, $letter) {
   $like = $letter . '%';  //% is the SQL wildcard character

   $query = "SELECT english_entryID FROM english_entry
   WHERE head_entryID IS NULL  /* so we don't get subentries */
   AND english_term LIKE ? ORDER BY english_term ASC";

   $stmt = $mysqli->stmt_init();
   if ($stmt = $mysqli->prepare($query)) {
      $stmt->bind_param("s", $like);
      $stmt->execute();
      $stmt->bind_result($entryID);
      while ($stmt->fetch()) {
         $listIDs[] = $entryID;
      }
      $stmt->close();        
   }
   return $listIDs;
}

What this does, when called, is execute an SQL statement that looks in the table english_entry and finds all the english_entryIDs in the table that match the following conditions: head_entryID does not exist (and therefore this is not a subentry) and english_term starts with the passed $letter. The ‘?’ in the query is replaced by the variable $like, which is $letter followed by the ‘%’ sign to match anything. So, if $letter = ‘a’, then the SQL statement becomes:

SELECT english_entryID FROM english_entry WHERE head_entryID IS NULL
AND english_term LIKE 'a%' ORDER BY english_term ASC;

This SQL statement returns a list of english_entryIDs

english_entryID
1
2
3

which is then put into a PHP array array( [0] => 1, [1] =>2, [2] =>3, ...); for ease of use.

Now that we have the array of english_entryIDs that start with ‘a’ and are not subentries, we can use more code to get each actual entry. For this project, I have put all the code that gets an English language entry into a class called EnglishListing. The following code is then used on the dictionary pages to generate a listing.

//make sure $letter is lower case
$letter = strtolower($letter);

$eng = new EnglishListing();

$list = $eng->getList($mysqli, $letter);
foreach ($list as $entryID) {
    echo $eng->getEntry($mysqli, $entryID);
}

So after calling getList(), for each english_entryID returned by the getList function, we call getEntry:

public function getEntry($mysqli, $entryID) {
    //an entry has a term, a pronunciation,
    //one or more defs (num, pos, deftext),
    //and 0 or more subentries (just like an entry).
    $termrow = $this->getTerm($mysqli, $entryID);
    $defrow = $this->getDefs($mysqli, $entryID);
    $subrow = $this->getSubentry($mysqli, $entryID);

    return "<dt>$termrow</dt><dd>$defrow $subrow</dd>\n";
}//end function

OK. That code breaks up the process of getting an entry into 3 steps: getTerm(), getDefs(), and getSubentry().

getTerm() executes an SQL statement that returns english_term and english_pronunciation. It then wraps each of these in span tags and returns $termrow:

private function getTerm($mysqli, $entryID) {
   $termrow = "";
   $query = "SELECT english_term, english_pronunciation
              FROM english_entry WHERE english_entryID =?";

   $stmt = $mysqli->stmt_init();
   if ($stmt = $mysqli->prepare($query)) {
      $stmt->bind_param("i", $entryID);
      $stmt->execute();
      $stmt->bind_result($term, $pron);
      while ($stmt->fetch()) {
         $termrow .= "<span class='term'>$term</span>
                      <span class='pron'>$pron</span>";
      }
      $stmt->close();   
   }
   return $termrow;
}//end function

getDefs() executes an SQL statement that returns definition_number, part_of_speech, definition for each definition matched by the given english_entryID. Where only 1 definition exists, $defrow contains only the part_of_speech and definition values. Otherwise, it contains all the values. ‘.=’ in the PHP code concatenates the value to the existing variable. An entry with two definitions, then would have $defrow equal to “$definition_number. $part_of_speech $def. $definition_number. $part_of_speech $def. “.

private function getDefs($mysqli, $entryID) {
   $defrow = "";

   $query = "SELECT definition_number, part_of_speech, definition
      FROM english_definition
      WHERE english_entryID = ?
      ORDER BY definition_number ASC";

   if ($stmt = $mysqli->prepare($query)) {
      $stmt->bind_param("i", $entryID);
      $stmt->execute();
      $stmt->bind_result($defno, $pos, $def);
      $stmt->store_result(); //so I can use $stmt->num_rows
      while ($stmt->fetch()) {
         $defno = "<span class='defno'>$defno</span>";
         $pos = "<span class='pos'>$pos</span> ";
         $def = "<span class='deftext'>$def</span> ";
         if (($stmt->num_rows) == 1) {
            $defrow .= "$pos $def ";
         } else {
            $defrow .= "$defno. $pos $def ";
         }
      }//end while
      $stmt->close();        
   }//end if 

   return $defrow;
}//end function

getSubentry() starts by calling another function checkSubentry() which checks to see if the entry has any subentries. If there is a subentry, then checkSubentry() returns the english_entryIDs for the subentries. Otherwise it returns an empty array. getSubentry() then checks the length (number of values) of that array (must be greater than or equal to 1), and for each value in the array it calls getTerm() and getDefs().

private function getSubentry($mysqli, $entryID) {
   $subrow = "";

   $headentryIDs = $this->checkSubentry($mysqli, $entryID);

   if (count($headentryIDs) >= 1) {
      foreach ($headentryIDs as $entryID) {
         $subrow .= "<br />&#x2022; "
            . $this->getTerm($mysqli, $entryID);
         $subrow .= $this->getDefs($mysqli, $entryID);
      }
   } 

   return $subrow;
}//end function

private function checkSubentry($mysqli, $entryID) {
   $headentryIDs=array();

   $query = "SELECT english_entryID FROM english_entry
             WHERE head_entryID =?"; 

   $stmt = $mysqli->stmt_init();
   if ($stmt = $mysqli->prepare($query)) {
      $stmt->bind_param("i", $entryID);
      $stmt->execute();
      $stmt->bind_result($eentryID);
      while ($stmt->fetch()) {
         $headentryIDs[] = $eentryID;
       }
      $stmt->close();
   }
   return $headentryIDs;
}

That is the code for the EnglishListing class. The Carrajina side looks similar. There were some challenges in that ‘ch’ and ‘dj’ could not be listed under ‘c’ and ‘d’. I’ll talk about that later.

<?php

class CarrajinaListing {

public function getList($mysqli, $letter) {
   //extra letters ch, dj, ñ, rr
   $like = $letter . '%';  //% is the SQL wildcard character
   $like2 = ucfirst($letter) . '%'; 

   //BINARY means that MySQL doesn't ignore accented characters,
   //n-tilde, and the like. it also doesn't ignore capitalization
   //without the BINARY keyword, accents and capitalization are
   //ignored.
   $query = "SELECT carrajina_entryID FROM carrajina_entry ";

   switch ($letter) {
      case 'c':
         $query .= "WHERE (carrajina_term LIKE BINARY ?
                    OR carrajina_term LIKE BINARY ?) AND
                    (carrajina_term NOT LIKE 'ch%') ";
      break;
      case 'd':
         $query .= "WHERE (carrajina_term LIKE BINARY ?
                    OR carrajina_term LIKE BINARY ?) AND
                    (carrajina_term NOT LIKE 'dj%' ) ";
      break;
      default:
         $query .= "WHERE carrajina_term LIKE BINARY ?
                    OR carrajina_term LIKE BINARY ? ";
      break;
   }

   $query .= " ORDER BY carrajina_term ASC";
   $stmt = $mysqli->stmt_init();
   if ($stmt = $mysqli->prepare($query)) {
      $stmt->bind_param("s", $like);
      $stmt->execute();
      $stmt->bind_result($entryID);
      while ($stmt->fetch()) {
         $listIDs[] = $entryID;
      }
      $stmt->close();        
   }
   return $listIDs;
}//end function

private function getEntry($mysqli, $entryID) {
   //an entry has a term, a pronunciation,
   //one or more defs (num, pos, deftext),
   //one or more idioms (idiom, idiomdef),
   //an etymology, and sometimes a note
   $termrow = $this->getTerm($mysqli, $entryID);
   $defrow = $this->getDefs($mysqli, $entryID);
   $idioms = $this->getIdioms($mysqli, $entryID);
   $etymology = $this->getEtymology($mysqli, $entryID);
   $note = $this->getNote($mysqli, $entryID);

   return "<dt>$termrow</dt>
           <dd>$defrow $etymology $idioms $note</dd>\n";
}//end function

private function getTerm($mysqli, $entryID) {
   $query = "SELECT carrajina_term, carrajina_pronunciation
             FROM carrajina_entry WHERE carrajina_entryID = ?";
   $stmt = $mysqli->stmt_init();
   if ($stmt = $mysqli->prepare($query)) {
      $stmt->bind_param("i", $entryID);
      $stmt->execute();
      $stmt->bind_result($term, $pron);
      while ($stmt->fetch()) {
           $termrow .= "<span class='term'>$term</span>
                        <span class='pron'>$pron</span>";
      }
      $stmt->close();        
   }//end if 

   return $termrow;
}//end function

private function getDefs($mysqli, $entryID) {
   $defrow = "";

   $query = "SELECT definition_number, part_of_speech, definition
             FROM carrajina_definition WHERE carrajina_entryID = ?
             ORDER BY definition_number ASC";
   $stmt = $mysqli->stmt_init();
   if ($stmt = $mysqli->prepare($query)) {
      $stmt->bind_param("i", $entryID);
      $stmt->execute();
      $stmt->bind_result($defno, $pos, $def);
      $stmt->store_result(); //so I can use $stmt->num_rows
      while ($stmt->fetch()) {
         $defno = "<span class='defno'>$defno</span>";
         $pos = "<span class='pos'>$pos</span> ";
         $def = "<span class='deftext'>$def</span> ";
         if (($stmt->num_rows) == 1) {
            $defrow .= "$pos $def ";
         } else {
            $defrow .= "$defno. $pos $def ";
         }
      }//end while
      $stmt->close();        
   }//end if 

   return $defrow;
}//end function

private function getIdioms($mysqli, $entryID) {
   $idioms = "";

   $query = "SELECT carrajina_idiom, carrajina_idiom_definition
             FROM carrajina_idiom WHERE carrajina_entryID = ?
             ORDER BY carrajina_idiom ASC";
   $stmt = $mysqli->stmt_init();
   if ($stmt = $mysqli->prepare($query)) {
      $stmt->bind_param("i", $entryID);
      $stmt->execute();
      $stmt->bind_result($idiom, $idiomdef);
      while ($stmt->fetch()) {
         $idioms .= "<span class='idiom term'>&#x2022; $idiom</span>
                     <span class='idiomdef'>$idiomdef</span> ";
      }//end while
      $stmt->close();        
   }//end if 

   return $idioms;
}//end function

private function getEtymology($mysqli, $entryID) {
   $etymology = "";

   $query = "SELECT etymology FROM carrajina_entry
             WHERE carrajina_entryID =?";

   $stmt = $mysqli->stmt_init();
   if ($stmt = $mysqli->prepare($query)) {
      $stmt->bind_param("i", $entryID);
      $stmt->execute();
      $stmt->bind_result($etym);
      while ($stmt->fetch()) {
         $etymology .= "<span class='etymology'>$etym</span>";
      }//end while
      $stmt->close();        
   }//end if 

   return $etymology;
}//end function

private function getNote($mysqli, $entryID) {
   $note = "";

   $query = "SELECT note FROM carrajina_note
             WHERE carrajina_entryID =?";

   $stmt = $mysqli->stmt_init();
   if ($stmt = $mysqli->prepare($query)) {
      $stmt->bind_param("i", $entryID);
      $stmt->execute();
      $stmt->bind_result($cnote);
      while ($stmt->fetch()) {
         $note .= "<span class='note'>$cnote</span> ";
      }//end while
      $stmt->close();        
   }//end if
   return $note;
}//end function

}//end class
?>

I know there are things in here that I didn’t explain. Consider this your excuse to take me to task and ask questions in the comments.

Edited 9-20-2011 for aesthetic reasons.

Using mysqli and parameterized statements

mysqli is the PHP extension that is recommended for use with current versions (4.1.3 and later) of MySQL. The way to connect to a MySQL database using mysqli is:

$host = "my_host"; //"localhost" or "http://mysql.host.com"
$user = "my_username"; //an authorized user of the MySQL database
$password = "my_password"; //my_username's password
$database = "my_database"; //the database we want to use.

$mysqli = new mysqli($host, $user, $password, $database);

if ($mysqli->connect_error) {
    die('Connect Error (' . $mysqli->connect_errno . ') '
            . $mysqli->connect_error);
}

Keep note of that $mysqli variable. It gets used a lot.

Querying the database can be as simple as:

$mysqli->query("some valid SQL statement");

This will return TRUE if the SQL statement was successful. So, if you are not expecting any data to be returned from the server, this bit of code will work fine:

if ($mysqli->query("some valid SQL statement") === TRUE) {
    //do stuff, or not, like print a success statement
}
$mysqli->close();

If you are using a SELECT statement to get data from the server, then you can do so:

if ($result = $mysqli->query("some valid SQL statement")) {
    //do stuff with the data
    while ($row = $result->fetch_row()) {
        //do stuff with the data in the $row array
    }
    $result->close();
}
$mysqli->close();

In addition to fetch_row(), there is also fetch_array() and fetch_assoc(). The full list of $result functions is here.

But the true wonderfulness of mysqli comes from the parameterized queries.

$stmt = $mysqli->stmt_init();
if ($stmt->prepare("some valid SQL query")) {
    $stmt->bind_params("s...", $vars);
    $stmt->execute();
    $stmt->bind_result($vars);
    while ($stmt->fetch()) {
        //do stuff with the data
    }
    $stmt->close();
}
$mysqli->close();

Yeah, right, you say. It looks complicated, but really, it isn’t. And some parts are optional.

I mentioned earlier that parameterized queries were one line of defense against sql injection attacks because they force the database to interpret anything within the bound variable as data and not as SQL instructions. So, here is how that works.

First, we initialize the statement with $stmt = $mysqli->stmt_init();. This step is optional, but it is good practice. By initializing, the next function called on $stmt has to be $stmt->prepare().

$stmt->prepare() takes a query (either directly or as a variable) and prepares it for the server. If the query requires data input, such as in a WHERE clause or in an INSERT statement, then one uses ‘?’ markers as placeholders for the data. Like so:

SELECT FirstName, LastName FROM Friends WHERE FavoriteColor = ?;

In PHP:

$stmt = $mysqli->stmt_init();
if ($stmt->prepare("SELECT FirstName, LastName
    FROM Friends WHERE FavoriteColor = ?")) {
    ...

So far, so good. Make sure not to include the normal final semi-colon in the SQL statement.

The next step is to bind the parameters, or tell the database what value the ? has. This is done using $stmt->bind_param(); This function takes a minimum of two arguments. The first is a string with information on the type of data. The second is the variable that holds the data. The information string can consist of the following characters only: b, d, i, and s. These stand for blob, double, integer, and string. Since FavoriteColor is a word, we use ‘s’ to identify the input as a string. The number and sequence of characters in the string must correspond to the number and sequence of ?s in the query. The number and sequence of the data- holding variables must correspond to the same number and sequence.

$color = "purple";
$stmt = $mysqli->stmt_init();
if ($stmt->prepare("SELECT FirstName, LastName
    FROM Friends WHERE FavoriteColor = ?")) {
       $stmt->bind_param("s", $color);
       ...

OK so far? Want another example? How about:

INSERT INTO Friends (FirstName, LastName, FavoriteColor)
     VALUES (?, ?, ?);
//the data
$firstname = "David";
$lastname = "Peterson";
$color = "orange";

$stmt = $mysqli->stmt_init();
if ($stmt->prepare("INSERT INTO Friends (FirstName, LastName,
    FavoriteColor) VALUES (?, ?, ?)")) {
       $stmt->bind_param("sss", $firstname, $lastname, $color);
       ...

See, three ?s, three characters in the first argument, and three variables with our data.

Then we execute the statement, using $stmt->execute(), which sends our query to the database.

$color = "purple";
$stmt = $mysqli->stmt_init();
if ($stmt->prepare("SELECT FirstName, LastName FROM Friends
    WHERE FavoriteColor = ?")) {
       $stmt->bind_param("s", $color);
       $stmt->execute();
       ...

Since this SELECT statement is returning two values (FirstName and LastName), we can also bind the results. This part is optional, though still a good idea.

$color = "purple";
$stmt = $mysqli->stmt_init();
if ($stmt->prepare("SELECT FirstName, LastName FROM Friends
    WHERE FavoriteColor = ?")) {
       $stmt->bind_param("s", $color);
       $stmt->execute();
       $stmt->bind_result($firstname, $lastname);
       ...

After binding the results, we can use $stmt->fetch() to get the data from the result set. All that’s left to do is close the statement with $stmt->close(), freeing up resources, and close the connection with $mysqli->close().

$color = "purple";
$stmt = $mysqli->stmt_init();
if ($stmt->prepare("SELECT FirstName, LastName FROM Friends
    WHERE FavoriteColor = ?")) {
       $stmt->bind_param("s", $color);
       $stmt->execute();
       $stmt->bind_result($firstname, $lastname);
       while ($stmt->fetch()) {
           //do stuff with the data
           echo "$firstname $lastname";
       }
       $stmt->close();
}
$mysqli->close();

The other example:

//the data
$firstname = "David";
$lastname = "Peterson";
$color = "orange";

$stmt = $mysqli->stmt_init();
if ($stmt->prepare("INSERT INTO Friends (FirstName, LastName,
    FavoriteColor) VALUES (?, ?, ?)")) {
       $stmt->bind_param("sss", $firstname, $lastname, $color);
       $stmt->execute();
       $stmt->close();
}
$mysqli->close();

INSERT does not return any results for binding. If you want information regarding the success of the query, you can use $mysqli->affected_rows to get back a count of affected rows. Even more useful is $mysqli->insert_id which returns the auto-incremented ID for the new row (if there is one, and if you call it right away before inserting or updating anything else.)

If you chose to skip binding the results, then you cannot use $stmt->fetch(). Instead, you must use  $stmt->get_result() and then $result->fetch_row() or an equivalent. Like so:

$color = "purple";
$stmt = $mysqli->stmt_init();
if ($stmt->prepare("SELECT FirstName, LastName FROM Friends
    WHERE FavoriteColor = ?")) {
       $stmt->bind_param("s", $color);
       $stmt->execute();
       $result = $stmt->get_result();
       while ($row = $result->fetch_row()) {
           //do stuff with the data in the $row array
       }
       $result->close();
}
$mysqli->close();

And that’s how you use mysqli and parameterized statements.

Oct 19, 2011: Updated to fix typos in the method name.
Apr 19, 2014: Updated to add missing commas in the code.

Do you speak SQL? (Part 1)

SQL stands for Structured Query Language and it is used to talk to relational databases. While there is a standard for SQL, many databases also have their own dialects, so to speak. This is a quick intro to basic SQL, nothing fancy. In the examples, I will use the following sample data:

Friends table:

FirstName LastName FavoriteColor
James Smith blue
Mary Johnson green
David Williams red
Linda Jones purple
Daniel Brown green
Jenny Davis blue
Nancy Smith orange

SELECT

This is the most common statement used, as it gets data from the database. The basic syntax of a SELECT statement is:

SELECT FirstName, LastName, FavoriteColor FROM Friends;

Multiple fields are generally separated by commas, as are multiple tables in more advanced SELECT statements. The SELECT statement may also have optional WHERE and ORDER BY clauses and other refinements.

One of those refinements is the asterisk (*) used in place of the fields:

SELECT * FROM Friends;

* stands for all fields in the table. This certainly saves time, but can cause issues if the fields returned are not the fields expected, say because someone has changed the structure of the table. I generally never use * in my applications for that reason.

Another refinement is the DISTINCT keyword:

SELECT DISTINCT FavoriteColor FROM Friends;

This will return:

FavoriteColor
blue
green
red
purple
orange

Without the DISTINCT keyword, we would get:

FavoriteColor
blue
green
red
purple
green
blue
orange

WHERE limits the data returned to that which matches the expression in the WHERE clause.

SELECT FirstName, LastName FROM Friends WHERE FavoriteColor = 'purple';

FirstName LastName
Linda Jones

‘purple’ needs to be quoted because it is a string rather than a number. Numbers are not quoted.

Multiple conditions in a WHERE clause can be joined by the AND keyword. There is also an OR keyword.

SELECT FavoriteColor FROM Friends WHERE FirstName = 'Jenny' AND LastName = 'Davis';

FavoriteColor
blue

LIKE looks for data that matches a certain condition. LIKE has two wildcard characters: ‘_’ which matches any single character in a string, and ‘%’ which matches any number of characters in a string.

SELECT FirstName, LastName FROM Friends WHERE FirstName LIKE 'J%';

FirstName LastName
James Smith
Jenny Davis

ORDER BY sorts the returned data. ASC specifies ascending order and DESC specifies descending order.

SELECT FirstName, LastName FROM Friends ORDER BY LastName ASC;

FirstName LastName
Jenny Davis
Daniel Brown
Mary Johnson
Linda Jones
James Smith
Nancy Smith
David Williams

INSERT, UPDATE, DELETE

These are used to add, change, and delete rows in tables.

An INSERT statement adds a row to a table.  The syntax for an INSERT statement is:

INSERT INTO Friends (FirstName, LastName, FavoriteColor) VALUES ('David', 'Peterson', 'orange');

An UPDATE statement changes an existing row in a table.

UPDATE Friends SET LastName = 'Anderson' WHERE FirstName = 'Nancy';

A DELETE statement deletes a row from a table.

DELETE FROM Friends WHERE FirstName = 'David' AND LastName = 'Williams';

After executing these INSERT, UPDATE, and DELETE statements, our data looks like so:

Friends table:

FirstName LastName FavoriteColor
James Smith blue
Mary Johnson green
Linda Jones purple
Daniel Brown green
Jenny Davis blue
Nancy Anderson orange
David Peterson orange

A short (very short) XML primer

So, XML. It’s actually very easy. It’s a lot like HTML, except you get to make up your own tags. The rules are:

  1. Any text ought to be surrounded by tags: <my_tag>text</my_tag>
  2. Do not nest tags. <my_tag><bold_text>text</my_tag></bold_text> is not allowed. Use <my_tag><bold_text>text</bold_text></my_tag> instead.
  3. There needs to be a tag that starts and ends the document. This is generally called the root node.

That’s it, mostly.* Follow those rules and you will have well-formed XML. Valid XML, however, requires a set of tag definitions to validate against. Those rules are written in a dtd file. Like so:

<!ELEMENT my_tag (tags and stuff my_tag can contain) >

A dtd needs an ELEMENT declaration for every tag in your XML file. Note that in my previous post, I showed two dtds for two XML files. A line like so:

<!ELEMENT english_term (#PCDATA)>

means that the tag <english_term>...</english_term> contains text but no other tags. A line like so:

<!ELEMENT english_entry (english_term, pronunciation, definitions+, subentry*)>

lists the tags that <english_entry>...</english_entry> contains. Note that some of these tags have characters after them: +, *, ?. Those have a special meaning:

  • + means that 1 or more tags of this type can exist within the defined tag
  • * means that zero or more tags of this type can exist within the defined tag
  • ? means that zero or 1 tag of this type can exist within the defined tag
  • no mark means that 1 tag of this type must exist within the defined tag

There. That is basic (very basic) XML. Easy!

*Then there are attributes and namespaces and other optional complications.

Converting static pages to dynamic pages using PHP and MySQL, Part 1

The challenge:
Take Adam Walker’s Carrajina dictionary (http://carrajina.conlang.org/dicthome.html) and stick it in a database so that Adam can add an entry whenever he wants without re-writing an html page and possibly messing up his html. One stricture is that Adam does not want to learn too much new stuff – no SQL, no PHP, just HTML. And if he can enter SAMPA and have it turn into IPA, that’s a plus!

Step 1: get the html source code for the current dictionary pages. Examine them carefully and figure out the structure. Outcome:

For the English to Carthaginian side:

Each entry consists of an English word, followed by a pronunciation. Then come definitions: a number if there’s more than one, a part of speech, and the definition text – a set of glosses in Carrajina sometimes with explanation. In addition, some entries have subentries, which have an English word or phrase, the definition number if there’s more than one, and the definition text.

I turned this into XML to make it easier to analyze. Turning it into a spreadsheet might also have worked. The dtd for this part of the dictionary is:

<!ELEMENT english-carthaginian (english_entry+)>
<!ELEMENT english_entry (english_term, pronunciation, definitions+, subentry*)>
<!ELEMENT english_term (#PCDATA)>
<!ELEMENT pronunciation (#PCDATA)>
<!ELEMENT definitions (definition_entry+)>
<!ELEMENT definition_entry (definition_number, part_of_speech, definition) >
<!ELEMENT definition_number (#PCDATA) >
<!ELEMENT part_of_speech (#PCDATA) >
<!ELEMENT definition (#PCDATA) >
<!ELEMENT subentry (english_term, pronunciation?, definition+)>

Creating the database model, I decided to put the definitions in their own table. Also, since the subentry is more or less identical to a head entry, I decided to put them in the same table. The diagram:

data model for english-carthaginian tables

For the Carthaginian to English side:

<!ELEMENT carthaginian-english (entry+)>
<!ELEMENT carrajina_entry (carrajina_term, pronunciation, definitions+, etymology?, note?, idioms*)>
<!ELEMENT carrajina_term (#PCDATA)>
<!ELEMENT pronunciation (#PCDATA)>
<!ELEMENT definitions (definition_entry+)>
<!ELEMENT definition_entry (definition_number, part_of_speech, definition) >
<!ELEMENT definition_number (#PCDATA) >
<!ELEMENT part_of_speech (#PCDATA) >
<!ELEMENT definition (#PCDATA) >
<!ELEMENT etymology (#PCDATA)>
<!ELEMENT note (#PCDATA) >
<!ELEMENT idioms (idiom, idiom_definition)>
<!ELEMENT idiom  (#PCDATA)>
<!ELEMENT idiom_definition (#PCDATA) >

Again, I put the definitions in their own table. Despite the fact that English and Carthaginian definitions both have the same structure, I will keep them in separate tables. Since an entry can have multiple idioms associated with it, those will go in their own table, too. Since etymologies are common, I will keep those in the same table with the terms. Notes, however, are not very common, so I will put them in their own table. This is a space- and memory-saving move. It is perfectly valid to put notes in the same table as terms as well.

data model for carthaginian-english tables

The full model:

full data model

Note: ignore the VARCHAR(45). I haven’t decided on the length of these fields yet.

Questions? Comments? Mistakes?

PHP 5.38

As of last week, PHP 5.3.8 is available and everyone ought to upgrade to it. It contains fixes to a bug introduced by 5.3.7. The full list of changes is here. People still using 5.2 especially need to upgrade, as 5.2 is no longer supported. (Which means that ISPs and others can no longer get official support from Zend and such.) So if you have a choice, use 5.3.8 or 5.3.6 if 5.3.8 is not available.

Protecting Your Data from an SQL Injection Attack

Introduction

Generally speaking, SQL injection is an attack on a website via a database attached to the site. This database might contain site content or it might contain user login and password information. A successful attack would allow the attacker complete access to the database, able to add, change, delete, and download the database content.

So how does the attack work? It works by using quote characters to end an input string prematurely, and then adding SQL statements to the input.

xkcd #327 “Exploits of a Mom” has an example of an SQL injection attack with a child named “Robert’); DROP TABLE Students; –”. The “’);” after “Robert” is a typical end to the SQL insert statement. It is followed by another SQL statement “DROP TABLE Students;” And the final “–” starts an SQL comment, thus guaranteeing that whatever the original statement was, the rest of it would be commented out.

In more detail: presumably the school has a database with a table called Students. In the normal course of things, someone would enter “Robert” into the first_name field, and Robert’s last name (let’s say it’s “Smith”) into the last_name field, and so on. The SQL statement that would be sent to the database would look like this:

INSERT INTO Students (first_name, last_name) VALUES (‘Robert’, ‘Smith’);

And one row would be inserted into the Students table. However, if instead of “Robert”, someone entered “Robert’); DROP TABLE Students; –” into the first_name field, then the following would be sent to the database:

INSERT INTO Students (first_name, last_name) VALUES (‘Robert’); DROP TABLE Students; --’, ‘Smith’);

One row would be inserted into the Students table (though last_name would be blank), and then the table Students would be deleted from the database. The last bit –’, ‘Smith’); would be interpreted as a comment and so ignored. See SQL Injection Attacks by Example for a more detailed examination of how this attack works.

The database is vulnerable anytime a user interacts with it. However, there are ways to defend against such an attack. I will discuss three of these ways:

  1. sanitize inputs
  2. use parameterized queries
  3. restrict permissions

None of these strategies is a guaranteed, will-stop-100%-of-attacks method. However, each can help lower the likelihood of a successful attack. So, these different defense strategies are not mutually exclusive, and it is best, whenever possible, to use all three.

Sanitize Inputs

The punchline of the xkcd comic has the mother saying “I hope you’ve learned to sanitize your database inputs.” One common way of sanitizing input is to escape the characters by adding another quote mark or a backslash to any quotes in the input. If the hapless school had escaped the input, then this is what would have resulted:

INSERT INTO Students (first_name, last_name) VALUES (‘Robert\’); DROP TABLE Students; --’, ‘Smith’);

The student would have a first name of “Robert’); DROP TABLE Students; –” and a last name of “Smith” and no tables would have been deleted. By escaping the single-quote with a backslash, the database interpreted it as part of the input. Characters that ought to be routinely escaped are: single quotes, double quotes, backslashes, and the NUL character. Control characters like CTRL-Z also ought to be escaped as they, too, can interfere with the interpretation of input strings.

Since this blog deals primarily with PHP, I will discuss several PHP functions that can assist.

  1. addslashes(): This function adds a backslash to single quotes, double quotes, backslashes, and NUL automatically.
  2. mysql_real_escape_string(): This is a MySQL-specific PHP function that escapes nulls, newlines, carriage returns, substitute characters, double quotes, single quotes, and backslashes. The  mysql_escape_string() function is deprecated, and should not be used, as it disregards the character set the string is in. The mysqli (mysql-improved) equivalent of mysql_real_escape_string() is mysqli::real_escape_string().

The following commands help to keep any html tags from getting into the database. Thus, if you end up displaying this data, it won’t change the html markup or add a javascript function to your page. This also helps to prevent someone from inserting a malicious script into your database.

  1. htmlspecialchars(): transforms ampersands, double quotes, less than and greater than into html entities. If the ENT_QUOTES flag is set, it also transforms single quotes. The htmlentities() function does the same thing, also encoding most Latin-1 non-ascii characters.
  2. str_replace(): can be used to replace any characters, like so:
    $string = "{user_input}";
    $bad = array('&', '"', "'", '<', '>', '\\', '%', '_');
    $good = array('&amp;', '&quot;', '&#039;', '&lt;', '&gt;', '&#092;', '&#037;', '&#095;');
    $sanitized = str_replace($bad, $good, $string);

    The backslash in $bad needs to be escaped in order to be recognized as a regular character.
  3. strip_tags() can be useful, too, as it strips the input of any html and php tags that are found.

Use parameterized queries

Parameterized queries are fun, convenient, faster, and much, much safer than merely escaping a string. Parameterized queries take an SQL statement like INSERT INTO Students (first_name, last_name) VALUES (?, ?); and then bind or assign the inputted values into the query, in essence escaping it for you. This is preferable to escaping the string yourself because it is less error-prone and you can’t forget to do it.

The mysql extension for PHP does not support parameterized queries. However, the mysqli (the ‘i’ stands for “improved”) extension (recommended for MySQL v4.1.3 and later) and the PDO extension do support parameterized queries. If you are not sure which to use, see this overview. I am partial to PDO, but mysqli is a good extension as well. I will write a future post on migrating your PHP code from mysql to mysqli.

Restrict permissions

If you are using your database to provide site content only (that is, you have no registered users), then it is a fairly simple thing to make your database read-only. That is, create a user for your database that has select permissions only and cannot insert, update, or delete any content. Use this account to connect the website with the database. That way anyone who manages to get around the escaped string in a parameterized query method cannot add, change, or delete your data through the website, accidentally or otherwise.

Of course, this is more complicated if you do have registered users. Make sure that any statements that deal with input from a user that is not logged in use a restricted, select-only account. Also, if you have your own administrative interface (a form where you enter stuff into your database rather than using phpMyAdmin or some other application), that page will need to access the database using a different account.

Hello world!

Hi!

I decided to start this blog because I could use a little practice in writing on technical topics. This blog’s primary audience is people who are not programmers but want to know a little something more about creating and maintaining websites.

Since I am active in the conlanging community, there will likely be a number of posts regarding storing and displaying dictionaries and texts. Since I am also really into PHP and MySQL and databases in general, there might be some posts about more technical topics as well.

Comments or questions are welcome.