Blake O'Hare .com

10 PHP Tips and Tricks

Here are a few things worth mentioning that I've figured out during my years of writing PHP-based websites. Some of these might be obvious but are worth re-iterating. Others are things that I didn't consciously consider until learning the hard way sometimes multiple times.

#10 - Never use MySQL commands directly

Ever. Create a database object. This object ought to hold a reference to the MySQL connection and have functions for select, delete, update, and insert. Even if you're lazy and make these functions all just take in a string for SQL, this allows you the opportunity to have different return values. Obviously the Select function will return the table results of query, however there are functions called mysql_affected_rows and mysql_insert_id. mysql_affected_rows returns the number of rows affected by an update or delete query given a MySQL connection object. mysql_insert_id returns the primary key of the last inserted row. Both of these are incredibly useful but are silly to put inline in your code everywhere.
$new_item_id = $db->insert("some query");

$update_success = $db->update("some query") > 0;

#9 - Create an underlying API from the get go

There comes a time in every PHP website's lifecycle where you want to go from a classical page-by-page layout to an AJAX or service-based presentation. To create a service API, you need to create each task a user could possibly do as a separate page. This becomes trivial if you created this from the start. If every individual task is a nice clean function call, then creating a public API for this is a simple matter of creating a bunch of pages that basically have half a dozen lines in them calling that function and printing out the results. Which brings us to the next point...

#8 - Actually use functions

PHP is a terrible language. There. I said it. Sometimes it's just really tempting to write a whole bunch of code from top to bottom because it's fast and easy in PHP. Bad. Take the time to figure out where there are natural separations in your logic for generating a page. Put those into functions. Group functions by theme into separate files then include those files from others. And while I'm talking about natural separations in logic...

#7 - Generate pages in two passes (data -> view)

The following is too terribly common:

$entries = $db->query("some select query to get a bunch of data");
for ($i = 0; $i < mysql_num_rows($entries); ++$i)
  $entry = mysql_fetch_array($entries);
  echo blah blah blah $entry['foo'] blah blah $entry['meow'] blah $entry['whatever'];

Perhaps if you're simply showing some sort of tabular data, this canonical approach is okay. But for anything more complicated, it's worth it to create some sort of intermediate data structure. SQL tables don't always map nicely to how the data will ultimately be displayed. The data structure you generate ought to be the most logical structure that corresponds to what you're trying to do. For example, any tree based UI. Suppose you have some sort of discussion topic that has posts in multiple threads that go off in different tangential directions in a hierarchical tree. Tables don't make good trees. Suppose there's a table called 'topic_post' with columns for post ID, topic ID, parent post ID, text, and time. Here is some sample code to easily display this:

$posts_from_db = $db->query("
  FROM `topic_post`
  WHERE `topic_id` = $topic
  ORDER BY `time`

$posts_by_id = array();
$post_ids = array();
$root_post_id = 0;

// create a lookup table by ID and list of ID's
for ($i = 0; $i < mysql_num_rows($posts_from_db); ++$i)
  $post = mysql_fetch_array($posts_from_db);
  $post['children'] = array();
  $posts_by_id[$post['post_id'] = $post;
  array_push($post_ids, $post['post_id']);

// go through all posts and add them to their corresponding parent
foreach ($posts_ids as $id)
  $parent_id = $posts_by_id[$id]['parent_post_id'];
  if ($parent_id > 0) // assume in our system, a parent ID indicates 
    array_push($posts_by_id[$parent_id]['children'], $id);
    $root_post_id = $id;

// since we're dealing with natrually recursive data, write a recursive function to display it
function show_post($post_id, $all_posts)
  $post = $all_posts[$post_id];
  echo '<div style="border:1px solid #000;">'.nl2br(htmlspecialchars($post['text'])).'</div>';
  if (count($post['children'] > 0))
    echo '<div style="padding-left:50px;">';
    foreach ($post['children'] as $child_id)
      show_post($child_id, $all_posts);
    echo '</div>';

if ($root_post_id > 0)
  show_post($root_post_id, $posts_by_id);

Doing a basic loop that iterates through the results once makes many useful tools such as the recursion used here impossible.

#6 - Redirect forms to their originating pages

When I first started writing PHP, I tended to have forms as two pages. A page with the form on it and the page that processed the form/showed error messages if something went wrong. If there was an error in the data the user entered, I would display the error message and force the user to go back to the previous page to fix it. Nasty. The better solution is to redirect to the same page.

$errors = array();
$form_submitted = isset($_POST['submit']);
$show_form = true;

if ($form_submitted)
  // check for errors. 
  // add any error messages to $errors

  if (count($errors) == 0)
    // do SQL magic or whatnot
    $show_form = false;
    echo "The following errors were encountered: <br />".implode("<br />", $errors);
    $show_form = true;

if ($show_form)
  echo '<form action="this page" method="post"> ... </form>';

#5 - Email Assertions

When developing client-side software, typically a debug build will throw assertions if something goes remotely wrong and the programmer wants to know about it. In PHP it's a little more difficult. Sure, you can have echo statements surrounded by if statements that only display when it sees your ID. But there are random people out there viewing random pages in odd ways. For example, each time I rewrite NP there tends to be quite a few broken links across the site and they're almost impossible to track down. Finally, I added a line of code to send me an email message each time a 404 page was reached and the URL the user was trying to view and which page they came from. Of course the first day I got hundreds of emails. But most of them were from about 3 or 4 high-traffic broken links. I fixed those few links (and for the ones originating from outside NP, I added appropriate redirects) and suddenly, the number of 404's dropped about 90%. Way more effective than just going through all the site content and hoping I find all broken links.

#4 - Do bookkeeping

Even powerful statistic and analysis tools like Google Analytics appears weak against the power you have in writing your own tracking system. You're not just limited to figuring out which pages are most popular, but you can also figure out what the heck it is your users spend their time doing. What does the typical user do? What pages do they visit and in what order? A few use cases can help you greatly optimize your page layout and content delivery.

#3 - Never use ORDER BY RAND() in MySQL

If you really must find a random entry in a database with more than a few hundred entries, do so using some heuristic or trick other than using ORDER BY RAND() LIMIT 1. This could be picking a random index and selecting the first entry greater than that. Do something like that because that can use an index to sort your results. When you sort results with an index, the sorting has already been done and the results are usually instantaneous. If you sort by RAND(), then the database has to generate a random number for each entry and compares each entry with every other entry using a traditional non-linear sorting algorithm. This is painfully slow and can actually cripple any query in a table with more than a few hundred entries. Even though most proprietary flavors of SQL have mecahnisms to do this quickly, MySQL just isn't smart enough to see the "LIMIT 1" at the end of the query and think to itself, "oh, I should just pick one at random". It sorts each entry. So you have to come up with something better than RAND().

#2 - Have an activation bit in the user table

I'm not sure why people register an account and then don't log in. This is extremely common. On some of my sites, the fraction of users that have never logged on is about 50% of the total entries in the user table. This is simply wasted space. Add a boolean column to your user table called has_ever_logged_on or some such with a default value of 0. When a user logs on, set it to 1. Then periodically, clear out all the old entries of people that never used their account. The extra space that 1 column uses definitely will pay for itself. If you have a last_login_time that's initialized to 0, you could re-purpose that for this function, too.

#1 - Security through obscurity never works

Everyone is looking for interesting things at interesting URL's. They're out to get you. If you have an admin panel at an obscure URL, assume someone will find it. Even if it's super obscure there's still a chance someone will find it via browser history or other mishaps. If you have something secure, go through the trouble of creating a password protected log-in system.