Basic principles of web programming

  1. Search engines
  2. Security
  3. Error reporting
  4. Debugging
  5. The workflow
  6. The application logic/display logic separation
  7. Absolute and relative paths
  8. Performance
  9. The request-response cycle
  10. Encoding
  11. Comments (6)

Disclaimer: My English isn't perfect. If you can't stand such poor grammar and can afford a bit of proofreading, here is the source on Github, all pull requests will be accepted with gratitude.

During a decade of active participation on Stack Overflow I was able to determine a set of reasons that lead to the most frequent questions on Q&A sites. It turned out that by following a rather limited set of basic principles one can avoid a multitude of generic problems.

Sadly, but these principles, although being universal, are almost as universally ignored or even violated in virtually every online tutorial. And I would say it's one of the biggest problems of online education.

For some reason there is a huge gap between a code written "for the education" and a professionally written code. And surprisingly, the difference is not in the professional code being more complex but on the contrary - the educational code being more elaborate, at the same time being error-prone and insecure.

But the irony is, following the professional practices doesn't make your code more complex! All you need to do is to learn a few basic principles and it will make your code tidy and secure!

I only have to add that the following set of rules is the generic right way of doing things. Of course there could be exceptions. But the point is, your first approach should be generic, not exceptional.

Although all language-specific details are given for PHP and MySQL, the principles are universal, and can be applied to any language.

Search engines

Learn to use Google. Seriously. There are loads of existing information on the web. All you need is to pick it up. Here are some useful hints:

Security

is the biggest shame of PHP tutorials. It is not a rare exception but rather a rule that a PHP tutorial picked at random from Google, or an Udemy course, would straight up teach you how to make your code critically vulnerable to every possible attack out there. And all this despite the fact that basic security rules are really simple and a no-brainer to follow! The good news, lately some tutorials managed to make it right at least with basic security.

Also, a big problem with security recommendations is that they are either uncertain (like "don't trust user input", "all data must be sanitized" etc) or make the wrong emphasis in the data source, not destination. But it's only destination that matters. Refer to the following list for the concrete recommendations:

Disclaimer: of course this is not an exhaustive list of vulnerabilities but just a list of the most frequent issues in the learners' code. For the more in-depth information refer to the resources like OWASP.

All I have to add is that all those rules only work when followed unconditionally. One steps into abyss when starting to choose whether their data is "safe" or not, whether it must be protected or not. Never judge the data by the source but only by the destination: sending it to SQL? Then it must be prepared. Sending it into HTML? Then it must be encoded. And so on.

Error reporting

is really a sore spot of PHP community, endorsed by myriads outdated tutorials and answers on Stack Overflow that repeat same bad practices again and again, making even a smallest piece of code unnecessarily elaborate, unsuitable for the production and even potentially dangerous, leaking the sensitive information outside.

NB. There must be a clear distinction between user interaction errors and system errors.

  • user errors are not actually errors but just notifications, such as "Passwords did not match" that can and should be shown to a site user
  • but real system errors, such as "Too many connections" MySQL error should be never shown to a site user. We are talking here of the system errors.

A lot of confusion is coming from the fact that every site, so to say, has two kinds of customers: a programmer and a user, who require totally different treatment in regard of error messages:

To make both satisfied at the same time, follow the principles below:

  1. Errors must be only thrown, but never printed out unconditionally. PHP is damn good at displaying a thrown error - you don't need no special code to see error message right on the screen during development. Not a single line. Just let errors to be thrown or throw them yourself, and then let PHP to handle the rest
  2. Try to avoid checking for the errors manually
    • most PHP modules can be configured to throw errors automatically (Most notable are PDO and Mysqli)
    • in case you need to check for the error manually, never echo it out. Instead, just throw it, using either trigger_error() or throw new Exception()
  3. There must be always a distinction between development and production modes.
    • however, such a distinction should never affect the code itself. Always write your code right away, without any conditions related to the server role
    • instead, it must be configured globally, affecting the whole site by means of setting a single configuration option
    • in the development mode PHP is usually configured to display errors right away
  4. You must remember that your site someday will go online. And that instant it will need an error reporting setup that is drastically different from that you were using before, being a sole user for your site while learning. In the production mode,
    • not a single system error message must ever be shown to a site user
    • errors must be logged into a file inaccessible from outside, or handled otherwise but only available to a site admin/developer
    • for a site user, a generic excuse page must be shown
  5. Errors shouldn't be swept under the rug. This includes not only the use of the infamous @ operator but some other techniques that aren't considered evil by the general audience, but really are. Such as

    • silencing errors with error_reporting(0). This function has nothing to do with displaying errors and hence should never be used for the purpose(which is fulfilled by completely different option, surprisingly called display_errors). Whereas error_reporting should be set at E_ALL whenever possible
    • checking for problems without articulated outcome. A very typical example is

      if ($result = mysqli_query($link, $query)) {
          // do stuff
      }
      // or
      if (is_readable($filename)) {
          include $filename;
      }

      although such verifications look innocent or even the right thing to do, in practice it's none other than sweeping the dirt under the rug. Without such a condition, PHP would be vocal about its problems and hence instrumental in fixing the issue. While such a code just makes it fail silently and leave you oblivious of not only the the actual error but even the place in the code where it happened

    • substituting an invaluably precise and certain error message with some vague assertion, such as

      if (!is_readable($filename)) {
          echo "A file cannot be read!";
      }

      *jackie_chan_wtf.jpg*! There are virtually hundreds of different reasons that may prevent a file from being readable - a permission issue, or some restriction such as open_basedir, or file doesn't exist, or it exists but in the different directory - etc,. etc., etc. Not to mention that a system error message always points at the exact line in the code where the problem occurs. While such home-brewed error messages will just pop out of nowhere.

    • some minor issues such as excessive usage of isset or its variants: empty and ??. It is often used again only to mute the possible error message. Of course there are situations when a variable can be genuinely not set, but such cases are much scarcer than the usage of such operators
    • disclaimer: of course there are situations when errors are non-critical and can be intentionally bypassed. But it should be a weighted decision, not a habit.

A practical implementation of all the rules mentioned above can be found in the detailed article on the PHP error reporting.

Debugging

The concept of debugging is almost universally unknown to PHP learners, probably because it's as universally ignored by tutorial authors, who have quite peculiar ideas on the debugging, if any.

At the same time, the concept of debugging is quite simple and natural. All you need to do is to make your code talk to you.

I am speaking here not about some specific tooling but about the generic principle, the actual idea that an issue can be hunted down and fixed by some other method than just staring at the code or asking someone else to take a look. The idea that a problem code must be run instead of stared at, and made talking about its problems.

Basically, in order to find the error, you need two things:

The learner's approach at the code is similar to assembling a Lego figure: if it failed, all you can do is to start over or ask someone to fix. But a code is not a Lego figure! You can make it run and make it talk!

Hence, always have error reporting at full. It will help PHP to tell you where the apparent error is. But the irony, such errors that can be found by PHP are the simplest ones. The real job starts when your code doesn't produce any errors but doesn't work (or works incorrectly) either. And here we come into the wonderful world of debugging.

Only one thing before we start: in order to be able to debug your application you need to understand what every part of your code is supposed to do. This is very important. When you positively know what value every variable should have at any given moment, the rest is a piece of cake: just add debugging output to your code and then compare the expected and actual output!

And then you just go and check every part distinctly. Your program is not a solid black box, which you feed the data with and it gives you the result. It consists of many parts and the problem may occur on the every step.

Say, there is an HTML form with a single field that always saves only a part of the data in the database, no matter how much words you enter. Well then, there are four distinct steps taking part here, each must be debugged individually:

First, you need to check whether the data has made it to the PHP script. Just add var_dump($_POST); die; at the top of the form handler script, submit the form and then view the page source (Ctrl-U)

in case it shows the proper field name followed by the full text - then the form works correctly. Delete var_dump() and die calls and proceed to saving into the database

then we have to check the data stored, using PHPmyAdmin, or a database console or any other database client.

in case the data is saved correctly then we have to check whether it has been gotten from the database correctly. Again, add var_dump($data); die; right after getting the data from the database

in case the data is shown, then proceed to displaying it in the form. In case it is not shown correctly, view the page source (Ctrl-U). If your data is there - then it means you incorrectly formatted it. Like, value attribute is missing quotes and/or the data is not encoded using htmlspecialchars. Fix it and here you go - the problem solved!

This is just an example to show you the generic course of actions. Just apply this general principle to any problem situation and have any problem solved or at least have a certain question for a Q&A site.

The workflow

The application logic/display logic separation

Is a very important concept. Yet it can be achieved by very simple means, without implementing any fancy stuff such as MVC. All you need is to separate every script into two parts: a part where only PHP code is used and not a single character is sent to the client; and a part where all the output starts, with occasional PHP to organize the control flow.

First of all, why do you need it? It's simple. An HTML page is not the only possible outcome for your script. There also could be:

Besides, if you are starting the HTML before starting to process the data, you won't be able to change significant parts of the page, such as a <title> tag, Open Graph widgets, or a custom javascript/CSS and such.

Besides, such a separation makes your code much simpler and cleaner. It becomes much easier to read and edit, for both a programmer and HTML designer.

In a simplest form, just have the common HTML header and footer in the separate files, then structure your code like this

<?php
// here goes all the logic, 
// getting all the data to be displayed ready

// then start the output
include 'header.php';
?>
now here goes the HTML for the page content
<?php
include 'footer.php' 

But that's really only for your first project. Learn to use a dedicated template system, such as Twig, as soon as possible. The principle, however, would remain the same:

Absolute and relative paths

A relatively (pun not intended) small issue but it leads to many confusions.

Examples feature the same file addressed different ways :

/var/www/example.com/index.php # a filesystem absolute path
         example.com/index.php # a filesystem relative path
                     index.php # a filesystem relative path
                    /index.php # a web-server absolute path 
                     index.php # a web-server relative path 

the /var/www/example.com part is called the DOCUMENT_ROOT and can be found in the $_SERVERarray in order to help building an absolute filesystem path from the web-server path. A more detailed explanation can be found in the dedicated article. Just a couple hints that can solve 99% of problems:

Performance

There are so many questions about performance asked out of the blue. But any concern about performance should be only provoked by the real life issue and backed by the results of profiling . Profiling, in simple terms, is measuring the execution times of distinct parts of the code and finding the slowest one, called a bottleneck. Only certain bottlenecks have to be fixed performance-wise, not just random parts of the code. Remember, "Premature optimization is the root of all evil"

The only generic advise about performance that could be given is: "avoid processing the huge amounts of data". This applies mostly to the database interaction when a learner unknowingly can cause an avalanche, the most notable example is selecting all the rows from a database table only to count them.

The request-response cycle

Some confusion is coming from the distributed nature of the server-browser system and the discrete nature of the PHP application. It is very important to understand that the execution of a PHP script is atomic:

Encoding

Is simple - just use UTF-8. All you need is to configure it properly. This is going to be a gist of the great answer on Stack Overflow, UTF-8 all the way through. In short, you have to make sure that all parts of your application are talking to each other using UTF-8 encoding: