Signs your PHP needs refactoring

I have had to go through a php application recently which has given me more than one headache and has required me to use all my possible patience. While working with it, I thought This is good material for an article, so that nobody else does the same in the future, and nodody else will need to experience the same displeasure as I have had to.

So here are the signs your PHP application needs a serious refactoring, right now:

Uses global variables

This one has two readings. In the first one, there's that coding style in which a series of variables is placed in the global scope, each function simply does php global $varA, $varB; etc, and proceeds to do whatever it needs to do with the variables. This one is the worst one; lots of very hard to trace errors plague programs written with this technique, most often because globalized variables end up overwriting variables which they shouldn't, so that's bad. It's also very complicated to keep track of who's modifying what, because there aren't function calls to access the variables, only global statements. There is no visibility at all.

Then there's the second reading, in which readily available global variables ($_REQUEST, $_POST, $_GET) are again used (and abused) throughout the whole application. You can find, for example, data models accessing directly the $_POST or $_GET arrays, which not only introduces extra complexity into their logic (what if we decide to send data with GET instead of POST?, for example) but also complicate unit testing the different pieces of the application. You can't just send an array of data to a model - it has to be in POST or GET.

Again, this way of doing things offers bad visibility, since you can't really tell who's changing what.

Brackets galore

Whenever I find things like this:

$variable['field']['subfield']['subsubfield1'] = 'value1';
$variable['field']['subfield']['subsubfield2'] = 'value2';
$variable['field']['subfield']['subsubfield3'] = 'value3';
$variable['field']['subfield']['subsubfield4'] = 'value4';

... a word comes to my mind: disgusting.

It is way easier and faster to just do something like this:

$variable['field']['subfield'] = 
  array('subfield1'=>'value1', 'subfield2'=>'value2', 'subfield3'=>'value3', 'subfield4'=>'value4');

You can indent it having one pair of array values per line, as you wish. The point is that if you ever need to change 'field' or 'subfield', you don't need to do it four times (even if there are things like Search and Replace). I have found cases where the Brackets Galore had contaminated twenty and sometimes even more lines!

Another example of brackets abuse happens when people retrieve results from databases in the form of arrays and access them in the wild. And you end up finding code like this one for accessing the first element:

$prop1 = $res['results'][0]['property1']
$prop2 = $res['results'][0]['property2']

Easier and shorter way:

list($prop1, $prop2...) = array_shift($res['results']);

Could somebody explain me what's that obsession of people with Brackets Galores?

Everything's an array

Objects were created for some reason. The fact that creating arrays is very easy with php and you don't have to allocate memory first and all that unlike other languages doesn't mean that you have to structure all your data with arrays.

The problem with using arrays as structured data containers is that everything inside them is public and there's no way of having nice set/get functions which prevent bad data to enter the array. So, use objects!

That is yet another reason for moving to php5: you can have really private properties, so that nobody will modify internal data as it can happen with php4's objects.

Duplicated code

This one really gets on my nerves. How many times have I looked at an application, just to find the same three lines being copied and pasted from function to function and from file to file? And then I wonder: have they ever been taught the concept of function?

If there's something there which looks vaguely similar to something else you did a bit before, it is probably asking for some factorization love! And if you need certain functionality in two separate places, do not copy and paste the same code, make it a function!

Not only the final code ends being shorter, but it is easier to understand five lines of code which can expand to one hundred when you go inside each function, than to get shown one hundred lines of code all of a sudden and try to find out what is going on there.

The neverending switch

Talking about one-hundred-lines blocks, I have also recurrently found three hundred lines switch statements. And believe me, I wouldn't recommend it to my worst enemy. This is cruelty and mental exploitation at its worst and should be criminally punished!

Obviously, the hundreds of lines should be split into functions, so that you don't need to spend half an hour reading a switch to get an idea of what does it do.

You have to modify too many files

If you need to modify lots of files for adding or modifying even a simple feature, something is not working in your design. It usually happens when the different areas in the application are excessively coupled and each one needs to know too much about the other.

The main problem with this is that it tends to manifest itself when the application is getting larger and is also in big production mode, so new features are highly demanded and most of the developers just lie to themselves: It's OK to have to change a few dozen files for adding this new feature, I'll work on simplifying this later. And you know that later never happens.

Hard coded values

Constants were defined for some reason. It is bad to find things like

if($result==4) {
// blah
} else if($result==5) {
// blah blah

where numeric values are supposedly meaning something, but it is also quite bad to find things like

if($result=='row1') {
// blah
} else if($result=='row2') {
// blah blah

because if for example, you misspell something and instead of row1 you write rowl, PHP is not going to complain at you since it's a perfectly valid comparison. What you need is to use constants, so that whenever you try to use a constant which hasn't been defined, PHP complains and you fix it immediately.

While it looks fairly easy to distinguish between "1" (the number one) and "l" (the lowercase L), it is not that easy for people with problems like dyslexia, or put it simply, when you're tired. And the errors introduced because of not using constants are extremely hard to trace, most of the times you have to spend insane amounts of time going line by line of code and trying to find out why it doesn't work. It's just easier and more preventive to define constants and stop worrying about having to update all the hard coded values in the code later on if needed.

Interface inconsistency

This one doesn't seem to get much attention but it is a really worrying area in PHP. Not sure if it's something inducted by the language itself (just have a look at the string handling functions), but there's an strange tendency in php applications to not to follow standards in what regards to function arguments and return types.

Sometimes functions return boolean values (true or false) or boolean like values (0 or 1), other times they use C style return types (0 for success and 1 or more for errors), which are clearly opposite to using boolean true for success and boolean false for error, since in PHP land 0==false and 1==true (unless you use the stricter === and !== operators, which unfortunately seems to not to be the case for most of the php developers around)

In some other cases, functions do not return anything, but instead use an array passed by reference to write results in it, looking somehow like this: php function my_function(&$return_array) . I really dislike this practice, because unless you just saw the function definition for my_function, it's impossible to distinguish if php my_function($my_array) is using $my_array as an input or output parameter.

It is better to agree on a style for return values, and use that style consistently in every function you write. Like: we'll always return an object with two properties, error which is a boolean and represents whether the call failed or not, plus results which will contain data if we need to return something.

With that, you know that you can always check the value of $res->error instead of having to check first if it exists or not, and then check the value itself. You get the idea...


Some of these issues would almost disappear if we had a really good php editor. While Eclipse does a decent job with the php plugins, most of the time, because of the way that php works, everything gets messed up and the autocomplete feature it's useless because it can't find the classes. Refactoring is something that almost relies on your short-term memory only (I think I saw something like this somewhere...), and other issues depend on people being high or not when programming (remember the 300 lines switch) or being competent in PHP and software design in general.

And now is the time to go back to your application and look at it with critical eyes!