I have had to go through a php application recently which has given me more than one headache and has required me to use all my possible patience. While working with it, I thought This is good material for an article, so that nobody else does the same in the future, and nodody else will need to experience the same displeasure as I have had to
.
So here are the signs your PHP application needs a serious refactoring, right now:
Uses global variables
This one has two readings. In the first one, there’s that coding style in which a series of variables is placed in the global scope, each function simply does
global $varA, $varB;etc, and proceeds to do whatever it needs to do with the variables. This one is the worst one; lots of very hard to trace errors plague programs written with this technique, most often because globalized variables end up overwriting variables which they shouldn’t, so that’s bad. It’s also very complicated to keep track of who’s modifying what, because there aren’t function calls to access the variables, only global statements. There is no visibility at all.
Then there’s the second reading, in which readily available global variables ($_REQUEST, $_POST, $_GET) are again used (and abused) throughout the whole application. You can find, for example, data models accessing directly the $_POST or $_GET arrays, which not only introduces extra complexity into their logic (what if we decide to send data with GET instead of POST?, for example) but also complicate unit testing the different pieces of the application. You can’t just send an array of data to a model – it has to be in POST or GET.
Again, this way of doing things offers bad visibility, since you can’t really tell who’s changing what.
Brackets galore
Whenever I find things like this:
$variable['field']['subfield']['subsubfield1'] = 'value1';
$variable['field']['subfield']['subsubfield2'] = 'value2';
$variable['field']['subfield']['subsubfield3'] = 'value3';
$variable['field']['subfield']['subsubfield4'] = 'value4';
...… a word comes to my mind: disgusting.
It is way easier and faster to just do something like this:
$variable['field']['subfield'] =
array('subfield1'=>'value1', 'subfield2'=>'value2', 'subfield3'=>'value3', 'subfield4'=>'value4');You can indent it having one pair of array values per line, as you wish. The point is that if you ever need to change ‘field’ or ‘subfield’, you don’t need to do it four times (even if there are things like Search and Replace). I have found cases where the Brackets Galore had contaminated twenty and sometimes even more lines!
Another example of brackets abuse happens when people retrieve results from databases in the form of arrays and access them in the wild. And you end up finding code like this one for accessing the first element:
$prop1 = $res['results'][0]['property1']
$prop2 = $res['results'][0]['property2']
...Easier and shorter way:
list($prop1, $prop2...) = array_shift($res['results']);Could somebody explain me what’s that obsession of people with Brackets Galores?
Everything’s an array
Objects were created for some reason. The fact that creating arrays is very easy with php and you don’t have to allocate memory first and all that unlike other languages doesn’t mean that you have to structure all your data with arrays.
The problem with using arrays as structured data containers is that everything inside them is public and there’s no way of having nice set/get functions which prevent bad data to enter the array. So, use objects!
That is yet another reason for moving to php5: you can have really private properties, so that nobody will modify internal data as it can happen with php4′s objects.
Duplicated code
This one really gets on my nerves. How many times have I looked at an application, just to find the same three lines being copied and pasted from function to function and from file to file? And then I wonder: have they ever been taught the concept of function?
If there’s something there which looks vaguely similar to something else you did a bit before, it is probably asking for some factorization love! And if you need certain functionality in two separate places, do not copy and paste the same code, make it a function!
Not only the final code ends being shorter, but it is easier to understand five lines of code which can expand to one hundred when you go inside each function, than to get shown one hundred lines of code all of a sudden and try to find out what is going on there.
The neverending switch
Talking about one-hundred-lines blocks, I have also recurrently found three hundred lines switch statements. And believe me, I wouldn’t recommend it to my worst enemy. This is cruelty and mental exploitation at its worst and should be criminally punished!
Obviously, the hundreds of lines should be split into functions, so that you don’t need to spend half an hour reading a switch to get an idea of what does it do.
You have to modify too many files
If you need to modify lots of files for adding or modifying even a simple feature, something is not working in your design. It usually happens when the different areas in the application are excessively coupled and each one needs to know too much about the other.
The main problem with this is that it tends to manifest itself when the application is getting larger and is also in big production mode, so new features are highly demanded and most of the developers just lie to themselves: It’s OK to have to change a few dozen files for adding this new feature, I’ll work on simplifying this later
. And you know that later never happens.
Hard coded values
Constants were defined for some reason. It is bad to find things like
if($result==4) {
// blah
} else if($result==5) {
// blah blah
}where numeric values are supposedly meaning something, but it is also quite bad to find things like
if($result=='row1') {
// blah
} else if($result=='row2') {
// blah blah
}because if for example, you misspell something and instead of row1 you write rowl, PHP is not going to complain at you since it’s a perfectly valid comparison. What you need is to use constants, so that whenever you try to use a constant which hasn’t been defined, PHP complains and you fix it immediately.
While it looks fairly easy to distinguish between “1″ (the number one) and “l” (the lowercase L), it is not that easy for people with problems like dyslexia, or put it simply, when you’re tired. And the errors introduced because of not using constants are extremely hard to trace, most of the times you have to spend insane amounts of time going line by line of code and trying to find out why it doesn’t work. It’s just easier and more preventive to define constants and stop worrying about having to update all the hard coded values in the code later on if needed.
Interface inconsistency
This one doesn’t seem to get much attention but it is a really worrying area in PHP. Not sure if it’s something inducted by the language itself (just have a look at the string handling functions), but there’s an strange tendency in php applications to not to follow standards in what regards to function arguments and return types.
Sometimes functions return boolean values (true or false) or boolean like values (0 or 1), other times they use C style return types (0 for success and 1 or more for errors), which are clearly opposite to using boolean true for success and boolean false for error, since in PHP land 0==false and 1==true (unless you use the stricter === and !== operators, which unfortunately seems to not to be the case for most of the php developers around)
In some other cases, functions do not return anything, but instead use an array passed by reference to write results in it, looking somehow like this:
function my_function(&$return_array). I really dislike this practice, because unless you just saw the function definition for my_function, it’s impossible to distinguish if
my_function($my_array)is using $my_array as an input or output parameter.
It is better to agree on a style for return values, and use that style consistently in every function you write. Like: we’ll always return an object with two properties, error which is a boolean and represents whether the call failed or not, plus results which will contain data if we need to return something
.
With that, you know that you can always check the value of $res->error instead of having to check first if it exists or not, and then check the value itself. You get the idea…
Conclusion
Some of these issues would almost disappear if we had a really good php editor. While Eclipse does a decent job with the php plugins, most of the time, because of the way that php works, everything gets messed up and the autocomplete feature it’s useless because it can’t find the classes. Refactoring is something that almost relies on your short-term memory only (I think I saw something like this somewhere…
), and other issues depend on people being high or not when programming (remember the 300 lines switch) or being competent in PHP and software design in general.
And now is the time to go back to your application and look at it with critical eyes!
herotyc
First of all, very interesting article, I’m sure it took you a while to write.
The mistakes you’re talking are very common in people who never gone to university(or not to a serious one) and they have no idea or no experience about the principles of programming.
I’m talking about the Divide&Conquer, Keep it short and simple, Do not reinvent the wheel, etc.
Reinventing the wheel is common in php, if you dont take the time to look at the documentation and all the built-in functions available you may be end up writing your own array_sort. :D
Zarate
Great article. I sometimes do some of the mistakes, but my excuse is “i’m not an actual php developer” : )
Anyway, wouldn’t be great to gather this tips and some others is a single page? I don’t know, something like whatiknowabout/phpcrap : D
sole
Herotyc: it took me more to clean up the mess in the original app! :-)
Although I’m not sure that people which goes to university is less able to produce these monsters. I would say it is due to inexperience more than any other reason (inexperience whether it is in general software development or in php specifically).
Zarate: Are you asking for a translation for you-know-which website? ;)
Zarate
I am, i am!
I’ll be more than happy if you can take this as a base, add some other “basic” stuff (maybe db abstraction, or templates if you fancy them, i don’t know) and write a small, handy “book”. The problem is: it really takes time, belive me.
If you ever consider it, i volunteer for the translation (either to English or Spanish), review, or whatever you might need.
It also would be great asking the guys at Buayacorp * to add some stuff related to php security, as they keep a really active blog about it. But I’ve never even talked to them.
Cheers!
* http://www.buayacorp.com/
A dev
Hi
thanks for the article.
One comment about the “big switch”, I can’t see what it will change to split to code into functions…
A simple pair of Php comments like this is not enough to distinguish the different parts ?
switch($case) {
// *************
case 1:
// *************
}
Cups
More…
We in PHP land need more of this kind of article.
This and the wonderful metapundit’s LAG (loops are good).
http://metapundit.net/sections/blog?tag=php
Still, one nagging question that I haven’t chased down to my full satisfaction – why is a Constant not a Global then?
Thanks for the article – PHP-improvers (like me) should read it over breakfast every day!
Casey Wise
Very good article. I’m an informally (self-taught) trained developer and this kind of “thinking-about-programming-generally” literature is so valuable to guys like me. Your recommendation to use constants is one that I haven’t heard before. Thanks for taking the time to put down your thoughts!
S C
WRT returning errors. PHP 5 supports exceptions so use them. It seems like a waste of time to return an error code or return a success code with a result. Just default to always returning the result and throw an exception when an error occurs. It’ll clean your code up immensely. No more
if func1($arg) == true{
…
}else{
…
}
if func2($arg)….
Instead
try{
func1($arg)
func2($arg)
}catch(Exception $e){
//Do something meaningful
}
Gaetano Giunta
Not only is brackets galore more time consuming when recfactoring needs arise, but it is also much slower upon execution/optimization: php emits one opcode per assigment, versus one single opcode for the ‘good’ example.
About global vars, I do not in fact agree 100%. One of the great strengths of php is the shared-nothing model that it inherits straight out of the webserver, ie. every variable and function is teared down at the end of the request, slimming down considerably the chances for name collisions (unless you pull in 100 pages of libraries on every request, but I would also call that bad practice). My advice: use with care.
About object-vs-array: not having to think about private/protected/public is a real time saver: I used to spend literally hours wondering what the best access modifier would be for all my object members. Usually the class never got subclassed anyway, so the time was really wasted. If you worry about everyday usage of private members, just name them _memberx, and it will be evident to everybody using the object that it the member/method is meant to be private.
The real boon of using objects as “data storage” is, imho, the fact that when you get an object instead of an array, you are sure you cann access all its members without checking first for their existence.
While agree that solid conventions i calling are good, returning an error object is a bit overkill: if false (or null) is outside the domain of valid return values, just return that instead of an obj with two members. The advantage of returning an error obj is when you use a string instead of a boolean as error marker: ” = all ok, else you can log the very specific error msg
BTW: screw the editor, it just makes faster writing tons of boilerplate code that obfuscates the business logic. The real productivity gains in scripting languages come from the lazy-typing approach that makes editor life miserable.
Despite all the FUD that static-typing fans love to spread, using lazy typing usually does not make applications unsafer. In the end, it is much more important to give a function/parameter a meaningful name that to specify its type, eg:
function UserLogin(char* p1, char* p2)
vs.
function UserLogin($userName, $passWord)
sole
First of all, apologies for the late approval of comments! I was on holidays and haven’t been able to get internet access before.
S C: You’re exaggeratedly right! But there’s no way people can enjoy exceptions when in php4. And there’s a lot of legacy applications still not fully tested in php5, I’m afraid. That was the main reason I wrote that part… although I agree with you on that point.
Thanks all for the excellent comments!
metapundit
Cups said
>This and the wonderful metapundit’s LAG (loops are good).
(blushes) Thanks for the props cups! In answer to the question about constants – they are global in scope but the primary problem with global variables is not just that they are scoped globally but that they are mutable (changeable). This leads to side-effects laden programming – functions who set variables in the global scope and potentially conflict with other functions also setting variables in the global scope. Or consider a function foo() that has a typo when referencing something in $GLOBALS and clobbers a global variable some other function asdf() depends on. Tracking that bug down will not be easy since the point of failure will be in asdf() but the cause is in an unrelated function…
So constants are less bad because they are immutable. You can’t clobber them and they never change so using them doesn’t make your functions intertwined in the same way that using global variables does. That said, dynamically defining constants is a sin – conditionally executed define calls is potentially a bad code smell and dynamic ones like
$b = qwer();
$a=foo();
define($a,$b);
is also a really bad idea. Also, PHP constants are less useful than they could be since redefining a constant *fails silently*. If you are defining in a library you intend to be included in other code, be aware that define(‘MYCONST’,5); has no effect if MYCONST is already a constant…
Aaron Saray
Although for the most part, I agree with the advice about constants, I do constantly have this inner argument with myself: what happens when a constant fails (and I won’t pretend this is all my idea – this came up after reading metapundit’s post :) )
The issue I have is my usage of an intermediate script (and I don’t use a debugger – yah I know, bad Aaron – but I’m working on it!!) for processing with constants. If I forget to define a constant or it fails to define for some reason, PHP throws an error but evaluates the constant as a string I believe… right? At any rate, what happens if you have a situation like this:
–myscript.php–
//define (‘SECRET_PHRASE’, ‘chickennuggets’);
if (SECRET_PHRASE == $_GET['phrase']) {
print “you’re super”;
}
———————
For whatever reason, the define command is commented out. (Lets just for the moment ignore the huge programming holes this example allows – I just don’t want to make a more advanced example).
Now, we expect the following request to print “you’re super”;
http://server.com/myscript.php?phrase=chickennuggets
But with our missing define, this request is the real one that is successful
http://server.com/myscript.php?phrase=SECRET_PHRASE
Now, if this was a processing page and we were moving on, we’d never see the error either!!
This is the internal thing I struggle with when I use constants.
The solution I finally end up doing is this:
–myupdatedscript.php–
//define (‘SECRET_PHRASE’, ‘chickennuggets’);
if (defined(‘SECRET_PHRASE’) && SECRET_PHRASE == $_GET['phrase']) {
print “you’re super”;
}
———————
Not exactly elegant – but I try to put that in every conditional if I’m checking a constant.
sole
Aaron, you could enable E_NOTICE while developing, so that php warns you about uninitialized values such as undefined constants. I promise you, in the moment you begin working with E_NOTICE on, you find lots of potential situations such as the one you described.
From the manual:
More here: http://www.php.net/manual/en/ref.errorfunc.php#ini.error-reporting
Coder
Great article.
We have very similar views on code optimization
Jan Pieter Kunst
Cups wrote: “Still, one nagging question that I haven’t chased down to my full satisfaction – why is a Constant not a Global then?”
A constant is global, but it’s also immutable. So there is no need to worry about “keeping track of who’s modifying what”.
Ve
Great work! In particular, I’m glad you mentioned the “not-so-obvious” and “not-so-widely-discussed” (hence, easy to forget) ones – “Everything’s an array” and especially “You have to modify too many files”. Also, I second the advice on having E_NOTICE turned on while developing.
RE-coder
You missed a really obvious one, probably because it’s SO obvious – but it’s been a major headache in the million-line application I’m refactoring now. (OK, that was a little bit of an exaggeration; I didn’t actually start with a million lines, only 931,327. Close enough.)
Far, FAR worse than 300-line switch statements are 1000+ line nested if/if/elseif/else blocks. Try this one on for size (a real example taken from production code, just greatly simplified for clarity):
if ($a) {if($b) { } // 200+ lines
else if ($c) { } // 100+ lines
else doDefault();
} else if ($i) {
if($j) // 300+ lines
else if($c) { } // 100+ lines
else if ($k) {
if($l) { } // 250+ lines
else if($m) // 75 lines
}
} else if ($w) {
if($x) // { } 200+ lines
else if ($y) { } // 150 lines
}
else {
doDefault();
}
if($nothing_has_been_done_yet) { doDefault(); }
I tried to make the major problem there as obvious as possible. Condition
$c actually equated to ‘user does not have permission to view this page’.
However, there were certain conditions under which forbidden users were
inadvertently allowed in; for instance, when ($i && $j) equated to true,
or when ($w && !$x && !$y). This was a definite bug, one that had never
before been identified. And it would’ve been impossible find and fix
within the 1000+ lines of code without refactoring it.
My rewrite of that code ended up something like the following:
switch($foo) {case 'c': displayAccessForbiddenPage(); exit;
case 'b': doB(); break;
case 'j': doJ(); break;
case 'k': doK($l,$m); break;
case 'x': doX(); break;
case 'y': doY(); break;
default: doDefault();
}
I think it’s inarguable that the refactored code is far easier to read, understand, debug and modify than the original version.
Please pass the Excedrin, sole. Thank God I’m almost done with this beastie. I just fervently wish that the people who’d written this application had read your article before ever setting fingers to keyboard.
sole
O-M-G
I’m surprised you’ve survived – you’re a tough one!!!!!
The people which had written that … shouldn’t have set their fingers in the keyboard ever!
this makes for a true daily WTF, you should send it to them :-)
Dave Doolin
I’m working through reverse engineering my first php application right now. The code is… interesting…
I was prepared to accept something like “php is different” but after really digging in, no, php is just the same as every other programming language.
Part of it is php makes it easy to write a lot “working” code without knowing how anything actually works. Effective refactoring means you have to understand stuff like “scope.” Boring.