December 2006 Archives
This guy's post about returning to his original focus (which would be Perl, Python, and PHP) reminded me of a post I had long intended to publish. It's (briefly) titled this: "How to correctly write PHP code in conjunction with databases." Long title, but I can't think of anything shorter.
Foreward: My database of choice would be MySQL (again, it's a PHP focused post). Note, however, that anything I mention here is easily applicable to any database. I have done developement on PGSQL and even, gasp, MSSQL, and likewise I'm quite confident that everything I mention here as far as databases go should apply to pretty much anything.
PHP is probably the most used language on the web. Why? It is fairly easy to pick up, and has a huge amount of support on pretty much any server anywhere, never mind the monstrosities of pre-existing PHP scripts you can download from... pretty much everywhere. Forums, download sites, e-commerce sites, you name it, you can probably find a PHP script doing it. However, as a result, it's one of the most commonly attacked languages out there, and while it is incredibly easy to pick up, it's not as easy to write PHP properly and in a secure manner.
Uh-oh, I wrote "PHP" and "secure" in the same sentence. That's a bad thing. PHP has recently recieved a fair amount of flak for it's security, or lack thereof. One of the lead programmers for PHP whose sole purpose was really just to make PHP more secure, stepped down. This article is a good read on the matter, for anyone curious. From what I gather, he stepped down for one main reason, with another one on the side. The main reason would be the response time to security holes. The side reason would be a view that PHP was built insecure from the ground up, and try as he might, that wasn't going to change.
That debate really comes down to "it's the programmer's fault for coding that way" vs. "you shouldn't even allow that to happen, ever." And it's true: improperly coded PHP is a complete nightmare, riddled with security holes, both in code and in function, and it turns into a complete mess quickly.
Now, let's avoid all of that, shall we?
Point #1: Check all of your data for validity.
Now, this is no small order. PHP is typically used to take user input, process it, and then output it. Right there, you have three places a malicious user can attack your code: the input, the processing/storage of the data, and finally, the output.
The input is one of the most commonly checked and parsed things, and, in my opinion, wrongly so. Assume, for a moment, that everyone everywhere is a non-malicious user. Why shouldn't they be allowed to have a username of "admin' or 1=1--"? Sure, it's a little weird, but then again, so is the internet as a whole. End result, most people code to strip out less commonly used characters. Others call htmlspecialchars(), and other flat-out deny query execution when some special characters are detected. Oh, and then you have the swarm of mysql_real_escape_string() people, and as a result, every other line will be mysql_real_escape_string().
Now, admittedly, the last bit there (mysql_real_escape_string()) is a real and valid solution to the bigger problem of SQL injection. But, it is not optimal. Optimal would be using SQL statements in combination with stored procedures. Currently, only MySQL is supported directly with stored procedures, by ways of the mysqli functions. However, PHP's PDO project is coming along very nicely, and with that you can use SQL statements with pretty much any database out there. SQL statments are, to put it simply, a way of using SQL queries in a manner that guarantees you will never be vulnerable to the most common type of SQL injection attacks (which is generally input manipulation).
Now, before you touch the database (I know, I'm out of order), you can do other sanity checks on your data. Take for instance, the ever common URL, http://example/page.php?id=1. It's that ?id=1 part that is common, and further, non-specific to PHP. My question to you is this: why is it that you have countless webpages where ?id=1 is a security hole? You can make your data checks really, really easy here: is_int(). Really, think about it. You can completly skip all forms of data validation in this example, by checking the results of $_GET["id"] with is_int(). Is there ANY reason why $_GET["id"] should be anything but a number? No? So then why are you spending so much time checking for odd characters that shouldn't be there, when you could merely check the datatype and know for sure that it's legit?
This can also easily apply to strings, or to be more defined, UUIDs
(On that page, search for "UUID()", it's roughly 2/3 of the way down).
If you know that a UUID will always be in
theaaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee format, you could use a regular
expression to check that easily (not to mention to check that it only
contains a-Z and 0-9 chars).
Point #2: Proper database security and design
If you've ever installed a PHP script that used SQL, it asked you for a login to a SQL server of your picking. In my opinion, it should ask you for two seperate logins. "Why?" That's an easy answer.
As it stands now, any SQL username and password you provide to a script, generally has complete reign over the database. SELECT, UPDATE, DELETE, the works. This poses the question: Why should someone viewing a page (not editing, adding, removing, etc.) even have the rights to turn a query around and run a DELETE? Does that even make sense? I sure hope not.
The solution to the problem is this: have two users, with differing rights on the SQL server itself. The first user should really only have SELECT, the ability to run a stored procedure, check out a view or two, and... that's it. Even at that, stored procedures that only execute SELECT. I can't name one reason why an anonymous page view should use a connection to the SQL server with enough rights to run an UPDATE. It just makes no logical sense. The second user, however, still shouldn't have UPDATE rights: rather, it should only have the ability to execute stored procedures. Stored procedures protect your data, and can sanitize code everywhere. Again: there is no reason why an anonymous page view should even have the rights, let alone ability, to run an UPDATE or a DELETE query. So, um, please stop allowing such things. It will make the world a nicer place.
Point #3: Taking point #2 a step further, and seperating the code that writes from the code that reads.
PHP 5.0 introduced a lot of very useful object oriented capabilities. Take for example, a mock forum, with the following functions: $forum->view->thread(), $forum->update->thread(). Again, there is no reason at all why view->thread() should be calling update->thread(). Functions and classes can be marked as private, protected, and public: please do so. Before someone goes off and states "but end users can't execute PHP code that I didn't write!", think again. It can happen. It has happened before. It is preventable. Plus, this makes your code cleaner and enforces good coding habits all around.
Point #4: Check your SQL output
As yes, the ever popular login string of "admin' or 1=1--". Commonly used is poorly coded web scripts to gain a login as the user "admin", you can simply google around for this and see countless example of it pretty much everywhere.
But there's a bigger problem with this. Let's assume for a moment that someone used this on a poorly coded website and succeded with a login as user "admin." The real problem? The "or 1=1" bit. You just selected every single user in the database. Here's yet another sanity check, useful for such things: *_num_rows(). This is an incredibly simple check all around: if you're trying to log in one user, and you get four rows back, you know something somewhere is wrong. So.... why do you allow it?
All of the above can really come down to two main points: data validation, and not allowing the database rights, but rather abstracting the rights into stored procedures. Data validation is usually taken too far in the wrong direction (stripping characters, not allowing query execution), while database rights are completly overlooked. All that data validation should be, is simple sanity checks (like is_int() and *_num_rows()). Used in combination with SQL statments and stored procedures (with limited database access on top), you'll quickly come to see just how secure PHP can really be.
The biggest problem with PHP out there currently? People try to secure their scripts, and then they do it incorrectly. "But I stripped out all of the backslashes!..."
I maintain two websites for my company. By "maintain", I mean "run the server, and let someone else update them." I wrote the code for one of them, and the other is a static webpage which hasn't changed in years. Those are just the public ones, and I have plans to add several more internal websites as we expand.
Oh, "as we expand." Seeing as every site will have it's own T1 line, point to point T1 line, and linux/windows server combo, this means that I could host the public (and private) websites on a lot of IPs, and a lot of servers. However, I really really didn't want to have to maintain and update website code on multiple servers. The solution? Subversion.
As it stands now, we have company sites A, B, and C. C is merely a remote location without any servers, linked into it all with a point to point T1. Likewise, we're going to ignore that site completly. A is considered the "home" building, and I work in the B building. The B building actually has space for IT equiptment, and likewise that's where the large majority of it resides. Sites A and B each have a linux/windows server combo, but site B also has a decent sized RAID array. So, I picked the 1.2TB RAID array to house the actual subversion repositories.
I setup the repo server to be served over apache. I didn't bother with SSL because 1) the point to point T1 lines run IPSEC and 2) apache is configured to allow the login "server/server" from a hard-coded list of IPs (which would be the servers). The server login is not allowed write priviliges either. If you were to try to access the SVN repo from anywhere else, the server login would not work, and further I'm the only other one with a login. Seeing as I'm sitting about four feet away from the SVN server, I'm not worried about plaintext transmission (but I'll fix that when I get around to it).
The repo itself contains 1) all of the needed apache vhost config files, 2) htaccess files and 3) all of the actual website data. Due to items one and two, in httpd.conf I can throw in the line "Include /srv/conf/vhosts/*" and never worry about having to configure apache ever again. Likewise, if I really screw something up, it's easy to revert all of the servers at once to a working configuration.
Logging is still one thing I need to figure out. Currently, the vhosts are just set not to log anything at all. I need to think up a way to store the logs locally on each server, yet still manage to generate statistics for all of the servers combine. The best I can come up with so far is a daily rotation of the logs, only at the end the previous logs are shipped off to be stored somewhere (and likewise merged together, and then analyzed). Unless anyone else has any better ideas, I'll probably wind up doing that in the near future.
The servers have a cron job that runs nightly, which will execute `svn up` in /srv, and then /etc/init.d/apache2 reload. That's it. That's really all that there is to it.
Now I have a versioned setup for multiple servers. Further, it's rather easy to add additional servers: USE="mysql apache2" emerge apache php; cd /; svn co http://svn/srv. Tada. All done, just like that.
I would like to point out, however, that there is a reason I did not use NFS for this. Firstly, that's a single point of failure over an already incredibly loaded point to point T1 line. That's also the second reason: less traffic going over the links that would be better used to serve people files as they work through the day. Lastly, I just don't like NFS much.
Plus, it allows me to update any aspect of the websites from any computer. Never mind the local copy I can keep on my laptop and desktop at work.
As a direct result of installing subversion to maintain a few websites, I've simplified the management that I have to do, increased my ability to effectively admin multiple servers at once, and cut back on bandwidth running inbetween the two sites.
Subversion: what do you want to do today?
