Sunday, August 15, 2010

Visualizing our example setup

Some people reading my post on setting up Apache, PHP, MySQL, Filezilla Server and PHPMyAdmin have seemed confused about how the whole setup works, or what the pieces are for. I put together this diagram a while back for a coworker and thought it might be helpful to readers here, too. Click the image for a larger version, suitable for zooming or printing out.



The basic idea that I want to convey is that web traffic has two sides: a client and a server. All conversation between them passes over a network. When someone visits a web page, their client, the browser, says "hey, send me your web page, please!" And on the other end of the network, a server gets the request and responds with the web page.

What's a server?


Before we go further, let's clear up some confusion about terms. Specifically, what is a server?

People use the term "server" to mean different things. In some cases, we mean "the program that responds to requests." With that meaning in mind, we'd call Apache a web server, MySQL a database server, and Filezilla Server an FTP server. In other cases, we mean "the computer that the server program(s) run on." With that in mind, we might say "our server sits on a rack in the server room."

In some setups, MySQL might be running on one server machine and Apache on another. In complex setups, both will be running on multiple machines.

The tutorial was all about setting up the various pieces of server software. If you set them up on your own computer, then visit your site from that same computer, then you refer to the server you're connecting to as "localhost" - local in the sense that it's on the same machine. In that case, the top part (the server) and the bottom part (the client) are really the same computer, and the network request never travels outside the computer you're working on. But of course most web sites you visit are served from elsewhere, and the network request might travel around the world.

Now, to look at the components we set up. Let's start by looking at the right side of the diagram, the HTTP side.

HTTP


This is the conversation between a browser and the server program - in our case, Apache. HTTP is the protocol they use to communicate. This just means that each of them expects requests and responses to be formatted a certain way - with headers, a body, and various sub-components of each.

When a user requests a URL, it's up to Apache to decide what to send back. In basic cases, if the user asks for "http://servername/foo.html", Apache will look for foo.html, read it, and send back its contents. If you ask for "bar.html" and no such file exists, it will tell you so. But you could have Apache do anything you like. For example, it could send back the same message no matter what URL was requested. Or it could map URLs that match certain patterns to certain responses, without there being a one-to-one relationship between requests and files on the server. (Ruby on Rails has a complex "routes" system that does this.)

Now imagine that the browser has requested a page, and Apache sends it back. What exactly does it send back? Just the HTML (in an HTTP "envelope", of course). The browser then looks at the HTML and says, "oh, I see that this lists some images, and a CSS file, and a Javascript file." It's then up to the browser (and the user's settings) whether it will ask for those other resources or not.

If the browser requests, say, an image on the page, the server sends it back, too. As far as the server is concerned, each request is totally separate. You can imagine the conversation like this:

Browser: I'd like this URL, please.
Server: Oh, hi, stranger! OK, here's some HTML.
Browser: (Hmmm, there's an image listed.) Server, please send me the image at this URL.
Server: Oh, hi, stranger! OK, here's that image.

Notice that the server doesn't "remember" the browser's previous request. In fact, it might even be a different server responding to the second request. This shows that HTTP is "stateless" - meaning that each request is separate, and doesn't depend on any previous ones. (This gets more complicated when you have logins and cookies and sessions, etc, but at bottom, it's the way HTTP works.)

In our example setup, when you ask for an HTML file, Apache just finds it and sends back the contents "raw" - without changing them. The same thing is true for CSS and Javascript files. But PHP is different.

Adding PHP


In our setup, we configured Apache to treat requests for .php files differently. When it sees that type of request, it asks php.exe to process the request and tell it how to respond.

PHP.exe will look at the file and execute any of the PHP instructions, like including headers from another file, running functions, pulling information from classes or objects, etc.

If the script requires it to, PHP will talk with a database server (in our case, MySQL) to get some of the information it needs to complete the request. For example, maybe it has to ask MySQL for a list of products, then output image tags and product descriptions for a catalog.

When it has finished running the script, PHP.exe gives the complete web page output to Apache, which sends it back to the browser, which can then ask for the images if it pleases.

One more thing to notice: as far as the browser is concerned, PHP and MySQL don't exist. It asks for a URL, and it gets back HTML (or an image, or other resource). It doesn't know or care how that gets put together. When it asks for "http://somesite.com/foo.php", it doesn't "know" that "foo.php" is a PHP script (and in fact, depending on how the server is configured, it might not be). The browser could ask for "foo.jpg" and get back HTML and it wouldn't be surprised, as long as the HTTP headers in the server's response said "what I'm sending you back is HTML."

In our setup, "foo.php" gets run when the browser requests a URL and the URL path is the file path to that script. But this is just a convenience for us, which makes it easy to see how the process works. We could make it more complicated, and as long as the browser could make a request and get a response it understood, it wouldn't know or care.

FTP


So where does Filezilla Server come in? If your server is on localhost, it may not. The basic purpose for Filezilla is to let you move files between the client and server computers. If they're the same computer, you can just copy the files there directly. But if not, you'll have to do it over the network, which is what FTP (File Transfer Protocol) is for.

Looking at the left side of the diagram, you'll see that there's another client-server relationship. This time, Filezilla Server is the server, and an FTP client (which could be Filezilla Client) is the client. If Filezilla Server is configured to use the same folder of files as Apache, you'll be able to change the content of your web site using this FTP conversation. To add or update a script, you'll upload it via FTP. The next time Apache gets a request for it, it will find it, get PHP to process it and send the results to the browser.