Build Scalable Web 2.0 Sites with Ubuntu, Symfony, and Lighttpd

One of the main concerns while developing ThemBid.com was to have a website that would be easy to use but that could scale as the project grew. Fortunately, I’ve been writing web applications for a long time and along the way I’ve been exposed to issues ranging from modifying the Linux Kernel IP stack all the way to creating Web 2.0 Photoshop icons and CSS rounded corners. In this post, I will discuss the steps taken to optimize our initial ThemBid.com server to have a scalable website using Ubuntu, Lighttpd, PHP, and Symfony that can be applied to most Web 2.0 applications running on Linux. In the last part of this article I discuss an upgrade plan to scale your application as you grow.

  1. Hardware - Initially we chose a decent server since we anticipated to get a few thousand users after the launch. The initial server is a dual core server with 2GB RAM and 2 x 36SCSI 10K RPM on RAID1. We decided to get this server from Layeredtech since from past experience they’ve had a reliable network.
  2. Low-Latency DNS - Since the plan was to expand rapidly, something to consider was a low latency DNS server since we wanted our site to load fast while providing reliability and helping us with our future expansion and redundancy plan. We considered using our registrar’s DNS but they did not offer IP anycast so we decided to look elsewhere. We decided to go with DNS Made Easy since they were affordable and they had really low latencies. You can use DNS Stuff to test them.
  3. OS - Since Linux is widely supported in many hosting environments and proven to scale it was always assumed Linux was an option. Our programmers environment is a machine running Ubuntu with two monitors with Eclipse in 1.5 monitors, a terminal on the background, and a browser on half of the second monitor. Ubuntu was chosen as the OS on the server as well, and now with their extended software maintenance that Ubuntu offers in the server distribution, it was an easy choice. This is a screen shot of our development environment in case you are wondering.
    Development
  4. Web Server - ThemBid.com has as many static pages/images as it does dynamic pages so we had to evaluate different servers. We initially looked at Apache with mod_php to server dynamic pages and a separate web server to serve static content. Then we took at look at Lighttpd. Lighttpd with FastCGI perfectly suits our needs. Lighttpd would handle all static image and “pass” the dynamic PHP page requests to FastCGI. Also, for those of you familiar with low-level event driven programming such as Kqueue, Epoll, etc. will know why an easy choice is Lighttpd. Lighttpd was written with speed in mind and they’ve done and excellent job. To install:
    
    # apt-get install lighttpd
    
    Then enable FastCGI on lighttpd with:
    
    # lighty-enable-mod fastcgi
    
    And edit /etc/lighttpd/conf-enabled/10-fastcgi.conf
    
    fastcgi.server    = ( ".php" =>
                          ( ( "host" => "127.0.0.1",
                              "bin-path" => "/usr/bin/php-cgi",
                              "port" => 9000
                            )
                          )
                        )
  5. Database - Since we have been developing MySQL based websites for a while and MySQL has scaled really good, we just decided this was not a time to test the waters again with PostgreSQL. MySQL slaves or a MySQL cluster would be enough for our scalability plan mentioned later. To install:
    
    # apt-get install mysql-client-5.0 mysql-common mysql-server mysql-server-5.0
  6. Development Framework - The first framework we looked at was Rails since currently its one of the most talked about frameworks. The main two reasons why we didn’t use it is because we were already familiar with PHP and know it can scale such as Digg and Yahoo and because its easier to find more experienced PHP programmers than Ruby programmers. After choosing PHP we were debating between using Symfony or CakePHP. This was a tough choice, but at the time of making the decision Symfony had better documentation and seemed more active. In addition, at the same time Yahoo had released an application based on the Symfony framework and that helped us in our choice for Symfony. I am glad we made that decision!! To install PHP (some of these we will need later):
    
    # apt-get install php5-cgi php5-cli php5-common php5-gd php5-mysql php5-mysqli
        php5-sqlite php5-dev make
    
    Configure Lighttpd on /etc/lighttpd/lighttpd.conf to work with Symfony as the
    following assuming you install Symfony on /var/www/www.thembid.com
    
    $HTTP["host"] == "www.thembid.com" {
      server.document-root = "/var/www/www.thembid.com/web"
      url.rewrite-once = (
            "^/(.*..+(?!html))$" => "$0",
            "^/(.*).(.*)"        => "$0",
            "^/([^.]+)$"          => "/index.php/$1",
            "^/$"                 => "/index.php"
        )
  7. AJAX - If possible, AJAX should be used to improve the user experience while navigating the site. Another advantage this brings is that the number of requests potentially decreases since only the content needed to reflect an action on a page is retrieved. Symfony makes this tasks extremely easy with the help with the Javascript library Prototype.
  8. Caching - One of the most important things to make an application scalable is to do smart caching to prevent repetitive and redundant actions and instead use the available resources to perform high priority tasks ASAP. The following are some of the caches we introduced in our application.
    • PHP Cache - By default, every time a PHP script is accessed, the web server must compile the script then execute the compiled code. This task is not needed if the PHP script doesn’t change. What a PHP accelerator does is it optimizes the scripts, compiles them, then caches them in a compiled state. Initially we tried XCache but we had an issue similar to this in Symfony. So we decided to go for eAccelerator. To install it:
      
      # wget http://bart.eaccelerator.net/source/0.9.5/eaccelerator-0.9.5.tar.bz2
      # tar xvjf eaccelerator-0.9.5.tar.bz2
      # cd eaccelerator-0.9.5/
      # phpize
      # ./configure
      # make install
      # mkdir /var/tmp/eaccelerator/; chmod 777 /var/tmp/eaccelerator/ 
      
      add the following to /etc/php5/cgi/php.ini
      
      zend_extension="/usr/lib/php5/20051025/eaccelerator.so"
      eaccelerator.shm_size="0"
      eaccelerator.cache_dir="/var/tmp/eaccelerator"
      eaccelerator.enable="1"
      eaccelerator.optimizer="1"
      eaccelerator.check_mtime="1"
      eaccelerator.debug="0"
      eaccelerator.filter=""
      eaccelerator.shm_max="0"
      eaccelerator.shm_ttl="0"
      eaccelerator.shm_prune_period="0"
      eaccelerator.shm_only="0"
      eaccelerator.compress="1"
      eaccelerator.compress_level="9"
      
      then restart lighttpd
    • Object and Content Cache - Object and content caching helps to eliminate the repetitive task of querying the database and re-computing the same result when our data hasn’t changed. Luckily Symfony already comes with multiple ways to implement caching. The main ones are file based caching, shared memory caching (using a PHP accelerator), and SQLite caching (we will release a Memcached based plugin soon). We anticipated to heavily use this cache so the number of items in the cache will be considerable. The file system cache was out of the question since we didn’t want our system spending all its time on file system operations trying to read and write these files. We wanted a cache that would solely live in memory for fast access. SQLite is really fast but it still resides on disk and the Memcached plugin was not written yet. We decided to use SQLite for its speed but put it on a memory based file system. This way we could get the speed on SQLite while sitting on memory. To do this we did the following:
      
      # mount -osize=100m -t tmpfs tmpfs /var/www/www.thembid.com/cache/
      
      or you can do this from /etc/fstab
      
      tmpfs   /var/www/www.thembid.com/cache/ tmpfs defaults,size=100m 0 0
    • Client Side Cache - Most of Javascripts, style sheets, and images would hardly change. As a result, we needed to tell the browsers not to re-download these for a while. Lighttpd comes with a module called mod_expire that can be configured to just do that. To configure edit /etc/lighttpd/lighttpd.conf and edit to make it look something like the following
      $HTTP["host"] == "www.thembid.com" {
        server.document-root = "/var/www/www.thembid.com/web"
        url.rewrite-once = (
              "^/(.*..+(?!html))$" => "$0",
              "^/(.*).(.*)"        => "$0",
              "^/([^.]+)$"          => "/index.php/$1",
              "^/$"                 => "/index.php"
          )
      
          expire.url = (
                        "/sf/"     => "access 1 days",
                        "/js/"     => "access 1 days",
                        "/css/"    => "access 1 days",
                        "/images/" => "access 3 days"
                      )
      }
  9. Monitoring - In order to tell when it is time to upgrade the server, resources must be closely monitored. A quick and easy to do it is by using Munin. To install:
    
    # apt-get install munin munin-node
    
    and change /etc/munin/munin.conf
    htmldir /var/www/www.thembid.com/web/status
    [thembid.com]
      address 127.0.0.1
     use_node_name yes

    Then visit yoursite.com/status and you will be able to closely monitor your resources. From there you can determine when and what to upgrade. Example:

    Munin Memory Graph

    In addition a web log analyzer such as Awstats should be installed to track hits and types of requests. This will let you know what parts of your site need to be emphasized the most for efficiency.

  10. Scalability Plan - Depending on how the Munin graphs look, it will dictate the upgrade plan to scale your site. The first upgrade should be moving the MySQL operations to a new server. This will free up a lot of resources on this server including memory, disk, and CPU. After having more than one machine it makes sense to move to a Memcached distributed memory cache. If there is a database bottleneck a MySQL master/slave configuration can be added or a MySQL cluster if its affordable. If there is a need for more Lighttpd servers a load balancer such as LVS can be added in front to distribute the load.

As you can see, there are many enhancements that can be implemented to make a website scalable using Ubuntu, Lighttpd, MySQL, PHP, eAccelerated, SQLite, and Memcached while developing with Symfony. Keep in mind that this post does not include a way to make your servers fault tolerant. This will be the topic of my next post once we do that with ThemBid.com.

Isaac Saldana

22 Responses to “Build Scalable Web 2.0 Sites with Ubuntu, Symfony, and Lighttpd”

  1. ruzz Says:

    minor correction: apt-get install mysql-client-5.0 mysql-common mysql-server mysql-server-5.0

    and you might want to refer people to http://ubuntuguide.org/wiki/Ubuntu_Edgy#How_to_add_extra_repositories

    or they will get a bevy of errors.

    great post though. :)

  2. isaldana Says:

    Thank you ruzz, the article has been updated!

  3. ThemBid.com » Blog Archive » Successful Blogging with Details Says:

    […] majority of the visits came from this blog post. If you browse through our blog posts you will see that the main difference between this post and […]

  4. Sebastian Says:

    Hi,

    nice article, thank you!

    Just one question: why did you use a SQLite cache on a memory based filesystem instead of directly using file cache on a memory based filesystem?

    Or aren’t you caching html fragments?

    Regards,
    Sebastian

  5. isaldana Says:

    Hi Sebastian,

    Good question. When you have a lot of elements in the cache and you are using a file system cache, it creates a deep file system structure. This will cause the OS to spend considerable amount of time in file system operations. For example, it needs to create the directories if they don’t exist, it has to check that you have permission on the directories before opening a file, etc. Also when you are deleting cache name spaces, it has to recursively delete directories and that is time consuming. You can see this is the case with database systems such as MySQL that have all their data in a few files instead of using a bunch of little files.

  6. John P Says:

    Very nice article, Please keep up the good work. I’ll read your blog more often from now on.

  7. xjfuf Says:

    Hello, you have a nice site, good LUCK!

  8. Shane Says:

    Wow, this was a great read - thanks for sharing your experiences and architecture:)

    Obviously, you love symfony. Can you tell me how many developers and how much time it took to develop ThemBid.

    Thanks!

  9. isaldana Says:

    Hi Shane,
    I am glad you liked it. We started development in late December 2006, with one developer/sys admin and a part-time graphics designer. A lot of the work was also on design, business issues, etc. We had a full demo by March 2007. We are still adding a bunch of features.

  10. » web 2.0 backends Says:

    […] link: http://blog.thembid.com/index.php/2007/04/05/build-scalable-web-20-sites-with-ubuntu-symfony-and-lig... […]

  11. ThemBid.com » Blog Archive » The WikiVersity Uses Our Blog as a Reference for System Administration Says:

    […] the WikiVerisity.org, there is an entry for System Administration that uses a blog post from our lead developer as a reference. […]

  12. Steve Says:

    I was wondering if you have any news on the memcached?

    You mentioned you were planning to release this plugin “soon” .. I was wondering what soon meant :-)

    We have a need for just such a symfony plugin. I think that any efforts our team spends on that could be better geared toward helping you with what you may already have, rather than starting one up from scratch.

    Any info you can offer regarding that would be greatly appreciated.

    Thanks,
    -steve

  13. Steve Says:

    And when I wrote:

    “I was wondering if you have any news on the memcached?”

    I meant:

    “I was wondering if you have any news on the memcached plugin?”

    Sorry, the coffee was still brewing while I wrote that …

    -steve

  14. todd Says:

    Can you please explain what you mean by “extended software maintenance”. I did a search and I didn’t find that phrase connected with Ubuntu.

    thanx

  15. isaldana Says:

    @todd

    I was referring to the Long Term Support (LTS) version. You can read more about the LTS release here http://www.ubuntu.com/news/606released

  16. Scalable web architectures » Blog Archive » Talks and slides from various web 2.0 architects Says:

    […] Build Scalable Web 2.0 Sites with Ubuntu […]

  17. Scalable web architectures » Blog Archive » Talks and slides from various web architects Says:

    […] “Build Scalable Web 2.0 Sites with Ubuntu Symfony and Lighttpd” […]

  18. Maski Says:

    Pretty good article.. thanks…

    By the way, this also works perfectly with Debian etch/Lenny with wordpress. Got my ram free to a 60% ,.. the sites I was tunning are now blazzing fast!!! thanks!

  19. Drivingsouth Says:

    Did you considered Zend against Symfony? I’m asking not because I like but because i don’t know and am currently considering that.

  20. Martin B. Says:

    Hey.. I was wondering if you ever released the memcached symfony plugin ?

    If not, I’d be willing to help you complete it. Contact me.

  21. isaldana Says:

    @Drivingsouth,
    The Zend framework was in its early stages and we needed more than it had at that time.

    @Martin B,
    I was but it already has been done: http://trac.symfony-project.com/wiki/sfMemcachePlugin

  22. Tony Says:

    Very nice article Isaac,

    We are new to symfony and the more we get to know it the more we are loving it…

    We are thinking of building a multimedia web page, something like you tube but we will provide the content so not anyone can upload.

    I are thinking of using a LVS on top of n real linux servers (some to handle petitions and some to handle video streams), symfony to handle front and backend and some cache. We are thinking of using the symfony’s sfMemcachePlugin since URLs wont change that much.

    Can you give us some advice?

    Thanks in advanced,
    Tony

Leave a Reply


Close
E-mail It