Note: This post is extremely advanced. If you’re not a technical person then we suggest you email it to your developer so that he can implement the suggestions below. If you don’t have a developer you can call us and we’ll take care of the optimization of your Joomla website.
We had a small debate amongst us when giving a title for this post – some of us thought that the “200 times” may seem like a marketing gimmick (something that we don’t do nor care about at itoctopus) – but at the end, there was a consensus to go with this title because there was no exaggeration whatsoever, it was true – and in this post we’ll explain how we did it!
A new client called us last Thursday (today is Tuesday) and told us that a local development company has just finished migrating his company’s website (the website was a hospital website with over 40,000 articles) to Joomla 2.5. Everything worked fine, except that the website was extremely slow – pages were literally taking ages to load. In fact, each page was taking 75 seconds to fully display on the browser. Needless to say, our client refrained from moving the website to production until the speed issue was resolved. As you have probably guessed, the client asked for our help to solve the problem…
Well, we first informed the client that by default, a Joomla 1.5 website is faster than a Joomla 2.5 website (this is a fact that the creators of Joomla seem to [unfortunately] deny). The reason for this is that Joomla 2.5 builds on some of the bad habits of Joomla 1.5, while it can do without them. Additionally, the developers of Joomla 2.5 seem to have forgotten that there are some very large websites using Joomla (OK, we’re done bashing the Joomla team).
We then informed the client that for Joomla to work with such an enormous amount of data, 2 factors have to be optimized: the hardware and the application.
Let’s start with the hardware first (the hardware is the least important of the two, but it’s the easiest to get done). A high traffic large Joomla website must be hosted on a very high end server, with the following specifications: 64GB of RAM, a 16 core high performance CPU, and an SSD drive. The large amount of RAM is needed to prevent disk swapping in case there are a large number of simultaneous visitors. The 16 core high performance CPU is necessary to handle the load that a huge Joomla website throws on MySQL. The SSD significantly reduces the access overhead to the filesystem and to the database.
As previously stated, the hardware is the easy part, because it’s just a matter of procuring the server with the above specifications (be aware that such as server can run you about $1,500 monthly).
As for optimizing the application, we informed our client that what we meant by that was modifying Joomla’s core to better adapt to the needs of his particular website. Our new client asked us to do the application optimization first, and here’s how we did it…
First of all, we printed all the queries that were processed on each and every page. We did that by adding an echo state to the setQuery function which is in the JDatabase class (in the file database.php which his located under the libraries/joomla/database folder). Not only that, we also printed the time of the start time of each and every query. Here’s the code that we added to the function setQuery to achieve this:
if ($_SERVER['REMOTE_ADDR'] == '[OUR IP]'){ //this if statement ensures that this debugging is only visible for our IP
$strTime = microtime();
$arrTime = explode(' ', $strTime);
echo("\n\n".$arrTime[1] + $arrTime[0]."\n");
echo($query."\n\n");
flush(); //flush allows us to display the query that is being executed
ob_flush();
}
Note: We are assuming, in our above code, that the non-database time to generate the page is negligible, which is the case in almost all Joomla websites.
Once we added the above code, we went to the homepage of the website (which was pointing to the featured items page) and we were surprised that a few Joomla queries caused a lot of slowdown. In fact, these Joomla queries, combined, took about 75 seconds to be done. One of them, in particular, took about 30 seconds and was being run twice (the same exact query). It was this one:
SELECT a.id, a.title, a.alias, a.title_alias, a.introtext, a.checked_out, a.checked_out_time, a.catid, a.created, a.created_by, a.created_by_alias, CASE WHEN a.modified = 0 THEN a.created ELSE a.modified END as modified, a.modified_by, uam.name as modified_by_name,CASE WHEN a.publish_up = 0 THEN a.created ELSE a.publish_up END as publish_up,a.publish_down, a.images, a.urls, a.attribs, a.metadata, a.metakey, a.metadesc, a.access, a.hits, a.xreference, a.featured, LENGTH(a.fulltext) AS readmore,CASE WHEN badcats.id is not null THEN 0 ELSE a.state END AS state,c.title AS category_title, c.path AS category_route, c.access AS category_access, c.alias AS category_alias,CASE WHEN a.created_by_alias > ' ' THEN a.created_by_alias ELSE ua.name END AS author,ua.email AS author_email,contact.id as contactid,parent.title as parent_title, parent.id as parent_id, parent.path as parent_route, parent.alias as parent_alias,ROUND(v.rating_sum / v.rating_count, 0) AS rating, v.rating_count as rating_count,c.published, CASE WHEN badcats.id is null THEN c.published ELSE 0 END AS parents_published FROM #__content AS a LEFT JOIN #__content_frontpage AS fp ON fp.content_id = a.id LEFT JOIN #__categories AS c ON c.id = a.catid LEFT JOIN #__users AS ua ON ua.id = a.created_by LEFT JOIN #__users AS uam ON uam.id = a.modified_by LEFT JOIN ( SELECT contact.user_id, MAX(contact.id) AS id, contact.language FROM #__contact_details AS contact WHERE contact.published = 1 GROUP BY contact.user_id, contact.language) AS contact ON contact.user_id = a.created_by LEFT JOIN #__categories as parent ON parent.id = c.parent_id LEFT JOIN #__content_rating AS v ON a.id = v.content_id LEFT OUTER JOIN (SELECT cat.id as id FROM #__categories AS cat JOIN #__categories AS parent ON cat.lft BETWEEN parent.lft AND parent.rgt WHERE parent.extension = 'com_content' AND parent.published != 1 GROUP BY cat.id ) AS badcats ON badcats.id = c.id WHERE a.access IN (1,1) AND c.access IN (1,1) AND CASE WHEN badcats.id is null THEN a.state ELSE 0 END = 1 AND (a.publish_up = '0000-00-00 00:00:00' OR a.publish_up <= '2013-06-10 04:53:14') AND (a.publish_down = '0000-00-00 00:00:00' OR a.publish_down >= '2013-06-10 04:53:14') GROUP BY a.id, a.title, a.alias, a.title_alias, a.introtext, a.checked_out, a.checked_out_time, a.catid, a.created, a.created_by, a.created_by_alias, a.created, a.modified, a.modified_by, uam.name, a.publish_up, a.attribs, a.metadata, a.metakey, a.metadesc, a.access, a.hits, a.xreference, a.featured, a.fulltext, a.state, a.publish_down, badcats.id, c.title, c.path, c.access, c.alias, uam.id, ua.name, ua.email, contact.id, parent.title, parent.id, parent.path, parent.alias, v.rating_sum, v.rating_count, c.published, c.lft, a.ordering, parent.lft, fp.ordering, c.id, a.images, a.urls ORDER BY CASE WHEN a.publish_up = 0 THEN a.created ELSE a.publish_up END DESC , a.created;
In case you’re wondering why the query was being run twice, it was because the first time the query was getting all the rows (yes, all the rows), and the second time it was running by a function called getTotal in order to know how many rows there were for the pagination. An immediate optimization idea (which we did) was to suppress the redundancy and pass the getTotal function the number of rows by simply issuing the function count() on the resulting array from the first query.
Now, we reduced the page loading time by 30 seconds (we eliminated one of the two query calls), but there was another major problem – that query was taking 30 seconds! Now, before we continue, let us briefly explain what this query is and where it is…
This query is a generic query used by mainly com_content views that display a list of items in one or more categories. It resides in the articles.php model file which is located under the components/com_content/models folder.
From a distance, the above query might look all fine and dandy, but if you take a closer look, you can easily notice that there’s a lot of useless table joins that are hardly (if ever) used:
- The query makes a join on the table #__contact_details table. Is that really necessary? Why would any website care for a join between the #__content table and the #__contact_details table.
-
badcats are those categories that are disabled, and as such, the articles under them should not appear on the website. However, the inclusion of badcats in the query is extremely costly because they were creating an unnecessary (and very costly) join on the #__categories table. Yes, an article with a disabled category should not be displayed, but if someone is about to disable a category, then he’s better off manually (or automatically) disabling all the articles belonging to that category so as not to slow down performance.
-
The #__content_rating join is pure overhead (especially in the case of our client, who wasn’t even using ratings anyway on the website) – we think that if a view needs a rating, then it must lookup the rating value for each and every rating that is displayed – instead of getting the ratings of 40,000 articles, and then just use the ratings of 30.
-
The #__users join might seem necessary at first glance – because the author’s information is needed. However, since each article has the author id in the #__content table, isn’t it much better to retrieve the author’s information only for the rows displayed (e.g. in the view itself) rather than getting the author’s information of all the articles and then displaying those of 30. (This point is very similar to the previous point.)
-
Last but not least, the join #__content_frontpage table is completely unnecessary and we have no idea why Joomla is still doing that join when we have the featured and the ordering fields in the #__content table. Not only that join is unecessary (and way too costly if a website has a lot of featured articles), but also the whole #__content_frontpage table should not be there in the first place. It’s simply no longer needed!
After removing the above unnecessary joins from the query, its execution speed was greatly improved, but it was still unacceptable. The query was taking 2 seconds to execute – which is way too high. So we examined it more closely…
We looked at the included fields in the query, and we noticed that we only needed one field – just one, which was the id of the entry in the #__content table. Yes – you read that right – only one field is needed. All the other fields are completely unnecessary – why oh why one needs to retrieve the introtext of 40,000 articles while he only needs the introtext of just 30 (or less) – by the way, some websites have all the content of their articles solely in the introtext field, so you can imagine the overhead. So, we scrapped all the other fields from the query, and we modified the views using that query to retrieve just the needed article information from the database. Yes, this means that for each entry to be displayed we will have a separate query (or more), but the cost of this is negligible when it is compared to retrieving nearly the full information of 40,000 articles from the #__content table.
Once we did the above the query was taking merely 0.07 seconds to execute!
We applied the same strategy for all the other costly queries, and we eventually achieved a load time (on production) of 0.35 seconds for the full page – which is about 214 times faster than the original 75 seconds!
We then enabled caching (we chose conservative caching) and we ensured that the caching time was 60 minutes for each page and 24 hours for the modules. This step was necessary because while 0.35 seconds is fast – it is not that fast when there are thousands of queries every hour to the website. A side note here: Many Joomla websites that perform very well in development, perform very poorly in production because the amount of traffic is not comparable. An unnoticeable performance issue on development is equivalent to an extremely slow site in production (not to mention, of course, the database errors).
Finally, we moved the website to production and we started monitoring its performance as of yesterday evening – and so far it is working without a hitch, even on the old, middle-end server. However, we did tell our client to migrate to a high end server as soon as possible (yes – we think it’s still needed – a high traffic Joomla website must have all the room it needs to perform as it should)…
If you have a very large Joomla website and you’re worried about performance issues when migrating to Joomla 2.5, then fear not; we can migrate your website from scratch and ensure that all the optimization techniques are implemented (for a very reasonable fee) so that your visitors can have a smooth experience. All you need to do is to contact us! We’re always available, we’re professional, and we know Joomla inside out!
[…] first time to generate the view and the second time to generate the paging (read more about this here). This strategy to load everything is lazy and amateurish at best – because 1) it can take […]
Hello and Congratulations for this Guide.
The only thing I haven’t figured out is how to remove the duplicate query with getTotal fuction.
You can explain to me?
Thanks in advance.
Hi Zac,
The solution to this problem requires a considerable core modification to Joomla that is way beyond the scope of this post.
[…] no problems whatsoever (unless you have a very large website, in that case you’ll need to optimize it). A custom CMS is the complete opposite, stability issues are the norm rather than the exception, […]
[…] users than for the general public. Most pages, on that particular website, were loading (after our optimization) in 0.2 to 0.3 seconds for visitors, but these same pages were taking nearly 20 seconds each to […]
And you submitted all this to the Joomla team right?
[…] Slowness: Google’s search engine doesn’t want to direct traffic to websites that take many seconds (or even minutes) to load, but rather to websites that load instantly. If your website falls into the first category, then it may get penalized, or, at best, it will drop a few spots in the search engine rankings to the point where your search engine traffic is significantly reduced (note that statistically, most people look only at the first 5 results when searching on Google – only those who are very desperate to find the right information look past the first page). Always ensure that your Joomla website loads quickly (you may want to read how we made a Joomla website 200 times faster). […]