I’m happy to report that LibraryThing’s servers have undergone a considerable improvement. LibraryThing’s server administrator, John Dalton (member Felius), carried through an ambitious restructuring of how LibraryThing’s considerable traffic was distributed across our web servers. And the restructuring worked out.
Across the site “pages” have been sped up by about 100%, dropping from a median of about 1 second to about 0.5 seconds. Catalog, or “Your books,” pages have dropped from a median of 1.6 seconds to 0.8 seconds. Response has also become more predictable, with much a lower standard deviation of response times.
Good, but not everything. While server-rendering speed is important, it isn’t the only factor in perceived speed–the other two being transfer and rendering by the web browser. Server improvements also hide the fact that rendering times also include database actions, which were not improved by this change. The truly bothersome pages on LibraryThing are hindered by this, not by web server load per se. So, this change hasn’t done much to improve response time on catalogs with thousands or tens of thousands of books, hit for the first time that day, or on work-combination requests that require recalculating thousands of items of data. Basically, the improvement speeds up every page by an average of 0.5 seconds, but a 10 second page still takes 9.5 seconds.
Here are some graphs of the effect on different page types. The white band is a period for which we don’t have numbers.
Catalog. Savings have been dramatic but, as noted above, mostly on the vast majority of “normal” requests, not on the rare but painful ones.
Talk topic pages. These have gotten much faster, because the data is easy to get from the database but also very copious, so it took a lot of server work to render it. This improvement has a perverse side-effect, however–the faster the page is made the more the Talk page can get ahead of master-slave replication. This issue will be addressed in an upcoming improvement.
Work pages haven’t improved because they were already well-distributed across our servers.
Does LT run on physical infrastructure, or is it virtualized?
What tool do you use to log page generation times?
That’s nice. Why are touchstones significantly worse, as in none for ER books, loooooooong wait times for changes to the touchstone, etc?
LT runs on servers we own. The database boxes are all physical, while most of the rest is virtualised (Xen on RHEL).
“Basically, the improvement speeds up every page by an average of 0.5 seconds, but a 10 second page still takes 9.5 seconds.”
Which is the real problem. Today it is still taking roughly 18-20 seconds for the Talk page to load. That 0.5 second improvement is sort of like the girl with four feet of hair who trims two inches: most people don’t notice.
thank you. it’s appreciated.