I've committed the initial modifications to the storage rebuilding code. The changes mostly live in the AUFS and COSS code - the rest of Lusca isn't affected.
The change pushes the rebuild logic itself into external helpers which simply stream swaplog entries to the main process. Lusca doesn't care how the swaplog entries are generated.
The external helper method is big boost for AUFS. Each storedir creates a single rebuild helper process which can block on disk IO without blocking anything else. The original code in Squid will do a little disk IO work at a time - which almost always involved blocking the process until said disk IO completed.
The main motivation of this work was the removal of a lot of really horrible, twisty code and further modularisation of the codebase. The speedups to the rebuild process are a nice side-effect. The next big improvement will be sorting out how the swap logs are written. Fixing that will be key to allowing enormous caches to properly function without log rotation potentially destroying the proxy service.
Tuesday, July 28, 2009
Monday, July 13, 2009
There are two issues with caching windows updates in squid/lusca:
* the requests for data themselves are all range requests, which means the content is never cached in Squid/Lusca;
* the responses contain validation information (eg ETags) but the object is -always- returned regardless of whether the validators match or not.
This feels a lot like Google Maps who did the same thing with revalidation. Grr.
I'm not sure why Microsoft (and Google!) did this with their web services. I'll see if I can find someone inside Microsoft who can answer questions about the Windows Update related stuff to see if it is intentional (and document why) or whether it is an oversight which they would be interested in fixing.
In any case, I'm going to fix it for the handful of commercial supported customers which I have here.
Wednesday, July 8, 2009
I've put forward a basic proposal to the fledgling Lusca community to get funding to fix up the storage logging and rebuilding code.
Right now the storage logging (ie, "swap.state" logging) is done using synchronous IO and this starts to lag Lusca if there is a lot of disk file additions/deletions. It also takes a -long- time to rotate the store swap log (which effectively halts the proxy whilst the logs are rotated) and an even longer time to rebuild the cache index at startup.
I've braindumped the proposal here - http://code.google.com/p/lusca-cache/wiki/ProjectStoreRebuildChanges .
Now, the good news is that I've implemented the rebuild helper programs and the results are -fantastic-. UFS cache dirs will still take forever to rebuild if the logfile doesn't exist or is corrupt but the helper programs speed this up by a factor of "LOTS". It also parallelises correctly - if you have 15 disks and you aren't hitting CPU/bus/controller limits, all the cache dirs will rebuild at full speed in parallel.
Rebuilding from the log files takes seconds rather than minutes.
Finally, I've sketched out how to solve the COSS startup/rebuild times and the even better news is that fixing the AUFS rebuild code will give me about 90% of what I need to fix COSS.
The bad news is that integrating this into the Lusca codebase and fixing up the rebuild process to take advantage of this parallelism is going to take 4 to 6 weeks of solid work. I'm looking for help from the community (and other interested parties) who would like to see this work go in. I have plenty of testers but nothing to help -coding- along and I unfortunately have to focus on projects that provide me with some revenue.
Please contact me if you're able to help with either coding or funding for this.