Tuesday, March 9, 2010

Blog has moved!

I've moved the contents of this blog to my personal blog - http://adrianchadd.blogspot.com/ . All of the lusca related stuff will happen under the "lusca" tag.

I'm tired of having a handful of little blogs everywhere so I've decided to have one main blog which I'll use tags to separate out the various bits of content that matter.

Wednesday, September 30, 2009

Just a few Lusca related updates!

  • All of the Cacheboy CDN nodes are running Lusca-HEAD now and are nice and stable.
  • I've deployed Lusca at a few customer sites and again, it is nice and stable.
  • The rebuild logic changes are, for the most part, nice and stable. There seems to be some weirdness with 32 vs 64 bit compilation options which I need to suss out but everything "just works" if you compile Lusca with large file/large cache file support regardless of the platform you're using. I may make that the default option.
  • I've got a couple of small coding projects to introduce a couple of small new features to Lusca - more on those when they're done!
  • Finally, I'm going to be migrating some more of the internal code over to use the sqinet_t type in preparation for IPv4/IPv6 agnostic support.
Stay Tuned!

Sunday, August 16, 2009

Squid-3 isn't a rewrite!

G'day,

There seems to be this strange misconception that Squid-3 is a "rewrite" of Squid in C++. I am not sure where this particular little tidbit gets copy/pasted from but just for the record:

Squid-3 is the continuation of Squid-2.5, made to compile using the GNU C++ compiler. It is not a rewrite.

If Squid-3 -were- a rewrite, and the resultant code -was- as much of a crappy-performing, bastardised C/C++ hybrid, then I'd have suggested the C++ coders in question need to relearn C++. Luckily for them, the codebase is a hybrid of C and C++ because it did just start as a C codebase with bits and pieces part-migrated to C++.

Thursday, August 6, 2009

Preparation for next release; IPv6 checklist

I've been slowly working on tidying up the codebase before the next snapshot release. I've been avoiding doing further large scale code reorganisation until I'm confident that this codebase is as stable and performs as well as it should.

I'll hopefully have the next stable snapshot online tonight. I'll then re-evaluate where things are at right now and come up with a short-list of things to do over the next couple of weeks. It'll almost certainly be the remainder of the IPv6 preparation work - I'd like to prepare the last few bits of infrastructure for IPv6 - and make certain that is all stable before I start converting the client-side and server-side code to actively using the IPv6 routines.

The current IPv6 shortlist, if I decide to do it:
  1. client database code - convert to a radix tree instead of a hash on the IP address; make IPv4/IPv6 agnostic.
  2. persistent connection code - up the pconn hash key length to fit the text version of the IPv6 address. I'll worry about migrating the pconn code to a tree later on.
  3. Importing the last remaining bits of the IPv6 related code into the internal DNS code.
  4. Make sure the internal and external DNS choices both function properly when handling IPv6 addresses for forward and reverse lookups.
  5. Import the IP protocol ACL type and IPv6 address ACL types - src6 and dst6.
  6. Modify the ACL framework to use the IPv6 datatype instead of "sockaddr_in" and "inaddr" structs; then enable src6/dst6.
  7. Make certain the source and destination hostname ACLs function correctly for both IPv4 and IPv6.
  8. Test, test, test!
The last time I did a "hack" conversion to support IPv6 client side code I found a number of places which expected a newly-allocated struct to be zero'ed, and thus the "in_addr" embedded inside it to be INADDR_ANY. This caused some crashes to occur in production testing. I'm thus going to hold off on pushing through the IPv6 client side changes (which are actually surprisingly simple once the above is done!) until I've enumerated and fixed all of those particular nightmares.

The IPv6 server-side stuff is a whole different barrel of fun. I'm going to ignore a lot of that for now until I've made certain the client-side code is stable and performing as well as the current IPv4-only code.

I don't even want to think about the FTP related changes that need to occur. I may leave the FTP support IPv4 only until someone asks (nicely) about it. The FTP code is rife with C string pointer manipulations which need to be rewritten to use the provided string primitives. I'd really like to do -that- before I consider upgrading it to handle IPv6.

Anyway. Lots to do, not enough spare time to do it all in.

Tuesday, July 28, 2009

Updates - rebuild logic, peering and COSS work

I've committed the initial modifications to the storage rebuilding code. The changes mostly live in the AUFS and COSS code - the rest of Lusca isn't affected.

The change pushes the rebuild logic itself into external helpers which simply stream swaplog entries to the main process. Lusca doesn't care how the swaplog entries are generated.

The external helper method is big boost for AUFS. Each storedir creates a single rebuild helper process which can block on disk IO without blocking anything else. The original code in Squid will do a little disk IO work at a time - which almost always involved blocking the process until said disk IO completed.

The main motivation of this work was the removal of a lot of really horrible, twisty code and further modularisation of the codebase. The speedups to the rebuild process are a nice side-effect. The next big improvement will be sorting out how the swap logs are written. Fixing that will be key to allowing enormous caches to properly function without log rotation potentially destroying the proxy service.

Monday, July 13, 2009

Caching Windows Updates

There are two issues with caching windows updates in squid/lusca:

* the requests for data themselves are all range requests, which means the content is never cached in Squid/Lusca;
* the responses contain validation information (eg ETags) but the object is -always- returned regardless of whether the validators match or not.

This feels a lot like Google Maps who did the same thing with revalidation. Grr.

I'm not sure why Microsoft (and Google!) did this with their web services. I'll see if I can find someone inside Microsoft who can answer questions about the Windows Update related stuff to see if it is intentional (and document why) or whether it is an oversight which they would be interested in fixing.

In any case, I'm going to fix it for the handful of commercial supported customers which I have here.

Wednesday, July 8, 2009

Storage rebuilding / logging project - proposal

I've put forward a basic proposal to the fledgling Lusca community to get funding to fix up the storage logging and rebuilding code.

Right now the storage logging (ie, "swap.state" logging) is done using synchronous IO and this starts to lag Lusca if there is a lot of disk file additions/deletions. It also takes a -long- time to rotate the store swap log (which effectively halts the proxy whilst the logs are rotated) and an even longer time to rebuild the cache index at startup.

I've braindumped the proposal here - http://code.google.com/p/lusca-cache/wiki/ProjectStoreRebuildChanges .

Now, the good news is that I've implemented the rebuild helper programs and the results are -fantastic-. UFS cache dirs will still take forever to rebuild if the logfile doesn't exist or is corrupt but the helper programs speed this up by a factor of "LOTS". It also parallelises correctly - if you have 15 disks and you aren't hitting CPU/bus/controller limits, all the cache dirs will rebuild at full speed in parallel.

Rebuilding from the log files takes seconds rather than minutes.

Finally, I've sketched out how to solve the COSS startup/rebuild times and the even better news is that fixing the AUFS rebuild code will give me about 90% of what I need to fix COSS.

The bad news is that integrating this into the Lusca codebase and fixing up the rebuild process to take advantage of this parallelism is going to take 4 to 6 weeks of solid work. I'm looking for help from the community (and other interested parties) who would like to see this work go in. I have plenty of testers but nothing to help -coding- along and I unfortunately have to focus on projects that provide me with some revenue.

Please contact me if you're able to help with either coding or funding for this.