I've decided that tackling the storage manager codebase in its current shape is a bit too much right now. I may end up going down the path of using refcounted buffers as part of the data store to temporarily work around the shortcomings in the code. I don't like the idea of doing that long-term because, to be quite honest, that area of the codebase needs a whole lot of reorganisation and sanity to make it consistent and understandable. I think it should be done before any larger-scale changes are done and this includes the obvious performance boosts available by avoiding the copying.
In any case, I'm going to move onto my next short-term goal - a very, very basic module framework. I'd like to at least shoehorn in some very simple dynamic module loading and unloading into the core, not unlike what TMF did for Squid-3 as part of their initial eCAP work. My initial plans are to do the bare minimum necessary to start breaking out a very small chunk of code into modules - namely the request rewriters (url, storeurl and location) so they don't have to be compiled in. It will also force a little bit of tidying up around the HTTP and client-side code.
The initial aim is purely code reorganisation. Instead of having a nested list of callbacks forming the request processing chain, I plan on having a simple iterative process finite state machine which will route the request through different modules as required before passing it along to the rest of the client-side code. I hope that I can (slowly!) unwind a large part of the request path hairiness and enumerate it with said state engine.
In any case, I won't be going anywhere near as far with this as I'd like to in the first pass. There are plenty of problems with this (the biggest being parsing compound configuration types like ACLs - for example, if I wanted to modularise the "asn" ACL type, the module will need to be loaded far before the rest of the configuration file is parsed; then it needs to hook itself into the ACL code and register (a la what happened with Squid-3) itself in there; then subsequent ACL line parsing needs to direct things at the ACL module; then the ACL module needs to make sure its not unloaded until everything referencing it is gone!) but I'm going to pleasantly ignore them all the first time around.
By keeping the scope low and the changes minimal, I hope that the amount of re-recoding needed later on down the track (once I've established exactly what is needed for all of this) should be limited.
Oh, and as an aside but related project, I'm slowly fixing the SNMP core startup code to -not- use 15-20 nested deep function calls as part of its MIB tree construction. It is a cute functional programming type construct but it is absolutely horrible to try and add something. The related bit is allowing for SNMP mib leaves to be added -and- removed at runtime - so modules can register themselves with the cachemgr and SNMP core to provide their stats.