PyDO (Python Data Objects) discussion
> On a final note, your PyDO has sparked a long conversation
IIRC, I believe that there may still be references to something called SDS in the documentation for PyDO. SDS did such caching (actually it cached relations and all sorts of things, it was quite a piece of work, query language, data modelling language and everything -- over top of Oracle).
PyDO was designed to be a realatively thin layer over the database (SDS was quite opaque). The main reason is this: with thick layers, if you need more than the layer will give you, you're basically hosed because going around it (and going straight to the database) may break it in odd ways. Also, if you have DBAs that want to know what you're going to do to their database, thick layers are very difficult to describe and your DBAs will hate you. PyDO was designed with the goal that it makes the common stuff easy (to do and explain), but advanced queries and things like that (e.g. stored procs), you can go direct to the db (perhaps as a PyDO object method tho..) without fear that something weird will happen -- and still be able to make the users of your data classes be ignorant of what is happening behind the scenes.
In short, I basically found out that while thick ones are pretty and do all cool theoretical stuff:
The thin ones may not do everything under the sun and may have a few ugly bits, but they do most of boring stuff that you'd have to do by hand otherwise and make the hard stuff still doable, while having most of the usability benefits of the thicker ones.
As for features that I wish PyDO had: support for more databases! Given Oracle, PostgreSQL and MySQL (we've got sapdb too) catch most people, but I'd still like to be able to support sybase, Interbase, DB2 and any other databases I can't think of right now and be able to provide a genscript (a script that grovels over the system catalogs/data dictionary/whatever and produces PyDO classes for them -- see pgenscript and ogenscript) for them.
> At the moment two calls to Employees.getSome() will return two lists
With caching, this problem only gets worse (as it turns out, it's *much* worse). There's obviously the problem of when to invalidate cache entries, how much object manipulation you're allowed to do before it pushes it to the database. If you want a cheesy, simple relationship cache, you could do something like the following:
#in some Groups object def getUsers(self): if not hasattr(self,'users'): self.users = Users.getSome(GROUP_ID=self['ID']) return self.users
The other problem that I found out the hard way, is that there is no "generic form of caching" that suits all applications properly. If your caching scheme doesn't fit your application, weird shit happens. My specific case was that SDS's cache was designed with web applications in mind. When used in a newsfeed script (the Reuters feed, IIRC) it got *really* confused (and tended to gobble up memory), mainly because SDS's cache was meant to be flushed fairly often (at the end of a web request) and the feed script was a long running program (well it was supposed to be, but it's confusion didn't allow it to be, we wound up having to flush it's cache every N articles).
Until databases have some mechanism for invalidating entries in some external cache, this problem will continue.
Anyway, this is why PyDO has no built in caching.
Drew Csillag <firstname.lastname@example.org>
Ideas from you!
Please add your thoughts on this discussion in this public guest book and share your thoughts with other visitors.
> Until databases have some mechanism > for invalidating entries in some > external cache, this problem will continue.
I wonder if triggers could be used for this. I've never programmed a trigger, but if you could set one up to call your code every time a change was made (this would be like Observer Pattern), then we could cache to our hearts content, have multi-users and be happy!
Not knowing how triggers work, I'm guessing you can't have a global trigger for any change to a database (too inefficient anyway?) vs. a trigger per table etc. - I'm not sure what is possible along these lines...
> I wonder if triggers could be used for this. I've > never programmed a trigger, but if you could set one > up to call your code every time a change was made > (this would be like Observer Pattern), then we could > cache to our hearts content, have multi-users and be > happy!
Perhaps triggers could be used for this, where they presumably notify some notification daemon and then the notification daemon notifies individual processes that have references to the appropriate objects (presumably, one would have to do some sort of registration with the notification daemon) to have them (in PyDO parlance .refresh()) refesh their state or commit suicide (if they were deleted).
The main problem here is the infrastructure that's involved. If you have long running transactions and processes (for some definition of long), it makes sense, for short lived processes or transactions, it may make more sense just to get rid of any objects it got than try to do any form of caching.