Return to the main Relationship Manager Pattern page.
Return to the main Relationship Manager Pattern Discussion page.

PyDO (Python Data Objects) discussion

Here is a reponse from the author of PyDO, Drew Csillag when I asked him why PyDO has no caching. See discussion related to Relationship Manager.

> On a final note, your PyDO has sparked a long conversation between a
> few of my collegues, regarding persistence layers and so forth. I'm
> not sure where PyDO stands in the grand scheme of things - obviously
> there are more sophisticated object to relational mapping
> technologies out there. In this context, what would you say is the
> most needed enhancement to PyDO? What springs to mind is perhaps a
> caching system, so that the same objects get returned when
> appropriate.

IIRC, I believe that there may still be references to something called SDS in the documentation for PyDO. SDS did such caching (actually it cached relations and all sorts of things, it was quite a piece of work, query language, data modelling language and everything -- over top of Oracle).

PyDO was designed to be a realatively thin layer over the database (SDS was quite opaque). The main reason is this: with thick layers, if you need more than the layer will give you, you're basically hosed because going around it (and going straight to the database) may break it in odd ways. Also, if you have DBAs that want to know what you're going to do to their database, thick layers are very difficult to describe and your DBAs will hate you. PyDO was designed with the goal that it makes the common stuff easy (to do and explain), but advanced queries and things like that (e.g. stored procs), you can go direct to the db (perhaps as a PyDO object method tho..) without fear that something weird will happen -- and still be able to make the users of your data classes be ignorant of what is happening behind the scenes.

In short, I basically found out that while thick ones are pretty and do all cool theoretical stuff:

they tend to be opaque and it's really hard to see what they're going to do and explain to your DBAs. BTW: I come from a *LARGE* web shop, and the DBAs, rightfully, want to know if you are going to bodyslam their database.
when they don't work, they're very difficult to debug
with most of them, you can't just drop them on top of just any schema. They usually have to fit some kind of pattern (e.g. all tables must have an ID column or something) for them to really work.

The thin ones may not do everything under the sun and may have a few ugly bits, but they do most of boring stuff that you'd have to do by hand otherwise and make the hard stuff still doable, while having most of the usability benefits of the thicker ones.

As for features that I wish PyDO had: support for more databases! Given Oracle, PostgreSQL and MySQL (we've got sapdb too) catch most people, but I'd still like to be able to support sybase, Interbase, DB2 and any other databases I can't think of right now and be able to provide a genscript (a script that grovels over the system catalogs/data dictionary/whatever and produces PyDO classes for them -- see pgenscript and ogenscript) for them.

> At the moment two calls to Employees.getSome() will return two lists
> containing totally different instances. Maybe I could learn to work
> with this 'feature' and not worry too much about it - but then
> again, stray objects might get out of date and you never know when
> you should be doing a .refresh() on them. If only the database
> could notify each PyDO object when it needs to be refreshed - that
> would be nice, and might allow multi user use, too. Thoughts?

With caching, this problem only gets worse (as it turns out, it's *much* worse). There's obviously the problem of when to invalidate cache entries, how much object manipulation you're allowed to do before it pushes it to the database. If you want a cheesy, simple relationship cache, you could do something like the following:

#in some Groups object
	def getUsers(self):
		if not hasattr(self,'users'):
			self.users = Users.getSome(GROUP_ID=self['ID'])
		return self.users

The other problem that I found out the hard way, is that there is no "generic form of caching" that suits all applications properly. If your caching scheme doesn't fit your application, weird shit happens. My specific case was that SDS's cache was designed with web applications in mind. When used in a newsfeed script (the Reuters feed, IIRC) it got *really* confused (and tended to gobble up memory), mainly because SDS's cache was meant to be flushed fairly often (at the end of a web request) and the feed script was a long running program (well it was supposed to be, but it's confusion didn't allow it to be, we wound up having to flush it's cache every N articles).

Until databases have some mechanism for invalidating entries in some external cache, this problem will continue.

Anyway, this is why PyDO has no built in caching.

Drew.

Drew Csillag <drew_csillag@yahoo.com>

Return to the main Relationship Manager Pattern page.
Return to the main Relationship Manager Pattern Discussion page.

Ideas from you!

Please add your thoughts on this discussion in this public guest book and share your thoughts with other visitors.

Date:: 3/25/02
Time:: 12:52:15 AM
Remote User:

Comments

> Until databases have some mechanism > for invalidating entries in some > external cache, this problem will continue.

I wonder if triggers could be used for this. I've never programmed a trigger, but if you could set one up to call your code every time a change was made (this would be like Observer Pattern), then we could cache to our hearts content, have multi-users and be happy!

Not knowing how triggers work, I'm guessing you can't have a global trigger for any change to a database (too inefficient anyway?) vs. a trigger per table etc. - I'm not sure what is possible along these lines...

-Andy Bulka

Date:: 3/25/02
Time:: 8:06:48 AM
Remote User:

Comments

> I wonder if triggers could be used for this. I've > never programmed a trigger, but if you could set one > up to call your code every time a change was made > (this would be like Observer Pattern), then we could > cache to our hearts content, have multi-users and be > happy!

Perhaps triggers could be used for this, where they presumably notify some notification daemon and then the notification daemon notifies individual processes that have references to the appropriate objects (presumably, one would have to do some sort of registration with the notification daemon) to have them (in PyDO parlance .refresh()) refesh their state or commit suicide (if they were deleted).

The main problem here is the infrastructure that's involved. If you have long running transactions and processes (for some definition of long), it makes sense, for short lived processes or transactions, it may make more sense just to get rid of any objects it got than try to do any form of caching.

-Drew