|
Home > Archive > Apache JDO Project > December 2005 > User demand and Issue 150. [was Re: Issue 150: Consistency requirements
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
User demand and Issue 150. [was Re: Issue 150: Consistency requirements
|
|
| David Bullock 2005-12-29, 2:45 am |
| Bin Sun wrote:
>Hi, David!
>
> I'd like to share some of my ideas. We are
>developing heavy stressed web applications based on
>JDO. In each bi-di-relation, if the coder has to
>assign both sides, the other side(usually a
>collection) has to be loaded into memory and cause
>much performance penalty. It's more efficient to leave
>this synchronization task to the implementation.
>
>
Hi Bin,
Thanks for your sharing your experiences. This is what I was wanting -
a justification from a real live user!
But I have a quibble with your statement: "if the coder has to assign
both sides, the other side (usually a collection) has to be loaded into
memory".
So you're saying that if I have persistent objects just so:
public class Invoice {
private Set<Line> lines;
public void addLine(Line line) {
this.lines.add(line);
}
}
public class Line {
private Invoice invoice;
public void setInvoice(Invoice invoice) {
this.invoice = invoice;
}
}
and I do this:
static {
Invoice invoice = Util.getPersistentInvoice("1234");
Line line = new Line();
line.setInvoice(invoice):
invoice.addLine(line); /*expensive?*/
}
that the JDO impl necessarily fetches the entire contents of the
Invoice.lines collection? And *because* of this problem, you as a user
wish to avoid the invoice.addLine(line) call?
I guess there might be JDO impl's out there which do that. But I'd
rather pay money for a good one [1] which allowed me to control the
fetch policy. There's absolutely no reason why the JDO impl needs to go
to the database to let me invoke .add() on a SCO collection, until it
needs to flush and do an INSERT.
Hence there is no performance hit when setting both sides 'manually'.
Please hasten to correct me if I've misunderstood why the performance
hit takes place.
cheers,
David.
[1] Not that there are many takers for my money anymore since JSR 220
'took over' persistence.
| |
| David Bullock 2005-12-29, 2:45 am |
| Bin Sun wrote:
>Hi, David!
>
> SCOCollection.add() may be implemented with lazy
>feature, and you're right at this point. However,
>convinience is also a concern for me. Since I had
>noted the bi-directional attribute for a collection in
>the metadata, I should be allowed to modify one side
>of them and let the impl. do the last. As I said.
>
>
Sure, and it would be a handy feature [1]. But since under the current
spec and its pending proposal, you don't get FULLY-managed
relationships, there is a span of time - between when you update one
side and when you finally commit - when the other side of the
relationship is NOT updated. Does this partway solution meet any
programming need of yours?
To illustrate, using our Invoice and Line classes from before, under the
current and proposed specs, you are being offered the following behaviour:
static {
PersistenceManager pm = Util.aquirePM();
pm.currentTransaction().begin();
Invoice inv = Util.lookupInvoice(pm, "1234"); // only has 0 lines
when picked up.
Line line = new Line();
line.setInvoice(inv); // we'll just update 1 side, because section
15.3 lets us take a shortcut
assert (inv.getLines().contains(line)); // assertion fails
pm.currentTransaction().commit(); // RetainValues or
DetachAllOnCommit are true.
assert (inv.getLines().contains(line)); // assertion succeeds
}
Notice how the same assertion yields different results before and after
commit. Is that really the behaviour you want of your
PersistenceManager? I know you *really* want fully-managed
relationships, but since you can't have them, are you truly happy with
the above situation[2]?
sincerely,
David.
[1] Although it only saves you one simple line of code.
[2] As I hope I've made clear by now, I'm not happy with it, and I
regard this behaviour as obnoxious. Nobody has so far presented any
argument at all as to why I should be happy with it.
| |
| David Bullock 2005-12-30, 5:45 pm |
| Craig L Russell wrote:
> First, let me say that I admire your passion. I wish that all expert
> group members were thus.
I figure that if it's worth mentioning in the first place, then it's
worth pursuing until it's clear to me that it's a flawed idea, or clear
to others that it's a sound one. It is, admittedly, costing me much
more time and effort to get to either state than I would like. But as
the British say, 'in for a penny, in for a pound'.
> I do have a quibble with your counter example below. Your code ignores
> the return boolean value from this.lines.add(line). What value would
> you return if the collection were not loaded?
OK, interesting point. The JDO impl would at least have to do a single
SELECT to verify if the Collection.contains() the added item. It still
doesn't *have* to fault in the entire collection. If there were going
to be *repeated* inserts to the collection in this manner (say for a
dozen line items being attached to an invoice), then it might be more
efficient to fault in at least the PK's of the collection. This to my
mind is just one more piece of information to be added to fault
groups/fetch plans.
So it seems that whenever Set.add() or Map.put() is invoked (regardless
of how 15.3 reads), the price of an immediate datastore access is
incurred, because the contract of these methods promises to tell if the
collection was substantially modified EACH time.
Thus I concede there is some inherent performance advantage to be gained
by avoiding Collection.add() in user code, when the collection is in
fact transparently persisted. (A point I hadn't appreciated until now
... thanks for asking a good question).
I can also appreciate that RDBMS' present an opportunity, whereby a
value is flushed to the backing column of a <mapped-by/> field
effectively updates both sides anyway, so why not let the user have it
as soon as practicable? The timing you propose - when DetachAllOnCommit
occurs - is even laudable given that the JDO impl apparently lately
can't be relied on to intercept mutators to bring it about immediately.
In view of the performance savings attained by avoiding unnecessary
calls involving Collection.contains() (and only for such savings), this
seems a desirable hack. (Of course, when we finally get a JSR for
managed relationships, the hack won't be required any further).
I guess my main beef is that while this (performance motivated)
optimization benefits me performance-wise when I care to use it, and
doesn't cost me performance-wise when I don't care to use it, it comes
at the price of a cognitive burden - whether I happen to need it or not.
That burden is that I have to "watch my step", and not use the object at
the as-yet-unsynchronized end of the relationship, until after when the
15.3 guarantees it will be synchronized. I dislike this prospect (even
assuming it is always possible to keep track of the necessary state,
which is by no means obvious to me), because I already have my hands
full with programming obligations. Despite repeated attempts, I was not
successful in getting my fellow user Bin to un-captiously remark "I love
this burden - this burden is everything I dreamed of", so I will assume
for now I am not the only person in the world to dislike it.
So far I have argued that the burden of keeping track which objects I
can and can't use is always unnecessary. But for the sake of outsmoking
EJB3, I am willing to admit that it sometimes might be worth bearing.
So let's concentrate on making the burden habitable.
There are 3 strategies for dealing with this burden:
#1 The SyncRelationshipsAfterCommit behaviour happens or not, but I
delcare to the PM that I am studiously not relying on it, and wish to be
notified by a runtime exception if my code (or 3rd party code) fails to
update both sides of a relationship by the time commit occurs. There is
no cognitive burden. I forego the performance benefits of avoiding
Collections.add(). My code works fine with non-managed objects in
different contexts.
#2 The SyncRelationshipsAfterCommit behaviour always happens, but I
choose as a matter of policy not to rely on it, and I always manually
update both sides of the relationship. I forego the performance
benefits of avoiding Collections.add(). There is a small chance that my
code won't work with non-managed objects in different contexts, because
JDO doesn't tell me if I unintentionally violate my own policy.
(Although it allows me to selectively and intentionally violate it,
which might sometimes be beneficial). So some cognitive burden remains.
#3 The SyncRelationshipsAfterCommit behaviour always happens, and I
choose to exploit it by judicious and minimal use of the model before
commit. I live with the burden. I attain the performance benefits, I
win the Petstore 'benchmark'. I never update both sides of the
relationship unless I can't help it. Sometimes I will have to, because
the model objects are used by 3rd-party code I have no control over, and
it willl expect the relationship to be completely mutual even before
commit. When trying to use my code in contexts where the objects are
non-managed, I may have to rewrite my minimal pre-commit code, since the
absence of the synchronization in the non-managed environment will mean
that my post-commit code won't be receiving the model in the expected
consistent state. In the best case, because I partitioned my code
according to the principles of OO, and not according to the time when it
gets executed, I'll just have to touch the internals of every second
setter involved in setting up a bi-directional relationship. If I've
relied so heavily on the synch behaviour that I omitted accessors like
'Set getChildren()', mutators like 'void add(DomainEntity)' or factory
methods like 'DomainEntiry newChild()' on some of my interfaces, it'll
break existing clients of my code.
I hope I have established that #3 might not be every developer's cup of
tea, and that they might prefer to accept some performance hit to avoid
both the cognitive burden and the potential implications for their
code. Certainly, they should be given the choice. They sort of have
the choice with #2, but it is a bit hit-and-miss and they are not
receiving a lot of help from JDO in enforcing their policy. This would
be adequately, cheaply, and neatly addressed by #1.
I contend that providing for #1 and the user to request that partial
updates to relationships at commit be regarded as an 'inconsistent
update' is no more of a burden on vendors than it is for them to
synchronize the memory model - they have to perform the detection in any
case. So there is no good reason why you shouldn't allow this strategy
(in addtion to the others) in the JDO 2.0 spec.
in conclusion,
David.
|
|
|
|
|