|
Home > Archive > Apache JDO Project > November 2005 > Negative VOTE Issue 143: Aggregating null-valued expressions
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Negative VOTE Issue 143: Aggregating null-valued expressions
|
|
| Craig L Russell 2005-11-16, 8:45 pm |
| Javadogs,
Please comment if you have any issues with the proposal.
Issue 143
H
Treatment of null values in JDOQL COUNT
JDOQL currently says nothing about the treatment of null values in
the COUNT clause of a query. Based on the SQL treatment, and the fact
that JDOQL is intended to be executed by the back end datastore (see
below) I propose adding this to the JDOQL chapter:
<proposed>
If null values are aggregated, they do not participate in the
aggregate result. If all of the expressions to be aggregated evaluate
to null, the result is the same as if there were no instances that
match the filter.
</proposed>
The behavior of aggregates is described in Part 2 of the ANSI spec,
section 10.9.
1) A null column is excluded from a COUNT( colName ) aggregate. This
is described in the section 10.9 under General Rules 4a. The database
is supposed to raise a warning: "warning--null value eliminated in
set function"
2) Unless you specify the DISTINCT keyword, the COUNT aggregate will
not filter out duplicates. Each row, regardless of whether it is a
duplicate, will go into the tally. This is described in the same
section underGeneral Rules 4b.
For the record, Derby exhibits this ANSI behavior. To summarize:
-- the following query eliminates rows with null in column "a"
select count( a ) from foo;
-- the following query eliminates rows with null in column "a"
-- and eliminates duplicates
select count( distinct a ) from foo;
Craig Russell
Architect, Sun Java Enterprise System http://java.sun.com/products/jdo
408 276-5638 mailto:Craig.Russell@sun.com
P.S. A good JDO? O, Gasp!
| |
| Wes Biggs 2005-11-17, 5:45 pm |
| Quite possibly my foot will soon be in my mouth, but in the PFD 14.6.9,
the target expression for count must be "this" or a variable name (well,
it says "can" -- am I reading this incorrectly?). If this is truly a
restriction, I do not see the relevance of this addition, which only
makes sense when talking about counting field expressions or the results
of calculations.
An edge case to consider: If null values are allowed as primary keys in
an object using application identity, the likely SQL "select
count(primary_key_column) from my_table" may incorrectly omit a valid
instance, meaning that "SELECT COUNT(this) FROM MyClass" returns a
different value than Collection.size() when invoked on "SELECT this FROM
MyClass". Acceptable?
Wes
Craig L Russell wrote:
> Javadogs,
>
> Please comment if you have any issues with the proposal.
>
> Issue 143
> H
> Treatment of null values in JDOQL COUNT
>
> JDOQL currently says nothing about the treatment of null values in the
> COUNT clause of a query. Based on the SQL treatment, and the fact that
> JDOQL is intended to be executed by the back end datastore (see below)
> I propose adding this to the JDOQL chapter:
>
> <proposed>
> If null values are aggregated, they do not participate in the
> aggregate result. If all of the expressions to be aggregated evaluate
> to null, the result is the same as if there were no instances that
> match the filter.
> </proposed>
>
> The behavior of aggregates is described in Part 2 of the ANSI spec,
> section 10.9.
>
> 1) A null column is excluded from a COUNT( colName ) aggregate. This
> is described in the section 10.9 under General Rules 4a. The database
> is supposed to raise a warning: "warning--null value eliminated in set
> function"
>
> 2) Unless you specify the DISTINCT keyword, the COUNT aggregate will
> not filter out duplicates. Each row, regardless of whether it is a
> duplicate, will go into the tally. This is described in the same
> section underGeneral Rules 4b.
>
> For the record, Derby exhibits this ANSI behavior. To summarize:
>
> -- the following query eliminates rows with null in column "a"
> select count( a ) from foo;
>
> -- the following query eliminates rows with null in column "a"
> -- and eliminates duplicates
> select count( distinct a ) from foo;
>
>
> Craig Russell
>
> Architect, Sun Java Enterprise System http://java.sun.com/products/jdo
>
> 408 276-5638 mailto:Craig.Russell@sun.com
>
> P.S. A good JDO? O, Gasp!
>
>
| |
| Craig L Russell 2005-11-17, 5:45 pm |
| Hi Wes,
On Nov 17, 2005, at 7:23 AM, Wes Biggs wrote:
> Quite possibly my foot will soon be in my mouth, but in the PFD
> 14.6.9, the target expression for count must be "this" or a
> variable name (well, it says "can" -- am I reading this
> incorrectly?). If this is truly a restriction, I do not see the
> relevance of this addition, which only makes sense when talking
> about counting field expressions or the results of calculations.
I think the description in the specification is too narrow; my mistake.
<spec 14.6.9>
count(<expression> ): the count of the number of instances of this
expression is returned; the expression can be “this” or a variable name
</spec 14.6.9>
I think it should be
<proposed 14.6.9>
count(<expression> ): the count of the number of instances of this
expression is returned; the expression is preceded by an optional
"distinct" followed by “this”, a navigational expression that
terminates in a single-valued field, or a variable name
sum(<numeric field expression> ): the sum of field expressions is
returned; the expression is preceded by an optional "distinct"
min(<field expression> ): the minimum value of the field expressions
is returned; the expression is preceded by an optional "distinct"
max(<field expression> ): the maximum value of the field expressions
is returned; the expression is preceded by an optional "distinct"
avg(<numeric field expression> ): the average value of all field
expressions is returned; the expression is preceded by an optional
"distinct"
</proposed 14.6.9>
With the definition as it currently exists, select count(manager)
from Employee is not legal, but it certainly should be. It counts the
number of employees with non-null managers. And select count(distinct
dept.manager) from Employee counts the number of managers.
>
> An edge case to consider: If null values are allowed as primary
> keys in an object using application identity, the likely SQL
> "select count(primary_key_column) from my_table" may incorrectly
> omit a valid instance, meaning that "SELECT COUNT(this) FROM
> MyClass" returns a different value than Collection.size() when
> invoked on "SELECT this FROM MyClass". Acceptable?
Primary keys cannot be null. So I don't think this is an issue.
Craig
>
> Wes
>
> Craig L Russell wrote:
>
>
Craig Russell
Architect, Sun Java Enterprise System http://java.sun.com/products/jdo
408 276-5638 mailto:Craig.Russell@sun.com
P.S. A good JDO? O, Gasp!
| |
| Wes Biggs 2005-11-17, 5:45 pm |
| Sounds good with changes as you mentioned. Thanks.
Wes
Craig L Russell wrote:
> Hi Wes,
>
> On Nov 17, 2005, at 7:23 AM, Wes Biggs wrote:
>
>
>
> I think the description in the specification is too narrow; my mistake.
>
> <spec 14.6.9>
> count(<expression> ): the count of the number of instances of this
> expression is returned; the expression can be “this” or a variable name
> </spec 14.6.9>
>
> I think it should be
>
> <proposed 14.6.9>
> count(<expression> ): the count of the number of instances of this
> expression is returned; the expression is preceded by an optional
> "distinct" followed by “this”, a navigational expression that
> terminates in a single-valued field, or a variable name
> sum(<numeric field expression> ): the sum of field expressions is
> returned; the expression is preceded by an optional "distinct"
> min(<field expression> ): the minimum value of the field expressions is
> returned; the expression is preceded by an optional "distinct"
> max(<field expression> ): the maximum value of the field expressions is
> returned; the expression is preceded by an optional "distinct"
> avg(<numeric field expression> ): the average value of all field
> expressions is returned; the expression is preceded by an optional
> "distinct"
> </proposed 14.6.9>
>
> With the definition as it currently exists, select count(manager) from
> Employee is not legal, but it certainly should be. It counts the
> number of employees with non-null managers. And select count(distinct
> dept.manager) from Employee counts the number of managers.
>
>
>
> Primary keys cannot be null. So I don't think this is an issue.
>
> Craig
>
>
> Craig Russell
>
> Architect, Sun Java Enterprise System http://java.sun.com/products/jdo
>
> 408 276-5638 mailto:Craig.Russell@sun.com
>
> P.S. A good JDO? O, Gasp!
>
>
| |
| Michael Bouschen 2005-11-17, 5:45 pm |
| Hi Craig,
I agree with the proposed change.
Regards Michael
> Hi Wes,
>
> On Nov 17, 2005, at 7:23 AM, Wes Biggs wrote:
>
>
>
> I think the description in the specification is too narrow; my mistake.
>
> <spec 14.6.9>
> count(<expression> ): the count of the number of instances of this
> expression is returned; the expression can be “this” or a variable name
> </spec 14.6.9>
>
> I think it should be
>
> <proposed 14.6.9>
> count(<expression> ): the count of the number of instances of this
> expression is returned; the expression is preceded by an optional
> "distinct" followed by “this”, a navigational expression that
> terminates in a single-valued field, or a variable name
> sum(<numeric field expression> ): the sum of field expressions is
> returned; the expression is preceded by an optional "distinct"
> min(<field expression> ): the minimum value of the field expressions is
> returned; the expression is preceded by an optional "distinct"
> max(<field expression> ): the maximum value of the field expressions is
> returned; the expression is preceded by an optional "distinct"
> avg(<numeric field expression> ): the average value of all field
> expressions is returned; the expression is preceded by an optional
> "distinct"
> </proposed 14.6.9>
>
> With the definition as it currently exists, select count(manager) from
> Employee is not legal, but it certainly should be. It counts the
> number of employees with non-null managers. And select count(distinct
> dept.manager) from Employee counts the number of managers.
>
>
>
> Primary keys cannot be null. So I don't think this is an issue.
>
> Craig
>
>
> Craig Russell
>
> Architect, Sun Java Enterprise System http://java.sun.com/products/jdo
>
> 408 276-5638 mailto:Craig.Russell@sun.com
>
> P.S. A good JDO? O, Gasp!
>
>
--
Michael Bouschen Tech@Spree Engineering GmbH
mailto:mbo.tech@spree.de http://www.tech.spree.de/
Tel.:++49/30/235 520-33 Buelowstr. 66
Fax.:++49/30/2175 2012 D-10783 Berlin
| |
| Craig L Russell 2005-11-18, 5:45 pm |
| One last issue. Some SQL dialects don't allow distinct with min and
max (it doesn't make any difference to the semantics of min and max).
I propose disallowing it from JDOQL.
Here's the latest proposal:
<proposed 14.6.9>
A14.6.9-5 [The result expressions include:
“this”: indicates that the candidate instance is returned
<field>: this indicates that a field is returned as a value; the
field might be in the candidate class or in a class referenced by a
variable
<variable>: this indicates that a variable’s value is returned as a
persistent instance
<aggregate>: this indicates that an aggregate of multiple values is
returned; if null values are aggregated, they do not participate in
the aggregate result; if all of the expressions to be aggregated
evaluate to null, the result is the same as if there were no
instances that match the filter.
count(<expression> ): the count of the number of instances of the
expression is returned; the expression is preceded by an optional
“distinct” and can be “this”, a navigational expression that
terminates in a single-valued field, or a variable name
sum(<numeric field expression> ): the sum of field expressions is
returned; the expression is preceded by an optional "distinct"
min(<orderable field expression> ): the minimum value of the field
expression is returned
max(<orderable field expression> ): the maximum value of the field
expression is returned
avg(<numeric field expression> ): the average value of all field
expressions is returned; the expression is preceded by an optional
"distinct"
<field expression>: the value of an expression using any of the
operators allowed in queries applied to fields is returned
<navigational expression>: this indicates a navigational path through
single-valued fields or variables as specified by the Java language
syntax; the navigational path starts with the keyword “this”, a
variable, a parameter, or a field name followed by field names
separated by dots.
<parameter>: one of the parameters provided to the query.
</proposed 14.6.9>
On Nov 17, 2005, at 3:27 PM, Michael Bouschen wrote:
> Hi Craig,
>
> I agree with the proposed change.
>
> Regards Michael
>
>
>
> --
> Michael Bouschen Tech@Spree Engineering GmbH
> mailto:mbo.tech@spree.de http://www.tech.spree.de/
> Tel.:++49/30/235 520-33 Buelowstr. 66
> Fax.:++49/30/2175 2012 D-10783 Berlin
>
Craig Russell
Architect, Sun Java Enterprise System http://java.sun.com/products/jdo
408 276-5638 mailto:Craig.Russell@sun.com
P.S. A good JDO? O, Gasp!
|
|
|
|
|