Apache Directory Project - ApacheDS partition implementation based on Relational Model

This is Interesting: Free IT Magazines  
Home > Archive > Apache Directory Project > November 2006 > ApacheDS partition implementation based on Relational Model





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author ApacheDS partition implementation based on Relational Model
Ersin Er

2006-11-02, 7:11 am

Hi all,

I need some advice on implementing a partition for ADS based on the
relational model and using SQL or Hibernate or JPA, or framework like
them..

First of all, is this realistic? Can we reach a usable result?

How can we map Attributes to SQL model? Should we hold Attribute
Values in blobs?

Can we leverage the power of SQL SELECT for LDAP search operations?

How much of the partition code in ADS can be used for this task?

And please share any more ideas you have.

Thanks in advance.

--
Ersin

Alex Karasulu

2006-11-02, 1:11 pm

Ersin Er wrote:
> Hi all,
>
> I need some advice on implementing a partition for ADS based on the
> relational model and using SQL or Hibernate or JPA, or framework like
> them..
>
> First of all, is this realistic? Can we reach a usable result?


Ok first off you need to better define exactly what you are trying to
achieve.

In my mind you might be asking to do 2 separate things:

1). Build a generic backend that backs data within a relational database
using JDBC and has a fixed custom schema for storing and querying LDAP
data.

2). Build a flexible backend that can map any relational database schema
to an LDAP schema and namespace. This is more like what is done with a
virtual directory.

I will presume below you are referring to #1 and answer your questions.

> How can we map Attributes to SQL model?


There are probably a few ways to do this but some will be much faster
however the faster it is the uglier it will be.

One way is to have one big table with the following columns:

1). ENTRY (BLOB)
2). NDN (VARCHAR)
3). UPDN (VARCHAR)
4). ID (INTEGER)

You can lookup entries that are blobs this way by normalized (NDN) and
user provided distinguished names (UPDN) as well as by ID.

If you want to index a specific attribute use some DDL to add a new
COLUMN to this table. That column should be the name of the attribute
being (LDAP not DB) indexed. Do a full table scan the first time and
populate this new "index" COLUMN with the values of the attribute.

Handling queries now is not that complex. Basically you need to
determine which attributes you have indices on and which you don't.
Then do a query to select and narrow down the rows that you'll have to
resusitate the entry from the blob from.

You might need another table for an existance index too. The EXISTANCE
table might have a ATTRIBUTE column, and ID column. If a record exists
in this table for an attribute your blobed entry then has a value for
this attribute.

Should we hold Attribute
> Values in blobs?


You will need to hold the entry in a blob.

> Can we leverage the power of SQL SELECT for LDAP search operations?


Sure. You just need to know how to build the WHERE clause of SQL using
this simple schema.

> How much of the partition code in ADS can be used for this task?


Not much.

Alex


Ole Ersoy

2006-11-02, 1:11 pm

Hi Ersin,

Alex and I talked about doing this.

I need to finish the first version of the the design
document.

OpenLDAP supports relational backends.

They also have this command:

ldbmcat - LDBM to LDIF database format conversion
utility

I think that performs relational to LDIF conversion.

Personally I need to be able to go between an LDAP
format, XML Schema format, and a relational format.

I'm planning on going with need a 4th representation
of
those 3, which will be Eclipse EMF Ecore (A subset of
OMG's meta object facility).

That way ecore is the common denominator.

So the initial thinking is that ldap attributes will
be mapped to attributes on an Ecore based model object
(Just a pojo really supported by the Eclipse EMF API).

The same Ecore model is used to generate the Hibernate
mapping (Preferably using annotations).

Then it's done. Hibernate takes care of the rest.

I still need to put this in complete context, adding
in Alex's thoughts on virtual directories, etc.

That's a little off though...after the rpm mojo gets
done.

Cheers,
- Ole



--- Ersin Er <ersin.er-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

> Hi all,
>
> I need some advice on implementing a partition for
> ADS based on the
> relational model and using SQL or Hibernate or JPA,
> or framework like
> them..
>
> First of all, is this realistic? Can we reach a
> usable result?
>
> How can we map Attributes to SQL model? Should we
> hold Attribute
> Values in blobs?
>
> Can we leverage the power of SQL SELECT for LDAP
> search operations?
>
> How much of the partition code in ADS can be used
> for this task?
>
> And please share any more ideas you have.
>
> Thanks in advance.
>
> --
> Ersin
>





________________________________________
________________________________________
____
Low, Low, Low Rates! Check out Yahoo! Messenger's cheap PC-to-Phone call rates
(http://voice.yahoo.com)


Emmanuel Lecharny

2006-11-02, 1:11 pm

On 11/2/06, Alex Karasulu <aok123-Bdlq13kUjeyLZ21kGMrzwg@public.gmane.org> wrote:
>
> Ersin Er wrote:
> <snip/>> How can we map Attributes to SQL model?
>
> There are probably a few ways to do this but some will be much faster
> however the faster it is the uglier it will be.
>
> One way is to have one big table with the following columns:
>
> 1). ENTRY (BLOB)
> 2). NDN (VARCHAR)
> 3). UPDN (VARCHAR)
> 4). ID (INTEGER)



Well, from a RDBMS point of view, I think that a correct structure will be
something like :
A first table, ENTRY_T, with those columns :

DN : varchar (but may be an id, which will refer to another table)
ATTR : attribute name, varchar

and an ATTRIBUTES_T table :
DN : varchar
ATTR : varchar
VALUE : blob

The idea is that yoiu will set index on those table, so you don't need
anymore to declare NDN.
For instance, if you want to get all attribute values for an entry, then the
request will looks like :
select DN, ATTR, VALUES from ATTRIBUTES_T where DN = %dn%

(%dn% stand for the DN you are looking for)

Now, if you want all the entries which cn = ACME, the request will be :

select DN, ATTR, VALUES from ATTRIBUTES_T where DN in (select DN from
ATTRIBUTES_T where ATTR = 'cn' and VALUE = 'ACME')

Just set the correct index to have good performances !

(this is just a first approach, we need to improve it a _lot_)

I have put some thought related to backend organization here :
http://docs.safehaus.org/display/APACHEDS/Backend
but it needs to be further a lot !

You can lookup entries that are blobs this way by normalized (NDN) and
> user provided distinguished names (UPDN) as well as by ID.
>
> If you want to index a specific attribute use some DDL to add a new
> COLUMN to this table. That column should be the name of the attribute
> being (LDAP not DB) indexed. Do a full table scan the first time and
> populate this new "index" COLUMN with the values of the attribute.
>
> Handling queries now is not that complex. Basically you need to
> determine which attributes you have indices on and which you don't.
> Then do a query to select and narrow down the rows that you'll have to
> resusitate the entry from the blob from.
>
> You might need another table for an existance index too. The EXISTANCE
> table might have a ATTRIBUTE column, and ID column. If a record exists
> in this table for an attribute your blobed entry then has a value for
> this attribute.
>
> Should we hold Attribute
>
> You will need to hold the entry in a blob.



Well, you have two options : varchar for simple and limited entries (but
varchar can't be larger than, say, 256 chars, which may become a problem, or
blobs, for binary elements or big chars. That's a pitty because blobs sucks
when you want to set index on them.

> Can we leverage the power of SQL SELECT for LDAP search operations?
>
> Sure. You just need to know how to build the WHERE clause of SQL using
> this simple schema.
>
>
> Not much.



Yes, this will be really one of our biggest problem.

Alex
>
>



--
Cordialement,
Emmanuel Lécharny

Emmanuel Lecharny

2006-11-02, 1:11 pm

Hi Ole,

On 11/2/06, Ole Ersoy <ole_ersoy-/E1597aS9LQAvxtiuMwx3w@public.gmane.org> wrote:
>
> Hi Ersin,
>
> <snip/>
> Personally I need to be able to go between an LDAP
> format, XML Schema format, and a relational format.


This will be possible really soon, at least from ldap to XML. Pam is
writting a DSML codec, so you will be able to extract data from LDAP and put
them in XML format (using DSML V2). You will also be able to send data to a
ldap server using DSML.


I'm planning on going with need a 4th representation
> of
> those 3, which will be Eclipse EMF Ecore (A subset of
> OMG's meta object facility).
>
> That way ecore is the common denominator.



At this point, I really think that DSML might be the perfect pivot
description, because LDIF is only representing data while DSML can express
operations. For instance, if you want to write a LDAP proxy, you can do that
with DSML requests and response (and the LdapStudio ldap browser is working
this way, so, yes, this is possible . I don't think we need a 4th
description meta-stuff KISS, bros !

So the initial thinking is that ldap attributes will
> be mapped to attributes on an Ecore based model object
> (Just a pojo really supported by the Eclipse EMF API).
>
> The same Ecore model is used to generate the Hibernate
> mapping (Preferably using annotations).
>
> Then it's done. Hibernate takes care of the rest.



Well, this is a vision which is not taking into account the performance
issues you will have if you do that. I may be wrong - and I hope to be,
because this seems to be _so_ simple that I really want it to work -, but as
an old programmer, I don't believe in god or in 'snap your finger and the
tool will do the rest' thingy ... Yeah, I'm an agnostic old fart ;) Make me
believe in it : I want a working sample !

I still need to put this in complete context, adding
> in Alex's thoughts on virtual directories, etc.
>
> That's a little off though...after the rpm mojo gets
> done.



Yeah, that's seems pragmatic Thanks for the work done on RPM, man !

Cheers,
> - Ole



Emmanuel

David Boreham

2006-11-02, 1:11 pm

Ersin Er wrote:

> I need some advice on implementing a partition for ADS based on the
> relational model and using SQL or Hibernate or JPA, or framework like
> them..


First the $64,000 question : WHY ?

> First of all, is this realistic? Can we reach a usable result?


Yes, but experience shows that it's typlically not worth the trouble.
There are two common reasons for wanting such a thing:

1. 'Datastore envy' : 'I want all my data in Oracle' (because Larry says
so).

2. Adapting existing data (hey, all our HR stuff is in an Oracle database
underneath Peoplesoft, let's expose that using LDAP).

The trouble with #1 is that once whoever it is that's asking
is told the cost and hassle involved vs just using a perfectly
working LDAP server that already exists, they tend to forget
their datastore envy.

The trouble with #2 is that it turns into an object relational mapping
science project. Very hard to say in advance what kinds of mapping
are needed without seeing the use cases. So it tends to deflate into
'well we can write some custom hack for each individual customer'
and 'hmm...syncing the data using a metadirectory solution is much
easier'.

> Can we leverage the power of SQL SELECT for LDAP search operations?


The simplest way to do it is to construct tables that look just like the
b-tree relations used in a custom LDAP data store. However this doesn't
goal achieve #2 above.

There have been some successful LDAP server products that
_only_ used the relational database store technique : IBM had one
and so did(does?) Oracle.



Ersin Er

2006-11-02, 1:11 pm

Hi!

On 11/2/06, Alex Karasulu <aok123-Bdlq13kUjeyLZ21kGMrzwg@public.gmane.org> wrote:
> Ersin Er wrote:
>
> Ok first off you need to better define exactly what you are trying to
> achieve.
>
> In my mind you might be asking to do 2 separate things:
>
> 1). Build a generic backend that backs data within a relational database
> using JDBC and has a fixed custom schema for storing and querying LDAP
> data.
>
> 2). Build a flexible backend that can map any relational database schema
> to an LDAP schema and namespace. This is more like what is done with a
> virtual directory.
>
> I will presume below you are referring to #1 and answer your questions.


Yes the first one.

>
> There are probably a few ways to do this but some will be much faster
> however the faster it is the uglier it will be.
>
> One way is to have one big table with the following columns:
>
> 1). ENTRY (BLOB)
> 2). NDN (VARCHAR)
> 3). UPDN (VARCHAR)
> 4). ID (INTEGER)
>
> You can lookup entries that are blobs this way by normalized (NDN) and
> user provided distinguished names (UPDN) as well as by ID.
>
> If you want to index a specific attribute use some DDL to add a new
> COLUMN to this table. That column should be the name of the attribute
> being (LDAP not DB) indexed. Do a full table scan the first time and
> populate this new "index" COLUMN with the values of the attribute.
>
> Handling queries now is not that complex. Basically you need to
> determine which attributes you have indices on and which you don't.
> Then do a query to select and narrow down the rows that you'll have to
> resusitate the entry from the blob from.


What if we do not have an index on an attribute? Pull all entries?

> You might need another table for an existance index too. The EXISTANCE
> table might have a ATTRIBUTE column, and ID column. If a record exists
> in this table for an attribute your blobed entry then has a value for
> this attribute.
>
> Should we hold Attribute
>
> You will need to hold the entry in a blob.
>
>
> Sure. You just need to know how to build the WHERE clause of SQL using
> this simple schema.
>
>
> Not much.
>
> Alex
>
>


Thanks. Let me move on to other messages on the thread also

--
Ersin

Ersin Er

2006-11-02, 1:11 pm

SGksCgpPbiAxMS8yLzA2LCBFbW1hbnVlbCBMZWNo
YXJueSA8ZWxlY2hhcm55QGdtYWlsLmNvbT4g
d3JvdGU6Cj4KPiBPbiAxMS8yLzA2LCBBbGV4IEth
cmFzdWx1IDxhb2sxMjNAYmVsbHNvdXRoLm5l
dD4gd3JvdGU6Cj4gPiBFcnNpbiBFciB3cm90ZToK
PiA+IDxzbmlwLz4+IEhvdyBjYW4gd2UgbWFw
IEF0dHJpYnV0ZXMgdG8gU1FMIG1vZGVsPwo+ID4K
PiA+IFRoZXJlIGFyZSBwcm9iYWJseSBhIGZl
dyB3YXlzIHRvIGRvIHRoaXMgYnV0IHNvbWUgd2ls
bCBiZSBtdWNoIGZhc3Rlcgo+ID4gaG93ZXZl
ciB0aGUgZmFzdGVyIGl0IGlzIHRoZSB1Z2xpZXIg
aXQgd2lsbCBiZS4KPiA+Cj4gPiBPbmUgd2F5
IGlzIHRvIGhhdmUgb25lIGJpZyB0YWJsZSB3aXRo
IHRoZSBmb2xsb3dpbmcgY29sdW1uczoKPiA+
Cj4gPiAxKS4gRU5UUlkgKEJMT0IpCj4gPiAyKS4g
TkROIChWQVJDSEFSKQo+ID4gMykuIFVQRE4g
KFZBUkNIQVIpCj4gPiA0KS4gSUQgKElOVEVHRVIp
Cj4KPiBXZWxsLCBmcm9tIGEgUkRCTVMgcG9p
bnQgb2YgdmlldywgSSB0aGluayB0aGF0IGEgY29y
cmVjdCBzdHJ1Y3R1cmUgd2lsbCBiZQo+IHNv
bWV0aGluZyBsaWtlICA6Cj4gQSBmaXJzdCB0YWJs
ZSwgRU5UUllfVCwgd2l0aCB0aG9zZSBjb2x1
bW5zIDoKPgo+IEROIDogdmFyY2hhciAoYnV0IG1h
eSBiZSBhbiBpZCwgd2hpY2ggd2lsbCByZWZl
ciB0byBhbm90aGVyIHRhYmxlKQo+IEFUVFIgOiBh
dHRyaWJ1dGUgbmFtZSwgdmFyY2hhcgo+Cj4g
YW5kIGFuIEFUVFJJQlVURVNfVCB0YWJsZSA6Cj4g
RE4gOiB2YXJjaGFyCj4gQVRUUiA6IHZhcmNo
YXIKPiBWQUxVRSA6IGJsb2IKClRoaXMgd2FzIHRo
ZSBzY2hlbWEgSSB0aG91Z2h0LiBCdXQgSSBh
bSByZWFsbHkgbm90IHN1cmUuIFRoaXMKZGlzY3Vz
c2lvbiBpcyBnb2luZyBpbnRlcmVzdGluZyBh
bmQgdGhhdCdzIGdvb2QuCgo+IFRoZSBpZGVhIGlz
IHRoYXQgeW9pdSB3aWxsIHNldCBpbmRleCBv
biB0aG9zZSB0YWJsZSwgc28geW91IGRvbid0IG5l
ZWQKPiBhbnltb3JlIHRvIGRlY2xhcmUgTkRO
Lgo+IEZvciBpbnN0YW5jZSwgaWYgeW91IHdhbnQg
dG8gZ2V0IGFsbCBhdHRyaWJ1dGUgdmFsdWVz
IGZvciBhbiBlbnRyeSwgdGhlbiB0aGUKPiByZXF1
ZXN0IHdpbGwgbG9va3MgbGlrZSA6Cj4gc2Vs
ZWN0IEROLCBBVFRSLCBWQUxVRVMgZnJvbSBBVFRS
SUJVVEVTX1Qgd2hlcmUgRE4gPSAlZG4lCj4K
PiAoJWRuJSBzdGFuZCBmb3IgdGhlIEROIHlvdSBh
cmUgbG9va2luZyBmb3IpCj4KPiBOb3csIGlm
IHlvdSB3YW50IGFsbCB0aGUgZW50cmllcyB3aGlj
aCBjbiA9IEFDTUUsIHRoZSByZXF1ZXN0IHdp
bGwgYmUgOgo+Cj4gc2VsZWN0IEROLCBBVFRSLCBW
QUxVRVMgZnJvbSBBVFRSSUJVVEVTX1Qgd2hl
cmUgRE4gaW4gKHNlbGVjdCBETiBmcm9tCj4gQVRU
UklCVVRFU19UIHdoZXJlIEFUVFIgPSAnY24n
IGFuZCBWQUxVRSA9ICdBQ01FJykKPgo+IEp1c3Qg
c2V0IHRoZSBjb3JyZWN0IGluZGV4IHRvIGhh
dmUgZ29vZCBwZXJmb3JtYW5jZXMgIQo+Cj4gKHRo
aXMgaXMganVzdCBhIGZpcnN0IGFwcHJvYWNo
LCB3ZSBuZWVkIHRvIGltcHJvdmUgaXQgYSBfbG90
XykKPgo+IEkgaGF2ZSBwdXQgc29tZSB0aG91
Z2h0IHJlbGF0ZWQgdG8gYmFja2VuZCBvcmdhbml6
YXRpb24gaGVyZSA6Cj4gaHR0cDovL2RvY3Mu
c2FmZWhhdXMub3JnL2Rpc3BsYXkvQVBBQ0hFRFMv
QmFja2VuZAo+IGJ1dCBpdCBuZWVkcyB0byBi
ZSBmdXJ0aGVyIGEgbG90ICEKPgo+ID4gWW91IGNh
biBsb29rdXAgZW50cmllcyB0aGF0IGFyZSBi
bG9icyB0aGlzIHdheSBieSBub3JtYWxpemVkIChO
RE4pIGFuZAo+ID4gdXNlciBwcm92aWRlZCBk
aXN0aW5ndWlzaGVkIG5hbWVzIChVUEROKSBhcyB3
ZWxsIGFzIGJ5IElELgo+ID4KPiA+IElmIHlv
dSB3YW50IHRvIGluZGV4IGEgc3BlY2lmaWMgYXR0
cmlidXRlIHVzZSBzb21lIERETCB0byBhZGQg
YSBuZXcKPiA+IENPTFVNTiB0byB0aGlzIHRhYmxl
LiAgVGhhdCBjb2x1bW4gc2hvdWxkIGJlIHRo
ZSBuYW1lIG9mIHRoZSBhdHRyaWJ1dGUKPiA+IGJl
aW5nIChMREFQIG5vdCBEQikgaW5kZXhlZC4g
IERvIGEgZnVsbCB0YWJsZSBzY2FuIHRoZSBmaXJz
dCB0aW1lIGFuZAo+ID4gcG9wdWxhdGUgdGhp
cyBuZXcgImluZGV4IiBDT0xVTU4gd2l0aCB0aGUg
dmFsdWVzIG9mIHRoZSBhdHRyaWJ1dGUuCj4g
Pgo+ID4gSGFuZGxpbmcgcXVlcmllcyBub3cgaXMg
bm90IHRoYXQgY29tcGxleC4gIEJhc2ljYWxs
eSB5b3UgbmVlZCB0bwo+ID4gZGV0ZXJtaW5lIHdo
aWNoIGF0dHJpYnV0ZXMgeW91IGhhdmUgaW5k
aWNlcyBvbiBhbmQgd2hpY2ggeW91IGRvbid0Lgo+
ID4gVGhlbiBkbyBhIHF1ZXJ5IHRvIHNlbGVj
dCBhbmQgbmFycm93IGRvd24gdGhlIHJvd3MgdGhh
dCB5b3UnbGwgaGF2ZSB0bwo+ID4gcmVzdXNp
dGF0ZSB0aGUgZW50cnkgZnJvbSB0aGUgYmxvYiBm
cm9tLgo+ID4KPiA+IFlvdSBtaWdodCBuZWVk
IGFub3RoZXIgdGFibGUgZm9yIGFuIGV4aXN0YW5j
ZSBpbmRleCB0b28uICBUaGUgRVhJU1RBTkNF
Cj4gPiB0YWJsZSBtaWdodCBoYXZlIGEgQVRUUklC
VVRFIGNvbHVtbiwgYW5kIElEIGNvbHVtbi4g
IElmIGEgcmVjb3JkIGV4aXN0cwo+ID4gaW4gdGhp
cyB0YWJsZSBmb3IgYW4gYXR0cmlidXRlIHlv
dXIgYmxvYmVkIGVudHJ5IHRoZW4gaGFzIGEgdmFs
dWUgZm9yCj4gPiB0aGlzIGF0dHJpYnV0ZS4K
PiA+Cj4gPiBTaG91bGQgd2UgaG9sZCBBdHRyaWJ1
dGUKPiA+ID4gVmFsdWVzIGluIGJsb2JzPwo+
ID4KPiA+IFlvdSB3aWxsIG5lZWQgdG8gaG9sZCB0
aGUgZW50cnkgaW4gYSBibG9iLgo+Cj4gIFdl
bGwsIHlvdSBoYXZlIHR3byBvcHRpb25zIDogdmFy
Y2hhciBmb3Igc2ltcGxlIGFuZCBsaW1pdGVk
IGVudHJpZXMgKGJ1dAo+IHZhcmNoYXIgY2FuJ3Qg
YmUgbGFyZ2VyIHRoYW4sIHNheSwgMjU2IGNo
YXJzLCB3aGljaCBtYXkgYmVjb21lIGEgcHJvYmxl
bSwgb3IKPiBibG9icywgZm9yIGJpbmFyeSBl
bGVtZW50cyBvciBiaWcgY2hhcnMuIFRoYXQncyBh
IHBpdHR5IGJlY2F1c2UgYmxvYnMgc3Vja3MK
PiB3aGVuIHlvdSB3YW50IHRvIHNldCBpbmRleCBv
biB0aGVtLgo+Cj4gPiA+IENhbiB3ZSBsZXZl
cmFnZSB0aGUgcG93ZXIgb2YgU1FMIFNFTEVDVCBm
b3IgTERBUCBzZWFyY2ggb3BlcmF0aW9ucz8K
PiA+Cj4gPiBTdXJlLiAgWW91IGp1c3QgbmVlZCB0
byBrbm93IGhvdyB0byBidWlsZCB0aGUgV0hF
UkUgY2xhdXNlIG9mIFNRTCB1c2luZwo+ID4gdGhp
cyBzaW1wbGUgc2NoZW1hLgo+ID4KPiA+ID4g
SG93IG11Y2ggb2YgdGhlIHBhcnRpdGlvbiBjb2Rl
IGluIEFEUyBjYW4gYmUgdXNlZCBmb3IgdGhp
cyB0YXNrPwo+ID4KPiA+IE5vdCBtdWNoLgo+Cj4g
WWVzLCB0aGlzIHdpbGwgYmUgcmVhbGx5ICBv
bmUgb2Ygb3VyIGJpZ2dlc3QgcHJvYmxlbS4KPgo+
ID4gQWxleAo+ID4KPiA+Cj4KPgo+Cj4gLS0K
PiBDb3JkaWFsZW1lbnQsCj4gIEVtbWFudWVsIEzD
qWNoYXJueQoKV2UgbmVlZCB0byBkb2N1bWVu
dCBvdXIgaWRlYXMuIEkgdGhpbmsgd2Ugd2lsbCBy
ZWFjaCBhIGdvb2QgcG9pbnQKc29vbi4gT3Ig
d2UnbGwgc2VlIGl0IGFmdGVyIGltcGxlbWVudGF0
aW9uLgoKVGhhbmtzIQoKLS0gCkVyc2luCg==

Ersin Er

2006-11-02, 1:11 pm

Hi,

On 11/2/06, David Boreham <david_list-a7e5zrL3AIdAfugRpC6u6w@public.gmane.org> wrote:
> Ersin Er wrote:
>
>
> First the $64,000 question : WHY ?


Well, this is just a test currently. It's a software engineering
project for two students to make them familiar with all these (ldap,
rdbms, ADS, pragmatic tools, etc.) stuff. It can also be thought as a
DB research project to learn what the best way of storing directory
data on RDBMS is.

>
> Yes, but experience shows that it's typlically not worth the trouble.
> There are two common reasons for wanting such a thing:
>
> 1. 'Datastore envy' : 'I want all my data in Oracle' (because Larry says
> so).
>
> 2. Adapting existing data (hey, all our HR stuff is in an Oracle database
> underneath Peoplesoft, let's expose that using LDAP).
>
> The trouble with #1 is that once whoever it is that's asking
> is told the cost and hassle involved vs just using a perfectly
> working LDAP server that already exists, they tend to forget
> their datastore envy.
>
> The trouble with #2 is that it turns into an object relational mapping
> science project. Very hard to say in advance what kinds of mapping
> are needed without seeing the use cases. So it tends to deflate into
> 'well we can write some custom hack for each individual customer'
> and 'hmm...syncing the data using a metadirectory solution is much
> easier'.
>
>
> The simplest way to do it is to construct tables that look just like the
> b-tree relations used in a custom LDAP data store. However this doesn't
> goal achieve #2 above.


Can you explain this more?

>
> There have been some successful LDAP server products that
> _only_ used the relational database store technique : IBM had one
> and so did(does?) Oracle.


Thanks.

--
Ersin

Emmanuel Lecharny

2006-11-02, 1:11 pm

On 11/2/06, David Boreham <david_list-a7e5zrL3AIdAfugRpC6u6w@public.gmane.org> wrote:
>
> Ersin Er wrote:
>
>
> First the $64,000 question : WHY ?



If it woth 64 000$, then it's a BECAUSE Otherwise, there are many good
other reasons beside being greedy :
- SQL databases are reliable, when jdbm database is not
- SQL databases have a _lot_ of tools, when we don't have any - or close to
any
- SQL Database support transactions, and it's good to have, because we don't
support them...
- SQL Database can be replicated
- SQL Database can be stored on a SAN or a cluster easily
- There are a lot of addon like Hibernate to do the mapping on SQL database
- Some customer want trustable storage. Oraacle is trustable (well, this is
questionnable... A system is as string as its weakest element (man ?)
- And, so far, database are quite fast. IBM IDS is using DB2, I have seen it
running with 70 000 000 entries, and it was fast enough for our needs...

> First of all, is this realistic? Can we reach a usable result?
>
> Yes, but experience shows that it's typlically not worth the trouble.
> There are two common reasons for wanting such a thing:
>
> 1. 'Datastore envy' : 'I want all my data in Oracle' (because Larry says
> so).
>
> 2. Adapting existing data (hey, all our HR stuff is in an Oracle database
> underneath Peoplesoft, let's expose that using LDAP).



I never so that in real life. Generally, what I saw is people using
meta-directory or tools to export data from HR base to ldif, and import. All
that done around 3am, with manual restoration and correction at 9am, in a
fury of urgence ;) (remember, the weakest element ...)

The trouble with #1 is that once whoever it is that's asking
> is told the cost and hassle involved vs just using a perfectly
> working LDAP server that already exists, they tend to forget
> their datastore envy.



yes, but here, we are much more in a political environment. The choice of a
backend will be driven but sells, and Oracle representatives are pretty
impressive when it's come to explain companies that "oracle can do it all".
So they might chose another ldap server (well, this is a weak demonstration,
I know . Ok, you are right, if it works smoothly with jdbm, people won't
care about having a RDBMS backend !

The trouble with #2 is that it turns into an object relational mapping
> science project. Very hard to say in advance what kinds of mapping
> are needed without seeing the use cases. So it tends to deflate into
> 'well we can write some custom hack for each individual customer'
> and 'hmm...syncing the data using a metadirectory solution is much
> easier'.



I don't really think that the RDBMS mapping will be that bad. We already
have to build a correct schema for a b-tree implementation, so it's just a
question of remapping it to relational schema. Not necesserally a big deal.

> Can we leverage the power of SQL SELECT for LDAP search operations?
>
> The simplest way to do it is to construct tables that look just like the
> b-tree relations used in a custom LDAP data store. However this doesn't
> goal achieve #2 above.



yukkk ! May be the worst solution ! The vast majority of b-trees used in
jdbm should simply be removed as they are exactly replaced by RDBMS index.
The main problem is to build the DN tree correctly, and that's it.

There have been some successful LDAP server products that
> _only_ used the relational database store technique : IBM had one
> and so did(does?) Oracle.



yes, and IBM is quite a good Ldap server, even if very heavy one

However, thanks for the insight, David, I pretty agree with you on some of
your points. In my mind, RDBMS backend is just a question of using
mainstream, because people want to use mainstream techno...

Emmanuel


--
Cordialement,
Emmanuel Lécharny

Ole Ersoy

2006-11-02, 1:11 pm

Hey Emmanuel,

How did you get the sexy vertical lines in the reply?
Way better than all the >>>

Are you using thunderbird?


--- Emmanuel Lecharny <elecharny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

> Hi Ole,
>
> On 11/2/06, Ole Ersoy <ole_ersoy-/E1597aS9LQAvxtiuMwx3w@public.gmane.org> wrote:
> format.
>
> This will be possible really soon, at least from
> ldap to XML. Pam is
> writting a DSML codec, so you will be able to
> extract data from LDAP and put
> them in XML format (using DSML V2). You will also be
> able to send data to a
> ldap server using DSML.




That's really sweet.
Here's one of the use cases I want to support for
generating applications (Note that for virtual
directories there will probably at least another use
case):

A)
First create a model that gets translated into Ecore
(XMI). This is the Model part of an MVC architecture,
but it's generic and can map to XML, Relational, LDAP,
whattevva,..etc.
B)
Translate the Ecore XMI into a Java model (Beans) for
in Memory Persistance (Incidentally Ecore is also used
to Generate Service Data Objects (SDO) - The purpose
of SDO is to unify data programming...Like JDO...but
using Ecore Meta Data as the driver....
C)
Generate the Schema for the persistance solution used
(Relational, LDAP, xml, whatevva)
Note that with Hibernate if you generate the hibernate
configuration file, Hibernate can then generate the
Relational Schema for you. There's already projects
doing this that can be found in the Eclipse EMF
Corner...on the Eclipse EMF site.

So DSML is cool because if Ecore can be used to
generate DSML and then PAM can be used as the driver
to go from DSML to LDIF and back...which means the
result can be imported back into a Ecore generated
Java model...if I understand correctly.

I hope to have some samples worked up soon, just gotta
get this JPackage plugin out the do!

Cheers,
- Ole


>
>
> I'm planning on going with need a 4th representation
> subset of
>
>
> At this point, I really think that DSML might be the
> perfect pivot
> description, because LDIF is only representing data
> while DSML can express
> operations. For instance, if you want to write a
> LDAP proxy, you can do that
> with DSML requests and response (and the LdapStudio
> ldap browser is working
> this way, so, yes, this is possible . I don't
> think we need a 4th
> description meta-stuff KISS, bros !
>
> So the initial thinking is that ldap attributes will
> object
> API).
> Hibernate
>
>
> Well, this is a vision which is not taking into
> account the performance
> issues you will have if you do that. I may be wrong
> - and I hope to be,
> because this seems to be _so_ simple that I really
> want it to work -, but as
> an old programmer, I don't believe in god or in
> 'snap your finger and the
> tool will do the rest' thingy ... Yeah, I'm an
> agnostic old fart ;) Make me
> believe in it : I want a working sample !
>
> I still need to put this in complete context, adding
> gets
>
>
> Yeah, that's seems pragmatic Thanks for the work
> done on RPM, man !
>
> Cheers,
>
>
> Emmanuel
>





________________________________________
________________________________________
____
Low, Low, Low Rates! Check out Yahoo! Messenger's cheap PC-to-Phone call rates
(http://voice.yahoo.com)


Emmanuel Lecharny

2006-11-02, 1:11 pm

On 11/2/06, Ole Ersoy <ole_ersoy-/E1597aS9LQAvxtiuMwx3w@public.gmane.org> wrote:
>
> Hey Emmanuel,
>
> How did you get the sexy vertical lines in the reply?
> Way better than all the >>>



gmail do it for me

Are you using thunderbird?


yes, but not in office.

--- Emmanuel Lecharny <elecharny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>
>
>
>
> That's really sweet.
> Here's one of the use cases I want to support for
> generating applications (Note that for virtual
> directories there will probably at least another use
> case):
>
> A)
> First create a model that gets translated into Ecore
> (XMI). This is the Model part of an MVC architecture,
> but it's generic and can map to XML, Relational, LDAP,
> whattevva,..etc.
> B)
> Translate the Ecore XMI into a Java model (Beans) for
> in Memory Persistance (Incidentally Ecore is also used
> to Generate Service Data Objects (SDO) - The purpose
> of SDO is to unify data programming...Like JDO...but
> using Ecore Meta Data as the driver....
> C)
> Generate the Schema for the persistance solution used
> (Relational, LDAP, xml, whatevva)
> Note that with Hibernate if you generate the hibernate
> configuration file, Hibernate can then generate the
> Relational Schema for you. There's already projects
> doing this that can be found in the Eclipse EMF
> Corner...on the Eclipse EMF site.
>
> So DSML is cool because if Ecore can be used to
> generate DSML and then PAM can be used as the driver
> to go from DSML to LDIF and back...which means the
> result can be imported back into a Ecore generated
> Java model...if I understand correctly.
>
> I hope to have some samples worked up soon, just gotta
> get this JPackage plugin out the do!
>
> Cheers,
> - Ole
>
>
>
>
>
>
>
> ________________________________________
________________________________________
____
> Low, Low, Low Rates! Check out Yahoo! Messenger's cheap PC-to-Phone call
> rates
> (http://voice.yahoo.com)
>
>



--
Cordialement,
Emmanuel Lécharny

Stefan Zoerner

2006-11-02, 1:11 pm

Ersin Er wrote:
> I need some advice on implementing a partition for ADS based on the
> relational model and using SQL or Hibernate or JPA, or framework like
> them..
>
> First of all, is this realistic? Can we reach a usable result?
>


Just a little note (you probably already know this) ...

IBM Tivoli Directory Server (one of the other certified servers besides
ApacheDS) uses DB2 as its backend. So at least they where able to create
a server which is functional (I know several large, distributed
implementations with this product).

Greetings, Stefan




Emmanuel Lecharny

2006-11-02, 1:11 pm

Hey, Stefan !

yeah, I saw IDS used with a 70 000 000 entries database. Pretty impressive


Emmanuel

On 11/2/06, Stefan Zoerner <stefan-EQq9qWhC7IA@public.gmane.org> wrote:
>
> Ersin Er wrote:
>
> Just a little note (you probably already know this) ...
>
> IBM Tivoli Directory Server (one of the other certified servers besides
> ApacheDS) uses DB2 as its backend. So at least they where able to create
> a server which is functional (I know several large, distributed
> implementations with this product).
>
> Greetings, Stefan
>
>
>
>



--
Cordialement,
Emmanuel Lécharny

Stefan Zoerner

2006-11-02, 1:11 pm

David Boreham wrote:
> Ersin Er wrote:
>
>
> First the $64,000 question : WHY ?
>


At least it would be a nice example on how to write your own partition
implementation. If documented right, it could perfectly work as a tutorial.

Greetings, Stefan


Ole Ersoy

2006-11-02, 1:11 pm

Incidentally IBM and BEA wrote the SDO specification,
so there's a good chance they are using that for as
the integration technology.

--- Emmanuel Lecharny <elecharny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

> Hey, Stefan !
>
> yeah, I saw IDS used with a 70 000 000 entries
> database. Pretty impressive
>
>
> Emmanuel
>
> On 11/2/06, Stefan Zoerner <stefan-EQq9qWhC7IA@public.gmane.org> wrote:
> for ADS based on the
> JPA, or framework like
> usable result?
> this) ...
> certified servers besides
> they where able to create
> large, distributed
>
>
> --
> Cordialement,
> Emmanuel Lécharny
>





________________________________________
________________________________________
____
Access over 1 million songs - Yahoo! Music Unlimited
(http://music.yahoo.com/unlimited)


Alex Karasulu

2006-11-02, 1:11 pm

Ersin Er wrote:
> Hi,
>
> On 11/2/06, David Boreham <david_list-a7e5zrL3AIdAfugRpC6u6w@public.gmane.org> wrote:
>
> Well, this is just a test currently. It's a software engineering
> project for two students to make them familiar with all these (ldap,
> rdbms, ADS, pragmatic tools, etc.) stuff. It can also be thought as a
> DB research project to learn what the best way of storing directory
> data on RDBMS is.
>
>
> Can you explain this more?


Basically he's saying model the db like you do the jdbm tables in the
ldbm rip off we use for the default backing store.

Meaning you have a master table, an id2dn table, and so on just like
using jdbm tables.

>
> Thanks.
>



Emmanuel Lecharny

2006-11-02, 7:11 pm

>
>
> the
>
> Basically he's saying model the db like you do the jdbm tables in the
> ldbm rip off we use for the default backing store.
>
> Meaning you have a master table, an id2dn table, and so on just like
> using jdbm tables.
>


If we go to RDBMS, this would be the worst approach. It is suppose to be a
relationnal model, not an hierarchical model mapped on a relationnal model.
Performance will be awfull

Emmanuel

David Boreham

2006-11-02, 7:11 pm


>
>
> Basically he's saying model the db like you do the jdbm tables in the
> ldbm rip off we use for the default backing store.


Right, there's a schema impedance mismatch. Goal #2 would want to take some
random table that has columns for 'employee ID' , 'first name', 'last
name' and 'manager' for example
and make that stuff show up as LDAP entries. Using the RDBMS mapping
approach outlined above
the tables wouldn't look at all like that. They'd all have only two
columns for example. Both
columns would contain data that to a first approximation would be
gibberish to a regular
RDBMS client application.






David Boreham

2006-11-02, 7:11 pm

Emmanuel Lecharny wrote:

> If it woth 64 000$, then it's a BECAUSE Otherwise, there are many
> good other reasons beside being greedy :
> - SQL databases are reliable, when jdbm database is not


Well, one might argue that there are better reliable storage manager
choices than a client/server RDBMS.

> - SQL databases have a _lot_ of tools, when we don't have any - or
> close to any


True, but not necessarily a good thing ;)

> - SQL Database support transactions, and it's good to have, because we
> don't support them...


See item #1 above.

> - SQL Database can be replicated


Sometimes, although the style of replication may not suit the directory
application.

> - SQL Database can be stored on a SAN or a cluster easily


True, but a non-feature for a directory service with its own replication.

> - There are a lot of addon like Hibernate to do the mapping on SQL
> database
> - Some customer want trustable storage. Oraacle is trustable (well,
> this is questionnable... A system is as string as its weakest element
> (man ?)


Yeah, this is the 'data store envy' argument.

> - And, so far, database are quite fast. IBM IDS is using DB2, I have
> seen it running with 70 000 000 entries, and it was fast enough for
> our needs...


So they fixed the 2Gbyte table size limit then ;)



Alex Karasulu

2006-11-02, 7:11 pm

Emmanuel Lecharny wrote:
>
> like the
> doesn't
>
> Basically he's saying model the db like you do the jdbm tables in the
> ldbm rip off we use for the default backing store.
>
> Meaning you have a master table, an id2dn table, and so on just like
> using jdbm tables.
>
>
> If we go to RDBMS, this would be the worst approach. It is suppose to be
> a relationnal model, not an hierarchical model mapped on a relationnal
> model. Performance will be awfull


You're right but there is no other distinctly different approach to have
a generalized approach to storing any kind of entry in a RDBMS. Whether
you use one big table or many little two column tables you're still
going to have a mess and inefficency.

One of the reasons I wanted to stuff it all into a single table was to
avoid the join overhead. But by far the extra network latency will kill
performance way before that.

I've got to agree with David and say that any RDBMS backed backend is
not going to perform as well as a btree based implementation. This is
primarily due to the extra layer of latency you're imposing. Caching is
also an option. But at this point you might as well stop and build a
virtual directory instead. That's energy well spent.

Alex

David Boreham

2006-11-02, 7:11 pm

Ersin Er wrote:

> Well, this is just a test currently. It's a software engineering
> project for two students to make them familiar with all these (ldap,
> rdbms, ADS, pragmatic tools, etc.) stuff. It can also be thought as a
> DB research project to learn what the best way of storing directory
> data on RDBMS is.


Ah, ok. So it really _is_ a science project ;)

Then I think you're fully entitled to go try it out.
Interesting exercise.



Ersin Er

2006-11-13, 8:09 am

http://www.amazon.com/Hierarchies-S...s/dp/1558609202

:-D

On 11/2/06, Alex Karasulu <aok123-Bdlq13kUjeyLZ21kGMrzwg@public.gmane.org> wrote:
> Emmanuel Lecharny wrote:
>
> You're right but there is no other distinctly different approach to have
> a generalized approach to storing any kind of entry in a RDBMS. Whether
> you use one big table or many little two column tables you're still
> going to have a mess and inefficency.
>
> One of the reasons I wanted to stuff it all into a single table was to
> avoid the join overhead. But by far the extra network latency will kill
> performance way before that.
>
> I've got to agree with David and say that any RDBMS backed backend is
> not going to perform as well as a btree based implementation. This is
> primarily due to the extra layer of latency you're imposing. Caching is
> also an option. But at this point you might as well stop and build a
> virtual directory instead. That's energy well spent.
>
> Alex
>



--
Ersin

David Boreham

2006-11-13, 8:09 am


>

I missed a few iterations in this thread (been busy with the day job),
but some late thoughts:

1. Have you looked at what's inside a RDBMS engine ?
B-trees and query processing that's much the same as
the average directory server. So mapping the DS's b-tree
relations to tables (which are themselves implemented as
b-trees) is not imho an inefficient approach.

2. the dit hierarchy is really a non-problem for RDBMS
mapping -- after all DS'es that use b-tree storage managers
directly already have the same identical problem to solve.
the average DIT is not very bushy nor deep anyway.

3. If you are concerned about performance, don't use
a RDBMS. The approach already chosen for Apache DS
is the most performant (possibly needs some work, but
it's the right way to go for performance).

4. Attempting to 'really use' the relational data model for
directory entries takes us back to the previously mentioned
relational mapping science project. (Customer already
has a bunch of tables in Oracle, and we need to refect
those via LDAP) Certainly an interesting
field to study, but there's no obvious good way to solve
this problem that I know of. Virtual Directory and sync
(meta) type solutions have addressed this area for
years with a fair degree of success.









Ersin Er

2006-11-13, 8:09 am

On 11/12/06, David Boreham <david-Q2lcOYGAZuYPnHn3N7+5xA@public.gmane.org> wrote:
>
> I missed a few iterations in this thread (been busy with the day job),
> but some late thoughts:
>
> 1. Have you looked at what's inside a RDBMS engine ?
> B-trees and query processing that's much the same as
> the average directory server. So mapping the DS's b-tree
> relations to tables (which are themselves implemented as
> b-trees) is not imho an inefficient approach.


I am perfectly aware of these. BTW, we are not mapping DIT to RDBMS'
B-Trees. We are mapping DIT to Tables and they map to B-Trees. So of
course there will be an overhead.

> 2. the dit hierarchy is really a non-problem for RDBMS
> mapping -- after all DS'es that use b-tree storage managers
> directly already have the same identical problem to solve.
> the average DIT is not very bushy nor deep anyway.


I think hierarchy is a serious problem for RDBMS. You cannot select
all subordinate entries of an entry in a single search. This requires
a recursive search. Oracle has support for this
(http://www.adp-gmbh.ch/ora/sql/connect_by.html) but non-standard.

> 3. If you are concerned about performance, don't use
> a RDBMS. The approach already chosen for Apache DS
> is the most performant (possibly needs some work, but
> it's the right way to go for performance).
>
> 4. Attempting to 'really use' the relational data model for
> directory entries takes us back to the previously mentioned
> relational mapping science project. (Customer already
> has a bunch of tables in Oracle, and we need to refect
> those via LDAP) Certainly an interesting
> field to study, but there's no obvious good way to solve
> this problem that I know of. Virtual Directory and sync
> (meta) type solutions have addressed this area for
> years with a fair degree of success.


We are just trying to see what it costs. So we can say do not use
RDBMS but ApacheDS when it really fits your needs :-)

Thanks.

--
Ersin

Emmanuel Lecharny

2006-11-13, 8:09 am

On 11/11/06, David Boreham <david-Q2lcOYGAZuYPnHn3N7+5xA@public.gmane.org> wrote:
>
>
> relationnal
> I missed a few iterations in this thread (been busy with the day job),
> but some late thoughts:
>
> 1. Have you looked at what's inside a RDBMS engine ?
> B-trees and query processing that's much the same as
> the average directory server. So mapping the DS's b-tree
> relations to tables (which are themselves implemented as
> b-trees) is not imho an inefficient approach.



I'm in the business since 1988, and I did a RDBMS specialisation while I was
a student. I think that you didn't understood what I said (may be I wasn't
clear enough : that a _direct_ mapping of actual b-trees would be really
unefficient _because_ RDBMS already use b-trees for index.So you just can't
map JDBM structure to a RDBMS structure, it would induce creation of useless
tables (in fact, all index can be removed). At the end, what we could have
in a RDBMS would be something like 2 tables : one for entries, and one for
attribute values, final. Everything else will be index. (plus one table for
partitions, of course).

2. the dit hierarchy is really a non-problem for RDBMS
> mapping -- after all DS'es that use b-tree storage managers
> directly already have the same identical problem to solve.
> the average DIT is not very bushy nor deep anyway.



That's true.

3. If you are concerned about performance, don't use
> a RDBMS. The approach already chosen for Apache DS
> is the most performant (possibly needs some work, but
> it's the right way to go for performance).


I would bet 1000$ on this too, but only because you will have to request
the RDBMS system through a network layer while you have direct access to
JDBM. However, this is not a debate, just a feeling, because I never
compared in real life the performance of each approach.

4. Attempting to 'really use' the relational data model for
> directory entries takes us back to the previously mentioned
> relational mapping science project. (Customer already
> has a bunch of tables in Oracle, and we need to refect
> those via LDAP) Certainly an interesting
> field to study, but there's no obvious good way to solve
> this problem that I know of. Virtual Directory and sync
> (meta) type solutions have addressed this area for
> years with a fair degree of success.



I think you are mixing different concerns, like "Using a RDBMS as a backend"
and "Synchronizing different sources of data". The initial question (from
Ersin) was : "How can we implement a RDBMS as a backend for ADS". This is
something which is possible, and efficient, and IBM proved it. The second
point deserve another approach (Virtual Directory/Meta Directory is a
solution), but you started another thread for that, and this is a good thing






--
Cordialement,
Emmanuel Lécharny

Sponsored Links






Free braindumps | Software forum | Database administration forum

Copyright 2003 - 2008 webservertalk.com