|
Home > Archive > Unix Programming > June 2006 > Something like GDBM, but table...
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Something like GDBM, but table...
|
|
| William Park 2006-05-29, 5:32 pm |
| Is there a simple library package which manipulate table, where row is a
record and columns are fields within the record. Sort of like GDBM, but
with more than one "value" per "key".
--
William Park <opengeometry@yahoo.ca>, Toronto, Canada
ThinFlash: Linux thin-client on USB key (flash) drive
http://home.eol.ca/~parkw/thinflash.html
BashDiff: Super Bash shell
http://freshmeat.net/projects/bashdiff/
| |
|
|
William Park wrote:
> Is there a simple library package which manipulate table, where row is a
> record and columns are fields within the record. Sort of like GDBM, but
> with more than one "value" per "key".
For record fields you can just use a struct. Here are two other
possibilities I've found.
* PBL KEYFILE/ISAM: ultra fast B* tree implementation for random
lookups; transaction handling; sequential access methods; embeddable
small footprint
http://www.mission-base.com/peter/source/
* SQLite, a small C library that implements a self-contained,
embeddable, zero-configuration SQL database engine, including
Transactions that are atomic, consistent, isolated, and durable (ACID)
even after system crashes and power failures.
http://www.sqlite.org/
>
> --
> William Park <opengeometry@yahoo.ca>, Toronto, Canada
> ThinFlash: Linux thin-client on USB key (flash) drive
> http://home.eol.ca/~parkw/thinflash.html
> BashDiff: Super Bash shell
> http://freshmeat.net/projects/bashdiff/
| |
| joe@invalid.address 2006-05-29, 7:18 pm |
| "toby" <toby@telegraphics.com.au> writes:
> William Park wrote:
>
> For record fields you can just use a struct. Here are two other
> possibilities I've found.
>
> * PBL KEYFILE/ISAM: ultra fast B* tree implementation for random
> lookups; transaction handling; sequential access methods; embeddable
> small footprint
> http://www.mission-base.com/peter/source/
>
> * SQLite, a small C library that implements a self-contained,
> embeddable, zero-configuration SQL database engine, including
> Transactions that are atomic, consistent, isolated, and durable (ACID)
> even after system crashes and power failures.
> http://www.sqlite.org/
SQLite is pretty neet: nice clean and small. Solaris 10 uses it for
its smf stuff. It can be very easy to use too depending on what you
want to do. Lots of things can be done with three library calls and
some simple SQL queries.
Joe
| |
|
|
joe@invalid.address wrote:
> "toby" <toby@telegraphics.com.au> writes:
>
>
> SQLite is pretty neet: nice clean and small. Solaris 10 uses it for
> its smf stuff.
Apple also uses it under the hood (Spotlight).
> It can be very easy to use too depending on what you
> want to do. Lots of things can be done with three library calls and
> some simple SQL queries.
>
> Joe
| |
| William Park 2006-05-30, 7:16 pm |
| In <comp.unix.shell> toby <toby@telegraphics.com.au> wrote:
>
> William Park wrote:
>
> For record fields you can just use a struct. Here are two other
> possibilities I've found.
>
> * PBL KEYFILE/ISAM: ultra fast B* tree implementation for random
> lookups; transaction handling; sequential access methods; embeddable
> small footprint
> http://www.mission-base.com/peter/source/
>
> * SQLite, a small C library that implements a self-contained,
> embeddable, zero-configuration SQL database engine, including
> Transactions that are atomic, consistent, isolated, and durable (ACID)
> even after system crashes and power failures.
> http://www.sqlite.org/
Thanks Toby, I'll look into <www.mission-base.com>. I'm aware of
SQLite, but I would like to explore something other than SQL query
statement to access the database.
I've been looking into how shell keep the positional parameters in stack
when you call shell functions. This internal stack is C array which
grows, and each row points to linked list of strings. Problem is that
it's all in memory. I need something on disk.
--
William Park <opengeometry@yahoo.ca>, Toronto, Canada
ThinFlash: Linux thin-client on USB key (flash) drive
http://home.eol.ca/~parkw/thinflash.html
BashDiff: Super Bash shell
http://freshmeat.net/projects/bashdiff/
| |
|
| William Park wrote:
> Is there a simple library package which manipulate table, where row is a
> record and columns are fields within the record. Sort of like GDBM, but
> with more than one "value" per "key".
You, of all people, should realize that the answer to that
can only be as good as the quality of the original specification!
(I hope you understand this is meant as a compliment....)
(You must primarily not be talking about relational
databases, because multiple values per key violates 2NF....)
Heck, with the scant information given in the OQ and
followups, I could give you URLs of several different
RDBMSes written in python, php, ruby, sh, ksh, and awk!
That said, besides KEYFILE/ISAM as suggested, I would
investigate dyndb, an enhancement of the highly regarded
cdb by mathematician and programmer Daniel J. Bernstein.
"dyndb.c",2000-02-06,http://www.ohse.de/uwe/dyndb.html,$0,"enhanced
cdb(3) API: 49Kb, under Unix","Uwe Ohse" <uwe@ohse.de>,
<dyndb-subscribe@lists.ohse.de>
"cdb.c",0.75b,http://cr.yp.to/cdb.html,$0,"FFDB: RO; flat-file","Daniel
J. Bernstein" <djb@pobox.co.uk>
As well as:
"unity.c",2.6,http://www.bell-labs.com/project/ww...snapshot?unity,$0,"FFRDB",<exptools@lucent.com>
"tdbengine",6.2.9,http://www.tdbengine.org/;http://tdbsql.tdbengine.org/,$0,"RDBMS"
"starbase.c",3.2.10,http://cfa-www.harvard.edu/~john/st.../Download.html,$0,"RDB:
similar-to rdb/nosql; ASCII; under Linux/Solaris","John Roll"
.... as well as all the free sourcecode for any of Konstantin
Knizhnik's databases at:
http://www.garret.ru/~knizhnik/databases.html # POST, gigaBASE, ...
Speaking of GDBM, there is always BerkeleyDB and others:
"Berkeley DB [XML]",4.3.27,http://www.sleepycat.com,$0,"DBMS: ACID;
hash",<support@sleepycat.com>
"TDB.",,http://sourceforge.net/projects/tdb/,$0,"Trivial Database --
GDBM/BSDDB-like DB with locking"
And, of course, let us certainly not omit the most amazing, free
and open-source, absolutely SQL standards compliant, ACID
RDBMS that there is (IMNSHO much more fleshed out than SQLite).
Yes, a true relational database, but you will never grow out of
it, and it's embeddable (for bashdiff?):
"OCELOT.{c,EXE}",03.02.0514,http://www.ocelot.ca,$0,"RDBMS: ASCII;
SQL99","Peter Gulutzan" <help@ocelot.ca>
Many of these will not be applicable to your purposes without
a survey for suitability, but all are recommended for the DB
niche that they fill.
=Brian
| |
|
|
|
| bsh wrote:
> William Park wrote:
> ...
> That said, besides KEYFILE/ISAM as suggested, I would
> investigate...
> ...many snipped...
> "TDB.",,http://sourceforge.net/projects/tdb/,$0,"Trivial Database --
> GDBM/BSDDB-like DB with locking"
> ...more snipped...
> Many of these will not be applicable to your purposes without
> a survey for suitability, but all are recommended for the DB
> niche that they fill.
Great set of links. I should have remembered tdb and a couple of others
since I had come across them recently, but you have clearly made a very
thorough survey... thanks.
>
> =Brian
| |
| William Park 2006-06-01, 1:17 pm |
| In <comp.unix.shell> toby <toby@telegraphics.com.au> wrote:
> bsh wrote:
>
> Great set of links. I should have remembered tdb and a couple of others
> since I had come across them recently, but you have clearly made a very
> thorough survey... thanks.
Me too. Thank you, Brian.
It seems that I have the choice of
1. flat ASCII file with some delimiter between columns, or
2. PostgreSQL/SQLite, and insert fields into SQL statement
implicitly, much like how printf(3) is used.
Option #2 is closer to what I'm looking for. The flat ASCII requires
too many read, parse, re-read, re-parse, etc. Also, in order to access
columns by name, I have to read the first line to get the column header,
then count the column when I read the record.
Here is my thought so far.
1. I can put row number and variable/column names into GDBM key, like
row1-name1=string1
row1-name2=string2
...
But, I can't easily fetch a complete record, without looping
through the keys. In order to get the keys, I have to fetch all
keys in GDBM database, and select which ones are applicable.
2. Since key/value data in GDBM is not zero-terminated, I can
concatenate multiple strings into single block and put it into
GDBM, ie.
string1\0string2\0...\0
So, for named variables or named fields, I can use separate
header, like
name1\0name2\0...\0
or put them in pairs, like
name1\0string1\0name2\0string2\0...\0
I can fetch a complete record. Of course, I need to cut it up,
but it should be simple and fast (in C).
3. In the tradition of Unix and everything being a file, I can
create file with variable names, and put variable strings into
the files. So,
name1=string1
is stored and read like
echo "string1" > name1
Each record will be a directory, and the columns will be files.
There is no need to parse and re-parse, because everything is
already separated. Also, you can 'grep' as you like.
With PostgreSQL/SQLite, though, I can do transaction/rollback. Although
this is the most important reason for using any database, I'm not sure
how necessary it is for my need. :-)
--
William Park <opengeometry@yahoo.ca>, Toronto, Canada
ThinFlash: Linux thin-client on USB key (flash) drive
http://home.eol.ca/~parkw/thinflash.html
BashDiff: Super Bash shell
http://freshmeat.net/projects/bashdiff/
| |
|
| William Park wrote:
> Me too. Thank you, Brian.
You're welcome, Toby and William.
Thank _you_ for your programming resources you make
freely available to the 'Net at large. I know how much
work [good] programming is.
> 1. flat ASCII file with some delimiter between columns, or
> 2. PostgreSQL/SQLite, and insert fields into SQL statement
> implicitly, much like how printf(3) is used.
Stop right there! You can have the best of both worlds,
through my aforementioned ocelot, which is a fully
conformant and reliable RDBMS implemented
over an ASCII workstore. Editable ASCII CSV files,
transactional reliability, and relational functionality!
One does not even need to specify indices, as per
SQL-1999 they are generated automagically, giving
the magic of _both_ power and performance. All this
and full sourcecode too -- although you are going to
have to do the porting to the Unix/Linux environment.
Again: http://www.ocelot.ca/file.htm
> Option #2 is closer to what I'm looking for.
No it isn't. PostgreSQL is very non-trivial to install and
use, is not embeddable except through functionality
provided by ODBC, requires superuser priveleges for
many functions, and does not yet natively execute
under Unix/Linux (yet?? Version 8 is/was supposed to
remedy this).
http://sql-info.de/postgresql/gotch...ration.html#2_2
> 1. I can put row number and variable/column names into GDBM key, like
> row1-name1=string1
> row1-name2=string2 ...
> But, I can't easily fetch a complete record, without looping
> through the keys. In order to get the keys, I have to fetch all
> keys in GDBM database, and select which ones are applicable.
Since one is held hostage to the particular hashing algorithm
used in a B*Tree or other ISAM database, one is always faced
with the choice of favoring one over the other of applicability
versus versatility.
It sounds as if you "should" insert unparsed records,
and parse the fields out yourself. The disadvantages
are that of subverting referential integrity functionality
(if supported), and the fact that few if any hashing
methods support ordered sequential traversals.
In other words, I wouldn't go down this route....
It sounds like a trivial solution providing a flat-file
filestore cum SQL operators, is worth investigating,
such as txtSQL (does it have to be C source?):
"txtSQL.php",3.0.0b,http://txtsql.sourceforge.net/;http...les/txtsql.zip,$0,"RDBMS:
OO, SQL and PHP","Faraz Ali" <farazali@users.sourceforge.net>
It's even quite impressive performance-wise:
http://chibiguy.dotgeek.org/pages/b...php?section=2#1
> 2. Since key/value data in GDBM is not zero-terminated, I can
> concatenate multiple strings into single block and put it into
> GDBM, ie.
> string1\0string2\0...\0
> So, for named variables or named fields, I can use separate
> header, like
> name1\0name2\0...\0
> or put them in pairs, like
> name1\0string1\0name2\0string2\0...\0
> I can fetch a complete record. Of course, I need to cut it up,
> but it should be simple and fast (in C).
Oops. I provided my above workaround before I read this.
At least we're on the same page.
> 3. In the tradition of Unix and everything being a file, I can
> create file with variable names, and put variable strings into
> the files. So,
> name1=string1
> is stored and read like
> echo "string1" > name1
> Each record will be a directory, and the columns will be files.
> There is no need to parse and re-parse, because everything is
> already separated. Also, you can 'grep' as you like.
This is how Ocelot's "regular (but non-default)" file
structure used to be structured. The designers now
favor an "Alternate File Structure" (that is far more
efficient) as should you.
http://www.ocelot.ca/file.htm
Idea: doesn't bash3 now implement ksh93-like
compound-variables? This would be a _lot_ more
versatile in the above regard.
> With PostgreSQL/SQLite, though, I can do transaction/rollback. Although
> this is the most important reason for using any database, I'm not sure
> how necessary it is for my need. :-)
ACID functionality in an RDBMS makes implementing
transactive integrity and rollback trivial. It's this quality
that one looks for in a good RDB, not these functional
ramifications.
BTW, _someone_ is going to call me to task to
provide examples, as previously mentioned, of
RDBMSes written in interpreted languages:
"kshsql.ksh",,http://books.dreambook.com/dfrench/...nch/kshsql.txt,$0,"ingres
clone","Dana French"
"gadfly.py",1.0,http://gadfly.sourceforge.net/,$0,"RDBMS: ASCII,
SQL92","Richard Jones" <richard@users.sf.net>, "Aaron Robert Watters"
"flatsql.pl",1;2,http://www.eecs.harvard.edu/~konrad.../flatsqlmysql/,$0,"RDBMS
-- emits .awk code: SQL","Konrad Lorincz, Kevin Redwine, Jesse Tov"
Who knew!?
=Brian
| |
|
| William Park wrote:
> ...
> It seems that I have the choice of
>
> 1. flat ASCII file with some delimiter between columns, or
>
> 2. PostgreSQL/SQLite, and insert fields into SQL statement
> implicitly, much like how printf(3) is used.
>
> Option #2 is closer to what I'm looking for. The flat ASCII requires
> too many read, parse, re-read, re-parse, etc. Also, in order to access
> columns by name, I have to read the first line to get the column header,
> then count the column when I read the record.
Furthermore it's not indexed...
>
> Here is my thought so far.
>
> 1. I can put row number and variable/column names into GDBM key, like
> row1-name1=string1
> row1-name2=string2
> ...
>
> But, I can't easily fetch a complete record, without looping
> through the keys. In order to get the keys, I have to fetch all
> keys in GDBM database, and select which ones are applicable.
That's why they call it SELECT :-)
>
>
> 2. Since key/value data in GDBM is not zero-terminated, I can
> concatenate multiple strings into single block and put it into
> GDBM, ie.
> string1\0string2\0...\0
> So, for named variables or named fields, I can use separate
> header, like
> name1\0name2\0...\0
> or put them in pairs, like
> name1\0string1\0name2\0string2\0...\0
>
> I can fetch a complete record. Of course, I need to cut it up,
> but it should be simple and fast (in C).
Yes, and you can use fixed length fields to make it even quicker...
>
>
> 3. In the tradition of Unix and everything being a file, I can
> create file with variable names, and put variable strings into
> the files. So,
> name1=string1
> is stored and read like
> echo "string1" > name1
>
> Each record will be a directory, and the columns will be files.
> There is no need to parse and re-parse, because everything is
> already separated. Also, you can 'grep' as you like.
>
>
> With PostgreSQL/SQLite, though, I can do transaction/rollback. Although
> this is the most important reason for using any database, I'm not sure
> how necessary it is for my need. :-)
PgSQL/MySQL/Ocelot(I think)/Ingres/etc are at the "big database" end of
the scale. Your needs sound far more modest. You should first consider
your query pattern closely, that would seem to be slightly more
influential than details of data representation (although there are
hash/btree/etc tradeoffs to consider when it comes to updating)?
--Toby
>
> --
> William Park <opengeometry@yahoo.ca>, Toronto, Canada
> ThinFlash: Linux thin-client on USB key (flash) drive
> http://home.eol.ca/~parkw/thinflash.html
> BashDiff: Super Bash shell
> http://freshmeat.net/projects/bashdiff/
| |
| William Park 2006-06-02, 1:23 am |
| In <comp.unix.shell> bsh <brian_hiles@rocketmail.com> wrote:
> http://www.ocelot.ca/file.htm
I'll study this more.
>
> Idea: doesn't bash3 now implement ksh93-like
> compound-variables? This would be a _lot_ more
> versatile in the above regard.
Bash doesn't have sub-variables. Though, it won't help that much,
because I need array of this thing.
Surely, I can't be the first person who needs to navigate array of
records? The closest analogy is "cursor" which some SQL databases have.
Move the cursor to a record, and all its fields are automagically pulled
into shell variables of same name. Hmm... almost there...
--
William Park <opengeometry@yahoo.ca>, Toronto, Canada
ThinFlash: Linux thin-client on USB key (flash) drive
http://home.eol.ca/~parkw/thinflash.html
BashDiff: Super Bash shell
http://freshmeat.net/projects/bashdiff/
| |
|
| bsh wrote:
> ...
>
> Stop right there! You can have the best of both worlds,
> through my aforementioned ocelot, ...
>
> Again: http://www.ocelot.ca/file.htm
>
>
> No it isn't. PostgreSQL is very non-trivial to install and
On most Linuxes installation is a breeze (Gentoo: 'emerge postgres').
> use, is not embeddable except through functionality
> provided by ODBC,
Not so. There is a perfectly good C library (and other bindings).
> requires superuser priveleges for
> many functions, and does not yet natively execute
> under Unix/Linux (yet?? Version 8 is/was supposed to
> remedy this).
What do you mean 'natively execute'?
>
> http://sql-info.de/postgresql/gotch...ration.html#2_2
>
>
> Since one is held hostage to the particular hashing algorithm
> used in a B*Tree or other ISAM database, one is always faced
Many databases give you choices here. mysql - embeddable - offers quite
a variety of table engines (hash, various ISAM, heap, bdb, etc), as
does Ingres - not so embeddable - Btree, hash, heap, ISAM.
> with the choice of favoring one over the other of applicability
> versus versatility.
>
> It sounds as if you "should" insert unparsed records,
> and parse the fields out yourself. The disadvantages
> are that of subverting referential integrity functionality
> (if supported), and the fact that few if any hashing
> methods support ordered sequential traversals.
>
> In other words, I wouldn't go down this route....
>
> It sounds like a trivial solution providing a flat-file
> filestore cum SQL operators, is worth investigating,
> such as txtSQL (does it have to be C source?):
>
> "txtSQL.php",3.0.0b,http://txtsql.sourceforge.net/;http...les/txtsql.zip,$0,"RDBMS:
> OO, SQL and PHP","Faraz Ali" <farazali@users.sourceforge.net>
>
> It's even quite impressive performance-wise:
>
> http://chibiguy.dotgeek.org/pages/b...php?section=2#1
>
>
> Oops. I provided my above workaround before I read this.
> At least we're on the same page.
>
>
> This is how Ocelot's "regular (but non-default)" file
> structure used to be structured. The designers now
> favor an "Alternate File Structure" (that is far more
> efficient) as should you.
>
> http://www.ocelot.ca/file.htm
The Ocelot web site is not very well structured, imho. It was difficult
to drill down to specifics such as what platforms are supported. Is it
ported to UNIX?
>
> Idea: doesn't bash3 now implement ksh93-like
> compound-variables? This would be a _lot_ more
> versatile in the above regard.
>
>
> ACID functionality in an RDBMS makes implementing
> transactive integrity and rollback trivial. It's this quality
> that one looks for in a good RDB, not these functional
> ramifications.
SQLite touts ACID.
>
> BTW, _someone_ is going to call me to task to
> provide examples, as previously mentioned, of
> RDBMSes written in interpreted languages:
>
> "kshsql.ksh",,http://books.dreambook.com/dfrench/...nch/kshsql.txt,$0,"ingres
> clone","Dana French"
>
> "gadfly.py",1.0, http://gadfly.sourceforge.net/ ...
>
> "flatsql.pl",1;2, http://www.eecs.harvard.edu/~konrad...s/flatsqlmysql/ ...
Also Cloudscape (Java),
http://www-306.ibm.com/software/data/cloudscape/
http://www.arunadb.com/ (Ruby)
>
> Who knew!?
>
> =Brian
| |
| Henry Townsend 2006-06-02, 1:15 pm |
| bsh wrote:
> No it isn't. PostgreSQL is very non-trivial to install and
> use, is not embeddable except through functionality
> provided by ODBC, requires superuser priveleges for
> many functions, and does not yet natively execute
> under Unix/Linux (yet?? Version 8 is/was supposed to
> remedy this).
I believe you have this last bit backward. As of a couple of years ago
when I last worked with it, PostgreSQL had always worked on Unixish
systems but worked on Windows only via Cygwin (or similar, don't recall
details). A native port to Win32 was to be a big part of the upcoming
major release at that time (which would be an old release by now).
HT
| |
| Roger Leigh 2006-06-02, 1:15 pm |
| | |
|
|
toby wrote:
> bsh wrote:
> On most Linuxes installation is a breeze.
Talkin' 'bout the broader idea of installation _and_ schema
design... although you would be right to quibble that this is
an issue equally valid in any SQL RDBMS.
> [ODBC] Not so. There is a perfectly good C library (and other bindings).
Talkin' 'bout embeddability.
> What do you mean 'natively execute'?
Whoops! I got it backwards: version 8 was supposed to
have a native executable (and setup binary) for Win32,
not vice versa, and is not pertinent for you on Unix/Linux.
Whether or not version 8 was found or not in my informal
survey, is not relevant to this particular issue.
> Many databases give you choices here. mysql - embeddable - offers quite
> a variety of table engines (hash, various ISAM, heap, bdb, etc), as
> does Ingres - not so embeddable - Btree, hash, heap, ISAM.
This is was aware, although with not so much specific knowledge.
=Brian
| |
|
|
Roger Leigh wrote:
> Henry Townsend <henry.townsend@not.here> writes:
[vbcol=seagreen]
> I believe you have this last bit backward.
As previously admitted: yes.
> I would also contend the difficulty of installation and use. It's
> pretty simple, even more so if you use the distribution's packages.
> http://www.whinlatter.ukfsn.org/talks/pgtalk/
Well then, an extract from the above PostgreSQL talk:
"Why not use PostgreSQL"
+ Too big and complex to administer.
- CSV
- GDBM/NDBM
- BerkeleyDB
- SQLite
Which pretty much echoes my thesis and our collective
list of alternatives. William Park never indicated that he
was conversant with SQL, and learning any new language
(especially one as proprietary as SQL) is always non-
trivial.
=Brian
| |
|
| bsh wrote:
> Roger Leigh wrote:
>
>
> As previously admitted: yes.
>
>
> Well then, an extract from the above PostgreSQL talk:
>
> "Why not use PostgreSQL"
> + Too big and complex to administer.
> - CSV
> - GDBM/NDBM
> - BerkeleyDB
> - SQLite
>
> Which pretty much echoes my thesis and our collective
> list of alternatives. William Park never indicated that he
> was conversant with SQL, and learning any new language
> (especially one as proprietary as SQL) is always non-
> trivial.
SQLite is designed to make embedding and use as simple as possible. See
this interesting talk by its principal developer:
http://video.google.com/videoplay?d...435487953918649
I doubt William would have any trouble should he choose that route :-)
>
> =Brian
| |
|
|
A followup:
"duro" looks to be a good compromise between that
eternal struggle between versatility and power.
"duro.c",0.11,http://duro.sourceforge.net,$0,"RDB API: utilizes
BerkeleyDB"
It basically implements a next-generation RDB on top of
BerkeleyDB.
=Brian
| |
|
|
| davids@webmaster.com 2006-06-15, 7:26 am |
|
William Park wrote:
> Is there a simple library package which manipulate table, where row is a
> record and columns are fields within the record. Sort of like GDBM, but
> with more than one "value" per "key".
Had GDBM been under the BSD license instead of the GPL, it would have
had this capability years ago. Sadly, we had to work with a product
that was free in the sense of freedom, not in the sense of beer, and
abandoned our work on GDBM.
DS
|
|
|
|
|