"archive" data formats

Ivan Shmakov · Feb 24, 2012

[Cross-posting to for the reasons below.
Feel free to drop if inappropriate.]

I want to use a tar file like an IBM partitioned dataset, i. e., a
file with multiple members, from a C program.

Click to expand...

There're a plenty of data formats allowing for such a use. Did you
consider SQLite [1] or HDF5 [2]? Or even GDBM [3]?

Click to expand...

[...]

If its octet sequences instead, SQLite BLOB's [4] could be the
way to go.
[1] http://sqlite.org/
[2] http://www.hdfgroup.org/HDF5/
[3] http://www.gnu.org.ua/software/gdbm/
[4] http://sqlite.org/c3ref/blob.html

Click to expand...

Thanks, Ivan, that'll all work, too. The data's more like TLOB's
(text large objects), with each "record" a small program, or more
often plain english text, in its own file.

SQLite seems to fit nicely such a description. Consider, e. g.:

CREATE TABLE "file" (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
text TEXT NOT NULL);

-- ensure that names are unique
CREATE UNIQUE INDEX "file-unique"
ON "file" ("name");

-- @file-get name
SELECT "text" FROM "file"
WHERE "name" = ?1;

-- @file-put name text
INSERT INTO "file" ("name", "text")
VALUES (?1, ?2);

-- @file-replace name text
UPDATE "file"
SET "text" = ?2
WHERE "name" = ?1;

AIUI, SQLite has strong support for static linking, even at the
"source level", which could be important for one wishing to keep
the number of dependencies low.

Then some set of those is collected and assembled. One first small
test application (just to exercise the code) is listings, e. g.,
www.forkosh.com then click "Alps" under Sample Code (I'd give you a
direct deep link, but you'll see it's a real long constructed link,
passing lots of query_string attributes on to the cgi program, under
construction, that I've been talking about).

The more important application will be algorithmically collecting
snippets of boilerplate text and constructing complete documents
"according to spec".

The above somehow reminds me of XML (the model, if not the
representation), and the associated "tools": XInclude, XPath,
XSLT and XQuery. And there's Fast Infoset for a space- and
time-efficient XML representation, BTW.

The use of XML to encode the structure of the data (and code)
being stored could bring a level of consistency, but depending
on the task, it may be too much pain for too low gain.

JohnF · Feb 24, 2012

Ivan Shmakov said:
JohnF said:

I want to use a tar file like an IBM partitioned dataset, i. e., a
file with multiple members, from a C program.
There're a plenty of data formats allowing for such a use. Did you
consider SQLite [1] or HDF5 [2]? Or even GDBM [3]?

Click to expand...

[...]

If its octet sequences instead, SQLite BLOB's [4] could be the
way to go.
[1] http://sqlite.org/
[2] http://www.hdfgroup.org/HDF5/
[3] http://www.gnu.org.ua/software/gdbm/
[4] http://sqlite.org/c3ref/blob.html

Click to expand...

Click to expand...

Thanks, Ivan, that'll all work, too. The data's more like TLOB's
(text large objects), with each "record" a small program, or more
often plain english text, in its own file.

Click to expand...

SQLite seems to fit nicely such a description. Consider, e. g.:

CREATE TABLE "file" (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
text TEXT NOT NULL);

-- ensure that names are unique
CREATE UNIQUE INDEX "file-unique"
ON "file" ("name");

-- @file-get name
SELECT "text" FROM "file"
WHERE "name" = ?1;

-- @file-put name text
INSERT INTO "file" ("name", "text")
VALUES (?1, ?2);

-- @file-replace name text
UPDATE "file"
SET "text" = ?2
WHERE "name" = ?1;

AIUI, SQLite has strong support for static linking, even at the
"source level", which could be important for one wishing to keep
the number of dependencies low.

The problem with all non-popen()-type solutions is redesigning/writing
existing code that fget's (for the "r" side) and fput's (for "w").
I've already replaced fopen() with myfopen() and fclose() with myfclose()
(not their real names) to transparently access files across the net --
if the requested file starts with http:// myfopen() popen's wget to read
the file, otherwise just fopen's it as usual (and there's no "w" side
for that yet, either).
So all the hooks already exist to transparently popen tar if the
requested filename indicates a tar file. Very easy change; no logic
affected whatsoever. Not so with anything else. Zip, etc instead of
tar is zero extra effort. Mysql, etc instead of tar becomes a job.
Moreover, the volume of requests is insignificant, so that
efficiency/overhead/whatever is completely irrelevant. And finally,
I'm seeing no real functional advantage to mysql, etc visible to
the end user (unless maybe, at some future time, e.g., text snippets
are described by keys), just complicating the internals.

The above somehow reminds me of XML (the model, if not the
representation), and the associated "tools": XInclude, XPath,
XSLT and XQuery. And there's Fast Infoset for a space- and
time-efficient XML representation, BTW.

The use of XML to encode the structure of the data (and code)
being stored could bring a level of consistency, but depending
on the task, it may be too much pain for too low gain.

That's what it sounds like (too much/too little), though I'm not
familiar enough with xml and friends to be entirely positive.
Modulo the original tar question, a satisfactory solution already
completely exists. And it's not clear what future functionality
might or might not be required/desired. Maybe none. And I'm quite
happy treating current code as a prototype to discover if there's
maybe some (additional functionality required). But immediately
making the design/internals too far ahead of the curve vis-a-vis
currently necessary (and reasonably anticipated) functionality
doesn't seem wise to me. Look how far this thread has diverged from
the original tar question. That's (one way) how projects fail.

Best Way to Create a PDF Archive of Google Workspace Data?	1	Jun 23, 2026
Personal Data API	0	Nov 19, 2024
boost::archive::xml_iarchive	0	Oct 25, 2012
security vs. XML-based formats	4	Mar 8, 2010
Hey can anyone tell me why input data wont save in my database?	2	Jun 15, 2024
Collect Excel Data from Website	5	Apr 30, 2022
Issue with passing fetched data to POST form. How can I?	0	Jul 23, 2023
How to treat an input data as variable?	4	Apr 13, 2023

"archive" data formats

Ivan Shmakov

JohnF

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads