Design of a URL encoded language to specify sets of files on aWebDAV server

A

Andrew James

Gentlemen,
I'm currently in the process of designing a language which will be used
to specify sets of files on a WebDAV server, encoded in a URL. The aims
of the language are to (in no particular order):
* Be concise, aesthetic and easy to type
* Be as similar as possible to existing query languages
* Allow for (nested) boolean operations
* Be cross-platform (so don't include any characters which can't be
used in filenames on *NIX or Win32)

There is a project Wiki which I will release once I've got it into a
semi-stable state.

This is part of a wider project (in Python, of course) which I'm
developing for my degree but which may have more uses. I'd like to draw
on all the experience here and ask some questions so that I can better
shape the implementation to suit the people who may choose to use it.

I would very much appreciate it if anyone could spare the time to have a
look through the specification and documentation below and let me know
about the following:

* Whether you think the language is up to the job (and if not, why not)
* Any additions which you think should be made that will increase
functionality and/or decrease ambiguity
* Any common pitfalls which you think I might fall into
* Any bugs in my specification

There will be a more verbose query language (probably SQLXML) also
implemented in the project but one key feature is that users should be
able to search by simply typing something into the 'Goto' box...

Well, that's about all. Please see the specification below and I hope to
be hearing your feedback in the near future.

Regards,
Andrew

MetaFS Path Query Language
==========================

Background
One way of specifying search criteria in MetaFS is by using a dynamic
URL, making it trivial to work with MetaFS from the command line. This
document explains the characteristics and formatting of these queries
and offers some examples to be expanded upon.


Diving In
The query language used by MetaFS is closely related to XPath/LaTeX,
with a few restrictions around reserved characters in filenames. The
basic format of a query is:

{BASE}/fs/catA/catB/catC[criteria]

An example of a simple query based on this format would be:

{BASE}/fs/photos/america/beach[type=jpeg,author='Andrew James']

This query is equivalent to searching for files which are in the
photos,america and beach categories (the intersection), have a jpeg mime
type and whose author is Andrew James.

This type of query should be enough to perform most sorts of simple
searches, but MetaFS also includes some advanced features that can be
accessed with more complicated queries.


Advanced Features
Boolean Logic Engine
MetaFS includes a complete boolean logic engine which allows for both
grouping of terms and the boolean operators and, or and not.

Reserved Operator Characters

/
Boolean AND (categories only)
^
Boolean OR (categories only)
~
Boolean NOT (categories only)
()
Term Grouping (categories only)
-
Less than (criteria only)
+
More than (criteria only)
=
Logical EQUALS (criteria only)
!
Logical NOT (criteria only)
~
Logical CONTAINS (criteria only)
,
Boolean AND (criteria only)

This allows us to create much more complex queries, such as

{BASE}/fs/photos/~(america^france)

or even

{BASE}/fs/music/mp3[artist~'Jackson',artist!~'Micheal']
Operator Order of Precedence
The operator order of precedence for filesystem queries is as follows:

Categories
(), ~, ^, /
Criteria
',', !, (=, -, +, ~)
Namespaces
Metadata criteria can not only include the default metadata attributes
which MetaFS assigns to files but also namespaces. These are simply
defined by prefixing the name with a namespace and colon, as in XML. For
example, one could specify criteria like:

[owner=drew,moddate>10/10/2004,ns:bitrate='128']


== Query Language Grammar ==
Below is the initial (and buggy, no doubt) specification for the MetaFS
query language in the TPG form of BNF.


Language Specification
# Tokens
separator space '\s+';
token Num '\d+(.\d+)?';
token Ident '[a-zA-Z]\w*';
token CharList '\'.*\'';
token CatUnOp '~';
token CatOp '[/\^]';
token MetaOp '[=\+\-!]';
token Date '\d\d-\d\d-\d\d\d\d';
token FileID '(\w+\.\w+)';
token EmptyLine '^$';

# Rules
START -> CatExpr ('\[' MetaExpr '\]')?
| FileID
| EmptyLine
;
CatExpr -> CatUnOp CatName
| CatName (CatOp CatExpr)*
;
CatName -> Ident
| '\(' CatExpr '\)'
;
MetaExpr -> MetaCrit (',' MetaCrit)*
;
MetaCrit -> Ident MetaOp Value
;
Value -> CharList | Num | Date
;

Test Queries
The following test queries have been run through the parser to check
whether they are parsed correctly. The version of the language grammar
above does this without errors. As you can see, the grammar allows for
term nesting, unary operators and arbitrarily complex boolean
expressions. In addition to this, file metadata querying is available
for extra filtering.

parseTests = (
"simple",
"this/is/a/simple/test",
"a/test/with/metadata[author='drew',date=10]",
"music/mp3/~jackson/michael",
"docs/latex/~(computer^science)",
"media/video/((comedy/action)^thriller)"
)
 
D

Dieter Maurer

Andrew James said:
...
{BASE}/fs/photos/america/beach[type=jpeg,author='Andrew James']

This query is equivalent to searching for files which are in the
photos,america and beach categories (the intersection), have a jpeg mime
type and whose author is Andrew James.

Why are you using path syntax when in fact you mean intersection?

Note that genuine "path"s, too, have an idependent meaning in a WebDAV
repository.
...
Reserved Operator Characters

/
Boolean AND (categories only)
^
Boolean OR (categories only)

There are more familiar symbols for "and" and "or"...
...
-
Less than (criteria only)
+
More than (criteria only)

"+" and "-" usually mean something different.
=
Logical EQUALS (criteria only)

What does that mean?

You want a more general "EQUALS" relation (not only on boolean values).
~
Logical CONTAINS (criteria only)

What does that mean?
,
Boolean AND (criteria only)

Why are your "and" operators for categories and criteria different?
This allows us to create much more complex queries, such as

{BASE}/fs/photos/~(america^france)

Above, I had the impression that criteria were enclosed in "[...]".
Seems not to be the case.


I stop here: I would not like your query language:

It uses unfamiliar symbols for well known operators (rather
than the standard ones).

It uses different symbols for the same operator in different
contexts.

It appears to be defined via examples (and not formally).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,049
Latest member
Allen00Reed

Latest Threads

Top