Constellations

R

Roedy Green

What I am hoping to do is provide stable rules for selecting files
that handle files appearing and disappearing, that can then feed the
entire set to various utilities, without each utility needing to be
clever. I want a standard way of specifying constellations that works
across many utilities, including ones I did not write.

Typical application would be specifying files to backup, or specifying
the files to burn and application to a CD. You don't want to have to
manually tune the set every time it is run. It should have rules in
it that adjust to the presence of new files and directories and the
disappearance of others.
--
Roedy Green Canadian Mind Products
http://mindprod.com
PM Steven Harper is fixated on the costs of implementing Kyoto, estimated as high as 1% of GDP.
However, he refuses to consider the costs of not implementing Kyoto which the
famous economist Nicholas Stern estimated at 5 to 20% of GDP
 
M

Martin Gregorie

Typical application would be specifying files to backup, or specifying
the files to burn and application to a CD. You don't want to have to
manually tune the set every time it is run. It should have rules in it
that adjust to the presence of new files and directories and the
disappearance of others.

I hate to say it, but normal shell globbing works just fine for an
overnight backup system I've been running for three years now. That's
about the only bit of the backup script that HASN'T needed tweaks.
 
D

David Segall

Roedy Green said:
Over the last 24 hours I have come round to that point over view too.
I am disappointed with that conclusion although I am not surprised. I
have examined a number of backup programs and I have not found any
that provide an intuitive, powerful language to specify the files to
be backed up.
The advantage is the tools for creating constellations can then be
used with software that is constellation unaware. Further, the app
needs no RAM overhead to analyse/evaluate the constellation. Further
the user can look at the list of files no verify it, or hand tune it
before feeding it to a app.
I seem to have over-interpreted or misinterpreted your original post.
I assumed you were attempting to specify an XML based language that
could be read by your "constellation interpreter" and the output of
that program (or class) would be a list of files that could be used
for any purpose including "hand tuning". I cannot believe that the
best source language for the purpose is a combination of find, grep
and sed.
 
A

Arne Vajhøj

Stefan said:
You do not have to write them, because they are already written.

And they run everywhere, since they are available for free for
every major operating system.

If it is a system that you are allowed to install various
utilities on and you want to spend the time finding them
and getting them build and installed.

Arne
 
A

Arne Vajhøj

I don't see how writing XML is more user-friendly than writing find
commands (neither are very friendly to the non-technical).

The XML can be (and are in the case of ant is) much easier to
read than shell-wizardry.

Arne
 
R

Roedy Green

I am disappointed with that conclusion although I am not surprised. I
have examined a number of backup programs and I have not found any
that provide an intuitive, powerful language to specify the files to
be backed up.

I have been complaining about that since the DOS days. I do it now
with my own scripts that compress and back up files to DVD with a
simple copy. DVDs are fat enough now that I personally can bypass
splitting files over many DVDs, though in general you cannot.


Almost nobody I know does backups and that is because it is too
difficult for them to do. The come to me after a crash hoping I can
salvage something from the wreckage.


--
Roedy Green Canadian Mind Products
http://mindprod.com
PM Steven Harper is fixated on the costs of implementing Kyoto, estimated as high as 1% of GDP.
However, he refuses to consider the costs of not implementing Kyoto which the
famous economist Nicholas Stern estimated at 5 to 20% of GDP
 
R

Roedy Green

. I cannot believe that the
best source language for the purpose is a combination of find, grep
and sed.

The problem find, grep and sed require procedural thinking, utterly
beyond the average user. Whether it is possible or not, I don't know,
but I need something even a relative idiot could use to create a set
of rules on what to backup or process, perhaps through iterative
refinement, but not requiring the forethought and composing out of
thin air the way programming does. It has to be something you do
mainly by selecting.

I had this problem back in the DOS days. I create a boolean
expression system for doing database queries at a charity. The users
came to me every time they needed a query. No matter how simple the
query, they just did not get it.

I replaced it with a fill-in-the blanks scheme with lower and upper
selection bounds on the possible fields. There was a Mickey mouse
scheme to cascade picks by turning a flag on and off in each record if
it matched the criteria. Even though it was ugly by programmer
standards, they had no problem with it. They never had to COMPOSE
anything, just combine selections. Composing is a much higher order
task than answering a multiple choice question, even a tedious list of
questions.


--
Roedy Green Canadian Mind Products
http://mindprod.com
PM Steven Harper is fixated on the costs of implementing Kyoto, estimated as high as 1% of GDP.
However, he refuses to consider the costs of not implementing Kyoto which the
famous economist Nicholas Stern estimated at 5 to 20% of GDP
 
R

RedGrittyBrick

Roedy said:
The problem find, grep and sed require procedural thinking, utterly
beyond the average user. Whether it is possible or not, I don't know,
but I need something even a relative idiot could use to create a set
of rules on what to backup or process, perhaps through iterative
refinement, but not requiring the forethought and composing out of
thin air the way programming does. It has to be something you do
mainly by selecting.

Isn't the usual answer to present the user with an expandable tree view
of the "folders" with check boxes for every directory and file. You can
save the resulting "specification" in whatever form is convenient for
the developer as the user never has to see it.

Some users don't get the concept of disks and folders, the computer is a
magic box and they have no mental model of storage inside it. They
unknowingly save files in whatever place each application offers as a
default. They don't organise their files. I guess the only sensible
backup to offer them is either their home directory and/or a list of
places that common applications might save documents.
 
D

David Segall

RedGrittyBrick said:
Isn't the usual answer to present the user with an expandable tree view
of the "folders" with check boxes for every directory and file.
It is and it is easy to understand. Unfortunately it fails for only
slightly complicated cases. For example, if you want to backup only
some of your Java source file directories then you cannot specify what
should happen if you add a new Java source file directory. If you
don't want to back up your class files you have to uncheck them one at
a time.
 
R

RedGrittyBrick

Patricia said:
...

I've used those systems for backup. I think it should be possible to do
better. For example, consider the issue of how to treat new files or
directories, added after the backup parameters were created. There are
obvious defaults if the new item is in a directory that is entirely
included or entirely excluded, but no way to specify what should happen
when creating in a directory containing some included and some excluded
content.

For example, I have a "My Documents" directory that is mainly backed up,
but I exclude a few things that are big and can be restored or
recalculated from other sources. New directories in "My Documents"
should be backed up, but there is no way to say so.

Perhaps one way it is "possible to do better" is if:
tick "My Documents" (directory inclusion, recursive),
untick "My Documents\Ephemera" (directory exlusion),
untick "My Documents\photos_1965-2009.zip" (file exclusion).
Would generate appropriate internal "rules"/"specs"/"find args".

Maybe with some global (or per directory context-menu) options:
[/] exclude files bigger than [100] MB.
[/] include folders inside this one.
etc

I do think this approach, based on a visual metaphor, is likely to be
easier for most non-IT folk than learning any system of writing
hierarchical rules in some typed grammar of words, operators and sentences.

I guess GUI approaches always have limitations that some IT experts will
find restrictive. That's why many of us continue to like command line
tools such as grep, sed and find.

Just my ¤0.02 worth.
 
R

RedGrittyBrick

David said:
It is and it is easy to understand. Unfortunately it fails for only
slightly complicated cases. For example, if you want to backup only
some of your Java source file directories then you cannot specify what
should happen if you add a new Java source file directory. If you
don't want to back up your class files you have to uncheck them one at
a time.

These are good points but not insuperable if you are implementing such
an interface for an interactive program for use by people who are happy
to have any backup at all that doesn't require them to learn a new
specification language whose concepts they find difficult to grasp.

* You could have a default for inclusion (or exclusion) of new
subdirectories.

* If the app finds new subdirectories it could (optionally) ask the user
what to do. If the user seems not to be present, it could make an
assumption erring on the side of safety? Such an app might do the wrong
thing less often than the alternative of a textual grammar based
interface that the user perhaps can't easily grasp. We're not aiming for
perfection.

* You could augment the directory-choosing interface with the ability to
tick/untick for inclusion a list of file types found to be present.
I'd list them by descriptive name of file type (not e.g. file name
extension)

I should note that Java programmers are not the end-users I had in mind
when thinking about Roedy's question.

Therefore KISS applies too and rules out more than a very few bells and
whistles of the sort I suggest.

I still think the main thrust of my suggestion is a reasonable one for
Roedy's stated target users. These details we are discussing shouldn't
negate that, even if it means that such an interface would not satisfy
you and I.
 
M

Martin Gregorie

Perhaps one way it is "possible to do better" is if:
tick "My Documents" (directory inclusion, recursive), untick "My
Documents\Ephemera" (directory exlusion), untick "My
Documents\photos_1965-2009.zip" (file exclusion).
Would generate appropriate internal "rules"/"specs"/"find args".

Maybe with some global (or per directory context-menu) options:
[/] exclude files bigger than [100] MB. [/] include folders inside
this one.
etc
Also consider allowing ticking /unticking files to select/deselect all
files with the same extension. This, combined with the ability to select
root directories for recursive backup probably covers 95% of non-techie
user requirements. It shouldn't be difficult to use this input to build
zip, tar or rsync-like arguments for the backup application or, indeed,
to use these utilities to do actual backup.
I do think this approach, based on a visual metaphor, is likely to be
easier for most non-IT folk than learning any system of writing
hierarchical rules in some typed grammar of words, operators and
sentences.
Agreed.
I guess GUI approaches always have limitations that some IT experts will
find restrictive. That's why many of us continue to like command line
tools such as grep, sed and find.
In this case a well-designed GUI interface that either runs an open
source and/or portable backup utility or (optionally) schedules a backup
using it is probably a near-optimum solution.
 
R

Roedy Green

Isn't the usual answer to present the user with an expandable tree view
of the "folders" with check boxes for every directory and file. You can
save the resulting "specification" in whatever form is convenient for
the developer as the user never has to see it.

It is very easy to specify a set of files this way, but a bit tougher
to specify a set of rules that will still work when new files are
added, directories are renamed etc.

For smarter people you might attach simple rules to nodes. For dummies
you may need a system where a specification goes "stale" and needs
manual help to tune it. The user still thinks mainly in terms of
individual files, specifying which ones to include and which to
exclude rather than rules. The rules might be inferred from the
choices.

I have a feeling I am just going to have to create a brought cut then
gradually refine it based on experience with use. My ideas are far
too fuzzy at this point.

I wanted this feature to provide a way to specify files for my website
link checking software, BrokenLinks. I need a way to specify which
local html files to check and which remote URLs to avoid checking.

I would also redo the Replicator, untouch, blout and other
tree-processing utilities to accept a constellation.
--
Roedy Green Canadian Mind Products
http://mindprod.com
PM Steven Harper is fixated on the costs of implementing Kyoto, estimated as high as 1% of GDP.
However, he refuses to consider the costs of not implementing Kyoto which the
famous economist Nicholas Stern estimated at 5 to 20% of GDP
 
R

Roedy Green

I've used those systems for backup. I think it should be possible to do
better. For example, consider the issue of how to treat new files or
directories, added after the backup parameters were created. There are
obvious defaults if the new item is in a directory that is entirely
included or entirely excluded, but no way to specify what should happen
when creating in a directory containing some included and some excluded
content.

Most system default to NOT including anything new. This pretty much
defeats the point of reusing a backup specification. It is too
dangerous.
--
Roedy Green Canadian Mind Products
http://mindprod.com
PM Steven Harper is fixated on the costs of implementing Kyoto, estimated as high as 1% of GDP.
However, he refuses to consider the costs of not implementing Kyoto which the
famous economist Nicholas Stern estimated at 5 to 20% of GDP
 
R

Roedy Green

I do think this approach, based on a visual metaphor, is likely to be
easier for most non-IT folk than learning any system of writing
hierarchical rules in some typed grammar of words, operators and sentences.

I guess GUI approaches always have limitations that some IT experts will
find restrictive. That's why many of us continue to like command line
tools such as grep, sed and find.

I think there will be three representations:

1. visual

2. rules

3. linear list of files.

with ways of interconverting.

That leaves the system most open for other programmers to integrate
with it.

This little weekend project is growing into a monster. I would feel
ever so much better about investing the effort if I knew it would get
used.
--
Roedy Green Canadian Mind Products
http://mindprod.com
PM Steven Harper is fixated on the costs of implementing Kyoto, estimated as high as 1% of GDP.
However, he refuses to consider the costs of not implementing Kyoto which the
famous economist Nicholas Stern estimated at 5 to 20% of GDP
 
M

Martin Gregorie

Most system default to NOT including anything new. This pretty much
defeats the point of reusing a backup specification. It is too
dangerous.
Depends. Its easy enough to set up zip, tar and rsync to recurse through
a list of top-level directories and provide an exclusion list that prunes
the selected structures. That does pretty much what's wanted - with
recursion on new files and directories will be automatically included
unless they fall within the scope of an exclusion term.

There's no reason why your constellation shell shouldn't codify file/
commandline/GUI input in this form. From there it could be passed
directly to programs capable of accepting it or expanding it against the
filing system for use by programs that can only accept a list of files[1].
Alternatively the list could be passed on a socket[2].

[1] this is fine if the subordinate program can read a list of files from
a file or pipe, but may give problems if the list length exceeds the OS-
specific maximum command line size.

[2] Using a socket might sound complex, but it avoids storage and command
line length issues. It potentially adds flexibility too, since the list
expansion need not be done on the same system that the list consumer runs
on.
 
R

Roedy Green

I often run into the problem of having to specify a set of files to be
processed by an application.

I have been experimenting with how you might specify constellations
with a GUI.

I was thinking of displaying a tree that keeps changing colours (with
geometric clues too for the colour blind) to let you know which files
and directories are going to be included. This lets you get your
solution iteratively.

A node or file has three possible states:

default -- include or exclude depending on state of parent node.
always include
always exclude

you wander through the tree expressing your overrides for directories
and files.

You can attach an extension list to a node. This effectively splits
the tree off that node into two subtrees, one that matches the
extension and one that does not. You can then wander through either
tree marking overrides. You might split off several trees, each with a
different extension list.

By doing the split at the root level, above the drive/file roots
level, you effectively get global control of extensions.

Ditto for a regex that matches a filename.

You can "prune" the tree by noticing that a parent is included, and
all children are included either by default or explicitly, ditto for
exclusion. You could then export the tree as reasonably compact XML
that will regenerate a reasonable list of files even if new
directories appear, are renamed or disappear.

--
Roedy Green Canadian Mind Products
http://mindprod.com
PM Steven Harper is fixated on the costs of implementing Kyoto, estimated as high as 1% of GDP.
However, he refuses to consider the costs of not implementing Kyoto which the
famous economist Nicholas Stern estimated at 5 to 20% of GDP
 
D

David Segall

Roedy Green said:
I often run into the problem of having to specify a set of files to be
processed by an application.

I have been experimenting with how you might specify constellations
with a GUI. [details snipped]
You could then export the tree as reasonably compact XML
that will regenerate a reasonable list of files even if new
directories appear, are renamed or disappear.
I am almost sure this is the wrong way round. If you can devise a
logical, reasonably intuitive, language to express a constellation
then writing a GUI front end for most of it will be relatively
straightforward. If you start with a front end then adding new
features will often require a kluge to both the GUI and the "compact
XML".

In any case, you will need to devise a machine readable representation
of a constellation just to present it in the form of a GUI to your
user. That representation may as well be as close to human readable as
possible.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

case sensitive filenames 62
Debugging regex 3
Browser news 4
Regex Puzzle 5
almost equal strings 20
Smoothing 2
How to use Densenet121 in monai 0
Avoiding fragmentation 9

Members online

Forum statistics

Threads
473,776
Messages
2,569,602
Members
45,185
Latest member
GluceaReviews

Latest Threads

Top