A proposal to handle file encodings

Discussion in 'Java' started by Roedy Green, Nov 22, 2012.

  1. Roedy Green

    Roedy Green Guest

    The problem with encodings is they are not attached in any way or
    embedded in any way in a file. You are just supposed to know how a
    file is encoded.

    Here is my idea to solve the problem.

    We invent a new encoding.

    Files in this encoding begin with a 0 byte, then an ASCII string
    giving the name of a conventional encoding then another 0 byte.

    When you read a file with this encoding, the header is invisible to
    your application. When you write a file, a header for a UTF8 file gets
    written automatically.

    You write your app telling it to read and write this new encoding e.g.
    "labeled".

    You can write a utilty to import files into your labelled universe by
    detecting or guessing or being told the encoding. It gets a header.
    Other than that the file is unmodified.
    --
    Roedy Green Canadian Mind Products http://mindprod.com
    Students who hire or con others to do their homework are as foolish
    as couch potatoes who hire others to go to the gym for them.
    Roedy Green, Nov 22, 2012
    #1
    1. Advertising

  2. Roedy Green

    Joerg Meier Guest

    On Thu, 22 Nov 2012 13:36:16 -0800, Roedy Green wrote:

    > The problem with encodings is they are not attached in any way or
    > embedded in any way in a file. You are just supposed to know how a
    > file is encoded.


    > Here is my idea to solve the problem.


    > We invent a new encoding.


    > Files in this encoding begin with a 0 byte, then an ASCII string
    > giving the name of a conventional encoding then another 0 byte.


    > When you read a file with this encoding, the header is invisible to
    > your application. When you write a file, a header for a UTF8 file gets
    > written automatically.


    > You write your app telling it to read and write this new encoding e.g.
    > "labeled".


    > You can write a utilty to import files into your labelled universe by
    > detecting or guessing or being told the encoding. It gets a header.
    > Other than that the file is unmodified.


    I can't tell whether you are being serious or doing a joke about that old
    "You have 25 standards" joke.

    However, in case you are serious, this ugly and error prone hack idea
    really belongs more with a language capable of realizing OS level/file
    system black magic like that in a somewhat sensible way. Like C.

    Liebe Gruesse,
    Joerg

    --
    Ich lese meine Emails nicht, replies to Email bleiben also leider
    ungelesen.
    Joerg Meier, Nov 22, 2012
    #2
    1. Advertising

  3. Roedy Green

    markspace Guest

    On 11/22/2012 1:36 PM, Roedy Green wrote:
    > The problem with encodings is they are not attached in any way or
    > embedded in any way in a file. You are just supposed to know how a
    > file is encoded.
    >
    > Here is my idea to solve the problem.
    >
    > We invent a new encoding.



    http://xkcd.com/927/
    markspace, Nov 23, 2012
    #3
  4. Roedy Green

    Arne Vajhøj Guest

    On 11/22/2012 4:36 PM, Roedy Green wrote:
    > The problem with encodings is they are not attached in any way or
    > embedded in any way in a file. You are just supposed to know how a
    > file is encoded.
    >
    > Here is my idea to solve the problem.
    >
    > We invent a new encoding.
    >
    > Files in this encoding begin with a 0 byte, then an ASCII string
    > giving the name of a conventional encoding then another 0 byte.
    >
    > When you read a file with this encoding, the header is invisible to
    > your application. When you write a file, a header for a UTF8 file gets
    > written automatically.
    >
    > You write your app telling it to read and write this new encoding e.g.
    > "labeled".


    It is a bad idea to have meta data in the file body. This meta data
    should be where the rest of meta data are.

    But even if it was moved to the file info area then I doubt
    the idea is good.

    It is enforcing a limitation that a text file will only have
    one encoding, that limitation does not exist today.

    There are practical problems:
    * different systems support different encodings (sometimes
    same encoding has different name) - what should a system
    do with an unknown encoding
    * there will be a huge number of legacy files without this meta
    data - what should a system do with those

    And even if those problems were solved - would it really create
    any benefits?

    It would take many years to get such an approach approved and
    widely implemented. Most likely >10 years. At that time I would
    expect UTF-8 to be almost universal used for new text files.
    Making this proposal obsolete.

    > You can write a utility to import files into your labelled universe by
    > detecting or guessing or being told the encoding.


    Which just repeat the existing problems.

    > It gets a header.
    > Other than that the file is unmodified.


    Solved much easier by using meta data.

    Arne
    Arne Vajhøj, Nov 23, 2012
    #4
  5. Roedy Green

    markspace Guest

    On 11/22/2012 5:25 PM, Arne Vajhøj wrote:
    >
    > Solved much easier by using meta data.



    I think Roedy is talking about the physical encoding of the meta data.
    I personally agree with him in this regard: meta data should be encoded
    into the physical file.

    Consider for example a meta data format that we all use: the Jar file.

    Each single Jar file is actually composed of many pieces of information.
    Class files, resources, libraries, the manifest file, etc. And yet
    it's all encoded into a single physical file. You never loose pieces of
    the file just because you made a copy of the file. You never have to
    worry about the meta data changing on a new system just because it's *new*.

    Contrast that with other schemes. Macintosh, I believe, uses a meta
    data format where the data is in one file, and the meta data occupies a
    second physical file with a name like .file-name.meta (I don't use Macs
    so I'm not 100%) sure. So if you use a raw copy command ("cp" from the
    Unix command line) you *don't* get the meta data, because you forgot to
    copy it.

    I hope you can all quickly see how obviously broken that is. Since we
    all use Jar files I think you can all reflect on the idea that it's a
    good solution. Have you ever had a problem with a Jar file retaining
    its meta data? Is it ever desirable to have a Jar file's meta data
    revert to nulls just because you FTP'ed the file someplace? I've never
    desired that "feature".

    It seems obvious to me. Encoding the meta data into a single physical
    file is by far the better solution.

    No, where I think Roedy goes wrong is to invent a *new* file format. My
    solution: use what's there already, just use Jar files.

    Proposal: Add a property "Data-Archive" like so:

    Manifest-Version: 1.0
    Data-Archive: /data

    Where the value of the Data-Archive is the path to the primary data
    stream (within the Zip/Jar file). You can just add an encoding or
    mime-type or any other property to the manifest you like to describe
    your data stream and you're set.

    Note that this is already being done. Open Office uses Jar files as its
    native file format. They just rename the extension as they wish, and
    open the file appropriately for a Jar file. They also store a lot more
    meta data than just a couple of properties, so they effectively have
    their own format, not this simple one.

    It might be useful to try to solve some common cases for data and
    meta-data. What I've got here is a single data stream and a single
    "type" property. It wouldn't be hard to extend this to several streams
    and several properties each. I think that would be the only other
    useful general case; after that you should just roll your own solution.

    BTW if anyone is copying this up to their website (mindprod), please
    credit appropriately: Brenden Towey.
    markspace, Nov 23, 2012
    #5
  6. Roedy Green

    Roedy Green Guest

    On Thu, 22 Nov 2012 19:47:09 -0800, markspace <-@.> wrote, quoted or
    indirectly quoted someone who said :

    >Each single Jar file is actually composed of many pieces of information.
    > Class files, resources, libraries, the manifest file, etc. And yet
    >it's all encoded into a single physical file. You never loose pieces of
    >the file just because you made a copy of the file. You never have to
    >worry about the meta data changing on a new system just because it's *new*.


    Yes, yes! The OS people have proved incompetent at keeping metadata
    separately from the file. We need formats where the metadata is part
    of the file. With text files the most important piece of metadata is
    the encoding. We do it sometimes, jpg, jar, csv (sometimes), video
    files,

    More generally the mime type is something you should be able to get
    with File.getMime()

    Imagine if you could do:

    File.getEncoding()
    File.getVersion()
    File.getCopyrightOwner()
    File.getCopyrightDate()

    Meta data-compliant file would look just like any other but with a
    header of the form
    0 <meta>...</meta> 0

    The meta data could be stored as XML. That gives you ability to add
    extra info without having to change the standard.

    the header is in ASCII 7-bit.


    We should be using somewhat more complicated formats for files with
    embedded metadata.

    As an application programmer you want to be able to have the system
    parse it for you. You get to pretend it is not there, but with the
    ability to query it.

    This reminds me a bit of the innovation of ANSI labelled mag tapes
    back in the 60s.

    The bBase people got this right long ago. You don't go writing files
    without a header describing the format of what was in the file.


    --
    Roedy Green Canadian Mind Products http://mindprod.com
    Students who hire or con others to do their homework are as foolish
    as couch potatoes who hire others to go to the gym for them.
    Roedy Green, Nov 23, 2012
    #6
  7. Roedy Green

    Jan Burse Guest

    Hi,

    If your files are HTML, then you can note the encoding in the
    header, via a meta tag:

    <html>
    <head>
    <meta http-equiv="content-type" content="text/html; charset=UTF-8">
    </head>
    <body>
    </body>
    </html>
    http://de.wikipedia.org/wiki/Meta-Element#.C3.84quivalente_zu_HTTP-Kopfdaten

    If your files are XML, then you can note the encoding in the
    xml tag:

    <?xml version="1.0" encoding="ISO-8859-1"?>
    http://de.wikipedia.org/wiki/XML-Deklaration

    If your file is plain text, you can insert a BOM, which allows to
    automatically detect a couple of encoding. And skip the BOM during
    reading. The BOM is:

    \uFEFF
    http://de.wikipedia.org/wiki/Byte_Order_Mark

    Would this not cover your requirements?

    Bye


    Roedy Green schrieb:
    > The problem with encodings is they are not attached in any way or
    > embedded in any way in a file. You are just supposed to know how a
    > file is encoded.
    >
    > Here is my idea to solve the problem.
    >
    > We invent a new encoding.
    >
    > Files in this encoding begin with a 0 byte, then an ASCII string
    > giving the name of a conventional encoding then another 0 byte.
    >
    > When you read a file with this encoding, the header is invisible to
    > your application. When you write a file, a header for a UTF8 file gets
    > written automatically.
    >
    > You write your app telling it to read and write this new encoding e.g.
    > "labeled".
    >
    > You can write a utilty to import files into your labelled universe by
    > detecting or guessing or being told the encoding. It gets a header.
    > Other than that the file is unmodified.
    >
    Jan Burse, Nov 23, 2012
    #7
  8. Roedy Green

    Roedy Green Guest

    On Fri, 23 Nov 2012 16:33:40 +0100, Jan Burse <>
    wrote, quoted or indirectly quoted someone who said :

    >
    >Would this not cover your requirements?


    The problem is primarily raw text files with no indication of the
    encoding.

    The HTML encoding is incompetent. You can't read it without knowing
    the encoding. It is just a confirmation. Thankfully the encoding comes
    in the HTTP header -- a case where meta information is available.

    I feel angry about this. What asshole dreamed up the idea of
    exchanging files in various encodings without any labelling of the
    encoding? That there is no universal way of identifying the format of
    a file is astounding. Parents who thought this way would send their
    kids out into the world not knowing their names, addresses, or
    genders.

    It sounds like something one of those people who live on beer and
    pizza, with a roomful of old pizza boxes lying around would have come
    up with. I wish Martha Stewart had gone into programming.
    --
    Roedy Green Canadian Mind Products http://mindprod.com
    Students who hire or con others to do their homework are as foolish
    as couch potatoes who hire others to go to the gym for them.
    Roedy Green, Nov 23, 2012
    #8
  9. Roedy Green

    Jan Burse Guest

    Roedy Green schrieb:
    > The HTML encoding is incompetent. You can't read it without knowing
    > the encoding. It is just a confirmation. Thankfully the encoding comes
    > in the HTTP header -- a case where meta information is available.


    For example when you edit a HTML file locally, you don't
    have this HTTP header information. Also where does the HTTP
    header get the charset information in the first place?

    Scenario 1:
    - HTTP returns only mimetype=text/html without
    the chartset option.
    - The browser then reads the HTML doc meta tag, and
    adjust the charset.

    Scenario 2:
    - HTTP returns mimetype=text/html; charset=<encoding>
    fetched from the HTML file meta tag.
    - The browser does not read the HTML doc meta tag, and
    follows the charset found in the mimetype.

    In both scenarios 1 + 2, the meta tag is used. Don't
    know whether there is a scenario 3, and where should
    this scenario take the encoding from?

    Bye
    Jan Burse, Nov 23, 2012
    #9
  10. On 11/23/2012 11:02 AM, Roedy Green wrote:
    > On Fri, 23 Nov 2012 16:33:40 +0100, Jan Burse <>
    > wrote, quoted or indirectly quoted someone who said :
    >
    >>
    >> Would this not cover your requirements?

    >
    > The problem is primarily raw text files with no indication of the
    > encoding.
    >
    > The HTML encoding is incompetent. You can't read it without knowing
    > the encoding. It is just a confirmation. Thankfully the encoding comes
    > in the HTTP header -- a case where meta information is available.


    Except that sometimes the HTTP header is wrong. I have seen enough
    UTF-8/ISO 8859-1 mojibake that I don't tend to place great confidence in
    metadata except at the most direct level in the protocol (e.g., though
    RFC 3977 dictates that NNTP transport is all done in UTF-8, I have
    enough experience to know that this is a fiction not borne by reality;
    but if I message says that it has an encoding of UTF-8 in its header,
    I'll trust that the message body is actually UTF-8).

    In general, the optimal way to handle encoding in this modern day and
    age is the following is an extremely simple algorithm:
    1. Always write out UTF-8.
    2. When reading, if it doesn't fail to parse as UTF-8, assume it's
    UTF-8. Otherwise, assume it's the "platform default" (which generally
    means ISO 8859-1).

    --
    Beware of bugs in the above code; I have only proved it correct, not
    tried it. -- Donald E. Knuth
    Joshua Cranmer, Nov 23, 2012
    #10
  11. On 2012-11-23 18:21, Jan Burse <> wrote:
    > Roedy Green schrieb:
    >> The HTML encoding is incompetent. You can't read it without knowing
    >> the encoding.


    Not true in practice. Almost all encodings used in the real world are
    some superset of ASCII, and you only need to recognize ASCII characters
    to find the relevant meta tag.

    >> It is just a confirmation. Thankfully the encoding comes
    >> in the HTTP header -- a case where meta information is available.

    [...]
    > Scenario 2:
    > - HTTP returns mimetype=text/html; charset=<encoding>
    > fetched from the HTML file meta tag.


    Which web server does this? I think CERN httpd did, back in the 1990's,
    but I don't think any of the current crop of servers does, at least not
    without some extra plugins. Normally the charset is taken from the
    server config.

    hp


    --
    _ | Peter J. Holzer | Fluch der elektronischen Textverarbeitung:
    |_|_) | Sysadmin WSR | Man feilt solange an seinen Text um, bis
    | | | | die Satzbestandteile des Satzes nicht mehr
    __/ | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel
    Peter J. Holzer, Nov 23, 2012
    #11
  12. Roedy Green

    Jan Burse Guest

    Peter J. Holzer schrieb:
    >> Scenario 2:
    >> >- HTTP returns mimetype=text/html; charset=<encoding>
    >> > fetched from the HTML file meta tag.

    > Which web server does this? I think CERN httpd did, back in the 1990's,
    > but I don't think any of the current crop of servers does, at least not
    > without some extra plugins. Normally the charset is taken from the
    > server config.


    Its the only way to retrieve the charset:
    http://tools.ietf.org/html/rfc2045#section-5.1

    Its also the only way to set the chartset in dynamic pages.
    For example in JSP one has to do the following:

    <%@page contentType="text/html; charset=UTF-8" %>

    There is a header field Content-Encoding, which
    is not what Roedy wants I guess. Since the term
    "Encoding" refers to compression here:
    http://en.wikipedia.org/wiki/HTTP_compression

    I guess Roedy wants the charset.

    Bye
    Jan Burse, Nov 23, 2012
    #12
  13. Roedy Green

    Jan Burse Guest

    Joshua Cranmer schrieb:
    >
    > In general, the optimal way to handle encoding in this modern day and
    > age is the following is an extremely simple algorithm:
    > 1. Always write out UTF-8.
    > 2. When reading, if it doesn't fail to parse as UTF-8, assume it's
    > UTF-8. Otherwise, assume it's the "platform default" (which generally
    > means ISO 8859-1).


    This advice is only valid, if you cannot influence the charset
    on the server side, via for example setting an appropriate mimetype. But
    otherwise it works perfectly fine.

    What is a little bit annonying is that I didn't find a MimeType
    decoder for the client side that easily delivers me the
    charset parameter. So I had to write my own.

    In the class comment of this custom decoder I wrote:

    * <p>Needed for pre JRE 1.5 code, since later in JRE 1.6 the
    * activation framework has been bundled and one can use
    * javax.activation.MimeType</p>

    Just wrap your con.getContentType() into this class, and then
    call getParameter().

    Bye
    Jan Burse, Nov 24, 2012
    #13
  14. On 2012-11-23 23:53, Jan Burse <> wrote:
    > Peter J. Holzer schrieb:
    >>> Scenario 2:
    >>> >- HTTP returns mimetype=text/html; charset=<encoding>
    >>> > fetched from the HTML file meta tag.

    >> Which web server does this? I think CERN httpd did, back in the 1990's,
    >> but I don't think any of the current crop of servers does, at least not
    >> without some extra plugins. Normally the charset is taken from the
    >> server config.

    >
    > Its the only way to retrieve the charset:
    > http://tools.ietf.org/html/rfc2045#section-5.1


    That section defines the meaning of the Content-Type header, it doesn't
    say anything about how that header is derived. It certainly doesn't say
    anything about a web server (RFC 2045 is about mail, not web) extracting
    the content type from an html file (the word "html" isn't even
    mentioned).


    > Its also the only way to set the chartset in dynamic pages.
    > For example in JSP one has to do the following:
    >
    ><%@page contentType="text/html; charset=UTF-8" %>


    This is something completely different than
    <meta http-equiv="content-type" content="text/html; charset=...">

    The former is a JSP directive which gets translated into some Java code
    which sets the Content-Type header of the HTTP response (probably by
    calling setContentType() of the ServletResponse object).

    The latter is just an element of the HTML response. It is typically
    interpreted by the browser (but only if no charset was specified in the
    HTTP header), not by the server.

    hp


    --
    _ | Peter J. Holzer | Fluch der elektronischen Textverarbeitung:
    |_|_) | Sysadmin WSR | Man feilt solange an seinen Text um, bis
    | | | | die Satzbestandteile des Satzes nicht mehr
    __/ | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel
    Peter J. Holzer, Nov 24, 2012
    #14
  15. Roedy Green

    Roedy Green Guest

    On Sat, 24 Nov 2012 00:11:36 +0100, "Peter J. Holzer"
    <> wrote, quoted or indirectly quoted someone who
    said :

    >>> The HTML encoding is incompetent. You can't read it without knowing
    >>> the encoding.

    >
    >Not true in practice. Almost all encodings used in the real world are
    >some superset of ASCII, and you only need to recognize ASCII characters
    >to find the relevant meta tag.


    You still have the 8- 16- bit,which you can figure out with the BOM in
    most cases. It is still Mickey Mouse. The encoding should be at the
    very front and encoded in ASCII or something fixed.
    --
    Roedy Green Canadian Mind Products http://mindprod.com
    Students who hire or con others to do their homework are as foolish
    as couch potatoes who hire others to go to the gym for them.
    Roedy Green, Nov 24, 2012
    #15
  16. Roedy Green

    Roedy Green Guest

    On Sat, 24 Nov 2012 00:53:51 +0100, Jan Burse <>
    wrote, quoted or indirectly quoted someone who said :

    >I guess Roedy wants the charset.


    In HTTP the meta information is in the HTTP header. This is all very
    well except the that the server is just guessing. It is serving a
    standard header for all documents with a given extension. The meta
    info needs to be in the document itself. Ditto for MIME type.

    If the document is transported compressed e.g. SPDY
    http://mindprod.com/jgloss/spdy.html
    and fluffed on the other end, then that compression is not part of the
    document meta data. If it is kept around compressed, e.g. zip, then it
    is.

    When it arrives, and is saved on disk, the meta info needs to be
    retained, so that an editor knows how to deal with it. The only way
    you can do that is is if the meta info is embedded in the file.

    The half-assed way we do things depends on the fact encodings are not
    all that different. You can get it wrong and still muddle through.
    --
    Roedy Green Canadian Mind Products http://mindprod.com
    Students who hire or con others to do their homework are as foolish
    as couch potatoes who hire others to go to the gym for them.
    Roedy Green, Nov 24, 2012
    #16
  17. On 2012-11-24 14:42, Roedy Green <> wrote:
    > On Sat, 24 Nov 2012 00:11:36 +0100, "Peter J. Holzer"
    ><> wrote, quoted or indirectly quoted someone who
    > said :
    >>>> The HTML encoding is incompetent. You can't read it without knowing
    >>>> the encoding.

    >>
    >>Not true in practice. Almost all encodings used in the real world are
    >>some superset of ASCII, and you only need to recognize ASCII characters
    >>to find the relevant meta tag.

    >
    > You still have the 8- 16- bit,which you can figure out with the BOM in
    > most cases.


    In this case the encoding is already known and the meta element must not
    be used:

    | The META declaration must only be used when the character encoding is
    | organized such that ASCII-valued bytes stand for ASCII characters (at
    | least until the META element is parsed).
    -- http://www.w3.org/TR/1999/REC-html401-19991224/charset.html

    > It is still Mickey Mouse.


    That wasn't your claim. Your claim was that it's impossible while all
    browsers in the last 15 years or so have demonstrated that it is in
    practice possible - on billions of web sites.

    > The encoding should be at the very front and encoded in ASCII or
    > something fixed.


    It is encoded in ASCII, and it

    | should appear as early as possible in the HEAD element.
    -- http://www.w3.org/TR/1999/REC-html401-19991224/charset.html

    And of course there is always the HTTP header. In fact your whole
    proposal sounds like an extremely simplified version of the MIME header.
    Which was invented 20 years ago and is widely used.

    And frankly, you picked the least interesting aspect of MIME: You can
    just require that UTF-8 is the only permissible encoding for plain text
    files. That's much simpler and more likely to be implemented than
    requiring the all text files must start with a header declaring the
    encoding. At the same time you are missing out on other aspects of plain
    text files (e.g., newline as line end vs. paragraph end, flowed) and of
    course everything except plain text.

    hp


    --
    _ | Peter J. Holzer | Fluch der elektronischen Textverarbeitung:
    |_|_) | Sysadmin WSR | Man feilt solange an seinen Text um, bis
    | | | | die Satzbestandteile des Satzes nicht mehr
    __/ | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel
    Peter J. Holzer, Nov 25, 2012
    #17
  18. On 2012-11-24 14:50, Roedy Green <> wrote:
    > On Sat, 24 Nov 2012 00:53:51 +0100, Jan Burse <>
    > wrote, quoted or indirectly quoted someone who said :
    >>I guess Roedy wants the charset.

    >
    > In HTTP the meta information is in the HTTP header. This is all very
    > well except the that the server is just guessing.


    No. Normally it isn't guessing at all. It just uses the configured
    charset.

    > It is serving a standard header for all documents with a given
    > extension.


    Right. It is the responsibility of the server operator to make sure that
    the extension matches the intended content-type. The server doesn't look
    into the file to derive the content-type.

    (For the "static files in a file system" case. Of course there are lots
    of other cases, most prominently CMSs, where the finished HTML document
    is assembled out of pieces stored in a database)

    > The meta info needs to be in the document itself. Ditto for MIME type.


    Then you wouldn't need a mime-type. That was invented precicely because
    not all file formats are self-identifying.

    hp


    --
    _ | Peter J. Holzer | Fluch der elektronischen Textverarbeitung:
    |_|_) | Sysadmin WSR | Man feilt solange an seinen Text um, bis
    | | | | die Satzbestandteile des Satzes nicht mehr
    __/ | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel
    Peter J. Holzer, Nov 25, 2012
    #18
  19. On 2012-11-24 15:51, Martin Gregorie <> wrote:
    > IBM got it pretty much right in the OS/400 operating system. The metadata,
    > which is held in the filing system catalogue, is transparently and
    > permanently associated with the file. Its a general mechanism: the system
    > provides standard metadata for source files, executables etc. and the
    > developer creates the metadata for, e.g. fixed field data files with
    > keyed access. The only demerit is that it uses a rather ugly two level
    > filing system.
    >
    > The UNIX/Linux equivalent would be to keep the meta-data in the file's
    > inode alongside the access permissions


    File attributes have existed on ext* filesystems for a very long time.

    > and to modify the file copy and move operations


    There is no file copy operation on the OS level. The kernel just sees
    that a process is creating and writing a new file. It doesn't know
    whether this process intends this new file to be an identical copy of
    some other file.

    rename(2) of course preserves file attributes, because it doesn't change
    the file at all (except the ctime entry), only the directories linking
    to it.

    cp, rsync, tar, etc. have options to copy the attributes along with
    the "normal" content. But the problem is that there are a lot of
    utilities working on files and they would all have to be modified.
    And worse, there isn't any standard for using those attributes, so
    nobody uses them, so there is little incentive to modify them.

    hp


    --
    _ | Peter J. Holzer | Fluch der elektronischen Textverarbeitung:
    |_|_) | Sysadmin WSR | Man feilt solange an seinen Text um, bis
    | | | | die Satzbestandteile des Satzes nicht mehr
    __/ | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel
    Peter J. Holzer, Nov 25, 2012
    #19
  20. Roedy Green

    Sven Köhler Guest

    Am 23.11.2012 02:25, schrieb Arne Vajhøj:
    > It is a bad idea to have meta data in the file body. This meta data
    > should be where the rest of meta data are.


    Now which OS actually supports this idea?

    Are you saying that XML is bad, because it contains metadata (i.e. the
    encoding/charset) inside the file body?
    Sven Köhler, Nov 25, 2012
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Dietrich
    Replies:
    1
    Views:
    636
    Joe Smith
    Jul 22, 2004
  2. Leon
    Replies:
    2
    Views:
    521
  3. Grzegorz Smith

    how to write file with cp1250 encodings?

    Grzegorz Smith, Feb 27, 2006, in forum: Python
    Replies:
    3
    Views:
    655
    jean-michel bain-cornu
    Mar 3, 2006
  4. Replies:
    0
    Views:
    367
  5. Jaroslav Dobrek

    read from file with mixed encodings in Python3

    Jaroslav Dobrek, Nov 7, 2011, in forum: Python
    Replies:
    2
    Views:
    253
    Peter Otten
    Nov 7, 2011
Loading...

Share This Page