Re: Saving (unusual) linux filenames

Discussion in 'Python' started by MRAB, Aug 31, 2010.

  1. MRAB

    MRAB Guest

    On 31/08/2010 15:49, wrote:
    > Hi,
    >
    > i have a script that reads and writes linux paths in a file. I save the
    > path (as unicode) with 2 other variables. I save them seperated by ","
    > and the "packets" by newlines. So my file looks like this:
    > path1, var1A, var1B
    > path2, var2A, var2B
    > path3, var3A, var3B
    > ....
    >
    > this works for "normal" paths but as soon as i have a path that does
    > include a "," it breaks. The problem now is that (afaik) linux allows
    > every char (aside from "/" and null) to be used in filenames. The only
    > solution i can think of is using null as a seperator, but there have to
    > a cleaner version ?
    >

    You could use a tab character '\t' instead.
     
    MRAB, Aug 31, 2010
    #1
    1. Advertising

  2. On 2010-08-31, MRAB <> wrote:
    > On 31/08/2010 15:49, wrote:
    >> Hi,
    >>
    >> i have a script that reads and writes linux paths in a file. I save the
    >> path (as unicode) with 2 other variables. I save them seperated by ","
    >> and the "packets" by newlines. So my file looks like this:
    >> path1, var1A, var1B
    >> path2, var2A, var2B
    >> path3, var3A, var3B
    >> ....
    >>
    >> this works for "normal" paths but as soon as i have a path that does
    >> include a "," it breaks. The problem now is that (afaik) linux allows
    >> every char (aside from "/" and null) to be used in filenames. The only
    >> solution i can think of is using null as a seperator, but there have to
    >> a cleaner version ?

    >
    > You could use a tab character '\t' instead.


    That just breaks with a different set of filenames.

    --
    Grant Edwards grant.b.edwards Yow! ! Everybody out of
    at the GENETIC POOL!
    gmail.com
     
    Grant Edwards, Aug 31, 2010
    #2
    1. Advertising

  3. MRAB

    MRAB Guest

    On 31/08/2010 17:58, Grant Edwards wrote:
    > On 2010-08-31, MRAB<> wrote:
    >> On 31/08/2010 15:49, wrote:
    >>> Hi,
    >>>
    >>> i have a script that reads and writes linux paths in a file. I save the
    >>> path (as unicode) with 2 other variables. I save them seperated by ","
    >>> and the "packets" by newlines. So my file looks like this:
    >>> path1, var1A, var1B
    >>> path2, var2A, var2B
    >>> path3, var3A, var3B
    >>> ....
    >>>
    >>> this works for "normal" paths but as soon as i have a path that does
    >>> include a "," it breaks. The problem now is that (afaik) linux allows
    >>> every char (aside from "/" and null) to be used in filenames. The only
    >>> solution i can think of is using null as a seperator, but there have to
    >>> a cleaner version ?

    >>
    >> You could use a tab character '\t' instead.

    >
    > That just breaks with a different set of filenames.
    >

    How many filenames contain control characters? Surely that's a bad idea.
     
    MRAB, Aug 31, 2010
    #3
  4. MRAB

    Nobody Guest

    On Tue, 31 Aug 2010 18:13:44 +0100, MRAB wrote:

    >>>> this works for "normal" paths but as soon as i have a path that does
    >>>> include a "," it breaks. The problem now is that (afaik) linux allows
    >>>> every char (aside from "/" and null) to be used in filenames. The only
    >>>> solution i can think of is using null as a seperator, but there have to
    >>>> a cleaner version ?
    >>>
    >>> You could use a tab character '\t' instead.

    >>
    >> That just breaks with a different set of filenames.
    >>

    > How many filenames contain control characters? Surely that's a bad idea.


    It may be a bad idea, but it's permitted by the OS. If you're writing a
    general-purpose tool, having it flake out whenever it encounters an
    "unusual" filename is also a bad idea.

    FWIW, my usual solution is URL-encoding (i.e. replacing any "awkward"
    character by a "%" followed by two hex digits representing the byte's
    value). It has the advantage that you can extend the set of bytes which
    need encoding as needed without having to change the code (e.g. you can
    provide a command-line argument or configuration file setting which
    specifies which bytes need to be encoded).
     
    Nobody, Aug 31, 2010
    #4
  5. On 2010-08-31, MRAB <> wrote:
    > On 31/08/2010 17:58, Grant Edwards wrote:
    >> On 2010-08-31, MRAB<> wrote:
    >>> On 31/08/2010 15:49, wrote:
    >>>> Hi,
    >>>>
    >>>> i have a script that reads and writes linux paths in a file. I save the
    >>>> path (as unicode) with 2 other variables. I save them seperated by ","
    >>>> and the "packets" by newlines. So my file looks like this:
    >>>> path1, var1A, var1B
    >>>> path2, var2A, var2B
    >>>> path3, var3A, var3B
    >>>> ....
    >>>>
    >>>> this works for "normal" paths but as soon as i have a path that does
    >>>> include a "," it breaks. The problem now is that (afaik) linux allows
    >>>> every char (aside from "/" and null) to be used in filenames. The only
    >>>> solution i can think of is using null as a seperator, but there have to
    >>>> a cleaner version ?
    >>>
    >>> You could use a tab character '\t' instead.

    >>
    >> That just breaks with a different set of filenames.
    >>

    > How many filenames contain control characters?


    How many filenames contain ","? Not many, but the OP wants his
    program to be bulletproof. Can't fault him for that.

    If I had a nickle for every Unix program or shell-script that failed
    when a filename had a space it it....

    > Surely that's a bad idea.


    Of course it's a bad idea. That doesn't stop people from doing it.

    --
    Grant Edwards grant.b.edwards Yow! ! Now I understand
    at advanced MICROBIOLOGY and
    gmail.com th' new TAX REFORM laws!!
     
    Grant Edwards, Aug 31, 2010
    #5
  6. MRAB

    MRAB Guest

    On 31/08/2010 19:33, Nobody wrote:
    > On Tue, 31 Aug 2010 18:13:44 +0100, MRAB wrote:
    >
    >>>>> this works for "normal" paths but as soon as i have a path that does
    >>>>> include a "," it breaks. The problem now is that (afaik) linux allows
    >>>>> every char (aside from "/" and null) to be used in filenames. The only
    >>>>> solution i can think of is using null as a seperator, but there have to
    >>>>> a cleaner version ?
    >>>>
    >>>> You could use a tab character '\t' instead.
    >>>
    >>> That just breaks with a different set of filenames.
    >>>

    >> How many filenames contain control characters? Surely that's a bad idea.

    >
    > It may be a bad idea, but it's permitted by the OS.

    [snip]
    So are viruses. :)
     
    MRAB, Aug 31, 2010
    #6
  7. Hi Grant,

    On 2010-08-31 20:49, Grant Edwards wrote:
    > How many filenames contain ","?


    CVS repository files end with ,v . However, just let's agree
    that nobody uses CVS anymore. :)

    > Not many, but the OP wants his
    > program to be bulletproof. Can't fault him for that.


    What about using the csv (not CVS) module?

    Stefan
     
    Stefan Schwarzer, Aug 31, 2010
    #7
  8. MRAB

    Alan Meyer Guest

    On 8/31/2010 2:33 PM, Nobody wrote:

    ....
    > FWIW, my usual solution is URL-encoding (i.e. replacing any "awkward"
    > character by a "%" followed by two hex digits representing the byte's
    > value). It has the advantage that you can extend the set of bytes which
    > need encoding as needed without having to change the code (e.g. you can
    > provide a command-line argument or configuration file setting which
    > specifies which bytes need to be encoded).


    I like that one.

    A similar solution is to use an escape character, e.g., backslash, e.g.,
    "This is a backslash\\ and this is a comma\,."

    However, because the comma won't appear at all in the URL-encoded
    version, it has the virtue of still allowing you to split on commas.

    You must of course also URL encode the '%' as %25, e.g.,
    "Here is a comma (%2C) and this (%2C) is a percent sign."

    Alan
     
    Alan Meyer, Sep 1, 2010
    #8
  9. MRAB

    Nobody Guest

    On Tue, 31 Aug 2010 18:49:33 +0000, Grant Edwards wrote:

    >> How many filenames contain control characters?

    >
    > How many filenames contain ","? Not many,


    Unless you only ever deal with "Unix folk", it's not /that/ uncommon to
    encounter filenames which are essentially complete sentences, punctuation
    included.

    FWIW, I've found that a significant proportion of "why can't I burn this
    file to a CD" queries are because the Joliet extension to ISO-9660 "only"
    allows 64 characters in a filename.
     
    Nobody, Sep 1, 2010
    #9
  10. In article <i5jirs$4ae$>,
    Grant Edwards <> wrote:
    >On 2010-08-31, MRAB <> wrote:
    >> On 31/08/2010 17:58, Grant Edwards wrote:
    >>> On 2010-08-31, MRAB<> wrote:
    >>>> On 31/08/2010 15:49, wrote:
    >>>>> Hi,
    >>>>>
    >>>>> i have a script that reads and writes linux paths in a file. I save the
    >>>>> path (as unicode) with 2 other variables. I save them seperated by ","
    >>>>> and the "packets" by newlines. So my file looks like this:
    >>>>> path1, var1A, var1B
    >>>>> path2, var2A, var2B
    >>>>> path3, var3A, var3B
    >>>>> ....
    >>>>>
    >>>>> this works for "normal" paths but as soon as i have a path that does
    >>>>> include a "," it breaks. The problem now is that (afaik) linux allows
    >>>>> every char (aside from "/" and null) to be used in filenames. The only
    >>>>> solution i can think of is using null as a seperator, but there have to
    >>>>> a cleaner version ?
    >>>>
    >>>> You could use a tab character '\t' instead.
    >>>
    >>> That just breaks with a different set of filenames.
    >>>

    >> How many filenames contain control characters?

    >
    >How many filenames contain ","? Not many, but the OP wants his
    >program to be bulletproof. Can't fault him for that.


    As appending ",v" is the convention for rcs / cvs archives, I would
    say: a lot. Enough to guarantee that all my backup tar's contain at
    least a few.

    >
    >If I had a nickle for every Unix program or shell-script that failed
    >when a filename had a space it it....


    I'd rather have it fail for spaces than for comma's.

    >
    >> Surely that's a bad idea.

    >
    >Of course it's a bad idea. That doesn't stop people from doing it.
    >
    >--
    >Grant Edwards grant.b.edwards Yow! ! Now I understand
    > at advanced MICROBIOLOGY and
    > gmail.com th' new TAX REFORM laws!!



    --
    --
    Albert van der Horst, UTRECHT,THE NETHERLANDS
    Economic growth -- being exponential -- ultimately falters.
    albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst
     
    Albert van der Horst, Sep 1, 2010
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. John Kandell
    Replies:
    4
    Views:
    4,228
    eeebop
    Dec 10, 2004
  2. Luis Esteban Valencia
    Replies:
    0
    Views:
    2,555
    Luis Esteban Valencia
    Jan 6, 2005
  3. B.J.
    Replies:
    4
    Views:
    772
    Toby Inkster
    Apr 23, 2005
  4. Replies:
    3
    Views:
    295
  5. Craig Manley
    Replies:
    4
    Views:
    372
    Bryan Castillo
    Apr 28, 2004
Loading...

Share This Page