check html file size

Discussion in 'Perl Misc' started by Xah Lee, Oct 5, 2005.

  1. Xah Lee

    Xah Lee Guest

    would anyone like to translate the following perl script to Python or
    Scheme (scsh)?

    the file takes a inpath, and report all html files in it above certain
    size. (counting inline images)
    also print a sorted report of html files and their size.

    (a copy of the script is here:
    http://xahlee.org/_scripts/check_file_size.pl
    )

    Xah

    ∑ http://xahlee.org/


    # perl

    # Tue Oct 4 14:36:48 PDT 2005
    # given a dir, report all html file's size. (counting inline images)
    # XahLee.org

    use Data::Dumper;
    use File::Find;
    use File::Basename;

    $inpath = '/Users/t/web/mydirectory/';
    $sizeLimit = 800 * 1000;

    # $inpath = $ARGV[0]; # should give a full path; else the
    $File::Find::dir won't give full path.
    while ($inpath =~ m@^(.+)/$@) { $inpath = $1;} # get rid of trailing
    slash

    die "dir $inpath doesn't exist! $!" unless -e $inpath;


    ##################################################
    # subroutines


    # getInlineImg($file_full_path) returns a array that is a list of
    inline images. For example, it may return ('xx.jpg','../image.png')
    sub getInlineImg ($) { $full_file_name= $_[0];
    @linx =(); open (FF, "<$full_file_name") or die "error: can not open
    $full_file_name $!";
    while (<FF>) { @txt_segs = split(m/img/, $_); shift @txt_segs;
    for $lin (@txt_segs) { if ($lin =~ m@ src\s*=\s*\"([^\"]+)\"@i) {
    push @linx, $1; }}
    } close FF;
    return @linx;
    }

    # linkFullPath($dir,$locallink) returns a string that is the full path
    to the local link. For example,
    linkFullPath('/Users/t/public_html/a/b', '../image/t.png') returns
    'Users/t/public_html/a/image/t.png'. The returned result will not
    contain double slash or '../' string.
    sub linkFullPath($$){ $result=$_[0] . $_[1]; while ($result =~
    s@\/\/@\/@) {}; while ($result =~ s@/[^\/]+\/\.\.@@) {}; return
    $result;}


    # listLocalLinks($html_file_full_path) returns a array where each
    element is a full path of local links in the html.
    sub listLocalLinks($) {
    my $htmlfile= $_[0];

    my ($name, $dir, $suffix) = fileparse($htmlfile, ('\.html') );
    my @aa = getlinks($htmlfile);
    @aa = grep(!m/\#/, @aa);
    @aa = grep (!m/^mailto:/, @aa);
    @aa = grep (!m/^http:/, @aa);

    my @linkedFiles=();
    foreach my $lix (@aa) { push @linkedFiles, linkFullPath($dir,$lix);}
    return @linkedFiles;
    }


    # listInlineImg($html_file_full_path) returns a array where each
    element is a full path to inline images in the html.
    sub listInlineImg($) {
    my $htmlfile= $_[0];

    my ($name, $dir, $suffix) = fileparse($htmlfile, ('\.html') );
    my @aa = getInlineImg($htmlfile);

    my @result=();
    foreach my $ele (@aa) { push @result, linkFullPath($dir,$ele);}
    return @result;
    }

    ##################################################
    sub checkLink {
    if (
    -T $File::Find::name
    && $File::Find::name =~ m@\.html$@
    ) {
    $total= -s $File::Find::name;
    @h2 = listInlineImg($File::Find::name);
    for my $ln (@h2) {$total += -s $ln;};
    if ( $total > $sizeLimit) {print "problem: file:
    $File::Find::name, size: $total\n";}

    push (@result, [$total, $File::Find::name]);
    };
    }

    find(\&checkLink, $inpath);

    @result = sort { $b->[0] <=> $a->[0]} @result;

    print Dumper(\@result);
    print "done reporting. (any file above size are printed above.)";

    __END__
     
    Xah Lee, Oct 5, 2005
    #1
    1. Advertising

  2. Xah Lee

    Matt Garrish Guest

    "Xah Lee" <> wrote in message
    news:...
    > would anyone like to translate the following perl script to Python or
    > Scheme (scsh)?


    Even if you weren't an incredibly offensive and petulant poster, what makes
    you think anyone would write a script from you?

    Matt
     
    Matt Garrish, Oct 5, 2005
    #2
    1. Advertising

  3. Matt Garrish wrote:

    > Even if you weren't an incredibly offensive and petulant poster, what makes
    > you think anyone would write a script from you?


    Because in addition to being offensive and petulant, he's also an idiot.

    --
    Erik Max Francis && && http://www.alcyone.com/max/
    San Jose, CA, USA && 37 20 N 121 53 W && AIM erikmaxfrancis
    There is no fate that cannot be surmounted by scorn.
    -- Albert Camus
     
    Erik Max Francis, Oct 5, 2005
    #3
  4. Xah Lee <> wrote:

    > would anyone like to translate the following perl script to Python or
    > Scheme (scsh)?



    Yes, I would.


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
     
    Tad McClellan, Oct 5, 2005
    #4
  5. On Tue, 04 Oct 2005 17:44:02 -0700, Xah Lee wrote:

    > would anyone like to translate the following perl script to Python or
    > Scheme (scsh)?


    Are you fucking seriously fucking expecting some fucking moron to
    translate your tech geeking fucking code moronicity? Fucking try writing
    it fucking properly in fucking Perl first.

    --
    I guess everybody's the same: Gotta be good at your job before you can enjoy the rest of your life
    -- Cole Trickle
     
    Richard Gration, Oct 5, 2005
    #5
  6. Richard Gration wrote:
    > Are you fucking seriously fucking expecting some fucking moron to
    > translate your tech geeking fucking code moronicity? Fucking try writing
    > it fucking properly in fucking Perl first.


    Fucking excuse me?

    Fucking maybe you should fucking go fucking **** your fucking self...

    Seriously, Xah might be a troll, but this is just pathetic.

    --
    We're glad that graduates already know Java,
    so we only have to teach them how to program.
    somewhere in a German company
    (credit to M. Felleisen and M. Sperber)
     
    Ulrich Hobelmann, Oct 5, 2005
    #6
  7. Richard Gration <> writes:

    > Are you fucking seriously fucking expecting some fucking moron to
    > translate your tech geeking fucking code moronicity? Fucking try writing
    > it fucking properly in fucking Perl first.


    Good fucking job! That's the funniest fucking response I've ever fucking seen
    to Xah's fucking moronistic fucking nonsense.

    Lenny Bruce would be so fucking proud.

    sherm--

    --
    Cocoa programming in Perl: http://camelbones.sourceforge.net
    Hire me! My resume: http://www.dot-app.org
     
    Sherm Pendley, Oct 6, 2005
    #7
  8. Ulrich Hobelmann <> writes:

    > Richard Gration wrote:
    >> Are you fucking seriously fucking expecting some fucking moron to
    >> translate your tech geeking fucking code moronicity? Fucking try writing
    >> it fucking properly in fucking Perl first.

    >
    > Fucking excuse me?
    >
    > Fucking maybe you should fucking go fucking **** your fucking self...
    >
    > Seriously, Xah might be a troll, but this is just pathetic.


    I'm guessing you didn't get the joke then. I think Richard's response was a
    parody of Xah's "style" - a funny parody, at that.

    sherm--

    --
    Cocoa programming in Perl: http://camelbones.sourceforge.net
    Hire me! My resume: http://www.dot-app.org
     
    Sherm Pendley, Oct 6, 2005
    #8
  9. Richard Gration wrote:

    > ... fucking ... fucking ... fucking ... fucking ... Fucking ... fucking
    > ... fucking


    My friend, you can learn to use a far richer vocabulary of
    obscenities. If your creative flow is blocked by the fear
    that you can't spell more dirty words correctly, you can
    dispel this fear with a few evenings of study and preparation.

    Amaze your friends! Amuse your enemies! Enrich your
    vocabulary! You can learn the joys of cussing seven
    times in the same sentence without resorting to repetition!
    For extra points, and with suitable study, you can even
    learn to write entire paragraphs of _original_ obscenity!

    Just imagine how much clearer your point would have been if
    you'd called him a jizz-licking dogcock grabber! Why insult
    his code with a vague word like "moronicity" when you could
    use "steaming pile of entrails" or better yet, "bucket of
    fermented ballsweat?" wouldn't that have made your technical
    point much clearer?

    Now go, and don't attempt obscenity in public again until
    you learn how.

    Bear
     
    Ray Dillinger, Oct 6, 2005
    #9
  10. Sherm Pendley wrote:
    > I'm guessing you didn't get the joke then. I think Richard's response was a
    > parody of Xah's "style" - a funny parody, at that.


    If you take all the line noise in Perl as swearing ;)
    I suppose I'm lucky I can't read it.

    --
    We're glad that graduates already know Java,
    so we only have to teach them how to program.
    somewhere in a German company
    (credit to M. Felleisen and M. Sperber)
     
    Ulrich Hobelmann, Oct 6, 2005
    #10
  11. On Wed, 05 Oct 2005 20:39:18 -0400, Sherm Pendley wrote:

    > Richard Gration <> writes:
    >
    >> Are you fucking seriously fucking expecting some fucking moron to
    >> translate your tech geeking fucking code moronicity? Fucking try writing
    >> it fucking properly in fucking Perl first.

    >
    > Good fucking job! That's the funniest fucking response I've ever fucking seen
    > to Xah's fucking moronistic fucking nonsense.


    Thanks, Sherm. I knew someone would get it. I think Bear and Ulrich
    haven't yet been exposed to Xah "in full effect" ;-) They're probably
    denizens of the Scheme group which seems to be a new entry on Xah's "this
    newsgroup needs spamming" list ;-)

    > Lenny Bruce would be so fucking proud.


    LOL
     
    Richard Gration, Oct 6, 2005
    #11
  12. Xah Lee

    Xah Lee Guest

    Xah Lee wrote: « would anyone like to translate the following perl
    script to Python or Scheme (scsh)?»

    Here's the Python version.

    # -*- coding: utf-8 -*-
    # Python


    # Wed Oct 5 15:50:31 PDT 2005
    # given a dir, report all html file's size. (counting inline images)
    # XahLee.org

    import re, os.path, sys

    inpath= '/Users/t/web/'

    while inpath[-1] == '/': inpath = inpath[0:-1] # get rid of trailing
    slash

    if (not os.path.exists(inpath)):
    print "dir " + inpath + " doesn't exist!"
    sys.exit(1)

    ##################################################
    # subroutines


    def getInlineImg(file_full_path):
    '''getInlineImg($file_full_path) returns a array that is a list of
    inline images. For example, it may return ['xx.jpg','../image.png']'''

    FF = open(file_full_path,'rb')
    txt_segs = re.split( r'src', unicode(FF.read(),'utf-8'))
    txt_segs.pop(0)
    FF.close()
    linx=[]
    for linkBlock in txt_segs:
    matchResult = re.search(r'\s*=\s*\"([^\"]+)\"', linkBlock)
    if matchResult: linx.append( matchResult.group(1) )
    return linx


    def linkFullPath(dir,locallink):
    '''linkFullPath(dir, locallink) returns a string that is the full
    path to the local link. For example,
    linkFullPath('/Users/t/public_html/a/b', '../image/t.png') returns
    'Users/t/public_html/a/image/t.png'. The returned result will not
    contain double slash or '../' string.'''
    result = dir + '/' + locallink
    result = re.sub(r'//+', r'/', result)
    while re.search(r'/[^\/]+\/\.\.', result): result =
    re.sub(r'/[^\/]+\/\.\.', '', result)
    return result

    def listInlineImg(htmlfile):
    '''listInlineImg($html_file_full_path) returns a array where each
    element is a full path to inline images in the html.'''
    dir=os.path.dirname(htmlfile)
    imgPaths = getInlineImg(htmlfile)
    result = []
    for aPath in imgPaths:
    result.append(linkFullPath( dir, aPath))
    return result


    ##################################################
    # main

    fileSizeList=[]
    def checkLink(dummy, dirPath, fileList):
    for fileName in fileList:
    if '.html' == os.path.splitext(fileName)[1] and
    os.path.isfile(dirPath+'/'+fileName):
    totalSize = os.path.getsize(dirPath+'/'+fileName)
    imagePathList = listInlineImg(dirPath+'/'+fileName)
    for imgPath in imagePathList: totalSize +=
    os.path.getsize(imgPath)
    fileSizeList.append([totalSize, dirPath+'/'+fileName])


    os.path.walk(inpath, checkLink, 'dummy')

    fileSizeList.sort(key=lambda x:x[0],reverse=True)

    for it in fileSizeList: print it
    print "done reporting."



    -------------------------------------------------
    This Python version is a direct translation of the Perl version. They
    match pretty much line by line.

    for both the Python version and the Perl version, see:
    http://xahlee.org/perl-python/check_html_size.html

    Would any lisper provides a Scheme version? i don't think i'll do a
    Scheme version anytime soon. Please, Schemers, show us some fanfare.

    Xah

    ∑ http://xahlee.org/
     
    Xah Lee, Oct 7, 2005
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. lawrence

    Upload file Check file size

    lawrence, Jun 10, 2004, in forum: ASP .Net
    Replies:
    4
    Views:
    9,287
    Craig Deelsnyder
    Jun 10, 2004
  2. Xah Lee

    check html file size

    Xah Lee, Oct 5, 2005, in forum: Python
    Replies:
    11
    Views:
    619
    Richard Gration
    Oct 6, 2005
  3. h3m4n
    Replies:
    2
    Views:
    585
    h3m4n
    Jul 8, 2006
  4. tiewknvc9
    Replies:
    6
    Views:
    662
    Chris Uppal
    Oct 1, 2006
  5. Jason Cavett

    Preferred Size, Minimum Size, Size

    Jason Cavett, May 23, 2008, in forum: Java
    Replies:
    5
    Views:
    12,587
    Michael Jung
    May 25, 2008
Loading...

Share This Page