Web Page Parsing/Downloading


T

TheRandomPast

Hi. I'm self taught at Python and I used http://www.codecademy.com/ to learn which was great help i must say but now, I'm attempting it all on my own and need a little help?

I have three scripts and this is what I'm trying to do with them;


Download from webpage
Parse Links from Page
Output summary of total links
Format a list of matched links
Parse and Print Email addresses
Crach Hash Passwords
Exception Handling
Parsing and Print links to image files/.doc
Save file into specified folder and alert when files don't save

Can anyone help because I've become a little stuck? None of the scripts are running for me and I can't see where I'm having issues


WebPage script;
import sys, urllib
def getWebpage(url):
print '[*] getWebpage()'
url_file = urllib.urlopen(url)
page = url_file.read()
return page
def main():
sys.argv.append('http://www.funeralformyfat.tumblr.com')
if len(sys.argv) != 2:
print '[-] Usage: webpage_get URL'
return

print getWebpage(sys.argv[1])

if __name__ == '__main__':
main()

getLinks

def print_links(page):
print '[*] print_links()'
links = re.findall(r'\<a.*href\=.*http\:.+', page)
links.sort()
print '[+]', str(len(links)), 'HyperLinks Found:'

for link in links:
print link

def main():
sys.argv.append('http://www.funeralformyfat.tumblr.com')
if len(sys.argv) != 2:
print '[-] Usage: webpage_getlinks URL'
return
page = webpage_get.wget(sys.argv[1])
print_links(page)

from os.path import join

directory = join('/home/', y, '/newdir/')
file_name = url.split('/')[-1]
file_name = join(directory, file_name)




if __name__ == '__main__':
main()

getParser

import md5

oldpasswd_byuser=str("tom")
oldpasswd_db="sha1$c60da$1835a9c3ccb1cc436ccaa577679b5d0321234c6f"
opw= md5.new(oldpasswd_byuser)
#opw= md5.new(oldpasswd_byuser).hexdigest()
if(opw == oldpasswd_db):
print "same password"
else:
print "Invalid password"

from email.parser import Parser


#headers = Parser().parse(open(messagefile, 'r'))


headers = Parser().parsestr('From: <[email protected]>\n'
'To: <[email protected]>\n'
'Subject: Test message\n'
'\n'
'Body would go here\n')
print 'To: %s' % headers['to']
print 'From: %s' % headers['from']
print 'Subject: %s' % headers['subject']



Thanks for any help!
 
Ad

Advertisements

C

Chris Angelico

Can anyone help because I've become a little stuck? None of the scripts are running for me and I can't see where I'm having issues

I'm rather lost in what you're trying to accomplish here. The first
thing to do would be to separate out your three scripts and just look
at one at a time; then cut each one down to just what it really needs
to be doing. Once you've done that, you'll have a simple example - see
http://sscce.org/ for tips on that - and you can figure out what it's
doing wrong. If you can't figure it out on your own, the short example
will be far more suitable for posting here, along with its error
backtrace (if it's throwing one), than a more verbose program listing.

Two general points of advice. Firstly, if you're just starting out, I
strongly recommend you use Python 3 instead of Python 2. All sorts of
things have been improved, and it's far easier to learn on the new
version than to learn on the old and then have to change your habits
later.

And secondly, please read this and take note:
https://wiki.python.org/moin/GoogleGroupsPython - otherwise, you'll
find that a lot of people don't want to see your post. Best would be
to avoid Google Groups altogether, as it's very approximately the
worst newsgroup client I've ever seen posts from.

ChrisA
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top