How to grab a number from inside a .html file using regex

Í

Íßêïò

Hello guys! Need your precious help again!

In every html file i have in the very first line a page_id fro counetr
countign purpsoes like in a format of a comment like this:

<!-- 1 -->
<!-- 2 -->
<!-- 3 -->

and so on. every html file has its one page_id

How can i grab that string representaion of a number from inside
the .html file using regex and convert it to an integer value?

# ==============================
# open current html template and get the page ID number
# ==============================

f = open( '/home/webville/public_html/' + page )

#read first line of the file
firstline = f.readline()

page_id = re.match( '<!-- \d -->', firstline )
print ( page_id )
 
Í

Íßêïò

i also dont know what wrong with this line:

host = socket.gethostbyaddr( os.environ['REMOTE_ADDR'] )[0]

hostmatch = re.search('cyta', host)

if cookie.has_key('visitor') != 'nikos' or hostmatch is None:
# do stuff

the 'stuff' never gets executed, while i ant them to be as long as i
dont have regex match!
 
M

MRAB

Îίκος said:
Hello guys! Need your precious help again!

In every html file i have in the very first line a page_id fro counetr
countign purpsoes like in a format of a comment like this:

<!-- 1 -->
<!-- 2 -->
<!-- 3 -->

and so on. every html file has its one page_id

How can i grab that string representaion of a number from inside
the .html file using regex and convert it to an integer value?

# ==============================
# open current html template and get the page ID number
# ==============================

f = open( '/home/webville/public_html/' + page )

#read first line of the file
firstline = f.readline()

page_id = re.match( '<!-- \d -->', firstline )
print ( page_id )

Use group capture:

found = re.match(r'<!-- (\d+) -->', firstline).group(1)
print(page_id)
 
M

MRAB

Îίκος said:
i also dont know what wrong with this line:

host = socket.gethostbyaddr( os.environ['REMOTE_ADDR'] )[0]

hostmatch = re.search('cyta', host)

if cookie.has_key('visitor') != 'nikos' or hostmatch is None:
# do stuff

the 'stuff' never gets executed, while i ant them to be as long as i
dont have regex match!

Try printing out repr(host). Does it contain "cyta"?
 
Î

Îίκος

Îίκος said:
i also dont know what wrong with this line:
host = socket.gethostbyaddr( os.environ['REMOTE_ADDR'] )[0]
hostmatch = re.search('cyta', host)
if cookie.has_key('visitor') != 'nikos' or hostmatch is None:
     # do stuff
the 'stuff' never gets executed, while i want them to be as long as i
dont have regex match!

Try printing out repr(host). Does it contain "cyta"?

Yes it does contain it as print shown!

is something wrong with this line in logic or syntax?

if cookie.has_key('visitor') != 'nikos' or re.search('cyta', host) is
None:
# do database stuff
 
Î

Îίκος

Use group capture:

     found = re.match(r'<!-- (\d+) -->', firstline).group(1)
     print(page_id)

Worked like a charm! Thanks a lot!

So match method here not only searched for the string representation
of the number but also convert it to integer as well?

r stand for retrieve the string here?

and group?

Wehn a regex searched a .txt file when is retrieving something for it
always retrieve it as string right? or can get it as a number as well?
 
T

Thomas Jollans

Worked like a charm! Thanks a lot!

So match method here not only searched for the string representation
of the number but also convert it to integer as well?

r stand for retrieve the string here?

r"xyz" is a raw string literal. That means that backslash escapes are
turned off -- r'\n' == '\\n'
 
M

MRAB

Îίκος said:
Îίκος said:
i also dont know what wrong with this line:
host = socket.gethostbyaddr( os.environ['REMOTE_ADDR'] )[0]
hostmatch = re.search('cyta', host)
if cookie.has_key('visitor') != 'nikos' or hostmatch is None:
# do stuff
the 'stuff' never gets executed, while i want them to be as long as i
dont have regex match!
Try printing out repr(host). Does it contain "cyta"?

Yes it does contain it as print shown!

is something wrong with this line in logic or syntax?

if cookie.has_key('visitor') != 'nikos' or re.search('cyta', host) is
None:
# do database stuff
You said "i want them to be as long as i dont have regex match".

re.search('cyta', host) will return None if there's no match, but you
said "Yes it does contain it", so there _is_ a match, therefore:

hostmatch is None

is False.
 
M

MRAB

Îίκος said:
Worked like a charm! Thanks a lot!

So match method here not only searched for the string representation
of the number but also convert it to integer as well?

r stand for retrieve the string here?

and group?

Wehn a regex searched a .txt file when is retrieving something for it
always retrieve it as string right? or can get it as a number as well?

The 'r' prefix makes it a 'raw string literal'. That means that the
string literal won't treat backslashes as special. Before raw string
literals were added to the Python language I would have needed to write:

'<!-- (\\d+) -->'

instead.

(Actually, that's not strictly true in this case, because \d doesn't
have a special meaning Python strings, but it's a good idea to use raw
string literals habitually when writing regexes in order to reduce the
chance of forgetting them when they _are_ necessary. Well, that's what I
think, anyway. :))
 
Î

Îίκος

re.search('cyta', host) will return None if there's no match, but you
said "Yes it does contain it", so there _is_ a match, therefore:

     hostmatch is None

is False.

The code block inside the if structure must be executes ONLY if the
'visitor' cookie is not set to the client's browser or the hostname
address of the client doesn't contain in it the string 'cyta'.

# ======================================
# do not increment the counter if a Cookie is set to the visitors
browser already
# ======================================

if cookie.has_key('visitor') != 'nikos' or re.search('cyta', host) is
None:

I still don't get it :)
 
Î

Îίκος

The 'r' prefix makes it a 'raw string literal'. That means that the
string literal won't treat backslashes as special. Before raw string
literals were added to the Python language I would have needed to write:

     '<!-- (\\d+) -->'

instead.

(Actually, that's not strictly true in this case, because \d doesn't
have a special meaning Python strings, but it's a good idea to use raw
string literals habitually when writing regexes in order to reduce the
chance of forgetting them when they _are_ necessary. Well, that's what I
think, anyway. :))

Couln't agree more!

As the saying goes, better safe than sorry! :)
 
M

MRAB

Thomas said:
This is always True. has_key returns a bool, which is never equal to any
string, even 'nikos'.

I missed that bit! :)

Anyway, the OP said "the 'stuff' never gets executed". Kinda puzzling...
 
Î

Îίκος

This is always True. has_key returns a bool, which is never equal to any
string, even 'nikos'.

if cookie.has_key('visitor') or re.search('cyta', host) is None:

adresses the problem :)

Thanks alot Thomas and MRAB for ALL your help!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top