Look for a string on a file and get its line number

H

Horacius ReX

Hi,

I have to search for a string on a big file. Once this string is
found, I would need to get the number of the line in which the string
is located on the file. Do you know how if this is possible to do in
python ?

Thanks
 
J

Jeroen Ruigrok van der Werven

-On [20080108 09:21] said:
I have to search for a string on a big file. Once this string is
found, I would need to get the number of the line in which the string
is located on the file. Do you know how if this is possible to do in
python ?

(Assuming ASCII, otherwise check out codecs.open().)

big_file = open('bigfile.txt', 'r')

line_nr = 0
for line in big_file:
line_nr += 1
has_match = line.find('my-string')
if has_match > 0:
print 'Found in line %d' % (line_nr)

Something to this effect.
 
J

John Machin

-On [20080108 09:21] said:
I have to search for a string on a big file. Once this string is
found, I would need to get the number of the line in which the string
is located on the file. Do you know how if this is possible to do in
python ?

(Assuming ASCII, otherwise check out codecs.open().)

big_file = open('bigfile.txt', 'r')

line_nr = 0
for line in big_file:
line_nr += 1
has_match = line.find('my-string')
if has_match > 0:

Make that >=

| >>> 'fubar'.find('fu')
| 0
| >>>
 
J

Jeroen Ruigrok van der Werven

-On [20080108 09:51] said:
Make that >=

Right you are. Sorry, was doing it quickly from work. ;)

And I guess the find will also be less precise if the word you are looking is
a smaller part of a bigger word. E.g. find 'door' in a line that has 'doorway'
in it.

So 't is merely for inspiration. ;)

--
Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai
イェルーン ラウフロック ヴァン デル ウェルヴェン
http://www.in-nomine.org/ | http://www.rangaku.org/
From morning to night I stayed out of sight / Didn't recognise I'd become
No more than alive I'd barely survive / In a word, overrun...
 
J

Jeroen Ruigrok van der Werven

-On [20080108 09:51] said:
Make that >=

| >>> 'fubar'.find('fu')

Or even just:

if 'my-string' in line:
...

Same caveat emptor applies though.
 
M

Martin Marcher

Jeroen said:
-On [20080108 09:21] said:
I have to search for a string on a big file. Once this string is
found, I would need to get the number of the line in which the string
is located on the file. Do you know how if this is possible to do in
python ?

(Assuming ASCII, otherwise check out codecs.open().)

big_file = open('bigfile.txt', 'r')

line_nr = 0
for line in big_file:
line_nr += 1
has_match = line.find('my-string')
if has_match > 0:
print 'Found in line %d' % (line_nr)

Something to this effect.

apart from that look at the linecache module. If it's a big file it could
help you with subsequent access to the line in question

hth
martin

--
http://noneisyours.marcher.name
http://feeds.feedburner.com/NoneIsYours

You are not free to read this message,
by doing so, you have violated my licence
and are required to urinate publicly. Thank you.
 
R

Ryan Ginstrom

On Behalf Of Horacius ReX
I have to search for a string on a big file. Once this string
is found, I would need to get the number of the line in which
the string is located on the file. Do you know how if this is
possible to do in python ?

This should be reasonable:
if "Guido" in line:
print "Found Guido on line", num
break


Found Guido on line 1296
Regards,
Ryan Ginstrom
 
W

Wildemar Wildenburger

Jeroen said:
line_nr = 0
for line in big_file:
line_nr += 1
has_match = line.find('my-string')
if has_match > 0:
print 'Found in line %d' % (line_nr)
Style note:
May I suggest enumerate (I find the explicit counting somewhat clunky)
and maybe turning it into a generator (I like generators):

def lines(big_file, pattern="my string"):
for n, line in enumerate(big_file):
if pattern in line:
print 'Found in line %d' % (n)
yield n

or for direct use, how about a simple list comprehension:

lines = [n for (n, line) in enumerate(big_file) if "my string" in line]

(If you're just going to iterate over the result, that is you do not
need indexing, replace the brackets with parenthesis. That way you get a
generator and don't have to build a complete list. This is especially
useful if you expect many hits.)

Just a note.

regards
/W
 
J

Jeroen Ruigrok van der Werven

-On [20080108 12:59] said:
Style note:
May I suggest enumerate (I find the explicit counting somewhat clunky)
and maybe turning it into a generator (I like generators):

Sure, I still have a lot to discover myself with Python.

I'll study your examples, thanks. :)
 
T

Tim Chase

I have to search for a string on a big file. Once this string
This should be reasonable:

if "Guido" in line:
print "Found Guido on line", num
break


Found Guido on line 1296

Just a small caveat here: enumerate() is zero-based, so you may
actually want add one to the resulting number:

s = "Guido"
for num, line in enumerate(open("file.txt")):
if s in line:
print "Found %s on line %i" % (s, num + 1)
break # optionally stop looking

Or one could use a tool made for the job:

grep -n Guido file.txt

or if you only want the first match:

sed -n '/Guido/{=;p;q}' file.txt

-tkc
 
H

Horacius ReX

Hi, thanks for the help. Then I got running the following code;

#!/usr/bin/env python

import os, sys, re, string, array, linecache, math

nlach = 12532

lach_list = sys.argv[1]
lach_list_file = open(lach_list,"r")
lach_mol2 = sys.argv[2] # name of the lachand mol2 file
lach_mol2_file = open(lach_mol2,"r")
n_lach_read=int(sys.argv[3])

# Do the following for the total number of lachands

# 1. read the list with the ranked lachands
for i in range(1,n_lach_read+1):
line = lach_list_file.readline()
ll = string.split (line)
#print i, ll[0]
lach = int(ll[0])
# 2. for each lachand, print mol2 file
# 2a. find lachand header in lachand mol2 file (example; kanaka)
# and return line number
line_nr = 0
for line in lach_mol2_file:
line_nr += 1
has_match = line.find('kanaka')
if has_match >= 0:
print 'Found in line %d' % (line_nr)
# 2b. print on screen all the info for this lachand
# (but first need to read natoms and nbonds info)
# go to line line_nr + 1
ltr=linecache.getline(lach_mol2, line_nr + 1)
ll=ltr.split()
#print ll[0],ll[1]
nat=int(ll[0])
nb=int(ll[1])
# total lines to print:
# header, 8
# at, na
# b header, 1
# n
# lastheaders, 2
# so; nat + nb + 11
ntotal_lines = nat + nb + 11
# now we go to the beginning of the lachand
# and print ntotal_lines
for j in range(0,ntotal_lines):
print linecache.getline(lach_mol2, line_nr - 1 + j )


which almost works. In the last "for j" loop, i expected to obtain an
output like:

sdsdsdsdsdsd
sdsdsfdgdgdgdg
hdfgdgdgdg

but instead of this, i get:

sdsdsdsdsdsd

sdsdsfdgdgdgdg

hdfgdgdgdg

and also the program is very slow. Do you know how could i solve
this ?

thanks
 
J

jcvanelst

Hi,

I have to search for a string on a big file. Once this string is
found, I would need to get the number of the line in which the string
is located on the file. Do you know how if this is possible to do in
python ?

Thanks

hi, i'm no python whizzkid, but you can do a lot with the .index
syntax. If you do something like this, it'll return the index of the
first character of your string as it is found in your file. Note that
if you read a file like this there will be some special characters for
new lines and the list structure that python uses, so if you want to
know the exact line you will have to find that out by playing with a
small file. You can always use a command like: print s[12]
if you want to know the exact 12th character in your file
--------------------------------------------------------
infile = open("C:\\Users\\yourname\\Desktop\\", 'r')

f= yourfile.readlines()
s=str(f)
yourstring = s.index('mystring')
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top