K
kevingloveruk
Hi
I have downloaded and unzipped the xml dump of Wikipedia (40+GB). I want touse Python and the SAX module (running under Windows 7) to carry out off-line phrase-searches of Wikipedia and to return a count of the number of hits for each search. Typical phrase-searches might be "of the dog" and "dog's".
I have some limited prior programming experience (from many years ago) and I am currently learning Python from a course of YouTube tutorials. Before Iget much further, I wanted to ask:
Is what I am trying to do actually feasible?
Are there any example programs or code snippets that would help me?
Any advice or guidance would be gratefully received.
Best regards,
Kevin Glover
I have downloaded and unzipped the xml dump of Wikipedia (40+GB). I want touse Python and the SAX module (running under Windows 7) to carry out off-line phrase-searches of Wikipedia and to return a count of the number of hits for each search. Typical phrase-searches might be "of the dog" and "dog's".
I have some limited prior programming experience (from many years ago) and I am currently learning Python from a course of YouTube tutorials. Before Iget much further, I wanted to ask:
Is what I am trying to do actually feasible?
Are there any example programs or code snippets that would help me?
Any advice or guidance would be gratefully received.
Best regards,
Kevin Glover