extract Infobox contents

Anish Chapagain · Apr 6, 2009

Hi,
I was trying to extract wikipedia Infobox contents which is in format
like given below, from the opened URL page in Python.

{{ Infobox Software
| name = Bash
| logo = [[Image:bash-org.png|165px]]
| screenshot = [[Image:Bash demo.png|250px]]
| caption = Screenshot of bash and [[Bourne shell|sh]]
sessions demonstrating some features
| developer = [[Chet Ramey]]
| latest release version = 4.0
| latest release date = {{release date|mf=yes|2009|02|20}}
| programming language = [[C (programming language)|C]]
| operating system = [[Cross-platform]]
| platform = [[GNU]]
| language = English, multilingual ([[gettext]])
| status = Active
| genre = [[Unix shell]]
| source model = [[Free software]]
| license = [[GNU General Public License]]
| website = [http://tiswww.case.edu/php/chet/bash/
bashtop.html Home page]
}} //upto this line

I need to extract all data between {{ Infobox ...to }}

Thank's if anyone can help,
am trying with

s1='{{ Infobox'
s2=len(s1)
pos1=data.find("{{ Infobox")
pos2=data.find("\n",pos2)

pat1=data.find("}}")

but am ending up getting one line at top only.

thank you,

ANN: eGenix mx Base Distribution 3.2.6 (mxDateTime, mxTextTools, etc.)	0	Apr 17, 2013
ANN: eGenix mx Base Distribution 3.1.1 for Python 2.6	0	Oct 15, 2008
compiling perl 5.8.7 on Solaris 8	3	Nov 17, 2005
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	15	Apr 1, 2006
Ruby Weekly News 28th February - 6th March 2005	1	Mar 6, 2005
REQ: Perl 5.8.3 on OpenBSD	3	Mar 6, 2004
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	1	Feb 1, 2004
comp.lang.c Changes to Answers to Frequently Asked Questions (FAQ)	1	Jul 4, 2004

extract Infobox contents

Anish Chapagain

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads