extract Infobox contents

  • Thread starter Anish Chapagain
  • Start date
A

Anish Chapagain

Hi,
I was trying to extract wikipedia Infobox contents which is in format
like given below, from the opened URL page in Python.

{{ Infobox Software
| name = Bash
| logo = [[Image:bash-org.png|165px]]
| screenshot = [[Image:Bash demo.png|250px]]
| caption = Screenshot of bash and [[Bourne shell|sh]]
sessions demonstrating some features
| developer = [[Chet Ramey]]
| latest release version = 4.0
| latest release date = {{release date|mf=yes|2009|02|20}}
| programming language = [[C (programming language)|C]]
| operating system = [[Cross-platform]]
| platform = [[GNU]]
| language = English, multilingual ([[gettext]])
| status = Active
| genre = [[Unix shell]]
| source model = [[Free software]]
| license = [[GNU General Public License]]
| website = [http://tiswww.case.edu/php/chet/bash/
bashtop.html Home page]
}} //upto this line

I need to extract all data between {{ Infobox ...to }}

Thank's if anyone can help,
am trying with

s1='{{ Infobox'
s2=len(s1)
pos1=data.find("{{ Infobox")
pos2=data.find("\n",pos2)

pat1=data.find("}}")

but am ending up getting one line at top only.

thank you,
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top