convert text documents to XML

D

ddddddd

hello sir

i am shaji from kerala india. i am a programmer. i hve one doubt i
will explain below.

we have to devolop a literatures database. i need to include varoius
features like search by author,search title,
search by publication year etc:-

so what we noramlly doing is first store all the literature details to
datbase(it includes foolwing
information like author(Au),title(TI),abstract(
Ab) ,published year etc-:
in order to store the these information we need to create one
literature entry form...and store the details via form...
this method is possible... it is very tedoius since we have around
6000 literaturees are there.

so is there any other method?
we are downloding these literatures from some external websites...all
the literatures are in the same format
...i am sending the litertures record forma below...please check
that ...

[in the below record format TI:means title AU means author
AB:abstract KW means Keywords PY means publication year
etc:

please check below records...and suggest any new methods.
can we convert it into xml directly?


literature Record format

Record: 1

TI- A Survey of Phytophthora Species on Hainan Island of South China.
AU- Hui-cai Zeng1
AU- Hon-hing Ho2 (e-mail address removed)
AU- Fuy-Cong Zheng3
JN- Journal of Phytopathology
PD- Jan2009, Vol. 157 Issue 1, p33-39
PG- 7p
DT- 20090101
PT- Article
AB- During the period 1997–2007, a comprehensive study of the
occurrence and distribution of Phytophthora species was conducted on
Hainan Island of South China. To date, 14 species of Phytophthora have
been recovered and their distribution determined. Phytophthora
nicotianae ( =P. parasitica) is the most important species attacking a
wide variety of crops, followed by Phytophthora capsici and
Phytophthora citrophthora. In contrast to Phytophthora colocasiae
attacking taro leaves throughout the entire island, Phytophthora
cyperi was found only once on Digitaria ciliaris in Danzhou. It is of
interest to note that Phytophthora heveae, Phytophthora katsurae and
Phytophthora insolita are commonly found in forest soil/water of
protected mountains without causing any plant diseases. Although
Phytophthora species are usually terrestrial or found in fresh water,
one isolate of Phytophthora resembling closely the asexual isolates of
P. insolita in Hainan was obtained from decaying Rhizophora leaves
submerged in seawater. An unidentified Phytophthora species producing
non-papillate; internally proliferating sporangia was isolated from
the soil in which Ceriops tagel and Bruguiera serangula were growing
in a salt water shrimp farm. [ABSTRACT FROM AUTHOR]
AB- Copyright of Journal of Phytopathology is the property of
Blackwell Publishing Limited and its content may not be copied or
emailed to multiple sites or posted to a listserv without the
copyright holder's express written permission. However, users may
print, download, or email articles for individual use. This abstract
may be abridged. No warranty is given about the accuracy of the copy.
Users should refer to the original published version of the material
for the full abstract. (Copyright applies to all Abstracts.)
DE- PHYTOPHTHORA
DE- CROPS
DE- PLANTS -- Wounds & injuries
DE- PLANT quarantine
GE- HAINAN Island (China)
GE- CHINA
KW- marine isolates
KW- Phytophthora capsici
KW- Phytophthora cinnamomi
KW- Phytophthora citrophthora
KW- Phytophthora heveae
KW- Phytophthora insolita
KW- Phytophthora katsurae
KW- Phytophthora nicotianae
AD- 1The Institute of Bioscience and Biotechnology, Chinese Academy of
Tropical Agricultural Sciences, Haikou 571101, Hainan, China
AD- 2Department of Biology, State University of New York, New Paltz,
New York 12561, USA
AD- 3College of Environment and Plant Protection, Hainan University,
Baodoa Xincun, Danzhou City, Hainan 571737, China
IS- 09311785
DI- 10.1111/j.1439-0434.2008.01441.x
AN- 35655828
UR- http://search.ebscohost.com/login.aspx?direct=true&db=a9h&AN=35655828&site=ehost-live
UR- http://search.ebscohost.com/login.aspx?direct=true&db=a9h&AN=35655828&site=ehost-live

Record: 2

TI- Effects of Some Plant Materials on Phytophthora Blight
(Phytophthora capsiciLeon.) of Pepper.
AT- Bazi Bitkisel Materyallerin Biberde Phytophthora Yanikligi
( Phytophthora capsici Leon.)'na Etkileri.
AU- Dem?rc?, Fikret1 (e-mail address removed)
AU- Dolar, F. Sara1
JN- Turkish Journal of Agriculture & Forestry
PD- 2006, Vol. 30 Issue 4, p247-252
PG- 6p
DT- 20060801
PT- Article
AB- Effects of dried garlic, peppermint, cabbage, lentil, alfalfa,
onion, radish, and garden cress plant materials on Phytophthora blight
(Phytophthora capsici Leon.) of pepper were determined, in both in
vitro and in vivo conditions. Extracts of the plant materials were
used in vitro. The plant materials were extracted in ethanol and were
added to corn meal agar (CMA) at 5 and 10 µg ml<sup>-1</sup>. The
extracts of alfalfa, garlic, cabbage, and peppermint reduced colony
diameter of P. capsicion corn meal agar between 3.46% and 13.73%,
whereas mycelial growth of P. capsici was increased by onion, radish,
garden cress, and lentil extracts. The plant materials inhibitory to
mycelial growth of P. capsici were incorporated into soil inoculated
with P. capsici, in pots and also in the field, in order to determine
their effects on Phytophthora blight severity. The severity of
Phytophthora blight of pepper was markedly reduced by cabbage, garlic,
and alfalfa materials by 15.3%, 39.8% and 46.9%, respectively, in pot
trials. No significant effect of peppermint on disease severity was
found. In the field infested with P. capsici, disease severity
decreased with cabbage, garlic, and alfalfa by 89.5%, 40%, and 10.7%,
respectively. Peppermint slightly increased the disease severity
(3.4%). In this study, dry cabbage, garlic, and alfalfa materials were
effective in reducing the severity of disease caused by P. capsici, in
both in vitro and in vivo conditions. (English) [ABSTRACT FROM
AUTHOR]
AB- Kurutulmus sar?msak, nane, lahana, mercimek, yonca, sogan, turp ve
tere bitki art?klar?n?n biberde Phytophthora yan?kl?g? (Phytophthora
capsici Leon.)'na etkileri in vitro ve in vivo kosullarda
belirlenmistir, in vitro kosullardaki çal?smalarda bitki
materyallerinin ekstraktlar? kullan?lm?st?r. Bitki materyalleri etil
alkolde ekstrakte edilmis ve m?s?r unu agara 5 ve 10 ug mi4 dozlar?nda
ilave edilmistir. Yonca, sar?msak, lahana ve nane, P. capsici misel
gelisimini % 3.46 ila % 13.73 oran?nda azalt?rken, sogan, turp, tere
ve mercimek ekstraktlar?, misel gelisimini art?rm?st?r. P. capsici nin
misel gelisimine engelleyici etkisi olan bitki materyalleri, biberde
Phytophthora yan?kl?g? hastal?g?n?n siddetine etkilerinin belirlenmesi
amac?yla, içinde P. capsici ile inokule edilmis toprak bulunan saks?
lara ve tarla toprag?na ilave edilmistir. Saks? denemelerinde lahana,
sar?msak ve yonca art?klar? biberde Phytophthora yan?kl?g? hastal?g?n?
n siddetini s?ras?yla %15.3, %39.8 ve %46.9 oran?nda azaltm?st?r.
Nanenin ise hastal?k siddetine önemli bir etkisi olmam?st?r. Tarla
kosullar?nda lahana, sar?msak ve yonca P. capsici nin hastal?k
siddetini s?ras?yla %89.5, % 40 ve %10.7 oran?nda azaltm?s, nane ise
%3.4 oran?nda art?rm?st?r. Bu çal?smada kuru lahana, sar?msak ve yonca
materyalleri in vitro ve in vivo kosullarda P. capsici ye kars? etkili
bulunmustur (Turkish) [ABSTRACT FROM AUTHOR]
AB- Copyright of Turkish Journal of Agriculture & Forestry is the
property of Scientific and Technical Research Council of Turkey and
its content may not be copied or emailed to multiple sites or posted
to a listserv without the copyright holder's express written
permission. However, users may print, download, or email articles for
individual use. This abstract may be abridged. No warranty is given
about the accuracy of the copy. Users should refer to the original
published version of the material for the full abstract. (Copyright
applies to all Abstracts.)
DE- PHYTOPHTHORA diseases
DE- FUNGAL diseases of plants
DE- GARLIC
DE- ALFALFA
SU- PEPPERMINT
KW- pepper
KW- Phytophthora capsici
KW- Phytophthorablight
KW- plant materials
KW- biber
KW- bitkisel materyaller
KW- Phytophthora capsici
KW- Phytophthora yan?kl?g?
LK- English; Turkish
AD- 1University of Ankara, Faculty of Agriculture, Plant Protection
Department, Ankara -- Turkey
IS- 1300011X
AN- 22865585
UR- http://search.ebscohost.com/login.aspx?direct=true&db=a9h&AN=22865585&site=ehost-live
UR- http://search.ebscohost.com/login.aspx?direct=true&db=a9h&AN=22865585&site=ehost-live


please reply
 
J

Joe Kesselman

can we convert it into xml directly?

You could certainly write code to parse this document format and produce
XML output. Of course, you have to decide what XML markup you're going
to use to represent the data, but after doing that it's not
significantly harder to produce XML than to put the data into a database.

I would recommend taking advantage of existing XML APIs -- DOM or SAX --
which will take care of the details of XML syntax and let you focus on
the structure of the XML document. DOM is probably easier for a beginner
to learn, since it's a tree structure; SAX is event-based and requires
that you maintain more state in your own code, but would be more
efficient for this sort of flow-through format conversion.

There may be off-the-shelf tools which can do the conversion for you,
but I haven't used any... and I'd consider this a fairly trivial bit of
programming so I wouldn't bother looking for one.

You might also want to contact whoever maintains the database you're
trying to access via the web, and see if they already offer a service
which returns the data in XML form rather than text form.
 
D

Daniel

hello sir

we have to devolop a literatures database. i need to include varoius
features like search by author,search title,
search by publication year etc:-

so what we noramlly doing is first store all the literature details to
datbase(it includes foolwing
information like author(Au),title(TI),abstract(
Ab) ,published year etc-:
in order to store  the these information we need to create one
literature entry form...and store the details via  form...
this method  is possible... it is very tedoius since we have around
6000 literaturees are there.

so is there any other method?
we are downloding these literatures from some external websites...all
the literatures are in the same format
..i am sending the litertures record forma  below...please check
that ...

[in the below record format  TI:means title      AU means author
AB:abstract  KW means Keywords PY means publication year
etc:

please check below records...and suggest any new methods.
can we convert it  into xml directly?
Have a look at the open source project ServingXML at http://servingxml.sourceforge.net/,
and check out some of the examples in the Examples link. You should
be able to define a resources script to convert these records to XML
easily.

-- Daniel
 
P

Peter Flynn

ddddddd said:
hello sir

i am shaji from kerala india. i am a programmer. i hve one doubt i
will explain below.

we have to devolop a literatures database. i need to include varoius
features like search by author,search title,
search by publication year etc:-

so what we noramlly doing is first store all the literature details to
datbase(it includes foolwing
information like author(Au),title(TI),abstract(
Ab) ,published year etc-:
in order to store the these information we need to create one
literature entry form...and store the details via form...
this method is possible... it is very tedoius since we have around
6000 literaturees are there.

so is there any other method?

Yes. Use a bibliographic database package like JabRef (free), or EndNote
/ ProCite / ReferenceManager (expensive), or even Zotero (plugin for
Firefox). All of these can export the data in many different formats.
we are downloding these literatures from some external websites...all
the literatures are in the same format
..i am sending the litertures record forma below...please check
that ... [...]
TI- A Survey of Phytophthora Species on Hainan Island of South China.
AU- Hui-cai Zeng1
[...]

That looks like RIS format or one of its derivatives. All the
bibliographic databases above should be able to open files of that
format, and you can then export to something like MODS (XML).

///Peter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,021
Latest member
AkilahJaim

Latest Threads

Top