What is the best html to latex program on the market or the internet ?

Discussion in 'HTML' started by vasan999@hotmail.com, Oct 22, 2007.

  1. Guest

    Basically, it should do all that any of the tools below and in
    addition,

    1/
    human readable output that maintains the text lines of the source, ie
    does not scramble the text lines or insert newlines unnecessarily or
    removes them. inserts minimal latex elements.

    2/
    maintains cross-links, ie convert <href to \ref and <name= to \label

    but if the set of htmls is incomplete proceed with the assumption that
    the reference is there, ie dont delete the links or try to modify them
    or their addresses. One of the tool I tested is too smart in this
    respect and actually ruins the result.

    3/
    proper conversion of images, tables, etc. No math mode involved in
    html.


    4/
    Even an emacs lisp function could be written by a guru that can do the
    job.

    5/
    Is there any commercial wysiwig tool ?


    LaTeX etc

    * html2latex is a program based on the NCSA html parser. Contact:
    .
    * Another html2latex can combine several HTML files into a single
    LaTeX file, converting links between the files to references. External
    URL's can be converted into footnotes or into a bibliography sorted on
    URL. Contact: (Frans J. Faase)
    * Another html2latex implemented on Linux by yacc+lex+C. Also
    available from the TSX-11 Linux FTP site as nc-html2latex-0.97.tar.gz.
    Contact: (Naoya Tozuka)
    * htmlatex.pl is a perl script to do the conversion (may be moving
    soon). Contact: (Jake Kesinger)
    * There is also a sed script to convert HTML into LaTeX.
     
    , Oct 22, 2007
    #1
    1. Advertising

  2. Guest

    The site says, that this will convert html to latex. Can anyone
    explain me this
    code? I am not familiar with such difficult commands especially there
    are no
    comments line by line explanation and overall operation.

    1i\
    \\documentstyle{article}
    1i\
    \\begin{document}
    $a\
    \\end{document}
    # Too bad there's no way to make sed ignore case!
    /<[Xx][Mm][Pp]>/,/<.[Xx][Mm][Pp]>/b lit
    /<.[Xx][Mm][Pp]>/b lit
    /<[Ll][Ii][Ss][Tt][Ii][Nn][Gg]>/,/<.[Ll][Ii][Ss][Tt][Ii][Nn][Gg]>/b
    lit
    /<.[Ll][Ii][Ss][Tt][Ii][Nn][Gg]>/b lit
    /<[Pp][Rr][Ee]>/,/<.[Pp][Rr][Ee]>/b pre
    /<.[Pp][Rr][Ee]>/b pre
    # Stuff to ignore
    s?<[Ii][Ss][Ii][Nn][Dd][Ee][Xx]>??
    s?</[Aa][Dd][Dd][Rr][Ee][Ss][Ss]>??g
    s?<[Nn][Ee][Xx][Tt][Ii][Dd][^>]*>??g
    # character set translations for LaTex special chars
    s?&gt.?>?g
    s?&lt.?<?g
    s?\\?\\backslash ?g
    s?{?\\{?g
    s?}?\\}?g
    s?%?\\%?g
    s?\$?\\$?g
    s?&?\\&?g
    s?#?\\#?g
    s?_?\\_?g
    s?~?\\~?g
    s?\^?\\^?g
    # Paragraph borders
    s?<[Pp]>?\\par ?g
    s?</[Pp]>??g
    # Headings
    s?<[Tt][Ii][Tt][Ll][Ee]>\([^<]*\)</[Tt][Ii][Tt][Ll][Ee]>?\
    \section*{\1}?g
    s?<[Hh]n>?\\part{?g
    s?</[Hh]n>?}?g
    s?<[Hh]1>?\\section*{?g
    s?</[Hh][0-9]>?}?g
    s?<[Hh]2>?\\subsection*{?g
    s?<[Hh]3>?\\subsubsection*{?g
    s?<[Hh]4>?\\subsubsection*{?g
    s?<[Hh]5>?\\paragraph{?g
    s?<[Hh]6>?\\subparagraph{?g
    # UL is itemize
    s?<[Uu][Ll]>?\\begin{itemize}?g
    s?</[Uu][Ll]>?\\end{itemize}?g
    s?<[Ll][Ii]>?\\item ?g
    # DL is description
    s?<[Dd][Ll]>?\\begin{description}?g
    s?</[Dd][Ll]>?\\end{description}?g
    # closing delimiter for DT is first < or end of line which ever comes
    first NO
    #s?<[Dd][Tt]>\([^<]*\)<?\\item[\1]<?g
    #s?<[Dd][Tt]>\([^<]*\)$?\\item[\1]?g
    #s?<[Dd][Dd]>??g
    s?<[Dd][Tt]>?\\item[<?g
    s?<[Dd][Dd]>?]?g
    # Other common SGML markup. this is ad-hoc
    s?<sec[ab]>??
    s?</sec[ab]>??g
    # Italics
    s?<it>\([^<]*\)</it>?{\\it \1 }?g
    # Get rid of Anchors
    :pre
    s?<[Aa][^>]*>??g
    s?</[Aa]>??g
    # This is a subroutine in sed, in case you are not a sed guru
    : lit
    s?<[Xx][Mm][Pp]>?\\begin{verbatim}?g
    s?</[Xx][Mm][Pp]>?\\end{verbatim}?
    s?<[Ll][Ii][Ss][Tt][Ii][Nn][Gg]>?\\begin{verbatim}?g
    s?</[Ll][Ii][Ss][Tt][Ii][Nn][Gg]>?\\end{verbatim}?


    On Oct 22, 2:57 pm, wrote:
    > Basically, it should do all that any of the tools below and in
    > addition,
    >
    > 1/
    > human readable output that maintains the text lines of the source, ie
    > does not scramble the text lines or insert newlines unnecessarily or
    > removes them. inserts minimal latex elements.
    >
    > 2/
    > maintains cross-links, ie convert <href to \ref and <name= to \label
    >
    > but if the set of htmls is incomplete proceed with the assumption that
    > the reference is there, ie dont delete the links or try to modify them
    > or their addresses. One of the tool I tested is too smart in this
    > respect and actually ruins the result.
    >
    > 3/
    > proper conversion of images, tables, etc. No math mode involved in
    > html.
    >
    > 4/
    > Even an emacs lisp function could be written by a guru that can do the
    > job.
    >
    > 5/
    > Is there any commercial wysiwig tool ?
    >
    > LaTeX etc
    >
    > * html2latex is a program based on the NCSA html parser. Contact:
    > .
    > * Another html2latex can combine several HTML files into a single
    > LaTeX file, converting links between the files to references. External
    > URL's can be converted into footnotes or into a bibliography sorted on
    > URL. Contact: (Frans J. Faase)
    > * Another html2latex implemented on Linux by yacc+lex+C. Also
    > available from the TSX-11 Linux FTP site as nc-html2latex-0.97.tar.gz.
    > Contact: (Naoya Tozuka)
    > * htmlatex.pl is a perl script to do the conversion (may be moving
    > soon). Contact: (Jake Kesinger)
    > * There is also a sed script to convert HTML into LaTeX.
     
    , Oct 23, 2007
    #2
    1. Advertising

  3. Guest

    maybe I should post in european tex groups also

    On Oct 22, 2:57 pm, wrote:
    > Basically, it should do all that any of the tools below and in
    > addition,
    >
    > 1/
    > human readable output that maintains the text lines of the source, ie
    > does not scramble the text lines or insert newlines unnecessarily or
    > removes them. inserts minimal latex elements.
    >
    > 2/
    > maintains cross-links, ie convert <href to \ref and <name= to \label
    >
    > but if the set of htmls is incomplete proceed with the assumption that
    > the reference is there, ie dont delete the links or try to modify them
    > or their addresses. One of the tool I tested is too smart in this
    > respect and actually ruins the result.
    >
    > 3/
    > proper conversion of images, tables, etc. No math mode involved in
    > html.
    >
    > 4/
    > Even an emacs lisp function could be written by a guru that can do the
    > job.
    >
    > 5/
    > Is there any commercial wysiwig tool ?
    >
    > LaTeX etc
    >
    > * html2latex is a program based on the NCSA html parser. Contact:
    > .
    > * Another html2latex can combine several HTML files into a single
    > LaTeX file, converting links between the files to references. External
    > URL's can be converted into footnotes or into a bibliography sorted on
    > URL. Contact: (Frans J. Faase)
    > * Another html2latex implemented on Linux by yacc+lex+C. Also
    > available from the TSX-11 Linux FTP site as nc-html2latex-0.97.tar.gz.
    > Contact: (Naoya Tozuka)
    > * htmlatex.pl is a perl script to do the conversion (may be moving
    > soon). Contact: (Jake Kesinger)
    > * There is also a sed script to convert HTML into LaTeX.
     
    , Oct 23, 2007
    #3
  4. Edd Barrett Guest

    On Oct 23, 2:26 am, wrote:
    > maybe I should post in european tex groups also
    >
    > On Oct 22, 2:57 pm, wrote:
    >
    > > Basically, it should do all that any of the tools below and in
    > > addition,

    >
    > > 1/
    > > human readable output that maintains the text lines of the source, ie
    > > does not scramble the text lines or insert newlines unnecessarily or
    > > removes them. inserts minimal latex elements.

    >
    > > 2/
    > > maintains cross-links, ie convert <href to \ref and <name= to \label

    >
    > > but if the set of htmls is incomplete proceed with the assumption that
    > > the reference is there, ie dont delete the links or try to modify them
    > > or their addresses. One of the tool I tested is too smart in this
    > > respect and actually ruins the result.

    >
    > > 3/
    > > proper conversion of images, tables, etc. No math mode involved in
    > > html.

    >
    > > 4/
    > > Even an emacs lisp function could be written by a guru that can do the
    > > job.

    >
    > > 5/
    > > Is there any commercial wysiwig tool ?

    >
    > > LaTeX etc

    >
    > > * html2latex is a program based on the NCSA html parser. Contact:
    > > .
    > > * Another html2latex can combine several HTML files into a single
    > > LaTeX file, converting links between the files to references. External
    > > URL's can be converted into footnotes or into a bibliography sorted on
    > > URL. Contact: (Frans J. Faase)
    > > * Another html2latex implemented on Linux by yacc+lex+C. Also
    > > available from the TSX-11 Linux FTP site as nc-html2latex-0.97.tar.gz.
    > > Contact: (Naoya Tozuka)
    > > * htmlatex.pl is a perl script to do the conversion (may be moving
    > > soon). Contact: (Jake Kesinger)
    > > * There is also a sed script to convert HTML into LaTeX.


    Hi,

    I don't know if this can be of help:
    http://openwetware.org/wiki/User:Austin_J._Che/Extensions/LatexDoc

    This is something that we are looking into to allow researchers to
    distribute documents in both PDF and web-based (we hope).

    Thanks

    Edd
     
    Edd Barrett, Oct 23, 2007
    #4
  5. metaperl.com Guest

    I like PlasTeX.SF.Net

    > Basically, it should do all that any of the tools below and in
    > addition,
     
    metaperl.com, Oct 23, 2007
    #5
  6. Guest

    On Oct 23, 11:13 am, "metaperl.com" <> wrote:
    > I like PlasTeX.SF.Net
    >
    > > Basically, it should do all that any of the tools below and in
    > > addition,


    I think OP wanted html->latex

    http://plastex.sourceforge.net/

    SAS is currently using plasTeX to generate HTML and DocBook for
    10,000+ pages of scientific documentation nightly.
     
    , Oct 23, 2007
    #6
  7. Peter Flynn Guest

    Re: What is the best html to latex program on the market or the internet?

    wrote:
    > The site says, that this will convert html to latex. Can anyone
    > explain me this code? I am not familiar with such difficult commands
    > especially there are no comments line by line explanation and overall
    > operation.
    >
    > 1i\
    > \\documentstyle{article}

    [snip]

    This is a sed(1) script. sed is a stream editor, available on most
    platforms.

    ///Peter
     
    Peter Flynn, Oct 23, 2007
    #7
  8. Peter Flynn Guest

    Re: What is the best html to latex program on the market or the internet?

    wrote:
    > Basically, it should do all that any of the tools below and in
    > addition,


    You've already asked this, and been given the answer, but in case you
    didn't see it...

    XSLT.

    Run your HTML through Tidy to produce XHTML.
    Then write an XSLT script to transform it to LaTeX.
    This gives you 100% control and ensures robustness.

    However, handling all the stupid things HTML authors do may make it
    long-winded if you want to cope with them all. On the other hand, if
    you are dealing with a reasonably consistent subset, it's probably the
    most reliable method.

    ///Peter
     
    Peter Flynn, Oct 23, 2007
    #8
  9. Guest

    On Oct 23, 3:27 pm, Peter Flynn <> wrote:
    > wrote:
    > > Basically, it should do all that any of the tools below and in
    > > addition,

    >
    > You've already asked this, and been given the answer, but in case you
    > didn't see it...
    >
    > XSLT.
    >
    > Run your HTML through Tidy to produce XHTML.
    > Then write an XSLT script to transform it to LaTeX.
    > This gives you 100% control and ensures robustness.
    >
    > However, handling all the stupid things HTML authors do may make it
    > long-winded if you want to cope with them all. On the other hand, if
    > you are dealing with a reasonably consistent subset, it's probably the
    > most reliable method.
    >
    > ///Peter


    forgot to cc to myself.
    Janusz
     
    , Oct 24, 2007
    #9
  10. Victor Ivrii Guest

    On Oct 23, 6:27 pm, Peter Flynn <> wrote:
    > wrote:
    > > Basically, it should do all that any of the tools below and in
    > > addition,

    >
    > You've already asked this, and been given the answer, but in case you
    > didn't see it...
    >
    > XSLT.
    >
    > Run your HTML through Tidy to produce XHTML.
    > Then write an XSLT script to transform it to LaTeX.
    > This gives you 100% control and ensures robustness.
    >
    > However, handling all the stupid things HTML authors do may make it
    > long-winded if you want to cope with them all. On the other hand, if
    > you are dealing with a reasonably consistent subset, it's probably the
    > most reliable method.


    One should remember that while tex parser (tex/latex/...) can run in
    quiet mode, it is not a default and finished tex document normally
    does not contain any tex errors. Meanwhile few html parsers (web
    browsers) even advise about errors. As a result absolute majority of
    html sources contain errors, from few to few hundreds (the latter is
    the case usually with commercial web pages, produced by community
    colleges graduates, who check their pages only against a specific
    version of MSIE). The task of converting of such html sources to error-
    free tex ones seems to be a really daunting




    >
    > ///Peter
     
    Victor Ivrii, Oct 24, 2007
    #10
  11. tsy Guest

    On Oct 24, 5:27 am, Peter Flynn <> wrote:
    > wrote:
    > Run your HTML through Tidy to produce XHTML.
    > Then write an XSLT script to transform it to LaTeX.
    > This gives you 100% control and ensures robustness.

    Is XSLT way easier than using a decent scripting language with a SAX
    library?
     
    tsy, Oct 24, 2007
    #11
  12. Peter Flynn Guest

    Re: What is the best html to latex program on the market or theinternet ?

    On Wed, 24 Oct 2007 08:21:29 -0700, tsy wrote:

    > On Oct 24, 5:27 am, Peter Flynn <> wrote:
    >> wrote:
    >> Run your HTML through Tidy to produce XHTML. Then write an XSLT script
    >> to transform it to LaTeX. This gives you 100% control and ensures
    >> robustness.

    > Is XSLT way easier than using a decent scripting language with a SAX
    > library?


    Yes. XSLT *is* a decent scripting (well, transformation-to-other-formats)
    language.

    ///Peter
     
    Peter Flynn, Oct 27, 2007
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    0
    Views:
    317
  2. Ramdas
    Replies:
    0
    Views:
    284
    Ramdas
    Mar 7, 2007
  3. Replies:
    6
    Views:
    327
    Peter Flynn
    Oct 27, 2007
  4. tsy
    Replies:
    2
    Views:
    381
    Peter Flynn
    Oct 28, 2007
  5. tsy
    Replies:
    5
    Views:
    448
    Pavel Lepin
    Oct 30, 2007
Loading...

Share This Page