Regular expression code implementation review between Tcl vs Python

Discussion in 'Python' started by K_Lee, Nov 11, 2003.

  1. K_Lee

    K_Lee Guest

    I documented the regex internal implementation code for both Tcl and Python.

    As much as I like Tcl, I like Python's code much more.
    Tcl's Stub interface to the external commands is confusing to
    outsider. I still don't get why the stub interface is needed.

    One aspect I don't understanding about python is that the Python
    language itself is object oriented and all the internal is implement
    as C object. Why not C++ object?

    Regular Expression
    initregex - regexmodule.c:646
    regex_global_methods - regexmodule.c:635
    regex_compile - regexmodule.c:410
    regex_symcomp - regexmodule.c:637
    regex_match - regexmodule.c:638
    regex_search - regexmodule.c:639
    re_compile_initialize - regexpr.c:448
    re_compile_pattern - regexpr.h:83

    Regular Expression
    TclRegexp - tclRegexp.h:35
    Tcl_RegExpCompile - tclRegexp.c:141
    CompileRegexp - tclRegexp.c:147
    Tcl_RegExpExec - tclRegexp.c:173
    RegExpExecUniChar - tclRegexp.c:213
    RegExpExecUniChar - tclRegexp.c:291
    K_Lee, Nov 11, 2003
    1. Advertisements

  2. K_Lee

    Ralf Fassel Guest

    Ralf Fassel, Nov 11, 2003
    1. Advertisements

  3. K_Lee

    Helmut Giese Guest

    Simple. Let's create an example.
    If you don't use it, than you have to link your extension against the
    current version of Tcl, say, tcl84.lib. Easy, no problem.

    But tomorrow Tcl 8.5 comes out and you have the problem, that
    tcl85.dll is running (used by tclsh or wish) and your extension needs
    tcl84.dll, since (during its linking) you created an un-breakable
    connection between the two.
    - Stick with the older version of Tcl.
    - Re-compile the extension now linking against Tcl85.lib (and repeat
    for Tcl 8.6, 8.7, etc.)
    - Don't link against Tcl8.x lib but use the 'stub interface'. This
    avoids creating this fixed connection between your extension and a
    particular version of Tcl, and you can use the extension with any
    future version of Tcl and be happy ever after (unless the stub
    interface itself changes, but this will be in a completely different
    time frame- if it should ever happen at all).
    Just my 0.02: I suppose that C++ compilers still differ a lot more on
    different platforms (concerning their conformance to the standard)
    than C compilers do. So, if portability is high on your check list, C
    still is the language of choice - but in the future C++ will catch up
    Best regards
    Helmut Giese
    Helmut Giese, Nov 11, 2003
  4. K_Lee

    3seas Guest

    hehe, I hit that wall to, as the two recent posts by them, the subject line
    of, caught my attention. Anyway I went to the main or base of the link..

    the posts are examples of use of their software and really not python
    specific. But if they wrote it in python, it'd likely be more cross
    platform compatable, as far as running the server application.
    3seas, Nov 11, 2003
  5. K_Lee

    lvirden Guest

    According to K_Lee <>:
    :Tcl's Stub interface to the external commands is confusing to
    :eek:utsider. I still don't get why the stub interface is needed.

    Check out the explanation at to understand
    why the stub interface is needed.
    lvirden, Nov 11, 2003
  6. K_Lee wrote:
    Using an OO language doesn't necessarily make implementing another
    OO language any easier, when the object models are mismatched (and
    they almost invariably are). Look at other opensource OO languages,
    such as Ruby, and you'll see they also use C, not C++.

    Actually, a small and carefully selected subset of C++ might surely help,
    _if_ there was consensus on what that subset should be and net of
    the work of hashing that consensus out. But C++ just isn't very much
    in the opensource culture -- C is just much more popular. Eric Raymond's
    book "The Art of Unix Programming" doesn't address the issue directly but
    much of what it says about "Unix culture" extends directly to opensource
    (as he notes, too). There are such values as an admiration for simplicity
    (cfr. C's principle, as per the Rationale of the C Standard, "provide only
    one way to perform an operation", and Python's corresponding "there ought
    to be one, and preferably only one, obvious way to do it") which militate in
    favour of C (which may not have reached simplicity everywhere but surely
    did and does always strive for it) and against C++ (which never gave
    language-simplicity a high priority level among its many design goals); even
    in subcultures that overtly reject such principles (e.g., Perl's) they may
    still have some subconscious-level effect.

    I used to bemoan this back when I was a C++ not-quite-but-close-guru.
    3 years later, with little use of C++ in the meantime, I have forgotten more
    about C++ than most of it practitioners will ever learn... but C is just
    never forgotten, like how to swim or how to ride a bicycle. So, I am not
    any longer so sure if my "bemoaning" was warranted.

    Alex Martelli, Nov 11, 2003
  7. K_Lee

    Rainer Deyke Guest

    Using RAII to automate reference count management would certainly help in
    implementing Python, even if it is a feature that Python itself doesn't
    Rainer Deyke, Nov 11, 2003
  8. Hello Alex,

    you may find this rather unusual but I wanted to say that
    I often read your postings in c.l.python in admiration.

    They usually are very instructive, explanatory, and contain
    a lot of knowledge about Python and things related. A lot can
    be learned from them indeed. I sometimes wonder how you find
    the time to write so many in c.l.p (and usually quite long
    and detailed too)...

    But also am I yealous of the quality of the language used
    in them... Though I'm not a native English speaker (like you,
    right?) I think I master the language fairly well, but
    not nearly as well as you ;-)

    Just wondering and wanted to let you know.

    Irmen de Jong.
    Irmen de Jong, Nov 11, 2003
  9. K_Lee

    K_Lee Guest

    Sorry about that.

    Will work on that. Here's the reason why we need the cookie at
    this point.

    The SDoc is a web document (like blogging ) where any number
    of users (current we have ~800 users at our website) can write and
    any topics in the document. The topics are organized like
    a directory tree. Each users can collaps and expand his/her owner
    topics without changing the view of other users.

    For example, we have a lots internal webbase document with
    thousand of topics.

    But at any this moment for the TCL document, I really
    only care about regex but not the network subtree. My current view
    have the network part is collaps state and the regular expression is
    is full expand state.

    We use cookie to keep track of each topics state in every document
    for every user not to keep track of what they view but to help
    the overload of information when the topic tree is shown.

    I really didn't want to implement that features with cookie initially,
    but I had tough time finding other "right" method to do it. Now I
    we can implement it with the embedded the user id as part of URL or
    ?uid=12345 append at the end of every URL.

    This will allows user who turn off cookie to browse the page. But if
    reason for turnning off the cookie is so the website can't track you.
    It doesn't really solve the problem.

    Either way, any dynamically generated webpage can still track you with
    the id embedded into the URL in every reponse even when you turn off
    the cookie.

    BTW, I see a lot of more other info from the weblog that is far more
    interesting than cookies. Cookie are kind boring from analyzing the
    webtraffic point of view.

    Sorry again for the "obstacles".


    K_Lee, Nov 12, 2003
  10. K_Lee

    K_Lee Guest

    Helmut, Thanks for the 0.02. :)

    The "normal" os's dll, .so system use dlsym() call to resolve the
    function "string_name" to function pointers. They seems to work for
    upward compatibilities for most of the cases.

    But I kind understand the argument from the statics link library
    point of view, just think the price is too high for
    such "features". I guess the TCL original goal was also to support
    Win16, DOS, etc.

    For example, here's the python's code for regexp methods functions
    They are cleaner more modular than the TCL's stub table. (My 0.005)

    static struct PyMethodDef regex_global_methods[] = {
    {"compile", regex_compile, METH_VARARGS},
    {"symcomp", regex_symcomp, METH_VARARGS},
    {"match", regex_match, METH_VARARGS},
    {"search", regex_search, METH_VARARGS},
    {"set_syntax", regex_set_syntax, METH_VARARGS},
    {"get_syntax", (PyCFunction)regex_get_syntax, METH_NOARGS},
    {NULL, NULL} /* sentinel */

    Here's the code for TCL Stub:

    Tcl_RegExpCompile, /* 212 */
    Tcl_RegExpExec, /* 213 */
    Tcl_RegExpMatch, /* 214 */
    Tcl_RegExpRange, /* 215 */
    K_Lee, Nov 12, 2003
  11. K_Lee

    K_Lee Guest


    It is sad but true.

    One c++ project (was commercial, now open source) I found
    that is very interesting to read/browse is the chorus. It is micro-kernel
    os where everything is a class/object including threads, pagetable, scheduler.
    Very interesting. Unfortunately, it seems to be dying when compare to
    BSD, Linux.

    What's nice about python's internal is that it is very clean
    and well structure with object orient mind set.

    K Lee
    K_Lee, Nov 12, 2003
  12. No it wasn't. Given whatever core an extension is loaded into, the
    extension grabs the function table from the core it is loaded into.

    Say for example I build all of Tcl statically into my application (kinda
    dumb, but let's just say). If I try to load an extension into it, where is
    it supposed to get the Tcl functions it needs? Do you see the issue Stubs

    PS. try that setup with BLT and see why Stubs is a life saver for those in
    the know.
    David Gravereaux, Nov 12, 2003
  13. Yes, C++ surely has plenty of features -- including RAII, templates,
    etc -- that might come in useful, quite apart from Python and/or C++
    "being OO". You can see that in extensions written with "Boost
    Python" -- you end up using _FAR_ less boilerplate, both for reference
    counting and other housekeeping and interfacing tasks.

    However, for the size and speed of the resulting code you end up
    somewhat at the mercy of your C++ compiler, and I'm not sure they're
    anywhere near as optimized as C compilers, yet -- it IS a MUCH larger
    and more complicated language, after all.

    The problems that C++'s size and complication give to its compilers
    (and may, by now, be overcome in the very best compilers -- I do
    believe that after these many years there ARE compilers which do
    claim 100% compliance with the standard, for example, though I don't
    know about quality of optimization) pair up with those given to its
    users -- the price to pay for that wonderful plenty of features, of
    course, and the awesome _potential_ for speed (even though there may
    be yet some time to wait until the full extent of that potential is
    actualized by super optimizers).

    There's a long-standing debate on whether big languages can be
    effectively subsetted, identifying a set of features that can be
    counted on to be solidly and optimally implemented on all the
    relevant compilers and ensuring no features outside that set are
    ever used -- thus allowing programmers to only learn "half" the
    language, or some such fraction. I believe this doesn't work,
    except perhaps in a tightly knit group of Extreme Programmers who
    are really keen to enforce the group-mandated subsetting rules --
    and even then, slight mistakes may end up in "correct" code that
    however strays from the blessed subset and produces symptoms which
    you need complete knowledge of the language to understand.

    So, my opinion is that if an open-source project adopted C++, it
    would basically require contributors to make the effort to learn
    the whole C++ language, not just half of it. _Some_ OS projects,
    such as Mozilla or KDE, appear to thrive on C++, I believe, and
    enforce SOME subsetting effectively. I'm not sure exactly what
    dynamics are at play there, since I'm not a part of those projects.
    Still, C still dominates the opensource scene -- C++ has a much
    smaller presence there.

    Alex Martelli, Nov 12, 2003
  14. K_Lee

    K_Lee Guest

    Ok, David, I understand better now.

    (Putting on my monday morning quarterback hat.)

    If the original design specifies that
    * If you load extension with .so/dll, you are required
    load tcl from .so/dll.
    then we can get rid the stub interface, right?

    That doesn't sound like an unreasonable requirement for
    people who use any tcl + extension.

    I still think Tcl'folks did a great jobs, just like to know the
    trade off vs. features better.

    K Lee
    K_Lee, Nov 12, 2003
  15. Yes, I think so. But still extensions would get locked to a named shared
    library. For example, loading a BLT extension which was linked against
    tcl83.lib (implicit) into tclsh84 will breach and crash. Either way with
    or without Stubs, implicit linking is still available. The EXTERN macro in
    the prototypes exports them all.
    It's real easy to use, though.
    1) compile with -DUSE_TCL_STUBS and link with tclstub8X.(lib|a)
    2) Use this in the extension's exported *_Init as the first call:

    #ifdef USE_TCL_STUBS
    if (Tcl_InitStubs(interp, TCL_VERSION, 0) == NULL) {
    return TCL_ERROR;

    Beyond that, you don't need to think about it. You can even extend the
    Stubs concept for applications that embed Tcl. Steps would be:

    0) same as #1 above.
    1) Knowing the path to a certain Tcl dll/so at or greater than the version
    you require, dlopen/LoadLibrary it.
    2) Get the address for Tcl_CreateInterp from the outside with
    3) make the first interp
    4) call Tcl_InitStubs (code from tclstub8x.(lib|a) in us)
    5) call Tcl_FindExecutable (very important!)
    6) Done, function table loaded. Tcl_DeleteInterp or use it... You have
    avoided 880 (or more) GetProcAddress calls to fill your own tables. Before
    Stubs came around, I was guilty of doing that :)

    Now you have an application that is upgradable from the outside, given that
    the path to the Tcl dll/so is settable or auto-discovered in some manner.
    It gets even better with extensions that export a Stubs table. You can
    build extensions for extensions :) Witness:


    EXTERN int
    Gui_irc_Init (Tcl_Interp *interp)
    #ifdef USE_TCL_STUBS
    if (Tcl_InitStubs(interp, "8.3", 0) == NULL) {
    return TCL_ERROR;
    #ifdef USE_ITCL_STUBS
    if (Itcl_InitStubs(interp, "3.1", 0) == NULL) {
    return TCL_ERROR;
    new IRCWindowsItclAdapter(interp);
    return TCL_OK;

    Using the services of [Incr Tcl] as well as Tcl, that extension declares
    class methods for this script:

    itcl::class IRC::ui {
    constructor {args} {
    destructor {
    public {
    method destroy {} {itcl::delete object $this}
    method echo {} @irc-ui-echo
    method window {} @irc-ui-window
    method menu {} @irc-ui-menu
    method hotkey {} @irc-ui-hotkey
    method alias {} @irc-ui-alias
    method channel {} @irc-ui-channel
    method query {} @irc-ui-query
    method chat {} @irc-ui-chat
    method queries {} @irc-ui-queries
    method chats {} @irc-ui-chats
    method say {} @irc-ui-say
    method input {} @irc-ui-input
    private {
    method _initUI {} @irc-ui-construct
    method _destroyUI {} @irc-ui-destruct

    The '@irc-ui-*' names refer to Tcl_CmdObjProcs set by the extension with
    Itcl_RegisterObjC(). IMO, Stubs does not stop at the core.
    David Gravereaux, Nov 12, 2003

  16. i thought 2003 was the year of the cookie

    oh wait that was 96
    Ro the Muppet, Nov 12, 2003
  17. Not dumb, sorry. Whoop.

    Starpacs do this and for good reason: to be a single file application
    distribution without dependencies.
    David Gravereaux, Nov 13, 2003
  18. _Some_ OS projects, such as Mozilla or KDE, appear to thrive on C++,
    In the case of Mozilla, some guidelines are available at
    I don't know whether this is still enforced, since this document is 5
    years old. And it shows: no templates allowed (hence no STL), no
    exceptions, no RTTI... Phew! Must be very frustrating to follow, but if
    such is the price of portability...

    Also, using C++ for implementing scripting languages complicates
    interfacing to extensions in the form of dynamic libraries, because of
    the various name-mangling schemes different compilers use. How does
    boost solve that problem, btw?
    =?ISO-8859-1?Q?Bernard_Delm=E9e?=, Nov 13, 2003
  19. Right -- still, it froze me solid of any inkling of joining the
    mozilla team (I wouldn't mind "no RTTI", but...:).

    A Python extension module just exposes "initmyname", it's not hard
    to define that one as extern "C". And the Python headers are careful
    to always wrap extern "C" ( ... } around everything if you use C++.
    So, NP.

    Alex Martelli, Nov 13, 2003
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.