Regular expression code implementation review between Tcl vs Python

K

K_Lee

I documented the regex internal implementation code for both Tcl and Python.

As much as I like Tcl, I like Python's code much more.
Tcl's Stub interface to the external commands is confusing to
outsider. I still don't get why the stub interface is needed.

One aspect I don't understanding about python is that the Python
language itself is object oriented and all the internal is implement
as C object. Why not C++ object?



http://www.slink-software.com/W/SrcDoc_Top/Python-2.3/Python-2.3.sdoc/N_97

Regular Expression
initregex - regexmodule.c:646
regex_global_methods - regexmodule.c:635
regex_compile - regexmodule.c:410
regex_symcomp - regexmodule.c:637
regex_match - regexmodule.c:638
regex_search - regexmodule.c:639
re_compile_initialize - regexpr.c:448
re_compile_pattern - regexpr.h:83


http://www.slink-software.com/W/SrcDoc_Top/tcl8.4.4/tcl8.4.4.sdoc/N_92

Regular Expression
TclRegexp - tclRegexp.h:35
Tcl_RegExpCompile - tclRegexp.c:141
CompileRegexp - tclRegexp.c:147
Tcl_RegExpExec - tclRegexp.c:173
RegExpExecUniChar - tclRegexp.c:213
RegExpExecUniChar - tclRegexp.c:291
 
H

Helmut Giese

I documented the regex internal implementation code for both Tcl and Python.

As much as I like Tcl, I like Python's code much more.
Tcl's Stub interface to the external commands is confusing to
outsider. I still don't get why the stub interface is needed.
Simple. Let's create an example.
If you don't use it, than you have to link your extension against the
current version of Tcl, say, tcl84.lib. Easy, no problem.

But tomorrow Tcl 8.5 comes out and you have the problem, that
tcl85.dll is running (used by tclsh or wish) and your extension needs
tcl84.dll, since (during its linking) you created an un-breakable
connection between the two.
Solutions:
- Stick with the older version of Tcl.
- Re-compile the extension now linking against Tcl85.lib (and repeat
for Tcl 8.6, 8.7, etc.)
- Don't link against Tcl8.x lib but use the 'stub interface'. This
avoids creating this fixed connection between your extension and a
particular version of Tcl, and you can use the extension with any
future version of Tcl and be happy ever after (unless the stub
interface itself changes, but this will be in a completely different
time frame- if it should ever happen at all).
One aspect I don't understanding about python is that the Python
language itself is object oriented and all the internal is implement
as C object. Why not C++ object?
Just my 0.02: I suppose that C++ compilers still differ a lot more on
different platforms (concerning their conformance to the standard)
than C compilers do. So, if portability is high on your check list, C
still is the language of choice - but in the future C++ will catch up
(IMHO).
Best regards
Helmut Giese
 
3

3seas

Ralf said:
* (e-mail address removed) (K_Lee)
| http://www.slink-software.com/W/SrcDoc_Top/tcl8.4.4/tcl8.4.4.sdoc/N_92

<quote>
If you still like to use this webbase application, please enable
the cookie in your browser and try again.
</quote>

You bet I won't. If you want people to read your stuff, let them read
it with no obstacles.

R'

hehe, I hit that wall to, as the two recent posts by them, the subject line
of, caught my attention. Anyway I went to the main or base of the link..

the posts are examples of use of their software and really not python
specific. But if they wrote it in python, it'd likely be more cross
platform compatable, as far as running the server application.
 
A

Alex Martelli

K_Lee wrote:
...
One aspect I don't understanding about python is that the Python
language itself is object oriented and all the internal is implement
as C object. Why not C++ object?

Using an OO language doesn't necessarily make implementing another
OO language any easier, when the object models are mismatched (and
they almost invariably are). Look at other opensource OO languages,
such as Ruby, and you'll see they also use C, not C++.

Actually, a small and carefully selected subset of C++ might surely help,
_if_ there was consensus on what that subset should be and net of
the work of hashing that consensus out. But C++ just isn't very much
in the opensource culture -- C is just much more popular. Eric Raymond's
book "The Art of Unix Programming" doesn't address the issue directly but
much of what it says about "Unix culture" extends directly to opensource
(as he notes, too). There are such values as an admiration for simplicity
(cfr. C's principle, as per the Rationale of the C Standard, "provide only
one way to perform an operation", and Python's corresponding "there ought
to be one, and preferably only one, obvious way to do it") which militate in
favour of C (which may not have reached simplicity everywhere but surely
did and does always strive for it) and against C++ (which never gave
language-simplicity a high priority level among its many design goals); even
in subcultures that overtly reject such principles (e.g., Perl's) they may
still have some subconscious-level effect.

I used to bemoan this back when I was a C++ not-quite-but-close-guru.
3 years later, with little use of C++ in the meantime, I have forgotten more
about C++ than most of it practitioners will ever learn... but C is just
never forgotten, like how to swim or how to ride a bicycle. So, I am not
any longer so sure if my "bemoaning" was warranted.


Alex
 
R

Rainer Deyke

Alex said:
Using an OO language doesn't necessarily make implementing another
OO language any easier, when the object models are mismatched (and
they almost invariably are).

Using RAII to automate reference count management would certainly help in
implementing Python, even if it is a feature that Python itself doesn't
have.
 
I

Irmen de Jong

Hello Alex,

you may find this rather unusual but I wanted to say that
I often read your postings in c.l.python in admiration.

They usually are very instructive, explanatory, and contain
a lot of knowledge about Python and things related. A lot can
be learned from them indeed. I sometimes wonder how you find
the time to write so many in c.l.p (and usually quite long
and detailed too)...

But also am I yealous of the quality of the language used
in them... Though I'm not a native English speaker (like you,
right?) I think I master the language fairly well, but
not nearly as well as you ;-)

Just wondering and wanted to let you know.

Regards,
Irmen de Jong.
(NL)
 
K

K_Lee

Ralf Fassel said:
* (e-mail address removed) (K_Lee)
| http://www.slink-software.com/W/SrcDoc_Top/tcl8.4.4/tcl8.4.4.sdoc/N_92

<quote>
If you still like to use this webbase application, please enable
the cookie in your browser and try again.
</quote>

You bet I won't. If you want people to read your stuff, let them read
it with no obstacles.

R'
Sorry about that.

Will work on that. Here's the reason why we need the cookie at
this point.

The SDoc is a web document (like blogging ) where any number
of users (current we have ~800 users at our website) can write and
read
any topics in the document. The topics are organized like
a directory tree. Each users can collaps and expand his/her owner
topics without changing the view of other users.

For example, we have a lots internal webbase document with
thousand of topics.

But at any this moment for the TCL document, I really
only care about regex but not the network subtree. My current view
have the network part is collaps state and the regular expression is
is full expand state.

We use cookie to keep track of each topics state in every document
for every user not to keep track of what they view but to help
minimize
the overload of information when the topic tree is shown.


I really didn't want to implement that features with cookie initially,
but I had tough time finding other "right" method to do it. Now I
believe
we can implement it with the embedded the user id as part of URL or
?uid=12345 append at the end of every URL.

This will allows user who turn off cookie to browse the page. But if
the
reason for turnning off the cookie is so the website can't track you.
It doesn't really solve the problem.

Either way, any dynamically generated webpage can still track you with
the id embedded into the URL in every reponse even when you turn off
the cookie.

BTW, I see a lot of more other info from the weblog that is far more
interesting than cookies. Cookie are kind boring from analyzing the
webtraffic point of view.


Sorry again for the "obstacles".

:)

K_Lee
 
K

K_Lee

Simple. Let's create an example.
If you don't use it, than you have to link your extension against the
current version of Tcl, say, tcl84.lib. Easy, no problem.

But tomorrow Tcl 8.5 comes out and you have the problem, that
tcl85.dll is running (used by tclsh or wish) and your extension needs
tcl84.dll, since (during its linking) you created an un-breakable
connection between the two.
Solutions:
- Stick with the older version of Tcl.
- Re-compile the extension now linking against Tcl85.lib (and repeat
for Tcl 8.6, 8.7, etc.)
- Don't link against Tcl8.x lib but use the 'stub interface'. This
avoids creating this fixed connection between your extension and a
particular version of Tcl, and you can use the extension with any
future version of Tcl and be happy ever after (unless the stub
interface itself changes, but this will be in a completely different
time frame- if it should ever happen at all).

Just my 0.02: I suppose that C++ compilers still differ a lot more on
different platforms (concerning their conformance to the standard)
than C compilers do. So, if portability is high on your check list, C
still is the language of choice - but in the future C++ will catch up
(IMHO).
Best regards
Helmut Giese

Helmut, Thanks for the 0.02. :)

The "normal" os's dll, .so system use dlsym() call to resolve the
function "string_name" to function pointers. They seems to work for
upward compatibilities for most of the cases.

But I kind understand the argument from the statics link library
point of view, just think the price is too high for
such "features". I guess the TCL original goal was also to support
Win16, DOS, etc.

For example, here's the python's code for regexp methods functions
pointers.
They are cleaner more modular than the TCL's stub table. (My 0.005)

http://www.slink-software.com/W/SL_...FILE_Modules/regexmodule.c/L_635/LN_635#L_632


static struct PyMethodDef regex_global_methods[] = {
{"compile", regex_compile, METH_VARARGS},
{"symcomp", regex_symcomp, METH_VARARGS},
{"match", regex_match, METH_VARARGS},
{"search", regex_search, METH_VARARGS},
{"set_syntax", regex_set_syntax, METH_VARARGS},
{"get_syntax", (PyCFunction)regex_get_syntax, METH_NOARGS},
{NULL, NULL} /* sentinel */
};


Here's the code for TCL Stub:
http://www.slink-software.com/W/SL_...FILE_generic/tclStubInit.c/L_650/LN_647#L_644

....
Tcl_RegExpCompile, /* 212 */
Tcl_RegExpExec, /* 213 */
Tcl_RegExpMatch, /* 214 */
Tcl_RegExpRange, /* 215 */
....
 
K

K_Lee

Alex Martelli said:
K_Lee wrote:
...

Using an OO language doesn't necessarily make implementing another
OO language any easier, when the object models are mismatched (and
they almost invariably are). Look at other opensource OO languages,
such as Ruby, and you'll see they also use C, not C++.

Actually, a small and carefully selected subset of C++ might surely help,
_if_ there was consensus on what that subset should be and net of
the work of hashing that consensus out. But C++ just isn't very much
in the opensource culture -- C is just much more popular. Eric Raymond's
book "The Art of Unix Programming" doesn't address the issue directly but
much of what it says about "Unix culture" extends directly to opensource
(as he notes, too). There are such values as an admiration for simplicity
(cfr. C's principle, as per the Rationale of the C Standard, "provide only
one way to perform an operation", and Python's corresponding "there ought
to be one, and preferably only one, obvious way to do it") which militate in
favour of C (which may not have reached simplicity everywhere but surely
did and does always strive for it) and against C++ (which never gave
language-simplicity a high priority level among its many design goals); even
in subcultures that overtly reject such principles (e.g., Perl's) they may
still have some subconscious-level effect.

I used to bemoan this back when I was a C++ not-quite-but-close-guru.
3 years later, with little use of C++ in the meantime, I have forgotten more
about C++ than most of it practitioners will ever learn... but C is just
never forgotten, like how to swim or how to ride a bicycle. So, I am not
any longer so sure if my "bemoaning" was warranted.


Alex

Alex,

It is sad but true.

One c++ project (was commercial, now open source) I found
that is very interesting to read/browse is the chorus. It is micro-kernel
os where everything is a class/object including threads, pagetable, scheduler.
Very interesting. Unfortunately, it seems to be dying when compare to
BSD, Linux.

http://www.slink-software.com/W/SrcDoc_Top/chorus_c5/chorus_c5.sdoc/N_59


What's nice about python's internal is that it is very clean
and well structure with object orient mind set.

K Lee
 
D

David Gravereaux

I guess the TCL original goal was also to support
Win16, DOS, etc.

No it wasn't. Given whatever core an extension is loaded into, the
extension grabs the function table from the core it is loaded into.

Say for example I build all of Tcl statically into my application (kinda
dumb, but let's just say). If I try to load an extension into it, where is
it supposed to get the Tcl functions it needs? Do you see the issue Stubs
solves?

PS. try that setup with BLT and see why Stubs is a life saver for those in
the know.
 
A

Alex Martelli

Rainer said:
Using RAII to automate reference count management would certainly help in
implementing Python, even if it is a feature that Python itself doesn't
have.

Yes, C++ surely has plenty of features -- including RAII, templates,
etc -- that might come in useful, quite apart from Python and/or C++
"being OO". You can see that in extensions written with "Boost
Python" -- you end up using _FAR_ less boilerplate, both for reference
counting and other housekeeping and interfacing tasks.

However, for the size and speed of the resulting code you end up
somewhat at the mercy of your C++ compiler, and I'm not sure they're
anywhere near as optimized as C compilers, yet -- it IS a MUCH larger
and more complicated language, after all.

The problems that C++'s size and complication give to its compilers
(and may, by now, be overcome in the very best compilers -- I do
believe that after these many years there ARE compilers which do
claim 100% compliance with the standard, for example, though I don't
know about quality of optimization) pair up with those given to its
users -- the price to pay for that wonderful plenty of features, of
course, and the awesome _potential_ for speed (even though there may
be yet some time to wait until the full extent of that potential is
actualized by super optimizers).

There's a long-standing debate on whether big languages can be
effectively subsetted, identifying a set of features that can be
counted on to be solidly and optimally implemented on all the
relevant compilers and ensuring no features outside that set are
ever used -- thus allowing programmers to only learn "half" the
language, or some such fraction. I believe this doesn't work,
except perhaps in a tightly knit group of Extreme Programmers who
are really keen to enforce the group-mandated subsetting rules --
and even then, slight mistakes may end up in "correct" code that
however strays from the blessed subset and produces symptoms which
you need complete knowledge of the language to understand.

So, my opinion is that if an open-source project adopted C++, it
would basically require contributors to make the effort to learn
the whole C++ language, not just half of it. _Some_ OS projects,
such as Mozilla or KDE, appear to thrive on C++, I believe, and
enforce SOME subsetting effectively. I'm not sure exactly what
dynamics are at play there, since I'm not a part of those projects.
Still, C still dominates the opensource scene -- C++ has a much
smaller presence there.


Alex
 
K

K_Lee

David Gravereaux said:
No it wasn't. Given whatever core an extension is loaded into, the
extension grabs the function table from the core it is loaded into.

Say for example I build all of Tcl statically into my application (kinda
dumb, but let's just say). If I try to load an extension into it, where is
it supposed to get the Tcl functions it needs? Do you see the issue Stubs
solves?


Ok, David, I understand better now.

(Putting on my monday morning quarterback hat.)

If the original design specifies that
* If you load extension with .so/dll, you are required
load tcl from .so/dll.
then we can get rid the stub interface, right?

That doesn't sound like an unreasonable requirement for
people who use any tcl + extension.


I still think Tcl'folks did a great jobs, just like to know the
trade off vs. features better.


K Lee
 
D

David Gravereaux

Ok, David, I understand better now.

(Putting on my monday morning quarterback hat.)

If the original design specifies that
* If you load extension with .so/dll, you are required
load tcl from .so/dll.
then we can get rid the stub interface, right?

Yes, I think so. But still extensions would get locked to a named shared
library. For example, loading a BLT extension which was linked against
tcl83.lib (implicit) into tclsh84 will breach and crash. Either way with
or without Stubs, implicit linking is still available. The EXTERN macro in
the prototypes exports them all.
That doesn't sound like an unreasonable requirement for
people who use any tcl + extension.

It's real easy to use, though.
1) compile with -DUSE_TCL_STUBS and link with tclstub8X.(lib|a)
2) Use this in the extension's exported *_Init as the first call:

#ifdef USE_TCL_STUBS
if (Tcl_InitStubs(interp, TCL_VERSION, 0) == NULL) {
return TCL_ERROR;
}
#endif

Beyond that, you don't need to think about it. You can even extend the
Stubs concept for applications that embed Tcl. Steps would be:

0) same as #1 above.
1) Knowing the path to a certain Tcl dll/so at or greater than the version
you require, dlopen/LoadLibrary it.
2) Get the address for Tcl_CreateInterp from the outside with
dlsym/GetProcAddress
3) make the first interp
4) call Tcl_InitStubs (code from tclstub8x.(lib|a) in us)
5) call Tcl_FindExecutable (very important!)
6) Done, function table loaded. Tcl_DeleteInterp or use it... You have
avoided 880 (or more) GetProcAddress calls to fill your own tables. Before
Stubs came around, I was guilty of doing that :)

Now you have an application that is upgradable from the outside, given that
the path to the Tcl dll/so is settable or auto-discovered in some manner.
I still think Tcl'folks did a great jobs, just like to know the
trade off vs. features better.

It gets even better with extensions that export a Stubs table. You can
build extensions for extensions :) Witness:

#undef TCL_STORAGE_CLASS
#define TCL_STORAGE_CLASS DLLEXPORT

EXTERN int
Gui_irc_Init (Tcl_Interp *interp)
{
#ifdef USE_TCL_STUBS
if (Tcl_InitStubs(interp, "8.3", 0) == NULL) {
return TCL_ERROR;
}
#endif
#ifdef USE_ITCL_STUBS
if (Itcl_InitStubs(interp, "3.1", 0) == NULL) {
return TCL_ERROR;
}
#endif
new IRCWindowsItclAdapter(interp);
return TCL_OK;
}

Using the services of [Incr Tcl] as well as Tcl, that extension declares
class methods for this script:

itcl::class IRC::ui {
constructor {args} {
_initUI
}
destructor {
_destroyUI
}
public {
method destroy {} {itcl::delete object $this}
method echo {} @irc-ui-echo
method window {} @irc-ui-window
method menu {} @irc-ui-menu
method hotkey {} @irc-ui-hotkey
method alias {} @irc-ui-alias
method channel {} @irc-ui-channel
method query {} @irc-ui-query
method chat {} @irc-ui-chat
method queries {} @irc-ui-queries
method chats {} @irc-ui-chats
method say {} @irc-ui-say
method input {} @irc-ui-input
}
private {
method _initUI {} @irc-ui-construct
method _destroyUI {} @irc-ui-destruct
}
}

The '@irc-ui-*' names refer to Tcl_CmdObjProcs set by the extension with
Itcl_RegisterObjC(). IMO, Stubs does not stop at the core.
 
D

David Gravereaux

David Gravereaux said:
Say for example I build all of Tcl statically into my application (kinda
dumb, but let's just say).

Not dumb, sorry. Whoop.

Starpacs do this and for good reason: to be a single file application
distribution without dependencies.
 
?

=?ISO-8859-1?Q?Bernard_Delm=E9e?=

_Some_ OS projects, such as Mozilla or KDE, appear to thrive on C++,
> I believe, and enforce SOME subsetting effectively.

In the case of Mozilla, some guidelines are available at
http://www.mozilla.org/hacking/portable-cpp.html
I don't know whether this is still enforced, since this document is 5
years old. And it shows: no templates allowed (hence no STL), no
exceptions, no RTTI... Phew! Must be very frustrating to follow, but if
such is the price of portability...

Also, using C++ for implementing scripting languages complicates
interfacing to extensions in the form of dynamic libraries, because of
the various name-mangling schemes different compilers use. How does
boost solve that problem, btw?
 
A

Alex Martelli

Bernard said:
In the case of Mozilla, some guidelines are available at
http://www.mozilla.org/hacking/portable-cpp.html
I don't know whether this is still enforced, since this document is 5
years old. And it shows: no templates allowed (hence no STL), no
exceptions, no RTTI... Phew! Must be very frustrating to follow, but if
such is the price of portability...

Right -- still, it froze me solid of any inkling of joining the
mozilla team (I wouldn't mind "no RTTI", but...:).

Also, using C++ for implementing scripting languages complicates
interfacing to extensions in the form of dynamic libraries, because of
the various name-mangling schemes different compilers use. How does
boost solve that problem, btw?

A Python extension module just exposes "initmyname", it's not hard
to define that one as extern "C". And the Python headers are careful
to always wrap extern "C" ( ... } around everything if you use C++.
So, NP.


Alex
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,570
Members
45,045
Latest member
DRCM

Latest Threads

Top