PHP script communicating with C/C++ program

N

Nimit

Hi, I wasn't sure which forum this post belongs to, so I've posted it
to a couple forums that I thought may be appropriate.

In giving me advice, please consider me a beginner. Below is a synopsis
of my problem/question:

SOME BACKGROUND:
- I am writing a php based web application.
- There is a very data intensive task I need to do that requires
reading and lookup of a lot of data.
- This data is all stored in a database. If I did the computation
directly from php, then the load on the database is too intensive, and
takes a unacceptable amount of time (over 20 secs) to calculate (I've
already tried this).
- So I thought what I need to do is load all this data into memory,
and then do this operation from there, and this should be a lot faster
than doing lookups on a database.

WHAT I WANT TO DO:
- Write a C/C++ program that is constantly running that is dedicated
to doing this data intensive operation.
- This program would read all the data from the dbase into memory upon
load. Then it would take requests from a PHP script to run the
operation on that data and return the results to the PHP script.

THINGS I'VE LOOKED INTO:
- I know you can write extensions to php in C++
(http://bugs.tutorbuddy.com/php5cpp/php5cpp/) but I don't know if this
is what I want to do because (as far as I understand) an extension is
loaded and unloaded with the script. The C++ program I need has to be
constantly running in the background, taking requests and returning
results.

If anyone has any advice of how I can go about writing this c/c++
program that communicates with a php script or doing something similar,
I would appreciate any help. (even if you think I am using the wrong
technology to accomplish this, I'd like to know) Thank you!

PS- my current apache/php/mysql setup is in Windows, however when this
project goes live, I expect it to be running in unix.
 
N

NC

Nimit said:
In giving me advice, please consider me a beginner. Below is a synopsis
of my problem/question:

SOME BACKGROUND:
- I am writing a php based web application.
- There is a very data intensive task I need to do that requires
reading and lookup of a lot of data.
- This data is all stored in a database. If I did the computation
directly from php, then the load on the database is too intensive, and
takes a unacceptable amount of time (over 20 secs) to calculate (I've
already tried this).
- So I thought what I need to do is load all this data into memory,
and then do this operation from there, and this should be a lot faster
than doing lookups on a database.

WHAT I WANT TO DO:
- Write a C/C++ program that is constantly running that is dedicated
to doing this data intensive operation.
- This program would read all the data from the dbase into memory upon
load. Then it would take requests from a PHP script to run the
operation on that data and return the results to the PHP script.

This program has been written a long time ago; it's called MySQL.
MySQL supports HEAP tables, which, unlike any other tables, are
stored in memory, not on disk. See MySQL Manual for details:

http://dev.mysql.com/doc/mysql/en/memory-storage-engine.html

Also, there is a good chance that you task can be sped up without
using HEAP tables. Reduce the number of queries (i.e., use one
complicated query instead of a series of simple queries), think
if your indexing is helping as much as it can, and perhaps
consider if a change in data architecture can contribute to
better performance...

Cheers,
NC
 
C

Csaba Gabor

Nimit said:
WHAT I WANT TO DO:
- Write a C/C++ program that is constantly running that is dedicated
to doing this data intensive operation.
- This program would read all the data from the dbase into memory upon
load. Then it would take requests from a PHP script to run the
operation on that data and return the results to the PHP script.

Here are two ideas that should work on Windows, but not on unix.
However, perhaps it will give you some ideas. Before you try these,
though, I would really double check, maybe even post a question about,
whether you are doing the most complicated computations in the world
that just can't be speeded up. Last guy who asked a question like this
turned out to have mixed up references...

On to the meat: What you are asking for is to write a server (this
isn't the only view possible. You could use polling, for example, or
your app could prepare a file based cache of relevant data and your PHP
could call a second app or pick it up directly) - you want something
that will respond to a client (PHP's) request.

COM technology is useful here. The most straighforward thing to do is
to create a COM object which will act as a server. Probably you can
already do this with your C/C++ knowledge. For those who only know VB,
you can use the wonderful, free, and hard to locate Microsoft provided
VB5CCE to create an .OCX that can be used. The OCX or DLL you create
will act as your server - once you've learned how to do it, it's very
easy. Then (since this COM object will be alive in the background),
you will need to get ahold of it with
http://php.net/com_get_active_object. Mind the warning on that page
since you're wanting to use this in exactly the fashion that the warn
you about. But this is my recommended option - I think it has less
learning overhead than option 2 and should be more efficient since it
will have less code.

There is a second path using COM. Create a CLI (command line) php
program that will sit in the background. This php program will bring
up a hidden version of IE (during development you'd better use a
visible version or you'll go nuts) and stay running by means of
com_message_pump. This instance of IE is globally locatable (because
you will give it a unique title). Your server PHP script will use
$oShell=new COM("WScript.Shell") to loop through all the IE windows and
check for a matching title. At that point you could use this IE to
communicate back and forth with your CLI php. You can add your own
custom variables/values from PHP onto IE. But more importantly, you
can use IE to reach into PHP classes and execute PHP code there.
Although I haven't tried it across PHP scripts, I imagine you could use
your server script to execute an IE function which will call into your
CLI php script. Pretty exciting stuff, but such hookup is beyond the
scope of this post and of course it carries the same warning as the
prior method.

I'll be curious to know your final solution,
Csaba Gabor from Vienna
 
E

ECRIA Public Mail Buffer

Some tasks are simply not suited for development in the form of web
applications. However, with all due respect it is much more likely that you
have not designed your application properly. For example, data lookup is not
a procedure that is limited by the processing speed of PHP; it's the
responsibility of the database (that's what SQL is for).

You may want to read "Advanced PHP Programming" by George Schlossnagle and
"High Performance MySQL" by Jeremy D. Zawodny & Derek J. Balling, both
available at B&N & online at Amazon.com. They deal with optimization
techniques, programming methods and database architecture/query design for
enterprise-level applications. If this is deep water for you, it may be time
to sub-contract. A large web development project is almost guaranteed to
require at least one expert who has experience developing systems on the
same scale.

PHP and MySQL are quite capable of providing a platform for scalable
on-demand high-volume applications, with the right hosting infrastructure,
optimizers and (crucially) forward-thinking application design.

Good luck,

ECRIA
http://www.ecria.com
 
C

Chung Leong

Nimit said:
THINGS I'VE LOOKED INTO:
- I know you can write extensions to php in C++
(http://bugs.tutorbuddy.com/php5cpp/php5cpp/) but I don't know if this
is what I want to do because (as far as I understand) an extension is
loaded and unloaded with the script. The C++ program I need has to be
constantly running in the background, taking requests and returning
results.

That's not true. An extension is loaded once (when the server starts)
and stays in memory. It's not the place where you'd put your processing
code though, as Apache would spawn multiple copies of itself on Unix.
If anyone has any advice of how I can go about writing this c/c++
program that communicates with a php script or doing something similar,
I would appreciate any help. (even if you think I am using the wrong
technology to accomplish this, I'd like to know) Thank you!

Write a daemon that listens on a socket for requests and have PHP
communicate with it.
 
N

Nimit

Hi NC. You've helped me with this problem a few months back if you
remember. The post is at: http://tinyurl.com/8d2ro. The query I am
using is pretty much the one you suggested, where the knows table joins
on itself once for each degree of separation we want to calculate.

To fill everyone else in, what I have is a social network (like
friendster) stored in a database. The database has a "knows" table that
has a row for each relation between two people. If you go to the post
I've mentioned above, you will probably get a better understanding of
what exactly my problem is. I didn't mention it in my original post
because I didn't want to make it too long or go off topic, but I guess
it was necessary.

I took your suggestion to try out Heap tables and it slashed my time a
LOT. I can do the 3rd degree search in about 3.75 seconds now, verus
about 17-18 seconds using the normal tables. That is awesome! I think
that is the answer I was looking for.

However, for that 3.75 seconds, my system totally freezes...like,
winamp stops playing, and everything stops responding. This scares me a
bit, because how many of these queries could this system handle at the
same time? Also, my laptop is has pretty good specs, pentium M 1.7ghz,
with 1GB of RAM, so I don't think it's the hardware. Any thoughts on
this?

Thanks again to all of you for your help and responses.
 
C

Csaba Gabor

Nimit said:
Hi NC. You've helped me with this problem a few months back if you
remember. The post is at: http://tinyurl.com/8d2ro. The query I am
using is pretty much the one you suggested, where the knows table joins
on itself once for each degree of separation we want to calculate.

To fill everyone else in, what I have is a social network (like
friendster) stored in a database. The database has a "knows" table that
has a row for each relation between two people. If you go to the post
I've mentioned above, you will probably get a better understanding of
what exactly my problem is. I didn't mention it in my original post
because I didn't want to make it too long or go off topic, but I guess
it was necessary.

I took your suggestion to try out Heap tables and it slashed my time a
LOT. I can do the 3rd degree search in about 3.75 seconds now, verus
about 17-18 seconds using the normal tables. That is awesome! I think
that is the answer I was looking for.

However, for that 3.75 seconds, my system totally freezes...like,
winamp stops playing, and everything stops responding. This scares me a
bit, because how many of these queries could this system handle at the
same time? Also, my laptop is has pretty good specs, pentium M 1.7ghz,
with 1GB of RAM, so I don't think it's the hardware. Any thoughts on
this?

As suspected, the real problem lay elsewhere.
Here is an approach that should sinificantly reduce your load.
People are not going to be updating their contacts that frequently and
the amount of updating is (in the grand scheme of things) relatively
minor. Ie. once established, the table does not change quickly.

Therefore, have a separate table where for each pair of people you make
a one time computation of their degree of separation (or whatever is
important) and put it in as: Person1, Person2, separationInfo (degree,
otherDetails) indexed on Person1 and Person2, of course. otherDetails
could be a string: a minimal chain to get to the person or whatever
might be interesting to you. At 10^8 pairs, that is a lot of entries
so you can be happy that you live in an era where 100 Gig hardrives can
be had.

But the important point is that updating can be done in the background
with an appropriate message pump so other apps can get their slices of
time (By the way, I'm sure you've come across Dijkstra's algorithm for
these types of computations). At the same time, extraction of the info
you want is now fast because mySQL just has to return a contiguous
block of 10K entries (if you wanted them all), that the info for each
person is grouped in one block. The central idea here is that you've
cached all your possible responses.

But beware that the memory requirements for this solution grow
quadratically with the number of people.

Csaba Gabor from Vienna
 
S

Steve

Hi, I wasn't sure which forum this post belongs to, so I've posted it
to a couple forums that I thought may be appropriate.

In giving me advice, please consider me a beginner. Below is a synopsis
of my problem/question:

SOME BACKGROUND:
- I am writing a php based web application.
- There is a very data intensive task I need to do that requires
reading and lookup of a lot of data.
- This data is all stored in a database. If I did the computation
directly from php, then the load on the database is too intensive, and
takes a unacceptable amount of time (over 20 secs) to calculate (I've
already tried this).
- So I thought what I need to do is load all this data into memory,
and then do this operation from there, and this should be a lot faster
than doing lookups on a database.
I think you need to look into how a proper rdbms manages its data. You'll
find that regularly used data *will* automatically be stored in memory[1].

Try transferring a data set into postgresql ( it's available for windows
if you must ), and see whether the performance becomes acceptable.

Failing that, use Oracle :)

Steve
[1] Assuming you've got enough spare!
 
L

Lisa Pearlson

Everyone here is telling you how to optimize your queries.
look into indexing, caching, stored procedures (mysql5 ?).. optimizing
queries and code.

But your question was how to let your script communicate with your c++ code.
The way I understand your question, your options would be:

execute scripts in php like this:
exec() shell() or ` ` quotes.

Alternatively, you can communicate with your C++ code via sockets.

But as suggested by others, I don't think programming C++ will help, because
the bottle neck is not your PHP code, rather running the queries against the
database.
 
N

Nimit

Lisa said:
Everyone here is telling you how to optimize your queries.
look into indexing, caching, stored procedures (mysql5 ?).. optimizing
queries and code.

But your question was how to let your script communicate with your c++ code.
The way I understand your question, your options would be:

execute scripts in php like this:
exec() shell() or ` ` quotes.

Alternatively, you can communicate with your C++ code via sockets.

But as suggested by others, I don't think programming C++ will help, because
the bottle neck is not your PHP code, rather running the queries against the
database.

Hi Lisa, you're right, the bottleneck is not PHP. What I was planning
to do with C++ however was to have a dedicated program on a server that
would first load the entire person network into memory, and then do the
calculations on that network in memory, which would be a lot faster
than making thousands of sql queries for every calculation.

I kind of solved this problem by NC's suggestion of using Heap tables,
as those are stored in memory. The performance in these tables wasn't
optimal either, but it was workable. I will be looking into caching
schemes of some sort (such as perhaps caching all 2nd degree friends)
or something to speed it up more. I will post about those when I come
up with something. Thanks for your response :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,534
Members
45,007
Latest member
obedient dusk

Latest Threads

Top