python daemon - compress data and load data into MySQL by pyodbc

M

MacRules

Hi,

I installed Python daemon, pyodbc module to access the back-end DB server.

My setup is like this

load data job -> Python Daemon A, port 6000 -> Python Daemon B, port
7000 -> MySQL

Daemon A will perform data compression, such as GZIP, and send over data
to Daemon B.
Daemon B will perform data uncompression, GUNZIP, and insert records to
MySQL or MSSQL or Oracle.

Where should I start this to code this?
Can someone give me a hint, as detail as possible here?

I am a python newbie.

Thanks for all the help I can get,
 
S

Sean DiZazzo

Hi,

I installed Python daemon, pyodbc module to access the back-end DB server..

My setup is like this

load data job -> Python Daemon A, port 6000 -> Python Daemon B, port
7000 -> MySQL

Daemon A will perform data compression, such as GZIP, and send over data
to Daemon B.
Daemon B will perform data uncompression, GUNZIP, and insert records to
MySQL or MSSQL or Oracle.

Where should I start this to code this?
Can someone give me a hint, as detail as possible here?

I am a python newbie.

Thanks for all the help I can get,

Start by reading the tutorial. http://docs.python.org/tutorial/

~Sean
 
D

David Smith

MacRules said:
Are you a Python expert?
Can you show me the basic coding to get a Daemon (pass through insert
data) up and insert to the backend MySQL?

You've asked a rather large and non-specific question. What avenues
have you explored so far? Can you describe the problem this is designed
to solve?

--David
 
M

MacRules

Ben said:
This group has many Python experts, and even more people who can no
doubt help you if you are able to articulate what you need help with.

To help dispel a possible misunderstanding: Don't expect to have a
one-on-one conversation with a single devoted correspondent here. We
prefer to have these discussions in a manner open for contributions from
everyone, to maximise the benefit for future readers and the chance of
correcting errors.


This doesn't make a whole lot of sense to me; it sounds like a
conflation of two separate problems that would be best tackled
separately. Can you re-phrase it, perhaps by describing the problem
you're actually trying to solve?

On the topic of implementing a daemon process in Python, there is the
‘python-daemon’ library <URL:http://pypi.python.org/pypi/python-daemon>
which I hope you will find useful. If you can describe what you're
actually trying to do, perhaps it will be clearer whether this library
is a good fit.

What I am looking for is this.

Oracle DB in data center 1 (LA, west coast)
MSSQL DB in data center 2 (DC, east coast)
So network bandwidth is an issue, I prefer to have gzip fist and deliver
the data.

I need 2 python daemons or a web service here in the future.
I will enter the Oracle table name, user id and password.
So the task is dump out Oracle data (Linux) and insert that to MSSQL.


I can try first with 1 daemon python. Take the Oracle data file, and let
the daemon connects to MSSQL (with pyodbc) and load the data in.
 
M

MacRules

Martin said:
MacRules wrote:


If bandwidth is really an issue, you should send compressed delta's.


That is assuming the table is the same and the columns of the table have
the same type with the same restrictions and don't even get me started
about encoding differences.


I think that you are underestimating your task, I would recommend to
'design' your application first using an UML like approach, whether you
need one, two or bazillion daemons should not be a design start but a
consequence of the design.

Anyway here is a sum up and some further pointers to take in regard for
your design:
- Can I do delta's? (if yes how do I calculate them)
- Are the tables comparable in design
Specifically if there is a limit on the character width, does Oracle
and MS-SQL think the same about newlines? (one versus two characters)
- How about encoding, are you sure it works out right?
- Is ATOMIC an issue
- Is latency an issue, that is how long may the tables be out of sync
- Can it be manual or is it preferably automatic
How about a trigger in the database that sets the sync going, if that
is too much burden how about a trigger that set a flag on the disk and a
scheduled job that reads that flag first
- Is security an issue, do I need an encrypted channel

In the past I've wrote a program that had the same starting point as you
had, over time it grew in design to be a datawarehouse push/pull central
server with all other databases as an agency. The only reason I wrote it
was because the more standard approach like business objects data
integrator was just way too expensive and oracles solutions didn't play
nice with PostgreSQL (though over time this issue seemed to be resolved).

You understand my issue clearly. For now, I only look for 1 time event,
I do not need push/pull yet.

For a good push/pull + compression + security + real time in mind, you
can sell it for decent money.
 
D

Dennis Lee Bieber

Oracle DB in data center 1 (LA, west coast)
MSSQL DB in data center 2 (DC, east coast)

Note that your thread subject line states MySQL... There is a big
difference between MySQL and M$ SQL-Server.
So network bandwidth is an issue, I prefer to have gzip fist and deliver
the data.

I need 2 python daemons or a web service here in the future.
I will enter the Oracle table name, user id and password.
So the task is dump out Oracle data (Linux) and insert that to MSSQL.
One: daemon implies something running in the background,
stand-alone, and not associated with any "user" type account. IOW,
something like a server program...

Do you really intend to create a server application that sits around
waiting for anyone to connect to it and send a properly formatted
request? What safeguards do you intend to restrict this daemon from
responding to hackers (after all, you're going to have to have an open
port to accept requests from outside).

Second: said daemon will have to be running ON the end with the
Oracle DBMS (or, at least, within its LAN which presumably has higher
bandwidth than the long distance connection).
I can try first with 1 daemon python. Take the Oracle data file, and let
the daemon connects to MSSQL (with pyodbc) and load the data in.

I see three operations here:

1 dumping data from Oracle
2 transferring data from LA to DC
3 importing data to whatever DBMS you are really using.

I believe, in a later post, you mention this is a one-time task. If
so, WHY worry about network bandwidth... Just let it run overnight...

Question: does the Oracle server permit remote connections?
(dangerous unless behind a VPN type connection)

If it does, I'd suggest the initial program should just open a
connection both the Oracle server, and the local server. Request the
contents of whatever needs to be transferred, collect the data, massage
it into whatever format is needed for the local, and insert it to the
local... Anything else will require being able to log into the remote
machine and running programs ON it. In which case you could just FTP the
data from one end to the other (assuming there is an FTP server active
on one or the other end -- or a web server address you can stuff the
files into and use HTTP to fetch them)
 
M

MacRules

Dennis said:
Note that your thread subject line states MySQL... There is a big
difference between MySQL and M$ SQL-Server.

One: daemon implies something running in the background,
stand-alone, and not associated with any "user" type account. IOW,
something like a server program...

Do you really intend to create a server application that sits around
waiting for anyone to connect to it and send a properly formatted
request? What safeguards do you intend to restrict this daemon from
responding to hackers (after all, you're going to have to have an open
port to accept requests from outside).

Second: said daemon will have to be running ON the end with the
Oracle DBMS (or, at least, within its LAN which presumably has higher
bandwidth than the long distance connection).


I see three operations here:

1 dumping data from Oracle
2 transferring data from LA to DC
3 importing data to whatever DBMS you are really using.

I believe, in a later post, you mention this is a one-time task. If
so, WHY worry about network bandwidth... Just let it run overnight...

Question: does the Oracle server permit remote connections?
(dangerous unless behind a VPN type connection)

If it does, I'd suggest the initial program should just open a
connection both the Oracle server, and the local server. Request the
contents of whatever needs to be transferred, collect the data, massage
it into whatever format is needed for the local, and insert it to the
local... Anything else will require being able to log into the remote
machine and running programs ON it. In which case you could just FTP the
data from one end to the other (assuming there is an FTP server active
on one or the other end -- or a web server address you can stuff the
files into and use HTTP to fetch them)

I have 1000 servers, and 1 time each, so it is 1000 times.
What I try to say, I do not need real-time push/pull here.

I know the manual operations, just want Python does automatically for me.

I like to see good examples on how to code daemon, signal, maybe
multi-threading.

I wrote something like that in C way back. And good at shell scripts.
 
M

MacRules

Dennis said:
Note that your thread subject line states MySQL... There is a big
difference between MySQL and M$ SQL-Server.

One: daemon implies something running in the background,
stand-alone, and not associated with any "user" type account. IOW,
something like a server program...

Do you really intend to create a server application that sits around
waiting for anyone to connect to it and send a properly formatted
request? What safeguards do you intend to restrict this daemon from
responding to hackers (after all, you're going to have to have an open
port to accept requests from outside).

Second: said daemon will have to be running ON the end with the
Oracle DBMS (or, at least, within its LAN which presumably has higher
bandwidth than the long distance connection).


I see three operations here:

1 dumping data from Oracle
2 transferring data from LA to DC
3 importing data to whatever DBMS you are really using.

I believe, in a later post, you mention this is a one-time task. If
so, WHY worry about network bandwidth... Just let it run overnight...

Question: does the Oracle server permit remote connections?
(dangerous unless behind a VPN type connection)

If it does, I'd suggest the initial program should just open a
connection both the Oracle server, and the local server. Request the
contents of whatever needs to be transferred, collect the data, massage
it into whatever format is needed for the local, and insert it to the
local... Anything else will require being able to log into the remote
machine and running programs ON it. In which case you could just FTP the
data from one end to the other (assuming there is an FTP server active
on one or the other end -- or a web server address you can stuff the
files into and use HTTP to fetch them)

I have 1000 servers, and 1 time each, so it is 1000 times.
What I try to say, I do not need real-time push/pull here.

I know the manual operations, just want Python does automatically for me.

I like to see good examples on how to code daemon, signal, maybe
multi-threading.

I wrote something like that in C way back. And good at shell scripts.
 
D

Dennis Lee Bieber

I have 1000 servers, and 1 time each, so it is 1000 times.
What I try to say, I do not need real-time push/pull here.
So you are now saying you want access to install a "one time use"
daemon on EACH of 1000 servers (all in LA? What it this -- consolidation
of systems installed in individual 7/11 stores?), just so some remote
host can connect to the daemon's open port, request data, wait while the
daemon queries the DBMS and packets the data for transfer [and are going
to write a protocol to transfer this data too], and then go back and
delete the daemon process and code from the each of the 1000 servers...

Somehow I think THAT is going to take more time then just doing a
direct connection to the database itself, fetch the records,
disconnect... Insert into local DBMS.
I know the manual operations, just want Python does automatically for me.
Well... First... Itemize exactly what those steps are... then code
an algorithm that does just that... Then modify it to take a list of
servers and repeat the process on each.

If you're hoping to get multiple processes pulling data in parallel,
and inserting it all into a single DBMS, you're going to run into the
possibility of transactional conflicts and the related need to do
rollbacks and restarts of transactions.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top