Python, MS SQL, and batch inserts

ericwoodworth · Apr 21, 2009

Hi,
I have a python script I'm writing that grabs some data from a
com object, does a little formatting, and then inserts that data into
a MS SQL 2003 DB. Because I'm using COM objects I'm importing
win32com.client. That also allows me to use ADODB.connection and
ADODB.command objects for working with SQL.

The program works fine but it's a little slow. Inserting ~5500
rows of data takes about 10 seconds using a DB that is on the same
machine running the script.

I've done some general searches on how to speed this up and in
other languages people suggest sending batches of inserts off at a
time instead of executing 1 insert at a time. For java and .net
people recommend using a stringbuilder function to create strings
quickly. I don't know of such a function in python s I tried grouping
my inserts into a single string using string += syntax. I knew that
would be terrible but I wanted to see how terrible. Final reults: It
was pretty terrible. Script went from taking ~18sec to taking
240sec. The overhead for recreating the strings was monster. No real
surprise there.

So I then loaded up the commands into a list and at the end I
used the strong join method to create the string. This was far faster
than using += to create my strings but still took twice as long as
just running my inserts one at a time. So I'm looking for
suggestions.

Basically I have 5000 SQL inserts that I want to do as quickly as
possible. This is purely academic as I can live with the 18 seconds
the script needs to run (9 to talk to the com object and format the
data and 10 to write to SQL) but I'm still curious how to improve on
what I have running.

Thanks in advance for any help,
Eric

Philip Semanchuk · Apr 21, 2009

Hi,
I have a python script I'm writing that grabs some data from a
com object, does a little formatting, and then inserts that data into
a MS SQL 2003 DB. Because I'm using COM objects I'm importing
win32com.client. That also allows me to use ADODB.connection and
ADODB.command objects for working with SQL.

The program works fine but it's a little slow. Inserting ~5500
rows of data takes about 10 seconds using a DB that is on the same
machine running the script.

I've done some general searches on how to speed this up and in
other languages people suggest sending batches of inserts off at a
time instead of executing 1 insert at a time. For java and .net
people recommend using a stringbuilder function to create strings
quickly. I don't know of such a function in python s I tried grouping
my inserts into a single string using string += syntax. I knew that
would be terrible but I wanted to see how terrible. Final reults: It
was pretty terrible. Script went from taking ~18sec to taking
240sec. The overhead for recreating the strings was monster. No real
surprise there.

So I then loaded up the commands into a list and at the end I
used the strong join method to create the string. This was far faster
than using += to create my strings but still took twice as long as
just running my inserts one at a time. So I'm looking for
suggestions.

Basically I have 5000 SQL inserts that I want to do as quickly as
possible. This is purely academic as I can live with the 18 seconds
the script needs to run (9 to talk to the com object and format the
data and 10 to write to SQL) but I'm still curious how to improve on
what I have running.

Are you sure your logjam is in Python? Inserting 5500 rows can take a
few seconds if you're COMMITting after each INSERT. Wrap the whole
thing in an explicit transaction and see if that helps.

Also, toss in a few print statements containing timestamps so you know
more about where the script is spending time.

bye
Philip

ericwoodworth · Apr 21, 2009

Are you sure your logjam is in Python? Inserting 5500 rows can take a
few seconds if you're COMMITting after each INSERT. Wrap the whole
thing in an explicit transaction and see if that helps.

Also, toss in a few print statements containing timestamps so you know
more about where the script is spending time.

bye
Philip

I'm not 100% sure it's python and not SQL but I do suspect there's a
better way to do this than just serial inserts. I could be wrong
about that which is what i'm trying to explore.
I already do use the time stamps and this is what I see:
at 9 secs in I've gotten my data, formatted it, and placed it on the
list
at 9.047 secs in the string.join() is done and I have my command
string
at 35 secs the program ends. So somehow my SQL is taking a lot longer
when I format it as single string.

How would I make the whole thing one transaction? Just insert BEGIN
TRANSACTION at the start and COMMIT at the end? Is that enough to do
it?

Philip Semanchuk · Apr 21, 2009

I'm not 100% sure it's python and not SQL but I do suspect there's a
better way to do this than just serial inserts. I could be wrong
about that which is what i'm trying to explore.

I don't think the SQL standard provides a way to do bulk inserts and
as a result most DB vendors have extended the standard to address this
common need (e.g. the COPY command in Postgres).

If you're doing a mass insert to populate a blank table it also often
helps to postpone index creation until after the table is populated.

I already do use the time stamps and this is what I see:
at 9 secs in I've gotten my data, formatted it, and placed it on the
list
at 9.047 secs in the string.join() is done and I have my command
string

You said you're inserting ~5500 rows, so are you calling .join() on a
list of 5500 items? If so, 9 seconds seems painfully slow unless
you're on old hardware.

at 35 secs the program ends. So somehow my SQL is taking a lot longer
when I format it as single string.

How would I make the whole thing one transaction? Just insert BEGIN
TRANSACTION at the start and COMMIT at the end? Is that enough to do
it?

That's the basic idea, but your database adapter (i.e. the Python
wrapper that talks to SQLServer) might want you to do it another way.
For instance, the connection object has a .commit() method and using
that is probably a better ideal than calling cursor.execute("COMMIT").

Per the Python DB API, your connection should be opened in
transactional mode by default. ("Note that if the database supports an
auto-commit feature, this must be initially off.")
http://www.python.org/dev/peps/pep-0249/

In other words, if you adapter is DB-API compliant then the only
reason you're not using a transaction is because you're explicitly
turning them off.

bye
P

ericwoodworth · Apr 21, 2009

I forget the name of the SQL Server bulk loader, but for large loads, I
used to populate a fresh table with the bulk data, then do UPDATEs and
INSERTs to get the data spread out into the main tables. You (the OP)
might try a scheme like that.

--Scott David Daniels
(e-mail address removed)

Hmm..I think I'm going to move my question over to a SQL forum because
this is starting to feel like a SQL, rather than a python issue to me.

Three times now after letting the system "rest" where I go off an do
other things and then run my script it completes in 10 seconds. If I
drop tables and start fresh immediately after that it takes 35
seconds. If I drop the tables and wait an hour and then run the
script it'll finish in 10 seconds again.

That makes me think it's a SQL config or optimization issue more than
a python issue.

oh and the times I listed above were totals from the start of
execution so the string.join() was taking 0.047 seconds to run. It
was taking 9 seconds to get my data from the com object and format it
but the join was quite fast.

ericwoodworth · Apr 21, 2009

Hmm..I think I'm going to move my question over to a SQL forum because
this is starting to feel like a SQL, rather than a python issue to me.

Three times now after letting the system "rest" where I go off an do
other things and then run my script it completes in 10 seconds. If I
drop tables and start fresh immediately after that it takes 35
seconds. If I drop the tables and wait an hour and then run the
script it'll finish in 10 seconds again.

That makes me think it's a SQL config or optimization issue more than
a python issue.

oh and the times I listed above were totals from the start of
execution so the string.join() was taking 0.047 seconds to run. It
was taking 9 seconds to get my data from the com object and format it
but the join was quite fast.

Also if I restart SQL it will respond very quickly as well...down to
10 secs again. Not sure why.

pruebauno · Apr 22, 2009

bcp (bulk copy) was the name of the bulk loader. I just remembered.

Sorry, no more exciting advice. it does sound like a good idea to go
to a SQL Server group.

--Scott David Daniels
(e-mail address removed)

For MS SQL try:

BULK INSERT AdventureWorks.Sales.SalesOrderDetail
FROM ''<drive>:\<path>\<filename>''
WITH (ROWTERMINATOR = '''+CHAR(10)+''')

Dennis Lee Bieber · Apr 23, 2009

Hi,
I have a python script I'm writing that grabs some data from a
com object, does a little formatting, and then inserts that data into
a MS SQL 2003 DB. Because I'm using COM objects I'm importing
win32com.client. That also allows me to use ADODB.connection and
ADODB.command objects for working with SQL.

Unfortunately, ADO DB isn't a Python DB-API interface, so the common
"standard" recommendations won't apply...

time instead of executing 1 insert at a time. For java and .net
people recommend using a stringbuilder function to create strings
quickly. I don't know of such a function in python s I tried grouping
my inserts into a single string using string += syntax. I knew that
would be terrible but I wanted to see how terrible. Final reults: It
was pretty terrible. Script went from taking ~18sec to taking
240sec. The overhead for recreating the strings was monster. No real
surprise there.

You don't mean to say you are creating full SQL insert statements
WITH THE DATA!

Surely ADO DB has a means of doing parameterized SQL!

How about...
http://msdn.microsoft.com/en-us/library/ms677502(VS.85).aspx

If I understand the cryptic M$ documentation, you'd create a
parameterized SQL statement which you feed to a command object; have the
statement "prepared" (basically, preparsed and prepped for use, but
without any real data), then fill in a parameter object with the data of
the fields, and execute the command object with those parameters; refill
the parameters and repeat the execute as needed.
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/

Lawrence D'Oliveiro · Apr 23, 2009

Dennis said:
You don't mean to say you are creating full SQL insert statements
WITH THE DATA!

Nothing wrong with that
<http://codecodex.com/wiki/index.php?title=Useful_MySQL_Routines>.

Lawrence D'Oliveiro · Apr 23, 2009

Philip said:
Are you sure your logjam is in Python? Inserting 5500 rows can take a
few seconds if you're COMMITting after each INSERT. Wrap the whole
thing in an explicit transaction and see if that helps.

Does MS-SQL have the equivalent of "insert delayed"
<http://dev.mysql.com/doc/refman/5.0/en/insert-delayed.html>?

Dennis Lee Bieber · Apr 24, 2009

Nothing wrong with that
<http://codecodex.com/wiki/index.php?title=Useful_MySQL_Routines>.

Eeeek! That whole page seems to take pride in violating the common
recommendations for the use of DB-API compliant database adapters.

--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/

Lawrence D'Oliveiro · Apr 24, 2009

Dennis said:
Eeeek! That whole page seems to take pride in violating the common
recommendations for the use of DB-API compliant database adapters.

Let's see you come up with better solutions.

Win 10, Win 11, GoDaddy MS SQL DB	1	Jul 9, 2025
Batch modifying text - content and context based	5	Jan 19, 2023
Batch file not closing after use.	0	Jan 23, 2023
Best Method to Recover Corrupted SQL Database Files	1	Jan 7, 2025
Procedure Invalid Identifier in SQL / PLSQL	1	Jul 26, 2022
SQL Connection string regex pattern to parse sections	1	May 9, 2024
Using Python and MS-SQL Server	2	Jun 23, 2008
User Interface and Batch Inserts/Updates	0	Nov 25, 2007

Python, MS SQL, and batch inserts

ericwoodworth

Philip Semanchuk

ericwoodworth

Philip Semanchuk

ericwoodworth

ericwoodworth

pruebauno

Dennis Lee Bieber

Lawrence D'Oliveiro

Lawrence D'Oliveiro

Dennis Lee Bieber

Lawrence D'Oliveiro

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads