data transformation, Perl and MicroSoft

C

ccc31807

My apology to those of you who think this post is OT.

Here is an extended quotation from a Microsoft white paper on data
transformation services, or Integration Services in MS SQL 2008. This
really hit me hard -- I'm a database guy who uses Perl extensively to
do my job. SQL Server is a commercial product, which MS sells to
generate revenue. Ordinarily, people buy stuff that they perceive has
value over and above the purchase price. Microsoft obviously wants to
persuade people that SIS adds value, and they may be right, I'm not
going to prejudge SIS, particularly since I don't know anything about
it. I'm wondering how others may respond, hence this post. I will
follow up this post with my response. (I don't want to do it now lest
I poison the well.)

In case you are wondering, this post is directly related to certain
events that have transpired in my workplace in the past couple of
days. I'll leave it to you to guess what those events are.

Here is the article, and yes, I know it violates the terms of use,
according to which you need written permission from Microsoft
Corporation to even discover that the document exists, much less to
read it!

<quote>
Challenges of Data Integration

At one level, the problem of data integration in our real-world
scenario is extraordinarily simple. Get data from multiple sources,
cleanse and transform the data, and load the data into appropriate
data stores for analysis and reporting. Unfortunately, in a typical
data warehouse or business intelligence project, enterprises spend 60–
80% of the available resources in the data integration stage. Why is
it so difficult?

Technology Challenges

Technology challenges start with source systems. We are moving from
collecting data on transactions (where customers commit to getting,
buying, or otherwise acquiring something) to collecting data on pre-
transactions (where mechanisms such as Web clicks or RFID tags track
customer intentions). Data is now not only acquired via traditional
sources and formats, such as databases and text files, but is
increasingly available in a variety of different formats (ranging from
proprietary files to Microsoft Office documents to XML-based files)
and from Internet-based sources such as Web services and RSS (Really
Simple Syndication) streams.

The most pertinent challenges are:
• Multiple sources with different formats.
• Structured, semi-structured, and unstructured data.
• Data feeds from source systems arriving at different times.
• Huge data volumes.

In an ideal world, even if you somehow manage to get all the data we
need in one place, new challenges start to surface, including:
• Data quality.
• Making sense of different data formats.
• Transforming the data into a format that is meaningful to business
analysts.

Suppose that you can magically get all of the data that you need and
that you can cleanse, transform, and map the data into a useful
format. There is still another shift away from traditional data
movement and integration. That is the shift from fixed long batch-
oriented processes to fluid and shorter on-demand processes. Most
organizations perform batch-oriented processes during “downtimes” when
users do not place heavy demands on the system. This is usually at
night during a predefined batch window of 6-8 hours, when no one is
supposed to be in the office. With the increasing globalization of
businesses of every size and type, this is no longer true. There is
very little (if any) downtime and someone is always in the office
somewhere in the world.

As a result you have:
• Increasing pressure to load the data as quickly as possible.
• The need to load multiple destinations at the same time.
• Diverse destinations.

Not only do you need to achieve all of these results, but also you
need to achieve them as fast as possible. In extreme cases, such as
online businesses, you must integrate data on a continuous basis.
There are no real batch windows and latencies cannot exceed minutes.
In many of these scenarios, the decision-making process is automated
with continuously running software.

Scalability and performance become more and more important as you face
business needs that cannot tolerate any downtime.

Without the right technology, systems require staging at almost every
step of the warehousing and integration process. As different
(especially nonstandard) data sources need to be included in the
Extract, Transform, and Load (ETL) process and as more complex
operations (such as data and text mining) need to be performed on the
data, the need to stage the data increases. As illustrated in Figure
1, with increased staging the time taken to “close the loop,” (i.e.,
to analyze, and take action on new data) increases as well. These
traditional ELT architectures (as opposed to value-added ETL processes
that occur prior to loading) impose severe restrictions on the ability
of systems to respond to emerging business needs.

Finally, the question of how data integration ties into the overall
integration architecture of the organization is becoming more
important when you need both the real-time transactional technology of
application integration and the batch-oriented high-volume world of
data integration technology to solve the business problems of the
enterprise.
</quote>
 
S

sln

My apology to those of you who think this post is OT.

<quote>
Challenges of Data Integration

Technology Challenges

Technology challenges start with source systems. We are moving from
collecting data on transactions (where customers commit to getting,
buying, or otherwise acquiring something) to collecting data on pre-
transactions (where mechanisms such as Web clicks or RFID tags track
customer intentions).

You know what I think about these Technology Challenges?
I think they can fucking go to fucking hell !!!

This joins the ranks with "adver-blasting", invasion of privacy,
and more of its ilk. Together, a thing to be shunned, spit on,
and kicked in the balls!

As far as I'm concerned, the Anti-Technology Technology is far
more profitable if your customers are the working people of the
world. Thats a business that will ultimately come out on top.

**** this shit!

-sln
 
C

ccc31807

You know what I think about these Technology Challenges?
I think they can fucking go to fucking hell !!!

This joins the ranks with "adver-blasting", invasion of privacy,
and more of its ilk. Together, a thing to be shunned, spit on,
and kicked in the balls!

As far as I'm concerned, the Anti-Technology Technology is far
more profitable if your customers are the working people of the
world. Thats a business that will ultimately come out on top.

**** this shit!

I've gotten more into MSSSQL Integration Services in the last few
days, and (God help me) have begun thinking about converting a rather
substantial set of Perl scripts to run entirely inside of MSSQL with
SSIS.

This turns out to be a serious effort on Microsoft's part, and quite
frankly has some impressive features. Not as impressive as writing a
clever regular expression to transform a file, but getting there.

CC.
 
C

Charlton Wilbur

cc> Ordinarily, people buy stuff that they perceive has value over
cc> and above the purchase price.

Yes. However, you seem to be assuming that the *perception* of value
actually adds value, and that the added value will be in an area that
you find useful. Those are both unwarranted leaps.

Charlton
 
C

ccc31807

Yes.  However, you seem to be assuming that the *perception* of value
actually adds value, and that the added value will be in an area that
you find useful.  Those are both unwarranted leaps.

I've now spend about a week playing with this. Microsoft has done a
typical Microsoft thing, produced the ability to 'program' a SSIS
application by pulling components off the shelf and connect them by
drawing on a GUI. IOW, managers can 'program' an application without
actually knowing anything about the language or technology they use.

From the managers POV, I can see the value -- it allows them to build
apps without requiring the services of a software guy and without
knowing much about software. In this respect, I work like this all the
time building physical things, using premanufactured components bought
at the building supply or hardware store.

I agree that the perception of value is important. Not as important as
the reality, but still important. In the long haul, the market will
determine either the reality or the fantasy of the perception, and I
suspect that in the case of Microsoft SSIS, the market will validate
the product.

I can do the same thing, and more, with less effort in less time using
Perl, but the downside is that I have spent years learning how.
Managers will spend a LOT of money to be able to have some of this
ability without learning Perl (or any technology for that matter).

CC.
 
C

Charlton Wilbur

cc> I've now spend about a week playing with this. Microsoft has
cc> done a typical Microsoft thing, produced the ability to
cc> 'program' a SSIS application by pulling components off the shelf
cc> and connect them by drawing on a GUI. IOW, managers can
cc> 'program' an application without actually knowing anything about
cc> the language or technology they use.

Wow, revolutionary!

That concept, as it happens, predates Windows.

http://en.wikipedia.org/wiki/Fourth-generation_programming_language

Charlton
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top