Capturing instant messages

Ed Leafe · Jul 17, 2006

I've been approached by a local business that has been advised that
they need to start capturing and archiving their instant messaging in
order to comply with Sarbanes-Oxley. The company is largely PC, but
has a significant number of Macs running OS X, too.

Googling around quickly turns up IM Grabber for the PC, which would
seem to be just what they need. But there is no equivalent to be
found for OS X. So if anyone knows of any such product, please let me
know and there will be no need for the rest of this post.

But assuming that there is no such product, would it be possible to
create something in Python, using the socket or a similar module?
They have a number of servers that provide NAT for each group of
machines; I was thinking that something on those servers could
capture all traffic on port 5190 and write it to disk. Is this
reasonable, or am I being too simplistic in my approach?

-- Ed Leafe
-- http://leafe.com
-- http://dabodev.com

Paul Rubin · Jul 18, 2006

Ed Leafe said:
But assuming that there is no such product, would it be
possible to create something in Python, using the socket or a similar
module? They have a number of servers that provide NAT for each group
of machines; I was thinking that something on those servers could
capture all traffic on port 5190 and write it to disk. Is this
reasonable, or am I being too simplistic in my approach?

Are you talking about IM's within the company, or over the internet?
According to

http://en.wikipedia.org/wiki/Instant_Message

There are about a bazillion different protocols in use. You have to
make sure everyone's using the same thing. Some of these protocols
use end to end encryption, which means logging at the network side
would just record ciphertext. You need to get these issues figured out.

Also ask them if they're sure they need to log the conversations.
According to some legal decisions, in some states, logging chats
without both parties' permission is illegal, like recording a phone
call. IM's are supposed to be transient communications, unlike email.
As such, logging all the company's IM's is sort of like audio
recording all the company's phone calls. I think the company should
get a lawyer to look at this question if it hasn't already.

Nick Vatamaniuc · Jul 18, 2006

Ed,

It depends on what IM protocol the company is using. If there is more
than one, your job might end up being quite complicated. You indicated
port 5190 in your post, does it mean that the company is using only AOL
IM? In general it seems like you would have to:

1) Capture the traffic
2) Decode the IM protocol
3) Record the captured text

1) As far as capturing the traffic, I would use a specific tool like
tcpick ( a cousin of tcpdump but actually dumps the data to console not
just the headers and recreates the tcp streams -- good stuff!). Again
if you know the exact port number and the exact protocol this might be
very easy because you will set up your capturing program to capture
traffic from only 1 port. Let's assume that for now. Here is my quick
and dirty attempt. First install tcpick http://tcpick.sourceforge.net/
if you don't have it, then become root and open a Python prompt. (Use
ipython... because my mom says it's better

.
In [1]:from subprocess import * #don't do this in your final script
always use 'import subprocess'
In [2]:cmd='/usr/sbin/tcpick -i eth0 -bR tcp port 80' #use your IM port
here instead of 80
#-bR means reconstruct TCP stream and dump data in raw mode to console
(good for ASCII stuff).
In [3]

=Popen(cmd, shell=True, bufsize=0, stdout=PIPE, stderr=PIPE)
#start a subprocess w/ NO_WAIT
In [4]

.pid #check the process pid, can use this to issue a 'kill'
command later...
Out[4]:7100
In [5]

.poll()
In [6]:#Acutally it is None, which means process is not finished
In [7]:#Read some lines one by one from output
In [8]

.stdout.readline() #Might block here, if so start a browser and
load a page
Out[8]:'Starting tcpick 0.2.1 at 2006-XX-XX XX:XX EDT\n'
In [9]:#
In [10]:#Print some lines from the output, one by one:
In [11]

.stdout.readline()
Out[11]:'Timeout for connections is 600\n' #first line, tcpick prompt
stuff
In [12]

.stdout.readline()
Out[12]:'tcpick: listening on eth0\n'
In [13]

.stdout.readline()
Out[13]:'setting filter: "tcp"\n'
In [14]

.stdout.readline()
Out[14]:'1 SYN-SENT 192.168.0.106:53498 >
64.233.167.104:www\n'
In [15]

.stdout.readline()
Out[15]:'1 SYN-RECEIVED 192.168.0.106:53498 >
64.233.167.104:www\n'
In [16]

.stdout.readline()
Out[16]:'1 ESTABLISHED 192.168.0.106:53498 >
64.233.167.104:www\n'
In [17]

.stdout.readline() #the good stuff should start right here
Out[17]:'GET /search?hl=en&q=42&btnG=Google+Search HTTP/1.1\r\n'
In [18]

.stdout.readline()
Out[18]:'Host: www.google.com\r\n'
In [19]

.stdout.readline()
Out[19]:'User-Agent: blah blah...\r\n'
In [20]

.stdout.read() #try a read() -- will block, press Ctrl-C
exceptions.KeyboardInterrupt
In [21]

.poll()
Out[21]:0 #process is finished, return errcode = 0
In [22]

.stderr.read()
Out[22]:'' #no error messages
In [23]

.stdout.read()
Out[23]:'\n257 packets captured\n7 tcp sessions detected\n'
In [24]: #those were the last stats before tcpick was terminated.

Well anyway, your readline()'s will block on process IO when no data
supplied from tcpick. Might have to start a thread in Python to manage
the thread that spawns the capture process. But in the end the
readlines will get you the raw data from the network (in this case it
was just one way from my ip to Google, of course you will need it both
ways).

2) The decoding will depend on your protocol, if you have more than one
IM protocol then the capture idea from above won't work too well, you
will have to capture all the traffic then decode each stream, for each
side, for each protocol.

3) Recording or replay is easy. Save to files or dump to a MySQL table
indexed by user id, timestamp, IP etc. Because of buffering issues you
will probably not get a very accurate real-time monitoring system with
this setup.

Hope this helps,
Nick Vatamaniuc

Ed Leafe · Jul 18, 2006

It depends on what IM protocol the company is using. If there is more
than one, your job might end up being quite complicated. You indicated
port 5190 in your post, does it mean that the company is using only
AOL
IM?

Yes, they've told me that the users routinely use AIM to contact
clients and each other. I don't believe that their firewalls permit
other IM protocols.

1) As far as capturing the traffic, I would use a specific tool like
tcpick ( a cousin of tcpdump but actually dumps the data to console
not
just the headers and recreates the tcp streams -- good stuff!). Again
if you know the exact port number and the exact protocol this might be
very easy because you will set up your capturing program to capture
traffic from only 1 port.

Thanks; I'll have to play around with tcpick today.

2) The decoding will depend on your protocol, if you have more than
one
IM protocol then the capture idea from above won't work too well, you
will have to capture all the traffic then decode each stream, for each
side, for each protocol.

I guess I'll have to start googling for AIM decoding information.

3) Recording or replay is easy. Save to files or dump to a MySQL table
indexed by user id, timestamp, IP etc. Because of buffering issues
you
will probably not get a very accurate real-time monitoring system with
this setup.

They aren't interested in real-time monitoring; their main concern
is Sarb-ox compliance.

Thanks for your help!

-- Ed Leafe
-- http://leafe.com
-- http://dabodev.com

Yu-Xi Lim · Jul 18, 2006

Ed said:
I've been approached by a local business that has been advised that
they need to start capturing and archiving their instant messaging in
order to comply with Sarbanes-Oxley. The company is largely PC, but has
a significant number of Macs running OS X, too.

This is going to be quite off-topic.

I'm not entirely familiar with SOX regulations. Is it necessary to
capture it at the gateway? The best solution would be to provide logging
at the individual chat clients. Piecing together conversation threads
from individual packets while filtering out other non-chat junk can be
extremely tedious.

I understand the standard AIM client doesn't provide logging. Probably
won't any time soon, since it wasn't made for enterprise. There are
enterprise gateways for AIM, but I'm not sure of the cost or other
deployment issues. (Try looking at Jabber) You should consider those. Or
a switch to a more enterprise-friendly protocol if that's possible.

Other alternatives would be to use a better client. Multi-protocol
clients like GAIM, Trillian, Miranda, and Adium X generally provide
logging. Most provide the ability to toggle logging for specific
sessions, thus reducing privacy issues.

Nick Vatamaniuc · Jul 18, 2006

Assuming a one person per one machine per one chat protocol it might be
possible to recreate the tcp streams (a lot of packet capturing devices
already do that). So the gateway would have to have some kind of a
dispatch that would recognize the initialization of a chat loggon and
start a capture process for each such connection. I imagine with a 1000
employess he will end up with a 1000 processes running at the same
time. Another way is to capture all the streams at once that deal with
the chat protocol and ports and then replay them later and somehow
cre-create the tcp streams and chat messages in a cron batch job (at
night or weekend).

Nick V.

Yu-Xi Lim · Jul 18, 2006

Nick said:
Assuming a one person per one machine per one chat protocol it might be
possible to recreate the tcp streams (a lot of packet capturing devices
already do that). So the gateway would have to have some kind of a
dispatch that would recognize the initialization of a chat loggon and
start a capture process for each such connection. I imagine with a 1000
employess he will end up with a 1000 processes running at the same
time. Another way is to capture all the streams at once that deal with
the chat protocol and ports and then replay them later and somehow
cre-create the tcp streams and chat messages in a cron batch job (at
night or weekend).

As I said, it's tedious, not impossible.

The AIM Sniff project (perl,
not Python) does most of what you describe, but has bugs because of the
approach.

You're also ignoring the fact that each person may chat with more than
one person. Some protocols route all messages through a central server,
making it impossible to use the IP of the other party as a unique
identifier (not that it's a good idea to use the IP anyway, since the
assumption of one unique and consistent IP per person is weak).
Furthermore, you have to deal with failed messages, resends, etc at the
application layer. And there are also other non-trivial (but thankfully
rarely occurring) issues with TCP stream reconstruction.

Basically, it's looking at the wrong OSI layer. An application layer
protocol is best handled at the application where all the necessary
semantics are easily available. It /is/ an business/organization trying
to conform to SOX, so something as minor as switching and standardizing
IM clients (not necessarily protocols) would be probably the least of
their problems. And probably more manageable than a custom script for a
non-trivial activity.

There are definitely enterprise solutions available. And if you want to
get Python involved in this discussion, consider GAIM, which can be
scripted using Python via a plugin.

Ed Leafe · Jul 19, 2006

This is going to be quite off-topic.

But helpful nonetheless.

I'm not entirely familiar with SOX regulations. Is it necessary to
capture it at the gateway?

I'm no lawyer either, so I probably know as much about this as you
do. It was the client who proposed this type of solution; I'm in the
process of figuring out if it's at all possible.

The best solution would be to provide logging
at the individual chat clients. Piecing together conversation threads
from individual packets while filtering out other non-chat junk can be
extremely tedious.

I got the impression that they want to capture the IM traffic and
record it somewhere JIC they are ever audited or subpoenaed, but that
most of it would never get looked at again.

I understand the standard AIM client doesn't provide logging. Probably
won't any time soon, since it wasn't made for enterprise. There are
enterprise gateways for AIM, but I'm not sure of the cost or other
deployment issues. (Try looking at Jabber) You should consider
those. Or
a switch to a more enterprise-friendly protocol if that's possible.

Other alternatives would be to use a better client. Multi-protocol
clients like GAIM, Trillian, Miranda, and Adium X generally provide
logging. Most provide the ability to toggle logging for specific
sessions, thus reducing privacy issues.

Thanks for the suggestions; I'll run them by the client. They don't
want to do it at the individual desktop level; they want a central
location to ensure that someone doesn't have the capability to
disable the logging, so perhaps an enterprise gateway might be a
better solution.

-- Ed Leafe
-- http://leafe.com
-- http://dabodev.com

ANN: eGenix mxODBC Connect 2.1.0 - Python ODBC Database Interface	0	May 28, 2014
Using pyclbr	0	Dec 30, 2004
ANN: Dabo 3-tier desktop framework for data-aware apps	15	May 12, 2004
ANN: eGenix mxODBC 3.2.0 - Python ODBC Database Interface	0	Aug 28, 2012
[ANN] Dabo 0.4 Released	0	Aug 9, 2005
YASS (Yet Another Success Story)	1	Jun 20, 2009
MAINFRAME Training with IBM Certification and JOB GUARANTEE:	0	Feb 21, 2008
Kamaelia 0.4.0 RELEASED - Faster! More Tools! More Examples! More Docs! ;-)	4	Jun 21, 2006

Capturing instant messages

Ed Leafe

Paul Rubin

Nick Vatamaniuc

Ed Leafe

Yu-Xi Lim

Nick Vatamaniuc

Yu-Xi Lim

Ed Leafe

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads