Extracting Rich Text data formats from win32clipboard

Discussion in 'Python' started by Trader, Aug 26, 2003.

  1. Trader

    Trader Guest

    Hi,

    I'm trying to use Mark Hammond's win32clipboard module to extract more
    complex data than just plain ASCII text from the Windows clipboard.
    For instance, when you select all the content on web page, you can
    paste it into an app like Frontpage, or something Rich Text-aware, and
    it will preserve all the formatting, HTML, etc. I'd like to include
    that behavior in the application I'm writing.

    In the interactive session below, before I run the clipboard_grab()
    function, I've selected all of the www.google.com homepage in IE and
    hit Control-C. The function cycles through all the formats stored on
    the clipboard and loads up a data list with each type it finds.

    Here's where it gets interesting: while data[2] is the textual data
    that I would expect to see if I pasted the clipboard in a Notepad
    file, data[0] and data[1] are in a weird, non-ASCII (binary?) format.
    Are these pointers to (or metadata for) the actual HTML or rich text?
    How do I use this data? Is there a reference I can use that will help
    me decipher this information? Any help would be greatly appreciated.

    Thanks!

    ----

    Python 2.3 (#46, Jul 29 2003, 18:54:32) [MSC v.1200 32 bit (Intel)] on
    win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import win32clipboard
    >>>
    >>> def clipboard_grab():

    .... global format, formats, data
    .... win32clipboard.OpenClipboard()
    .... format = 1
    .... formats = []
    .... data = []
    .... while 1:
    .... format = win32clipboard.EnumClipboardFormats(format)
    .... print "FORMAT:", format
    .... if not format:
    .... break
    .... try:
    .... datum = win32clipboard.GetClipboardData(format)
    .... formats.append(format)
    .... data.append(datum)
    .... except:
    .... print format, traceback.format_exception(sys.exc_type,
    sys.exc_value, sys.exc_traceback)
    .... win32clipboard.EmptyClipboard()
    .... win32clipboard.CloseClipboard()
    ....
    >>>
    >>>
    >>> clipboard_grab()

    FORMAT: 49171
    FORMAT: 16
    FORMAT: 7
    FORMAT: 0
    >>> len(data)

    3
    >>> data[0]

    '\x00\x00\x00\x00\x18\x01\x00\x00\x01\x00\x00\x00\x06\x00\x00\x00\x00\x00\x00\x0
    0\x00\x00\x00\x00\xe3\xc0\xc2w\x00\x00\x00\x00\x01\x00\x00\x00\xff\xff\xff\xff\x
    01\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xa2\xc0\xe9\x02\x
    00\x00\x00\x00\x01\x00\x00\x00\xff\xff\xff\xff\x01\x00\x00\x00\x01\x00\x00\x00\x
    00\x00\x00\x00\x00\x00\x00\x00K\xc1\xc2w\x00\x00\x00\x00\x01\x00\x00\x00\xff\xff
    \xff\xff\x01\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00L\xc1\xc
    2w\x00\x00\x00\x00\x01\x00\x00\x00\xff\xff\xff\xff\x01\x00\x00\x00\x01\x00\x00\x
    00\x00\x00\x00\x00\x00\x00\x00\x00\r\x00\xc2w\x00\x00\x00\x00\x01\x00\x00\x00\xf
    f\xff\xff\xff\x01\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0
    1\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\xff\xff\xff\xff\x01\x00\x00\x00\x0
    1\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0
    0\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0
    0\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0
    0\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
    >>> data[1]

    '\t\x04\x00\x00'
    >>> data[2]

    '\r\n \tWeb\t \tImages\t \tGroups\t \tDirectory\t \tNews\t \r\n\r\n
    \t\r\n\t \x0
    7 Advanced Search\r\n \x07 Preferences\r\n \x07 Language
    Tools\r\n\r\n\r\nAdvert
    ise with Us - Business Solutions - Services & Tools - Jobs, Press, &
    Help\r\n\r\
    nc2003 Google - Searching 3,307,998,701 web pages'
    >>>
     
    Trader, Aug 26, 2003
    #1
    1. Advertising

  2. > >>> clipboard_grab()
    > FORMAT: 49171
    > FORMAT: 16
    > FORMAT: 7


    7 = CF_OEMTEXT
    16 = CF_LOCALE
    49171 = 0xC013 = apparently OLE private data

    That should help you with some searches. Basically the CF_OEMTEXT is the
    only one that's going to be useful for you, unless you can figure out what
    to do with the OLE private data.

    -Mike
     
    Michael Geary, Aug 26, 2003
    #2
    1. Advertising

  3. Trader

    Trader Guest

    Thanks for your help, Neil! Your example code gave me an idea what I
    should be seeing when the HTML/RTF stuff is working properly. I'd
    been using a non-IE browser (Firebird) for testing, and it wasn't
    giving me those results. Thanks for getting me on track! Trader

    "Neil Hodgson" <> wrote in message news:<rlH2b.64505$>...
    > Trader:
    > > >>> clipboard_grab()

    > > FORMAT: 49171
    > > FORMAT: 16
    > > FORMAT: 7
    > > FORMAT: 0

    >
    > Now add in:
    >
    > for f in formats:
    > if f >= 0xC000:
    > print win32clipboard.GetClipboardFormatName(f)
    >
    > Formats above 0xC000 are dynamically registered clipboard types. I get:
    >
    > FORMAT: 13
    > FORMAT: 49278
    > FORMAT: 49245
    > FORMAT: 49171
    > FORMAT: 16
    > FORMAT: 7
    > FORMAT: 0
    >
    > HTML Format
    > Rich Text Format
    > Ole Private Data
    >
    > The HTML has a prologue and then some HTML:
    >
    > Version:1.0
    > StartHTML:000000195
    > EndHTML:000001891
    > StartFragment:000001597
    > EndFragment:000001710
    > StartSelection:000001597
    > EndSelection:000001710
    > SourceURL:http://sydney.citysearch.com.au/
    > <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
    >
    > <HTML><HEAD><TITLE>CitySearch.com.au Australia - Your guide to the city of
    > Sydney</TITLE>
    > ...
    >
    > The RTF looks normal:
    >
    > {\rtf1\ansi\ansicpg-1\deff0\deflang3081{\fonttbl{\f0\froman\fcharset0 Times
    > New Roman;}{\f1\ftech\fcharset0 Symbol;}{\f2\fswiss\fcharset0
    > Arial;}{\f3\fswiss\fcharset0 Courier New;}{\f4\ftech\fcharset0
    > Wingdings;}}{\colortbl\red0\green0\blue0;\red0\green0\blue255;\red0\green255
    > \blue255;\red0\green255\blue0;\red255\green0\blue255;\red255\green0\blue0;\r
    > ed255\green255\blue0;\red255\green255\blue255;\
    > ...
    >
    > Neil
     
    Trader, Aug 26, 2003
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. aurora
    Replies:
    2
    Views:
    839
  2. LeeRisq

    win32clipboard operation

    LeeRisq, Jul 23, 2009, in forum: Python
    Replies:
    4
    Views:
    278
    LeeRisq
    Jul 23, 2009
  3. Alfredo Agosti
    Replies:
    3
    Views:
    354
    Aaron Bertrand - MVP
    Sep 19, 2003
  4. Hollow Quincy

    rich:dataTable - rich:dataScroller

    Hollow Quincy, Dec 30, 2011, in forum: Java
    Replies:
    5
    Views:
    4,576
    Arved Sandstrom
    Jan 2, 2012
  5. Replies:
    16
    Views:
    198
    Dave Angel
    Sep 19, 2013
Loading...

Share This Page