Library for simple content type identification

C

C.Ullenboom

Hello friends of Java.

For my chat application I'm looking for a simple solution to find a
possible content type of a String to set an appropriate renderer. If
the sender is posting something like

public static void main(String...args[] ) { int i; }

the chat should format it as Java because it looks like typical Java
source. If it looks like

Is this <b>HTML</b>?

it should guess HTML.

Parsing and validating is not required. Because of the nature of
exchanging fragments in a chat I don't have a content/MIME type
either.

Do you know a library for guessing a type? Interesting types are
standard (wiki) text, Java, HTML, XML, SQL or a general listing type,
e.g. for property files.

What do you think are typical patterns to look for? My first idea is to
look for Java keywords, or piled expressions like

</?(\w+)(\s*\w*\s*=\s*("[^"]*"|'[^']'|[^>]*))*|/?>

for html/xml or a string matching

(SELECT\s[\w\*\)\(\,\s]+\sFROM\s[\w]+)|
(UPDATE\s[\w]+\sSET\s[\w\,\'\=]+)|
(INSERT\sINTO\s[\d\w]+[\s\w\d\)\(\,]*\sVALUES\s\([\d\w\'\,\)]+)|
(DELETE\sFROM\s[\d\w\'\=]+)

for SQL.

Christian
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,763
Messages
2,569,563
Members
45,039
Latest member
CasimiraVa

Latest Threads

Top