Strange Socket problem

S

Steven Simpson

new Socket() can return null (it says, but not why) but I don't think
that is happening or you'd have seen an NPE.

Where are you seeing this? How can a constructor 'return' null?

<http://docs.oracle.com/javase/7/docs/api/java/net/Socket.html>

In another post, you also said (about InetAddress.getByName()):
All I wanted to do was to flag up that its odd that the Javadocs say that
sometimes it returns null without (apparently) throwing an exception but
don't say explicitly what governs each behaviour, and that it would be
useful to call it in such a way that that you force it to make that
distinction explicit.

Again, I don't see where this is said.

<http://docs.oracle.com/javase/7/docs/api/java/net/InetAddress.html#getByName(java.lang.String)>

Am I looking at the wrong docs?

Cheers!
 
S

Steven Simpson

It cycles fine until it hangs.


I'm trying to understand what 'fine' is, and how much of the failing
cycle appears fine before it fails. The previous 'fine' cycle must end
with a "Disconnected" message. The failing cycle might get as far as
printing "Connected", before jumping to the "Disconnected" message.

It's just that when you say "it just prints 'Disconnected'", you're
obviously truncating the entire trace to something manageable, but I
can't distinguish these cases:

* You're just showing the last message.
* You're showing all messages in the final cycle, and they happen to
include only one message.

These would be distinguishable by truncating just before the last
message of the previous cycle. The first case would then be indicated by:

Disconnected
Disconnected

....and the second case by:

Disconnected
Connected
Disconnected

I suppose you mean that it's the first case by this:

********* I know that the line below is not being executed **********

fireConnectionEvent(ConnectionEvent.CONNECTED);
socket.setSoTimeout(3600000); // one hour timeout
System.out.println("SportsWinClient: Connected");

....but I'm not clear on how you know this.

Perhaps you have a listener for the ConnectionEvent, and you can see
that it doesn't execute. But that could also mean that it or an earlier
listener in the chain has thrown an unchecked exception.
fireConnectionEvent doesn't protect itself against this.
 
X

x

Lew pisze:
The issue would be what's used inside the 'run()' method, not just what
fields the class has.

I agree that the 'thread' variable's scope is wrong - it should be local
to 'start()', not an instance member. But at the moment that's not
causing any outright harm.


His class spawns one thread per instance. The main thread doesn't use
the mutable fields. The only common access of note is the 'runFlag'
mechanism, which is bog-standard in his code. The 'thread' member is not
referenced within 'run()'. I'm not sure why all his other instance
variables are 'volatile', but at first blush I don't see a problem with
his synchronization safety.

You need to be specific about the trouble you claim you found. Show us
where, because I don't see danger coming from the areas you cited.

Consider the following scenario:

SportsWinClient swc = new SportsWinClient();
swc.start();
new Thread(swc).start();

(this exact code is unlikely to be present, but if the swc variable is
passed around, someone, in a code far far away, may start the swc in a
new thread. Simply because it's a Runnable.)

That is why I'm asking about the usage scenario.
It's also bog standard, and bog simple. Its purpose is to start the
socket thread.

Is it standard... well, I'm not sure. As far as I know, the most typical
thing to do would be to create a Runnable object (containing ONLY the
task logic), and a separate Thread object:

Runnable myRunnable = new MyRunnable(...);
Threat t = new Thread(myRunnable);
t.start();


public class Client
{
public static void main(String[] args)
{
new SportsWinClient().start();
}
}
 
X

x

Knute Johnson pisze:
It's so I can start and interrupt the thread from a method call.

The volatiles exist because the methods that access them can be called
from other threads. I could have synchronized the start() stop()

Other threads... What are other threads allowed to do ?
You see, I'm really troubled by the public methods:

public void disconnect()
public void stop()
public boolean isConnected()

methods but not easily the socket variable in the run() method. I
thought it was cleaner to just use volatile.

But, the volatile keyword does not provide any kind of access control.
It just tells the JVM "hey, if this variable is changed by thread A, the
changed value must be visible for thread B, so don't cache it too
aggressively'.

Marcin Jaskolski
 
P

Paka Small

I'm having a problem in some production code that I can't figure out.
I'll post the complete actual code below.  This code is running in three
places and has the same problem in two of them at the same time.  The
other I'm not sure, it may be that the personnel operating it are
restarting the program and so don't complain.  This piece of code is a
simple client that connects via a Socket to a server.  The server
supplies some data and the client reads that data and files it away.  It
is supposed to restart itself if there is a connection failure or fault
for whatever reason.  The problem is that at some random point in time
the Socket disconnects, the code logs the disconnect but never restarts.
  It does print the "SportsWinClient Disconnected" message but never
executes the "fireconnectionEvent()" method after creating a new Socket.
  It doesn't print any Exception message.  I'm not sure how it gets out
of the try block without printing the "End of Stream" message or an
exception message.

The crazy part is that all night long when there is no activity from the
server it times out and restarts with no problems.

I'm hoping that somebody will see a fault in my code that could cause
the failure.  It is not a compile problem so I left the formatting as it is.

Thanks for looking.

package com.knutejohnson.xyzcasinos.translux;

import java.io.*;
import java.net.*;
import java.util.*;

import com.knutejohnson.classes.*;

import static com.knutejohnson.xyzcasinos.translux.Constants.*;

public class SportsWinClient implements Runnable {
     private final Thread thread;

     private volatile boolean isConnected;
     private volatile boolean runFlag = true;

     private volatile Socket socket;

     public SportsWinClient() {
         thread = new Thread(this,"SportsWinClient");
     }

     public void start() {
         thread.start();
     }

     public void run() {
//        boolean serverFlag = true;

         System.out.println("SportsWinClient: Started");
         while (runFlag) {
//            String serverAddress = serverFlag ? SPORTS_WIN_IP_PRIMARY :
//             SPORTS_WIN_IP_SECONDARY;
             try {
//                socket = new Socket(serverAddress,SPORTS_WIN_PORT,
                 socket = new Socket(SPORTS_WIN_IP,SPORTS_WIN_PORT,
                  InetAddress.getByName(REMOTE_IP),0);
                 socket.setKeepAlive(true);
                 isConnected = true;

********* I know that the line below is not being executed **********

                 fireConnectionEvent(ConnectionEvent.CONNECTED);
                 socket.setSoTimeout(3600000);  // one hour timeout
                 System.out.println("SportsWinClient: Connected");
                 InputStream is = socket.getInputStream();
                 InputStreamReader isr = new InputStreamReader(is);
                 BufferedReader br = new BufferedReader(isr);

                 String str;
                 while ((str = br.readLine()) != null) {
                     if (!str.matches("\\d+.*"))  // not a sports record
                         continue;
                     SportsBet sb = new SportsBet(str);
                     SPORTS_BET_MAP.put(sb.betNumber,sb);
                 }

                 System.out.println("SportsWinClient: End of Stream");
             } catch (IOException ioe) {
                 System.out.println("SportsWinClient: "+ ioe.toString());
             } finally {
                 isConnected = false;
                 if (socket != null)
                     try {
                         socket.close();
                     } catch (IOException ioe) {
                         ioe.printStackTrace();
                     }
                 fireConnectionEvent(ConnectionEvent.DISCONNECTED);
//                serverFlag = !serverFlag;

*********** I know that the line below is being executed *************

                 System.out.println("SportsWinClient: Disconnected");
             }
             // stop interrupts this thread so this will bebypassed on
a stop
             try {
                 Thread.sleep(10000);
             } catch (InterruptedException ie) { }
         }
         System.out.println("SportsWinClient: Stopping");
     }

     public void disconnect() {
         if (isConnected())
             if (socket != null)
                 try {
                     socket.close();
                 } catch (IOException ioe) {
                     ioe.printStackTrace();
                 }
     }

     public void stop() {
         runFlag = false;
         thread.interrupt();
         if (socket != null)
             try {
                 socket.close();
             } catch (IOException ioe) {
                 ioe.printStackTrace();
             }
     }

     public boolean isConnected() {
         return isConnected;
     }

     private final java.util.List<ConnectionListener> connectionListeners =
      new ArrayList<ConnectionListener>();

     public synchronized void addConnectionListener(ConnectionListener cl) {
         connectionListeners.add(cl);
     }

     public synchronized void
removeConnectionListener(ConnectionListener cl) {
         connectionListeners.remove(cl);
     }

     private synchronized void fireConnectionEvent(int id) {
         ConnectionEvent ce = new ConnectionEvent(this,id);

         for (ConnectionListener listener : connectionListeners)
             listener.connState(ce);
     }

}

Like Steven I would suggest you add Catch Throwable to log the
information about any unexpected happening and make sure that you get
it from the log when it actually occurs. It should give you a clear
indication what to look for to solve the issue.

Kind regards, Paka
 
X

x

Knute Johnson pisze:
I'm having a problem in some production code that I can't figure out.

Hi Knute,

I was trying to analyze your code, and currently it might be too
complicated. The class controls thread starting, stopping, network
operations, parsing the received lines, notyfying other components, and
so on... Too many concerns.

If I were you, I would start with refactoring it into manageable chunks.
For example, in the most general terms, your class has to read lines
from a server and process them somehow. Why don't you write a small
class, which does EXACTLY that, and push the remaining issues to other
classes ?

If you do that, you will be able to unit test each of the parts
separately. Currently it's rather hard...

See this class for example:


public class Worker implements Runnable {

private final String serverAddress;
private volatile boolean running;

//we will get lines from this fellow...
private LineSource lineSource;

//...and pass them to this guy
private final LineProcessor lineProcessor;

public Worker(String serverAddress, LineProcessor lineProcessor) {
this.running = true;
this.serverAddress = serverAddress;
this.lineProcessor = lineProcessor;
}

@Override
public void run() {
// does not connect to server, just creates the object and saves //
the server address
lineSource = new LineSource(serverAddress);

while (running) {
String line = null;
try {
line = lineSource.getLine();
} catch (Exception e) {
// TODO LOG IT. MUST... HAVE... LOGS...
}

if (line != null) {
try {
lineProcessor.processLine(line);
} catch (Exception e) {
// TODO LOG IT. MUST... HAVE... LOGS...
}
}
}

// ok, we're finished, just close the lineSource
try {
lineSource.close();
} catch (Exception e) {
// TODO LOG IT. MUST... HAVE... LOGS...
}
}

public void stop() {
running = false;
}
}

It is just a starting point, still too messy to be used in production.
And I did't really look into the subject of blocking operations.

I realize that my solution to your problem is an indirect one, but hey,
it just took five minutes to write this code.

Have a nice day,
Marcin Jaskolski
 
M

markspace

I can't run it here because I can't get to the inside of the network
that these live on.

This is a serious problem, imo. You must be able to test code in order
to drive a bug to root cause, and then be able to test a fix. There's
no other way. At minimum you need a test environment that can duplicate
the issue. And not having access to real equipment that can test in a
live environment makes it very very difficult to verify any result.
I have no way to know what was being sent but all of
the relevant code is here. Bad data shouldn't cause any problems but
that does give me an idea of where to place another trap (thanks).

Again, pressing a code listing to your forehead and murmuring words that
your teachers taught you doesn't fix code. The whole apparatus must be
examined in order to find errant behavior. The idea that the customer
can wall off part of their system is ridiculous. Especially because
it's for their benefit that the system be examined and the problem found.
I have no idea what OS is running on the server and I can't get any data
from that end. I do know that the subprocess that I connect to is there
to keep me away from the actual server machine.

I hope you're getting paid by the hour and not for any particular result.
 
K

Knute Johnson

I'm trying to understand what 'fine' is, and how much of the failing
cycle appears fine before it fails. The previous 'fine' cycle must end
with a "Disconnected" message. The failing cycle might get as far as
printing "Connected", before jumping to the "Disconnected" message.

It's just that when you say "it just prints 'Disconnected'", you're
obviously truncating the entire trace to something manageable, but I
can't distinguish these cases:

* You're just showing the last message.
* You're showing all messages in the final cycle, and they happen to
include only one message.

These would be distinguishable by truncating just before the last
message of the previous cycle. The first case would then be indicated by:

Disconnected
Disconnected

...and the second case by:

Disconnected
Connected
Disconnected

I suppose you mean that it's the first case by this:

********* I know that the line below is not being executed **********

fireConnectionEvent(ConnectionEvent.CONNECTED);
socket.setSoTimeout(3600000); // one hour timeout
System.out.println("SportsWinClient: Connected");

...but I'm not clear on how you know this.

Perhaps you have a listener for the ConnectionEvent, and you can see
that it doesn't execute. But that could also mean that it or an earlier
listener in the chain has thrown an unchecked exception.
fireConnectionEvent doesn't protect itself against this.

Yes I do have a listener and it doesn't hear. Once it has stopped the
listening component says it is not connected. The connect/disconnect
messages are in the appropriate order and the last message received is a
Disconnect message.
 
K

Knute Johnson

This is a serious problem, imo. You must be able to test code in order
to drive a bug to root cause, and then be able to test a fix. There's no
other way. At minimum you need a test environment that can duplicate the
issue. And not having access to real equipment that can test in a live
environment makes it very very difficult to verify any result.

They can't risk having me have access to the server from my location
because for several reasons not the least of which it would be illegal.
All of the clients connect to the servers through Cisco hardware VPNs
which is another reason they don't want me to be able to get in from the
outside. I can however travel to the client's location and hook up, I
just don't want to for several other reasons.
Again, pressing a code listing to your forehead and murmuring words that
your teachers taught you doesn't fix code. The whole apparatus must be
examined in order to find errant behavior. The idea that the customer
can wall off part of their system is ridiculous. Especially because it's
for their benefit that the system be examined and the problem found.

I don't think the problem can be caused by bad data. If you see
someplace where you think that could be, please feel free to point it out.
I hope you're getting paid by the hour and not for any particular result.

That would be really nice wouldn't it :).

One of the sites' personnel are pretty savvy and just restart the
program when this happens. The other two call in house tech support,
who then calls my boss, who then calls me and I restart it. What I
would like to do is figure out why it quits so I don't have to hear from
them.
 
K

Knute Johnson

Like Steven I would suggest you add Catch Throwable to log the
information about any unexpected happening and make sure that you get
it from the log when it actually occurs. It should give you a clear
indication what to look for to solve the issue.

Kind regards, Paka

Thanks, that's going in tonight if they stop using it early enough.
 
K

Knute Johnson

Knute Johnson pisze:

Hi Knute,

I was trying to analyze your code, and currently it might be too
complicated. The class controls thread starting, stopping, network
operations, parsing the received lines, notyfying other components, and
so on... Too many concerns.

If I were you, I would start with refactoring it into manageable chunks.
For example, in the most general terms, your class has to read lines
from a server and process them somehow. Why don't you write a small
class, which does EXACTLY that, and push the remaining issues to other
classes ?

If you do that, you will be able to unit test each of the parts
separately. Currently it's rather hard...

See this class for example:


public class Worker implements Runnable {

private final String serverAddress;
private volatile boolean running;

//we will get lines from this fellow...
private LineSource lineSource;

//...and pass them to this guy
private final LineProcessor lineProcessor;

public Worker(String serverAddress, LineProcessor lineProcessor) {
this.running = true;
this.serverAddress = serverAddress;
this.lineProcessor = lineProcessor;
}

@Override
public void run() {
// does not connect to server, just creates the object and saves // the
server address
lineSource = new LineSource(serverAddress);

while (running) {
String line = null;
try {
line = lineSource.getLine();
} catch (Exception e) {
// TODO LOG IT. MUST... HAVE... LOGS...
}

if (line != null) {
try {
lineProcessor.processLine(line);
} catch (Exception e) {
// TODO LOG IT. MUST... HAVE... LOGS...
}
}
}

// ok, we're finished, just close the lineSource
try {
lineSource.close();
} catch (Exception e) {
// TODO LOG IT. MUST... HAVE... LOGS...
}
}

public void stop() {
running = false;
}
}

It is just a starting point, still too messy to be used in production.
And I did't really look into the subject of blocking operations.

I realize that my solution to your problem is an indirect one, but hey,
it just took five minutes to write this code.

Have a nice day,
Marcin Jaskolski

Thanks Marcin. All of this has already been tested many times as
individual pieces. It has been running in this condition for almost a
year now in three sites.
 
K

Knute Johnson

Lew pisze:
The issue would be what's used inside the 'run()' method, not just
what fields the class has.

I agree that the 'thread' variable's scope is wrong - it should be
local to 'start()', not an instance member. But at the moment that's
not causing any outright harm.


His class spawns one thread per instance. The main thread doesn't use
the mutable fields. The only common access of note is the 'runFlag'
mechanism, which is bog-standard in his code. The 'thread' member is
not referenced within 'run()'. I'm not sure why all his other instance
variables are 'volatile', but at first blush I don't see a problem
with his synchronization safety.

You need to be specific about the trouble you claim you found. Show us
where, because I don't see danger coming from the areas you cited.

Consider the following scenario:

SportsWinClient swc = new SportsWinClient();
swc.start();
new Thread(swc).start();

(this exact code is unlikely to be present, but if the swc variable is
passed around, someone, in a code far far away, may start the swc in a
new thread. Simply because it's a Runnable.)

That is why I'm asking about the usage scenario.
It's also bog standard, and bog simple. Its purpose is to start the
socket thread.

Is it standard... well, I'm not sure. As far as I know, the most typical
thing to do would be to create a Runnable object (containing ONLY the
task logic), and a separate Thread object:

Runnable myRunnable = new MyRunnable(...);
Threat t = new Thread(myRunnable);
t.start();


public class Client
{
public static void main(String[] args)
{
new SportsWinClient().start();
}
}

I'm not sure what your argument is. It is easy enough either way to
create or start numerous examples. I don't and I won't and your
scenario wouldn't prevent that either. And if the Thread is started
more than once it throws and exception.
 
M

markspace

One of the sites' personnel are pretty savvy and just restart the
program when this happens. The other two call in house tech support, who
then calls my boss, who then calls me and I restart it. What I would
like to do is figure out why it quits so I don't have to hear from them.


You're going to have to camp out in their machine room until you find
it. There probably isn't any other way. So far I think we can add
"Cisco VPN" to the list of things that may be giving up the ghost.

BTW, why do you keep referring to bad data? I never mentioned bad data
that I can recollect. I mentioned hardware, operating systems,
debugging, diagnostics, and the need to collect diagnostic information
from the systems involved. I never mentioned data.

My guess is it isn't in the Java code at all, it's somewhere else.
Cisco pretty famous for ignoring specs and RFCs and just doing their own
thing. It causes lots of weird incompatibilities with just about
everything.
 
K

Knute Johnson

You're going to have to camp out in their machine room until you find
it. There probably isn't any other way. So far I think we can add "Cisco
VPN" to the list of things that may be giving up the ghost.

BTW, why do you keep referring to bad data? I never mentioned bad data
that I can recollect. I mentioned hardware, operating systems,
debugging, diagnostics, and the need to collect diagnostic information
from the systems involved. I never mentioned data.

My guess is it isn't in the Java code at all, it's somewhere else. Cisco
pretty famous for ignoring specs and RFCs and just doing their own
thing. It causes lots of weird incompatibilities with just about
everything.

Because the only thing two things I can think of right now are bad data
causing an exception that I'm not seeing for some reason or something in
the TCP/IP connection sequence that is blocking the socket creation and
hanging the code at that point.
 
L

Lew

x said:
But, the volatile keyword does not provide any kind of access control. It just
tells the JVM "hey, if this variable is changed by thread A, the changed value
must be visible for thread B, so don't cache it too aggressively'.

It does more than that.

All writes from thread A prior to the write to the volatile variable are
visible from thread B after the latter reads that variable.

However I agree that multiple views of the same volatile variable introduce
danger, as in the idiom

if (volaVar != null)
{
volarVar.doSomething();
}

because there's a threat window between the conditional check for non-'null'
and the dereference to the method call where another thread could set the
variable to a different value.

There's also a threat when you use two volatile variables to control a code block:

if (volaVar1.equals(SOME_VALUE) && volaVar2.equals(DIFFERENT_VALUE)) ...

because 'volaVar1' can change by the time you test the second clause.
 
J

John B. Matthews

Knute Johnson said:
Because the only thing two things I can think of right now are bad
data causing an exception that I'm not seeing for some reason or
something in the TCP/IP connection sequence that is blocking the
socket creation and hanging the code at that point.

Any chance they started preemptively rebooting errant hardware at a time
that correlates with your observation?
 
K

Knute Johnson

Any chance they started preemptively rebooting errant hardware at a time
that correlates with your observation?

I think that is likely but I have no way to get that information. The
company holds tight to information like that.

I've put in code to trap any Throwable and also to look for the
possibility of bad data. Now I just have to wait and see if it happens
again.

Thanks for looking.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,189
Latest member
CryptoTaxSoftware

Latest Threads

Top