troubleshooting urouter problems: webex presentation

80
Troubleshooting Urouter Problems Oct. 31, 2013 Chris Breemer, Compuware Technical Support Table of contents 1. Overview of exclusive and shared architecture 2. Client-side troubleshooting 3. Under the hood – the urouter log 4. Server-side troubleshooting 5. Troubleshooting runtime problems 6. Troubleshooting web connection problems What NOT to expect A comprehensive discussion of all possible urouter and userver features. A configuration and performance tuning guide. A Tomcat/IIS/Web configuration guide. A document for all platforms. Only Windows and Unix are discussed. A list of all possible problems and their solution What we’ll do Take you through the entire connection process Examine the urouter trace file to get the big picture of what happens under the hood Highlight the usual spots where things can go wrong Present some of the tools that can be used for monitoring and troubleshooting

Upload: uniface

Post on 10-May-2015

3.175 views

Category:

Technology


41 download

DESCRIPTION

• Overview of exclusive and shared architecture • Do’s and don’ts • Under the hood – the Urouter log • Troubleshooting techniques

TRANSCRIPT

Troubleshooting Urouter Problems Oct. 31, 2013

Chris Breemer, Compuware Technical Support

Table of contents

1. Overview of exclusive and shared architecture 2. Client-side troubleshooting 3. Under the hood – the urouter log 4. Server-side troubleshooting 5. Troubleshooting runtime problems 6. Troubleshooting web connection problems

What NOT to expect

• A comprehensive discussion of all possible urouter and userver features. • A configuration and performance tuning guide. • A Tomcat/IIS/Web configuration guide. • A document for all platforms. Only Windows and Unix are discussed. • A list of all possible problems and their solution

What we’ll do

• Take you through the entire connection process • Examine the urouter trace file to get the big picture of what happens under the hood • Highlight the usual spots where things can go wrong • Present some of the tools that can be used for monitoring and troubleshooting

1 – Overview of exclusive and shared architecture

Traditionally, the Uniface client-server architecture gave every client its own dedicated polyserver. A simple and robust protocol, it was entirely satisfactory in the time when it was designed. However over the years, with installations growing bigger and bigger, the limitations of this approach became clear:

• Too many processes created on the server – 300 clients meant 300 polyserver processes on the server, using up a lot of memory and other resources. Most platforms also have certain restrictions and quota that limit the number of polyservers.

• Too many database connections, because each polyserver process needs to connect to the database, causing strain on the database • High licensing costs - well, not with the old SEK-based licensing but with the new DLM licensing which checks on numbers.

Therefore in Uniface 8, we introduced the shared architecture. The key to the shared architecture is a new middleware process called the urouter. Clients now only communicate with the urouter, who maintains a pool of uservers which can basically serve any client as long as that client does not ask for a specific server. With this approach it is possible to have far less userver (the new name for polyserver) processes than there are client processes. This can greatly reduce Uniface footprint on the server.

It is possible to assign servers a specific role, like being a database-only server, application-only server, file-only server, or even a server specific for one database. This feature is seldom if ever used however, and we will not discuss it here. In this presentation, any server is a database server, application server, and file server simultaneously.

By default, a database request from a client has no preference for a specific server. In that case that urouter can pick one of the currently running uservers that are in status Idle, and assign it the client’s request. Typically, that will be the most recently used userver.

This is a good moment to discuss the specific states that shared uservers can be in :

Idle, no state

The userver is not busy executing a request, it is not locked to a client, and has no state (i.e. instances). It is available to serve any client’s request as long as that client does not have a preference. In a web environment, where stateless components are used, this is the state we will expect to see for most uservers.

Idle, has state

The userver is not busy executing a request, it is not locked to a client, but has state (i.e. one or more open instances). It is available to serve any client’s request as long as that client does not have a preference, but a client who does an activate of the open instance will use only this userver.

Locked

The userver is locked to a specific client when it is in a transaction. This typically happens when the client modifies an occurrence and thus locks a row in the database on the server. This userver is not available to serve requests from other clients until the transaction is committed (i.e. the modifications stored). The client in question will during this time ONLY use this same userver. Note that this userver can also have state for the same or any other client. This you can see in the Router Monitor, but not in the urouter log.

Busy

The userver is currently executing a request for a client and is obviously not available for anything until the request is completed and the response sent back to urouter.

Locked uservers often cause a problem in old applications that were migrated from Uniface 7 or earlier. These applications often keep transactions open for a long time, so that new uservers have to be started all the time. In the worst case, one could end up with as many uservers as there are clients, plus the added overhead of the urouter juggling all that data traffic. This is clearly not good for performance. For this reason, customers with old applications are often advised to use exclusive uservers only. The shared architecture is best suited for applications that were specifically designed for it.

Another potential pitfall is the fact that shared uservers can hold state for multiple clients, thus enabling one client to overwrite or access the other client’s data. For example consider this case.

Client 1 creates an instance of service A and activates an operation in A that sets a component variable in the service. This is all handled by the same userver, say server with sid=1, which now is in status Idle, has state.

Client 2 also creates an instance of service A and activates the same operation in A setting the component to another value. Because server 1 is available, it will handle the request, using the same instance.

Client 1 now wants to read back the value it had set, but gets the value that was set by client 2.

In combination with the Locked status, this can also lead to unexpected wait situations. Consider this example;

Client 1 retrieves a row and creates an instance of service A. This is all handled by the same userver, say server with sid=1, which now is in status Idle, has state.

Client 2 also retrieves a row and modifies it. Because server 1 was available, both requests are handled by server 1, and the modification causes server 1 to be locked to client 2.

Client 1 now wants to activate an operation in the instance it had created. This can obviously only be handled by server 1, but server 1 is now locked, causing client 1 to wait until client 2 completes its transaction.

Both scenarios can happen in a migrated client-server application that does not manage its instances and transactions carefully. In such cases, using exclusive servers may be the better choice.

The Router Monitor

To find out if shared uservers are locked and/or have state, you use the Router Monitor (urmon.exe). You first need to connect to the urouter in question, using Urouter->Connect, then from the pulldown menu choose Servers->Show. By clicking the small icons to the left of the Server ID fields you can being up a Server state form for each userver, showing that one of our two uservers is Locked and the other isn’t. Both are Idle, however Server 1 is not available to other clients than the one owning the lock.

The 3 fields together labeled Current State can have the respective values Idle/Busy, Locked/Has context and Starting/Exiting. Unfortunately, there is no way to find out why exactly an userver is locked, or what context it is that an userver is holding. The urouter does not keep this kind of information. It is worth nothing that you cannot see exclusive uservers in the Router Monitor (although you may see them briefly during the connection process).

2 - Client side troubleshooting

Where to start investigating when things don’t work depends on what information you are given. If the customer or end user rings saying “I can’t get into the application”, and you know the application involves an urouter, it is probably best to start right from the beginning, verifying all steps one by one. A common mistake is to take certain things for granted, for example that the client is connecting to the urouter that you think it should be, whereas maybe it is going somewhere completely different. It is good practice not to take ANYTHING for granted, even the things that you are sure should not be wrong. Remember Murphy !

As an example, let’s investigate a classic client/server scenario (we’ll get to web later), where a client application on Windows is supposed to start an exclusive userver on Solaris to retrieve data from Oracle. Many user applications have an application-specific logon screen that verifies the user credentials in the server database, and it is often at that point that something fails. The failure can be anywhere between client and database, but the user’s perception is only that they “can’t log on”. Typical error symptoms here can be

• an hourglass, and or application no longer responds • an error message Logon to database failed • Logon (TCP:violet+10094|chris|***|userver) failed with status -21, Network logon error

The most important thing is to get the EXACT error symptoms from the user. If they say there was an error on the screen, have them do it again and take a screenshot before closing the application. Always ask for a message frame with $ioprint = 255. Always ask for customer’s assignment file(s) but do not take for granted that they send the right file(s). Whenever you look at an assignment file ask yourself if the application is indeed using this file. The quickest way to find out is to insert a deliberate error in the file and verify the application now refuses to start up. Uniface will always give a clear message about assignment file errors (except in a server process, which is very inconvenient, we’ll get to that). For example, insert this line in your assignment file

[error]

and start the application. This results in a transcript window with the exact error and error location.

If you don’t see that error, your application is NOT using this asn-file. You will see such an error on the screen even if the transcript is redirected to a file elsewhere in the asn-file(s). I have found this little trick extremely useful over the years.

Having made 100% sure what assignment file(s) the application uses, we can now start putting things in there to troubleshoot. If the symptom was an hourglass, it is best to apply a client connect timeout :

[settings] $net_timeout cct=20s

This will cause the client to give up after 20 seconds when the network path can’t be connected.

Next step is to verify which path is causing the problem. In this case this will be a network path. If you locate the path in the asn-file, do verify the application indeed tries to open this path, by replacing the password in the logon string by a question mark, e.g.

$def tcp:violet+10094|chris|?|userver –ex

This should cause Uniface to display a logon form for path $DEF:

If you don’t see such a logon form, you are definitely not using this path. Keep in mind that the same path could be defined elsewhere, maybe in usys.asn or an included asn-file. It is best to avoid any redundancy in your assignments, and not rely on Uniface’s rules about which assignment takes preference (which depends on the kind of assignment and is not always what you’d expect).

This is a good place to give some general tips about maintaining assignment files such that they are more easy to read and maintain. White space is allowed in many places, do take advantage of it. Keeping your asn-files tidy, clear, and unambiguous is a good practice which pays off when there’s a problem. Some tips than can be useful (though you may not want to follow all of them) :

• Use spaces (not tabs) to create a tabular layout • Equal signs can generally be omitted (except when part of the right-side value) • Work alphabetically • Use uppercase for section names, lowercase elsewhere (unless it is case sensitive like filenames on Unix) • Avoid unnecessary comments and commented-out lines • Avoid empty sections • Avoid redundancy • Do not use any settings or flags unless you know why • Specify full pathnames for files • Do not rely on Uniface’s precedence rules (avoid duplication across asn-files)

To illustrate some of these points, see how much clearer and tidier an average asn-file can become:

A typical jumble… … made nice and tidy ! [SETTINGS] $trace_is_true $variation=CUS $keyboard=MSWINX $enhanced_edit=all $active_field=col=21 $curocc_video=col=21 ;$def_curocc_video=col=21 ;;; Initialize Booleans as false $STORE_BOOLEANS_AS_FALSE ;search in DICT first ! $search_object=DBMS_FIRST $search_descriptor=DBMS_FIRST ;search only in DOL and URR ;$search_object=FILE_ONLY ;$search_descriptor=FILE_ONLY ;;; Uniface License File ;;$license_options LM_LICENSE_FILE="P:\UNIFACE8\Uniface_License.xml" ;;; Uniface 9 License file $license_options LM_LICENSE_FILE=7188@licserver [USER_3GL] ;;; demandload=KERNEL32.DLL,ADVAPI32.DLL,CusMail32.dll, ScreenPrint.dll ;;; Load User DLL's ..\ dll\CusMail32.dll /preload ADVAPI32.DLL /preload ..\ dll\GAPI32.DLL /preload ;;; Load local kernel32.dll laden KERNEL32.DLL /preload [LOGICALS] [PATHS] $DICT ORA:ORA10G|oradict|oradict $IDF = $DICT $SYS = $DICT $UUU = $DICT [ENTITIES] *.TEXT $DICT:*.* *.DICT $DICT:*.* *.UVCS $DICT:*.*

[SETTINGS] $active_field col=21 $curocc_video col=21 $enhanced_edit all $keyboard mswinx $license_options lm_license_file=7188@licserver $search_descriptor dbms_first $search_object dbms_first $store_booleans_as_false $trace_is_true $variation cus [PATHS] $dict ora:ora10g|oradict|oradict $idf $dict $sys $dict $uuu $dict

[ENTITIES] *.UVCS $dict:*.*

[USER_3GL] ..\dll\cusmail32.dll /preload ..\dll\gapi32.dll /preload advapi32.dll /preload kernel32.dll /preload

Back to troubleshooting the connection. Now that we know beyond doubt which path is causing the problem, let’s look at the components of the path, e.g.

$tcp = violet+13001|john|smith|orsv

This simple path assumes quite some things:

• We can reach the host violet on the network • That machine has an urouter running on port 13001 • That machine has an account for user john (password smith) • The urouter assignment file on that machine has the UST orsv defined • The server described in the UST orsv can be started under the account john

Let’s check all this step by step. Is the network accessible and can we actually reach that host ? On the commandline, do

$ ping violet

for a sanity check. The expected output will look like this

C:\>ping violet Pinging violet.emea.cpwr.corp [172.16.32.135] with 32 bytes of data: Reply from 172.16.32.135: bytes=32 time<1ms TTL=255 Reply from 172.16.32.135: bytes=32 time<1ms TTL=255 Reply from 172.16.32.135: bytes=32 time<1ms TTL=255 Reply from 172.16.32.135: bytes=32 time<1ms TTL=255 Ping statistics for 172.16.32.135: Packets: Sent = 4, Received = 4, Lost = 0 (0% loss), Approximate round trip times in milli-seconds: Minimum = 0ms, Maximum = 0ms, Average = 0ms

The ping command is available on Windows, Unix and Linux. It can already diagnose certain connection issues. For example;

C:\>ping violet Pinging violet.emea.cpwr.corp [172.16.33.217] with 32 bytes of data: Reply from 172.16.43.135: Destination host unreachable. Reply from 172.16.43.135: Destination host unreachable. Reply from 172.16.43.135: Destination host unreachable. Reply from 172.16.43.135: Destination host unreachable. Ping statistics for 172.16.33.217: Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),

This host was known to the network (the DNS server on IP address 172.16.43.135) but could not be reached. Most often this means the machine is turned off, or does not have a working network connection, or was maybe disposed of without reconfiguring the DNS tables. When you get such an error, consult your network administrator. In Uniface, an unreachable host will cause the client to wait forever (showing the hourglass) unless you have a client connect timeout, in which case it will eventually show TCP error 10060 (on Windows, on Unix the number will be different).

TCP error [10060]: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. (10060) Logon (TCP:violet+13001|chris|***|userver -ex) failed with status -18, Failed to connect to URouter

This is on Windows. If the client is on Unix, this situation will produce error 111 (Connection refused) :

TCP error [111]: Connection refused (111) Logon (TCP:violet+10094|chris|***|userver -ex) failed with status -18, Failed to connect to URouter

The message Connection refused sometimes confuses people. It is not a connection problem, or a rights/access/permission problem, it means that you could connect to the server but no process is listing on the specified port.

Another typical ping response is this

C:\>ping violet Ping request could not find host violet. Please check the name and try again.

meaning that you have the host name wrong or misspelled. In Uniface, a wrong hostname will produce following error in the message frame:

TCP: No such host (-4) Logon (TCP:violet+13001|chris|***|userver -ex) failed with status -18, Failed to connect to URouter

If the ping is successful, we know the host is alive and reachable. So how do we know it’s listening on port 13001 ? You can of course go and log on to that machine and check, which we will do, but you can already get some info right here using the telnet command, which is available on all Unix/Linux and most Windows versions. Telnet actually tries to connect, rather than do just a network sanity test. It is important to use both ping (which works on the IP level) and telnet (which works on the TCP level).

While you cannot use telnet to actually converse with the urouter, you can use it to test the connection, specifying the port you want to reach :

C:\>telnet violet 13001 Connecting To violet...Could not open connection to the host, on port 13001: Connect failed

Note that telnet does not tell you WHY it can’t connect. In this case we already made sure with ping that the host was alive and kicking, so the conclusion is that no process is listening on violet on port 13001. This typically means that urouter is not running, or that we have the port number wrong.

The other possible response of telnet is … nothing at all! That usually means it has connected ! On Windows, the screen goes blank, and you have to press Ctrl-] (the Telnet escape character) to get the prompt:

Welcome to Microsoft Telnet Client Escape Character is 'CTRL+]' Microsoft Telnet>

On Unix, telnet will display the telnet banner:

Trying 172.16.32.135... Connected to cwnl-violet.emea.cpwr.corp (172.16.32.135). Escape character is '^]'.

and also here you press Ctrl-] to get the prompt:

^] telnet>

Now is a good time to verify you are indeed connected to port 13001 on violet. Use TcpView ( a tool to be presented hereafter) :

Or else use the netstat command:

C:\ >netstat Active Connections Proto Local Address Foreign Address State TCP 0.0.0.0:80 AMS090861D1:0 LISTENING TCP 0.0.0.0:135 AMS090861D1:0 LISTENING TCP 0.0.0.0:445 AMS090861D1:0 LISTENING TCP 0.0.0.0:623 AMS090861D1:0 LISTENING TCP 127.0.0.1:51635 AMS090861D1:51549 ESTABLISHED TCP 127.0.0.1:51938 AMS090861D1:0 LISTENING TCP 127.0.0.1:56213 AMS090861D1:56214 ESTABLISHED ... TCP 172.16.43.135:56623 emea-ams-fs101:microsoft-ds CLOSE_WAIT TCP 172.16.43.135:56643 65.55.246.20:https ESTABLISHED TCP 172.16.43.135:56651 lhr08s02-in-f4:https TIME_WAIT TCP 172.16.43.135:56655 lhr08s02-in-f0:https TIME_WAIT TCP 172.16.43.135:56656 emea-ams-ps002:microsoft-ds ESTABLISHED TCP 172.16.43.135:56659 cwnl-violet:13001 ESTABLISHED TCP 172.16.43.135:56660 db3msgr6011506:https ESTABLISHED

You should find your connection on the specified host and port with the status ESTABLISHED. Now it should definitely be possible to make a connection to a userver on the other end.

Before moving on to server-side troubleshooting, let’s mention some very useful (and free!) tools I regularly use on Windows client. We’ll see some of these tools in action later on.

TcpView from http://technet.microsoft.com/en-us/sysinternals/bb897437.aspx

See all network connections as well as services listening on ports. TcpView is basically a graphical interface on top of netstat, but is easier to use. Also you can see process properties, and kill specific processes and sockets. A good way to see if your local service is listening or your client has been connected to a service.

Process Monitor from http://technet.microsoft.com/en-us/sysinternals/bb896645.aspx

Monitor system activity of all or selected processes. This can include file I/O, registry activity, network activity and process activity. A great way to find out which file is not being found, or why an “access denied” error is given. You can also see details about sockets and process and thread creation. Generally this is the first tool I deploy when troubleshooting any file-related problem, on Windows.

Process Explorer from http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx

See all running processes and their relations and properties. You can see DLL’s and all other types of resources in use by a specific process, as well as many properties of the process like threads and stack traces. Ideal to check which files, pipes, etc are being used, and which DLL’s are loaded and if they are of the correct version. A very handy option of Process Explorer is the Find->Handle or DLL function in the pulldown menu, which shows you which process has a specific file or DLL open.

Dependency Walker from http://www.dependencywalker.com/

Examine DLL dependencies and exports/entrypoints/ordinals of .exe and .dll files. The profiling option for .exe files is like a debugger, a great way to investigate problems related to loading and initializing dll’s.

Network Monitor from http://www.microsoft.com/en-us/download/details.aspx?id=4865

The tool for monitoring of network traffic in selected processes and conversations. The sure way to see what really goes over the network line, and see details about several network layers. The tool has knowledge of nearly all network-related protocols. Network Monitor is a step up from earlier tools like WireShark (formerly Ethereal) and tcpdump (Unix only). One disadvantage is that (AFAIK) you cannot monitor network traffic within the machine, i.e. between a client and a locally running urouter.

Redmond Path from http://download.cnet.com/Redmond-Path/3000-2094_4-10811594.html

Maintaining the Windows PATH variable is an arduous and error-prone task if you use the standard Windows interface. This GUI path manager makes life a lot easier, and is great to remove unwanted entries from the PATH and move things around.

3 - Under the hood – the urouter log

In most or many cases the key to troubleshooting connection problems is first of all the urouter log.

By default, the logfile generated by urouter only displays top-level errors such as the dreaded -25, -17 and -16 errors. These are quite useless, except for alerting you the fact that there IS a problem, so you always need to go back to the customer to reproduce the problem with full logging. To see the details, you must use $ioprint = 255, and to get the maximum info, use tracing.

The recommended settings are these

[settings] $ioprint 255 $trc_start urouter.trc $trc_levels 9A-Za-z6c5s0R5t0z0N $trc_info cat,lvl,dtt

which will be the settings used for this presentation. The term "urouter log" will henceforth mean the urouter.trc file generated by these settings.

TIP While troubleshooting, put your logging where you can SEE it. Don’t stuff it away in files with hard to remember names and locations. Nothing beats seeing things happen in real time. Once you succeed in writing a log- or tracefile, download a program like WinTail or BareTail so you can follow the log and see stuff rolling over the screen as you progress. This is on Windows. On Unix, you can use the tail command to follow a file being written to. Also handy on Unix is directing your logfile to a terminal window. E.g.

[settings] $ioprint = 255 $putmess_logfile = /dev/pts/2

Before using the urouter log for troubleshooting, let's go through it following a successful connection. This gives a pretty good idea of the flow of events, as well that it tells you what you should expect to see (never examine a log without knowing what it SHOULD look like). We will list all lines from the log, interjected by explanations of what is about o happen.

The following is what you see when an urouter is started on Windows. The startup banner contains valuable information about the environment. On Windows, it will display the full command line, unless it is a service we are starting, in which case it displays the service name (like in this case: Uniface 96 Development URouter). On other platforms, the command line is not available in this banner.

[startup] ====================================================================== [startup] Date/time : 2013-09-20 13:09:29.27 [startup] Uniface : [MSW] 9.6.03.02, X301 (Sep 18 2013), $ioprint=255 [startup] Command : Uniface 96 Development URouter, pid=3620 [startup] Directory : d:\uf\96\common\bin\ [startup] OS : Windows 7 Service Pack 1 (Build 7601) [startup] Processor : Intel64 Family 6 Model 42 Stepping 7, Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz, 8149 Mb [startup] Hostname : AMS090861D1.clients.emea.cpwr.corp [startup] User : system [startup] ======================================================================

Urouter.exe, which is not much more than a startup shell, loads urout.dll, which contains most or all of the urouter code. Always check that the DLL version is the one expected (i.e. the same as the version of urouter.exe). This check is of course specific for Windows, on other platforms we don't have the luxury of DLL versions. Also displayed is the source version number of rout.c, which is important as that file contains most of the urouter code.

9 1F 1379682569 Loaded 'urout' from d:\uf\96\common\bin\urout.dll, version: 9.6.03 X301 9 2F 1379682569 CONT_ID=%fv: rout.c-163 % %dc: Mon Mar 04 16:01:33 2013 %

Urouter declares itself started and creates a listen thread:

9 3F 1379682569 URouter started at 20-sep-2013 13:09:29 9 4F 1379682569 URouter pid=3620;rid=E689D463-26F2-4930-BADC-F2B7D5DFD2B3 9 5F 1379682569 started thread to listen to TCP:+13001 9 6F 1379682569 UROUTERSTART: waiting for listening threads 9 7F 1379682569 listen_net: new thread active, cnt=2, lst=0, pmq=0

The listen thread loads the PSV middleware DLL umwpsv10.dll. This DLL implements the polyserver protocol, like handshaking and building of specific client-server messages. Again, do verify the DLL version.

9 8F 1379682569 Loaded 'umwpsv10' from d:\uf\96\common\bin\umwpsv10.dll, version: 9.6.03 X301

The listen thread loads the TCP driver, utcp10.dll, so that it can start listening for connection requests. This DLL is the actual interface to TCP/IP, i.e. the network, using sockets for connection. As always, do check the DLL version.

9 9F 1379682569 Loaded 'utcp10' from d:\uf\96\common\bin\utcp10.dll, version: 9.6.03 X301

Some internal housekeeping…

5 1s 1379682569 UNWTCP: enter TCP(6424304) call=NETINFO, chn=0, lst=0 5 2s 1379682569 UNWTCP: exit TCP(6424304) call=NETINFO, chn=0, lst=0, result=NET_SUCCESS, err=0 5 3s 1379682569 UNWTCP: enter TCP(6424304) call=NETCREATE_SHARED, chn=0, lst=0

Urouter now calls bind() (this is a function in the TCP socket API) to associate the name of the host with the socket.

3 4s 1379682569 UNWTCP: TCP6create : bind(): chn=613 hst=AMS090861D1.clients.emea.cpwr.corp on TCP4

Some more housekeeping. Unfortunately, the listen() call that will put the socket in listen mode is not displayed here, unless it fails.

5 5s 1379682569 UNWTCP: exit TCP(6424304) call=NETCREATE_SHARED, chn=0, lst=613, result=NET_SUCCESS, err=0 5 6s 1379682569 UNWTCP: enter TCP(6459368) call=NETINSTANCE, chn=0, lst=613 5 7s 1379682569 UNWTCP: exit TCP(6459368) call=NETINSTANCE, chn=0, lst=613, result=NET_SUCCESS, err=0 5 8s 1379682569 UNWTCP: enter TCP(6459368) call=NETCONNECT, chn=0, lst=613 9 10F 1379682569 UROUTERSTART: All listening threads started

This is where the log pauses when urouter is successfully started up. It is now listening for connection requests on its designated port 13001. It is always a good idea to verify, using a command like netstat or TcpView, that the socket is in LISTEN or LISTENING mode. Here is how that looks in TcpView :

On Unix or Linux (or on Windows, if you prefer the command line to the GUI) you use the netstat command:

$ netstat -a | grep 13001 *.13001 *.* 0 0 49152 0 LISTEN

So now, our urouter is all set up to go, waiting to get to work. From now on, the flow of events depends on whether the client requested an exclusive or shared userver. Let's first look at an exclusive connection.

The request from a client comes in:

9 10F 1379682041 accepted new connection on TCP:+13001 5 11s 1379682041 UNWTCP: enter TCP(3512392) call=NETINSTANCE, chn=0, lst=609 5 12s 1379682041 UNWTCP: exit TCP(3512392) call=NETINSTANCE, chn=0, lst=609, result=NET_SUCCESS, err=0 5 13s 1379682041 UNWTCP: enter TCP(3512392) call=NETCONNECT, chn=0, lst=609

A thread is created to handle the request, so that urouter has its hands free to accept new requests :

9 2Z 1379682041 thpsv: new thread received u=3511008, thp=7798664, net=3510248, upsv=0, rmth=0, tha=0 1 3Z 1379682041 thpsv: new thread active chn=621, cnt=3, lst=1, pmq=0, cc=1

The urouter reads the socket to obtain the client's connection request

5 14s 1379682041 UNWTCP: enter TCP(3510248) call=NETGET, chn=621, lst=609 5 15s 1379682041 UNWTCP: exit TCP(3510248) call=NETGET, chn=621, lst=609, result=NET_SUCCESS, err=0

The request was for an exclusive userver (EXCLTCON). You see full details about the requesting client (clt=) and the logon information (log=).

9 11F 1379682041 From Client:chn=621;len=151: EXCLTCON; 9 12F 1379682041 clt=(hst=172.16.43.135,AMS090861D1.clients.emea.cpwr.corp;pid=14968;tid=13548;sid=0;usr=cwnl-chris;ust=) 9 13F 1379682041 log=(hst=TCP:localhost+13001;usr=emea\cwnl-chris;ust=userver -ex)

Next is some housekeeping :

9 14F 1379682041 reguser: nid=172.16.43.135, node=AMS090861D1.clients.emea.cpwr.corp, pid=14968, ust= 1 1a 1379682041 claimsrv: want sid=0 usr=cwnl-chris ust=userver mnem=ANY ex=1 strt=1 5 2a 1379682041 prepare_to_wait: for sid=1; ust=userver 5 3a 1379682041 prepare_to_wait: Queued client entry #1 in server queue 5 16s 1379682041 UNWTCP: enter TCP(3510248) call=NET_GETOSCHAN, chn=621, lst=609 5 17s 1379682041 UNWTCP: exit TCP(3510248) call=NET_GETOSCHAN, chn=621, lst=609, result=NET_SUCCESS, err=0

The big moment, urouter is going to start our userver. Note that urouter assigns a server id (srvid or sid) to each userver it starts. For an exclusive server this is not so relevant but for shared servers this number is all-important.

9 15F 1379682041 svstart: starting server: user=emea\cwnl-chris; pgm=d:\uf\96\common\bin\userver.exe -srvid=1 -dnp=TCP:+13001||DA0210E1-A6DC-44C5-89E1-58627D9ABCAF| -drv=ANY -ust=userver -chn=620 -ex /adm=d:\uf\96\common\adm -dir=D:\uf\96 1 1S 1379682041 useCreatePAU: Inheriting handle=620

This is a very important line, the one you will always be looking for first. Here, control is effectively passed to the operating system to userver process. It is obviously a point where many things can go wrong. If so, an error message will usually be reported immediately. Note that userver is passed a handle to the open connection with the client (chn=620). This is known as connection inheritance. The userver will use this same channel (after urouter has closed its copy of it) to talk directly with the client.

In this example there is no error so urouter reports userver successfully started, and proceeds to wait for the userver reporting back – again a point where things can go wrong, if userver has some startup problem or aborts prematurely. Note the process id (pid), that may come in handy when you go looking for the userver process or maybe a log- or tracefile that contains the pid in its name.

9 16F 1379682041 svstart: Succesfully launched server, new pid=672 5 4a 1379682041 handle_wait: wait for server sid=1; ust=userver 5 5a 1379682041 handle_wait: wait for client entry #1 in server queue

Next, we see the newly created userver establishing a network connection to urouter (it has just been started so it has none yet). In this stage, the userver is just another client to urouter, with a request that needs to be handled, so a new thread is created which will be terminated once the userver has successfully registered.

3 18s 1379682041 UNWTCP: TCPaccept: chn=665 got host=AMS090861D1.clients.emea.cpwr.corp on TCP4 5 19s 1379682041 UNWTCP: exit TCP(3512392) call=NETCONNECT, chn=665, lst=609, result=NET_SUCCESS, err=0 9 17F 1379682041 accepted new connection on TCP:+13001 5 20s 1379682041 UNWTCP: enter TCP(3524160) call=NETINSTANCE, chn=0, lst=609 5 21s 1379682041 UNWTCP: exit TCP(3524160) call=NETINSTANCE, chn=0, lst=609, result=NET_SUCCESS, err=0 5 22s 1379682041 UNWTCP: enter TCP(3524160) call=NETCONNECT, chn=0, lst=609 9 4Z 1379682041 thpsv: new thread received u=3523320, thp=7798664, net=3512392, upsv=0, rmth=0, tha=0 1 5Z 1379682041 thpsv: new thread active chn=665, cnt=4, lst=1, pmq=0, cc=1 5 23s 1379682041 UNWTCP: enter TCP(3512392) call=NETGET, chn=665, lst=609

The server registration comes in (SRVCON). Only now, urouter can be sure the userver is alive and kicking, and include it in its internal administration.

5 24s 1379682041 UNWTCP: exit TCP(3512392) call=NETGET, chn=665, lst=609, result=NET_SUCCESS, err=0 9 18F 1379682041 From Server:chn=665;len=219: SRVCON; 9 19F 1379682041 clt=(hst=172.16.43.135,AMS090861D1.clients.emea.cpwr.corp;pid=672;tid=7216;sid=1;usr=cwnl-chris;ust=userver) 9 20F 1379682041 log=(hst=TCP:AMS090861D1.clients.emea.cpwr.corp+13001;usr=cwnl-chris;ust=userver -drv=ANY -oschn=620;rid=DA0210E1-A6DC-44C5-89E1-58627D9ABCAF) 5 1c 1379682041 srvload: local server registering sid=1;rid=DA0210E1-A6DC-44C5-89E1-58627D9ABCAF 9 21F 1379682041 reguser: nid=172.16.43.135, node=AMS090861D1.clients.emea.cpwr.corp, pid=672, ust=userver 9 22F 1379682041 srvload: this is server sid=1

Urouter tells userver it has been successfully registered (CONANS) and can start communicating directly with the client.

9 23F 1379682041 To Server:chn=665;len=3: CONANS; continue:sid=1: 5 25s 1379682041 UNWTCP: enter TCP(3512392) call=NETPUT, chn=665, lst=609 5 26s 1379682041 UNWTCP: exit TCP(3512392) call=NETPUT, chn=665, lst=609, result=NET_SUCCESS, err=0

Following tracing indicates that the client and userver are now connected to each other.

5 6a 1379682041 notify_next_client: finding next client for sid=1 5 7a 1379682041 notify_next_client: client #1 in server queue can use sid=1 1 6Z 1379682041 thpsv: thread exit, cnt=3, lst=1, pmq=0, cc=0 9 24F 1379682041 handle_wait: Queued client #1 continues with sid=1, ust=userver 5 8a 1379682041 exclusive match reserved

Here, urouter closes its end of the inherited socket :

5 27s 1379682041 UNWTCP: enter TCP(3510248) call=NETCLOSE, chn=621, lst=609 5 28s 1379682041 UNWTCP: exit TCP(3510248) call=NETCLOSE, chn=621, lst=609, result=NET_SUCCESS, err=0

You may ask, why it shows chn=621 here, and not chn=620 as we saw above. It’s really the same thing. Internally, Uniface uses the actual handle plus one, and is not always consistent in which value to display.

Following lines look alarming – stopping server ? We’ve just started it ! But really, all this means is that the entries for this this userver, as well as its client, are being removed from urouter’s internal administration. The userver and client of course remain running.

9 25F 1379682041 Stopping server sid=1; shut=0 mode=normal 9 26F 1379682041 Reason for stop: Serv entry is given free 5 29s 1379682041 UNWTCP: enter TCP(3512392) call=NETDISCONNECT, chn=665, lst=609 5 30s 1379682041 UNWTCP: exit TCP(3512392) call=NETDISCONNECT, chn=0, lst=0, result=NET_SUCCESS, err=0 5 9a 1379682041 notify_next_client: finding next client for sid=1 5 10a 1379682041 notify_next_client: no clients found for sid=1

Lastly, urouter removes the thread it had created to handle the server registration.

1 7Z 1379682041 thpsv: thread exit, cnt=2, lst=1, pmq=0, cc=0

At this point, urouter effectively forgets all about what has just happened. Client and userver are connected and can do without the urouter. This also means that exclusive connections are not visible in the Router Monitor (URMON).

So, for problems between a client and an exclusive server, once they have connected to each other, the urouter log is not the place to look. We will get back to that.

Now, let’s see how a shared connection looks under the hood. The urouter log will be virtually the same until the moment when a client connects. The difference starts once urouter discovers what connection the client wants. Whereas with the exclusive connection we saw

9 11F 1379682041 From Client:chn=621;len=151: EXCLTCON; with a shared connection we now see

9 11F 1380822143 From Client:chn=621;len=147: CLTCON; 9 12F 1380822143 clt=(hst=172.16.43.135,AMS090861D1.clients.emea.cpwr.corp;pid=13288;tid=2884;sid=0;usr=cwnl-chris;ust=) 9 13F 1380822143 log=(hst=TCP:localhost+13001;usr=emea\cwnl-chris;ust=userver) 9 14F 1380822143 reguser: nid=172.16.43.135, node=AMS090861D1.clients.emea.cpwr.corp, pid=13288, ust=

and from here on things go a bit differently. Instead of starting a server right away, urouter ‘shakes hands’ with the client, then sends back a message asking what exactly it wants to be done.

9 15F 1380822143 To Client:chn=621;len=2: CONANS; continue: 5 16s 1380822143 UNWTCP: enter TCP(7704552) call=NETPUT, chn=621, lst=613 5 17s 1380822143 UNWTCP: exit TCP(7704552) call=NETPUT, chn=621, lst=613, result=NET_SUCCESS, err=0 5 18s 1380822143 UNWTCP: enter TCP(7704552) call=NETGET, chn=621, lst=613 5 19s 1380822143 UNWTCP: exit TCP(7704552) call=NETGET, chn=621, lst=613, result=NET_SUCCESS, err=0 9 16F 1380822143 From Client:chn=621;len=43: HANDSHAKE; pv=9:max=4096:ver=9.6~007F 9 17F 1380822143 To Client:chn=621;len=46: HANDSHAKE; pv=9:max=4096:ver=9.6~007F 5 20s 1380822143 UNWTCP: enter TCP(7704552) call=NETPUT, chn=621, lst=613 5 21s 1380822143 UNWTCP: exit TCP(7704552) call=NETPUT, chn=621, lst=613, result=NET_SUCCESS, err=0 5 22s 1380822143 UNWTCP: enter TCP(7704552) call=NETGET, chn=621, lst=613 5 23s 1380822143 UNWTCP: exit TCP(7704552) call=NETGET, chn=621, lst=613, result=NET_SUCCESS, err=0

Note the handshaking details. The client and urouter exchange their major Uniface version (9.6) to be sure they are compatible. No handshaking is done between urouter and userver, as you might have expected. The urouter expects the userver to be the same version (which is reasonable, as it has started the userver).

The handshaking is more or less informational. A handshaking error is reported only when the client and server have a different major release. An exception here is for web requests. In that case, the client is the WRD which for some reason reports version 8.1 in the handshake.

Next, the message from the client is received. In this case it is a database request (DBREQ) caused by the client doing a retrieve :

9 18F 1380822143 From Client:chn=621;len=160: DBREQ; typ=D;av=I;op=I;mod=129;iop=255;ign=0; 9 19F 1380822143 hop=0;dbg=0;pid=13288;tid=2884;qid=0;ins=0; 1 1a 1380822143 claimsrv: want sid=0 usr=cwnl-chris ust=userver mnem=ANY ex=0 strt=1 5 2a 1380822143 prepare_to_wait: for sid=1; ust=userver 5 3a 1380822143 prepare_to_wait: Queued client entry #1 in server queue

The “claimsrv” line above is important. The item want sid=0 signals that this client request is not for a specific userver. This client evidently has no transaction, state, or instances open in an userver, or else it would ask to be served by that specific userver (want sid=N). Instead, the client specifies that this request can be served by any userver that is available (i.e. in Idle state). This is typically what we would expect to see in a web application using stateless requests. Urouter, seeing that no uservers are running yet, decides that this request will be handled by the userver with sid=1, and queues it in anticipation of the userver becoming available.

Next we see our userver being started:

9 20F 1380822143 svstart: starting server: user=emea\cwnl-chris; pgm=d:\uf\96\common\bin\userver.exe -srvid=1 -dnp=TCP:+13001||CD24AB7D-470A-43F8-9FA1-EFE98719820E| -drv=ANY -ust=userver /adm=d:\uf\96\common\adm -dir=D:\uf\96 9 21F 1380822143 svstart: Succesfully launched server, new pid=4040 5 4a 1380822143 handle_wait: wait for server sid=1; ust=userver 5 5a 1380822143 handle_wait: wait for client entry #1 in server queue

This is almost the same line as we saw for an exclusive userver, except for the absence of the –ex flag (obviously) and the –chn=NNN argument (this userver will communicate with urouter, not with the client). Now urouter will wait for the userver to connect back, which we see happening here, followed by the creating of a new thread and some housekeeping:

3 24s 1380822143 UNWTCP: TCPaccept: chn=665 got host=AMS090861D1.clients.emea.cpwr.corp on TCP4 5 25s 1380822143 UNWTCP: exit TCP(7706152) call=NETCONNECT, chn=665, lst=613, result=NET_SUCCESS, err=0 9 22F 1380822143 accepted new connection on TCP:+13001 5 26s 1380822143 UNWTCP: enter TCP(7720504) call=NETINSTANCE, chn=0, lst=613 5 27s 1380822143 UNWTCP: exit TCP(7720504) call=NETINSTANCE, chn=0, lst=613, result=NET_SUCCESS, err=0 5 28s 1380822143 UNWTCP: enter TCP(7720504) call=NETCONNECT, chn=0, lst=613 9 4Z 1380822143 thpsv: new thread received u=7719120, thp=7576760, net=7706152, upsv=0, rmth=0, tha=0 1 5Z 1380822143 thpsv: new thread active chn=665, cnt=4, lst=1, pmq=0, cc=1 5 29s 1380822143 UNWTCP: enter TCP(7706152) call=NETGET, chn=665, lst=613 5 30s 1380822143 UNWTCP: exit TCP(7706152) call=NETGET, chn=665, lst=613, result=NET_SUCCESS, err=0

Next we see the message from the userver coming in, asking to be registered as a shared server (SRVCON) :

9 23F 1380822143 From Server:chn=665;len=208: SRVCON; 9 24F 1380822143 clt=(hst=172.16.43.135,AMS090861D1.clients.emea.cpwr.corp;pid=4040;tid=11328;sid=1;usr=cwnl-chris;ust=userver) 9 25F 1380822143 log=(hst=TCP:AMS090861D1.clients.emea.cpwr.corp+13001;usr=cwnl-chris;ust=userver -drv=ANY;rid=CD24AB7D-470A-43F8-9FA1-EFE98719820E) 5 1c 1380822143 srvload: local server registering sid=1;rid=CD24AB7D-470A-43F8-9FA1-EFE98719820E 9 26F 1380822143 reguser: nid=172.16.43.135, node=AMS090861D1.clients.emea.cpwr.corp, pid=4040, ust=userver 9 27F 1380822143 srvload: this is server sid=1

The server id (sid=1) is the id for this server. These ID’s are handed out sequentially by urouter and are the main keys in the urouter’s administration. The registration being done, urouter sends a confirmation answer (CONANS) back to the userver:

9 28F 1380822143 To Server:chn=665;len=3: CONANS; continue:sid=1: 5 31s 1380822143 UNWTCP: enter TCP(7706152) call=NETPUT, chn=665, lst=613 5 32s 1380822143 UNWTCP: exit TCP(7706152) call=NETPUT, chn=665, lst=613, result=NET_SUCCESS, err=0

and searches its administration for a client that can be served by this userver (this will typically be the client that just posted the request) :

5 6a 1380822143 notify_next_client: finding next client for sid=1 5 7a 1380822143 notify_next_client: client #1 in server queue can use sid=1 1 6Z 1380822143 thpsv: thread exit, cnt=3, lst=1, pmq=0, cc=0 9 29F 1380822143 handle_wait: Queued client #1 continues with sid=1, ust=userver 5 8a 1380822143 capable match reserved

Having decided that queued client 1 and userver 1 are a match, urouter forwards the client’s database request (DBREQ) to the userver:

9 30F 1380822143 To Server:chn=665;len=160: DBREQ; typ=D;av=I;op=I;mod=129;iop=255;ign=0; 9 31F 1380822143 hop=0;dbg=0;pid=13288;tid=2884;qid=0;ins=0; 5 33s 1380822143 UNWTCP: enter TCP(7706152) call=NETPUT, chn=665, lst=613 5 34s 1380822143 UNWTCP: exit TCP(7706152) call=NETPUT, chn=665, lst=613, result=NET_SUCCESS, err=0 5 35s 1380822143 UNWTCP: enter TCP(7706152) call=NETGET, chn=665, lst=613 5 36s 1380822143 UNWTCP: exit TCP(7706152) call=NETGET, chn=665, lst=613, result=NET_SUCCESS, err=0

You need to realize that a simple retrieve done by the client will result in a number of database requests being sent to the database driver via the urouter (Logon, Open Table, and Select/Fetch). The one you see here is actually the Logon request (never mind all those letter codes….) and the answer from the server promptly follows and is passed back to the client:

9 32F 1380822143 From Server:chn=665;len=118: ANSWER; typ=Z;av=I;op=M;ret=0,0; 9 33F 1380822143 hop=0;dbg=0;pid=13288;tid=2884;qid=0;ins=0; 9 34F 1380822143 To Client:chn=621;len=118: ANSWER; typ=Z;av=I;op=M;ret=0,0; 9 35F 1380822143 hop=0;dbg=0;pid=13288;tid=2884;qid=0;ins=0; 5 37s 1380822143 UNWTCP: enter TCP(7704552) call=NETPUT, chn=621, lst=613 5 38s 1380822143 UNWTCP: exit TCP(7704552) call=NETPUT, chn=621, lst=613, result=NET_SUCCESS, err=0 5 39s 1380822143 UNWTCP: enter TCP(7706152) call=NETGET, chn=665, lst=613 5 40s 1380822143 UNWTCP: exit TCP(7706152) call=NETGET, chn=665, lst=613, result=NET_SUCCESS, err=0 9 36F 1380822143 From Server:chn=665;len=1150: ANSWER; typ=Z;av=I;op=Z;ret=0,0; 9 37F 1380822143 hop=0;dbg=0;pid=13288;tid=2884;qid=0;ins=0; 9 38F 1380822143 To Client:chn=621;len=1150: ANSWER; typ=Z;av=I;op=Z;ret=0,0; 9 39F 1380822143 hop=0;dbg=0;pid=13288;tid=2884;qid=0;ins=0; 5 41s 1380822143 UNWTCP: enter TCP(7704552) call=NETPUT, chn=621, lst=613 5 42s 1380822143 UNWTCP: exit TCP(7704552) call=NETPUT, chn=621, lst=613, result=NET_SUCCESS, err=0 9 40F 1380822143 sid=1 ready: not locked and no state 5 9a 1380822143 notify_next_client: finding next client for sid=1 5 10a 1380822143 notify_next_client: no clients found for sid=1 5 43s 1380822143 UNWTCP: enter TCP(7704552) call=NETGET, chn=621, lst=613 5 44s 1380822143 UNWTCP: exit TCP(7704552) call=NETGET, chn=621, lst=613, result=NET_SUCCESS, err=0

Note that there are actually TWO responses from the server, both passed back to the client. One of these is the server’s messageframe information, being send back to the client because we specified $ioprint=255. Without the use of $ioprint, you’ll see only one answer here. Depending on the level of ioprint, there can be more server responses. Note that after completely processing this request, the urouter reports the state of this userver :

sid=1 ready: not locked and no state

and will check if there are any other pending requests in the queue for this specific userver. There are none at this moment, so urouter goes and listens for the next client request. We see a similar exchange of data for the two other driver requests (Open Table and Select/Fetch) which we’ll not include here, it’s just more of the same.

Finally, when this client exits, we see urouter disconnecting the client (though NOT the userver) from its administration, and terminating the thread that was created for this client:

9 103F 1380822145 From Client:chn=621;len=36: CLSNETREQ; typ=X;av=I;op=Z;mod=0;iop=255;ign=0; 9 104F 1380822145 hop=0;dbg=0;pid=13288;tid=2884;qid=0;ins=0; 9 105F 1380822145 To Client:chn=621;len=32: ANSWER; typ=Z;av=X;op=Z;ret=0,0; 9 106F 1380822145 hop=0;dbg=0;pid=13288;tid=2884;qid=0;ins=0; 5 105s 1380822145 UNWTCP: enter TCP(7704552) call=NETPUT, chn=621, lst=613 5 106s 1380822145 UNWTCP: exit TCP(7704552) call=NETPUT, chn=621, lst=613, result=NET_SUCCESS, err=0 5 107s 1380822145 UNWTCP: enter TCP(7704552) call=NETGET, chn=621, lst=613 5 108s 1380822145 UNWTCP: exit TCP(7704552) call=NETGET, chn=0, lst=0, result=Ignoring error, err=-12 9 107F 1380822145 client gone, searching for servers to stop for client: [(AMS090861D1.clients.emea.cpwr.corp/172.16.43.135) cltpid=13288] 5 109s 1380822145 UNWTCP: enter TCP(7704552) call=NETDISCONNECT, chn=0, lst=0 5 110s 1380822145 UNWTCP: exit TCP(7704552) call=NETDISCONNECT, chn=0, lst=0, result=NET_SUCCESS, err=0 7Z 1380822145 thpsv: thread exit, cnt=2, lst=1, pmq=0, cc=0

This concludes our tour of the urouter log, in situations where everything goes well. This knowledge will come in handy when things don’t go well.

You may have been wondering what the numbers mean at the beginning of each line, like for example

5 108s 1380822145 UNWTCP: exit TCP(7704552) call=NETGET, chn=0, lst=0, result=Ignoring error, err=-12

The first two are the level, sequence number and category of the message. These are of practical use to Compuware Technical Support only. The large number is the timestamp, in the standard format of seconds since the Epoch (i.e. since jan. 1, 1970, 00:00:00, or on Windows since dec.30, 1899, 00:00:00). It was displayed in this format because I mistakenly used this setting for the tracing:

$trc_info cat,lvl,dtm

where dtm means “datetime” (number of seconds since the Epoch), instead of the recommended

$trc_info cat,lvl,dtt

where dtt means “delta time and thread id” (elapsed time since the start of the log). Then the lines would have looked like this

9 F 0:00.664.41 t=1: URouter started at 24-oct-2013 14:16:41

The delta time is in the following format:

minutes:seconds.milliseconds.microseconds

Both formats make it hard to calculate the absolute datetime for a specific line, even though the starting time of the log is printed near the top:

9 F 0:00.664.41 t=1: URouter started at 24-oct-2013 14:16:41

To make life easier, we also print the absolute datetime whenever a real error is logged, e.g.

9 F 0:16.313.74 t=3: [Thu Oct 24 14:43:59 2013] err=-25: thpsv: Problems handling request

4 - Server-side troubleshooting

So, we have been able to verify that we’ve done everything right on the client, and we know what to expect from urouter when all goes well. Yet, there will usually be some problem that we need to find. Let’s examine the checks that can be done, using the one-step-at-the time approach, and the tools to use. For Windows, the tools were already mentioned. On Unix/Linux, the most important tool to know about is truss (which is called strace on Linux). As a rule, these programs come with the operating system. If they are not there, ask your system administrator to install them. We’ll talk more about truss/strace later.

The first check is whether the urouter process is indeed running. On Windows, use Task Manager or Process Explorer. On Unix, the ps command. If it is running, shut it down because we are going to start from scratch. From the urouter shortcut, service definition, script, or whatever it is that starts the urouter, get the name of the assignment file. This will usually be uniface/adm/urouter.asn but this could have been overruled at the command line etc. It is important to be 100% sure about what assignment file(s) is/are being used. If you don’t know exactly, find it using Process Monitor (on Windows) or truss/strace on Unix/Linux. An example of doing this on Linux with strace:

$ strace -o strace.log common/bin/urouter ^C $ grep '.asn' strace.log open("/h/chris/uf/96/lia/uniface/adm/usys.asn", O_RDONLY) = 3 open("/h/chris/uf/96/lia/common/adm/usys.asn", O_RDONLY) = 4 open("urouter.asn", O_RDONLY) = -1 ENOENT (No such file or directory) open("urouter.asn", O_RDONLY) = -1 ENOENT (No such file or directory) open("/h/chris/uf/96/lia/uniface/adm/urouter.asn", O_RDONLY) = 3

This tells you exactly what asn-files are being opened, and which are tried but were not found (the urouter.asn in the working directory). On Windows, you get similar information from Process Monitor if you filter on “Path ends with .asn”.

If you THINK you know what assignment files urouter uses, and can’t or don’t want to run a tool to verify it, then insert a syntax error in it, and verify that urouter now refuses to start. Unfortunately it will not give a message of it, like a client application would.

Although urouter does open and read uniface/adm/usys.asn, like any Uniface process does, there is usually nothing much here relevant to urouter. It is good practice to write the urouter assignment file so it is self-contained, and will not need assignments in usys.asn. A typical urouter assignment file is simple and small, e.g.

[SETTINGS] $ioprint 255 $putmess_logfile urouter.log $default_net TCP:+13001||| [SERVERS] userver /h/chris/uf/96/lia/common/bin/userver /dir=/h/chris/uf/96/lia

As a rule, you don’t need anything else besides the [SETTINGS] and [SERVERS] sections. The above is usually enough for a first successful test.

Note that there are different places where urouter’s port number can be defined. In the asn, on the command line, or in /etc/services. I find it most useful to keep this information in the asn-file:

$default_net TCP:+13001|||

So it can be easily changed and you don’t need to look in different places. When you have started the urouter , verify in the tracefile that urouter is indeed using this port number by looking for this line :

started thread to listen to TCP:+13001

and then use netstat to check if it is indeed listening:

$ netstat -a|grep 13001 tcp 0 0 *:13001 *:* LISTEN

What if urouter does not start ? There could be several reasons for this:

1) Errors with loading dll’s or shared libraries (e.g. LD_LIBRARY_PATH not set, or on Windows the Uniface bin folder is not in PATH).

Tools like Process Monitor, Dependency Walker, truss/strace will usually show what is wrong.

2) Invalid image type (e.g. trying to start a LIA executable on LIB).

Make sure you have installed the correct platform.

3) Assignment statement errors. No message is given of this (unlike in a client)

Check your urouter/userver assignment files for syntax errors by using them with IDF, e.g.

$ $idf /asn=uniface/adm/urouter.asn 8008 - Assignment error: '[SETINGS]' in uniface/adm/urouter.asn:2

4) Logfile in use. No message is given of this. If you need to run multiple instances of urouter, make sure the logfile names specified in the urouter asn are unique by using one or more of the special tokens in the file name:

Token expanded to ----- ----------- %p process id %u username %t timestamp %h hostname

5) Port in use (another urouter already running on the same port number). No message is given of this but you can find this error in the urouter log:

1 s UNWTCP: TCP6create : bind(): chn=6 hst=0.0.0.0 failed ret=98 Address already in use 8 s UNWTCP: TCPclose: chn=6 success 1 s UNWTCP: TCP6create : failed ret=98 1 s UNWTCP: exit TCP(506652288) call=NETCREATE_SHARED, chn=0, lst=0, result=NETERR_UNKNOWN, err=98 9 F can't create listen channel at TCP:+13001 5 s UNWTCP: enter TCP(506652288) call=NETMSG, chn=0, lst=0 5 s UNWTCP: exit TCP(506652288) call=NETMSG, chn=0, lst=0, result=NET_SUCCESS, err=0 9 F TCP (98) TCP error [98]: Address already in use

This is a good moment to talk about the main troubleshooting tool on Unix : truss (an acronym for Trace Utility for System Calls and Signals). On Linux, this program is called strace. On older versions of HP-UX, it used to be called tusc, but current versions have truss also. On machines truss/strace is available by default, but if it isn’t, and you have to troubleshoot, insist on getting it installed. It’s the first I turn to when troubleshooting any file related problem on Unix. From here in we use the name truss to denote either strace or truss. The programs are largely the same but for specific options you always need to consult the local man page.

You can use truss on one of two ways:

1) Start a program under truss, using the full command line as argument. For example

$ truss common/bin/urouter /pri=255

2) Hook truss up to an already running process, using the process id (pid) as argument. For example

$ truss –p pid

In this most basic scenario, truss outputs on the screen all Unix system calls made by the process, with their argument and return values, and signals. Be prepared for a lot of output, even the simplest one-liner C program can already produce a page or more of output. There are many command line parameters to control what you want to see and what you don’t. For example, if you are interested only in seen which files are accessed, you can add the –topen argument. For example:

$ truss -topen,write $idf /who

produces output like this:

open("/var/ld/64/ld.config", O_RDONLY) = 3 open("/usr/lib/64/libc.so.1", O_RDONLY) = 3 open("/usr/lib/64/libdl.so.1", O_RDONLY) = 3 open("/usr/platform/SUNW,Sun-Fire-V490/lib/sparcv9/libc_psr.so.1", O_RDONLY) = 3 open("/.machine", O_RDONLY) Err#2 ENOENT open("/var/ld/64/ld.config", O_RDONLY) = 3 open("/usr/lib/secure/64/s9_preload.so.1", O_RDONLY) = 3 open("/lib/sparcv9/libCrun.so.1", O_RDONLY) = 3 open("/lib/sparcv9/libm.so.1", O_RDONLY) = 3 open("/lib/sparcv9/libCstd.so.1", O_RDONLY) = 3 open("/lib/sparcv9/libc.so.1", O_RDONLY) = 3 open("/h/chris/uf/96/so9/common/lib/libucall.so", O_RDONLY) = 3 open("/lib/sparcv9/libnsl.so.1", O_RDONLY) = 3 open("/lib/sparcv9/libsocket.so.1", O_RDONLY) = 3 open("/lib/sparcv9/libdl.so.1", O_RDONLY) = 3 open("/lib/sparcv9/librt.so.1", O_RDONLY) = 3 open("/lib/sparcv9/libmalloc.so.1", O_RDONLY) = 3 open("/lib/sparcv9/libthread.so.1", O_RDONLY) = 3 open("/usr/lib/64/libmp.so.2", O_RDONLY) = 3 open("/usr/lib/64/libaio.so.1", O_RDONLY) = 3 open("/usr/lib/64/libmd5.so.1", O_RDONLY) = 3 open("/usr/platform/SUNW,Sun-Fire-V490/lib/sparcv9/libc_psr.so.1", O_RDONLY) = 3 open("/h/chris/uf/96/so9/common/lib/libulib.so", O_RDONLY) = 3 open("/h/chris/uf/96/so9/common/lib/libuenc.so", O_RDONLY) = 3 open("/h/chris/uf/96/so9/common/lib/libdlm64.so", O_RDONLY) = 3 open("/h/chris/uf/96/so9/common/lib/liburtl.so", O_RDONLY) = 3 open("/var/run/name_service_door", O_RDONLY) = 3 open("/h/chris/uf/96/so9/common/adm/usys.ini", O_RDONLY) Err#2 ENOENT open("/h/chris/uf/96/so9/common/adm/usys.ini", O_RDONLY) Err#2 ENOENT open("/H/CHRIS/UF/96/SO9/COMMON/ADM/USYS.INI", O_RDONLY) Err#2 ENOENT open("/h/chris/uf/96/so9/uniface/adm/usys.asn", O_RDONLY) = 4 open("/h/chris/uf/96/so9/common/adm/usys.asn", O_RDONLY) = 5 open("idf.asn", O_RDONLY) = 4 open("/usr/share/lib/zoneinfo/MET", O_RDONLY) = 4 open("/h/chris/uf/96/so9/common/adm/usys.ini", O_RDONLY) Err#2 ENOENT open("/h/chris/uf/96/so9/common/adm/usys.ini", O_RDONLY) Err#2 ENOENT open("/H/CHRIS/UF/96/SO9/COMMON/ADM/USYS.INI", O_RDONLY) Err#2 ENOENT open("/h/chris/uf/96/so9/common/usys/usys.urr", O_RDONLY) = 5 open("/h/chris/uf/96/so9/common/usys/udesc.urr", O_RDONLY) = 5 open("/h/chris/uf/96/so9/common/usys/uobj.dol", O_RDONLY) Err#2 ENOENT open("/h/chris/uf/96/so9/common/usys/uobj.dol", O_RDONLY) Err#2 ENOENT open("/H/CHRIS/UF/96/SO9/COMMON/USYS/UOBJ.DOL", O_RDONLY) Err#2 ENOENT open("/h/chris/uf/96/so9/common/usys/usys.dol", O_RDONLY) = 5

I find truss most useful to find errors on opening files : what files does the program try to open, where does it look for them, and what is the result. In particular, the loading of shared libraries is interesting because it is the only way to make sure from which directory in the LD_LIBRARY_PATH is actually loaded. For this purpose you need to look at the open and stat system calls. For example if you do

$ truss -o truss.log -topen,stat common/bin/urouter

you can see in the output only the calls used to locate and open files:

stat("/h/chris/uf/96/so9/common/lib/libsocket.so.1", 0xFFFFFFFF7FFFE2F0) Err#2 ENOENT stat("/h/chris/uf/dlm41/Linux/64/libsocket.so.1", 0xFFFFFFFF7FFFE2F0) Err#2 ENOENT stat("/local/products/dbms/oracle1020/lib/libsocket.so.1", 0xFFFFFFFF7FFFE2F0) Err#2 ENOENT stat("/local/products/dbms/oracle1020/rdbms/lib/libsocket.so.1", 0xFFFFFFFF7FFFE2F0) Err#2 ENOENT stat("/cwnl/solaris/compilers/cc57CC57/SUNWspro/lib/rw7/v9/libsocket.so.1", 0xFFFFFFFF7FFFE2F0) Err#2 ENOENT stat("/cwnl/solaris/compilers/cc57CC57/SUNWspro/lib/v9/libsocket.so.1", 0xFFFFFFFF7FFFE2F0) Err#2 ENOENT stat("/opt/SUNWspro/lib/v9/libsocket.so.1", 0xFFFFFFFF7FFFE2F0) Err#2 ENOENT stat("/usr/ccs/lib/sparcv9/libsocket.so.1", 0xFFFFFFFF7FFFE2F0) Err#2 ENOENT stat("/lib/sparcv9/libsocket.so.1", 0xFFFFFFFF7FFFE2F0) = 0 open("/lib/sparcv9/libsocket.so.1", O_RDONLY) = 3

showing the multiple locations where the system tries to find a shared library. The above is not exceptional but sometimes the list of failed attempts gets really long which is of course not good for performance. It is good practice to review your LD_LIBRARY_PATH (or LD_LIBRARY_PATH64, or LIBPATH, or SHLIB_PATH, depending on your platform) and make sure there are no unwanted, non-existing or duplicate directories, and that the most used directories (like the Uniface lib directory) are not at the very bottom of a long list.

Some truss flags you need to know about:

-o file

Directs the output to a file, which is usually preferable over getting tons of stuff on your screen. Truss does not work well with I/O redirection.

-t func1,func2,… (truss) -e trace=func1,func2,… (strace)

Directs truss only to trace the specified functions.

-t !func3,func4,… (truss) -e trace=!func3,func4,… (strace)

Directs truss NOT to trace the specified functions.

-t !nanosleep (truss) -e trace=!nanosleep (strace)

Prevents the output from filling up with nanosleep() calls. Calling this function is what urouter does when it has nothing better to do. You’ll want to use this especially when you hook up to a running urouter.

-w 2

Trace full I/O buffers for file descriptor 2 (this is the Unix standard error channel, stderr). Very handy to see system-level error messages that would otherwise be lost. We’ll see an example of this later on. Other file descriptors could be used here too, obviously, like 1 for standard output, stdout.

-f

Follow child processes. Use this when you also want to trace all executables spawned by your program. When tracing urouter, you generally do not want to trace all the uservers. When tracing an userver, you generally DO want to trace spawned program. However be aware that this option can lead to enormous amounts of output, especially when shell scripts are being executed.

Back to the troubleshooting trip. Having ascertained that we can reach port 13000 on the server, and that urouter is indeed running on that port, it should be possible to make a connection. If you start the client, a line like this should appear in the urouter log

9 10F 1379682041 accepted new connection on TCP:+13001

and in netstat, you should now see an additional connection with status ESTABLISHED:

$ netstat -a | grep 13001 *.13001 *.* 0 0 49152 0 LISTEN cwnl-violet.13001 AMS090861D1.clients.emea.cpwr.corp.63508 65024 0 49640 0 ESTABLISHED

The urouter now receives the network logon path provided by the client. As an example, suppose the client logon path is

$psv = tcp:violet+13001|chris|urouter2013|prod001 –ex

which means the client requested to start an exclusive userver on the machine violet under the account chris with password urouter2013 and with UST prod001. The urouter must translate the UST prod001 into an actual userver command line that can be started. This information is found in the urouter’s assignment file.

The UST definition.

The urouter assignment file contains the parameters for each known userver (known as the UST definition) in the [SERVERS] section, e.g.

[SERVERS] prod001 /h/chris/uf/94/so9/common/bin/userver /dir=/home/prod001 /pri=255

In the above example the UST is prod001, which is what the client requested in the UST part of the network logon path. The definition for this UST is thus

/h/chris/uf/94/so9/common/bin/userver /dir=/home/prod001 /pri=255 /max=1

This looks like a full userver command line but it isn’t – not yet, anyway. Certain parts, like /dir and /max, will not be passed to userver but processed by urouter before starting the userver. Urouter will add command line arguments of its own to the userver command. It is important to realize that there is NO SYNTAX CHECKING here. Everything that is not recognized by urouter is passed verbatim to the userver. That could be command line switches (starting with / or - ) or program arguments. They will be put in the correct order by urouter, i.e. switches first, arguments last). The full userver command line can be found in the urouter log, as we have already seen, in the line starting with svstart: The full command line for the above UST will be

9 F 0:03.905.09 t=3: svstart: starting server: user=chris; pgm=/h/chris/uf/94/so9/common/bin/userver/h/chris/uf/94/so9/common/bin/userver -srvid=1 -pri=255 -dnp=TCP:+13001||0AF1C028-3B14-11E3-B6E9-A25E8C4312FA| -drv=ANY -ust=userver -chn=8 -ex -dir=/h/chris/uf/94/so9

The /dir switch needs special mention. When specified, urouter attempts to make this the working directory for the userver process. However there is NO ERROR CHECKING. When it fails, because the directory does not exist, or is misspelled, or has no permission, urouter will proceed to start the userver in the currecnt directory. Additionally, it also passes the /dir command to the userver (this is a bug…) who will silently exit because it cannot set the directory.

TIP – A nice undocumented feature, available as from R122, E102 and 9.6.01, is the possibility to run the userver under a debugger-like program without the need for a

wrapper script or program. This is a great way to diagnose problems encountered during userver startup. This can be done in the UST definition by prepending the debugger command and a + sign before the name of the userver. The debugger program in question will usually be truss (on Unix) or Dependency Walker (in Windows). For example:

userver = /usr/bin/truss -f -o /tmp/truss.log + /uf/bin/userver /dir=...

or

userver = c:\tools\depends.exe /pb /od:c:\userver.dwi + D:\uf\bin\userver.exe /dir=...

The plus sign is essential to avoid urouter mashing up the command line and get confused about which switches belong to which program.

Please note that full path names must be used for truss and depends.exe. In this stage, there isn’t a command interpreter that will go and look for a command in a list of locations. It has to be just right. When in doubt where a command is located, use the command which (on Unix) or where (on Windows). For example:

$ which truss /usr/bin/truss

or

C:\Users\cwnl-chris>where depends D:\Tools\depends.exe

Another trick here is have the UST refer to a wrapper script instead of the userver executable :

userver = /uf/bin/userver.sh /dir=...

which gives you complete flexibility about what to do before the userver is launched. In its most basic form, the wrapper script looks like this

#!/bin/ksh . /h/chris/uf/94/so9/common/adm/insunis /h/chris/uf/94/so9/common/bin/userver $*

simply passing all arguments on to the userver. The nice thing is that here you have an opportunity to

1) Set any environment variables the userver needs (think of ORACLE_HOME, ORACLE_SID, DSQUERY, etc.) 2) Do initializing, logging, etc 3) Check userver exit code and handle possible core dumps 4) Use truss if your version is older than R122 or E102 so you can’t use the above trick.

This is described in more detail in this article http://frontline.compuware.com/products/uf/tech/22752.aspx (note: the exec command used in this article is no longer needed in recent versions). Don’t forget to set the executable bit on the script. I have never attempted a wrapper script on Windows and don’t know if it is possible.

Having parsed the UST definition, urouter will handle the security aspects of the request. It is important to understand what exactly is about to happen now. The keywords are authentication and impersonation.

Authentication

First of all the username and password provided by the client must be checked for validity on the target system. This validation consists of 3 checks:

• Username and password are both required. Neither of them can be empty. • The user must exist (i.e. has an active account) on the target system. • The password for this username must be valid.

If one of these checks fails, urouter rejects the connection request and returns error -21 (Authentication error), causing the client to report a logon failure:

Client connect error: -21: cretpsv: Authentication of user/password failed for user foobar 8061 - Network error detected ( (0)). Logon (TCP:violet+13001|foobar|***|userver -ex) failed with status -21, Network logon error

Don’t be worried about the message “Network error detected”. This is Uniface speak for “I could not get connected to the userver”. If there really is a network error you would see the details of it.

It is worth noting that in most cases, authentication does not requires root rights. But in some circumstances it does, e.g. on a Unix system that uses shadow passwords, because /etc/shadow is readable only by root.

On Windows, processes require a set of User Rights. Discussing these is outside the scope of this presentation. A Uniface installation sets the correct rights for urouter and userver. If however Windows still reports “ A required privilege is not held by the client” when trying to start a urouter or userver, it is hard (or maybe impossible) to find out what specific user privilege it is missing. When in doubt it usually helps to assign these rights to all users involved:

• Act as part of the Operating System • Create a token object • Log on as a batch job • Log on as a service • Replace a process level token

Impersonation

Before starting an userver, the urouter must create a subprocess with the user credentials of the specified user. In Unix terms, it needs to change the userid. For obvious reasons, this action always requires root permission. However, if the desired userid is equal to the current userid, there is no need to change it, and urouter will skip this action. In this case the urouter does not need to run as root, but it means that you can can use only one user for all your uservers, i.e. the user as which the urouter is running.

Things work slightly different on Windows. Here, authentication and impersonation are not separate actions, but are both implemented by the CreateProcessAsUser() function. This function is called regardless of whether the urouter and userver have the same user id. Typically on Windows, urouter will be running as a service user the system account NT_AUTHORITY\SYSTEM.

Uniface provides a mechanism to implement custom authentication, called a Security Driver. The idea is that customers can develop their own 3GL code in C to make urouter perform whatever authentication they desire. This C code must use the macros defined in the include file uniface\3gl\include\zsecint.h and the code must export the function usecappl() which can be called by urouter. Because this is implemented by a C call-out, a security driver must be included in the [USER_3GL] section of the urouter’s assignment file:

[user_3gl] mysecdriver(usecappl)

Note that it is ONLY the authentication you can customize here, not the impersonation. For this reason security drivers are mainly used on Unix. On Windows, a security driver could be used to impose additional authentication for an userver. This step is then performed before calling CreateProcessAsUser(), which will in turn still do the standard Windows authentication. As far as I know, no security drivers are being deployed on other platforms than Unix.

Besides custom authentication, a security driver can also optionally implement encryption of network logon strings and/or postmessage headers. The sample security driver provided with Uniface does just this, and can be activated in the asn like this

[user_3gl] zsecdrv(usecappl)

Note that there is however no way in Uniface to encrypt all data traffic passing from client to userver. Such functionality is not currently on the roadmap.

• A security driver that does only authentication needs to be included in the urouter assignment only. • A security driver that does encryption of logon strings and/or message headers must be included in ALL assignment files (client, urouter and userver).

In practice, few if any customers write their own security driver, but over the years a couple of security drivers have been provided by Technical Support to various customers:

NAME PURPOSE hpux Support for shadow passwords on HP-UX and Trusted HP-UX (both are not handled by the default authentication) dummy

A security driver that does NO authentication, i.e. does not check the password (but SOME bogus password must be supplied because it is mandatory). Note that the user must still be a valid user, because it will be used in the impersonate step.

pam On Solaris, authenticates using the PAM (Pluggable Authentication Module). By default, Uniface only uses the PAM on Linux. upass

A combination of a security driver and setuid-root program that allows urouter to run as a normal user on a system using shadow passwords. Recall that validation a shadow password requires root access. Some customers object to an urouter being root-owned. With this implementation, the actual validation is done by executing the setuid-root program upass. Available from ftp://ftp.compuware.com/pub/uniface/outgoing/cbr/upass-3.06.tar

While the sample code and include file of the security driver look rather complicated, the actual code can be very succinct. For example the source code of the dummy security driver reads like this, nicely illustrating the concept:

#include "zsecdrv.h" /* Include header from uniface/3gl/include */ long usecappl (USecDrv *Sec) /* Entry point to be called by urouter */ { if ( Sec->Function == USEC_DRVINFO ) /* Urouter to secdriver: What functions do you implement ? */ { USecSetUserPassVal(Sec); /* Secdriver to urouter: I do user/password validation only */ } return USEC_SUCCESS; /* Actual validation simply returns OK always */ }

Setting the userver’s environment.

For an userver to work properly, it will typically need a bunch of environment variables – at least on Unix, less so on Windows. Think of the variables USYS etc. set by insunis, the variables needed to access a database like ORACLE_SID or DSQUERY, and the variables needed to locate shared libraries (LD_LIBRARY_PATH and friends).

There are several ways an userver on Unix can obtain its necessary environment:

• By inheritance from its parent, the urouter. If you set all variables in a script or terminal session before starting urouter, all uservers get them too.

• By using the –su option in the UST definition. This will cause urouter to run su to execute the target user’s logon profile. This is however a complicated process and not all user profiles are suitable to be executed in a server process (because they may do something with the screen or keyboard). Also, the use of /su may be restricted on some systems. This is why I never recommend using /su but instead the 3rd option:

• By using a wrapper script as described earlier. This is an efficient and convenient way to make sure each userver gets exactly what it needs.

Starting the userver.

Having successfully authenticated the username and password, (on Unix) forked a subprocess with the target user’s credentials, and ( if necessary) having set the working directory and user environment, urouter is now ready to start the actual userver process. This is a point where a lot of things can go wrong, and it is the most problematic stage to troubleshoot, as we give control to the operating system and have to wait until an userver is up and running.

This first class of problems that can be encountered here are issues with the userver executable itself, preventing the OS from loading it. These include:

File not found

The file may have been moved, or you could have misspelled the name

Insufficient permission

The file does not have read and/or execute permission for the user in question

Wrong type of executable

Possibly the file is for another platform (e.g. LIA instead of LIB)

In all these cases, the urouter does not actually detect the problem. Instead it suggests that the userver has been successfully launched - but that only means, at this point, that a new process has been created which still needs to execute the userver. The failure to execute is not trapped, which I believe to be a bug, and instead the urouter times out waiting for the userver to respond. You find in the urouter log (some lines left out for brevity) :

9 F 4:17.968.17 t=3: svstart: starting server: user=chris; pgm=/this/is/a/bogus/path//userver -srvid=1 -dnp=TCP:+13001||BA3CE7A2-3AFF-11E3-95CF-BC93302E3648| -drv=ANY -ust=userver -chn=8 -ex -dir=/foo/bar 9 F 4:17.971.48 t=3: svstart: Succesfully launched server, new pid=18243 5 a 4:17.971.60 t=3: handle_wait: wait for server sid=1; ust=userver 9 F 5:20.726.65 t=1: clean_sweep: Server startup timed out after 63 seconds, sid=1 9 F 5:20.727.29 t=1: Stopping server sid=1; shut=0 mode=normal 9 F 5:20.728.26 t=3: [Tue Oct 22 11:58:16 2013] err=-25: getsrv: handle_wait wait failed

And the client only reports a -25 error:

8061 - Network error detected ( (0)). Logon (TCP:violet+13001|chris|***|userver -ex) failed with status -25, UServer unexpectedly gone

The client error text is “UServer unexpectedly gone is also a little misleading”, suggesting that there has been an userver process when in fact there never was one.

Problems like this are best tackled by running Process Monitor (on Windows) or running the userver under truss. That will reveal the actual problem. Or, you can use a wrapper script which picks up the standard output channels of userver (which by default get lost for a server).

The next class of problems are userver startup problems. That is, the OS has successfully started the userver process, but it exits more or less immediately (in any case before reporting back to urouter, and before being able to produce a log- or trace file) because of some initialization error. Some common problems :

1) Assignment statement error.

For example, the userver asn-file contains some unrecognized word, for example the first line is

[BLAAAAAAA]

This will also cause a server timeout and -25 error. As said earlier, userver does not log this kind of error anywhere, which is a bit of a pain. It is therefore good practice to sanity-check the userver assignment file. Two ways to do this:

• Run idf with the userver’s assignment file. You don’t expect idf to start, only report the error:

$ $idf /asn=common/adm/userver.asn 8008 - Assignment error: '[BLAAAAAAA]' in uniface/adm/userver.asn:1

• Run the userver under truss in a command window:

$ truss common/bin/userver . . . open("/h/chris/uf/94/so9/uniface/adm/userver.asn", O_RDONLY) = 4

read(4, " [ B L A", 4) = 4 read(4, " A A A A A A ]\n\n [ S E".., 1020) = 808 close(4) = 0 lseek(1, 0, SEEK_CUR) = 1723552 lseek(2, 0, SEEK_CUR) = 1723587 lseek(2, 0, SEEK_CUR) = 1723622 lseek(1, 0, SEEK_CUR) = 1723657 lseek(2, 0, SEEK_CUR) = 1723692 lseek(2, 0, SEEK_CUR) = 1723727 _exit(1)

If you see userver exiting immediately after reading some assignment line, you can be sure that line is wrong.

In earlier Uniface versions, you could also get an “Assignment statement error” when the log or trace file could not be written. In recent versions, this is no longer a fatal error. Userver will now create a file userverNNNNN.log in the current directory and write the error(s) in it, e.g.

ULOG Error: Failed to open log /foo/userver.log ULOG Error: Failed to write to /foo/userver.log

There are also assignment statement errors which are not detected at startup, only when the specific assignment is being used. Think for example of database connector parameters. These will be parsed and checked by the connector in question when the path is first accessed, and as a rule, a clear message is given in the Uniface message frame.

2) Failure to load a shared library.

Suppose someone has deleted the DLM installation directory. The userver will then not be able to start as it statically depends on libdlm64.so. In a terminal window this is easily detected by simply running userver from the command line:

$ common/bin/userver ld.so.1: userver: fatal: libdlm64.so: open failed: No such file or directory

In a server environment, this is harder to detect as the standard error channel is not preserved, nor does an error like this end up in the urouter log (and an userver log is not created because the process cannot load). Also, the environment may not be the same as in a terminal window. Best thing is to run userver under truss. Let’s use some custom truss flags:

[SERVERS] userver /usr/bin/truss -t write -w 2 -o truss.log + /h/chris/uf/94/so9/common/bin/userver

specifying to trace only write calls, and display the full I/O buffer for file descriptor 2 (stderr). This produces the output that nicely shows the problem:

write(2, 0xFFFFFFFF7F332790, 77) = 77 l d . s o . 1 : u s e r v e r : f a t a l : l i b d l m 6 4 . s o : o p e n f a i l e d : N o s u c h f i l e o r d i r e c t o r y\n

3) Userver is of wrong architecture or linkage. For example if you try to run Uniface 9.6 for Redhat EL Linux 6.x on Redhat EL 4.x you get the error

$ common/bin/userver common/bin/userver: error while loading shared libraries: requires glibc 2.5 or later dynamic linker

As you see this is easily diagnosed from the command line, but for good measure it can also be shown with truss, as in the previous example.

From here on, the userver should be able to produce a logfile, so that we no longer have to grope in the grey area between urouter and userver. You can still get network errors in the client, though, particularly if the userver dies prematurely, but at least you will now have some logging available.

Before moving on to all the problems that can still occur, some more tips.

Manage your shared library paths.

On Windows all executable files (.exe and .dll files) are found via one environment variable, PATH. Typically, any software installation puts its directories in this system-wide variable, so that programs will mostly pick up the correct files. On Unix this is different and more complex.

On Unix, separate environment variables are used. The PATH variable to locate commands, the LD_LIBRARY_PATH variable (LD_LIBRARY_PATH _64 on Solaris) to locate shared libraries. Furthermore on Unix, environment variables are local to the process that defines them, unless you export them, in which case they become visible to all child processes. A variable defined in a certain process is never visible in other processes except the defining process and its children.

This is why you need to execute the insunis script, which defines a lot of environment variables for Uniface, with the dot prefix:

$ . common/adm/insunis

so that it is executed by the current shell. If you forget the dot, the Unix shell will execute the commands in a subshell, and the variables will only be valid for that subshell, which exits immediately after. Upon returning to the current shell, they are gone and forgotten.

The variable exported in insunis that concerns us is LD_LIBRARY_PATH(_64). For example this one from Solaris:

LD_LIBRARY_PATH_64=$USYSLIB:/h/chris/uf/dlm41/SunOS/64:$LD_LIBRARY_PATH_64 ; export LD_LIBRARY_PATH_64

By default, this takes care of the Uniface and DLM libraries. As a rule, you will need to add the directories for any databases you use, e.g. the Oracle bin directory.

As seen earlier, long paths can lead to needlessly long search trips for a executable or shared library. It is good practice to keep your paths organized and free from duplicate, unused, or wrong directories. If the list is still long it can help performance to order the list according to usage, i.e. the most often used directories first.

On Solaris, you can also manage the locating and loading of shared libraries with the crle (configure runtime linking environment) command. It’s beyond the scope of this paper, but worth keeping in mind.

Pre-starting servers.

A good way to troubleshoot uservers in advance of testing the application is to use the [PRE_START] section in the urouter asn-file. Surprisingly, this is possible for exclusive as well as shared uservers. A pre-started exclusive userver is visible in the Urouter Monitor, showing the /ex flag, until such time as a client connects to it. At that moment, it disappears from view. To pre-start an userver, include the complete logon path (as specified on the client) in the [PRE_START] section, for example

[PRE_START] tcp:localhost+13001|chris|bla|userver

Urouter and DLM.

Urouter does not do anything with licensing, in the sense that it does not require a license file or server. It does not check out any features. However on Unix, you cannot start urouter unless DLM is installed and the DLM directory added in LD_LIBRARY_PATH. This is because urouter is statically dependent on libulib.so, which in turn is statically dependent on libdlm64.so. So if that dependency cannot be resolved, you may see an error like this when starting urouter:

$ urouter ld.so.1: urouter: fatal: libdlm64.so: open failed: No such file or directory Killed

6 - Troubleshooting runtime problems.

We have covered just about all situations that can prevent an userver from being started and going about its job. Some things can still go wrong in the early stage.

License error.

Unlike urouter, which does not use DLM, an userver will typically want to check out a license feature (unless it uses only the Sequential Driver, $SEQ, which is free of license). A license error is passed back to the client message frame, e.g.

Server: Using license option LM_LICENSE_FILE [email protected] Server: Checkout USRVORA: -1 Server: The licensed number of concurrent users has been reached; try again later. Borrowed :: The application that was requested is not licensed. [email protected] :: A connection could not be established between this client and the license server. [email protected] :: The licensed number of concurrent users has been reached; try again later. compulock :: The application that was requested is not licensed. Fatal error: 8011 - License not available.Server: Fatal error: 8011 - License not available.Server: 2013-10-25 10:23:24.88 - Uniface session stopped

Failure to load a shared library / Database environment not set up

We have already discussed shared library loading problems, but that was for libraries which were statically linked. Most shared libraries, in particular the database connectors, are loaded dynamically upon first use. As with static dependencies, this requires LD_LIBRARY_PATH to be set. If that is not the case, Uniface cannot load the database connector. This is not apparent on the client side:

Server: Using license option LM_LICENSE_FILE [email protected] Server: Feature USRVORA expires in 74 days Server: Checkout USRVORA: 1 [-2](_read) READ:2 [-2] done<end of module>

as it just reports error -2 (Occurrence not found). The userver trace file reveals the real problem:

9 8z SYS_I010: dlopen: ld.so.1: userver: fatal: uora62.so: open failed: No such file or directory 9 16F Unable to open uora62; error ld.so.1: userver: fatal: uora62.so: open failed: No such file or directory 9 9z SYS_I011: udllvec: UDBORA00 not found in uora62

Although this is also misleading. It suggests that libuora62.so could not be found, whereas the actual problem is that this file is there but it cannot locate the Oracle libraries. Recent versions of Uniface do a better job of reporting the exact problem:

Could not load /h/chris/uf/94/so9/common/lib/libuora64.so. dlerror: ld.so.1: uniface: fatal: libclntsh.so.11.1: open failed: No such file or directory

From this point, an userver will usually be communicating with its client, and, at least initially, working normally. The common things that can still go wrong are

• Crashes • Hangs • Memory and CPU usage • Urouter refuses to start more uservers

Troubleshooting crashes.

Terminology first. Some people call every problem a crash, for instance when userver reports an unexpected/fatal error to the client, or if urouter reports a dead server. At Compuware we use the more strict meaning of crash,. which is that the program has terminated and this has been reported by the operating system. On Unix, you know a program has crashed when it suddenly ends with

Segmentation Fault(coredump)

or

Bus Error (coredump)

On Windows, you know a program has crashed when you get one of the various Windows popup saying it “has experienced a problem and needs to be shut down”, “terminated unexpectedly”, “has stopped working” or something alike.

Uniface fatal errors like “9010 Out of memory” or “9024 Logon error” are not crashes – just Uniface fatal errors.

Typically, crashes are “handled” by the operating system in one way or another. On Unix, a crash usually results in a core dump being created in the program’s current directory. Depending on the type of system and configuration, a crash can also be logged in the local syslog.

Traditionally the name of the core dump file is core but different systems have different rules. Linux sensibly adds the process id to the name, e.g. core.2525. On some systems, core files can also be disabled, and/or specific naming rules assigned (see e.g. the coreadm command on Linux). When a crash is reported or suspected, Compuware Support will always ask for the core file, which contains valuable information about the state of the system at that moment, and in particular the stack trace. Some useful commands to look for and examine core files:

$ ls -l core* -rw------- 1 chris chris 41490316 Oct 25 17:55 core $ file core core: ELF 64-bit MSB core file SPARCV9 Version 1, from 'userver' $ strings core | head CORE userver /h/chris/uf/94/so9/common/bin/userver -srvid=1 -dnp=TCP:+13001||39A3B9EA-3D84-1 . . .

or

$ strings core | grep userver

As you see core files can get quite big, especially if the process was eating memory before it crashed. It can also be hard to examine a core file on another system than it was generated on. For these reasons Compuware will usually ask for a stack trace generated from the core, rather than the file itself. For this you need to have the name of the core file, the name of the executable that dumped it (as shown above you can see that with the strings command), and a debugger like dbx, gdb, or wdb. For our purpose, all 3 debuggers work the same.

Invoke your debugger, for example gdb on Linux, with the userver executable and core file names as arguments:

$ gdb /h/chris/uf/94/so9/common/bin/userver core . . . t@1 (l@1) terminated by signal SEGV (Segmentation Fault) 0xffffffff7e1a5aac: _so_recv+0x000c: bcc,pt %icc,_so_recv+0x28 ! 0xffffffff7e1a5ac8 (dbx)

This already displays the type of crash and location. Enter the command where to display the stack trace:

(dbx) where current thread: t@1 =>[1] _so_recv(0x4, 0xffffffff7fffbede, 0x2, 0x0, 0xffffffff7de0e318, 0x73), at 0xffffffff7e1a5aac [2] do_recv(0x100171580, 0x6, 0xffffffff7fffbede, 0x2, 0xffffffff7de265b8, 0xffffffff7bc08490), at 0xffffffff7bc0347c [3] TCPreceive(0x100171580, 0x6, 0xffffffff7fffc280, 0x1000, 0xffffffff7fffc1b8, 0x10589c), at 0xffffffff7bc03ba4 [4] UNWTCP(0x100171580, 0x73, 0x10012a0f0, 0x0, 0x0, 0x0), at 0xffffffff7bc07278 [5] dorcv(0x100171580, 0x200, 0xffffffff7fffe678, 0x0, 0x47, 0x0), at 0xffffffff7be0196c [6] recmsg(0x100171580, 0x0, 0xffffffffffffffff, 0xffffffff7fffeac8, 0x1, 0xe0), at 0xffffffff7be01ec8 [7] umwgo(0xffffffff7fffe980, 0x0, 0xffffffffffffffff, 0x10012a0f0, 0xffffffff7be03f68, 0xffffffff7c2645d0), at 0xffffffff7c0c5f30 [8] urecmsg(0x100171580, 0x0, 0x200, 0xffffffff7fffeac8, 0x1, 0x0), at 0xffffffff7c0c66c4 [9] srvloop(0x10012a0f0, 0x0, 0x7ffd, 0xffffffff7fffeac8, 0xffffffff7b70a020, 0x1), at 0xffffffff7b60868c [10] USERVERSTART(0x10012a0f0, 0x50, 0x100129190, 0x100bec, 0x1, 0xffffffff7b70a020), at 0xffffffff7b609630 [11] USRVMAIN(0x10012d7b0, 0x10012d8cc, 0x5, 0x10012a0f0, 0x100101440, 0x0), at 0x100001168 [12] UMAIN(0x5, 0xffffffff7fffee58, 0xffffffff7fffed98, 0x0, 0x100101ac8, 0x100000ee8), at 0xffffffff7df03d20 [13] main(0x5, 0x40, 0x0, 0x10070c, 0x0, 0x100101440), at 0x100000d68 (dbx)

For debuggable binaries (sometimes provided by Support to help troubleshooting) this would also show the source and line number.

When you have neither of these debuggers installed, good old adb (Absolute Debugger - should be present on all Unix systems) will do the job, albeit without the ability to display source information. The command line syntax is the same as for dbx/gdb/wdb. Note that adb does not display a prompt, it just sits there waiting for your input. The command you enter to display the stack trace is $c :

$ adb /h/chris/uf/94/so9/common/bin/userver core core file = core -- program ``/h/chris/uf/94/so9/common/bin/userver'' on platform SUNW,Sun-Fire-V490 SIGSEGV: Segmentation Fault adb: warning: core file is from SunOS 5.10 Generic_142909-17; shared text mappings may not match installed libraries $c libc.so.1`_so_recv+0xc(100171580, 6, ffffffff7fffbede, 2, ffffffff7de265b8, ffffffff7bc08490) libutcp10.so`TCPreceive+0x2c(100171580, 6, ffffffff7fffc280, 1000, ffffffff7fffc1b8, 10589c) libutcp10.so`UNWTCP+0x670(100171580, 73, 10012a0f0, 0, 0, 0) libumwpsv10.so`dorcv+0x84(100171580, 200, ffffffff7fffe678, 0, 47, 0) libumwpsv10.so`recmsg+0x130(100171580, 0, ffffffffffffffff, ffffffff7fffeac8, 1, e0) liburtl.so`umwgo+0xf8(ffffffff7fffe980, 0, ffffffffffffffff, 10012a0f0, ffffffff7be03f68, ffffffff7c2645d0) liburtl.so`urecmsg+0x34(100171580, 0, 200, ffffffff7fffeac8, 1, 0) libuserv.so`srvloop+0x1e4(10012a0f0, 0, 7ffd, ffffffff7fffeac8, ffffffff7b70a020, 1) libuserv.so`USERVERSTART+0x200(10012a0f0, 50, 100129190, 100bec, 1, ffffffff7b70a020) USRVMAIN+0x280(10012d7b0, 10012d8cc, 5, 10012a0f0, 100101440, 0) libucall.so`UMAIN+0x38(5, ffffffff7fffee58, ffffffff7fffed98, 0, 100101ac8, 100000ee8) main+0x38(5, 40, 0, 10070c, 0, 100101440) _start+0x17c(0, ffffffff7fffee58, ffffffff7f60e640, ffffffff7f1bf588, ffffff00, ffffffff7f710000)

To exit adb, press Control-D. On Solaris and HP-UX, you can also use the pstack command to produce the stack trace:

$ pstack core

Producing output like this:

$ pstack core core 'core' of 28762: /h/chris/uf/94/so9/common/bin/userver -srvid=1 -dnp=TCP:+13001||39A3B9 ----------------- lwp# 1 / thread# 1 -------------------- ffffffff7e1a5aac _so_recv (100171580, 6, ffffffff7fffbede, 2, ffffffff7de265b8, ffffffff7bc08490) + c ffffffff7bc03ba4 TCPreceive (100171580, 6, ffffffff7fffc280, 1000, ffffffff7fffc1b8, 10589c) + 2c ffffffff7bc07278 UNWTCP (100171580, 73, 10012a0f0, 0, 0, 0) + 670 ffffffff7be0196c dorcv (100171580, 200, ffffffff7fffe678, 0, 47, 0) + 84 ffffffff7be01ec8 recmsg (100171580, 0, ffffffffffffffff, ffffffff7fffeac8, 1, e0) + 130 ffffffff7c0c5f30 umwgo (ffffffff7fffe980, 0, ffffffffffffffff, 10012a0f0, ffffffff7be03f68, ffffffff7c2645d0) + f8 ffffffff7c0c66c4 urecmsg (100171580, 0, 200, ffffffff7fffeac8, 1, 0) + 34 ffffffff7b60868c srvloop (10012a0f0, 0, 7ffd, ffffffff7fffeac8, ffffffff7b70a020, 1) + 1e4 ffffffff7b609630 USERVERSTART (10012a0f0, 50, 100129190, 100bec, 1, ffffffff7b70a020) + 200 0000000100001168 USRVMAIN (10012d7b0, 10012d8cc, 5, 10012a0f0, 100101440, 0) + 280 ffffffff7df03d20 UMAIN (5, ffffffff7fffee58, ffffffff7fffed98, 0, 100101ac8, 100000ee8) + 38 0000000100000d68 main (5, 40, 0, 10070c, 0, 100101440) + 38 0000000100000cfc _start (0, ffffffff7fffee58, ffffffff7f60e640, ffffffff7f1bf588, ffffff00, ffffffff7f710000) + 17c ----------------- lwp# 2 / thread# 2 -------------------- ffffffff7e1a5818 _libc_nanosleep (7a120, ffffffff7d2106f4, 52cb3570, 111954, ffffffff7cbbc8ac, ffffffff7cb60b48) + 8 ffffffff7cbbc8ac l101001111 (1f4, cf, 526a9453, ffffffff7ccfae70, 16b280, 7a120) + a4 ffffffff7cb64558 l010011111 (2ec8, 0, 0, 0, 0, 0) + 428 ffffffff7d2181a8 _lwp_start (0, 0, 0, 0, 0, 0)

Crash analysis on Windows is a bit less straightforward though not essentially different. First of all, any crash should have been logged in the Windows Application Event Log:

The Event Logs are very useful to search for the history of problems. Whenever you have (or suspect) a crash, always look there first, it already displays some useful information.

NB - We have seen cases where userver crashes were logged in the event log but not of the users had ever noticed any problem ! This can happen in a web environment, where after a (random) crash, a new userver is started , the request is repeated and succeeds.

Traditionally, Windows is shipped with the “postmortem” debugger Dr. Watson. You can install Dr.Watson as the default exit debugger with the command

drwtsn32 –i

which will put the Dr.Watson command line in the AeDebug subkey of registry key HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion:

This value specifies the program to be invoked on the process that runs on a crash. Other possibilities here are the Visual Studio debugger, WinDbg, or the JIT debugger;

msdev.exe -p %ld -e %ld C:\debuggers\windbg.exe -p %ld -e %ld -g C:\Windows\system32\VSjitdebugger.exe -p %ld -e %ld

Dr. Watson will, for each crash, log an entry in its log file drwtsn32.log. The location of that file can be seen/configured by running Dr.Watson interactively:

C:\> drwtsn32

The Dr.Watson log can be useful input for Compuware support to analyze crashes. As crashes are appended in the log, it can get quite large over the years, and it makes sense to clear it every now and then. As you see above, Dr. Watson also offers a possibility to create user mode dumps which can be useful for Compuware to analyze crashes.

In recent Windows versions (as from Windows Vista), Microsoft has discontinued Dr.Watson, although it can still be downloaded and it still works. Instead of Dr. Watson we now have Windows Error Reporting (WER) that enables you to collect a user mode dump (sometimes referred to as minidump or crash dump)when an application crashes. This is also done in the registry, in key HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\Windows Error Reporting\LocalDumps

The name of the created dump consists of the executable, the process id (pid) and the suffix .dmp, e.g.

uniface.exe.4332.dmp

Dump files can be analyzed by Compuware support.

Troubleshooting hangs.

Uservers can become unresponsive or appear to ‘hang’. This usually manifests in the client by the hourglass or *busy* indicator not going away, or by urouter reporting a timeout. When this happens we must try to find out what userver is doing, or else waiting for.

The most obvious thing to check is whether the userver may be waiting on a database lock. How to find that out depends on the database in question and goes beyond the scope of this presentation. You might be able to deduce it from examining the current state of the process.

To examine the state of a process on Windows, the best tools are Process Monitor (to see if it is still doing something) and Process Explorer (to see what resources and DLL’s it has open, and what the process’s threads are doing).

After starting Process Monitor, use the Filter function in the pulldown menu to restrict output to userver.exe:

Make sure the buttons for File, Registry, Network and Process/Thread activity are all pressed (those are the 4 buttons toward the right of the button bar, I’ve never used the rightmost one). For a running userver you may be seeing output like this:

Use File->Save to save the filtered events to a .PML file. This file gives a lot of great information about what goes on inside the process.

When you see nothing appearing on the screen (and are sure the process exists and your filter is correct) you can be pretty sure the userver hangs. Time to examine its state with Process Explorer.

After starting Process Explorer, you first locate your executable in the process tree in the top pane. You can choose your columns of choice here, of which the ones shown here are the most useful. In View->Lower Pane View you can choose whether to see DLL’s or Handles in the lower pane. Both views can be customized and are very useful. Let’s first see the DLL view:

A great way to check if no DLL’s have been loaded from unexpected locations (for example, Uniface 9 loading a Uniface 8 dll or vice versa). With Lower Pane View set to handles you’ll see all the Windows objects currently open by the process. The Files are usually most interesting :

Back in the upper pane, right mouse click on the process name, and choose Properties. The Properties screen reveals a wealth of information about the process. Especially useful are the threads, for each of which you can request a stack trace:

Here we see the stack trace of the main thread of userver, showing it to be blocked on a network receive. It should be noted that in most cases, Windows stack traces only make sense when all the symbols have been installed. The symbols of the Uniface binaries can be requested from Technical Support, the symbol files for Windows can be downloaded from the Windows Symbol Store. The same applies to debugging an application or its crash dump – you really need the symbols to go the whole way.

The tools we have on Unix are not so fancy, but they are hardly less powerful. Truss is the alternative to Process Monitor. You hook your process up to truss like this:

$ truss –p NNNNN

where NNNNN is the pid of the userver (or other Uniface program), and see what the output gives. For example, running truss on an userver that is idle shows that it is waiting to receive a network message:

$ truss -t !time,nanosleep -p 2060 /1: recv(6, 0xFFFFFFFF7FFFBEDE, 2, 0) (sleeping...)

Note that we filtered out the time and nanosleep calls, as userver has a separate thread calling these continuously and we don’t want to see them all the time.

And here is what an userver shows that is waiting for a lock in Oracle:

$ truss -t!time,nanosleep -p 18065 /1: read(13, 0x10024C296, 2064) (sleeping...)

Note that here we see a read rather than a recv , proving not much except that it is waiting for something else than a network socket.

The pstack command, available on Solaris, HP-UX and Linux, can be used to take a snapshot of a process’ execution stack. By repeatedly doing this you may get an idea of what an userver or urouter is doing. If a process is hanging (or waiting/sleeping) pstack will indicate in which function this is. For example here is a pstack from a running urouter on Linux:

$ pstack 27181 #0 0x000000341a00ce45 in recv () from /lib64/libpthread.so.0 #1 0x00002aaaabe8b177 in do_recv () from /h/chris/uf/95/lia/common/lib/libutcp10.so #2 0x00002aaaabe8e116 in UNWTCP () from /h/chris/uf/95/lia/common/lib/libutcp10.so #3 0x00002aaaabd8594a in recmsg () from /h/chris/uf/95/lia/common/lib/libumwpsv10.so #4 0x00002aaaabd8724c in UMWPSV10 () from /h/chris/uf/95/lia/common/lib/libumwpsv10.so #5 0x00002aaaab7a1e55 in umwgo () from /h/chris/uf/95/lia/common/lib/liburtl.so #6 0x00002aaaab7a2630 in urecmsg () from /h/chris/uf/95/lia/common/lib/liburtl.so #7 0x00002aaaabf994cd in srvloop () from /h/chris/uf/95/lia/common/lib/libuserv.so #8 0x00002aaaabf9a392 in USERVERSTART () from /h/chris/uf/95/lia/common/lib/libuserv.so #9 0x0000000000400acb in USRVMAIN () #10 0x00002aaaaaaafda5 in UMAIN () from /h/chris/uf/95/lia/common/lib/libucall.so #11 0x000000000040086e in main ()

showing the urouter is waiting on network input. This is quite the default state for an urouter, unless you happen to catch it in the middle of handling a request.

The Unix command that comes closest to Process Explorer, as far as resources are concerned, is probably lsof (list open files), sometimes referred to as “The Swiss Army knife of tools”. It has a zillion options, and is worth asking your sysadmin to install if you don’t have it (I think it comes standard on Linux). You will usually run lsof with a process id as argument:

$ lsof –p 13955

which will produce output like this :

As you see this lists loaded shared libraries with their actual Unix versions, as well as files and network sockets with their file descriptors. For example you see file descriptors 0, 1, and 2 (Unix stdin, stdout, and stderr) pointing to the terminal device /dev/pts/3 which is the window in which I was running this urouter and userver. Note that you can redirect userver’s stdout and stderr in a wrapper script, in case it would try to output some unexpected error message. As file descriptors 3, 4, and 5 you see the connection with urouter, the userver log, and the sequential database file ENT.UDF. To examine various characteristics of a process, use the ps (process status) command. This also has a zillion of options (consult the man page for your platform). For example to see the output of this same userver you enter

$ ps -lf -p 13955 F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD 0 S chris 13955 13952 0 81 4 - 13687 - 16:20 pts/3 00:00:00 /h/chris/uf/96/lia/common/bin/userver -srvid=1 . . .

This is what you also use to see the process’s memory footprint. The SZ column is the size, in 4Kb pages, of the core image of the process. The columns you can choose depend on the version of Unix.

To see a running process’s stack trace, some Unix flavors (Linux, Solaris, HP-UX, not AIX though) provide the pstack command, which is invoked with the pid as argument (without the –p keyword). For example to see the stack of this same userver on Linux:

$ pstack 13955 #0 0x000000341a00ce45 in recv () from /lib64/libpthread.so.0 #1 0x00002aaaac2cf621 in do_recv () from /h/chris/uf/96/lia/common/lib/libutcp10.so #2 0x00002aaaac2cfb7d in TCPreceive () from /h/chris/uf/96/lia/common/lib/libutcp10.so #3 0x00002aaaac2d1c2b in UNWTCP () from /h/chris/uf/96/lia/common/lib/libutcp10.so #4 0x00002aaaac0ca060 in recmsg () from /h/chris/uf/96/lia/common/lib/libumwpsv10.so #5 0x00002aaaac0cb229 in UMWPSV10 () from /h/chris/uf/96/lia/common/lib/libumwpsv10.so #6 0x00002aaaab9defeb in umwgo () from /h/chris/uf/96/lia/common/lib/liburtl.so #7 0x00002aaaab9dfa50 in urecmsg () from /h/chris/uf/96/lia/common/lib/liburtl.so #8 0x00002aaaac4dce4e in srvloop () from /h/chris/uf/96/lia/common/lib/libuserv.so #9 0x00002aaaac4de14d in USERVERSTART () from /h/chris/uf/96/lia/common/lib/libuserv.so #10 0x0000000000400bd8 in USRVMAIN () #11 0x00002aaaaaaafa6e in UMAIN () from /h/chris/uf/96/lia/common/lib/libucall.so #12 0x00000000004007e0 in main ()

On Solaris, pstack displays the stack of all running threads:

$ pstack 18117 18117: /h/chris/uf/94/so9/common/bin/userver -srvid=5 -dnp=TCP:+13001||706A27 ----------------- lwp# 1 / thread# 1 -------------------- ffffffff7e0a5aa8 recv (8, ffffffff7fffbe4e, 2, 0) ffffffff7bb0347c do_recv (100171580, 8, ffffffff7fffbe4e, 2, ffffffff7dd265b8, ffffffff7bb08490) + 94 ffffffff7bb03ba4 TCPreceive (100171580, 8, ffffffff7fffc1f0, 1000, ffffffff7fffc128, 10589c) + 2c ffffffff7bb07278 UNWTCP (100171580, 73, 10012a0f0, 0, 1cbae4, 0) + 670 ffffffff7bd0196c dorcv (100171580, 200, ffffffff7fffe5e8, 0, 47, ff) + 84 ffffffff7bd01ec8 recmsg (100171580, 0, ffffffffffffffff, ffffffff7fffea38, 1, e0) + 130 ffffffff7bfc5f30 umwgo (ffffffff7fffe8f0, 0, ffffffffffffffff, 10012a0f0, ffffffff7bd03f68, ffffffff7c1645d0) + f8 ffffffff7bfc66c4 urecmsg (100171580, 0, 200, ffffffff7fffea38, 1, 0) + 34 ffffffff7b50868c srvloop (10012a0f0, 0, 7ffd, ffffffff7fffea38, ffffffff7b60a020, 1) + 1e4 ffffffff7b509630 USERVERSTART (10012a0f0, 50, 100129190, 100bec, 1, ffffffff7b60a020) + 200 0000000100001168 USRVMAIN (10012d7b0, 10012d8cc, 7, 10012a0f0, 100101440, 0) + 280 ffffffff7de03d20 UMAIN (7, ffffffff7fffedc8, ffffffff7fffed08, 0, 100101ac8, 100000ee8) + 38 0000000100000d68 main (7, 40, 0, 10070c, 0, 100101440) + 38 0000000100000cfc _start (0, ffffffff7fffedc8, ffffffff7f60e640, ffffffff7f1bf588, ffffff00, ffffffff7f710000) + 17c ----------------- lwp# 2 / thread# 2 -------------------- ffffffff7e0a5814 nanosleep (ffffffff7b4fbd70, 0) ffffffff7d1107a0 usleep (7a120, ffffffff7d1106f4, 52cb3570, 111954, ffffffff7cabc8ac, ffffffff7ca60b48) + ac ffffffff7cabc8ac l101001111 (1f4, 90, 526e9201, ffffffff7cbfae70, 16b280, 7a120) + a4 ffffffff7ca64558 l010011111 (2ec8, 0, 0, 0, 0, 0) + 428 ffffffff7d1181a8 _lwp_start (0, 0, 0, 0, 0, 0)

Solaris has a couple of other interesting commands, (pldd, psig, pfiles) which together give as much possibilities, if not more, than lsof. Check them out with the command man pstat.

Troubleshooting resource usage.

A common cause of problem is userver’s (or urouter’s) uncontrolled consumption of resources. With resources we mainly mean memory, CPU, handles and threads. You can monitor a process’ resource in several ways.

On Windows, use Task Manager. One of the columns is Memory (Private Working Set) which represents the amount of memory allocated by the process. If you see this number rising continually, the process usually has a memory leak.

Process Explorer is more sophisticated. The memory column is here called Private Bytes. You can also bring up the Properties sheet for the userver process and click the Performance tab, which brings up a screen that is continuously refreshed and where you can see various resources being consumed :

Another great option is the Performance Monitor where you can choose your process and add one or more counters. In the below example I’ve monitored urouter for a while during startup and handling some clients starting shared uservers:

The Performance Monitor takes a bit of getting used to, defining suitable counters is a bit of a task, but is well worth investigating. Process Explorer also has some facilities so investigate CPU usage.

On Unix, you will have to make do with ps (for memory/cpu) and lsof (for files/sockets) . For example here is a very simple shell script to monitor the memory of your userver(s) :

ps -l | grep ' SZ ' while : do ps -el | grep userver sleep 2 done

which shows output like this

F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD 0 S 1421 21920 2055 0 52 24 ? 4238 ? pts/2 0:01 userver 0 S 1421 21920 2055 0 52 24 ? 4724 ? pts/2 0:01 userver 0 S 1421 21920 2055 0 52 24 ? 4724 ? pts/2 0:01 userver 0 S 1421 21920 2055 0 52 24 ? 4724 ? pts/2 0:01 userver 0 S 1421 21920 2055 0 52 24 ? 4820 ? pts/2 0:01 userver

The SZ column is the size, in 4Kb blocks, of the process’s memory.

Most Unix systems also have the top command which lets you see the top CPU consumers in real time:

The most common reason for memory leakage is mismanaging instances. Uniface programmers often happily create new instances but forget to delete them. Uniface does not delete unused instances automatically.

On Windows, there is an (undocumented) trace option mem to display the process’ private bytes in each trace line. This is the same number you will also see in Task Manager. It can be added to the $trc_info line in the assignment file :

[SETTINGS] $ioprint 255 $trc_start d:\uf\96\urtrace.log $trc_levels 9A-Za-z6c5s0R5t0z0N $trc_info cat,lvl,dtt,mem

and will produce output like this:

9 F 0:00.000.00 t=8224: 5812Kb Loaded 'urout' from d:\uf\96\common\bin\urout.dll, version: 9.6.03 X301 9 F 0:00.022.72 t=8224: 6004Kb CONT_ID=%fv: rout.c-163 % %dc: Mon Mar 04 16:01:33 2013 % 9 F 0:00.022.75 t=8224: 6004Kb URouter started at 29-oct-2013 12:37:27 9 F 0:00.022.82 t=8224: 6024Kb URouter pid=12340;rid=854469A1-E963-43F9-9E3A-2E1183A1A1AB 9 F 0:00.022.98 t=8224: 6064Kb started thread to listen to TCP:+10096 9 F 0:00.023.02 t=8224: 6072Kb UROUTERSTART: waiting for listening threads 1 Z 0:00.023.06 t=5244: 6080Kb listen_net: new thread active, cnt=2, lst=1, pmq=0 9 F 0:00.041.86 t=5244: 6128Kb Loaded 'umwpsv10' from d:\uf\96\common\bin\umwpsv10.dll, version: 9.6.03 X301 9 F 0:00.076.04 t=5244: 6288Kb Loaded 'utcp10' from d:\uf\96\common\bin\utcp10.dll, version: 9.6.03 X301 5 s 0:00.076.10 t=5244: 6292Kb UNWTCP: enter TCP(6302200) call=NETINFO, chn=0, lst=0 5 s 0:00.279.33 t=5244: 8036Kb UNWTCP: exit TCP(6302200) call=NETINFO, chn=0, lst=0, result=NET_SUCCESS, err=0 5 s 0:00.279.37 t=5244: 8036Kb UNWTCP: enter TCP(6302200) call=NETCREATE_SHARED, chn=0, lst=0 3 s 0:00.280.31 t=5244: 8048Kb UNWTCP: TCP6create : bind(): chn=613 hst=AMS090861D1.clients.emea.cpwr.corp on TCP4 5 s 0:00.281.04 t=5244: 8056Kb UNWTCP: exit TCP(6302200) call=NETCREATE_SHARED, chn=0, lst=613, result=NET_SUCCESS, err=0 5 s 0:00.281.07 t=5244: 8056Kb UNWTCP: enter TCP(6206848) call=NETINSTANCE, chn=0, lst=613 5 s 0:00.281.09 t=5244: 8056Kb UNWTCP: exit TCP(6206848) call=NETINSTANCE, chn=0, lst=613, result=NET_SUCCESS, err=0 5 s 0:00.281.16 t=5244: 8056Kb UNWTCP: enter TCP(6206848) call=NETCONNECT, chn=0, lst=613 9 F 0:00.281.09 t=8224: 8056Kb UROUTERSTART: All listening threads started

For an userver, this setting is extremely useful in combination with $proc_tracing = true, because you can directly see the effect on memory of each executed proc statement. The mem setting is currently ignored on non-Windows platforms.

The most common cause of extremely high CPU usage of an userver is the proc code being stuck in a loop or endless recursion. With standard means this is difficult to diagnose, as it often happens after the userver has been running for a long time. Having $proc_tracing enabled from the beginning is just not practical. But there is an undocumented tool for this, on Unix and Windows.

What you need to do is have your uservers create a putmess logfile with ioprint zero:

[settings] $ioprint 0 $putmess_logfile mess%p.log $proc_tracing false $proc_tracing_addition [%%$status]

Initially, this will generate (almost) no output. Now when you suspect an userver of improperly executing proc code, you can switch proc tracing on and off from the command line.

On Unix, you send a SIGUSR2 signal to the user process, using the kill command :

$ kill -s USR2 process_id

This will switch proc tracing of that process on when it was off, and off when it was on.

On Windows, there are no signals available. Instead there is a small commandline program called utoggle which you can use in a similar way:

C:\> utoggle process_id

The utoogle program is not (yet) integrated with Uniface but can be downloaded here:

ftp://ftp.compuware.com/pub/uniface/outgoing/cbr/utoggle.exe

Urouter stops functioning.

Under very heavy load, urouter may start refusing to accept new connections and/or start new uservers. Typically this is caused by urouter having too many files or threads. Process Monitor or truss will usually show what the problem is.

On Unix, every process has a number of limitations imposed on it. These can be shown with the command

ulimit –a

For example on Solaris, that output looks like this

time(seconds) unlimited file(blocks) unlimited data(kbytes) unlimited stack(kbytes) 8192 coredump(blocks) unlimited nofiles(descriptors) 254 vmemory(kbytes) unlimited

These limits can be defined globally (for example in /etc/system) or locally for a process

Note that the number of open files here is limited to 254. That seems like enough for an urouter, until you realize that for Unix a network socket is also a file, and that urouter creates a socket for each client and each userver. For an userver, this number could be too small when it open many files (or maybe uses a file-based database).

On some platforms, the number of threads per process can be limited (this used to be the case on HP-UX) causing urouter to report the error ‘Unable to start listening thread’.

7. Troubleshooting Web connection problems.

We’ll conclude with a very brief discussion of Web connection issues. Suppose our web URL is

http://localhost:8082/uniface/wrd/testwrd

The first error you can get is a browser error:

This usually means you have not started Tomcat, or maybe it is not running on the port you thought (8082). Check with TcpView or netstat:

Depending on how you started Tomcat, you may see java.exe, tomcat.exe, or tomcat7.exe. The important thing is that it’s listening on the expected port. For debugging purposes it is useful to start Tomcat in a window, by running the start.bat or startup.sh script in the tomcat bin directory. This will produce a screen like this:

which immediately shows you it is running and listening on its ports.

Once Tomcat is up and running, you will usually hit the next obstacle, the infamous Red Screen :

Despite the text, this message means that the WRD cannot connect to the urouter. More specific information can be found in the WRD logfile. But let’s first have a look in the wrd.xml. During troubleshooting, it is good practice to have WRD generate debug output by means of these two parameters:

<init-param><param-name>URDDEBUG</param-name><param-value>0xffff</param-value></init-param> <init-param><param-name>ERRORLOGFILE</param-name><param-value>d:\temp\wrd95.log</param-value></init-param>

The generated WRD debug log is the key to troubleshooting WRD connection problems. The above red screen corresponds with these entries in the WRD log:

DBG_RESOURCE: addRef, new reference count: <1>. DBG_REQUEST: begin handling GET request DBG_REQUEST: path info = /testwrd DBG_REQUEST: context path = /uniface DBG_REQUEST: Need to forward USP request to server DBG_REQUEST: HttpHeader host=localhost:8082 DBG_REQUEST: HttpHeader user-agent=Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Firefox/24.0 DBG_REQUEST: HttpHeader accept=text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 DBG_REQUEST: HttpHeader accept-language=en-US,en;q=0.5 DBG_REQUEST: HttpHeader accept-encoding=gzip, deflate DBG_REQUEST: HttpHeader connection=keep-alive DBG_MW: trying to connect to middleware UV8:localhost+10095|emea\cwnl-chris|***|userver DBG_MW: created middleware UV8 DBG_CONNECTION: opening connection to ASV. DBG_REQUEST: Exception on handling HTTP request: com.compuware.uniface.urd.URDMWException: Cannot connect to the UNIFACE process on the remote address/port. DBG_ERROR: reading error file err300.htm DBG_ERROR: Warning: error page err300.htm could not be located. DBG_ERROR: Error Text: Cannot connect to the UNIFACE process on the remote address/port. DBG_ERROR: Cause: Connection refused: connect DBG_REQUEST: end handling GET request

which makes it clear that no urouter is running on port 10095.

Having started urouter, you can now get a different red screen:

This is not very helpful ! The Uniface Message Guide does not about a -24 error. The WRD log only says this

DBG_MW: trying to connect to middleware UV8:localhost+10095|emea\cwnl-chris|Respighi1879|userver DBG_MW: created middleware UV8 DBG_CONNECTION: opening connection to ASV. Connect SENDING:

and does not indicate any error.

You can however find codes like this in … the Uniface Library page for $procerror ! The middleware codes that can be returned by the WRD are listed here:

-16 UNETERR_UNKNOWN Network error. -17 UNETERR_PIPE_BROKEN Connection lost. -18 UNETERR_CONNECTION Application failed to connect to the Uniface Router, or failed to start an exclusive Uniface Server. -19 UNETERR_FATAL Uniface Server exited with fatal error. -20 UNETERR_MAX_CLIENTS Uniface Router could not accept new client, $MAX_CLIENTS exceeded. -21 UNETERR_LOGON_ERROR Network logon error. -22 UNETERR_NO_REGISTRATION Application failed to register with the Uniface Router. -23 UNETERR_DOUBLE_UST Registration with Uniface Router specified UST that is already in use. -24 UNETERR_START_SERVER Uniface Router could not start Uniface Server process (executable not found). -25 UNETERR_SERVER_GONE Uniface Router could not route request to specific Uniface Server process.

The -25 error is quite common, it is given when an userver was started but timed out. It is always necessary to look in the urouter log to find out. In this case the specific error was:

9 41F 4:07.586.01 t=15164: svstart: starting server: user=emea\cwnl-chris; pgm=D:\uf\95\bin\xuserver.exe -srvid=2 -dnp=TCP:+10095||73F0CDBD-13DF-41C1-979E-2B8035B1039F| -drv=ANY -ust=userver -dir=D:\uf\95 -asn=d:\uf\95\adm\userver.asn 9 42F 4:07.620.78 t=15164: [Tue Oct 29 17:04:11 2013] err=-1: svstart: Failed to start (len=154) D:\uf\95\bin\xuserver.exe -srvid=2 -dnp=TCP:+10095||73F0CDBD-13DF-41C1-979E-2B8035B1039F| -drv=ANY -ust=userver -dir=D:\uf\95 -asn=d:\uf\95\adm\userver.asn 9 43F 4:07.620.84 t=15164: [Tue Oct 29 17:04:11 2013] err=-1: svstart: The system cannot find the file specified.

We have seen how to troubleshoot various issues with starting up uservers so will not go through that again here. Finally when an userver is successfully started you may be rewarded by a Yellow Screen, a very informative screen which you do not normally want end users to see but is great for debugging. The most useful information is usually near the end, in this case

Status: -50 ProcerrorContext: ERROR=-50 MNEM=<UACTERR_NO_SIGNATURE> DESCRIPTION=Signature not found COMPONENT=USYSHTTP PROCNAME=WRDEXEC TRIGGER=OPER LINE=169 ADDITIONAL=COMPONENTNAME=TESTWRD·;INSTANCENAME=testwrd

showing that the server page we wanted to execute was not found (or to be precise, its signature was not found).

Because such a detailed screen can be considered a security hazard, you can customize it by putting some logicals in the userver’s assignment file:

[LOGICALS] USYSHTTPTITLE Yikes, an error ! USYSHTTPHEADER Oops, something went wrong. USYSHTTPBODY <pre>We were unable to process your request. Please try again later.<br>We apologize for any convenience.<hr></pre> USYSHTTPBGCOLOR wheat

which makes it look like a standard web apology screen:

The End !