Archive for the ‘Uncategorized’ Category

OpenSAF 4.0 Troubleshooting

October 28, 2010

Here is list of problems and solutions when bringing up OpenSAF 4.0

DISCARDING IMPLEMENTER

Upon startup do you get the following errors in /var/log/messages and then OpenSAF quits?

Oct 28 18:35:19 my-laptop osafimmnd[32053]: Create implementer:
Oct 28 18:35:19 my-laptop osafimmnd[32053]: Implementer 3 disconnected. Marking it as doomed
Oct 28 18:35:19 my-laptop osafimmnd[32053]: DISCARDING IMPLEMENTER 3 (safPlmService)
Oct 28 18:35:34 my-laptop opensafd[31997]: Starting the PLMD service
Oct 28 18:35:34 my-laptop osafimmnd[32053]: Create implementer:
Oct 28 18:35:34 my-laptop osafimmnd[32053]: Implementer 4 disconnected. Marking it as doomed
Oct 28 18:35:34 my-laptop osafimmnd[32053]: DISCARDING IMPLEMENTER 4 (safPlmService)
Oct 28 18:35:49 my-laptop opensafd[31997]: Starting the PLMD service
Oct 28 18:35:49 my-laptop osafimmnd[32053]: Create implementer:
Oct 28 18:35:49 my-laptop osafimmnd[32053]: Implementer 5 disconnected. Marking it as doomed
Oct 28 18:35:49 my-laptop osafimmnd[32053]: DISCARDING IMPLEMENTER 5 (safPlmService)
Oct 28 18:35:49 my-laptop osafimmnd[32053]: Implementer 1 disconnected. Marking it as doomed
Oct 28 18:35:49 my-laptop osafimmnd[32053]: DISCARDING IMPLEMENTER 1 (OpenSAFDtsvService)
Oct 28 18:35:49 my-laptop osafimmnd[32053]: Director Service in NOACTIVE state
Oct 28 18:35:49 my-laptop osafrded[32018]: Connection closed by client (orderly shutdown)
Oct 28 18:35:49 my-laptop osafimmnd[32053]: Director Service is down
Oct 28 18:35:49 my-laptop osafrded[32018]: Connection closed by client (orderly shutdown)
Oct 28 18:35:49 my-laptop osafrded[32018]: Connection closed by client (orderly shutdown)
Oct 28 18:35:49 my-laptop kernel: [14221.183751] TIPC: Disabling bearer
Oct 28 18:35:49 my-laptop kernel: [14221.183762] TIPC: Left network mode
Oct 28 18:35:49 my-laptop kernel: [14221.199712] NET: Unregistered protocol family 30
Oct 28 18:35:49 my-laptop kernel: [14221.199718] TIPC: Deactivated

Answer

You do not have a machine that has “platform management” or you do not have openhpi correctly installed. Platform management is the ability for the system to detect hardware issues like temperature alarms or component failures. It is optional. Reconfigure, rebuild, and reinstall OpenSAF, but disable plm in the configure step as follows:

./configure –disable-ais-plm

Startup seems to work without error and then quits

Answer 1

Make sure that the node name as defined in /etc/opensaf/node_name is one of the nodes defined in your imm.xml file. By default it fills this file with the linux nodename during installation.

Answer 2

Make sure that your node name (/etc/opensaf/node_name) and slot number (/etc/opensaf/slot_id) is unique in the cluster.

Debugging Components

April 5, 2010

It can be tricky to debug any program, and doubly so for a program that is started within the purview of a framework (like browser plugins for example) since it can be hard to use “printfs” when there is no clear console output, and it can be hard to connect a debugger. OpenSAF components are no exception.

ARRG! It reboots!

The number one first thing to do it remove that pesky reboot! Edit the file:
/usr/local/lib/opensaf/opensaf_reboot
and comment out the /sbin/shutdwon and /sbin/reboot lines.

Logging

The first thing to remember is that the stdout and stderr of your programs ARE going somewhere — you just have to figure out where that is!
The default “distribution” of OpenSAF logs to three places:

/var/log/messages — this is where you would look for errors relating to the OpenSAF system itself. I recommend running “tail -f /var/log/messages” in a separate xterm.

/var/lib/opensaf/stdouts — this directory contains the console printouts of each OpenSAF program and the programs that OpenSAF spawns. Note that this is not the place for YOUR program’s output because OpenSAF actually starts a helper script which then starts your program.

/var/opt/ — Your program’s output. The default helper script sticks your program’s logs here. However, note that this destination is very easy to modify so it may be different in your system.

This command will search all of these areas for any logs relating to your component. Of course you must replace YourCompName with the name of your component (as defined in the imm.xml file).
grep -i “YourCompName” /var/lib/opensaf/stdouts/* /var/log/messages /var/opt/*

Debugging startup issues

There are 3 possible locations for the problem:
1. AMF may not be starting your program. For example, the executable in the imm.xml file may not exist.
2. Your program’s startup script.
3. The initialization code for your program itself.

Run outside OpenSAF AMF

Another option is to run your program outside of the OpenSAF AMF high availability framework. As long as OpenSAF is running, a program that is linked with the appropriate libraries can be run on the command line and still utilise all the SAF services except AMF. This is very convenient during the development phase because it means that you can run your program under GDB using normal development/debug methodologies.

To do this, you need to rip out all the saAmf* calls, the active/standby callback handlers, etc and just start your stuff going from the “main” routine. In fact, it might make more sense to make “fake” calls to your active/standby handlers directly from the “main” routine (or another thread) rather than rip all this out!

OpenSAF AMF Control

March 9, 2010

Now that the the new component is running, we need to do some basic operations such as assigning it to be active (i.e. “unlocking” it in SAF terminology), assigning some configuration data (SAF terminology: create a “component service instance”) and so forth.

To do so, you need to modify the configuration in the IMM database. There are a set of binaries that can manipulate imm called
immadm, immcfg, immdump, immfind, immlist

and a set of wrapper scripts around these called:
amf-adm, amf-find, amf-state

To start/stop components I’ve been using the amf wrapper scripts. If you are unfamiliar with SAF, please note that SAF uses a strange and unusual state machine and nomenclature to start and stop components. If you are unfamiliar with it what is going on below will be very confusing. Please stop here and read the appropriate section of the AMF spec (www.saforum.org). Dig thru it for the terms “lock”,”unlock”,”lock-assignment”, and “lock-instantiation”.

What follows is an example using my component “AmfDemo”.

amf-adm and amf-find examples

Search for the full “DN” name of my component:
root@eceni:/code/opensaf# amf-find su | grep AmfDemo
safSu=SU1,safSg=AmfDemo,safApp=AmfDemo
safSu=SU2,safSg=AmfDemo,safApp=AmfDemo

“Lock” just puts it in a quiescent state:

root@eceni:/code/opensaf# amf-adm lock "safSu=SU1,safSg=AmfDemo,safApp=AmfDemo"
root@eceni:/code/opensaf# ps -efwww | grep AmfDemo
root 11359 1494 0 15:10 pts/0 00:00:00 grep AmfDemo

root@eceni:/code/opensaf# amf-adm unlock "safSu=SU1,safSg=AmfDemo,safApp=AmfDemo"
root@eceni:/code/opensaf# ps -efwww | grep AmfDemo
root 11374 1494 0 15:10 pts/0 00:00:00 grep AmfDemo

The component’s log (in /var/log/messages):

Mar 9 15:10:24 eceni opensaf_scap: 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo' UNLOCKED => LOCKED
Mar 9 15:10:47 eceni opensaf_scap: 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo' LOCKED => UNLOCKED

Now I will lock, and then “lock-instantiate” the component, which will have the effect of shutting it down:

root@eceni:/code/opensaf# amf-adm lock "safSu=SU1,safSg=AmfDemo,safApp=AmfDemo"
root@eceni:/code/opensaf# ps -efwww | grep amf_demo
root 5589 1 0 12:13 ? 00:00:00 /opt/amf_demo/amf_demo instantiate comparg1 SC_2_1
root 11430 1494 0 15:11 pts/0 00:00:00 grep amf_demo
root@eceni:/code/opensaf# amf-adm lock-in "safSu=SU1,safSg=AmfDemo,safApp=AmfDemo"
root@eceni:/code/opensaf# ps -efwww | grep amf_demo
root 11445 1494 0 15:12 pts/0 00:00:00 grep amf_demo

And here are the relevant logs:
Mar 9 15:11:45 eceni opensaf_scap: 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo' UNLOCKED => LOCKED
Mar 9 15:12:02 eceni opensaf_scap: 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo' LOCKED => LOCKED_INSTANTIATION
Mar 9 15:12:02 eceni amf_demo[5589]: Dispatched 'Component Terminate' in 'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo'
Mar 9 15:12:02 eceni opensaf_scap: 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo' INSTANTIATED => TERMINATING
Mar 9 15:12:02 eceni opensaf_scap: 'safSu=SU1,safSg=AmfDemo,safApp=AmfDemo' TERMINATING => UNINSTANTIATED

amf-adm Errors

Here are a bunch of typical errors and what they mean:

SA_AIS_ERR_NO_OP: You are already in the state that you told it to go to
root@eceni:/code/opensaf# amf-adm unlock safSg=AmfDemo,safApp=AmfDemo
error - saImmOmAdminOperationInvoke_2 FAILED: SA_AIS_ERR_NO_OP (28)

SA_AIS_ERR_NOT_SUPPORTED: OpenSAF does not support this SAF feature
root@eceni:/code/opensaf# amf-adm lock-in safSg=AmfDemo,safApp=AmfDemo
error – saImmOmAdminOperationInvoke_2 FAILED: SA_AIS_ERR_NOT_SUPPORTED (19)

“does not exist”: Oops you gave a bad DN
root@eceni:/code/opensaf# amf-adm lock safSu=SC_2_2,safSg=AmfDemo?,safApp=AmfDemo?
error - saImmOmAdminOwnerSet - object 'safSu=SC_2_2,safSg=AmfDemo?,safApp=AmfDemo?' does not exist

SA_AIS_ERR_BAD_OPERATION: You can’t get there from here!
root@eceni:/code/opensaf# amf-adm lock-in "safSu=SU1,safSg=AmfDemo,safApp=AmfDemo"
error - saImmOmAdminOperationInvoke_2 FAILED: SA_AIS_ERR_BAD_OPERATION (20)

In other words, you tried to move from state A to state C without going through intermediate state B.

Writing your own OpenSAF Component

March 9, 2010

Writing an OpenSAF component is easy if you know the SAF AMF APIs, but probably hard otherwise! Luckily I know them 🙂 and also OpenSAF provides a template “main” file called amf_comp_template.c in the avsv sample directory. I copied this and the Makefile into a separate directory, renamed it to “main.c”, modified the makefile, built, copied the binary over the amf_demo component (so I wouldn’t have to hack imm.xml) & I was up and running!

OpenSAF: Trying the Samples

March 2, 2010

Sample apps exist over in the “opensaf-4.0.M4/samples” directory. I started with the 1st one “avsv”.

AVSV Sample

This samples is located in the opensaf-4.0.M4/samples/avsv directory.

Building

First off, the compilation fails because it can’t find the default installed libraries, like this (for search engines):

root@eceni:/code/opensaf/opensaf-4.0.M4/samples/avsv# make
gcc -g -O2 -Wall -fPIC -I. -I/usr/include/opensaf -o amf_demo amf_demo.o -lSaAmf -lavsv_common -lopensaf_core
/usr/bin/ld: cannot find -lavsv_common
collect2: ld returned 1 exit status
make: *** [amf_demo] Error 1

So you really need to jam in a -L line into the link. As discussed in prior postings, in Ubuntu the libs were put in “/usr/local/lib/opensaf”. The cleanest way to get this to work is to edit the makefile in the avsv directory and fix it there. But we can also jam the value into a make variable that happens to be unused in the make, namely “CPPFLAGS”:


root@eceni:/code/opensaf/opensaf-4.0.M4/samples/avsv# make CPPFLAGS="-L /usr/local/lib/opensaf"
gcc -L /usr/local/lib/opensaf -g -O2 -Wall -fPIC -I. -I/usr/include/opensaf -o amf_demo amf_demo.o -lSaAmf -lavsv_common -lopensaf_core

Running

Running it is a bit tricky. How do you integrate this model within the existing default openSAF model so that the SAF AMF knows what to run? Well the model file (opensaf-4.0.M4/samples/avsv/AppConfig-2N.xml) suggests using the immxml-merge tool, which did not exist anywhere on my machine. But that’s ok, merging by hand is really quite simple. What you need to do is load /etc/opensaf/imm.xml and AppConfig-2N.xml into an editor. Then copy every “object” tag and all children to imm.xml. So basically you want to copy the entire file except the top-level “imm” tag. Copy it to the bottom of imm.xml, but INSIDE the “imm” tag in there…

Also, if you look at the xml file, you’ll see that the application’s directory is hard-coded to /opt/amf_demo (normal xml gt lt has been replaced with []):
[object class=”SaAmfNodeSwBundle”]
[dn]safInstalledSwBundle=safBundle=AmfDemo,safAmfNode=SC_2_2,safAmfCluster=myAmfCluster[/dn]
[attr]
[name]saAmfNodeSwBundlePathPrefix[/name]
[value]/opt/amf_demo[/value]

[/attr]
[/object]

So go ahead and create /opt/amf_demo and copy the executables over:

root@eceni:/# mkdir /opt/amf_demo
root@eceni:/code/opensaf/opensaf-4.0.M4/samples/avsv# cp amf_demo /opt/amf_demo/
root@eceni:/code/opensaf/opensaf-4.0.M4/samples/avsv# cp amf_demo_script /opt/amf_demo/
root@eceni:/code/opensaf/opensaf-4.0.M4/samples/avsv# cp amf_demo_package /opt/amf_demo/

Now you can start up openSAF in the usual fashion and the example should work!


root@eceni:/code/opensaf/opensaf-4.0.M4/samples/avsv# /etc/init.d/opensafd start
Mon Mar 1 16:11:16 EST 2010 - Starting OpenSAF
Mon Mar 1 16:11:16 EST 2010 - Starting Node Initialization Daemon: /usr/local/lib/opensaf/ncs_nid
Starting TIPC service... Done.
Starting RDF service... Done.
RDF-ROLE for this System Controller is: 0, ACTIVE
Starting DTSV service... Done.
Starting HLFM service... Done.
Starting IMMD service... Done.
Starting IMMND service... Done.
Starting LOGD service... Done.
Starting NTFD service... Done.
Starting EDSV service... Done.
Starting SCAP service... Done.
Node Initialization Successful.
SUCCESSFULLY SPAWNED ALL SERVICES!!!
Mon Mar 1 16:11:44 EST 2010 - OpenSAF Service Initialization Success

root@eceni:/code/opensaf/opensaf-4.0.M4/samples/avsv# ps -efww
root 2018 1 0 16:11 ? 00:00:00 /bin/bash /usr/local/lib/opensaf/nid_tipc start eth1 1234 NID_SVC_NAME=TIPC
root 2063 1 0 16:11 ? 00:00:00 /usr/local/lib/opensaf/ncs_rde
root 2089 1 0 16:11 ? 00:00:00 /usr/local/lib/opensaf/ncs_dts ROLE=1 NID_SVC_NAME=DTSV
root 2105 1 0 16:11 ? 00:00:00 /usr/local/lib/opensaf/opensaf_fmsd ROLE=1 NID_SVC_NAME=HLFM
root 2120 1 0 16:11 ? 00:00:00 /usr/local/lib/opensaf/opensaf_immd
root 2135 1 0 16:11 ? 00:00:00 /usr/local/lib/opensaf/opensaf_immnd
root 2155 1 0 16:11 ? 00:00:00 /usr/local/lib/opensaf/opensaf_saflogd
root 2178 1 0 16:11 ? 00:00:00 /usr/local/lib/opensaf/opensaf_ntfd
root 2191 1 0 16:11 ? 00:00:00 /usr/local/lib/opensaf/ncs_eds ROLE=1 NID_SVC_NAME=EDSV
root 2203 1 0 16:11 ? 00:00:00 /usr/local/lib/opensaf/opensaf_scap ROLE=1 NID_SVC_NAME=SCAP
root 2232 1 0 16:11 ? 00:00:00 /usr/local/lib/opensaf/opensaf_smfnd
root 2233 1 0 16:11 ? 00:00:00 /usr/local/lib/opensaf/opensaf_smfd
root 2252 1 0 16:11 ? 00:00:00 /usr/local/lib/opensaf/ncs_mqnd
root 2272 1 0 16:11 ? 00:00:00 /usr/local/lib/opensaf/ncs_mqd
root 2296 1 0 16:11 ? 00:00:00 /usr/local/lib/opensaf/ncs_glnd
root 2325 1 0 16:11 ? 00:00:00 /usr/local/lib/opensaf/ncs_cpnd
root 2337 1 0 16:11 ? 00:00:00 /usr/local/lib/opensaf/ncs_gld
root 2376 1 0 16:11 ? 00:00:00 /usr/local/lib/opensaf/ncs_cpd
root 2480 2018 0 16:15 ? 00:00:00 sleep 15
root 2491 1 0 16:15 ? 00:00:00 /opt/amf_demo/amf_demo instantiate comparg1 SC_2_1

As you can see by the “ps” the amf_demo program is running. And if you take a peek at the code you’ll see that it goes to syslog, so if you tail /var/log/messages you’ll see something like:

Mar 1 16:18:22 eceni amf_demo: #012#012 ##############################################
Mar 1 16:18:22 eceni amf_demo: # #
Mar 1 16:18:22 eceni amf_demo: # You are about to witness AvSv Demo !!! #
Mar 1 16:18:22 eceni amf_demo: # #
Mar 1 16:18:22 eceni amf_demo: ##############################################
Mar 1 16:18:22 eceni amf_demo: AMF thread entered
Mar 1 16:18:22 eceni amf_demo: AMF Initialization Done !!!
Mar 1 16:18:22 eceni amf_demo: #011AmfHandle: ff900001
Mar 1 16:18:22 eceni amf_demo: AMF Selection Object Get Successful !!!
Mar 1 16:18:22 eceni amf_demo: Component Name Get Successful !!!
Mar 1 16:18:22 eceni amf_demo: #011CompName: safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo
Mar 1 16:18:22 eceni amf_demo: Component Registered !!!
Mar 1 16:18:22 eceni opensaf_immnd: Create runtime object ‘safSISU=safSu=SU1\#safSg=AmfDemo\#safApp=AmfDemo,safSi=AmfDemo,safApp=AmfDemo’ by Impl id: 4
Mar 1 16:18:22 eceni opensaf_immnd: Create runtime object ‘safCSIComp=safComp=AmfDemo\#safSu=SU1\#safSg=AmfDemo\#safApp=AmfDemo,safCsi=AmfDemo,safSi=AmfDemo,safApp=AmfDemo’ by Impl id: 4
Mar 1 16:18:22 eceni opensaf_scap: ‘safSu=SU1,safSg=AmfDemo,safApp=AmfDemo’ TERMINATING => INSTANTIATED
Mar 1 16:18:22 eceni amf_demo: Dispatched ‘CSI Set’ Callback for Component: ‘safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo’
Mar 1 16:18:22 eceni amf_demo: #011CSIName: safCsi=AmfDemo,safSi=AmfDemo,safApp=AmfDemo #012 HAState: Active #012 CSIFlags: Add One
Mar 1 16:18:22 eceni amf_demo: INVOKING saAmfHAStateGet() API !!!
Mar 1 16:18:24 eceni amf_demo: CompName: safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo #012 CSIName: safCsi=AmfDemo,safSi=AmfDemo,safApp=AmfDemo #012 HAState: Active
Mar 1 16:18:24 eceni amf_demo: DEMONSTRATING AMF-INITIATED HEALTHCHECK !!!
Mar 1 16:18:26 eceni amf_demo: Started AMF-Initiated HealthCheck (with Component Failover Recommended Recovery) #012 Comp: safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo #012 HealthCheckKey: AmfDemo
Mar 1 16:18:26 eceni amf_demo: #012 Dispatched ‘HealthCheck’ Callback #012 Component: safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo #012 HealthCheckKey: AmfDemo
Mar 1 16:18:36 eceni amf_demo: #012 Dispatched ‘HealthCheck’ Callback #012 Component: safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo #012 HealthCheckKey: AmfDemo
Mar 1 16:18:46 eceni amf_demo: #012 Dispatched ‘HealthCheck’ Callback #012 Component: safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo #012 HealthCheckKey: AmfDemo
Mar 1 16:18:56 eceni amf_demo: #012 Dispatched ‘HealthCheck’ Callback #012 Component: safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo #012 HealthCheckKey: AmfDemo
Mar 1 16:19:07 eceni amf_demo: #012 Dispatched ‘HealthCheck’ Callback #012 Component: safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo #012 HealthCheckKey: AmfDemo
Mar 1 16:19:17 eceni amf_demo: #012 Dispatched ‘HealthCheck’ Callback #012 Component: safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo #012 HealthCheckKey: AmfDemo
Mar 1 16:19:27 eceni amf_demo: #012 Dispatched ‘HealthCheck’ Callback #012 Component: safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo #012 HealthCheckKey: AmfDemo
Mar 1 16:19:37 eceni amf_demo: #012 Dispatched ‘HealthCheck’ Callback #012 Component: safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo #012 HealthCheckKey: AmfDemo
Mar 1 16:19:47 eceni amf_demo: #012 Dispatched ‘HealthCheck’ Callback #012 Component: safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo #012 HealthCheckKey: AmfDemo
Mar 1 16:19:57 eceni amf_demo: #012 Dispatched ‘HealthCheck’ Callback #012 Component: safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo #012 HealthCheckKey: AmfDemo
Mar 1 16:19:57 eceni amf_demo: #012 Stopped HealthCheck for Comp: safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo with HealthCheckKey: AmfDemo
Mar 1 16:19:57 eceni amf_demo: #012#012 DEMONSTRATING COMPONENT FAILOVER THROUGH ERROR REPORT !!!

Great! Good job! You’ve gotten the demo app running on 1 node! Try doing 2 nodes. It should be the same process on the other node.

Running and Basic Debug of OpenSAF on Ubuntu

March 1, 2010

So now that you’ve installed OpenSAF and probably tried to run it and found that it didn’t work. I had to reverse engineer the startup scripts to figure out basic debug so let me share it here so you do not have to.

OpenSAF config files are located at /etc/opensaf.

All programs write basic logging to stdout and stderr which is redirected to /var/lib/opensaf/stdouts.

I set my rde.conf and nodeinit.conf correctly (see the README file in the openSAF distro) and got the following error:


/etc/init.d/opensafd start
Thu Feb 25 14:21:18 EST 2010 - Starting Node Initialization Daemon: /usr/local/lib/opensaf/ncs_nid
Starting TIPC service... Done.
Starting RDF service... Failed
Timed-out for response from:RDF

Going for recovery
Starting RDF service… Failed
Timed-out for response from:RDF

Going for recovery
Starting RDF service… Failed
Timed-out for response from:RDF

Starting RDF service… Failed
Timed-out for response from:RDF

Starting RDF service… Failed
Timed-out for response from:RDF

Looking in the rde console dump file gave something useful:
cat /var/lib/opensaf/stdouts/ncs_rde
/usr/local/lib/opensaf/ncs_rde: error while loading shared libraries: libSaAmf.so.0: cannot open shared object file: No such file or directory

Ok, well OpenSAF was installed to the standard location so I did:
export LD_LIBRARY_PATH=/usr/local/lib

Then I shutdown the tipc communications since OpenSAF likes to start it itself
rmmod tipc

So I tried again, still no luck. This time it just hangs at Starting RDF, but the stdout file was pretty clear:
cat /var/lib/opensaf/stdouts/ncs_rde
PID file : /var/run/opensaf/rde.pid
Shelf number : 2
Slot number : 1
Site number : 1
Log level : 5
Interactive mode : FALSE
(null): pidfile /var/run/opensaf/rde.pid open failed

So I did:
mkdir /var/run/opensaf/

And tried again.

root@tormalin:/# /etc/init.d/opensafd start
Thu Feb 25 15:05:19 EST 2010 - Starting Node Initialization Daemon: /usr/local/lib/opensaf/ncs_nid
Starting TIPC service... Done.
Starting RDF service... Done.
RDF-ROLE for this System Controller is: 0, ACTIVE
Starting DTSV service... Done.
Starting HLFM service... Done.
Starting IMMD service... Done.
Starting IMMND service... Done.
Starting LOGD service... Done.
Starting NTFD service... Done.
Starting EDSV service... Done.
Starting SCAP service... Done.
Node Initialization Successful.
SUCCESSFULLY SPAWNED ALL SERVICES!!!
Thu Feb 25 15:05:54 EST 2010 - OpenSAF Service Initialization Success

And it worked!

Installing OpenSAF

February 25, 2010

I did an installation in Ubuntu of OpenSAF Rel 4.0.M4 and these are my notes:

URL: http://download.opensaf.org/releases/opensaf-4.0.0.tar.gz

Prerequisites:

These are common packages that must be installed to build OpenSAF:
On Ubuntu:
apt-get -y install libxml2-dev flex bison build-essential libtool autoconf automake sqlite3

net-snmp-5.4 — optional

URL: http://net-snmp.sourceforge.net/download.html
wget http://voxel.dl.sourceforge.net/project/net-snmp/net-snmp/5.4.2.1/net-snmp-5.4.2.1.tar.gz

Prerequisites

apt-get -y install libperl-dev

Installation:

./configure; make; make install

Xerces C++ version 2.7.0 (edit — possibly unnecessary for OpenSAF 4.0 release):

(Note: please use only this version, the later versions is not working with this OpenSAF release).
URL: http://xml.apache.org/xerces-c/

prereq:

apt-get -y install autoconf

Installation:

wget http://archive.apache.org/dist/xml/xerces-c/Xerces-C_2_7_0/source/xerces-c-src_2_7_0.tar.gz
export XERCESCROOT={Where you detarred it}
cd $XERCESCROOT/src/xercesc
autoconf
./runConfigure -plinux -cgcc -xg++ -minmem -nsocket -tnative -rpthread
make; make install

TIPC:

You need to get the tipc config tools that match the tipc .ko provided with your kernel
Ubuntu: sudo apt-get install tipcutils
From Source:
wget http://downloads.sourceforge.net/project/tipc/tipc-linux-extras/tipc-utils-1.0.4/tipcutils-1.0.4.tar.gz?use_mirror=iweb
URL: http://tipc.sourceforge.net/download.html

OpenHPI:

URL: http://openhpi.org/
apt-get -y install openhpi libopenhpi-dev

prereq:

apt-get -y install libglib2.0-dev
apt-get -y install libltdl-dev
apt-get -y install e2fslibs-dev
apt-get -y install uuid-dev

Install

wget http://superb-sea2.dl.sourceforge.net/project/openhpi/openhpi-stable/2.14.1/openhpi-2.14.1.tar.gz

./configure;make;make install

TETWARE (edit: possibly optional for OpenSAF 4.0.0)

Download:

wget http://tetworks.opengroup.org/tet/tet3.7a-unsup.src.tar.gz
mkdir tet
cd tet; tar xvfz ../tet*.gz

Installation:

export TET_ROOT=`pwd`; export PATH=$PATH:$TET_ROOT/bin
sh ./configure -t lite; cd src; make install

Building OpenSAF:

Installation

./configure; make; make install

What is this?

February 24, 2010

This blog will be a place where I chronicle my experiences and discoveries about OpenSAF.  I hope that this will help others who are using the software, as I have found that there is very little searchable information on the web.  OpenSAF is an implementation of the Service Availability Forum’s High Availability and middleware specification.  You can find more info about it at http://www.saforum.org, and find out about OpenSAF at http://www.opensaf.org.