~shefty/librdmacm.git
5 years agoPerform completion event acknowledgments in batches instead
Sreedhar Kodali [Thu, 18 Sep 2014 08:59:48 +0000 (14:29 +0530)]
Perform completion event acknowledgments in batches instead
of individually to minimze locking overheads.  Size of the
completion queue decides the size of a batch.

Signed-off-by: Sreedhar Kodali <srkodali@linux.vnet.ibm.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
5 years agorsockets: Add fine grained interception mechanism for preload library
Sreedhar Kodali [Thu, 18 Sep 2014 06:03:21 +0000 (11:33 +0530)]
rsockets: Add fine grained interception mechanism for preload library

By default the R-Sockets pre-loading library intercepts all
the stream and datagram sockets belonging to a launched
program processes and threads.

However, distributed application and database servers may
require fine grained interception to ensure that only the
processes which are listening for remote connections on the
RDMA transport need to be enabled with RDMA while remaining
can continue to use TCP as before.  This allows proper
communication happening between various server components locally.

A configuration file based mechanism is introduced to facilitate
this fine grained interception mechanism.  As part of preload
initialization, the configuration file is scanned and an
in-memory record store is created with all the entries found.
When a request is made to intercept a socket, its attributes
are cross checked with stored records to see whether we
should proceed with rsocket switch over.

Note: Right now, the fine grained interception mechanism is
enabled only for newly created sockets.  Going forward,
this can be extened to select connections based on the
specified host/IP addresses and ports as well.

"preload_config" is the name of the configuration file which
should exist in the default configuration location
(usually the full path to this configuration file is:
<install-root>/etc/rdma/rsocket/preload_config)
of an installed rsocket library.

The sample format for this configuration file is shown below:

# Sample config file for preloading in a program specific way
#
# Each line entry should have the following format:
#
#   program domain type protocol
#
# where,
#
# program    - program or command name (string without spaces)
# domain     - the socket domain: AF_INET / AF_INET6 / AF_IB
# type       - the socket type: SOCK_STREAM / SOCK_DGRAM
# protocol   - the socket protocol: IPPROTO_TCP / IPPROTO_UDP
#
# The wildcard value of '*' is supported for any
#
# Note:
#  Lines beginning with '#' character are treated as comments.

Signed-off-by: Sreedhar Kodali <srkodali@linux.vnet.ibm.com>
Reviewed-by: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
6 years agorsockets: Support calling listen multiple times on same rsocket
Sean Hefty [Thu, 4 Sep 2014 18:19:28 +0000 (11:19 -0700)]
rsockets: Support calling listen multiple times on same rsocket

Standard sockets allows an application to call listen() multiple
times on the same socket without error.  This allows a multi-threaded
app to call listen from all threads.

rsockets will fail the second listen call.  Modify the behavior to
match standard sockets.

Problem reported by: Sreedhar Kodali <srkodali@linux.vnet.ibm.com>

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
6 years agorsocket: Index map item is cleaned before it is used in iomapping cleanup
Sasha Kotchubievsky [Thu, 4 Sep 2014 16:16:21 +0000 (09:16 -0700)]
rsocket: Index map item is cleaned before it is used in iomapping cleanup

rs_free function clears index map item corresponding to the roscket
(in idm_clear called from rs_remove) and then uses it in
iomapping cleanup (in riounmap called from rs_free_iomappings).

Signed-off-by: Sasha Kotchubievsky <sashakot@mellanox.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
6 years agoRenumber 1.0.19-1 release to 1.0.19.1 v1.0.19.1
Sean Hefty [Mon, 18 Aug 2014 20:52:22 +0000 (13:52 -0700)]
Renumber 1.0.19-1 release to 1.0.19.1

6 years agoMerge branch 'dev' v1.0.19-1
Sean Hefty [Fri, 8 Aug 2014 18:53:16 +0000 (11:53 -0700)]
Merge branch 'dev'

6 years agoRelease 1.0.19-1 hotfix
Sean Hefty [Thu, 7 Aug 2014 22:43:00 +0000 (15:43 -0700)]
Release 1.0.19-1 hotfix

6 years agoindexer: Include errno.h directly
Sean Hefty [Mon, 4 Aug 2014 17:01:31 +0000 (10:01 -0700)]
indexer: Include errno.h directly

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
6 years agorsocket: Segmentation fault fix in case of multiple connections
Ilya Nelkenbaum [Mon, 28 Jul 2014 12:48:09 +0000 (15:48 +0300)]
rsocket: Segmentation fault fix in case of multiple connections

In case of more than 16 rsocket connections
are established, "svc->rss" buffer is reallocated
with more memory. Index 0 is reserved for the service's
communication socket, and this is not taken in count
when data is copied from old buffer location to
new one.

Signed-off-by: Ilya Nelkenbaum <ilyan@mellanox.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
6 years agoudpong: Fix client_recv error check
Sean Hefty [Wed, 23 Jul 2014 06:24:53 +0000 (23:24 -0700)]
udpong: Fix client_recv error check

We only want to report an error if it's not EGAIN.  The if
statement is reversed.  Correct it.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
6 years agoRelease 1.0.19 v1.0.19
Sean Hefty [Wed, 16 Jul 2014 22:49:16 +0000 (15:49 -0700)]
Release 1.0.19

6 years agoriostream: Only verify last data transfer
Sean Hefty [Wed, 16 Jul 2014 20:44:56 +0000 (13:44 -0700)]
riostream: Only verify last data transfer

Data verification will fail when running the bandwidth
tests or the transfer count is > 1.  The issue is that
subsequent writes by the initiator side will overwrite
the data in the target buffer before the receiver can
verify that it is correct.

To fix this, only verify that the data in the buffer
is correct after the last transfer has completed.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
6 years agoRevert "Revert "rsocket: Change keepalive to 0-byte RDMA write""
Sean Hefty [Mon, 7 Jul 2014 15:40:44 +0000 (08:40 -0700)]
Revert "Revert "rsocket: Change keepalive to 0-byte RDMA write""

This reverts commit a34703c53259845dd20450a87eb6747030e23e8b.

0-byte RDMA writes appears to be working correctly with
HCAs from 2 different vendors.  The original problem that
was reported turned out to be a user error.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
6 years agorsocket: Update correct rsocket keepalive time
Sean Hefty [Thu, 3 Jul 2014 20:45:52 +0000 (13:45 -0700)]
rsocket: Update correct rsocket keepalive time

When the keepalive time of an rsocket is updated, the
updated information is forwarded to the keepalive service
thread.  However, the thread updates the time for the
wrong service as shown:

tcp_svc_timeouts[svc->cnt] = rs_get_time() + msg.rs->keepalive_time;

The index into tcp_svc_timeouts should correspond to the
rsocket being updated, not the last one in the list.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
6 years agorsocket: Fix removing rsocket from service thread
Sean Hefty [Thu, 3 Jul 2014 20:55:39 +0000 (13:55 -0700)]
rsocket: Fix removing rsocket from service thread

When removing an rsocket from a service thread, we replace
the removed service with the one at the end of the service list.
This keeps the array tightly packed.  However, rs_svc_rm_rs
decrements the rsocket count before doing the swap.  The result
is that the entry at the end of the list gets dropped off.
Defer decrementing the count until the swap has been made.

In this case, the cnt value is a valid index into the array,
because we start at index 1.  Index 0 is used internally by
the service thread.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
6 years agorsocket: Fix crash resulting from keepalive timeout
Sean Hefty [Wed, 2 Jul 2014 22:37:10 +0000 (15:37 -0700)]
rsocket: Fix crash resulting from keepalive timeout

The following crash was reported by Hal Rosenstock,
<hal@mellanox.com>, with keepalive enabled.  The crash
occurs in the keepalive thread attempting to send a
keepalive message.

report:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffecf08700 (LWP 6013)]
rs_post_write (rs=<value optimized out>, sgl=0x0, nsge=0, wr_data=3758096385,
    flags=0, addr=0, rkey=0) at src/rsocket.c:1660
1660            return rdma_seterrno(ibv_post_send(rs->cm_id->qp, &wr, &bad));
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.107.el6.x86_64
(gdb)
(gdb) p/x rs
$1 = value has been optimized out

So I added in the following to debug:
1660    if (rs == NULL)
1661    abort();
1662    if (rs->cm_id == NULL)
1663    abort();
1664    if (rs->cm_id->qp == NULL)
1665    abort();
1666            return rdma_seterrno(ibv_post_send(rs->cm_id->qp, &wr, &bad));
1667    }

And saw in gdb:

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffecf08700 (LWP 8096)]
0x00000030d50328a5 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.107.el6.x86_64
(gdb)
(gdb) bt
#0  0x00000030d50328a5 in raise () from /lib64/libc.so.6
#1  0x00000030d5034085 in abort () from /lib64/libc.so.6
#2  0x00007ffff057fe23 in rs_post_write (rs=<value optimized out>, sgl=0x1fa0,
    nsge=6, wr_data=4294967295, flags=0, addr=0, rkey=0) at src/rsocket.c:1665
#3  0x00007ffff058193d in tcp_svc_send_keepalive (arg=0x7ffff0789f20)
    at src/rsocket.c:4245
#4  tcp_svc_run (arg=0x7ffff0789f20) at src/rsocket.c:4279
#5  0x00000030d5807851 in start_thread () from /lib64/libpthread.so.0
#6  0x00000030d50e890d in clone () from /lib64/libc.so.6
(gdb) fr 2
#2  0x00007ffff057fe23 in rs_post_write (rs=<value optimized out>, sgl=0x1fa0,
    nsge=6, wr_data=4294967295, flags=0, addr=0, rkey=0) at src/rsocket.c:1665
1665    abort();

So qp is NULL somehow...
:end report

There is an issue if an rsocket is closed without going through
the rshutdown.

int rshutdown(int socket, int how)
{
...
if (rs->opts & RS_OPT_SVC_ACTIVE)
rs_notify_svc(&tcp_svc, rs, RS_SVC_REM_KEEPALIVE);

We remove the rsocket from the keepalive thread in rshutdown.

int rclose(int socket)
{
...
if (rs->state & rs_connected)
rshutdown(socket, SHUT_RDWR);
...
rs_free(rs);

rclose will call shutdown only if we're connected.  However, if the
keepalive failed, the socket will be in an error state.  So,
no call to rshutdown, which will leave the freed rsocket on
the keepalive thread's list.

The fix is to to have rclose remove an rsocket from being processed
by a service thread if it is still active.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
6 years agoexample/rdma_xclient/server: Update XRC support in sample programs
Sean Hefty [Wed, 2 Jul 2014 05:52:40 +0000 (22:52 -0700)]
example/rdma_xclient/server: Update XRC support in sample programs

Update rdma_xclient and rdma_xserver sample programs to test
XRC data transfers.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
6 years agordmacm: Update addrinfo with XRC support
Sean Hefty [Wed, 2 Jul 2014 05:56:43 +0000 (22:56 -0700)]
rdmacm: Update addrinfo with XRC support

Remove internal defines, and use libibverbs exported values
instead.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
6 years agordmacm: Add support for XRC QPs
Sean Hefty [Wed, 2 Jul 2014 00:47:22 +0000 (17:47 -0700)]
rdmacm: Add support for XRC QPs

Export a new extended create QP call.  Add support for XRC
QPs.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
6 years agordmacm: Add support for allocating XRC SRQs
Sean Hefty [Wed, 2 Jul 2014 00:14:13 +0000 (17:14 -0700)]
rdmacm: Add support for allocating XRC SRQs

Add extended SRQ creation call, to support allocating
XRC SRQs.  Use the rdma_cm_id qp type field to
determine which type of SRQ should be allocated.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
6 years agordmacm: Add functionality to allocate an XRCD
Sean Hefty [Tue, 1 Jul 2014 23:46:34 +0000 (16:46 -0700)]
rdmacm: Add functionality to allocate an XRCD

XRC QPs and SRQs are associated by an XRC domain.  Provide a
call to allocate an XRCD, similar to how the rdmacm allocates
a PD for the user.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
6 years agobuild: Add build support for XRC
Sean Hefty [Tue, 1 Jul 2014 23:17:30 +0000 (16:17 -0700)]
build: Add build support for XRC

Modify autotools to check for and require a libibverbs
version that includes XRC and extension support.

Remove any code used to support older versions of
libibverbs.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
6 years agolibrdmacm: Use SRQ in rdma_create_qp
Sean Hefty [Tue, 1 Jul 2014 20:30:42 +0000 (13:30 -0700)]
librdmacm: Use SRQ in rdma_create_qp

If an application has allocated an SRQ on an rdma_cm_id, use
it when creating a QP.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
6 years agolibrdmacm: Remove NULL checks after calling alloca
Sean Hefty [Wed, 25 Jun 2014 19:56:18 +0000 (12:56 -0700)]
librdmacm: Remove NULL checks after calling alloca

alloca doesn't return a NULL pointer on failure.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
6 years agoRevert "rsocket: Change keepalive to 0-byte RDMA write"
Sean Hefty [Sat, 21 Jun 2014 00:44:26 +0000 (17:44 -0700)]
Revert "rsocket: Change keepalive to 0-byte RDMA write"

This reverts commit 0f2c76e81ecf1470cf152600c08c421e7e82b00e.

Testing has shown that this does not always result in the
keep-alive message working correctly, such that a broken
connection is reported as having failed.  The reason for this
behavior is unknown, but revert the patch until the issue has
been resolved.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
6 years agolibrdmacm: In ucma_convert_path, fix selector values
Hal Rosenstock [Thu, 19 Jun 2014 17:08:02 +0000 (13:08 -0400)]
librdmacm: In ucma_convert_path, fix selector values

Intent is for the selectors to be equal to (exactly) rather than less than.
Selector for exactly is value of 2 rather than 1.

Signed-off-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
6 years agorsocket: Add support for RDMA_ROUTE option in rgetsockopt
Hal Rosenstock [Thu, 19 Jun 2014 15:54:11 +0000 (11:54 -0400)]
rsocket: Add support for RDMA_ROUTE option in rgetsockopt

Create as many ibv_path_data structs from the RDMA route
ibv_sa_path_rec struct for the rsocket based on how
many fit into the supplied buffer.

Signed-off-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
6 years agoMerge branch 'dev'
Sean Hefty [Wed, 18 Jun 2014 18:56:42 +0000 (11:56 -0700)]
Merge branch 'dev'

6 years agorsocket: Change keepalive to 0-byte RDMA write
Susan K. Coulter [Mon, 16 Jun 2014 17:28:08 +0000 (10:28 -0700)]
rsocket: Change keepalive to 0-byte RDMA write

Signed-off-by: Susan K. Coulter <markus@cj-fe1.lanl.gov>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
6 years agordma_server: handle IBV_SEND_INLINE correctly
Doug Ledford [Wed, 18 Jun 2014 17:45:23 +0000 (10:45 -0700)]
rdma_server: handle IBV_SEND_INLINE correctly

Not all RDMA devices support IBV_SEND_INLINE.  At least some of those
that don't will ignore the flag passed to rdma_post_send and attempt to
send the command by using an sge entry instead.  Because we don't
register the send memory, this fails.  The proper way to deal with the
fact that IBV_SEND_INLINE is not guaranteed is to check the returned
value in our cap struct to see if we have support for inline data, and
if not, fall back to non-inline sends and to register the send memory
region.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
6 years agordma_client: handle IBV_SEND_INLINE correctly
Doug Ledford [Wed, 18 Jun 2014 17:44:49 +0000 (10:44 -0700)]
rdma_client: handle IBV_SEND_INLINE correctly

Not all RDMA devices support IBV_SEND_INLINE.  At least some of those
that don't will ignore the flag passed to rdma_post_send and attempt to
send the command by using an sge entry instead.  Because we don't
register the send memory, this fails.  The proper way to deal with the
fact that IBV_SEND_INLINE is not guaranteed is to check the returned
value in our cap struct to see if we have support for inline data, and
if not, fall back to non-inline sends and to register the send memory
region.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
6 years agordma_server: use perror, unwind allocs on failure
Doug Ledford [Wed, 18 Jun 2014 17:44:28 +0000 (10:44 -0700)]
rdma_server: use perror, unwind allocs on failure

Our main test function prints out errno directly, which is hard to read
as it's not decoded at all.  Instead, use perror() to make failures more
readable.  Also redo the failure flow so that we can do a simple unwind
at the end of the function and just jump to the right unwind spot on
error.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
6 years agordma_client: use perror, unwind allocs on failure
Doug Ledford [Wed, 18 Jun 2014 17:44:13 +0000 (10:44 -0700)]
rdma_client: use perror, unwind allocs on failure

Our main test function prints out errno directly, which is hard to read
as it's not decoded at all.  Instead, use perror() to make failures more
readable.  Also redo the failure flow so that we can do a simple unwind
at the end of the function and just jump to the right unwind spot on
error.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
6 years agocmtime: rework program to be multithread
Doug Ledford [Wed, 18 Jun 2014 17:43:04 +0000 (10:43 -0700)]
cmtime: rework program to be multithread

When using very large numbers of connections (10,000 was in use here),
we ran into a problem where when we resolved a performance problem in
the kernel cma.c code, we suddenly developed a new problem.  That new
problem turned out to be the fact that with the underlying kernel issue
resolved, 10,000 connect requests would flood the server side of the
test and the cmtime application would respond as quickly as possible.
However, the client side would not bother to check any of the returns
until after having sent all 10,000 connect requests.  When the kernel
had a serializing performance problem, this was OK.  When it was fixed,
this caused a general slowdown in connect operations due to overruns in
the event processing.  This patch causes the client side to fire off
threads that will handle responses to connect requests as they come in
instead of allowing them to backlog uncontrollably.  Times for a 10,000
connect run changed from this:

[root@rdma-dev-01 ~]# more
3.12.0-rc1.cached_gids+optimized_connect+trimmed_cache+.output
ib1:
step              total ms     max ms     min us  us / conn
create id    :       46.64       0.10       1.00       4.66
bind addr    :       89.61       0.04       7.00       8.96
resolve addr :       50.63      26.18   23976.00       5.06
resolve route:      565.44     538.77   26736.00      56.54
create qp    :     4028.31       5.70     326.00     402.83
connect      :    50077.42   49990.49   90734.00    5007.74
disconnect   :     5277.25    4850.35  380017.00     527.72
destroy      :       42.15       0.04       2.00       4.21

ib0:
step              total ms     max ms     min us  us / conn
create id    :       34.82       0.04       1.00       3.48
bind addr    :       25.94       0.02       1.00       2.59
resolve addr :       48.18      25.01   22779.00       4.82
resolve route:      501.28     476.26   25071.00      50.13
create qp    :     3274.12       6.05     257.00     327.41
connect      :    55549.64   55490.32   62150.00    5554.96
disconnect   :     5263.64    4851.18  375628.00     526.36
destroy      :       47.20       0.07       2.00       4.72

to this:

[root@rdma-dev-01 ~]# more
3.12.0-rc1.cached_gids+optimized_connect+trimmed_cache+-fixed-cmtime.output
ib1:
step              total ms     max ms     min us  us / conn
create id    :       34.45       0.08       1.00       3.44
bind addr    :       88.41       0.04       7.00       8.84
resolve addr :       33.59       4.65     612.00       3.36
resolve route:      618.68       0.61      97.00      61.87
create qp    :     4024.03       6.30     341.00     402.40
connect      :     6983.35    6886.33    8509.00     698.33
disconnect   :     5066.47     230.34     831.00     506.65
destroy      :       37.02       0.03       2.00       3.70

ib0:
step              total ms     max ms     min us  us / conn
create id    :       42.61       0.14       1.00       4.26
bind addr    :       27.05       0.03       2.00       2.70
resolve addr :       40.65      10.73     869.00       4.06
resolve route:      626.75       0.60     103.00      62.68
create qp    :     3334.50       6.48     273.00     333.45
connect      :     6310.29    6251.59   13298.00     631.03
disconnect   :     5111.12     365.87     867.00     511.11
destroy      :       36.57       0.02       2.00       3.66

with this patch.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
6 years agorsocket: Use malloc instead of calloc
Hal Rosenstock [Wed, 18 Jun 2014 16:55:06 +0000 (09:55 -0700)]
rsocket: Use malloc instead of calloc

No need to clear allocated memory as immediately followed by
memcpy which covers the allocated memory.

Signed-off-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
6 years agolibrdmacm: Update rdma_accept man page
Sean Hefty [Tue, 27 May 2014 18:43:05 +0000 (11:43 -0700)]
librdmacm: Update rdma_accept man page

Document NULL conn_param parameter for rdma_accept.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
6 years agoindexer: Free index_map resources when cleared
Sean Hefty [Thu, 22 May 2014 23:13:08 +0000 (16:13 -0700)]
indexer: Free index_map resources when cleared

Free memory allocated for index map entries when they are no
longer in use.  To handle this, count the number of entries
stored by the index map item arrays and release the arrays when
no items are being tracked.

This reduces valgrind noise.

Problem reported by: Hannes Weisbach <hannes_weisbach@gmx.net>

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
6 years agorstream: fix "-T resolve" detection
Patrick MacArthur [Wed, 30 Apr 2014 04:30:08 +0000 (21:30 -0700)]
rstream: fix "-T resolve" detection

Signed-off-by: Patrick MacArthur <pmacarth@iol.unh.edu>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
6 years agolibrdmacm: Fix verbs leak due to reentrancy issue
shamir rabinovitch [Wed, 30 Apr 2014 02:57:36 +0000 (19:57 -0700)]
librdmacm: Fix verbs leak due to reentrancy issue

Any call to ucma_init_device must be done under lock.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
6 years agorsocket: Relax requirement for minimal inline data
Sean Hefty [Thu, 17 Apr 2014 05:01:51 +0000 (22:01 -0700)]
rsocket: Relax requirement for minimal inline data

Inline data support is optional.  Allow rsockets to work
with devices that do not support inline data, provided
that they do support RDMA writes with immediate data.
This allows rsockets to work over Intel TrueScale HCA.

Patch derived from work by: Amir Hanania

Signed-off-by: Amir Hanania <amir.hanania@intel.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
6 years agorsocket: Modify when control messages are available
Sean Hefty [Thu, 17 Apr 2014 05:33:38 +0000 (22:33 -0700)]
rsocket: Modify when control messages are available

Rsockets currently tracks how many control messages (i.e.
entries in the send queue) that are available using a
single ctrl_avail counter.  Seems simple enough.

However, control messages currently require the use of
inline data.  In order to support control messages that
do not use inline data, we need to associate each
control message with a specific data buffer.  This will
become easier to manage if we modify how we track when
control messages are available.

We replace the single ctrl_avail counter with two new
counters.  The new counters conceptually treat control
messages as if each message had its own sequence number.
The sequence number will then be able to correspond to
a specific data buffer in a follow up patch.

ctrl_seqno will be used to indicate the current control
message being sent.  ctrl_max_seqno will track the
highest control message that may be sent.

A side effect of this change is that we will be able to
see how many control messages have been sent.  This also
separates the updating of the control count on the
sending  side, versus the receiving side.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
6 years agorsocket: Dedicate a fixed number of SQEs for control messages
Sean Hefty [Thu, 17 Apr 2014 15:37:47 +0000 (08:37 -0700)]
rsocket: Dedicate a fixed number of SQEs for control messages

The number of SQEs allocated for control messages is set
to 1 of 2 constant values (either 4 or 2).  A default
value is used unless the size of the SQ is below a certain
threshold (16 entries).  This results in additional code
complexity, and it is highly unlikely that the SQ would
ever be allocated smaller than 16 entries.

Simplify the code to use a single constant value for the
number of SQEs allocated for control messages.  This will
also help in subsequent patches that will need to deal
with HCAs that do not support inline data.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
6 years agorsocket: Check max inline data after creating QP
Sean Hefty [Thu, 17 Apr 2014 04:42:06 +0000 (21:42 -0700)]
rsocket: Check max inline data after creating QP

The ipath provider will ignore the max_inline_size
specified as input into ibv_create_qp and instead
return the size that it supports (which is 0) on
output.

Update the actual inline size returned from create QP,
and check that it meets the minimum requirement for
rsockets.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
6 years agolibrdmacm: Make ucma_init_all static
Sean Hefty [Wed, 30 Apr 2014 03:11:35 +0000 (20:11 -0700)]
librdmacm: Make ucma_init_all static

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
6 years agolibrdmacm: Support lazy initialization
Sean Hefty [Wed, 9 Apr 2014 19:19:25 +0000 (12:19 -0700)]
librdmacm: Support lazy initialization

librdmacm currently opens a device context per configured HCA. This is
usually done in rdma_create_event_channel() or first time whenever
ucma_init() is called. If a process is only going to use one of the
configured HCAs/RDMA IPs then the remaining device contexts are not
used/required. Opening a device context on each device apriori limits the
maximum number of processes that can be supported on a node to the maximum
number of open context supported per HCA regardless of number of HCAs present
in the system.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
6 years agorsocket: Fix sbuf_bytes_avail counter 'overrun' with iwarp
Sean Hefty [Thu, 6 Mar 2014 21:42:31 +0000 (13:42 -0800)]
rsocket: Fix sbuf_bytes_avail counter 'overrun' with iwarp

Reported-by: Jonas Pfefferle1 <JPF@zurich.ibm.com>
"The problem is that on the client side sbuf_bytes_avail overflows
in rs_poll_cq.  And from what I debugged so far there are 2
completions for every send and this is because I use iWarp hardware
which does not support write with immediate so there is one completion
for the write and one for the send (both go into the default case
and add the length to sbuf_bytes_avail)."

To avoid the issue, we flag send message operations that are used
in place of immediate data.  Other send message operations are
not affected.  The completion code can then check whether the
completion is for a send message which was paired with an RDMA
write transaction and adjust the behavior accordingly.  Additionally,
such send messages only carry the opcode in their WR_ID, with the
data portion zeroed.  This avoids adding the length value twice.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
6 years agoriostream: Add AF_IB support
Hal Rosenstock [Wed, 5 Mar 2014 20:51:54 +0000 (12:51 -0800)]
riostream: Add AF_IB support

Allow the user to specify GID addresses (AF_IB) with riostream

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
6 years agorsocket: Return EBADF on bad rsocket fd
Hal Rosenstock [Wed, 5 Mar 2014 01:06:47 +0000 (17:06 -0800)]
rsocket: Return EBADF on bad rsocket fd

Eliminates potential seg faults when passed an invalid rsocket.

Signed-off-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
6 years agoman/rsocket: Enhance riomap documentation
Sean Hefty [Wed, 5 Mar 2014 00:59:20 +0000 (16:59 -0800)]
man/rsocket: Enhance riomap documentation

Document that the user must set IOMAPSIZE in order to
use the riomap call.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
6 years agolibrdmacm 1.0.18 v1.0.18
Sean Hefty [Mon, 27 Jan 2014 20:10:55 +0000 (12:10 -0800)]
librdmacm 1.0.18

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
6 years agoudaddy: Remove support for port space IB
Sean Hefty [Mon, 27 Jan 2014 19:30:34 +0000 (11:30 -0800)]
udaddy: Remove support for port space IB

UD support for the IB port space requires that the application
use rdma_create_ep, rather than rdma_create_id.  However, using
rdma_create_ep results in address and route resolution being
performed synchronously as part of the rdma_create_ep call.
Since udaddy is an example, we want to show how it can be used
with asynchronous events.  So, rather than update udaddy to
use rdma_create_ep in order to support the IB port space, it
would be better to remove that support.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
6 years agorsocket: Add keepalive logic
Susan K. Coulter [Fri, 17 Jan 2014 22:31:42 +0000 (14:31 -0800)]
rsocket: Add keepalive logic

Actually send and receive keepalive messages if keepalive is
enabled on an rsocket.

Signed-off-by: Susan K. Coulter <markus@cj-fe2.lanl.gov>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
6 years agolibrdmacm: Add directives on binding to IPv6 any address to man pages
Or Gerlitz [Wed, 4 Dec 2013 00:51:07 +0000 (16:51 -0800)]
librdmacm: Add directives on binding to IPv6 any address to man pages

Explain how to bind to IPv6 any address in the man pages for the examples

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
6 years agolibrdmacm: Check 'init' under mutex
Sean Hefty [Tue, 26 Nov 2013 21:16:19 +0000 (13:16 -0800)]
librdmacm: Check 'init' under mutex

ucma_ib_init() does a quick check that access to ibacm has
been initialized.  This check is done outside of the
acm_lock mutex.  We need to check init again inside of
holding the mutex to ensure that we don't run the
initialization code twice.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
6 years agorping: Fix server reporting error on exit
Sean Hefty [Mon, 18 Nov 2013 21:12:04 +0000 (13:12 -0800)]
rping: Fix server reporting error on exit

Commit e57196c71ddd850e14f3e66355f02786e4914f72
rping: added checks to the return values functions
resulted in the rping server always reporting that
it failed.  Fix this by only failing in the case of
an unexpected termination, and not the result of
the client completing.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
6 years agoRetrieve SGID after calling rdma_bind_addr
Sean Hefty [Mon, 11 Nov 2013 18:24:54 +0000 (10:24 -0800)]
Retrieve SGID after calling rdma_bind_addr

A change was made to rdma_bind_addr when AF_IB is enabled
to only retrieve the resulting bound address.  Previously,
rdma_bind_addr would retrieve the corresponding SGID as
well.  This breaks some apps which were checking the
SGID after binding to an IP address.  Revert to the
previous behavior of also retrieving the SGID after
calling rdma_bind_addr.

Tested-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
6 years agolibrdmacm: Some fixes to man pages
Guy Shapiro [Tue, 5 Nov 2013 17:52:20 +0000 (19:52 +0200)]
librdmacm: Some fixes to man pages

Fix the man pages of rdma_destroy_ep & rdma_destroy_qp to the correct return value (void).

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
6 years ago[librdmacm] Makefile.am: Add missing riostream man page to man_MANS
Hal Rosenstock [Mon, 4 Nov 2013 12:56:08 +0000 (07:56 -0500)]
[librdmacm] Makefile.am: Add missing riostream man page to man_MANS

Signed-off-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
6 years agorsockets: Handle race between rshutdown and rpoll
Sean Hefty [Fri, 16 Aug 2013 22:15:12 +0000 (15:15 -0700)]
rsockets: Handle race between rshutdown and rpoll

Multi-threaded applications which call rpoll and rshutdown
simultaneously can hang.  Ceph developers reported an issue
with the rsocket implementation.  Ceph calls rpoll in
one thread, and while that thread is blocked in rpoll,
a second thread may cann rshutdown on the socket.  In
normal sockets, this results in the poll call unblocking
(since a call to read on the socket will no longer block).
however, rsockets does not free the thread blocked on the
rpoll call.

To fix this, we add some additional state checking to
protect against threads calling rpoll and rshutdown
simultaneously.  We also have the rshutdown call
transition the QP into an error state.  This causes all
posted receives to complete as flushed, which results
in unblocking the thread in rpoll (to process the flushed
receives).

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
7 years ago[librdmacm] man/rstream.1: Update man page to be consistent with rstream -h
Hal Rosenstock [Wed, 11 Sep 2013 19:37:11 +0000 (15:37 -0400)]
[librdmacm] man/rstream.1: Update man page to be consistent with rstream -h

Signed-off-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
7 years ago[librdmacm] rstream.c: Indicate when specified address family is unknown
Hal Rosenstock [Wed, 11 Sep 2013 18:44:32 +0000 (14:44 -0400)]
[librdmacm] rstream.c: Indicate when specified address family is unknown

Signed-off-by: Hal Rosenstock >hal@mellanox.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
7 years ago[librdmacm] man/rdma_create_id.3: Add RDMA_PS_IB port space description
Hal Rosenstock [Wed, 11 Sep 2013 18:44:28 +0000 (14:44 -0400)]
[librdmacm] man/rdma_create_id.3: Add RDMA_PS_IB port space description

Signed-off-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
7 years agoexamples: Add cmtime to .gitignore
Yan Droneaud [Tue, 27 Aug 2013 18:37:54 +0000 (11:37 -0700)]
examples: Add cmtime to .gitignore

Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
7 years agorsocket: Update rsocket man page
Sean Hefty [Thu, 22 Aug 2013 22:29:15 +0000 (15:29 -0700)]
rsocket: Update rsocket man page

Update fork support and RDMA_ROUTE socket option.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
7 years agocmtime: Add retry support for address and route resolution
Sean Hefty [Thu, 22 Aug 2013 19:00:54 +0000 (12:00 -0700)]
cmtime: Add retry support for address and route resolution

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
7 years agocmtime: Allow user to specify timeout values
Sean Hefty [Thu, 22 Aug 2013 18:54:56 +0000 (11:54 -0700)]
cmtime: Allow user to specify timeout values

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
7 years agocmtime: Add ability to time rdma_bind_addr calls
Sean Hefty [Thu, 22 Aug 2013 18:30:33 +0000 (11:30 -0700)]
cmtime: Add ability to time rdma_bind_addr calls

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
7 years agocmtime: Add example program that times rdma cm calls
Sean Hefty [Mon, 5 Aug 2013 17:57:43 +0000 (10:57 -0700)]
cmtime: Add example program that times rdma cm calls

cmtime is a new sample program that measures how long it
takes for each step in the connection process to complete.
It can be used to analyze the performance of the various
CM steps.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
7 years agorstream: Use rsocket option to set route directly
Sean Hefty [Fri, 26 Jul 2013 16:52:55 +0000 (09:52 -0700)]
rstream: Use rsocket option to set route directly

If we're using GID addressing, rdma_getaddrinfo can return
routing data directly.  Add an option for the user to
indicate that rdma_getaddrinfo should be called in place of
getaddrinfo.  And if routing data is available, call
rsetsockopt to set the route.

This helps test rsockets when ibacm and AF_IB support are
available.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
7 years agorsocket: Return 0 on success for SOL_RDMA options
Sean Hefty [Fri, 2 Aug 2013 21:18:06 +0000 (14:18 -0700)]
rsocket: Return 0 on success for SOL_RDMA options

The processing of SOL_RDMA does not set the return value in
the case of successfully handled options.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
7 years agorsockets: Add ability to set the IB route directly
Sean Hefty [Mon, 10 Jun 2013 19:33:20 +0000 (12:33 -0700)]
rsockets: Add ability to set the IB route directly

Add an RDMA specific rsocket option that allows the user
to program the RDMA route directly.  This is useful
for apps that have path record data available, e.g. from
ibacm.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
7 years agoexamples: Add support for native IB addressing to samples
Sean Hefty [Sun, 21 Jul 2013 02:22:55 +0000 (19:22 -0700)]
examples: Add support for native IB addressing to samples

Allow the user to specify GID addresses (AF_IB) into
udaddy and rstream.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
7 years agorsockets: Support native IB addressing on connected rsockets
Sean Hefty [Thu, 18 Jul 2013 20:26:15 +0000 (13:26 -0700)]
rsockets: Support native IB addressing on connected rsockets

Update rsockets to support AF_IB addresses on connected rsockets.
Support for datagram rsockets is more difficult as a result of
using real UDP sockets for QP resolution, so that support is
deferred.  For connected sockets, we need to update internal
checks to handle AF_IB.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
7 years ago[4/4] Declare 'server_port' as an unsigned variable
Bart Van Assche [Sun, 28 Jul 2013 09:20:54 +0000 (11:20 +0200)]
[4/4] Declare 'server_port' as an unsigned variable

Change the data type of the 'server_port' variable from signed to
unsigned such that the cast in the fscanf() call can be removed.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
7 years ago[3/4] rsocket: Remove the unused variable 'ret'
Bart Van Assche [Sun, 28 Jul 2013 09:19:48 +0000 (11:19 +0200)]
[3/4] rsocket: Remove the unused variable 'ret'

The variable 'ret' is assigned a value but that value is never used.
This triggers the following compiler warning:

src/rsocket.c:3720:9: warning: variable 'ret' set but not used [-Wunused-but-set-variable]

Hence remove this variable.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
7 years ago[2/4] cma: Remove the unused variable 'id_priv'
Bart Van Assche [Sun, 28 Jul 2013 09:19:15 +0000 (11:19 +0200)]
[2/4] cma: Remove the unused variable 'id_priv'

The variable 'id_priv' is assigned a value but is never used.
This triggers the following compiler warning:

src/cma.c:1178:25: warning: variable 'id_priv' set but not used [-Wunused-but-set-variable]

Hence remove this variable.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
7 years ago[1/4] acm: Remove the unused variable 'pri_path'
Bart Van Assche [Sun, 28 Jul 2013 09:18:36 +0000 (11:18 +0200)]
[1/4] acm: Remove the unused variable 'pri_path'

The variable 'pri_path' is assigned a value but is never used.
This triggers the following compiler warning:

src/acm.c:301:26: warning: variable 'pri_path' set but not used [-Wunused-but-set-variable]

Hence remove this variable.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
7 years agoinit: Remove USE_IB_ACM configuration option
Sean Hefty [Mon, 10 Jun 2013 17:57:56 +0000 (10:57 -0700)]
init: Remove USE_IB_ACM configuration option

When the librdmacm is configured, it sets the USE_IB_ACM option
if infininband/acm.h is found.  We can remove this option with
very little overhead, which would allow a user to install
ACM after installing the librdmacm, and the librdmacm would be
able to make use of ACM.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
7 years agoacm: Define needed ACM protocol messages
Sean Hefty [Mon, 10 Jun 2013 18:07:12 +0000 (11:07 -0700)]
acm: Define needed ACM protocol messages

The librdmacm needs message definitions used to communicate
with the ibacm.  It currently pulls these from infiniband/acm.h,
which is installed by ibacm.  This creates an install order
dependency on ibacm.  However, work on the scalable SA has
the ibacm using the librdmacm (via rsockets) for communication
between the different SSA components.

To resolve this issue, have the librdmacm define the message
structures that it needs to communicate with ibacm.  The
librdmacm already defines some ACM messages through configuration
checks.  We just expand that capability, which isolates the librdmacm
package from the ibacm package.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
7 years agocmatose: Allow user to specify address format
Sean Hefty [Wed, 29 Aug 2012 22:02:54 +0000 (15:02 -0700)]
cmatose: Allow user to specify address format

Provide an option for the user to indicate the type of
addresses used as input.  Support hostname, IPv4, IPv6,
and GIDs.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
7 years agoRemove executable mode bit on text files
Yann Droneaud [Tue, 16 Jul 2013 23:03:42 +0000 (16:03 -0700)]
Remove executable mode bit on text files

Source code and man page should not be executable.

Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
7 years agoOpen files with "close on exec" flag
Yann Droneaud [Tue, 16 Jul 2013 21:59:52 +0000 (23:59 +0200)]
Open files with "close on exec" flag

File opened by librdmacm are not supposed to be inherited across
exec*(), most of the files are of no use for another program, and
others cannot be used without the associated memory mapping.

This patch changes fopen() open() and socket() to always set
close on exec flag.

This patch also add checks to configure to guess if fopen() supports
"e" flag. If O_CLOEXEC and SOCK_CLOEXEC are supported, fopen() should
support "e". If not supported, its discarded according to POSIX. Many
operating systems have support for fopen("e").

You might find more information about close on exec in the following articles:

- "Excuse me son, but your code is leaking !!!" by Dan Walsh
  http://danwalsh.livejournal.com/53603.html

- "Secure File Descriptor Handling" by Ulrich Drepper
  http://udrepper.livejournal.com/20407.html

Note: this patch won't set close on exec flag on file descriptors
created by the kernel for completion channel and such.
This is addressed by another kernel patch.

Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
7 years agoAdd .gitignore rules
Yann Droneaud [Tue, 16 Jul 2013 21:59:50 +0000 (23:59 +0200)]
Add .gitignore rules

Add the list of files/patterns to be exclueded from git status output.
Additionally it will prevent such files/patterns to be added and committed.

Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
7 years agoconfigure: Use automake's option "subdir-objects"
Yann Droneaud [Tue, 16 Jul 2013 21:59:49 +0000 (23:59 +0200)]
configure: Use automake's option "subdir-objects"

Following advice in "Autotool Mythbuster" [1], option subdir-objects
can be used to have Makefiles create object files in the same
directory than theirs source files.

It reduces clobbering in the build directory.

[1] "Autotool Mythbuster", by Diego Elio "Flameeyes" Petten`o
http://www.flameeyes.eu/autotools-mythbuster/automake/nonrecursive.html

Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
7 years agoconfigure: Apply updates proposed by autoupdate
Yann Droneaud [Tue, 16 Jul 2013 21:59:48 +0000 (23:59 +0200)]
configure: Apply updates proposed by autoupdate

'autoupdate' is a tool to help developer to update configure.ac.

This patch applies a few fixes as suggested by autoupdate.

Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
7 years agoautogen.sh: Use autoreconf in autogen.sh
Jeff Squyres [Tue, 16 Jul 2013 21:59:47 +0000 (23:59 +0200)]
autogen.sh: Use autoreconf in autogen.sh

The old sequence of Autotools commands listed in autogen.sh is no
longer correct.  Instead, just use the single "autoreconf" command,
which will invoke all the Right Autotools commands in the correct
order.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
7 years agoMakefile.am: Fix an automake warning
Bart Van Assche [Tue, 16 Jul 2013 21:59:46 +0000 (23:59 +0200)]
Makefile.am: Fix an automake warning

Fix the following automake warning message:

    Makefile.am:1: `INCLUDES' is the old name for `AM_CPPFLAGS' (or `*_CPPFLAGS')

A quote from the automake manual:

    INCLUDES
        This does the same job as AM_CPPFLAGS (or any per-target _CPPFLAGS variable
        if it is used). It is an older name for the same functionality. This
        variable is deprecated; we suggest using AM_CPPFLAGS and per-target
        _CPPFLAGS instead.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
7 years agoAdd "foreign" option to AM_INIT_AUTOMAKE
Bart Van Assche [Tue, 16 Jul 2013 21:59:45 +0000 (23:59 +0200)]
Add "foreign" option to AM_INIT_AUTOMAKE

Switch to the modern form of the AM_INIT_AUTOMAKE macro and tell
automake that the librdmacm package does not follow the GNU
standards. This change makes it possible to use 'autoreconf' for the
librdmacm package.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
7 years agolib: Rename configure.in to configure.ac
Sean Hefty [Thu, 2 May 2013 20:47:51 +0000 (13:47 -0700)]
lib: Rename configure.in to configure.ac

Update to latest autotools naming.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
7 years agorsocket: Add support for iWarp
Sean Hefty [Thu, 11 Apr 2013 17:05:29 +0000 (10:05 -0700)]
rsocket: Add support for iWarp

iWarp does not support RDMA writes with immediate data.
Instead of sending messages using immediate data, allow
the rsocket protocol to exchange messages using sends.

The rsocket protocol remains the same.  RDMA writes are
used for data transfers, with send messages used to transfer
rsocket protocol messages.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
7 years agorsocket: Merge usage of wr_id between stream and datagram svcs
Sean Hefty [Fri, 12 Apr 2013 21:41:52 +0000 (14:41 -0700)]
rsocket: Merge usage of wr_id between stream and datagram svcs

The rsocket data streaming and datagram services use different
formats for the wr_id.  Although some differences are needed,
we can make them more similar.  This will be useful when the
wr_id is used for iwarp support, plus eliminates use of wr_id
bits that aren't actually needed.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
7 years agolibrdmacm: Release 1.0.17 v1.0.17
Sean Hefty [Wed, 6 Mar 2013 01:18:11 +0000 (17:18 -0800)]
librdmacm: Release 1.0.17

7 years agolibrdmacm/rsocket: Fix resetting O_NONBLOCK after calling shutdown
Sean Hefty [Wed, 20 Feb 2013 04:03:58 +0000 (20:03 -0800)]
librdmacm/rsocket: Fix resetting O_NONBLOCK after calling shutdown

Shutdown switches an rsocket from nonblocking to blocking to
ensure that all data has been sent.  After completing all
transfers, it should switch back to nonblocking; this handles
partial shutdown situations, where only half the connection
is shut down.  However, the code uses the value of '1' to
set the nonblocking flag, rather than O_NONBLOCK.  Fix this.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
7 years agolibrdmacm/rstream: Reduce default transfer count
Sean Hefty [Tue, 5 Feb 2013 00:52:18 +0000 (16:52 -0800)]
librdmacm/rstream: Reduce default transfer count

1 million ping-pong transfers takes over 3 seconds to
complete, and I'm impatient.  Reduce the default number of
transfers for small messsages to speed up running
performance tests, especially when running over slower
connections, like TCP sockets or over a WAN.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
7 years agolibrdmacm: Work-around kernel bug returning uid = 0
Sean Hefty [Sat, 2 Feb 2013 01:17:34 +0000 (17:17 -0800)]
librdmacm: Work-around kernel bug returning uid = 0

Older kernels have a bug where it can report an event with the
uid set to 0.  The librdmacm crashes when casting the uid to
an rdma_cm_id and dereferencing the NULL pointer.

There are a limited number of events where this can occur and
in most cases it's safe to simply discard the event.  (This is
what the kernel does anyway.)  However, it's possible for us
to process an RDMA_CM_EVENT_ESTABLISHED event with the uid
set to 0.  (See kernel commit 418edaaba96e58112b15c82b4907084e2a9caf42.)

Although it's rare for this to occur, it does in fact happen
in practice.  To work-around the kernel bug, when the uid of an
established event is set to 0, we first try to locate the correct
user space id based on related data before discarding the event.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
7 years agolibrdmacm: Define ucma_ib_init when IB_ACM is disabled
Sean Hefty [Mon, 28 Jan 2013 22:56:25 +0000 (14:56 -0800)]
librdmacm: Define ucma_ib_init when IB_ACM is disabled

ucma_ib_init is only defined if IB_ACM is enabled, which is
determined by looking for the infiniband/acm.h header file.
Define ucma_ib_init when IB_ACM is disabled.

Problem reportedy by Suresh Shelvapille <suri@baymicrosystems.com>

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
7 years agorsockets: Update rsocket man page
Sean Hefty [Mon, 21 Jan 2013 23:28:39 +0000 (15:28 -0800)]
rsockets: Update rsocket man page

Update man page to include recently added rsocket options
and undocumented configuration file.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
7 years agorsockets: Add support for existing UDP apps
Sean Hefty [Wed, 9 Jan 2013 22:54:47 +0000 (14:54 -0800)]
rsockets: Add support for existing UDP apps

Support for existing UDP applications is done via the rspreload
library.  However, when the preload library is loaded, socket
calls used by rsockets get intercepted and converted into
rsocket calls.

The preload library was able to handle this for TCP rsockets
by using a per thread variable and checking for recursive calls
coming from rsockets back into the preload library.  The preload
library would direct such calls to the real socket calls.

The problem is more complex for UDP rsockets, which can invoke
socket calls from an internal rsocket thread.  The result is that
the preload library intercepts socket calls that originate from
the rsocket library which are not recursive.

Although, this is really a problem with the preload library,
the simplest solution is for rsockets to fully initialize the
library when allocating the first rsocket, versus deferring
initialization until required.  The preload library can then
detect the recursive calls.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
7 years agoexamples/udpong: Add test program for rsocket datagrams
Sean Hefty [Wed, 5 Dec 2012 23:58:03 +0000 (15:58 -0800)]
examples/udpong: Add test program for rsocket datagrams

Add a sample test program to test datagram rsockets.  Move
common routines used by udpong and other test programs into
a common source file.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
7 years agorsocket: Add datagram support
Sean Hefty [Fri, 9 Nov 2012 18:26:38 +0000 (10:26 -0800)]
rsocket: Add datagram support

Add datagram support through the rsocket API.

Datagram support is handled through an entirely different protocol and
internal implementation than streaming sockets.  Unlike connected rsockets,
datagram rsockets are not necessarily bound to a network (IP) address.
A datagram socket may use any number of network (IP) addresses, including
those which map to different RDMA devices.  As a result, a single datagram
rsocket must support using multiple RDMA devices and ports, and a datagram
rsocket references a single UDP socket, plus zero or more UD QPs.

Rsockets uses headers inserted before user data sent over UDP sockets to
resolve remote UD QP numbers.  When a user first attempts to send a datagram
to a remote address (IP and UDP port), rsockets will take the following steps:

1. Store the destination address into a lookup table.
2. Resolve which local network address should be used when sending
   to the specified destination.
3. Allocate a UD QP on the RDMA device associated with the local address.
4. Send the user's datagram to the remote UDP socket.

A header is inserted before the user's datagram.  The header specifies the
UD QP number associated with the local network address (IP and UDP port) of
the send.

A service thread is used to process messages received on the UDP socket.  This
thread updates the rsocket lookup tables with the remote QPN and path record
data.  The service thread forwards data received on the UDP socket to an
rsocket QP.  After the remote QPN and path records have been resolved, datagram
communication between two nodes are done over the UD QP.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>