[Spread-users] Increasing MAX_GROUP_NAME leads to message header corruption?

Sebastian Wrede swrede at techfak.uni-bielefeld.de
Mon Apr 28 07:01:54 EDT 2008


Hi,

in the process of figuring out what the reasons for the group name
problem I described in my previous mail could be, I experienced another
phenomenon which leads to a segmentation fault in the spread daemon. I
can reproduce this by conducting the following steps:

     1. start Java listener joining 32 Byte-long MD5 group
     2. start Java informer sending a message to several of these groups
        (also to the one the listener joined previously)
     3. nothing happens (no message is delivered to listener)
     4. re-starting the Java client leads to a segmentation fault of the
        spread deamon

All this happens without any modification of the group name length
limits. I've attached the excerpts from the log files as far as I assume
them to be relevant.

Is this behavior "normal" or is this possibly a bug indicating that in
the Java lib the full space of 32 chars for group names is sent to the
Spread daemon for registration even if it cannot process it adequately?
Or am I missing something obvious?

If I reduce the group length in Java by one to 31 characters everything
is working perfectly.

Best regards,

Sebastian

Logs:
// 1. Joining MD5 group from Java client (with MAX_GROUP_NAME==32)
G_handle_join: #m-118638#localhost joins group 937d8b7012763ca9d7d0756afe285525
G_handle_join in GOP
new: creating pointer 0x81ad0d8 to object type 31 named group
G_handle_join: New group added with group id:
G_handle_join: Group_id {Proc ID: 127.0.1.1, Time: 1209375380, Index: 0}

// 2. Sending Message from Java to some (here 6) MD5 groups
Sess_read: queueing message of type 2 with len 0 to the protocol
Handle_hurry: sending token now
new: reusing pointer 0x81ad8c8 to object type 35 named time_event
dispose: disposing pointer 0x81abe58 to object type 35 named time_event
E_queue: dequeued a (first) simillar event
E_queue: (first) event queued func 0x8049f20 code 0 data 0x0 in future (2:0)
DL_send: sent a message of 32 bytes to (127.0.1.1,4804) on channel 5
Prot_token_hurry: retransmiting token 7 1
E_handle_events: next event 
E_handle_events: poll select
E_handle_events: exec handler for fd 4, fd_type 0, priority 1
DL_recv: received 32 bytes on channel 4
Received Token
new: reusing pointer 0x81ad058 to object type 2 named pack_head_obj
dispose: disposing pointer 0x81abea8 to object type 20 named scatter
dispose: disposing pointer 0x816e990 to object type 27 named down_link
Send_new_packets: packet 5 sent and inserted 
Net_flush_bcast: Flushing with Queued_bytes = 493; num_elements in scat = 2; size of scat0,1 = 36 457
DL_send: sent a message of 32 bytes to (127.0.1.1,4804) on channel 5
new: reusing pointer 0x81abe58 to object type 35 named time_event
dispose: disposing pointer 0x81ad8c8 to object type 35 named time_event
E_queue: dequeued a (first) simillar event
E_queue: (first) event queued func 0x8049f20 code 0 data 0x0 in future (2:0)
new: reusing pointer 0x81ad8c8 to object type 35 named time_event
dispose: disposing pointer 0x81abe80 to object type 35 named time_event
E_queue: dequeued a simillar event

// 3. a) Re-Started listener process tries to join MD5 groups leads to segfault
G_handle_join: #m-926480#localhost joins group 937d8b7012763ca9d7d0756afe2855250918d71640504079870a7153360e2b518793e0a896faac8318f8a745d7d3be93937d8b7012763ca9d7d0756afe285525d9f9684cd89e977a0a8a527ae00adc424a6e2cc2383ffad1bbaafc59e078d9b1<?xml version="1.0"?>< [...remaining data of previous call to multicast from different process]
G_handle_join in GOP
new: reusing pointer 0x81ad0d8 to object type 31 named group

// 3. b) Stacktrace of spread daemon's segmentation fault as displayed on terminal
Successfully configured Segment 0 [127.0.0.255:4803] with 1 procs:
                   localhost: 127.0.1.1
Set Alarm mask to: ffffffff
        *** glibc detected *** /vol/xcf/sbin/spread: malloc(): memory corruption (fast): 0x081ad218 ***
======= Backtrace: =========
/lib/tls/i686/cmov/libc.so.6[0xb7dd1c42]
/lib/tls/i686/cmov/libc.so.6(__libc_malloc+0x90)[0xb7dd2fc0]
/vol/xcf/sbin/spread[0x8065e31]
/vol/xcf/sbin/spread[0x8066729]
/vol/xcf/sbin/spread[0x8053c2b]
/vol/xcf/sbin/spread[0x804d29b]
/vol/xcf/sbin/spread[0x804a2d9]
/vol/xcf/sbin/spread[0x804a9a7]
/vol/xcf/sbin/spread[0x804ba47]
/vol/xcf/sbin/spread[0x8055bd9]
/vol/xcf/sbin/spread[0x8049c8a]
/lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe0)[0xb7d7d050]
/vol/xcf/sbin/spread[0x8049551]
======= Memory map: ========
08048000-08077000 r-xp 00000000 08:02 1406630    /vol/xcf/sbin/spread
08077000-08078000 rw-p 0002e000 08:02 1406630    /vol/xcf/sbin/spread
08078000-081cc000 rw-p 08078000 00:00 0          [heap]
b7c00000-b7c21000 rw-p b7c00000 00:00 0 
b7c21000-b7d00000 ---p b7c21000 00:00 0 
b7d50000-b7d5a000 r-xp 00000000 08:02 959110     /lib/libgcc_s.so.1
b7d5a000-b7d5b000 rw-p 0000a000 08:02 959110     /lib/libgcc_s.so.1
b7d5b000-b7d64000 r-xp 00000000 08:02 991630     /lib/tls/i686/cmov/libnss_files-2.6.1.so
b7d64000-b7d66000 rw-p 00008000 08:02 991630     /lib/tls/i686/cmov/libnss_files-2.6.1.so
b7d66000-b7d67000 rw-p b7d66000 00:00 0 
b7d67000-b7eab000 r-xp 00000000 08:02 991620     /lib/tls/i686/cmov/libc-2.6.1.so
b7eab000-b7eac000 r--p 00143000 08:02 991620     /lib/tls/i686/cmov/libc-2.6.1.so
b7eac000-b7eae000 rw-p 00144000 08:02 991620     /lib/tls/i686/cmov/libc-2.6.1.so
b7eae000-b7eb2000 rw-p b7eae000 00:00 0 
b7eb2000-b7ec6000 r-xp 00000000 08:02 991626     /lib/tls/i686/cmov/libnsl-2.6.1.so
b7ec6000-b7ec8000 rw-p 00013000 08:02 991626     /lib/tls/i686/cmov/libnsl-2.6.1.so
b7ec8000-b7eca000 rw-p b7ec8000 00:00 0 
b7eca000-b7eed000 r-xp 00000000 08:02 991624     /lib/tls/i686/cmov/libm-2.6.1.so
b7eed000-b7eef000 rw-p 00023000 08:02 991624     /lib/tls/i686/cmov/libm-2.6.1.so
b7f01000-b7f03000 rw-p b7f01000 00:00 0 
b7f03000-b7f1d000 r-xp 00000000 08:02 959118     /lib/ld-2.6.1.so
b7f1d000-b7f1f000 rw-p 00019000 08:02 959118     /lib/ld-2.6.1.so
bff2b000-bff41000 rw-p bff2b000 00:00 0          [stack]
ffffe000-fffff000 r-xp 00000000 00:00 0          [vdso]
Aborted (core dumped)



Am Sonntag, den 27.04.2008, 18:11 +0200 schrieb Sebastian Wrede:
> Hi,
> 
> in order to allow for URIs that map to group names with a length that is
> significantly longer than 32 characters, we encode these as MD5 hashes. 
> 
> In order to do so, we increased the value of MAX_GROUP_NAME by 1 to 33
> characters because it seems that in the default of 32 chars the zero
> termination symbol is included. Because spuser now allows us to join,
> send to and receive messages from an "MD5" group, which it did not
> before this modification I assume the zero termination assumption is
> correct, isn't it?
> 
> The issue now is that once we change the group length in
> SpreadConnection.java to 33 characters, too, it is not possible to send
> any longer any messages from a java client to the spread daemon. As soon
> as the message is written to the socket, an exception (broken pipe) is
> thrown. The spread daemon warns about an "Invalid Hint" message field,
> probably suggesting that the message protocol seems broken.
> 
> I've already scanned the mail archives for similar problems, but could
> not find an appropriate answer. It would be great if anyone could point
> us in the right direction to solve this problem.
> 
> BTW: We use spread-4.0.0 on an Ubuntu 7.10 linux distribution configured
> for testing purposes with a segment containing only the local host. We
> applied the source changes in /include/sp.h /daemon/spread_params.h and
> SpreadConnection.java.
> 
> 
> Best regards,
> 
> Sebastian
> 
> 





More information about the Spread-users mailing list