From jschultz at spreadconcepts.com Fri Feb 3 20:21:27 2012 From: jschultz at spreadconcepts.com (John Schultz) Date: Fri, 3 Feb 2012 20:21:27 -0500 Subject: [Spread-users] sporadic latencies with SP_receive In-Reply-To: <4F26D349.2060602@techfak.uni-bielefeld.de> References: <4E2D57C5.3040201@techfak.uni-bielefeld.de> <02AF4C40-A45C-483D-980E-8C2A609769FC@spreadconcepts.com> <4F26D349.2060602@techfak.uni-bielefeld.de> Message-ID: <813DC5CF-8B30-4408-A880-C0491481F78C@spreadconcepts.com> Johannes, In the new version of Spread we are currently developing we believe we've solved this issue. Spread will now use both the old badger timeout semantics and also monitor client sockets for write-ability. So, the issue you saw should no longer exist in the next version of Spread which is due out soon. Cheers! ----- John Lane Schultz Spread Concepts LLC Phn: 301 830 8100 Cell: 443 838 2200 On Jan 30, 2012, at 12:28 PM, Johannes Wienke wrote: Hey, Am 01/09/2012 10:04 PM, schrieb John Schultz: > I've looked into this report (thanks for the demonstration app!) and > have figured out both complaints. > > The first complaint is that a receiver can sometimes see latencies of > up to 100ms from SP_receive even when lots of traffic has been sent > to them. [...] Thanks, the proposed workaround already saved our lives once. Of course we would be interested in a real solution for this. ;) > The second complaint was that you got a stack corruption bug when you > passed a privateGroup array of only MAX_PRIVATE_NAME characters. > This is expected as the privateGroup array is expected to be > MAX_GROUP_NAME characters long. Ok, thanks. Kind regards, Johannes _______________________________________________ Spread-users mailing list Spread-users at lists.spread.org http://lists.spread.org/mailman/listinfo/spread-users -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3805 bytes Desc: not available Url : http://lists.spread.org/pipermail/spread-users/attachments/20120203/0b0a511f/attachment.bin From gorhas at gmail.com Sat Feb 4 01:07:20 2012 From: gorhas at gmail.com (=?UTF-8?B?R8O2cmFuIEhhc3Nl?=) Date: Sat, 4 Feb 2012 07:07:20 +0100 Subject: [Spread-users] usage of gethostname Message-ID: I gathered that there is a new release in pipeline and want to discuss a small issue. In configuration.c there is a usage of gethostname /* Match my IP address to entry in configuration file */ if( my_name == NULL ){ gethostname(machine_name,sizeof(machine_name)); host_ptr = gethostbyname(machine_name); But I will argue that the hostname has nothing to do with *interface names* Line of reasoning... (There is so many names around in a Unix system!). First we have the KERNELNAME! This is the current compiled kernel running in the system. We get this with uname -a and typicaly gives us key information on the kernel. Linux fs487 2.6.38-13-generic #53-Ubuntu SMP Mon Nov 28 19:33:45 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux Then whe have INTERFACE NAMES! A unix system could have a lot of interfaces eth0 eth1 eth2 We give them IP-numbers ifconfig eth0 inet 123.123.123.123 ... ifconfig eth0:1 inet 234.234.234.234 ... ifconfig eth1 inet 123.123.123.123 .... And so on! To handle this long list of interface id we make a database (DNS) or for convinience make a small database in /etc/hosts. In this database we holds small aliases for the interfaces. /etc/hosts 123.123.123.123 road.to.perdition.org perdition 234.234.234.234 road.to.paradise.org paradise Then we have the HOSTNAME This is the name of the *hardware* on which we are running. This name should NOT! be used in any way in association with the interface names. I know that this is often so - but it should not. So I argue that the function *getaddrinfo* should be used instad. Not that the manpage of this function also mix the terminology "node" and "host" in a confusing manner. Cordially yours, G?ran Hasse -- gorhas at gmail.com G?ran Hasse Boo 229 715 91 ODENSBACKEN Mob: 070-5530148 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.spread.org/pipermail/spread-users/attachments/20120204/f90b9d5b/attachment.html From hisham at gobolinux.org Thu Feb 2 14:57:50 2012 From: hisham at gobolinux.org (Hisham H M) Date: Thu, 2 Feb 2012 17:57:50 -0200 Subject: [Spread-users] [PATCH] pkgconfig file for libspread Message-ID: Hello, I'm just getting started with Spread and I'm enjoying it so far. As I approached integrating the client library with my autoconf scripts, I found it simpler to add a .pc file to the Spread distribution rather than diving in the autoconf macros for library search. The attached patch adds a libspread.pc to the Spread distribution, which is generated at build time and installed along with libspread. I put it in the libspread directory since I figured that for installations that build only the client library, I could do "make -C libspread install" instead of installing the whole thing. This way I get only libraries, headers and the .pc file installed. With this patch, using the client library in other autoconf-based projects can be done somewhat like this: #------------------------------------ AC_ARG_WITH([libspread], [ --with-libspread=PREFIX Location of libspread, the Spread client library], [PKG_CONFIG_PATH=$PKG_CONFIG_PATH:$with_libspread/lib/pkgconfig],[]) export PKG_CONFIG_PATH LIBSPREADFOUND=no PKG_CHECK_MODULES(LIBSPREAD,libspread >= 4.1,LIBSPREADFOUND=yes) if ! test x$LIBSPREADFOUND = xyes; then echo echo "Failed." echo "Libspread and its devel headers are required." exit 1 fi #------------------------------------ Thanks, -- Hisham ps: I'm not currently subscribed to the list, so please cc: me on any replies. -------------- next part -------------- A non-text attachment was scrubbed... Name: spread-4.1.0-pkgconfig_for_libspread.diff Type: text/x-patch Size: 2148 bytes Desc: not available Url : http://lists.spread.org/pipermail/spread-users/attachments/20120202/9788aa33/attachment.bin From Andreas.Koehler at cassidian.com Fri Feb 10 05:11:30 2012 From: Andreas.Koehler at cassidian.com (Koehler, Andreas) Date: Fri, 10 Feb 2012 11:11:30 +0100 Subject: [Spread-users] network priority Message-ID: <4AF8778B5747C44783C6D046C9819B6C021B86F5@GSX703A.mxchg.m.corp> Hi, is there a way to configure a higher priority of spreads network packages (e.g. as specified with RFC 2597 http://tools.ietf.org/html/rfc2597 and/or RFC 3260 http://tools.ietf.org/html/rfc3260)? Thanks, Andreas -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.spread.org/pipermail/spread-users/attachments/20120210/6fff4715/attachment.html From matthew.garman at gmail.com Wed Feb 22 13:40:10 2012 From: matthew.garman at gmail.com (Matt Garman) Date: Wed, 22 Feb 2012 12:40:10 -0600 Subject: [Spread-users] Send_new_packets: created packet 203 already exist 2 Message-ID: Hi, I asked about this back in May, 2008 [1], but never really came to any resolution. As a refresher, we're getting regular spread daemon crashes (it went away for a while, but has recently become a very regular occurrence, as in several times/day). We're using spread version 4.00.00, self-compiled on CentOS 5.6. The log leading up to the crash looks like this: [Wed 22 Feb 2012 12:04:58] Prot_handle_token: BUG WORKAROUND: Too many rounds in EVS state; swallowing token; state: [Wed 22 Feb 2012 12:04:58] Aru: 241 [Wed 22 Feb 2012 12:04:58] My_aru: 241 [Wed 22 Feb 2012 12:04:58] Highest_seq: 200 [Wed 22 Feb 2012 12:04:58] Highest_fifo_seq: 103 [Wed 22 Feb 2012 12:04:58] Last_discarded: 0 [Wed 22 Feb 2012 12:04:58] Last_delivered: 241 [Wed 22 Feb 2012 12:04:58] Last_seq: 3533 [Wed 22 Feb 2012 12:04:58] Token_rounds: 501 [Wed 22 Feb 2012 12:04:58] Last Token: [Wed 22 Feb 2012 12:04:58] type: 0x80040080 [Wed 22 Feb 2012 12:04:58] transmiter_id: -1407973572 [Wed 22 Feb 2012 12:04:58] seq: 0 [Wed 22 Feb 2012 12:04:58] proc_id: -1407973572 [Wed 22 Feb 2012 12:04:58] aru: 241 [Wed 22 Feb 2012 12:04:58] aru_last_id: -1407973572 [Wed 22 Feb 2012 12:04:58] flow_control: 0 [Wed 22 Feb 2012 12:04:58] rtr_len: 0 [Wed 22 Feb 2012 12:04:58] conf_hash: 1007608523 Membership id is ( -1407973572, 1329934005) [Wed 22 Feb 2012 12:04:58] -------------------- [Wed 22 Feb 2012 12:04:58] Configuration at lnxsvr1 is: [Wed 22 Feb 2012 12:04:58] Num Segments 1 [Wed 22 Feb 2012 12:04:58] 4 172.20.7.63 4803 [Wed 22 Feb 2012 12:04:58] lnxsvr1 172.20.7.60 [Wed 22 Feb 2012 12:04:58] lnxsvr2 172.20.7.61 [Wed 22 Feb 2012 12:04:58] lnxsvr6 172.20.7.62 [Wed 22 Feb 2012 12:04:58] lnxsvr5 172.20.7.58 [Wed 22 Feb 2012 12:04:58] ==================== [Wed 22 Feb 2012 12:04:58] Send_new_packets: created packet 203 already exist 2 Exit caused by Alarm(EXIT) Any thoughts? Thanks, Matt [1] http://lists.spread.org/pipermail/spread-users/2008-May/003824.html From jschultz at spreadconcepts.com Wed Feb 22 14:04:28 2012 From: jschultz at spreadconcepts.com (John Schultz) Date: Wed, 22 Feb 2012 14:04:28 -0500 Subject: [Spread-users] Send_new_packets: created packet 203 already exist 2 In-Reply-To: References: Message-ID: Hi Matt, These kinds of issues have been lingering for some time now as you know. We suspect that the internal state of the daemons is somehow being corrupted or getting to an illegal state through a bug of some sort. This issue eventually manifests itself when the daemon(s) later detect some invariant being violated and commits suicide. It shows up in different places for different people at different times but the underlying cause is likely the same for many of these kinds of reports. It seems to occur more often for people that have "flaky" (e.g. - higher loss than normal, asymmetric comms., etc.) networks and high message rates. Unfortunately, using something like valgrind to try to catch invalid memory accesses is unlikely to help because the overhead is too high and ends up affecting the performance too much away from native running behavior (i.e. - Heisenbug). We believe the only way we will ultimately squash this bug(s) is to add more internal validity checking, internal state dumps and/or code reviews to try to spot them. We are currently wrapping up a release candidate for the next version of Spread (4.2). After that is release is officially out, we intend to turn our attention to this issue to try to nail down what is happening here. Since you are regularly running into this issue, and it has proven very hard for us to cause in our test environment, it might be helpful if we could deploy test versions into your environment. Would you be open to that kind of arrangement? Cheers! ----- John Lane Schultz Spread Concepts LLC Phn: 301 830 8100 Cell: 443 838 2200 On Feb 22, 2012, at 1:40 PM, Matt Garman wrote: Hi, I asked about this back in May, 2008 [1], but never really came to any resolution. As a refresher, we're getting regular spread daemon crashes (it went away for a while, but has recently become a very regular occurrence, as in several times/day). We're using spread version 4.00.00, self-compiled on CentOS 5.6. The log leading up to the crash looks like this: [Wed 22 Feb 2012 12:04:58] Prot_handle_token: BUG WORKAROUND: Too many rounds in EVS state; swallowing token; state: [Wed 22 Feb 2012 12:04:58] Aru: 241 [Wed 22 Feb 2012 12:04:58] My_aru: 241 [Wed 22 Feb 2012 12:04:58] Highest_seq: 200 [Wed 22 Feb 2012 12:04:58] Highest_fifo_seq: 103 [Wed 22 Feb 2012 12:04:58] Last_discarded: 0 [Wed 22 Feb 2012 12:04:58] Last_delivered: 241 [Wed 22 Feb 2012 12:04:58] Last_seq: 3533 [Wed 22 Feb 2012 12:04:58] Token_rounds: 501 [Wed 22 Feb 2012 12:04:58] Last Token: [Wed 22 Feb 2012 12:04:58] type: 0x80040080 [Wed 22 Feb 2012 12:04:58] transmiter_id: -1407973572 [Wed 22 Feb 2012 12:04:58] seq: 0 [Wed 22 Feb 2012 12:04:58] proc_id: -1407973572 [Wed 22 Feb 2012 12:04:58] aru: 241 [Wed 22 Feb 2012 12:04:58] aru_last_id: -1407973572 [Wed 22 Feb 2012 12:04:58] flow_control: 0 [Wed 22 Feb 2012 12:04:58] rtr_len: 0 [Wed 22 Feb 2012 12:04:58] conf_hash: 1007608523 Membership id is ( -1407973572, 1329934005) [Wed 22 Feb 2012 12:04:58] -------------------- [Wed 22 Feb 2012 12:04:58] Configuration at lnxsvr1 is: [Wed 22 Feb 2012 12:04:58] Num Segments 1 [Wed 22 Feb 2012 12:04:58] 4 172.20.7.63 4803 [Wed 22 Feb 2012 12:04:58] lnxsvr1 172.20.7.60 [Wed 22 Feb 2012 12:04:58] lnxsvr2 172.20.7.61 [Wed 22 Feb 2012 12:04:58] lnxsvr6 172.20.7.62 [Wed 22 Feb 2012 12:04:58] lnxsvr5 172.20.7.58 [Wed 22 Feb 2012 12:04:58] ==================== [Wed 22 Feb 2012 12:04:58] Send_new_packets: created packet 203 already exist 2 Exit caused by Alarm(EXIT) Any thoughts? Thanks, Matt [1] http://lists.spread.org/pipermail/spread-users/2008-May/003824.html _______________________________________________ Spread-users mailing list Spread-users at lists.spread.org http://lists.spread.org/mailman/listinfo/spread-users -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3805 bytes Desc: not available Url : http://lists.spread.org/pipermail/spread-users/attachments/20120222/db1f8f9c/attachment.bin