[Spread-users] spread connection error

John Lane Schultz jschultz at spreadconcepts.com
Tue May 23 11:27:51 EDT 2017


Hi Emmanuel,

I took a quick look at the logs and monitor output and I think there is something wrong with the networking.  In the monitor outputs, the segment retransmissions by GTC_T_PE1_PC3 continually go up at a quite fast pace but no messages are being delivered.

A segment retransmission happens when multiple daemons in the same segment miss a message.  What I think is happening is that your broadcast in the second segment isn’t working.  So, when any of them try to broadcast a message, then the other two daemons in that segment don’t hear it.  

I assume you are sending through GTC_T_PE1_PC3 and/or GTC_T_PA1_MAI? 

Those two daemons will receive each others messages because they send them to each other via unicast, then GTC_T_PA1_MAI broadcasts for the rest of its segment, but they never hear it.  They both mark the messages as missed, which causes a segment retransmission.  The problem is that the retransmission is also broadcast from GTC_T_PA1_MAI, which they never hear.  And round and round it goes with them never being able to recover the messages and progress halts.

It only truly breaks when you add the third daemon to the segment because with only two daemons in the segment they will use unicast recovery to recover the messages, which does work.

I don’t know if this issue is because we changed the networking code for broadcast that broke it on Windows, if Windows has changed how it handles broadcast, if your network addresses are wrong, or if there is a firewall somewhere blocking broadcast.

I also don’t know why GTC_T_PE1_PC3 would complain about “name not unique” on the absolute FIRST client connection to Spread.  Once it gets into the above condition, then it may block input from clients (as it isn’t making progress), but then I’d expect your client connections to be refused, timeout or hang in some other manner.

You can try to build the spsend.exe and sprecv.exe programs and see if you can get broadcast or multicast to work in your segments.

Another alternative is to put every daemon in its own segment.  That will likely work, but they will “broadcast” messages to all the other daemons by sending it unicast N times, once for each of the other daemons.  If your application doesn’t require high throughput, then that might be acceptable.

Cheers!

-----
John Lane Schultz
Spread Concepts LLC
Cell: 443 838 2200

On May 23, 2017, at 10:06 AM, MAGNIER Emmanuel [EIFFAGE ENERGIE] <Emmanuel.MAGNIER at eiffage.com> wrote:

Hi John,

Were you able to look at provided logs ?

The "name not unique" error appears when starting spuser for the first time on the given computer, only daemon was running.

Thank you.

Best Regards

Emmanuel MAGNIER
Responsable d'études

Tél. : +33 (0)9 53 43 79 29

-----Message d'origine-----
De : John Lane Schultz [mailto:jschultz at spreadconcepts.com] 
Envoyé : lundi 15 mai 2017 18:40
À : MAGNIER Emmanuel [EIFFAGE ENERGIE] <Emmanuel.MAGNIER at eiffage.com>
Objet : Re: [Spread-users] spread connection error

Hi Emmanuel,

I've received your reports and will look into them as soon as I can, but probably not before Wednesday.

The "name not unique" error means that Spread thinks a client is already connected with the specified user name.  You can specify a different user name to spuser with the -u option (I believe, check the usage --help).

We have had reports in the past of Spread not "releasing" a client connection (and, therefore, its unique user name) even when the connection is broken.  That might explain why subsequent attempts fail.  If your first ever attempt to the daemon fails with that error, then that's a different bug.

Cheers!

-----
John Lane Schultz
Spread Concepts LLC
Cell: 443 838 2200

On May 15, 2017, at 3:05 AM, MAGNIER Emmanuel [EIFFAGE ENERGIE] <Emmanuel.MAGNIER at eiffage.com> wrote:

Hi John,

Thank you for explanations.

Using multicast may not be allowed by our customer network gear.

To the "first" spread.conf file previously sent, we added requested DebugFlags.

Please find attached logs files from 3 computers :
- *_traces.txt are log files made by spread.exe
- *_sptmonitor_log.txt are log files with the output of sptmonitor

Here's what we did :
- every daemons shutdown, GTC_T_PE1_PC3 and GTC_T_PA1_MAI  started => GTC_T_PE1_PC3_Traces.txt till line 192
- spuser started on both computers, we sent messages => GTC_T_PE1_PC3_Traces.txt till line 213
- daemon started on GTC_T_ARC => GTC_T_PE1_PC3_Traces.txt till line 339
- daemon started on GTC_T_ACQ_Q3 => GTC_T_PE1_PC3_Traces.txt till line 481

GTC_T_ACQ_Q3 is the 3rd started in Spread_Segment 172.23.164.255:4803, spread is no more working.
On GTC_T_PE1_PC3, i tried to start spuser and got this error :

=========================================================
D:\Specifique\Spread>spuser.exe
Spread library version is 4.4.0
SP_error: (-6) Connection rejected, name not unique

Bye.
=========================================================

I don't understand this error "name not unique". 
In the past, we also got "connection refused" error.

Please help us to find out what happen and how to make it works.

Thank you

Best Regards

Emmanuel MAGNIER
Responsable d'études

Tél. : +33 (0)9 53 43 79 29

-----Message d'origine-----
De : John Lane Schultz [mailto:jschultz at spreadconcepts.com]
Envoyé : mardi 2 mai 2017 16:43
À : MAGNIER Emmanuel [EIFFAGE ENERGIE] <Emmanuel.MAGNIER at eiffage.com> Cc : spread-users at lists.spread.org; DELFOSSE Benjamin [EIFFAGE ENERGIE] <Benjamin.DELFOSSE at eiffage.com> Objet : Re: [Spread-users] spread connection error

Hi Emmanuel, 

> How many segments should we use ?

Your segments should correspond to your LANs that support single-hop broadcast, or preferably, multicast transmission between all of the daemons in the segment.

> Which network masks/range ?

If your network gear supports it, then you should prefer to use multicast addresses for your segment addresses instead of broadcast addresses.  That is more efficient, in terms of interrupts for uninvolved machines on the network, and you don't have to worry about getting your network masking / addressing exactly correct.

> Are there any limits in spread use hit by our config ? (max segments, 
> max computer per segment, max computers in the whole config  .)

In 4.4, there should be no new limitations on the number of daemons you can have in a segment or in a configuration.  It is possible, however, that some Windows specific regressions may have been introduced at some point because we don't have a great environment in which to test large Windows deployments.

To better debug your problems, I'd need you to send some log file outputs from the daemons with the following debug flags enabled:

DebugFlags = { PRINT EXIT CONFIGURATION MEMBERSHIP }

If you could also run an sp_monitor at the same time and have it query the status of all the involved daemons and give me some of that output, that might help too.

Cheers!

-----
John Lane Schultz
Spread Concepts LLC
Cell: 443 838 2200

On Apr 28, 2017, at 8:34 AM, MAGNIER Emmanuel [EIFFAGE ENERGIE] <Emmanuel.MAGNIER at eiffage.com> wrote:

Dear spread users,

We're upgrading an old setup made with :
	. Spread 3
	. Windows XP computers
	. Windows 2003 computers

To this new setup :
	. Spread 4.4
	. Windows 10 computers
	. Windows 2016 computers

First spread.conf :

Spread_Segment 172.21.164.255:4803 {
              GTC_T_ACQ_Q1              172.21.164.10
              GTC_T_PE1_Q1 172.21.164.11 }

Spread_Segment 172.22.164.255:4803 {
              GTC_T_ACQ_Q2              172.22.164.10
              GTC_T_PE1_Q2 172.22.164.11
              GTC_T_PE1_PC3              172.22.164.15
              GTC_T_PE2_PC3              172.22.164.16
              GTC_T_BDD                       172.22.164.19
              GTC_T_ACQ_GTE            172.22.164.22
              GTC_T_PE3_CLI 172.22.164.23
              GTC_T_PE1_GTE             172.22.164.25
}

Spread_Segment 172.23.164.255:4803 {
              GTC_T_ACQ_Q3              172.23.164.10
              GTC_T_PE1_Q3 172.23.164.11
              GTC_T_PA1_MAI            172.23.164.15
              GTC_T_ARC                       172.23.164.24
}

Spread_Segment 172.24.164.255:4803 {
              GTC_T_ACQ_Q4              172.24.164.10
              GTC_T_PE1_Q4 172.24.164.11
              GTC_T_PA1_ADM           172.24.164.15
#             GTC_T_PA1_BDT             172.24.164.16
              GTC_T_PE1_CLI 172.24.164.18
              GTC_T_PE2_CLI 172.24.164.19
              GTC_T_AS1_Q4 172.24.164.25
              GTC_T_AS2_Q4 172.24.164.27
              GTC_T_PE4_CLI 172.24.164.28
              GTC_T_PE1_SI1 172.24.164.50
              VERDI                                   172.24.164.240
}

Spread_Segment 172.27.112.255:4803 {
              GTC_B_ACQ                      172.27.112.10
              GTC_B_PE1_PCS             172.27.112.13
              GTC_B_PA1_ADM          172.27.112.16
}

Spread_Segment 172.26.146.255:4803 {
              GTC_R_ACQ                      172.26.146.10
              GTC_R_PE1_CLI 172.26.146.12
              GTC_R_PA1_ADM          172.26.146.14
              GTC_R_PE1_GTE             172.26.146.18
}


Spread_Segment 172.28.116.255:4803 {
              GTC_A_PE1_PCS             172.28.116.12
              GTC_A_ACQ                      172.28.116.14
}

#Spread_Segment 172.20.16.255:4803 {
#             VERDI                   172.20.16.87
#}

DangerousMonitor = true


This doesn't work :
	. When starting spread.exe on more than 2 computers in the same segments OR
	. When starting spread.exe on more than 8 computers

We're getting "spread connection refused" error.

We tried this second spread.conf :

Spread_Segment 172.21.164.255:4803 {
              GTC_T_ACQ_Q1              172.21.164.10
              GTC_T_PE1_Q1 172.21.164.11 }

Spread_Segment 172.22.164.255:4803 {
              GTC_T_ACQ_Q2              172.22.164.10
              GTC_T_PE1_Q2 172.22.164.11 }

Spread_Segment 172.22.164.255:4803 {
              GTC_T_PE1_PC3              172.22.164.15
              GTC_T_PE2_PC3              172.22.164.16
}

Spread_Segment 172.22.164.255:4803 {
              GTC_T_BDD                       172.22.164.19
              GTC_T_ACQ_GTE            172.22.164.22
}

Spread_Segment 172.22.164.255:4803 {
              GTC_T_PE3_CLI 172.22.164.23
              GTC_T_PE1_GTE             172.22.164.25
}

Spread_Segment 172.23.164.255:4803 {
              GTC_T_ACQ_Q3              172.23.164.10
              GTC_T_PE1_Q3 172.23.164.11 }

Spread_Segment 172.23.164.255:4803 {
              GTC_T_PA1_MAI            172.23.164.15
              GTC_T_ARC                       172.23.164.24
}

Spread_Segment 172.24.164.255:4803 {
              GTC_T_ACQ_Q4              172.24.164.10
              GTC_T_PE1_Q4 172.24.164.11 }

Spread_Segment 172.24.164.255:4803 {
              GTC_T_PA1_ADM           172.24.164.15
#             GTC_T_PA1_BDT             172.24.164.16
}

Spread_Segment 172.24.164.255:4803 {
              GTC_T_PE1_CLI 172.24.164.18
              GTC_T_PE2_CLI 172.24.164.19 }

Spread_Segment 172.24.164.255:4803 {
              GTC_T_AS1_Q4 172.24.164.25
              GTC_T_AS2_Q4 172.24.164.27 }

Spread_Segment 172.24.164.255:4803 {
              GTC_T_PE4_CLI 172.24.164.28
              GTC_T_PE1_SI1 172.24.164.50 }

Spread_Segment 172.24.164.255:4803 {
              VERDI                                   172.24.164.240
}

Spread_Segment 172.27.112.255:4803 {
              GTC_B_ACQ                      172.27.112.10
              GTC_B_PE1_PCS             172.27.112.13
}

Spread_Segment 172.27.112.255:4803 {
              GTC_B_PA1_ADM          172.27.112.16
}

Spread_Segment 172.26.146.255:4803 {
              GTC_R_ACQ                      172.26.146.10
              GTC_R_PE1_CLI 172.26.146.12 }

Spread_Segment 172.26.146.255:4803 {
              GTC_R_PA1_ADM          172.26.146.14
              GTC_R_PE1_GTE             172.26.146.18
}

Spread_Segment 172.28.116.255:4803 {
              GTC_A_PE1_PCS             172.28.116.12
              GTC_A_ACQ                      172.28.116.14
}

#Spread_Segment 172.20.16.255:4803 {
#             VERDI                   172.20.16.87
#}

DangerousMonitor = true


It seems better with few computers (others aren't installed yet), but it's  still strange for us.

Please advise us about :
	. How many segments should we use ?
	. Which network masks/range ?
	. Are there any limits in spread use hit by our config ? (max segments, max computer per segment, max computers in the whole config  .)

Thank you for your help

Best Regards

Emmanuel MAGNIER
Responsable d'études

Tél. : +33 (0)9 53 43 79 29


Cet e-mail et ses éventuelles pièces jointes peuvent contenir des informations confidentielles et sont exclusivement adressés au(x) destinataire(s) mentionné(s) ci-dessus. Toute diffusion, exploitation ou copie sans autorisation de cet e-mail et de ses pièces jointes est strictement interdite. Si vous recevez ce message par erreur, merci de le détruire et d' avertir immédiatement l'expéditeur. EIFFAGE décline toute responsabilité si ce message a été modifié ou falsifié. 
This message and any attachments may contain confidential information and are established exclusively for his or its recipients. Any use of this message, for which it was not intended, any distribution or any total or partial publication is prohibited unless previously approved. If you receive this message in error, please destroy it and immediately notify the sender thereof. The EIFFAGE Group declines all responsibility concerning this message if it has been altered or tampered with. _______________________________________________
Spread-users mailing list
Spread-users at lists.spread.org
https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.spread.org%2Fmailman%2Flistinfo%2Fspread-users&data=02%7C01%7CEmmanuel.MAGNIER%40eiffage.com%7Cc6c4b6defb8b47594a7b08d49bb10150%7C2ed0a394379b4092aaee9dce8bdb4f2d%7C1%7C0%7C636304631944148984&sdata=r4zx3%2BuTNIXyqN%2F13979WGyB7lOG%2FhjH%2FI0%2F%2BM52ZoU%3D&reserved=0

<files.zip>_______________________________________________
Spread-users mailing list
Spread-users at lists.spread.org
https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.spread.org%2Fmailman%2Flistinfo%2Fspread-users&data=02%7C01%7CEmmanuel.MAGNIER%40eiffage.com%7Cc6c4b6defb8b47594a7b08d49bb10150%7C2ed0a394379b4092aaee9dce8bdb4f2d%7C1%7C0%7C636304631944148984&sdata=r4zx3%2BuTNIXyqN%2F13979WGyB7lOG%2FhjH%2FI0%2F%2BM52ZoU%3D&reserved=0





More information about the Spread-users mailing list