[Spread-users] crash bug report

Jonathan Stanton jonathan at cnds.jhu.edu
Wed Feb 11 17:30:17 EST 2004


I'm a bit concerned about this patch as it seems to be working around a 
compiler bug by making the code less clear (char instead of int when the 
value is an integer) -- and it isn't guaranteed to work if gcc changes how 
the algorithm by which it removes memcpys. I'd be happier with a flag that 
forced gcc not to remove the memcpys if there is no better solution.

I'll admit I do not have the C standard memorized. Does anyone else know 
if this is valid compiler behavior on platforms where the memcpy is 
required for alignment reasons?

On platforms without the alignment requirements, I'd be happy to hear if 
gcc could optimize away unnecessary memcpys, but it seems like that 
behavior on systems with alignment requirements is wrong as the memcpy is 
necessary.

Thanks,

Jonathan

On Wed, Feb 11, 2004 at 03:07:20PM -0500, Mikhail Terekhov wrote:
> It still gives a bus error. Attached patch fixes it for me.
> The explanation is that gcc takes definition "int32 *num_vs_ptr"
> as a hint and eliminates call to memcpy.
> 
> Regards,
> Mikhail
> 
> 
> Jonathan Stanton wrote:
> 
> >I have fixed all of this type of bus-error causing bugs in the groups code
> >that I can find. The fixes have been committed to CVS. If you want to try
> >it out, since I don't really have any Sparcs or Alphas I can test on, I'd
> >appreciate it.
> >
> >If you want to wait for a release to test, I hope to package up a test 
> >version of 3.17.2 soon, and that will include this fix. I just have a bit 
> >more I want to do first.
> >
> >Cheers,
> >
> >Jonathan
> >
> >On Wed, Dec 24, 2003 at 12:34:04PM -0500, Mikhail Terekhov wrote:
> > 
> >
> >>The same thing could happen on lines 866,1121,1314,1512,1829.
> >>Am I correct?
> >>
> >>Mikhail
> >>
> >>Theo Schlossnagle wrote:
> >>
> >>   
> >>
> >>>On Tue, 2003-12-23 at 17:04, Greg Shebert wrote:
> >>>
> >>>
> >>>     
> >>>
> >>>>line 797:
> >>>>
> >>>>num_vs_ptr = (int32 *)&Mess_buf[ num_bytes ];
> >>>>num_bytes += sizeof( int32 );
> >>>>*num_vs_ptr = 1;
> >>>>
> >>>>the last statement (line 799) can cause a bus error... basically, if
> >>>>num_bytes is not a multiple of the host systems word size then the last
> >>>>statement tries to write a word length value into a location that is not
> >>>>word aligned... 
> >>>>
> >>>>the resulting bus error brings down the daemon :(
> >>>>
> >>>>i corrected this using a memcpy instead of the assignment and this seems
> >>>>to correct the problem...
> >>>>
> >>>>i experienced the problem on solaris2.8
> >>>> 
> >>>>
> >>>>       
> >>>>
> >>>You are not mislead.  That will certainly cause a bus error on any
> >>>architecture that requires word aligned copies like that.  It'll likely
> >>>bus error on alpha as well.
> >>>
> >>>
> >>>
> >>>     
> >>>
> >>_______________________________________________
> >>Spread-users mailing list
> >>Spread-users at lists.spread.org
> >>http://lists.spread.org/mailman/listinfo/spread-users
> >>   
> >>
> >
> > 
> >

> Copyright (C) 2003 Mikhail Terekhov
> 
> Permission is hereby granted, free of charge, to any person obtaining
> a copy of this software and associated documentation files (the
> "Software"), to deal in the Software without restriction, including
> without limitation the rights to use, copy, modify, merge, publish,
> distribute, sublicense, and/or sell copies of the Software, and to
> permit persons to whom the Software is furnished to do so, subject to
> the following conditions:
>     
> THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
> NONINFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDER BE LIABLE FOR
> ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF
> CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
> WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
> 
> 
> Index: groups.c
> ===================================================================
> RCS file: /storage/cvsroot/spread/daemon/groups.c,v
> retrieving revision 1.13
> diff -C3 -r1.13 groups.c
> *** groups.c	8 Feb 2004 15:07:23 -0000	1.13
> --- groups.c	11 Feb 2004 19:42:44 -0000
> ***************
> *** 684,690 ****
>   	group		*grp, *new_grp;
>   	member		*mbr, *new_mbr;
>   	int		needed;
> ! 	int32		*num_vs_ptr; /* num members in virtual-synchrony/failure-atomicity set */
>   	int		num_bytes;
>   	char		proc_name[MAX_PROC_NAME];
>   	char		private_name[MAX_PRIVATE_NAME+1];
> --- 684,690 ----
>   	group		*grp, *new_grp;
>   	member		*mbr, *new_mbr;
>   	int		needed;
> ! 	char		*num_vs_ptr; /* num members in virtual-synchrony/failure-atomicity set */
>   	int		num_bytes;
>   	char		proc_name[MAX_PROC_NAME];
>   	char		private_name[MAX_PRIVATE_NAME+1];
> ***************
> *** 824,830 ****
>   					head_ptr->type |= CAUSED_BY_JOIN ;
>   
>   					/* notify all local members */
> ! 					num_vs_ptr = (int32 *)&Mess_buf[ num_bytes ];
>   					num_bytes += sizeof( int32 );
>                                           temp = 1;
>                                           memcpy( num_vs_ptr, &temp, sizeof( int32 ) ); /* *num_vs_ptr = 1; */
> --- 824,830 ----
>   					head_ptr->type |= CAUSED_BY_JOIN ;
>   
>   					/* notify all local members */
> ! 					num_vs_ptr = &Mess_buf[ num_bytes ];
>   					num_bytes += sizeof( int32 );
>                                           temp = 1;
>                                           memcpy( num_vs_ptr, &temp, sizeof( int32 ) ); /* *num_vs_ptr = 1; */
> ***************
> *** 892,898 ****
>   						head_ptr = Message_get_message_header(joiner_msg);
>   						head_ptr->type |= CAUSED_BY_NETWORK ;
>   						/* build a self vs set */
> ! 						num_vs_ptr = (int32 *)&Mess_buf[ num_bytes ];
>   						num_bytes += sizeof( int32 );
>                                                   temp = 1;
>                                                   memcpy( num_vs_ptr, &temp, sizeof( int32 ) ); /* *num_vs_ptr = 1; */
> --- 892,898 ----
>   						head_ptr = Message_get_message_header(joiner_msg);
>   						head_ptr->type |= CAUSED_BY_NETWORK ;
>   						/* build a self vs set */
> ! 						num_vs_ptr = &Mess_buf[ num_bytes ];
>   						num_bytes += sizeof( int32 );
>                                                   temp = 1;
>                                                   memcpy( num_vs_ptr, &temp, sizeof( int32 ) ); /* *num_vs_ptr = 1; */
> ***************
> *** 1022,1028 ****
>   	proc		p, p1;
>   	group		*grp;
>   	member		*mbr;
> ! 	int32		*num_vs_ptr; /* num members in vs set */
>   	char		*vs_ptr;     /* the virtual synchrony set */
>   	message_link	*mess_link;
>   	message_header	*head_ptr;
> --- 1022,1028 ----
>   	proc		p, p1;
>   	group		*grp;
>   	member		*mbr;
> ! 	char		*num_vs_ptr; /* num members in vs set */
>   	char		*vs_ptr;     /* the virtual synchrony set */
>   	message_link	*mess_link;
>   	message_header	*head_ptr;
> ***************
> *** 1147,1153 ****
>   			head_ptr->type |= CAUSED_BY_LEAVE ;
>   
>   			/* notify all local members */
> ! 			num_vs_ptr = (int32 *)&Mess_buf[ num_bytes ];
>   			num_bytes += sizeof( int32 );
>                           temp = 1;
>                           memcpy( num_vs_ptr, &temp, sizeof( int32 ) ); /* *num_vs_ptr = 1; */
> --- 1147,1153 ----
>   			head_ptr->type |= CAUSED_BY_LEAVE ;
>   
>   			/* notify all local members */
> ! 			num_vs_ptr = &Mess_buf[ num_bytes ];
>   			num_bytes += sizeof( int32 );
>                           temp = 1;
>                           memcpy( num_vs_ptr, &temp, sizeof( int32 ) ); /* *num_vs_ptr = 1; */
> ***************
> *** 1219,1225 ****
>   	proc		p, p1;
>   	group		*grp, *nextgroup;
>   	member		*mbr;
> ! 	int32		*num_vs_ptr; /* num members in vs set */
>   	char		*vs_ptr;     /* the virtual synchrony set */
>   	message_link	*mess_link;
>   	message_header	*head_ptr;
> --- 1219,1225 ----
>   	proc		p, p1;
>   	group		*grp, *nextgroup;
>   	member		*mbr;
> ! 	char		*num_vs_ptr; /* num members in vs set */
>   	char		*vs_ptr;     /* the virtual synchrony set */
>   	message_link	*mess_link;
>   	message_header	*head_ptr;
> ***************
> *** 1341,1347 ****
>   
>   				head_ptr->type |= CAUSED_BY_DISCONNECT ;
>   
> ! 				num_vs_ptr = (int32 *)&Mess_buf[ num_bytes ];
>   				num_bytes += sizeof( int32 );
>                                   temp = 1;
>                                   memcpy( num_vs_ptr, &temp, sizeof( int32 ) ); /* *num_vs_ptr = 1; */
> --- 1341,1347 ----
>   
>   				head_ptr->type |= CAUSED_BY_DISCONNECT ;
>   
> ! 				num_vs_ptr = &Mess_buf[ num_bytes ];
>   				num_bytes += sizeof( int32 );
>                                   temp = 1;
>                                   memcpy( num_vs_ptr, &temp, sizeof( int32 ) ); /* *num_vs_ptr = 1; */
> ***************
> *** 1484,1490 ****
>   	int		changed;
>   	int		ret;
>   	int		vs_bytes;
> ! 	int32		*num_vs_ptr; /* num members in virtual-synchrony/failure-atomicity set */
>           int32           num_vs;
>   	int		num_exist;
>   	struct worklist *indices[MAX_PROCS_RING];
> --- 1484,1490 ----
>   	int		changed;
>   	int		ret;
>   	int		vs_bytes;
> ! 	char		*num_vs_ptr; /* num members in virtual-synchrony/failure-atomicity set */
>           int32           num_vs;
>   	int		num_exist;
>   	struct worklist *indices[MAX_PROCS_RING];
> ***************
> *** 1544,1550 ****
>   	    group *this_group;
>   	    /* prepare vs set */
>   	    vs_bytes = 0;
> ! 	    num_vs_ptr = (int32 *)&Temp_buf[0];
>   	    vs_bytes+= sizeof( int32 );
>               num_vs = 0;
>   	    
> --- 1544,1550 ----
>   	    group *this_group;
>   	    /* prepare vs set */
>   	    vs_bytes = 0;
> ! 	    num_vs_ptr = &Temp_buf[0];
>   	    vs_bytes+= sizeof( int32 );
>               num_vs = 0;
>   	    
> ***************
> *** 1859,1865 ****
>   
>   	int		num_bytes;
>   	message_header	*head_ptr;
> ! 	int32		*num_vs_ptr; /* num members in virtual-synchrony/failure-atomicity set */
>   	struct skiplistnode *iter;
>           member          *mbr;
>   	char		*membs_ptr;
> --- 1859,1865 ----
>   
>   	int		num_bytes;
>   	message_header	*head_ptr;
> ! 	char		*num_vs_ptr; /* num members in virtual-synchrony/failure-atomicity set */
>   	struct skiplistnode *iter;
>           member          *mbr;
>   	char		*membs_ptr;
> ***************
> *** 1870,1876 ****
>   
>   	head_ptr->type = head_ptr->type | caused;
>   
> ! 	num_vs_ptr = (int32 *)&buf[num_bytes];
>   	num_bytes += sizeof( int32 );
>           head_ptr->data_len += sizeof( int32 );
>           num_vs = 0;
> --- 1870,1876 ----
>   
>   	head_ptr->type = head_ptr->type | caused;
>   
> ! 	num_vs_ptr = &buf[num_bytes];
>   	num_bytes += sizeof( int32 );
>           head_ptr->data_len += sizeof( int32 );
>           num_vs = 0;
> ***************
> *** 1925,1931 ****
>   	char		*gid_ptr;
>   	member		*mbr;
>   	struct skiplistnode *giter, *iter;
> ! 	int16		*num_memb_ptr;
>           int16           num_memb;
>   	char		*memb_ptr;
>           int     size_for_this_group;
> --- 1925,1931 ----
>   	char		*gid_ptr;
>   	member		*mbr;
>   	struct skiplistnode *giter, *iter;
> ! 	char		*num_memb_ptr;
>           int16           num_memb;
>   	char		*memb_ptr;
>           int     size_for_this_group;
> ***************
> *** 1955,1961 ****
>   		num_bytes += sizeof( group_id );
>   		memcpy( gid_ptr, &grp->grp_id, sizeof(group_id) );
>   
> ! 		num_memb_ptr = (int16 *)&buf[num_bytes];
>   		num_bytes += sizeof( int16 );
>   		num_memb  = 0;
>   
> --- 1955,1961 ----
>   		num_bytes += sizeof( group_id );
>   		memcpy( gid_ptr, &grp->grp_id, sizeof(group_id) );
>   
> ! 		num_memb_ptr = &buf[num_bytes];
>   		num_bytes += sizeof( int16 );
>   		num_memb  = 0;
>   
> ***************
> *** 1992,1998 ****
>   	group		*grp;
>   	char		*gid_ptr;
>   	member		*mbr;
> ! 	int16		*num_memb_ptr;
>           int16           num_memb;
>   	int		i;
>   
> --- 1992,1998 ----
>   	group		*grp;
>   	char		*gid_ptr;
>   	member		*mbr;
> ! 	char		*num_memb_ptr;
>           int16           num_memb;
>   	int		i;
>   
> ***************
> *** 2041,2047 ****
>   		num_bytes += sizeof( group_id );
>   		memcpy( &grp->grp_id, gid_ptr, sizeof(group_id) );
>   
> ! 		num_memb_ptr = (int16 *)&Temp_buf[num_bytes];
>   		num_bytes += sizeof( int16 );
>                   memcpy( &num_memb, num_memb_ptr, sizeof( int16 ) );
>   
> --- 2041,2047 ----
>   		num_bytes += sizeof( group_id );
>   		memcpy( &grp->grp_id, gid_ptr, sizeof(group_id) );
>   
> ! 		num_memb_ptr = &Temp_buf[num_bytes];
>   		num_bytes += sizeof( int16 );
>                   memcpy( &num_memb, num_memb_ptr, sizeof( int16 ) );
>   


-- 
-------------------------------------------------------
Jonathan R. Stanton         jonathan at cs.jhu.edu
Dept. of Computer Science   
Johns Hopkins University    
-------------------------------------------------------




More information about the Spread-users mailing list