This notice in the man page is for the benefit of people trying to write portable programs.
Since there has been speculation about what glibc itself does in this case, I decided to check.
The glibc source code actually avoids signed-overflow UB, at least in the conversion function scanf("%d")
uses. At worst you could say the conversion result is undefined with glibc, but not the behaviour of the whole program. int
on GNU systems doesn't have trap values (it's 2's complement) so this can't make your program crash or misbehave, other than perhaps not having a numeric value that matches what you might get from other ways of parsing the string. e.g. if your code looked at the last decimal digit as well as using sscanf
to convert, you could have -1
even though the last decimal digit was even.
errno == ERANGE
after a glibc scanf
integer conversion that overflowed long
or unsigned long
, for conversions of long
or narrower.
(%lld
on a 32-bit system would only check for overflow of long long
.)
I checked with this test program:
#include <stdio.h>
int main(){
int tmp = 0xcccccccc;
int conv_result = scanf("%d", &tmp);
printf("successful conversions = %d, result = %d = %#x\n",
conv_result, tmp, (unsigned)tmp);
}
With input that fits in a long
(64-bit on x86-64 GNU/Linux), we get that value truncated to int
.
With larger input, glibc detects overflow and produces -1
(actually LONG_MIN
or LONG_MAX
according to the sign, in this case LONG_MAX which gets truncated to -1
when narrowing to int
).
For example it converts 1111111111111111111111111111111
as -1
, but 1111111111111111111
as 734294471
= 0x2bc471c7
. See it on Godbolt with 2 executors that feed stdin with those inputs. It treats this as a successful conversion either way, scanf returning 1
, e.g.
successful conversions = 1, result = -1 = 0xffffffff
I used GDB to single-step into scanf with glibc 2.38-7 on my Arch GNU/Linux system (letting debuginfod fetch the library source code, very helpful). It eventually reached __strtol_l
(https://codebrowser.dev/glibc/glibc/stdlib/strtol_l.c.html#215) after a bunch of stdio overhead and copying characters one at a time into a tmp buffer, checking the base each time to see if it should be checking for hex or base-10 digits. Yikes, not efficient.
https://codebrowser.dev/glibc/glibc/stdlib/strtol_l.c.html#466 is the actual part of that function which checks for overflow with something like total >= ULONG_MAX/10
and the the trailing decimal digit of ULONG_MAX
against the new digit being converted, before doing the total = total*base + digit
.
// glibc/stdlib/strtol_l.c
INT
INTERNAL (__strtol_l) (const STRING_TYPE *nptr, STRING_TYPE **endptr,
int base, int group, locale_t loc)
{
...
if (c >= L_('0') && c <= L_('9'))
c -= L_('0');
... // check for grouping characters like ' if enabled
else if (ISALPHA (c))
c = TOUPPER (c) - L_('A') + 10;
else
break;
// my comments added:
// c is a the new digit converted to integer in the [0,base) range
// i is the total to be returned
if ((int) c >= base)
break;
/* Check for overflow. */
if (i > cutoff || (i == cutoff && c > cutlim)) // cutoff and cutlim were set from a lookup table according to base
overflow = 1;
else
{
use_long: // goto label from a loop using narrower types, if LONG isn't the same size as long
i *= (unsigned LONG int) base;
i += c;
}
}
...
if (__glibc_unlikely (overflow))
{
__set_errno (ERANGE);
#if UNSIGNED
return STRTOL_ULONG_MAX;
#else
return negative ? STRTOL_LONG_MIN : STRTOL_LONG_MAX;
#endif
}
...
(Yes, the loop could skip overflowing digits and still process a later smaller digit, but the later code doesn't use i
at all if overflow
is set.)
man 3 sscanf
does not indicate deprecation for my toolchain. You should cite yours.