5
\$\begingroup\$

I was practicing question 1a of the 2022 British Informatics Olympiad past paper.

Each letter in the alphabet can be given a value based on its position in the alphabet, A being 1 through to Z being 26. Two letters can produce a third letter by adding together their values, deducting 26 from the sum if it is greater than 26, and finding the letter whose value equals this result. A simple encryption scheme for a string is as follows. Working from left to right, after leaving the left-most character alone, each character is replaced with the character produced by adding it to thecharacter to its immediate left. For example: The string ENCRYPT goes through the steps ENCRYPT -> ESCRYPT -> ESVRYPT -> ESVNYPT -> ESVNMPT -> ESVNMCT -> ESVNMCW

Write a program that reads in an encrypted string, between 1 and 10 uppercase letters inclusive, and outputs the original version of that string.

This is the code I wrote:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAX_STRING_SIZE 11  // 10 characters, plus null.

#define ALPHABET "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
#define ALPHABET_LENGTH 26

int indexof(char *str, char c) {
    return strchr(str, c) - str;
}

void decrypt(char *str) {
    int len = strlen(str);
    for (int i = len - 1; i > 0; --i) {
        int index = indexof(ALPHABET, str[i]) - indexof(ALPHABET, str[i - 1]) - 1;

        if (index < 0) {
            index += ALPHABET_LENGTH;
        }

        str[i] = ALPHABET[index % ALPHABET_LENGTH];
    }
}

int main() {
    char *str = malloc(MAX_STRING_SIZE);

    printf("Enter the encrypted str: ");
    scanf("%s", str);

    decrypt(str);

    printf("\n%s\n", str);

    free(str);
}

Example run:

Enter the encrypted str: ESVNMCW

ENCRYPT

(You can find more test cases in the mark scheme)

I would be grateful to hear about any improvements that could be made to this code.

\$\endgroup\$
2
  • 2
    \$\begingroup\$ Why char *str = malloc(MAX_STRING_SIZE); instead of char str[MAX_STRING_SIZE];? \$\endgroup\$ Commented Jul 5 at 17:00
  • \$\begingroup\$ @chux-ReinstateMonica yes I figured that. Sorry I'm still not that experienced with C. \$\endgroup\$ Commented 2 days ago

1 Answer 1

4
\$\begingroup\$

First and foremost, you've tested this code with the sample data provided, right? It works... That's fundamental, and something to be proud of!

Next is that the code is nicely presented (very readable), with meaningful tokens and good use of whitespace to show structure. More marks...

The logic uses heap memory, and frees the region before terminating.
Here's the first tip: System calls (eg: malloc()) can fail (returning an error code).
Programs that disregard returned values may continue running until crashing for mysterious reasons (making especially difficult bugs to locate and squelch!) Make it a habit to always verify important operations (file open/close, read/write, memory allocation) have succeeded before moving on to do more processing.

Now for some suggestions:


scanf("%s", str); will not limit itself to only the size of the buffer. Had you entered "STRAWBERRIES", the buffer would have been overrun resulting in undefined behaviour (likely crashing, but maybe not... who can say? It's undefined.)
The minimal fix would be scanf("%10s", str); that unfortunately duplicates a constant.

The next issue is that there's no rejection of the string "Mr.&Mrs.". Users will find any/every weakness in a program. It's up to the coder to do their best to outwit malicious users and ensure the program deals appropriately with bad data.

Since the challenge is to accept only uppercase alphabetic, you can use a scanset to restrict the input: scanf("%10[A-Z]", str);. Now, entering the name of the current COVID variant "FLiRT" will be handled as if it were "FL"... Better than nothing, eh?

The code should verify that scanf() successfully assigned one value, but scanf() alone is good for a book as thick as any Dickens' novel...


Moving on... Using heap storage for this small program is definitely overkill.

char str[ 10 + 1 ]; // input buffer + '\0'

would have been enough, meaning <stdlib.h> is not needed. Don't overdo things, adding complexities where they're unnecessary. "KISS" means "Keep It Simple Stupid"..


The for() loop of decrypt() is curious, and, for many coders, prone to "off-by-one" errors in calculating array indices properly.

Consider: the encryption proceeds from left-to-right via addition (with modulo 'wrapping' around the alphabet). Decrypting would be more straightforward if it, too, processed left-to-right but with subtraction (and modulo fix-ups). In this way, the code could work until the end of the string is reached (and not have to first measure the length of the string.) Poof! #include <string.h> is no longer needed...

Next (and subject to pedantry, sometimes), it's probably safe to assume that we're dealing with ordinary 7-bit ASCII characters whose character codes are well-known and usable. (This I'm going to presume. Other's may wish to provide answers that deal with EBCDIC, or extended alphabets. "KISS", remember?) As such, simple numerical calculations replace look-up scans to match characters and return an offset.


"grateful to hear about any improvements that could be made to this code"

Below puts some of this together, along with providing 2 equivalent buffers (hardwired test cases) to decrypt()...

It's up to you to decide if this is an improvement...

#include <stdio.h>

void decrypt( char *d, char *e ) {
    d[0] = e[0]; // should return here if e is an empty (zero length) string
    int i = 0;
    while( e[ ++i ] )
        d[i] = ( ( (e[i] - 'A') - (e[i-1] - 'A') + 25 ) % 26 ) + 'A';
    d[i] = '\0';
}

int main( void ) {
//  char estr[ 10 + 1 ] = "ESVNMCW";
    char estr[ 10 + 1 ] = "STRAWBERRY";
    char dstr[ 10 + 1 ] = {0};

    printf("\n%s\n", estr);
//  printf("Enter the encrypted str: ");
//  scanf("%10[A-Z]", estr);

    decrypt( dstr, estr );

    printf("\n%s\n", dstr);
}

Output:

STRAWBERRY

SAXIVECMZG

Not under any time pressure (like a competition), the code above seemed less than optimal with its source and destination buffers.

Still processing the string left-to-right, the following is a small improvement on the above code...

void decrypt( char *cp ) {
    if( *cp ) {
        char nch, och = *cp++;
        do {
            nch = ( *cp - och + 25 ) % 26 + 'A';
            och = *cp;
            *cp = nch;
        } while( *++cp );
    }
}

This requires the caller to pass only the enciphered string in a mutable buffer. och is the old character and nch is the new character. And, empty strings are handled by code instead of a wishful comment.


Final Destination: (with bug fix)

Fixing my own bug in that code, and applying helpful suggestions, lead to what might be the most expressive version of this operation:

void decrypt( char *cp ) {
    if( !cp[0] || !cp[1] ) return; // strlen < 2 => unaltered
    const unsigned N_ALPHA = 26;
    for( char nch, och = *cp; *++cp; och = *cp, *cp = nch )
        nch = ( *cp - och + N_ALPHA - 1 ) % N_ALPHA + 'A';
}

Function receives a mutable string pointer. If the string length is 0 or 1, nothing happens. Otherwise, 2x one char buffers appear; one filled with a copy of the string's first character. The string is traversed to its end. A byte is copied out of the string, and replaced by another byte... Then, some iterative jiggery-pokery with subtraction and modulo involving 3 distinct bytes happens.

To some readers, this appears as "Code Golf" (seeking the fewest bytes of code possible.) I prefer to read/write code like this. As a reader, I can pause for a few seconds, work out what's happening in these 2-3 lines, and not miss the "twice increment" that I'd allowed into the previous (not fully tested) version. To each their own. (The functional description of operations would be in the code, not in this description.)

Challenge:
Now that this seems to be deciphering given strings, write the enciphering version of the function for your own pleasure.

Bonus round:
Ampersand ('&') was once, in some circles, the 27th letter of the English alphabet. See here.

Modify the code to handle, for instance, "COKE&PEPSI" as a valid input string.

Better to revert to a lookup table? Or, is it better to use special handling (branching) when a few special characters are allowed?

\$\endgroup\$
5
  • 1
    \$\begingroup\$ Aside from maybe 0 and 1, I advocate no naked magic numbers like 25, 26. Perhaps #define AZ_SIZE 26 and nch = ( *cp - och + AZ_SIZE - 1) % AZ_SIZE + 'A';? And since ( *cp - och + AZ_SIZE - 1) % AZ_SIZE may be negative with a rogue cp, consider #define AZ_SIZE 26u to prevent any negative remainder from % \$\endgroup\$ Commented Jul 5 at 16:56
  • 1
    \$\begingroup\$ #define AZ_SIZE ('Z' - 'A' + 1)? \$\endgroup\$
    – lungj
    Commented Jul 5 at 20:25
  • \$\begingroup\$ @chux-ReinstateMonica Thanks. :-) Good idea! I've gone with the above to "limit the scope" of the token (without extra #undef), but considered, too, a simple enum... Would value your insights on taking the enum approach! Thanks! :-) \$\endgroup\$
    – Fe2O3
    Commented Jul 6 at 0:41
  • \$\begingroup\$ @lungj Yes. I've stuck with 26 because it should be recognisable in this context... Here's some fun for a rainy afternoon! Thanks... :-)... \$\endgroup\$
    – Fe2O3
    Commented Jul 6 at 0:44
  • 1
    \$\begingroup\$ @Fe2O3, enum vs. macro: if code needs pre-processing like #if AZ_SIZE != 26 ... then macro is the way to go. It simply depends on the larger code needs. \$\endgroup\$ Commented 2 days ago

Not the answer you're looking for? Browse other questions tagged or ask your own question.