Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1372: Add suffixes for matching glossary terms #1373

Merged
merged 79 commits into from
Jan 10, 2024

Conversation

pedro-mendonca
Copy link
Member

@pedro-mendonca pedro-mendonca commented Mar 31, 2022

Add suffixes to matching glossary terms.

Nouns

https://www.thefreedictionary.com/Forming-Plurals.htm
https://preply.com/en/blog/simple-rules-for-the-formation-of-plural-nouns-in-english/

Plurals of singular nouns

  • Ending in a sibilant (-ss, -z, -x, -sh, -ch). Suffix: '-es'.
    E.g.: Kiss and kiss-es, waltz and waltz-es, box and box-es, dish and dish-es, coach and coach-es.
  • Ending with '-y' preceded by vowel. Suffix: '-s'.
    E.g.: Delay and delay-s, key and key-s, toy and toy-s, guy and guy-s.
  • Ending with '-o' and '-y' preceded by consonant. Suffix: '-es'.
    E.g.: Lady and ladi-es, hero, tomato.
  • Ending with with '-an'. Suffix: '-en'.
    E.g.: Woman and wom-en.
  • Ending with '-f', '-fe' or '-s'. Suffix: '-es'.
    E.g.: Wife and wiv-es, leaf and leav-es, wolf and wolv-es, bus and bus-es.
  • Fallback suffix for most nouns not ended with '-s'. Suffix: '-s'.

Verbs

Third-person singular

  • Ending in a sibilant (-ss, -z, -x, -sh, -ch). Suffix: '-es'.
    E.g.: Pass and pass-es, quiz and quiz-es, fix and fix-es, push and push-es, watch and watch-es.
  • Ending with '-y' preceded by vowel. Suffix: '-s'.
    E.g.: Play and play-s.
  • Ending with '-o' and '-y' preceded by consonant. Suffix: '-es'.
    E.g.: Try and tri-es, Go and go-es, do and do-es.
  • Fallback suffix for most verbs. Suffix: '-s'.
    E.g.: Format and format-s, make and make-s, pull and pull-s

Past simple tense and past participle. Suffix '-ed'.

  • Not ending with '-e'.
    E.g.: Fix and fix-ed, push and push-ed.
  • Ending with '-e'.
    E.g.: Contribute and contribut-ed, delete and delet-ed.

Present participle and gerund. Suffix '-ing'.

  • Not ending with '-e'.
    E.g.: Fix and fix-ing, push and push-ing.
  • Ending with '-ee', '-ye' or '-oe'.
    E.g.: Agree and agree-ing, see and see-ing, dye and dye-ing, tiptoe and tiptoe-ing.
  • Ending with single '-e'.
    E.g.: Contribute and contribut-ing, delete and delet-ing, care and car-ing. Change to '-ing'.

Nouns formed by Verbs

https://www.thefreedictionary.com/Commonly-Confused-Suffixes-tion-vs-sion.htm

  • Verbs that form nouns ending with Suffix '-tion'.
    Check the complete list below.
// General.
'ate'    => 'a',     // Abbreviate and abbrevia-tion. Change to 'a-tion'.
'ize'    => 'iza',   // Authorize and authoriza-tion. Change to 'iza-tion'.
'ify'    => 'ifica', // Specify and specifica-tion. Change to 'ifica-tion'.
'efy'    => 'efac',  // Liquefy and liquefac-tion. Change to 'efaca-tion'.
'aim'    => 'ama',   // Exclaim and exclama-tion. Change to 'ama-tion'.
'pt'     => 'p',     // Encrypt and encryp-tion. Change to 'p-tion'.
'scribe' => 'scrip', // Subscribe and subscrip-tion. Change to 'scrip-tion'.
'ceive'  => 'cep',   // Perceive and percep-tion. Change to 'cep-tion'.
'sume'   => 'sump',  // Resume and resump-tion. Change to 'sump-tion'.
'ct'     => 'c',     // Correct and correc-tion. Change to 'c-tion'.
'ete'    => 'e',     // Delete and dele-tion. Change to 'e-tion'.
'it'     => 'i',     // Edit and edi-tion. Change to 'i-tion'.
'ite'    => 'i',     // Ignite and igni-tion. Change to 'i-tion'.
'ute'    => 'u',     // Contribute and contribu-tion. Change to 'u-tion'.
'olve'   => 'olu',   // Resolve and resolu-tion. Change to 'olu-tion'.
'ose'    => 'osi',   // Compose and composi-tion. Change to 'osi-tion'.
// After 'n' cases.
'tain'   => 'ten',   // Abstain and absten-tion. Change to 'ten-tion'.
'vene'   => 'ven',   // Contravene and contraven-tion. Change to 'ven-tion'.
'vent'   => 'ven',   // Prevent and preven-tion. Change to 'ven-tion'.
// After 'r' cases.
'rt'     => 'r',     // Insert and inser-tion. Change to 'r-tion'.
  • Verbs that form nouns ending with Suffix '-sion'.
    Check the complete list below.
// General.
'ade'  => 'a',   // Invade and inva-sion. Change to 'a-sion'.
'cede' => 'ces', // Precede and preces-sion. Change to 'ces-sion'.
'ide'  => 'i',   // Decide and deci-sion. Change to 'i-sion'.
'ode'  => 'o',   // Explode and explo-sion. Change to 'o-sion'.
'ude'  => 'u',   // Exclude and exclu-sion. Change to 'u-sion'.
'ise'  => 'i',   // Supervise and supervi-sion. Change to 'i-sion'.
'use'  => 'u',   // Confuse and confu-sion. Change to 'u-sion'.
'pel'  => 'pul', // Expel and expul-sion. Change to 'pul-sion'.
'mit'  => 'mis', // Submit and submis-sion. Change to 'mis-sion'.
'ss'   => 's',   // Compress and compres-sion. Change to 's-sion'.
// After 'n' cases.
'end'  => 'en',  // Extend and exten-sion. Change to 'en-sion'.
// After 'r' cases.
'vert' => 'ver', // Convert and conver-sion. Change to 'ver-sion'.
'erse' => 'er',  // Disperse and disper-sion. Change to 'er-sion'.
'ur'   => 'ur',  // Recur and recur-sion. Change to 'ur-sion'.
'erge' => 'er',  // Emerge and emer-sion. Change to 'er-sion'.

Hooks

  • Add filter gp_glossary_match_suffixes() to allow customization.

Fixes #1384, #1386 and #1372

@ocean90 ocean90 added this to the 3.1 milestone Mar 31, 2022
gp-templates/helper-functions.php Outdated Show resolved Hide resolved
@pedro-mendonca
Copy link
Member Author

@ocean90 I've made a big change by separating terms by type, with more clear lists that we can keep or comment out, for testing.

Let me know your thoughts on this please.

@ocean90
Copy link
Member

ocean90 commented Apr 13, 2022

I like the direction where this goes! But please make sure you update your branch with the latest changes from #1395 as it changes how gp_sort_glossary_entries_terms(), now gp_glossary_add_suffixes(), works.

@ocean90
Copy link
Member

ocean90 commented May 4, 2022

@pedro-mendonca Do you plan to continue working on this? Do you need any help?

@pedro-mendonca
Copy link
Member Author

Hi, I do, just had no time yet. I'll get back to it.
Thanks!

@pedro-mendonca pedro-mendonca changed the title 1372: Add suffixes for matching glossary terms ending with 'e' Jun 9, 2022
@pedro-mendonca
Copy link
Member Author

@ocean90 I have a .po with a set of originals and a .csv for the glossary to test this.
I can share here if you want.
I'll do some unit tests for all these cases.

@pedro-mendonca
Copy link
Member Author

After looking at most rules, I've made an array of rules with the same structure:

array(
  'endings'  => array(
    'y',  // Lady.
  ),
  'preceded' => '[b-df-hj-np-tv-xz]',  // Preceded by any consonant.
  'change'   => 'i',                   // Change to 'i'.
  'add'      => 'es',                  // Add 'es'.
),

Then I made a single logic that runs on every set of rules, for every part_of_speech type.

We can add rules or remove them by adding or removing these arrays.

Feedback please :)

@pedro-mendonca
Copy link
Member Author

@ocean90 I've added all the Unit Tests for the suffixes, for nouns and verbs.
Feel free to review when you can.

@ocean90 ocean90 requested review from akirk and amieiro July 18, 2022 07:20
@pedro-mendonca
Copy link
Member Author

pedro-mendonca commented Oct 22, 2023

@amieiro I've completed the full update related to your above suggestions.

NOUNS

Here is a summary:

4 ❌ Incorrect. Ending with '-an'. Suffix: '-en'. Words ending with 'an" add a "s" to make the plural. Some examples:
Plan -> Plans, Can -> Cans, Pan -> Pans, Fan -> Fans, Clan -> Clans, Span -> Spans, Human -> Humans.
In English we have some words with irregular plurals (Woman -> Women, Man -> Men, Child -> Children, Foot -> Feet, Tooth -> Teeth). I think it doesn't make sense to add a full list of irregular plurals.
We should remove the unit test test_map_glossary_entries_to_translation_originals_with_nouns_ending_with_an_in_glossary (Test 5)

Fixed, removed all the entries and tests, old and new, about the irregular noun plurals for 'man' and 'woman'.


6 ❌ Incorrect. Fallback. We have this regex: "\w(?<!z|x|sh|ch|s|y|an|fe)". We need to remove the "an" (point 4) and add the "f" (point 5). This is the proposed regex: "\w(?<!z|x|sh|ch|s|y|f|fe)". Adds "s".
Visual examples:
chef -> chefs.
Unit test: test_map_glossary_entries_to_translation_originals_with_nouns_fallback_not_ending_with_s_in_glossary (Test 8)

Fixed, removed the exception 'an'. No need to add 'f', as it's matched by the fallback default '-s'.


❓Comment. Have you considered the plural of some "-s" and "-z" endings? For some nouns that end in "-s" or "-z", you have to double the "-s" or "-z" and add "-es". For example:
fez –> fezzes
gas –> gasses
As it is not a rule for all endings, we can avoid it.

Not added, as it's not a regular case.


VERBS

1 ❌ Incorrect. Third-person singular. Ending in "-ss," "-z," "-x," "-sh," "-ch," or "-tch." -> Add "es". "-tch" considered the same than "-ch". We need to add the "-s" situation (e.g. bias->biases).
Visual examples:
Pass -> passes.
Quiz -> quizes. (The correct plural is quizzes).
Fix -> fixes.
Push -> pushes.
Watch -> watches.
Unit test: test_map_glossary_entries_to_translation_originals_with_verbs_3rdperson_ending_with_sibilant_in_glossary (Test 9)
Example to add:
focus -> focuses.

Fixed, removed the '-ss' and added '-s' to match both situations. Added case bias and focus.


❓Comment. Here we have some examples not checked:
Base forms which end in vowel + single consonant. This depends on the stressed syllable, so we can create both, as we won't be able to know the stressed syllable.
commit. -> committed, commited (incorrect), committing, commiting (incorrect)
Vowel + l. The consonant is doubled if the base form ends in a vowel + l, whether the last syllable is stressed or not. I think we can't double the vowel with the current approach.
travel -> travelling, travelled

Fixed: Now the ending values don't use null if no changes, now it has support for placeholders (%s or %1$s) to allow using the current ending, and duplicating for the case of doubling the ending consonants (committed, committing, etc.). The vowel-l case is matched in the added rules. As you've mentioned, it's not simple to detect the stressed syllable, so there are two matches for both single and double ending consonant.


TESTING

Created the function check_map_glossary() to simplify the testing for the functions map_glossary_entries_to_translation_originals......().


At last, do you think it would be usefull to add to the tests/phpunit/data a testing .pot and a glossary .csv despite not being used currently on any phpunit tests?

Thank you for your compreehensive review!

Copy link
Member

@amieiro amieiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related with the .pot and .csv files in the tests/phpunit/data, as we don't use it in the automatic test, I suggest you to remove it from the PR and add it to a GitHub Gist or to a own repo. I think a good option is to add them in a GitHub Gist, so we can update it easily and discuss. But I think the best option will be to read both files in each test and using their data instead of hard-coding them (we can do this in a different PR).

tests/phpunit/testcases/test_template_helper_functions.php Outdated Show resolved Hide resolved
@pedro-mendonca
Copy link
Member Author

pedro-mendonca commented Jan 10, 2024

Related with the .pot and .csv files in the tests/phpunit/data, as we don't use it in the automatic test, I suggest you to remove it from the PR and add it to a GitHub Gist or to a own repo. I think a good option is to add them in a GitHub Gist, so we can update it easily and discuss. But I think the best option will be to read both files in each test and using their data instead of hard-coding them (we can do this in a different PR).

Hi @amieiro, thanks for your review!!
I've removed both glossary testing example files, .pot and .csv.
I've also added the missing types from check_map_glossary().

@amieiro amieiro self-requested a review January 10, 2024 12:07
@amieiro amieiro enabled auto-merge (squash) January 10, 2024 12:07
@amieiro amieiro dismissed ocean90’s stale review January 10, 2024 12:10

These changes requested have been done in the next commits.

@amieiro amieiro merged commit 18f1205 into GlotPress:develop Jan 10, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants