Computing desk
< November 18	<< Oct \| November \| Dec >>	November 20 >

Welcome to the Wikipedia Computing Reference Desk Archives
The page you are currently viewing is a transcluded archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.

November 19

Disguised characters?

This morning I came across some disruptive editing involving a user copying and pasting content to "new" titles of existing articles that appear to the naked eye as having the same title. I am assuming that (some) of the characters are different on an encoded level. What are the differences? I'd appreciate some tips on how I could examine them to see the differences for myself in the future. Below are the blue links to our actual articles, and the red links to the identical-seeming page titles that were created:

Thanks--Fuhghettaboutit (talk) 01:33, 19 November 2015 (UTC)[reply]

cf Homoglyph -- Finlay McWalterᚠTalk 01:39, 19 November 2015 (UTC)[reply]

As to comparing the two - ideally there's be some web service, but I'm not aware of one. I wrote this little Python3 script for this job:

#!/usr/bin/python3
# -*- coding: utf-8 -*-

# paste the two strings you want to show into these two variables:
s="The Game (rapper)"
t="Тhе Gаmе (rарреr)"

import unicodedata

for c,d in zip(s,t):
    print ("{:s} {:s} U+{:04x} U+{:04x} {:s} {:30s} {:30s}".format(c,
                                                                   d,
                                                                   ord(c),
                                                                   ord(d),
                                                                   "!" if c!=d  else " ",
                                                                   unicodedata.name(c),
                                                                   unicodedata.name(d),
                                                               ))

which for the first two titles you mentioned will show:

T Т U+0054 U+0422 ! LATIN CAPITAL LETTER T         CYRILLIC CAPITAL LETTER TE    
h h U+0068 U+0068   LATIN SMALL LETTER H           LATIN SMALL LETTER H          
e е U+0065 U+0435 ! LATIN SMALL LETTER E           CYRILLIC SMALL LETTER IE      
    U+0020 U+0020   SPACE                          SPACE                         
G G U+0047 U+0047   LATIN CAPITAL LETTER G         LATIN CAPITAL LETTER G        
a а U+0061 U+0430 ! LATIN SMALL LETTER A           CYRILLIC SMALL LETTER A       
m m U+006d U+006d   LATIN SMALL LETTER M           LATIN SMALL LETTER M          
e е U+0065 U+0435 ! LATIN SMALL LETTER E           CYRILLIC SMALL LETTER IE      
    U+0020 U+0020   SPACE                          SPACE                         
( ( U+0028 U+0028   LEFT PARENTHESIS               LEFT PARENTHESIS              
r r U+0072 U+0072   LATIN SMALL LETTER R           LATIN SMALL LETTER R          
a а U+0061 U+0430 ! LATIN SMALL LETTER A           CYRILLIC SMALL LETTER A       
p р U+0070 U+0440 ! LATIN SMALL LETTER P           CYRILLIC SMALL LETTER ER      
p р U+0070 U+0440 ! LATIN SMALL LETTER P           CYRILLIC SMALL LETTER ER      
e е U+0065 U+0435 ! LATIN SMALL LETTER E           CYRILLIC SMALL LETTER IE      
r r U+0072 U+0072   LATIN SMALL LETTER R           LATIN SMALL LETTER R          
) ) U+0029 U+0029   RIGHT PARENTHESIS              RIGHT PARENTHESIS

So that shows the first letter of the two strings is different (with a !) and you see that while the "good" one is a normal ASCII/Latin 'T', the other is a Cyrillic letter that, in many fonts, resembles a T but isn't one. -- Finlay McWalterᚠTalk 02:04, 19 November 2015 (UTC)[reply]

Thanks Finlay. I'm surprised there no common and easy resource to examine these – which leaves people like me, who are quite unable to say "I just whipped up a python script", in the dark!--Fuhghettaboutit (talk) 03:52, 19 November 2015 (UTC)[reply]

This Cisco technical document, Homoglyph Advanced Phishing Attacks, notes that these kinds of characters are used as part of a very modern and increasingly-common security attack: the IDN homograph attack. The Cisco page also has links to some additional resources, including scripts and detailed explanations. Nimur (talk) 06:38, 19 November 2015 (UTC)[reply]

If you copy the URLs that link to the false titles and paste them in a text editor, you'll find that Тhе Gаmе (rарреr) is actually %D0%A2h%D0%B5_G%D0%B0m%D0%B5_(r%D0%B0%D1%80%D1%80%D0%B5r) and List оf dесеаsеd hiр hор аrtists is List_%D0%BEf_d%D0%B5%D1%81%D0%B5%D0%B0s%D0%B5d_hi%D1%80_h%D0%BE%D1%80_%D0%B0rtists in percent encoding. IE and Edge will display the URLs as such. --Paul_012 (talk) 15:57, 19 November 2015 (UTC)[reply]

I use a Firefox extension, Character Identifier, which gives the Unicode (numbers and names) for the selected text. —Tamfang (talk) 05:19, 22 November 2015 (UTC)[reply]

Why FOLLOW sets doesn't contain epsilon?

Could anyone give an intuitive explanation for the reason why FOLLOW sets doesn't contain epsilon?It might be better if you can also provide an example.JUSTIN JOHNS (talk) 09:10, 19 November 2015 (UTC)[reply]

I haven't studied this stuff in a long time, but it's not clear to me what ε in a FOLLOW set would mean. A nonterminal X can either expand to 0 terminals or to 1 or more terminals; in the former case, ε is in FIRST(X), and in the latter case, the leftmost terminal is in FIRST(X). The substring generated by a nonterminal X can either be at the end of the string or not; in the former case $ is in FOLLOW(X), and in the latter case the leftmost following terminal is in FOLLOW(X). Maybe the role you're imagining for ε in FOLLOW sets is actually filled by $. -- BenRG (talk) 14:24, 19 November 2015 (UTC)[reply]

This was pretty incomprehensible to someone not familiar with the terminology. I'm hazarding a guess that the relevant article is LL parser. --NorwegianBlue^talk 13:27, 20 November 2015 (UTC)[reply]

Does the LL parser tells anything about 'why epsilon is not included in FOLLOW sets'?I think no.JUSTIN JOHNS (talk) 06:30, 23 November 2015 (UTC)[reply]

Can a string be accepted during parsing if the input is exhausted but productions are yet to derive?

I would like to know whether a string can be accepted if the input gets exhausted but there are productions still to derive.Also I would like to know can such an issue occur in parsing?JUSTIN JOHNS (talk) 10:12, 19 November 2015 (UTC)[reply]

A string is accepted if and only if it can be generated by the grammar. If you're talking about a situation where the string is "abc", and the parser's state looks something like "a b c . X Y", then the string will be accepted iff both X and Y can generate the empty string. -- BenRG (talk) 14:26, 19 November 2015 (UTC)[reply]

Swelling of old lithium-ion batteries

I've always regarded it to be common knowledge that swelling of old lithium-ion batteries is a normal effect resulting from degradation as they age. This effect has been consistently exhibited in all the old batteries I've discarded over the past ten years, and using the "spin test" to determine that a battery has passed its prime also appears to be common knowledge among acquaintances. However, a Google search today only gives results that say swollen batteries are a dangerous situation resulting either from misuse or malfunction, and that they should be discarded immediately, with no mention of it being part of the normal ageing process. What gives? --Paul_012 (talk) 16:11, 19 November 2015 (UTC)[reply]

Spin test? [1] 220 of ^Borg 17:48, 19 November 2015 (UTC)[reply]

"Malfunction" may just be another word for normal "degradation as they age". This also applies to people. :-) StuRat (talk) 18:44, 19 November 2015 (UTC)[reply]

Spin test: a battery that is a cylinder...

  -
 | |
  -

...on its side will not spin well, but a battery that is slightly barrel shaped...

  -
 ( )
  -

...will spin very well.

What the spin test does is to identify batteries that are only slightly swollen -- not enough to see with your eyes.

A tiny bit of swelling is not exactly "normal" but it also isn't all that uncommon and it isn't worth worrying about.

What a tiny bit of swelling usually means is that your system either has a tiny bit of overcharging or a tiny bit of overheating.

References:

--Guy Macon (talk) 19:31, 19 November 2015 (UTC)[reply]

• Lithium-ion battery convenience link.
That link I added above seemed to apply more to the little flat batteries in mobile phones, but I can see how it would still work. There are of course cases where the battery is grossly swollen, and can explode, or catch fire. Apparently on some laptops battery swelling causes the trackpad button/s to misbehave. [2]
Paul 012, it may be helpful if you can give the exact Google search you did. 220 of ^Borg 00:00, 20 November 2015 (UTC)[reply]

Google for lithium batteries loose lithium aging. It is in research how lithium batteries "loose" the lithium by aging. --Hans Haase (有问题吗) 11:29, 21 November 2015 (UTC)[reply]