2

There are two elements : <div class = "abc def"> and <div class = "abc">

I want to select the latter.

My code is

soup.find('div', {'class':'abc'})

However it select the former.

What is the correct way to do it?

0

4 Answers 4

0

The former element has two classes: and (see e.g. How to assign multiple classes to an HTML container?), so BeautifulSoup correctly points at it when using find().

In order to point to the second you should use findAll - which returns a list - and extract the second element:

soup.findAll('div', {'class':'abc'})[1]
1
  • If you mean the last element of a list, you just type listname[-1]
    – cap.py
    Commented Oct 22, 2019 at 15:01
0

From Official doc :

You can also search for the exact string value of the class attribute:

css_soup.find_all("p", class_="body strikeout")
# [<p class="body strikeout"></p>]
soup.find_all("div", class_="abc")
1
  • It will select both.
    – Chan
    Commented Oct 22, 2019 at 14:37
0

Try :nth-of-type(2) or :nth-child(2) with css selector.

print(soup.select_one('.abc:nth-of-type(2)'))

Example:

html='''<div class = "abc def"></div>
        <div class = "abc"></div>'''

soup=BeautifulSoup(html,'html.parser')
print(soup.select_one('.abc:nth-of-type(2)'))

Edited:

print(soup.select_one('.abc:not(.def)'))
4
  • It returns None.
    – Chan
    Commented Oct 22, 2019 at 14:50
  • @Chan : Please post your html Structure.Is your html look like my given example then it should work.If Not can you post Html?
    – KunduK
    Commented Oct 22, 2019 at 14:52
  • @Chan : please check the Edited answer.Hope this will help.
    – KunduK
    Commented Oct 22, 2019 at 14:56
  • It still returns None.
    – Chan
    Commented Oct 23, 2019 at 5:53
0

To get an exact class match, you can use the following function lambda expression as filter.

 soup.find_all(lambda x: x.name == 'div' and ''.join(x.get('class', list())) == 'abc')

You can also wrap this in a function if you want. ''.join(x.get('class', list())) == 'abc' joins a the classes (if available) and checks if it is equal to 'abc'.

Example

from bs4 import BeautifulSoup
html = """
<div class = "abc def"></div>
<div class = "abc"></div>
<div></div>
"""
soup = BeautifulSoup(html, 'html.parser')
print(
    soup.find_all(
        lambda x: x.name == 'div' and ''.join(x.get('class', list())) == 'abc'
    )
)

Output

[<div class="abc"></div>]

Ref:

Not the answer you're looking for? Browse other questions tagged or ask your own question.