Here is how the webpage looks like
So I tried to parse it as:
But it turned out that though all the articles were fetched correctly, they all end up in the first section. (The other section names are not fetched, that is.) But I am at a loss what to do, since all section names are included in "h3", and unlike webpage of built-in recipes, <div class='module'> appears only before the first section, not every section (which I think explains the failure). Can anyone help me out? Just a quick answer is appreciated.
Code:
<div class='module'>
<h3>Section1</h3>
……
<li>
<h4>articles links and article titles</h4>
</li>
……
<h3>Section2</h3>
……
<li>
<h4>articles links and article titles</h4>
</li>
……
Code:
for section in soup.findAll('div', attrs={'class':['module']}):
h3 = section.find('h3')
section_title = self.tag_to_string(h3)
self.log('Found section:', section_title)
articles = []
for post in section.findAll('li'):
h4 = post.findAll(['h4'])
a = post.find('a', href=True)
title = self.tag_to_string(a)
url = a['href']