Quantcast
Channel: MobileRead Forums - Calibre
Viewing all articles
Browse latest Browse all 31536

Sometimes the browser seems to temporarily lose its session

$
0
0
I'm working on a recipe to get at the subscriber-only content for the Boston Globe (the existing recipes all seem to use the fairly limited RSS feeds the Globe offers for non-subscribers, and as a subscriber I'd like everything). It logs in, then scrapes the "Today's Paper" page for sections and articles, sets them up as feeds, and then does some processing on the actual articles to clean them up.

Seems like pretty standard stuff, from what I've seen in the existing recipes and here on the forum.

However, I've run into a very bizarre situation. For some articles, instead of getting the subscriber content for the article, calibre ends up with the non-subscriber content (that is, the soup passed into post_process_html contains the non-subscriber content). What's weird is that if I open my browser, log in, and go to the article's URL (the one that calibre is using), I get the subscriber content.

So the URL itself isn't the problem; rather, it looks like in these cases, calibre's python browser has lost its session or something like that.

I've managed to hack around the problem by having post_process_html recognize the two flavors of page and do the right thing. In the non-subscriber page case, that involves finding a "Next" link and doing another page fetch, since one thing they do is split the articles across multiple pages.

However, this is not a very palatable solution. And in at least one case, it fails - if the editorial cartoon ends up in non-subscriber mode, the cartoon itself isn't fetched because they don't let non-subscribers see it.

What I'd really like to do is to figure out why this is happening and put in some magic to avoid it. Has anyone seen anything like this?

oh, and by the way, if there are other Boston Globe subscribers out there, I'll be happy to share the recipe once it's ready.

Viewing all articles
Browse latest Browse all 31536

Trending Articles