Web Articles Scraping

Technical support and scripting issues

Moderators: Dorian (MJT support), JRL

Post Reply
Semper
Junior Coder
Posts: 30
Joined: Mon Feb 25, 2008 3:28 pm

Web Articles Scraping

Post by Semper » Sat Oct 26, 2013 10:39 am

Hi,

would someone be kind enough to point or just hint me to the right direction.
How would I scrape all the articles from a site like http://www.jugantor.com (where there are no rss feeds) for a later translation.

I've used MS for alot of thing but never for web so I'm a little confused.

I started with this but got stuck

Code: Select all

IECreate>IE[0]
IENavigate>%IE[0]%,http://www.jugantor.com/,r
IEWait>%IE[0]%,r
Wait>delay
IEGetTags>www.jugantor.com,DIV,O,divArr
DeleteFile>C:\temp\jugantor.txt
If>divArr_count>0
  Let>k=0
  Repeat>k
    Let>k=k+1
    Let>this_div_html=divArr_%k%
    WriteLn>C:\temp\jugantor.txt,resvar,%this_div_html%
  Until>k=divArr_count
Endif
The writeln was just to se what I'm dealing with.
Or maybe there is a whole different aproach that I'm not aware of :?

Thanks.

User avatar
Marcus Tettmar
Site Admin
Posts: 7395
Joined: Thu Sep 19, 2002 3:00 pm
Location: Dorset, UK
Contact:

Post by Marcus Tettmar » Mon Nov 11, 2013 2:25 pm

You are on the right lines.

This will help you identify the elements:
http://www.mjtnet.com/blog/2012/10/02/f ... web-sites/

Often I will use the extracttag function(s) to extract a container element and then use RegEx for the rest.
Marcus Tettmar
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar

Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?

Post Reply
Sign up to our newsletter for free automation tips, tricks & discounts