Need to find specific string on changing webpage

Technical support and scripting issues

Moderators: Dorian (MJT support), JRL

Post Reply
randybass
Newbie
Posts: 5
Joined: Tue Jun 21, 2005 4:52 pm
Location: Dallas, Texas

Need to find specific string on changing webpage

Post by randybass » Fri Nov 10, 2006 8:58 pm

This webpage changes from time to time, but the specific table I am looking for always stays the same.

How would I program a script to find a specific column or columns in a webpage. I am trying to post data to Excel located in this string DATA HERE

Here is the table I am trying to extract data from:



7.125%


7.451%




Any help would be appreciated.
Randy Bass
Dallas, Texas

User avatar
pgriffin
Automation Wizard
Posts: 460
Joined: Wed Apr 06, 2005 5:56 pm
Location: US and Europe

Post by pgriffin » Fri Nov 10, 2006 9:05 pm

Randy,

What's the URL? I am curious about how it operates. Maybe I can create something if I have the site to test with.

randybass
Newbie
Posts: 5
Joined: Tue Jun 21, 2005 4:52 pm
Location: Dallas, Texas

Post by randybass » Fri Nov 10, 2006 9:47 pm

The URL is http://www.mortgage.com/C3/application. ... el=default You will need to go through the items to get to the rates.

I am trying to extract the rate:



7.250%



I have been running WebRecorder (which works great!!) and then a script Marcus helped me with to extract data. The problem is, the webmaster adds or deletes a column from time to time which moves the information I am trying to extract. The location of the data stays the same. I am looking to modify the script below so that it will go and find one specific location of data even if the webmaster adjusts the Table by a column or two.

Here is what I do now:


//Open Excel Sheet
ExecuteFile>xlfilename
WaitWindowOpen>Microsoft Excel - RETAILRATES*
SetFocus>Microsoft Excel - RETAILRATES*

//Empty the sheet
//Press CTRL
//Send>a
//Release CTRL
//Press Del
Wait>1

//This bit loops through the table and populates Excel cells
VBSTART
VBEND

Let>k=274
Let>col=1
Let>row=1
Repeat>k
Let>TD%K%_SIZE=9999
LibFunc>hIE,ExtractTag,r,%IE[0]%,main,TD,k,0,TD%K%
MidStr>r_6,1,r,TD%K%
Let>v=TD%K%
StringReplace>v,CRLF,,v
StringReplace>v,",,v
VBEval>Trim("%v%"),v
If>{%v%=""}
Let>row=row+1
Let>col=1
ELSE
//MessageModal>v
DDEPoke>Excel,RETAILRATES.xls,R%row%C%col%,v
Let>col=col+1
Endif
Let>k=k+1
Until>k=293

Wait>1

//Save Excel
SetFocus>Microsoft Excel -*
Press CTRL
Send>s
Release CTRL

//Exit Excel
Press ALT
Send>fx
Release ALT

//Close IE
LibFunc>hIE,KillIE,r,%IE[0]%
Randy Bass
Dallas, Texas

User avatar
Marcus Tettmar
Site Admin
Posts: 7380
Joined: Thu Sep 19, 2002 3:00 pm
Location: Dorset, UK
Contact:

Post by Marcus Tettmar » Tue Nov 14, 2006 11:21 am

Hi Randy,

Apologies for not getting back to you sooner.

The problem with this kind of thing is that the script either needs to use an absolute indexing system .. i.e. start at the 52nd table cell, or scan the source for something unique that will identify the data. The problem with the latter is that often there is nothing unique to look for. Looking at your HTML snippet that looks like it may be the case here. So we instead have been using an indexing method. But that suffers when the page changes frequently and tables/columns are added/removed all the time.

Even if we scan through the table looking for a heading, it won't necessarily help us if a column is removed. E.g. on this page we would search for "30 Year Fixed" and then we know the subsequent cells have our data. We then scan cell by cell. But we could count the number of columns in the table, and look at the first row for column headings and make some decision based on that. But if the headings aren't consistent this will fail. This is also getting quite complicated. And if the webmaster decides to add a column, how do we know beforehand what it is?

This is an inherent problem with scraping data from websites - especially third party ones we have no control over. You are at the mercy of the website's developer and there's no possible way of knowing what the website will look like tomorrow. Often we just have to be prepared to tweak the script every time something changes.

My best suggestion is that we try to build the script so that it looks for the table column headings and based on those determines which columns contain the data you want, and then extracts the data. This could take a little time to develop but should be possible.
Marcus Tettmar
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar

Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?

Post Reply
Sign up to our newsletter for free automation tips, tricks & discounts