StringReplace parsing

Hints, tips and tricks for newbies

Moderators: Dorian (MJT support), JRL

Post Reply
User avatar
GalaxyMan
Junior Coder
Posts: 40
Joined: Sat Jun 27, 2009 7:21 pm

StringReplace parsing

Post by GalaxyMan » Wed Jul 15, 2009 4:02 pm

I have an HTML document with several tables in it. The tables are of different size in terms of both rows and columns and they vary every time. They are never the same twice.

I need to change the tag in each cell of two adjacent columns of each table. Even though there are a different number of columns in each table, the two column/cells I need to change are ALWAYS the 2nd & 3rd tags after a tag.

Right now I've created a macro that does what I need, but I need to set it for the specific number of rows & columns each time and manually run it on each table.

Any suggestions on how to accomplish this in one shot with some sort of adaptive parsing would be greatly appreciated.

Thank you,

Ronen

User avatar
Bob Hansen
Automation Wizard
Posts: 2475
Joined: Tue Sep 24, 2002 3:47 am
Location: Salem, New Hampshire, US
Contact:

Post by Bob Hansen » Wed Jul 15, 2009 6:19 pm

I suspect we can use RegEx to do this.

Could you provide some lines of the HTML document, showing the "before" and desired "after" results? Could use that to do some RegEx testing.
Hope this was helpful..................good luck,
Bob
A humble man and PROUD of it!

User avatar
GalaxyMan
Junior Coder
Posts: 40
Joined: Sat Jun 27, 2009 7:21 pm

Post by GalaxyMan » Wed Jul 15, 2009 6:46 pm

Bob Hansen wrote:I suspect we can use RegEx to do this.

Could you provide some lines of the HTML document, showing the "before" and desired "after" results? Could use that to do some RegEx testing.
Hi, Bob.

Here is a bit, showing the starting tag and then going down two rows. Keep in mind, each table has a different number of tags, but no matter how many it has, I'm only interested in the 3rd & 4th ones. I've deliberately cut out the header row, as this particular exercise does not apply to the header row, but there is one in the actual code.

Here's before:

Code: Select all


<table>
<tr>
<td>1</td>
<td>R3003808</td>
<td>214 Via Palacio</td>
<td>Palacio</td>
<td>5</td>
<td>6.3</td>
<td>Y</td>
<td>$3,875,000</td>
<td>GOLF</td>
</tr>
<tr>
<td>2</td>
<td>R2997138</td>
<td>124 Via Palacio</td>
<td>Mirasol PALACIO</td>
<td>5</td>
<td>6.2</td>
<td>Y</td>
<td>$3,650,000</td>
<td>LAKE</td>
</tr>
Here's after:

Code: Select all

<table>
<tr>
<td>1</td>
<td>R3003808</td>
<td>214 Via Palacio</td> class="tdLeft" after the td
<td>Palacio</td> class="tdLeft" after the td
<td>5</td>
<td>6.3</td>
<td>Y</td>
<td>$3,875,000</td>
<td>GOLF</td>
</tr>
<tr>
<td>2</td>
<td>R2997138</td>
<td>124 Via Palacio</td> class="tdLeft" after the td
<td>Mirasol PALACIO</td> class="tdLeft" after the td
<td>5</td>
<td>6.2</td>
<td>Y</td>
<td>$3,650,000</td>
<td>LAKE</td>
</tr>
That 'class="tdLeft" after the td' is because the text input window won't accurately output what I've input...

Thanks for your help.

Ronen

User avatar
Bob Hansen
Automation Wizard
Posts: 2475
Joined: Tue Sep 24, 2002 3:47 am
Location: Salem, New Hampshire, US
Contact:

Post by Bob Hansen » Wed Jul 15, 2009 6:55 pm

Not clear what the extra class info is supposed to look like:
214 Via Palacio class="tdLeft" after the td

Have you tried ti disable the HTML in the posting to get the text format that you want?
Hope this was helpful..................good luck,
Bob
A humble man and PROUD of it!

User avatar
GalaxyMan
Junior Coder
Posts: 40
Joined: Sat Jun 27, 2009 7:21 pm

Post by GalaxyMan » Wed Jul 15, 2009 7:07 pm

Bob Hansen wrote:Have you tried ti disable the HTML in the posting to get the text format that you want?
Never occurred to me... :)

214 Via Palacio

That is what the 3rd and 4th tags are supposed to look like after the change.

Thanks...

User avatar
Bob Hansen
Automation Wizard
Posts: 2475
Joined: Tue Sep 24, 2002 3:47 am
Location: Salem, New Hampshire, US
Contact:

Post by Bob Hansen » Wed Jul 15, 2009 9:40 pm

OK.

Now that I see it, I should have understood.

I have done some prelim RegEx tests and feel good this can be done. But I am having some overheating problems with computer right now, may be another day before I have the solution.
Hope this was helpful..................good luck,
Bob
A humble man and PROUD of it!

User avatar
Bob Hansen
Automation Wizard
Posts: 2475
Joined: Tue Sep 24, 2002 3:47 am
Location: Salem, New Hampshire, US
Contact:

Post by Bob Hansen » Wed Jul 15, 2009 9:44 pm

Wow, that was a quick day.......I was able to get this done before the system crashed, had the prelim done already, just needed the final detail that you provided.

This seems to work for me:

Code: Select all

Let>vSample=<table><tr><td>1</td><td>R3003808</td><td>214 Via Palacio</td><td>Palacio</td><td>5</td><td>6.3</td><td>Y</td><td>$3,875,000</td><td>GOLF</td></tr><tr><td>2</td><td>R2997138</td><td>124 Via Palacio</td><td>Mirasol PALACIO</td><td>5</td><td>6.2</td><td>Y</td><td>$3,650,000</td><td>LAKE</td></tr>

Let>vNeedle=(<tr>.*?</td>.*?</td>)(<td>)(.*?)(<td>)(.*?</tr>)
Let>vHaystack=%vSample%
Let>vReplacement=$1<td class="tdLeft">$3<td class="tdLeft">$5

RegEx>%vNeedle%,%vHaystack%,0,matches,matchnum,1,%vReplacement%,vNewData

MessageModal>New data is %vNewData%
Hope this was helpful..................good luck,
Bob
A humble man and PROUD of it!

User avatar
GalaxyMan
Junior Coder
Posts: 40
Joined: Sat Jun 27, 2009 7:21 pm

Post by GalaxyMan » Thu Jul 16, 2009 4:55 am

Bob Hansen wrote:Wow, that was a quick day.......I was able to get this done before the system crashed, had the prelim done already, just needed the final detail that you provided.

This seems to work for me:

Code: Select all

Let>vSample=<table><tr><td>1</td><td>R3003808</td><td>214 Via Palacio</td><td>Palacio</td><td>5</td><td>6.3</td><td>Y</td><td>$3,875,000</td><td>GOLF</td></tr><tr><td>2</td><td>R2997138</td><td>124 Via Palacio</td><td>Mirasol PALACIO</td><td>5</td><td>6.2</td><td>Y</td><td>$3,650,000</td><td>LAKE</td></tr>

Let>vNeedle=(<tr>.*?</td>.*?</td>)(<td>)(.*?)(<td>)(.*?</tr>)
Let>vHaystack=%vSample%
Let>vReplacement=$1<td class="tdLeft">$3<td class="tdLeft">$5

RegEx>%vNeedle%,%vHaystack%,0,matches,matchnum,1,%vReplacement%,vNewData

MessageModal>New data is %vNewData%
Hi, Bob.

I must admit, this code above is so far over my head, I have no clue what it is about or does. :(

On another note, sometimes my tables are several hundred rows long, and there are several tables in each document that I need to deal with. It strikes me as VERY unwieldy to have to put the entire table each time into a Let>vSample statement. Then it becomes faster and easier to just use the macro I have and run it on each table.

Since I don't know, I was imagining something like searching for a tag, then searching for the 3rd and 4th tags after the tag, modifying them, stopping at the first tag and then reloading to search for the next , something like that.

Since I don't know or understand this stuff, I'm allowed to let my imagination run away with itself as I try to figure out how things might possibly work. :)

Thanks again...

gdyvig
Automation Wizard
Posts: 447
Joined: Fri Jun 27, 2008 7:57 pm
Location: Seattle, WA

Subroutine

Post by gdyvig » Thu Jul 16, 2009 6:01 am

Hi Ronen,

You wrote:
I need to change the tag in each cell of two adjacent columns of each table. Even though there are a different number of columns in each table, the two column/cells I need to change are ALWAYS the 2nd & 3rd tags after a tag.
and
It strikes me as VERY unwieldy to have to put the entire table each time into a Let>vSample statement.
Bob was not finished yet. He was demonstrating how RegEx changes what is in vSample. You don't have to load it in with Let statements. The next step is to have the script find the value of vSample for you for each table. Then run Bob's code against it.

I'm still a beginner so far as the RegEx command is concerned. However I know you can make it do a lot with very few statements. And it runs very fast. Macro Scheduler has other string commands but your script would become more longer and more complex.

I'm not ready yet to finish the script, but want to let you know it can be done and probably in not very many lines of code.

Gale

User avatar
Bob Hansen
Automation Wizard
Posts: 2475
Joined: Tue Sep 24, 2002 3:47 am
Location: Salem, New Hampshire, US
Contact:

Post by Bob Hansen » Thu Jul 16, 2009 12:55 pm

Acrually, the script I provided is complete.

What you need to do is replace vSample with your real info. My script used your text as the vSample to be processed. It just looks different without the visible line breaks.

If you have a web page with many tables on is, assume the name of the page is a file named MyTables1.htm, Now, just read the file into a variable and run the rest of the script, like this:

Code: Select all

/This line replaces the Let>vSample= line from original sample script.
ReadFile>Drive\Path\MyTables1.htm,vSample

Let>vNeedle=(<tr>.*?</td>.*?</td>)(<td>)(.*?)(<td>)(.*?</tr>)
Let>vHaystack=%vSample%
Let>vReplacement=$1<td class="tdLeft">$3<td class="tdLeft">$5

RegEx>%vNeedle%,%vHaystack%,0,matches,matchnum,1,%vReplacement%,vNewData

MessageModal>New data is %vNewData%
The script above will go to every table in the file, and insert the class text into the third and fourth cell in every row of every table.
The final message that pops up will be the original file with all the replacements as the variable vNewData.

The RegEx is looking for a Needle in a Haystack and Replacing the Needle as you define it.

The Haystack is the file you are searching through, MyTables1.htm
The Needle is each row of a table, but grouping the different cells for replacement.
The Replacement is keeping all of the original groups, but inserting "class text" into the third and fourth cell tags.

If the file MyTables1.htm has 300 tables, they will all be processed immediately, inserting the text into the third and fourth cell of every row.

===========================
Explanation of RegEx syntax:
Needle.....= (.*?.*?)()(.*?)()(.*?) (Find this: each row in a table)
(.*?.*?) = Group1, beginning of a row and all text in the firt two cells
() = Group2, the tag for cell3
(.*?) = Group3, all the info inside cell3
() = Group4, the tag for cell4
(.*?) = Group5, all of the remaining text for that row

Replacement.....= $1$3$5 (Replace what you find with this)
$1 = Everything from Group1
= new text to replace Group2
$3 = Everything from Group3
= new text to replace Group4
$5 = Everything from Group5
Hope this was helpful..................good luck,
Bob
A humble man and PROUD of it!

User avatar
GalaxyMan
Junior Coder
Posts: 40
Joined: Sat Jun 27, 2009 7:21 pm

Post by GalaxyMan » Thu Jul 16, 2009 1:38 pm

Amazing. I can feel myself actually beginning to grok what you've done.

One problem is that I have many tables in my document, but only 4 of them are relevant to this process. They all have a unique tag:



Is there any way to limit this action to only tables displaying this particular table tag?

Will this also affect the rows with headers or is it only going to deal with non-header rows?

Example:

Smile!

Also, is there any reason to NOT put this code after all of my StringReplace code (approx. 30 different StringReplace), so that it is the last thing that runs, or would I be better off running it independently?

Thanks again...

Ronen

User avatar
Bob Hansen
Automation Wizard
Posts: 2475
Joined: Tue Sep 24, 2002 3:47 am
Location: Salem, New Hampshire, US
Contact:

Post by Bob Hansen » Thu Jul 16, 2009 4:41 pm

1. Is there any way to limit this action to only tables displaying this particular table tag?
I am pretty sure the answer is Yes, this can be restricted to those tables only. I have no time to do the code now, but you need to add to the front of Group1 and to the end of group5. They both need some extra qualifiers, not just this text alone. They need to allow for other text and make sure only a single table is processed at a time (non greedy).
-------------------
2. Will this also affect the rows with headers or is it only going to deal with non-header rows?
Yes, EVERY TABLE, and EVERY ROW that has 4 or more cells would be changed, as it is now written.
-------------------
3. Also, is there any reason to NOT put this code after all of my StringReplace code (approx. 30 different StringReplace), so that it is the last thing that runs, or would I be better off running it independently?
Probably makes no difference. But, not knowing what you are looking for or replacing with, it cannot be determined by me. But as long as you are not modifying the cell tags, then it most likely does not matter. This use of RegEx is like a ReplaceString on steroids, but in addition to searching for specific strings, it also can search for patterns. When it finds them, they can be replaced with anything else, including rearranging what is found.
---------------------------------
Last edited by Bob Hansen on Thu Jul 16, 2009 4:55 pm, edited 4 times in total.
Hope this was helpful..................good luck,
Bob
A humble man and PROUD of it!

User avatar
GalaxyMan
Junior Coder
Posts: 40
Joined: Sat Jun 27, 2009 7:21 pm

Post by GalaxyMan » Thu Jul 16, 2009 4:50 pm

Deep, very deep....I'll do some reading on RegEx and see what I can learn. I was just getting comfortable with StringReplace, too. :(

Thanks again...

User avatar
Bob Hansen
Automation Wizard
Posts: 2475
Joined: Tue Sep 24, 2002 3:47 am
Location: Salem, New Hampshire, US
Contact:

Post by Bob Hansen » Thu Jul 16, 2009 5:01 pm

Reference books:
Regular Expressions Cookbook, Jan Goyvaerts & Steven Levithan, O'Reilly publications.

Mastering Regular Expressions (3rd edition), Jeffrey Friedl, O"Reilly publications.

Free online RegEx tester: http://www.gskinner.com/RegExr/
(You can actually save your own RegEx strings in a library for later usage).

Many other resources available, but I can recommend these from personal use..
Hope this was helpful..................good luck,
Bob
A humble man and PROUD of it!

Post Reply
Sign up to our newsletter for free automation tips, tricks & discounts