Author |
|
raven77 Groupie
Joined: January 02 2007 Location: United States
Online Status: Offline Posts: 44
|
Posted: August 08 2011 at 09:58 | IP Logged
|
|
|
can someone give me a quick explanation on how to implement this? I would like to use this new feature to pull temp data from the CAI networks webcontrol.
Thanks!
|
Back to Top |
|
|
dhoward Admin Group
Joined: June 29 2001 Location: United States
Online Status: Offline Posts: 4447
|
Posted: August 09 2011 at 23:05 | IP Logged
|
|
|
Raven,
I had totally forgotten that I had included the URL Scraper plugin with the beta and meant to document it sooner. In the meantime, here are some quick basics to get going:
In the PowerHome Explorer under PowerHome|Setup|Plugins, create a new plugin line. Give it a suitable ID. Under "Launch Data (ActiveX classname)", use:
PH_URLScraper.phurlscraper
For "Initialization Data", enter the path and filename for the urlscraper.ini file. This will typically be:
c:\powerhome\plugins\urlscraper.ini
Edit the urlscraper.ini file for the URL and data you're trying to get. The default file is currently set to scrape weather data for zipcode 32712. The regular expression syntax is the standard VBScript version that is best documented here: http://www.regular-expressions.info/vbscript.html.
When the plugin scans the URL and finds a regex match, it will fire the appropriate trigger. The regex search for section [URL_1_1] will fire generic plugin trigger for the ID, Command 1, Option 1. [URL_1_2] will be Command 1, Option 2. [URL_2_3] will be Command 2, Option 3, etc. The scraped data will be concatenated data together into system variable [TEMP5]. The individual elements will be separated by "<|>". If you have 10 or fewer elements in a single regex expression, the individual elements will also be returned in the [LOCAL1] thru [LOCAL10] variables.
I hope this gives you enough info to get started.
Dave.
Edited by dhoward - August 09 2011 at 23:06
|
Back to Top |
|
|
GadgetGuy Super User
Joined: June 01 2008 Location: United States
Online Status: Offline Posts: 942
|
Posted: November 08 2014 at 08:38 | IP Logged
|
|
|
Dave-
I am about to attempt use of this URLScraper, but find that more info on its "care & feeding" is needed.
Specifically, I see in the .ini file the following parameters that I'm not sure how to configure.
Can you clarify ...
freq=
scrapecount=
regexoccur=
regexflags=
Thanks.
__________________ Ken B - Live every day like it's your last. Eventually, you'll get it right!
|
Back to Top |
|
|
dhoward Admin Group
Joined: June 29 2001 Location: United States
Online Status: Offline Posts: 4447
|
Posted: November 08 2014 at 12:33 | IP Logged
|
|
|
Ken,
Probably easiest to explain in terms of the actual sample posted below:
Code:
[config]
urlcount=1
[URL_1]
url=http://www.wund.com/cgi-bin/findweather/getForecast?quer y=Eindhoven
freq=0.5
scrapecount=2
[URL_1_1]
regexsearch=<div id="main">[\s\S]*?<span>(.+)</span>[\s\S]*?<h4>(.+)</h4>[\s\S]*?<label>Wind:</label>[\s\S]*?<span>[\s\S]*?<span>(.+)</span>[\s\S]*?from[\s\S]*?<span>(.+)</span>[\s\S]*?<label>Dew Point:</label>[\s\S]*?<span>(.+)</span>
regexoccur=1
regexflags=0
[URL_1_2]
regexsearch=<label>Pressure:</label>[\s\S]*?<b>(.+)</b>[\s\S]*?<label>Windchill:</label>[\s\S]*?<span>(.+)</span>[\s\S]*?<label>Humidity:</label>[\s\S]*?<div class="b">(.+)</div>[\s\S]*?<label>Visibility:</label>[\s\S]*?<span>(.+)</span>
regexoccur=1
regexflags=0 |
|
|
You'll start with the urlcount under the [config] section. This determine how many unique URL's will be retrieved (a single instance of the plugin can retrieve multiple different URL's). For each URL in the URL count, you'll have URL sections. For 1 URL count, you'll have a [URL_1] section. If you have a count of 2, you'll have both a [URL_1} and [URL_2] section.
Within a [URL_X] section, you'll have the url, the freq (the frequency in minutes for how often to retrieve the URL), and the scrapecount. The scrapecount is how many regex searches are going to be made against the retrieved URL HTML data. For [URL_1] with a scrapecount of 2, you'll have both a [URL_1_1] and [URL_1_2] section. If you have a [URL_2] section with a scrapecount of 1, then you'll have a [URL_2_1] section.
The [URL_X_Y] section defines a regex search for the URL and fires a generic plugin trigger. The "X" value corresponds to the Trigger ID column (Command 1 for an X value of 1) and the "Y" value corresponds to the Trigger Value column (Option 1 for a Y value of 1). The regex search that is done uses the VBScript regular expression engine (the same that is used in the new ph_regex???2 functions) and full details on the protocol can be found here: http://www.regular-expressions.info/vb.html. The regexoccur and regexflags parameters correspond to the occur and flags parameters respectively as documented in the PH help file for the ph_regex2 function.
Hope this helps,
Dave.
|
Back to Top |
|
|
|
|