Web Scraper Provider
The Web Scraper Provider finds and extracts any data out of a HTML website.
Whenever you need to extract values from websites like fuel prices, headlines, playlist titles, statistics, game results, measurement or surveillance data, monitor status pages etc etc. use the Web Scraper Provider.
Configuration
-
URL
The website URL to query. -
HTTP Method
Method
Description
Get
HTTP GET method, this is the default.
Post
HTTP POST method
Put
HTTP PUT method
-
Request Body
POST or PUT body data.
This can contain a JSON string to request certain API data. -
Request MIME Type
POST or PUT body content MIME type like "text/plain" or "application/json". -
Authentication The following fields set up a HTTP authentication.
-
Authentication Method Selects the HTTP authentication method:
Method
Description
None
No authentication required
Basic
Basic access authentication
Token
Bearer Token authentication
-
Username
Username if basic authentication is required, empty if HTTP authentication is not necessary. -
Password
Password if basic authentication is required. -
Token
A token string for bearer authentication.
-
-
Confidential URL Parameter The following parameter can be used in the URL:
-
Username
Username for the URL, used as$USER
in the URL, like…/xyz.html?user=$USER
. -
Password
Password for the URL, used as$PASS
in the URL, like…/xyz.html?pw=$PASS
. -
Optional Parameter
Secure optional parameter like a code etc. for the URL, used as$PARA
in the URL, like…/xyz.html?code=$PARA
.
-
-
Interval
The query interval in seconds. -
Ignore Sleep
True to ignore sleep, false to pause requests during sleep.
False is the default to safe power.
Query
-
Value
Returns the response value.-
Param 1
A valid selector expression, see below.TipTo get a CSS selector expression to an element within a HTML page:
- Open the according page and copy the URL to the configuration URL
- Using Firefox: Open theWeb Developer Tools
and select the according element using the inspector. Right click on the HTML element text and selectCopy > CSS Selector
. Copy the selector to Param 1.
- Using Chrome: OpenMore tools > Developer tools
and select the according element using the inspector. Right click on the HTML element text and selectCopy > Copy selector
. Copy the selector to Param 1.
-
-
Status
Get the result status:Text
Numeric
Description
N/A
0
No result available.
Excellent
1
Result answer available.
Fail
5
Result parsing or format error
Selector expression overview
A CSS (or jquery) selector syntax is used to find matching elements, that allows very powerful and robust queries.
tyckr uses the jsoup engine for data extraction, the following is taken from the jsoup documentation:
-
tagname
find elements by tag, e.g.a
-
ns|tag
find elements by tag in a namespace, e.g.fb|name
finds<fb:name>
elements -
#id
find elements by ID, e.g.#logo
-
.class
find elements by class name, e.g..masthead
-
[attribute]
elements with attribute, e.g.[href]
-
[^attr]
elements with an attribute name prefix, e.g.[^data-]
finds elements with HTML5 dataset attributes -
[attr=value]
elements with attribute value, e.g.[width=500]
(also quotable, like[data-name='launch sequence']
) -
[attr^=value]
,[attr$=value]
,[attr*=value]
elements with attributes that start with, end with, or contain the value, e.g.[href*=/path/]
-
[attr~=regex]
elements with attribute values that match regular expression; e.g.img[src~=(?i)\.(png|jpe?g)]
-
*
all elements, e.g.*
Selector combinations
-
el#id
elements with ID, e.g.div#logo
-
el.class
elements with class, e.g.div.masthead
-
el[attr]
elements with attribute, e.g.a[href]
-
Any combination, e.g.
a[href].highlight
-
ancestor child
child elements that descend from ancestor, e.g..body p
findsp
elements anywhere under a block with class "body" -
parent > child
child elements that descend directly from parent, e.g.div.content > p
findsp
elements; andbody > *
finds the direct children of the body tag -
siblingA + siblingB
finds sibling B element immediately preceded by sibling A, e.g.div.head + div
-
siblingA ~ siblingX
finds sibling X element preceded by sibling A, e.g.h1 ~ p
-
el, el, el
group multiple selectors, find unique elements that match any of the selectors; e.g.div.masthead, div.logo
Pseudo selectors
-
:lt(n)
find elements whose sibling index (i.e. its position in the DOM tree relative to its parent) is less than n; e.g.td:lt(3)
-
:gt(n)
find elements whose sibling index is greater than n; e.g.div p:gt(2)
-
:eq(n)
find elements whose sibling index is equal to n; e.g.form input:eq(1)
-
:has(selector)
find elements that contain elements matching the selector; e.g.div:has(p)
-
:not(selector)
find elements that do not match the selector; e.g.div:not(.logo)
-
:contains(text)
find elements that contain the given text. The search is case-insensitive; e.g.p:contains(jsoup)
-
:containsOwn(text)
find elements that directly contain the given text -
:matches(regex)
find elements whose text matches the specified regular expression; e.g.div:matches((?i)login)
-
:matchesOwn(regex)
find elements whose own text matches the specified regular expression -
Note that the above indexed pseudo-selectors are 0-based, that is, the first element is at index 0, the second at 1, etc.
See the Selector API reference for the full supported list and details.