Page Capture Tutorial

For cases where the price for an asset cannot be directly retrieved from the quote provider (Yahoo! Finance), InvestControl provides an update method called "Page Capture", that allows it to extract asset prices from web pages using a "capture expression".

In its simplest form, a capture expression is a piece of HTML code where the price can be extracted from, with the actual price replaced by a {price} tag. In more advanced cases, regular expressions can be used to build the capture expression. At every update, InvestControl looks for that code inside the page and reads the value from the capture tag, updating the asset price. 

The following is a common scenario where the asset price is located in a table with multiple assets: 

1. Using your desktop browser, locate the web page where the asset price can be found. In this example, we're trying to extract the price for fund "KD BRIC". 

2. Right-click some point in the page body and choose "View Source" (or something similar, depending on the browser). 


3. Locate where the fund name appears along with its price within the HTML markup, and extract a piece of code surrounding the data. 

KD BRIC</td><td align="center">10.04.2012</td><td align="center">09.04.2012</td><td align="right">144.4347</td> 

4. Now we must tell the program where it must extract the price from, this way: 

KD BRIC</td><td align="center">10.04.2012</td><td align="center">09.04.2012</td><td align="right">{price}</td> 

Here, {price} is a special tag which InvestControl interprets as (.+?). This is a regular expression that captures any text in its place. You could use any valid regular expression that produces a capture group here. 

There is one more detail: between the fund name and the price there are two dates that will always change. So we must tell InvestControl to match the expression regardless of what it finds in those places. 

KD BRIC</td><td align="center">{any}</td><td align="center">{any}</td><td align="right">{price}</td> 

Here, we use {any} to tell InvestControl to accept any text in that position. We could also use the expression .+? (without parenthesis) to achieve the same effect. 

Other special tags available are "{symbol}" and "{name}", which are replaced by the asset's symbol and name as entered in InvestControl (you can use them in the URL or in the capture expression). 

5. When the expression is ready, enter the webpage URL and the capture expression in InvestControl. You can optionally send them to your phone via e-mail or an app like Evernote, and then copy the text from there. To ensure that the price is recognized, use the Test button. 


Since this method involves loading a complete web page in order to extract the price, you can save bandwidth enabling option Update only once a day for assets whose price does not vary in the same day, like most mutual funds. This option is only available in Page Capture mode. 

Note: InvestControl will reuse the page data downloaded from a URL if the exact same URL is used by multiple assets. In the example above, you could extract prices for the other funds in the page without having to download it again. 

If necessary, you can use a regular expression tester (many are available online) to validate your expression before entering it in InvestControl. The following example shows that the expression entered at the top properly matches the HTML code at the bottom. 


Note that Page Capture mode will not work if a login is required in order to access the web site, and that this method is heavily dependent on the page layout, that can change often. If you notice that the price of an asset is not being updated anymore, you may have to review the capture expression. 

Click here for sample price sources submitted by our users.