This is the final installment in our three-part series on how to tell data-driven stories using import.io, where we will show you how to query multiple sources at once by recording an action on a website.
The video above shows you how to create a connector to Tesco and then how to combine that data with other similar data using a mix with import.io. The steps are as follows:
1. You should already have the import.io browser installed on your comupter, but if you don't you can download it for free.
Open the browser, navigate to Tesco's website and click the pink 'io' button in the top right corner. Then, click 'let's get cracking' and select the C button (on the right).
For this example, you don't need to login to see this data, so click 'No' and then click 'I'm there'.
2. In order to access data from this site you need to perform a search.
Click the small red record button, wait for the page to reload and then do a search on the Tesco website. For this example we used 'onions', but you can use anything you like.
Once Tesco has processed your search and you can see the search results on the page press the black stop button. This will automatically detect optimal settings and if you can still see your data, as you can in this example, press 'Yes'.
Next you need to label your input. Click the 'make input' button and type your input label into the box. Your input label can be anything you want, but if you plan on combining multiple Connectors, as you will in this example, you need to make sure you use the same input label for each. Then click 'Take me to the next step'.
For this example, there are multiple entries in this list so you need to click on the multiple results button in the middle.
Next, see if you can locate a total results counter on the page. Ff you can (as in the example) press 'yes', then do the same for multiple pages.
The final step in creating a connector is to test it. Click on the 'make input' button and do another search. You will see your search played back on the site. If the search plays back correctly click 'yes'.
3. Now it is time to begin extracting the data. Because you chose multiple results, you will need to start by training the rows.
To do this, highlight all of the data for one full result (ie. one product listing) and click the 'train rows' button. You will need to train a few examples (in this case 2) before the tool will be able to recognize the pattern of data on the page.
Next, you will need to train the exact bits of data you want on the page by adding columns. To do this, click on the 'add columns' button, type the name into the box and select the data type – text, number, currency, link, etc.
Again, it is very important that you keep your column names and types exactly the same if you plan to combine multiple connectors into a mix.
Then highlight an example of the data you want in that column and click train. In some cases you may need to train multiple examples of data for a column. For this use case, you will need to map the product name as text, price as currency and weight as text.
Once you have trained all your columns, press 'I've got what I need'.
Next you will need to train the total results by highlighting it with your cursor and clicking 'train total results'.
To train pagination simply click the 'next' or 'page 2' button on the site. import.io should automatically detect that you are on the next page and pull in the data. All you need to do is check that the rows, columns and total results are correct. In some cases you may need to do a bit of additional training.
The final step is to add another example query. Press the 'add example' and perform another search. import.io will preform the search on the site and bring all of the resulting data into your table automatically.
Again you will need to check that it has done this correctly and do another pagination test by going to the second page and checking the data for a final time.
Once you have checked all your data is correct, you can click 'I'm done creating tests' and 'upload to import.io'. Then click 'show me the data' to be taken to the dataset page.
On the dataset page you will see a search box with your input in it. If you type any search term into this box and press query, import.io will perform that search on the website and bring you back live data straight into your table.
4. Now you will want to combine multiple connectors together into a mix so that you can search one term across multiple websites. First you will need to repeat steps 1-3 for as many sites as you need data from. Remember to use the same input and column names.
To create a mix open a new tab in the import.io browser and click 'create data set'. This will open a blank dataset. Click on 'add data' and then 'create new mix'.
Next you will need to choose the connectors you want to mix together. Click on 'pick data' and then choose the connectors you want.
Now you can type in one search term into the box, press 'query' and import.io will return all the data to you live from each website.
Clicking on the small plus icon in the top left of the table will show you which source each row of data is from. You can also sort columns by clicking on them.
The first screencast showed how to extract data about the richest people in America using the import.io extractor.
And in our second screencast we showed you how to get data about food prices in Africa from behind the M-Farm login using an authenticated API.
Free daily newsletter
- In the UK, data journalism and investigations are getting more local
- Tip: Open this data journalism advent calendar throughout December
- Tip: Check out these recommendations for improving your data visualisations
- 10 key principles for data-driven storytelling
- Tip: Take note of this advice for investigating large data leaks