APPS

Adhoc Web Page Data Scraper

A simple tool to scan a domain and aliases to report common risks and issues.

What is This?

A simple python application which, given:

a URL
a CSS Selector to match elements on the page
a list of attributes

Will Get the URL, scrape the CSS Selected elements and show the attributes in a table which can be sorted, filtered, navigated and exported to CSV.

A ‘special’ attribute of contents will retrieve the contents of the element.

For Example:

URL: https://talotics.com/archive/
CSS Selector: article ul li a
Attributes: contents, href

Would scrape the archive page at https://talotics.com/archive retrieve all the anchor elements within the list in the article and show a table containing the text of the link and the url it links to:

Scraper Results

Local Execution

The code is open source and can be found on github.

Check Python is installed
in the src folder for the repo code -pip3 install -r requirements.txt

The command below will run the streamlit GUI version:

streamlit run doStreamlitAdhocWebScraper.py

The application uses Streamlit and the Streamlit AG Grid plugin to render the grid of data. AG Grid is an open source JavaScript Data Grid to make rendering interactive tabular data easy.