APPS
Adhoc Web Page Data Scraper
A simple tool to scan a domain and aliases to report common risks and issues.
What is This?
A simple python application which, given:
- a URL
- a CSS Selector to match elements on the page
- a list of attributes
Will Get the URL, scrape the CSS Selected elements and show the attributes in a table which can be sorted, filtered, navigated and exported to CSV.
A ‘special’ attribute of contents
will retrieve the contents of the element.
For Example:
- URL:
https://talotics.com/archive/
- CSS Selector:
article ul li a
- Attributes:
contents, href
Would scrape the archive page at https://talotics.com/archive retrieve all the anchor elements within the list in the article and show a table containing the text of the link and the url it links to:
Local Execution
The code is open source and can be found on github.
- Check Python is installed
- in the
src
folder for the repo code -pip3 install -r requirements.txt
The command below will run the streamlit GUI version:
streamlit run doStreamlitAdhocWebScraper.py
The application uses Streamlit and the Streamlit AG Grid plugin to render the grid of data. AG Grid is an open source JavaScript Data Grid to make rendering interactive tabular data easy.