class WebPage: (source)
Simple representation of a web page, decoupled from any particular HTTP library's API.
Well, except for the class methods that use requests
or aiohttp
to create the WebPage.
This object is designed to be created for each website scanned by python-Wappalyzer. It will parse the HTML with BeautifulSoup to find <script> and <meta> tags.
You can create it from manually from HTML with the WebPage()
method
or from the class methods.
Method | __init__ |
Initialize a new WebPage object manually. |
Instance Variable | url |
Undocumented |
Instance Variable | html |
Undocumented |
Instance Variable | headers |
Undocumented |
Instance Variable | scripts |
Undocumented |
Instance Variable | parsed_html |
Undocumented |
Instance Variable | meta |
Undocumented |
Class Method | new_from_url |
Constructs a new WebPage object for the URL, using the requests module to fetch the HTML. |
Class Method | new_from_response |
Constructs a new WebPage object for the response, using the BeautifulSoup module to parse the HTML. |
Async Class Method | new_from_url_async |
Same as new_from_url only Async. |
Async Class Method | new_from_response_async |
Constructs a new WebPage object for the response, using the BeautifulSoup module to parse the HTML. |
Method | _parse_html |
Parse the HTML with BeautifulSoup to find <script> and <meta> tags. |
Initialize a new WebPage object manually.
>>> from Wappalyzer import WebPage >>> w = WebPage('exemple.com', html='<strong>Hello World</strong>', headers={'Server': 'Apache', })
Parameters | url | The web page URL. (type: str ) |
html | The web page content (HTML) (type: str ) | |
headers | The HTTP response headers (type: Mapping[str, Any] ) |
Constructs a new WebPage object for the URL,
using the requests
module to fetch the HTML.
>>> from Wappalyzer import WebPage >>> page = WebPage.new_from_url('exemple.com', timeout=5)
Parameters | url | URL (type: str ) |
kwargs | Any other arguments are passed to requests.get method as well. (type: Any ) | |
headers | (optional) Dictionary of HTTP Headers to send. | |
cookies | (optional) Dict or CookieJar object to send. | |
timeout | (optional) How many seconds to wait for the server to send data before giving up. | |
proxies | (optional) Dictionary mapping protocol to the URL of the proxy. | |
verify | (optional) Boolean, it controls whether we verify the SSL certificate validity. | |
Returns | Undocumented (type: WebPage ) |
BeautifulSoup
module to parse the HTML.Parameters | response | requests.Response object (type: requests.Response ) |
Returns | Undocumented (type: WebPage ) |
Same as new_from_url only Async.
Constructs a new WebPage object for the URL,
using the aiohttp
module to fetch the HTML.
>>> from Wappalyzer import WebPage >>> from aiohttp import ClientSession >>> async with ClientSession() as session: ... page = await WebPage.new_from_url_async(aiohttp_client_session=session)
Parameters | url | URL (type: str ) |
verify | (optional) Boolean, it controls whether we verify the SSL certificate validity. (type: bool ) | |
aiohttp_client_session | aiohttp.ClientSession instance to use, optional. (type: aiohttp.ClientSession ) | |
kwargs | Any other arguments are passed to aiohttp.ClientSession.get method as well. (type: Any ) | |
headers | Dict. HTTP Headers to send with the request (optional). | |
cookies | Dict. HTTP Cookies to send with the request (optional). | |
timeout | Int. override the session's timeout (optional) | |
proxy | Proxy URL, str or yarl.URL (optional). | |
Returns | Undocumented (type: WebPage ) |
Constructs a new WebPage object for the response,
using the BeautifulSoup
module to parse the HTML.
>>> from aiohttp import ClientSession >>> wappalyzer = Wappalyzer.latest() >>> async with ClientSession() as session: ... page = await session.get("http://example.com") ... >>> webpage = await WebPage.new_from_response_async(page)
Parameters | response | aiohttp.ClientResponse object (type: aiohttp.ClientResponse ) |
Returns | Undocumented (type: WebPage ) |