Wappalyzer.WebPage : API documentation

class documentation

class WebPage: (source)

Simple representation of a web page, decoupled from any particular HTTP library's API.

Well, except for the class methods that use requests or aiohttp to create the WebPage.

This object is designed to be created for each website scanned by python-Wappalyzer. It will parse the HTML with BeautifulSoup to find <script> and <meta> tags.

You can create it from manually from HTML with the WebPage() method or from the class methods.

Method	`__init__`	Initialize a new WebPage object manually.
Instance Variable	`url`	Undocumented
Instance Variable	`html`	Undocumented
Instance Variable	`headers`	Undocumented
Instance Variable	`scripts`	Undocumented
Instance Variable	`parsed_html`	Undocumented
Instance Variable	`meta`	Undocumented
Class Method	`new_from_url`	Constructs a new WebPage object for the URL, using the `requests` module to fetch the HTML.
Class Method	`new_from_response`	Constructs a new WebPage object for the response, using the `BeautifulSoup` module to parse the HTML.
Async Class Method	`new_from_url_async`	Same as new_from_url only Async.
Async Class Method	`new_from_response_async`	Constructs a new WebPage object for the response, using the `BeautifulSoup` module to parse the HTML.
Method	`_parse_html`	Parse the HTML with BeautifulSoup to find <script> and <meta> tags.

def __init__(self, url, html, headers): (source)

Initialize a new WebPage object manually.

>>> from Wappalyzer import WebPage
>>> w = WebPage('exemple.com',  html='<strong>Hello World</strong>', headers={'Server': 'Apache', })

Parameters	url	The web page URL. (type: `str`)
	html	The web page content (HTML) (type: `str`)
	headers	The HTTP response headers (type: `Mapping[str, Any]`)

url = (source)

Undocumented

html = (source)

Undocumented

headers = (source)

Undocumented

scripts = (source)

Undocumented

(type: List[str])

def _parse_html(self): (source)

Parse the HTML with BeautifulSoup to find <script> and <meta> tags.

parsed_html = (source)

Undocumented

meta = (source)

Undocumented

@classmethod
def new_from_url(cls, url, **kwargs): (source)

Constructs a new WebPage object for the URL, using the requests module to fetch the HTML.

>>> from Wappalyzer import WebPage
>>> page = WebPage.new_from_url('exemple.com', timeout=5)

Parameters	url	URL (type: `str`)
	kwargs	Any other arguments are passed to `requests.get` method as well. (type: `Any`)
	headers	(optional) Dictionary of HTTP Headers to send.
	cookies	(optional) Dict or CookieJar object to send.
	timeout	(optional) How many seconds to wait for the server to send data before giving up.
	proxies	(optional) Dictionary mapping protocol to the URL of the proxy.
	verify	(optional) Boolean, it controls whether we verify the SSL certificate validity.
Returns	Undocumented (type: `WebPage`)

@classmethod
def new_from_response(cls, response): (source)

Constructs a new WebPage object for the response, using the BeautifulSoup module to parse the HTML.

Parameters	response	`requests.Response` object (type: `requests.Response`)
Returns	Undocumented (type: `WebPage`)

@classmethod
async def new_from_url_async(cls, url, verify=True, aiohttp_client_session=None, **kwargs): (source)

Same as new_from_url only Async.

Constructs a new WebPage object for the URL, using the aiohttp module to fetch the HTML.

>>> from Wappalyzer import WebPage
>>> from aiohttp import ClientSession
>>> async with ClientSession() as session:
...     page = await WebPage.new_from_url_async(aiohttp_client_session=session)

Parameters	url	URL (type: `str`)
	verify	(optional) Boolean, it controls whether we verify the SSL certificate validity. (type: `bool`)
	aiohttp_client_session	`aiohttp.ClientSession` instance to use, optional. (type: `aiohttp.ClientSession`)
	kwargs	Any other arguments are passed to `aiohttp.ClientSession.get` method as well. (type: `Any`)
	headers	Dict. HTTP Headers to send with the request (optional).
	cookies	Dict. HTTP Cookies to send with the request (optional).
	timeout	Int. override the session's timeout (optional)
	proxy	Proxy URL, `str` or `yarl.URL` (optional).
Returns	Undocumented (type: `WebPage`)

@classmethod
async def new_from_response_async(cls, response): (source)

Constructs a new WebPage object for the response, using the BeautifulSoup module to parse the HTML.

>>> from aiohttp import ClientSession
>>> wappalyzer = Wappalyzer.latest()
>>> async with ClientSession() as session:
...     page = await session.get("http://example.com")
...
>>> webpage = await WebPage.new_from_response_async(page)

Parameters	response	`aiohttp.ClientResponse` object (type: `aiohttp.ClientResponse`)
Returns	Undocumented (type: `WebPage`)

API Documentation for python-Wappalyzer, generated by pydoctor 21.2.2 at 2021-06-15 15:04:17.