class documentation

class WebPage: (source)

View In Hierarchy

Simple representation of a web page, decoupled from any particular HTTP library's API.

Well, except for the class methods that use requests or aiohttp to create the WebPage.

This object is designed to be created for each website scanned by python-Wappalyzer. It will parse the HTML with BeautifulSoup to find <script> and <meta> tags.

You can create it from manually from HTML with the WebPage() method or from the class methods.

Method __init__ Initialize a new WebPage object manually.
Instance Variable url Undocumented
Instance Variable html Undocumented
Instance Variable headers Undocumented
Instance Variable scripts Undocumented
Instance Variable parsed_html Undocumented
Instance Variable meta Undocumented
Class Method new_from_url Constructs a new WebPage object for the URL, using the requests module to fetch the HTML.
Class Method new_from_response Constructs a new WebPage object for the response, using the BeautifulSoup module to parse the HTML.
Async Class Method new_from_url_async Same as new_from_url only Async.
Async Class Method new_from_response_async Constructs a new WebPage object for the response, using the BeautifulSoup module to parse the HTML.
Method _parse_html Parse the HTML with BeautifulSoup to find <script> and <meta> tags.
def __init__(self, url, html, headers): (source)

Initialize a new WebPage object manually.

>>> from Wappalyzer import WebPage
>>> w = WebPage('exemple.com',  html='<strong>Hello World</strong>', headers={'Server': 'Apache', })
ParametersurlThe web page URL. (type: str)
htmlThe web page content (HTML) (type: str)
headersThe HTTP response headers (type: Mapping[str, Any])

Undocumented

html = (source)

Undocumented

headers = (source)

Undocumented

scripts = (source)

Undocumented

(type: List[str])
def _parse_html(self): (source)
Parse the HTML with BeautifulSoup to find <script> and <meta> tags.
parsed_html = (source)

Undocumented

meta = (source)

Undocumented

@classmethod
def new_from_url(cls, url, **kwargs): (source)

Constructs a new WebPage object for the URL, using the requests module to fetch the HTML.

>>> from Wappalyzer import WebPage
>>> page = WebPage.new_from_url('exemple.com', timeout=5)
ParametersurlURL (type: str)
kwargsAny other arguments are passed to requests.get method as well. (type: Any)
headers(optional) Dictionary of HTTP Headers to send.
cookies(optional) Dict or CookieJar object to send.
timeout(optional) How many seconds to wait for the server to send data before giving up.
proxies(optional) Dictionary mapping protocol to the URL of the proxy.
verify(optional) Boolean, it controls whether we verify the SSL certificate validity.
ReturnsUndocumented (type: WebPage)
@classmethod
def new_from_response(cls, response): (source)
Constructs a new WebPage object for the response, using the BeautifulSoup module to parse the HTML.
Parametersresponserequests.Response object (type: requests.Response)
ReturnsUndocumented (type: WebPage)
@classmethod
async def new_from_url_async(cls, url, verify=True, aiohttp_client_session=None, **kwargs): (source)

Same as new_from_url only Async.

Constructs a new WebPage object for the URL, using the aiohttp module to fetch the HTML.

>>> from Wappalyzer import WebPage
>>> from aiohttp import ClientSession
>>> async with ClientSession() as session:
...     page = await WebPage.new_from_url_async(aiohttp_client_session=session)
ParametersurlURL (type: str)
verify(optional) Boolean, it controls whether we verify the SSL certificate validity. (type: bool)
aiohttp_client_sessionaiohttp.ClientSession instance to use, optional. (type: aiohttp.ClientSession)
kwargsAny other arguments are passed to aiohttp.ClientSession.get method as well. (type: Any)
headersDict. HTTP Headers to send with the request (optional).
cookiesDict. HTTP Cookies to send with the request (optional).
timeoutInt. override the session's timeout (optional)
proxyProxy URL, str or yarl.URL (optional).
ReturnsUndocumented (type: WebPage)
@classmethod
async def new_from_response_async(cls, response): (source)

Constructs a new WebPage object for the response, using the BeautifulSoup module to parse the HTML.

>>> from aiohttp import ClientSession
>>> wappalyzer = Wappalyzer.latest()
>>> async with ClientSession() as session:
...     page = await session.get("http://example.com")
...
>>> webpage = await WebPage.new_from_response_async(page)
Parametersresponseaiohttp.ClientResponse object (type: aiohttp.ClientResponse)
ReturnsUndocumented (type: WebPage)
API Documentation for python-Wappalyzer, generated by pydoctor 21.2.2 at 2021-06-15 15:04:17.