-
-
Notifications
You must be signed in to change notification settings - Fork 349
API
PyWhat has its own API, it will return a JSON object like:
{
"File Signatures": null,
"Language": null,
"Regexes": [
{
"Matched": "https://google.com/",
"Regex Pattern": {
"Name": "Uniform Resource Locator (URL)",
"Regex": "(https?:\\/\\/(?:www\\.|(?!www))[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\\.[^\\s]{2,}|www\\.[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\\.[^\\s]{2,}|https?:\\/\\/(?:www\\.|(?!www))[a-zA-Z0-9]+\\.[^\\s]{2,}|www\\.[a-zA-Z0-9]+\\.[^\\s]{2,})",
"Description": "A Uniform Resource Location (URL) pointing to a web address.",
"Rarity": 1,
"Tags": [
"Identifiers"
]
}
}
]
}
To use this API, run this code:
from pywhat import identifier
id = identifier.Identifier()
id.identify(text, api=True)
This is sorted from most probable to least probable.
To filter out what regexes should be used or shown, we can use distributions. A distribution is just a regex list but with a filter applied to it.
A nice use-case is Wannacry. Using distributions you can only get all the domains from malware (no crypto-addresses) and use that to auto-buy those domains if possible. Potentially stopping the malware if it has a built in kill-switch!
We start by importing the necessary libraries:
from pywhat import pywhat_tags, Distribution
from pywhat.helper import CaseInsensitiveSet, InvalidTag, load_regexes
Now we can make a filter:
filter1 = {"MinRarity": 0.3, "Tags": ["Networking"], "ExcludeTags": ["Identifiers"]}
We only support:
- MinRarity Rarity is a measure of how unlikely it is for something to be a false-positive. Rarity of 1 == it can't be a false positive.
Rarity of 0.1 == Very likely to be a false positive.
MinRarity is the absolute minimum you'll want to see. Up this to avoid false positives!
- MaxRarity
Max rarity is the absolute maximum rarity you want to see.
- Tags
Every regex is tagged. To only use AWS specific tags, use
AWS
as the tag.
To see all tags, run what --tags
😄
- ExcludeTags What tags do you not want to see?
Let's make another filter:
from pywhat import pywhat_tags, Distribution
from pywhat.helper import CaseInsensitiveSet, InvalidTag, load_regexes
filter1 = {"MinRarity": 0.3, "Tags": ["Networking"], "ExcludeTags": ["Identifiers"]}
filter2 = {"MinRarity": 0.4, "MaxRarity": 0.8, "ExcludeTags": ["Media"]}
Distributions support logical operators! Want every tag that's in both filter1
and filter2
?
from pywhat import pywhat_tags, Distribution
from pywhat.helper import CaseInsensitiveSet, InvalidTag, load_regexes
filter1 = {"MinRarity": 0.3, "Tags": ["Networking"], "ExcludeTags": ["Identifiers"]}
filter2 = {"MinRarity": 0.4, "MaxRarity": 0.8, "ExcludeTags": ["Media"]}
dist = Distribution(filter1) & Distribution(filter2)
Or:
from pywhat import pywhat_tags, Distribution
from pywhat.helper import CaseInsensitiveSet, InvalidTag, load_regexes
filter1 = {"MinRarity": 0.3, "Tags": ["Networking"], "ExcludeTags": ["Identifiers"]}
filter2 = {"MinRarity": 0.4, "MaxRarity": 0.8, "ExcludeTags": ["Media"]}
dist = Distribution(filter1)
dist &= Distribution(filter2)
We also support logical or! Get all the items in distribution1 or distribution2!
from pywhat import pywhat_tags, Distribution
from pywhat.helper import CaseInsensitiveSet, InvalidTag, load_regexes
filter1 = {"MinRarity": 0.3, "Tags": ["Networking"], "ExcludeTags": ["Identifiers"]}
filter2 = {"MinRarity": 0.4, "MaxRarity": 0.8, "ExcludeTags": ["Media"]}
filter3 = {"ExcludeTags": ["AWS"]}
dist = Distribution(filter1) | Distribution(filter2)
dist |= Distribution(filter3)