This python package allows you to integrate the StartupRadar API directly into your own Data or Machine Learning pipelines. With only a list of domains, you can create huge Pandas DataFrames filled with all the data available on StartupRadar.
- Creates a human-readable DataFrame for usage in Excel or Google Spreadsheets (through CSV).
Transformers in this module create data from API functionality. All transformers in this module require API access.
LinkTransformer
: Create columns for all the domains a given domain links toBacklinkTransformer
: Create columns for all the domains that link to the given domainDomainTextTransformer
: Create a text column with the homepage text of the given domainBacklinkTypeCounter
: Counts the types of pages that link to a specific domain
Transformers in this module work with DataFrames and provide useful feature generation on domains. The transformers in this module don't require the API and can be used by anyone.
DomainNameTransformer
: Extract features from a domain name, currently only top level domain, e.g.com
orio
CommonStringTransformer
: Application of aCountVectorizer
to find common strings among passed inputsColumnPrefixTransformer
: Create a DataFrame with the same column names, but prefixed with e.g.prefix_
CounterTransformer
: Create row-wise Counter objects and distribute keys as columns
Transformers that re-implement scikit-learn transformers, to also output Pandas DataFrames. These transformers can be used by anyone, no API key necessary.
OneHotEncoderDF
: One Hot Encoder outputting a dense and consistent DataFrameFeatureUnionDF
: Create a FeatureUnion with pd.DataFrames as input and outputPipelineDF
: Creates a pipeline that retains DataFrames and their column namesTfidfVectorizerDF
: Adaption of the sklearn transformerCountVectorizerDF
: Adaption of the sklearn transformer
Transformers we're thinking about that may be coming soon:
- something to leverage the similar domains endpoint
- tfidf of all backlinks or (forward) links combined (domain- or url-level)
For most transformers, you can simply pass a series of domain names as input. In the case of the DomainNameTransformer, it could look like this:
> import pandas as pd
> from startupradar.transformers.util import DomainNameTransformer
>
> domains = ["loreyventures.com", "startupradar.co", "karllorey.com"]
> domains_series = pd.Series(domains)
> t = DomainNameTransformer()
> t.fit_transform(domains_series)
tld
loreyventures.com com
startupradar.co co
karllorey.com com
This is a work in progress. You should expect things to change on a daily basis. If you're still convinced to try it, feel free to check the latest version by installing it as a git-based dependency:
> pip install git+https://github.com/startupradar/transformers.git