Skip to content
/ hidi Public

A library for high-dimensional latent factor modeling for collaborative filtering applications

License

Notifications You must be signed in to change notification settings

kahnvex/hidi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

f5bd480 · May 9, 2018
May 2, 2017
May 9, 2018
May 9, 2017
Apr 18, 2017
Apr 27, 2017
Apr 18, 2017
Apr 20, 2017
May 2, 2017
Apr 20, 2017
Apr 20, 2017
Apr 21, 2017
Apr 18, 2017
May 2, 2017
Apr 18, 2017

Repository files navigation

HiDi: Pipelines for Latent Factor Modeling

https://circleci.com/gh/VEVO/hidi/tree/master.svg?style=svg

HiDi is a library for high-dimensional latent factor modeling for collaborative filtering applications.

Read the full documentation.

How Do I Use It?

This will get you started.

from hidi import inout, clean, matrix, pipeline


# CSV file with link_id and item_id columns
in_files = ['hidi/examples/data/user-item.csv']

# File to write output data to
outfile = 'latent-factors.csv'

transforms = [
    inout.ReadTransform(in_files),      # Read data from disk
    clean.DedupeTransform(),            # Dedupe it
    matrix.SparseTransform(),           # Make a sparse user*item matrix
    matrix.SimilarityTransform(),       # To item*item similarity matrix
    matrix.SVDTransform(),              # Perform SVD dimensionality reduction
    matrix.ItemsMatrixToDFTransform(),  # Make a DataFrame with an index
    inout.WriteTransform(outfile)       # Write results to csv
]

pl = pipeline.Pipeline(transforms)
pl.run()

Setup

Requirements

HiDi is tested against CPython 2.7, 3.4, 3.5, and 3.6. It may work with different version of CPython.

Installation

To install HiDi, simply run

$ pip install hidi

Run the Tests

$ pip install tox
$ tox

About

A library for high-dimensional latent factor modeling for collaborative filtering applications

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published