Skip to content

Commit

Permalink
Merge pull request #6 from RubixML/0.3.0-beta
Browse files Browse the repository at this point in the history
0.3.0 beta
  • Loading branch information
andrewdalpino authored Jan 1, 2021
2 parents 7f4fde2 + 72f3c73 commit f9d7371
Show file tree
Hide file tree
Showing 38 changed files with 1,166 additions and 760 deletions.
1 change: 1 addition & 0 deletions .github/FUNDING.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
github: [RubixML, andrewdalpino]
2 changes: 2 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@ jobs:
uses: shivammathur/setup-php@v2
with:
php-version: ${{ matrix.php-versions }}
tools: pecl
extensions: bz2
ini-values: memory_limit=-1

- name: Validate composer.json
Expand Down
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,11 @@
- 0.3.0-beta
- Added Vantage Point Tree for spatial queries
- Added Bzip2 serializers
- Added Levenshtein distance kernel
- Move K Best Selector to main repository
- Added custom exceptions from the main repo
- Moved Flysystem Persister over to main repo

- 0.2.1-beta
- Implemented K Best feature selector

Expand Down
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
MIT License

Copyright (c) 2020 The Rubix ML Community
Copyright (c) 2020 Rubix ML
Copyright (c) 2020 Andrew DalPino

Permission is hereby granted, free of charge, to any person obtaining a copy
Expand Down
5 changes: 4 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,5 +10,8 @@ $ composer require rubix/extras
### Requirements
- [PHP](https://php.net/manual/en/install.php) 7.2 or above

##### Optional
- [Bzip2 extension](https://www.php.net/manual/en/book.bzip2.php) for Bzip2 compression

## License
[MIT](https://github.com/RubixML/Extras/blob/master/LICENSE.md)
The code is licensed [MIT](LICENSE) and the documentation is licensed [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/).
Original file line number Diff line number Diff line change
@@ -1,28 +1,28 @@
<?php

namespace Rubix\ML\Benchmarks\Transformers;
namespace Rubix\ML\Benchmarks\Graph\Trees;

use Rubix\ML\Graph\Trees\VPTree;
use Rubix\ML\Datasets\Generators\Blob;
use Rubix\ML\Transformers\KBestSelector;
use Rubix\ML\Datasets\Generators\Agglomerate;

/**
* @Groups({"Transformers"})
* @Groups({"Trees"})
* @BeforeMethods({"setUp"})
*/
class KBestSelectorBench
class VPTreeBench
{
protected const DATASET_SIZE = 10000;

/**
* @var \Rubix\ML\Datasets\Labeled
* @var \Rubix\ML\Datasets\Labeled;
*/
public $dataset;
protected $dataset;

/**
* @var \Rubix\ML\Transformers\KBestSelector
* @var \Rubix\ML\Graph\Trees\VPTree
*/
protected $transformer;
protected $tree;

public function setUp() : void
{
Expand All @@ -34,16 +34,16 @@ public function setUp() : void

$this->dataset = $generator->generate(self::DATASET_SIZE);

$this->transformer = new KBestSelector(2);
$this->tree = new VPTree(30);
}

/**
* @Subject
* @Iterations(3)
* @OutputTimeUnit("milliseconds", precision=3)
* @OutputTimeUnit("seconds", precision=3)
*/
public function apply() : void
public function grow() : void
{
$this->dataset->apply($this->transformer);
$this->tree->grow($this->dataset);
}
}
11 changes: 7 additions & 4 deletions composer.json
Original file line number Diff line number Diff line change
Expand Up @@ -10,21 +10,24 @@
"ai", "rubixml", "rubix ml"
],
"authors": [
{
"name": "Andrew DalPino",
"homepage": "https://github.com/andrewdalpino",
"role": "Lead Engineer"
},
{
"name": "Contributors",
"homepage": "https://github.com/RubixML/Extras/graphs/contributors"
}
],
"require": {
"php": ">=7.2",
"rubix/ml": "^0.2.0",
"rubix/ml": "0.3.0",
"rubix/tensor": "^2.0.4",
"wamania/php-stemmer": "^2.0",
"league/flysystem": "2.0.0-beta.3"
"wamania/php-stemmer": "^2.0"
},
"require-dev": {
"friendsofphp/php-cs-fixer": "2.16.*",
"league/flysystem-memory": "2.0.0-beta.3",
"phpbench/phpbench": "0.17.*",
"phpstan/extension-installer": "^1.0",
"phpstan/phpstan": "0.12.*",
Expand Down
28 changes: 28 additions & 0 deletions docs/graph/trees/vp-tree.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
<span style="float:right;"><a href="https://github.com/RubixML/Extras/blob/master/src/Graph/Trees/VPTree.php">[source]</a></span>

# VP Tree
A Vantage Point Tree is a binary spatial tree that divides samples by their distance from the center of a cluster called the *vantage point*. Samples that are closer to the vantage point will be put into one branch of the tree while samples that are farther away will be put into the other branch.

**Interfaces:** Binary Tree, Spatial

**Data Type Compatibility:** Depends on distance kernel

## Parameters
| # | Param | Default | Type | Description |
|---|---|---|---|---|
| 1 | max leaf size | 30 | int | The maximum number of samples that each leaf node can contain. |
| 2 | kernel | Euclidean | Distance | The distance kernel used to compute the distance between sample points. |

## Example
```php
use Rubix\ML\Graph\Trees\VPTree;
use Rubix\ML\Kernels\Distance\Euclidean;

$tree = new VPTree(30, new Euclidean());
```

## Additional Methods
This tree does not have any additional methods.

### References
>- P. N. Yianilos. (1993). Data Structures and Algorithms for Nearest Neighbor Search in General Metric Spaces.
16 changes: 16 additions & 0 deletions docs/kernels/distance/levenshtein.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
<span style="float:right;"><a href="https://github.com/RubixML/Extras/blob/master/src/Kernels/Distance/Levenshtein.php">[source]</a></span>

# Levenshtein
Levenshtein distance is defined as the number of single-character edits (such as insert, delete, or replace) needed to change one word to another.

**Data Type Compatibility:** Categorical

## Parameters
This kernel does not have any parameters.

## Example
```php
use Rubix\ML\Kernels\Distance\Levenshtein;

$kernel = new Levenshtein();
```
24 changes: 24 additions & 0 deletions docs/persisters/Serializers/bzip2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
<span style="float:right;"><a href="https://github.com/RubixML/Extras/blob/master/src/Persisters/Serializers/Bzip2.php">[source]</a></span>

# Bzip2
A compression format based on the Burrows–Wheeler transform. Bzip2 is slightly smaller than Gzip format but is slower and requires more memory.

> **Note:** This serializer requires the Bzip2 PHP extension.
## Parameters
| # | Param | Default | Type | Description |
|---|---|---|---|---|
| 1 | block size | 4 | int | The size of each block between 1 and 9 where 9 gives the best compression. |
| 2 | work factor | 0 | int | Controls how the compression phase behaves when the input is highly repetitive. |
| 3 | serializer | Native | Serializer | The base serializer |

## Example
```php
use Rubix\ML\Persisters\Serializers\Bzip2;
use Rubix\ML\Persisters\Serializers\Native;

$serializer = new Bzip2(4, 125, new Native());
```

### References
>- J. Tsai. (2006). Bzip2: Format Specification.
29 changes: 0 additions & 29 deletions docs/persisters/flysystem.md

This file was deleted.

30 changes: 0 additions & 30 deletions docs/transformers/k-best-selector.md

This file was deleted.

6 changes: 6 additions & 0 deletions phpunit.xml
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,18 @@
<testsuite name="Base">
<directory>tests</directory>
</testsuite>
<testsuite name="Graph">
<directory>tests/Graph</directory>
</testsuite>
<testsuite name="NeuralNet">
<directory>tests/NeuralNet</directory>
</testsuite>
<testsuite name="Other">
<directory>tests/Other</directory>
</testsuite>
<testsuite name="Persisters">
<directory>tests/Persisters</directory>
</testsuite>
<testsuite name="Transformers">
<directory>tests/Transformers</directory>
</testsuite>
Expand Down
Loading

0 comments on commit f9d7371

Please sign in to comment.