Skip to content
This repository has been archived by the owner on Jul 1, 2023. It is now read-only.

Implement more layers that are available in Keras #54

Open
rxwei opened this issue Mar 11, 2019 · 34 comments
Open

Implement more layers that are available in Keras #54

rxwei opened this issue Mar 11, 2019 · 34 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@rxwei
Copy link
Contributor

rxwei commented Mar 11, 2019

No description provided.

@rxwei rxwei added enhancement New feature or request help wanted Extra attention is needed labels Mar 11, 2019
@tanmayb123
Copy link
Contributor

@rxwei Quick question on this: Do we also want to add layers like Add Subtract Multiply Concatenate etc.? Two things to note:

  1. If we do, we could just sequence these operations instead of having to run the operations on the Tensors.
  2. How would a layer handle an arbitrary length of inputs? I'm aware of how to send multiple pre-defined inputs using a struct, but could you just pass an array of Tensors (i.e. [Tensor<Scalar>]) for many inputs?

@rxwei
Copy link
Contributor Author

rxwei commented Mar 18, 2019

Hi @tanmayb123, let’s not add those operator-like layers for now and defer it to further discussions. Ideally, we would want functions to be able to conform to the layer protocol, but it’s currently impossible in Swift, so we’ll probablu need a single type wrapper that can turn any differentiable function to a layer.

As for arbitrary-length input/output, there’s ongoing work on making Array conform to Differentiable. It should be resolved next week or so.

@rxwei
Copy link
Contributor Author

rxwei commented Mar 20, 2019

Random thought: We can define layer wrappers for common arities (unary and binary), and define a layer(_:) factory function for turning any Differentiable-to-Differentiable function into a layer.

Rough sketch:

public struct ClosureLayer<T: Differentiable, U: Differentiable>: Layer {
    @noDerivative
    public var function: @differentiable (T) -> U
    public func applied(to input: T) -> U {
        return function(input)
    }
}

public struct ClosureLayer2<T: Differentiable, U: Differentiable, V: Differentiable>: Layer {
    ...
}

func layer<T: Differentiable, U: Differentiable>(_ function: @differentiable (T) -> U) -> ClosureLayer<T, U> {
    return ClosureLayer(function: function)
}

...

Then, you'll be able to use functions in sequenced(in:through:).

input.sequenced(in: context, through: conv, maxPool, layer(sin), ...)

What's better: You can now use a trailing closure to create anonymous layers!

let myLayer = layer { (input: Tensor<Float>) in
    sin(cos(x)) + someParameter
}

We have plans and designs for differentiating w.r.t. closure captures. Therefore, you will even be able to differentiate through this layer and optimize someParameter.

@Shashi456
Copy link
Contributor

@rxwei could we go onto make a list of what is available and whats to be done? so that gives a clearer picture of what layers need to be added? #53, #52 also are layer adding issues. So making a issue and referencing all of them might make the task easier.

@Shashi456
Copy link
Contributor

Shashi456 commented Mar 28, 2019

The implementations include :

Convolution Layer :

  • Conv 1D Layer
  • Conv 2D Layer
  • Conv 3D Layer
  • SeperableConv 1D
  • SeperableConv 2D
  • Depthwise Convolution 2D
  • Convolution 2D Transpose
  • Convolution 3D Transpise
  • Deconvolution 2D
  • Deconvolution 3D
  • UpSampling 1D
  • UpSampling 2D
  • UpSampling 3D
  • ZeroSampling 1D
  • ZeroSampling 2D
  • ZeroSampling 3D
  • Cropping 1D
  • Cropping 2D
  • Cropping 3D

Pooling :

  • MaxPool 1D
  • MaxPool 2D
  • AvgPool 1D
  • AvgPool 2D
  • AvgPool 3D
  • MaxPool 3D
  • GlobalMaxPool 1D
  • GlobalMaxPool 2D
  • GlobalMaxPool 3D
  • GlobalAvgPool 1D
  • GlobalAvgPool 2D
  • GlobalAvgPool 3D

Normalization :

  • BatchNorm
  • LayerNorm

Embedding :

  • Embedding

Recurrent :

  • RNN
  • LSTM
  • LSTMCell
  • GRU
  • GRUCell
  • SimpleRNN
  • SimpleRNNCell
  • StackedRNNCell

Core :

  • Dense
  • Flatten
  • Reshape
  • Dropout
  • Masking
  • SpatialDropout 1D
  • SpatialDropout 2D
  • SpatialDropout

Recursive Neural Networks #68

Activation Layers :
- [ ] Relu
- [ ] ELU
- [ ] Leaky Relu

There are a few more layers in core and Activation and a few more classes like merge classe which has add, concat, dot, minimum, maximum etc, there's convolutional recurrent layers and noise and local class, the above ones are important as of now imo and we can focus on implementing those.

Activation Layers will be added as function, refer to the discussion above.

@rxwei is this okay for a start? I'll make a more comprehensive list in a while.

@tanmayb123
Copy link
Contributor

@Shashi456 that’s a great list - thanks for compiling it. Just two things:

  1. Dropout has been implemented, but it’s not checked on your list.
  2. According to what Richard and I discussed above, I thought we’re not planning on creating layers like activation layers? Rather, to just pass values through the functions, or pass the functions to layers as an activation function (as you can do right now).

@Shashi456
Copy link
Contributor

@tanmayb123 Alright I will remove the activate layers. But is that for sure? Weren't they made layers to make the process more intuitive in the first place?

@tanmayb123
Copy link
Contributor

@rxwei what do you think?

@rxwei
Copy link
Contributor Author

rxwei commented Mar 28, 2019

@Shashi456 Thanks a lot for listing these! Looks good to me. I'd suggest starting with the non-recurrent ones first.

@aman-bhu
Copy link

@rxwei , I am willing to contribute. Can I implement one the above listed layers?

@rxwei
Copy link
Contributor Author

rxwei commented Apr 20, 2019

Absolutely! What would you like to implement?

@aman-bhu
Copy link

I am planning for Conv 3D Layer.

@rxwei
Copy link
Contributor Author

rxwei commented Apr 20, 2019

Sounds great. Look forward to your PR.

@Shashi456
Copy link
Contributor

@rxwei @dan-zheng I wanted to ask if it'd be possible to add more aliases for different kinds of layers in the repo?
For example GlobalAvgpooling = GlobalAveragePooling etc.
and maybe also for the losses.
Like Meansquarederror = MSE and sigmoidcrossentropy = XENT

@rxwei
Copy link
Contributor Author

rxwei commented Apr 22, 2019

IMO it is ideal to stick with one set of names for consistency in all our models and example code. Currently we are leaning towards consistency with Keras. This will ensure we have overall consistency in our recommendations, while the user has the full freedom to define any aliases they want in their libraries.

@Shashi456
Copy link
Contributor

Shashi456 commented May 23, 2019

#130 shows that upsampling 3D doesn't work. We are currently looking at ways to fix it. One way to do it is, to take an approach inspired by the Keras Implementation of the same.

Solved.

@dan-zheng
Copy link
Member

#130 shows that upsampling doesn't work. We are currently looking at ways to fix it. One way to do it is, to take an approach inspired by the Keras Implementation of the same.

To be precise, only UpSampling3D doesn't work, because it works with 8-D tensors that are too high-dimensional for broadcasting.

@lakshya-sky
Copy link
Contributor

Hi, @Shashi456
when seperableconv2d will be available? so that i can implement mobilenet using s4tf.

@Shashi456
Copy link
Contributor

@Dash2507, sometime next week. I'm working on it locally right now, I'll push it once I'm done with the other PRs.

@Shashi456
Copy link
Contributor

@rxwei just had a simple question, So the convolution layers also have a zero padding layer but we already have a padded function, Do i write the layers anyway? I'm just trying to avoid redundancy since they would be wrappers just calling this function

@rxwei
Copy link
Contributor Author

rxwei commented Jun 16, 2019

We already have such layers, Reshape, for example. Adding a layer wrapper for each function is definitely not ideal and would complicate our API surface. Instead of throwing a lot of work into implementing those wrapper layers, I'd suggest trying define a Function (or, Lambda) layer that takes any arbitrary differentiable function and uses it inside callAsFunction(_:). Essentially, it's going to look like this:

public struct Function<InputScalar: TensorFlowFloatingPoint, OutputScalar: TensorFlowFloatingPoint>: Layer {
    public typealias Input = Tensor<InputScalar>
    public typealias Input = Tensor<OutputScalar>
    public var body: @differentiable (Input) -> Output
    public init(body: @differentiable (Input) -> Output) {
        self.body = body
    }
    public func callAsFunction(_ input: Input) -> Output {
        body(input)
    }
}

With this, you can turn any closure to a layer:

let tanhLayer = Function<Float, Float>(tanh)
let reshapeLayer = Function<Float, Float> { x in x.reshaped(to: [10, 10]) }
let paddingLayer = Function<Float, Float> { x in x.padded(forSizes: [(0, 1)], with: 0) }

Would you like to prototype this?

@Shashi456
Copy link
Contributor

Alright, I'll get a PR up later today.

@jon-tow
Copy link
Contributor

jon-tow commented Jun 17, 2019

I've attempted an implementation of an Embedding layer but am running into problems with the Layer protocol's input type requirements. Given that an Embedding layer consumes tensors of indices (UInt/Int) there's no way to satisfy the differentiability of callAsFunction(_:). Is there a work around to this?

@dan-zheng I've noticed an implementation of a Differentiable Embedding struct in the GPT-2 model found in the swift-models repo (GPT-2 Transformer). This doesn't conform to the Layer protocol but could we bring it into the API since it's quite useful for NLP tasks?

@Shashi456
Copy link
Contributor

@jon-tow did you also define a vjp for your embedding layer?

@rxwei
Copy link
Contributor Author

rxwei commented Jun 17, 2019

I've attempted an implementation of an Embedding layer but am running into problems with the Layer protocol's input type requirements. Given that an Embedding layer consumes tensors of indices (UInt/Int) there's no way to satisfy the differentiability of callAsFunction(_:). Is there a work around to this?

For now, you can define a nested Input structure and mark the vocabulary property as @noDerivative. Something like:

struct Embedding<Scalar: TensorFlowFloatingPoint> {
    struct Input: Differentiable {
        @noDerivative var vocabulary: Tensor<Int32>
    }
    func callAsFunction(_ input: Input) -> Tensor<Scalar> {
        ...
    }
}

@jon-tow
Copy link
Contributor

jon-tow commented Jun 17, 2019

Hey @Shashi456. Yup. It just wouldn't compile as it relied on the Raw.gather(params:, atIndices:) function which requires a BinaryInteger for the second argument. Thanks @rxwei I'll give it a try.

@eaplatanios
Copy link
Contributor

eaplatanios commented Jun 17, 2019 via email

@rxwei
Copy link
Contributor Author

rxwei commented Jun 17, 2019

Specifying @differentiable(wrt: self) is not possible yet because the Layer protocol requires both input and self to be differentiable. There are definitely a lot of ways to resolve this, e.g. defining a separate protocol that Layer inherits from and make that protocol only require self to be differentiable. However, that requires some non-trivial thunking-related engineering right now.

@rxwei
Copy link
Contributor Author

rxwei commented Jun 17, 2019

It just wouldn't compile as it relied on the Raw.gather(params:, atIndices:) function which requires a BinaryInteger for the second argument.

Hope we can merge #151 so that you can use gathering(atIndices:alongAxis:).

@jon-tow
Copy link
Contributor

jon-tow commented Jun 17, 2019

Richard's advice resolved the compiler issues I had before regarding input types. Thanks for the suggestion @eaplatanios.
The only issue left seems to be differentiating gathering. I'll keep an eye out for that merge. Appreciate the help folks!

@bartchr808
Copy link
Contributor

Hey @jon-tow ! Actually we were mistaken but #156 already added gathering(atIndices:alongAxis:) so you should have access to it! 😄

@jon-tow
Copy link
Contributor

jon-tow commented Jun 17, 2019

@bartchr808 I had some tests passing and everything seemed okay. I was wondering what was going on! Thanks for letting me know :). I'll submit a PR sometime today.

@Shashi456
Copy link
Contributor

@rxwei So I've been working on the Function layer we were talking about the other day,

public struct Function<InputScalar: TensorFlowFloatingPoint, OutputScalar:TensorFlowFloatingPoint>: Layer {
    public typealias Input = Tensor<InputScalar>
 
    public typealias Output = Tensor<OutputScalar>
  
    public typealias Body = @differentiable (Input) -> Output
  
    @noDerivative public let body: Body
  
    public init(
        body: @escaping Body) {
        self.body = body
    }
    
    @differentiable
    public func callAsFunction(_ input: Input) -> Output {
        return body(input)
    }
}

Does this look right? I run into this error that the layer doesn't conform to protocol of Layer and that a call function is needed. As far as i understand, for a structure to inherit a protocol, you need to extend and define all the functions in the protocol, something like abstract classes theoretically. Any thoughts on where i might be going or doing it wrong?

@tanmayb123
Copy link
Contributor

public struct Function<InputScalar: TensorFlowFloatingPoint, OutputScalar: TensorFlowFloatingPoint>: Layer {
    public typealias Input = Tensor<InputScalar>
    public typealias Output = Tensor<OutputScalar>
    public typealias Body = @differentiable (Input) -> Output

    @noDerivative public let body: Body

    public init(body: @escaping Body) {
        self.body = body
    }

    @differentiable
    public func callAsFunction(_ input: Input) -> Output {
        return body(input)
    }

    @differentiable
    public func call(_ input: Input) -> Output {
        return callAsFunction(input)
    }
}

That compiles for me.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

9 participants