Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR] ipfs add -w --name when adding from stdin #4986

Closed
ToxicFrog opened this issue Apr 28, 2018 · 27 comments
Closed

[FR] ipfs add -w --name when adding from stdin #4986

ToxicFrog opened this issue Apr 28, 2018 · 27 comments
Labels
help wanted Seeking public contribution on this issue kind/feature A new feature

Comments

@ToxicFrog
Copy link

Version information:

go-ipfs version: 0.4.13-
Repo version: 6
System version: amd64/linux
Golang version: go1.9.4

Type:

Feature Request

Description:

ipfs add supports -w for filename preservation, e.g. ipfs add -w foo.zip will let you access the result as /ipfs/<dir hash>/foo.zip instead of just /ipfs/<file hash>. However, it doesn't work very well when reading data from stdin:

$ cat test.txt | ipfs add -w
added Qme9Z5gJLNsQzVELtHh2oQZfjDugHGvLGuNKVtttGAJYAE 
Qme9Z5gJLNsQzVELtHh2oQZfjDugHGvLGuNKVtttGAJYAE
added Qmaq9QpAYfjF5sPAeEytKHk9rC3Vcgt6J8PRbCsNk7ypqY

And now test.txt can only be accessed as /ipfs/Qmaq9QpAYfjF5sPAeEytKHk9rC3Vcgt6J8PRbCsNk7ypqY/Qme9Z5gJLNsQzVELtHh2oQZfjDugHGvLGuNKVtttGAJYAE

This can be fixed post hoc by creating a new parent object with the right link name (using ipfs object or, more conveniently, ipfs files), but this is awkward, and (if you need the resulting object pinned) slow.

It would be much more convenient if you could specify a name for the wrapped object on the command line, and have ipfs add create the link with that name.

@Stebalien Stebalien added the kind/feature A new feature label Apr 28, 2018
@magik6k magik6k added the help wanted Seeking public contribution on this issue label May 19, 2018
@kjzz
Copy link
Contributor

kjzz commented May 29, 2018

@Stebalien @magik6k if someone use command "cat",the request cannot get the file name.so can i add a command such as "-w test.txt",the “test.txt” is the file name that user can specify,so we can get the file name instead of a hash string?

@Stebalien
Copy link
Member

Currently, that command would add the file test.txt from the current directory. We'd probably need to add a new flag. However, I'm not sure the best way to do this.

@kjzz
Copy link
Contributor

kjzz commented May 29, 2018

@Stebalien i understand you.And i want solve this problem and i am coding about this feature.In my opinion,i design a new command "-n",it means the filename.we can use this command such as "cat text.txt | ipfs add -w -n "text.txt" ",so i can get the filename from the command.it is ok?

@kjzz
Copy link
Contributor

kjzz commented May 29, 2018

@Stebalien the reason i design about this is that if i use "cat" to get the file,we cannot get the filename in cmd.request.

@schomatis
Copy link
Contributor

Hey @kjzz, sorry for the delay, are you still interested in fixing this? It would seem to me that the real fix should happen in go-ipfs-cmdkit (not sure though), but we could definitely add a flag in the adder to explicit a file name that would overwrite the original file name or provide one in case it wasn't specified (as is the case if reading from stdin).

This is the function in the adder that checks file names,

https://github.com/ipfs/go-ipfs/blob/afda4ca04f4ce5b9ffe7ada05a05aaaec5eb0cfb/core/coreunix/add.go#L414

and probably where the --name flag should be applied (the value passed with the flag should be stored in the Adder).

@kjzz
Copy link
Contributor

kjzz commented Aug 22, 2018

hello @schomatis ,sorry for delay,i am still interested in fix this issue,and i am working for this.i plan to add --name flag to get file name.do you agree with me?

@schomatis
Copy link
Contributor

Yes, go ahead! Ping me if you have any questions.

@kjzz
Copy link
Contributor

kjzz commented Aug 23, 2018

hello,@schomatis i have a question that if we want add two or more files to ipfs,such as ipfs add -w test.txt test1.txt.And how to use --name to explicit the files name.For example,if we have only one file , we can use ipfs add -w test.txt --name myfile.txt.However,if we have more than one file,can we use ipfs add -w test1.txt test2.txt --name myfile1,myfile2?(I plan use a comma or oher flag to split files name)maybe it is not a good solution,if you have some advice,you can ping me.

@schomatis
Copy link
Contributor

@kjzz I would start with the simplest solution (at least for this first iteration) of only one name, especially to avoid the scenario of having to check in ipfs add <dir> --name <many-names> that we're passing as many names as files in the directory hierarchy. With only one name we should support:

  1. Passing content through stdin (where FileName() is "").

  2. (Optional) Passing content through a single file and replace its name with --name.

We should ask @keks what's the best way to check that req.Files is actually a single file (for case 2, I think checking for FileName() == "" is enough for case 1).

@kjzz
Copy link
Contributor

kjzz commented Aug 23, 2018

@schomatis hello, i have finish the simplest solution of one name .You can review in #5399

@schomatis
Copy link
Contributor

Hey, sorry, I should have been more clear (I forgot how this issue got started), there are two different problems:

  1. I pass data from stdin so that is added as a file without a name (actually, with a name equal to its hash).

  2. I add data with a directory wrapper, which is anonymous, I would like to name that directory.

Those two problems (although somewhat similar) are not related:

echo "IPFS" | ipfs add
# added QmdFyxZXsFiP4csgfM5uPu99AvFiKH62CSPDw5TP92nr7w QmdFyxZXsFiP4csgfM5uPu99AvFiKH62CSPDw5TP92nr7w
# The content added through `stdin` has the hash (QmdFyxZ) as its name.

# Now wrap `stdin` inside a directory (`-w`).
echo "IPFS" | ipfs add -w
added QmdFyxZXsFiP4csgfM5uPu99AvFiKH62CSPDw5TP92nr7w QmdFyxZXsFiP4csgfM5uPu99AvFiKH62CSPDw5TP92nr7w
added QmTv9PEwAsEJefjVQwo1gCtfemcTTaDFfg9Wwq7eVs7Yxj 
# Two unnamed files:
#   1. The "IPFS" string from `stdin` as before (same hash).
#   2. The directory that wraps it (QmTv9PE).

I should have made that distinction from the beginning, I was talking about the 1st problem, which of those do you want to fix? The PR seems more oriented to the 2nd.

@schomatis
Copy link
Contributor

Also, the second problem is a bit more difficult, as it involves the MFS layer in another repo, so I would suggest you start with the first one.

@keks
Copy link
Contributor

keks commented Aug 23, 2018

@schomatis

what's the best way to check that req.Files is actually a single file

Phew, good question. The problem is that usually the request will come through HTTP which uses multipart encoding, and looking through the specs there doesn't seem to be a way to check how many files are sent without reading them all. Looks like there aren't even Content-Length headers for the individual files, so we can't see whether it only is one file by checking whether the whole size matches the first file's size plus a bunch of headers and boundaries.
What we could do is check that if we set the name there only is one file (and maybe even require that it is stdin) in PreRun. Even for HTTP calls these are executed on the client side before making the request and here we can check that if we set a name (a) f.IsDirectory() returns false and if not (b) we type assert the cmdkit.Fileto a *cmdkit.SliceFile and use sf.Length() to get the number of files (though subdirectories that contain multiple files will only count as one, of course).

Does that help?

@schomatis
Copy link
Contributor

@keks

The problem is that usually the request will come through HTTP which uses multipart encoding,

Thanks, I was only thinking about the CLI case and not HTTP, so maybe we should concentrate on problem 1 for now (data from stdin),

What we could do is check that if we set the name there only is one file (and maybe even require that it is stdin) in PreRun.

How could we achieve just the stdin check in PreRun?

@kjzz
Copy link
Contributor

kjzz commented Aug 31, 2018

Hello @schomatis ,i want to fix the 2nd problem that you said.

# Now wrap `stdin` inside a directory (`-w`).
echo "IPFS" | ipfs add -w
added QmdFyxZXsFiP4csgfM5uPu99AvFiKH62CSPDw5TP92nr7w QmdFyxZXsFiP4csgfM5uPu99AvFiKH62CSPDw5TP92nr7w
added QmTv9PEwAsEJefjVQwo1gCtfemcTTaDFfg9Wwq7eVs7Yxj 

Now if we use the command -w ,we can not define the path name.

$ cat test.txt | ipfs add -w
added Qme9Z5gJLNsQzVELtHh2oQZfjDugHGvLGuNKVtttGAJYAE 
Qme9Z5gJLNsQzVELtHh2oQZfjDugHGvLGuNKVtttGAJYAE
added Qmaq9QpAYfjF5sPAeEytKHk9rC3Vcgt6J8PRbCsNk7ypqY

In my PR #5399 ,we can use command like this:

cat test1.txt | ipfs add -w --name myfile.txt
added Qmf3xdREx34pewgYZjyvHQsrZxuk1TLUFdzoF95dsXfv3k myfile.txt
added QmSLquBbWRXs3VjTCJBXYv4BHn7NFNZxh5iU3zGNp5aMcf

@keks
Copy link
Contributor

keks commented Aug 31, 2018

@schomatis

How could we achieve just the stdin check in PreRun?

I just looked into this and it looks like I was wrong and there is no straightforward way to do that, sorry. We could add a helper the "cmdkit/files" package - I'm just a bit weary that at some point we might have a lot of helpers in there.

@schomatis
Copy link
Contributor

Hey @kjzz, I think we're still mixing two different issues. The second problem is about naming the anonymous directory included by the flag -w, but you seem to be interested in naming content from stdin, that's the first problem, and it's independent from the first one, it will happen whether or not you include the -w flag.

In the 1st problem that you're trying to fix, we need to detect that the file came from stdin, but as @keks was explaining that may not be so easy. @keks Do you see a simple (even if not complete) solution here that would take into account the HTTP as well as the CLI case?

Checking that the filename is an empty string "" would work only if we're sending a single file (otherwise, if there are multiple files without names, we can't rename them all to the same name). A really dirty approach would be to just rename the first unnamed file (and leave the rest unnamed), that would work for stdin in CLI, and in HTTP, in the worst case where we're receiving many file, we just rename the first one and leave the rest as is, which doesn't sound so bad, @keks WDYT?

@kjzz
Copy link
Contributor

kjzz commented Aug 31, 2018

hello @schomatis , thanks for explain.My pr #5399 aimed at solving the second problem.And i think it is the simplest way to solve problem.if we add many files with --name,it will return error.And i think --name ,should only support one file add.WDYT?

@schomatis
Copy link
Contributor

i think --name ,should only support one file

Agreed.

My pr #5399 aimed at solving the second problem.

The second problem is about the directory name, but you're PR is adding a name to the file, e.g.,

echo "FILE" | ipfs add -w
added QmeMMd4ou96A66KZPswpdsjjk4BtNhKk9Ap9kWCVFh3tZh QmeMMd4ou96A66KZPswpdsjjk4BtNhKk9Ap9kWCVFh3tZh
# ^^^ "FILE" contents without name: 1st problem.

added QmQ7iFAvmU57VTnjptj9h2nEmXQnQnd96F6ygFb3zdLEKP
# ^^^ Directory (included because of the `-w` argument) *also* without name: 2nd problem.

So, forgetting about the 1st/2nd terminology that may be confusing, there are two things without a name:

  1. The file passed through stdin.
  2. The directory wrapper created because of -w.

Which of those would you like to name? Your PR right now seems to be trying to name the file but is adding a check for the -w flag (which is unrelated to the fact that a stdin file has no name), that is a bit confusing to me. Once we understand what we're trying to accomplish we can focus more on the PR/code.

@kjzz
Copy link
Contributor

kjzz commented Sep 1, 2018

hello,@schomatis,let us review the issue.
As we all known,if we use the command cat

$ cat test.txt | ipfs add -w
added Qme9Z5gJLNsQzVELtHh2oQZfjDugHGvLGuNKVtttGAJYAE 
Qme9Z5gJLNsQzVELtHh2oQZfjDugHGvLGuNKVtttGAJYAE
added Qmaq9QpAYfjF5sPAeEytKHk9rC3Vcgt6J8PRbCsNk7ypqY

And now test.txt can only be accessed as

/ipfs/Qmaq9QpAYfjF5sPAeEytKHk9rC3Vcgt6J8PRbCsNk7ypqY/Qme9Z5gJLNsQzVELtHh2oQZfjDugHGvLGuNKVtttGAJYAE

but if we did not use command cat,such as

$ ipfs add test.txt -w
added Qme9Z5gJLNsQzVELtHh2oQZfjDugHGvLGuNKVtttGAJYAE  test.txt
added Qmaq9QpAYfjF5sPAeEytKHk9rC3Vcgt6J8PRbCsNk7ypqY

and use this command,we can access text.txt by

/ipfs/Qmaq9QpAYfjF5sPAeEytKHk9rC3Vcgt6J8PRbCsNk7ypqY/test.txt

so the difference between two command is the test.txt accessed path.

1./Qmaq9QpAYfjF5sPAeEytKHk9rC3Vcgt6J8PRbCsNk7ypqY/Qme9Z5gJLNsQzVELtHh2oQZfjDugHGvLGuNKVtttGAJYAE(hash + hash)
2./ipfs/Qmaq9QpAYfjF5sPAeEytKHk9rC3Vcgt6J8PRbCsNk7ypqY/test.txt (hash + filename)

In my pr #5399 ,
we can use new command --name and cat

cat test.txt | ipfs add -w --name test.txt
added Qme9Z5gJLNsQzVELtHh2oQZfjDugHGvLGuNKVtttGAJYAE test.txt
added Qmaq9QpAYfjF5sPAeEytKHk9rC3Vcgt6J8PRbCsNk7ypqY

the accessed path is the same as ipfs add test.txt -w

ipfs/Qmaq9QpAYfjF5sPAeEytKHk9rC3Vcgt6J8PRbCsNk7ypqY/test.txt

of course,if we use command such as cat test.txt | ipfs add -w --name myfile.txt.the accessed path will change:

ipfs/Qmaq9QpAYfjF5sPAeEytKHk9rC3Vcgt6J8PRbCsNk7ypqY/myfile.txt

so my pr is aimed at to solve problem that when we use -w command,we can define is file name by ourselves.Do i explain clearly?

@schomatis
Copy link
Contributor

schomatis commented Sep 2, 2018

Yes, I think I get it now, thanks for taking the time to re-explain it to me.

My confusion was always related to the use of the -w flag which is not related to the unnamed file issue, but I know that from my understanding of the underlying MFS/UnixFS layers, but from the perspective of the user it makes perfect sense to associate the two. We (mostly I) need to work more on closing this knowledge gap. I'll take another look at the PR.

To give you some brief (and incomplete) background: all of the IPFS files you manipulate (e.g., with ipfs add) belong to a hierarchical file system named MFS. When you create a file it get's added to a root directory which the -w flag exposes. That is, this implicit root directory (Qme9Z5 in your example) always existed, even if you don't use the -w flag. The difference, for you as a user (I now realize), is that if you have access to that root directory then you can reference the added file with the typical Unix path /ipfs/<hash-dir>/<file>, and that's where your --name flag comes in, to help you reference the file with a human-readable name instead of a hash. But, if you don't use the -w flag, the name of the file (created with --name) still would exist in this root directory, just that you wouldn't be able to access the file by name because you don't have the hash of this root directory. (You don't need to understand all this, but I felt I owed you some explanation regarding the context of my confusion.)

Sorry for the back and forth in this issue, but this has been a great feedback for me on what are some of the obscure points of how do files in IPFS work. One issue that I'm identifying right now is this contradiction between the content-addressable philosophy of IPFS and the location-addressed system of any file system such as Unix. In IPFS everything is first accessed by a hash and then you can resolve links with names in a path, but that path is meaningless unless you have a hash as an anchor to that path (the hash defines in which filesystem you're in, as there is no unique root path in IPFS).

@kjzz
Copy link
Contributor

kjzz commented Sep 3, 2018

@schomatis thanks very much for giving some ipfs background,I also very interested MFS.And i have mofidy code by your advice. And i have understand why should drop wrap check.Thank you again.Please help me review my pr #5399 . And I also want to contribute more for ipfs,can you give some task to do?My email is 601172892@qq.com,if you can give some some advice or task,you can emal to me or tell me here(in github).Thanks a lot!

@schomatis
Copy link
Contributor

That's great @kjzz! Why don't you take a look at #5388? It's kind of the entry point (WIP) for community contributions to how are files handled in IPFS. That may give you a general view of what can you do in that area, and we can take it from there.

@kjzz
Copy link
Contributor

kjzz commented Sep 4, 2018

@schomatis thx.And you mean we can write some file in this pr #5052 ?

@schomatis
Copy link
Contributor

And you mean we can write some file in this pr #5052 ?

Sure! I mean, I don't know if I can give you permissions to push to that PR (if there is please let me know), but feel free to create your own PR based on that one (or just create a new PR from scratch), #5052 is rather stale at the moment.

@kjzz
Copy link
Contributor

kjzz commented Sep 5, 2018

hey @schomatis ,so Should i write some doc about file api following your template in #5052 ?

@schomatis
Copy link
Contributor

Hey @kjzz, #5052 is not much of a useful template at the moment, I would start by opening an issue in the Files API milestone and start documenting there information you find interesting (from a new user perspective) and think that other users would benefit to know (or what part of the code should be better documented), so we can later discuss how to incorporate that in a doc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Seeking public contribution on this issue kind/feature A new feature
Projects
None yet
Development

No branches or pull requests

6 participants