Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.
/ corefx Public archive

Initial commit of System.IO.Packaging #1818

Closed
wants to merge 20 commits into from
Closed

Initial commit of System.IO.Packaging #1818

wants to merge 20 commits into from

Conversation

EricWhiteDev
Copy link
Contributor

New version of System.IO.Packaging, rewritten to use System.IO.Compression. This assembly is used by the Open XML SDK. The code was written to work with both .NET 4.5.* and COREFX/CORECLR. I will be adding additional tests in the near future. In addition, I have a massive amount of tests that will go into "outer-loop" testing after some cleanup.

@dnfclas
Copy link

dnfclas commented May 21, 2015

Hi @EricWhiteDev, I'm your friendly neighborhood .NET Foundation Pull Request Bot (You can call me DNFBOT). Thanks for your contribution!

In order for us to evaluate and accept your PR, we ask that you sign a contribution license agreement. It's all electronic and will take just minutes. I promise there's no faxing. https://cla2.dotnetfoundation.org.

TTYL, DNFBOT;

@dnfclas
Copy link

dnfclas commented May 21, 2015

@EricWhiteDev, Thanks for signing the contribution license agreement so quickly! Actual humans will now validate the agreement and then evaluate the PR.

Thanks, DNFBOT;

@stephentoub
Copy link
Member

@dotnet-bot test this please

/// accompanied by a '\n' in the string, else its an error.
/// </summary>
/// <param name="contentType"></param>
private static void ValidateCarriageReturns(string contentType)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this going to be correct on platforms where newlines are just "\n"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very interesting issue. RFC-2616 (definition of content types, as referred to by OPC) does not allow for CRLF (or CR or LF) in a content type. Further, I did a check in my sample documents, and no Open XML documents contain a CR, LF, or CRLF in a content type. I suspect that this code has never been executed in a real scenario.

However, I am loath to change this code, as changing it could potentially impact existing applications. There may be some scenario where programs are setting content types with CRLF in them, although I can't think of what that would be.

One of my goals with this module is to implement as closely as possible the exact same functionality as the existing implementation. There are certainly aspects of System.IO.Packaging that could be improved upon, such as adding new functionality to make it more LINQ friendly, and so on. However, I propose that this is a separate project from the project of getting an implementation that is as compatible as possible as the existing one.

I propose that we leave the code as-is for now, and if there are issues that crop up in the Linux or Mac world, we address those as a separate work item.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

@stephentoub
Copy link
Member

@EricWhiteDev, just following up... what's the status of this?

@stephentoub
Copy link
Member

Two tests are also failing on Unix:

System.IO.Packaging.Tests.Tests.T035_ModifyAllPackageProperties [FAIL]
08:06:16       Assert.Equal() Failure
08:06:16                                        ��� (pos 21)
08:06:16       Expected: ������p.Category: >(null)<\npp.ContentStatus: >(null)<\npp.ContentTyp������
08:06:16       Actual:   ������p.Category: >(null)<\r\npp.ContentStatus: >(null)<\r\npp.ContentT������
08:06:16                                        ��� (pos 21)
08:06:16       Stack Trace:
08:06:16             at System.IO.Packaging.Tests.Tests.T035_ModifyAllPackageProperties()
08:06:16    System.IO.Packaging.Tests.Tests.T029_DeletePackageRelationship [FAIL]
08:06:16       Assert.Equal() Failure
08:06:16                   ��� (pos 2)
08:06:16       Expected: #0\nUri: /docProps/app.xml\nContentType: appl������
08:06:16       Actual:   #0\r\nUri: /docProps/app.xml\r\nContentType: ap������
08:06:16                   ��� (pos 2)
08:06:16       Stack Trace:
08:06:16             at System.IO.Packaging.Tests.Tests.T029_DeletePackageRelationship()
08:06:16 Finished:    System.IO.Packaging.Tests
08:06:16 
08:06:16 === TEST EXECUTION SUMMARY ===
08:06:16    System.IO.Packaging.Tests  Total: 89, Errors: 0, Failed: 2, Skipped: 0, Time: 2.187s

The relevant files aren't showing up in the diff, but I believe the issue is that the source is using verbatim string literals that span lines, such that the newline on the platform on which compilation is happening is being embedded in the string, rather than the newline on which the test is running. The code should be changed to avoid verbatim strings spanning newlines.

@stephentoub
Copy link
Member

@EricWhiteDev, any updates on this?

@EricWhiteDev
Copy link
Contributor Author

@stephentoub I've been on vacation for the past few days. I have made progress on making appropriate changes, but have not finished. I have some questions on things, and will contact you or Matt regarding those.

@stephentoub
Copy link
Member

Ok, thanks for the update.

dotnet-bot and others added 7 commits June 18, 2015 21:54
@EricWhiteDev
Copy link
Contributor Author

@stephentoub Fixed the issue with tests that have strings that span newlines, as those tests failed on Unix.

return true;
else
return false;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could just be:

return !s.CanRead && !s.CanSeek && !s.CanWrite;

@stephentoub
Copy link
Member

Thanks, Eric. I left some additional comments. As they're mostly minor issues, I'd be ok with us merging this and then addressing such issues subsequently; up to you. I do think some/most of the commits should be squashed down before it's merged, e.g. the initial commit from dotnet-bot and then one or a few commits from you addressing feedback. We try to avoid commits in the history that don't build or pass tests, that are just undoing changes from previous commits in the same PR, etc.


// Flush all the parts that are currently open.
// This will flush part relationships.
DoOperationOnEachPart(DoFlush);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DoFlush is an instance method; the delegate allocation can't be avoided with the current signatures (or it would need to be manually cached on the instance, which comes with its own tradeoffs), so addressing that would be a larger change. Using p => DoFlush(p) would actually be worse, as it'd force a closure allocation as well in order to capture the implicit this reference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeap, I was wrong initially but we can
DoOperationOnEachPart((thisRef, packagePart) => thisRef.DoFlush(packagePart));
and pass this as parameter inside DoOperationOnEachPart.
All these changes related to private/internal API.

@stephentoub
Copy link
Member

One more thing: I just noticed the test files in the PR (GitHub was hiding them from view). It appears you've Base64-encoded various test assets (e.g. .docx files) and included them in a .cs file for use by the tests. That might be the right approach, but it's not how we've handled test assets elsewhere, e.g. in System.IO.Compression, where test data has been provided in a separate NuGet package:
https://github.com/dotnet/corefx/blob/master/src/System.IO.Compression/tests/project.json#L18 . Please sync with @FiveTimesTheFun to decide which approach we want to follow here. I'm ok with either as long as you guys agree. Thanks!

@EricWhiteDev
Copy link
Contributor Author

@stephentoub Thanks for doing this detailed review. Good points all. To a large extent, these code issues are in code that I inherited from the original System.IO.Packaging. In general, I opted to make as few changes as possible to existing code, given that this code is used in mission critical applications where stability of the application that uses this library is of the highest importance.

Regarding performance issues, the majority of processing time occurs in System.IO.Compression, and no matter how much we optimize the code in System.IO.Packaging, we can't impact the performance of System.IO.Compression. In the process of developing the new System.IO.Packaging, I did some measurements on performance, and our biggest issues are in the Open XML SDK proper, not System.IO.Packaging, so it seems good to address those first. In any case, I have metrics that show that the new System.IO.Packaging has better performance than the old one (mainly due to the better perf of System.IO.Compression when compared to MS.Internal.IO.Compression). Performance in this area is not a concern of the users of the library.

I would like to go with your suggested plan of merging now, and then addressing code issues such as these in a subsequent phase of work. The code in its current state is the code that I "hardened" by testing on 370,000 Open XML files, and making even simple changes requires that I again run the code through the my massive test. This test is run using OxRunner (https://www.youtube.com/watch?v=AcZFexDxL0U), which requires setting up 8 quad-core machines, all properly updated, with correct software installed, git repos properly installed on all machines, and so on. It is not possible to test efficiently on such a massive document library without taking an approach that uses parallelism.

Further, I have a super-high priority project of making the Open XML SDK ready for Office 2016 (time is ticking on this one), so I propose that I address these issues after I make headway on the Open XML SDK.

@stephentoub
Copy link
Member

I would like to go with your suggested plan of merging now, and then addressing code issues such as these in a subsequent phase of work.

@EricWhiteDev: that's fine, thanks. Please do just verify about the test assets with @FiveTimesTheFun, and squash as mentioned.

@stephentoub
Copy link
Member

@EricWhiteDev, just checking in on this. I think this is good to go except for the test assets and then squashing. Any ETA for when you plan to update this accordingly?

@stephentoub
Copy link
Member

@EricWhiteDev, @FiveTimesTheFun, how are we doing on this?

@EricWhiteDev
Copy link
Contributor Author

@stephentoub We are meeting in a couple of hours so that I can understand the details of how to put together the test assets. I'll put together the changes tomorrow. Should be good to go in short order.

@stephentoub
Copy link
Member

Great! Thanks.

@EricWhiteDev
Copy link
Contributor Author

@stephentoub We rescheduled our meeting for Thursday, so I expect to have this in good shape early next week.

@stephentoub
Copy link
Member

@EricWhiteDev, checking in again 😄 Any updates?

@EricWhiteDev
Copy link
Contributor Author

@stephentoub Yes! Matt and I got the test assets moved to the corefx-testdata repo, and I have most of the tests working using those test assets. I have a small amount of cleanup left, and then will push latest commit, and in theory, should be good to go. I'll ping you hopefully before the end of week, you can take a final look. I have been buried last couple of days on another project, but should have time to finalize this today or tomorrow.

@stephentoub
Copy link
Member

Thanks, Eric 😄

@EricWhiteDev
Copy link
Contributor Author

@stephentoub Ok, it is ready once again for you to look at. All tests now pull from the corefx-testdata repo. TestFileLib.cs (which had the base64 encoded test assets) has been deleted. Further, some tests were leaving behind temporary files. Now all tests clean up after themselves. The temporary files were named with Guids, so no chance of conflicting with other tests, but in any case, was messy and wrong to leave them around.

So take a look, and let me know if there is anything else I need to do.

if (contentType == null)
throw new ArgumentNullException("contentType");

if (String.CompareOrdinal(contentType, String.Empty) == 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (contentType.Length == 0) ?

@stephentoub
Copy link
Member

Thanks, Eric. I took a quick skim through and left a few more minor comments. At this point, here's what I think you should do:

  • There are a bunch of issues not addressed in the code. That's ok for this PR, but I want to make sure it doesn't get lost. Please open an issue to track fixing up those and other issues.
  • Please squash this down to a smaller number of commits. In generally we try to keep history clean, which means no merge commits as part of a PR's commit history, commits that compile and pass all tests at the time, etc. I think it'd make sense to have two (maybe three) commits here: one for the initial dotnet-bot commit of the source, and then one for the subsequent changes you made.

Once that's in, I can do a final pass through, and we can get this merged.

Thanks!

@EricWhiteDev
Copy link
Contributor Author

@stephentoub I have consolidated the code cleanup into a single issue:

https://github.com/dotnet/corefx/issues/2699

@EricWhiteDev
Copy link
Contributor Author

@stephentoub Sorry, I am out of my depth here with regards to squashing commits in this scenario. (I am able to do this with my own super-simple repos, but worry about doing this one.) Don't want to mess it up, so I will get with @FiveTimesTheFun to do the squashing. I will schedule a short meeting with him.

@stephentoub
Copy link
Member

Thanks, @EricWhiteDev. I can help; I'll squash things down now, make sure it all builds/tests appropriately, and put up a replacement PR that contains your commits.

@EricWhiteDev
Copy link
Contributor Author

@stephentoub thank you, appreciate it. As far as I am concerned, don't need any of the commit history.

@stephentoub
Copy link
Member

Replaced by #2701

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants