Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Importer for 0.8.9 data via the CLI #3502

Merged
merged 11 commits into from
Aug 6, 2015
Merged

Importer for 0.8.9 data via the CLI #3502

merged 11 commits into from
Aug 6, 2015

Conversation

corylanou
Copy link
Contributor

This PR allows you to import an exported file from 0.8.9 (#3477).

Caveats

For the export/import to work, all requisites have to be met. For export, all series names in 0.8 should be in the following format:

<tagName>.<tagValue>.<tagName>.<tagValue>.<measurement>

for example:

az.us-west-1.host.serverA.cpu

or any number of tags

building.2.temperature

Additionally, the fields need to have a consistent type (all float64, int64, etc) for every write in 0.8. Otherwise they have the potential to fail writes in the import. See below for more information.

Running the import command

To import via the cli, you can specify the following command:

influx -import -path=metrics-default.gz -compressed

If the file is not compressed you can issue it without the -compressed flag:

influx -import -path=metrics-default

To redirect failed import lines to another file, run this command:

influx -import -path=metrics-default.gz -compressed > failures

It will import using the line protocol in batches of 5,000 lines per batch.

Understanding the results of the import

The batch will give some basic stats when finished:

2015/07/29 23:15:20 Processed 2 commands
2015/07/29 23:15:20 Processed 70207923 inserts
2015/07/29 23:15:20 Failed 29785000 inserts

Most inserts fail due to the following types of error:

2015/07/29 22:18:28 error writing batch:  write failed: field type conflict: input field "value" on measurement "metric" is type float64, already exists as type integer

This is due to the fact that in 0.8 a field could get created and saved as int or float types for independent writes. In 0.9 the field has to have a consistent type.

@beckettsean
Copy link
Contributor

@corylanou how are errors reported? STDOUT?

@corylanou
Copy link
Contributor Author

@beckettsean no, STDERR... get it? errors ... :-)

@beckettsean
Copy link
Contributor

but of course...

:dunce_hat: should be an emoji, I need one.

@corylanou corylanou changed the title WIP - Importer for 0.8.9 data via the CLI Importer for 0.8.9 data via the CLI Jul 30, 2015
@corylanou corylanou force-pushed the import branch 2 times, most recently from 884b269 to fe05a53 Compare July 31, 2015 18:18
// Query is used to send a command to the server. Both Command and Database are required.
type Query struct {
Command string
Database string
}

// ParseConnectionString will parse a string to create a valid connection URL
func ParseConnectionString(path string, ssl bool) (url.URL, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the path just require http://... or https://... and then no need for ssl bool?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could, but then you have to parse and validate, and if they don't give you either, you aren't sure what they wanted (you could default to http, but that might not be right). Making them explicitly choose makes it their responsibility to get right. Otherwise, if they don't provide either, it's now on us to try to make the right decision without enough information.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that http or https would be required but no strong feeling here.

@pauldix
Copy link
Member

pauldix commented Jul 31, 2015

Looks mostly good. The one other thing I think would be good to add is to output any of the lines that fail to parse into a new file. That way the developer can write their own custom script to handle just those data points.

And if there are write timeouts that just can't be overcome, to output all the timed out data to a different file.


// Process the scanner
v8.processDDL(scanner)
v8.processDML(scanner)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the DDL and DML need to be processed in order? Right now it sends it to goroutines that are running concurrently and it seems like they could be run out-of-order.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I might of over engineered that a little. I might refactor that to straight method calls instead after seeing the final product.

@corylanou
Copy link
Contributor Author

@pauldix none of the lines can fail to parse as they were written out by the same line protocol code we use to ingest. I could check for it, but imo that would be the equivalent of a corrupt input file at that point.

Do you want them to specify a file for "failed writes" that we write to? Otherwise I can write failed writes to STDOUT and all other info to STDERR and they can just pipe failed writes to any file they want.

@pauldix
Copy link
Member

pauldix commented Jul 31, 2015

@corylanou oh right, never mind what I said about the parse thing. For failed writes I think let them specify a file to output to? That way they can just try again and point it at that file.

} else {
c.Line.Close()
os.Exit(0)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can drop the else since the if block will exit.

@corylanou
Copy link
Contributor Author

@benbjohnson ok, I think I have your comments addressed. Please take another look when you get a chance. Thanks!

@corylanou
Copy link
Contributor Author

@pauldix this now writes all failed lines to STDOUT, and messages to STDERR so the end users can do what they want depending on what they care about.

@pauldix
Copy link
Member

pauldix commented Aug 6, 2015

+1, just give @beckettsean the invocation example that outputs the failed writes to another file.

corylanou added a commit that referenced this pull request Aug 6, 2015
Importer for 0.8.9 data via the CLI
@corylanou corylanou merged commit 08f84a2 into master Aug 6, 2015
@corylanou corylanou deleted the import branch August 6, 2015 15:46
@beckettsean
Copy link
Contributor

@corylanou is there any way to throttle or back off on the import? Most users will be running the import against a live 0.9.3 server.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants