Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for archive folder being on SMB share #340

Open
wants to merge 24 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
b73f12a
Add smbprotocol to requirements
robintw Apr 21, 2020
3d9a62c
Set useful derived variable from config file to say whether SMB is us…
robintw Apr 21, 2020
abed84b
Added first few functions to smb_and_local_file_operations - makedirs…
robintw Apr 21, 2020
cca62d9
Modified start of FileProcessor class to use new methods for SMB/loca…
robintw Apr 21, 2020
8755178
Add passing auth params
robintw Apr 21, 2020
74da433
Moved to import as smblocal in FileProcessor
robintw Apr 21, 2020
dba0867
Moved to just implement simple functions, rather than pepys-specific …
robintw Apr 21, 2020
90aef15
Updated FileProcessor to us new smblocal functions for moving files t…
robintw Apr 21, 2020
5508bf7
Add pass-through tests for smblocal module, to check smbclient or os …
robintw Apr 21, 2020
735babd
Fix _set_file_basic_info to add extra compulsory argument
robintw Apr 21, 2020
e640a6d
Add open_file wrapper, and replace all usages of open() in FileProces…
robintw Apr 21, 2020
54ab429
Fix tests to cope with new argument
robintw Apr 21, 2020
19bcc21
Wrapped all SMB operations in try/except blocks
robintw Apr 22, 2020
2461b0e
Tidy up error message
robintw Apr 22, 2020
a9598b1
Add output of full exception too
robintw Apr 22, 2020
1e1e14a
Update docs for new SMB share options in archive path
robintw Apr 22, 2020
8c51c33
Merge branch 'develop' into archive-via-smb
IanMayo Apr 23, 2020
6217815
Removed duplication of try-except blocks, replacing with a contextman…
robintw Apr 24, 2020
d9b70fc
Merge branch 'archive-via-smb' of github.com:debrief/pepys-import int…
robintw Apr 24, 2020
4413c5e
Merge branch 'develop' into archive-via-smb
IanMayo Apr 29, 2020
b19662b
Merge branch 'develop' of github.com:debrief/pepys-import into archiv…
robintw May 4, 2020
5708020
Merge branch 'archive-via-smb' of github.com:debrief/pepys-import int…
robintw May 4, 2020
ebdd597
Merge branch 'develop' of github.com:debrief/pepys-import into archiv…
robintw Jul 7, 2020
222225f
isort changes
robintw Jul 7, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .isort.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@ include_trailing_comma = true
force_grid_wrap = 0
use_parentheses = true
line_length = 100
known_third_party =alembic,dateutil,geoalchemy2,geopy,iterfzf,lxml,pg8000,pint,prompt_toolkit,pyfiglet,pygments,pytest,setuptools,shapely,sqlalchemy,tabulate,testing,tqdm
known_third_party =alembic,dateutil,geoalchemy2,geopy,iterfzf,lxml,pg8000,pint,prompt_toolkit,pyfiglet,pygments,pytest,setuptools,shapely,smbclient,smbprotocol,sqlalchemy,tabulate,testing,tqdm
2 changes: 2 additions & 0 deletions config.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,8 @@
ARCHIVE_USER = config.get("archive", "user", fallback="")
ARCHIVE_PASSWORD = config.get("archive", "password", fallback="")
ARCHIVE_PATH = config.get("archive", "path", fallback=None)
# Set useful extra config variable based on the archive path
ARCHIVE_ON_SMB = True if ARCHIVE_PATH.startswith("\\\\") else False

# Process user and password if necessary
if ARCHIVE_USER.startswith("_") and ARCHIVE_USER.endswith("_"):
Expand Down
21 changes: 16 additions & 5 deletions docs/configuration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,15 +37,26 @@ variables are:

:code:`[archive]` section
##########################
These settings control how pepys-import archives files after importing them.
The specific variables are:

- :code:`user`: Username used to connect to the archiving location (default: none). Can be encrypted.
- :code:`password`: Password used to connect to the archiving location (default: none). Can be encrypted.
- :code:`path`: Full path to folder used to archive input files and store output logs (default: none)
These settings control where Pepys stores the archived input files, alongside the output and error
logs for an input The specific variables are:

- :code:`user`: Username used to connect to the archiving location. Only used when the :code:`path` is a
Windows shared folder (default: none).
- :code:`password`: Password used to connect to the archiving location (default: none). Only used when the :code:`path` is a
Windows shared folder (default: none).
- :code:`path`: Full path to folder used to archive input files and store output logs (default: none).
This should be either a local path to a folder (it will be created if it doesn't exist), or a path to a
Windows shared network folder (also called a *SMB Share*) on another computer. For the latter, the
path must start with :code:`\\` and be structured as :code:`\\SERVER\share\path\to\folder`. When running
on Windows, the :code:`SERVER` part of the path can be specified as either a Windows hostname or an IP
address. For all other platforms, the :code:`SERVER` part must be specified as an IP address.
The username and password configured in the other two variables in this section will be used to connect
to the shared folder.

:code:`[local]` section
##########################

These settings control how pepys-import finds custom locally-installed parsers and validation tests.
The specific variables are:

Expand Down
45 changes: 24 additions & 21 deletions pepys_import/file/file_processor.py
Original file line number Diff line number Diff line change
@@ -1,15 +1,14 @@
import inspect
import json
import os
import shutil
from datetime import datetime
from getpass import getuser
from stat import S_IREAD

from config import ARCHIVE_PATH, LOCAL_PARSERS
from paths import IMPORTERS_DIRECTORY
from pepys_import.core.store.data_store import DataStore
from pepys_import.core.store.table_summary import TableSummary, TableSummarySet
from pepys_import.file import smb_and_local_file_operations as smblocal
from pepys_import.file.highlighter.highlighter import HighlightedFile
from pepys_import.file.importer import Importer
from pepys_import.utils.datafile_utils import hash_file
Expand Down Expand Up @@ -37,14 +36,15 @@ def __init__(self, filename=None, archive=False):
self.filename = filename
self.output_path = None
self.input_files_path = None
self.directory_path = None
self.output_files_path = None
self.archive = archive

# Check if ARCHIVE_PATH is given in the config file
if ARCHIVE_PATH:
# Create the path if it doesn't exist
if not os.path.exists(ARCHIVE_PATH):
os.makedirs(ARCHIVE_PATH)
if not smblocal.exists(ARCHIVE_PATH):
smblocal.makedirs(ARCHIVE_PATH)
self.output_path = ARCHIVE_PATH
self.archive = archive

def process(self, path: str, data_store: DataStore = None, descend_tree: bool = True):
"""Process the data in the given path
Expand All @@ -58,6 +58,9 @@ def process(self, path: str, data_store: DataStore = None, descend_tree: bool =
"""
dir_path = os.path.dirname(path)
# create output folder if not exists
# We will never get to this bit of code if the ARCHIVE_PATH is set
# (as self.output_path is already set), so we can just use standard
# Python os functions
if not self.output_path:
self.output_path = os.path.join(dir_path, "output")
if not os.path.exists(self.output_path):
Expand All @@ -81,22 +84,22 @@ def process(self, path: str, data_store: DataStore = None, descend_tree: bool =
str(now.minute).zfill(2),
str(now.second).zfill(2),
)
if not os.path.isdir(self.output_path):
os.makedirs(self.output_path)
if not smblocal.isdir(self.output_path):
smblocal.makedirs(self.output_path)
else:
self.output_path = os.path.join(
self.output_path + "_" + str(now.microsecond).zfill(3)[:3]
)
os.makedirs(self.output_path)
smblocal.makedirs(self.output_path)

# create input_files folder if not exists
self.input_files_path = os.path.join(self.output_path, "sources")
if not os.path.exists(self.input_files_path):
os.makedirs(self.input_files_path)
if not smblocal.exists(self.input_files_path):
smblocal.makedirs(self.input_files_path)

self.directory_path = os.path.join(self.output_path, "reports")
if not os.path.isdir(self.directory_path):
os.makedirs(self.directory_path)
self.output_files_path = os.path.join(self.output_path, "reports")
if not smblocal.isdir(self.output_files_path):
smblocal.makedirs(self.output_files_path)

processed_ctr = 0

Expand Down Expand Up @@ -274,7 +277,7 @@ def process_file(self, file_object, current_path, data_store, processed_ctr, imp

# Write highlighted output to file
highlighted_output_path = os.path.join(
self.directory_path, f"{filename}_highlighted.html"
self.output_files_path, f"{filename}_highlighted.html"
)

highlighted_file.export(highlighted_output_path, include_key=True)
Expand Down Expand Up @@ -315,25 +318,25 @@ def process_file(self, file_object, current_path, data_store, processed_ctr, imp
summary_details["filename"] = basename

# write extraction log to output folder
with open(
os.path.join(self.directory_path, f"{filename}_output.log"), "w",
with smblocal.open_file(
os.path.join(self.output_files_path, f"{filename}_output.log"), "w",
) as file:
file.write("\n".join(log))
if self.archive is True:
# move original file to output folder
new_path = os.path.join(self.input_files_path, basename)
shutil.move(full_path, new_path)
smblocal.move(full_path, new_path)
# make it read-only
os.chmod(new_path, S_IREAD)
smblocal.set_read_only(new_path)
summary_details["archived_location"] = new_path
import_summary["succeeded"].append(summary_details)

else:
failure_report_filename = os.path.join(
self.directory_path, f"{filename}_errors.log"
self.output_files_path, f"{filename}_errors.log"
)
# write error log to the output folder
with open(failure_report_filename, "w") as file:
with smblocal.open_file(failure_report_filename, "w") as file:
json.dump(errors, file, ensure_ascii=False, indent=4)
import_summary["failed"].append(
{
Expand Down
4 changes: 3 additions & 1 deletion pepys_import/file/highlighter/support/export.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
from pepys_import.file import smb_and_local_file_operations as smblocal

from .color_picker import color_for, hex_color_for, mean_color_for


Expand Down Expand Up @@ -93,5 +95,5 @@ def export_report(filename, chars, dict_colors, include_key=False):

output_strings.append(html_footer)

with open(filename, "w") as f:
with smblocal.open_file(filename, "w") as f:
f.write("".join(output_strings))
80 changes: 80 additions & 0 deletions pepys_import/file/smb_and_local_file_operations.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
import os
import shutil
import sys
from contextlib import contextmanager
from stat import S_IREAD

import smbclient
import smbclient.path
import smbclient.shutil
from smbprotocol.exceptions import SMBAuthenticationError, SMBResponseException

from config import ARCHIVE_ON_SMB, ARCHIVE_PASSWORD, ARCHIVE_USER


@contextmanager
def handle_smb_errors():
try:
yield
except (SMBAuthenticationError, SMBResponseException, ValueError) as e:
print(e)
print(
"Error connecting to archive location on Windows shared folder (SMB share). "
+ "Check config file details are correct and server is accessible. See full error above."
)
sys.exit()


auth = {"username": ARCHIVE_USER, "password": ARCHIVE_PASSWORD}


def exists(path):
if ARCHIVE_ON_SMB:
with handle_smb_errors():
smbclient.path.exists(path, **auth)
else:
return os.path.exists(path)


def isdir(path):
if ARCHIVE_ON_SMB:
with handle_smb_errors():
return smbclient.path.isdir(path, **auth)
else:
return os.path.isdir(path)


def makedirs(path):
if ARCHIVE_ON_SMB:
with handle_smb_errors():
return smbclient.makedirs(path, **auth)
else:
return os.makedirs(path)


def move(from_path, to_path):
if ARCHIVE_ON_SMB:
with handle_smb_errors():
# No move function in smbclient, so copy then delete original copy
smbclient.shutil.copy(from_path, to_path, **auth)
os.remove(from_path)
else:
shutil.move(from_path, to_path)


def set_read_only(path):
if ARCHIVE_ON_SMB:
with handle_smb_errors():
smbclient.shutil._set_file_basic_info(
path, follow_symlinks=False, read_only=True, **auth
)
else:
os.chmod(path, S_IREAD)


def open_file(*args, **kwargs):
if ARCHIVE_ON_SMB:
with handle_smb_errors():
return smbclient.open_file(*args, **kwargs, **auth)
else:
return open(*args, **kwargs)
3 changes: 2 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,5 @@ alembic>=1.4.2
pg8000>=1.14.1
setuptools>=40.8.0
Pygments>=2.6.1
geopy>=1.22
geopy>=1.22
smbprotocol
Loading