Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 0.10.1 #235

Merged
merged 43 commits into from
Jun 10, 2020
Merged
Show file tree
Hide file tree
Changes from 39 commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
6b5a9a5
fix off by one error in random value selection
MaximMoinat Jan 25, 2020
8411e07
refactoring: class for storing value counts from scan report
MaximMoinat Jan 25, 2020
888c744
refactor valueCounts to use arraylist as property
Apr 14, 2020
7d17cdb
option to force first field to be the primary key. plus refactorings
Apr 14, 2020
b68f4f6
add option for uniform sampling from valuelist
Apr 14, 2020
e6793f6
fix primary key force option
Apr 14, 2020
141c6bd
Merge branch 'develop' into fake-data-fix
Apr 21, 2020
5ebc92f
add reference to latest version in docs
May 20, 2020
949bc26
bump snapshot version
May 22, 2020
7e7428d
fix rows checkecd count set
May 22, 2020
6e98c05
set nRows to -1 if not provided
May 22, 2020
4eabbac
wip: show fields mapping panel upon double click of table box
Aug 19, 2019
13ffa58
clean up code, hide target table
May 22, 2020
31c2813
fix issue with creating mapping for source field view
May 22, 2020
cc8b76c
Merge pull request #229 from thehyve/riah-fix-rowcount-read
May 26, 2020
b8f593b
Merge pull request #230 from thehyve/riah-source-table-details
May 26, 2020
77e1306
display percentage empty and unique count
Sep 23, 2019
3ce13a3
unique count and percentage empty on one line
Sep 23, 2019
a8c90b5
change empty percentage formatting
Sep 24, 2019
3c8db28
fix reading in of inexact counts (<=)
May 27, 2020
afc8439
read overview sheet by name
May 27, 2020
fe783de
apply html decoding to all cell values
May 27, 2020
bb71f78
update all table and field information when reloading scan report
Jun 2, 2020
7a1d0ea
also add new fields and tables upon reloading scan report
Jun 2, 2020
a2db213
add language_concep_ids and remove all invalid concepts
Jun 2, 2020
53d9b76
Merge pull request #231 from thehyve/riah-source-details
Jun 9, 2020
f03cf65
Merge pull request #233 from thehyve/riah-concept-id-hint-language
Jun 9, 2020
b63e1e1
Merge branch 'develop' into fake-data-fix
Jun 9, 2020
be57e51
make unique fields from scan also unique in fake data
Jun 9, 2020
eeabc87
add UI option to set uniform sampling
Jun 9, 2020
b2d7ca9
Merge branch 'fake-data-fix' of https://github.com/thehyve/OHDSI-Whit…
Jun 9, 2020
1f8e278
code cleanup
Jun 9, 2020
339c098
Merge pull request #203 from thehyve/fake-data-fix
Jun 9, 2020
b95a7d2
Merge branch 'master' into release-0.10.1
Jun 9, 2020
fedb8b3
bump version in docs
Jun 9, 2020
5d97a04
fix reading status of uniform sampling option
Jun 9, 2020
487139f
fix pk_cursor going out of range
Jun 9, 2020
982e353
fix markdown and html export of empty table cells
Jun 9, 2020
c6582df
update examples
Jun 9, 2020
e1ba80e
refactorings for reading scanreport
Jun 10, 2020
3a7fca6
refactor database connection tooltips
Jun 10, 2020
03bdd77
do not read value if count could not be read from scanreport
Jun 10, 2020
37fdf22
read count as floored double
Jun 10, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -357,7 +357,7 @@ <h1>Features</h1>
</div>
<div id="current-version" class="section level1">
<h1>Current version</h1>
<p><a href="https://github.com/OHDSI/WhiteRabbit/releases/tag/v0.9.0"><strong>v0.9.0</strong></a></p>
<p><a href="https://github.com/OHDSI/WhiteRabbit/releases/latest"><strong>v0.10.1</strong></a></p>
</div>


Expand Down
2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,4 +21,4 @@ It comes with **RabbitInAHat**, an application for interactive design of an ETL
- Rabbit in a Hat generates ETL specification document according to OMOP templatement according to OMOP template

# Current version
[**v0.9.0**](https://github.com/OHDSI/WhiteRabbit/releases/tag/v0.9.0)
[**v0.10.1**](https://github.com/OHDSI/WhiteRabbit/releases/latest)
Binary file modified examples.zip
Binary file not shown.
2 changes: 1 addition & 1 deletion iniFileExamples/WhiteRabbit.ini
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ PASSWORD = supersecret # Password for the database
DATABASE_NAME = schema_name # Name of the data schema used
DELIMITER = , # The delimiter that separates values
TABLES_TO_SCAN = * # Comma-delimited list of table names to scan. Use "*" (asterix) to include all tables in the database
SCAN_FIELD_VALUES = yes # Include a frequency count of field values in the scan report? "yes" or "no"
SCAN_FIELD_VALUES = yes # Include the frequency of field values in the scan report? "yes" or "no"
MIN_CELL_COUNT = 5 # Minimum frequency for a field value to be included in the report
MAX_DISTINCT_VALUES = 1000 # Maximum number of distinct values per field to be reported
ROWS_PER_TABLE = 100000 # Maximum number of rows per table to be scanned for field values
Expand Down
2 changes: 1 addition & 1 deletion pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
<groupId>org.ohdsi</groupId>
<artifactId>leporidae</artifactId>
<packaging>pom</packaging>
<version>0.10.0</version>
<version>0.10.1</version>
<modules>
<module>rabbitinahat</module>
<module>whiterabbit</module>
Expand Down
2 changes: 1 addition & 1 deletion rabbit-core/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
<parent>
<artifactId>leporidae</artifactId>
<groupId>org.ohdsi</groupId>
<version>0.10.0</version>
<version>0.10.1</version>
</parent>
<modelVersion>4.0.0</modelVersion>

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,10 @@ public void setTables(List<Table> tables) {
this.tables = tables;
}

public void addTable(Table table) {
this.tables.add(table);
}

public String getDbName() {
return dbName;
}
Expand Down Expand Up @@ -138,11 +142,14 @@ public static Database generateModelFromScanReport(String filename) {
Database database = new Database();
QuickAndDirtyXlsxReader workbook = new QuickAndDirtyXlsxReader(filename);

// Create table lookup from tables overview, if exists
// Create table lookup from tables overview, if it exists
Map<String, Table> nameToTable = createTablesFromTableOverview(workbook, database);

// Field overview is the first sheet
Sheet overviewSheet = workbook.get(0);
Sheet overviewSheet = workbook.getByName(ScanSheetName.FIELD_OVERVIEW);
if (overviewSheet == null) {
overviewSheet = workbook.get(0);
}
Iterator<QuickAndDirtyXlsxReader.Row> overviewRows = overviewSheet.iterator();

overviewRows.next(); // Skip header
Expand All @@ -168,11 +175,12 @@ public static Database generateModelFromScanReport(String filename) {
String fieldName = row.getStringByHeaderName(ScanFieldName.FIELD);
Field field = new Field(fieldName.toLowerCase(), table);

String fractionEmpty = row.getByHeaderName(ScanFieldName.FRACTION_EMPTY);
field.setNullable(fractionEmpty == null || !fractionEmpty.equals("0"));
field.setType(row.getByHeaderName(ScanFieldName.TYPE));
field.setMaxLength(row.getIntByHeaderName(ScanFieldName.MAX_LENGTH));
field.setDescription(row.getStringByHeaderName(ScanFieldName.DESCRIPTION));
field.setFractionEmpty(row.getDoubleByHeaderName(ScanFieldName.FRACTION_EMPTY));
field.setUniqueCount(row.getIntByHeaderName(ScanFieldName.UNIQUE_COUNT));
field.setFractionUnique(row.getDoubleByHeaderName(ScanFieldName.FRACTION_UNIQUE));
field.setValueCounts(getValueCounts(workbook, tableName, fieldName));

table.getFields().add(field);
Expand All @@ -186,18 +194,13 @@ public static Table createTable(String name, String description, Integer nRows,
Table table = new Table();
table.setName(name.toLowerCase());
table.setDescription(description);
table.setRowCount((nRows == null || nRows == -1) ? nRowsChecked : nRows);
table.setRowCount(nRows == null ? -1 : nRows);
table.setRowsCheckedCount(nRowsChecked == null ? -1 : nRowsChecked);
return table;
}

public static Map<String, Table> createTablesFromTableOverview(QuickAndDirtyXlsxReader workbook, Database database) {
Sheet tableOverviewSheet = null;
for (Sheet sheet : workbook) {
if (sheet.getName().equals(ScanSheetName.TABLE_OVERVIEW)) {
tableOverviewSheet = sheet;
break;
}
}
Sheet tableOverviewSheet = workbook.getByName(ScanSheetName.TABLE_OVERVIEW);

if (tableOverviewSheet == null) { // No table overview sheet, empty nameToTable
return new HashMap<>();
Expand All @@ -224,7 +227,7 @@ public static Map<String, Table> createTablesFromTableOverview(QuickAndDirtyXlsx
return nameToTable;
}

private static String[][] getValueCounts(QuickAndDirtyXlsxReader workbook, String tableName, String fieldName) {
private static ValueCounts getValueCounts(QuickAndDirtyXlsxReader workbook, String tableName, String fieldName) {
Sheet tableSheet = null;
String targetSheetName = Table.createSheetNameFromTableName(tableName);
for (Sheet sheet : workbook) {
Expand All @@ -233,29 +236,43 @@ private static String[][] getValueCounts(QuickAndDirtyXlsxReader workbook, Strin
break;
}
}
if (tableSheet == null) // Sheet not found for table, return empty array
return new String[0][0];

// Sheet not found for table, return empty
if (tableSheet == null) {
return new ValueCounts();
}

Iterator<org.ohdsi.utilities.files.QuickAndDirtyXlsxReader.Row> iterator = tableSheet.iterator();
org.ohdsi.utilities.files.QuickAndDirtyXlsxReader.Row header = iterator.next();
int index = header.indexOf(fieldName);
List<String[]> list = new ArrayList<String[]>();

ValueCounts valueCounts = new ValueCounts();
if (index != -1) // Could happen when people manually delete columns
while (iterator.hasNext()) {
org.ohdsi.utilities.files.QuickAndDirtyXlsxReader.Row row = iterator.next();
if (row.size() > index) {
String value = row.get(index);
String count;
if (row.size() > index + 1)

if (row.size() > index + 1) {
count = row.get(index + 1);
else
} else {
count = "";
if (value.equals("") && count.equals(""))
}

if (value.equals("") && count.equals("")) {
break;
list.add(new String[] { value, count });
}

// If the count is not a number, ignore this row
try {
valueCounts.add(value, (int) (Double.parseDouble(count)));
MaximMoinat marked this conversation as resolved.
Show resolved Hide resolved
} catch (NumberFormatException e) {
// System.out.println("Count could not be parsed for value: " + value);
}
}
}
return list.toArray(new String[list.size()][2]);
return valueCounts;
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -25,13 +25,16 @@ public class Field implements MappableItem {
private Table table;
private String name;
private String comment = "";
private String[][] valueCounts;
private ValueCounts valueCounts = new ValueCounts();
private boolean isNullable;
private String type;
private String description = "";
private Integer maxLength;
private boolean isStem;
private List<ConceptsMap.Concept> conceptIdHints;
private Double fractionEmpty;
private Integer uniqueCount;
private Double fractionUnique;

public Field(String name, Table table) {
this.table = table;
Expand Down Expand Up @@ -66,11 +69,11 @@ public void setName(String name) {
this.name = name;
}

public String[][] getValueCounts() {
public ValueCounts getValueCounts() {
return valueCounts;
}

public void setValueCounts(String[][] valueCounts) {
public void setValueCounts(ValueCounts valueCounts) {
this.valueCounts = valueCounts;
}

Expand Down Expand Up @@ -137,4 +140,29 @@ public List<ConceptsMap.Concept> getConceptIdHints() {
public void setConceptIdHints(List<ConceptsMap.Concept> conceptIdHints) {
this.conceptIdHints = conceptIdHints;
}

public Double getFractionEmpty() {
return fractionEmpty;
}

public void setFractionEmpty(Double fractionEmpty) {
this.fractionEmpty = fractionEmpty;
this.setNullable(fractionEmpty == null || fractionEmpty != 0);
}

public Integer getUniqueCount() {
return uniqueCount;
}

public void setUniqueCount(Integer uniqueCount) {
this.uniqueCount = uniqueCount;
}

public Double getFractionUnique() {
return fractionUnique;
}

public void setFractionUnique(Double fractionUnique) {
this.fractionUnique = fractionUnique;
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,10 @@ public void setRowsCheckedCount(int rowsCheckedCount) {
this.rowsCheckedCount = rowsCheckedCount;
}

public void addField(Field field) {
this.fields.add(field);
}

public List<Field> getFields() {
return fields;
}
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
/*******************************************************************************
* Copyright 2020 Observational Health Data Sciences and Informatics
*
* This file is part of WhiteRabbit
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
******************************************************************************/
package org.ohdsi.rabbitInAHat.dataModel;

import java.util.ArrayList;

public class ValueCounts {
private ArrayList<ValueCounts.ValueCount> valueCounts = new ArrayList<>();
private int totalFrequency = 0;

public class ValueCount {
private String value;
private int frequency;

public ValueCount(String value, int frequency) {
this.value = value;
this.frequency = frequency;
}

public String getValue() {
return value;
}

public void setValue(String value) {
this.value = value;
}

public int getFrequency() {
return frequency;
}
}

public boolean add(String value, int frequency) {
totalFrequency += frequency;
return valueCounts.add(new ValueCount(value, frequency));
}

public ArrayList<ValueCounts.ValueCount> getAll() {
return valueCounts;
}

public ValueCounts.ValueCount get(int i) {
return valueCounts.get(i);
}

public String getMostFrequentValue() {
// Assumption: first added value is the most frequent one (that is how the scan report is structured)
MaximMoinat marked this conversation as resolved.
Show resolved Hide resolved
if (valueCounts.size() > 0) {
return valueCounts.get(0).getValue();
}
return null;
}

public int getTotalFrequency() {
return totalFrequency;
}

public int size() {
return valueCounts.size();
}

public boolean isEmpty() {
return size() == 0;
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -40,10 +40,11 @@ public class QuickAndDirtyXlsxReader extends ArrayList<Sheet> {

private static final long serialVersionUID = 25124428448185386L;

private List<String> sharedStrings = new ArrayList<String>();
private List<String> sharedStrings = new ArrayList<>();

private Map<String, Sheet> rIdToSheet = new HashMap<String, Sheet>();
private Map<String, Sheet> filenameToSheet = new HashMap<String, Sheet>();
private Map<String, Sheet> rIdToSheet = new HashMap<>();
private Map<String, Sheet> nameToSheet = new HashMap<>();
private Map<String, Sheet> filenameToSheet = new HashMap<>();

public QuickAndDirtyXlsxReader(String filename) {
try {
Expand All @@ -56,17 +57,10 @@ public QuickAndDirtyXlsxReader(String filename) {
readFromStream(inputStream);

// Step 3: order the sheets:
Collections.sort(this, new Comparator<Sheet>() {

@Override
public int compare(Sheet o1, Sheet o2) {
return IntegerComparator.compare(o1.order, o2.order);
}
});
Collections.sort(this, (o1, o2) -> IntegerComparator.compare(o1.order, o2.order));
} catch (FileNotFoundException e) {
e.printStackTrace();
}

}

private void loadSharedStringsAndRels(FileInputStream inputStream) {
Expand Down Expand Up @@ -191,7 +185,7 @@ else if (tag.equals("/v") || tag.equals("/t")) {
result.add("");
if (sharedString) {
int index = Integer.parseInt(string.substring(stringStart, tagStart - 1));
result.set(column, sharedStrings.get(index));
result.set(column, decode(sharedStrings.get(index)));
} else
result.set(column, decode(string.substring(stringStart, tagStart - 1)));
}
Expand Down Expand Up @@ -521,10 +515,18 @@ private void processWorkBook(InputStream inputStream) throws NumberFormatExcepti
Sheet sheet = rIdToSheet.get(rId);
sheet.setName(name);
sheet.order = Integer.parseInt(order);
nameToSheet.put(name, sheet);
}
}
}

public Sheet getByName(String sheetName) {
if (nameToSheet.containsKey(sheetName)) {
return nameToSheet.get(sheetName);
}
return null;
MaximMoinat marked this conversation as resolved.
Show resolved Hide resolved
}

public class Sheet extends ArrayList<Row> {
private static final long serialVersionUID = -8597151681911998153L;
private String name;
Expand Down Expand Up @@ -590,6 +592,7 @@ public String getStringByHeaderName(String fieldName) {
public Double getDoubleByHeaderName(String fieldName) {
String value = getStringByHeaderName(fieldName);
if (value != null) {
value = value.replace("<=","").trim();
MaximMoinat marked this conversation as resolved.
Show resolved Hide resolved
return Double.parseDouble(value);
} else {
return null;
Expand Down
Loading