Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datasource options field in DB too short #78

Closed
Michael2222 opened this issue Dec 19, 2020 · 25 comments
Closed

Datasource options field in DB too short #78

Michael2222 opened this issue Dec 19, 2020 · 25 comments
Assignees
Milestone

Comments

@Michael2222
Copy link

Hello,
perhaps you have an example for "HTML grabber".
I have try to import the table/values every day by this site:
https://www.sozialministerium.at/Informationen-zum-Coronavirus/Neuartiges-Coronavirus-(2019-nCov).html

But I can't find the right parameter.
Thanks!

@Rello
Copy link
Owner

Rello commented Dec 19, 2020

Hello,
I will have a look. nice idea
but: Analytics does not have multi dimensions yet(!), means we can only use one e.g. (one will be reserved for the timestamp of the load to be able to show the history):

dimension1 dimension2 value
Bestätigte Fälle timestamp. Ö-gesamt
Totesfälle timestamp. Ö-gesamt
....

do you know what I mean?
so either the the totals per keyfigure OR the numbers per Bundesland.

@Michael2222
Copy link
Author

Michael2222 commented Dec 19, 2020

Hello,

do you know what I mean?

I hope.
So if I want to store the value of the row "Bestätigte Fälle" and the column "Österreich gesamt" every day at 10am
So I have set this fileds:
Datenquelle=HTML grabber
URL=https://www.sozialministerium.at/Informationen-zum-Coronavirus/Neuartiges-Coronavirus-(2019-nCov).html
Anzeige=Diagramm & Tabelle
Diagrammtyp= Linie

  1. But what should I input in the filed "valid regex" so I address only this one field in the table from this Covid site?
  2. And how can I set the time, e.g. 10am, set he store the value?

Thanks for your hint.

@Michael2222
Copy link
Author

Hello,

I have copy the HTML File and this is the HTML structure:

 <table class="table"> 
  <thead> 
   <tr> 
    <th scope="col">Bundesland</th> 
    <th scope="col">Bgld.</th> 
    <th scope="col">Ktn.</th> 
    <th scope="col">NÖ</th> 
    <th scope="col">OÖ</th> 
    <th scope="col">Sbg.</th> 
    <th scope="col">Stmk.</th> 
    <th scope="col">T</th> 
    <th scope="col">Vbg.</th> 
    <th scope="col">W</th> 
    <th scope="col">Österreich<br> gesamt</th> 
   </tr> 
  </thead> 
  <tbody> 
   <tr> 
    <th scope="row">Bestätigte Fälle<br> (Stand 19.12.2020, 15:00 Uhr)</th> 
    <td>8.629</td> 
    <td>19.430</td> 
    <td>49.617</td> 
    <td>67.909</td> 
    <td>25.773</td> 
    <td>39.245</td> 
    <td>38.119</td> 
    <td>18.034</td> 
    <td>68.729</td> 
    <td>335.485</td> 
   </tr> 

How can I extract the value 335.485 with regex?
Thank you!

@Rello
Copy link
Owner

Rello commented Dec 20, 2020

Hey,
will check hopefully tomorrow. I am just finishing the testing on a bigger change for the analytics app.
will get back to you

@Michael2222
Copy link
Author

Thank you!

@Rello
Copy link
Owner

Rello commented Dec 22, 2020

Bildschirmfoto 2020-12-22 um 14 17 17

I am not a regex specialist. but somehow like this.
not sure how to get the last per row

(<th scope="row">)(?<dimension>.*?)(<br>)(.*?)(<td>)(?<value>.*?)(<\/td)

https://regex101.com

@Rello
Copy link
Owner

Rello commented Jan 2, 2021

did you find a way?

@Rello Rello self-assigned this Jan 2, 2021
@Rello Rello added the needs info feedback from requester required label Jan 2, 2021
@Michael2222
Copy link
Author

Hello,
no sorry, I didn't get it to work.

@AleksovAnry
Copy link
Contributor

Hello! Try this RegEx:

/(<th scope="row">)(?<dimension>.*)(<\/th>)(.*)([\w\W\d]+>)(?<value>.*)(<\/td>[\n\s]*<\/tr)/

Снимок экрана 2021-01-03 в 14 32 27

@Rello
Copy link
Owner

Rello commented Jan 3, 2021

Thank you for your input

@AleksovAnry
Copy link
Contributor

Trying to search within a portion of the page is a bad idea. I analyzed the URL and changed the RegEx:

(<th scope="row">)(?<dimension>.*)(<\/th>)(.*)([\w\W\d]+>)(?<value>.*)(<\/td>[\n\s]*<\/tr>[\n\s]*<tr>[\n\s]*<th scope="row">Todesfälle)

Снимок экрана 2021-01-03 в 15 01 49

@Michael2222
Copy link
Author

Hello,

I have try to put the string "//()(?.)(</th>)(.)([\w\W\d]+>)(?.)(</td>[\n\s]</tr>[\n\s][\n\s]Todesfälle)//" in the field "valid regex" and click "report store" but he do not store this value:

grafik
report
If I click to another report and then back to this report, the "valid regex" is empty.

@Rello
Copy link
Owner

Rello commented Jan 4, 2021

Let me check

@AleksovAnry
Copy link
Contributor

Hello! Because you put wrong value. You put double slashes before, and i think after RegEx.
Valid value:
/(<th scope="row">)(?<dimension>.*)(<\/th>)(.*)([\w\W\d]+>)(?<value>.*)(<\/td>[\n\s]*<\/tr>[\n\s]*<tr>[\n\s]*<th scope="row">Todesfälle)/

@Rello
Copy link
Owner

Rello commented Jan 4, 2021

Hello,
first yes, only one "/"

but - i think there is an issue in Analytics. I am not sure, I need to check.
When I create a realtime Report as you did above, I do not get any data.

When I create a Report type "internal Database" and get the data via a dataload, its showing up
Bildschirmfoto 2021-01-04 um 22 31 44

Bildschirmfoto 2021-01-04 um 22 32 11

So the question is, what you initially wanted. A realtime for the current number?
or a daily dataload with timestamp to actually show a graph with a trend? in the later case, you should remove the date after the "bestätigte fälle" in the regex

@AleksovAnry
Copy link
Contributor

AleksovAnry commented Jan 5, 2021

Got data & graph using type 'column'. @Rello remind me, if we can set a zero value for axis?

Снимок экрана 2021-01-05 в 03 28 34

@Michael2222
Copy link
Author

So the question is, what you initially wanted. A realtime for the current number?
or a daily dataload with timestamp to actually show a graph with a trend?

The second. In austria every day the government published the new covid values at 9am and at 3pm. I want to store every day the value at 10am and draw a trend from the last weeks/month.

I have now update my analytics to 3.2.0, but allways the same behavior.
I put the regex expression in the field, press "store" go to another report, come back and the field is empty.
I don't get feedback from the UI that the value are really stored!?

@Rello
Copy link
Owner

Rello commented Jan 9, 2021

I put the regex expression in the field, press "store" go to another report, come back and the field is empty.
I don't get feedback from the UI that the value are really stored!?

Hello,
i could not find anything here. working for me.
wen you press the "update report" button, can you send me the details of the request from the console that is sent to the server?

there is a PUT request to analytics/dataset/226
this should have all values in the request header:
Bildschirmfoto 2021-01-09 um 10 41 19

@Michael2222
Copy link
Author

Hello,

here is the output from my console (only I have changed my domain to domain.at

Uncaught TypeError: data is undefined
    createWidgetContent https://domain.at/apps/analytics/js/dashboard.js?v=1dfba11f-20:106
    onreadystatechange https://domain.at/apps/analytics/js/dashboard.js?v=1dfba11f-20:96
dashboard.js:106:19
    createWidgetContent https://domain.at/apps/analytics/js/dashboard.js?v=1dfba11f-20:106
    onreadystatechange https://domain.at/apps/analytics/js/dashboard.js?v=1dfba11f-20:96
XHRPUThttps://domain.at/index.php/apps/analytics/dataset/1

PUT
	https://domain.at/index.php/apps/analytics/dataset/1
Status500
Internal Server Error
VersionHTTP/2
Übertragen2,40 KB (5,21 KB Größe)
Referrer Policyno-referrer

	
cache-control
	no-store, no-cache, must-revalidate
content-encoding
	gzip
content-length
	1510
content-security-policy
	default-src 'self'; script-src 'self' 'nonce-alhJMXh5Mm1VYlNIR1FGODdLUzBGcmQ1TFRRSEhzV2dNa0NkTXcvSndyTT06MVNGdzgxM3RCNFRoU2s4ZHcvMzdSY1VTVlhGTWRyYlFZQ1RUQVVxc2tQaz0='; style-src 'self' 'unsafe-inline'; frame-src *; img-src * data: blob:; font-src 'self' data:; media-src *; connect-src *; object-src 'none'; base-uri 'self';
content-type
	text/html; charset=UTF-8
date
	Sat, 09 Jan 2021 16:31:42 GMT
expires
	Thu, 19 Nov 1981 08:52:00 GMT
pragma
	no-cache
referrer-policy
	no-referrer
server
	Apache
strict-transport-security
	max-age=15768000
vary
	Accept-Encoding,User-Agent
x-content-type-options
	nosniff
x-download-options
	noopen
X-Firefox-Spdy
	h2
x-frame-options
	SAMEORIGIN
x-permitted-cross-domain-policies
	none
x-robots-tag
	none
x-xss-protection
	1; mode=block
	
Accept
	*/*
Accept-Encoding
	gzip, deflate, br
Accept-Language
	de,en-US;q=0.7,en;q=0.3
Cache-Control
	no-cache
Connection
	keep-alive
Content-Length
	658
Content-Type
	application/x-www-form-urlencoded; charset=UTF-8
Cookie
	__Host-nc_sameSiteCookielax=true; __Host-nc_sameSiteCookiestrict=true; _pk_id.1.5398=606cd27dd58ac81d.1594664372.41.1595491840.1595487924.; oc_music_volume=99; nc_username=michael.domain; nc_token=9m5cZrITs9xNnirIY3hReXsD9xBhW%2BQw; nc_session_id=jagu44omielipavbkm5fbbffi3; oc_sessionPassphrase=CeW8ShSXW6fWtSYWYaSy%2BDlEgcdeBb3fTDfa%2BmQC55LkOCzPnblLy%2Fh8GbdNna4zQ0kgBButEg73eMqMo9UvSwpbsTIGDwNvvqYfJMBLa%2F8xPRIYkPHWvtPU22H6jvyd; ocvobv4fb23o=jagu44omielipavbkm5fbbffi3
Host
	domain.at
OCS-APIREQUEST
	true
Origin
	https://domain.at
Pragma
	no-cache
requesttoken
	ILsdkri/y6x1TGsZS8ayNVf0s+EEZ+Trq5TUILRkttQ=:eOhYpsj0nZwTHyV4ZJ/9ZiWfy6RPD5eb+fCaEvEB5J4=
User-Agent
	Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:84.0) Gecko/20100101 Firefox/84.0
X-Requested-With
	XMLHttpRequest

@Rello
Copy link
Owner

Rello commented Jan 9, 2021

I am missing the variables, which are sent.

@Rello
Copy link
Owner

Rello commented Jan 9, 2021

it shows an error 500. can you check the nextcloud log?

if you are familiar with debugging, please check line 658 of the analytics/js/sidebar.js
are the options recognized correctly?
Bildschirmfoto 2021-01-09 um 21 25 56

@Michael2222
Copy link
Author

Michael2222 commented Jan 10, 2021

Hello,

yes I have an entry in the log file. Perhaps the database column size is to small for my parameter?

Error | index | Doctrine\DBAL\Exception\DriverException:
An  exception occurred while executing 'UPDATE `oc_analytics_dataset` SET  `name` = ?, `subheader` = ?, `type` = ?, `link` = ?, 
`visualization` =  ?, `chart` = ?, `chartoptions` = ?, `dataoptions` = ?, `parent` = ?,  `dimension1` = ?, `dimension2` = ?, `value` = ? 
WHERE (`user_id` = ?)  AND (`id` = ?)' with params ["Covid19 Zahlen", "", 5,  "{\"url\":\"https:\/\/www.sozialministerium.at
\/Informationen-zum-Coronavirus\/Neuartiges-Coronavirus-(2019-nCov).html\",\"regex\":\"\/(<th   scope=\\\"row\\\">)
(?<dimension>.*)(<\\\\\/th>)(.*)([\\\\w\\\\W\\\\d]+>)(?<value>.*)(<\\\\\/td>[\\\\n\\\\s]*<\\\\\/tr>[\\\\n\\\\s]*<tr>[\\\\n\\\\s]*
<th   scope=\\\"row\\\">Todesf\u00e4lle)\/\",\"limit\":\"\",\"timestamp\":\"true\"}",  "ct", "line", "", "", 0, "Objekt", "Datum", 
"Wert", "michael",  1]:  SQLSTATE[22001]: String data, right truncated: 1406 Data too long for  column 'link' at row 1

-- | -- | --

@Rello
Copy link
Owner

Rello commented Jan 10, 2021

thank you!
this helps? in fact its really a 256 field. it definitely makes sense to extend it. I will take care

@Rello Rello changed the title Example for "HTML grabber" Datasource options field in DB too short Jan 10, 2021
@Rello Rello added datasource in progress development in progress and removed needs info feedback from requester required labels Jan 10, 2021
@Rello Rello added this to the 3.3.0 milestone Jan 29, 2021
@Rello Rello added pending release part of the next release version and removed in progress development in progress labels Jan 29, 2021
@Rello
Copy link
Owner

Rello commented Jan 29, 2021

solved with the next release

@Michael2222
Copy link
Author

When will we get the new release?

@Rello Rello removed the pending release part of the next release version label Feb 13, 2021
@Rello Rello closed this as completed Feb 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants