Todo.xml

<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="XSL/Todo.xsl" ?> 

<topics xmlns:r="http://www.r-project.org">
<title>CodeDepends</title>

<topic>
<title>CodeDepends</title>
<items>
<title>Issues</title>

<item>
sourceVariable() needs to find all of the functions
that are called in any function that is defined in the dependencies.
</item>

<item status="verify">
Fix getInputs so that it doesn't add the elements on the RHS to updates.
See ~/Projects/CaseStudies/BML/BML.Rdb. and sc[[45]]@outputs and specifically
adding grid to the updates..
</item>

<item status="verify">
Fix getInputs so that if it is just an update, then don't put in the outputs
See ~/Projects/CaseStudies/BML/BML.Rdb. and sc[[45]]@outputs.
</item>

<item>
getDepends

getVariables() also includes @updates, so found 
then includes updates, but not

<r:code>
sc = readScript("~/Projects/CaseStudies/BML/BML.Rdb")
inputs = getInputs(sc)
getDepends("createGrid", sc)
</r:code>
</item>

<item>
Handle calls to substitute.
Also quote.
</item>


<item>
Don't seem to identify x as an output in 
<r:code>
  ans = numeric(length(x))
  for(i in seq(along =x)[-1])
      x[i] = x[i-1]*2
  ans
</r:code>
Should be for the second expression - the for loop where body updates x.
</item>

<item>
Get getInputs() from the body of a function, an expression, etc. 
to behave the same as reading from a script.
See RLLVMCompile/explorations/deadVariables.R
</item>
<item>
Is getInputs () method for function correct in having the names of the formals
be outputs and not inputs?
</item>

<item>
findWhenUnneeded probably need to query the updates also.
If an update is done as he last use of a variable, this is currently missed.
(See RLLVMCompile explorations/script.R)
</item>
<item>
findWhenUnneeded returns NA for a variable that is never used, just defined.
Od we want 0 or 1 or where it is defined?
</item>

<item>
<r:expr>if (as.character(fn) == "close")</r:expr>.
fn can be of the form x$foo and so have length greater than 1.
</item>

<item> Back again:<br/>
sourceVariable doesn't seem to run the function definition expressions.
<br/>
I've added facilities to do this, identifying functions as 
either locally defined within the script or not, or unknown.
<br/>
In CaseStudies/PairsTrading/
<r:code>
sc = readScript("duncanSol.Rdb")
sourceVariable("r", , sc)
</r:code>
</item>


<item status="check">
sourceVariable seems to run up to but not including the expression to create c
<br/>
Actually it seems we just need force = TRUE.
<r:code>
sc = readScript("duncanSol.Rdb")
sourceVariable("c", , sc)
</r:code>
</item>

<item status="check">
getVariables(sc[[3]]) doesn't seem to work. No method for an expression.
<br/>
But if we use a ScriptNodeInfo rather than an expression, this is fine.
<br/>
Method added for expression.
</item>

<item status="done">
Names on a script have a preceeding space. 
Is this makeTaskNames with names(x)[1] being empty? Yes.
<br/>
Now we trim the names.
</item>

<item>
Add in locally defined functions in  plot(getDetailedTimelines(.)).
<br/>
Can use functions = TRUE. However, functions = NULL doesn't turn all
of the functions off.  
<br/>
[Fixed] When we include functions, c shows up define many places. 
<br/>
getDetailedTimeline now uses the functions argument in the call to getVariables.
</item>


<item status="done">
Allow suppressing the display of the expressions in readScript.
<br/>
Added a ... and this is passed on to readXMLScript and any other fragment readers.
Use  verbose = FALSE for XML documents.
</item>
<item>
Show functions being used in time lines,
i.e. color them differently.
<br/>
They seem to now appear as being redefined!
</item>
<item>
Make an interactive, SVG plot of the timelines.
Show the code when we mouseover or click on a point.
</item>
<item>
Connect the SVG display to an HTML document with the code parts.
Allow the viewer to navigate the document via the plot.
</item>
<item status="ok">
duncalSol.Rdb in Projects/CaseStudies/PairsTrading/
shows a red before a green in getProfit.K (?)
<br/>
This is fine as we do reference the function before we
define it.
<r:code>
sc = readScript("duncanSol.Rdb")
plot(getDetailedTimelines(sc))
</r:code>
</item>
<item status="verify">
Seems to be working:
<br/>
Recognize  TRUE/FALSE, literals, @slotname, $name, not as regular variables.
<br/>
I think we probably don't want to include these literals as variables.
They are literals/constants. And we don't seem to include them in ScriptNodeInfo.
See 
<r:code>
sc = readScript("inst/samples/literal.R")
as(sc, "ScriptInfo")
</r:code>
</item>
<item>
Variables in loops - when they are local to the loop, recognize this.
See inst/samples/loop.R. i is not defined until the loop.
We seem to need a new concept to introduced here.
</item>

<item status="done">
highlightCode( inline = TRUE)
doesn't include the CSS and the JS doesn't work.
<br/>
Was using <xml:tag>link</xml:tag> for inline version when I
should have been using <xml:tag>style</xml:tag>.
</item>
<item>
Change the CSS for highlight.css to get better colors and fonts.
</item>

<item>
Display code in HTML and have a plot
of the task graph, or the detailed timeline 
and link the two. 
Allow highlighting a variable in the plot and
then show the variables in the code.
Two modes - where variables are assigned and when they are used.
</item>

<item>
How are getExpressionThread and getVariableDepends different?
</item>

<item>
findWhenUnneeded misses variables used in a formula. See example in Rd file for x.
Use cleanVars1.R.
Also, we remove the data frame which the fit object may reference.
</item>

<item>
Update functions and methods to allow a DynScript where a regular Script can be.
Just coerce. and/or method for getInputs()
</item>

<item>
Add support for include formula variables in  analysis
of getPropagateChanges(), etc.
</item>

<item>
Determine when a code chunk redefines a variable. 
This changes the dependencies 
<r:code>
p = foo()

p  = 1
bar(p)
</r:code>

<r:code>
getInputs(quote({p = foo();p  = 1; bar(p)}))
</r:code>
gives p as an output and no input.
We could say updates p.

<br/>
What about 
<r:code>
getInputs(quote({p  = p + 1; bar(p)}))
</r:code>
This has p as inputs and outputs. This is correct.
</item>

<item>
Would like to be able to update the script, see what expressions have changed
and then  use that information when using sourceVariable().
See if Gabe has done something on this.
</item>

<item status="done">
If a variable is named missing, we don't count it in the dependencies.
<br/>
Yes we do. See inst/samples/missing.R
</item>

<item status="check">
Identify as a dependency a variable that is used
as the default value of a function that is called in our 
expressions.
See fgets.Rdb in Rllvm/explorations.
<br/>
<r:code>
sc = readScript(system.file("samples", "defaultValue.R", package = "CodeDepends"))
k = as(sc, "ScriptInfo")
k[[3]]@inputs

k[[4]]@inputs
</r:code>
</item>
<item>
(Recursively) find global variables in functions.
See fgets.Rdb in Rllvm/explorations.
</item>

<item>
Come up with a mechanism that identifies
side effects via closures.
See for example the RClangSimple/RCIndex examples (notDone.R)
where we call parseTU(file,  col$fun) and col$fun is a closure.
<br/>
This is slightly tricky as we want to mark col as being updated,
not col$fun.  We could identify col as a reference type or mutable
and hence if it is passed to a function, we can consider it mutated
or possibly so. We can also analyze the function to which it is passed
and see if it uses the mutable parts, and do this recursively to the functions
it calls and so on.
</item>


<item>
Function to determine if  a function (potentially) modifies its arguments.
Put another way, determine which parameters are const.
This can be part of a type inference package I am working on
(github.com:duncantl/RTypeInference).
</item>

<item>
Function that just revaluates the command to define a variable,
e.g. <r:expr>sc$reval('foo')</r:expr>
So this is for when we don't want to find the command and
have the script.

</item>

<item status="done">
getSectionDepends does'nt seem to recognize need for hout in RClangSimple/inst/TU/routinese.R
when source'ing <r:var>done</r:var>.
</item>

<item>
Add more checks for sideEffects. See R/sideEffects.R
Just <r:expr>cat(, file = sym)</r:expr> here.
These pretty much need to be done manually.
</item>

<item status="done">
In sourceVariable and getVariableDepends, recognize
side effects such as <r:expr>cat(, file = var)</r:expr>.
<br/>
For an example, see routines.R and hout in RClangSimple/inst/TU (Apr 23 '13)
<br/>
Should we capture these as sideEffects or in the updates slot?
</item>

<item>
Make the Script object something we can do something like
<r:expr>  sc[["foo"]] </r:expr>
to run the code to get foo and the return foo.
This is just syntax for sourceVariable.
<br/>
Doesn't sc$name do this already.
<br/>
FIX: But it doesn't handle function definitions.
</item>

<item>
For tasks/nodes that have a single output, put the name of the variable 
as the name of the task.
</item>

<item>
Allow sourceVariable()  to check what variables
already exist and just run the code for those that don't.
This may not always be what we want, but it can be useful
when we have different tasks some of which share the same inputs.
When we run one task and then another, we don't want to necessarily run
the shared code again.
</item>

<item>
Make a reference Script object that checks to see if the file has been updated
and changes its information when it has.
</item>

<item status="extend">
sourceVariable() should load packages.
Ideally determine which functions/variables/symbols come from
which packages.
If the functions don't exist when we go to evaluate the call,
then load the packages in earlier.
</item>

<item status="done">
Make sourceVariable recognize if it is given a script as the second argument.
</item>
<item status="done">
sourceVariable is repeating the same commands. Need
to ensure getVariableDepends() returns unique expressions.
<br/>
<r:func>sapply</r:func> rather than <r:func>lapply</r:func>
led to sort on columns.
</item>


<item status="high">
<r:code><![CDATA[
getInputs(quote(x[i] <- x[i] + 2))
]]></r:code>
says that i is an output!  It is on the 
LHS, but it is not an output. Need to be more precise.
</item>
<item>
Handle dget, .readRDS, etc. as potentially reading  files.
</item>
<item status="verify">
When an assignment is not a simple name, take the name
of the first variable mentioned and use that as an input.
<br/>
Test to see if it is working. Changed the is.name(e[[2]][[2]]) condition.
Need to be more general.
<br/>
<r:code><![CDATA[
getInputs(quote( x[[3]]$foo <- k + g(z)))
]]></r:code>
</item>

<item status="done">
Computing inputs is not getting covs in 
sitepairs.R, specifically the 28th expression.
<r:code>
s = readScript("inst/samples/sitepairs.R")
getInputs(s[[28]])
</r:code>
<br/>
For some silly reason, I was excluding variables that were also in the outputs.
</item>

<item status="verify">
Seems to be working - April 1, 2014
<br/>
Names of assignment functions not being added to functions in
getInputs. E.g. in 
<r:code><![CDATA[
getInputs(quote(colnames(covs) <- paste(xcolnames(covs), "c", sep = ".")))
]]></r:code>
should be catching the colnames.


</item>

<item>
Get the graphs working properly.
<r:code>
sc = readScript("inst/samples/parallel.R")
g = makeVariableGraph(,sc)
library(Rgraphviz)
plot(g)
</r:code>
</item>

<item status="check">
Don't include in the variables the right hand side of a $ call,
i.e. x$foo should only reference x.
See getInputs.language in codeDepends.
See
<r:code>
sc = readScript("inst/samples/dollar.R")
i = getInputs(sc)
</r:code>
or more simply
<r:code><![CDATA[
getInputs(expression(x$a <- 1))
]]></r:code>
The assignment includes the rhs, i.e. i[[3]]
<br/>
We may want to introduce a new class of "inputs" and "outputs"
named, e.g., "modified" to reflect inline modifications
via x$foo and x[i] and x[i, j]
</item>


<item status="verify">
Document the undocumented functions.
<r:code>
library(tools)
undoc("CodeDepends", ".")
</r:code>
</item>

<item status="check">
Handle complex assignments (i.e. left hand sides that are not simple names)
where we add them to the outputs.
</item>

<item>
getExpressionThread() and redefined variables and infinite loops.
</item>
</items>

<items>
<item>
Follow load() and source() commands to identify variables they provide.
Recursively process source() commands.
Get the table of contents for a load() file.
</item>
<item>
Introduce a SourceNodeInfo that extends ScriptNodeInfo and
indicates this is a source command.
Then introduce other important concepts that are not generic.
</item>


<item>
Determine parallel/non-dependent blocks.
</item>

<item status="done">
Determine the point at which a variable is no longer needed
and be able to add code to remove it, e.g. a cleanup section
for each code block.
<br/>
This is as simple as looking at the info for later code blocks
and see if a variable is in the inputs of any of these.
<br/>
See addRemoveIntermediates.
</item>

<item status="done">
Create a graph connecting the expressions with edges going
from an expression block that is an  input to another task.
<br/>
makeGraph
</item>


<item>
Add support for going to individual expressions and not just blocks.
i.e. so that we can extract subsets of a block when updating.
<br/>
The existing code will do all of the hard work. The issue
is to arrange the results within a code block/expression
based on the individual calls.
This is not hard, just a change to our setup.
So we will want to introduce new  classes to preserve the
existing material while introducing new approaches.
</item>

<item status="done">
getInputs() on an R function (e.g. arima0) seems
to have a lot of redefinitions.
<r:code>
info = getInputs(arima0)
dtm = getDetailedTimelines(, ff)
plot(dtm)
</r:code>
<br/>
Problem was I was reusing the same collector() for
all expressions in case the caller gave us a collector.
As a result, we were accumulating results across expressions.
</item>

<item status="low">
Plot of the timeline of variables.
<br/>
See getDetailedTimelines() and its plot method.
<br/>
We should make this more polished (fix up lines that go to the end), 
and think about what else we should really be plotting.
</item>

<item status="done">
Find rm() commands.
</item>
</items>
</topic>
<topic>
<title>Alternatives and Threads</title>
<items>
<item>
Deal with alternatives in the document.
</item>
<item>
Draw the tasks as a flow.
<br/>
Think about alternatives at this point, but of course they don't really belong here
necessarily.
</item>
</items>
</topic>


</topics>