All About Scripts
What is a Script?
Scripts are specific elements that are part of a LOST annotation pipeline. A script element is implemented as a python3 module. The listing below shows an example of such a script. This script will request image annotations for all images of a dataset. You can find the script here.
In order to implement a script you need to create a python class that
inherits from lost.pyapi.script.Script. Your class needs to implement a main method
whichneeds to be instantiated within your python script. The
listing below shows a minimum example for a script.
from lost.pyapi import script
class MyScript(script.Script):
def main(self):
self.logger.info('Hello World!')
if __name__ == "__main__":
MyScript()
Example Scripts
More script examples can be found here:
lost/backend/lost/pyapi/examples/pipes
The LOST PyAPI Script Model
As all pipeline elements, a script has an input and an output object. Via these objects, it is connected to other elements in a pipeline (see also here).
Inside a script you can exchange information with the connected elements
by using the self.inp object and the
self.outp object.
Reading Imagesets
It is a common pattern to read a path to an imageset from a
Datasource element in your annotation pipeline. See
the listing below for
a code example. Since multiple Datasources could be connected to our
script, we iterate over all connected Datasources of the input with
self.inp.datasources. For each Datasource
element we can read the
path attribute to get the filesystem path to a folder with images.
from lost.pyapi import script
import os
class MyScript(script.Script):
def main(self):
for ds in self.inp.datasources:
for img_file in os.listdir(ds.path):
img_path = os.path.join(ds.path, img_file)
if __name__ == "__main__":
MyScript()
Requesting Annotations
The most important feature of the LOST PyAPI is the ability to request
annotations for a connected AnnotationTask element. Inside a
Script you can access the output element and call the
self.outp.request_annos method (see
listing 4 below).
self.outp.self.outp.request_annos(img_path)
Sometimes you also want to send annotation proposals to an AnnotationTask in order to support your annotator. In most cases these proposals will be generated by an AI, like an object detector. The listing below shows a simple example to send a dummy box and a dummy point to an annotation tool.
self.outp.self.outp.request_annos(img_path,
annos = [[0.1, 0.1, 0.2, 0.2], [0.1, 0.2]],
anno_types = ['bbox', 'point'])
Annotation Broadcasting
If multiple AnnoTask elements are connected to your
ScriptOutput and you call
self.outp.request_annos,
the annotation request will be broadcast to all connected AnnoTasks. So each AnnoTask will get its own copy of
your annotation request. Technically, for each annotation request an
empty ImageAnno will be created for each AnnoTask. During the
annotation process this
ImageAnno
will be filled with information.
Reading Annotations
Another important task is to read annotations from previous pipeline elements. In most cases this will be AnnoTask elements.
If you like to read all annotations at the
script input in a vectorized way, you can use
self.inp.to_df()
to get a pandas DataFrame
or self.inp.to_vec() to get a list of lists.
If you prefer to iterate over all
ImageAnnos you can use the respective iterator
self.inp.img_annos. See the
listing below for
an example.
for img_anno in self.inp.img_annos:
for twod_anno in img_anno.twod_annos:
self.logger.info('image path: {}, 2d_anno_data: {}'.format(img_anno.img_path, twod_anno.data)
Contexts to Store Files
There are three different contexts that can be used to store files that
should handled by your script. Each context is modeled as a specific
folder in the lost filesystem. In order to get the path to a context
call
self.get_path.
The listing below
shows an application of
self.get_path
in order to get the path to the instance context.
../../../backend/lost/pyapi/examples/pipes/sia/export_csv.py
There a three types of contexts that can be accessed: instance, pipe, and static.
The instance context is only accessible by the current instance of your script. Each time a pipeline is started each script will get its own instance folder in the LOST filesystem. No other script in the same pipeline will access this folder.
If you'd like to exchange files among the script instances of a started
pipeline, you can choose the pipe context. When calling
self.get_path
with context = 'pipe' you will get a path to a
folder that is available to all script instances of a pipeline instance.
The static context is a path to the pipeline project folder where all script instances will have access to. In this way you can access files that you have provided inside the Pipeline Project. For example, if you'd like to load a pretrained machine learning model inside of your script, you can put it into the pipeline project folder and and access it via the static context:
path_to_model = self.get_path('pretrained_model.md5', context='static')
Logging
Each Script will have a its own
logger.
This logger is an instance of the standard python
logger. The
example below shows
how to log an info message, a warning and an error. All logs are
redirected to a pipeline log file that can be downloaded via the
pipeline view inside the web gui.
self.logger.info('I am a info message')
self.logger.warning('I am a warning')
self.logger.error('An error occured!')
Script Errors and Exceptions
If an error occurs in your script, the traceback of the exception will be visible in the web gui, when clicking on the respective script in your pipeline. The error will also be automatically logged to the pipeline log file.
Script ARGUMENTS
The ARGUMENTS variable will be used to provide script arguments that can be set during the start of a pipline within the web gui. ARGUMENTS are defined as a dictionary of dictionaries. Each argument dictionary has the keys value and help. As you can see in the listing below the first argument is called my_arg. Its value is true and its help text is A boolean argument.
ARGUMENTS = {'my_arg' : { 'value':'true',
'help': 'A boolean argument.'}
}
Within your script you can access the value of an argument with the get_arg(...) method as shown below.
if self.get_arg('my_arg').lower() == 'true':
self.logger.info('my_arg was true')
Script ENVS
The EVNS variable provides meta information for the pipeline engine by defining a list of environments (similar to conda environments) where this script may be executed in. In this way you can assure that a script will only be executed in environments where all your dependencies are installed. All environments are installed in workers that may execute your script. If many different environments are defined within the ENVS list of a script, the pipeline engine will try to assign the script to a worker in the same order as defined within the ENVS list. So if a worker is online that has installed the first environment in the list the pipeline engine will assign the script to this worker. If no worker with the first environment is online, it will try to assign the script to a worker with the second environment in the list and so on. The listing below shows an example of the ENVS definition in a script that may be executed in two different environments.
ENVS = ['lost', 'lost-cv']
Script RESOURCES
Sometimes a script will require all resources of a worker. Therefore, no other script should be executed in parallel by the worker that executes your script. This is often the case if you train an AI model and you need all GPU memory to do this. In those cases, you can define a RESOURCES variable inside your python script and assign a list containing the string lock_all to it. See the listing below for an example:
RESOURCES = ['lock_all']
Debugging a Script
Most likely, if you imported your pipeline and run it for the first time, some scripts will not work, since you placed some tiny bug into your code :-)
Inside the web GUI, all exceptions and errors of your script will be visualized when clicking on the respective script element in the pipeline visualization. This way, you get a first hint at what's wrong.
In order to debug your code you need to login to the docker container and find the instance folder that is created for each script instance. Inside this folder, there is a bash script called debug.sh that needs to be executed in order to start the pudb debugger. You will find your script by its unique pipeline element id. The path to the script instance folder will be /home/lost/app/debug/i-<pipe_element_id>. You can find the ID when inspecting the pipeline in the LOST web GUI.
# Log in to docker
docker exec -it lost bash
# Change directory to the instance path of your script
cd /home/lost/app/debug/i-<pipe_element_id>
# Start debugging
bash debug.sh
If your script requires a special ENV to be executed, you need to login to a container that has installed this environment for debugging.