ray.tune: Tracked actor is not managed

Ask Question

Asked 7 months ago

Modified 7 months ago

Viewed 120 times

from ray import tune
from ray.tune.search.optuna import OptunaSearch
from ray.tune.integration.pytorch_lightning import TuneReportCheckpointCallback
from pytorch_lightning import Trainer              
from pytorch_lightning.callbacks import EarlyStopping
from data_module import DataModule 
from fcnn_regressor import FCNNRegressor
from typing import Any

dm = DataModule(X_train, X_test, y_train, y_test,
                    batch_size=2028)

def train_model(config: dict[str, Any]) -> None:
    model = FCNNRegressor(
        input_dim=X_train.shape[1],
        hidden_dim1=config["hidden_dim1"],
        hidden_dim2=config["hidden_dim2"],
        dropout_prob=config["dropout_prob"],
        lr=config["lr"],
        weight_decay=config["weight_decay"],
    )

    trainer = Trainer(
        max_epochs=100,
        callbacks=[
            EarlyStopping(monitor="val_loss", patience=5, mode="min"),
            TuneReportCheckpointCallback({"val_loss": "val_loss"},
                                         on="validation_end")
        ],
    )
    trainer.fit(model, datamodule=dm)

search_space = {
    "hidden_dim1": tune.randint(64, 257),
    "hidden_dim2": tune.sample_from(
        lambda cfg: np.random.randint(16, cfg["hidden_dim1"] // 2 + 1)),
    "dropout_prob": tune.uniform(0.1, 0.4),
    "lr": tune.loguniform(1e-5, 1e-2),
    "weight_decay": tune.loguniform(1e-6, 1e-2),
}

optuna = OptunaSearch(metric="val_loss", mode="min")

tuner = tune.Tuner(
    train_model,
    param_space=search_space,
    tune_config=tune.TuneConfig(num_samples=20, search_alg=optuna),
)

results = tuner.fit()
print(results.get_best_result("val_loss", "min").metrics)

Why there is the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File ~/pyenvs/base/lib/python3.12/site-packages/ray/tune/tune.py:994, in run(run_or_experiment, name, metric, mode, stop, time_budget_s, config, resources_per_trial, num_samples, storage_path, storage_filesystem, search_alg, scheduler, checkpoint_config, verbose, progress_reporter, log_to_file, trial_name_creator, trial_dirname_creator, sync_config, export_formats, max_failures, fail_fast, restore, resume, resume_config, reuse_actors, raise_on_failed_trial, callbacks, max_concurrent_trials, keep_checkpoints_num, checkpoint_score_attr, checkpoint_freq, checkpoint_at_end, chdir_to_trial_dir, local_dir, _remote, _remote_string_queue, _entrypoint)
    993 while not runner.is_finished() and not experiment_interrupted_event.is_set():
--> 994     runner.step()
    995     if has_verbosity(Verbosity.V1_EXPERIMENT):

File ~/pyenvs/base/lib/python3.12/site-packages/ray/tune/execution/tune_controller.py:685, in TuneController.step(self)
    684 # Handle one event
--> 685 if not self._actor_manager.next(timeout=0.1):
    686     # If there are no actors running, warn about potentially
    687     # insufficient resources
    688     if not self._actor_manager.num_live_actors:

File ~/pyenvs/base/lib/python3.12/site-packages/ray/air/execution/_internal/actor_manager.py:225, in RayActorManager.next(self, timeout)
    224 else:
--> 225     self._handle_ready_resource_future()
    226     # Ready resource futures don't count as one event as they don't trigger
    227     # any callbacks. So we repeat until we hit anything that is not a resource
    228     # future.

File ~/pyenvs/base/lib/python3.12/site-packages/ray/air/execution/_internal/actor_manager.py:310, in RayActorManager._handle_ready_resource_future(self)
    309 # We handle resource futures one by one, so only try to start 1 actor at a time
--> 310 self._try_start_actors(max_actors=1)
...
    732     self._pending_actors_to_enqueued_actor_tasks[tracked_actor].append(
    733         (tracked_actor_task, method_name, args, kwargs)
    734     )

ValueError: Tracked actor is not managed by this event manager: <TrackedActor 34183756754259754017506083767126328506>
Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...

If it necessary, you can check code of modules:

fcnn_regressor - https://pastebin.com/NGqCVNji
data_module - https://pastebin.com/9x167drN

I tried to use ray.put and put object DataModule into train_model function. First way lead to error related to "Invalid type of object refs, <class'numpy.ndarray'>, is given" or "Invalid type of object refs, <class 'torch.Tensor'>, is given". Second way didn't resolve the issue

Data is torch.Tensor of size [463715, 90] (only numerical columns)

edited May 2 at 10:45

asked May 2 at 8:40

blnk.off

741 silver badge6 bronze badges

without example data it hard to test it and see what can be changed. Maybe you should put code with data on GitHub instead of pastebin

furas
– furas

2025-05-02 10:02:32 +00:00
Commented May 2 at 10:02
I left a sentence about the data

blnk.off
– blnk.off

2025-05-02 10:47:04 +00:00
Commented May 2 at 10:47

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

ray.tune: Tracked actor is not managed

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest