0

I have a dataset in JSON format with ‘id’ and ‘text’ columns. Currently, I’m using the following pipeline configuration in AWS:

hub = {
    'HF_MODEL_ID':'distilbert-base-uncased',
    'HF_TASK':'feature-extraction'
}
# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   env=hub,                                                # configuration for loading model from Hub
   role=role,                                              # IAM role with permissions to create an endpoint
   transformers_version="4.26",                             # Transformers version used
   pytorch_version="1.13",                                  # PyTorch version used
   py_version='py39',                                      # Python version used
)
# create Transformer to run our batch job
batch_job = huggingface_model.transformer(
    
    instance_count=1,
    instance_type='ml.m5.xlarge',
    output_path=output_s3_path, # we are using the same s3 path to save the output with the input
    strategy='SingleRecord')

I’m using a batch transform job to generate the output, which currently contains only the extracted text. However, I also want to include the ‘id’ associated with each text in the output file. Is there a way to achieve this, and if so, how can I modify my configuration to include the ‘id’ in the output file? Any guidance or examples would be greatly appreciated!

1 Answer 1

0

Yes you can associate the input with the output in a Batch Transform Job. See:

https://docs.aws.amazon.com/sagemaker/latest/dg/batch-transform-data-processing.html#batch-transform-data-processing-workflow:~:text=%3A%20%22%24%22%0A%20%20%20%20%7D%0A%7D-,Example%3A%20Output%20Inferences%20Joined%20with%20Input%20Data,-If%20you%27re%20using

In your .transform() method you use the input_filter and output_filter to asociate your input and output key:value pairs respectively.

sm_transformer.transform(…, input_filter="$", join_source= "Input", output_filter="$")

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.