1

I have a problem with incorrect results during sending wildcarded query in my Java project. On my Marklogic database I do have saved multiple json files with same structure. I want to receive those jsons which in field named "icsList" (It is List of Strings) starts with given String.

Example icsList in json looks like:

  • "icsList": ["11.040.40", "12.50.80"]
  • "icsList": ["12.50.60"]
  • "icsList": ["50.010.10"]

My example results:

Request - String "50" Result - All jsons (even those which do not have "50" in their icsList)

Request - String "50." Result - Jsons that have "50." inside icsList (for example: "50.010.10", "50.010.20" but also "12.50.60")

As I mantioned earlier my primary goal is to get all jsons which in field named "icsList" STARTS with given String. My secound goal is to get rid of necessary dot at the end of request String.

My code is:

StructuredQueryBuilder sqb = new StructuredQueryBuilder();
String[] wordOptions = {"wildcarded"};
StructuredQueryDefinition queryDefinitionIcs = sqb.word(sqb.jsonProperty("icsList"),
        null, wordOptions, 1, searchText + "*");

StructuredQueryDefinition query = sqb.and(queryDefinitionIcs);
query.setCollections(DocumentCollection.ATTRIBUTES.getName());


try (DocumentPage search = jsonDocumentManager.search(query, 1L)) {
    JacksonHandle handle = new JacksonHandle();
    List<DocumentAttributes> documentAttributes = StreamSupport.stream(search.spliterator(), false)
            .map(v -> mapToDocumentAttributes(handle, v))
            .toList();
}
private DocumentAttributes mapToDocumentAttributes(JacksonHandle handle, DocumentRecord v) {
        try {
            var doc = objectMapper.treeToValue(v.getContent(handle).get(), DocumentAttributes.class);
            return doc;
        } catch (JsonProcessingException e) {
            throw new RuntimeException(e);
        }
    }

In pom.xml I do have:

<dependency>
    <groupId>com.marklogic</groupId>
    <artifactId>marklogic-client-api</artifactId>
    <version>5.5.3</version>
</dependency>

1 Answer 1

1

If you enable Two Character Searches on the database, then you could search for "50*" instead of "50.*", but that could dramatically affect the size of your indexes and ingestion performance, so that may not be advisable.

You might need to enable Three Character Searches or Trailing Wildcard Searches on your database in order to be able to search efficiently with such a short wildcarded value as "50.*" or "50.* *".

https://docs.marklogic.com/guide/search-dev/wildcard#id_39731

If you used value() to construct a cts:json-property-value-query(), instead of a word query, and included the . in the wildcarded value, then it would find just that last document that starts with 50..

For example, this search:

cts:search(doc(), cts:json-property-value-query("icsList", "50.*"))

or:

cts:search(doc(), cts:json-property-value-query("icsList", "50* *"))

Note that the text content for the value in a cts:json-property-value-query is treated the same as a phrase in a cts:word-query, where the phrase is the property value. Therefore, any wildcard and/or stemming rules are treated like a phrase. For example, if you have an property value of "hello friend" with wildcarding enabled for a query, a cts:json-property-value-query for "he*" will not match because the wildcard matches do not span word boundaries, but a cts:json-property-value-query for "hello *" will match. A search for "*" will match, because a "*" wildcard by itself is defined to match the value. Similarly, stemming rules are applied to each term, so a search for "hello friends" would match when stemming is enabled for the query because "friends" matches "friend".

StructuredQueryBuilder sqb = new StructuredQueryBuilder();
String[] options = {"wildcarded"};
StructuredQueryDefinition queryDefinitionIcs = sqb.value(sqb.jsonProperty("icsList"),
    null, options, 1, searchText + "*");

An alternative to making database-wide changes would be to create a field with the necessary index settings to facilitate a two-character wildcard search for that field.

  • field value searches
  • trailing wildcard searches
  • two character searches

Then you could search the field with trailing wildcard:

cts:search(doc(), cts:field-value-query("icsList", "50* *"))

Search against the field instead of a jsonProperty:

StructuredQueryDefinition queryDefinitionIcs = sqb.value(sqb.field("icsList"),
null, options, 1, searchText + "* *");
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.