How to merge multiple Arrays containing the same value in Java

Question

I'm struggling to find the best way to merge arrays (or create new ones) by looking at their shared value.

List<String[]> dictionary = new ArrayList<String[]>();

this is my "dictionary" filled with arrays of 2 words, for example it contains arrays:

["A","B"]
["B","C"]
["D","E"]
["F","C"]
["G","H"]
["T","D"]

I need to merge them by values they share, so for example the finished "dictionary" (or completely new list) would look like this:

["A","B","C","F"];
["D","E","T"];
["G","H"];

Also, the old arrays don't have to be removed they can stay in "dictionary" but I need the merged ones and I have hard time figuring it out.

Arrays don't have to be sorted at anyhow.

This is what i have so far and it is not working

    public static void SynonymsMerge(List<String[]> dictionary){
    ArrayList<ArrayList<String>> newDictionary = new ArrayList<ArrayList<String>>();
    for(int i=0;i < dictionary.size(); i++){
        ArrayList<String> synonyms = new ArrayList<String>();
        for(int j=0; j < dictionary.get(i).length; j++){
            synonyms.add(dictionary.get(i)[j]);
        }
        newDictionary.add(synonyms);
    }
    for(int i=0;i< newDictionary.size();i++){
        for(int j=0; j < newDictionary.size();j++){
            for (int k=0; k < newDictionary.get(j).size() ;k++) {
                    if (newDictionary.get(i).equals(newDictionary.get(j)))
                        continue;
                    if (newDictionary.get(i).contains(newDictionary.get(j).get(k)))
                        newDictionary.get(i).addAll(newDictionary.get(j));

you will always have to create new arrays, as Java doesn't support to resize existing arrays. — cyberbrain
– cyberbrain, Commented Apr 25, 2022 at 16:39
...and therefore: is it possible to work just with lists, at least as an intermediate step? It would be so much easier to merge them. — cyberbrain
– cyberbrain, Commented Apr 25, 2022 at 16:47
as we know in java static array can not be modified , so instead of using List<String[]> use can use List<ArrayList<String>> dictionary = new ArrayList<>() it will give more flexibility — Vasim Hayat
– Vasim Hayat, Commented Apr 25, 2022 at 17:01
@Bahij.Mik "I'm struggling to find the best way to merge arrays" — dani-vta
– dani-vta, Commented Apr 25, 2022 at 17:18

Florian Hartung · Accepted Answer · 2022-04-25 19:23:56Z

2

First of all, here is the code. I changed the input type from List<String[]> to List<List<String>> as it does not really make sense to mix up both Lists and Arrays. This also applies to the output type.

The code

public static List<List<String>> merge(List<List<String>> dictionary) {
        List<List<String>> newDictionary = new ArrayList<>();

        for (List<String> stringPair : dictionary) {

            List<Integer> matchIndices = new ArrayList<>();
            for (int i = 0; i < newDictionary.size(); i++) {
                List<String> newStrings = newDictionary.get(i);

                for (String str : stringPair) {
                    if (newStrings.contains(str)) {
                        matchIndices.add(i);
                    }
                }
            }
            if (matchIndices.size() == 0) {
                newDictionary.addAll(new ArrayList<List<String>>(Collections.singleton(new ArrayList<>(stringPair))));
                continue;
            }

            matchIndices.sort(Integer::compareTo);

            if (matchIndices.size() == 1) {
                newDictionary.get(matchIndices.get(0)).addAll(new ArrayList<>(stringPair));
            } else {
                int last = matchIndices.remove(0);
                while (matchIndices.size() > 0) {
                    int i = matchIndices.get(0);
                    newDictionary.get(last).addAll(newDictionary.get(i));
                    newDictionary.remove(i);
                    matchIndices.remove(0);
                    matchIndices = new ArrayList<>(matchIndices.stream().map(a -> a - 1).toList());
                }
            }
        }
        newDictionary = newDictionary.stream()
                .map(strings -> strings.stream().distinct().toList())
                .toList();

        return newDictionary;
    }

How does it work?

dictionary the input of type List<List<String>> (inner List has max size of 2, even though the function would work with even more strings in theory)
newDictionary the output of the function of type List<List<String>>

The following code is executed for every input pair/List of strings in directory

Get all the existing different "groups" (their indicies) in newDictionary in which the strings from the par are already present. This List of indices is called matchIndices
Example: stringPair=["A","E"] newDictionary:[["I", "A", "O"], ["P", "D"]] would result in matchIndices=[0] because only "A" is present one time in the first element of newDictionary
If matchIndices.size() is 0, create a new group in newDictionary with the string pair. Back to 1.
If matchIndices.size() is 1, append the strings from the pair to the specific newDictionary group with the index specified in matchIndices. Back to 1.
If matchIndices.size() is greater than 1, that means that multiple groups from newDictionary with the indices specified in matchIndices will have to be merged together in the for-loop. Back to 1.

In the end we have to make sure there are no duplicates in the Lists in newDictionary.

Main method

    public static void main(String[] args) {
        List<List<String>> dictionary = new ArrayList<>(List.of(
                List.of("A", "B"),
                List.of("B", "C"),
                List.of("D", "E"),
                List.of("F", "C"),
                List.of("G", "H"),
                List.of("T", "D")));
        
        System.out.println(merge(dictionary));
    }

Why do we need step 4?

In your specific example we don't have to merge multiple groups.
But with input data like this

List<List<String>> dictionary = new ArrayList<>(List.of(
                List.of("A", "B"),
                List.of("B", "C"),
                List.of("D", "E"),
                List.of("F", "E"),
                List.of("E", "A")));

we eventually come to the point where newDictionary=[[A, B, B, C], [D, E, F, E]] and we have to try to insert [E, A]. Here both groups from newDictionary will have to be merged together.
This then results in the output of [[A, B, C, D, E, F]], where both groups are merged and removed duplicates.

P.s.

I am not really happy with this solution as it is not really clear what is actually going on, but I'm still posting this, because you said you would be happy with any solution. :)

edited Apr 25, 2022 at 19:23

answered Apr 25, 2022 at 19:08

Florian Hartung

1531 silver badge6 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

Florian Hartung Over a year ago

Multiple occurrences of strings are only filtered out in the end, which results in this intermediate representation.

Florian Hartung Over a year ago

I'm pretty new to Stackoverflow and don't see why I would post the output, which OP already specified again. Also the output [[A, B, C, D, E, F]] is for the specific input case [[A, B], [B, C], [D, E], [F, E], [E, A]], which was not asked for, but I felt like I would have to include it to properly explain why step 4 was needed. Just to make sure. Is it common practise to post the output wanted by OP again?

kpcttobee Over a year ago

HI so i tried your solution and on first attempt it works fine but when i use it again it misses some words and i have no idea why. Is there any way we could chat ?

Alexander Ivanchenko Over a year ago

Your code requires an improvement both in terms of stile and performance. 1. It's not a very bright idea to cram all the logic into a single method. It'll be nicer to split it into several narrow-focused methods. 2. I have to think more carefully when you are choosing data structures (and their implementations, i.e. collections). Don't use ArrayList everywhere. The way you are using matchIndices clearly indicates that it has to be a stack and not a list (class Stack is legacy` don't use it, you need implementation of the Deque interface).

kpcttobee Over a year ago

With this values the word "wood" doesnt get included in the merged dictionary imgur.com/boVpOfb @FlorianHartung

|

Alexander Ivanchenko · Accepted Answer · 2022-04-25 19:03:59Z

For this problem you need to connect A with B (and vice versa), then C with B and C with F and so on. A lot of intersections and a lot of cycles.

And that is the perfect case to utilize a Graph as data structure.

To be precise, this dataset could be represented as a cyclic undirected disjointed graph, that contains several connected components. And in terms of graph theory, this task boils down to discovering all components in the graph.

For that, we need to take the following steps:

Create and initialize the graph by parsing the input data.
Iterate over the collection of vertices of the graph, and traverse every encountered connected component. Each vertex that wasn't seen previously indicates that a component to which it belongs also hasn't been yet discovered. As traversal algorithm I've chosen Depth first search (but for the purpose of this task Breadth first search algorithm will also work fine).

Implementation:

public class Graph {
    private Map<String, Vertex> vertexByName = new HashMap<>();
    
    private Graph() {} // no way and no need to invoke this constractor outside the class
    
    public static Graph getInstance(List<List<String>> dictionary) { // method responsible for instantiation and initialization of the graph
        Graph graph = new Graph();
        graph.init(dictionary);
        return graph;
    }
    
    private void init(List<List<String>> dictionary) {
        for (List<String> list: dictionary) {
            for (String name: list) {
                addVertex(name, list);
            }
        }
    }
    
    private void addVertex(String name, List<String> neighbours) {
        
        Vertex cur = vertexByName.computeIfAbsent(name, Vertex::new);
    
        for (String neighbourName: neighbours) {
            if (neighbourName.equals(name)) continue;
        
            Vertex neighbour = vertexByName.computeIfAbsent(neighbourName, Vertex::new);
            cur.addNeighbour(neighbour);
            neighbour.addNeighbour(cur); // this graph is undirectional, i.e. both vertices in each connected pair should hold a reference to one another
        }
    }
    
    public List<List<String>> getComponents() {
        List<List<String>> components = new ArrayList<>();
        
        Set<Vertex> seen = new HashSet<>();
        for (Vertex vertex: vertexByName.values()) {
            if (seen.contains(vertex)) continue;
            
            components.add(getComponentNames(vertex, seen));
        }
        return components;
    }

    // Depth first search implementation
    private List<String> getComponentNames(Vertex vertex, Set<Vertex> seen) {
        Deque<Vertex> stack = new ArrayDeque<>();
        List<String> names = new ArrayList<>();
        stack.push(vertex);
        seen.add(vertex);
        
        while(!stack.isEmpty()) {
            Vertex current = stack.pop();
            names.add(current.getName());
            
            for (Vertex neighbour: current.getNeighbours()) {
                if (seen.contains(neighbour)) continue;
                
                seen.add(neighbour);
                stack.push(neighbour);
            }
        }
        return names;
    }
    
    private class Vertex {
        private String name;
        private Set<Vertex> neighbours = new HashSet<>();
        
        public Vertex(String name) {
            this.name = name;
        }
    
        public Vertex(String name) {
            this.name = name;
        }
        
        public boolean addNeighbour(Vertex neighbour) {
            return neighbours.add(neighbour);
        }
    
        public String getName() {
            return name;
        }
    
        public Set<Vertex> getNeighbours() {
            return neighbours;
        }
    }
}

main() - demo

public static void main(String[] args) {
    List<List<String>> dictionary =
        List.of(List.of("A","B"), List.of("B","C"),
                List.of("D","E"), List.of("F","C"),
                List.of("G","H"), List.of("T","D"));
    
    Graph graph = Graph.getInstance(dictionary);
    List<List<String>> componentNames = graph.getComponents();

    System.out.println(componentNames);
}

Output

[[A, B, C, F], [T, D, E], [G, H]]

displayName · Accepted Answer · 2022-04-25 19:50:18Z

Un-refined first thought - Index the arrays, and then merge them. Repeat.

Iterate over your arrays in the ArrayList;
Index the items in the array;
Merge the items that are overlapping;
Repeat the same process over the merged result, until there is no overlap.

Using your example:

[A, B] (Call this #1 array)
[B, C] (Call this #2 array)
[D, E] (Call this #3 array)
[F, C] (Call this #4 array)
[G, H] (Call this #5 array)
[T, D] (Call this #6 array)

Now, prepare the index like:

A -> 1   (because A occurs only in array 1)
B -> 1,2 (because B occurs in array 1 and 2)
C -> 2,4 ...
D -> 3,6 ...
E -> 3   ...
F -> 4   ...
G -> 5   ...
T -> 6   ...

Looking at the above index, we know that we should merge 1 and 2, 2 and 4, and, 3 and 6. This will give us:

[A, B, C] (This is our new #1)
[B, C, F] (This is our new #2)
[D, E, T] (This is our new #3)
[G, H]    (This is our new #4)

Repeat the steps with your new ArrayList of arrays. Re-indexing gives...

A -> 1
B -> 1,2
C -> 1,2
D -> 3
E -> 3
F -> 2
G -> 4
H -> 4
T -> 3

Merge the overlaps again. This time only 1 and 2 overlap. Merging it will give you:

[A, B, C, F] (This is our new #1)
[D, E, T]    (This is our new #2)
[G, H]       (This is our new #3)

Once again, re-index,

A -> 1
B -> 1
C -> 1
D -> 2
E -> 3
F -> 1
G -> 3
H -> 3
T -> 2

Since there are no overlapping arrays this time, there is nothing more to merge and that's the final answer.

Bahij.Mik · Accepted Answer · 2022-04-25 20:35:07Z

Adding a solution using Union find, so the goal here is to iterate over all the Strings and while doing so finding the common "leader".

After doing that we will iterate over the dictionary again but this time each String will have a leader, we will bind them to a common leader and then create the merged dictionary

public class UnionFind
{
    private Map<String, String> graph;

    public UnionFind()
    {
        graph = new HashMap<>();
    }

    public String find(String str)
    {
        if (str == null) throw new IllegalArgumentException("Invalid String");

        if (graph.getOrDefault(str, str).equals(str))
            graph.put(str, str);
        else 
            graph.put(str, find(graph.get(str)));

        return graph.get(str);
    }

    public void union(String str1, String str2)
    {
        String root1 = find(str1);
        String root2 = find(str2);


        if (!root1.equals(root2))
        {
            if (root1.equals(str1)) graph.put(graph.get(root1), graph.get(root2));
            else graph.put(graph.get(root2), graph.get(root1));
        }
    }

    public static void main(String[] args)
    {
        List<List<String>> dictionary = prepareDictionary();

        UnionFind unionFind = new UnionFind();

        for (List<String> list : dictionary)
        {
            for (int i = 1; i < list.size(); i++)
            {
                unionFind.union(list.get(i - 1), list.get(i));
            }
        }

        Map<String, Set<String>> map = new HashMap<>();

        for (List<String> list : dictionary)
        {
            for (String str : list)
            {
                String parent = unionFind.find(str);
                if (!map.containsKey(parent))
                    map.put(parent, new LinkedHashSet<>());

                map.get(parent).add(str);
            }
        }

        List<List<String>> result = new ArrayList<>();
        for(Map.Entry<String, Set<String>> entry : map.entrySet())
        {
            result.add(new ArrayList<>(entry.getValue()));
        }

        System.out.println(result);
    }

    private static List<List<String>> prepareDictionary()
    {
        List<List<String>> dictionary = new ArrayList<>();

        dictionary.add(Arrays.asList("A", "B"));
        dictionary.add(Arrays.asList("B", "C"));
        dictionary.add(Arrays.asList("D", "E"));
        dictionary.add(Arrays.asList("F", "C"));
        dictionary.add(Arrays.asList("G", "H"));
        dictionary.add(Arrays.asList("T", "D"));

        return dictionary;
    }

result:

[[A, B, C, F], [D, E, T], [G, H]]

iroli · Accepted Answer · 2022-04-25 22:30:26Z

Here another solution with set of strings for your structure

public void merge(List<String[]> dictionary) {
    List<Set<String>> dictionaryList = dictionary.stream()
            .map(x -> new HashSet<> (Arrays.asList(x))).collect(Collectors.toList());

    for (int i = 0; i < dictionaryList.size() ; i++){
        Set list = dictionaryList.get(i);

        for (int j = i + 1; j < dictionaryList.size() ; j++){
            Set otherList = dictionaryList.get(j);
            Set result = (Set) list.stream().filter(otherList::contains).collect(Collectors.toSet());

            if (!result.isEmpty()) {
                list.addAll(otherList);
                dictionaryList.remove(j);
            }
        }
    }
    System.out.println(dictionaryList);
}

Result

[[A, B, C, F], [D, T, E], [G, H]]

kpcttobee · Accepted Answer · 2022-04-26 00:51:01Z

0

All answers were great thank you! However i couldn't use them because maybe i will need to explain the code and i was not understanding them fully (firts time coding in java yaaay!). What i did in the end was: i saved all the input into list of hashsets because hashsets cant duplicate values, then i ran all the input through the 2 for(nested) loops with if statment

if(return !Collections.disjoint(hashset1,hashset2);)

after that i merged them using set.addAll(hashset1); set.addAll(hashset2);

however it was still not complete and there were a few sets that should have been merged but weren't. so i ran it through the 2for(nested) loops again with same if statement and it worked(i hope). It worked on input with 2000words and i hope it works on input with more words :D Thank everybody for being so helpful.

answered Apr 26, 2022 at 0:51

kpcttobee

676 bronze badges

2 Comments

Community Over a year ago

As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.

Basil Bourque Over a year ago

Please respect your readers enough to use proper case and punctuation. This site is meant to be more like Wikipedia, and less like a casual chat room.

Collectives™ on Stack Overflow

How to merge multiple Arrays containing the same value in Java

6 Answers 6

The code

How does it work?

Main method

Why do we need step 4?

P.s.

9 Comments

Comments

Comments

Comments

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

The code

How does it work?

Main method

Why do we need step 4?

P.s.

9 Comments

Comments

Comments

Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related