WeSearch

arrayJoin in ClickHouse: Why Your Rows Are Duplicating (and How to Control It)

·2 min read · 0 reactions · 0 comments · 1 view
arrayJoin in ClickHouse: Why Your Rows Are Duplicating (and How to Control It)

When working with arrays in ClickHouse, arrayJoin feels straightforward. Until your query suddenly...

Original article
DEV Community
Read full at DEV Community →
Full article excerpt tap to expand

try { if(localStorage) { let currentUser = localStorage.getItem('current_user'); if (currentUser) { currentUser = JSON.parse(currentUser); if (currentUser.id === 3228500) { document.getElementById('article-show-container').classList.add('current-user-is-article-author'); } } } } catch (e) { console.error(e); } Mohamed Hussain S Posted on Apr 28 arrayJoin in ClickHouse: Why Your Rows Are Duplicating (and How to Control It) #clickhouse #sql #database #performance When working with arrays in ClickHouse, arrayJoin feels straightforward. Until your query suddenly returns far more rows than expected. The Use Case Let’s say you have a table like this: CREATE TABLE events ( user_id UInt32, actions Array(String) ) ENGINE = MergeTree ORDER BY user_id; Enter fullscreen mode Exit fullscreen mode Example row: user_id: 1 actions: ['click', 'scroll', 'purchase'] Enter fullscreen mode Exit fullscreen mode Now you want each action as a separate row. The Tool: arrayJoin SELECT user_id, arrayJoin(actions) AS action FROM events; Enter fullscreen mode Exit fullscreen mode Output: 1 click 1 scroll 1 purchase Enter fullscreen mode Exit fullscreen mode So far, everything looks correct. Where Things Go Wrong Now let’s say you write: SELECT user_id, arrayJoin(actions) AS action, arrayJoin(actions) AS action2 FROM events; Enter fullscreen mode Exit fullscreen mode You might expect: 3 rows But you actually get: 9 rows Why This Happens arrayJoin doesn’t just flatten arrays. It expands rows. Each element in the array creates a new row. So when you use it multiple times: First arrayJoin → expands rows Second arrayJoin → expands again Result: 3 elements → 3 × 3 = 9 rows This is effectively a cartesian multiplication of rows. The Hidden Impact This becomes a real problem when: Arrays are large Multiple arrayJoins are used You don’t expect row multiplication Result: Incorrect output Sudden increase in row count Slower queries The Better Approach 1. Use a single arrayJoin when possible SELECT user_id, arrayJoin(actions) AS action FROM events; Enter fullscreen mode Exit fullscreen mode 2. Use ARRAY JOIN syntax (cleaner and explicit) SELECT user_id, action FROM events ARRAY JOIN actions AS action; Enter fullscreen mode Exit fullscreen mode 3. Use arrayZip to avoid unintended multiplication If you’re working with multiple arrays: SELECT user_id, arrayJoin(arrayZip(actions, actions)) AS zipped FROM events; Enter fullscreen mode Exit fullscreen mode This ensures elements are paired instead of multiplied. Why This Matters arrayJoin is powerful-but easy to misuse. If used without understanding: Row count can explode Queries become expensive Results can be misleading Real-World Use Cases Event tracking pipelines Flattening nested JSON Working with semi-structured logs Exploding arrays into rows for analysis One Important Gotcha Every arrayJoin multiplies rows. If your result size looks unexpectedly large, this is one of the first things to check. Final Thoughts arrayJoin is one of the most useful tools in ClickHouse. But its behavior is not always intuitive. In many cases, the issue is not the data itself-but how the query expands it. Understanding this early can save a lot of debugging time. Top comments (0) Subscribe Personal Trusted User Create template Templates let you quickly answer FAQs or store snippets for re-use. Submit Preview Dismiss Code of Conduct • Report abuse Are you sure you want to hide this comment? It will become hidden in your post, but will still be…

This excerpt is published under fair use for community discussion. Read the full article at DEV Community.

Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Email

Discussion

0 comments

More from DEV Community