Explore
This are public items saved by our community
Motivation behind this
Hugo ‘s Elixir Radar newsletter recently posted this very useful article on batch updates with ecto by Fabian Becker, that you should definitely read.
This reminded me of some cool stuff that we used to do with ecto working at V7 in the early days of the Darwin product. We actually no longer do this now, because it’s a much larger system, so it’s all about caching, rather than overoptimising, but back then, it fixed a bottleneck, got ous out of some serious downtime and improved performance by an order of mangitude.
We still do cool stuff, of course, but just different kinds of cool stuff.
Anyway, this will be a brief on two of these cool things
Using SQL to specify values, avoiding roundtrips during inserts
The UNNEST
method Fabian uses in his article to pass in arrays of values for batch updates can also be used with inserts. In fact, you can Repo.insert_all
a query that does a select of relevant data and it works great.
This has a really good use for aggregates, or for any sort of data transformation.
For a simple example, say you want to track the number of posts and comments on a daily basis. You will have a cron job or something similar.
Basic solution
post_count = Repo . aggregate ( Post , :count , :id ) comment_count = Repo . aggregate ( Comment , :count , :id ) Repo . insert_all ( Metric , [ %{ post_count : post_count , comment_count : comment_count , date : Date . today ( ) } ] )
This works great, but it requires n extra queries for n fields of the metric. Here, it’s just two, so that’s fine, but in reality, it will be more, and each could be expensive.
So the tempation is to try and offload that too the db by just doing one complex query.
Advanced solution
source = Post |> join ( :left , [ p ] , c in Comments , on : true ) |> select ( %{ post_count : count ( p . id ) , comment_count : count ( c . id ) , date : fragment ( "TODAY()" ) } ) Repo . insert_all ( Metric , source )
Ok, so now it’s just a single trip to the database, but I can almost guarantee in this basic example, it’s overall slower.
In other examples, it often WILL be faster, but not as fast as it could be.
Sidenote: Our Scenario at V7
In our scenario, we were recording a metric every 30 seconds and it was 7 fields, across 3 or 4 tables. So the join was more efficient than querying separately for each field, but it was still a very expensive join.
Eventually, we got to a point where it could not finish in 30 seconds, resulting in resource starvation and downtime.
Expert Solution
We solved it by making use of an insert feature most people aren’t aware off. Your source can be a list of maps, but the values for the map keys can be queries returning 1 value.
In the simple example
Repo . insert_all ( Metric , [ %{ post_count : Post |> select ( [ p ] , count ( p . id ) ) , comment_count : Comment |> select ( [ c ] , count ( c . id ) ) , date : Date . today ( ) } ] )
We’ve eliminated a join, we still do just one trip to the db and it’s way faster.
Now, it’s not always going to be faster. It really all depends on how expensive your join is. But at some point, as the join gets expensive enough, this approach definitely wins out.
Master Solution
Of course, as your system becomes more complex, you just won’t be dealing with these kinds of optimisations. Instead, you’ll have some caching for these counts within Elixir, and simply insert a record using counts in the cache. You’ll probably also be recording these metrics in some sort of queue, so that if you get to a point where they get too expensive, you don’t end up in downtime.
Using placeholders in inserts
Let’s have a look at the expert solution one more time.
Repo . insert_all ( Metric , [ %{ post_count : Post |> select ( [ p ] , count ( p . id ) ) , comment_count : Comment |> select ( [ c ] , count ( c . id ) ) , date : Date . today ( ) } ] )
This query only inserts one record, and as part of that, it requires passing in one argument, Date.today()
from Elixir into Postgres. That’s fine here, but what if we’re in a scenario where we’re inserting more records?
For example, we’re importing posts from a CSV, timestamps being a common example.
now = NaiveDateTime . utc_now post_data = file |> Enum . map ( & String . split ( &1 , " \n " ) ) |> Enum . map ( & String . split ( &1 , "," ) ) |> Enum . map ( fn [ title , body ] -> %{ title : title , body : body , inserted_at : now , updated_at : now } end ) Repo . insert_all ( post , post_data )
This will work, but if there are 1000 entries, we are sending 2000 copies of the value now
into the database. That’s inefficient.
The Ecto team thought of this, though, by supporting placeholders.
now = NaiveDateTime . utc_now placeholders = %{ now : now } post_data = file |> Enum . map ( & String . split ( &1 , " \n " ) ) |> Enum . map ( & String . split ( &1 , "," ) ) |> Enum . map ( fn [ title , body ] -> %{ title : title , body : body , inserted_at : { :placeholder , :now } , updated_at : { :placeholder , :now } } end ) Repo . insert_all ( post , post_data , placeholders : placeholders )
With this approach, we get the same result, but only send a single copy of the value now
.
Isn’t that awesome!?
TL;DR
- We propose UniVG-R1, a reasoning guided MLLM for universal visual grounding, which employs GRPO training combined with a cold-start initialization to effectively enhance reasoning capabilities across multimodal contexts.
- A high-quality CoT grounding dataset is introduced, encompassing diverse tasks, each meticulously annotated with detailed reasoning chains to facilitate advanced reasoning-based grounding.
- We identify a difficulty bias in GRPO training, and propose a difficulty-aware weight adjustment strategy. Experiments validate that GRPO equipped with this strategy consistently enhance the model performance.
- Extensive experiments demonstrate that our model achieves state-of-the-art performance across multiple grounding benchmarks, showcasing its versatility and generalizability.
UniVG-R1 tackles a wide range of visual grounding tasks with complex and implicit instructions. By combining GRPO training with a cold-start initialization, it effectively reasons over instructions and visual inputs, significantly improving grounding performance. Our model achieves state-of-the-art results on MIG-Bench and exhibits superior zero-shot performance on four reasoning-guided grounding benchmarks with an average 23.4% improvement.
Abstract
Traditional visual grounding methods primarily focus on single-image scenarios with simple textual references. However, extending these methods to real-world scenarios that involve implicit and complex instructions, particularly in conjunction with multiple images, poses significant challenges, which is mainly due to the lack of advanced reasoning ability across diverse multi-modal contexts. In this work, we aim to address the more practical universal grounding task, and propose UniVG-R1, a reasoning guided multimodal large language model (MLLM) for universal visual grounding, which enhances reasoning capabilities through reinforcement learning (RL) combined with cold-start data. Specifically, we first construct a high-quality Chain-of-Thought (CoT) grounding dataset, annotated with detailed reasoning chains, to guide the model towards correct reasoning paths via supervised fine-tuning. Subsequently, we perform rule-based reinforcement learning to encourage the model to identify correct reasoning chains, thereby incentivizing its reasoning capabilities. In addition, we identify a difficulty bias arising from the prevalence of easy samples as RL training progresses, and we propose a difficulty-aware weight adjustment strategy to further strengthen the performance. Experimental results demonstrate the effectiveness of UniVG-R1, which achieves state-of-the-art performance on MIG-Bench with a 9.1% improvement over the previous method. Furthermore, our model exhibits strong generalizability, achieving an average improvement of 23.4% in zero-shot performance across four image and video reasoning grounding benchmarks.
Pipeline
We adopt a two-stage training process. The first stage employs CoT-SFT, with the training data construction shown in (a). The second stage utilizes GRPO equipped with a difficulty-aware weight adjustment strategy in (b). The GRPO training process is illustrated in (c), where the policy model generates multiple responses, and each is assigned a distinct reward.
Results
Difficulty-Aware Weight Adjustment Strategy
During the stage 2 reinforcement learning process, we observe that most samples progressively become easier for the model, with the proportion of easy samples increasing and the proportion of hard samples steadily decreases. Since the GRPO algorithm normalizes rewards to calculate the relative advantage within each group, easy samples (e.g., (\textit{mIoU}) = 0.8) receives the same policy gradient update as hard samples (e.g., (\textit{mIoU}) = 0.2). This leads to a difficulty-bias issue. In particular, during the later stages of training, as easy samples become predominant, most updates are derived from these easier instances, making it difficult for the model to focus on hard samples.
To address this problem, we propose a difficulty-aware weight adjustment strategy, which dynamically adjusts the weight of each sample based on its difficulty. Specifically, we introduce a difficulty coefficient ( \phi \propto -\textit{mIoU} ) to quantify the difficulty level of each sample, where the function ( \phi ) is negatively correlated with (\textit{mIoU}). This coefficient dynamically adjusts the sample weights by computing the average accuracy reward of different responses for each sample. The detailed formula is provided below.
[
\mathcal{J}{GRPO}(\theta) = \mathbb{E}{q \sim P(Q), {o_i}{i=1}^G \sim \pi{\theta_{old}}(O|q)} \left[
\frac{1}{G}\sum_{i=1}^G {\color{blue} \phi(\mathit{mIoU})} \frac{\pi_{\theta}(o_i|q)}{\pi_{\theta_{old}}(o_i|q)}A_i - \beta\mathbb{D}{KL}(\pi{\theta}||\pi_{ref})
\right]
]
Visualization
Acknowledgement
Our work is primarily based on Migician , VLM-R1 , LLaMA-Factory , lmms-eval . We are sincerely grateful for their excellent works.
BibTeX
`@article{bai2025univg,
title={UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning},
author={Bai, Sule and Li, Mingxing and Liu, Yong and Tang, Jing and Zhang, Haoji and Sun, Lei and Chu, Xiangxiang and Tang, Yansong},
journal={arXiv preprint arXiv:2505.14231},
year={2025}
}`