Explore

This are public items saved by our community

QR Code
The latest on everything AppSignal | AppSignal Blog

You can show this QR Code to a friend or ask them to scan directly on your screen!

Thanks for sharing! 🫶

The url for this was also copied to your clipboard!
QR Code
Structs and Embedded Schemas in Elixir: Beyond Maps | AppSignal Blog

You can show this QR Code to a friend or ask them to scan directly on your screen!

Thanks for sharing! 🫶

The url for this was also copied to your clipboard!
QR Code
Polymorphism in Elixir | Joseph Koski’s Blog

You can show this QR Code to a friend or ask them to scan directly on your screen!

Thanks for sharing! 🫶

The url for this was also copied to your clipboard!
QR Code
Polymorphism and Behavior Injection | Joseph Koski’s Blog

You can show this QR Code to a friend or ask them to scan directly on your screen!

Thanks for sharing! 🫶

The url for this was also copied to your clipboard!
QR Code
Joseph Koski’s Blog | Blog by Joseph Koski, author of Advanced Functional Programming with Elixir (PragProg) and creator of the Funx library.

You can show this QR Code to a friend or ask them to scan directly on your screen!

Thanks for sharing! 🫶

The url for this was also copied to your clipboard!
QR Code
Blog | Bego.dev

You can show this QR Code to a friend or ask them to scan directly on your screen!

Thanks for sharing! 🫶

The url for this was also copied to your clipboard!

Motivation behind this

Hugo ‘s Elixir Radar newsletter recently posted this very useful article on batch updates with ecto by Fabian Becker, that you should definitely read.

This reminded me of some cool stuff that we used to do with ecto working at V7 in the early days of the Darwin product. We actually no longer do this now, because it’s a much larger system, so it’s all about caching, rather than overoptimising, but back then, it fixed a bottleneck, got ous out of some serious downtime and improved performance by an order of mangitude.

We still do cool stuff, of course, but just different kinds of cool stuff.

Anyway, this will be a brief on two of these cool things

Using SQL to specify values, avoiding roundtrips during inserts

The UNNEST method Fabian uses in his article to pass in arrays of values for batch updates can also be used with inserts. In fact, you can Repo.insert_all a query that does a select of relevant data and it works great.

This has a really good use for aggregates, or for any sort of data transformation.

For a simple example, say you want to track the number of posts and comments on a daily basis. You will have a cron job or something similar.

Basic solution

post_count  =  Repo . aggregate ( Post ,  :count ,  :id )  comment_count  =  Repo . aggregate ( Comment ,  :count ,  :id )  Repo . insert_all ( Metric ,  [ %{  post_count :  post_count ,  comment_count :  comment_count ,  date :  Date . today ( )  } ] )

This works great, but it requires n extra queries for n fields of the metric. Here, it’s just two, so that’s fine, but in reality, it will be more, and each could be expensive.

So the tempation is to try and offload that too the db by just doing one complex query.

Advanced solution

source  =  Post  |>  join ( :left ,  [ p ] ,  c  in  Comments ,  on :  true )  |>  select ( %{  post_count :  count ( p . id ) ,  comment_count :  count ( c . id ) ,  date :  fragment ( "TODAY()" )  } )  Repo . insert_all ( Metric ,  source )

Ok, so now it’s just a single trip to the database, but I can almost guarantee in this basic example, it’s overall slower.

In other examples, it often WILL be faster, but not as fast as it could be.

Sidenote: Our Scenario at V7

In our scenario, we were recording a metric every 30 seconds and it was 7 fields, across 3 or 4 tables. So the join was more efficient than querying separately for each field, but it was still a very expensive join.

Eventually, we got to a point where it could not finish in 30 seconds, resulting in resource starvation and downtime.

Expert Solution

We solved it by making use of an insert feature most people aren’t aware off. Your source can be a list of maps, but the values for the map keys can be queries returning 1 value.

In the simple example

Repo . insert_all ( Metric ,  [ %{  post_count :  Post  |>  select ( [ p ] ,  count ( p . id ) ) ,  comment_count :  Comment  |>  select ( [ c ] ,  count ( c . id ) ) ,  date :  Date . today ( )  } ] )

We’ve eliminated a join, we still do just one trip to the db and it’s way faster.

Now, it’s not always going to be faster. It really all depends on how expensive your join is. But at some point, as the join gets expensive enough, this approach definitely wins out.

Master Solution

Of course, as your system becomes more complex, you just won’t be dealing with these kinds of optimisations. Instead, you’ll have some caching for these counts within Elixir, and simply insert a record using counts in the cache. You’ll probably also be recording these metrics in some sort of queue, so that if you get to a point where they get too expensive, you don’t end up in downtime.

Using placeholders in inserts

Let’s have a look at the expert solution one more time.

Repo . insert_all ( Metric ,  [ %{  post_count :  Post  |>  select ( [ p ] ,  count ( p . id ) ) ,  comment_count :  Comment  |>  select ( [ c ] ,  count ( c . id ) ) ,  date :  Date . today ( )  } ] )

This query only inserts one record, and as part of that, it requires passing in one argument, Date.today() from Elixir into Postgres. That’s fine here, but what if we’re in a scenario where we’re inserting more records?

For example, we’re importing posts from a CSV, timestamps being a common example.

now  =  NaiveDateTime . utc_now  post_data  =  file  |>  Enum . map ( & String . split ( &1 ,  " \n " ) )  |>  Enum . map ( & String . split ( &1 ,  "," ) )  |>  Enum . map ( fn  [ title ,  body ]  ->  %{  title :  title ,  body :  body ,  inserted_at :  now ,  updated_at :  now  }  end )  Repo . insert_all ( post ,  post_data )

This will work, but if there are 1000 entries, we are sending 2000 copies of the value now into the database. That’s inefficient.

The Ecto team thought of this, though, by supporting placeholders.

now  =  NaiveDateTime . utc_now  placeholders  =  %{ now :  now }  post_data  =  file  |>  Enum . map ( & String . split ( &1 ,  " \n " ) )  |>  Enum . map ( & String . split ( &1 ,  "," ) )  |>  Enum . map ( fn  [ title ,  body ]  ->  %{  title :  title ,  body :  body ,  inserted_at :  { :placeholder ,  :now } ,  updated_at :  { :placeholder ,  :now }  }  end )  Repo . insert_all ( post ,  post_data ,  placeholders :  placeholders )

With this approach, we get the same result, but only send a single copy of the value now .

Isn’t that awesome!?

QR Code
Advanced inserts with Ecto | Bego.dev

You can show this QR Code to a friend or ask them to scan directly on your screen!

Thanks for sharing! 🫶

The url for this was also copied to your clipboard!
QR Code
agentjido/jido: 🤖 Autonomous agent framework for Elixir. Built for distributed, autonomous behavior and dynamic workflows.

You can show this QR Code to a friend or ask them to scan directly on your screen!

Thanks for sharing! 🫶

The url for this was also copied to your clipboard!
QR Code
just-every/code: Fast, effective, mind-blowing, coding CLI. Browser integration, multi-agents, theming, and reasoning control. Orchestrate agents from OpenAI, Claude, Gemini or any provider.

You can show this QR Code to a friend or ask them to scan directly on your screen!

Thanks for sharing! 🫶

The url for this was also copied to your clipboard!
QR Code
9001/copyparty: Portable file server with accelerated resumable uploads, dedup, WebDAV, FTP, TFTP, zeroconf, media indexer, thumbnails++ all in one file, no deps

You can show this QR Code to a friend or ask them to scan directly on your screen!

Thanks for sharing! 🫶

The url for this was also copied to your clipboard!
QR Code
awesome-selfhosted/awesome-selfhosted: A list of Free Software network services and web applications which can be hosted on your own servers

You can show this QR Code to a friend or ask them to scan directly on your screen!

Thanks for sharing! 🫶

The url for this was also copied to your clipboard!
QR Code
Web Application Firewall | SafePoint

You can show this QR Code to a friend or ask them to scan directly on your screen!

Thanks for sharing! 🫶

The url for this was also copied to your clipboard!
QR Code
bunkerity/bunkerweb: 🛡️ Open-source and next-generation Web Application Firewall (WAF)

You can show this QR Code to a friend or ask them to scan directly on your screen!

Thanks for sharing! 🫶

The url for this was also copied to your clipboard!
QR Code
deepbeepmeep/Wan2GP: A fast AI Video Generator for the GPU Poor. Supports Wan 2.1/2.2, Hunyuan Video, LTX Video and Flux.

You can show this QR Code to a friend or ask them to scan directly on your screen!

Thanks for sharing! 🫶

The url for this was also copied to your clipboard!
QR Code
Polar — Payment infrastructure for the 21st century | Polar

You can show this QR Code to a friend or ask them to scan directly on your screen!

Thanks for sharing! 🫶

The url for this was also copied to your clipboard!
QR Code
tranek/GASDocumentation: My understanding of Unreal Engine 5's GameplayAbilitySystem plugin with a simple multiplayer sample project.

You can show this QR Code to a friend or ask them to scan directly on your screen!

Thanks for sharing! 🫶

The url for this was also copied to your clipboard!
QR Code
Unreal Engine 5 - The truth of the Gameplay Ability System - Devtricks

You can show this QR Code to a friend or ask them to scan directly on your screen!

Thanks for sharing! 🫶

The url for this was also copied to your clipboard!

TL;DR

  1. We propose UniVG-R1, a reasoning guided MLLM for universal visual grounding, which employs GRPO training combined with a cold-start initialization to effectively enhance reasoning capabilities across multimodal contexts.
  2. A high-quality CoT grounding dataset is introduced, encompassing diverse tasks, each meticulously annotated with detailed reasoning chains to facilitate advanced reasoning-based grounding.
  3. We identify a difficulty bias in GRPO training, and propose a difficulty-aware weight adjustment strategy. Experiments validate that GRPO equipped with this strategy consistently enhance the model performance.
  4. Extensive experiments demonstrate that our model achieves state-of-the-art performance across multiple grounding benchmarks, showcasing its versatility and generalizability.

Interpolate start reference image. UniVG-R1 tackles a wide range of visual grounding tasks with complex and implicit instructions. By combining GRPO training with a cold-start initialization, it effectively reasons over instructions and visual inputs, significantly improving grounding performance. Our model achieves state-of-the-art results on MIG-Bench and exhibits superior zero-shot performance on four reasoning-guided grounding benchmarks with an average 23.4% improvement.

Abstract

Traditional visual grounding methods primarily focus on single-image scenarios with simple textual references. However, extending these methods to real-world scenarios that involve implicit and complex instructions, particularly in conjunction with multiple images, poses significant challenges, which is mainly due to the lack of advanced reasoning ability across diverse multi-modal contexts. In this work, we aim to address the more practical universal grounding task, and propose UniVG-R1, a reasoning guided multimodal large language model (MLLM) for universal visual grounding, which enhances reasoning capabilities through reinforcement learning (RL) combined with cold-start data. Specifically, we first construct a high-quality Chain-of-Thought (CoT) grounding dataset, annotated with detailed reasoning chains, to guide the model towards correct reasoning paths via supervised fine-tuning. Subsequently, we perform rule-based reinforcement learning to encourage the model to identify correct reasoning chains, thereby incentivizing its reasoning capabilities. In addition, we identify a difficulty bias arising from the prevalence of easy samples as RL training progresses, and we propose a difficulty-aware weight adjustment strategy to further strengthen the performance. Experimental results demonstrate the effectiveness of UniVG-R1, which achieves state-of-the-art performance on MIG-Bench with a 9.1% improvement over the previous method. Furthermore, our model exhibits strong generalizability, achieving an average improvement of 23.4% in zero-shot performance across four image and video reasoning grounding benchmarks.

Pipeline

Interpolate start reference image.

We adopt a two-stage training process. The first stage employs CoT-SFT, with the training data construction shown in (a). The second stage utilizes GRPO equipped with a difficulty-aware weight adjustment strategy in (b). The GRPO training process is illustrated in (c), where the policy model generates multiple responses, and each is assigned a distinct reward.

Results

Interpolate start reference image. Interpolate start reference image.

Difficulty-Aware Weight Adjustment Strategy

During the stage 2 reinforcement learning process, we observe that most samples progressively become easier for the model, with the proportion of easy samples increasing and the proportion of hard samples steadily decreases. Since the GRPO algorithm normalizes rewards to calculate the relative advantage within each group, easy samples (e.g., (\textit{mIoU}) = 0.8) receives the same policy gradient update as hard samples (e.g., (\textit{mIoU}) = 0.2). This leads to a difficulty-bias issue. In particular, during the later stages of training, as easy samples become predominant, most updates are derived from these easier instances, making it difficult for the model to focus on hard samples.

To address this problem, we propose a difficulty-aware weight adjustment strategy, which dynamically adjusts the weight of each sample based on its difficulty. Specifically, we introduce a difficulty coefficient ( \phi \propto -\textit{mIoU} ) to quantify the difficulty level of each sample, where the function ( \phi ) is negatively correlated with (\textit{mIoU}). This coefficient dynamically adjusts the sample weights by computing the average accuracy reward of different responses for each sample. The detailed formula is provided below. Interpolate start reference image. [ \mathcal{J}{GRPO}(\theta) = \mathbb{E}{q \sim P(Q), {o_i}{i=1}^G \sim \pi{\theta_{old}}(O|q)} \left[ \frac{1}{G}\sum_{i=1}^G {\color{blue} \phi(\mathit{mIoU})} \frac{\pi_{\theta}(o_i|q)}{\pi_{\theta_{old}}(o_i|q)}A_i - \beta\mathbb{D}{KL}(\pi{\theta}||\pi_{ref}) \right] ] Interpolate start reference image.

Visualization

Interpolate start reference image.

Acknowledgement

Our work is primarily based on Migician , VLM-R1 , LLaMA-Factory , lmms-eval . We are sincerely grateful for their excellent works.

BibTeX

`@article{bai2025univg,
      title={UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning},
      author={Bai, Sule and Li, Mingxing and Liu, Yong and Tang, Jing and Zhang, Haoji and Sun, Lei and Chu, Xiangxiang and Tang, Yansong},
      journal={arXiv preprint arXiv:2505.14231},
      year={2025}
}`
visual grounding
reinforcement learning
QR Code
UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning

You can show this QR Code to a friend or ask them to scan directly on your screen!

Thanks for sharing! 🫶

The url for this was also copied to your clipboard!
QR Code
yanboding/MTVCrafter · Hugging Face

You can show this QR Code to a friend or ask them to scan directly on your screen!

Thanks for sharing! 🫶

The url for this was also copied to your clipboard!
QR Code
Kraigie/nostrum: Elixir Discord Library

You can show this QR Code to a friend or ask them to scan directly on your screen!

Thanks for sharing! 🫶

The url for this was also copied to your clipboard!
QR Code
clemcer/LoggiFly: Get Alerts from your Docker Container Logs

You can show this QR Code to a friend or ask them to scan directly on your screen!

Thanks for sharing! 🫶

The url for this was also copied to your clipboard!
QR Code
clemcer/LoggiFly: Get Alerts from your Docker Container Logs

You can show this QR Code to a friend or ask them to scan directly on your screen!

Thanks for sharing! 🫶

The url for this was also copied to your clipboard!
QR Code
MAZANOKE | Online Image Optimizer That Runs Privately in Your Browser

You can show this QR Code to a friend or ask them to scan directly on your screen!

Thanks for sharing! 🫶

The url for this was also copied to your clipboard!
QR Code
Colanode - Open-source & local-first Slack and Notion alternative

You can show this QR Code to a friend or ask them to scan directly on your screen!

Thanks for sharing! 🫶

The url for this was also copied to your clipboard!