为了在快节奏的,全球市场,中加速和完善决策,企业可以部署生成人工智能模型来帮助总结和解释通常填充市场摘要和财务报告的图表。

To accelerate and refine decision-making in a fast-paced, global marketplace, enterprises may deploy generative artificial intelligence models to help summarize and interpret the charts that often fill market summaries and financial reports.

但即使是最新的视觉语言模型有时也会难以完成这项任务,,因为它需要一个模型来集成视觉, 数字, 和语言理解。投资最先进模型的公司可能仍然会收到不准确或不完整的信息。

But even the latest vision-language models sometimes struggle with this task, since it requires a model to integrate visual, numerical, and linguistic understanding. A company that invests in a state-of-the-art model might still receive inaccurate or incomplete information.

为了填补这一性能差距,,来自 MIT 和 MIT-IBM 计算研究实验室的研究人员为 AI 用户开发了一个多方面的资源,专门用于教授视觉语言模型 (VLMs) 如何有效地解释图表。

To fill this performance gap, researchers from MIT and the MIT-IBM Computing Research Lab developed a multifaceted resource for AI users that is specifically designed to teach vision-language models (VLMs) how to effectively interpret charts. 

他们使用一种新颖的数据生成方法来构建最先进的数据集,其中包括超过一百万个不同的图表。该数据集还对每个图表图像, 的许多视觉, 语言, 和数字组件进行编码,使模型能够稳健地推理图表中的信息。

They used a novel data generation method to build a state-of-the-art dataset that includes more than a million varied charts. The dataset also encodes many visual, linguistic, and numerical components of each chart image, which enable models to robustly reason about the information in a chart.

研究人员使用这个名为 ChartNet, 的数据集, 来训练一系列开源 VLM。 Many of these smaller models significantly outperformed orders of magnitude larger, commercial models on tasks like data extraction and chart summarization.

The researchers used this dataset, called ChartNet, to train a series of open-source VLMs.  Many of these smaller models significantly outperformed orders of magnitude larger, commercial models on tasks like data extraction and chart summarization.

通过使开源模型超越商业模型, ChartNet 可以让预算有限的小公司更容易地利用人工智能。该开源数据集可用于提高人工智能模型执行业务趋势分析和科学数据解释等任务的能力。

By enabling open-source models to outperform their commercial counterparts, ChartNet could allow small firms with limited budgets to more readily utilize AI. The open-source dataset can be used to improve the capabilities of AI models for tasks like business trend analysis and scientific figure interpretation.

“We developed ChartNet to be a one-stop shop for chart understanding, covering basically anything that an AI model and a practitioner who is training that model might need. We hope our work motivates researchers to achieve state-of-the-art performance with smaller models that don’t require infinite amounts of computation,” says Jovana Kondic, an MIT electrical engineering and computer science (EECS) graduate student and lead author of a paper on ChartNet.

研究人员在开发生成式人工智能模型方面取得了长足进步,这些模型擅长自然语言处理和自然图像推理。但较少的工作集中在解释图表中包含的复杂多模式数据, Kondic 说。

Researchers have made great strides developing generative AI models that excel at natural language processing and reasoning about natural images. But less work has focused on interpreting complex multimodal data contained within charts, Kondic says.

然而,对于几乎每个行业的大型和小型企业来说,, 图表的理解都是一项关键任务。

Yet for large and small businesses in nearly every industry, chart understanding is a critical task.

“金融业依靠图表蓬勃发展。如果视觉语言模型可以从图表中提取信息,,例如趋势描述,,这将促进下游发生的许多工作流程,” Joshi 说。

“The finance industry thrives on charts. If vision-language models can extract information out of charts, like descriptions of trends, that facilitates a lot of workflows that happen downstream,” Joshi says.

缺乏高质量的训练数据是阻碍能够准确解释图表的 VLM 发展的主要瓶颈。许多数据集包含从互联网上获取的有限图表图像,并且通常缺乏必要的比例和附加信息来帮助模型解释基础数据。

The lack of high-quality training data is a major bottleneck holding back the development of VLMs that can accurately interpret charts. Many datasets contain limited chart images pulled from the internet and often lack the necessary scale and additional information to help a model interpret the underlying data.

“A vision-language model, unlike our brains, may need to see thousands of examples during training to reliably recognize something as a line chart,” Kondic says.

The researchers sought to overcome those shortcomings by generating synthetic data.合成数据是通过算法人为生成的,以模仿实际数据的统计特性。

The researchers sought to overcome those shortcomings by generating synthetic data. Synthetic data are artificially generated by algorithms to mimic the statistical properties of actual data. 

The ChartNet dataset holds more a million high-quality chart images, along with the corresponding code used to generate each chart, a textual description, and a table that contains its numerical information. In addition, each datapoint includes question-and-answer pairs to teach the model how to correctly answer questions about the chart image.

“这些附加数据模式指导模型连接和对齐图表图像编码的不同信息,” Kondic 说。

“These additional modes of data guide the model to connect and align the different pieces of information that the chart image encodes,” Kondic says.

为了构建 ChartNet,,研究人员创建了一个两步, 合成数据生成管道。

To build ChartNet, the researchers created a two-step, synthetic data generation pipeline.

First, their automated system translates any pre-existing set of chart images into code. Then the system iteratively augments that code to change different aspects of each chart, such as chart type, data values, topic, colors, etc.

“我们可以从用作种子的单个图表开始,并对其进行数百种增强。这就是我们如何能够构建包含超过一百万个不同图像的数据集,” Kondic 解释道。

“We can start from a single chart that we use as a seed and come up with hundreds of augmentations of it. This is how we were able to build a dataset with more than a million diverse images,” Kondic explains.

他们还采用了自动质量检查流程,以确保合成数据的高质量。此过程验证代码是否可执行,并且渲染的图表图像准确且干净。

They also incorporated an automated quality check process to ensure the synthetic data are high quality. This process verifies that the code is executable and rendered chart images are accurate and clean.

“我们不想’只想生成不同的样本。我们还希望以有意义的方式呈现信息,” 她说。

“We don’t want to just be generating diverse samples. We also want the information to be presented in a meaningful way,” she says.

ChartNet 还包括一系列由人类专家注释的图表数据点。这提供了对其他类型的图表和带有有效性保证的支持数据的访问。

ChartNet also includes a selection of chart datapoints annotated by human experts. This provides access to additional types of charts and supporting data that carry validity guarantees.

A practitioner could use the annotated data to fine-tune an existing VLM, further boosting performance for a specific application, Joshi adds.

研究人员通过训练 IBM的 Granite Vision 系列模型以及其他几个不同规模的开源模型,并在各种图表解释任务中评估它们来测试 ChartNet。该数据集提高了所有模型在图表重建,图表数据提取,图表汇总,和图表问答方面的准确性。

The researchers tested ChartNet by training IBM的 Granite Vision series of models as well as several other open-source models of various sizes and evaluating them on various chart interpretation tasks. The dataset improved the accuracy of all models in chart reconstruction, chart data extraction, chart summarization, and chart question answering. 

借助 ChartNet,,小型开源模型的性能始终优于大型商业模型。

With ChartNet, small open-source models consistently outperformed much larger  commercial models. 

“A 许多先前的训练数据集仅专注于回答有关图表的简单问题。我们试图通过 ChartNet 来超越这一点,生成支持强大图表理解各个方面的数据,” Kondic 说。

“A lot of prior training datasets only focused on answering simple questions about a chart. We tried to go beyond that with ChartNet by generating data that support all aspects of robust chart understanding,” Kondic says.

在未来,,研究人员计划通过合并更复杂的数据来继续扩展 ChartNet。他们还希望借鉴研究界的反馈。

In the future, the researchers plan to continue expanding ChartNet by incorporating data with added levels of complexity. They also want to draw on feedback from the research community. 

这项研究由 MIT-IBM 计算研究实验室资助,,部分,。

This research was funded, in part, by the MIT-IBM Computing Research Lab.