拥有强大的Embedding Transform插件是一种什么体验?



Apache SeaTunnelEmbedding转换插件是一个强大的工具,它能够将文本数据转换为向量化表示,从而使得这些数据可以被用于各种机器学习和数据分析任务。这个插件支持多种模型提供商,并且可以轻松集成不同的API。在这篇文档中,我们将深入了解Embedding插件的配置选项,包括如何指定模型提供商、API密钥、自定义配置,以及详细的示例配置,帮助读者理解如何在实际项目中应用这些概念。无论您是想要利用预训练模型还是自定义模型,本文都将为您提供必要的指导和参考。



Embedding 转换插件利用 embedding 模型将文本数据转换为向量化表示。此转换可以应用于各种字段。该插件支持多种模型提供商,并且可以与不同的API集成。




  • embedding_model_provider
    用于生成 embedding 的模型提供商。常见选项包括 DOUBAO、QIANFAN、OPENAI 等,同时可选择 CUSTOM 实现自定义 embedding 模型的请求以及获取。
  • api_key
    用于验证 embedding 服务请求的API密钥。通常由模型提供商在你注册他们的服务时提供。
  • secret_key
  • single_vectorized_input_number
  • vectorization_fields
vectorization_fields {
   book_intro_vector = book_intro
   author_biography_vector  = author_biography
  • model
    要使用的具体 embedding 模型。这取决于embedding_model_provider。例如,如果使用 OPENAI ,可以指定 text-embedding-3-small。
  • api_path
    用于向 embedding 服务发送请求的API。根据提供商和所用模型的不同可能有所变化。通常由模型提供商提供。
  • oauth_path
  • custom_config
  • custom_response_parse
    custom_response_parse 选项允许您指定如何解析模型的响应。您可以使用 JsonPath 从响应中提取所需的特定数据。例如,使用 $.data[*].embedding提取如下json中的 embedding 字段 值,获取 List嵌套 List 的结果。JsonPath 的使用请参考 JsonPath 快速入门(https://github.com/json-path/JsonPath?tab=readme-ov-file#getting-started)。
 "object": "list",
 "data": [
     "object": "embedding",
     "index": 0,
     "embedding": [
 "model": "text-embedding-3-small",
 "usage": {
   "prompt_tokens": 5,
   "total_tokens": 5
  • custom_request_headers
    custom_request_headers选项允许您定义应包含在发送到模型 API 的请求中的自定义头信息。如果 API 需要标准头信息之外的额外头信息,例如授权令牌、内容类型等,这个选项会非常有用。
  • custom_request_body
  • ${model}:用于模型名称的占位符。
  • ${input}:用于确定输入值的占位符,同时根据body value的类型定义请求体请求类型。例如:["${input}"] -> ["input"] ( list)
  • common options
    转换插件的常见参数, 请参考 Transform Plugin(



env {
 job.mode = "BATCH"

source {
 FakeSource {
   row.num = 5
   schema = {
     fields {
       book_id = "int"
       book_name = "string"
       book_intro = "string"
       author_biography = "string"
   rows = [
     {fields = [1, "To Kill a Mockingbird",
     "Set in the American South during the 1930s, To Kill a Mockingbird tells the story of young Scout Finch and her brother, Jem, who are growing up in a world of racial inequality and injustice. Their father, Atticus Finch, is a lawyer who defends a black man falsely accused of raping a white woman, teaching his children valuable lessons about morality, courage, and empathy.",
     "Harper Lee (1926–2016) was an American novelist best known for To Kill a Mockingbird, which won the Pulitzer Prize in 1961. Lee was born in Monroeville, Alabama, and the town served as inspiration for the fictional Maycomb in her novel. Despite the success of her book, Lee remained a private person and published only one other novel, Go Set a Watchman, which was written before To Kill a Mockingbird but released in 2015 as a sequel."
     ], kind = INSERT}
     {fields = [2, "1984",
     "1984 is a dystopian novel set in a totalitarian society governed by Big Brother. The story follows Winston Smith, a man who works for the Party rewriting history. Winston begins to question the Party’s control and seeks truth and freedom in a society where individuality is crushed. The novel explores themes of surveillance, propaganda, and the loss of personal autonomy.",
     "George Orwell (1903–1950) was the pen name of Eric Arthur Blair, an English novelist, essayist, journalist, and critic. Orwell is best known for his works 1984 and Animal Farm, both of which are critiques of totalitarian regimes. His writing is characterized by lucid prose, awareness of social injustice, opposition to totalitarianism, and support of democratic socialism. Orwell’s work remains influential, and his ideas have shaped contemporary discussions on politics and society."
     ], kind = INSERT}
     {fields = [3, "Pride and Prejudice",
     "Pride and Prejudice is a romantic novel that explores the complex relationships between different social classes in early 19th century England. The story centers on Elizabeth Bennet, a young woman with strong opinions, and Mr. Darcy, a wealthy but reserved gentleman. The novel deals with themes of love, marriage, and societal expectations, offering keen insights into human behavior.",
     "Jane Austen (1775–1817) was an English novelist known for her sharp social commentary and keen observations of the British landed gentry. Her works, including Sense and Sensibility, Emma, and Pride and Prejudice, are celebrated for their wit, realism, and biting critique of the social class structure of her time. Despite her relatively modest life, Austen’s novels have gained immense popularity, and she is considered one of the greatest novelists in the English language."
     ], kind = INSERT}
     {fields = [4, "The Great GatsbyThe Great Gatsby",
     "The Great Gatsby is a novel about the American Dream and the disillusionment that can come with it. Set in the 1920s, the story follows Nick Carraway as he becomes entangled in the lives of his mysterious neighbor, Jay Gatsby, and the wealthy elite of Long Island. Gatsby's obsession with the beautiful Daisy Buchanan drives the narrative, exploring themes of wealth, love, and the decay of the American Dream.",
     "F. Scott Fitzgerald (1896–1940) was an American novelist and short story writer, widely regarded as one of the greatest American writers of the 20th century. Born in St. Paul, Minnesota, Fitzgerald is best known for his novel The Great Gatsby, which is often considered the quintessential work of the Jazz Age. His works often explore themes of youth, wealth, and the American Dream, reflecting the turbulence and excesses of the 1920s."
     ], kind = INSERT}
     {fields = [5, "Moby-Dick",
     "Moby-Dick is an epic tale of obsession and revenge. The novel follows the journey of Captain Ahab, who is on a relentless quest to kill the white whale, Moby Dick, that once maimed him. Narrated by Ishmael, a sailor aboard Ahab’s ship, the story delves into themes of fate, humanity, and the struggle between man and nature. The novel is also rich with symbolism and philosophical musings.",
     "Herman Melville (1819–1891) was an American novelist, short story writer, and poet of the American Renaissance period. Born in New York City, Melville gained initial fame with novels such as Typee and Omoo, but it was Moby-Dick, published in 1851, that would later be recognized as his masterpiece. Melville’s work is known for its complexity, symbolism, and exploration of themes such as man’s place in the universe, the nature of evil, and the quest for meaning. Despite facing financial difficulties and critical neglect during his lifetime, Melville’s reputation soared posthumously, and he is now considered one of the great American authors."
     ], kind = INSERT}
   plugin_output = "fake"

transform {
 Embedding {
   plugin_input = "fake"
   embedding_model_provider = QIANFAN
   model = bge_large_en
   api_key = xxxxxxxxxx
   secret_key = xxxxxxxxxx
   api_path = xxxxxxxxxx
   vectorization_fields {
       book_intro_vector = book_intro
       author_biography_vector  = author_biography
   plugin_output = "embedding_output"

sink {
 Assert {
     plugin_input = "embedding_output"

     rules =
         field_rules = [
             field_name = book_id
             field_type = int
             field_value = [
                 rule_type = NOT_NULL
             field_name = book_name
             field_type = string
             field_value = [
                 rule_type = NOT_NULL
             field_name = book_intro
             field_type = string
             field_value = [
                 rule_type = NOT_NULL
             field_name = author_biography
             field_type = string
             field_value = [
                 rule_type = NOT_NULL
             field_name = book_intro_vector
             field_type = float_vector
             field_value = [
                 rule_type = NOT_NULL
             field_name = author_biography_vector
             field_type = float_vector
             field_value = [
                 rule_type = NOT_NULL

  • 自定义Embedding模型
env {
 job.mode = "BATCH"

source {
 FakeSource {
   row.num = 5
   schema = {
     fields {
       book_id = "int"
       book_name = "string"
       book_intro = "string"
       author_biography = "string"
   rows = [
     {fields = [1, "To Kill a Mockingbird",
     "Set in the American South during the 1930s, To Kill a Mockingbird tells the story of young Scout Finch and her brother, Jem, who are growing up in a world of racial inequality and injustice. Their father, Atticus Finch, is a lawyer who defends a black man falsely accused of raping a white woman, teaching his children valuable lessons about morality, courage, and empathy.",
     "Harper Lee (1926–2016) was an American novelist best known for To Kill a Mockingbird, which won the Pulitzer Prize in 1961. Lee was born in Monroeville, Alabama, and the town served as inspiration for the fictional Maycomb in her novel. Despite the success of her book, Lee remained a private person and published only one other novel, Go Set a Watchman, which was written before To Kill a Mockingbird but released in 2015 as a sequel."
     ], kind = INSERT}
     {fields = [2, "1984",
     "1984 is a dystopian novel set in a totalitarian society governed by Big Brother. The story follows Winston Smith, a man who works for the Party rewriting history. Winston begins to question the Party’s control and seeks truth and freedom in a society where individuality is crushed. The novel explores themes of surveillance, propaganda, and the loss of personal autonomy.",
     "George Orwell (1903–1950) was the pen name of Eric Arthur Blair, an English novelist, essayist, journalist, and critic. Orwell is best known for his works 1984 and Animal Farm, both of which are critiques of totalitarian regimes. His writing is characterized by lucid prose, awareness of social injustice, opposition to totalitarianism, and support of democratic socialism. Orwell’s work remains influential, and his ideas have shaped contemporary discussions on politics and society."
     ], kind = INSERT}
     {fields = [3, "Pride and Prejudice",
     "Pride and Prejudice is a romantic novel that explores the complex relationships between different social classes in early 19th century England. The story centers on Elizabeth Bennet, a young woman with strong opinions, and Mr. Darcy, a wealthy but reserved gentleman. The novel deals with themes of love, marriage, and societal expectations, offering keen insights into human behavior.",
     "Jane Austen (1775–1817) was an English novelist known for her sharp social commentary and keen observations of the British landed gentry. Her works, including Sense and Sensibility, Emma, and Pride and Prejudice, are celebrated for their wit, realism, and biting critique of the social class structure of her time. Despite her relatively modest life, Austen’s novels have gained immense popularity, and she is considered one of the greatest novelists in the English language."
     ], kind = INSERT}
     {fields = [4, "The Great GatsbyThe Great Gatsby",
     "The Great Gatsby is a novel about the American Dream and the disillusionment that can come with it. Set in the 1920s, the story follows Nick Carraway as he becomes entangled in the lives of his mysterious neighbor, Jay Gatsby, and the wealthy elite of Long Island. Gatsby's obsession with the beautiful Daisy Buchanan drives the narrative, exploring themes of wealth, love, and the decay of the American Dream.",
     "F. Scott Fitzgerald (1896–1940) was an American novelist and short story writer, widely regarded as one of the greatest American writers of the 20th century. Born in St. Paul, Minnesota, Fitzgerald is best known for his novel The Great Gatsby, which is often considered the quintessential work of the Jazz Age. His works often explore themes of youth, wealth, and the American Dream, reflecting the turbulence and excesses of the 1920s."
     ], kind = INSERT}
     {fields = [5, "Moby-Dick",
     "Moby-Dick is an epic tale of obsession and revenge. The novel follows the journey of Captain Ahab, who is on a relentless quest to kill the white whale, Moby Dick, that once maimed him. Narrated by Ishmael, a sailor aboard Ahab’s ship, the story delves into themes of fate, humanity, and the struggle between man and nature. The novel is also rich with symbolism and philosophical musings.",
     "Herman Melville (1819–1891) was an American novelist, short story writer, and poet of the American Renaissance period. Born in New York City, Melville gained initial fame with novels such as Typee and Omoo, but it was Moby-Dick, published in 1851, that would later be recognized as his masterpiece. Melville’s work is known for its complexity, symbolism, and exploration of themes such as man’s place in the universe, the nature of evil, and the quest for meaning. Despite facing financial difficulties and critical neglect during his lifetime, Melville’s reputation soared posthumously, and he is now considered one of the great American authors."
     ], kind = INSERT}
   plugin_output = "fake"

transform {
Embedding {
   plugin_input = "fake"
   model_provider = CUSTOM
   model = text-embedding-3-small
   api_key = xxxxxxxx
   api_path = "http://mockserver:1080/v1/doubao/embedding"
   single_vectorized_input_number = 2
   vectorization_fields {
       book_intro_vector = book_intro
       author_biography_vector  = author_biography
       custom_response_parse = "$.data[*].embedding"
       custom_request_headers = {
           "Content-Type"= "application/json"
           "Authorization"= "Bearer xxxxxxx
       custom_request_body ={
           modelx = "${model}"
           inputx = ["${input}"]
   plugin_output = "embedding_output_1"

sink {
 Assert {
     plugin_input = "embedding_output_1"
     rules =
         field_rules = [
             field_name = book_id
             field_type = int
             field_value = [
                 rule_type = NOT_NULL
             field_name = book_name
             field_type = string
             field_value = [
                 rule_type = NOT_NULL
             field_name = book_intro
             field_type = string
             field_value = [
                 rule_type = NOT_NULL
             field_name = author_biography
             field_type = string
             field_value = [
                 rule_type = NOT_NULL
             field_name = book_intro_vector
             field_type = float_vector
             field_value = [
                 rule_type = NOT_NULL
             field_name = author_biography_vector
             field_type = float_vector
             field_value = [
                 rule_type = NOT_NULL

Apache SeaTunnel

Apache SeaTunnel 是一个分布式、高性能、易扩展、用于海量数据(离线&实时)同步和转化的数据集成平台



Apache SeaTunnel 下载地址:

我们相信,在「Community Over Code」(社区大于代码)、「Open and Cooperation」(开放协作)、「Meritocracy」(精英管理)、以及「多样性与共识决策」等 The Apache Way 的指引下,我们将迎来更加多元化和包容的社区生态,共建开源精神带来的技术进步!

我们诚邀各位有志于让本土开源立足全球的伙伴加入 SeaTunnel 贡献者大家庭,一起共建开源!



订阅社区开发邮件列表 : 


加入 Slack:

关注 X.com: 