กูเกิลเปิดตัว Gemini 2.0 ตอบเป็นเสียงได้ด้วย เขียนโค้ดเก่งกว่า Gemini 1.5 Pro

By: lew

on 11 December 2024 - 20:19 Tags:

Topics:

Gemini

Google

LLM

Artificial Intelligence

กูเกิลเปิดตัว Gemini 2.0 Flash รุ่นทดสอบ เป็นโมเดลแรกในกลุ่ม Gemini 2.0 ที่น่าจะเปิดตัวตามกันออกมา โดยความสามารถสำคัญคือการตอบคำถามด้วย ภาพ, ข้อความ, และเสียง โดยไม่ต้องใช้โมเดลอื่นๆ มาสร้างภาพให้

ผลทดสอบของ Gemini 2.0 Flash ดีขึ้นในการทดสอบสำคัญๆ หลายส่วนโดยเฉพาะการทดสอบการเขียนโค้ด, คณิตศาสตร์, และความรู้ทั่วไป สามารถทำคะแนนได้ดีกว่า Gemini 1.5 Pro เสียอีก อย่างไรก็ดีคะแนนบางส่วนแย่ลงกว่า Gemini 1.5 Flash บ้าง เช่น คะแนนทดสอบการแปลเสียงเป็นข้อความ หรือการทำความเข้าใจข้อมูลขนาดยาว แต่คะแนนก็ลดลงไม่มากนัก สำหรับนักพัฒนา Gemini 2.0 สามารถค้นกูเกิล, รันโค้ด, และเรียกฟังก์ชั่นภายนอกได้ในตัว

กูเกิลทดสอบความสามารถชอง Gemini 2.0 ด้วย โครงการสาธิตต่างๆ ได้แก่

Project Astra ที่สามารถวิดีโอคอลกับ Gemini ได้ต่อเนื่อง สามารถจำข้อมูลต่างๆ ในวิดีโอย้อนหลังได้ถึง 10 นาที
Project Mariner ส่วนเสริม Chrome ที่ทำความเข้าใจหน้าเว็บทำตัวเป็นผู้ช่วยที่สามารถทำตามคำสั่งผู้ใช้
Jules ปัญญาประดิษฐ์ช่วยนักพัฒนา สามารถอ่านข้อมูลใน issue, เขียนโค้ด, และแก้ไขด้วยตัวเอง
Agent in games สร้างปัญญาประดิษฐ์สำหรับควบคุม แนะนำผู้ใช้ว่าควรทำอะไรต่อ เปิดทางการใช้งาน Genini ควบคุมหุ่นยนต์ต่อไปในอนาคต

นักพัฒนาสามารถเรียกใช้ Gemini 2.0 Flash ผ่านทาง API ใหม่ ชื่อว่า Multimodal Live API ที่เปิดให้สตรีมเสียงและวิดีโอเข้าไปยังโมเดลได้ต่อเนื่อง โดยเรียกใช้ได้ทั้ง Google AI Studio และ Google Cloud Vertex AI

สำหรับผู้ใช้ทั่วไปสามารถเรียกใช้ Gemini 2.0 Flash ผ่านทางแอป Gemini

ที่มา - Google Blog

No Description

CAPABILITY	BENCHMARK	DESCRIPTION	Gemini 1.5 Flash 002	Gemini 1.5 Pro 002	Gemini 2.0 Experimental Flash
General	MMLU-Pro	Enhanced version of popular MMLU dataset with questions across multiple subjects with higher difficulty tasks	67.3%	75.8%	76.4%
Code	Natural2Code	Code generation across Python, Java, C++, JS, Go. Held out dataset HumanEval-like, not leaked on the web	79.8%	85.4%	92.9%
	Bird-SQL (Dev)	Benchmark evaluating converting natural language questions into executable SQL	45.6%	54.4%	56.9%
	LiveCodeBench (Code Generation)	Code generation in Python. Code Generation subset covering more recent examples: 06/01/2024-10/05/2024	30.0%	34.3%	35.1%
Factuality	FACTS Grounding	Ability to provide factuality correct responses given documents and diverse user requests. Held out internal dataset	82.9%	80.0%	83.6%
Math	MATH	Challenging math problems (incl. algebra, geometry, pre-calculus, and others)	77.9%	86.5%	89.7%
	HiddenMath	Competition-level math problems, Held out dataset AIME/AMC-like, crafted by experts and not leaked on the web	47.2%	52.0%	63.0%
Reasoning	GPQA (diamond)	Challenging dataset of questions written by domain experts in biology, physics, and chemistry	51.0%	59.1%	62.1%
Long context	MRCR (1M)	Novel, diagnostic long-context understanding evaluation	71.9%	82.6%	69.2%
Image	MMMU	Multi-discipline college-level multimodal understanding and reasoning problems	62.3%	65.9%	70.7%
	Vibe-Eval (Reka)	Visual understanding in chat models with challenging everyday examples. Evaluated with a Gemini Flash model as a rater	48.9%	53.9%	56.3%
Audio	CoVoST2 (21 lang)	Automatic speech translation (BLEU score)	37.4	40.1	39.2
Video	EgoSchema (test)	Video analysis across multiple domains	66.8%	71.2%	71.5%