AI Genesis: Weekly News Roundup

🤖 Grok-2, Gemini Live, OpenAI Voice, Rise of Flux AI and more

Hey Genesis Residents!

AI is blowing up this week! 🔥

Google has taken the lead in the voice mode race, leaving OpenAI in the dust.

Meanwhile, FLUX AI is making waves, stealing the spotlight from Midjourney V6.1 with its new image-generation capabilities.

There's a lot to catch up on.

Let’s get scrolling and rolling

📝 Today’s Menu

  • 🤖 xAI Unveils Grok-2 and Grok-2 Mini Models

  • 🗣️ Google Beats OpenAI in Voice Mode Race (Again)

  • 🔎 Meta’s Segment Anything Model 2 (SAM 2)

  • 🖼️ Is Flux AI the New King of Image Generation

  • 🦾 Figure AI Unveils Next-Gen Humanoid Robot

  • 📹 ByteDance Challenges Sora with Jimeng AI

  • 🔊 OpenAI Begins ChatGPT Voice Rollout

  • 🚀 Trending AI Tools

  • 🔥 Quick AI Developments

Read time: 15 minutes

MAIN AI UPDATES

X GROK
🤖 xAI Unveils Grok-2 and Grok-2 Mini Models

Elon Musk’s xAI has released an early preview of Grok-2, a significant upgrade from Grok-1.5, and Grok-2 mini, a smaller yet capable version now available on X.

The Details:     

  • The updated models bring improvements over the previous Grok-1.5, with the Grok-2 mini available to users in X’s paid tiers.

  • The new models show substantial advancements in areas such as graduate-level science knowledge (GPQA), general knowledge (MMLU, MMLU-Pro), and math competition problems (MATH).

  • The new Grok models incorporate image generation using Flux, an AI image generation model known for its realistic imagery.

  • Grok-2 is reported to surpass OpenAI’s GPT-4-Turbo and Anthropic’s Claude 3.5 Sonnet on the LMSYS leaderboard, though this is not independently verified.

The introduction of realistic image generation as a core feature and the redesigned chat interface for Grok-2 and Grok-2 mini represent a significant leap forward.

Despite ongoing controversy over data usage for training AI models, xAI continues to push boundaries in AI development.

GOOGLE GEMINI
🗣️ Google Beats OpenAI in Voice Mode Race

Google has introduced Gemini Live, a cutting-edge mobile conversational AI with advanced voice capabilities, outpacing OpenAI’s ChatGPT voice mode, which remains in its “limited alpha phase” and is not yet universally available.

The Details:  

  • Google’s new voice AI offers “in-depth” hands-free conversations with 10 human-like voice options.

  • Users can interrupt and pose follow-up questions mid-response, simulating a natural conversation flow. Future updates will enable Gemini Live to respond to camera views.

  • Similar to Apple’s forthcoming Intelligence features, Gemini integrates seamlessly with Google, delivering context-aware responses without the need to switch apps.

  • Gemini Live is the default assistant on Google’s Pixel 9 and is now available to all Gemini Advanced subscribers on Android, with an iOS release coming soon.

The progression of real-time voice interaction is transforming AI from a text-based tool into an intuitive partner for collaboration, learning, and consultation.

With OpenAI’s advancements eagerly awaited, Google’s early launch sets a new benchmark in advanced AI voice technology.

META
🔎 Meta’s Segment Anything Model 2 (SAM 2)

Meta just introduced Segment Anything Model 2 (SAM 2), an advanced AI model that can identify and track objects across video frames in real-time, marking a significant leap in video AI.

The Details:     

  • SAM 2 extends Meta's previous image segmentation capabilities to video, addressing challenges like fast movement and object occlusion.

  • The model can segment any object in a video and create cutouts in a few clicks — with a free demo available to try.

  • Meta is also open-sourcing the model and releasing a large, annotated database of 50,000 videos used for training.

With Llama 3.1 last week and now SAM 2, Meta is continuing its strategy of developing massive AI breakthroughs — while making everything open and free to use.

BLACK FOREST LABS
🖼️ Is Flux the New King of Image Generation

Flux AI, developed by Black Forest Labs, is setting new benchmarks in AI image generation, surpassing popular tools like Midjourney.

Flux offers exceptional realism, particularly in rendering people and text, making it a strong contender in the AI space.

  • Flux is an open-source AI image generation model from Black Forest Labs.

  • It is available in three versions: Pro (for commercial use), Dev (mid-weight), and Schnell (fast).

  • Flux surpasses competitors like Midjourney, especially in rendering tasks involving people and text.

  • Future developments include a state-of-the-art text-to-video model.

As Flux AI continues to develop, it could become the go-to tool for AI image generation, potentially disrupting the market and challenging established players like Midjourney and DALL-E.

Tweet of the Day

LATEST DEVELOPMENTS

ROBOTICS
🦾 Figure Unveils Next-Gen Humanoid Robot

Figure AI, a startup backed by OpenAI, has introduced Figure 02, its next-generation AI-powered humanoid robot.

Designed to operate autonomously in complex environments, Figure 02 has already demonstrated its capabilities by working in a BMW factory.‌

The Details:    

  • Figure 02 is equipped with OpenAI’s AI models for speech-to-speech reasoning, allowing it to engage in full conversations with humans.

  • It utilizes a Vision Language Model (VLM) to make quick decisions based on visual input and self-correct errors.

  • The robot features six RGB cameras, providing 360-degree vision for navigating real-world environments.

  • Standing at 5'6" and weighing 132 lbs, Figure 02 has a lifting capacity of 44 lbs and can operate for up to 20 hours on a custom 2.25 KWh battery pack.

This development directly challenges Elon Musk’s Tesla Optimus, setting the stage for a new era in robotics. The partnership with OpenAI gives Figure AI a significant advantage in leveraging cutting-edge AI technology.

BYTEDANCE
📹 ByteDance Challenges Sora with Jimeng AI

ByteDance, the parent company of TikTok, has launched Jimeng AI, a text-to-video AI app for Chinese users.

This move positions ByteDance as a direct competitor to OpenAI’s yet-to-be-released Sora AI video model.

The Details:    

  • Jimeng AI is now available on the Apple App Store and Android for users in China.

  • The app allows users to create up to 2,050 images or 168 AI videos per month, with a subscription priced at 79 yuan ($11) monthly or 659 yuan ($92) annually.

  • Unlike OpenAI’s Sora, which remains unreleased, Jimeng AI is already accessible to the public.

The AI video generation market is rapidly expanding, with Chinese companies leading the charge.

With TikTok’s vast user base and access to extensive data, Jimeng AI is well-positioned to challenge global AI leaders like OpenAI.

OPENAI
🔊 OpenAI Begins ChatGPT Voice Rollout

OpenAI has begun a limited rollout of its hotly anticipated ‘Advanced Voice Mode’ for paying ChatGPT Plus users, offering natural, real-time conversations and the ability for the AI to detect and respond to emotions.

The Details:    

  • The feature will initially be available to a small group of ChatGPT Plus users, with plans to give all Plus users access by fall 2024.

  • Advanced Voice Mode uses GPT-4o and can sense emotions in users' voices, including sadness, excitement, or singing.

  • Video and screen-sharing capabilities, previously showcased in OpenAI’s early demo, will launch at a ‘later’ date.

Advanced Voice Mode’s ability to understand and respond to emotions in real-time convos could also have huge use cases in everything from customer service to mental health support.

🚀 Trending AI Tools

 ElevenLabs -Transform text into lifelike speech with a click.

🤖 Bardeen - AI agent that turns complex workflows into one click

📽️ Vidyo.AI - long videos into viral clips with a tap, saving you hours of editing

🎨 FLUX - Access FLUX, the new AI image generation model on Freepik premium

🎬 Meta SAM 2 - Segment anything, track an object across any videos

QUICK AI DEVELOPMENTS

💎 Meta has launched AI Studio, a platform powered by Llama 3.1, enabling users to build personalized AI chatbots for social media platforms like Instagram, Messenger, and WhatsApp.

👝 Upwork added OpenAI’s ChatGPT Enterprise to its platform, resulting in higher client spending and more frequent freelancer success in finding work

 Midjourney has released version 6.1 of its AI image generation platform, enhancing skin textures and text rendering for more realistic images and improved legibility — generally more beautiful.

🎨 Canva announced the acquisition of Leonardo AI, a startup known for its advanced image-generative AI platform

💥 Stability AI has launched Stable Fast3D, a new model that generates 3D images in just half a second, achieving a 1200 times speed increase over previous models.

Thanks for reading!

If you enjoyed this, please help spread the love by forwarding this Newsletter to a friend or colleague.

SPONSOR US

Get your product in front of over 4000+ AI enthusiasts

Our newsletter is read by thousands of tech professionals, investors, engineers, managers, and business owners around the world. 

FEEDBACK

How would you rate today's newsletter?

Vote below to help us improve the newsletter for you.

Login or Subscribe to participate in polls.

I hope to see you in the next one!

In Case You Missed The Latest Video