Back to Blog
AI
14 min read

How to Build an AI-Powered React Native App in 2026

Paweł Karniej·March 2026

How to Build an AI-Powered React Native App in 2026

March 2026

There are three ways to build a React Native AI app right now: call AI APIs through edge functions, run models on-device with ExecuTorch, or combine both. After shipping AI features in 10+ apps over the past year, I can tell you that option one -- API calls through edge functions -- is the only approach that reliably works in production today.

On-device inference sounds great in theory. In practice, the models that fit on a phone are too limited for anything users actually want. ExecuTorch is improving fast, but for image generation, speech synthesis, and GPT-level chat, you still need server-side models. The hybrid approach (on-device for simple tasks, API for heavy lifting) is where things are heading, but in 2026, edge functions covering your AI calls is the architecture that ships.

This guide covers six AI capabilities I have built into production React Native apps, with the actual patterns that work.

Table of Contents

  • The Edge Function Pattern
  • Image Generation (DALL-E 3, Flux Pro, SDXL)
  • Video Generation (Minimax, Stable Video Diffusion)
  • Text-to-Speech (ElevenLabs)
  • Speech-to-Text (Whisper)
  • Image Analysis (GPT-4 Vision)
  • AI Chat (GPT-4o)
  • Cost Breakdown: What AI Actually Costs
  • What Actually Works in Production
  • Common Mistakes
  • Getting Started Fast
  • The Edge Function Pattern

    Before diving into specific AI features, you need to understand the one rule that matters: never put API keys in your mobile app.

    I cannot stress this enough. React Native apps can be decompiled. JavaScript bundles can be extracted. If your OpenAI API key is in your client code, someone will find it, and you will wake up to a $2,000 bill from someone generating images on your dime.

    The pattern that works:

    React Native App
        -> (authenticated request)
    Convex Function (edge)
        -> (API key stored server-side)
    OpenAI / Replicate / ElevenLabs
        -> (response)
    Convex Function
        -> (store result, deduct credits)
    React Native App

    Every AI call goes through a backend function. I use Convex for this because it gives me auth, database, and functions in a single service. No separate auth provider, no separate database, no separate hosting for functions. One service, one bill, and it handles the complexity of real-time subscriptions so your UI updates instantly when an AI job completes.

    The Convex function validates the user is authenticated, checks they have enough credits, makes the API call with a securely stored key, saves the result, deducts credits, and returns the response. All in one place.

    Image Generation (DALL-E 3, Flux Pro, SDXL)

    Image generation is the most popular AI feature in my apps. The pattern is straightforward: user sends a prompt, your edge function calls the API, you get back a URL or base64 image.

    DALL-E 3 is the simplest to integrate. One API call, one response, done. Quality is excellent for general-purpose images. Cost is $0.04-0.08 per image depending on resolution.

    Flux Pro (via Replicate) gives better results for photorealistic images and costs roughly $0.05 per generation. The catch is that Replicate uses a prediction model -- you submit a job and poll for results, which adds complexity.

    SDXL is cheaper ($0.01-0.02 per image) but quality varies more. Good for high-volume use cases where you need to keep costs down.

    The edge function pattern for image generation:

    // convex/ai/generateImage.ts
    export const generate = action({
      args: {
        prompt: v.string(),
        model: v.union(
          v.literal("dall-e-3"),
          v.literal("flux-pro")
        ),
      },
      handler: async (ctx, args) => {
        const userId = await getAuthUserId(ctx);
        if (!userId) throw new Error("Not authenticated");
    
        // Check credits before making expensive API call
        const user = await ctx.runQuery(
          internal.users.getCredits, { userId }
        );
        if (user.credits < 1) {
          throw new Error("Insufficient credits");
        }
    
        // Call AI API with server-side key
        const imageUrl = await callImageAPI(
          args.prompt, args.model
        );
    
        // Store result and deduct credits atomically
        await ctx.runMutation(internal.images.store, {
          userId, imageUrl, prompt: args.prompt,
        });
        await ctx.runMutation(internal.users.deductCredits, {
          userId, amount: 1,
        });
    
        return { imageUrl };
      },
    });

    The important detail: check credits before making the API call. AI APIs are not free, and you do not want to generate an image for a user who cannot pay for it.

    Video Generation (Minimax, Stable Video Diffusion)

    Video generation is a different beast. Unlike image generation, which returns in 5-10 seconds, video generation takes 30 seconds to several minutes. You cannot hold an HTTP connection open that long on mobile.

    The pattern that works is async polling:

  • User submits a video prompt
  • Edge function creates a job record in the database with status "processing"
  • Edge function kicks off the generation via a scheduled action
  • The scheduled action polls the AI provider until the video is ready
  • When complete, update the database record with the video URL
  • Client is subscribed to that record via Convex real-time -- UI updates automatically
  • // Simplified async pattern
    export const startVideoGeneration = action({
      handler: async (ctx, args) => {
        // Create a pending job
        const jobId = await ctx.runMutation(
          internal.jobs.create,
          { userId, status: "processing", prompt: args.prompt }
        );
        // Schedule the actual generation
        await ctx.scheduler.runAfter(0, internal.ai.processVideo, {
          jobId, prompt: args.prompt,
        });
        return { jobId };
      },
    });

    On the React Native side, you subscribe to the job record. When the status changes from "processing" to "complete," the UI updates and shows the video. No manual polling from the client, no WebSocket setup. Convex handles the real-time sync.

    Minimax is my current go-to for video generation. Roughly $0.10-0.30 per video depending on length and resolution. Stable Video Diffusion through Replicate is cheaper but slower and lower quality.

    Text-to-Speech (ElevenLabs)

    ElevenLabs produces the most natural-sounding speech of any TTS API right now. The integration pattern is streaming audio back to the client.

    Your edge function sends text to the ElevenLabs API and gets back an audio stream. You have two options: stream the audio chunks directly to the client, or wait for the full audio file and return it as a URL.

    For short text (under 500 characters), waiting for the full response is fine. For longer content, you want to store the audio file and return a URL the client can stream from.

    // Edge function returns audio storage URL
    const audioResponse = await elevenLabsAPI.textToSpeech({
      text: args.text,
      voice_id: args.voiceId,
      model_id: "eleven_multilingual_v2",
    });
    
    // Store the audio buffer
    const storageId = await ctx.storage.store(
      new Blob([audioResponse])
    );
    const audioUrl = await ctx.storage.getUrl(storageId);

    On the React Native side, you play the audio URL with expo-av. Cost is roughly $0.30 per 1,000 characters with ElevenLabs, which adds up fast. This is one of those features that absolutely needs a credit system.

    Speech-to-Text (Whisper)

    Whisper transcription follows a file upload pattern. The user records audio on their phone, you upload the audio file to your backend, and the edge function sends it to the Whisper API.

    The key detail: audio files from mobile recordings can be large. You want to record in a compressed format (m4a or webm) rather than raw wav. Expo's audio API lets you configure this.

    // React Native: record in compressed format
    const recording = new Audio.Recording();
    await recording.prepareToRecordAsync({
      android: {
        extension: '.m4a',
        outputFormat: Audio.AndroidOutputFormat.MPEG_4,
        audioEncoder: Audio.AndroidAudioEncoder.AAC,
      },
      ios: {
        extension: '.m4a',
        outputFormat: Audio.IOSOutputFormat.MPEG4AAC,
      },
    });

    Upload the file to Convex storage, then pass it to a function that calls the Whisper API. Whisper accepts files up to 25MB, and a 5-minute m4a recording is typically under 2MB, so you are well within limits.

    Cost is $0.006 per minute of audio. Extremely cheap. This is one of the few AI features where cost is almost negligible.

    Image Analysis (GPT-4 Vision)

    GPT-4 Vision lets users take a photo and get AI analysis. I used this in AIVidly for analyzing video thumbnails and in other apps for receipt scanning and document analysis.

    The pattern: capture or select an image on the client, convert to base64, send to your edge function.

    // React Native: convert image to base64
    import * as FileSystem from 'expo-file-system';
    
    const base64Image = await FileSystem.readAsStringAsync(
      imageUri,
      { encoding: FileSystem.EncodingType.Base64 }
    );
    
    // Send to your backend
    await analyzeImage({ imageBase64: base64Image });

    On the backend, you pass the base64 string to the OpenAI API with a system prompt that defines what kind of analysis you want.

    One gotcha: base64 images are roughly 33% larger than the original file. A 3MB photo becomes a 4MB base64 string. This matters for upload times on slow connections. Resize images on the client before encoding -- 1024x1024 is usually sufficient for analysis and keeps the payload manageable.

    Cost is based on token count, which depends on image resolution. A typical analysis runs $0.01-0.03 per image.

    AI Chat (GPT-4o)

    AI chat is the feature users expect most, and it is the trickiest to get right in a mobile app. The key challenge is streaming responses so the user sees text appearing in real-time, not waiting 5-10 seconds for a complete response.

    The pattern I use: the client calls a Convex action that streams the OpenAI response and progressively updates a message record in the database. The client subscribes to that record and sees the text appear token by token.

    // Convex action: stream and save progressively
    const stream = await openai.chat.completions.create({
      model: "gpt-4o",
      messages: conversationHistory,
      stream: true,
    });
    
    let fullResponse = "";
    for await (const chunk of stream) {
      const content = chunk.choices[0]?.delta?.content || "";
      fullResponse += content;
    
      // Update the message in the database periodically
      if (fullResponse.length % 50 === 0) {
        await ctx.runMutation(internal.messages.update, {
          messageId,
          content: fullResponse,
          status: "streaming",
        });
      }
    }

    Conversation history matters. You need to store previous messages and send them with each new request so the AI has context. But be careful -- GPT-4o charges per token, and sending 50 previous messages with every request gets expensive fast. I limit conversation history to the last 20 messages and summarize older context.

    Cost for GPT-4o is roughly $2.50 per million input tokens and $10 per million output tokens. For a typical chat session (10 messages back and forth), that is about $0.01-0.03. Cheap per session, but it adds up with active users.

    Cost Breakdown: What AI Actually Costs

    Here is what AI API costs look like for a small app with 100-500 monthly active users:

    | Feature | Cost per use | Monthly estimate |

    |---------|-------------|-----------------|

    | Image Generation (DALL-E 3) | $0.04-0.08 | $4-8 |

    | Video Generation (Minimax) | $0.10-0.30 | $2-5 |

    | Text-to-Speech (ElevenLabs) | ~$0.01-0.05 per request | $2-4 |

    | Speech-to-Text (Whisper) | $0.006/min | $0.50-1 |

    | Image Analysis (GPT-4V) | $0.01-0.03 | $1-2 |

    | AI Chat (GPT-4o) | $0.01-0.03/session | $1-3 |

    Total: roughly $5-15/month for a small app. That is manageable, but it scales linearly with users. At 5,000 MAU, you are looking at $50-150/month. At 50,000 MAU, $500-1,500/month.

    This is why you need two things: a credit system and a paywall.

    The credit system limits how much any single user can consume. Give free users 10-20 credits, charge for more. The paywall ensures you have revenue to cover API costs before they spiral.

    I learned this the hard way. One of my early apps had unlimited AI generation for paying users. A few power users ran up $200 in API costs in a single month on a $4.99 subscription. Credits solve this.

    What Actually Works in Production

    After building AIVidly and several other AI-powered React Native apps, here is what I know works:

    Edge functions are non-negotiable. Every AI call goes through a backend function. No exceptions. This is not just about security -- it is about control. You need to track usage, enforce limits, and handle errors gracefully. You cannot do any of that from the client.

    Convex as the backend simplifies everything. Before Convex, I was stitching together Supabase for the database, a separate auth provider, and Vercel functions for AI calls. Three services, three sets of docs, three potential failure points. Convex handles auth, database, file storage, and server functions in one place. Real-time subscriptions mean the UI updates instantly when an AI job completes -- no manual polling.

    Credit systems are essential. Every AI feature costs money. Users will abuse unlimited plans. A credit-based system with clear pricing per action is the fairest model and the easiest to maintain financially.

    Timeouts need explicit handling. Mobile networks are unreliable. AI APIs are sometimes slow. Your app needs to handle both gracefully. I set a 30-second timeout on most AI calls and show clear UI states for loading, success, and failure. For longer operations like video generation, the async pattern with database-backed job tracking is the only reliable approach.

    Users care about speed more than quality. I tested DALL-E 3 (higher quality, slower) versus SDXL (lower quality, faster) in an A/B test. Users preferred the faster option. Response time under 5 seconds is the target for any synchronous AI feature.

    Caching saves money. If two users generate the same prompt, serve the cached result. Simple to implement, and it cuts costs significantly for popular prompts.

    Common Mistakes

    I have made most of these mistakes myself. Save yourself the trouble.

    Putting API keys in client code. I already covered this, but it bears repeating. React Native JavaScript bundles are not secure. Anyone with basic tooling can extract strings from your app binary. Put keys in environment variables on your backend, never in your React Native code.

    Not handling timeouts. AI APIs do not always respond in 2 seconds. Sometimes they take 30 seconds. Sometimes they time out entirely. If your app does not handle this, users see a frozen screen and think the app is broken. Always implement timeouts, loading states, and retry logic.

    Ignoring rate limits. OpenAI, Replicate, and ElevenLabs all have rate limits. If you hit them, your app breaks for everyone. Implement queuing on your backend so requests are spaced out. Convex's scheduler makes this straightforward -- you can throttle concurrent AI calls per user.

    Not implementing a credit system from day one. "I will add monetization later" is how you end up with a $500 API bill and no revenue. Build the credit system before you launch. Every AI action costs credits. Free tier gets a limited number. Paid users get more. This is not optional.

    Sending raw, uncompressed media. A 10MB photo sent as base64 is 13MB over the wire. Resize images before sending. Compress audio before uploading. Your users on cellular data will thank you, and your API costs will be lower because of smaller input sizes.

    No error handling for content moderation. OpenAI and other providers reject certain prompts. Your app needs to handle rejection responses gracefully with a clear message to the user, not a cryptic error or crash.

    Getting Started Fast

    Building all of these AI integrations from scratch takes weeks. I know because I did it, multiple times, across multiple apps. The auth flow, the credit system, the edge function patterns, the error handling, the streaming infrastructure -- it is a lot of work before you even get to the AI features themselves.

    That is why I built Ship React Native. It includes all 8 AI integrations pre-built and production-tested: image generation with multiple providers, video generation with async job handling, text-to-speech, speech-to-text, image analysis, AI chat with streaming, plus AI-powered coaching and newsletter analysis patterns. The credit system, paywall, and Convex backend are all wired up and ready to go.

    You can tear it apart and learn from the code, or use it as a foundation to ship your AI-powered React Native app in days instead of months. Either way, the patterns in this guide are exactly what is running in production inside the kit.

    The best time to build a React Native AI app was six months ago. The second best time is now. The APIs are mature, the patterns are proven, and mobile users are hungry for AI features that actually work well on their phones.