PDFTextExtractor API

The PDFTextExtractor module provides a JavaScript wrapper for native PDF text extraction functionality, enabling you to extract, search, and analyze text from PDF documents.

Import

import PDFTextExtractor from 'react-native-pdf-jsi/src/utils/PDFTextExtractor';

Methods

`isTextExtractionAvailable()`

Check if text extraction is available on the current platform.

const available = await PDFTextExtractor.isTextExtractionAvailable();
console.log('Text extraction available:', available);

Returns: Promise<boolean>

true if text extraction is supported
false if native module is not available

Example:

if (await PDFTextExtractor.isTextExtractionAvailable()) {
  // Proceed with text extraction
  const text = await PDFTextExtractor.extractAllText(pdfPath);
} else {
  console.warn('Text extraction not available on this platform');
}

`extractAllText(filePath)`

Extract text from all pages in a PDF document.

const textMap = await PDFTextExtractor.extractAllText('/path/to/document.pdf');
console.log(textMap);
// Output: { 0: "Page 1 text content...", 1: "Page 2 text content...", ... }

Parameters:

filePath (string): Absolute path to the PDF file

Returns: Promise<Object>

Object mapping page numbers (0-indexed) to extracted text

Example:

try {
  const textMap = await PDFTextExtractor.extractAllText(pdfPath);
  
  // Process each page
  Object.entries(textMap).forEach(([pageNum, text]) => {
    console.log(`Page ${parseInt(pageNum) + 1}: ${text.length} characters`);
  });
} catch (error) {
  console.error('Text extraction failed:', error);
}

`extractTextFromPages(filePath, pageNumbers)`

Extract text from specific pages.

const pageTexts = await PDFTextExtractor.extractTextFromPages(
  '/path/to/document.pdf',
  [0, 1, 2]  // Extract first 3 pages (0-indexed)
);

Parameters:

filePath (string): Absolute path to the PDF file
pageNumbers (number[]): Array of page indices (0-based)

Returns: Promise<Object>

Object mapping page numbers to extracted text

Example:

// Extract pages 1, 5, and 10 (convert to 0-indexed)
const pagesToExtract = [0, 4, 9];
const textMap = await PDFTextExtractor.extractTextFromPages(pdfPath, pagesToExtract);

console.log('Page 1:', textMap[0]);
console.log('Page 5:', textMap[4]);
console.log('Page 10:', textMap[9]);

`extractTextFromPage(filePath, pageNumber)`

Extract text from a single page.

const pageText = await PDFTextExtractor.extractTextFromPage(
  '/path/to/document.pdf',
  0  // First page (0-indexed)
);

Parameters:

filePath (string): Absolute path to the PDF file
pageNumber (number): Page index (0-based)

Returns: Promise<string>

Extracted text from the specified page

Example:

// Extract text from page 5 (0-indexed as 4)
const page5Text = await PDFTextExtractor.extractTextFromPage(pdfPath, 4);
console.log('Page 5 text:', page5Text);

// Word count
const wordCount = page5Text.split(/\s+/).filter(w => w.length > 0).length;
console.log('Word count:', wordCount);

`searchText(filePath, searchQuery, options)`

Search for text within a PDF document.

const results = await PDFTextExtractor.searchText(
  '/path/to/document.pdf',
  'search term',
  { caseSensitive: false }
);

Parameters:

filePath (string): Absolute path to the PDF file
searchQuery (string): Text to search for
options (Object, optional):
- caseSensitive (boolean): Whether search is case-sensitive (default: false)

Returns: Promise<Array>

Array of search result objects

Result Object:

{
  page: number,      // Page number (0-indexed)
  position: number,  // Character position in page text
  context: string    // Surrounding context (~50 chars before/after)
}

Example:

// Case-insensitive search
const results = await PDFTextExtractor.searchText(pdfPath, 'react native', {
  caseSensitive: false
});

console.log(`Found ${results.length} results`);

results.forEach((result, index) => {
  console.log(`\nResult ${index + 1}:`);
  console.log(`  Page: ${result.page + 1}`);
  console.log(`  Position: ${result.position}`);
  console.log(`  Context: "${result.context}"`);
});

// Case-sensitive search
const exactResults = await PDFTextExtractor.searchText(pdfPath, 'ReactNative', {
  caseSensitive: true
});

`getTextStatistics(filePath)`

Get text statistics for a PDF document.

const stats = await PDFTextExtractor.getTextStatistics('/path/to/document.pdf');
console.log(stats);

Parameters:

filePath (string): Absolute path to the PDF file

Returns: Promise<Object>

Statistics Object:

{
  totalPages: number,              // Total number of pages
  totalCharacters: number,         // Total character count
  totalWords: number,              // Total word count
  averageWordsPerPage: number,     // Average words per page
  averageCharactersPerPage: number // Average characters per page
}

Example:

const stats = await PDFTextExtractor.getTextStatistics(pdfPath);

console.log('Document Statistics:');
console.log(`  Pages: ${stats.totalPages}`);
console.log(`  Characters: ${stats.totalCharacters.toLocaleString()}`);
console.log(`  Words: ${stats.totalWords.toLocaleString()}`);
console.log(`  Avg. Words/Page: ${stats.averageWordsPerPage}`);
console.log(`  Avg. Chars/Page: ${stats.averageCharactersPerPage}`);

// Estimate reading time (assuming 200 words/minute)
const readingMinutes = Math.ceil(stats.totalWords / 200);
console.log(`  Est. Reading Time: ${readingMinutes} minutes`);

Complete Examples

Example 1: Extract and Display Text

import React, { useState } from 'react';
import { View, Button, Text, ScrollView } from 'react-native';
import PDFTextExtractor from 'react-native-pdf-jsi/src/utils/PDFTextExtractor';

const TextExtractorDemo = ({ pdfPath }) => {
  const [extractedText, setExtractedText] = useState('');
  const [loading, setLoading] = useState(false);

  const extractText = async () => {
    setLoading(true);
    try {
      const textMap = await PDFTextExtractor.extractAllText(pdfPath);
      
      // Combine all pages
      const allText = Object.values(textMap).join('\n\n--- Page Break ---\n\n');
      setExtractedText(allText);
    } catch (error) {
      console.error('Extraction failed:', error);
      setExtractedText('Error: ' + error.message);
    }
    setLoading(false);
  };

  return (
    <View style={{ flex: 1, padding: 20 }}>
      <Button 
        title={loading ? 'Extracting...' : 'Extract Text'} 
        onPress={extractText}
        disabled={loading}
      />
      
      {extractedText && (
        <ScrollView style={{ marginTop: 20, flex: 1 }}>
          <Text>{extractedText}</Text>
        </ScrollView>
      )}
    </View>
  );
};

export default TextExtractorDemo;

Example 2: Search with Highlighting

import React, { useState } from 'react';
import { View, TextInput, Button, FlatList, Text } from 'react-native';
import PDFTextExtractor from 'react-native-pdf-jsi/src/utils/PDFTextExtractor';

const SearchDemo = ({ pdfPath }) => {
  const [searchQuery, setSearchQuery] = useState('');
  const [results, setResults] = useState([]);
  const [searching, setSearching] = useState(false);

  const performSearch = async () => {
    if (!searchQuery.trim()) return;
    
    setSearching(true);
    try {
      const searchResults = await PDFTextExtractor.searchText(
        pdfPath,
        searchQuery,
        { caseSensitive: false }
      );
      
      setResults(searchResults);
    } catch (error) {
      console.error('Search failed:', error);
    }
    setSearching(false);
  };

  return (
    <View style={{ flex: 1, padding: 20 }}>
      <TextInput
        value={searchQuery}
        onChangeText={setSearchQuery}
        placeholder="Search in PDF..."
        style={{
          borderWidth: 1,
          borderColor: '#ccc',
          padding: 10,
          borderRadius: 5
        }}
      />
      
      <Button 
        title={searching ? 'Searching...' : 'Search'}
        onPress={performSearch}
        disabled={searching || !searchQuery.trim()}
      />
      
      <Text style={{ marginTop: 20, fontWeight: 'bold' }}>
        {results.length} results found
      </Text>
      
      <FlatList
        data={results}
        keyExtractor={(item, index) => index.toString()}
        renderItem={({ item, index }) => (
          <View style={{ 
            padding: 10, 
            marginVertical: 5, 
            backgroundColor: '#f0f0f0',
            borderRadius: 5 
          }}>
            <Text style={{ fontWeight: 'bold' }}>
              Result {index + 1} - Page {item.page + 1}
            </Text>
            <Text style={{ marginTop: 5, fontStyle: 'italic' }}>
              "{item.context}"
            </Text>
          </View>
        )}
      />
    </View>
  );
};

export default SearchDemo;

Example 3: Document Statistics Dashboard

import React, { useState, useEffect } from 'react';
import { View, Text, ActivityIndicator } from 'react-native';
import PDFTextExtractor from 'react-native-pdf-jsi/src/utils/PDFTextExtractor';

const StatsDashboard = ({ pdfPath }) => {
  const [stats, setStats] = useState(null);
  const [loading, setLoading] = useState(true);

  useEffect(() => {
    loadStats();
  }, [pdfPath]);

  const loadStats = async () => {
    try {
      const statistics = await PDFTextExtractor.getTextStatistics(pdfPath);
      setStats(statistics);
    } catch (error) {
      console.error('Failed to load stats:', error);
    }
    setLoading(false);
  };

  if (loading) {
    return <ActivityIndicator size="large" />;
  }

  if (!stats) {
    return <Text>No statistics available</Text>;
  }

  const readingTime = Math.ceil(stats.totalWords / 200);

  return (
    <View style={{ padding: 20 }}>
      <Text style={{ fontSize: 20, fontWeight: 'bold', marginBottom: 20 }}>
        Document Statistics
      </Text>
      
      <StatItem 
        label="Total Pages" 
        value={stats.totalPages} 
      />
      <StatItem 
        label="Total Words" 
        value={stats.totalWords.toLocaleString()} 
      />
      <StatItem 
        label="Total Characters" 
        value={stats.totalCharacters.toLocaleString()} 
      />
      <StatItem 
        label="Avg. Words/Page" 
        value={stats.averageWordsPerPage} 
      />
      <StatItem 
        label="Estimated Reading Time" 
        value={`${readingTime} minutes`} 
      />
    </View>
  );
};

const StatItem = ({ label, value }) => (
  <View style={{ 
    flexDirection: 'row', 
    justifyContent: 'space-between',
    paddingVertical: 10,
    borderBottomWidth: 1,
    borderBottomColor: '#eee'
  }}>
    <Text style={{ fontSize: 16 }}>{label}:</Text>
    <Text style={{ fontSize: 16, fontWeight: 'bold' }}>{value}</Text>
  </View>
);

export default StatsDashboard;

Error Handling

All methods may throw errors. Always use try-catch blocks:

try {
  const text = await PDFTextExtractor.extractAllText(pdfPath);
} catch (error) {
  if (error.message.includes('not available')) {
    // Native module not available
    console.log('Text extraction not supported');
  } else if (error.message.includes('file not found')) {
    // File doesn't exist
    console.log('PDF file not found');
  } else {
    // Other errors
    console.error('Extraction error:', error);
  }
}

TypeScript

interface PDFTextExtractorStatic {
  isTextExtractionAvailable(): Promise<boolean>;
  extractAllText(filePath: string): Promise<{[page: number]: string}>;
  extractTextFromPages(filePath: string, pageNumbers: number[]): Promise<{[page: number]: string}>;
  extractTextFromPage(filePath: string, pageNumber: number): Promise<string>;
  searchText(
    filePath: string,
    searchQuery: string,
    options?: { caseSensitive?: boolean }
  ): Promise<Array<{
    page: number;
    position: number;
    context: string;
  }>>;
  getTextStatistics(filePath: string): Promise<{
    totalPages: number;
    totalCharacters: number;
    totalWords: number;
    averageWordsPerPage: number;
    averageCharactersPerPage: number;
  }>;
}

Platform Support

Platform	Support	Notes
Android	✅ Full	Native text extraction
iOS	✅ Full	Native text extraction
Windows	⚠️ Limited	Check availability

Performance Tips

Cache Results: Text extraction can be slow for large documents. Cache results when possible.

const textCache = new Map();

async function getCachedText(pdfPath) {
  if (textCache.has(pdfPath)) {
    return textCache.get(pdfPath);
  }
  
  const text = await PDFTextExtractor.extractAllText(pdfPath);
  textCache.set(pdfPath, text);
  return text;
}

Extract Only What You Need: Use extractTextFromPages() instead of extractAllText() when you only need specific pages.
Debounce Search: Debounce search queries to avoid excessive extraction operations.

import { debounce } from 'lodash';

const debouncedSearch = debounce(async (query) => {
  const results = await PDFTextExtractor.searchText(pdfPath, query);
  setResults(results);
}, 300);

Import​

Methods​

isTextExtractionAvailable()​

extractAllText(filePath)​

extractTextFromPages(filePath, pageNumbers)​

extractTextFromPage(filePath, pageNumber)​

searchText(filePath, searchQuery, options)​

getTextStatistics(filePath)​

Complete Examples​

Example 1: Extract and Display Text​

Example 2: Search with Highlighting​

Example 3: Document Statistics Dashboard​

Error Handling​

TypeScript​

Platform Support​

Performance Tips​

See Also​

Import

Methods

`isTextExtractionAvailable()`

`extractAllText(filePath)`

`extractTextFromPages(filePath, pageNumbers)`

`extractTextFromPage(filePath, pageNumber)`

`searchText(filePath, searchQuery, options)`

`getTextStatistics(filePath)`

Complete Examples

Example 1: Extract and Display Text

Example 2: Search with Highlighting

Example 3: Document Statistics Dashboard

Error Handling

TypeScript

Platform Support

Performance Tips

See Also