Create Transcription - Nexara Documentation

Supported Formats and Limitations

Supported formats: mp3, wav, mp3, m4a, flac, ogg, opus, mp4, mov, avi, mkv. Furthermore, sending files by their URL is supported by the API.
Maximum file size: 1GB. If you need more, write to Support.
Minimum audio length: 0.3 seconds.
Maximum audio length: 10 hours.
Rate limit: 10 requests per second.

To save bandwidth, it is recommended to convert video files to audio formats, for example using ffmpeg:ffmpeg -i input.mp4 -vn -c:a aac -b:a 192k output.m4aIn this example, the video file input.mp4 is converted to output.m4a with a bitrate of 192 kbps.

Examples

Click the picker on the right, which is either verbose_json_example, json_example or diarization_example to view the example responses.

Authorizations

Authorization

string

header

required

Use your API key as a Bearer token in the Authorization header. Example: Authorization: Bearer nx-yourkey

Body

multipart/form-data

file

file | null

The audio file object (not filename) to transcribe, in one of the supported formats. Either file or url must be sent.

url

string | null

The URL of the audio file to transcribe, in one of the supported formats. This option is unsupported by the OpenAI SDK. Either file or url must be sent.

Example:

"https://upload.wikimedia.org/wikipedia/commons/a/a1/Gettysburg_by_Britton.ogg"

task

enum<string>

default:transcribe

The task to perform. Currently only 'transcribe' and 'diarize' are supported. transcribe just transcribes the audio, while diarize also identifies different speakers in the audio and attributes transcribed segments to each one.

Available options:

transcribe,

diarize

Example:

"transcribe"

model

string

default:whisper-1

ID of the model to use. Only whisper-1 is currently available.

Example:

"whisper-1"

language

string | null

The language of the input audio (ISO-639-1 format). Auto-detected if omitted.

Example:

"ru"

response_format

enum<string>

default:json

The format of the transcript output. srt and vtt formats will return ready-to-use formatted subtitles. If the diarize task is used, the response will always be a json object.

Available options:

json,

text,

srt,

verbose_json,

vtt

Example:

"verbose_json"

num_speakers

number | null

The number of speakers to detect. If not provided, the model will detect the number of speakers automatically. Please note that this variable does not guarantee the correct number of speakers but can guide the model. This parameter is ignored if the task is not diarize.

Example:

2

diarization_setting

enum<string>

default:general

The config for the diarization model. general is the default setting. meeting is optimized for meetings, while telephonic is optimized for telephonic conversations. This parameter is ignored if the task is not diarize.

Available options:

general,

meeting,

telephonic

Example:

"telephonic"

timestamp_granularities[]

enum<string>

default:segment

Timestamp granularities to include. word requires response_format to be verbose_json.

Available options:

segment,

word

Example:

"segment"

Response

200

application/json

Successful transcription or diarization response. The format depends on the 'response_format' parameter.

text

string

required

The full transcribed text.

task

string | null

The task that was performed. Currently, always returns transcribe.

language

string | null

The language of the input audio.

duration

number | null

The duration of the input audio.

segments

object[] | null

Segments of the transcribed text and their details.

words

object[] | null

Extracted words and their corresponding timestamps.

Endpoints

​Supported Formats and Limitations

​Examples

Authorizations

Body

Response

Supported Formats and Limitations

Examples