package bio_io

  1. Overview
  2. Docs

In_channel for FASTA records. For more general info, see the Record_in_channel module mli file.

Examples

Return all records in a list

Simplest way. May raise exceptions.

let records = Fasta.In_channel.with_file_records_exn fname 

A bit more involved, but you won't get exceptions. Instead, you have to handle the Or_error.t.

let records =
  match Fasta.In_channel.with_file_records name with
  | Error err ->
      eprintf "Problem reading records: %s\n" (Error.to_string_hum err);
      exit 1
  | Ok records -> records

Iterating over records

Use the iter functions when you need to go over each record and perform some side-effects with them.

Print sequence IDs and sequence lengths

let () =
  Fasta.In_channel.with_file_iter_records_exn "sequences.fasta"
    ~f:(fun record ->
      let open Fasta.Record in
      printf "%s => %d\n" (id record) (seq_length record))

Print sequence index, IDs, and sequence lengths.

This is like the last example except that we also want to print the index. The first record is 0, the 2nd is 1, etc.

let () =
  Fasta.In_channel.with_file_iteri_records_exn "sequences.fasta"
    ~f:(fun index record ->
      let open Fasta.Record in
      printf "%d: %s => %d\n" (index + 1) (id record)
        (seq_length record)

Folding over records

If you need to reduce all the records down to a single value, use the fold functions.

Get total length of all sequences in the file.

Watch out as this may raise exceptions...see the _exn suffix.

let total_length =
  Fasta.In_channel.with_file_fold_records_exn "sequences.fasta" ~init:0
    ~f:(fun length record -> length + Fasta.Record.seq_length record)

Same thing, but this won't raise exceptions. You do have to handle Or_error.t to get the final value. Note that within the fold function, you get Fasta.Record.t and not Fasta.Record.t Or_error.t.

let total_length =
  match
    Fasta.In_channel.with_file_fold_records name ~init:0
      ~f:(fun length record -> length + Fasta.Record.seq_length record)
  with
  | Error err ->
      eprintf "Problem reading records: %s\n" (Error.to_string_hum err);
      exit 1
  | Ok total_length -> total_length

Pipelines with records

Sometimes you have a "pipeline" of computations that you need to do one after the other on records. In that case, you could the sequence functions. Here's a silly example.

let () =
  Fasta.In_channel.with_file_exn name ~f:(fun chan ->
      Fasta.In_channel.record_sequence_exn chan
      (* Add sequence index to record description *)
      |> Sequence.mapi ~f:(fun i record ->
             let new_desc =
               match Fasta.Record.desc record with
               | None -> Some (sprintf "sequence %d" i)
               | Some old_desc ->
                   Some (sprintf "%s -- sequence %d" old_desc i)
             in
             Fasta.Record.with_desc new_desc record)
      (* Convert all sequence chars to lowercase *)
      |> Sequence.map ~f:(fun record ->
             let new_seq = String.lowercase (Fasta.Record.seq record) in
             Fasta.Record.with_seq new_seq record)
      (* Print sequences *)
      |> Sequence.iter ~f:(fun record ->
             print_endline @@ Fasta.Record.serialize record))

One thing to watch out for though...if you get an exception half way through and you are running side-effecting code like we are here then part of your side effects will have occured and part of them will not have occured.

There are also Or_error.t flavors of the sequence functions. Just watch out because these you actually do have to deal with Or_error.t for each Fasta.Record.t in the sequence.

As an alternative, you could use the record_sequence_exn function, but wrap that in the with_file function. That way you don't have to deal with the Or_error.t inside your pipeline. Instead you deal with it at the end.

let total_length =
  match
    Fasta.In_channel.with_file name ~f:(fun chan ->
        Fasta.In_channel.record_sequence_exn chan
        (* Blow up pipeline on second sequence. *)
        |> Sequence.mapi ~f:(fun i record ->
               if i = 1 then assert false;
               record)
        |> Sequence.fold ~init:0 ~f:(fun length record ->
               length + String.length (Fasta.Record.seq record)))
  with
  | Error err ->
      eprintf "Problem in parsing pipeline: %s\n"
        (Error.to_string_hum err);
      exit 1
  | Ok total_length -> total_length

As you can see, if that fasta file has more than one sequence it will hit the assert false and blow up.

include Record_in_channel.S with type record := Record.t

API

type t
val stdin : t

create_exn file_name opens an t on the standard input channel.

val create_exn : Base.string -> t

create_exn file_name opens an input channel on the file specified by file_name.

val create : Base.string -> t Base.Or_error.t

create file_name opens an input channel on the file specified by file_name.

val close_exn : t -> Base.unit

close_exn t Close the t. Raises if the call fails.

val close : t -> Base.unit Base.Or_error.t

close t is like close_exnt t except that it shouldn't raise.

val with_file_exn : Base.string -> f:(t -> 'a) -> 'a

with_file_exn file_name ~f executes ~f on the channel created from file_name and closes it afterwards.

val with_file : Base.string -> f:(t -> 'a) -> 'a Base.Or_error.t

with_file file_name ~f is like with_file_exn file_name ~f except that it shouldn't raise.

val equal : t -> t -> Base.bool

equal t1 t2 compares t1 and t2 for equality.

val input_record_exn : t -> Record.t Base.option

input_record_exn t returns Some record if there is a record to return. If there are no more records, None is returned. Exn is raised on bad input.

val input_record : t -> Record.t Base.option Base.Or_error.t

input_record t is like input_record_exn t except that it should not raise exceptions.

Folding over records

val fold_records_exn : t -> init:'a -> f:('a -> Record.t -> 'a) -> 'a

fold_records_exn t ~init ~f reduces all records from a t down to a single value of type 'a.

val fold_records : t -> init:'a -> f:('a -> Record.t -> 'a) -> 'a Base.Or_error.t

fold_records t ~init ~f is like fold_records_exn t ~init ~f except that it should not raise exceptions. Rather than deal with exceptions inside the reducing function, you must deal with them at the end when handling the return value.

val foldi_records_exn : t -> init:'a -> f:(Base.int -> 'a -> Record.t -> 'a) -> 'a

Like fold_records_exn t ~init ~f except that f is provided the 0-based record index as its first argument. See fold_records_exn.

val foldi_records : t -> init:'a -> f:(Base.int -> 'a -> Record.t -> 'a) -> 'a Base.Or_error.t

Like foldi_records_exn t ~init ~f except that it shouldn't raise. See foldi_records_exn.

Folding with file name

val with_file_fold_records_exn : Base.string -> init:'a -> f:('a -> Record.t -> 'a) -> 'a

with_file_fold_records_exn file_name ~init ~f is like fold_records_exn t ~init ~f except that it is passed a file name, and it manages t automatically. See with_file.

val with_file_fold_records : Base.string -> init:'a -> f:('a -> Record.t -> 'a) -> 'a Base.Or_error.t

with_file_fold_records file_name ~init ~f is like fold_records t ~init ~f except that it is passed a file name, and it manages t automatically. See with_file.

val with_file_foldi_records_exn : Base.string -> init:'a -> f:(Base.int -> 'a -> Record.t -> 'a) -> 'a

with_file_foldi_records_exn file_name ~init ~f is like foldi_records_exn t ~init ~f except that it is passed a file name, and it manages t automatically. See with_file.

val with_file_foldi_records : Base.string -> init:'a -> f:(Base.int -> 'a -> Record.t -> 'a) -> 'a Base.Or_error.t

with_file_foldi_records file_name ~init ~f is like fold'_records t ~init ~f except that it is passed a file name, and it manages t automatically. See with_file.

Iterating over records

The iter functions are like the fold functions except they do not take an init value and the f function returns unit insead of some other value 'a, and thus return unit rather than a value 'a.

They are mainly called for side effects.

val iter_records_exn : t -> f:(Record.t -> Base.unit) -> Base.unit

iter_records_exn t ~f calls f on each record in t. As f returns unit this is generally used for side effects.

val iter_records : t -> f:(Record.t -> Base.unit) -> Base.unit Base.Or_error.t

iter_records t ~f is like iter_records_exn t ~f except that it shouldn't raise.

val iteri_records_exn : t -> f:(Base.int -> Record.t -> Base.unit) -> Base.unit

iteri_records_exn t ~f is like iteri_records_exn t ~f except that f is passed in the 0-indexed record index as its first argument.

val iteri_records : t -> f:(Base.int -> Record.t -> Base.unit) -> Base.unit Base.Or_error.t

iteri_records t ~f is like iteri_records_exn t ~f except that it shouldn't raise.

Iterating with file name

val with_file_iter_records_exn : Base.string -> f:(Record.t -> Base.unit) -> Base.unit

with_file_iter_records_exn file_name ~init ~f is like iter_records_exn t ~init ~f except that it is passed a file name, and it manages t automatically. See with_file.

val with_file_iter_records : Base.string -> f:(Record.t -> Base.unit) -> Base.unit Base.Or_error.t

with_file_iter_records file_name ~init ~f is like iter_records t ~init ~f except that it is passed a file name, and it manages t automatically. See with_file.

val with_file_iteri_records_exn : Base.string -> f:(Base.int -> Record.t -> Base.unit) -> Base.unit

with_file_iteri_records_exn file_name ~init ~f is like iteri_records_exn t ~init ~f except that it is passed a file name, and it manages t automatically. See with_file.

val with_file_iteri_records : Base.string -> f:(Base.int -> Record.t -> Base.unit) -> Base.unit Base.Or_error.t

with_file_iteri_records file_name ~init ~f is like iteri_records t ~init ~f except that it is passed a file name, and it manages t automatically. See with_file.

Getting records as a list

These functions return lists of recordss.

val records_exn : t -> Record.t Base.List.t

With file name

val with_file_records_exn : Base.string -> Record.t Base.List.t
val with_file_records : Base.string -> Record.t Base.List.t Base.Or_error.t

Getting records as a sequence

These are a bit different:

* There are no with_file versions as you would have to do some fiddly things to keep the channel open, making them not so nice to use.

* Each record that is yielded is wrapped in an Or_error.t. This is different from the iter, fold, and other non _exn functions in which case the entire result is wrapped in an Or_error.t, letting you ignore errors in the passed in ~f function and deal with failure once.

val record_sequence_exn : t -> Record.t Base.Sequence.t

record_sequence_exn t returns a Sequence.t of record. May raise exceptions.

val record_sequence : t -> Record.t Base.Or_error.t Base.Sequence.t

record_sequence t is like record_sequence_exn t except that instead of raising exceptions, each item of the sequence is a record Or_error.t rather than an "unwrapped" record. This could make things annoying to deal with. If you don't want exceptions, you could instead wrap your entire sequence processing pipeline in a call to with_file and handle the Or_error.t in that way. See the pipelines usage examples for more info.