Limited-memory BFGS

Limited-memory BFGS (L-BFGS or LM-BFGS) is an optimization algorithm in the family of quasi-Newton methods that approximates the Broyden–Fletcher–Goldfarb–Shanno algorithm (BFGS) using a limited amount of computer memory.[1] It is a popular algorithm for parameter estimation in machine learning.[2][3] The algorithm's target problem is to minimize over unconstrained values of the real-vector where is a differentiable scalar function.

Like the original BFGS, L-BFGS uses an estimate of the inverse Hessian matrix to steer its search through variable space, but where BFGS stores a dense approximation to the inverse Hessian (n being the number of variables in the problem), L-BFGS stores only a few vectors that represent the approximation implicitly. Due to its resulting linear memory requirement, the L-BFGS method is particularly well suited for optimization problems with many variables. Instead of the inverse Hessian Hk, L-BFGS maintains a history of the past m updates of the position x and gradient ∇f(x), where generally the history size m can be small (often ). These updates are used to implicitly do operations requiring the Hk-vector product.

Algorithm

The algorithm starts with an initial estimate of the optimal value, , and proceeds iteratively to refine that estimate with a sequence of better estimates . The derivatives of the function are used as a key driver of the algorithm to identify the direction of steepest descent, and also to form an estimate of the Hessian matrix (second derivative) of .

L-BFGS shares many features with other quasi-Newton algorithms, but is very different in how the matrix-vector multiplication is carried out, where is the approximate Newton's direction, is the current gradient, and is the inverse of the Hessian matrix. There are multiple published approaches using a history of updates to form this direction vector. Here, we give a common approach, the so-called "two loop recursion."[4][5]

We take as given , the position at the k-th iteration, and where is the function being minimized, and all vectors are column vectors. We also assume that we have stored the last m updates of the form

.

We define , and will be the 'initial' approximate of the inverse Hessian that our estimate at iteration k begins with.

The algorithm is based on the BFGS recursion for the inverse Hessian as

For a fixed k we define a sequence of vectors as and . Then a recursive algorithm for calculating from is to define and . We also define another sequence of vectors as . There is another recursive algorithm for calculating these vectors which is to define and then recursively define and . The value of is then our ascent direction.

Thus we can compute the descent direction as follows:


This formulation gives the search direction for the minimization problem, i.e., . For maximization problems, one should thus take -z instead. Note that the initial approximate inverse Hessian is chosen as a diagonal matrix or even a multiple of the identity matrix since this is numerically efficient.

The scaling of the initial matrix ensures that the search direction is well scaled and therefore the unit step length is accepted in most iterations. A Wolfe line search is used to ensure that the curvature condition is satisfied and the BFGS updating is stable. Note that some software implementations use an Armijo backtracking line search, but cannot guarantee that the curvature condition will be satisfied by the chosen step since a step length greater than may be needed to satisfy this condition. Some implementations address this by skipping the BFGS update when is negative or too close to zero, but this approach is not generally recommended since the updates may be skipped too often to allow the Hessian approximation to capture important curvature information. Some solvers employ so called damped (L)BFGS update which modifies quantities and in order to satisfy the curvature condition.

The two-loop recursion formula is widely used by unconstrained optimizers due to its efficiency in multiplying by the inverse Hessian. However, it does not allow for the explicit formation of either the direct or inverse Hessian and is incompatible with non-box constraints. An alternative approach involves using a low-rank representation for the direct and/or inverse Hessian.[6] This represents the Hessian as a sum of a diagonal matrix and a low-rank update. Such a representation enables the use of L-BFGS in constrained settings, for example, as part of the SQP method.

Applications

L-BFGS has been called "the algorithm of choice" for fitting log-linear (MaxEnt) models and conditional random fields with -regularization.[2][3]

Variants

Since BFGS (and hence L-BFGS) is designed to minimize smooth functions without constraints, the L-BFGS algorithm must be modified to handle functions that include non-differentiable components or constraints. A popular class of modifications are called active-set methods, based on the concept of the active set. The idea is that when restricted to a small neighborhood of the current iterate, the function and constraints can be simplified.

L-BFGS-B

The L-BFGS-B algorithm extends L-BFGS to handle simple box constraints (aka bound constraints) on variables; that is, constraints of the form lixiui where li and ui are per-variable constant lower and upper bounds, respectively (for each xi, either or both bounds may be omitted).[7][8] The method works by identifying fixed and free variables at every step (using a simple gradient method), and then using the L-BFGS method on the free variables only to get higher accuracy, and then repeating the process.

OWL-QN

Orthant-wise limited-memory quasi-Newton (OWL-QN) is an L-BFGS variant for fitting -regularized models, exploiting the inherent sparsity of such models.[3] It minimizes functions of the form

where is a differentiable convex loss function. The method is an active-set type method: at each iterate, it estimates the sign of each component of the variable, and restricts the subsequent step to have the same sign. Once the sign is fixed, the non-differentiable term becomes a smooth linear term which can be handled by L-BFGS. After an L-BFGS step, the method allows some variables to change sign, and repeats the process.

O-LBFGS

Schraudolph et al. present an online approximation to both BFGS and L-BFGS.[9] Similar to stochastic gradient descent, this can be used to reduce the computational complexity by evaluating the error function and gradient on a randomly drawn subset of the overall dataset in each iteration. It has been shown that O-LBFGS has a global almost sure convergence [10] while the online approximation of BFGS (O-BFGS) is not necessarily convergent.[11]

Implementation of variants

Notable open source implementations include:

  • ALGLIB implements L-BFGS in C++ and C# as well as a separate box/linearly constrained version, BLEIC.
  • R's optim general-purpose optimizer routine uses the L-BFGS-B method.
  • SciPy's optimization module's minimize method also includes an option to use L-BFGS-B.

Notable non open source implementations include:

  • The L-BFGS-B variant also exists as ACM TOMS algorithm 778.[8][12] In February 2011, some of the authors of the original L-BFGS-B code posted a major update (version 3.0).
  • A reference implementation in Fortran 77 (and with a Fortran 90 interface).[13][14] This version, as well as older versions, has been converted to many other languages.
  • An OWL-QN C++ implementation by its designers.[3][15]

Works cited

  1. ^ Liu, D. C.; Nocedal, J. (1989). "On the Limited Memory Method for Large Scale Optimization". Mathematical Programming B. 45 (3): 503–528. CiteSeerX 10.1.1.110.6443. doi:10.1007/BF01589116. S2CID 5681609.
  2. ^ a b Malouf, Robert (2002). "A comparison of algorithms for maximum entropy parameter estimation". Proceedings of the Sixth Conference on Natural Language Learning (CoNLL-2002). pp. 49–55. doi:10.3115/1118853.1118871.
  3. ^ a b c d Andrew, Galen; Gao, Jianfeng (2007). "Scalable training of L₁-regularized log-linear models". Proceedings of the 24th International Conference on Machine Learning. doi:10.1145/1273496.1273501. ISBN 9781595937933. S2CID 5853259.
  4. ^ Matthies, H.; Strang, G. (1979). "The solution of non linear finite element equations". International Journal for Numerical Methods in Engineering. 14 (11): 1613–1626. Bibcode:1979IJNME..14.1613M. doi:10.1002/nme.1620141104.
  5. ^ Nocedal, J. (1980). "Updating Quasi-Newton Matrices with Limited Storage". Mathematics of Computation. 35 (151): 773–782. doi:10.1090/S0025-5718-1980-0572855-7.
  6. ^ Byrd, R. H.; Nocedal, J.; Schnabel, R. B. (1994). "Representations of Quasi-Newton Matrices and their use in Limited Memory Methods". Mathematical Programming. 63 (4): 129–156. doi:10.1007/BF01582063. S2CID 5581219.
  7. ^ Byrd, R. H.; Lu, P.; Nocedal, J.; Zhu, C. (1995). "A Limited Memory Algorithm for Bound Constrained Optimization". SIAM J. Sci. Comput. 16 (5): 1190–1208. doi:10.1137/0916069. S2CID 6398414.
  8. ^ a b Zhu, C.; Byrd, Richard H.; Lu, Peihuang; Nocedal, Jorge (1997). "L-BFGS-B: Algorithm 778: L-BFGS-B, FORTRAN routines for large scale bound constrained optimization". ACM Transactions on Mathematical Software. 23 (4): 550–560. doi:10.1145/279232.279236. S2CID 207228122.
  9. ^ Schraudolph, N.; Yu, J.; Günter, S. (2007). A stochastic quasi-Newton method for online convex optimization. AISTATS.
  10. ^ Mokhtari, A.; Ribeiro, A. (2015). "Global convergence of online limited memory BFGS" (PDF). Journal of Machine Learning Research. 16: 3151–3181. arXiv:1409.2045.
  11. ^ Mokhtari, A.; Ribeiro, A. (2014). "RES: Regularized Stochastic BFGS Algorithm". IEEE Transactions on Signal Processing. 62 (23): 6089–6104. arXiv:1401.7625. Bibcode:2014ITSP...62.6089M. CiteSeerX 10.1.1.756.3003. doi:10.1109/TSP.2014.2357775. S2CID 15214938.
  12. ^ "TOMS Home". toms.acm.org.
  13. ^ Morales, J. L.; Nocedal, J. (2011). "Remark on "algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound constrained optimization"". ACM Transactions on Mathematical Software. 38: 1–4. doi:10.1145/2049662.2049669. S2CID 16742561.
  14. ^ "L-BFGS-B Nonlinear Optimization Code". users.iems.northwestern.edu.
  15. ^ "Orthant-Wise Limited-memory Quasi-Newton Optimizer for L1-regularized Objectives". Microsoft Download Center.

Further reading

Read other articles:

Penggergajian kayu Amerika, 1920 Penggergajian kayu yang berdiri pada awal abad ke 20 yang masih bertahan, di Jerome, Arizona Penggergajian kayu adalah fasilitas di mana kayu yang telah ditebang dipotong-potong menjadi kayu untuk bahan bangunan atau keperluan lainnya. Penggergajian kayu adalah tahap awal kayu bulat diolah menjadi kayu gergajian. Kegiatan utama dalam penggergajian adalah membelah dan memotong kayu menggunakan gergaji sehingga hasil yang diperoleh disebut kayu gergajian. Proses Se…

Papan Penanda Isi Hati(Message On A Placard)(心のプラカード Kokoro No Placard)Sampul edisi reguler yang ditampilkan oleh Yupi, Haruka, Shania, dan MelodySingel oleh JKT48Sisi-AKokoro no Placard (Papan Penanda Isi Hati) / SenbatsuSisi-BKurumi to Dialogue (Dialog Dengan Kenari) / Team JLucky Seven / Team KIIIIiwake Maybe / Siswi Pelatihan Generasi KetigaMessage on a Placard / Senbatsu (hanya edisi reguler)Dirilis 27 Agustus 2014FormatCD+DVDDirekam2014 IndonesiaGenrePopLabelIndonesia Mu…

Artikel ini tidak memiliki referensi atau sumber tepercaya sehingga isinya tidak bisa dipastikan. Tolong bantu perbaiki artikel ini dengan menambahkan referensi yang layak. Tulisan tanpa sumber dapat dipertanyakan dan dihapus sewaktu-waktu.Cari sumber: Cati Bilang Pandai – berita · surat kabar · buku · cendekiawan · JSTOR Cati Bilang Pandai adalah nama seorang tokoh legendaris yang banyak disebut di dalam Tambo Minangkabau. Keluarga Menurut Tambo Minangka…

Patria AMV (Bahasa Inggris: Armored Modular Vehicle) ialah kendaraan militer multi-guna beroda 8x8 yang diproduksi oleh perusahaan industri pertahanan Finlandia Patria. Fitur utama dari AMV ialah rancangan modular, yang memudahkan bongkar-pasang berbagai kubah meriam, senjata, sensor, atau sistem komunikasi di platform yang sama. Ada rancangan-rancangan untuk berbagai versi kendaraan angkut personel dan kendaraan penempur infanteri, versi komunikasi, ambulans dan berbagai versi bantuan tembakan …

New Zealand priest The ReverendFrancis DouglasS.S.C.M.E.ChurchRoman Catholic ChurchOrdersOrdination29 October 1934by Archbishop Thomas O'SheaPersonal detailsBornFrancis Vernon Dougls(1910-05-22)May 22, 1910Johnsonville, Wellington, New ZealandDiedJuly 27, 1943(1943-07-27) (aged 33)Longos, Kalayaan, Laguna, PhilippinesNationalityNew ZealanderParentsGeorge Charles Douglas (father) Kathleen Gaffney (mother)OccupationMissionary PriestAlma materMarist Brothers' School (1921–1922)Johnsonvi…

Superfamily of lizards PygopodoideaTemporal range: Albian – Present, 110–0 Ma PreꞒ Ꞓ O S D C P T J K Pg N Eastern stone gecko (Diplodactylus vittatus) Pink-tailed worm-lizard (Aprasia parapulchella) Scientific classification Domain: Eukaryota Kingdom: Animalia Phylum: Chordata Class: Reptilia Order: Squamata Clade: Gekkonomorpha Infraorder: Gekkota Clade: PygopodomorphaVidal & Hedges, 2009 Superfamily: PygopodoideaGray, 1845 Families Diplodactylidae Carphodactylidae Pygopodidae …

Elections 2016 United States House of Representatives elections in Arkansas ← 2014 November 8, 2016 (2016-11-08) 2018 → All 4 Arkansas seats to the United States House of Representatives   Majority party Minority party Third party   Party Republican Libertarian Democratic Last election 4 0 0 Seats won 4 0 0 Seat change Popular vote 760,415 196,512 111,347 Percentage 71.16% 18.39% 10.42% Swing 9.83% 10.42% 20.22% Republican  …

Election in New Jersey Main article: 1920 United States presidential election 1920 United States presidential election in New Jersey ← 1916 November 2, 1920 1924 →   Nominee Warren G. Harding James M. Cox Party Republican Democratic Home state Ohio Ohio Running mate Calvin Coolidge Franklin D. Roosevelt Electoral vote 14 0 Popular vote 611,541 256,887 Percentage 67.65% 28.42% County Results Harding  50-60%  60-70%  70-8…

Racing ClubNama lengkapRacing Club de AvellanedaJulukanLa AcademiaBerdiri25 Maret, 1903StadionStadion Juan Domingo Perón,El Cilindro de Avellaneda (Silinder Avellaneda),Avellaneda, Provinsi Buenos Aires(Kapasitas: 64,161)Ketua Víctor BlancoManajer Gustavo CostasLigaLiga Utama ArgentinaCampeonato 202412 Klasemen Kostum kandang Kostum tandang Kostum ketiga Racing Club adalah klub sepak bola yang berbasis di kota Avellaneda, Provinsi Buenos Aires, Argentina. Prestasi terbaik dari Racing Club adal…

Voce principale: Brescia Calcio. Associazione Calcio BresciaStagione 1964-1965Sport calcio Squadra Brescia Allenatore Renato Gei Presidente Giacomo Ghidini Serie B1º posto (in Serie A) Coppa ItaliaSecondo turno Maggiori presenzeCampionato: Brotto (38)Totale: Brotto (39) Miglior marcatoreCampionato: De Paoli (20)Totale: De Paoli (21) 1963-1964 1965-1966 Si invita a seguire il modello di voce Questa pagina raccoglie i dati riguardanti l'Associazione Calcio Brescia nelle competizioni ufficial…

Science and engineering of interacting surfaces in relative motion Tribology is the science and engineering of understanding friction, lubrication and wear phenomena for interacting surfaces in relative motion. It is highly interdisciplinary, drawing on many academic fields, including physics, chemistry, materials science, mathematics, biology and engineering.[1] The fundamental objects of study in tribology are tribosystems, which are physical systems of contacting surfaces. Subfields o…

Untuk pemain bola tangan, lihat Bent Larsen (pemain bola tangan). Bent Larsen (lahir 4 Maret 1935 - Buenos Aires, 9 September 2010) adalah seorang pecatur Denmark dengan peringkat FIDE 2461 (2006). Larsen belajar catur pada usia 6 tahun dan pada tahun 1954 ia menjadi Master Internasional dari Denmark. Ia kemudian mengulang kesuksesan pada tahun 1955, 1956, 1959, 1963, dan 1964. Pada tahun 1956 Larsen menjadi Grandmaster. Larsen biasa menjalankan pembukaan 1.b3. Dengan pembukaan ia ia memainkan 1…

DisturbedDisturbed tampil pada tahun 2016 di Rock im ParkInformasi latar belakangAsalChicago, Illinois, Amerika SerikatGenre Heavy metal hard rock metal alternatif nu metal (awal) Tahun aktif 1994–2011 2015–Sekarang LabelGiant / Warner Bros.Reprise / Warner Bros.Situs webwww.disturbed1.comAnggotaDavid DraimanDan DoneganJohn MoyerMike WengrenMantan anggotaSteve Fuzz Kmak Disturbed adalah grup musik heavy metal Amerika dari Chicago, dibentuk pada tahun 1994. Grup ini termasuk vokalis David Dra…

L'allégation antisémite est une accusation délibérément fabriquée, présentée comme vraie, dans le seul but d'inciter à l'antisémitisme. Bien qu'elles aient été largement réfutées, les calomnies antisémites font souvent partie d'une théorie plus vaste de complot juif. Selon l'historien Kenneth Stern : « Historiquement, les Juifs ont beaucoup souffert des théories du complot. Les différents mythes, que les Juifs ont tué le Christ, ou qu'ils empoisonnent les puits,…

Monarchy in Central Asia from 1823 to 1926 Not to be confused with Islamic Emirate of Afghanistan, Islamic Emirate of Afghanistan (1996–2001), or Emirate of Afghanistan (1929). Emirate of AfghanistanEmirate of Kabul (1823–1855)امارت افغانستان (Persian)Imārat-i Afğānistān1823–1926 Flag (1919–1926) Emblem (1919–1926) Map of the Emirate of Afghanistan after the Durand Line AgreementAfghanistan before the 1893 Durand Line AgreementMap of the Emirate of Afghanistan in…

Pour les articles homonymes, voir Côte d'Azur (homonymie). Si ce bandeau n'est plus pertinent, retirez-le. Cliquez ici pour en savoir plus. Cet article ne cite pas suffisamment ses sources (avril 2023). Si vous disposez d'ouvrages ou d'articles de référence ou si vous connaissez des sites web de qualité traitant du thème abordé ici, merci de compléter l'article en donnant les références utiles à sa vérifiabilité et en les liant à la section « Notes et références ». En …

Jāmī Jāmī (Torbat-e-Jam, 1414 – Herat, 1492) è stato un poeta persiano. Indice 1 Biografia 2 Note 3 Altri progetti 4 Collegamenti esterni Biografia Vissuto alla corte dei Timuridi a Herat, Jāmī – il cui nome completo era ʿAbd al-Raḥmān ibn Aḥmad Nūr al-Dīn Jāmī (in persiano نورالدین عبدالرحمن جامی) – si dimostrò profondissimo conoscitore della lingua araba ed esponente di spicco del movimento letterario timuride. Meritò, dai critici del suo tempo, la…

この項目には、一部のコンピュータや閲覧ソフトで表示できない文字が含まれています(詳細)。 数字の大字(だいじ)は、漢数字の一種。通常用いる単純な字形の漢数字(小字)の代わりに同じ音の別の漢字を用いるものである。 概要 壱万円日本銀行券(「壱」が大字) 弐千円日本銀行券(「弐」が大字) 漢数字には「一」「二」「三」と続く小字と、「壱」「弐」…

Submarine of the Royal Navy HMS Tantalus in Plymouth Sound in August 1948 History United Kingdom NameHMS Tantalus BuilderVickers Armstrong, Barrow Laid down6 June 1942 Launched24 February 1943 Commissioned2 June 1943 IdentificationPennant number P318 FateScrapped in November 1950 Badge General characteristics Class and typeBritish T class submarine Displacement 1,290 tons surfaced 1,560 tons submerged Length276 ft 6 in (84.28 m) Beam25 ft 6 in (7.77 m) Draught 12…

American foreign service fraternity This article is about the professional fraternity. For the social sorority, see Delta Phi Epsilon (social). Delta Phi EpsilonΔΦΕFoundedJanuary 25, 1920; 104 years ago (January 25, 1920), Washington, D.C.Georgetown UniversityTypeProfessionalAffiliationIndependentEmphasisForeign serviceScopeNationalMottoλατρεύω LatreuoGreek: (I Serve)Colors  Black and   GoldFlowerMorning gloryHeadquarters3401 Prospect Street, NWPO Box 25401Washin…

Kembali kehalaman sebelumnya